Building, Configuring, and Using Isearch-cgi

Original Version: Erik Scott

Revised: Archie Warnock

Isearch-cgi is an add-on for Isearch. Isearch-cgi lets you do all the cool stuff that Isearch does, only you can do it from a page on the World Wide Web. This means your textbase can be accessed by anyone in the world who has a web browser (and these days, that's pretty much everyone).

It's important to understand that Isearch-cgi is almost useless by itself. You have to have already installed Isearch. This document will describe Isearch-cgi version 1.45, now distributed Isearch version 1.45 and Isite 2.05.

INSTALLATION:

To install Isearch-cgi requires that you have already gotten and built Isearch. If you haven't, get it now, build it, and learn to use it. The Isearch package is available for anonymous ftp from ftp.cnidr.org in the directory /pub/software/Isearch or /pub/software/Isite.

From this point on, we're going to make some assumptions:

  1. You have Isearch, and it's installed as /local/project/Isearch-1.45
  2. You know how to use (at least) Isearch and Iindex
  3. You already have the Apache Web server installed
  4. You pretty much just took the defaults when you installed httpd, like placing your HTML files in /home/httpd/html.

Note that if you're using one of Netscape's web servers, these instructions will probably be close, but not quite precisely correct. You were warned.

So, let's get Isearch-cgi:

sti-gw% ftp ftp.awcubed.com
Connected to ftp.awcubed.com.
220 screamer Microsoft FTP Service (Version 4.0).
331 Anonymous access allowed, send identity (e-mail name) as password.
230-Welcome to AWCubed FTP Site
230 Anonymous user logged in.
ftp> cd Software
250 CWD command successful.
ftp> binary
200 Type set to I.
ftp> get Isite-2.05.tar.gz
200 PORT command successful.
150 Opening BINARY mode data connection for Isite-2.05.tar.gz (1328832 bytes).
226 Transfer complete.
ftp> quit
221 Goodbye.

What we just did was this: used ftp to connect to ftp.awcubed.com, logged in as "anonymous", typed our email address for a password (which of course didn't show up here), and changed directories to where Isite (which includes Isearch and Isearch-cgi) is kept. Then we made sure we were using "binary" mode, and we got the distribution.

Now we need to uncompress the distribution:

% gunzip Isite-2.05.tar.gz

This creates a file named "Isite-2.05.tar". Now, we'll move that tar file to a good place to work from:

% mv Isite-2.05.tar /local/project
% cd /local/project

Finally (for now) we'll unpack that tar file:

% tar xf Isite-2.05.tar
% cd Isite/Isearch

We're now ready to start the main part of building Isearch and Isearch-cgi.

The first thing to do is to edit the file "Makefile". Basically, load the Makefile in your favorite editor and follow the directions. At one point, you'll see a line that says "That's all! Type 'make'". Don't edit below that line unless you really know what you're doing.

Now it's time to type "make". Don't be surprised if the compiler prints some warnings: no one is perfect. If all goes well, the Makefile will print:

Welcome to CNIDR Isearch!

Read the README file for configuration and installation instructions

Which is pretty sound advice, even if you're armed with this guide, since small details change from time to time.

At this point, you have built "isrch_fetch", "isrch_srch", "isrch_html" and "search_form". You'll need to build some shell scripts now, and you'll do it with the "Configure" script provided. Configure will need one argument, the directory where your Isearch indexes are stored. The Isearch indexes are the files created when you run Iindex, and the name was set by the "-d" option to Iindex. We're going to assume you have an index named "tester" in the directory "/local/project/Isite/db/":

% Configure /local/project/Isite/db

That created the scripts "isearch", "ihtml" and "ifetch". Copy these two scripts to wherever you put your cgi-bin applications. For RedHat Linux, this will most likely be /home/httpd/cgi-bin.

Now, following the outline of the README, we'll take a moment to make an index to some interesting files:

% cd /local/project/Isite
% cd bin
% Iindex -d ../db/tester -t sgmltag /etc/motd

That made a searchable index of the login banner. Boring example, yes, but one that anyone can handle.

The one remaining step is to make a web page that contains the buttons and text fields and so forth we need to actually do the searching. The program "search_form" will do that for us. Here's a simple example:

% search_form –simple /local/project/Isite/db tester > form.html

This creates a form named "form.html" that knows to use the "tester" textbase in the "/local/project/Isite/db" directory. Copy form.html to your httpd document directory (under RedHat Linux, it will probably be /home/httpd/html).

Now point your favorite web browser at that page (probably you'll use a URL like http://myhost.mydomain.com/form.html). You should see a form with blanks to fill in for search terms and a button that says "Submit Query". When you click on that button, the query will run and you'll get your search results. By default, the page you get will have brief "headlines" for the matching documents, and you'll be given links to click on to see the full document if you want to.

There is a chance that you will see a message in your web browser after searching that says something to the effect that it can't read "/cgi-bin/isearch". If this happens, check the entry for "ScriptAlias" in srm.conf, probably as /usr/local/etc/httpd/conf/srm.conf. Make sure it contains the fully qualified path name to your cgi-bin directory.

That's pretty much the whole game, the rest is just details.

DETAILS

As promised, here's the rest of the story.

More Advanced Search Forms:

First, the simple search page, form.html, that we created above is really simple. It allows you to enter three search terms, and they'll all be "or"-ed together at search time. It’s not all that wonderful. No wonder it's called a "simple" search form. Fortunately, there are three other kinds of search forms, "html", "boolean" and "advanced".

The HTML search form is specifically designed to handle textbases of HTML documents. To generate that kind of form, use the "-html" option to search_form:

% search_form -html /local/project/Isite/db tester >form2.html

A Boolean search form lets you specify two search terms and whether they are "and"-ed, "or"-ed, or "andnot"-ed. To generate that kind of form, use the "-boolean" option to search_form:

% search_form -boolean /local/project/Isite/db tester >form2.html

You can also create an advanced search form. This allows you to type free-form, infix boolean queries, like "((cheese and wine) or caviar) andnot sherry". To generate this kind of page, use:

% search_form -advanced /local/project/Isite/db tester >form3.html

Better Looking Forms:

The search forms that are generated are pretty plain. You'll probably want to add button bars and pictures of your company headquarters and so forth. Feel free. Go nuts.

The search results page is generated by the code in isrch_srch.cxx (or by isrch_html.cxx for the HTML search form). Look at the "cout" statements and get a feel for what is going on, and then hack away. You can make the output as simple or as fancy as you'd like. You can have a little Java animation of yourself pointing at your favorite document when it gets found, for instance. Or color code the hits, red for the highest score through purple to blue for the lowest. Again, go nuts.

Fast, Clean Links:

You can also modify the search form so that fetches aren't done through ifetch, but go straight to the html files that were indexed. This assumes, of course, that the files you indexed were part of your normal htdocs tree. If not, you're out of luck. But if you just indexed your web site, add the line:

<input name="HTTP_PATH" type=hidden value="/path/to/http/docs">

to your search form (like form3.html, above, for instance). Make sure you edit the pathname, though. This technique will make the Isearch-cgi results point to the real files instead of always going through ifetch. This is good for two reasons:

  1. It protects the integrity of the link. This is nice from a philosophical standpoint.
  2. It is much faster and places much less load on your server. This is nice from a job security standpoint.

That concludes the Isearch-cgi Users' Guide. Make sure you subscribe to the Isearch mailing list (isearch@franklin.oit.unc.edu) or the Isite mailing list (listproc@list.netprovisions.com) for the most current information.