SCRIPT Databases

The SCRIPT search engine was initially developed simply to test the Search API. However, we felt that others might find it useful, so we left it in.

The purpose of the SCRIPT search engine is to allow system administrators to offer new, interesting databases quickly and easily without having a specific database system or tool and without having to extend the source code of Isite.

We chose to implement this by describing a simple interface between the Search API and external applications. There are three components involved in this interface: The Database Information Group, the external application and the results file.

A Database Information Group is an entry in a sapi.ini file that describes a database. For SCRIPT-type databases, the database information group must have the directives Type, Location and Results. Consider:

[ManPages]
Type=SCRIPT
Location=/usr/local/bin/SearchScript
GetFull=/usr/local/bin/GetScript
Results=/tmp/results
FieldMaps=/home/cnidr/Isite-geo-beta/bin/gils.map

In this example, the database name is ManPages and the database type is SCRIPT. For SCRIPT type databases, the Location directive is mandatory and specifies a fully-qualified pathname of an external application or shell script which performs the search. The GetFull directive specifies the fully-qualified pathname of an external application or shell script which, given a record key, retrieves a full record from the database.

The Results directive is also mandatory and acts as a prefix for a temporary storage file for the results of the search. When the Search API receives a request to search the ManPages database for a term of 'strcmp', for example, it constructs a command of the form:

/usr/local/bin/SearchScript /tmp/results.<pid> strcmp[attr1,value1]

and executes that command with a system() call. Therefore, the calling of a SCRIPT search engine is quite simple, however the results file must be structured to allow the Search API to read the search results.

The Results file must be adhere to the following format:

[Default]
HitCount=3
Diagnostic=0
Separator=##separator string - your choice##
[Data]
Key-for-record1
Record data for record number 1
##separator string - your choice##
Key-for-record2
Record data for record number 2
##separator string - your choice##
Key-for-record3
Record data for record number 3

If the Results file is not of this format, you can expect unexpected results! The file begins with a group named "Default". Within the Default group, "HitCount" is the number of documents matching the user's query and available for retrieval. As of this writing, "Diagnostic" can be one of two values, 0 or 1. A value of 0 indicates success and 1, failure. The "Separator" directive should be a unique string (unique among the data records) that will be used to separate the data records themselves. This provides the Search API with a dynamic mechanism for retrieving records based on a caller's request.

Next, we have a group named "Data". The actual data records (HitCount of them) are listed sequentially after the Data group name, separated by a single line containing only the Separator value.

Therefore, if you wish to write a SCRIPT type search engine to be used with the Search API (hence any applications that use the Search API), you need to do the following:

- Write a script or program that accepts 2 command line arguments,  a temp file name and query term.

- In the script, perform whatever operation you wish and then write the results of your operation into the temp file you received on the command line. You must adhere to the file format described above! 

- Edit your sapi.ini file and add your new database by adding the database name to the DBList directive and add a Database Information Group as described above.

Any applications (zserver, for example) that use the Search API should be made aware of the Search API's new databases. Refer to the documentation for each application for more details. That should be all!

The Location script expects to find the brief records (or headlines) in the temporary file specified on the command line. These results are passed back to the user, who will be offered an opportunity to retrieve the full record corresponding to one or more of the hits. The request to retrieve the full record is handled by the GetFull directive which, similar to the way the Location script handles things, uses a system call to a shell script to retrieve the requested record. This script is passed a temporary file name and the record key (from the search script) to be used to retrieve the full record.

When the Search API receives a request to retrieve a full record from the ManPages database, for example, it constructs a command of the form:

/usr/local/bin/GetScript /tmp/<tmpname> strcmp-key

and executes that command with a system() call. It is expected that the record will already be formatted correctly.