[Previous] [Next] [Up] [Top] [Search] [Index]

CGI Scripts


CGI stands for Common Gateway Interface. It provides a standard for Web servers to interact with programs or scripts which are not part of the server but may produce output which you wish to serve.

16.1 Do You need a CGI script?

Many functions which are done by CGI scripts on other servers are built in features of WN. If your needs can be met by these features then not only will you save yourself considerable effort in creating, setting up, and maintaining scripts or programs, but the built in feature will perform much more efficiently and much more securely than a CGI script.

These features include the ability to respond with different text or entirely different documents based on the the client request, the client's hostname, IP address, user-agent, or the "referer", the document containing the link. For information about this see the chapter on parsed text. Also support for "imagemaps" or clickable images is built in so there is no need to use CGI for this. See the chapter on imagemaps. Finally WN supports a variety of methods of searching your data including by title, keyword, or full text. See the chapter on searches.

If these features do not meet your needs and something like a CGI script will, then you may wish to consider using a WN filter. These have most of the functionality of CGI scripts, but are somewhat more secure and have one advantage: the output of filters can be parsed while CGI output cannot.

16.2 How Does the Server Recognize a CGI Script?

It would be nice if one could simply indicate in the appropriate index file that a particular file is a CGI program which should be executed rather than served. Unfortunately, the CGI protocol makes it impossible to implement this in an efficient way.

There are two mechanisms in fairly common use with other servers for indicating that a file is a CGI script and WN supports them both. The first is to give the file name a special extension (by default it is ".cgi") which indicates that it is a CGI script. Thus any file you serve with the name "something.cgi" will be treated as a CGI script. The special extension ".cgi" can be redefined by editing the file config.h and recompiling.

The second mechanism is to have specially named directories with the property that any file in that directory will be assumed to be a CGI script. The default for this special name is "cgi-bin". Thus, if you have a directory /cgi-bin in your hierarchy the server will assume that any file served from that directory is a CGI script. Of course, as always, only files listed in that directory's index file will be servable. No files in subdirectories of cgi-bin can be served. This is because the server will alway interpret a request for "/cgi-bin/foo/bar" as meaning run the script /cgi-bin/foo with the PATHINFO environment variable set to "bar". Thus if foo is actually a directory and bar a file in it, the request will fail.

There is no need for cgi-bin/ to be at the top of your hierarchy. It could be anywhere in the hierarchy. And, in fact, you can have as many directories named "cgi-bin" as you like. They will all be treated the same. The name "cgi-bin" can be changed by editing config.h and recompiling.

16.3 How Does a CGI script work?

It is beyond the scope of this document to provide an extensive tutorial in writing CGI scripts. There is an online tutorial at www.charm.net and another available from NCSA. A collection of links to CGI information is available at www.stars.com.

We will provide only a simple example of a CGI script written in perl. More examples can be found in the /docs/examples directory of the WN distribution.


#!/usr/bin/perl
# Simple example of CGI script.

print "Content-type: text/html\r\n";
# The first line must specify content type. Other
# optional headers might go here.

print "\r\n";
# A blank line ends the headers. All header lines should
# end with CRLF ("\r\n"), but other lines don't need to.

# From now on everything goes to the client

print "<body>\n";
print "<h2>A few CGI environment variables:</h2>\n\n";

print "REMOTE_HOST = $ENV{REMOTE_HOST}<br>\n";
print "HTTP_REFERER = $ENV{HTTP_REFERER}<br>\n";
print "HTTP_USER_AGENT = $ENV{HTTP_USER_AGENT}<br>\n";
print "QUERY_STRING = $ENV{QUERY_STRING}<br>\n";
print "<p>\n";

print "</body>\n";

Notice that the first thing the script does is provide the HTTP "Content-type:" header line. It may be followed by other optional headers you want the server to send. The end of these headers is indicated by a blank line. Of course the server will add additional headers.

By default the WN server assumes that the output of any CGI script is "dynamic" or different each time the script is run and is also "non-cachable". Hence the server behaves as if the "Attributes=dynamic,noncachable" directive had been used. The dynamic attribute causes the server not to send a last modified date or a content length since they might be constantly changing. The noncachable attribute attempts to dissuade clients and proxies from caching the output by sending an appropriate HTTP header. If, in fact, the output of your script is always the same, you can use the "Attributes=nondynamic" directive. Also if you wish it to be cached you must use the "Attributes=cachable" directive. In particular, if you want the browser "back" button to return users to a a CGI generated page after they have followed a link you may need "Attributes=cachable" (especially with a POST form) since otherwise the browser may not even cache the page in memory.

The script above is a good example of one which should not be cached as it prints out the client's hostname, user agent and the URL of the document which contains the link to this CGI script. The CGI script gets this information about the client from environmental variables set by the server. A complete list of the standard CGI environment variables and a description of what they contain plus a description of some additional non-standard ones supplied by the WN server can be found in Appendix D: Environment Variables.

In addition to setting these environment variables appropriately the server will change the current working directory of the CGI process to the directory in which the CGI script is located.

Note: In general a CGI script has complete control over its output, so it is responsible for doing things which the server might do for a static document. This means that you cannot use many of the WN features with CGI output. In particular the server will not use a filter or parse it for <!-- #include -->, etc. The CGI script must do these things for itself. Also the server will not provide ranges specified in the Range: header. Instead the contents of this header is passed to the script in the environment variable HTTP_RANGE, so the script can do the range processing.

One thing you should be aware of in writing scripts is that the WN server does not send the standard error output to the error log file, but leaves its default the terminal from which the server is invoked. This allows the maintainer to set it to a file of her choice or leave it directed to the console window in which swn was invoked. To redirect it to a file called my.errs simply run swn with a command like "swn <options> 2>my.errs" if you are using a Borne-like shell. This can be useful when debugging CGI scripts because their errors are typically sent to stderr so you can easily view them with a command like "tail -f my.errs", rather than have them buried in a log file.

16.4 CGI Handlers

Sometimes you may have a number of files which are to be processed by the same CGI script or program. In that case you might consider designating a "handler" for these files instead of putting the the name of the CGI program in the URL for each of them.

The file directive

CGI-Handler=bar.cgi
causes the script "bar.cgi" to be run and its output to be served in place of the document requested. This is a way to designate a CGI script to handle a file somewhat like a filter. The name of the script need not be in the URL since is in the index file. So when http://host/foo.html is requested this will cause the "handler", bar.cgi, to be run with Pathinfo set to /path2/foo.html. In normal use the script bar.cgi will do something to the file foo.html and serve the output. It is useful if you want a number of files in a directory to be handled by the same CGI script. Note the file foo.html need not be used in any way by the script, but it must exist or else the server will treat it as a non-existent file.

The directory directive Default-CGI-Handler=handler.cgi specifies that all files in the directory should be treated as if the CGI-Handler file directive had been set to handler.cgi. To override this setting and specify no CGI handler use the "CGI-Handler=<none>" directive.

16.5 How can CGI scripts be made safe?

This is an extremely important issue, but one which is beyond the scope of this document. I highly recommend the CGI security FAQ maintained by Paul Phillips and the WWW Security FAQ maintained by Lincoln Stein.


John Franks <john@math.nwu.edu>
[Previous] [Next] [Up] [Top] [Search] [Index]