[Previous] [Next] [Up] [Top] [Search] [Index]
An Overview of the WN server
An HTTP server should do more than just serve files. It should
play an active role in both navigation and presentation issues. It is
my hope that this server provides better tools for the creative
webmaster. - John Franks
WN is a server for the Hypertext Transfer Protocol (HTTP). Its
primary design goals are security, robustness, and flexibility, in
that order. One of its objectives is to provide functionality usually
available only with complex CGI scripts without the necessity of
writing or using these scripts. (Of course CGI/1.1 is fully supported
for those who want it). Despite this extensive functionality the WN
executable is substantially smaller than the CERN, NCSA or Apache
servers.
WN was planned with a focus on serving HTML documents. This means such
things as enabling full text searching of a single logical HTML
document which may consist of many files on the server, or allowing
users to search all titles on the server and obtain a menu of
matching items, or allowing users to download a total logical document
for printing which, in fact, consists of many linked files on the
server. All of these are done in a way which is transparent to the
user (and largely transparent to the maintainer!) The User's Guide provides a good example of many
of these features.
Another feature not found in many other servers is conditionally served
text. Often a server maintainer may wish to serve different versions
of a document to different clients. By adding simple HTML comments to
documents and marking those documents to be "parsed" by the server,
the maintainer can arrange that different sections or entirely
different documents are sent to clients, based on such things as the
client's domain name, IP address, browser type, browser "Accept"
header, "Cookie header", etc. This feature is described in more
detail in the user guide chapter on parsed
documents.
But these are only examples of many new tools WN
makes available to webmasters.
The design and security mechanisms of WN differ substantially
from those of the httpd servers available from CERN and NCSA, so a
brief description of how they work is useful.
Files served by an HTTP server may have many attibutes relevant to
their serving. These attributes include content-type, optional title,
optional expiration date, optional keywords, whether the file should
be parsed for server-side includes, access restrictions, etc. Some
servers try to encode this information in ad hoc ways, in a file name
suffix, or in a global "configuration file." The approach of WN is to
keep this information in small databases, one for each directory in
the document hierarchy.
The WN maintainer never needs to understand the format of these
database files (named index.cache by default), but this format is very
simple and a brief description will indicate how WN works. When the
server receives a request, say for /dir/foo.html, it looks in the file
/dir/index.cache which contains lines like
file=foo.html&content=text/html&title=whatever...
If the server finds a line starting with "file=foo.html" then the file
will be served. If such a line does not exist the file will not be
served (unless special permission to serve all files in the directory
has been granted). This is the basis of WN security. Unlike other
servers, the default action for WN is to deny access to a file. A file
can only be served if explicit permission to do so has been granted by
entering it in the index.cache database or if explicit permission to
serve all files in /dir has been given in the index.cache file in /dir.
This database also provides other security functions. For example,
restricting the execution of CGI scripts can be done on the basis of
the ownership (or group ownership) of their index.cache files. There
is no need to limit execution to scripts located in particular
designated directories. The location of a file in the data hierarchy
should be orthogonal to security restrictions on it and this is the
case with the WN server.
The index.cache database file has a number of other functions beyond
its security role. Attributes of foo.html which can be computed
before it is served and which don't often change are stored in the
fields of the line starting file=foo.html. For example, the MIME
content type "text/html" must be deduced from the filename suffix ".html".
This is done once at the time index.cache is created and need not be
done every time the file is served.
The title of a file is another example. With the WN server every file
served has a title (even binaries) and optionally has a list of
keywords, an expiration date, and other fields associated with it.
For an HTML document the title and the keywords are automatically
extracted from the header of the document and stored in fields of that
file's line in its index.cache file. These are used for the built-in
keyword and title searches which the server supports. The maintainer
also has the option of adding his own fields to this database file.
They could contain such things as document author, document id number,
etc. These user defined fields can be searched with the built-in WN
searches or their contents can be inserted into the document, on the
fly, as it is served
So how are the index.cache databases created? Their format is quite simple
and a maintainer is free to create them any way she chooses, but normally
they are created by the utility WNdex (pronounced "windex").
This program, which is part of the WN distribution, is designed
to produce the index.cache file from a file with a friendlier format
with the default name index. A very simple index file might
look like
File=foo.html
File=clap.au
Title=Sound of one hand clapping
File=hand
Title=Picture of one hand clapping
Content-type=text/gif
Of course if the file hand were named hand.gif the
content-type line would not be necessary as wndex could deduce the
type from the .gif suffix. Likewise it is not necessary to give a
title for foo.html because wndex will read the HTML header from that
file and extract the title and perhaps other things like keywords and
expiration date.
1.2 Features of WN
The WN server has several features which are not available
with other servers or only available through the use of CGI scripts.
One of the design goals of WN is to provide the maintainer with
tools to create extensive navigational aids for the server. A variety
of search mechanisms are available.
- Title searches
- In response to the URL <http://host/dir/search=title>
the server will provide an HTML form (automatically generated or
prepared by the maintainer) asking for a regular
expression search term. When supplied the server will search
the index.cache files in /dir and designated subdirectories for a
items whose titles contain a match for the search term. An
HTML document with a menu of these items is returned.
- Keyword searches
- Like title searches except matches are sought in keywords instead
of titles. Keywords for HTML documents are automatically obtained
from headers. For other documents (or HTML documents) they
can be manually supplied in the index file.
- Title/keyword search
- Like the above except the match can be either in the keyword or
the title.
- User supplied field searches
- Like keyword searches except matches are sought in user supplied
fields. The user supplied fields can contain any text and are
attached to a document by entering them in that document's
record in the index file. Their purpose is to include
items like a document id number, or document author in the index.cache
database. A field search could then produce all documents by a
given author for example. Or using regular expressions in the
search term produce a list of all documents whose id number satisfy
certain criteria.
- Context searches
- Unlike the title and keyword searches this is a full text search
of all text/* documents in one directory (not subdirectories).
The returned HTML document contains a list of all the titles of
documents containing a match together with a sublist of the
lines from those documents containing the match. This provides
one line of context for the match. For HTML documents the matched
expression in each of these lines will be a highlighted anchor.
Selecting one takes you to the document with your viewer focused
on the matching location. The primary intent of this feature is
to provide full text searching for an HTML "document" which might
consist of a substantial number of files.
- File context and grep searches
- A file context search is just like a context search, except limited
to a single file. The file grep search returns a text/html
document containing the lines in the file matching matching
the regular expression.
- List searches
- The server will search an HTML document
looking for an unordered list of anchors linking to WWW objects.
The contents of each anchor will be searched for a match to the
supplied regular expression. The search returns an HTML document
containing an unordered list of those anchors with a match. This
is quite useful with the digest utility which creates HTML
documents to be searched in this way from files with internal structure
like mail or news digests, mailing lists, etc.
- Index searches
- This is a mechanism by which arbitrary search engines can be
linked to WN through a "search-module". The server will provide
the search term to the search-module and expects an HTML list of
links to matching items to be returned.
All of the searching methods listed above except the index searches
are built into the server and require no additional effort for the
maintainer. They are simply referenced with URLs like
<http://host/dir/search=context> where /dir is any directory
containing files to be served and an index.cache listing them. Of
course search permission can be denied for any directory or any file
contained in that directory.
The WN server has extensive capabilities for automatically
including files in one which is being served or "wrapping" a served
file with another, i.e. pre-pending and post-pending information to a
file being served. This latter is useful if you wish to place a
standard message at the beginning or end (or both) of a large
collection of files. For security all files included in a file or
used as a wrapper for it are listed in that file's index.cache file.
This combined with various available security options, like requiring
that a served file and all its includes and wrappers have the same
owner (or group owner) as the index.cache file listing them, provide a
safe and productive Web environment.
One important application of wrappers is to customize the HTML documents
returned listing the successful search matches. If a search item is
given a wrapper the server assumes that it contains text describing
the search and it merely inserts an unordered list of links to the
matching items.
In addition to including files the output of programs may be inserted
and the value of any user defined field in the index.cache database
entry for a file may be inserted.
Also parsed text may conditionally insert items with a simple if - else - endif construct. based on Accept
headers, User-Agent headers, Referer headers etc.
An arbitrary "filter" can be assigned to any file to be served. A
filter is a program which reads the file and has the program output
served rather than the content of the file. The name of the filter is
another field in the file's line in its index.cache file. One common
use of this feature is for on-the-fly decompression. For, example, a
file can be stored in its compressed form and assigned a filter like
zcat which uncompresses it. Then the client is served the
uncompressed file but only the compressed version is stored on disk.
As another example, you might use "nroff -man" as a filter to process
UNIX man files before serving. There are many other interesting uses
of filters. Be creative!
If the server is accessed via a URL like
<http://host/dir/foo;lines=20-30> and file is any text/*
document it will return a text/plain document consisting of lines 20
through 30 of file foo. This is very useful for structured text files
like address lists or digests of mail and news. A WN utility
called digest will produce an
HTML document with a list of links to separate sections (line ranges)
of the structured file. The digest utility is executed with two
regular expressions as arguments: one to match the section separator
and the other to match the section title. For a mail digest, for
example, these could be ^From and ^Subject:
respectively. Then the sections of the virtual documents would be
delimited by a line starting with "From" and would have the message
subject as their title. A similar mechanism provides byte ranges from
files.
John Franks <john@math.nwu.edu>
[Previous] [Next] [Up] [Top] [Search] [Index]