WWWOFFLE VERSION 2.5 - FREQUENTLY ASKED QUESTIONS AND ANSWERS

This file contains a list of frequently asked questions and their answers relating to WWWOFFLE version 2.5. Not all of the questions here are real users questions, some of them have been made up to give some help to people trying to use the program who find that the README documentation is insufficient.

Section 0 - Why doesn't this FAQ answer my question?

Section 1 - What does WWWOFFLE do (and what it doesn't)

Q 1.1 Does WWWOFFLE support http, ftp, finger, https, gopher, ...?

Q 1.2 Does WWWOFFLE run on systems other than UNIX?

Q 1.3 Can you change WWWOFFLE so that in the pages that it generates ...?

Section 2 - How to use WWWOFFLE to serve an intranet

Q 2.1 Can the WWWOFFLE proxy be accessed by clients other than localhost?

Q 2.2 Why can't remote clients access the WWWOFFLE proxy?

Q 2.3 Why can't remote clients follow all of the links?

Q 2.4 What are the security issues with WWWOFFLE in a multi-user environment?

Q 2.5 How can I have different configurations for different groups of users?

Section 3 - What to look for when WWWOFFLE fails

Q 3.1 Why does my browser return an empty page with WWWOFFLE but not without?

Q 3.2 Why can't WWWOFFLE find a host when the browser without it can?

Q 3.3 Why does my browser say "Connection reset by peer" when browsing?

Q 3.4 Why does following a link on an FTP site go to the wrong server?

Section 4 - Applet handling

Q 4.1 Why doesn't my Browser start applet XYZ?

Q 4.2 Are unicoded applet names supported?

Q 4.3 Why does my Netcape Browser throw the trustProxy security exception?

Section 5 - How to make most use of WWWOFFLE features

Q 5.1 How can I see what monitored pages were downloaded last time online?

Q 5.2 How can I do a recursive fetch on a regular interval?

Q 5.3 How can I stop users from accessing the index?

Q 5.4 How can I use JunkBuster with WWWOFFLE?

Section 6 - More information about WWWOFFLE

Q 6.1 Who wrote WWWOFFLE, When and Why?

Q 6.2 What WWWOFFLE mailing lists are available?

Q 6.3 How do I report bugs in WWWOFFLE?


Section 0 - Why doesn't this FAQ answer my question?

This FAQ is released with each new version of the WWWOFFLE program so if you are reading the supplied version and if the question is one that is frequently asked about this new version then you will by definition not find the answer here. This FAQ is also available on the WWWOFFLE homepage along with much other information about the program. http://www.gedanken.demon.co.uk/wwwoffle/version-2.5/

Section 1 - What does WWWOFFLE do (and what it doesn't)

Q 1.1 Does WWWOFFLE support http, ftp, finger, https, gopher, ...?

Some of these are supported and some are not.

http : Yes
        The original version of WWWOFFLE only supported http.

ftp : Yes
        Since version 2.0 there has been support for ftp URLs.

finger : Yes
        Since version 2.1 there has been support for finger.  Although this is
        not a standard protocol for proxying there is no reason that it cannot
        usefully be performed.

https : Yes
        Since version 2.4 there has been support for transparent proxying of
        Secure Socket Layer (SSL) connections.  This includes the https
        protocol.

gopher : No
        This is a protocol that is less popular now that the WWW has really
        taken off.  From looking at browsers that support it, it would seem to
        be not impossible, but the market for it seems to be limited.

Q 1.2 Does WWWOFFLE run on systems other than UNIX?

For example DOS / Win3 / Win95 / WinNT / OS/2.

UNIX    = Yes
        This is the system that the program way designed and initially written
        for, it should work on many versions of UNIX.
        I know that it works on Linux, SunOS 4.1.x, Solaris 2.x, *BSD.

DOS/Win3 = No
        The program was not designed for DOS, the filenames used and the
        multi-process nature of the program do not allow this.

Win95/Win98/WinNT = Yes (Partly)
        A Windows 32-bit version of the program is now available thanks to the 
        Cygwin development kit that provides a UNIX system call library
        available on MS Windows.

OS/2    = Maybe
        I do not know of an equivalent for the Cygwin product for OS/2, if it
        exists then it should be possible to port as it was for Windows 95 /
        Windows NT above.

Q 1.3 Can you change WWWOFFLE so that in the pages that it generates ...?

This is a question that gets asked a lot.  People want to see Javascript,
images, different colours ... on the web pages that WWWOFFLE generates.

From version 2.2 this is no longer an issue since it is possible to customise
all of the web-pages that WWWOFFLE itself generates.  This means that the
background colour and the font size can all be changed to suit your preferences.
To find out how to do this look in the /var/spool/wwwoffle/html/messages
directory and read the README file.

Section 2 - How to use WWWOFFLE to serve an intranet

Q 2.1 Can the WWWOFFLE proxy be accessed by clients other than localhost?

Yes it can, that facility has been present from the beginning.

The other clients can be any type of computer that is connected to the server
that is running the wwwoffled program.  The only requirement is that they are
networked to the server and that they have browsers on them configured to access
the WWWOFFLE proxy.

Q 2.2 Why can't remote clients access the WWWOFFLE proxy?

The default situation in the wwwoffle.conf file is to not allow any clients to
access the proxy other than localhost.  To allow them to access the proxy the
wwwoffle.conf file needs to be edited as described below and the new
configuration loaded.

The AllowedConnect section of the configuration file contains a list of hosts
that are allowed to connect to the WWWOFFLE proxy.  These names are matched
against the name that WWWOFFLE gets when the connection is made and access is
allowed or denied.  A form of wildcard matching is applied to the entries in
this list but no extra name lookups are performed.

For example you are using the private IP address space 192.168.*.* for your
intranet then your AllowedConnect section in the configuration file should look
like this.

AllowedConnect
{
 192.168.*
}

This will allow all hosts that come from this set of IP addresses to connect to
the WWWOFFLE proxy.

Q 2.3 Why can't remote clients follow all of the links?

Some of the links that are generated in the web pages that come out of the
WWWOFFLE proxy need to point to other pages on the proxy.  To be able to do this
the name of the host running the proxy needs to be specified in the LocalHost
section of the configuration file.

For example if the computer running the WWWOFFLE proxy is called www-proxy then
the LocalHost section of the configuration file would look like this.

LocalHost
{
 www-proxy
 localhost
 127.0.0.1
}

The first of the names is what is used by WWWOFFLE to generate these links.  The
others are used for servers that do not get cached by the proxy.

Q 2.4 What are the security issues with WWWOFFLE in a multi-user environment?

Security is a feature that I have considered to some extent when writing
WWWOFFLE although it has not been one of my biggest concerns.  The issues are
listed below.

For the Win32 version it should be noted that on Win95/98 there is not the user
level security that is provided by UNIX.  It is not possible therefore to create
files that are readable by WWWOFFLE and not by other users.  The security
features that are present in WWWOFFLE are therefore inapplicable to these
systems.

Configuration file password
   This file can have a password specified in it in the StartUp section that is
   used to limit access to the control features of WWWOFFLE.  If set this
   password must be used to put WWWOFFLE online, put it offline, purge the
   cache, stop the server, edit the configuration file etc.  If you have set a
   password then you should also make the file readable only by authorised users.
   The password is sent as plain text when using the wwwoffle program to control
   the wwwoffled server.  The encryption used for the web page authentication is
   trivial.

Proxy Authentication
   With the ability to be able to control access to WWWOFFLE using the HTTP/1.1
   Proxy Authentication method, there is the added security risks of this.  It
   is basically the same as for the configuration file password, the usernames
   and passwords are in plaintext in the configuration file and the password is
   send to the server using the same trivial encryption method.

WWWOFFLE server uid/gid
   The uid and gid of the wwwoffled server process can be controlled by the
   run-uid and run-gid options in the StartUp section of the configuration file.
   This uid/gid needs to be able to read the configuration file (write is not
   required unless the interactive edit page is used) and have read/write access
   to the spool directory.  If this option is used then the server must be
   started by root.

Deleting requested URLs
   Only the user that makes a request for a page can delete that request, and
   then only when the deletion is done immediately.  This is because a password
   is made by hashing the contents of the file in the outgoing directory.  This
   means that read access to this directory must be denied for this to be secure.

The built in web server
   This is a very simple server and will follow symbolic links, as a security
   feature only files that are world readable can be accessed.  They must also
   be in a directory that the wwwoffled server can read.  A check is not made for
   each directory component so world readable files in a directory readable only
   by the uid that runs wwwoffled are not safe.

Accessing the cache
   There is in general no problem with allowing users access to the cache
   provided it is read only (but see URLs with password below).  The only
   concern is that if purging is done using the access time of the files then
   running grep on the cache will spoil this.

URLs with Passwords
   The URLs that use usernames and passwords need to be stored in the cache.
   For simplicity they are not hidden in any way.  This means that any URL that
   uses a username/password in it can show up in the log file (with Debug or
   ExtraDebug levels only).  The files in the cache also contain the username/
   password information and should be made inaccesible to users for that reason.

Q 2.5 How can I have different configurations for different groups of users?

When there are two groups of users that will access the same WWWOFFLE cache but
where each group has different WWWOFFLE configurations it is possible to run two
instances of WWWOFFLE.

For example in a school it may be required that the students can access the
cache but they cannot request new pages.  The teachers must be able to access
the same cache and to be able to use WWWOFFLE online and request pages while
offline.

The two WWWOFFLE configuration files will be the same in most respects, but
there will be differences as shown below.

-- wwwoffle.student.conf --               -- wwwoffle.teacher.conf --
StartUp                                 | StartUp 
{                                       | {
 http-port     = 8080                   |  http-port     = 9080
 wwwoffle-port = 8081                   |  wwwoffle-port = 9081
 password      = secret                 |  password      = teacher
}                                       | }
                                        | 
DontRequestOffline                      | DontRequestOffline
{                                       | {
 *://*/*                                | 
}                                       | }
                                        | 
AllowedConnectUsers                     | AllowedConnectUsers
{                                       | {
                                        |  teacher1:password1
                                        |  teacher2:password2
}                                       | }
                                        | 
AllowedConnectHosts                     | AllowedConnectHosts
{                                       | {
                                        |  teacher1pc
                                        |  teacher2pc
}                                       | }

The two copies of WWWOFFLE must use different port numbers.  They use the same
spool directory and therefore the same web-pages are available to both sets of
users.  You will need to have a password on the students version of WWWOFFLE to
stop them editing the configuration file, but for the teachers it may not be
required.  To keep the students from accessing the teachers version of WWWOFFLE
you must use either the AllowedConnectHosts or the AllowedConnectUsers sections
in the configuration file.  These will restrict access to either the set of
machines that the teachers have access to or will require a username/password to
be entered before browsing starts.

In the example above the students are not allowed to request any pages when
offline.  This version of WWWOFFLE is never used in online mode so there is
never any way that the students can browse while online.  Only the teachers
version of WWWOFFLE is ever used in online mode.

Section 3 - What to look for when WWWOFFLE fails

Q 3.1 Why does my browser return an empty page with WWWOFFLE but not without?

When using a browser to visit a web-page nothing is returned when WWWOFFLE is
used as a proxy but when the site is accessed directly without WWWOFFLE the page
is visible.

This can have a number of causes (all reported to me or tested myself):

a) The web server that you are accessing requires the User-Agent header.  If it
   is not present or set to an uncommon value (not Netscape or IE) then it
   returns an empty page
   In this case if you have the CensorHeader configuration file section set to
   remove the User-Agent header then you should either not censor this header
   line or set a replacement string that is acceptable.

b) As above, but it does not matter what the value is for it to return a
   non-empty page.
   The solution is the same except that any User-Agent string can be used.

c) The web server uses cookies to maintain state.  This is common on sites that
   are more concerned with form than content, often without warning.

d) The browser and server are trying to use HTTP/1.1 extensions that WWWOFFLE is
   ignoring.

Q 3.2 Why can't WWWOFFLE find a host when the browser without it can?

There are two possible reasons for this.

1) A Non-authoritative DNS server.
2) A change in the DNS server configuration since wwwoffle was started.

When WWWOFFLE looks up a hostname it uses the standard UNIX library (libc)
function call gethostbyname().  This will only return the host information if
the name that it receives from the domain name server (DNS) is authoritative.  A
non-authoritative answer is not returned, but an error status is set.

Large browser projects (Netscape in particular) will use a non-authoritative
answer if one is available.  This means that it can access sites that are not
available to WWWOFFLE.  The source code for doing this is not obvious and
requires quite low level functions in the name resolver library (libresolv).

This problem only happens when the name server that you are using has poor
connectivity to other name servers or some other name resolving problem.  If
possible the solution is to use a different name server, or complain to the
manager of the one that you do use.

[If anybody has source code for non-authoritative name lookups please tell me.]

The other possible reason is that the DNS server that was configured when
WWWOFFLE was started is no longer valid.  This would happen for example if the
file /etc/resolv.conf was changed after wwwoffled was run.  This is not a
WWWOFFLE only problem, but will affect any (most) programs that use name
lookups.

The reason is that the name lookup part of the standard UNIX library (libc) is
initialised when the program is first started.  When the name lookup is
performed later it will still use the same configuration that was in place when
the program was first started.

This may happen without you being aware of it since some of the user friendly
PPP setup programs will change the /etc/resolv.conf file depending on which ISP
you are connecting to.

Q 3.3 Why does my browser say "Connection reset by peer" when browsing?

This happens when using Netscape to access some web-pages.  The cause is not
known, but the problem is only seen when WWWOFFLE is used and not when a direct
connection is made.

[I believe that this problem (a peculiarity of Netscape) has been fixed in
 version 2.2c or WWWOFFLE, please tell me if this is not the case.]

Q 3.4 Why does following a link on an FTP site go to the wrong server?

If there is a directory called '/dir' on an ftp server and you load the page
'ftp://server/' you get a directory listing that includes a link to '/dir'.
Following this link should take the browser to 'ftp://server/dir/', but on some
browsers it goes to 'ftp://dir/' instead.

I think that this behaviour is due to the browser and not WWWOFFLE.  If you went
to 'http://server/' and followed the link to '/dir/' then you would expect to go
to 'http://server/dir/' and not to 'http://dir/'.  This is just common sense.
Why the browser is different for ftp than http I am not sure.

[This should be fixed in version 2.1 of WWWOFFLE, so is not really applicable to
 this version of the FAQ]

Section 4 - Applet handling

Q 4.1 Why doesn't my Browser start applet XYZ.

[Walter Pfannenmueller <pfn@online.de> writes:]

I suppose you have enabled java support.  Your Browser says something like
"Can't start Applet XYZ.class".  Check if the file has been successfully
downloaded by WWWOFFLE.  If the file is accessible, open a java console (your
browser should provide something like that) and get more details on the problem.
Probably it's a security - violation.  Every Browser has it's own
SecurityManager class and you should consult the manual how you can lower these
restrictions.  If your applet however tries to get in contact with some server
functionality (servlets, RMI, CORBA), we are at the end of the possibilities of
an offline reader.

Q 4.2 Are unicoded applet names supported.

[Walter Pfannenmueller <pfn@online.de> writes:]

I don't know.  I transform those names to UTF8 encoding and the rest depends on
what your filesystem or the host filesystem does with it.  Java compilers do
have problems with unicode, too, even though it should be supported.  I'd
appreciate any information that helps enlighten the dark.  I'd like to know how
to code Unicode to UTF8 transformation.  The implementation in javaclass.c looks
somehow awkward.

Q 4.3 Why does my Netcape Browser throw the trustProxy security exception?

[Walter Pfannenmueller <pfn@online.de> writes:]

The error message should be

Could not resolve IP for host ... See the trustProxy property.

The Netscape Browser tries to verify the applets source host IP address.
While offline this is not possible. Therefore you have to persuade
the Browser to trust the proxy. To do this you have to find the preferences
file preferences.js on UNIX or prefs.js on Windows. Edit the file,
even though it says "don't edit" and insert the line

user_pref("security.lower_java_network_security_by_trusting_proxies", true);

somewhere. be sure to have closed all browser windows, because the
preferences file will be overwritten on closing. This should work for
all Netscape 4.0x and 4.5.
For more information have a look at
http://developer.netscape.com/docs/technote/security/sectn3.html

Section 5 - How to make most use of WWWOFFLE features

Q 5.1 How can I see what monitored pages were downloaded last time online?

The easiest way to do this is to go the the monitored web pages index and sort
the pages by "Access Time" (http://localhost:8080/index/monitor/?atime). Each
page is accessed when it is monitored so the most recently monitored ones are
the ones at the top of this listing.

Q 5.2 How can I do a recursive fetch on a regular interval?

This is a combination of the recursive fetch option and the monitor option.  If
you select the page that you want in the recursive fetch index
(http://localhost:8080/refresh-options/) with the options that you want and
press the button you will be presented with a page telling you that the request
has been recorded.  There is a link on here to allow you to monitor this
request, which takes you to the normal monitor page
(http://localhost:8080/monitor-options) but with the URL already filled in.

Q 5.3 How can I stop users from accessing the index?

Access to the indexes can be denied to users by using the configuration file
DontGet section.

DontGet
{
 http://localhost:8080/index
}

You must make sure that the hostname that you give is the first one in the
LocalHost section since this is what will be checked.

Q 5.4 How can I use JunkBuster with WWWOFFLE?

The Internet Junk Buster is a progam that can filter out many of the junk
adverts and other features of web-pages.

The most recent versions of WWWOFFLE add in many of the features of the
JunkBuster program but not all of them.  If you look at the options that
WWWOFFLE has you may decide that you don't need to use JunkBuster.

If you decide that you do want to use both programs then there are two options:

1) Browser <-> WWWOFFLE <-> JunkBuster <-> Internet

Any pages that the user requests that JunkBuster blocks will have the JunkBuster
error message stored in the WWWOFFLE cache.  Any recursive fetching or fetching
of images that WWWOFFLE does in the background are passed through JunkBuster and
the JunkBuster error messages are cached.

2) Browser <-> JunkBuster <-> WWWOFFLE <-> Internet

Any pages that the user requests that JunkBuster blocks will not be stored in
the WWWOFFLE cache.  Any recursive fetching or fetching of images that WWWOFFLE
does in the background are not passed through JunkBuster and they will be stored
in the WWWOFFLE cache but blocked when the browser tries to view them.

If you decide that WWWOFFLE will be doing lots of fetching because you are using
it to browse offline then the 1st method is best.  If you decide that you will
be only using it while online and not requesting pages when offline then the 2nd
method is best.

If reducing bandwidth is the most important feature of JunkBuster then the 1st
option is the best since it will stop WWWOFFLE fetching the junk pages.

Section 6 - More information about WWWOFFLE

Q 6.1 Who wrote WWWOFFLE, When and Why?

The WWWOFFLE program was written by Andrew M. Bishop (amb@gedanken.demon.co.uk)
in 1996,97,98.

There is a WWWOFFLE home-page on the World Wide Web, available via the author's
home-page at http://www.gedanken.demon.co.uk/ .  This is kept updated with news
about the program, as new versions become available.

An earlier program by the same author written in perl had been used for a while
but it was realised that the functionality of that version was insufficient
except for a small amount of use.  Work on the WWWOFFLE program itself started
in the Christmas holiday in 1996, initially as a hack to improve the perl
version.

After the release of the Beta version 0.9 at the beginning of January 1997 there
was a lot of interest generated which led to the release of version 1.0 later
that same month.  More versions followed until December that year when version
2.0 was released.  This contained several large new features (like FTP) and
included a re-write of a large proportion of the code to make it easier to
maintain and build on, this included changing completely the cache format.
Version 2.1 was released in March 1998 with some more new features, version 2.2
in June 1998 with more features and version 2.3 in August 1998 with even more
features.

The Win32 version of the program was made possible by version beta-20 of the
Cygwin development kit at the end of October 1998 when version 2.3e of WWWOFFLE
was released.

The WWWOFFLE program can be freely distributed according to the terms of the GNU
General Public License (see the file `COPYING').

Q 6.2 What WWWOFFLE mailing lists are available?

There are now four mailing lists available for WWWOFFLE.  They can be subscribed
to in two different ways - on the WWWOFFLE users web-page and via e-mail.

wwwoffle-announce       For announcements of new versions of WWWOFFLE.

wwwoffle-beta-announce  For pre-announcements of new versions of WWWOFFLE for
                        beta-testers.  (Only subscribe if you are willing to give
                        time to testing WWWOFFLE).

wwwoffle-users          For discussion of WWWOFFLE features, excluding operating
                        system specific features.

wwwoffle-win32          For discussion of WWWOFFLE on the Win32 system.

The first two are only for announcements from the author of WWWOFFLE, there is
no discussion allowed on them.  The latter two are open for posting from members
of the list and others who are not subscribed.

To subscribe by e-mail send a message to majordomo@gedanken.demon.co.uk with the
message 'subscribe <group-name>' in the body, e.g. 'subscribe wwwoffle-announce'.

Q 6.3 How do I report bugs in WWWOFFLE?

By e-mail, send them to me at amb@gedanken.demon.co.uk and put WWWOFFLE somewhere
in the subject line.  You can also report bugs or provide comments via the
feedback form on the WWWOFFLE home-page on the World Wide Web accessible via
http://www.gedanken.demon.co.uk/ .

Before doing this, you should check the FAQ and the WWWOFFLE web-page to see if
the answer is there.  If it is not and you want to report it to me then it helps
if you can reproduce the error, in particular if you start wwwoffled as
'wwwoffled -d 5 -c wwwoffle.conf' and capture the debugging output for the
session that shows the error.