Section 0 - Why doesn't this FAQ answer my question?
Section 1 - What does WWWOFFLE do (and what it doesn't)
Q 1.1 Does WWWOFFLE support http, ftp, finger, https, gopher, ...?
Q 1.2 Does WWWOFFLE run on systems other than UNIX?
Q 1.3 Can you change WWWOFFLE so that in the pages that it generates ...?
Section 2 - How to use WWWOFFLE to serve an intranet
Q 2.1 Can the WWWOFFLE proxy be accessed by clients other than localhost?
Q 2.2 Why can't remote clients access the WWWOFFLE proxy?
Q 2.3 Why can't remote clients follow all of the links?
Q 2.4 What are the security issues with WWWOFFLE in a multi-user environment?
Section 3 - What to look for when WWWOFFLE fails
Q 3.1 Why does my browser return an empty page with WWWOFFLE but not without?
Q 3.2 Why can't WWWOFFLE find a host when the browser without it can?
Q 3.3 Why does my browser say "Connection reset by peer" when browsing?
Q 3.4 Why does following a link on an FTP site go to the wrong server?
Section 4 - Undocumented features of WWWOFFLE
Q 4.1 How can I see what monitored pages were downloaded last time online?
Q 4.2 How can I do a recursive fetch on a regular interval?
Q 4.3 How can I stop users from accessing the index?
Section 5 - More information about WWWOFFLE
Q 5.1 Who wrote WWWOFFLE, When and Why?
Q 5.2 How do I report bugs in WWWOFFLE?
Some of these are supported and some are not. http : Yes The original version of WWWOFFLE only supported http. ftp : Yes Since version 2.0 there has been support for ftp URLs. finger : Yes Since version 2.1 there has been support for finger. Although this is not a standard protocol for proxying there is no reason that it cannot usefully be performed. https : No This is not really something that you would want to be cached anyway. There is a defined method of proxying (non-caching) this protocol, so it is possible and I have the information. It will be added when time permits. gopher : No This is a protocol that is less popular now that the WWW has really taken off. From looking at browsers that support it, it would seem to be not impossible, but the market for it seems to be limited.
For example DOS / Win3 / Win95 / WinNT / OS/2. UNIX = Yes This is the system that the program way designed and initially written for, it should work on many versions of UNIX. I know that it works on Linux, SunOS 4.1.x, Solaris 2.x. DOS/Win3 = No The program was not designed for DOS, the filenames used and the multi-process nature of the program do not allow this. Win95/WinNT = Maybe Since Windows 95 and Windows NT claim to be real 32-bit multi-tasking operating systems, and support the long filenames that are required, it should in theory be possible to get the program working on these. OS/2 = Maybe As for Windows 95 / Windows NT above.
This is a question that gets asked a lot. People want to see Javascript, images, different colours ... on the web pages that WWWOFFLE generates. From version 2.2 this is no longer an issue since it is possible to customise all of the web-pages that WWWOFFLE itself generates. This means that the background colour and the font size can all be changed to suit your preferences. To find out how to do this look in the /var/spool/wwwoffle/html/messages directory and read the README file.
Yes it can, that facility has been present from the beginning. The other clients can be any type of computer that is connected to the server that is running the wwwoffled program. The only requirement is that they are networked to the server and that they have browsers on them configured to access the WWWOFFLE proxy.
The default situation in the wwwoffle.conf file is to not allow any clients to access the proxy other than localhost. To allow them to access the proxy the wwwoffle.conf file needs to be edited as described below and the new configuration loaded. The AllowedConnect section of the configuration file contains a list of hosts that are allowed to connect to the WWWOFFLE proxy. These names are matched against the name that WWWOFFLE gets when the connection is made and access is allowed or denied. A form of wildcard matching is applied to the entries in this list but no extra name lookups are performed. For example you are using the private IP address space 192.168.*.* for your intranet then your AllowedConnect section in the configuration file should look like this. AllowedConnect { 192.168 } This will allow all hosts that come from this set of IP addresses to connect to the WWWOFFLE proxy.
Some of the links that are generated in the web pages that come out of the WWWOFFLE proxy need to point to other pages on the proxy. To be able to do this the name of the host running the proxy needs to be specified in the LocalHost section of the configuration file. For example if the computer running the WWWOFFLE proxy is called www-proxy then the LocalHost section of the configuration file would look like this. LocalHost { www-proxy localhost 127.0.0.1 } The first of the names is what is used by WWWOFFLE to generate these links. The others are used for servers that do not get cached by the proxy.
Security is a feature that I have considered to some extent when writing WWWOFFLE although it has not been one of my biggest concerns. The issues are listed below. Configuration file password This file can have a password specified in it in the StartUp section that is used to limit access to the control features of WWWOFFLE. If set this password must be used to put WWWOFFLE online, put it offline, purge the cache, stop the server, edit the configuration file etc. If you have set a password then you should also make the file readable only by authorised users. Note: The password is sent as plain text when using the wwwoffle program to control the server. The encryption used for the web page authentication is trivial. WWWOFFLE server uid/gid The uid and gid of the wwwoffled server process can be controlled by the run-uid and run-gid options in the StartUp section of the configuration file. This uid/gid needs to be able to read the configuration file (write is not required unless the interactive edit page is used) and have read/write access to the spool directory. If this option is used then the server must be started by root. Deleting requested URLs Only the user that makes a request for a page can delete that request, and then only when the deletion is done immediately. This is because a password is made by hashing the contents of the file in the outgoing directory. This means that read access to this directory must be denied for this to be secure. The built in web server This is a very simple server and will follow symbolic links, as a security feature only files that are world readable can be accessed. They must also be in a directory that the wwwoffled server can read. A check is not made for each directory component so world readable files in a directory readable only by the uid that runs wwwoffled are not safe. Accessing the cache There is in general no problem with allowing users access to the cache provided it is read only. The only concern is that if purging is done using the access time of the files then running grep on the cache will spoil this. URLs with Passwords The URLs that use usernames and passwords need to be stored in the cache. For simplicity they are not hidden in any way. This means that any URL that uses a username/password in it can show up in the log file (with Debug or ExtraDebug levels only). The files in the cache also contain the username/ password information and should be made inaccesible to users for that reason.
When using a browser to visit a web-page nothing is returned when WWWOFFLE is used as a proxy but when the site is accessed directly without WWWOFFLE the page is visible. This can have a number of causes (all reported to me or tested myself): a) The web server that you are accessing requires the User-Agent header. If it is not present or set to an uncommon value (not Netscape or IE) then it returns an empty page In this case if you have the CensorHeader configuration file section set to remove the User-Agent header then you should either not censor this header line or set a replacement string that is acceptable. b) As above, but it does not matter what the value is for it to return a non-empty page. The solution is the same except that any User-Agent string can be used. c) The web server uses cookies to maintain state. This is common on sites that are more concerned with form than content, often without warning.
When WWWOFFLE looks up a hostname it uses the standard UNIX library (libc) function call gethostbyname(). This will only return the host information if the name that it receives from the domain name server (DNS) is authoritative. A non-authoritative answer is not returned, but an error status is set. Large browser projects (Netscape in particular) will use a non-authoritative answer if one is available. This means that it can access sites that are not available to WWWOFFLE. The source code for doing this is not obvious and requires quite low level functions in the name resolver library (libresolv). This problem only happens when the name server that you are using has poor connectivity to other name servers or some other name resolving problem. If possible the solution is to use a different name server, or complain to the manager of the one that you do use. [If anybody has source code for non-authoritative name lookups please tell me.]
This happens when using Netscape to access some web-pages. The cause is not known, but the problem is only seen when WWWOFFLE is used and not when a direct connection is made. [I believe that this problem (a peculiarity of Netscape) has been fixed in version 2.2c, please tell me if this is not the case.]
If there is a directory called '/dir' on an ftp server and you load the page 'ftp://server/' you get a directory listing that includes a link to '/dir'. Following this link should take the browser to 'ftp://server/dir/', but on some browsers it goes to 'ftp://dir/' instead. I think that this behaviour is due to the browser and not WWWOFFLE. If you went to 'http://server/' and followed the link to '/dir/' then you would expect to go to 'http://server/dir/' and not to 'http://dir/'. This is just common sense. Why the browser is different for ftp than http I am not sure. [This should be fixed in version 2.1 of WWWOFFLE, so is not really applicable to this version of the FAQ]
The easiest way to do this is to go the the monitored web pages index and sort the pages by "Access Time" (http://localhost:8080/index/monitor/?atime). Each page is accessed when it is monitored so the most recently monitored ones are the ones at the top of this listing.
This is a combination of the recursive fetch option and the monitor option. If you select the page that you want in the recursive fetch index (http://localhost:8080/refresh-options/) with the options that you want and press the button you will be presented with a page telling you that the request has been recorded. There is a link on here to allow you to monitor this request, which takes you to the normal monitor page (http://localhost:8080/monitor-options) but with the URL already filled in.
Access to the indexes can be denied to users by using the configuration file DontGet section. DontGet { http://localhost:8080/index } You must make sure that the hostname that you give is the first one in the LocalHost section since this is what will be checked.
The WWWOFFLE program was written by Andrew M. Bishop (amb@gedanken.demon.co.uk) in 1996,97,98. There is a WWWOFFLE home-page on the World Wide Web, available via the author's home-page at http://www.gedanken.demon.co.uk/ . This is kept updated with news about the program, as new versions become available. An earlier program by the same author written in perl had been used for a while but it was realised that the functionality of that version was insufficient except for a small amount of use. Work on the WWWOFFLE program itself started in the Christmas holiday in 1996, initially as a hack to improve the perl version. After the release of the Beta version 0.9 at the beginning of January 1997 there was a lot of interest generated which led to the release of version 1.0 later that same month. More versions followed until December that year when version 2.0 was released. This contained several large new features (like FTP) and included a re-write of a large proportion of the code to make it easier to maintain and build on, this included changing completely the cache format. Version 2.1 was released in March 1998 with some more new features, version 2.2 in June 1998 with more features and version 2.3 in August 1998 with even more features. The WWWOFFLE program can be freely distributed according to the terms of the GNU General Public License (see the file `COPYING').
By e-mail, send them to me at amb@gedanken.demon.co.uk and put WWWOFFLE somewhere in the subject line. You can also report bugs or provide comments via the feedback form on the WWWOFFLE home-page on the World Wide Web accessible via http://www.gedanken.demon.co.uk/ . Before doing this, you should check the FAQ and the WWWOFFLE web-page to see if the answer is there. If it is not and you want to report it to me then it helps if you can reproduce the error, in particular if you start wwwoffled as 'wwwoffled -d5 -c wwwoffle.conf' and capture the debugging output for the session that shows the error.