dtd2html
is a
Perl
program that generates an
HTML
document that documents an
SGML
document type definition (DTD)
and allows hypertext navigation of an SGML DTD.
Contents:
dtd2html
generates various HTML files for hypertext
navigation of an SGML DTD. The files generated are as follows:
DTD-HOME.html
File is the home page of the HTML document. This file
contains the basic links to start navigating through the
DTD. The name of this file can be changed with the
-homefile
option. User text may be added to this page via the
Description File.
TOP-ELEM.html
This file lists the top-most elements of the DTD, and contains
the links to element pages describing each top-most element.
The name of this file can be changed with the
-topfile
option.
ALL-ELEM.html
This file contains a list of all elements defined in the DTD.
This page allows quick access to any individual element
description page. The name of this file can be changed with the
-allfile
option.
ENTS.html
(Optional)File contains a list of general entities defined in the DTD.
This file is only generated if the
-ents
option is specified during program invocation.
The name of this file can be changed with the
-entfile
option.
DTD-TREE.html
(Optional)File contains the content heierachy tree(s) of
the top-most element(s) in the DTD. This file is only generated
if the
-tree
option is specified during program invocation. The
name of this file can be changed with the
-treefile
option.
.html
For each element defined in the DTD, an element description file is generated with a filename of the element name suffixed by ".html". User text may be added to this page via the Description File.
.attr.html
For each element defined in the DTD, a file is generated describing the attributes defined for the element. User text may be added to this page via the Description File.
.cont.html
For each element defined in the DTD, a file is generated listing the content model decleration of the element as declared in the DTD.
Once all the files are generated, one needs only to create a link in the Web server being used to the DTD-HOME page.
More information on the content of each file is in the HTML File Descriptions section.
dtd2html
is invoked from a command-line shell, with the
following syntax:
% dtd2html [options] filename
filename is the SGML DTD to be parsed for generating the HTML files. The following is the list of options available:
-allfile
filenameSet the filename for file listing all elements in the DTD to
filename. The default name is "ALL-ELEM.html
".
-catalog
filenameUse filename as the file for mapping public
identifiers and external entities to system files. If
-catalog
is not specified, "catalog
" is
used as the default filename.
See
Resolving External Entities for more
information.
-contnosort
The base content list of the element.html page is listed as declared in the content model declaration. Normally, the elements are listed in sorted order and with no group delimiters, group connectors, or occurance indicators.
-descfile
filenameUse filename as the source for element descriptions in the DTD. If this argument is not specified, no description file is used. See Description File for more information.
-docurl
URLUse URL for location of documentation on
dtd2html
. The default
URL is "file:/usr/doc/perlsgml/dtd2html.html
".
-dtdname
stringSet the name of the DTD to string. If not specified,
dtd2html
determines the name of the DTD by its filename with the
extension stripped off. If reading from standard input, then
this argument should be specified. Otherwise, "Unknown" is
used. The string " DTD" will be appended to the name of the
DTD. If the -qref
option is specified, then the string " DTD Quick Reference"
is appended to represent the title of the quick reference document.
-elemlist
Generate a blank description file to standard output. See Description File for more information.
-ents
Generate a general entities page. The general entities types listed are: replaceable character data, CDATA, SDATA, and PI (processing instruction). Note: For large DTDs, this list may be quite large and provide little usefulness to the document.
-entsfile
filenameSet the filename for the general entities page to
filename. The default name is "ENTS.html"
.
-entslist
Generate a blank description file
to standard output containing ONLY general entity
entries. This differs from
-elemlist
is that
-elemlist
outputs ONLY entries for elements and attributes.
See
Description File
for more information.
-help
Print out a terse description of all options available. No HTML files are generated and all other options are ignored when this option is specified.
-homefile
filenameSet the filename for the HTML home page for the DTD to
filename. The default name is "DTD-HOME.html"
.
-keepold
This option is only valid if
-updateel
is specified. This
option tells dtd2html
to preserve any old descriptions when
updating an description file.
-level
#Set the prune level of the content hierachy tree to
#. This option is only valid if
-tree
is specified.
-modelwidth
#Set the maximum output width for content model declarations to
# for element.cont.html
pages.
Default value is 65.
-nodocurl
Do not insert hyperlink to dtd2html
documentation in the
DTD-HOME page.
-noreport
This option is only valid if
-updateel
is specified. This
options tells dtd2html
to not output a report when updating an
description file.
-outdir
pathSet destination of generated HTML files to path. Defaults to the current working directory.
-qref
Output a quick reference document of the DTD. The document is
outputted to standard output (STDOUT). When this option is
specified, only the quick reference document is generated.
Therefore, the tree page and the
-outdir
options are ignored. See
Quick Reference Mode
for more information on the -qref
option.
-qrefdl
Output a quick reference document of the DTD using the <DL>,
definition list, HTML tag. When this option is specified,
only the quick reference document is generated. Therefore, the
tree page and the
-outdir
options are ignored. See
Quick Reference Mode
for more information. This option overrides the
behavior of the
-qref
option.
-qrefhtag
htagUse htag as the header tag for the element names when the
-qref
option is specified. Defaults to '<H2>'.
-reportonly
This option is only valid if
-updateel
is specified. This
options tells dtd2html
to generate only a report when the
-updateel
option is specified.
-topfile
filenameSet the filename for file listing the top-most elements in the
DTD to filename. The default name is
"TOP-ELEM.html
".
-tree
Generate the content hierarchy of the top-most elements defined in the DTD.
-treelink
Create anchor in HTML pages to the tree page, even if
-tree
is not specified.
-treefile
filenameSet the filename for file containing the content hierarchy
tree(s) of the DTD to filename. The default name is
"DTD-TREE.html
". This option is only valid if
-tree
is specified.
-treeonly
Create only the tree page. This option implies
-tree
.
-treetop
stringSet the top-most elements to string. String is a comma
separated list of elements that dtd2html
should treat as the
top-most elements when printing the content hierarchy tree(s),
and/or which elements get listed in the TOP-ELEM page.
Normally, dtd2html
will compute what are the top-most elements
of the DTD. This option overrides that computation.
-updateel
filePerform an update of the description file specified by file. This option allows one to update an element description to contain any new elements/attributes that have been added to the DTD without affecting element descriptions already defined. See Updating Description File for more information.
-verbose
Print status messages to standard error on what dtd2html
is
doing. This
option generates much output, and is used mainly for debugging
purposes.
All HTML files/pages generated contain hypertext links at the end of the page to the DTD-HOME, TOP-ELEM, ALL-ELEM, ENTS (optional), and DTD-TREE (optional) pages, unless stated otherwise.
This page is the root of the HTML document. It contains the links to the other main pages as described above.
One can add documentation to the home page via the Description File or by manually editting the file.
This page contains the list of all top-most elements defined in the DTD. A top-most element is defined as: An element which cannot be contained by another element or can be only contained by itself.
This page contains an alphabetic list of all elements defined in the DTD.
This page contains an alphabetic list of of general entities defined in the DTD. The general entities types listed are: replaceable character data, CDATA, SDATA, and PI (processing instruction). Note: For large DTDs, this list may be quite large and provide little usefulness to the document. Also, entities are not handled when updating a description file.
This page contains the content hierarchy tree(s) of the top-most
elements of the DTD. The maximum depth of the tree can be set
via the
-level
command-line option.
The tree shows the overall content hierarchy for an element.
Content hierarchies of descendents will also be shown. Elements that
exist at a higher (or equal) level, or if the maximum depth has been
reached, are pruned. The string "...
" is appended to an
element if it has been pruned due to pre-existance at a higher (or
equal) level. The content of the pruned element can be determined
by searching for the complete tree of the element (ie. elements w/o
"...
"). Elements pruned because maximum depth has been
reached will not have "...
" appended.
Example:
|__section+) |_(effect?, ... |__title, ... |__toc?, ... |__epc-fig*, | |_(effect?, ... | |__figure, | | |_(effect?, ... | | |__title, ... | | |__graphic+, ... | | |__assoc-text?)
Pruning must be done to avoid a combinatorical explosion. It is common for DTD's to define content hierarchies of infinite depth. Even with a predefined maximum depth, the generated tree can become very large.
Since the tree outputed is static, the inclusion and exclusion sets
of elements are treated specially. Inclusion and exclusion elements
inherited from ancestors are not propagated down to determine
what elements are printed, but special markup is presented at a
given element if there exists inclusion and exclusion elements from
ancestors. The reason inclusions and exclusions are not propagated down
is because of the pruning done. Since an element may occur in multiple
contexts -- and have different ancestoral inclusions and exclusions in
effect -- an element without "...
" may be the only place
of reference to see the content hierarchy of the element.
Example:
D1 | {+} idx needbegin needend newline | |_(head, | | {A+} idx needbegin needend newline | | {-} needbegin needend | | | |_(((#PCDATA | | |____((acro | | | | {A+} idx needbegin needend newline | | | {A-} needbegin needend | | | | | |_(((#PCDATA | | | |____((super | ... | | |______sub)))*)) ...
Ignoring the lines starting with {}'s, one gets the content
hierachy of an element as defined by the DTD without concern of where
it may occur in the overall structure. The {} lines give additional
information regarding the element with respect to its existance
within a specific context. For example, when an ACRO
element occurs within D1,HEAD
-- along with its normal
content -- it can contain IDX
and NEWLINE
elements due to inclusions from ancestors. However, it cannot contain
NEEDBEGIN
and NEEDEND
regardless of its
defined content since an ancestor(s) excludes them.
NEEDBEGIN
,
NEEDEND
are excluded from ACRO
.Explanation of {}'s keys:
{+}
{+}
appended
to the subelement entry.
{A+}
{-}
{-}
appended to the subelement
listing.
{A-}
The element page describes the content of element. The element page is divided into the following sections:
The element.attr page describes the attributes of element. The element.attr page is divided into the following sections:
This page is not created if no attributes are defined for element.
The element.cont page gives the element's content model decleration as defined in the DTD. The element.cont page is divided into the following sections:
The content models are reformatted to allow better readability.
The maximum width to use when reformating is set by the
-modelwidth
option. Each element listed in the content model is a hyperlink
to that element's page.
Here's an example of how
dtd2html
formats content model declarations:
(((#PCDATA| ((acro|book|emph|location|not|parm|term|var))| ((super|sub))| ((link|xref))| ((computer|cursor|display|keycap|softkey|user))| ((footnote|ineqn|ingraphic|fillin))| ((nobreak)))*))
This page is not created if element is defined with empty content.
dtd2html
supports the ability to add documentation
to the HTML files
generated from a DTD through the
-descfile
option. Documentation can
be added to the
element pages,
the
attribute pages,
and/or
ents page.
The basic syntax of the description file is as follows:
<?DTD2HTML identifier> <P> Description of identifier here. </P> <?DTD2HTML identifier> <P> Description of identifier here. </P> ...
The line <?DTD2HTML identifier>
signifies the beginning of a description entry for identifier.
All text up to the next
<?DTD2HTML ...>
instruction or end-of-file is used as the identifier description.
The identifier can be one of the following formats:
An element name in the DTD. The following description text will go at the top of the element's page.
*
An element in the DTD followed by a `*
'. The following
description text will go at the top of the element's attribute
page.
*
attributeAn element in the DTD followed by a `*' which is followed by an attribute name of the element. The following description text will go below the attribute heading of the element's attribute page.
+
An element in the DTD followed by a '+
'. The following
description text goes after each elements listed in
ALL-ELEM
and in
element pages.
Due to the context that
the description text will appear (ie. inside a <LI> element),
it is best to keep the description to a single sentence.
*
attributeA `*
' followed by an attribute name.
The following description
text will go to any attribute named attribute, unless a
specific description is given to the attribute via an
element*
attribute.
This identifier allows to add descriptions
to commonly shared attributes in one locale.
&
A general entity followed by a '&
'.
The following description text will go after each entity listed in
the ENTS page.
Due to the context that
the description text will appear (ie. inside a <LI> element),
it is best to keep the description to a single sentence.
,
identifier,
...
A sequence of identifiers separated by commas, `,'. This allows a description to be shared among muliple identifiers. Note: there should be NO whitespace between the identifiers and the commas.
If the special element, -HOME-
, is specified in the
description file, then its description text will be put on the
DTD-HOME
page.
dtd2html
provides special instructions that may be
used in a description file to control how dtd2html
processes the file.
Special instructions follow a similiar syntax as descriptive instructions:
<?DTD2HTML #instruction argument>
The following special instructions are defined:
#include
argumentThe include
directive tells dtd2html
to treat the argument as a filename to read that contains
description entries. Example:
<?DTD2HTML #include ents.dsc>
The example instructs dtd2html
to open a file called
ents.dsc
and read it for description entries.
SGML comments are also supported in the description file. Comments are
skipped by dtd2html
. The syntax for a comment is the following:
<!-- This is a comment -->
dtd2html
can only handle a comment that
spans a single line (to
make the parsing simple). Therefore, the following will cause
dtd2html
to add the comment text beyond the first line of the
comment to an indentifier's description:
<!-- This is a comment that spans more than one line. -->
If you want to put line breaks in the description file without them
being applied to an indentifier's description, then use the SGML short
comment: <!>
.
<!-- Include external descriptions --> <!> <?DTD2HTML #include ents.dsc> <!> <!-- A short description --> <!> <?DTD2HTML a+ > Anchor; source and/or destination of a link <!> <!-- A shared description --> <!> <?DTD2HTML h1,h2,h3,h4,h5,h6 > <p> The six heading elements, <H1> through <H6>, denote section headings. Although the order and occurrence of headings is not constrained by the HTML DTD, documents should not skip levels (for example, from H1 to H3), as converting such documents to other representations is often problematic. </p> <!> <!-- Element and attribute descriptions --> <!> <?DTD2HTML a > <p> The <A> element indicates a hyperlink anchor. At least one of the NAME and HREF attributes should be present. </p> <?DTD2HTML a* > <?DTD2HTML a*href > <p> Gives the URI of the head anchor of a hyperlink. </p> <?DTD2HTML a*methods > <p> Specifies methods to be used in accessing the destination, as a whitespace-separated list of names. The set of applicable names is a function of the scheme of the URI in the HREF attribute. For similar reasons as for the <a href="title.html">TITLE</a> attribute, it may be useful to include the information in advance in the link. For example, the HTML user agent may chose a different rendering as a function of the methods allowed; for example, something that is searchable may get a different icon. </p>
dtd2html
ignores element descriptions that
are empty or contain only the <P> tag.
If duplicate descriptions exist, the first one defined is used (In versions prior to 1.3.0, it was the last description defined that was used).
To get started with a description file for a DTD, you can use the
-elemlist
option to
dtd2html
to generate a file with all
elements and attributes defined in the DTD with empty descriptions.
To get a list of general entities, you can use the
-entslist
option to
dtd2html
to generate a file with
general entities defined in the DTD with empty descriptions.
dtd2html
supports the ability to generate a quick
reference document
of a DTD with the
-qref
option. The document generated is sent to
standard output (STDOUT). Therefore, one should redirect STDOUT to a
file. Example:
% dtd2html -qref html.dtd > htmlqref.html
No other output/files are generated while in quick reference mode.
The format of the quick reference document is as follows:
The title is determined by the
-dtdname
option (or the filename of
the DTD if the option is not specified).
Each element is listed in an <H2> tag (or the tag
specified by the
-qrefhtag
option) wrapped with the '<>' characters.
Any element description text follows the element heading if defined in a description file.
All elements are listed in alphabetical order.
Each element in the <H2> tag is wrapped with the <A NAME="element"> tag so one may cross-reference the element if desired. Example:
<H2><A NAME="body"><body></A></H2>.
An alternative format for the quick reference document may be
generated with the
-qrefdl
command-line option. The format of the
document shares the same properties as those of the
-qref
option, with
the following exceptions:
Each element is still wrapped in a <A NAME> statement to allow cross-referencing.
Keep element descriptions as brief as possible. The quick
reference document may get quite large for large DTDs. Care must
also be given if using the
-qrefdl
option; less HTML markup is
available while in a <DL>.
Keep a separate description file just for the quick
reference. Usually, the description file used in the
normal dtd2html
output would be inappropriate for a quick
reference.
The -HOME-
element description identifier may
be used to place
text before the list of elements. One could add a link to the
DTD-HOME page that is generated by dtd2html
when the
-qref
option is not used.
As a DTD changes, one can automatically update the element description
file for the DTD to reflect the changes via the
-updateel
command line
option. The new updated description file is sent to standard
output (STDOUT). Therefore, one should redirect STDOUT to a file.
Example:
% dtd2html -updateel html.desc html.dtd > html-new.desc
When updating a description file, a report is prepended to the new description file. The report is contained in SGML comment declaration statements. Here's an example of what the report looks like:
<!-- Element Description File Update --> <!-- Source File: sgm/html.desc --> <!-- Source DTD: sgm/html.2.0/html.dtd --> <!-- Deleting Old? Yes --> <!-- Date: Mon Jun 27 00:25:41 EDT 1994 --> <!-- New identifiers: --> <!-- br, dl*, dl*compact, form, form*, form*action, form*enctype, --> <!-- form*method, img*ismap, input, input*, input*align, --> <!-- input*checked, input*maxlength, input*name, input*size, --> <!-- input*src, input*type, input*value, option, option*, --> <!-- option*selected, option*value, select, select*, --> <!-- select*multiple, select*name, select*size, strike, textarea, --> <!-- textarea*, textarea*cols, textarea*name, textarea*rows --> <!-- Old identifiers: --> <!-- dir*, dir*compact, key, link*name, menu*, menu*compact, ol*, --> <!-- ol*compact, u, ul*, ul*compact --> <!-- -->
Entity descriptions are NOT checked, and are excluded from the output. Only elements and attributes are processed.
If the description file processed contains "#include" instructions, these instructions are not preserved in the output. The output is a merging of all description entries processed.
If "#include" instruction are used, it may be best to use the
-reportonly
option. Therefore,
you can determine what has changed and update the description file(s)
manually.
The report will specify any new identifiers that were created, and any old identifier no longer applicable to the DTD.
By default, any old identifiers are removed in the new element
description file. This can be overriden by the
-keepold
option.
The report will state if old identifiers are deleted or not.
ALL non-deleted identifiers keep all the description text specified in the source (original) description file.
If you desire no report, use the
-noreport
option.
If all you desire is to see what changes exist without creating a
new description file, then use the
-reportonly
option.
This option will only cause the report to be generated. This may
be used to help keep track of changes in a DTD.
Any user entered comments in the source element description file are lost in the update.
Defining the mapping between external entities to system files
may be done via the -catalog
command-line option. The catalog provides you with the
capability of mapping public identifiers to system identifiers
(files) or to map entity names to system identifiers.
Catalog Syntax
The syntax of a catalog is a subset of SGML catalogs (as defined in SGML Open Draft Technical Resolution 9401:1994).
A catalog contains a sequence of the following types of entries:
PUBLIC
public_id system_idThis maps public_id to system_id.
ENTITY
name system_idThis maps a general entity whose name is name to system_id.
ENTITY %
name system_idThis maps a parameter entity whose name is name to system_id.
Syntax Notes
A system_id string cannot contain any spaces. The system_id is treated as pathname of file.
Any line in a catalog file that does not follow the previously mentioned entries is ignored.
In case of duplicate entries, the first entry defined is used.
Example catalog file:
-- ISO public identifiers -- PUBLIC "ISO 8879-1986//ENTITIES General Technical//EN" iso-tech.ent PUBLIC "ISO 8879-1986//ENTITIES Publishing//EN" iso-pub.ent PUBLIC "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN" iso-num.ent PUBLIC "ISO 8879-1986//ENTITIES Greek Letters//EN" iso-grk1.ent PUBLIC "ISO 8879-1986//ENTITIES Diacritical Marks//EN" iso-dia.ent PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN" iso-lat1.ent PUBLIC "ISO 8879-1986//ENTITIES Greek Symbols//EN" iso-grk3.ent PUBLIC "ISO 8879-1986//ENTITIES Added Latin 2//EN" ISOlat2 PUBLIC "ISO 8879-1986//ENTITIES Added Math Symbols: Ordinary//EN" ISOamso -- HTML public identifiers and entities -- PUBLIC "-//IETF//DTD HTML//EN" html.dtd PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML" ISOlat1.ent ENTITY "%html-0" html-0.dtd ENTITY "%html-1" html-1.dtd
Environment Variables
The following envariables (ie. environment variables) are supported:
This is a colon (semi-colon for MSDOS users) separated list of paths for finding catalog files or system identifiers. For example, if a system identifier is not an absolute pathname, then the paths listed in P_SGML_PATH are used to find the file.
This envariable is a colon (semi-colon for MSDOS users) separated list of catalog files to read. If a file in the list is not an absolute path, then file is searched in the paths listed in the P_SGML_PATH and SGML_SEARCH_PATH.
This is a colon (semi-colon for MSDOS users) separated list of paths for finding catalog files or system identifiers. This envariable serves the same function as P_SGML_PATH. If both are defined, paths listed in P_SGML_PATH are searched first before any paths in SGML_SEARCH_PATH.
The use of P_SGML_PATH is for compatibility with earlier versions.
SGML_CATALOG_FILES and SGML_SEARCH_PATH
are supported for compatibility with James Clark's nsgmls(1)
.
The file specified by
-catalog
is read first before any files specified by SGML_CATALOG_FILES.
This program is part of the perlSGML package; see <URL:file:/usr/doc/perlsgml/perlSGML.html>