|
HMMER
User's Guide
|
|
Dept. of Genetics |
WashU |
Medical School |
Sequencing Center |
CGM |
IBC|
|
Eddy lab |
Internal (lab only) |
HMMER |
PFAM |
tRNAscan-SE |
Software |
Publications
|
Next: Sequence files
Up: File formats
Previous: HMMER null model files
Observed counts of emissions (residues) and transitions (insertions
and deletions) in a multiple alignment are combined with
Dirichlet priors to convert them to probabilities
in an HMM.
For protein models, by default, HMMER uses a nine-component mixture
Dirichlet prior for match emissions, and single component Dirichlet
priors for insert emissions and transitions. The nine-component match
emission mixture Dirichlet comes from the work of Kimmen Sjölander
[Sjölander et al., 1996].
For DNA/RNA models, by default, HMMER uses single component
Dirichlets.
Two example null model files,
amino.pri and nucleic.pri, are provided
in the Demos subdirectory of the HMMER distribution. (They are
copies of the internal default HMMER prior settings.)
The way the format of these files is parsed is identical to null
models: everything after a # on a line is a comment, the order
of occurrence of the fields is important, and fields must be separated
by either blanks or newlines.
A prior file consists of the following fields:
- [Strategy] Must be the keyword Dirichlet. Currently
this is the only available prior strategy in the public HMMER release.
- [Alphabet type] Must be either Amino or
Nucleic.
- [Transition priors] 1 + 8a fields, where a
is the number of transition mixture components. The first field is the
number of transition prior components, a (often just 1). Then, for each
component, eight fields follow: the prior probability of that mixture
component (1.0 if there is only one component), then the Dirichlet
alpha parameters for the seven transitions, in order of
,
,
,
,
,
,
.
- [Match emission priors] 1 + (5 or 21)b fields,
where b is the number of match emission mixture components. The first
field is the number of match emission mixture components, b. Then, for
each component, 5 or 21 fields follows: the prior probability of that
mixture component (1.0 if there is only one component), then the
Dirichlet alpha parameters for the 4 or 20 residue types, in
alphabetical order.
- [Insert emission priors] 1 + (5 or 21)c fields,
where c is the number of insert emission mixture components. The
first field is the number of insert emission mixture components,
c. Then, for each component, 5 or 21 fields follows: the prior
probability of that mixture component (1.0 if there is only one
component), then the Dirichlet alpha parameters for the 4 or 20
residue types, in alphabetical order.
In the code, prior files are parsed by prior.c:P7ReadPrior().
Next: Sequence files
Up: File formats
Previous: HMMER null model files
Direct comments and questions to <eddy@genetics.wustl.edu>