The speechtools package provides some routines of the type that are frequently used in automatic speech recognition applications.
Eventually, a complete set of components for building speech recognisers will be available. At this time, only N-gram language models are fully implemented and tested.
At present there are two unfinished parts of the speech tools for acoustic modeling.
The Acoustic_Model
class provides a template for generalised
acoustic modeling, and is used by the stack decoder (see section Decoding).
The beginnings of a Hidden Markov Model class are available in
the HMM
class.
Ngram language models are supported by the EST_Ngrammar class
(see section EST_Ngrammar C++ Class), and associated routines and programs. The
programs described here supercede ngram
described in
section Executable Programs. As with all Speech Tools programs, the
command line option -help
prints a summary of options.
The program build_ngram
estimates ngram language models from
data. The data can be in any of the formats described in section Ngrammar data formats, and the language model saved in one of the formats
described in section Ngrammar file formats.
test_ngram
computes language model entropy/perplexity on test
data. At this time, only CSTR format (see section Ngrammar file formats)
language models can be loaded.
ch_ngram
allows file format conversion and linear interpolation of ngram language models.
A basic stack decoder template is implemented in the Decoder class. This uses template acoustic and language model classes, which are designed to be as general as possible (not limited to HMMs and N-grams for example).
This decoder may form the basis for future speech recognition work at CSTR.
A straightforward Viterbi decoder is already available, see (viterbi code description HERE), or see section Executable Programs.
Go to the first, previous, next, last section, table of contents.