Go to the first, previous, next, last section, table of contents.


C++ classes for speech and linguistic objects

The EST provides speech classes for most of the commonly used speech classes. In addition, there is set of classes for complex linguistic data structures.

Waveform C++ class

This class is meant to take all the hassles out of waveforms...

Features include:

Member Functions

Constructors

There are two constructors. EST_Wave() is the default constructor, and EST_Wave(const EST_Wave &a) is the copy constructor.

Access member functions

File i/o

The EST_Wave class has a considerable file i/o library and can read and write nist, ulaw, ESPS sd, CSTR vox, Sun/Next snd, Microsoft riff, Apple aiff and unheadered (raw) files.

Manipulation

General Information

Other Useful functions for the EST_Wave Class

Some other useful (non-member) functions for the wave class are described.

Track C++ class

The Track class is intended for represnting data as a function of time. Possible examples include Fo contours, cepstra, LPC parametes. Features include:

File i/o support for various file formats including ESPS FEA, htk, xmg, ascii, snns (Stuttgart neural net software) and xgraph.

Different algorithms require data to be in different file formats. For instance, some want the points evenly spaced, while others require a set of x/y values. The Track class provides the means to change easily from one format to another.

Breaks can be placed in the Track. A break at a point indicates that there is no value at that particular point. This can be used to indicate unvoiced regions in F0 contours and formants, for instance.

By default, Tracks have single channel, but any number of channels can be used. The channels all share the same timing and break information.

Some algorithms want their data in milli-second spacing, others in second spacing. By default all times in EST are in seconds, but functions exist which automatically provide a milli-second equivalent.

Consult the header file `include/EST_Track.h' for a full list of member functions.

Design

The data is stored in three arrays: time, amplitude and value. The amplitude array stores the actual values, while the time array stores the times at which those values occur. All times are in seconds. In many applications, the points on the time access are fixed and thus this is somewhat redundant. However, some applications (such as pitch synchronous marking) require the points to be at uneven intervals and this is the main use of the time array.

The amplitude array is two dimensional, one array for the frame access, the other for channel access. By default, it is assumed that the Track has a single channel.

Often a data type has missing values. For instance in an Fo contour, there are no Fo values during unvoiced regions of speech. The value array is used to store this information. A value of 1 indicates that the track has a proper value at that point, a value of 0 indicates that there is no proper value. These 0 values are referred to as breaks.

Different algorithms require the data to be in different formats, and a general function change_type() is used for this. Two major variations are provided. The first converts the time axis to fixed interval spacing and the second allows the option of having as many breaks as there are time points, or only have one break between stretches of contour.

Initialisation

There are four constructor functions for the Track class:

Member functions

Controlling breaks

Track configuration

Other variables

EST_Stream_Item C++ Class

A Stream_Item object contains information relating to a single label. The principal fields are name, relating to the identity of the unit (e.g. phoneme name), and end which is the end point in seconds.

There are a number of secondary fields which are used to store extra information and to allow items in one stream to be linked to items in another.

Consult `include/EST_Stream.h' for definitons and member functions.

Storing Additional Information

Obviously one often needs to store more information about an item than simply its name and end point. There are various mechanisms within the EST_Stream_Item class which make provision for this.

Fields

The simplest method is to store the additional information using the fields mechanism, which allows a single EST_String to be tagged to the EST_Stream_Item. The set_field_names() function can be used to set this and the fields() is used to read it.

If the literal flag is set when the EST_Stream.load() function is called on an xlabel file, any information following the separator (";" by default) after the label name will be placed in the fields variable.

Contents

Because you may wish to associate arbitrary information with a Stream Item we provide a method to storge arbitrrary data in a Stream Item.

The member function set_contents() takes a void * pointer to data and a pointer to a garbage collection function that will delete contents appropriately if called with the contents. Reference counts are used to keep track of users of this contents, and the garbage colleciton function is called only when the last instantiation of an EST_Stream_Item referencing the contents is deleted.

For example suppose we wished to associate an EST_Wave with a stream item. We can define a gc function as

static void gc_wave(void *w) { delete (EST_Wave *)w; }

The following code will set the contents to a given wave

   EST_Wave *w;
   EST_Stream_Item i;

   w = new EST_Wave(wave);
   i.set_contents(w,gc_wave);

Note that the given contents must not be deleted by any other destructor so here we copy the wave into a new wave before setting the contents.

The member function contents() returns a void * pointer to the data which should be cast to what ever structure or class is in the contents.

Note the reason we offer this rather than depend on some inherited class or name fields of specific classes in the stream item is that we wish to allow arbtrary class and structures to be held as contents without having to recompile the speech tools.

Features

The allow a more general feature mechanism for stream items, a list of names and feature values may be associated with an item. Feature names are strings while values may be integers, floats or strings. The class EST_Val is a general class that will assign and convert between ints, floats and strings as required.

EST_Stream C++ Class

The EST_Stream class is used to represent lists of linguistic objects, such as phones, syllables or words. It is a based on a speech synthesis architecture developed by Paul Taylor and Alan Black at ATR (black94). This is a C++ implementation.

A EST_Stream object is essentially a list of EST_Stream_Items, with some additional functions for file i/o etc.

It has two additional fields stream_name and pos_name. stream_name is used for storing the type of the stream, e.g. "phoneme" or "syllable". pos_name is used for storing the name of a single special label. This reason for this is rather obscure but is useful when dealing with files where the labels represent binary features.

The headed file `include/EST_Stream.h' contains the full class definition.

Accessing a EST_Stream

The easiest way to access the information in a EST_Stream is to use a for loop iteration idiom similar to that used in the EST_TList class.

Declare an iteration variable as a pointer to EST_Stream_Item:


        EST_Stream s;
        EST_Stream_Item *p;

        for (p = s.head(); p != 0; p = next(p))
                cout << p->name();

Note that unlike the EST_TList class, there is no data encapsulation and that access it provided directly through the pointer. This may be changed in later versions.

EST_Utterance C++ Class

At present the Utterance class consists of a list of streams, some accessing functions and a load and save function.

Any number of streams can be accommodated within an Utterance object. They must be initialised before they can be used, and this is done with the create_stream() member function. This function takes the name of the stream to be created as an argument. Internally the streams are kept as a list.

The header file `include/EST_Utterance.h' contains the full class definition.

The following member functions also exist:

The << operator prints the structure.

Utterance files are a slight variation on ESPS/xwaves label files. An example is given below.


separator ;
nfields 1
# Word
	1.6 26 	example; a3;  Syllable 3 4 5 ;  Phoneme 7 8 9 10 11 12 13 14 
# Syllable
	1.0 26 	S; a3;  Word 3 ;  Phoneme 7 8 9 
	1.3 26 	S; a4;  Word 3 ;  Phoneme 10 11 
	1.5 26 	S; a5;  Word 3 ;  Phoneme 12 13 14 
# Phoneme
	0.8 26 	e; a7;  Syllable 3 ;  Word 3 
	0.9 26 	g; a8;  Syllable 3 ;  Word 3 
	1.0 26 	z; a9;  Syllable 3 ;  Word 3 
	1.1 26 	a; a10;  Syllable 4 ;  Word 3 
	1.2 26 	m; a11;  Syllable 4 ;  Word 3 
	1.3 26 	p; a12;  Syllable 5 ;  Word 3 
	1.4 26 	e; a13;  Syllable 5 ;  Word 3 
	1.5 26 	l; a14;  Syllable 5 ;  Word 3 

Every time a "#" occurs, a new stream is started, named by the string after the "#". Each unit is on a spearate line. The first field is its end point in seconds. The second field is its colour and is irrelevant (all queries to Entropic about this!). The third field is the name of the unit. After the name is a colon indicating a separate xlabel display field. The first field afterthe colon is a number indicating the unit's address. These need not be sequential or start at zero - they only need be unique in the stream. The next field is a relations. Each relations field starts with the name of the stream it is referring to and has a list of the addresses that the unit is linked to. There can be arbitrarily many relations, each containing a semi-colon to start, a name and a list of addresses.

EST_Ngrammar C++ Class

The EST_Ngrammar class provides N-gram language models of various types. There is no built-in limit on N, rather it is limited by how much memory you have available. Vocabulary items are internally indexed by ints, which allows large vocabularies. Executable programs for building and testing N-gram language models are described above.

Vocabulary

The first N-1 items (predictors) in the N-gram are used to predict the probability distrubution of the N'th item (predictee). By default, predictor(s) and predictee are taken from the same vocabulary, but this need not be the case (except for backed-off models).

Internal representation

The class constructors take as one of their arguments the internal representation to be used. This is of type EST_Ngrammar::representation_t

Dense

The most basic representation, where all possible N-grams are stored. Internally, this is a simple array of values, indexed by a value computed from the N-gram predictor/predictee indices.

Sparse

A more effecient alternative to the dense representation, where only N-grams with non-zero frequencies are stored. Additional overheads mean that the sparse representation is only more efficient if enough N-grams have zero frequencies. The tradeoff point is left to the user to determine. Internally, the representation is a tree whose root is the first item in the N-gram. At this time, the sparse representation is not fully working.

Backoff

This form is used for backed-off N-gram models (katz87) Internally, the representation is a tree whose root is the most recent item in the N-gram. Backoff weights are stored in the same tree. This representation limits the predictor and predictee vocabularies to be the same.

Threshold

The threshold for including an N-gram (its minimum frequency) is settable, but is the same for all orders.

Discounting methods

At this time, only an ad-hoc Good-Turing based method of computing discounts is available. This method involves computing a Good-Turing smoothed frequency-of-frequencies distribution. The discount for each frequency is then given by the difference between the smoothed and unsmoothed values, with the zeroton (frequency = 0) frequency being taken directly from the smoothed F-of-F distribution. Limits on the maximum frequency for Good-Turing smoothing, and a limit for discounting can be set. Alternative methods from the literature will be added in future releases.

Limitations

Unfortunately, backed-off Ngrammars can only be saved as such in ARPA format files (see section Ngrammar file formats) - the only standard "defined" (we use the term loosely) - but ARPA files cannot, at this time, be read. Saving in CSTR format involves expansion to a dense representation See section Ngrammar data formats

Building grammars from data

The member function accumulate is used to build N-gram models from data, but the higher level member function build is more useful.

Start/end tags

To deal with start and end of data/sentence, the tags 'prev_prev', 'prev' and 'next' can be given. These are used to fill up the sliding window when building from 'sentence_per_line' or 'sentence_per_file' data (see section Ngrammar data formats). For example (the tags take their default values here):

N = 2
Sentence is : Hello world
Mode is : sentence per line 
Using default tags 
Ngrams accumulated are : (!ENTER Hello), (Hello world), (world !EXIT)

Ngrammar data formats

Data can be in one of the following formats: ngram_per_line, sentence_per_line or sentence_per_file.

Ngrammar file formats

Dense and sparse Ngrammars can be saved and loaded in CSTR's own format (either ascii or binary). Backoff grammars can be saved in ARPA format, or in CSTR format which involves conversion to dense format. Compressed output is available via `gzip` (the GNU zip program).

CSTR ascii format

Header

The header starts with the magic number "Ngram_2" follwoed by the order (N). The next two lines give predictor and predictee vocabularies. For example (predictor and predictee vocabularies are the same):

Ngram_2 1
acknowledge align check !ENTER !EXIT
acknowledge align check !ENTER !EXIT

Data

Each line contains an ngram followed by its frequency (may be non-integer). For example:

acknowledge align : 33
check clarify : 72

Zero frequency ngrams are not stored. White space may be any number of spaces or tabs. Blank lines are ignored.

CSTR binary format

Header

The header starts with a magic number defined by EST_NGRAMBIN_MAGIC in the `EST_Ngrammar.h' header file, followed by mBin_2, then the order (N).

Data

Frequencies are written in binary (floating point) form, and since only dense format is supported, the ngram "words" need not be written. Simple run-length encoding is used to reduce file size, where repeated values are written as "value, -n" where n is the number of repetitions. This is possible because the values (frequencies) themselves are never negative.

ARPA format

The only "standard" format defined for backed-off grammars. We know of no written definition of this format, and hence do not attempt to give one here!

@ignore

This file documents the speech tools library developed at the Centre for Speech Technology Research at the University of Edinburgh.

Copyright (C) 1996 University of Edinburgh

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

This part describes base classes in the system.

Tlist C++ Class

TList is a generic template doubly-linked list class. See `include/EST_TList.h'.

The list is made up of a a series of "items" of class Titem. Each of these has a value, val and a next and previous pointer. At present, the list uses a pointer to TBI to interate through the list class. The best way to iterate through the list is to use <tt>for</tt> loop style syntax.

Member functions of TList

See `include/EST_TList.h' for list,

EST_TList includes most of what you'd expect from a list:

In addition, the following overloaded operators are provided:

Examples of usage:


EST_TList l<int>;  // declare a list "l" of integers;
EST_TBI *p; // declare an interation pointer for this list.

for (i = 0; i < 10; ++i) // fill with some values.
    l.append(i);

for (p = l.head(); p != 0; p = next(p)) // iterate through list.
    cout << l(p); // access the item vai () overloading

cout << l.length(); // print out length of list (in this case 10).

Instansiation of EST_TList

C++ does not have a standard for template instansiation which makes it difficult to arbitrarily define new template types. Within the speech_tools library new EST_TList template classes should be defined as follows. Suppose you have a class called Thing and you wish to make a EST_TList of it. Add to your file the following

#if defined(INSTANTIATE_TEMPLATES)
#include "../base_class/EST_TList.cc"
template class EST_TList<Thing>;
#endif

And add the name of that file to the make variable TSRCS in your `Makeilfe'

Future Changes

We hope this class has become stable though some more member functions may be added (e.g. sort etc.).

KVL C++ Class

KVL (short for Key/Value List) is a template class of a list of key/value pairs. There are two specifiable types for the KVL class, the key and the value, e.g. KVL<EST_String,EST_String> produces a list of string pairs. KVL uses the TList class to actually store its data.

KVL has much the same functionality as EST_TList, but has some additional features which make use easier. A crucial difference between a KVL and a normal list is that each key in the KVL is unique.

`include/KVL.h'

Member functions of KVL

In addition, the following overloaded operators are provided:

Examples of usage:


KVL<int, int> x; // declare key value list.
EST_TBI *p;          // declare iteration pointer.

// read some values from standard input.
while(cin)
{
    cout << "type key then val\n";
    cin >> k;
    cin >> v;
    x.add_item(k, v);
}

// is vkey "9" in list?
cout << (x.present(9) ? "true" : "false") << endl;

for (p = x.list.head(); p != 0; p = next(p)) // iterate through list.
   cout << x.key(p) << " " << x.val(p); 
//print out all keys and values in list.

Option C++ Class

The EST_Option class provides a uniform way to access options in a program. The most obvious source of options are from the command line. The function parse_command_line2(...) takes the C command line variables (argc,argv) and produces an EST_Option class. Specifically it allows options names value tpyes, defaults and documentation for opntions in a program.

The EST_Options class is derived from KVL<EST_String, EST_String>, so all the KVL member functions also work with EST_Option. It provides some useful extra functionality.

Member functions of EST_Option

All the options are stored as key value pairs of EST_Strings. However, it if often useful to have other types, e.g. integers. This is possible, but it is the EST_String of the integer that is actually stored. Additional member functions, e.g. add_item() do the conversion automatically.

The member functions ival(const EST_String &key) will return the value as an ineger and fval(const EST_String &key) as a float.

File i/o

It is sometimes convenient to store options in files, and the options class supports a system where there is one.

There is one key/value pair per line. Lines can be commented by starting them with the comment character (by default this is ";", but this can be set using the load() function). Each line must start with the key. The remainder, which may appear as a list in the file, is taken as the value. Option files can be included in other option files by using the #include filename directive.

If a particular key appears more than once when loading, the value of the last occurance is used. Files are loaded using the load(...) function. The first argument to this is the file name, and the second (optional) argument is the comment character. The load function merely appends to the existing options (while overriding the values of duplicate keys) - if an entirely new set of options are to be loadedcall the clear() member function first.

The EST_Option class inherits the member functions of the KVL class. In addition, the following exist:

EST_TVector

A simple vector class is provided for. Member functions are given in `include/EST_TVector.h'.

EST_TMatrix

The EST_TMatrix class allows the creation of standard matrices.

See `include/EST_TMatrix.h' for member functions.

There is a derive class EST_FMatrix for floats, the derivation is used rather than a simple template to allow loading and saving to files.

EST_Chunk

The EST_Chunk classes offers a reference counting system for arbitrary segments of memory. This is primarily used by the EST_String class.

EST_String

This class was written for a number of reasons. It offers a string class functionally identical to the GNU libg++ String class. We choose to write our own string class rather than use the one provided with GNU G++ for the following reasons. The String class in libg++ is different in different versions and causes lots of confusion when compiling the system with different versions of `libg++'. If we depended of the GNU String class we must provide `libg++' on all platforms we compile the system on. This and the Regex class are the only classes we relied on, by writing our one we all much greater portability. The GNU String class typcially copies string values around while our replacement uses reference counts. Because of the way we use strings in the speech tools and Festival keeping track of reference counting allows a much more efficient implementation of strings. Thus our replacement string class is faster for substantial benchmarks of Festival than the GNU equivalent.

The member functions of EST_String follow that of the GNU `libg++' String class as closely as possible (we designed it for a drop in replacement of our current use of String).

EST_Regex

As we wished to remove our dependence on GNU libg++ as described in the previosu section we have provided a regular expression class which for the most part follows that of the GNU libg++ Regex class. This implementation uses the regex functions from BSD4.4-lite (and earlier) written by Henry Spencer.

EST_TNamedEnum

A class which relates names (EST_String) to enums.

EST_StringTrie

EST_StringTrie builds a tree index from string keys to arbiitrary objects. Thus objects may be index effciently from strings. The strings must be ascii (the eighth bit is ignored).

For example the following builds an index of regular expressions based on their character form so that they need not be recompiled.

EST_StringTrie regexes;

EST_Regex *make_regex(const char *r)
{
    // Access previously generated Regex or make new one
    // and add to index
    EST_Regex *rx;

    if ((rx = (EST_Regex *)regexes.lookup(r)) == 0)
    {
        EST_Regex *nr = new EST_Regex(r);
        regexes.add(r,(void *)nr);
        rx = nr;
    }

    return rx;
}

A StringTrie may be explicited clear with the function clear(). The contents of the string tree make cleared by passing a garabage collection function with clear() which will be poassed each item in the trie as a void *. The type of the user provided garbage collection function is

    void (*deletecontents)(void *n);

EST_Token and EST_TokenStream

EST_Token with EST_TokenStream provides a method for reading files as whitespece separated tokens. A token consists of four parts, some of which may be empty: a name, the actual token, preceding whitespace, preceding punctuation, the name and succeeding punctuation. The definitions of whitespace and punctuation are user definable. There is also support for single character symbols and quoted tokens.

A token stream from which tokens may be gotten, may be a file or a string.

For example the follow reads a file and output each toke on a new line

   EST_TokenStream ts;
   EST_Token;

   ts.open("myfile");
   while (!ts.eof())
        cout << ts.get() << endl;

Although token streams support on symbol look ahead via peek() they do support unget.

Punctuation (pre and post) may be set after opening a stream. The defaults are empty. Any characters defined as punctuation found around a token are striped and saved in the punctuation fields. Single character symbols will cause a token break when ever they occur (i.e. separating whitespace is not required), again by default these are empty. Whitespace by default is defined as space, horizontal tab, carriage return and line feed.

Quoting mode is off by default but may be started by calling set_quotes with a quote character and an escape character (typically " and \). When in quote mode, a token starting with the quote character will continue until next unescaped quote character, including whitespace and punctuation.

Although a whole file's contents including all its whitespace may be recorded by tokens from a token stream, any final whitespace after the last real token may be mistakenly omitted unless care is taken. In many cases you'll just require the final whitespace before end of file to set end of file which is the default. In quotes mode all tokens include this last token with an empty name will be returned before eof is set.


Go to the first, previous, next, last section, table of contents.