Go to the first, previous, next, last section, table of contents.


Stream-Based Architecture

Stream-based Architecture Formalism

This page contains an overview of the formal structure of the (yet to have a nice name) stream-based architecture. There is also a page on the C++ implementation of this architecture.

An Utterance structure contains all the information for a speech utterance. An utterance can be of any length, from a phoneme to a paragraph and longer. Each utterance contains lists of one of four linguistic types. The types are tree, stream, track and waveform. The are no fixed rules saying how each should be used, but general conventions do apply. Trees are usd to represent hierarchical contex-free structures (e.g. syntax trees), streams are used to represent linear lists of linguistic objects (e.g. phonemes), tracks are used to represent functions of a variable over time (e.g. F0 contour) and waveforms are used to represent acoustic data.

The tree facility has yet to be implemented.

A stream is a list of a single class of linguistic object. An utterance structure generally has several streams. A stream contains a name by which it is referred by, and a list of linguistic units.

A typical utterance structure in a spech synthesizer would have 4 streams for "word", "phoneme", "syllable" and "intonation".

Each unit contains three kinds of information:

Generic Properties

All units have three basic properties:

Class Specific Properties

Each unit can also make use of an indefinitely complex set of additional features. This is implemented by defining a class specific to the word or syllable etc, and having a pointer in the unit to this.

Relations

A unit can be linked to units in different streams by using relations. For example a word will be related to its syllables, and its syllables to its phonemes. Each unit has a set of relations, one relation for every stream in the Utterance structure. Each relation has a name, indicating which stream it is referring to and is a list of links. Each link is a integer, which refers to the address of the unit that is being linked.

For example:


	Name		Address 	Relations

Stream: Word
	example		1 : 		Syllable 1 2 3

Stream: Syllable
	S		1		Word 1		Phoneme 1 2 3 
	S		2		Word 1		Phoneme 4 5
	S		3		Word 1		Phoneme 6 7 8

Stream: Phoneme
	e		1		Syllable 1
	g		2		Syllable 1
	z		3		Syllable 1
	a		4		Syllable 2
	m		5		Syllable 2
	p		6		Syllable 3
        e		7		Syllable 3
	l		8		Syllable 3

Links should always be mutual - if unit a is linked to unit b, then unit be should be linked to unit a.

Implementation of Stream-based Architecture

The Stream-based architecture makes use of the following C++ classes:

EST_Stream_Item

Each stream contains a list of units of one type of lingustic object (e.g. phoneme). These are implemented in the EST_Stream_Item class. The header file, `include/EST_Stream.h' contains the full class definition. As explained in the formal descpription, each has three basic properties which are accessed by the following member functions:

Each unit has a String which specifies the type of the stream, e.g. "phoneme" or "word". This is set by the init() function which must be called.

The following member functions are also present:

Note that none of the make_link() functions cause reciprical linking. Given two units "a" and "b", calling

a.make_link(b.stream_name(), b.addr())

will cause b to be included in a's relation, but not vice-versa. To ensure reciprical linking, use the non-member function link().

Stream item's may also be associated with other data though the fields, features and contents attributes. See section EST_Stream_Item C++ Class.

Relation

The relation class has a name and a list of integers. The name is the name of the stream that is being pointed to. The list of integers are the addresses of the units in the named stream that are being pointed to.

Relation is derived from the EST_TList<int> class.


Go to the first, previous, next, last section, table of contents.