Go to the first, previous, next, last section, table of contents.


Executable Programs

This section gives a brief description of the executable programs available with the speech tools. Most of these programs are simple wrap-around main() functions to library routines.

Many of these programs have man pages. Please consult the man pages for more detailed information. Most programs print a summary of their command line options when given the -help flag. Some programs are "finished", while others are still "in progress". The finished programs should be well documented and stable. The "in progress" programs are near completion but typically still require some work regarding user interfaces and documentation.

Data manipulation programs

ch_wave

Changes waveform file formats, performs re-sampling and scaling, prints information on waveform headers etc.

ch_track

Changes track file formats, converts track files into label files, smoothes tracks, re-samples tracks. Tracks are for F0, LPC coefficients, ceptra and such like.

ch_lab

Changes label file formats, converts label files into track files, performs one-to-one mapping of labels from one set to another, performs context sensitive label re-writing.

comp_lab (in progress)

Comparision of two label files, e.g. for scoring a test transcription against a reference one.

comp_track (in progress)

Comparison of two track files.

comp_wave (in progress)

Comparison of two waveform files.

Audio Playback

na_play

Plays arbitrary waveform files on a variety of hardware audio devices. Can perform re-sampling to match audio device capability. `na_play' has support for a number of audio devices. Compile time options specify which devices are supported. Note you must actually have these devices on your machine before `na_play' can play any waveform.

`na_play', depending on compile-time options, supports the following audio devices, specified by the `-p' command.

The default audio is netaudio if it is supported. If not the platform specific auido mode is the default (e.g. sun16audio, linux16audio, freebsd16audio or mplayeraudio). If none of these is supported, sunaudio is the default. The Audio_Command method is always an option.

Signal Processing

fvgen

Generate frame based feature vectors, including cepstra, delta cepstra, energy. More types will be added soon, including formants.

pda

Pitch tracker based on super resolution pitch determination (srpd). Takes waveforms (of any type) as input and produces F0 contours.

icda

Pitch tracker with smoothing based on super resolution pitch determination (srpd). Takes waveforms (of any type) as input and produces F0 contours. Smoothing involes median smoothing of the pda output and interpolation through unvoiced regions.

lpc_analysis (in progress)

lpc_synthesis (in progress)

spectgen (in progress)

Speech Recognition

viterbi (in progress)

A straightforward Viterbi decoder, using an ngram language model (which can be estimated using build_ngram, and a sequence of observation probability vectors.

build_ngram

Build ngram language models.

test_ngram

Test an ngram on text data.

ch_ngram (in progress)

Modify ngrams, e.g. interpolation of two ngram models.

hmm_decode (in progress)

Hmm decoder.

Statistical Analysis

wagon

A classification and regression tree building program following the techniques described in breiman84. See section Wagon

multistats (in progress)

Analysis of multivariate data.

normalise (in progress)

Normalisation of data.

popanalysis (in progress)

Population analysis.

cluster (in progress)

Simple clustering programs.

confusion (in progress)

Confusion matrix generation

Intonation

ev_param (in progress)

ev_synth (in progress)


Go to the first, previous, next, last section, table of contents.