Decoding Graphs and Acoustic Models

The KeenASR speech recognizer finds the most likely sequence of words spoken in the recorded audio input. The speech recognizer takes into account both the acoustic match (how closely the features extracted from the audio match the acoustic model), as well as the language model.

The decoding graph combines the language model with other resources (acoustic models, lexicon) in a manner that simplifies the decoding process.

The KeenASR framework supports the programmatic creation of decoding graphs from a set of phrases/words users are likely to say; the API provides a number of different configuration parameters allowing optimization for specific scenarios (e.g. specific task such as oral reading, or allowing more or less strictness with accented speech).

If the number of words (ngrams, to be precise) is large, the creation of a decoding graph on a mobile device may exhaust the device memory or take too long, especially on devices with slower CPUs and < 1GB of RAM. In such cases, we recommend that you create a decoding graph ahead of time in your development sandbox and then bundle it with your app. Contact us if you are interested in large vocabulary dictation and need help with creating decoding graphs for large language models.

The ASR Bundles contain acoustic models, a lexicon, and various configuration files. ASR Bundles are specific for each language and recognizer type. They are typically trained using thousands of hours of transcribed speech data.

Note: We can provide ASR Bundles that work better in adverse acoustic environments or have a smaller memory footprint and CPU utilization. Contact us for details.