KeenASR Framework Reference Documentation

KeenASR framework provides on-device automated speech recognition functionality for iOS devices. Speech recognition is performed on the device; no internet connectivity and backend/cloud support is needed. On recent Apple devices (iPhone 6s) recognition occurs in real-time for tasks with up to 80,000 words, using deep neural network recognizers.

The framework currently supports GMM, NNet2, and NNet3 (including chain) Kaldi acoustic models. We provide ASR bundles (acoustic models) which are based on publicly available Kaldi acoustic models from http://kaldi-asr.org/downloads/all/ as well as custom built acoustic models for different populations and tasks. If you are interested in custom ASR Bundles (e.g. NNet3 chain), or population/domain specific models, please contact us.

NNet decoders are based on deep neural networks and provide significantly better results on medium and large vocabulary tasks. They are also more CPU intensive than GMM decoders, and for larger language models (more than few thousand words) are likely to fully utilize the CPU on recent iOS devices (e.g. iPhone 6) when recognizer is listening/decoding. NNet3 chain acoustic models utilize about 40-50% of a single CPU core on iPhone 6, for medium size vocabularies (couple of thousand words).

To get started, review the Installation, the Quick Start, and Decoding Graphs & Acoustic Models documents.