Quick Start Document

After you installed the framework, you can start using it in your app by following these steps:

1. IMPORT HEADER FILE

Import the framework header file wherever you are using it

#import "KeenASR/KeenASR.h"


2. INITIALIZE THE ENGINE

Initialize the engine, preferably in your AppDelegate. For example:

// maybe change the log levels (default is warn)
// [KIOSRecognizer setLogLevel:KIOSRecognizerLogLevelInfo];

if (! [KIOSRecognizer sharedInstance]) {
   [KIOSRecognizer initWithASRBundle:@"librispeech-gmm-en-us"];

   // assumes we want to capture audio in the file (default is NO)
   // you can change this at any time, but be aware that if it's set
   // to YES, it will start filling up the device with audio recordings
   [KIOSRecognizer sharedInstance].createAudioRecordings = YES;
 }

3. ASSIGN DELEGATE TO THE RECOGNIZER AND IMPLEMENT PROTOCOL METHODS

Make sure the controller that handles ASR activity implements KIOSRecognizerDelegate protocol and set it to be the delegate for the recognizer

 @interface MyViewController : UIViewController <KIOSRecognizerDelegate>

 ....
 // in init, or viewDidLoad
 [KIOSRecognizer sharedInstance].delegate = self;

also, implement at least the following method [KIOSRecognizerDelegate recognizerFinalResult:forRecognizer:] to obtain recognition result when recognizer stops listening. You can also get notifications about partial recognition results via [KIOSRecognizerDelegate recognizerPartialResult:forRecognizer:].

In order to properly handle audio interrupts, you should also implement [KIOSRecognizerDelegate recognizerReadyToListenAfterInterrupt:] method. If audio interrupt (incoming phone call, SMS notification, app goes to background) occurs framework will automatically stop listening. Once interrupt is over, framework will trigger [KIOSRecognizerDelegate recognizerReadyToListenAfterInterrupt:]; you can use this method to prepare the UI (e.g. re-enable Start button if you had one) or call startListening if your app is always listening.


4. CREATE DECODING GRAPH

Decoding graph combines the language model with all the other recognition resources (acoustic models, lexicon) and it used by the recognizer to perform recognition on live audio or audio from the file. You can build decoding graph dynamically, from within your app in two different ways:

  1. By providing an ARPA language model file, which is bundled with your app. In this case you will need to use createDecodingGraphFromArpaFileAtURL:forRecognizer:andSaveWithName: method. We assume ARPA language model was built in your development sandbox.

  2. By providing a list of sentences/phrases users are likely to say. In the latter case, SDK will first build the ARPA language model, and then create the decoding graph. In this case you will need to use createDecodingGraphFromSentences:forRecognizer:andSaveWithName: method.

You can also create decoding graph offline, in your development sandbox, and bundle it with your app. This approach is recommended if your app is listening to more than couple of thousand words, because in such case process of building decoding graph may require significant memory and it may take too long when done on the mobile device.

Following is an example of how a custom decoding graph can be created dynamically on the device, using information available on the device (music library in this case); for details check the source code on Github.

 // Since we are using data from the phone's music library, we will first
 // compose a list of relevant phrases
 NSArray *sentences = [self createMusicDemoSentences];
 if ([sentences count] == 0) {
   self.statusLabel.text = @"Unable to access music library";
   return;
 }

 // and create custom decoding graph named 'music' using those phrases
 if (! [KIOSDecodingGraph createDecodingGraphFromSentences:sentences forRecognizer:self.recognizer andSaveWithName:@"music"]) {
   NSLog(@"Error occured while creating decoding graph from users music library");
   return;
 }

This will create a custom decoding graph called ‘music’ and save it in the file system. Later on you can refer to this decoding graph by its name. You will typically create decoding graph only once, or perhaps re-create it periodically when you know that the data used to build it may have changed.

KIOSDecodingGraph class requires extended ASR bundles (lang/ subdirectory with various files needs to exist in the ASR bundle directory)

For large vocabulary support and offline creation of decoding graphs see Decoding Graphs and Acoustic Models section.

IMPORTANT If sentences contain words that are not in the lexicon (ASRBUNDLE/lang/lexicon.txt), their phonetic transcription will be assigned algorithmically. Since this method is not perfect and wrong phonetic transcriptions will affect recognition performance, you should aim to cover as many words as possible in the original lexicon by manually assigning accurate phonetic transcriptions. Contact us if you need help.


5. PREPARE TO LISTEN

Before starting to listen you will need to tell KIOSRecognizer which decoding graph to use by calling prepareForListeningWithCustomDecodingGraphWithName: or prepareForListeningWithCustomDecodingGraphAtPath: method. For a given decoding graph you do this only once before you start listening.

NSString *dgName = @"music";
if (! [self.recognizer prepareForListeningWithCustomDecodingGraphWithName:dgName]) {
   NSLog(@"Unable to prepare for listening with custom decoding graph called %@", dgName);
   return;
}


6. START/STOP LISTENING

To start capturing audio from the device mic and decoding it, call recognizer’s startListening method. To stop, call its stopListening method. For example, if you are analyzing recognition results in recognizerPartialResult:forRecognizer: method, you may want to stop the recognizer if your app logic determines that the partial result conveys sufficient information to move forward with the app logic. Or, you may opt to wait until recognizerFinalResult:forRecognizer: callback method is called, indicated that user stopped talking (according to VAD parameters).

While the recognizer is listening, it will periodically (every 100-200ms) call delegate’s recognizerPartialResult:forRecognizer: method IF there are partial recognition results AND they are different than the most recent partial result.

Recognizer will automatically stop listening when one of the Voice Activity Detection (VAD) rules triggers. You can control VAD config parameters via setVADParameter:toValue: method. When the recognizer stops listening due to VAD triggering, it will call recognizerFinalResult:forRecognizer: method.

See KIOSVadParameter constants for more details on different VAD settings.

If audio interrupt occurs – for example, a phone call comes in or the app goes to the background – recognizer will automatically stop listening. No callbacks will be triggered when this happens due to lack of time and the fact that interrupt occured at random time (recognition results may not properly reflect what was said). You can implement [KIOSRecognizerDelegate recognizerReadyToListenAfterInterrupt:] method to receive notification when audio interrupt is over.

If you set recognizer’s createAudioRecordings property to YES, you can use recordingsDir and lastRecordingFilename properties of the recognizer to determine the path to the recorded file. The framework only creates the recording; it is your responsibility to do something with the audio file once it’s created. For example, you may want to play it back to the user (and delete eventually), or send it to the backend (and delete locally), so that you can assess how users are interacting with your app.


7. SWITCHING DECODING GRAPHS

If your app needs to support multiple decoding graphs you can either dynamically build or bundle within your app multiple decoding graphs. At any time while the recognizer is not listening you can call one of the prepareForListening methods to load a different decoding graph.


8. Other

Our proof of concept app on Github showcases use of KeenASR framework in a few domains.

For more details on how to specify what recognizer is listening for (decoding graph), see Decoding Graphs and Acoustic Models.

NOTE: Make sure you are building/testing for the device, since KeenASR framework doesn’t support the simulator.

NOTE: Trial version of the framework will force the app to exit (“crash”) after 10min of use.