After you installed the framework, you can start using it in your app by following these steps:

1. Import Header File

Import the framework header file wherever you are using the framework

#import "KeenASR/KeenASR.h"

2. Initialize the SDK

Initialize the SDK, preferably in your AppDelegate. For example:

// maybe change the log levels (default is warn)
// [KIOSRecognizer setLogLevel:KIOSRecognizerLogLevelInfo];
     
if (! [KIOSRecognizer sharedInstance]) {
  [KIOSRecognizer initWithASRBundle:@"librispeechQT-nnet2-en-us"];

  // assumes we want to capture audio in the file (default is NO)
  // you can change this at any time, but be aware that if it's set
  // to YES, it will start filling up the device with audio recordings
  [KIOSRecognizer sharedInstance].createAudioRecordings = YES;

  // you can also set other parameters here, like VAD (Voice Activity Detection)
  // settings
}

3. Assign Delegate to the Recognizer Instance and Implement Protocol Methods

Make sure the controller that handles ASR activity implements KIOSRecognizerDelegate protocol and set it to be the delegate for the recognizer

@interface MyViewController : UIViewController <KIOSRecognizerDelegate>

....
// in init, or viewDidLoad
[KIOSRecognizer sharedInstance].delegate = self;

You will also need to implement at least the following method recognizerFinalResult:forRecognizer: to obtain recognition result when recognizer stops listening. You can also get notifications about partial recognition results via recognizerPartialResult:forRecognizer:.

In order to properly handle audio interrupts, you should also implement recognizerReadyToListenAfterInterrupt: method. If audio interrupt (incoming phone call, SMS notification, app goes to background) occurs, the SDK will automatically stop listening. Once interrupt is over, it will trigger recognizerReadyToListenAfterInterrupt: method; you can use this method to prepare the UI (e.g. re-enable Start button if you had one) or call startListening if your app is always listening.

4. Create Decoding Graph

Decoding graph combines the language model with all the other recognition resources (acoustic models, lexicon) and provide recognizer with a data structure that simplifes decoding process. You can build decoding graph dynamically, from within your app in two different ways:

  1. By providing a list of sentences/phrases users are likely to say. In this case, SDK will first build the ARPA language model, and then create the decoding graph. This functionality is provided via createDecodingGraphFromSentences:forRecognizer:andSaveWithName: method of KIOSDecodingGraph class.

  2. By providing an ARPA language model file, which is bundled with your app. In this case you will need to use createDecodingGraphFromArpaFileAtURL:forRecognizer:andSaveWithName: method.

Decoding graph can also be created offline, in a development sandbox, and bundled with the app or downloaded after the app has been installed. This approach is recommended if your app is targetting use cases where more than a few thousand words need to be recognized. In such case process of building decoding graph may require significant memory and it may take too long when done on a mobile device.

Following is an example of how a custom decoding graph can be created dynamically on the device:


// getPhrases may be a hard-coded list of commands, a more elaborate list of phrases
// (e.g. movie titles for a voice-search), or it may be based on data retrieved on the
// device (contact names, songs in the library, etc.)
NSArray *phrases = [self getPhrases];
if ([phrases count] == 0) {
   self.statusLabel.text = @"Unable to retrieve sentences for language model/decoding graph";
   return;
}
  
// and create custom decoding graph named 'MyDecodingGraph' using those phrases
if (! [KIOSDecodingGraph createDecodingGraphFromSentences:phrases forRecognizer:self.recognizer andSaveWithName:@"MyDecodingGraph"]) {
   NSLog(@"Error occured while creating decoding graph");
   return;
}

This will create a custom decoding graph called ‘MyDecodingGraph’ and save it in the file system. Later on you can refer to this decoding graph by its name. You will typically create decoding graph only once, or perhaps re-create it periodically when you know that the data used to build it may have changed.

KIOSDecodingGraph class requires extended ASR bundles (lang/ subdirectory with various files needs to exist in the ASR bundle directory)

For large vocabulary support and offline creation of decoding graphs see Decoding Graphs and Acoustic Models section.

5. Prepare to Listen

Before starting to listen you will need to tell KIOSRecognizer instance which decoding graph to use by calling prepareForListeningWithCustomDecodingGraphWithName: or prepareForListeningWithCustomDecodingGraphAtPath method. For a given decoding graph you do this only once before you start listening.

NSString *dgName = @"MyDecodingGraph";
if (! [self.recognizer prepareForListeningWithCustomDecodingGraphWithName:dgName]) {
   NSLog(@"Unable to prepare for listening with custom decoding graph called %@", dgName);
   return;
}

6. Start/Stop Listening

To start capturing audio from the device mic and decoding it, call recognizer’s startListening method. While you can stop it by explicitly calling its `stopListening method, we highly recommend you rely on Voice Activity Detection and let recognizer stop automatically when one of the VAD rules is triggered.

While the recognizer is listening, it will periodically (every 100-200ms) call delegate’s recognizerPartialResult:forRecognizer: method IF there are partial recognition results AND they are different than the most recent partial result.

Recognizer will automatically stop listening when one of the Voice Activity Detection (VAD) rules triggers. You can control VAD config parameters via setVADParameter:toValue: method. When the recognizer stops listening due to VAD triggering, it will call recognizerFinalResult:forRecognizer: method.

See KIOSVadParameter constants for more details on different VAD settings.

If audio interrupt occurs – for example, a phone call comes in or the app goes to the background – recognizer will automatically stop listening and unwind its audio stack before the app goes to background. No recognizer callbacks will be triggered when this happens due to lack of time and the fact that interrupt occured at random time (recognition results may not properly reflect what was said). You can implement recognizerReadyToListenAfterInterrupt: method to receive notification when audio interrupt is over.

If you set recognizer’s createAudioRecordings property to YES, you can use lastRecordingFilename property of the recognizer to determine the path to the recorded file.

The framework only creates the recording; it is your responsibility to do something with the audio file once it’s created. For example, you may want to play it back to the user (and delete eventually), or send it to the backend (and delete locally), so that you can assess how users are interacting with your app.

You can also control if recognition result metadata should be stored on the device. If createJSONResultMetadata is set to YES, recognizer instance will create a JSON file with various metadata the device and recognition result.

Keen Research customers have access to Dashboard, a cloud-based tool for development support. Using KIOSUploader class you can setup a background upload thread which will push audio recordings and JSON recognition results to Dashboard for further analysis.

7. Switching Decoding Graphs

If your app needs to support multiple decoding graphs you can either dynamically build or bundle within your app multiple decoding graphs. At any time while the recognizer is not listening you can call one of the prepareForListening methods to load a different decoding graph.

8. Other

Our proof of concept app on Github showcases use of KeenASR framework in a few domains.

For more details on how to specify what recognizer is listening for (decoding graph), see Decoding Graphs and Acoustic Models.