After you installed the framework, you can start using it in your app by following these steps:
1. Import the Header File
Import the framework header file wherever you are using the framework.
#import "KeenASR/KeenASR.h"
2. Initialize the SDK
Initialize the SDK, preferably in your AppDelegate. For example:
// consider changing the log levels (default is warn)
// [KIOSRecognizer setLogLevel:KIOSRecognizerLogLevelInfo];
if (! [KIOSRecognizer sharedInstance]) {
[KIOSRecognizer initWithASRBundle:@"keenB2mQT-nnet3chain-en-us"];
// assumes we want to capture audio in the file (default is NO)
// you can change this at any time, but be aware that if the parameter is set
// to YES, the app will start filling the device with audio recordings
[KIOSRecognizer sharedInstance].createAudioRecordings = YES;
[KIOSRecognizer sharedInstance].createJSONResultMetadata = YES;
// you can also set other parameters here, like VAD (Voice Activity Detection)
// settings
}
3. Assign Delegate to the Recognizer Instance and Implement Protocol Methods
Make sure the controller that handles the ASR activity implements the KIOSRecognizerDelegate
protocol and set it to be the delegate for the recognizer
@interface MyViewController : UIViewController <KIOSRecognizerDelegate>
....
// in init, or viewDidLoad
[KIOSRecognizer sharedInstance].delegate = self;
You will also need to implement at least the following method recognizerFinalResult:forRecognizer:
to obtain recognition result when the recognizer stops listening. You can also get notifications about partial recognition results via recognizerPartialResult:forRecognizer:
.
If you are planning to use trigger phrase functionality, you can implement the optional recognizerTriggerPhraseRecognized:
method, which will be called when a trigger phrase has been recognized. In this callback method you can provide visual (and/or audio) feedback to the user to indicate that the trigger phrase has been recognized.
In order to properly handle audio interrupts, you should also implement the recognizerReadyToListenAfterInterrupt:
method. If audio interrupt (incoming phone call, SMS notification, app goes to background) occurs, the SDK will automatically stop listening. Once the interrupt is over, the recognizerReadyToListenAfterInterrupt:
method is triggered; you can use this method to prepare the UI (for example, to reenable the Start button) or to call startListening
if your app is always listening. Note that audio feedback may affect speech recognition performance, since audio played by the app will also be captured by the microphone. If users are using a headset, audio feedback might be more appropriate.
4. Create the Decoding Graph
The decoding graph combines the language model with all other recognition resources (acoustic models, lexicon) and provides the recognizer with a data structure that simplifies the decoding process. You can build the decoding graph dynamically from within your app in two different ways:
-
By providing a list of sentences/phrases users are likely to say. In this case, the SDK will first build the n-gram language model and then create the decoding graph. This functionality is provided through the
createDecodingGraphFromSentences:forRecognizer:andSaveWithName:
method of theKIOSDecodingGraph
class. -
By providing an ARPA language model file bundled with your app. In this case you will need to use the
createDecodingGraphFromArpaFileAtURL:forRecognizer:andSaveWithName:
method.
You can also create the decoding graph offline in a development sandbox and bundle it with your app or have the app download the decoding graph after the app has been installed. This approach is recommended if your app is targeting use cases where more than a few thousand words need to be recognized. For large vocabulary recognition tasks building the decoding graph may be too memory intensive to be performed on a mobile device.
The following example illustrates how you can create a custom decoding graph dynamically on the device:
// getPhrases may be a hard-coded list of commands, a more elaborate list of phrases
// (e.g., movie titles for a voice-search), or it may be based on data retrieved on the
// device (contact names, songs in the library, etc.)
NSArray *phrases = [self getPhrases];
if ([phrases count] == 0) {
self.statusLabel.text = @"Unable to retrieve sentences for language model/decoding graph";
return;
}
// create a custom decoding graph named 'MyDecodingGraph' using those phrases
if (! [KIOSDecodingGraph createDecodingGraphFromSentences:phrases forRecognizer:self.recognizer andSaveWithName:@"MyDecodingGraph"]) {
NSLog(@"Error occurred while creating decoding graph");
return;
}
This example creates a custom decoding graph called ‘MyDecodingGraph’ and saves it in the file system. Later on you can refer to this decoding graph by its name. You will typically create the decoding graph only once. You will re-create the decoding graph only when you know that the data used to build it may have changed.
the KIOSDecodingGraph
class requires extended ASR bundles (a lang/ subdirectory with various files needs to exist in the ASR bundle directory).
For trigger phrase support, you would create a decoding graph using the createDecodingGraphFromSentences:withTriggerPhrase:forRecognizer:andSaveWithName:
method of the KIOSDecodingGraph
class.
For information on large vocabulary support and offline creation of decoding graphs, see Decoding Graphs and Acoustic Models.
5. Prepare to Listen
Before starting to listen you will need to tell the KIOSRecognizer
instance which decoding graph to use by calling the prepareForListeningWithCustomDecodingGraphWithName:
or the prepareForListeningWithCustomDecodingGraphAtPath
method. For a given decoding graph you do this only once before you start listening.
NSString *dgName = @"MyDecodingGraph";
if (! [self.recognizer prepareForListeningWithCustomDecodingGraphWithName:dgName]) {
NSLog(@"Unable to prepare for listening with custom decoding graph called %@", dgName);
return;
}
6. Start/Stop Listening
To start capturing audio from the device microphone and decoding it, call the recognizer’s startListening
method. While you can stop the device listening by explicitly calling the stopListening
method, we highly recommend you rely on Voice Activity Detection and let the recognizer stop automatically when one of the VAD rules is triggered.
While the recognizer is listening, it periodically (every 100-200ms) calls delegate’s recognizerPartialResult:forRecognizer:
method IF there are partial recognition results AND they are different than the most recent partial result.
The recognizer automatically stops listening when one of the Voice Activity Detection (VAD) rules is triggerred. You can control VAD configuration parameters through the setVADParameter:toValue:
method. When the recognizer stops listening due to VAD triggering, it will call the recognizerFinalResult:forRecognizer:
method.
Refer to the KIOSVadParameter
constants for information on different VAD settings.
When using a decoding graph created with trigger phrase support, the logic is slightly different. After you call the startListening
method, the recognizer listens continuously until the trigger phrase is recognized. This event calls the recognizerTriggerPhraseDetectedForRecognizer
method, which causes the recognizer to switch back to regular listening mode. In regular listening mode, the recognizer reports partial result, and it stops listening after one of the VAD rules triggers. In your recognizerFinalResult:forRecognizer:
calllback method you might act upon the command and then, depending on the logic of your application, call startListening
again to continue listening.
If an audio interrupt occurs–for example, a phone call comes in or the app goes to the background–the recognizer automatically stops listening and unwind its audio stack before the app moves to the background. No recognizer callbacks will be triggered when this happens because there is no time and because the interrupt occurred at random; hence the recognition results may not correctly reflect what was said. You can implement the recognizerReadyToListenAfterInterrupt:
method to receive notification when audio interrupt is over.
If you set the recognizer’s createAudioRecordings
property to YES, you can use the lastRecordingFilename
property of the recognizer to determine the path to the recorded file.
The framework only creates the recording; it is your responsibility to do something with the audio file once it is created. For example, you may want to play it back to the user (and delete it eventually), or send it to the back end and delete it locally, so that you can assess how users are interacting with your app.
You can control if recognition result metadata should be stored on the device. If createJSONResultMetadata
is set to YES, the recognizer instance will create a JSON file populated with metadata from the device and the recognition result.
Keen Research customers have access to Dashboard, a cloud-based tool for development support. Using the `KIOSUploader’ class you can set up a background upload thread which will push audio recordings and JSON recognition results to Dashboard for further analysis.
7. Switching Decoding Graphs
If your app needs to support multiple decoding graphs you can either dynamically build multiple decoding graphs, or bundle multiple decoding graphs with your app. At any time while the recognizer is not listening you can call one of the prepareForListening
methods to load a different decoding graph.
8. Other
The proof of concept app on Github showcases the use of the KeenASR framework in a few domains.
For information on how to specify what the recognizer is listening for (decoding graph), refer to Decoding Graphs and Acoustic Models.