After you installed the framework, you can start using it in your app by following these steps:
1. Import the Header File
Import the framework header file wherever you are using the framework.
#import "KeenASR/KeenASR.h"
2. Initialize the SDK
Initialize the SDK, preferably in your AppDelegate. For example:
// consider changing the log levels (default is warn)
// [KIOSRecognizer setLogLevel:KIOSRecognizerLogLevelInfo];
if (! [KIOSRecognizer sharedInstance]) {
[KIOSRecognizer initWithASRBundle:@"keenB2mQT-nnet3chain-en-us"];
// you can also set other parameters here, like VAD (Voice Activity Detection)
// settings
}
3. Assign Delegate to the Recognizer Instance and Implement Protocol Methods
Make sure the controller that handles the ASR activity implements the KIOSRecognizerDelegate
protocol and set it to be the delegate for the recognizer
@interface MyViewController : UIViewController <KIOSRecognizerDelegate>
....
// in init, or viewDidLoad
[KIOSRecognizer sharedInstance].delegate = self;
You will also need to implement at least the following method recognizerFinalResponse:forRecognizer:
to obtain the response which contains recognition result when the recognizer stops listening. You can also get notifications about partial recognition results via recognizerPartialResult:forRecognizer:
.
If you are planning to use trigger phrase functionality, you can implement the optional recognizerTriggerPhraseRecognized:
method, which will be called when a trigger phrase has been recognized. In this callback method you can provide visual (and/or audio) feedback to the user to indicate that the trigger phrase has been recognized.
In order to properly handle audio interrupts, you should also implement the recognizerReadyToListenAfterInterrupt:
method. If audio interrupt (incoming phone call, SMS notification, app goes to background) occurs, the SDK will automatically stop listening. Once the interrupt is over, the recognizerReadyToListenAfterInterrupt:
method is triggered; you can use this method to prepare the UI (for example, to reenable the Start button) or to call startListening
if your app is always listening. Note that audio feedback may affect speech recognition performance, since audio played by the app will also be captured by the microphone. If users are using a headset, audio feedback might be more appropriate.
4. Create the Decoding Graph
The decoding graph combines the language model with all other recognition resources (acoustic models, lexicon) and provides the recognizer with a data structure that simplifies the decoding process. You can build the decoding graph dynamically from within your app by providing a list of phrases users are likely to say. In this case, the SDK will first build the n-gram language model and then create the decoding graph. This functionality is provided through the createDecodingGraphFromPhrases:forRecognizer:andSaveWithName:
method of the KIOSDecodingGraph
class; there several variants of this method that provide additional configuration parameters for building decoding graphs.
You can also create the decoding graph offline in a development sandbox and bundle it with your app or have the app download the decoding graph after the app has been installed. This approach is recommended if your app is targeting use cases where decoding graph covers larger domain (more than a few thousand words). For large vocabulary recognition tasks building the decoding graph may be too memory intensive to be performed on a mobile device.
The following example illustrates how you can create a custom decoding graph dynamically on the device:
// getPhrases may be a hard-coded list of commands, a more elaborate list of phrases
// (e.g., movie titles for a voice-search), or it may be based on data retrieved on the
// device (contact names, songs in the library, etc.)
NSArray *phrases = [self getPhrases];
if ([phrases count] == 0) {
self.statusLabel.text = @"Unable to retrieve sentences for language model/decoding graph";
return;
}
// create a custom decoding graph named 'MyDecodingGraph' using those phrases
if (! [KIOSDecodingGraph createDecodingGraphFromPhrases:phrases forRecognizer:self.recognizer andSaveWithName:@"MyDecodingGraph"]) {
NSLog(@"Error occurred while creating decoding graph");
return;
}
This example creates a custom decoding graph called ‘MyDecodingGraph’ and saves it in the file system. Later on you can refer to this decoding graph by its name. You will typically create the decoding graph only once. You will re-create the decoding graph only when you know that the data used to build it may have changed.
For trigger phrase support, you would create a decoding graph using the createDecodingGraphFromPhrases:withTriggerPhrase:forRecognizer:andSaveWithName:
method of the KIOSDecodingGraph
class.
For information on large vocabulary support and offline creation of decoding graphs, see Decoding Graphs and Acoustic Models.
KIOSAlternativePronunciation
class, and pass those to the methods that create decoding graphs. Contact us if you need help.5. Prepare to Listen
Before starting to listen you will need to tell the KIOSRecognizer
instance which decoding graph to use by calling the prepareForListeningWithDecodingGraphWithName:withGoPComputation:
or the prepareForListeningWithDecodingGraphAtPath:withGoPComputation:
method. For a given decoding graph you do this only once before you start listening.
NSString *dgName = @"MyDecodingGraph";
if (! [self.recognizer prepareForListeningWithDecodingGraphWithName:dgName withGoPComputation:false]) {
NSLog(@"Unable to prepare for listening with custom decoding graph called %@", dgName);
return;
}
When you want to switch to a different decoding graph, you would call this method again, with such graph name. This will unload the current graph from the memory and load the new one.
6. Start/Stop Listening
To start capturing audio from the device microphone and decoding it, call the recognizer’s startListening
method. While you can stop the device listening by explicitly calling the stopListening
method, we highly recommend you rely on Voice Activity Detection and let the recognizer stop automatically when one of the VAD rules is triggered.
While the recognizer is listening, it periodically (every 100-200ms) calls delegate’s recognizerPartialResult:forRecognizer:
method IF there are partial recognition results AND they are different than the most recent partial result.
The recognizer automatically stops listening when one of the Voice Activity Detection (VAD) rules is triggerred. You can control VAD configuration parameters through the setVADParameter:toValue:
method. When the recognizer stops listening due to VAD triggering, it will call the recognizerFinalResult:forRecognizer:
method.
Refer to the KIOSVadParameter
constants for information on different VAD settings.
When using a decoding graph created with trigger phrase support, the logic is slightly different. After you call the startListening
method, the recognizer listens continuously until the trigger phrase is recognized. This event calls the recognizerTriggerPhraseDetectedForRecognizer
method, which causes the recognizer to switch back to regular listening mode. In regular listening mode, the recognizer reports partial result, and it stops listening after one of the VAD rules triggers. In your recognizerFinalResponse:forRecognizer:
callback method you might act upon the command and then, depending on the logic of your application, call startListening
again to continue listening.
If an audio interrupt occurs–for example, a phone call comes in or the app goes to the background–the recognizer automatically stops listening and unwind its audio stack before the app moves to the background. No recognizer callbacks will be triggered when this happens because there is no time and because the interrupt occurred at random; hence the recognition results may not correctly reflect what was said. You can implement the recognizerReadyToListenAfterInterrupt:
method to receive notification when audio interrupt is over.
recognizerFinalResponse:forRecognizer:
will provide you with the KIOSResponse
object, which contains KIOSResult
as well as various other information about the most recent interaction. KIOSResponse
also provides ways to save the corresponding audio and JSON in the filesystem.
Keen Research customers have access to Dashboard, a cloud-based tool for development support. Using the `KIOSUploader’ class you can set up a background upload thread which will automatically push audio recordings and JSON recognition results to Dashboard for further analysis.
7. Switching Decoding Graphs
If your app needs to support multiple decoding graphs you can either dynamically build multiple decoding graphs, or bundle multiple decoding graphs with your app. At any time while the recognizer is not listening you can call one of the methods from prepareForListening
family to unload the current graph and load a new one.
8. Other
The proof of concept app on Github showcases the use of the KeenASR framework in a few domains.
For information on how to specify what the recognizer is listening for (decoding graph), refer to Decoding Graphs and Acoustic Models.