|
KeenASR Framework v2.1 (8b72cc4)
Keen Research
|
#include <KIOSRecognizer.h>
Instance Methods | |
Config Parameters | |
| (void) | - setVADParameter:toValue: |
Deprecated methods and properties | |
| (NSString *lastRecordingFilename) | - __deprecated_msg |
| (NSString *lastJSONMetadataFilename) | - __deprecated_msg |
| (BOOL) | - prepareForListeningWithCustomDecodingGraphWithName: |
| (BOOL) | - prepareForListeningWithCustomDecodingGraphAtPath: |
| (BOOL) | - startListeningFromAudioFile: |
| (void) | - enableBluetoothOutput: |
| (void) | - enableBluetoothA2DPOutput: |
Class Methods | |
Other | |
| (nonnull NSString *) | + version |
| (void) | + setLogLevel: |
Properties | |
| id< KIOSRecognizerDelegate > | delegate |
| KIOSRecognizerState | recognizerState |
| NSString * | asrBundlePath |
| NSString * | asrBundleName |
| NSString * | currentDecodingGraphName |
| NSString * | recordingsDir |
| NSString * | miscDataDirectory |
| BOOL | rescore |
| (nullable KIOSRecognizer *) | + sharedInstance |
Audio Handling | |
| BOOL | handleNotifications |
| (BOOL) | + echoCancellationAvailable |
| (float) | - inputLevel |
| (BOOL) | - performEchoCancellation: |
| (void) | - setBluetoothA2DPOutput: |
| (BOOL) | - deactivateAudioStack |
| (BOOL) | - activateAudioStack |
| (void) | - reinitAudioStack |
Initialization, Preparing, Starting, and Stopping Recognition | |
| (BOOL) | + initWithASRBundle: |
| (BOOL) | + initWithASRBundleAtPath: |
| ((unavailable("new not available, call sharedInstance instead"))) | + __attribute__ |
| (BOOL) | + teardown |
| (void) | - setVadGating: |
| (BOOL) | - prepareForListeningWithDecodingGraphWithName:withGoPComputation: |
| (BOOL) | - prepareForListeningWithDecodingGraphAtPath:withGoPComputation: |
| (BOOL) | - prepareForListeningWithContextualDecodingGraphWithName:andContextId:withGoPComputation: |
| (BOOL) | - prepareForListeningWithContextualDecodingGraphAtPath:andContextId:withGoPComputation: |
| (BOOL) | - startListening: |
| (void) | - stopListening |
Speaker Adaptation | |
| (BOOL) | + removeAllSpeakerAdaptationProfiles |
| (BOOL) | + removeSpeakerAdaptationProfiles: |
| (void) | - adaptToSpeakerWithName: |
| (void) | - resetSpeakerAdaptation |
| (void) | - saveSpeakerAdaptationProfile |
An instance of the KIOSRecognizer class, called recognizer, manages recognizer resources and provides speech recognition capabilities to your application.
You typically initialize the engine at the app startup time by calling +initWithASRBundle: or +initWithASRBundleAtPath: method, and then use sharedInstance method when you need to access the recognizer.
Recognition results are provided via callbacks. To obtain results one of your classes will need to adopt a [KIOSRecognizerDelegate protocol](KIOSRecognizerDelegate), and implement some of its methods.
In order to properly handle audio interrupts you will need to implement [KIOSRecognizerDelegate recognizerReadyToListenAfterInterrupt:] callback method in which you need to perform audio play cleanup (stop playing audio). This allows KeenASR SDK to properly deactivate audio session before app goes to background.
You can optionally implement [KIOSRecognizerDelegate recognizerReadyToListenAfterInterrupt:] callback method, which will trigger after KIOSRecognizer is fully setup after app comes to the foreground. This is where you may refresh the UI state of the app.
Initialization example:
After initialization, audio data from all sessions when recognizer is listening will be used for online speaker adaptation. You can name speaker adaptation profiles via adaptToSpeakerWithName:, persist profiles in the filesystem via saveSpeakerAdaptationProfile, and reset via resetSpeakerAdaptation.
| - (BOOL) activateAudioStack |
Activates audio stack that was previously deactivated using deactivateAudioStack method. This method should be called after all other audio systems have been setup to make sure AVAudioSession is properly initialized for audio capture.
| - (void) adaptToSpeakerWithName: | (nonnull NSString *) | speakerName |
Defines the name that will be used to uniquely identify speaker adaptation profile. When recognizer starts to listen, it will try to find a matching speaker profile in the filesystem (profiles are matched based on speakername, asrbundle, and audio route). When saveSpeakerAdaptationProfile method is called, it uses the name to uniquely identify the profile file that will be saved in the filesystem.
| speakerName | (pseduo)name of the speaker for which adaptation is to be performed. Default value is 'default'. |
The name used here does not have to correspond to the real name of user (thus we call it pseudo name). The exact value does not matter as long as you can match the value to the specific user in your app. For example, you could use 'user1', 'user2', etc..
In-memory speaker adaptation profile can always be reset by calling resetSpeakerAdaptation.
If this method is called while recognizer is listening, it will only affect subsequent calls to startListening methods.
| - (BOOL) deactivateAudioStack |
Deactivates audio session and KeenASR audio stack. If handleNotifications is set to YES, you will not need to use this method and its counterpart activateAudioStack; KeenASR Framework will handle audio interrupts and notifications when the app goes to background/foreground.
If your app is handling notifications explicitly (handleNotifications is set to NO), you may want to call this method when an audio interrupt occurs. If recognizer is listening, this method will automatically stop listening, and then deactivate the audio stack. When the app comes active or audio interrupt finishes, you will need to call the activateAudioStack.
| + (BOOL) echoCancellationAvailable |
Provides information about echo cancellation support on the device.
| + (BOOL) initWithASRBundle: | (nonnull NSString *) | bundleName |
Initialize ASR engine with the ASR Bundle, which provides all the resources necessary for initialization. You will use this initalization method if you included ASR bundle with your application. See also initWithASRBundleAtPath: for scenarios when ASR Bundle is not included with the app, but downloaded after the app has been installed. SDK initialization needs to occur before any other work can be performed.
| bundleName | name of the ASR Bundle. A directory containing all the resources necessary for the specific recognizer type. This will typically include all acoustic model related files, and configuration files. The bundle directory should contain decode.conf configuration file, which can be augmented with additional config params. Currently, that is the only way to pass various settings to the decoder. All path references in config files should be relative to the app root directory (e.g. librispeechQT-nnet2-en-us/mfcc.conf). The init method will initiallize appropriate recognizer type based on the name and content of the ASR bundle. |
| + (BOOL) initWithASRBundleAtPath: | (nonnull NSString *) | pathToASRBundle |
Initialize ASR engine with the ASR Bundle located at provided path. This is an alternative method to initialize the SDK, which you would use if you did not package ASR Bundle with your application but instead downloaded it after the app has been installed. SDK initialization needs to occur before any other work can be performed.
| pathToASRBundle | full path to the ASR Bundle. For more details about ASR Bundles see initWithASRBundle: |
| - (float) inputLevel |
Retrieves the peak RMS audio level computed from the most recent audio buffer that's been processed by the recognizer. RMS is computed on a 25ms chunks of audio and peak value from the most recent audio buffer is returned.
If recognizer is not listening, or if no valid RMS level has been computer, returns NaN (see std::nan).
| - (BOOL) performEchoCancellation: | (BOOL) | value |
EXPERIMENTAL Specifies if echo cancellation should be performed. If value is set to YES and the device supports echo cancellation, then audio played by the application will be removed from the audio captured via the microphone.
| value | set to YES to turn on echo cancellation processing, NO to turn it off. Default is NO. |
| - (BOOL) prepareForListeningWithContextualDecodingGraphAtPath: | (nonnull NSString *) | dgPath | |
| andContextId: | (nonnull NSNumber *) | contextId | |
| withGoPComputation: | (BOOL) | computeGoP |
Prepare for recognition by loading custom decoding graph that was typically bundled with the application.
After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
Goodness of pronunciation scoring (GoP) requires ASR Bundle with relevant models; if such models are not available in the ASR Bundle, GoP scores will not be computed regardless of the computeGoP setting.
| dgPath | absolute path to the decoding graph directory which was created ahead of time and packaged with the app. |
| contextId | id of the context that should be used. This number will be in range of 0 - contextualPhrases.length, where contextualPhrases is an <NSArray<NSArray *>> used to build contextual graph. |
| computeGoP | goodness of pronunciation scores will be computed if this parameter is set to TRUE |
| - (BOOL) prepareForListeningWithContextualDecodingGraphWithName: | (nonnull NSString *) | dgName | |
| andContextId: | (nonnull NSNumber *) | contextId | |
| withGoPComputation: | (BOOL) | computeGoP |
Prepare for recognition by loading decoding graph that was prepared via [createContextualDecodingGraphFromPhrases:forRecognizer:usingAlternativePronunciations:andTask:andSaveWithName:] family of methods.
After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
Goodness of pronunciation scoring (GoP) requires ASR Bundle with relevant models; if such models are not available in the ASR Bundle, GoP scores will not be computed regardless of the computeGoP setting.
| dgName | name of the decoding graph |
| contextId | id of the context that should be used. This number will be in range of 0 - contextualPhrases.length, where contextualPhrases is an <NSArray<NSArray *>> used to build contextual graph. |
| computeGoP | goodness of pronunciation scores will be computed if this parameter is set to TRUE |
| - (BOOL) prepareForListeningWithDecodingGraphAtPath: | (nonnull NSString *) | pathToDecodingGraphDirectory | |
| withGoPComputation: | (BOOL) | computeGoP |
Prepare for recognition by loading custom decoding graph that was typically bundled with the application. You will typically use this approach for large vocabulary tasks, where it would take too long to build the decoding graph on the mobile device.
After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
Goodness of pronunciation scoring (GoP) requires ASR Bundle with relevant models; if such models are not available in the ASR Bundle, GoP scores will not be computed regardless of the computeGoP setting.
| pathToDecodingGraphDirectory | absolute path to the custom decoding graph directory which was created ahead of time and packaged with the app. |
| computeGoP | goodness of pronunciation scores will be computed if this parameter is set to TRUE |
| - (BOOL) prepareForListeningWithDecodingGraphWithName: | (nonnull NSString *) | dgName | |
| withGoPComputation: | (BOOL) | computeGoP |
Prepare for recognition by loading decoding graph that was prepared via [createDecodingGraphFromPhrases:forRecognizer:usingAlternativePronunciations:andTask: andSaveWithName: family of methods:] family of methods.
After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
Goodness of pronunciation scoring (GoP) requires ASR Bundle with relevant models; if such models are not available in the ASR Bundle, GoP scores will not be computed regardless of the computeGoP setting.
| dgName | name of the decoding graph |
| computeGoP | goodness of pronunciation scores will be computed if this parameter is set to TRUE |
| - (void) reinitAudioStack |
Reinitializes audio stack. Calling this method is equaivalent to calling deactivateAudioStack followed by activateAudioStack method.
| + (BOOL) removeAllSpeakerAdaptationProfiles |
Remove all adaptation profiles for all speakers.
| + (BOOL) removeSpeakerAdaptationProfiles: | (nonnull NSString *) | speakerName |
Removes all adaptation profiles for the speaker with name speakerName.
| speakerName | name of the speaker whose profiles should be removed |
| - (void) resetSpeakerAdaptation |
Resets speaker adaptation profile in the current recognizer session. Calling this method will also reset the speakerName to 'default'. If the corresponding speaker adaptation profile exists in the filesystem for 'default' speaker, it will be used. If not, initial models from the ASR Bundle will be the baseline.
You would typically use this method id there is a new start of a certain activity in your app that may entail new speaker. For example, a practice view is started and there is a good chance a different user may be using the app.
If speaker (pseudo)identities are known, you don't need to call this method, you can just switch speakers by calling adaptToSpeakerWithName: with the appropriate speakerName
Following are the tradeoffs when using this method:
Calls to this method will be ignored if recognizer is in LISTENING state.
If you are resetting adaptation profile and you know user's (pseudo)identity, you may want to call saveSpeakerAdaptationProfile method prior to calling this method so that on subsequent user switches, adaptation profiles can be reloaded and recognition starts with the speaker profile trained on previous sessions audio.
| - (void) saveSpeakerAdaptationProfile |
Saves speaker profile (used for adaptation) in the filesystem.
Speaker profile will be saved in the file system, in Caches/KaldiIOS-speaker-profiles/ directory. Profile filename is composed of the speakerName, asrBundle, and audioRoute.
| - (void) setBluetoothA2DPOutput: | (BOOL) | value |
Enables or disables bluetooth output via AVAudioSessionCategoryOptionAllowBluetoothA2DP category option of AVAudioSession.
| value | set to true to enable Bluetooth A2DPOutput or to false to disabled it. |
| + (void) setLogLevel: | (KIOSRecognizerLogLevel) | logLevel |
Set log level for the framework.
| logLevel | one of KIOSRecognizerLogLevel |
Default value is KIOSRecognizerLogLevelWarning.
| - (void) setVadGating: | (BOOL) | value |
Sets Voice Activity Detection to either FALSE or TRUE. If set to TRUE recognizer will utilize a simple Voice Activity Detection module and no recognition will occur until voice activity is detected. From the moment voice activity is detected, recognizer operates in a standard mode.
All the information in KIOSResponse (audio file, ASR result, etc.) is based from the moment of voice activity detection, NOT from the moment of startListening call.
This should be set to YES primarily in always-on listening mode to minimize the number of listening restarts as well as to minimize battery utilization.
| value | TRUE or FALSE |
| - (void) setVADParameter: | (KIOSVadParameter) | parameter | |
| toValue: | (float) | value |
Set any of KIOSVadParameter Voice Activity Detection parameters. These parameters can be set at any time. If they are set while the recognizer is listening, they will be used immediately.
| parameter | one of KIOSVadParameter |
| value | duration in seconds for the parameter |
| + (nullable KIOSRecognizer *) sharedInstance |
Returns shared instance of the recognizer
| - (BOOL) startListening: | (NSString *_Nullable *_Nullable) | responseId |
Start processing audio from the microphone.
After calling this method, recognizer will listen to and decode audio coming through the microphone using decoding graph you specified via one of the prepareForListening methods.
For example:
The listening process will stop after:
When the recognizer stops listening due to VAD triggering, it will call [recognizerFinalResponse:forRecognizer:]([KIOSRecognizerDelegate recognizerFinalResponse:forRecognizer:]) callback method.
When the recognizer stops listening due to audio interrupt, no callback methods will be triggered until audio interrupt is over.
If decoding graph was created with the trigger phrase support, recognizer will listen continuously until the trigger phrase is recognized, then it will switch over to the standard mode with partial results being reported via [recognizerPartialResult:forRecognizer:]([KIOSRecognizerDelegate recognizerPartialResult:forRecognizer:]) callback.
VAD settings can be modified via setVADParameter:toValue: method.
| responseId | address of the pointer to an NSString, which will be set to a responseId if startListening is successful. responseId is a unique identifier of the response. You can pass NULL to this method if you don't need responseId. |
| - (void) stopListening |
Stop the recognizer from processing incoming audio.
| + (BOOL) teardown |
Teardown recognizer and all the related resources. This method would typically be called when you want to create a new recognizer that uses different ASR Bundle (currently the SDK supports only one recognizer instance at the time). For example, if you are using English recognizer and want to switch to a Spanish recognizer you would teardown the English recognizer, and then create a Spanish recognizer.
| + (nonnull NSString *) version |
Version of the KeenASR framework.
|
readnonatomicassign |
Name of the ASR Bundle (name of the directory that contains all the ASR resources. This will be the last component of the asrBundlePath.
|
readnonatomicassign |
Absolute path to the ASR bundle where acoustic models, config, etc. reside
|
readnonatomicassign |
Name of the decoding graph currently used by the recognizer
|
readwritenonatomicweak |
Delegate, which handles KIOSRecognizerDelegate protocol methods
|
readwritenonatomicassign |
If set to YES (default behavior), SDK will handle various notifications related to app activity. When app goes to background, or a phone call or a audio interrupt comes through, the SDK will stop listening, teardown internal audio stack, and then upon app coming back to foreground/interrupt ending it will reinitialize internal audio stack.
If set to NO, it is developer's responsibility to handle notifications that may affect audio capture. In this case, you will need to stop listening and deactivate KeenASR audio stack if an audio interrupt comes through, and then reinit the audio stack when the interrupt is over. Setting handleNotifications to NO allows the SDK to work in the background mode; you will still need to properly handle audio interrupts using deactivateAudioStack, activateAudioStack or reinitAudioStack, and stopListening methods.
|
readnonatomicassign |
Path to the directory where miscelaneous data will be saved
|
readatomicassign |
State of the recognizer, a read-only property that takes one of KIOSRecognizerState values
|
readnonatomicassign |
Path to the directory where audio/json files will be saved for Dashboard uploads
|
readwritenonatomicassign |
Two-letter language code that defines language of the ASR Bundle used to initialize the recognizer. For example: "en", @"es", etc. If set to YES, recognizer will perform rescoring for the final result, using rescoring language model provided in the custom decoding graph that's bundled with the application.
Default is YES.