KIOSRecognizer Class Reference

Inherits from NSObject
Declared in KIOSRecognizer.h
KIOSRecognizer.mm

Overview

An instance of the KIOSRecognizer class, called recognizer, manages recognizer resources and provides speech recognition capabilities to your application.

You typically initialize the engine at the app startup time by calling initWithASRBundle: or initWithASRBundleAtPath: method, and then use sharedInstance method when you need to access the recognizer.

Recognition results are provided via callbacks. To obtain results one of your classes will need to adopt a KIOSRecognizerDelegate protocol, and implement some of its methods.

In order to properly handle audio interrupts you will need to implement [KIOSRecognizerDelegate recognizerReadyToListenAfterInterrupt:] callback method in which you need to perform audio play cleanup (stop playing audio). This allows KeenASR SDK to properly deactivate audio session before app goes to background.

You can optionally implement [KIOSRecognizerDelegate recognizerReadyToListenAfterInterrupt:] callback method, which will trigger after KIOSRecognizer is fully setup after app comes to the foreground. This is where you may refresh the UI state of the app.

Initialization example:

 if (! [KIOSRecognizer sharedInstance]) {
     [KIOSRecognizer initWithASRBundle:@"librispeech-nnet2-en-us"];
 }
 // for convenience our class keeps a local reference of the recognizer
 self.recognizer = [KIOSRecognizer sharedInstance];

 // this class will also be implementing methods from KIOSRecognizerDelegate 
 // protocol
 self.recognizer.delegate = self;

 // recordings will be saved on the device
 self.recognizer.createAudioRecordings = YES;

 // after 0.8sec of silence, recognizer will automatically stop listening
 [self.recognizer setVADParameter:KIOSVadTimeoutEndSilenceForGoodMatch toValue:.8];

 // define callbacks for KIOSRecognizerDelegate

After initialization, audio data from all sessions when recognizer is listening will be used for online speaker adaptation. You can name speaker adaptation profiles via adaptToSpeakerWithName:, persist profiles in the filesystem via saveSpeakerAdaptationProfile, and reset via resetSpeakerAdaptation.

Warning: Only a single instance of the recognizer can exist at any given time.

Properties

+ sharedInstance

Returns shared instance of the recognizer

+ (nullable KIOSRecognizer *)sharedInstance

Return Value

The shared recognizer instance

Discussion

Warning: if the engine has not been initialized by calling initWithASRBundle:, this method will return nil

Declared In

KIOSRecognizer.h

  delegate

Delegate, which handles KIOSRecognizerDelegate protocol methods

@property (nonatomic, weak, nullable) id<KIOSRecognizerDelegate> delegate

Declared In

KIOSRecognizer.h

  recognizerState

State of the recognizer, a read-only property that takes one of KIOSRecognizerState values

@property (assign, readonly) KIOSRecognizerState recognizerState

Declared In

KIOSRecognizer.h

  asrBundlePath

Absolute path to the ASR bundle where acoustic models, config, etc. reside

@property (nonatomic, readonly, nonnull) NSString *asrBundlePath

Declared In

KIOSRecognizer.h

  asrBundleName

Name of the ASR Bundle (name of the directory that contains all the ASR resources. This will be the last component of the asrBundlePath.

@property (nonatomic, readonly, nonnull) NSString *asrBundleName

Declared In

KIOSRecognizer.h

  recognizerType

Type of the recognizer. It makes sense to query this property only after the recognizer has been initialized.

@property (nonatomic, assign, readonly) KIOSRecognizerType recognizerType

Declared In

KIOSRecognizer.h

  rescore

If set to YES, recognizer will perform rescoring for the final result, using rescoring language model provided in the custom decoding graph that’s bundled with the application.

@property (nonatomic, assign) BOOL rescore

Discussion

Default is YES.

Warning: If the resources necessary for rescoring are not available in the custom decoding graph directory bundled with the app, and rescore is set to YES, rescoring step will be skipped.

Declared In

KIOSRecognizer.h

Initialization, Preparing, Starting, and Stopping Recognition

+ initWithASRBundle:

Initialize ASR engine with the ASR Bundle, which provides all the resources necessary for initialization. You will use this initalization method if you included ASR bundle with your application. See also initWithASRBundleAtPath: for scenarios when ASR Bundle is not included with the app, but downloaded after the app has been installed. SDK initialization needs to occur before any other work can be performed.

+ (BOOL)initWithASRBundle:(nonnull NSString *)bundleName

Parameters

bundleName

name of the ASR Bundle. A directory containing all the resources necessary for the specific recognizer type. This will typically include all acoustic model related files, and configuration files. The bundle directory should contain decode.conf configuration file, which can be augmented with additional config params. Currently, that is the only way to pass various settings to the decoder. All path references in config files should be relative to the app root directory (e.g. librispeech-gmm-en-us/mfcc.conf). The init method will initiallize appropriate recognizer type based on the name and content of the ASR bundle.

Return Value

TRUE if succesful, FALSE otherwise.

Discussion

Warning: When initializing the recognizer, you need to make sure that bundle directory contains all the necessary resources needed for the specific recognizer type. If your app is dynamically creating decoding graphs, ASR bundle directory needs to contain lang subdirectory with relevant resources (lexicon, etc.).

Declared In

KIOSRecognizer.h

+ initWithASRBundleAtPath:

Initialize ASR engine with the ASR Bundle located at provided path. This is an alternative method to initialize the SDK, which you would use if you did not package ASR Bundle with your application but instead downloaded it after the app has been installed. SDK initialization needs to occur before any other work can be performed.

+ (BOOL)initWithASRBundleAtPath:(nonnull NSString *)pathToASRBundle

Parameters

pathToASRBundle

full path to the ASR Bundle. For more details about ASR Bundles see initWithASRBundle:

Return Value

TRUE if succesful, FALSE otherwise.

Discussion

Warning: When initializing the recognizer, make sure that the bundle directory contains all the necessary resources needed for the specific recognizer type. If your app is dynamically creating decoding graphs, ASR bundle directory needs to contain lang subdirectory with relevant resources (lexicon, etc.).

Declared In

KIOSRecognizer.h

– prepareForListeningWithCustomDecodingGraphWithName:

- (BOOL)prepareForListeningWithCustomDecodingGraphWithName:(nonnull NSString *)dgName

Parameters

dgName

name of the custom decoding graph

Return Value

TRUE if successful, FALSE otherwise

Discussion

After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.

Declared In

KIOSRecognizer.h

– prepareForListeningWithCustomDecodingGraphAtPath:

Prepare for recognition by loading custom decoding graph that was bundled with the application. You will typically use this approach for large vocabulary tasks, where it would take too long to build the decoding graph on the mobile device.

- (BOOL)prepareForListeningWithCustomDecodingGraphAtPath:(nonnull NSString *)pathToDecodingGraphDirectory

Parameters

pathToDecodingGraphDirectory

absolute path to the custom decoding graph directory which was created ahead of time and packaged with the app.

Return Value

TRUE if successful, FALSE otherwise.

Discussion

After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.

Warning: If custom decoding graph was built with rescoring capability, all the resources will be loaded regardless of how rescore paramater is set.

Declared In

KIOSRecognizer.h

– startListening

Start processing incoming audio.

- (BOOL)startListening

Return Value

TRUE if successful, FALSE otherwise

After calling this method, recognizer will listen to and decode audio coming through the microphone using decoding graph you specified via one of the prepareForListening methods. The listening process will stop either by: a) an explicit call to stopListening or b) if one of the Voice Activity Detection module rules are triggered (for example, max duration without speech, or end-silence, etc.), c) if audio interrupt occurs (phone call, audible notification, app goes to background, etc.).

When the recognizer stops listening due to VAD triggering, it will call recognizerFinalResult:forRecognizer: callback method.

When the recognizer stops listening due to audio interrupt, no callback methods will be triggered until audio interrupt is over.

VAD settings can be modified via setVADParameter:toValue: method.

Discussion

Warning: You will need to call either prepareForListeningWithCustomDecodingGraphWithName or prepareForListeningWithCustomDecodingGraphAtPath before calling this method. You will also need to make sure that user has granted audio recording permission before calling this method; see AVAudioSessionRecordPermission and [AVAudioSession requestRecordPermission:] in AVFoundation framework for details.

Declared In

KIOSRecognizer.h

– startListeningFromAudioFile:

Performs speech recognition on the audio file. This is an asynchronious method, which will perform basic validation (valid wav file, sampling frequency of the audio matches that of the ASR Bundle), and then start recognition in the background and return. Recognition results can be obtained via recognizerFinalResult:forRecognizer: and recognizerPartialResult:forRecognizer: methods.

- (BOOL)startListeningFromAudioFile:(nonnull NSString *)pathToAudioFile

Parameters

pathToAudioFile

full path to the audio file in WAV format. Files should be mono (single channel) and its sampling frequency should match the sampling frequency used for the ASR bundle training (typically 16kHz).

Return Value

TRUE if the audio file is valid WAV file, its sampling frequency matches the one in ASR Bundle, and recording duration is less than 100ms, FALSE otherwise.

Discussion

Note: The whole audio file will be loaded in the memory, thus we currently limit the length to 100sec. If file is longer than 100sec no processing will occur and the method will return FALSE.

Declared In

KIOSRecognizer.h

– stopListening

Stop the recognizer from processing incoming audio.

- (void)stopListening

Discussion

Warning: Calling this method will not trigger recognizerFinalResult delegate call. Use stopListeningAndReturnFinalResult if you are interested in obtaining the final result directly.

Declared In

KIOSRecognizer.h

– stopListeningAndReturnFinalResult

Stop the recognizer from processing incoming audio and return the final result.

- (nullable KIOSResult *)stopListeningAndReturnFinalResult

Return Value

Final result of the recognition.

Discussion

Warning: This method runs synchroniously. For large decoding graphs there may be noticable delay (few hundred ms) on lower-end devices. This method will return nil if recognizer is already in KIOSRecognizerStateFinalProcessing (due to VAD rules automatically triggering for example).

Declared In

KIOSRecognizer.h

Speaker Adaptation

– adaptToSpeakerWithName:

Defines the name that will be used to uniquely identify speaker adaptation profile. When recognizer starts to listen, it will try to find a matching speaker profile in the filesystem (profiles are matched based on speakername, asrbundle, and audio route). When saveSpeakerAdaptationProfile method is called, it uses the name to uniquely identify the profile file that will be saved in the filesystem.

- (void)adaptToSpeakerWithName:(nonnull NSString *)speakerName

Parameters

speakerName

(pseduo)name of the speaker for which adaptation is to be performed. Default value is ‘default’.

The name used here does not have to correspond to the real name of user (thus we call it pseudo name). The exact value does not matter as long as you can match the value to the specific user in your app. For example, you could use ‘user1’, ‘user2’, etc..

Discussion

Warning: If you cannot match names to your users, it’s recommended to not use this method, and to not save adaptation profiles between sessions. Adaptation will still be performed throughout the session, but each new session (activity after initialization of recognizer) will start from the baseline models.

In-memory speaker adaptation profile can always be reset by calling resetSpeakerAdaptation.

If this method is called while recognizer is listening, it will only affect subsequent calls to startListening methods.

Declared In

KIOSRecognizer.h

– resetSpeakerAdaptation

Resets speaker adaptation profile in the current recognizer session. Calling this method will also reset the speakerName to ‘default’. If the corresponding speaker adaptation profile exists in the filesystem for ‘default’ speaker, it will be used. If not, initial models from the ASR Bundle will be the baseline.

- (void)resetSpeakerAdaptation

Discussion

You would typically use this method id there is a new start of a certain activity in your app that may entail new speaker. For example, a practice view is started and there is a good chance a different user may be using the app.

If speaker (pseudo)identities are known, you don’t need to call this method, you can just switch speakers by calling adaptToSpeakerWithName: with the appropriate speakerName

Following are the tradeoffs when using this method:

  • the downside of resetting user profile for the existing user is that ASR performance will be reset to the baseline (no adaptation), which may slightly degrade performance in the first few interactions

  • the downside of NOT resetting user profile for a new user is that, depending on the characteristics of the new user’s voice, ASR performance may initially be degraded slightly (when comparing to the baseline case of no adaptation)

Calls to this method will be ignored if recognizer is in LISTENING state.

If you are resetting adaptation profile and you know user’s (pseudo)identity, you may want to call saveSpeakerAdaptationProfile method prior to calling this method so that on subsequent user switches, adaptation profiles can be reloaded and recognition starts with the speaker profile trained on previous sessions audio.

Declared In

KIOSRecognizer.h

– saveSpeakerAdaptationProfile

Saves speaker profile (used for adaptation) in the filesystem.

- (void)saveSpeakerAdaptationProfile

Discussion

Speaker profile will be saved in the file system, in Caches/KaldiIOS-speaker-profiles/ directory. Profile filename is composed of the speakerName, asrBundle, and audioRoute.

Declared In

KIOSRecognizer.h

+ removeAllSpeakerAdaptationProfiles

Remove all adaptation profiles for all speakers.

+ (BOOL)removeAllSpeakerAdaptationProfiles

Declared In

KIOSRecognizer.h

+ removeSpeakerAdaptationProfiles:

Removes all adaptation profiles for the speaker with name speakerName.

+ (BOOL)removeSpeakerAdaptationProfiles:(nonnull NSString *)speakerName

Parameters

speakerName

name of the speaker whose profiles should be removed

Declared In

KIOSRecognizer.h

File Audio Recording Management

  createAudioRecordings

Set to true if you want to keep audio recordings in the file system. Default is FALSE.

@property (nonatomic, assign) BOOL createAudioRecordings

Declared In

KIOSRecognizer.h

  recordingsDir

Directory in which recordings will be stored. Default is Library/Cache/KaldiIOS-recordings

@property (nonatomic, copy, nonnull) NSString *recordingsDir

Declared In

KIOSRecognizer.h

  lastRecordingFilename

Filename of the last recording. If createAudioRecordings was set to TRUE, you can read the filename of the latest recording via this property.

@property (nonatomic, readonly, nullable) NSString *lastRecordingFilename

Declared In

KIOSRecognizer.h

Other

– inputLevel

The most recent signal input level in dB

- (float)inputLevel

Return Value

signal input level in dB

Declared In

KIOSRecognizer.h

+ echoCancellationAvailable

Provides information about echo cancellation support on the device.

+ (BOOL)echoCancellationAvailable

Return Value

YES if echo cancellation is supported, NO otherwise

Declared In

KIOSRecognizer.h

– performEchoCancellation:

EXPERIMENTAL Specifies if echo cancellation should be performed. If value is set to YES and the device supports echo cancellation, then audio played by the application will be removed from the audio captured via the microphone.

- (BOOL)performEchoCancellation:(BOOL)value

Parameters

value

set to YES to turn on echo cancellation processing, NO to turn it off. Default is NO.

Return Value

TRUE if value was successfully set, FALSE otherwise. If the device does not support echo cancellatio and you pass YES to this method, it will return FALSE.

Discussion

Warning: Calls to this method while the recognizer is listening will be ignored end the method will return FALSE.

Declared In

KIOSRecognizer.h

+ version

Version of the KeenASR framework.

+ (nonnull NSString *)version

Declared In

KIOSRecognizer.h

+ setLogLevel:

Set log level for the framework.

+ (void)setLogLevel:(KIOSRecognizerLogLevel)logLevel

Parameters

logLevel

one of KIOSRecognizerLogLevel

Default value is KIOSRecognizerLogLevelWarning.

Declared In

KIOSRecognizer.h

Config Parameters

– setVADParameter:toValue:

Set any of KIOSVadParameter Voice Activity Detection parameters. These parameters can be set at any time and they will go into effect immediately.

- (void)setVADParameter:(KIOSVadParameter)parameter toValue:(float)value

Parameters

parameter

one of KIOSVadParameter

value

duration in seconds for the parameter

Discussion

Warning: Setting VAD rules in the config file within the ASR bundle will NOT have any effect. Values for these parameters are set to their defaults upon initialization of KIOSRecognizer. They can only be changed programmatically, using this method.

Declared In

KIOSRecognizer.h

Deprecated methods and properties

  listening

Is recognizer listening to and decoding the incoming audio. This property has been deprecated and replaced by recognizerState.

@property (assign, readonly) BOOL listening

Declared In

KIOSRecognizer.h