Class KASRRecognizer
You typically initialize the engine at the app startup time by calling
initWithASRBundleAtPath(String, Context)
method, and
then use sharedInstance()
static method when you need to access the
recognizer.
Recognition results are provided via callbacks. To obtain results one of your
classes will need to adopt a KASRRecognizerListener
interface and implement some
of its methods.
Note: Only a single instance of the recognizer can exist at any given time.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
These constants indicate the log levels for the framework.static enum
These constants indicate different states recognizer can assume.static enum
These constants correspond to different Voice Activity Detection parameters that are used for endpointing during recognition. -
Method Summary
Modifier and TypeMethodDescriptionboolean
Activate audio stack that has been previously deactivated.void
adaptToSpeakerWithName
(String speakerName) Defines the name that will be used to uniquely identify speaker adaptation profile.void
addListener
(KASRRecognizerListener listener) Adds listener.void
Adds trigger phrase listener.boolean
Deactivate audio stack.Obtains a name of the ASR Bundle used to initialize the SDK.Obtains a path to the directory where ASR Bundle used to initialize the SDK is storedReturns name of the decoding graph that's used for recognition.float
The most recent signal input level in dBReturns recognizer state, one of KASRRecognizerState valuesValue of the rescoring flagstatic boolean
initWithASRBundleAtPath
(String pathToAsrBundle, android.content.Context context) Initialize ASR engine with the ASR Bundle located at the provided path.boolean
boolean
performEchoCancellation
(boolean value) EXPERIMENTAL Specifies if echo cancellation should be performed.boolean
prepareForListeningWithContextualDecodingGraphAtPath
(String pathToDecodingGraphDirectory, Integer contextId, boolean computeGoP) Prepare for recognition by loading contextual decoding graph that was bundled with the application.boolean
prepareForListeningWithContextualDecodingGraphWithName
(String dgName, Integer contextId, boolean computeGoP) Prepare for recognition by loading contextual decoding graph that was prepared viaKASRDecodingGraph.createContextualDecodingGraphFromPhrases(ArrayList, KASRRecognizer, ArrayList, KASRDecodingGraph.KASRSpeakingTask, float, String)
method.boolean
prepareForListeningWithDecodingGraphAtPath
(String pathToDecodingGraphDirectory, boolean computeGoP) Prepare for recognition by loading decoding graph that's stored in the filesystem (for example, downloaded or copied from the app bundle.boolean
prepareForListeningWithDecodingGraphWithName
(String dgName, boolean computeGoP) Prepare for recognition by loading decoding graph that was prepared via on of the methods that create decoding graphs.boolean
Remove all adaptation profiles for all speakers.void
removeListener
(KASRRecognizerListener listener) Removes listener.boolean
removeSpeakerAdaptationProfiles
(String speakerName) Removes all adaptation profiles for the speaker with name speakerName.void
Removes trigger phrase listener.void
Resets speaker adaptation profile in the current recognizer session.void
Saves speaker profile (used for adaptation) in the filesystem.static void
Set log level for the framework.void
setRescore
(Boolean value) If set to true, recognizer will perform rescoring for the final result, using rescoring language model provided in the custom decoding graph that's bundled with the application.boolean
setVADGating
(boolean value) VAD (voice activity detection) gating introduces a super lightweight voice activity detection before speech recognition.void
setVADParameter
(KASRRecognizer.KASRVadParameter parameter, float value) Set any ofKASRRecognizer.KASRVadParameter
Voice Activity Detection parameters.static KASRRecognizer
boolean
Start processing incoming audio.boolean
Stop the recognizer from processing incoming audio.Deprecated.This method is deprecated.static boolean
teardown()
Teardown current singleton instance of the recognizer and all associated resources.static String
version()
Version of the KeenASR framework.
-
Method Details
-
initWithASRBundleAtPath
public static boolean initWithASRBundleAtPath(String pathToAsrBundle, android.content.Context context) Initialize ASR engine with the ASR Bundle located at the provided path. SDK initialization needs to occur before any other work can be performed.- Parameters:
pathToAsrBundle
- full path to the ASR Bundlecontext
- application context- Returns:
- True if successful, false otherwise.
Note: When initializing the recognizer, make sure that the bundle directory contains all the necessary resources needed for the specific recognizer type. If your app is dynamically creating decoding graphs, ASR bundle directory needs to contain lang subdirectory with the relevant resources (lexicon, etc.).
-
teardown
public static boolean teardown()Teardown current singleton instance of the recognizer and all associated resources. This method will return false if recognizer is actively listening.- Returns:
- True if successful, false otherwise.
-
activateAudioStack
public boolean activateAudioStack()Activate audio stack that has been previously deactivated.- Returns:
- True if successful, false otherwise
-
deactivateAudioStack
public boolean deactivateAudioStack()Deactivate audio stack. You will typically call this method when there is an audio route change (e.g. Bluetooth headset has been connected). In such scenario you will typically stop listening, deactivate the audio stack, activate audio stack in order for KeenASR SDK to pickup the default route, and then start listening again. This method will return false if recognizer instance is actively listening or if audio stack has already been deactivated.- Returns:
- True if successful, false otherwise
-
prepareForListeningWithDecodingGraphWithName
Prepare for recognition by loading decoding graph that was prepared via on of the methods that create decoding graphs. Calls to this method will be ignored if the recognizer is listening.After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
- Parameters:
dgName
- name of the custom decoding graphcomputeGoP
- set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.- Returns:
- True if successful, false otherwise.
-
prepareForListeningWithDecodingGraphAtPath
public boolean prepareForListeningWithDecodingGraphAtPath(String pathToDecodingGraphDirectory, boolean computeGoP) Prepare for recognition by loading decoding graph that's stored in the filesystem (for example, downloaded or copied from the app bundle. You will typically use this approach for large vocabulary tasks, where it would take too long to build the decoding graph on the mobile device. Call to this method will be ignored if the recognizer is listening.After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
- Parameters:
pathToDecodingGraphDirectory
- absolute path to the custom decoding graph directory which was created ahead of time and packaged with the app.computeGoP
- set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.- Returns:
- True if successful, false otherwise.
Note: If custom decoding graph was built with rescoring capability, all the resources will be loaded regardless of how rescore paramater is set.
-
prepareForListeningWithContextualDecodingGraphWithName
public boolean prepareForListeningWithContextualDecodingGraphWithName(String dgName, Integer contextId, boolean computeGoP) Prepare for recognition by loading contextual decoding graph that was prepared viaKASRDecodingGraph.createContextualDecodingGraphFromPhrases(ArrayList, KASRRecognizer, ArrayList, KASRDecodingGraph.KASRSpeakingTask, float, String)
method. Calls to this method will be ignored if the recognizer is listening.After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
- Parameters:
dgName
- name of the custom decoding graphcontextId
- 0-based index of the context groupcomputeGoP
- set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.- Returns:
- True if successful, false otherwise.
-
prepareForListeningWithContextualDecodingGraphAtPath
public boolean prepareForListeningWithContextualDecodingGraphAtPath(String pathToDecodingGraphDirectory, Integer contextId, boolean computeGoP) Prepare for recognition by loading contextual decoding graph that was bundled with the application. Call to this method will be ignored if the recognizer is listening.After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
- Parameters:
pathToDecodingGraphDirectory
- absolute path to the custom decoding graph directory which was created ahead of time and packaged with the app.contextId
- 0-based index of the context groupcomputeGoP
- true if Goodness of Pronunciation scores at the phoneme level should be computed, false otherwise- Returns:
- True if successful, false otherwise.
Note: If custom decoding graph was built with rescoring capability, all the resources will be loaded regardless of how rescore paramater is set.
-
setRescore
If set to true, recognizer will perform rescoring for the final result, using rescoring language model provided in the custom decoding graph that's bundled with the application.Default is true.
Note: If the resources necessary for rescoring are not available in the custom decoding graph directory bundled with the app, and rescore is set to true, rescoring step will be skipped.
- Parameters:
value
- boolean value that determines if rescoring is to be performed
-
getRescore
Value of the rescoring flag- Returns:
- True is rescoring is set, false otherwise
-
startListening
public boolean startListening()Start processing incoming audio.- Returns:
- True if successful, false otherwise
After calling this method, recognizer will listen to and decode audio coming through the microphone using decoding graph you specified via one of the prepareForListening methods. For decoding graphs created without trigger phrase support listening process will stop either due to: a) an explicit call to stopListening or b) if one of the Voice Activity Detection rules are triggered (for example, max duration without speech, or end-silence, etc.). We recommend using VAD based end-pointing instead of explicitly calling stopListening
If decoding graph was created with the triggerPhrase support the SDK will listen continuously until the trigger phrase is recognized, then it will switch over to the standard mode with partial results being reported via onPartialResult callback.
When the recognizer stops listening due to VAD triggering, it will call
KASRRecognizerListener.onFinalResponse(KASRRecognizer, KASRResponse)
callback method.When the recognizer stops listening due to audio interrupt, *no callback methods* will be triggered until audio interrupt is over.
VAD settings can be modified via
Note: You will need to call eithersetVADParameter(KASRVadParameter, float)
method.prepareForListeningWithDecodingGraphWithName(String, boolean)
orprepareForListeningWithDecodingGraphAtPath(String, boolean)
before calling this method. You will also need to make sure that user has granted audio recording permission before calling this method; see android.support.v4.app.ActivityCompat#requestPermissions for details.
-
stopListening
public boolean stopListening()Stop the recognizer from processing incoming audio.- Returns:
- True if successful, false otherwise.
Note: Calling this method will not trigger recognizerFinalResult delegate call.
-
stopListeningAndReturnFinalResult
Deprecated.This method is deprecated. UseKASRRecognizer.KASRVadParameter
to instruct recognizer to stop listening and obtain KASRResponse via callback.Stop the recognizer from processing incoming audio and return the final result.- Returns:
- Final result of the recognition.
If your application is using Voice Activity Detection parameters, it possible that this method doesn't return the result if one of the Voice Activity Detection thresholds triggers. In that case, recognizerFinalResult delegate will be called.
Note: This method runs synchronously. For large decoding graphs there may be noticeable delay (few hundred ms) on lower-end devices.
-
getRecognizerState
Returns recognizer state, one of KASRRecognizerState values- Returns:
- state of the recognizer, one of KASRRecognizerState values
-
getDecodingGraphName
Returns name of the decoding graph that's used for recognition.- Returns:
- name of the decoding graph that's used for recognition, null if the recognizer is not prepared for listening.
-
addListener
Adds listener.- Parameters:
listener
- listener that should be added
-
removeListener
Removes listener.- Parameters:
listener
- listener to be removed
-
addTriggerPhraseListener
Adds trigger phrase listener.- Parameters:
listener
- listener that should be added
-
removeTriggerPhraseListener
Removes trigger phrase listener.- Parameters:
listener
- listener to be removed
-
getAsrBundlePath
Obtains a path to the directory where ASR Bundle used to initialize the SDK is stored- Returns:
- String containing a full path to the ASR Bundle directory
-
getAsrBundleName
Obtains a name of the ASR Bundle used to initialize the SDK.- Returns:
- String containing a name of the ASR Bundle
-
adaptToSpeakerWithName
Defines the name that will be used to uniquely identify speaker adaptation profile. When recognizer starts to listen, it will try to find a matching speaker profile in the filesystem (profiles are matched based on speakername, asrbundle, and audio route). When saveSpeakerAdaptationProfile method is called, it uses the name to uniquely identify the profile file that will be saved in the filesystem.- Parameters:
speakerName
- (pseduo)name of the speaker for which adaptation is to be performed. Default value is 'default'.The name used here does not have to correspond to the real name of user (thus we call it pseudo name). The exact value does not matter as long as you can match the value to the specific user in your app. For example, you could use 'user1', 'user2', etc..
Note: If you cannot match names to your users, it's recommended to not use this method, and to not save adaptation profiles between sessions. Adaptation will still be performed throughout the session, but each new session (activity after initialization of recognizer) will start from the baseline models.
In-memory speaker adaptation profile can always be reset by calling resetSpeakerAdaptation method.
If this method is called while recognizer is listening, it will only affect subsequent calls to startListening methods.
-
resetSpeakerAdaptation
public void resetSpeakerAdaptation()Resets speaker adaptation profile in the current recognizer session. Calling this method will also reset the speakerName to 'default'. If the corresponding speaker adaptation profile exists in the filesystem for 'default' speaker, it will be used. If not, initial models from the ASR Bundle will be the baseline. You would typically use this method id there is a new start of a certain activity in your app that may entail new speaker. For example, a practice view is started and there is a good chance a different user may be using the app. If speaker (pseudo)identities are known, you don't need to call this method, you can just switch speakers by calling adaptToSpeakerWithName: with the appropriate speakerName Following are the tradeoffs when using this method:- the downside of resetting user profile for the existing user is that ASR performance will be reset to the baseline (no adaptation), which may slightly degrade performance in the first few interactions
- the downside of NOT resetting user profile for a new user is that, depending on the characteristics of the new user's voice, ASR performance may initially be degraded slightly (when comparing to the baseline case of no adaptation)
-
saveSpeakerAdaptation
public void saveSpeakerAdaptation()Saves speaker profile (used for adaptation) in the filesystem. Speaker profile will be saved in the file system, in files/KeenASR-speaker-profiles/ directory. Profile filename is composed of the speakerName, asrBundle, and audioRoute. -
removeAllSpeakerAdaptationProfiles
public boolean removeAllSpeakerAdaptationProfiles()Remove all adaptation profiles for all speakers.- Returns:
- true if successfully removed all the profiles for all the speakers
-
removeSpeakerAdaptationProfiles
Removes all adaptation profiles for the speaker with name speakerName.- Parameters:
speakerName
- name of the speaker whose profiles should be removed- Returns:
- true if successfully removed, false otherwise
-
setLogLevel
Set log level for the framework.- Parameters:
logLevel
- one of KIOSRecognizerLogLevelDefault value is KIOSRecognizerLogLevelWarning.
-
setVADGating
public boolean setVADGating(boolean value) VAD (voice activity detection) gating introduces a super lightweight voice activity detection before speech recognition. After you call startListening method, if VADGating is turned on recognition will start only after voice was detected by VADGating module. You would typically use this only in always-on listening scenarios when you want to preserve battery life.- Parameters:
value
- true if VAD Gating should be turned on, false otherwise.- Returns:
- true if successfully set, false otherwise
-
setVADParameter
Set any ofKASRRecognizer.KASRVadParameter
Voice Activity Detection parameters. These parameters can be set at any time and they will go into effect immediately.- Parameters:
parameter
- one of KASRVadParametervalue
- duration in seconds for the parameterNote: Setting VAD rules in the config file within the ASR bundle will NOT have any effect. Values for these parameters are set to their defaults upon initialization of KASRRecognizer. They can only be changed programmatically, using this method.
-
getInputLevel
public float getInputLevel()The most recent signal input level in dB- Returns:
- signal input level in dB
-
isEchoCancellationAvailable
public boolean isEchoCancellationAvailable()- Returns:
- True if the device natively supports echo cancellation, false otherwise.
-
performEchoCancellation
public boolean performEchoCancellation(boolean value) EXPERIMENTAL Specifies if echo cancellation should be performed. If value is set to true and the device supports echo cancellation, then audio played by the application will be removed from the audio captured via the microphone.- Parameters:
value
- set to YES to turn on echo cancellation processing, NO to turn it off. Default is NO.- Returns:
- true if value was successfully set, false otherwise. If the device does not support echo cancellatio and you pass true to this method, it will return false. WARNING: Calls to this method while the recognizer is listening will be ignored end the method will return false.
-
version
Version of the KeenASR framework.- Returns:
- string containing SDK version