Class KASRRecognizer

java.lang.Object
com.keenresearch.keenasr.KASRRecognizer

public class KASRRecognizer extends Object
An instance of the KASRRecognizer class, called recognizer, manages recognizer resources and provides speech recognition capabilities to your application.

You typically initialize the engine at the app startup time by calling initWithASRBundleAtPath(String, Context) method, and then use sharedInstance() static method when you need to access the recognizer.

Recognition results are provided via callbacks. To obtain results one of your classes will need to adopt a KASRRecognizerListener interface and implement some of its methods.

Note: Only a single instance of the recognizer can exist at any given time.

  • Method Details

    • sharedInstance

      public static KASRRecognizer sharedInstance()
      Returns:
      singleton instance of KASRRecognizer. If you previously initialized the recognizer you can use this method to obtain the instance of that recognizer.
    • initWithASRBundleAtPath

      public static boolean initWithASRBundleAtPath(String pathToAsrBundle, android.content.Context context)
      Initialize ASR engine with the ASR Bundle located at the provided path. SDK initialization needs to occur before any other work can be performed.
      Parameters:
      pathToAsrBundle - full path to the ASR Bundle
      context - application context
      Returns:
      True if successful, false otherwise.

      Note: When initializing the recognizer, make sure that the bundle directory contains all the necessary resources needed for the specific recognizer type. If your app is dynamically creating decoding graphs, ASR bundle directory needs to contain lang subdirectory with the relevant resources (lexicon, etc.).

    • teardown

      public static boolean teardown()
      Teardown current singleton instance of the recognizer and all associated resources. This method will return false if recognizer is actively listening.
      Returns:
      True if successful, false otherwise.
    • activateAudioStack

      public boolean activateAudioStack()
      Activate audio stack that has been previously deactivated.
      Returns:
      True if successful, false otherwise
    • deactivateAudioStack

      public boolean deactivateAudioStack()
      Deactivate audio stack. You will typically call this method when there is an audio route change (e.g. Bluetooth headset has been connected). In such scenario you will typically stop listening, deactivate the audio stack, activate audio stack in order for KeenASR SDK to pickup the default route, and then start listening again. This method will return false if recognizer instance is actively listening or if audio stack has already been deactivated.
      Returns:
      True if successful, false otherwise
    • prepareForListeningWithDecodingGraphWithName

      public boolean prepareForListeningWithDecodingGraphWithName(String dgName, boolean computeGoP)
      Prepare for recognition by loading decoding graph that was prepared via on of the methods that create decoding graphs. Calls to this method will be ignored if the recognizer is listening.

      After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.

      Parameters:
      dgName - name of the custom decoding graph
      computeGoP - set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.
      Returns:
      True if successful, false otherwise.
    • prepareForListeningWithDecodingGraphAtPath

      public boolean prepareForListeningWithDecodingGraphAtPath(String pathToDecodingGraphDirectory, boolean computeGoP)
      Prepare for recognition by loading decoding graph that's stored in the filesystem (for example, downloaded or copied from the app bundle. You will typically use this approach for large vocabulary tasks, where it would take too long to build the decoding graph on the mobile device. Call to this method will be ignored if the recognizer is listening.

      After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.

      Parameters:
      pathToDecodingGraphDirectory - absolute path to the custom decoding graph directory which was created ahead of time and packaged with the app.
      computeGoP - set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.
      Returns:
      True if successful, false otherwise.

      Note: If custom decoding graph was built with rescoring capability, all the resources will be loaded regardless of how rescore paramater is set.

    • prepareForListeningWithContextualDecodingGraphWithName

      public boolean prepareForListeningWithContextualDecodingGraphWithName(String dgName, Integer contextId, boolean computeGoP)
      Prepare for recognition by loading contextual decoding graph that was prepared via KASRDecodingGraph.createContextualDecodingGraphFromPhrases(ArrayList, KASRRecognizer, ArrayList, KASRDecodingGraph.KASRSpeakingTask, float, String) method. Calls to this method will be ignored if the recognizer is listening.

      After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.

      Parameters:
      dgName - name of the custom decoding graph
      contextId - 0-based index of the context group
      computeGoP - set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.
      Returns:
      True if successful, false otherwise.
    • prepareForListeningWithContextualDecodingGraphAtPath

      public boolean prepareForListeningWithContextualDecodingGraphAtPath(String pathToDecodingGraphDirectory, Integer contextId, boolean computeGoP)
      Prepare for recognition by loading contextual decoding graph that was bundled with the application. Call to this method will be ignored if the recognizer is listening.

      After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.

      Parameters:
      pathToDecodingGraphDirectory - absolute path to the custom decoding graph directory which was created ahead of time and packaged with the app.
      contextId - 0-based index of the context group
      computeGoP - true if Goodness of Pronunciation scores at the phoneme level should be computed, false otherwise
      Returns:
      True if successful, false otherwise.

      Note: If custom decoding graph was built with rescoring capability, all the resources will be loaded regardless of how rescore paramater is set.

    • setRescore

      public void setRescore(Boolean value)
      If set to true, recognizer will perform rescoring for the final result, using rescoring language model provided in the custom decoding graph that's bundled with the application.

      Default is true.

      Note: If the resources necessary for rescoring are not available in the custom decoding graph directory bundled with the app, and rescore is set to true, rescoring step will be skipped.

      Parameters:
      value - boolean value that determines if rescoring is to be performed
    • getRescore

      public Boolean getRescore()
      Value of the rescoring flag
      Returns:
      True is rescoring is set, false otherwise
    • startListening

      public boolean startListening()
      Start processing incoming audio.
      Returns:
      True if successful, false otherwise

      After calling this method, recognizer will listen to and decode audio coming through the microphone using decoding graph you specified via one of the prepareForListening methods. For decoding graphs created without trigger phrase support listening process will stop either due to: a) an explicit call to stopListening or b) if one of the Voice Activity Detection rules are triggered (for example, max duration without speech, or end-silence, etc.). We recommend using VAD based end-pointing instead of explicitly calling stopListening

      If decoding graph was created with the triggerPhrase support the SDK will listen continuously until the trigger phrase is recognized, then it will switch over to the standard mode with partial results being reported via onPartialResult callback.

      When the recognizer stops listening due to VAD triggering, it will call KASRRecognizerListener.onFinalResponse(KASRRecognizer, KASRResponse) callback method.

      When the recognizer stops listening due to audio interrupt, *no callback methods* will be triggered until audio interrupt is over.

      VAD settings can be modified via setVADParameter(KASRVadParameter, float) method.

      Note: You will need to call either prepareForListeningWithDecodingGraphWithName(String, boolean) or prepareForListeningWithDecodingGraphAtPath(String, boolean) before calling this method. You will also need to make sure that user has granted audio recording permission before calling this method; see android.support.v4.app.ActivityCompat#requestPermissions for details.
    • stopListening

      public boolean stopListening()
      Stop the recognizer from processing incoming audio.
      Returns:
      True if successful, false otherwise.

      Note: Calling this method will not trigger recognizerFinalResult delegate call.

    • stopListeningAndReturnFinalResult

      @Deprecated public KASRResult stopListeningAndReturnFinalResult()
      Deprecated.
      This method is deprecated. Use KASRRecognizer.KASRVadParameter to instruct recognizer to stop listening and obtain KASRResponse via callback.
      Stop the recognizer from processing incoming audio and return the final result.
      Returns:
      Final result of the recognition.

      If your application is using Voice Activity Detection parameters, it possible that this method doesn't return the result if one of the Voice Activity Detection thresholds triggers. In that case, recognizerFinalResult delegate will be called.

      Note: This method runs synchronously. For large decoding graphs there may be noticeable delay (few hundred ms) on lower-end devices.

    • getRecognizerState

      public KASRRecognizer.KASRRecognizerState getRecognizerState()
      Returns recognizer state, one of KASRRecognizerState values
      Returns:
      state of the recognizer, one of KASRRecognizerState values
    • getDecodingGraphName

      public String getDecodingGraphName()
      Returns name of the decoding graph that's used for recognition.
      Returns:
      name of the decoding graph that's used for recognition, null if the recognizer is not prepared for listening.
    • addListener

      public void addListener(KASRRecognizerListener listener)
      Adds listener.
      Parameters:
      listener - listener that should be added
    • removeListener

      public void removeListener(KASRRecognizerListener listener)
      Removes listener.
      Parameters:
      listener - listener to be removed
    • addTriggerPhraseListener

      public void addTriggerPhraseListener(KASRRecognizerTriggerPhraseListener listener)
      Adds trigger phrase listener.
      Parameters:
      listener - listener that should be added
    • removeTriggerPhraseListener

      public void removeTriggerPhraseListener(KASRRecognizerTriggerPhraseListener listener)
      Removes trigger phrase listener.
      Parameters:
      listener - listener to be removed
    • getAsrBundlePath

      public String getAsrBundlePath()
      Obtains a path to the directory where ASR Bundle used to initialize the SDK is stored
      Returns:
      String containing a full path to the ASR Bundle directory
    • getAsrBundleName

      public String getAsrBundleName()
      Obtains a name of the ASR Bundle used to initialize the SDK.
      Returns:
      String containing a name of the ASR Bundle
    • adaptToSpeakerWithName

      public void adaptToSpeakerWithName(String speakerName)
      Defines the name that will be used to uniquely identify speaker adaptation profile. When recognizer starts to listen, it will try to find a matching speaker profile in the filesystem (profiles are matched based on speakername, asrbundle, and audio route). When saveSpeakerAdaptationProfile method is called, it uses the name to uniquely identify the profile file that will be saved in the filesystem.
      Parameters:
      speakerName - (pseduo)name of the speaker for which adaptation is to be performed. Default value is 'default'.

      The name used here does not have to correspond to the real name of user (thus we call it pseudo name). The exact value does not matter as long as you can match the value to the specific user in your app. For example, you could use 'user1', 'user2', etc..

      Note: If you cannot match names to your users, it's recommended to not use this method, and to not save adaptation profiles between sessions. Adaptation will still be performed throughout the session, but each new session (activity after initialization of recognizer) will start from the baseline models.

      In-memory speaker adaptation profile can always be reset by calling resetSpeakerAdaptation method.

      If this method is called while recognizer is listening, it will only affect subsequent calls to startListening methods.

    • resetSpeakerAdaptation

      public void resetSpeakerAdaptation()
      Resets speaker adaptation profile in the current recognizer session. Calling this method will also reset the speakerName to 'default'. If the corresponding speaker adaptation profile exists in the filesystem for 'default' speaker, it will be used. If not, initial models from the ASR Bundle will be the baseline. You would typically use this method id there is a new start of a certain activity in your app that may entail new speaker. For example, a practice view is started and there is a good chance a different user may be using the app. If speaker (pseudo)identities are known, you don't need to call this method, you can just switch speakers by calling adaptToSpeakerWithName: with the appropriate speakerName Following are the tradeoffs when using this method:
      • the downside of resetting user profile for the existing user is that ASR performance will be reset to the baseline (no adaptation), which may slightly degrade performance in the first few interactions
      • the downside of NOT resetting user profile for a new user is that, depending on the characteristics of the new user's voice, ASR performance may initially be degraded slightly (when comparing to the baseline case of no adaptation)
      Calls to this method will be ignored if recognizer is in LISTENING state. If you are resetting adaptation profile and you know user's (pseudo)identity, you may want to call saveSpeakerAdaptationProfile method prior to calling this method so that on subsequent user switches, adaptation profiles can be reloaded and recognition starts with the speaker profile trained on previous sessions audio.
    • saveSpeakerAdaptation

      public void saveSpeakerAdaptation()
      Saves speaker profile (used for adaptation) in the filesystem. Speaker profile will be saved in the file system, in files/KeenASR-speaker-profiles/ directory. Profile filename is composed of the speakerName, asrBundle, and audioRoute.
    • removeAllSpeakerAdaptationProfiles

      public boolean removeAllSpeakerAdaptationProfiles()
      Remove all adaptation profiles for all speakers.
      Returns:
      true if successfully removed all the profiles for all the speakers
    • removeSpeakerAdaptationProfiles

      public boolean removeSpeakerAdaptationProfiles(String speakerName)
      Removes all adaptation profiles for the speaker with name speakerName.
      Parameters:
      speakerName - name of the speaker whose profiles should be removed
      Returns:
      true if successfully removed, false otherwise
    • setLogLevel

      public static void setLogLevel(KASRRecognizer.KASRRecognizerLogLevel logLevel)
      Set log level for the framework.
      Parameters:
      logLevel - one of KIOSRecognizerLogLevel

      Default value is KIOSRecognizerLogLevelWarning.

    • setVADGating

      public boolean setVADGating(boolean value)
      VAD (voice activity detection) gating introduces a super lightweight voice activity detection before speech recognition. After you call startListening method, if VADGating is turned on recognition will start only after voice was detected by VADGating module. You would typically use this only in always-on listening scenarios when you want to preserve battery life.
      Parameters:
      value - true if VAD Gating should be turned on, false otherwise.
      Returns:
      true if successfully set, false otherwise
    • setVADParameter

      public void setVADParameter(KASRRecognizer.KASRVadParameter parameter, float value)
      Set any of KASRRecognizer.KASRVadParameter Voice Activity Detection parameters. These parameters can be set at any time and they will go into effect immediately.
      Parameters:
      parameter - one of KASRVadParameter
      value - duration in seconds for the parameter

      Note: Setting VAD rules in the config file within the ASR bundle will NOT have any effect. Values for these parameters are set to their defaults upon initialization of KASRRecognizer. They can only be changed programmatically, using this method.

    • getInputLevel

      public float getInputLevel()
      The most recent signal input level in dB
      Returns:
      signal input level in dB
    • isEchoCancellationAvailable

      public boolean isEchoCancellationAvailable()
      Returns:
      True if the device natively supports echo cancellation, false otherwise.
    • performEchoCancellation

      public boolean performEchoCancellation(boolean value)
      EXPERIMENTAL Specifies if echo cancellation should be performed. If value is set to true and the device supports echo cancellation, then audio played by the application will be removed from the audio captured via the microphone.
      Parameters:
      value - set to YES to turn on echo cancellation processing, NO to turn it off. Default is NO.
      Returns:
      true if value was successfully set, false otherwise. If the device does not support echo cancellatio and you pass true to this method, it will return false. WARNING: Calls to this method while the recognizer is listening will be ignored end the method will return false.
    • version

      public static String version()
      Version of the KeenASR framework.
      Returns:
      string containing SDK version