1. Review the The KeenASR API Reference

Download Javadoc for the SDK or review the documentation online

2. Import Relevant Packages

import com.keenresearch.keenasr.KASRDecodingGraph;
import com.keenresearch.keenasr.KASRRecognizer;
import com.keenresearch.keenasr.KASRResult;
import com.keenresearch.keenasr.KASRRecognizerListener;
// if trigger phrase functionality is used
//import com.keenresearch.keenasr.KASRRecognizerTriggerPhraseListener;
import com.keenresearch.keenasr.KASRBundle;

3. Copy the ASR Bundle from the Assets folder to Device Internal Storage

If the ASR Bundle was included in your app as an assets it will need to be copied to the device internal (or external) storage. The ways Android provides read access to assets is not compatible with Keen Research native libraries. You therefore need to copy the ASR Bundle to the regular device storage before you can initialize the SDK.


KASRBundle asrBundle = new KASRBundle(this.context);
ArrayList<String> assets = new ArrayList<String>();

// you will need to make sure all individual assets are added to the ArrayList
assets.add("keenB2mQT-nnet3chain-en-us/decode.conf");
assets.add("keenB2mQT-nnet3chain-en-us/final.dubm");
assets.add("keenB2mQT-nnet3chain-en-us/final.ie");
assets.add("keenB2mQT-nnet3chain-en-us/final.mat");
assets.add("keenB2mQT-nnet3chain-en-us/final.mdl");
assets.add("keenB2mQT-nnet3chain-en-us/global_cmvn.stats");
assets.add("keenB2mQT-nnet3chain-en-us/ivector_extractor.conf");
assets.add("keenB2mQT-nnet3chain-en-us/mfcc.conf");
assets.add("keenB2mQT-nnet3chain-en-us/online_cmvn.conf");
assets.add("keenB2mQT-nnet3chain-en-us/splice.conf");
assets.add("keenB2mQT-nnet3chain-en-us/splice_opts");
assets.add("keenB2mQT-nnet3chain-en-us/wordBoundaries.int");
assets.add("keenB2mQT-nnet3chain-en-us/words.txt");

assets.add("keenB2mQT-nnet3chain-en-us/lang/lexicon.txt");
assets.add("keenB2mQT-nnet3chain-en-us/lang/phones.txt");
assets.add("keenB2mQT-nnet3chain-en-us/lang/tree");


String asrBundleRootPath = getApplicationInfo().dataDir;
String asrBundlePath = new String(asrBundleRootPath + "/keenB2mQT-nnet3chain-en-us");
try {
  asrBundle.installASRBundle(assets, asrBundleRootPath);
} catch (IOException e) {
  Log.e(TAG, "Error occurred when installing ASR bundle" + e);
}

4. Make Sure the App has Microphone Access

When an Android app captures audio from the microphone, it must ask the user for permission. This logic needs to be built into your app. The proof of concept app on Github shows one way to do this. Note that this is a very simple proof-of-concept, approach. In a real app you would want to gracefully handle the case when a user declines providing access to the microphone.

5. Initialize the SDK, Set up the Listener, and Set the VAD Parameters and Other Settings

Initialize the SDK using the path to the ASR Bundle (on the internal or external storage). This process takes a few seconds, so you may run it in a background thread (AsyncTask).

KASRRecognizer.initWithASRBundleAtPath(asrBundlePath, getApplicationContext());
KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
if (recognizer) {
   recognizer.addListener(MainActivity.instance);
//   recognizer.addTriggerPhraseListener(MainActivity.instance);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutEndSilenceForGoodMatch, 1.0f);
   ecognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutEndSilenceForAnyMatch, 1.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutMaxDuration, 10.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutForNoSpeech, 3.0f);
   recognizer.setCreateAudioRecordings(true);
}

6. Implement Listener Methods

The onPartialResult method will fire every 100-200ms and provide partial recognition results in real time. The partial result object does not contain word timing information, confidence statistics, or JSON representation.

The onFinalResult method is called after the recognizer has stopped listening due to one of the Voice Activity Detection rules triggering.

public void onPartialResult(KASRRecognizer recognizer, final KASRResult result) {
  Log.i(TAG, "   Partial result: " + result.getText());
  // you may do something else here, e.g. show the partial result to the user
  // or analyze it and set KASRVadTimeoutEndSilenceForGoodMatch and KASRVadTimeoutEndSilenceForAnyMatch to a lower value
}

public void onFinalResult(KASRRecognizer recognizer, final KASRResult result) {
  Log.i(TAG, "Final result: " + result);
  Log.i(TAG, "audioFile is in " + recognizer.getLastRecordingFilename());
  // you would probably do a bit more, e.g. re-enable "Start Listening" button, show text to the user
  // perform some activity in the app based on the recognized text

If you use the trigger phrase functionality, be sure to implement the KASRRecognizerTriggerPhaseListener interface, for example, the onTriggerPhrase method. This method is called when a trigger phrase is recognized.

public void onTriggerPhrase(KASRRecognizer recognizer) {
  Log.i(TAG, "   Trigger phrase detected");
  // you may do something else here, e.g. use visual effect to indicate to the user that the trigger phrase has been recognized.
}

7. Create Decoding Graphs and Prepare for Listening

Here we assume that the getPhrases method returns an array of strings with phrases that the recognizer should listen to.

KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
// getPhrases is a method that returns an array of phrases
String[] phrases = MainActivity.getPhrases();

if (recognizer != null) {
   String dgName = "words";
   KASRDecodingGraph.createDecodingGraphFromSentences(phrases, recognizer, dgName);
   // alternatively, if you are using trigger phrase functionality
//   KASRDecodingGraph.createDecodingGraphFromSentencesWithTriggerPhrase(phrases, "hey computer", recognizer, dgName);   
   recognizer.prepareForListeningWithCustomDecodingGraphWithName("words");
} else {
   Log.e(TAG, "Unable to retrieve recognizer");
}

8. Start/Stop Listening

To start capturing audio from the device mic and decoding it, call the recognizer’s startListening method.

KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
recognizer.startListening();

You can make the app stop listening by explicitly calling the stopListening() method. However, we highly recommend you rely on Voice Activity Detection and let the recognizer stop automatically when one of the VAD rules is triggered.

While the recognizer is listening, it will periodically (every 100-200ms) call the listener’s onPartialResult() method IF there are partial recognition results AND they differ from the most recent partial result.

The Recognizer automatically stops listening when one of the Voice Activity Detection (VAD) rules is triggered. You can control the VAD configuration parameters through the setVADParameter() method. When the recognizer stops listening due to VAD triggering, it will call onFinalResult() method.

See KASRVadParameter constants for more information on configuring VAD settings.

When using a decoding graph created with trigger phrase support, the logic is slightly different. After you call the startListening method, the recognizer will listen continuously. Then, when the trigger phrase is recognized, the onTriggerPhrase method is called, after which the recognizer switches back to the regular listening mode. In regular listening mode partial results are reported via onPartialResult and the recognizer stops listening after one of the VAD rules is triggered. In your onFinalResult method you would act upon the command and then most likely call startListening again to continue listening (this will depend on the logic of your application).

You can direct the recognizer to create audio recordings of the audio that is captured from the microphone by calling setCreateAudioRecordings(true). Use the getLastRecordingFilename() method to determine the path to the recorded file.

The SDK only creates the recording; it is your responsibility to do something with the audio file once it has been created. For example, you might play the audio back to the user and delete it afterwards. Or you could delete the audio only locally after sending it to the backend to assess how users are interacting with your app.

As a Keen Research customer, you have access to Dashboard, a cloud-based tool for development support. Future releases of Android SDK will provide ways to automatically upload recordings and ASR metadata to Dashboard for review and further analysis.