1. Review the The KeenASR API Reference

Download Javadoc for the SDK or review the documentation online

2. Import Relevant Packages

import com.keenresearch.keenasr.KASRDecodingGraph;
import com.keenresearch.keenasr.KASRRecognizer;
import com.keenresearch.keenasr.KASRResult;
import com.keenresearch.keenasr.KASRResponse;
import com.keenresearch.keenasr.KASRRecognizerListener;
import com.keenresearch.keenasr.KASRBundle;

// if trigger phrase functionality is used
//import com.keenresearch.keenasr.KASRRecognizerTriggerPhraseListener;

3. Copy the ASR Bundle from the Assets folder to Device Internal Storage

If the ASR Bundle was included in your app as an assets it will need to be copied to the device internal (or external) storage. The ways Android provides read access to assets is not compatible with Keen Research native libraries. You therefore need to copy the ASR Bundle to the regular device storage before you can initialize the SDK.


KASRBundle asrBundle = new KASRBundle(this.context);
ArrayList<String> assets = new ArrayList<String>();

// you will need to make sure all individual assets are added to the ArrayList
String asrBundleName = "keenB2mQT-nnet3chain-en-us";
assets.add(asrBundleName + "/decode.conf");
assets.add(asrBundleName + "/final.dubm");
assets.add(asrBundleName + "/final.ie");
assets.add(asrBundleName + "/final.mat");
assets.add(asrBundleName + "/final.mdl");
assets.add(asrBundleName + "/global_cmvn.stats");
assets.add(asrBundleName + "/ivector_extractor.conf");
assets.add(asrBundleName + "/mfcc.conf");
assets.add(asrBundleName + "/online_cmvn.conf");
assets.add(asrBundleName + "/splice.conf");
assets.add(asrBundleName + "/splice_opts");
assets.add(asrBundleName + "/wordBoundaries.int");
assets.add(asrBundleName + "/words.txt");

assets.add(asrBundleName + "/lang/lexicon.txt");
assets.add(asrBundleName + "/lang/phones.txt");
assets.add(asrBundleName + "/lang/tree");
assets.add(asrBundleName + "/lang/unk_inv.fst");


String asrBundleRootPath = getApplicationInfo().dataDir;
String asrBundlePath = new String(asrBundleRootPath + "/keenB2mQT-nnet3chain-en-us");
try {
  asrBundle.installASRBundle(assets, asrBundleRootPath);
} catch (IOException e) {
  Log.e(TAG, "Error occurred when installing ASR bundle" + e);
}

4. Make Sure the App has Microphone Access

When an Android app captures audio from the microphone, it must ask the user for permission. This logic needs to be built into your app. The proof of concept app on Github shows one way to do this. Note that this is a very simple proof-of-concept, approach. In a real app you would want to gracefully handle the case when a user declines providing access to the microphone and make sure access is properly granted before you start using the SDK.

5. Initialize the SDK, Set up the Listener, and Set the VAD Parameters and Other Settings

Initialize the SDK using the path to the ASR Bundle (on the internal or external storage). This process can takes a couple of seconds, so you may need to run it in a background thread (AsyncTask).

KASRRecognizer.initWithASRBundleAtPath(asrBundlePath, getApplicationContext());
KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
if (recognizer) {
   recognizer.addListener(MainActivity.instance);
//   recognizer.addTriggerPhraseListener(MainActivity.instance);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutEndSilenceForGoodMatch, 1.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutEndSilenceForAnyMatch, 1.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutMaxDuration, 10.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutForNoSpeech, 3.0f);
}

6. Implement Listener Methods

The onPartialResult method will fire every 100-200ms and provide partial recognition results in real time. The partial result object does not contain word timing information, confidence statistics, or JSON representation.

The onFinalResonse method is called after the recognizer has stopped listening due to one of the Voice Activity Detection rules triggering.

public void onPartialResult(KASRRecognizer recognizer, final KASRResult result) {
  Log.i(TAG, "   Partial result: " + result.getText());
  // you may do something else here, e.g. show the partial result to the user
  // or analyze it and set KASRVadTimeoutEndSilenceForGoodMatch and KASRVadTimeoutEndSilenceForAnyMatch to a lower value
}

public void onFinalResponse(KASRRecognizer recognizer, final KASRResponse response) {
  Log.i(TAG, "Final result: " + response.result);
  // you would probably do a bit more, e.g. re-enable "Start Listening" button, show text to the user
  // perform some activity in the app based on the recognized text

  // you can also save the audio and JSON that correspond to this response in the filesystem
  File dir = this.getApplication().getApplicationContext().getCacheDir();
  response.saveAudio(dir);
  response.saveJson(dir);
  // files with these names will be in the directory specified by dir
  Log.i(TAG, "audioFilepath:" + response.getAudioFilename());
  Log.i(TAG, "jsonFilepath:" + response.getJsonFilename());

If you use the trigger phrase functionality, be sure to implement the KASRRecognizerTriggerPhaseListener interface, for example, the onTriggerPhrase method. This method is called when a trigger phrase is recognized.

public void onTriggerPhrase(KASRRecognizer recognizer) {
  Log.i(TAG, "   Trigger phrase detected");
  // you may do something else here, e.g. use visual effect to indicate to the user that the trigger phrase has been recognized.
}

7. Create Decoding Graphs and Prepare for Listening

Here we assume that the getPhrases method returns an array of strings with phrases that the recognizer should listen to.

KASRRecognizer recognizer = KASRRecognizer.sharedInstance();

if (recognizer != null) {
   // getPhrases is a method that returns an array of phrases
   String[] phrases = MainActivity.getPhrases();
   
   String graphName="words";
   if (KASRDecodingGraph.createDecodingGraphFromPhrases(phrases, recognizer, graphName)) {
     // prepare for listening with the created decoding graph
     if (! recognizer.prepareForListeningWithDecodingGraphWithName(graphName, false /* we are not computing GoP scores*/) ) {
       Log.e(TAG, "Unable to prepare for listening with graph " + graphName);
     }
   } else {
     Log.e(TAG, "Unable to create decoding graph" + graphName);
   }
} else {
   Log.e(TAG, "Unable to retrieve recognizer");
}

8. Start/Stop Listening

To start capturing audio from the device mic and decoding it, call the recognizer’s startListening method.

KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
recognizer.startListening();

You can make the app stop listening by explicitly calling the stopListening() method. However, we highly recommend you rely on Voice Activity Detection and let the recognizer stop automatically when one of the VAD rules is triggered.

While the recognizer is listening, it will periodically (every 100-200ms) call the listener’s onPartialResult() method IF there are partial recognition results AND they differ from the most recent partial result.

The Recognizer automatically stops listening when one of the Voice Activity Detection (VAD) rules is triggered. You can control the VAD configuration parameters through the setVADParameter() method. When the recognizer stops listening due to VAD triggering, it will call onFinalResponse() method.

See KASRVadParameter constants for more information on configuring VAD settings.

When using a decoding graph created with trigger phrase support, the logic is slightly different. After you call the startListening method, the recognizer will listen continuously. Then, when the trigger phrase is recognized, the onTriggerPhrase method is called, after which the recognizer switches back to the regular listening mode. In regular listening mode partial results are reported via onPartialResult and the recognizer stops listening after one of the VAD rules is triggered. In your onFinalResponse method you would act upon the command and then most likely call startListening again to continue listening (this will depend on the logic of your application).

As a Keen Research customer, you have access to Dashboard, a cloud-based tool for development support. Future releases of Android SDK will provide ways to automatically upload recordings and ASR metadata to Dashboard for review and further analysis.