1. Review API Reference

Download or review online Javadoc for the SDK

2. Import Relevant Packages

import com.keenresearch.keenasr.KASRDecodingGraph;
import com.keenresearch.keenasr.KASRRecognizer;
import com.keenresearch.keenasr.KASRResult;
import com.keenresearch.keenasr.KASRRecognizerListener;
// if trigger phrase functionality is used
//import com.keenresearch.keenasr.KASRRecognizerTriggerPhraseListener;
import com.keenresearch.keenasr.KASRBundle;

3. Copy ASR Bundle from Assets Directory to Device Internal Storage

If ASR Bundle was included in your app as an assets it will need to be copied to the device internal (or external) storage. The ways Android provides read access to assets is not compatible with our native libraries, so we need to copy ASR Bundle to the regular device storage before SDK can be initailized.


KASRBundle asrBundle = new KASRBundle(this.context);
ArrayList<String> assets = new ArrayList<String>();

// you will need to make sure all individual assets are added to the ArrayList
assets.add("keenB2mQT-nnet3chain-en-us/decode.conf");
assets.add("keenB2mQT-nnet3chain-en-us/final.dubm");
assets.add("keenB2mQT-nnet3chain-en-us/final.ie");
assets.add("keenB2mQT-nnet3chain-en-us/final.mat");
assets.add("keenB2mQT-nnet3chain-en-us/final.mdl");
assets.add("keenB2mQT-nnet3chain-en-us/global_cmvn.stats");
assets.add("keenB2mQT-nnet3chain-en-us/ivector_extractor.conf");
assets.add("keenB2mQT-nnet3chain-en-us/mfcc.conf");
assets.add("keenB2mQT-nnet3chain-en-us/online_cmvn.conf");
assets.add("keenB2mQT-nnet3chain-en-us/splice.conf");
assets.add("keenB2mQT-nnet3chain-en-us/splice_opts");
assets.add("keenB2mQT-nnet3chain-en-us/wordBoundaries.int");
assets.add("keenB2mQT-nnet3chain-en-us/words.txt");

assets.add("keenB2mQT-nnet3chain-en-us/lang/lexicon.txt");
assets.add("keenB2mQT-nnet3chain-en-us/lang/phones.txt");
assets.add("keenB2mQT-nnet3chain-en-us/lang/tree");


String asrBundleRootPath = getApplicationInfo().dataDir;
String asrBundlePath = new String(asrBundleRootPath + "/keenB2mQT-nnet3chain-en-us");
try {
  asrBundle.installASRBundle(assets, asrBundleRootPath);
} catch (IOException e) {
  Log.e(TAG, "Error occurred when installing ASR bundle" + e);
}

4. Make Sure App has Microphone Access

When Android app captures audio from the microphone, it needs to ask the user for the permission to do so. Check the example code in the proof of concept app on Github on one way to do this. Note that this is a rudimenary, proof-of-concept, approach; in a real app you would want to gracefully handle the case when user does not provide access to the microphone.

5. Initialize the SDK, Setup Listener, and Set VAD Parameters and Other Settings

Initialize the SDK using the path to the ASR Bundle (on internal/external storage). This will take couple of seconds, so it’s better to do it in a background thread (AsyncTask).

KASRRecognizer.initWithASRBundleAtPath(asrBundlePath, getApplicationContext());
KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
if (recognizer) {
   recognizer.addListener(MainActivity.instance);
//   recognizer.addTriggerPhraseListener(MainActivity.instance);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutEndSilenceForGoodMatch, 1.0f);
   ecognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutEndSilenceForAnyMatch, 1.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutMaxDuration, 10.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutForNoSpeech, 3.0f);
   recognizer.setCreateAudioRecordings(true);
}

6. Implement Listener Methods

onPartialResult method will fire every 100-200ms, and provide partial recognition results in real time. Partial result object does not contain word timings, confidences, nor JSON representation. onFinalResult will be called after recognizer has stopped listening due to one of the Voice Activity Detection rules triggering.

public void onPartialResult(KASRRecognizer recognizer, final KASRResult result) {
  Log.i(TAG, "   Partial result: " + result.getText());
  // you may do something else here, e.g. show the partial result to the user
  // or analyze it and set KASRVadTimeoutEndSilenceForGoodMatch and KASRVadTimeoutEndSilenceForAnyMatch to a lower value
}

public void onFinalResult(KASRRecognizer recognizer, final KASRResult result) {
  Log.i(TAG, "Final result: " + result);
  Log.i(TAG, "audioFile is in " + recognizer.getLastRecordingFilename());
  // you would probably do a bit more, e.g. re-enable "Start Listening" button, show text to the user
  // perform some activity in the app based on the recognized text

If trigger phrase functionality is used, you will need to make sure to implement the KASRRecognizerTriggerPhaseListener interface, i.e. onTriggerPhrase method. This method will be called when trigger phrase has been recognized.

public void onTriggerPhrase(KASRRecognizer recognizer) {
  Log.i(TAG, "   Trigger phrase detected");
  // you may do something else here, e.g. use visual effect to indicate to the user that the trigger phrase has been recognized.
}

7. Create Decoding Graphs and Prepare for Listening

Here we assume that getPhrases method returns an array of strings with phrases that the recognizer should listen to.

KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
// getPhrases is a method that returns an array of phrases
String[] phrases = MainActivity.getPhrases();

if (recognizer != null) {
   String dgName = "words";
   KASRDecodingGraph.createDecodingGraphFromSentences(phrases, recognizer, dgName);
   // alternatively, if you are 
//   KASRDecodingGraph.createDecodingGraphFromSentencesWithTriggerPhrase(phrases, "hey computer", recognizer, dgName);   
   recognizer.prepareForListeningWithCustomDecodingGraphWithName("words");
} else {
   Log.e(TAG, "Unable to retrieve recognizer");
}

8. Start/Stop Listening

To start capturing audio from the device mic and decoding it, call recognizer’s startListening method.

KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
recognizer.startListening();

While you can stop it by explicitly calling its stopListening() method, we highly recommend you rely on Voice Activity Detection and let recognizer stop automatically when one of the VAD rules is triggered.

While the recognizer is listening, it will periodically (every 100-200ms) call the listener’s onPartialResult() method IF there are partial recognition results AND they are different than the most recent partial result.

Recognizer will automatically stop listening when one of the Voice Activity Detection (VAD) rules triggers. You can control VAD config parameters via setVADParameter() method. When the recognizer stops listening due to VAD triggering, it will call onFinalResult() method.

See KASRVadParameter constants for more details on different VAD settings.

When using decoding graph created with trigger phrase support, the logic is slightly different. After you call startListening method, the recognizer will listen continuosly; when the trigger phrase is recognized onTriggerPhrase method will be called, after which the recognizer switches back to the regular listening mode where partial result will be reported via onPartialResult and the recognizer stops listening after one of the VAD rules triggers. In your onFinalResult method you would act upon the command and then most likely call startListening again to continue to listen (this will depend on the logic of your application).

You can direct recognizer to create audio recordings of the audio that’s captured from the microphone by calling setCreateAudioRecordings(true); you can use getLastRecordingFilename() method of the recognizer to determine the path to the recorded file.

The SDK only creates the recording; it is your responsibility to do something with the audio file once it’s created. For example, you may want to play it back to the user (and delete eventually), or send it to the backend (and delete locally), so that you can assess how users are interacting with your app.

Keen Research customers have access to Dashboard, a cloud-based tool for development support. Future release of Android SDK will provide ways to automatically upload recordings and ASR metadata to Dashboard.