1. Review API Reference

Download or review online Javadoc for the SDK

2. Import Relevant Packages

import com.keenresearch.keenasr.KASRDecodingGraph;
import com.keenresearch.keenasr.KASRRecognizer;
import com.keenresearch.keenasr.KASRResult;
import com.keenresearch.keenasr.KASRRecognizerListener;
import com.keenresearch.keenasr.KASRBundle;

3. Load Native Libraries

In the class that will be initializing KeenASR SDK add the following static block

static {  
  System.loadLibrary("c++_shared");  
  System.loadLibrary("keenasr-jni");  
}  

See example in the PoC app on Github

4. Copy ASR Bundle from Assets Directory to Device Internal Storage

If ASR Bundle was included in your app as an assets it will need to be copied to the device internal (or external) storage. The ways Android provides read access to assets is not compatible with our native libraries, so we need to copy ASR Bundle to the regular device storage before SDK can be initailized.


KASRBundle asrBundle = new KASRBundle(this.context);
ArrayList<String> assets = new ArrayList<String>();

// you will need to make sure all individual assets are added to the ArrayList
assets.add("librispeechQT-nnet2-en-us/decode.conf");
assets.add("librispeechQT-nnet2-en-us/final.dubm");
assets.add("librispeechQT-nnet2-en-us/final.ie");
assets.add("librispeechQT-nnet2-en-us/final.mat");
assets.add("librispeechQT-nnet2-en-us/final.mdl");
assets.add("librispeechQT-nnet2-en-us/global_cmvn.stats");
assets.add("librispeechQT-nnet2-en-us/ivector_extractor.conf");
assets.add("librispeechQT-nnet2-en-us/mfcc.conf");
assets.add("librispeechQT-nnet2-en-us/online_cmvn.conf");
assets.add("librispeechQT-nnet2-en-us/splice.conf");
assets.add("librispeechQT-nnet2-en-us/splice_opts");
assets.add("librispeechQT-nnet2-en-us/wordBoundaries.int");
assets.add("librispeechQT-nnet2-en-us/words.txt");

assets.add("librispeechQT-nnet2-en-us/lang/lexicon.txt");
assets.add("librispeechQT-nnet2-en-us/lang/phones.txt");
assets.add("librispeechQT-nnet2-en-us/lang/tree");


String asrBundleRootPath = getApplicationInfo().dataDir;
String asrBundlePath = new String(asrBundleRootPath + "/librispeechQT-nnet2-en-us");
try {
  asrBundle.installASRBundle(assets, asrBundleRootPath);
} catch (IOException e) {
  Log.e(TAG, "Error occurred when installing ASR bundle" + e);
}

5. Make Sure App has Microphone Access

When Android app captures audio from the microphone, it needs to ask the user for the permission to do so. Check the example code in the proof of concept app on Github on one way to do this. Note that this is a rudimenary, proof-of-concept, approach; in a real app you would want to gracefully handle the case when user does not provide access to the microphone.

6. Initialize the SDK, Setup Listener, and Set VAD Parameters and Other Settings

Initialize the SDK using the path to the ASR Bundle (on internal/external storage). This will take couple of seconds, so it’s better to do it in a background thread (AsyncTask).

KASRRecognizer.initWithASRBundleAtPath(asrBundlePath, getApplicationContext());
KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
if (recognizer) {
   recognizer.addListener(MainActivity.instance);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutEndSilenceForGoodMatch, 1.0f);
   ecognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutEndSilenceForAnyMatch, 1.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutMaxDuration, 10.0f);
   recognizer.setVADParameter(KASRRecognizer.KASRVadParameter.KASRVadTimeoutForNoSpeech, 3.0f);
   recognizer.setCreateAudioRecordings(true);
}

7. Implement Listener Methods

onPartialResult method will fire every 100-200ms, and provide partial recognition results in real time. Partial result object does not contain word timings and confidences. onFinalResult will be called after recognizer has stopped listening due to one of the Voice Activity Detection rules triggering.

public void onPartialResult(KASRRecognizer recognizer, final KASRResult result) {
  Log.i(TAG, "   Partial result: " + result);
  // you may do something else here, e.g. show the partial result to the user
  // or analyze it and set KASRVadTimeoutEndSilenceForGoodMatch and KASRVadTimeoutEndSilenceForAnyMatch to a lower value
}

public void onFinalResult(KASRRecognizer recognizer, final KASRResult result) {
  Log.i(TAG, "Final result: " + result);
  Log.i(TAG, "audioFile is in " + recognizer.getLastRecordingFilename());
  // you would probably do a bit more, e.g. re-enable "Start Listening" button, show text to the user
  // perform some activity in the app based on the recognized text

8. Create Decoding Graphs and Prepare for Listening

Here we assume that getPhrases method returns an array of strings with phrases that the recognizer should listen to.

KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
// getPhrases is a method that returns an array of phrases
String[] phrases = MainActivity.getPhrases();

if (recognizer != null) {
   String dgName = "words";
   KASRDecodingGraph.createDecodingGraphFromSentences(phrases, recognizer, dgName);
   recognizer.prepareForListeningWithCustomDecodingGraphWithName("words");
} else {
   Log.e(TAG, "Unable to retrieve recognizer");
}

9. Start/Stop Listening

To start capturing audio from the device mic and decoding it, call recognizer’s startListening method.

KASRRecognizer recognizer = KASRRecognizer.sharedInstance();
recognizer.startListening();

While you can stop it by explicitly calling its stopListening() method, we highly recommend you rely on Voice Activity Detection and let recognizer stop automatically when one of the VAD rules is triggered.

While the recognizer is listening, it will periodically (every 100-200ms) call the listener’s onPartialResult() method IF there are partial recognition results AND they are different than the most recent partial result.

Recognizer will automatically stop listening when one of the Voice Activity Detection (VAD) rules triggers. You can control VAD config parameters via setVADParameter() method. When the recognizer stops listening due to VAD triggering, it will call onFinalResult() method.

See KASRVadParameter constants for more details on different VAD settings.

You can direct recognizer to create audio recordings of the audio that’s captured from the microphone by calling setCreateAudioRecordings(true); you can use getLastRecordingFilename() method of the recognizer to determine the path to the recorded file.

The framework only creates the recording; it is your responsibility to do something with the audio file once it’s created. For example, you may want to play it back to the user (and delete eventually), or send it to the backend (and delete locally), so that you can assess how users are interacting with your app.

Keen Research customers have access to Dashboard, a cloud-based tool for development support. Using KIOSUploader class you can setup a background upload thread which will push audio recordings and JSON recognition results to Dashboard for further analysis.