After you installed KeenASR SDK for Web library, you can start using it in your web page app by following these steps:
1. Import the Library
Import keenasr-web.js file wherever you want to use SDK.
import KeenASR from "path-to-sdk/keenasr-web.js";
2. Prepare the SDK
Prepare the SDK, which will retrieve and instantiate a wasm binary and initialize the local filesystem.
await KeenASR.prepare(); // this method is async, so make sure to await it
3. Fetch ASR Bundle
We provide a helper method that will fetch ASR Bundle from the server and store it locally.
await KeenASR.fetchASRBundle("path-to-archived-bundle.tgz");
To avoid download of the ASR Bundle on every page load you can use the helper method KeenASR.isASRBundleAvailable(bundleName)
; if this method returns true than you down need to fetch the bundle from the server.
const isASRBundleAvailable = KeenASR.isASRBundleAvailable(bundleName);
if (!isASRBundleAvailable) {
await KeenASR.fetchASRBundle("path-to-sdk/keenAK1mPron4-nnet3chain-en-us.tgz");
}
3. Initialize the SDK
Initialize the SDK using the name of the ASR Bundle; this process runs asynchroniously and may take several seconds.
await KeenASR.initialize(bundleName);
Once recognizer is initialized you will need to setup handlers (callback methods) for final result, partial results, and audio recorder.
KeenASR.setResultHandlers(
handlePartialResult,
handleFinalResult,
handleAudioRecorder
);
Each of these is your method that will be called when corresponding events happen.
At this point you can also further configure the recognizer, such as setting different VAD parameters, etc..
4. Create the Decoding Graph
The decoding graph combines the language model with all other recognition resources (acoustic models, lexicon) and provides the recognizer with a data structure that simplifies the decoding process. You can build the decoding graph dynamically from within your web app by providing a list of sentences/phrases users are likely to say. In this case, the SDK will first build the n-gram language model and then create the decoding graph. This functionality is provided through the createDecodingGraphFromPhrases
method:
const phrases = ["Once upon a time there was an old mother pig who had three little pigs and not enough food to feed them.", "This is just another phrase"]
await KeenASR.createDecodingGraphFromPhrases(phrases, [], KeenASR.SpeakingTask.SpeakingTaskOralReading, 'reading');
This example code creates a custom decoding graph called reading, using underlying logic to optimize the graph for oral reading task, and saves the graph in the local file system. Later on you can refer to this decoding graph by its name. You typically create the decoding graph only once and re-create it only when you know that the data used to build it may have changed. You can use decodingGraphWithNameExists
method to check if the graph with the given name already exists; this way you don’t need to create decoding graph on every page load.
if (!KeenASR.decodingGraphWithNameExists('reading')) {
await KeenASR.createDecodingGraphFromPhrases(phrases, [], KeenASR.SpeakingTask.SpeakingTaskOralReading, 'reading');
}
5. Prepare to Listen
Before starting to listen you will need to tell the SDK which decoding graph to use, by calling the prepareForListeningWithCustomDecodingGraphWithName(graphName)
method. For a given decoding graph you do this only once before you start listening.
if (!KeenASR.prepareForListeningWithCustomDecodingGraphWithName('reading'))
throw new Error("SDK is not prepared for listening");
6. Start/Stop Listening
To start capturing audio from the device microphone and decoding it, call the SDK’s startListening()
method. While you can stop the device listening by explicitly calling the stopListening()
method, we highly recommend you rely on Voice Activity Detection and let the recognizer stop automatically when one of the VAD rules is triggered.
While the recognizer is listening, it periodically (every 100-200ms) calls delegate’s handlePartialResult
event handler IF there are partial recognition results AND they are different than the most recent partial result.
The recognizer automatically stops listening when one of the Voice Activity Detection (VAD) rules is triggerred. You can control VAD configuration parameters through the setVADParameter()
method. When the recognizer stops listening due to VAD triggering, it will call the onFinalResult()
event handler.
Refer to the KeenASR.VADParameter
constants for information on different VAD settings.
7. Switching Decoding Graphs
If your app needs to support multiple decoding graphs you can dynamically build multiple decoding graphs. At any time while the recognizer is not listening you can call one of the prepareForListening
methods to load a different decoding graph.
8. Other
Upon loading the web page that uses KeenASR SDK for Web browser will ask the user to allow the page to access the microphone. You might want to prime the user and explain in simple language why this is required before you initalize the SDK (web browser explanation might be sparse and confusing for the user).
For information on how to specify what the recognizer is listening for (decoding graph), refer to Decoding Graphs and Acoustic Models.
You can also review the oral reading demo and view its source code.