Initialize the SDK with a bundle URL and options (e.g. perform echo cancellation).
initialization parameters.
The full URL of the ASR bundle (e.g., https://.../bundle-name.tgz
). The bundle
needs to be in a tar gzipped file, and your web server needs to be configured to serve
these files with Content-Encoding set to gzip. For more details, see
SDK Installation docs.
Optional
doEchoCancellation?: booleanWhether to enable echo cancellation for audio input. Defaults to false
.
Optional
onASRBundleReady?: () => voidCallback invoked when the ASR bundle is ready (dowloaded and stored in the local filesystem).
Can be used for reporting initialization progress in the UI.
Optional
onCoreReady?: () => voidCallback invoked when the core module is ready.
Can be used for e.g. initializing KeenASR logging or reporting initialization progress in the UI.
Check if the browswer can run keenasr-web library. Currently, only Chrome 113 (or higher) and Safari 18 (or higher) are supported. More specifically, this function checks if cross-origin isolation is enabled and whether web assembly is supported.
Cross-origin isolation needs to be enabled for the use of SharedArrayBuffer by the Web SDK. This JavaScript object allows shared memory between web workers and the main thread. Although web workers run in the same process as the main thread, they have separate execution contexts and memory spaces, by default. For this reason, SharedArrayBuffer provides a way for different execution contexts to access the same memory buffer, enabling efficient and fast data exchange without copying.
For SharedArrayBuffer browser compatibility visit this page.
For WebAssembly browser compatibility visit this page.
Cross-origin isolation is enabled via the following headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
true if browser can run keenasr-web library, false otherwise
Get or set the audio input ID that will be matched against MediaDeviceInfo.deviceId to find the audio input device for recording.
If the device with a specified ID is not found, then recording will fall back to the system default audio input device.
Filters the result of MediaDevices.enumerateDevices where kind == 'audioinput'
request microphone access if needed
Optional
getAudioInputDeviceId: (audioInputDeviceId: string) => voidcallback that returns audio input device id.
Status of whether the browser microphone permissions have been granted. Note: the navigator.permissions API will not work for 'microphone' in Firefox
Get or set the audio input ID that will be matched against MediaDeviceInfo.deviceId to find the audio input device for recording.
If the device with a specified ID is not found, then recording will fall back to the system default audio input device.
Optional
onCallback invoked when the recognizer stops listening due to one of the VADParameter thresholds being met. The callback will provide an instance of ASRResponse object, which contains ASRResult, and various other data related to the most recent interaction. This include access to raw audio via ASRResponse.getAudio.
Optional
onCallback invoked each time when a partial result is available. This callback will execute periodically while the recognizer is listening, and provide partial result (only text, without more detailed information like timing, phonemes, etc.).
LogLevel defines logging levels used by the SDK.
Meant to be used with the KeenASR.setLogLevel method.
VAD (voice activity detection) gating introduces a super lightweight voice activity detection before speech recognition. After you call startListening method, if VADGating is turned on recognition will start only after voice was detected by VADGating module. You would typically use this only in always-on listening scenarios when you want to preserve battery life.
true if VAD Gating should be turned on, false otherwise.
Set any of VADParameter Voice Activity Detection parameters. These parameters can be set at any time and they will go into effect immediately.
a combination of Voice Activity Detection parameters and their corresponding values.
Optional
Readonly
timeoutEndSilenceForAnyMatch?: numberTimeout after this many seconds if we had any match (even if final state has not been reached). Default is 2 seconds.
Optional
Readonly
timeoutEndSilenceForGoodMatch?: numberTimeout after this many seconds if we had a good (high probability) match to the final state. Default is 1 second.
Optional
Readonly
timeoutForNoSpeech?: numberTimeout after this many seconds even if nothing has been recognized. Default is 10 seconds.
Optional
Readonly
timeoutMaxDuration?: numberTimeout after this many seconds regardless of what has been recognized. This is effectively upper bound on the duration of recognition. Default value is 20 seconds.
SpeakingTask defines a type of speaking task that will be handled. It is primarily used to indicate to methods that create decoding graphs what type of task they need to handle, so that appropiate customization can be done when creating language model and decoding graph.
Create contextual decoding graph from an array of contexts, for a specific task, using provided array of alterantive pronunciations, and save it in the filesystem for later use. Contextual decoding graphs can be referenced by their name by various methods in the SDK.
a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.
an array of contexts (where each context is an array of phrases).
You can swtich between contexts via KeenASR.prepareForListeningWithContextualDecodingGraphWithNameAndContextId,
where contextIds
are using 0-base indexing. These phrases are used to create an ngram language model,
from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and
dates should be represented by words, so 'two hundred dollars' not $200).
configuration parameters.
Optional
altProns?: AlternativePronunciation[]an optional array of AlternativePronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).
one of SpeakingTask specifying a type of interaction.
Optional
spokenNoiseProb?: numberan optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.
Create decoding graph from an array of phrases, for a specific task, using provided array of word mispronunciations and save it in the filesystem for later use.
a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.
an array of String objects that specify phrases recognizer should listen for. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200).
configuration parameters.
Optional
altProns?: AlternativePronunciation[]an optional array of AlternativePronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).
one of SpeakingTask specifying a type of interaction.
Optional
spokenNoiseProb?: numberan optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.
Create decoding graph from an array of phrases, using specified triggerPhase, and save it in the filesystem for later use. Decoding graphs can be referenced by their name by various methods in the SDK. When using decoding graphs created with the trigger phrase support, upon calling StartListening method the SDK will listen continuously until it hears the trigger phrase; only then will partial results start occurring.
a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.
an array of String objects that specify phrases recognizer should listen for. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200).
a String representing a trigger phrase used to initiate recognition when using this decoding graph, for example "Hey computer". When using decoding graph with trigger phrase, recognizer will continuously listen until it hears the trigger phrase. No partial callback results will be provided until trigger phrase is recognized.
configuration parameters.
Optional
altProns?: AlternativePronunciation[]an optional array of AlternativePronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).
one of SpeakingTask specifying a type of interaction.
Optional
spokenNoiseProb?: numberan optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.
Returns true if valid decoding graph with the given name exists in the filesystem.
name of the decoding graph.
Optional
onCallback invoked when the recognizer stops listening due to one of the VADParameter thresholds being met. The callback will provide an instance of ASRResponse object, which contains ASRResult, and various other data related to the most recent interaction. This include access to raw audio via ASRResponse.getAudio.
Optional
onCallback invoked each time when a partial result is available. This callback will execute periodically while the recognizer is listening, and provide partial result (only text, without more detailed information like timing, phonemes, etc.).
Prepare for recognition by loading contextual decoding graph that was prepared via
KeenASR.createContextualDecodingGraphFromPhrases(phrases: string[][], altProns: AlternativePronunciation[], speakingTask: SpeakingTask, name: string)
method. Calls to this method will be ignored if the recognizer is listening.
After calling this method, recognizer will be ready to start listening via startListening method.
name of the decoding graph.
context identifier.
set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.
Prepare for recognition by loading decoding graph that was created via one of the methods that create decoding graphs. Calls to this method will be ignored if the recognizer is listening.
After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
name of the decoding graph.
set to true to compute Goodness of Pronunciation scores in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.
Starts audio capture with the default audio input and recognition of the incoming audio. Listening process will stop when either: a) there is an explicit call to the stopListening method, b) if one of the Voice Activity Detection thresholds has triggered (for example, end-silence); in this case onFinalResponse callback will be executed, whereas calling stopListening will not compute final response.
Stop the recognizer from processing incoming audio and stops audio capture. You would typically use this method when you need to stop the recognizer as soon as possible (for example, because user is navigating away from the activity). If you use this method you will not be able to obtain the result for the current listening session.
Align reference and hypothesis(recognized) text
reference words.
hypothesis words.
Returns recognizer state, one of RecognizerState values.
state of the recognizer, one of RecognizerState values.
KeenASR class provides a high-level JavaScript module that provides core ASR functionality.
Installation of the package is described here.
Typically, you will follow these steps when using the SDK: