Class KeenASR

KeenASR class provides a high-level JavaScript module that provides core ASR functionality.

Installation of the package is described here.

Typically, you will follow these steps when using the SDK:

Initialize the SDK with initialize method.
Configure recognizer options and callbacks. See Configuration category and onPartialResult and onFinalResponse callbacks.
Create one or more decoding graphs See Decoding Graphs category for relevant methods.
Prepare recognizer for listening with a specific graph. See Listening category for different prepareForListening methods.
Call startListening to start audio capture and recognition. partialResult method will be called periodically and you can use it to display what has been recognized so far. finalResponse method will be called when the recognizer stops listening, and it will provide the final response, which will include the final result.

Hierarchy

default
- KeenASR

INITIALIZATION

initialize

initialize(
    params: {
        asrBundleURL: string | URL;
        doEchoCancellation?: boolean;
        onASRBundleReady?: () => void;
        onCoreReady?: () => void;
    },
): Promise<void>
Initialize the SDK with a bundle URL and options (e.g. perform echo cancellation).
Parameters
- params: {
      asrBundleURL: string | URL;
      doEchoCancellation?: boolean;
      onASRBundleReady?: () => void;
      onCoreReady?: () => void;
  }
  initialization parameters.
  - asrBundleURL: string | URL
    The full URL of the ASR bundle (e.g., https://.../bundle-name.tgz). The bundle needs to be in a tar gzipped file, and your web server needs to be configured to serve these files with Content-Encoding set to gzip. For more details, see SDK Installation docs.
  - OptionaldoEchoCancellation?: boolean
    Whether to enable echo cancellation for audio input. Defaults to false.
  - OptionalonASRBundleReady?: () => void
    Callback invoked when the ASR bundle is ready (dowloaded and stored in the local filesystem).
    
    Can be used for reporting initialization progress in the UI.
  - OptionalonCoreReady?: () => void
    Callback invoked when the core module is ready.
    
    Can be used for e.g. initializing KeenASR logging or reporting initialization progress in the UI.
Returns Promise<void>
Overrides KeenASRBase.initialize

isBrowserSupported

isBrowserSupported(): boolean
Check if the browswer can run keenasr-web library. Currently, only Chrome 113 (or higher) and Safari 18 (or higher) are supported. More specifically, this function checks if cross-origin isolation is enabled and whether web assembly is supported.

Cross-origin isolation needs to be enabled for the use of SharedArrayBuffer by the Web SDK. This JavaScript object allows shared memory between web workers and the main thread. Although web workers run in the same process as the main thread, they have separate execution contexts and memory spaces, by default. For this reason, SharedArrayBuffer provides a way for different execution contexts to access the same memory buffer, enabling efficient and fast data exchange without copying.

For SharedArrayBuffer browser compatibility visit this page.

For WebAssembly browser compatibility visit this page.

Cross-origin isolation is enabled via the following headers:
```
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
```
Returns boolean
true if browser can run keenasr-web library, false otherwise

MICROPHONE ACCESS

audioInputDeviceId

audioInputDeviceId: string = ''

Get or set the audio input ID that will be matched against MediaDeviceInfo.deviceId to find the audio input device for recording.

If the device with a specified ID is not found, then recording will fall back to the system default audio input device.

getAvailableAudioInputs

getAvailableAudioInputs(
requestIfNeeded?: boolean,
getAudioInputDeviceId?: (audioInputDeviceId: string) => void,
): Promise<MediaDeviceInfo[]>
Filters the result of MediaDevices.enumerateDevices where kind == 'audioinput'
Parameters
- requestIfNeeded: boolean = false
  request microphone access if needed
- OptionalgetAudioInputDeviceId: (audioInputDeviceId: string) => void
  callback that returns audio input device id.
Returns Promise<MediaDeviceInfo[]>

isMicrophoneAccessEnabled

isMicrophoneAccessEnabled(): Promise<boolean>
Status of whether the browser microphone permissions have been granted. Note: the navigator.permissions API will not work for 'microphone' in Firefox

Returns Promise<boolean>

requestMicrophoneAccess

requestMicrophoneAccess(
getAudioInputDeviceId?: (audioInputDeviceId: string) => void,
): Promise<void>
Request microphone permissions from user.
Parameters
- OptionalgetAudioInputDeviceId: (audioInputDeviceId: string) => void
  callback to get the audio input device id
Returns Promise<void>

CONFIGURATION

audioInputDeviceId

audioInputDeviceId: string = ''

Get or set the audio input ID that will be matched against MediaDeviceInfo.deviceId to find the audio input device for recording.

If the device with a specified ID is not found, then recording will fall back to the system default audio input device.

`Optional`onFinalResponse

onFinalResponse?: (response: ASRResponse) => void

Callback invoked when the recognizer stops listening due to one of the VADParameter thresholds being met. The callback will provide an instance of ASRResponse object, which contains ASRResult, and various other data related to the most recent interaction. This include access to raw audio via ASRResponse.getAudio.

`Optional`onPartialResult

onPartialResult?: (result: ASRResult) => void

Callback invoked each time when a partial result is available. This callback will execute periodically while the recognizer is listening, and provide partial result (only text, without more detailed information like timing, phonemes, etc.).

LogLevel

get LogLevel(): typeof LogLevel
LogLevel defines logging levels used by the SDK.

Meant to be used with the KeenASR.setLogLevel method.

Returns typeof LogLevel
Inherited from KeenASRBase.LogLevel

setLogLevel

setLogLevel(logLevel: LogLevel): void
Set the level of emitted logs by the SDK. Default is WARNING.
Parameters
- logLevel: LogLevel
  one of LogLevel.
Returns void
Inherited from KeenASRBase.setLogLevel

setVADGating

setVADGating(value: boolean): void
VAD (voice activity detection) gating introduces a super lightweight voice activity detection before speech recognition. After you call startListening method, if VADGating is turned on recognition will start only after voice was detected by VADGating module. You would typically use this only in always-on listening scenarios when you want to preserve battery life.
Parameters
- value: boolean
  true if VAD Gating should be turned on, false otherwise.
Returns void
Inherited from KeenASRBase.setVADGating

setVADParameters

setVADParameters(
    parameters: {
        timeoutEndSilenceForAnyMatch?: number;
        timeoutEndSilenceForGoodMatch?: number;
        timeoutForNoSpeech?: number;
        timeoutMaxDuration?: number;
    },
): void
Set any of VADParameter Voice Activity Detection parameters. These parameters can be set at any time and they will go into effect immediately.
Parameters
- parameters: {
      timeoutEndSilenceForAnyMatch?: number;
      timeoutEndSilenceForGoodMatch?: number;
      timeoutForNoSpeech?: number;
      timeoutMaxDuration?: number;
  }
  a combination of Voice Activity Detection parameters and their corresponding values.
  - Optional ReadonlytimeoutEndSilenceForAnyMatch?: number
    Timeout after this many seconds if we had any match (even if final state has not been reached). Default is 2 seconds.
  - Optional ReadonlytimeoutEndSilenceForGoodMatch?: number
    Timeout after this many seconds if we had a good (high probability) match to the final state. Default is 1 second.
  - Optional ReadonlytimeoutForNoSpeech?: number
    Timeout after this many seconds even if nothing has been recognized. Default is 10 seconds.
  - Optional ReadonlytimeoutMaxDuration?: number
    Timeout after this many seconds regardless of what has been recognized. This is effectively upper bound on the duration of recognition. Default value is 20 seconds.
Returns void
Inherited from KeenASRBase.setVADParameters

DECODING GRAPH

SpeakingTask

get SpeakingTask(): typeof SpeakingTask
SpeakingTask defines a type of speaking task that will be handled. It is primarily used to indicate to methods that create decoding graphs what type of task they need to handle, so that appropiate customization can be done when creating language model and decoding graph.

Returns typeof SpeakingTask
Inherited from KeenASRBase.SpeakingTask

createContextualDecodingGraphFromPhrases

createContextualDecodingGraphFromPhrases(
    name: string,
    contexts: string[][],
    config: {
        altProns?: WordPronunciation[];
        speakingTaskType: SpeakingTask;
        spokenNoiseProb?: number;
    },
): Promise<boolean>
Create contextual decoding graph from an array of contexts, for a specific task, using provided array of alterantive pronunciations, and save it in the filesystem for later use. Contextual decoding graphs can be referenced by their name by various methods in the SDK.
Parameters
- name: string
  a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.
- contexts: string[][]
  an array of contexts (where each context is an array of phrases). You can swtich between contexts via KeenASR.prepareForListeningWithContextualDecodingGraphWithNameAndContextId, where contextIds are using 0-base indexing. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200).
- config: {
      altProns?: WordPronunciation[];
      speakingTaskType: SpeakingTask;
      spokenNoiseProb?: number;
  }
  configuration parameters. See WordPronunciation for how to construct valid alternative pronunciations.
  - OptionalaltProns?: WordPronunciation[]
    an optional array of WordPronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).
  - speakingTaskType: SpeakingTask
    one of SpeakingTask specifying a type of interaction.
  - OptionalspokenNoiseProb?: number
    an optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.
Returns Promise<boolean>
true if the decoding graph was created successfully; false otherwise.

Inherited from KeenASRBase.createContextualDecodingGraphFromPhrases

createDecodingGraphFromPhrases

createDecodingGraphFromPhrases(
    name: string,
    phrases: string[],
    config: {
        altProns?: WordPronunciation[];
        speakingTaskType: SpeakingTask;
        spokenNoiseProb?: number;
    },
): Promise<boolean>
Create decoding graph from an array of phrases, for a specific task, using provided array of word mispronunciations and save it in the filesystem for later use.
Parameters
- name: string
  a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.
- phrases: string[]
  an array of String objects that specify phrases recognizer should listen for. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200).
- config: {
      altProns?: WordPronunciation[];
      speakingTaskType: SpeakingTask;
      spokenNoiseProb?: number;
  }
  configuration parameters. See WordPronunciation for how to construct valid alternative pronunciations.
  - OptionalaltProns?: WordPronunciation[]
    an optional array of WordPronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).
  - speakingTaskType: SpeakingTask
    one of SpeakingTask specifying a type of interaction.
  - OptionalspokenNoiseProb?: number
    an optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.
Returns Promise<boolean>
true if the decoding graph was created successfully; false otherwise.

Inherited from KeenASRBase.createDecodingGraphFromPhrases

createDecodingGraphFromPhrasesWithTriggerPhrase

createDecodingGraphFromPhrasesWithTriggerPhrase(
    name: string,
    phrases: string[],
    trigger_phrase: string,
    config: {
        altProns?: WordPronunciation[];
        speakingTaskType: SpeakingTask;
        spokenNoiseProb?: number;
    },
): Promise<boolean>
Create decoding graph from an array of phrases, using specified triggerPhase, and save it in the filesystem for later use. Decoding graphs can be referenced by their name by various methods in the SDK. When using decoding graphs created with the trigger phrase support, upon calling StartListening method the SDK will listen continuously until it hears the trigger phrase; only then will partial results start occurring.
Parameters
- name: string
  a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.
- phrases: string[]
  an array of String objects that specify phrases recognizer should listen for. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200).
- trigger_phrase: string
  a String representing a trigger phrase used to initiate recognition when using this decoding graph, for example "Hey computer". When using decoding graph with trigger phrase, recognizer will continuously listen until it hears the trigger phrase. No partial callback results will be provided until trigger phrase is recognized.
- config: {
      altProns?: WordPronunciation[];
      speakingTaskType: SpeakingTask;
      spokenNoiseProb?: number;
  }
  configuration parameters. See WordPronunciation for how to construct valid alternative pronunciations.
  - OptionalaltProns?: WordPronunciation[]
    an optional array of WordPronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).
  - speakingTaskType: SpeakingTask
    one of SpeakingTask specifying a type of interaction.
  - OptionalspokenNoiseProb?: number
    an optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.
Returns Promise<boolean>
true if the decoding graph was created successfully; false otherwise.

Inherited from KeenASRBase.createDecodingGraphFromPhrasesWithTriggerPhrase

decodingGraphWithNameExists

decodingGraphWithNameExists(name: string): boolean
Returns true if valid decoding graph with the given name exists in the filesystem.
Parameters
- name: string
  name of the decoding graph.
Returns boolean
- True if decoding graph with such name exists, false otherwise. This method will also check for existence of all the necessary files in the decoding graph directory.
Inherited from KeenASRBase.decodingGraphWithNameExists

LISTENING

`Optional`onFinalResponse

onFinalResponse?: (response: ASRResponse) => void

`Optional`onPartialResult

onPartialResult?: (result: ASRResult) => void

prepareForListeningWithContextualDecodingGraphWithNameAndContextId

prepareForListeningWithContextualDecodingGraphWithNameAndContextId(
    decodingGraphName: string,
    contextId: number,
    computeGop?: boolean,
): void
Prepare for recognition by loading contextual decoding graph that was prepared via
```
KeenASR.createContextualDecodingGraphFromPhrases(phrases: string[][], altProns: WordPronunciation[], speakingTask: SpeakingTask, name: string)
```
method. Calls to this method will be ignored if the recognizer is listening.

After calling this method, recognizer will be ready to start listening via startListening method.
Parameters
- decodingGraphName: string
  name of the decoding graph.
- contextId: number
  context identifier.
- computeGop: boolean = false
  set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.
Returns void
Inherited from KeenASRBase.prepareForListeningWithContextualDecodingGraphWithNameAndContextId

prepareForListeningWithDecodingGraphWithName

prepareForListeningWithDecodingGraphWithName(
decodingGraphName: string,
computeGop?: boolean,
): void
Prepare for recognition by loading decoding graph that was created via one of the methods that create decoding graphs. Calls to this method will be ignored if the recognizer is listening.

After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.
Parameters
- decodingGraphName: string
  name of the decoding graph.
- computeGop: boolean = false
  set to true to compute Goodness of Pronunciation scores in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.
Returns void
Inherited from KeenASRBase.prepareForListeningWithDecodingGraphWithName

startListening

startListening(): Promise<void>
Starts audio capture with the default audio input and recognition of the incoming audio. Listening process will stop when either: a) there is an explicit call to the stopListening method, b) if one of the Voice Activity Detection thresholds has triggered (for example, end-silence); in this case onFinalResponse callback will be executed, whereas calling stopListening will not compute final response.

Returns Promise<void>

stopListening

stopListening(): Promise<void>
Stop the recognizer from processing incoming audio and stops audio capture. You would typically use this method when you need to stop the recognizer as soon as possible (for example, because user is navigating away from the activity). If you use this method you will not be able to obtain the result for the current listening session.

Returns Promise<void>

Other

alignTexts

alignTexts(
    refWords: string[],
    hypWords: string[],
    useChardiff: boolean,
    insertToken: string,
): { alignedHyp: string[]; alignedRef: string[] }
Align reference and hypothesis(recognized) text
Parameters
- refWords: string[]
  reference words.
- hypWords: string[]
  hypothesis words.
- useChardiff: boolean
- insertToken: string
Returns { alignedHyp: string[]; alignedRef: string[] }
Inherited from KeenASRBase.alignTexts

getRecognizerState

getRecognizerState(): RecognizerState
Returns recognizer state, one of RecognizerState values.

Returns RecognizerState
state of the recognizer, one of RecognizerState values.

Inherited from KeenASRBase.getRecognizerState

VERSION

getVersionHash

getVersionHash(): string
Returns string
The git commit hash of the current build.

Inherited from KeenASRBase.getVersionHash

version

version(): string
Returns string
The version of the SDK.

Inherited from KeenASRBase.version

Class KeenASR

Hierarchy

Index

INITIALIZATION

MICROPHONE ACCESS

CONFIGURATION

DECODING GRAPH

LISTENING

Other

VERSION

INITIALIZATION

initialize

Parameters

asrBundleURL: string | URL

OptionaldoEchoCancellation?: boolean

OptionalonASRBundleReady?: () => void

OptionalonCoreReady?: () => void

Returns Promise<void>

isBrowserSupported

Returns boolean

MICROPHONE ACCESS

audioInputDeviceId

getAvailableAudioInputs

Parameters

Returns Promise<MediaDeviceInfo[]>

isMicrophoneAccessEnabled

Returns Promise<boolean>

requestMicrophoneAccess

Parameters

Returns Promise<void>

CONFIGURATION

audioInputDeviceId

OptionalonFinalResponse

OptionalonPartialResult

LogLevel

Returns typeof LogLevel

setLogLevel

Parameters

Returns void

setVADGating

Parameters

Returns void

setVADParameters

Parameters

Optional ReadonlytimeoutEndSilenceForAnyMatch?: number

Optional ReadonlytimeoutEndSilenceForGoodMatch?: number

Optional ReadonlytimeoutForNoSpeech?: number

Optional ReadonlytimeoutMaxDuration?: number

Returns void

DECODING GRAPH

SpeakingTask

Returns typeof SpeakingTask

createContextualDecodingGraphFromPhrases

Parameters

OptionalaltProns?: WordPronunciation[]

speakingTaskType: SpeakingTask

OptionalspokenNoiseProb?: number

Returns Promise<boolean>

createDecodingGraphFromPhrases

Parameters

OptionalaltProns?: WordPronunciation[]

speakingTaskType: SpeakingTask

OptionalspokenNoiseProb?: number

Returns Promise<boolean>

createDecodingGraphFromPhrasesWithTriggerPhrase

Parameters

OptionalaltProns?: WordPronunciation[]

speakingTaskType: SpeakingTask

OptionalspokenNoiseProb?: number

Returns Promise<boolean>

decodingGraphWithNameExists

Parameters

Returns boolean

LISTENING

OptionalonFinalResponse

OptionalonPartialResult

prepareForListeningWithContextualDecodingGraphWithNameAndContextId

Parameters

Returns void

prepareForListeningWithDecodingGraphWithName

`Optional`doEchoCancellation?: boolean

`Optional`onASRBundleReady?: () => void

`Optional`onCoreReady?: () => void

`Optional`onFinalResponse

`Optional`onPartialResult

`Optional` `Readonly`timeoutEndSilenceForAnyMatch?: number

`Optional` `Readonly`timeoutEndSilenceForGoodMatch?: number

`Optional` `Readonly`timeoutForNoSpeech?: number

`Optional` `Readonly`timeoutMaxDuration?: number

`Optional`altProns?: WordPronunciation[]

`Optional`spokenNoiseProb?: number

`Optional`altProns?: WordPronunciation[]

`Optional`spokenNoiseProb?: number

`Optional`altProns?: WordPronunciation[]

`Optional`spokenNoiseProb?: number

`Optional`onFinalResponse

`Optional`onPartialResult