keenasr-web - v2.0.5
    Preparing search index...

    Class KeenASR

    KeenASR class provides a high-level JavaScript module that provides core ASR functionality.

    Installation of the package is described here.

    Typically, you will follow these steps when using the SDK:

    1. Initialize the SDK with initialize method.
    2. Configure recognizer options and callbacks. See Configuration category and onPartialResult and onFinalResponse callbacks.
    3. Create one or more decoding graphs See Decoding Graphs category for relevant methods.
    4. Prepare recognizer for listening with a specific graph. See Listening category for different prepareForListening methods.
    5. Call startListening to start audio capture and recognition. partialResult method will be called periodically and you can use it to display what has been recognized so far. finalResponse method will be called when the recognizer stops listening, and it will provide the final response, which will include the final result.

    Hierarchy

    • default
      • KeenASR
    Index

    INITIALIZATION

    • Initialize the SDK with a bundle URL and options (e.g. perform echo cancellation).

      Parameters

      • params: {
            asrBundleURL: string | URL;
            doEchoCancellation?: boolean;
            onASRBundleReady?: () => void;
            onCoreReady?: () => void;
        }

        initialization parameters.

        • asrBundleURL: string | URL

          The full URL of the ASR bundle (e.g., https://.../bundle-name.tgz). The bundle needs to be in a tar gzipped file, and your web server needs to be configured to serve these files with Content-Encoding set to gzip. For more details, see SDK Installation docs.

        • OptionaldoEchoCancellation?: boolean

          Whether to enable echo cancellation for audio input. Defaults to false.

        • OptionalonASRBundleReady?: () => void

          Callback invoked when the ASR bundle is ready (dowloaded and stored in the local filesystem).

          Can be used for reporting initialization progress in the UI.

        • OptionalonCoreReady?: () => void

          Callback invoked when the core module is ready.

          Can be used for e.g. initializing KeenASR logging or reporting initialization progress in the UI.

      Returns Promise<void>

    • Check if the browswer can run keenasr-web library. Currently, only Chrome 113 (or higher) and Safari 18 (or higher) are supported. More specifically, this function checks if cross-origin isolation is enabled and whether web assembly is supported.

      Cross-origin isolation needs to be enabled for the use of SharedArrayBuffer by the Web SDK. This JavaScript object allows shared memory between web workers and the main thread. Although web workers run in the same process as the main thread, they have separate execution contexts and memory spaces, by default. For this reason, SharedArrayBuffer provides a way for different execution contexts to access the same memory buffer, enabling efficient and fast data exchange without copying.

      For SharedArrayBuffer browser compatibility visit this page.

      For WebAssembly browser compatibility visit this page.

      Cross-origin isolation is enabled via the following headers:

      Cross-Origin-Opener-Policy: same-origin
      Cross-Origin-Embedder-Policy: require-corp

      Returns boolean

      true if browser can run keenasr-web library, false otherwise

    MICROPHONE ACCESS

    audioInputDeviceId: string = ''

    Get or set the audio input ID that will be matched against MediaDeviceInfo.deviceId to find the audio input device for recording.

    If the device with a specified ID is not found, then recording will fall back to the system default audio input device.

    • Filters the result of MediaDevices.enumerateDevices where kind == 'audioinput'

      Parameters

      • requestIfNeeded: boolean = false

        request microphone access if needed

      • OptionalgetAudioInputDeviceId: (audioInputDeviceId: string) => void

        callback that returns audio input device id.

      Returns Promise<MediaDeviceInfo[]>

    • Status of whether the browser microphone permissions have been granted. Note: the navigator.permissions API will not work for 'microphone' in Firefox

      Returns Promise<boolean>

    • Request microphone permissions from user.

      Parameters

      • OptionalgetAudioInputDeviceId: (audioInputDeviceId: string) => void

        callback to get the audio input device id

      Returns Promise<void>

    CONFIGURATION

    audioInputDeviceId: string = ''

    Get or set the audio input ID that will be matched against MediaDeviceInfo.deviceId to find the audio input device for recording.

    If the device with a specified ID is not found, then recording will fall back to the system default audio input device.

    onFinalResponse?: (response: ASRResponse) => void

    Callback invoked when the recognizer stops listening due to one of the VADParameter thresholds being met. The callback will provide an instance of ASRResponse object, which contains ASRResult, and various other data related to the most recent interaction. This include access to raw audio via ASRResponse.getAudio.

    onPartialResult?: (result: ASRResult) => void

    Callback invoked each time when a partial result is available. This callback will execute periodically while the recognizer is listening, and provide partial result (only text, without more detailed information like timing, phonemes, etc.).

    • get LogLevel(): typeof LogLevel

      LogLevel defines logging levels used by the SDK.

      Meant to be used with the KeenASR.setLogLevel method.

      Returns typeof LogLevel

    • Set the level of emitted logs by the SDK. Default is WARNING.

      Parameters

      Returns void

    • VAD (voice activity detection) gating introduces a super lightweight voice activity detection before speech recognition. After you call startListening method, if VADGating is turned on recognition will start only after voice was detected by VADGating module. You would typically use this only in always-on listening scenarios when you want to preserve battery life.

      Parameters

      • value: boolean

        true if VAD Gating should be turned on, false otherwise.

      Returns void

    • Set any of VADParameter Voice Activity Detection parameters. These parameters can be set at any time and they will go into effect immediately.

      Parameters

      • parameters: {
            timeoutEndSilenceForAnyMatch?: number;
            timeoutEndSilenceForGoodMatch?: number;
            timeoutForNoSpeech?: number;
            timeoutMaxDuration?: number;
        }

        a combination of Voice Activity Detection parameters and their corresponding values.

        • Optional ReadonlytimeoutEndSilenceForAnyMatch?: number

          Timeout after this many seconds if we had any match (even if final state has not been reached). Default is 2 seconds.

        • Optional ReadonlytimeoutEndSilenceForGoodMatch?: number

          Timeout after this many seconds if we had a good (high probability) match to the final state. Default is 1 second.

        • Optional ReadonlytimeoutForNoSpeech?: number

          Timeout after this many seconds even if nothing has been recognized. Default is 10 seconds.

        • Optional ReadonlytimeoutMaxDuration?: number

          Timeout after this many seconds regardless of what has been recognized. This is effectively upper bound on the duration of recognition. Default value is 20 seconds.

      Returns void

    DECODING GRAPH

    • get SpeakingTask(): typeof SpeakingTask

      SpeakingTask defines a type of speaking task that will be handled. It is primarily used to indicate to methods that create decoding graphs what type of task they need to handle, so that appropiate customization can be done when creating language model and decoding graph.

      Returns typeof SpeakingTask

    • Create contextual decoding graph from an array of contexts, for a specific task, using provided array of alterantive pronunciations, and save it in the filesystem for later use. Contextual decoding graphs can be referenced by their name by various methods in the SDK.

      Parameters

      • name: string

        a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.

      • contexts: string[][]

        an array of contexts (where each context is an array of phrases). You can swtich between contexts via KeenASR.prepareForListeningWithContextualDecodingGraphWithNameAndContextId, where contextIds are using 0-base indexing. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200).

      • config: {
            altProns?: AlternativePronunciation[];
            speakingTaskType: SpeakingTask;
            spokenNoiseProb?: number;
        }

        configuration parameters.

        • OptionalaltProns?: AlternativePronunciation[]

          an optional array of AlternativePronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).

        • speakingTaskType: SpeakingTask

          one of SpeakingTask specifying a type of interaction.

        • OptionalspokenNoiseProb?: number

          an optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.

      Returns Promise<void>

    • Create decoding graph from an array of phrases, for a specific task, using provided array of word mispronunciations and save it in the filesystem for later use.

      Parameters

      • name: string

        a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.

      • phrases: string[]

        an array of String objects that specify phrases recognizer should listen for. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200).

      • config: {
            altProns?: AlternativePronunciation[];
            speakingTaskType: SpeakingTask;
            spokenNoiseProb?: number;
        }

        configuration parameters.

        • OptionalaltProns?: AlternativePronunciation[]

          an optional array of AlternativePronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).

        • speakingTaskType: SpeakingTask

          one of SpeakingTask specifying a type of interaction.

        • OptionalspokenNoiseProb?: number

          an optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.

      Returns Promise<void>

    • Create decoding graph from an array of phrases, using specified triggerPhase, and save it in the filesystem for later use. Decoding graphs can be referenced by their name by various methods in the SDK. When using decoding graphs created with the trigger phrase support, upon calling StartListening method the SDK will listen continuously until it hears the trigger phrase; only then will partial results start occurring.

      Parameters

      • name: string

        a name of the contextual decoding graph. All graph resources will be stored in the local file system, in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME.

      • phrases: string[]

        an array of String objects that specify phrases recognizer should listen for. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200).

      • trigger_phrase: string

        a String representing a trigger phrase used to initiate recognition when using this decoding graph, for example "Hey computer". When using decoding graph with trigger phrase, recognizer will continuously listen until it hears the trigger phrase. No partial callback results will be provided until trigger phrase is recognized.

      • config: {
            altProns?: AlternativePronunciation[];
            speakingTaskType: SpeakingTask;
            spokenNoiseProb?: number;
        }

        configuration parameters.

        • OptionalaltProns?: AlternativePronunciation[]

          an optional array of AlternativePronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If these specific pronunciations were recognized, these words will be reported in partial/final result with #tag tag appended to the word (e.g. CHOICE#WRONG, if WRONG was a tag).

        • speakingTaskType: SpeakingTask

          one of SpeakingTask specifying a type of interaction.

        • OptionalspokenNoiseProb?: number

          an optional value that defines a <SPOKEN_NOISE> probability value. The value will be clipped to [0-1] range. Value close to 0 will tune down the probability of recognizing <SPOKEN_NOISE>. Value close to 1 will make <SPOKEN_NOISE> recognition as likely as any other word.

      Returns Promise<void>

    • Returns true if valid decoding graph with the given name exists in the filesystem.

      Parameters

      • name: string

        name of the decoding graph.

      Returns boolean

      • True if decoding graph with such name exists, false otherwise. This method will also check for existence of all the necessary files in the decoding graph directory.

    LISTENING

    onFinalResponse?: (response: ASRResponse) => void

    Callback invoked when the recognizer stops listening due to one of the VADParameter thresholds being met. The callback will provide an instance of ASRResponse object, which contains ASRResult, and various other data related to the most recent interaction. This include access to raw audio via ASRResponse.getAudio.

    onPartialResult?: (result: ASRResult) => void

    Callback invoked each time when a partial result is available. This callback will execute periodically while the recognizer is listening, and provide partial result (only text, without more detailed information like timing, phonemes, etc.).

    • Prepare for recognition by loading contextual decoding graph that was prepared via

      KeenASR.createContextualDecodingGraphFromPhrases(phrases: string[][], altProns: AlternativePronunciation[], speakingTask: SpeakingTask, name: string)
      

      method. Calls to this method will be ignored if the recognizer is listening.

      After calling this method, recognizer will be ready to start listening via startListening method.

      Parameters

      • decodingGraphName: string

        name of the decoding graph.

      • contextId: number

        context identifier.

      • computeGop: boolean = false

        set to true if you would like Goodness of Pronunciation scores to be computed in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.

      Returns void

    • Prepare for recognition by loading decoding graph that was created via one of the methods that create decoding graphs. Calls to this method will be ignored if the recognizer is listening.

      After calling this method, recognizer will load the decoding graph into memory and it will be ready to start listening via startListening method.

      Parameters

      • decodingGraphName: string

        name of the decoding graph.

      • computeGop: boolean = false

        set to true to compute Goodness of Pronunciation scores in the final result. There is additional overhead when computing these scores, and they require additional assets to be present in the ASR Bundle.

      Returns void

    • Starts audio capture with the default audio input and recognition of the incoming audio. Listening process will stop when either: a) there is an explicit call to the stopListening method, b) if one of the Voice Activity Detection thresholds has triggered (for example, end-silence); in this case onFinalResponse callback will be executed, whereas calling stopListening will not compute final response.

      Returns Promise<void>

    • Stop the recognizer from processing incoming audio and stops audio capture. You would typically use this method when you need to stop the recognizer as soon as possible (for example, because user is navigating away from the activity). If you use this method you will not be able to obtain the result for the current listening session.

      Returns Promise<void>

    Other

    • Align reference and hypothesis(recognized) text

      Parameters

      • refWords: string[]

        reference words.

      • hypWords: string[]

        hypothesis words.

      • useChardiff: boolean
      • insertToken: string

      Returns { alignedHyp: string[]; alignedRef: string[] }

    VERSION

    • Returns string

      The git commit hash of the current build.

    • Returns string

      The version of the SDK.