KeenASR is an offline speech recognition SDK for iOS and Android. This Unity plugin provides a C# API that wraps the native SDK, giving you on-device speech recognition without requiring a network connection.

Quick Start

using KeenResearch;
 
// 1. Subscribe to initialization event
KeenASR.onInitializedReceived += OnInit;
 
// 2. Initialize with an ASR bundle from StreamingAssets
KeenASR.Initialize("keenA1m-nnet3chain-en-us");
 
void OnInit(bool success) {
    if (!success) return;
 
    // 3. Subscribe to results
    KeenASR.Instance.onFinalResponseReceived += OnResponse;
 
    // 4. Create a decoding graph with expected phrases
    string[] phrases = { "YES", "NO", "HELLO", "GOODBYE" };
    KeenASR.Instance.CreateDecodingGraphFromPhrases("myDG", phrases);
 
    // 5. Load the decoding graph and prepare the recognizer
    KeenASR.Instance.PrepareForListeningWithDecodingGraph("myDG");
 
    // 6. Start listening
    KeenASR.Instance.StartListening();
 
    // Steps 4-6 can happen at any given time
}
 
void OnResponse(ASRResponse response) {
    Debug.Log("Recognized: " + response.result.text);
 
    // do something with the response/result
 
    // Always dispose the response to release native resources
    response.Dispose();
}

Key Concepts

ASR Bundle

An ASR bundle is a language specific asset with a pre-trained speech recognition model. It is stored in Assets/StreamingAssets/ and specified by name when calling KeenASR.Initialize().

Decoding Graph

A decoding graph defines what the recognizer can recognize. Create one from a list of phrases using CreateDecodingGraphFromPhrases(). Graphs are saved on the device and can be referenced by name and reused across sessions.

Recognizer Lifecycle

The recognizer progresses through these states (see RecognizerState):

NeedsDecodingGraph — initialized, but no decoding graph loaded
ReadyToListen — decoding graph loaded, ready to start recognition
Listening — actively capturing and decoding audio
FinalProcessing — transient state, audio capture stopped, computing final result

How Listening Stops

Listening stops in one of three ways:

VAD threshold triggered — e.g. end-silence or max duration (delivers final result via onFinalResponseReceived)
Audio interrupt — phone call, notification, app backgrounded (no callback)
Explicit StopListening() call — stops audio processing but does not trigger onFinalResponseReceived. To get the final result, set short VAD timeouts instead so the recognizer stops naturally.

Recognition Results

Results are delivered through two events:

onPartialASRResultReceived — called repeatedly during recognition with intermediate text
onFinalResponseReceived — called when recognition completes via VAD, with an ASRResponse containing the full result, per-word timing, phoneme-level detail, and audio quality metrics

Response Lifecycle

Each ASRResponse holds a reference to native resources. The caller owns the response and must call Dispose() when done. Failing to dispose will leak native memory until the garbage collector runs the finalizer.

void OnResponse(ASRResponse response) {
    // Use the response...
    string text = response.result.cleanText;
 
    // Save audio/JSON if needed
    response.SaveAudioFile(Application.persistentDataPath);
    response.SaveJsonFile(Application.persistentDataPath);
 
    // Or queue for Dashboard upload
    response.QueueForUpload();
 
    // Always dispose when done
    response.Dispose();
}

Voice Activity Detection (VAD)

VAD parameters control when the recognizer automatically stops listening:

TimeoutForNoSpeech — stop if no speech detected within this many seconds
TimeoutEndSilenceForGoodMatch — seconds of trailing silence for a confident match
TimeoutEndSilenceForAnyMatch — seconds of trailing silence for any match
TimeoutMaxDuration — maximum total listening duration

For practical purposes, TimeoutEndSilenceForGoodMatch and TimeoutEndSilenceForAnyMatch can be treated the same way.

Goodness of Pronunciation (GoP)

When computeGoP is set to true in PrepareForListeningWithDecodingGraph(), the final result includes phoneme-level pronunciation scores (0-1) in each ASRWord.phones array. This is useful for pronunciation assessment applications. This parameter also requires the ASR Bundle to support goodness of pronunciation scoring.

Dashboard Upload

To upload recognition responses to the KeenASR Dashboard for analysis:

// Start the uploader once during setup
KeenASR.StartDataUploader("YOUR_APP_KEY");
 
// Queue individual responses for upload
void OnResponse(ASRResponse response) {
    response.QueueForUpload();
    response.Dispose();
}

Platform Notes

Feature	iOS	Android
Initialization	Synchronous	Asynchronous
Speaking task graphs	Supported	Supported
Echo cancellation	Supported	Not yet implemented

Teardown

To fully release the SDK and all resources:

KeenASR.Teardown();

KeenResearch.KeenASR.Teardown

static bool Teardown()

Tears down the recognizer and releases all associated resources. All audio playback should be stopped...

Definition KeenASR.cs:508

After teardown, KeenASR.Instance returns null. The SDK can be re-initialized by calling KeenASR.Initialize() again.