Unity Quick Start

After you installed the plugin, you can start using it in your app by following these steps. This guide walks through a minimal integration: initialize the SDK, create a decoding graph, listen for speech, and handle results.

1. Import the Namespace

using KeenResearch;

2. Initialize the SDK

Initialization should happen early in your app lifecycle. On Android, initialization is asynchronous, so always use the callback pattern:

IEnumerator Start() {
    // Request microphone permission on Android
#if UNITY_ANDROID
    if (!Permission.HasUserAuthorizedPermission(Permission.Microphone))
        Permission.RequestUserPermission(Permission.Microphone);
    while (!Permission.HasUserAuthorizedPermission(Permission.Microphone))
        yield return null;
#else
    yield return null;
#endif

    // Optional: set log level before initialization
    KeenASR.SetLogLevel(LogLevel.Info);

    // Register the initialization callback
    KeenASR.onInitializedReceived += OnKeenASRInitialized;

    // Initialize with the ASR bundle name (must match the directory
    // name in Assets/StreamingAssets/)
    KeenASR.Initialize("keenA1m-nnet3chain-en-us");
}

3. Set Up Events and Create a Decoding Graph

Once initialized, register event handlers and create a decoding graph from your target phrases:

void OnKeenASRInitialized(bool success) {
    if (!success) {
        Debug.LogError("KeenASR initialization failed");
        return;
    }

    KeenASR recognizer = KeenASR.Instance;

    // Register event handlers
    recognizer.onPartialASRResultReceived += OnPartialResult;
    recognizer.onFinalResponseReceived += OnFinalResponse;

    // Create a decoding graph, which defines which phrases the
    // recognizer can detect. The recognizer will match spoken audio
    // against these phrases.
    string[] phrases = new string[] {
        "YES", "NO", "START", "STOP", "HELLO", "GOODBYE"
    };
    recognizer.CreateDecodingGraphFromPhrases("myGraph", phrases);

    // Prepare the recognizer with this graph.
    // Set computeGoP: true for phoneme-level pronunciation scores.
    recognizer.PrepareForListeningWithDecodingGraph("myGraph");

    // Configure Voice Activity Detection: how long to wait after
    // the user stops speaking before finalizing the result
    recognizer.SetVADParameter(VadParameter.TimeoutEndSilenceForGoodMatch, 1.0f);
    recognizer.SetVADParameter(VadParameter.TimeoutEndSilenceForAnyMatch, 1.0f);
}

Note: Decoding graphs are persisted in the file system. If your phrases are not changing, use DecodingGraphWithNameExists() to check whether the graph already exists instead of rebuilding it every time.

4. Start/Stop Listening

To start capturing audio and decoding it, call the recognizer’s StartListening() method:

KeenASR.Instance.StartListening();

While the recognizer is listening, it fires onPartialASRResultReceived every 100-200ms with interim results. The recognizer automatically stops listening when one of the VAD rules is triggered (for example, end-of-silence timeout after a good match), and fires onFinalResponseReceived with the final result.

If you need to stop recognition immediately (for example, the user is navigating away from the screen), call StopListening(). This cancels recognition and does not produce a final result.

KeenASR.Instance.StopListening();

5. Handle Results

Partial results arrive every 100-200ms while the user is speaking:

void OnPartialResult(string text) {
    // Update UI with interim result
    resultLabel.text = text;
}

Final response arrives when VAD determines the user has finished speaking. The response includes the recognized text, word timing, and optionally phoneme-level pronunciation scores:

void OnFinalResponse(ASRResponse response) {
    ASRResult result = response.result;

    Debug.Log("Recognized: " + result.cleanText);

    // Word-level details
    foreach (ASRWord word in result.words) {
        Debug.Log("Word: " + word.text +
                  " start=" + word.startTime + "s" +
                  " duration=" + word.duration + "s");

        // Phoneme-level pronunciation scores (if computeGoP was true)
        if (word.phones != null) {
            foreach (ASRPhone phone in word.phones) {
                Debug.Log("  " + phone.text +
                          " score=" + phone.pronunciationScore);
            }
        }
    }

    // Always dispose the response when done to release native resources
    response.Dispose();
}

6. Switching Decoding Graphs

You can create multiple decoding graphs and switch between them when the recognizer is not listening:

// Create another graph
string[] colorPhrases = new string[] {
    "RED", "BLUE", "GREEN", "YELLOW", "ORANGE"
};
recognizer.CreateDecodingGraphFromPhrases("colors", colorPhrases);

// Switch to it (must not be listening)
recognizer.PrepareForListeningWithDecodingGraph("colors");

7. Contextual Decoding Graphs

Contextual decoding graphs let you group phrases into contexts and switch between them at runtime without rebuilding:

// Create a contextual graph with three contexts
string[][] contexts = new string[][] {
    new string[] { "YES", "NO" },           // context 0
    new string[] { "HELLO", "GOODBYE" },    // context 1
    new string[] { "LEFT", "RIGHT", "UP" }  // context 2
};
recognizer.CreateContextualDecodingGraphFromPhrases("myContextGraph", contexts);

// Prepare with a specific context
recognizer.PrepareForListeningWithContextualDecodingGraph("myContextGraph", 0);

// Later, switch to a different context (must not be listening)
recognizer.PrepareForListeningWithContextualDecodingGraph("myContextGraph", 2);

8. Alternative Pronunciations

For reading assessment or language learning, you can define alternative pronunciations for words. When an alternative pronunciation is recognized, the result includes the tag:

// Define a mispronunciation: if "PEAK" is pronounced as "P IH0 K",
// the result will show "PEAK#WRONG"
var altPronunciations = new WordPronunciation[] {
    new WordPronunciation("PEAK", "P IH0 K", "WRONG")
};

// Pass alternatives when creating the decoding graph
string[] phrases = new string[] { "PEAK", "PEEK", "PICK" };
recognizer.CreateDecodingGraphFromPhrases("reading", phrases,
    alternativePronunciations: altPronunciations);

Phone symbols must come from the ASR bundle’s lexicon (lang/phones.txt). Use WordPronunciation.IsValid() to verify a pronunciation at runtime.

Warning: If phrases used to build a decoding graph contain words that are not in the lexicon (lang/lexicon.txt in the ASR Bundle directory), their phonetic transcription will be assigned algorithmically. Since this method is not perfect for English and wrong phonetic transcriptions will affect recognition performance, this might have unwanted effects if you are aiming to recognize unusual words. You can provide a list of words with their pronunciations via the WordPronunciation class. Contact us if you need help.

9. Dashboard Integration

To upload recognition responses to the Keen Research Dashboard for analysis:

// Start the uploader once after initialization (requires your app key)
KeenASR.StartDataUploader("YOUR_APP_KEY");

// In your final response handler, queue the response before disposing
void OnFinalResponse(ASRResponse response) {
    // ... process result ...

    response.QueueForUpload();
    response.Dispose();
}

// Optionally pause/resume or stop the uploader
KeenASR.PauseUploader();
KeenASR.ResumeUploader();
KeenASR.StopUploader();

10. Teardown

When your app no longer needs speech recognition, release native resources:

KeenASR.Teardown();

After teardown, you can re-initialize with KeenASR.Initialize() if needed.

Tips

VAD tuning: Lower TimeoutEndSilenceForGoodMatch for faster finalization at the cost of potentially cutting off the user. Higher values wait longer for additional speech.
Audio interrupts: Register onRecognizerReadyToListenAfterInterruptReceived to handle phone calls or other audio interruptions gracefully.
Editor testing: The plugin compiles in the Unity Editor using a stub that logs warnings. Build to a device or simulator for actual recognition.

Complete Example

See Assets/TestKeenASR.cs in the plugin distribution for a complete working example with UI, microphone permission handling, and recognition callbacks.

Warning: The trial version of the SDK includes all supported functionality, but it runs for only 15 minutes at a time. After 15 minutes, the SDK will force the app to exit. You can always restart the app and run it again. For commercial licensing inquiries please get in touch.