Common questions about KeenASR SDK, organized by topic.

Product

1. What platforms does the SDK support?

KeenASR SDK runs natively on iOS, Android, web (via JavaScript/WebAssembly), and Linux. A Unity plugin enables cross-platform game and app development, currently supporting iOS and Android. For platform-specific details, see the Platforms section of the documentation.

2. What are the minimum device requirements?

Requirements vary by platform. The SDK runs on a wide range of devices, including older hardware such as iPhone 8 and comparable Android phones. As a general guideline, the CPU clock should be roughly 1.6 GHz or higher. The SDK requires approximately 30-80MB of storage for the ASR Bundle and 120-160MB of RAM during recognition. The SDK runs on a single CPU core and uses 25-80% of that core while recognition is active (closer to 25% on faster, newer processors and up to 80% on older, slower ones). For specific OS version requirements, see the introduction pages for iOS, Android, and Web.

3. Does the SDK work offline?

Yes. All speech recognition processing happens on the device, so no internet connection is required. The SDK delivers real-time results with no network latency and works in environments with limited or no connectivity.

4. Is the SDK compliant with COPPA, GDPR, and other privacy regulations?

Because all audio processing happens locally and no speech data leaves the device, the SDK is well suited for apps that need to meet COPPA, GDPR, or other privacy requirements. No audio is sent to external servers or third-party cloud services for speech recognition processing.

5. How does KeenASR compare to the built-in speech recognition on iOS and Android?

The native speech recognition APIs on iOS (Apple Speech) and Android (Google Speech) are general-purpose and primarily designed for dictation. KeenASR differs in several important ways: it runs fully on-device with no cloud dependency, it allows precise control over what the recognizer listens for (through custom decoding graphs), and it provides consistent behavior across platforms. The SDK also supports custom acoustic models for specialized environments, children’s voices, and domain-specific vocabulary, none of which are available through the platform APIs. In addition, KeenASR provides a number of features specifically designed for EdTech use cases, such as reading instruction and assessment, and phoneme-level pronunciation scoring, that are not offered by native speech recognition services.

6. How can I try the SDK?

The quickest way to experience KeenASR is through the live web demos, which run entirely in the browser. Among them, the Developer Demo exposes most of the SDK API through a GUI and is useful for experimentation and getting familiar with the API. For a deeper evaluation, proof-of-concept apps for iOS and Android are available on GitHub and use the trial SDK. See Try KeenASR SDK for download links and instructions.

7. What languages are supported?

Keen Research currently provides ASR Bundles for English, Spanish, German, and French. An ASR Bundle optimized for children’s voices is also available for English. ASR Bundles for most major spoken languages can be provided upon request within 6-8 weeks.

8. Does the SDK support multiple languages simultaneously?

The SDK supports one ASR Bundle at a time. To switch between languages, a different ASR Bundle needs to be loaded. This is a fast operation and can be done between recognition sessions, but the recognizer cannot process multiple languages within a single session.

9. Does the SDK provide real-time streaming results?

Yes. While recognition is in progress, the SDK delivers partial results through a callback, allowing the app to display or act on what has been recognized so far. Partial results update as more audio is processed. A final result is delivered once the recognizer determines that the utterance is complete. For more details, see Recognition Results and Callbacks.

10. Can custom acoustic models be trained for a specific domain?

Yes. Keen Research can train custom acoustic models tailored to specific environments, accents, or use cases. Custom language models can also be created for domain-specific vocabulary and terminology. This is available through a Professional Services Agreement.

11. Can the SDK recognize made-up words?

Yes. The SDK can recognize any word or phrase that can be represented phonetically, including fictional words, brand names, and invented languages. Custom pronunciations can be defined when creating a decoding graph, so the recognizer knows how each word is expected to sound.

12. Does the SDK provide reading fluency or pronunciation scoring metrics?

The SDK does not provide pre-packaged fluency or pronunciation scores at the response level. Instead, the recognition result includes all the building blocks needed to compute these metrics: word-level and phoneme-level timing, and phoneme-level pronunciation scores (Goodness of Pronunciation). This gives developers full control over how to define and calculate fluency, accuracy, or pronunciation quality for their specific use case and pedagogical approach. For details on what is included in the result, see Recognition Results and Callbacks.

Technical

13. How is the recognizer configured to listen for specific words and phrases?

The recognizer listens for words and phrases defined in a decoding graph. The SDK provides createDecodingGraph methods that build a decoding graph from a list of phrases. Once created, decoding graphs are persisted on the file system and can be reused across app sessions. To activate a decoding graph, call prepareForListening with a reference to it before starting recognition.

For most use cases, especially those with limited vocabularies, creating decoding graphs directly on the device via the SDK API is the simplest approach. This also works well when the content changes dynamically. For larger vocabularies (more than a few thousand words), it is generally better to create decoding graphs ahead of time using CLI tools in a development environment, and include them with the app.

14. What information is included in a recognition result?

A recognition result includes the recognized text, overall confidence score, along with word-level and phoneme-level timing, and phoneme-level pronunciation scores (Goodness of Pronunciation). Additional metadata at the response level, such as audio quality indicators, is also available. This information can be used for display, scoring, analytics, or driving app logic. For a full description, see Recognition Results and Callbacks.

15. What happens when the recognizer encounters a word not in the decoding graph?

The recognizer can only recognize words and phrases defined in the active decoding graph. If a speaker says something outside that vocabulary, the SDK will either map it to a special <SPOKEN_NOISE> token or to the acoustically closest word in the decoding graph. The API provides ways to control how lenient or strict the recognizer should be with the <SPOKEN_NOISE> token, which is useful for being more forgiving with accented speech and mispronunciations. Designing the decoding graph to cover the expected vocabulary for each recognition context is an important part of integration. For guidance, see Decoding Graphs & Acoustic Models.

16. Do I need to capture the audio and feed it to the SDK?

No. The SDK manages audio capture from the microphone internally. Once the recognizer is started, it automatically captures and processes audio. There is no need to record audio separately or pass audio buffers to the SDK.

17. What sample rate and audio format does the SDK expect?

The SDK handles audio capture internally and configures the microphone for the correct sample rate and format. In most cases, no manual audio configuration is needed. For details on how audio is managed, see the Audio Handling section on the Getting Started page.

18. Can the audio stack be controlled from the app?

For the most part, no. The KeenASR SDK assumes control of the audio stack on most platforms to guarantee that audio is captured in exactly the format the SDK expects. The audio stack is initialized for the most general use, and callbacks are provided so that other SDKs or audio modules can be set up without interfering. For more information, see the Audio Handling section on the Getting Started page.

For specific use cases that do not match current SDK capabilities, contact us.

19. Can the SDK be set up to work in always-on listening mode?

Yes. The SDK can be configured for continuous listening by restarting recognition in the final response callback. This allows the recognizer to stay active over extended periods, which is useful for hands-free applications, voice-driven workflows, and scenarios where the app needs to listen for commands or phrases over long sessions. For more details, see Continuous Listening.

20. Does the SDK support echo cancellation?

Yes. When the app plays audio through the device speaker while the recognizer is listening, echo cancellation (AEC) can be enabled to filter out the device’s own audio output from the microphone signal. This is available on iOS and Android. For setup details and platform-specific considerations, see Echo Cancellation.

21. Can the SDK run in background mode?

Yes. On iOS, notification handling in the SDK can be disabled by setting KIOSRecognizer’s handleNotifications property to NO. Audio interrupts still need to be handled properly, and the audio stack should be deactivated and activated as necessary (for example, when a phone call comes through) using the activateAudioStack and deactivateAudioStack KIOSRecognizer methods.

22. How much will the SDK and ASR Bundle increase the app size?

An ASR Bundle’s on-disk size is typically between 30MB and 80MB, depending on which acoustic model is used and whether optional components are included. When compressed, the corresponding size would be approximately 20MB - 45MB, which is how much the app size would increase if the ASR Bundle is embedded in the app.

If size is a concern, the app can download the ASR Bundle on first launch instead of embedding it.

23. What is the memory footprint and CPU utilization when the SDK is running?

Memory footprint is driven by the size of the deep neural network acoustic model and by the size of the decoding graph (primarily by the language model used to build it). This will result in roughly 120-160MB of RAM utilization for typical use cases.

CPU utilization also depends on the size of the model (there is a fixed processing cost related to the acoustic model; for each frame of audio, features are pushed through the deep neural network). The other factor is graph search, which depends on the size of the graph and various configuration parameters. KeenASR SDK will typically use a fraction of a single CPU core, ranging from ~25 percent on recent devices to 90+ percent on older devices.

24. What are some tips for the development workflow?

Log complete recognition results in partial and final callbacks. Displaying this information on screen during debugging and early user testing helps catch errors early, even if it will not be shown in the final product.

Review all console/logcat outputs as well as log messages with warning or error levels.

Use visual and audio indicators when the app is actively listening; this is a good general UX practice, not just during debugging.

During user testing, the app can be connected to the Dashboard to automatically send audio data and recognition results for further analysis.

For assessing speech recognition performance, the best approach is to collect a small amount of test data and run the SDK against it in a controlled manner to evaluate recognition accuracy.

25. Can the SDK be tested against custom speech data?

Yes. Keen Research provides batch testing tooling packaged in a Docker container that can run the SDK against a set of audio files. Setting up and running batch tests does not require development skills, only configuration files that define the experiment. Typically, Keen Research will set up the initial experiment with a subset of the data and share both the setup and results, so the evaluation can then be continued independently. For guidance on evaluating ASR systems, see Evaluating ASR Systems, Part 1: The Big Picture.

26. Can recordings be captured for later analysis?

Yes. The SDK provides an API to save response audio and JSON data to a local directory, allowing the app to process, export, or analyze recordings as needed. Additionally, the SDK can be configured to upload recordings and recognition results to the Dashboard, where they can be played back, transcribed, and analyzed. Dashboard uploads are disabled by default and should not be enabled in production.

27. What is Dashboard?

Dashboard is a webapp and a private web API that provides visibility into what is happening on the device during speech recognition testing. It allows seamless sync of response data from the app to the webapp, where developers can listen to recorded audio, review recognition results, transcribe recordings, and inspect the recognizer configuration for each session. Dashboard uploads are disabled by default and are intended for development use only. See Dashboard for details.

28. The list of phrases was updated, but new words and phrases are not recognized. Why?

The decoding graph must be recreated after updating the list of phrases. The SDK provides a way to check whether a decoding graph already exists on the file system, so it does not need to be created every time it’s used. However, when the input to the decoding graph creation methods changes (for example, by augmenting or modifying the phrase list during development), the decoding graph must be recreated with the updated input data. A common mistake is skipping the creation call because a decoding graph was already created in a previous run.

Licensing & Services

29. Is there a trial or evaluation SDK available?

Yes. A trial SDK is available for download and can be used to evaluate KeenASR in a development environment. The trial includes full SDK functionality. See Try KeenASR SDK for details and download links.

30. How much does the SDK license cost?

For mobile and web apps, licensing is typically structured as a yearly recurring fee that covers use of the SDK in a specific product. For custom hardware devices, the typical licensing model is a per-device fee.

Contact us with details about the product and intended use to start the conversation.

A KeenASR license covers all supported platforms (iOS, Android, web, and Unity), access to ASR Bundles trained in-house (including support for additional languages), access to new SDK releases, and use of the Dashboard service during development. For mobile and web apps, there are no fees associated with the number of installations, number of users, or usage.

31. Does Keen Research offer app development services?

Keen Research does not offer general software development services. However, through a Professional Services Agreement, Keen Research can assist with proof-of-concept development (typically focused on the voice user interface), evaluation of a specific task domain, SDK integration, training of custom acoustic and language models, and porting to custom hardware platforms.

32. What kind of support is included with a license?

Licensed customers have access to engineering support for SDK integration, troubleshooting, and configuration guidance. Bug fixes and SDK updates are included as part of the license. Support is provided directly by the team that builds the SDK. For more information, see the Support page.