Frequently Asked Questions

How much will the SDK and ASR Bundle increases the size of my app?

An ASR Bundle's on-disk size is typically be between 30MB and 80MB, depending on which acoustic model is used, and if optional components are needed and included in the ASR Bundle. When compressed, the corresponding size would be approximately 20MB - 45MB, which is how much the app size would increase if the ASR Bundle were embedded in your app.

If size is of concern, you can have the app download the ASR Bundle when the user launches it for the first time instead of including the ASR Bundle in the app itself.

What languages do you support?

Keen Research currently provides ASR Bundles for English, Spanish, German, and French. We also provide an ASR Bundle for English that's optimized for children's voices. We can provide ASR Bundles upon request for most major spoken spoken languages within 6-8 weeks.

How does your SDK perform with children's voices?

We provide English ASR Bundle optimized for kids voices; performance is equivalent to adults on comparable test sets. A number of our customers use KeenASR SDK in apps for children.

Even though I updated the list of phrases that are passed to the method for creating the decoding graph, the new words and phrases are not recognized. Why?

Make sure that the decoding graph is recreated after you updated the list of phrases. The SDK allows you to check if the decoding graph exists in the file system, so that you don't have to create it every time the app is started. However, when you change the input to the decoding graph creation methods (for example, by augmenting or modifying the list of phrases during development) you need to make sure that the decoding graph is recreated using the updated input data. In other words, make sure you are not skipping the call to create the decoding graph simply because it has been created in the past.

Can I control the audio stack from my app?

For the most part, no, you cannot (or should not) control the audio stack. The KeenASR SDK assumes control of the audio stack on most platforms. This way we can guarantee that the audio is captured in exactly the way the KeenASR SDK expects it. We typically initiate the audio stack for the most general use and then provide callbacks that allow you to set up other SDKs or audio modules in a way that will not interfere with our SDK. For more information, see the Audio Handling section on the Getting Started page.

If you have a specific use case that does not match current SDK capabilities, drop us a line.

Do you have any tips for the development workflow?

Keen Research recommends you log complete recognition results in partial and final callbacks. It is useful to display this information on the screen during debugging and early user testing. Even if you don't plan to show this information in the final product, it will help you catch errors early on.

Review all console/logcat outputs as well as log messages with warning or error levels.

Use visual and audio indicators for when the app is actively listening; this is a good general UX practice, not just during debugging.

If you are conducting user testing, you can connect your app to the Dashboard and automatically send audio data and recognition results to the cloud for further analysis.

For assessing speech recognition performance, the best approach is to collect a small amount of test data. You can run the SDK against this test data set in a controlled manner to evaluate how well the recognition works.

When can I create a decoding graph on the device?

The answer depends on several factors: (1) the size of your language model, (2) the amount of available CPU and RAM on your device, and (3) how often you need to create decoding graphs. If you are dealing with more than 10,000 words, or if the content you are using to create the decoding graph is large (> 25,000 phrases) you will probably want to work with us and create decoding graphs ahead of time in your development sandbox. Otherwise, you can leverage the SDK to create decoding graphs on the device, as needed.

Consider this answer as a baseline; ultimately, you will need to validate your approach on devices that will be used in a production setting.

What is the memory footprint and CPU utilization when the SDK is running?

Memory footprint is driven by the size of the deep neural network acoustic model and by the size of the decoding graph (primarily by the language model that was used to build the decoding graph). This will result in roughly 100-150MB or RAM utilization for typical use cases.

CPU utilization will also depend on the size of the model (there is a fixed CPU processing burden related to the size of the acoustic model; for each frame of the audio, audio features are pushed through the deep neural network). The other factor affecting CPU utilization is graph search, which depends on the size of the graph (size of the language model) as well as on various configuration parameters. KeenASR SDK will typically take a fraction of a single CPU core; this will range from ~25 percent (on recent devices) to 90+ percent of a single core on older devices.

Keen Research is working on a number of optimizations for mobile devices that will significantly reduce memory footprint and CPU utilization.

Can the SDK run in background mode?

Yes, as of version 1.5 you can disable notification handling in the SDK by setting KIOSRecognizer's handleNotifications property to NO. You will still need to make sure audio interrupts are properly handled and the audio stack is deactivated and activated as necessary (for example, when a phone call comes through) by using the activateAudioStack and deactivateAudioStack KIOSRecognizer methods.

How much does the SDK license cost?

For mobile apps, licensing is typically structured as a yearly recurring fee per product you want to voice-enable. Depending on the type of your product or app, this structure may vary and can include revenue sharing, per device installation fees, or a one-time fee).

Keen Research's pricing and licensing structure depends on a number of factors; we don't have a single pricing model for SDK licensing. To start the conversation, let us know what you are building and how you plan to use our SDK.

A KeenASR license includes SDKs for iOS and Android, access to ASR Bundles trained in-house (including support for additional languages), access to new SDK releases, and the use of the Dashboard service during the app development phase. There are no limits on the number of installations, number of users, usage, etc.

Can you build a voice-enabled app for me?

Ken Research is not providing software development services, so most likely the answer is no.

Through our Professional Services Agreement, we can assist with proof-of-concepts (these will typically be rudimentary apps with focus on voice user interface), evaluation of a specific task domain, integration of the SDK in your app, training of custom acoustic and language models, and porting to custom hardware platforms.

Why is my Android app crashing during the KeenASR SDK initialization (Windows dev setup)?

Please check this page for potential problems with line endings in text files that can happen with Github on Windows.

Why am I am getting a linker error when building my proof-of-concept app?

Make sure you are building an app for a real device, since KeenASR SDK will not run on simulator.

Another reason may be that you cloned the iOS PoC project from Github without having git-lfs installed on your development box. Due to the large size of the KeenASR framework, some of the files in the iOS KeenASR framework are managed via git-lfs. If git-lfs is not installed, when you clone the library you will end up with a git-lfs reference file (a small text file) instead of a binary file. You can check the size of the library file (KeenASR.framework/KeenASR); if properly checked out it will be 100MB or so, if should not be very small (few thousand bytes).

The library in the iOS PoC app is not properly checked out if you just download the project zip file from Github.