An ASR Bundle's on-disk size is typically be between 30MB and 60MB, depending on which acoustic model is used. When compressed, the corresponding size would be approximately 20MB - 45MB, which is how much the app size would increase if the ASR Bundle were embedded in your app.
If size is of concern, you can have the app download the ASR Bundle when the user launches it for the first time instead of including the ASR Bundle in the app itself.
Even though I updated the list of phrases that are passed to the method for creating the decoding graph, the new words and phrases are not recognized. Why?
For the most part, no, you cannot (or should not) control the audio stack. The KeenASR SDK assumes control of the audio stack on most platforms. This way we can guarantee that the audio is captured in exactly the way the KeenASR SDK expects it. We typically initiate the audio stack for the most general use and then provide callbacks that allow you to set up other SDKs or audio modules in a way that will not interfere with our SDK. For more information, see the Audio Handling section on the Getting Started page.
If you have a specific use case that does not match current SDK capabilities, drop us a line.
Keen Research recommends you log complete recognition results in partial and final callbacks. It is useful to display this information on the screen during debugging and early user testing. Even if you don't plan to show this information in the final product, it will help you catch errors early on.
Review all console/logcat outputs as well as log messages with warning or error levels.
Use visual and audio indicators for when the app is actively listening; this is a good general UX practice, not just during debugging.
If you are conducting user testing, you can connect your app to the Dashboard and automatically send audio data and recognition results to the cloud for further analysis.
For assessing speech recognition performance, the best approach is to collect a small amount of test data. You can run the SDK against this test data set in a controlled manner to evaluate how well the recognition works.
The answer depends on several factors: (1) the size of your language model, (2) the amount of available CPU and RAM on your device, and (3) how often you need to create decoding graphs. If you are dealing with more than 10,000 words, or if the content you are using to create the decoding graph is large (> 25,000 phrases) you will probably want to work with us and create decoding graphs ahead of time in your development sandbox.
Note that the decoding graphs created on the device are not using the rescoring approach in decoding. Also note that this answer relates to the process of creating decoding graphs, which is typically done once.
Consider this answer as a baseline; ultimately, you will need to validate your approach on devices that will be used in a production setting.
CPU utilization will also depend on the size of the model (there is a fixed CPU processing burden related to the size of the acoustic model; for each frame of the audio, audio features are pushed through the deep neural network). The other factor affecting CPU utilization is graph search, which depends on the size of the graph (size of the language model) as well as on various configuration parameters.
For medium vocabulary tasks (such as searching a movie library with ~7000 titles), the memory footprint for an SDK with the standard ASR Bundle would be 100-150MB. CPU utilization would be approximately 40% of a single core on the iPhone 6s.
Keen Research is working on a number of optimizations for mobile devices that will significantly reduce memory footprint and CPU utilization.
Keen Research's pricing and licensing structure depends on a number of factors; we don't have a single pricing model for SDK licensing. To start the conversation, let us know what you are building and how you plan to use our SDK.
A KeenASR license includes SDKs for iOS and Android, access to ASR Bundles trained in-house (including support for additional languages), access to new SDK releases, and the use of the Dashboard service during the app development phase. There are no limits on the number of installations, number of users, usage, etc.
Through our Professional Services Agreement, we can assist with proof-of-concepts (these will typically be rudimentary apps with focus on voice user interface), evaluation of a specific task domain, integration of the SDK in your app, training of custom acoustic and language models, and porting to custom hardware platforms.
Another reason may be that you cloned the iOS PoC project from Github without having git-lfs installed on your development box. Due to the large size of the KeenASR framework, some of the files in the iOS KeenASR framework are managed via git-lfs. If git-lfs is not installed, when you clone the library you will end up with a git-lfs reference file (a small text file) instead of a binary file. You can check the size of the library file (KeenASR.framework/KeenASR); if properly checked out it will be 100MB or so, if should not be very small (few thousand bytes).
The library in the iOS PoC app is not properly checked out if you just download the project zip file from Github.