Decoding graph for large vocabulary tasks (more than ~10k words) will need to be built ahead of time, and bundled with the app or downloaded after the app has been installed. It is currently not feasible to create large decoding graphs on the device due to memory and CPU constraints. Contact us for more details and help with creating these decoding graphs.

In the future we are planning to provide tools that will allow you to create such decoding graphs as well as a set of decoding graphs (generic and domain specific) for large vocabulary tasks. We can also help create language models and decoding graphs for domain-specific tasks (e.g. medical, construction, industrial, etc.) using existing data from your enterprise systems.

The SDK provides startListening method which will usually be user initiated; for example, user will tap a button on the screen to initiate speech recognition session. You can call stopListening to explicitly stop listening from the app, but we recommend you rely on Voice Activity Detection and let the SDK stop listening automatically when the user stops speaking. See Getting Started section that further describes this.
ASR Bundle on-disk size will typically be between 30MB to 60MB, depending on which acoustic model is used. When compressed, the corresponding size would be about 20MB - 45MB, which is how the app size would be affected.

If this is still of concern, instead of including the ASR Bundle in the app, you can download it when the app is ran for the first time.
We currently provide ASR Bundles for English (US and UK). We can provide ASR Bundles for majority of frequently spoken languages within 4-6 weeks.
Children's voices are different from adult voices, so it's recommended that acoustic models trained on children speech data are used; we are in the process of creating such acoustic models. Note that in case of small vocabulary tasks, even adult acoustic models may work well for children.
Make sure that decoding graph is (re)created after you updated the list of phrases. SDK provides ways to check if the decoding graph exists in the file system, so that you don't have to create it every time the app is started. When you change the input to the decoding graph creation methods (by augmenting or modifying list of phrases, for example) -- typically during the development -- you will need to make sure that the decoding graph is created using new input data, i.e. you are not skipping the call to create decoding graph, because it has been created in the past.
For the most part no, you cannot; our SDK takes control of the audio stack on most platforms. This way we can guarantee that audio is captured in a way KeenASR SDK expects it. We typically initiate audio stack for the most general use and provide callbacks that allow you to setup other SDKs/audio modules in a way that will not interfere with our SDK. For more details see Audio Handling section of Getting Started document.

If you have a specific use case that doesn't match current capabilities, drop us a line.
We recommend you log complete recognition result in partial and final callbacks. It is also useful to display this information on the screen during debugging and early user testing. Even if you don't plan to show this information in the final product, this will help you catch errors early on.

It's also useful to have visual and audio indication on when the app is listening; this is a good general practice, not just during debugging.

If you are doing user testing, you can connect your app to Dashboard and automatically send audio data and recognition results to the cloud for further analysis.

For assessing speech recognition performance, the best approach is to collect a small amount of test data. You can run the SDK against the files to assess, in a controlled manner, how well the recognition works.
This depends on several factors: 1) how big is your language model, 2) how much CPU and RAM do you have? 3) how often do you need to create decoding graphs? In general, if you are dealing with more than 10,000 words and/or if the content you are using to create decoding graph is large (> 25,000 entries) you will probably want to create decoding graphs ahead of time in your development box (you will need to work with us).

If you can run this processes in a background thread, in a controlled manner, and on a more recent devices (iPhone 7 or iPhone X, and equivalent Android devices) you may be able to create larger decoding graphs on the device.

Note that decoding graphs created on the device are not using rescoring approach in decoding. Also note that this answer relates to the process of creation of decoding graphs, which is typically done once.

This answer provides you with a baseline, but ultimately you will want to test this on devices that will be used in production setting.
Memory footprint will be driven by the size of the deep neural network acoustic model, and the size of the decoding graph (primarily langauge model that was used to build the decoding graph).

CPU utilization will also depend on the size of the model (there is a fixed CPU processing related to the size of the acoustic model; for each frame of audio, audio features are pushed through the deep neural network). The other factor for CPU utilization is graph search, which will depend on the size of the graph (size of the language model) as well as various configuration parameters.

For medium vocabulary task (e.g. searching movie library with ~7000 titles), memory footprint with some of the in-house models we have will be around 100MB, and CPU utilization will be around 40% of a single core on iPhone 6s.

We are working on a number of optimizations for mobile devices that will significantly reduce memory footprint and CPU utilization.