Release Notes Document
Release 1.3, Sept 20th 2017
Enhancement: we reduced the memory footprint of the recognizer by about 20%. In addition, version 1.3 of the SDK supports quantized ASR Bundles which take 3-3.5x less disk space. Currently, this 3-3.5x reduction in ASR Bundle size only applies for on-disk storage (i.e. affecting your app download size). Future releases will follow similar reduction in memory footprint.
Enhancement: when running recognition from the audio file, using startListeningFromAudioFile:, method we now verify that sampling frequency, bps, and number of channels match expected values (audio files should always be mono and 16bps, and sampling frequency should match Fs from the ASR Bundle).
Enhacement / Bug fix: we introduced recognizerState readonly property for KIOSRecognizer instances. This property is meant to replace listening property, which was not providing sufficient information. For more details check different KIOSRecognizerState values recognizerState can take.
Feature: you can now direct KIOSRecognizer to perform echo cancellation, if available on the device, to remove audio played by the app speaker from the signal captured by the microphone. For details see performEchoCancellation: and echoCancellationAvailable methods of the KIOSRecognizer. This feature is experimental.
Bug fix: cleanup of the audio interruption logic that was causing crashes on older iOSes and prevented compilation with older versions of XCode. The current implementation takes complete control of AVAudioSession management and provides two callback methods via KIOSRecognizerDelegate protocol that allow developers to: 1) cleanup audio playing resources (stop playing audio, remember app state as necessary) before the app goes to the background. This allows us to properly deactivate AVAudioSession before app goes to background, 2) setup UI, etc. after audio interrupt is over and the app comes to foreground. For more details see unwindAppAudioBeforeAudioInterrupt and recognizerReadyToListenAfterInterrupt:. unwindAppAudioBeforeAudioInterrupt is not optional, which means that you will have to update your controllers that conform to KIOSRecognizerDelegate protocol.
Bug fix: SDK crashes if prepareForListeningWithCustomDecodingGraphWithName: or prepareForListeningWithCustomDecodingGraphAtPath: is called while the app is listening. This is now fixed.
Bug fix: there were couple of circumstances that would cause a crash when using startListeningFromAudioFile: method to perform recognition from the audio files. These are now fixed.
Release 1.2, June 20th 2017
Enhancement: Cleaned up logging and reduce log level (info->debug) for some messages.
Bug fix: Fixed a bug with audio interruption handling where recognition wasn’t being automatically stopped on audio interrupt or when app goes to background.
Release 1.1, June 15th 2017
Enhancement: Further improvements to logging so that the format is consistent across the SDK
Enhancement: we introduced another method for initialization initWithASRBundleAtPath. If you don’t want to embed ASR Bundle with your app (to miminize app size) you can instead download the ASR Bundle after the app has been installed and use this method to initalize the SDK.
Enhancement: Improved handling of out-of-vocabulary words (primarily for non-public ASR bundles)
Enhacement: Minor improvements in CPU utilization (future release will be focused on CPU and memory optimizations)
Bug fix: SDK now gracefully handles audio interruptions (phone calls, app going to the background, etc.). If the app is listening via KeenASR SDK, it will stop listening as soon as the interrupt occurs (without triggering any callbacks due to lack of time). Once interrupt is done and the app resumes, there is a callback method you can implement to get notified; this is were you would update any UI elements and/or start listening if your app is listening all the time.
Bug fix: resolved another edge case where partial results may have not been reported if they are the same as the last partial result from the previos recognition run.
Release 1.0, April 27th 2017
We changed the name of the framework to KeenASR since the old name occasionally created confusion and we are now working on features that are not Kaldi specific. This change is reflected in two places: 1) name of the framework is now KeenASR.framework, 2) main include file is KeenASR.h. There were no changes in class and method names, so the switchover should be simple. To install release 1.0 of the SDK you will need to: 1) remove KaldiIOS.framework from your project, 2) add KeenASR.framework to your project 3) replace includes of “KaldiIOS/KaldiIOS.h” with “KeenASR/KeenASR.h” in your source code.
Bug fix: partial result were not reported if they were the same as the last partial result from the previous recognition run. For example, if you ran a recognition session and said “doctor” which was reported via partial result and then via final result, then ran another session and said “doctor” again, partial result callback would not trigger; final result callback would still trigger properly.
Enhancement: improved logging, where some of the messages were logged in a non-standard format. (log messages from one of the modules we use are still reported via NSLog, in a different format than the rest of the log messages; this will be changed in future releases)
Bug fix: release 0.9 introduced a bug which made some of the (non-public) ASR Bundles incompatible with the SDK. You would experience this bug only if you received custom ASR bundles from us.
Release 0.9, March 11th 2017
We switched to libc++ STD library. When you replace the framework in your XCode project, you will also need to follow these two steps: 1) Replace libstdc++.6.0.9.tbd library with libc++.tbd library. Under Targets, choose your target, select Build Phases tab, open Link Binary With Libraries, click on +, select libc++.tbd, click Add. Delete libstdc++.6.0.9.tbd. 2) Under Targets, choose your target, select Build Settings tab, search for “c++” and make sure that C++ Language Dialect is set to C++11, and C++ Standard Library is set to libc++.
Enhancement: we added support for lattice rescoring when decoding graph is bundled with the app. The whole process is opaque and driven by existence of a rescoring const arpa file in the decoding graph directory. We plan to provide additional documentation and make the command line tool for creation of decoding graphs availble in the near future. If large vocabulary recognition is of interest to you, please contact us. The SDK currently supports real-time recognition of about 80k words on iPhone 6s with this approach.
Bug fix: on certain occasions confidence scores and timings were not being provided in the final result. This bug would sometime occur in situations when one of the words in the decoding graph was not in the lexicon (words.txt file in the ASR Bundle) and its pronunciation was automatically derived by the SDK.
Release 0.8, February 27th 2017
Enhancement: log messages from the framework include KeenASR label to enable easier filtering in apps with numerous log entries
Enhancement: cleanup of log messages (remove extranous messages and improved text for some of the messages)
Enhancement: audio capture will now work with Bluetooth devices. NOTE: quality of audio captured via bluetooth devices can vary significantly depending on the quality of the bluetooth device, and the type of noise-cancellation algorithms that may be used. Low-quality bluetooth devices will likely have significant negative impact on recognition accuracy.
Bug fix: only responses with high confidence will be used to update the adaptation state. This bug may have affected performance of your apps in use cases where there was a significant number of misrecognitions (with low-confidence), for example in vary noisy environments.
Release 0.7, January 17th 2017
Enhancement: we removed startListeningWithDecodingGraph method from the KIOSRecognizer class and added couple of methods that are used to prepare for recognition with either custom decoding graphs built in the app or custom decoding graphs bundled with the app. This makes this release not backward compatible with previous versions of the framework.
Enhancement: KIOSDecodingGraph class has gone through some refactoring. Majority of methods are now class based.
Feature: By the end of January we plan to release a command line tool for Mac OS X, called dgBuilder, which will allow you to build custom decoding graphs in your development sendbox, which you can then bundle with your app. This will allow you to better deal with large vobulary tasks (decoding graph creation on mobile device takes too long and is memory intense). Contact us if you are interested in beta testing this tool.
Enhancement: We’ve update various documentation (Quick Start, Decoding Graphs, etc) based on your feedback.
Enhancement: In addition to providing versioning information in the application log file, the framework now contains a text file called VERSION.txt with this information. Documentation also specifies framework version, both in the header and the footer.
Release 0.6, December 8th 2016
NOTE: this release requires the updated ASR Bundles; when you update the framework, you will have to download and update ASR bundle(s) as well
Feature: Recognition from the audio file. KIOSRecognizer now provides a few methods to perform recognition from the wav files. While we envision the framework being primarily used for real-time audio, the file-based recognition can be useful for controlled evaluation purposes.
Feature: A more detailed KIOSResult class with overal confidence, and word confidences, start times, and durations for each word.
Feature: Refactored and partially optimized creation of decoding graph.
Enhancement: modified decoding graph creation to dynamically create various resources that were stored in the ASR Bundles; new bundles are cleaned up from the unnecessary files. Version 0.6 of the framework will not work with the older ASR bundles.
Deprecated: KIOSDecodingGraph method createDecodingGraphFromBigramURL:andSaveWithName has been renamed to createDecodingGraphFromArpaURL:andSaveWithName: to properly reflect the underlying functionality. The old method is deprecated and it will be removed in the future releases.
Release 0.5, October 20th 2016
Feature: Support for Kaldi NNet3 models, including chain models. With chain models we can now support larger decoding graphs, and language models with over 30k words on iPhone 6.
Feature: Add support for weighting down silence phones when doing adaptation via iVectors. This tends to improve recognition performance for NNet recognizers.
Enhancement: Simplify initialization of the SDK. The type of the recognizer is determined from the ASR Bundle name and doesn’t need to be passed to the init method any more.
Enhancement: Increase the timeout for the trial version of the framework from 5min to 10min.
Bug fix: Fix issue with adaptation state not being carried over from subsequent interactions.
Other: Config file for the DNN ASR Bundle (librispeech-nnet-en-us) has been updated with the parameters that enforce down-weighting of the silence phones for iVector computation.
Release 0.4.2, September 28th 2016
Bug fix: Fix crashes that would happen on stopListening in certain occasions
Enhancement: Remove hard-coded bundle id from the trial version of the framework. From now on, trial version of the framework will work with any app bundle id, but the app will timeout (exit) after 5 minutes. If you need a fully functioning version of the framework, tied to your app bundle id, contact us at email@example.com.
Release 0.4.1, September 11th 2016
Bug fix: Decoding graph creation code would trigger crash in low-memory conditions
Bug fix: Removed non-public API for logging which caused iTunes Connect to complain and reject apps
Support for iOS 10 and XCode 8
Note that iOS 10 requires apps to specify “Privacy - Microphone Usage Description” key in info.plist file, when microphone access is required by the app.
Release 0.4, July 20th 2016
Bug fix: creation of custom decoding graph was intermittently failing
Bug fix: custom decoding graph directory was not properly cleaned up if creation failed
New Feature: KIOSRecognizer now supports Speaker Adaptation control via adaptToSpeakerWithName, resetSpeakerAdaptation, saveSpeakerAdaptationProfile, etc. methods. For details see Speaker Adaptation section of the KIOSRecognizer class.
Release 0.3.2, July 11th 2016
Improved handling of sentences passed to KIOSDecodingGraph methods (removal of irrelevant punctuation, reducing accented words to their ascii representation, better number interpretation)
Fixed bug for 8000Hz models (audio was still sampled at 16000Hz)
Couple of “under-the-hood” bug fixes that may have caused the framework to fail when building the decoding graph
- Speaker Adaptation (control via API)
- Bug fix that triggers exception when stopListening is called from recognizerPartialResult callback
Release 0.3.1, Jun 20th 2016
Control of Voice Activity Detection parameters via KIOSRecognizer’s setVADParameter:toValue: method
New [KIOSRecognizer initWithRecognizerType:andASRBundle:] method in KIOSRecognizer that allows initialization of KIOSRecognizer without passing a relative path to the decoding graph. If you are not bundling decoding graphs with your app, you will most likely use this method to initialize the engine.
Audio sampling frequency is not hard-coded to 16kHz any more. That is the default value, but if mfcc.conf file specifies a different value via sample-frequency, the engine will use that value. This is only releant if you are using the framework with your own acoustic models.
Several performance enhancements and bug fixes in different methods of
We’ve also updated the proof of concept app on Github with several demos that show how to create custom decoding graphs.
Release 0.3, Jun 12th 2016
Support for dynamic creation of decoding graph. Using an instance of the
KIOSDecodingGraphclass you can now create decoding graph in your app by providing a list of sentences or a bigram language model (see QuickStart and class documentation for more details)
Consolidated logging; you can now control the level of logging from the framework (defaults to WARN). See logLevel property of the KIOSRecognizer class
Several bug fixes and performance enhancements
Note that you will need to also download and update ASR Bundles in order to use the dynamic creation of the decoding graph (ASR Bundles now contain additional information that allows building of decoding graphs)