How We Started With On-device Speech Recognition: The Story of Keen Research
How the Frustration of Teaching Multiplication Facts Led to a New Force in On-Device Speech RecognitionAuthor: Ognjen Todic | September 14, 2022
It all began about six years ago when a friend jokingly complained to me about the repetitive chore of helping his daughter practice her multiplication facts. “Why isn’t there an app for that?” he asked. That got me thinking. I began toying with prototypes for just such an app – a voice-enabled solution that would quiz a learner on multiplication tasks using a mobile phone or tablet. Among other things, the app would have to run anywhere without internet connection – meaning, it would need on-device speech recognition capabilities.
After a couple of months, the multiplication app was in the App Store. That’s when a realization hit me: many developers would likely face the same challenge, yet there was no good solution on the market for on-device speech recognition.
So I decided – then and there – to build my own.
By that time, I’d already had a good deal of experience in speech recognition. Beginning in the late 1990s, I’d spent years working in speech recognition-related R&D roles at Entropic Research Lab, and Ordinate Corporation (now part of Pearson).
I took a deep dive into recent advancements in speech recognition technology, including the use of deep neural networks, and before too long I had a working iOS prototype of what eventually became the KeenASR SDK. What began as a fun engineering challenge – could I build a working offline speech recognition system for mobile devices? – became something much more significant when I considered the ramifications.
An effective on-device speech recognition SDK would be able to:
- allow applications to run in locations without reliable Internet access.
- protect privacy and, out-of-the-box, support regulations such as GDPR and COPPA by not sending personal data from the device.
- allow app developers to offer more affordable apps that don’t require backend speech processing from costly third-party cloud services. On-device processing also meant endless scalability.
- deliver mobile apps with superior user experience, with low latency, and always-on listening.
- enable developers to customize the SDK to fit their own use cases.
Encouraged by these opportunities, Keen Research morphed from a software development shop into a dedicated on-device automatic speech recognition SDK provider.
Our growth has been fueled by customer demand and our ongoing commitment to making our SDK more versatile, powerful, and efficient. The EdTech and KidTech markets were the first to show interest, so we developed acoustic models optimized for children’s voices as well as a number of features to support oral reading instruction and assessment and language learning use cases. We also broadened the SDK game support by creating a Unity plugin for on-device speech recognition and support for Android OS. In addition to English, we extended language support to include Spanish, German, and French.
It’s been a busy and eventful six years. Today, KeenASR SDK powers voice interactive apps that are used by millions of people. Examples include: the voice-activated digital metronome “Hey Metronome”; Reading Adventure from Osmo; Noggin from Nickelodeon.; the award-winning Novel Effect augmented audio reality for storytelling app; PBS Kids “The Cat in the Hat Invents” app; the 4.7-star-rated Rally Reader coaching tool; MiraCheck, a virtual assistant for sports pilots; the highly-lauded Readability Tutor; eoStar, a warehouse management solution with voice-picking support; the Ambifi, an ambient intelligent guidance app supporting digital checklists, workflows, and procedures… and many more.
Along the way, Keen Research grew from a team of one (me, wearing way too many hats) to a team of four. (we are hiring, by the way.) We are bootstrapped and organically growing the business.
Our goal is to provide developers with solutions that are cost-effective, easy to integrate, and work reliably across many different use cases. We’re helping kids learn how to read and learn languages. We’re enabling people with disabilities to easily control their medical devices using their voices. We’re making frontline workers’ jobs easier and safer, and much, much more.
We’re excited by our progress and the opportunities that lie ahead. And to think: it all started because a tired dad wanted a better way to teach his daughter multiplication.
We’d love to hear how we can help you find a better way. Try our KeenASR SDK and let us know.