It is more than half a century since the first computerised voice recognition program, Audrey, made its début. Designed by Bell Laboratories (of Alexander Graham Bell fame), it boasted the ability to understand digits that were read aloud by a single person: a sensational breakthrough at the time.
We are now several leagues ahead in terms of the levels of sophistication that can be reached; voice recognition is now beginning to play a significant role in our daily lives. Finding its way into everything from mobile phones to dedicated home devices, voice is quickly becoming one of the fundamental tools at our disposal for controlling the technology around us.
From the smartphone to the smart home
The first flickers of the voice recognition revolution emerged in the late 2000s with the launch of both Google Now and Siri. Initially devised as spoken search tools, these interfaces – along with competing software such as Microsoft Cortana – have since become intelligent personal assistants embedded within our phones and computers, able to carry out basic commands such as sending texts, locating restaurants and scheduling meetings.
The global voice recognition market is still dominated by many of these players; Apple, Microsoft, Google and IBM are all among the leaders in terms of global market share. Yet this landscape could be about to change, as voice recognition becomes more prevalent as a control interface for the Internet of Things.
The best-known iteration of voice control is the Amazon Echo which, despite not being available anywhere outside the USA, has become the centre of attention in the smart home market. This voice-controlled smart speaker has proven to be equally convenient as a hub for the connected home, enabling several IoT-based home technology systems to be managed from a single hub.
While this device has proven popular among smart home enthusiasts, the potential for Amazon’s Alexa Voice Service is significant. Last year, it opened this up for integration with third-party hardware developers; now an array of Alexa-based products are making their way onto the market.
The chasing pack
While Amazon may have swooped in and stolen a march, others are now making inroads into the connected home. Chief among them is Google, which recently announced a direct competitor to the Echo in the form of Google Home, its proprietary smart speaker-hub hybrid. This is paired with Google Assistant, an evolution of its Now voice recognition platform that takes a more AI-based, conversational approach. This allows it to understand the context of secondary questions and offer an appropriate answer without users needing to repeat themselves. Apple, meanwhile, is reportedly planning on creating its own Siri-based speaker, although this is still in development.
Another company looking to build a more sophisticated alternative to Alexa is Viv, from the original creators of Siri. Like Google’s offering, this can stack requests for information and analyse context – but it can also generate programs dynamically in order to devise the optimum means of answering the specific question being asked.
Even the luxury smart home space – where seamless control is a crucial part of the homeowner experience – is seeing voice recognition emerge as a way of simplifying the user experience. Josh.ai is targeting the premium connected home environment with its Josh box, a $14,000 (£9,500) device that promises to control several luxury connected home systems such as Crestron, Lutron and Kaleidescape.
Crestron itself has already moved into voice recognition, with its latest touchscreen control panel supporting speech-based instruction that can ‘control any function’ within the property. Neither of these support far-field always-listening technology, although Josh.ai has stated that it is on their roadmap to be implemented at a later date and Crestron is making speech control ‘a very important part’ of its development pathway.
“Voice recognition is a highly compelling new frontier of control, but the key is making sure it is integrated in ways that are intuitive and natural, which will be our continued focus of development going forward.
There will always be a drive to make voice recognition more intelligent, and the technology has already seen advances that will eventually translate into compelling new applications. One can imagine uses where the subtleties of voice – such as inflection – could add a new dimension to voice recognition and control.”
– Byron Wendling, technology manager for touchscreens and user interfaces at Crestron
What is the value of voice recognition?
While touchscreens and button interfaces have long been the de facto control medium for home technology, there are several scenarios where they are not the optimal solution. When you want to quickly access information or look something up, for example, it can be frustrating to navigate menus or input search terms. Likewise, when performing manual tasks such as cooking or gardening it is inconvenient to have to stop what you’re doing to adjust a setting or locate a certain piece of information.
The meteoric rise of cloud-based music and video streaming in the home also presents an opportunity for voice command. Research conducted by AI platform MindMeld (which took a representative sample of 1,800 US-based users of intelligent voice assistants) indicated that 50% of those surveyed wanted ‘voice enablement’ in their music apps, allowing them to multi-task at home and express commands without having to be in close proximity to a control device. This has been a central tenant of the new Google Home device, which links up with its Chromecast devices to make it easier for homeowners to manage their media consumption from a central point.
While the home market is ripe for a voice recognition explosion, the automotive industry is also beginning to reap the benefits of hands-free interaction. Maserati, which exhibited an Android Auto voice-enabled interface at Google I/O, is among several car manufacturers that are aiming to integrate connected vehicle technology into their product lines. This is predicted to rise swiftly, with more than half of all automobiles expected to have speech recognition built in by 2019.
One of the main areas that Amazon is looking to crack is voice ordering. Its Echo hubs work in a similar manner to its one-click ordering process, allowing Prime members to order an Amazon product directly through its products and track the status of orders in real-time.
Where does voice recognition go next?
Science fiction has done much of the heavy lifting in terms of devising the ideal speech recognition tools – think Doctor Who’s TARDIS or Star Trek’s Universal Translator. While these are idealised plot devices, their applications have real-life counterparts that are attempting to bring these dreams to life.
The concept of a universal translator that could help people understand each other, no matter the languages involved, has long been a pipe dream of writers and businesspeople alike. The idea is so pervasive that even the US military has invested resources into development of such a tool, teaming up with IBM on a $32.8m project named BOLT, or the Broad Operational Language Translation project, to help its forces work more effectively in the countries it operates in.
Yet it is not just governments seeking to create a universal communication device: Windows has released an instant translation tool for its voice calling platform Skype, albeit with limited functionality, while Waverly Labs is taking pre-orders for a smart earpiece that translates directly into a user’s ear like a real-life Babel Fish.
Yet translation isn’t the only barrier that researchers and corporations alike are trying to overcome. Nuance continues to lead the way in terms of voice dictation with its Dragon software, while several companies are examining the possibility of using speech recognition for the purpose of identity verification and voice biometrics.
With many of these, the end goal is to develop systems that can offer better-than-human accuracy. Graduate students from the University of Rochester have developed software that correctly identifies the emotions behind 700 human utterances with a 72% success rate, compared to 60% from the human control group. While such projects are still largely at the R&D stage, it is not unreasonable to predict that we could be on the cusp of a rapid rise in the sophistication of speech recognition technology.
Is voice recognition the future of the connected home?
“The home is a complex environment where voice is sometimes, but not always, the right solution. The room might be loud, pressing pause might be quicker than telling Alexa to stop, and remote access will be vital.
Because of this we believe physical switches and dimmers will remain, mobile apps will continue to rise, and visual interfaces such as touch screens and tablets will get more sophisticated. More importantly, we believe the system will get smarter through big data and machine learning algorithms, automating and predicting tasks without the need to even say a word.”
Alex Capelcelatro, co-founder and CEO at Josh.ai
Voice recognition undoubtedly has a huge part to play in our connected futures, but there is still a way to go before it becomes the easiest way for us to control our homes. Significant work is needed across all platforms to improve the various technologies that combine to create a successful speech recognition platform, including deep learning algorithms, signal acquisition and natural language processing. Yet, as these platforms gain more users they can become more useful tools themselves, the increasing amount of data gathered can be used to inform these programs and improve their algorithms to work more effectively with humans.
One of the drawbacks that the likes of Siri and other mobile-based voice platforms have found thus far has been the reluctance by users to use speech recognition tools in public spaces. This is particularly true for the younger generation, which generally prefers texting to calling. The privacy offered by a home environment may provide a more welcoming environment for people engage with voice control, allowing homeowners to become comfortable using this technology without the fear of embarrassment.
There can be no doubt that voice control has arrived in the connected home; the only question is how long it will take until it becomes the principal means by which we communicate with the technology in our properties. It’s not there yet, but it might be slightly closer than you think.