September 9, 2019
Computers have been a part of our lives for decades now. Despite giant leaps in computing power, there have been two constant peripherals – the keyboard and mouse. Using a computer means flirting with carpal tunnel as you interact with one or both of these elements in some way. The advent of the touchscreen and incorporation of hands-free computing has started to change all of this.
Our smallest devices are now the most innovative, while desktops remain static and dated. As these technologies begin to mature and converge across the devices we use, the hands-free future of science fiction is tumbling towards reality.
Apple’s Siri and its direct competition Google Now are two great examples of how hands-free computing – in the form of speech – has become a part of everyday life. The frustrating reality is that using voice to interact with machines still has a way to go before it’s flawless. These days, Siri is more likely to mistake your “open my email” command for “look up some kale”. Which makes it useless.
This is changing though. Increasingly, the experience of speaking to your mobile device and having it understand your request and execute it seamlessly is eliciting surprise. Voice recognition in machines is getting better. The next few years in voice and speech recognition will see hands-free computing at the forefront of human-machine interaction.
As more voice usage data becomes available, speech recognition accuracy will get better and better. It’s a trend known as the “virtuous cycle of AI.” And it’s enabled through deep learning and massive amounts of data. It works like this: The more people using voice interfaces, the more data is gathered, the more effective the recognition. Only this year, IBM announced that it reached a new industry record in conversational speech recognition.
The IBM team’s system achieved a 5.5 percent word error rate – down from 6.9 percent in 2016. This benchmark was measured on a difficult task with the machine deciphering conversations between humans discussing day-to-day topics like buying a car. It’s a test known as SWITCHBOARD and has been around for more than two decades. So, how did IBM achieve this? Through deep learning.
Julia Hirschberg, a professor at the Department of Computer Science at Columbia University, says that this is a major development in lieu of the challenges facing hands-free voice computing. “The ability to recognize speech as well as humans do is a continuing challenge, since human speech, especially during spontaneous conversation, is extremely complex.”
The continued improvement of voice recognition and hands-free computing gives huge momentum to the proliferation of digital assistants. Assistants like Alexa, Siri, Cortana, and Google Assistant could spell the end for screens as a computing interface and replace individual apps with a single point of contact.
Increasingly these assistants are becoming the maestros of our lives, using AI, big data, and voice prompts to arrange and organize our lives. Cortana, for example, mines your emails, calendars, and digital workspace to learn about your day-to-day activities. Don’t get a fright the next time your digital assistant reminds you of that appointment that you completely forgot about, without prompting.
Machines already see better than humans can, recognize objects faster, and can listen and hear better. Eventually, they’ll be able to understand and interpret more accurately. What does a world where computers listen to everything we say look like? Will it change the way we live? It’ll certainly change how we interact with our devices. A conference room, vehicle or wearable technology device that’s listening to our conversations, understanding, and interpreting it into what we need will eventually become the norm. The question is, will we be willing to accept this?
If you’re curious about how the development of hands-free computing technologies like voice recognition is shaping our lives, read our latest article. It outlines how AI is similarly disrupting the healthcare industry through automation and deep learning.