In a recent research from the popular search engine giant, it suggests that users will no longer need a network connection when using all smartphone features when dictating them with voice commands through the Google AI. Researchers from the company have already created a lightweight, yet accurate, speech-recognition system that is embedded within the Nexus 5. It is small enough to run “faster than real time” on said smartphone, without the need for an Internet connection.
New Google AI Brings Voice Recognition Commands Without the Need of an Internet Connection
With this new Google AI system, it no longer needs the computation of a remote data center. It could get around through known obstacles of AIs that require a reliable network connection in order to use voice commands or speech recognition features. This artificial intelligence can even be used on- or offline on a smartphone, a smartwatch, or any other memory-constrained device.
The objective for the project has already been outlined within a new paper written by a team of Google researchers. Their main goal was to create a lightweight yet accurate system to be embedded within a device to acquire speech-recognition commands without the need for a network connection.
It is said to be lightweight, not only in size but also in memory. It runs on a 20.3MB footprint system. When tested on a Nexus 5 that contains a 2.26GHz CPU and 2GBG of RAM, it is able to achieve a 13.5-percent word error rate while being on an open-ended dictation task.
Using a variety of techniques, the researchers within the project have compressed an acoustic model which is one-tenth of its original size. This is to scrimp on system requirements for the two varying domains of dictation and voice commands. As the researchers have noted, the embedded speech-recognition systems that can work offline can already handle certain commands. For instance, users are able to send an email like, “Send an email to <name of contact person>. Can we reschedule?” The system would then listen to what had been said immediately and then execute the command later when there is a reliable Internet connection connected to the device.
In order to train the Google AI, the researchers extracted and used three million utterances. This would then amount to 2,000-hours from the voice search traffic found within the popular search engine. To make the recognition even sturdier, the system was also introduced with noise samples found from YouTube videos.