Sometimes, people require Speech-to-Text (STT) technique to make machines recognize oral commands. Now, we are introducing the simplest way to speech recognition in JavaScript. This tutorial is for beginners. It helps you create web pages recognizing what people talk by using Web Speech API services.
All codes here are not complicated, so you can easily understand even though you are still students in school. To benefit your learning, we will provide you download link to a zip file thus you can get all source codes for future usage.
Estimated reading time: 6 minutes
EXPLORE THIS ARTICLE
TABLE OF CONTENTS
BONUS
Source Code Download
We have released it under the MIT license, so feel free to use it in your own project or your school homework.
Download Guideline
- Prepare HTTP server such as XAMPP or WAMP in your windows environment.
- Download and unzip into a folder that http server can access.
SECTION 1
The Basics
Let’s study an experimental technology called Web Speech API in JavaScript for speech recognition. However, it is not feasible for all browsers upon different OS platforms. Currently, only Chrome is available on Windows and Android.
Web Speech API
The Mozilla Web Speech API can provide interfaces to be aware of what people speak about. Smartly, it converts voices from microphone into texts. Unfortunately, that function should leverage browser’s speech recognition service, so it is available only for browsers in a compatible table.
Speech recognition is accessed via the SpeechRecognition interface, which provides the ability to recognize voice context from an audio input (normally via the device’s default speech recognition service) and respond appropriately. Generally you’ll use the interface’s constructor to create a new SpeechRecognition object, which has a number of event handlers available for detecting when speech is input through the device’s microphone. The SpeechGrammar interface represents a container for a particular set of grammar that your app should recognise. Grammar is defined using JSpeech Grammar Format (JSGF.)
Truly, specifying a list of predefined words will get more accuracy for recognition. When audio comes, the processing results include recognized words and corresponding confidence scores. However, the word list should obey JSpeech Grammar Format (JSGF.)
Browser Compatibility
Even though browser compatible table announce Microsoft Edge is available, many people working on Edge get unexpected result as Microsoft Forum said. Our tests also get the same result. Therefore, our example focuses only on Chrome Windows and Chrome Android.
SECTION 2
Speech Recognition
Web Speech API develop a recognition module in JavaScript. First, the section show how to configure before recognizing. Then when a audio stream comes from microphone, the module translates it to be a word or command.
Using jQuery Mobile
Using jQuery Mobile is more friendly for mobile devices on Chrome Android. But the version compatibility for jQuery and jQuery Mobile should be considered, too.
<link rel="stylesheet" href="https://ajax.googleapis.com/ajax/libs/jquerymobile/1.4.5/jquery.mobile.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquerymobile/1.4.5/jquery.mobile.min.js"></script>
</head>
Using Web Browser Engine webkit
You must use webkit to create SpeechRecognition
object. Then speech recognition in JavaScript will get Chrome support. So remember to feed the right object to Chrome as below.
var SpeechRecognition = webkitSpeechRecognition;
var SpeechGrammarList = webkitSpeechGrammarList;
Click a button to listen voices from microphones. Once received, start to recognize.
Grammar or Commands
Subsequently, codes define the grammar or commands we want to recognize. The grammar format used is JSpeech Grammar Format (JSGF). The variable below holds our grammar:
var grammar = '#JSGF V1.0; grammar animals; public <animal> = kangaroo
| monkey | zebra | snake | panda | hippo | bull | elephant | lion | antelope
| gorilla | giraffe | tiger | eagle | ostrich | rabbit | parrot | turtle
| chameleon | horse | donkey | rooster | chicken | goat | duck ;'
A recognized spoken word is shown as illustrated below.
Recognition Configuration
There are 4 major properties and 2 callback functions to be configured in our example. These properties are
continuous
: If true, capture continuous result, rather than a single one. Default is false.lang
: Sets the language of the recognition. A DOMString representing the BCP 47 language tag.interimResults
: If true, return interim results; otherwise, just return final results. Default is false.maxAlternatives
: Sets the number of alternative potential matches. Default is 1.
To increase accuracy, it is important to configure your needs in JavaScript speech recognition. Especially, correct language setting will reduce ambiguity in word matching.
function speech_to_text_config() {
.....
recog = new SpeechRecognition();
speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recog.grammars = speechRecognitionList;
recog.lang = 'en-US';
recog.continuous = false;
recog.interimResults = false;
recog.maxAlternatives = 1;
recog.onresult = function (event) { onResult(event); }
recog.onerror = function (event) { onError(event); }
console.log("Speech To Text Configuration Done.");
}
onresult
defines the callback function of retrieving result and confidence score, while onerror
defines a function to catch errors. Both of callback functions terminate recognition process by recog.stop()
.
function onResult(event) {
console.log(event);
var animal = event.results[0][0].transcript;
var confidence = event.results[0][0].confidence.toFixed(4);
$("#recognized").html(animal+" "+confidence).css("color", "blue");
recog.stop();
}
function onError(event) {
console.log(event);
$("#recognized").html("OnError: " + event.error).css("color", "red");
recog.stop();
}
Using F12 Inspect as below, you can check the 2 properties transcript
and confidence
. For speech recognition in JavaScript, they represent recognized words in grammar list and confidence score for likeness, respectively.
More Configuration
You can set recog.maxAlternatives = 2
to gain one more alternative matched word.
FINAL
Conclusion
Browser-based speech recognition in JavaScript is indeed easier than in Android Java. Additionally in the next post, we will introduce a reversed solution called Speech Synthesis or Text-to-Speech (TTS) which transforms written words to speech.
Thank you for reading, and we have suggested more helpful articles here. If you want to share anything, please feel free to comment below. Good luck and happy coding!
Learning Tips
Let us suggest a excellent way to learn HTML scripts here. Using Google Chrome F12 Inspect or Inspect Element will help you study the codes.
In Google Chrome, there are two ways to inspect a web page using the browser built-in Chrome DevTools:
- Right-click an element on the page or in a blank area, then select Inspect.
- Go to the Chrome menu, then select More Tools > Developer Tools.
Suggested Reading
TRY IT
Quick Experience
That is all for this project, and here is the link that let you experience the program. Please kindly leave your comments for our enhancement.
Try It Yourself
Click here to execute the source code, thus before studying the downloaded codes, you can check whether it is worthy.
thanks for the article. Where can i find a list of all potential default grammars?
So far i have seen colors, numbers and animals. Are there more?
Truly, grammars can be ignored. Try to recognize without grammars, and you won’t find any difference.