JavaScript Speech Recognition for Beginners

Sometimes, people require Speech-to-Text (STT) technique to make machines recognize oral commands. Now, we are introducing the simplest way to speech recognition in JavaScript. This tutorial is for beginners. It helps you create web pages recognizing what people talk by using Web Speech API services.

All codes here are not complicated, so you can easily understand even though you are still students in school. To benefit your learning, we will provide you download link to a zip file thus you can get all source codes for future usage.

Estimated reading time: 6 minutes

EXPLORE THIS ARTICLE
TABLE OF CONTENTS

BONUS Source Code Download

1 The Basics

2 Speech Recognition

FINAL Conclusion

TRY IT Quick Experience

BONUS
Source Code Download

We have released it under the MIT license, so feel free to use it in your own project or your school homework.

Download Guideline

Prepare HTTP server such as XAMPP or WAMP in your windows environment.
Download and unzip into a folder that http server can access.

DOWNLOAD SOURCE

SECTION 1
The Basics

Let’s study an experimental technology called Web Speech API in JavaScript for speech recognition. However, it is not feasible for all browsers upon different OS platforms. Currently, only Chrome is available on Windows and Android.

Web Speech API

The Mozilla Web Speech API can provide interfaces to be aware of what people speak about. Smartly, it converts voices from microphone into texts. Unfortunately, that function should leverage browser’s speech recognition service, so it is available only for browsers in a compatible table.

Speech recognition is accessed via the SpeechRecognition interface, which provides the ability to recognize voice context from an audio input (normally via the device’s default speech recognition service) and respond appropriately. Generally you’ll use the interface’s constructor to create a new SpeechRecognition object, which has a number of event handlers available for detecting when speech is input through the device’s microphone. The SpeechGrammar interface represents a container for a particular set of grammar that your app should recognise. Grammar is defined using JSpeech Grammar Format (JSGF.)

Truly, specifying a list of predefined words will get more accuracy for recognition. When audio comes, the processing results include recognized words and corresponding confidence scores. However, the word list should obey JSpeech Grammar Format (JSGF.)

Browser Compatibility

Even though browser compatible table announce Microsoft Edge is available, many people working on Edge get unexpected result as Microsoft Forum said. Our tests also get the same result. Therefore, our example focuses only on Chrome Windows and Chrome Android.

SECTION 2
Speech Recognition

Web Speech API develop a recognition module in JavaScript. First, the section show how to configure before recognizing. Then when a audio stream comes from microphone, the module translates it to be a word or command.

Using jQuery Mobile

Using jQuery Mobile is more friendly for mobile devices on Chrome Android. But the version compatibility for jQuery and jQuery Mobile should be considered, too.

index.html

<link rel="stylesheet" href="https://ajax.googleapis.com/ajax/libs/jquerymobile/1.4.5/jquery.mobile.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquerymobile/1.4.5/jquery.mobile.min.js"></script>
</head>

Using Web Browser Engine webkit

You must use webkit to create SpeechRecognition object. Then speech recognition in JavaScript will get Chrome support. So remember to feed the right object to Chrome as below.

index.html

var SpeechRecognition = webkitSpeechRecognition;
var SpeechGrammarList = webkitSpeechGrammarList;

Click a button to listen voices from microphones. Once received, start to recognize.

Grammar or Commands

Subsequently, codes define the grammar or commands we want to recognize. The grammar format used is JSpeech Grammar Format (JSGF). The variable below holds our grammar:

index.html

var grammar = '#JSGF V1.0; grammar animals; public <animal> = kangaroo
| monkey | zebra | snake | panda | hippo | bull | elephant | lion | antelope
| gorilla | giraffe | tiger | eagle | ostrich | rabbit | parrot | turtle
| chameleon | horse | donkey | rooster | chicken | goat | duck ;'

A recognized spoken word is shown as illustrated below.

Recognition Configuration

There are 4 major properties and 2 callback functions to be configured in our example. These properties are

continuous: If true, capture continuous result, rather than a single one. Default is false.
lang: Sets the language of the recognition. A DOMString representing the BCP 47 language tag.
interimResults: If true, return interim results; otherwise, just return final results. Default is false.
maxAlternatives: Sets the number of alternative potential matches. Default is 1.

To increase accuracy, it is important to configure your needs in JavaScript speech recognition. Especially, correct language setting will reduce ambiguity in word matching.

index.html

function speech_to_text_config() {
.....
    recog = new SpeechRecognition();
    speechRecognitionList = new SpeechGrammarList();
    speechRecognitionList.addFromString(grammar, 1);
    recog.grammars = speechRecognitionList;
    recog.lang = 'en-US';
    recog.continuous = false;
    recog.interimResults = false;
    recog.maxAlternatives = 1;
    recog.onresult = function (event) { onResult(event); }
    recog.onerror = function (event) { onError(event); }
    console.log("Speech To Text Configuration Done.");
}

onresult defines the callback function of retrieving result and confidence score, while onerror defines a function to catch errors. Both of callback functions terminate recognition process by recog.stop().

index.html

function onResult(event) {
    console.log(event);
    var animal = event.results[0][0].transcript;
    var confidence = event.results[0][0].confidence.toFixed(4);
    $("#recognized").html(animal+" "+confidence).css("color", "blue");
    recog.stop();
}
function onError(event) {
    console.log(event);
    $("#recognized").html("OnError: " + event.error).css("color", "red");
    recog.stop();
}

Using F12 Inspect as below, you can check the 2 properties transcript and confidence. For speech recognition in JavaScript, they represent recognized words in grammar list and confidence score for likeness, respectively.

More Configuration

You can set recog.maxAlternatives = 2 to gain one more alternative matched word.

FINAL
Conclusion

Browser-based speech recognition in JavaScript is indeed easier than in Android Java. Additionally in the next post, we will introduce a reversed solution called Speech Synthesis or Text-to-Speech (TTS) which transforms written words to speech.

Thank you for reading, and we have suggested more helpful articles here. If you want to share anything, please feel free to comment below. Good luck and happy coding!

Learning Tips

Let us suggest a excellent way to learn HTML scripts here. Using Google Chrome F12 Inspect or Inspect Element will help you study the codes.

In Google Chrome, there are two ways to inspect a web page using the browser built-in Chrome DevTools:

Right-click an element on the page or in a blank area, then select Inspect.
Go to the Chrome menu, then select More Tools > Developer Tools.

TRY IT
Quick Experience

That is all for this project, and here is the link that let you experience the program. Please kindly leave your comments for our enhancement.

Try It Yourself

Click here to execute the source code, thus before studying the downloaded codes, you can check whether it is worthy.

2 thoughts on “JavaScript Speech Recognition for Beginners”

sumesh

April 16, 2022 at 6:20 pm

thanks for the article. Where can i find a list of all potential default grammars?
So far i have seen colors, numbers and animals. Are there more?
- Editorial Staff
  
  April 17, 2022 at 5:52 pm
  
  Truly, grammars can be ignored. Try to recognize without grammars, and you won’t find any difference.

JavaScript Speech Recognition for Beginners

EXPLORE THIS ARTICLE
TABLE OF CONTENTS

BONUS
Source Code Download

Download Guideline

SECTION 1
The Basics

Web Speech API

Browser Compatibility

SECTION 2
Speech Recognition

Using jQuery Mobile

Using Web Browser Engine webkit

Grammar or Commands

Recognition Configuration

More Configuration

FINAL
Conclusion

Learning Tips

Suggested Reading

TRY IT
Quick Experience

Try It Yourself

2 thoughts on “JavaScript Speech Recognition for Beginners”

Leave a Comment Cancel reply

EXPLORE THIS ARTICLETABLE OF CONTENTS

BONUSSource Code Download

Download Guideline

SECTION 1The Basics

Web Speech API

Browser Compatibility

SECTION 2Speech Recognition

Using jQuery Mobile

Using Web Browser Engine webkit

Grammar or Commands

Recognition Configuration

More Configuration

FINALConclusion

Learning Tips

Suggested Reading

TRY ITQuick Experience

Try It Yourself

2 thoughts on “JavaScript Speech Recognition for Beginners”

Leave a Comment Cancel reply

EXPLORE THIS ARTICLE
TABLE OF CONTENTS

BONUS
Source Code Download

SECTION 1
The Basics

SECTION 2
Speech Recognition

FINAL
Conclusion

TRY IT
Quick Experience