JavaScript Speech Recognition Example (Speech to Text)

With the Web Speech API, we can recognize speech using JavaScript . It is super easy to recognize speech in a browser using JavaScript and then getting the text from the speech to use as user input. We have already covered How to convert Text to Speech in Javascript .

But the support for this API is limited to the Chrome browser only . So if you are viewing this example in some other browser, the live example below might not work.

Javascript speech recognition - speech to text

This tutorial will cover a basic example where we will cover speech to text. We will ask the user to speak something and we will use the SpeechRecognition object to convert the speech into text and then display the text on the screen.

The Web Speech API of Javascript can be used for multiple other use cases. We can provide a list of rules for words or sentences as grammar using the SpeechGrammarList object, which will be used to recognize and validate user input from speech.

For example, consider that you have a webpage on which you show a Quiz, with a question and 4 available options and the user has to select the correct option. In this, we can set the grammar for speech recognition with only the options for the question, hence whatever the user speaks, if it is not one of the 4 options, it will not be recognized.

We can use grammar, to define rules for speech recognition, configuring what our app understands and what it doesn't understand.

JavaScript Speech to Text

In the code example below, we will use the SpeechRecognition object. We haven't used too many properties and are relying on the default values. We have a simple HTML webpage in the example, where we have a button to initiate the speech recognition.

The main JavaScript code which is listening to what user speaks and then converting it to text is this:

In the above code, we have used:

recognition.start() method is used to start the speech recognition.

Once we begin speech recognition, the onstart event handler can be used to inform the user that speech recognition has started and they should speak into the mocrophone.

When the user is done speaking, the onresult event handler will have the result. The SpeechRecognitionEvent results property returns a SpeechRecognitionResultList object. The SpeechRecognitionResultList object contains SpeechRecognitionResult objects. It has a getter so it can be accessed like an array. The first [0] returns the SpeechRecognitionResult at the last position. Each SpeechRecognitionResult object contains SpeechRecognitionAlternative objects that contain individual results. These also have getters so they can be accessed like arrays. The second [0] returns the SpeechRecognitionAlternative at position 0 . We then return the transcript property of the SpeechRecognitionAlternative object.

Same is done for the confidence property to get the accuracy of the result as evaluated by the API.

We have many event handlers, to handle the events surrounding the speech recognition process. One such event is onspeechend , which we have used in our code to call the stop() method of the SpeechRecognition object to stop the recognition process.

Now let's see the running code:

When you will run the code, the browser will ask for permission to use your Microphone , so please click on Allow and then speak anything to see the script in action.

Conclusion:

So in this tutorial we learned how we can use Javascript to write our own small application for converting speech into text and then displaying the text output on screen. We also made the whole process more interactive by using the various event handlers available in the SpeechRecognition interface. In future I will try to cover some simple web application ideas using this feature of Javascript to help you usnderstand where we can use this feature.

If you face any issue running the above script, post in the comment section below. Remember, only Chrome browser supports it .

You may also like:

  • JavaScript Window Object
  • JavaScript Number Object
  • JavaScript Functions
  • JavaScript Document Object

C language

IF YOU LIKE IT, THEN SHARE IT

Related posts.

  • Español – América Latina
  • Português – Brasil
  • Tiếng Việt
  • Chrome for Developers

Voice driven web apps - Introduction to the Web Speech API

The new JavaScript Web Speech API makes it easy to add speech recognition to your web pages. This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later. Here's an example with the recognized text appearing almost immediately while speaking.

Web Speech API demo

DEMO / SOURCE

Let’s take a look under the hood. First, we check to see if the browser supports the Web Speech API by checking if the webkitSpeechRecognition object exists. If not, we suggest the user upgrades their browser. (Since the API is still experimental, it's currently vendor prefixed.) Lastly, we create the webkitSpeechRecognition object which provides the speech interface, and set some of its attributes and event handlers.

The default value for continuous is false, meaning that when the user stops talking, speech recognition will end. This mode is great for simple text like short input fields. In this demo , we set it to true, so that recognition will continue even if the user pauses while speaking.

The default value for interimResults is false, meaning that the only results returned by the recognizer are final and will not change. The demo sets it to true so we get early, interim results that may change. Watch the demo carefully, the grey text is the text that is interim and does sometimes change, whereas the black text are responses from the recognizer that are marked final and will not change.

To get started, the user clicks on the microphone button, which triggers this code:

We set the spoken language for the speech recognizer "lang" to the BCP-47 value that the user has selected via the selection drop-down list, for example “en-US” for English-United States. If this is not set, it defaults to the lang of the HTML document root element and hierarchy. Chrome speech recognition supports numerous languages (see the “ langs ” table in the demo source), as well as some right-to-left languages that are not included in this demo, such as he-IL and ar-EG.

After setting the language, we call recognition.start() to activate the speech recognizer. Once it begins capturing audio, it calls the onstart event handler, and then for each new set of results, it calls the onresult event handler.

This handler concatenates all the results received so far into two strings: final_transcript and interim_transcript . The resulting strings may include "\n", such as when the user speaks “new paragraph”, so we use the linebreak function to convert these to HTML tags <br> or <p> . Finally it sets these strings as the innerHTML of their corresponding <span> elements: final_span which is styled with black text, and interim_span which is styled with gray text.

interim_transcript is a local variable, and is completely rebuilt each time this event is called because it’s possible that all interim results have changed since the last onresult event. We could do the same for final_transcript simply by starting the for loop at 0. However, because final text never changes, we’ve made the code here a bit more efficient by making final_transcript a global, so that this event can start the for loop at event.resultIndex and only append any new final text.

That’s it! The rest of the code is there just to make everything look pretty. It maintains state, shows the user some informative messages, and swaps the GIF image on the microphone button between the static microphone, the mic-slash image, and mic-animate with the pulsating red dot.

The mic-slash image is shown when recognition.start() is called, and then replaced with mic-animate when onstart fires. Typically this happens so quickly that the slash is not noticeable, but the first time speech recognition is used, Chrome needs to ask the user for permission to use the microphone, in which case onstart only fires when and if the user allows permission. Pages hosted on HTTPS do not need to ask repeatedly for permission, whereas HTTP hosted pages do.

So make your web pages come alive by enabling them to listen to your users!

We’d love to hear your feedback...

  • For comments on the W3C Web Speech API specification: email , mailing archive , community group
  • For comments on Chrome’s implementation of this spec: email , mailing archive

Refer to the Chrome Privacy Whitepaper to learn how Google is handling voice data from this API.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2013-01-13 UTC.

DEV Community

DEV Community

JoelBonetR 🥇

Posted on Aug 22, 2022 • Updated on Aug 25, 2022

Speech Recognition with JavaScript

Cover image credits: dribbble

Some time ago, speech recognition API was added to the specs and we got partial support on Chrome, Safari, Baidu, android webview, iOS safari, samsung internet and Kaios browsers ( see browser support in detail ).

Disclaimer: This implementation won't work in Opera (as it doesn't support the constructor) and also won't work in FireFox (because it doesn't support a single thing of it) so if you're using one of those, I suggest you to use Chrome -or any other compatible browser- if you want to take a try.

Speech recognition code and PoC

Edit: I realised that for any reason it won't work when embedded so here's the link to open it directly .

The implementation I made currently supports English and Spanish just to showcase.

Quick instructions and feature overview:

  • Choose one of the languages from the drop down.
  • Hit the mic icon and it will start recording (you'll notice a weird animation).
  • Once you finish a sentence it will write it down in the box.
  • When you want it to stop recording, simply press the mic again (animation stops).
  • You can also hit the box to copy the text in your clipboard.

Speech Recognition in the Browser with JavaScript - key code blocks:

This implementation currently supports the following languages for speech recognition:

If you want me to add support for more languages tell me in the comment sections and I'm updating it in a blink so you can test it on your own language 😁

That's all for today, hope you enjoyed I sure did doing that

Top comments (20)

pic

Templates let you quickly answer FAQs or store snippets for re-use.

venkatgadicherla profile image

  • Location 3000
  • Work Mr at StartUp
  • Joined Aug 17, 2019

It's cool mate. Very good

joelbonetr profile image

  • Location Spain
  • Education Higher Level Education Certificate on Web Application Development
  • Work Tech Lead/Lead Dev
  • Joined Apr 19, 2019

Thank you! 🤖

Can u add Telugu a Indian language:)

I can try, do you know the IETF/ISO language code for it? 😁

nngosoftware profile image

  • Location İstanbul, Turkey
  • Joined Apr 28, 2022

This is really awesome. Could you please add the Turkish language? I would definitely like to try this in my native language and use it in my projects.

polterguy profile image

  • Location Cyprus
  • Work CTO at AINIRO AS
  • Joined Mar 13, 2022

Cool. I once created a speech based speech recognition thing based upon MySQL and SoundEx allowing me to create code by speaking through my headphones. It was based upon creating a hierarchical “menu” where I could say “Create button”. Then the machine would respond with “what button”, etc. The thing of course produced Hyperlambda though. I doubt it can be done without meta programming.

One thing that bothers me is that this was 5 years ago, and speech support has basically stood 100% perfectly still in all browsers since then … 😕

Not in all of them, (e.g. Opera mini, FireFox mobile), it's a nice to have in browsers, specially targeting accessibility, but screen readers for blind people do the job and, on the other hand, most implementations for any other purpose send data to a backend using streams so they can process the incoming speech plus use the user feedback to train an IA among others and without hurting the performance.

...allowing me to create code by speaking through my headphones... ... I doubt it can be done without meta programming.

I agree on this. The concept "metaprogramming" is extense and covers different ways in which it can work (or be implemented) and from its own definition it is a building block for this kind of applications.

mamsoares profile image

  • Location Rio de Janeiro, RJ
  • Education Master Degree
  • Work FullStack and Mobile Developer
  • Joined May 18, 2021

Thank you 🙏. I'd like that you put in Brazilian Portuguse too.

Added both Portugal and Brazilian portuguese 😁

samuelrivaldo profile image

  • Work Student
  • Joined Jul 21, 2022

Thanks you 🙏. I'd like that you put in french too.

Thank you! 😁

symeon profile image

  • Work Technical Manager @ Gabrieli Media Group
  • Joined Aug 29, 2022

Thank you very much for your useful article and implementation. Does it support Greek? Have a nice (programming) day

Hi Symeon, added support for Greek el-GR , try it out! 😃

arantisjr profile image

  • Education Cameroon
  • Joined Aug 26, 2022

I added support for some extra languages in the mean time 😁

aheedkhan profile image

  • Joined Jan 15, 2023

Can you please add urdu language

Hi @aheedkhan I'm not maintaining this anymore but feel free to fork the pen! 😄

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink .

Hide child comments as well

For further actions, you may consider blocking this person and/or reporting abuse

mahabubr profile image

Decoding Backend Architecture: Crafting Effective Folder Structures

Mahabubur Rahman - Jun 13

Harnessing the Power of React's useContext Hook: Simplifying State Management in Complex Applications

dharamgfx profile image

Dive into Server-Side Website Programming: From Basics to Mastery🚀🚀

Dharmendra Kumar - Jun 13

haikelei profile image

Why I Choose WebStorm Over VSCode

Gary Lu - Jun 13

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

speechrecognition js

Voice commands and speech synthesis made easy

Artyom.js is an useful wrapper of the speechSynthesis and webkitSpeechRecognition APIs.

Besides, artyom.js also lets you to add voice commands to your website easily, build your own Google Now, Siri or Cortana !

Download .js

Get on bower, installation.

If you don't use any module bundler like browserify, require etc, just include the artyom window script in the head tag of your document and you are ready to go !

The Artyom class would be now available and you can instantiate it:

Note You need to load artyom.js in the head tag to preload the voices in case you want to use the speechSynthesis API. otherwise you can still load it in the end of the body tag.

About Artyom in this Browser

Loading info ....

According to your browser, speech synthesis and speech recognition may be available or not separately, use artyom.speechSupported and artyom.recognizingSupported methods to know it.

These are the available voices of artyom in this browser. See the initialization codes in the initialization area or read the docs.

Our Code Editor

Give artyom some orders in this website Since you're in this website artyom has been enabled. Try using any of the demo commands in the following list to test it !
Trigger command with Description Smart

Voice commands

Before the initialization, we need to add some commands for being processed. Use the artyom.addCommands(commands) method to add commands.

A command is a literal object with some properties. There are 2 types of commands normal and smarts .

A smart command allow you to retrieve a value from a spoken string as a wildcard. Every command can be triggered for any of the identifiers given in the indexes array.

Pro tip You can add commands dinamically while artyom is active. The commands are stored in an array so you can add them whenever you want and they'll be processed.

Start artyom

Now that artyom has commands, these can be processed. Artyom can work in continuous and uncontinuous mode.

Remember that artyom provides you the possibility to process the commands with a server language instead of javascript, you can enable the remote mode of artyom and use the artyom.remoteProcessorService method.

Note You'll need an SSL certificate in your website (https connection) in order to use the continuous mode, otherwise you'll be prompted for the permission to access the microphone everytime the recognition ends.
Pro tip Set always the debug property to true if you're working with artyom locally , you'll find convenient, valuable messages and information in the browser console.

Speech text

Use artyom.say to speak text. The language is retrieven at the initialization from the lang property.

Note Artyom removes the limitation of the traditional API ( about 150 characters max. Read more about this issue here ). With artyom you can read very extense text chunks without being blocked and the onEnd and onStart callbacks will be respected.
Pro tip Split the text by yourself in the way you want and execute and use artyom.say many times to decrease the probability of limitation of characters in the spoken text.

Test it by yourself paste all the text you want in the following textarea and click on speak to hear it !

Speech to text

Convert what you say into text easily with the dictation object.

Note You'll need to stop artyom before start a new dictation using artyom.fatality as 2 instances of webkitSpeechRecognition cannot run at time.

Simulate instructions without say a word

You can simulate a command without use the microphone using artyom.simulateInstruction("command identifier") for test purposes (or you don't have any microphone for test).

Try simulating any of the commands of this document like "hello","go to github" etc.

Get spoken text while artyom is active

If you want to show the user the recognized text while artyom is active, you can redirect the output of the speech recognition of artyom using artyom.redirectRecognizedTextOutput .

All that you say on this website will be shown in the following box:

Pause and resume commands recognition

You can pause the commands recognition, not the original speechRecognition. The text recognition will continue but the commands execution will be paused using the artyom.dontObey method.

To resume the command recognition use the artyom.obey . Alternatively, use the obeyKeyword property to enable with the voice at the initialization.

Useful keywords

Use the executionKeyword at the initialization to execute immediately a command though you are still talking. Use the obeyKeyword to resume the commands recognition if you use the pause method ( artyom.dontObey ). If you say this keyword while artyom is paused, artyom will be resumed and it will continue processing commands automatically.

Trending tops in Our Code World

Top 7 : best free web development ide for javascript, html and css.

See the review from 7 of the best free IDE (and code editors) for web proyects development in Our Code World.

Top 5 : Best jQuery scheduler and events calendar for web applications

See the review from 5 of the best dynamics scheduler and events calendar for Web applications with Javascript and jQuery in Our Code World

Top 20: Best free bootstrap admin templates

See the collection from 20 of the most imponent Admin templates built in bootstrap for free in Our Code World.

Thanks for read everything !

Support the project, did you like artyom.

If you did, please consider in give a star on the github repository and share this project with your developer friends !

We are already persons supporting artyom.js

I'm here to help you

Issues and troubleshooting.

If you need help while you're trying to implement artyom and something is not working, or you have suggestions please report a ticket in the issues are on github and i'll try to help you ASAP.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

💬Speech recognition for your React app

JamesBrill/react-speech-recognition

Folders and files.

NameName
335 Commits
vendor vendor

Repository files navigation

React-speech-recognition.

A React hook that converts speech from the microphone to text and makes it available to your React components.

npm version

How it works

useSpeechRecognition is a React hook that gives a component access to a transcript of speech picked up from the user's microphone.

SpeechRecognition manages the global state of the Web Speech API, exposing functions to turn the microphone on and off.

Under the hood, it uses Web Speech API . Note that browser support for this API is currently limited, with Chrome having the best experience - see supported browsers for more information.

This version requires React 16.8 so that React hooks can be used. If you're used to version 2.x of react-speech-recognition or want to use an older version of React, you can see the old README here . If you want to migrate to version 3.x, see the migration guide here .

Useful links

Basic example, why you should use a polyfill with this library, cross-browser example, supported browsers, troubleshooting.

  • Version 3 migration guide
  • TypeScript declaration file in DefinitelyTyped

Installation

To install:

To import in your React code:

The most basic example of a component using this hook would be:

You can see more examples in the example React app attached to this repo. See Developing .

By default, speech recognition is not supported in all browsers, with the best native experience being available on desktop Chrome. To avoid the limitations of native browser speech recognition, it's recommended that you combine react-speech-recognition with a speech recognition polyfill . Why? Here's a comparison with and without polyfills:

  • ✅ With a polyfill, your web app will be voice-enabled on all modern browsers (except Internet Explorer)
  • ❌ Without a polyfill, your web app will only be voice-enabled on the browsers listed here
  • ✅ With a polyfill, your web app will have a consistent voice experience across browsers
  • ❌ Without a polyfill, different native implementations will produce different transcriptions, have different levels of accuracy, and have different formatting styles
  • ✅ With a polyfill, you control who is processing your users' voice data
  • ❌ Without a polyfill, your users' voice data will be sent to big tech companies like Google or Apple to be transcribed
  • ✅ With a polyfill, react-speech-recognition will be suitable for use in commercial applications
  • ❌ Without a polyfill, react-speech-recognition will still be fine for personal projects or use cases where cross-browser support is not needed

react-speech-recognition currently supports polyfills for the following cloud providers:

Speechly

You can find the full guide for setting up a polyfill here . Alternatively, here is a quick (and free) example using Speechly:

  • Install @speechly/speech-recognition-polyfill in your web app
  • You will need a Speechly app ID. To get one of these, sign up for free with Speechly and follow the guide here
  • Here's a component for a push-to-talk button. The basic example above would also work fine.

Detecting browser support for Web Speech API

If you choose not to use a polyfill, this library still fails gracefully on browsers that don't support speech recognition. It is recommended that you render some fallback content if it is not supported by the user's browser:

Without a polyfill, the Web Speech API is largely only supported by Google browsers. As of May 2021, the following browsers support the Web Speech API:

  • Chrome (desktop): this is by far the smoothest experience
  • Safari 14.1
  • Microsoft Edge
  • Chrome (Android): a word of warning about this platform, which is that there can be an annoying beeping sound when turning the microphone on. This is part of the Android OS and cannot be controlled from the browser
  • Android webview
  • Samsung Internet

For all other browsers, you can render fallback content using the SpeechRecognition.browserSupportsSpeechRecognition function described above. Alternatively, as mentioned before, you can integrate a polyfill .

Detecting when the user denies access to the microphone

Even if the browser supports the Web Speech API, the user still has to give permission for their microphone to be used before transcription can begin. They are asked for permission when react-speech-recognition first tries to start listening. At this point, you can detect when the user denies access via the isMicrophoneAvailable state. When this becomes false , it's advised that you disable voice-driven features and indicate that microphone access is needed for them to work.

Controlling the microphone

Before consuming the transcript, you should be familiar with SpeechRecognition , which gives you control over the microphone. The state of the microphone is global, so any functions you call on this object will affect all components using useSpeechRecognition .

Turning the microphone on

To start listening to speech, call the startListening function.

This is an asynchronous function, so it will need to be awaited if you want to do something after the microphone has been turned on.

Turning the microphone off

To turn the microphone off, but still finish processing any speech in progress, call stopListening .

To turn the microphone off, and cancel the processing of any speech in progress, call abortListening .

Consuming the microphone transcript

To make the microphone transcript available in your component, simply add:

Resetting the microphone transcript

To set the transcript to an empty string, you can call the resetTranscript function provided by useSpeechRecognition . Note that this is local to your component and does not affect any other components using Speech Recognition.

To respond when the user says a particular phrase, you can pass in a list of commands to the useSpeechRecognition hook. Each command is an object with the following properties:

  • command : This is a string or RegExp representing the phrase you want to listen for. If you want to use the same callback for multiple commands, you can also pass in an array here, with each value being a string or RegExp
  • command : The command phrase that was matched. This can be useful when you provide an array of command phrases for the same callback and need to know which one triggered it
  • resetTranscript : A function that sets the transcript to an empty string
  • matchInterim : Boolean that determines whether "interim" results should be matched against the command. This will make your component respond faster to commands, but also makes false positives more likely - i.e. the command may be detected when it is not spoken. This is false by default and should only be set for simple commands.
  • The value of command (with any special characters removed)
  • The speech that matched command
  • The similarity between command and the speech
  • The object mentioned in the callback description above
  • fuzzyMatchingThreshold : If the similarity of speech to command is higher than this value when isFuzzyMatch is turned on, the callback will be invoked. You should set this only if isFuzzyMatch is true . It takes values between 0 (will match anything) and 1 (needs an exact match). The default value is 0.8 .
  • bestMatchOnly : Boolean that, when isFuzzyMatch is true , determines whether the callback should only be triggered by the command phrase that best matches the speech, rather than being triggered by all matching fuzzy command phrases. This is useful for fuzzy commands with multiple command phrases assigned to the same callback function - you may only want the callback to be triggered once for each spoken command. You should set this only if isFuzzyMatch is true . The default value is false .

Command symbols

To make commands easier to write, the following symbols are supported:

  • Example: 'I would like to order *'
  • The words that match the splat will be passed into the callback, one argument per splat
  • Example: 'I am :height metres tall'
  • The one word that matches the named variable will be passed into the callback
  • Example: 'Pass the salt (please)'
  • The above example would match both 'Pass the salt' and 'Pass the salt please'

Example with commands

Continuous listening.

By default, the microphone will stop listening when the user stops speaking. This reflects the approach taken by "press to talk" buttons on modern devices.

If you want to listen continuously, set the continuous property to true when calling startListening . The microphone will continue to listen, even after the user has stopped speaking.

Be warned that not all browsers have good support for continuous listening. Chrome on Android in particular constantly restarts the microphone, leading to a frustrating and noisy (from the beeping) experience. To avoid enabling continuous listening on these browsers, you can make use of the browserSupportsContinuousListening state from useSpeechRecognition to detect support for this feature.

Alternatively, you can try one of the polyfills to enable continuous listening on these browsers.

Changing language

To listen for a specific language, you can pass a language tag (e.g. 'zh-CN' for Chinese) when calling startListening . See here for a list of supported languages.

regeneratorRuntime is not defined

If you see the error regeneratorRuntime is not defined when using this library, you will need to ensure your web app installs regenerator-runtime :

  • npm i --save regenerator-runtime
  • If you are using NextJS, put this at the top of your _app.js file: import 'regenerator-runtime/runtime' . For any other framework, put it at the top of your index.js file

How to use react-speech-recognition offline?

Unfortunately, speech recognition will not function in Chrome when offline. According to the Web Speech API docs : On Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.

If you are building an offline web app, you can detect when the browser is offline by inspecting the value of navigator.onLine . If it is true , you can render the transcript generated by React Speech Recognition. If it is false , it's advisable to render offline fallback content that signifies that speech recognition is disabled. The online/offline API is simple to use - you can read how to use it here .

You can run an example React app that uses react-speech-recognition with:

On http://localhost:3000 , you'll be able to speak into the microphone and see your speech as text on the web page. There are also controls for turning speech recognition on and off. You can make changes to the web app itself in the example directory. Any changes you make to the web app or react-speech-recognition itself will be live reloaded in the browser.

View the API docs here or follow the guide above to learn how to use react-speech-recognition .

Releases 23

Sponsor this project, used by 11.5k.

@manoj1689

Contributors 8

  • JavaScript 97.9%
  • Main Content

speechrecognition js

  • JavaScript Promises
  • ES6 Features

Demo: JavaScript Speech Recognition

Allow access to your microphone and then say something -- the Speech Recognition API may echo back what you said! Also: check out the Dev Tools console to follow events:

You said: nothing yet .

Back to: JavaScript Speech Recognition

SpeechRecognition 3.10.4

pip install SpeechRecognition Copy PIP instructions

Released: May 5, 2024

Library for performing speech recognition, with support for several engines and APIs, online and offline.

Verified details

Maintainers.

Avatar for Anthony.Zhang from gravatar.com

Unverified details

Project links, github statistics.

  • Open issues:

View statistics for this project via Libraries.io , or by using our public dataset on Google BigQuery

License: BSD License (BSD)

Author: Anthony Zhang (Uberi)

Tags speech, recognition, voice, sphinx, google, wit, bing, api, houndify, ibm, snowboy

Requires: Python >=3.8

Classifiers

  • 5 - Production/Stable
  • OSI Approved :: BSD License
  • MacOS :: MacOS X
  • Microsoft :: Windows
  • POSIX :: Linux
  • Python :: 3
  • Python :: 3.8
  • Python :: 3.9
  • Python :: 3.10
  • Python :: 3.11
  • Multimedia :: Sound/Audio :: Speech
  • Software Development :: Libraries :: Python Modules

Project description

Latest Version

UPDATE 2022-02-09 : Hey everyone! This project started as a tech demo, but these days it needs more time than I have to keep up with all the PRs and issues. Therefore, I’d like to put out an open invite for collaborators - just reach out at me @ anthonyz . ca if you’re interested!

Speech recognition engine/API support:

Quickstart: pip install SpeechRecognition . See the “Installing” section for more details.

To quickly try it out, run python -m speech_recognition after installing.

Project links:

Library Reference

The library reference documents every publicly accessible object in the library. This document is also included under reference/library-reference.rst .

See Notes on using PocketSphinx for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under reference/pocketsphinx.rst .

You have to install Vosk models for using Vosk. Here are models avaiable. You have to place them in models folder of your project, like “your-project-folder/models/your-vosk-model”

See the examples/ directory in the repository root for usage examples:

First, make sure you have all the requirements listed in the “Requirements” section.

The easiest way to install this is using pip install SpeechRecognition .

Otherwise, download the source distribution from PyPI , and extract the archive.

In the folder, run python setup.py install .

Requirements

To use all of the functionality of the library, you should have:

The following requirements are optional, but can improve or extend functionality in some situations:

The following sections go over the details of each requirement.

The first software requirement is Python 3.8+ . This is required to use the library.

PyAudio (for microphone users)

PyAudio is required if and only if you want to use microphone input ( Microphone ). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations.

If not installed, everything in the library will still work, except attempting to instantiate a Microphone object will raise an AttributeError .

The installation instructions on the PyAudio website are quite good - for convenience, they are summarized below:

PyAudio wheel packages for common 64-bit Python versions on Windows and Linux are included for convenience, under the third-party/ directory in the repository root. To install, simply run pip install wheel followed by pip install ./third-party/WHEEL_FILENAME (replace pip with pip3 if using Python 3) in the repository root directory .

PocketSphinx-Python (for Sphinx users)

PocketSphinx-Python is required if and only if you want to use the Sphinx recognizer ( recognizer_instance.recognize_sphinx ).

PocketSphinx-Python wheel packages for 64-bit Python 3.4, and 3.5 on Windows are included for convenience, under the third-party/ directory . To install, simply run pip install wheel followed by pip install ./third-party/WHEEL_FILENAME (replace pip with pip3 if using Python 3) in the SpeechRecognition folder.

On Linux and other POSIX systems (such as OS X), follow the instructions under “Building PocketSphinx-Python from source” in Notes on using PocketSphinx for installation instructions.

Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended.

Vosk (for Vosk users)

Vosk API is required if and only if you want to use Vosk recognizer ( recognizer_instance.recognize_vosk ).

You can install it with python3 -m pip install vosk .

You also have to install Vosk Models:

Here are models avaiable for download. You have to place them in models folder of your project, like “your-project-folder/models/your-vosk-model”

Google Cloud Speech Library for Python (for Google Cloud Speech API users)

Google Cloud Speech library for Python is required if and only if you want to use the Google Cloud Speech API ( recognizer_instance.recognize_google_cloud ).

If not installed, everything in the library will still work, except calling recognizer_instance.recognize_google_cloud will raise an RequestError .

According to the official installation instructions , the recommended way to install this is using Pip : execute pip install google-cloud-speech (replace pip with pip3 if using Python 3).

FLAC (for some systems)

A FLAC encoder is required to encode the audio data to send to the API. If using Windows (x86 or x86-64), OS X (Intel Macs only, OS X 10.6 or higher), or Linux (x86 or x86-64), this is already bundled with this library - you do not need to install anything .

Otherwise, ensure that you have the flac command line tool, which is often available through the system package manager. For example, this would usually be sudo apt-get install flac on Debian-derivatives, or brew install flac on OS X with Homebrew.

Whisper (for Whisper users)

Whisper is required if and only if you want to use whisper ( recognizer_instance.recognize_whisper ).

You can install it with python3 -m pip install SpeechRecognition[whisper-local] .

Whisper API (for Whisper API users)

The library openai is required if and only if you want to use Whisper API ( recognizer_instance.recognize_whisper_api ).

If not installed, everything in the library will still work, except calling recognizer_instance.recognize_whisper_api will raise an RequestError .

You can install it with python3 -m pip install SpeechRecognition[whisper-api] .

Troubleshooting

The recognizer tries to recognize speech even when i’m not speaking, or after i’m done speaking..

Try increasing the recognizer_instance.energy_threshold property. This is basically how sensitive the recognizer is to when recognition should start. Higher values mean that it will be less sensitive, which is useful if you are in a loud room.

This value depends entirely on your microphone or audio data. There is no one-size-fits-all value, but good values typically range from 50 to 4000.

Also, check on your microphone volume settings. If it is too sensitive, the microphone may be picking up a lot of ambient noise. If it is too insensitive, the microphone may be rejecting speech as just noise.

The recognizer can’t recognize speech right after it starts listening for the first time.

The recognizer_instance.energy_threshold property is probably set to a value that is too high to start off with, and then being adjusted lower automatically by dynamic energy threshold adjustment. Before it is at a good level, the energy threshold is so high that speech is just considered ambient noise.

The solution is to decrease this threshold, or call recognizer_instance.adjust_for_ambient_noise beforehand, which will set the threshold to a good value automatically.

The recognizer doesn’t understand my particular language/dialect.

Try setting the recognition language to your language/dialect. To do this, see the documentation for recognizer_instance.recognize_sphinx , recognizer_instance.recognize_google , recognizer_instance.recognize_wit , recognizer_instance.recognize_bing , recognizer_instance.recognize_api , recognizer_instance.recognize_houndify , and recognizer_instance.recognize_ibm .

For example, if your language/dialect is British English, it is better to use "en-GB" as the language rather than "en-US" .

The recognizer hangs on recognizer_instance.listen ; specifically, when it’s calling Microphone.MicrophoneStream.read .

This usually happens when you’re using a Raspberry Pi board, which doesn’t have audio input capabilities by itself. This causes the default microphone used by PyAudio to simply block when we try to read it. If you happen to be using a Raspberry Pi, you’ll need a USB sound card (or USB microphone).

Once you do this, change all instances of Microphone() to Microphone(device_index=MICROPHONE_INDEX) , where MICROPHONE_INDEX is the hardware-specific index of the microphone.

To figure out what the value of MICROPHONE_INDEX should be, run the following code:

This will print out something like the following:

Now, to use the Snowball microphone, you would change Microphone() to Microphone(device_index=3) .

Calling Microphone() gives the error IOError: No Default Input Device Available .

As the error says, the program doesn’t know which microphone to use.

To proceed, either use Microphone(device_index=MICROPHONE_INDEX, ...) instead of Microphone(...) , or set a default microphone in your OS. You can obtain possible values of MICROPHONE_INDEX using the code in the troubleshooting entry right above this one.

The program doesn’t run when compiled with PyInstaller .

As of PyInstaller version 3.0, SpeechRecognition is supported out of the box. If you’re getting weird issues when compiling your program using PyInstaller, simply update PyInstaller.

You can easily do this by running pip install --upgrade pyinstaller .

On Ubuntu/Debian, I get annoying output in the terminal saying things like “bt_audio_service_open: […] Connection refused” and various others.

The “bt_audio_service_open” error means that you have a Bluetooth audio device, but as a physical device is not currently connected, we can’t actually use it - if you’re not using a Bluetooth microphone, then this can be safely ignored. If you are, and audio isn’t working, then double check to make sure your microphone is actually connected. There does not seem to be a simple way to disable these messages.

For errors of the form “ALSA lib […] Unknown PCM”, see this StackOverflow answer . Basically, to get rid of an error of the form “Unknown PCM cards.pcm.rear”, simply comment out pcm.rear cards.pcm.rear in /usr/share/alsa/alsa.conf , ~/.asoundrc , and /etc/asound.conf .

For “jack server is not running or cannot be started” or “connect(2) call to /dev/shm/jack-1000/default/jack_0 failed (err=No such file or directory)” or “attempt to connect to server failed”, these are caused by ALSA trying to connect to JACK, and can be safely ignored. I’m not aware of any simple way to turn those messages off at this time, besides entirely disabling printing while starting the microphone .

On OS X, I get a ChildProcessError saying that it couldn’t find the system FLAC converter, even though it’s installed.

Installing FLAC for OS X directly from the source code will not work, since it doesn’t correctly add the executables to the search path.

Installing FLAC using Homebrew ensures that the search path is correctly updated. First, ensure you have Homebrew, then run brew install flac to install the necessary files.

To hack on this library, first make sure you have all the requirements listed in the “Requirements” section.

To install/reinstall the library locally, run python -m pip install -e .[dev] in the project root directory .

Before a release, the version number is bumped in README.rst and speech_recognition/__init__.py . Version tags are then created using git config gpg.program gpg2 && git config user.signingkey DB45F6C431DE7C2DCD99FF7904882258A4063489 && git tag -s VERSION_GOES_HERE -m "Version VERSION_GOES_HERE" .

Releases are done by running make-release.sh VERSION_GOES_HERE to build the Python source packages, sign them, and upload them to PyPI.

To run all the tests:

To run static analysis:

To ensure RST is well-formed:

Testing is also done automatically by GitHub Actions, upon every push.

FLAC Executables

The included flac-win32 executable is the official FLAC 1.3.2 32-bit Windows binary .

The included flac-linux-x86 and flac-linux-x86_64 executables are built from the FLAC 1.3.2 source code with Manylinux to ensure that it’s compatible with a wide variety of distributions.

The built FLAC executables should be bit-for-bit reproducible. To rebuild them, run the following inside the project directory on a Debian-like system:

The included flac-mac executable is extracted from xACT 2.39 , which is a frontend for FLAC 1.3.2 that conveniently includes binaries for all of its encoders. Specifically, it is a copy of xACT 2.39/xACT.app/Contents/Resources/flac in xACT2.39.zip .

Please report bugs and suggestions at the issue tracker !

How to cite this library (APA style):

Zhang, A. (2017). Speech Recognition (Version 3.8) [Software]. Available from https://github.com/Uberi/speech_recognition#readme .

How to cite this library (Chicago style):

Zhang, Anthony. 2017. Speech Recognition (version 3.8).

Also check out the Python Baidu Yuyin API , which is based on an older version of this project, and adds support for Baidu Yuyin . Note that Baidu Yuyin is only available inside China.

Copyright 2014-2017 Anthony Zhang (Uberi) . The source code for this library is available online at GitHub .

SpeechRecognition is made available under the 3-clause BSD license. See LICENSE.txt in the project’s root directory for more information.

For convenience, all the official distributions of SpeechRecognition already include a copy of the necessary copyright notices and licenses. In your project, you can simply say that licensing information for SpeechRecognition can be found within the SpeechRecognition README, and make sure SpeechRecognition is visible to users if they wish to see it .

SpeechRecognition distributes source code, binaries, and language files from CMU Sphinx . These files are BSD-licensed and redistributable as long as copyright notices are correctly retained. See speech_recognition/pocketsphinx-data/*/LICENSE*.txt and third-party/LICENSE-Sphinx.txt for license details for individual parts.

SpeechRecognition distributes source code and binaries from PyAudio . These files are MIT-licensed and redistributable as long as copyright notices are correctly retained. See third-party/LICENSE-PyAudio.txt for license details.

SpeechRecognition distributes binaries from FLAC - speech_recognition/flac-win32.exe , speech_recognition/flac-linux-x86 , and speech_recognition/flac-mac . These files are GPLv2-licensed and redistributable, as long as the terms of the GPL are satisfied. The FLAC binaries are an aggregate of separate programs , so these GPL restrictions do not apply to the library or your programs that use the library, only to FLAC itself. See LICENSE-FLAC.txt for license details.

Project details

Release history release notifications | rss feed.

May 5, 2024

Mar 30, 2024

Mar 28, 2024

Dec 6, 2023

Mar 13, 2023

Dec 4, 2022

Dec 5, 2017

Jun 27, 2017

Apr 13, 2017

Mar 11, 2017

Jan 7, 2017

Nov 21, 2016

May 22, 2016

May 11, 2016

May 10, 2016

Apr 9, 2016

Apr 4, 2016

Apr 3, 2016

Mar 5, 2016

Mar 4, 2016

Feb 26, 2016

Feb 20, 2016

Feb 19, 2016

Feb 4, 2016

Nov 5, 2015

Nov 2, 2015

Sep 2, 2015

Sep 1, 2015

Aug 30, 2015

Aug 24, 2015

Jul 26, 2015

Jul 12, 2015

Jul 3, 2015

May 20, 2015

Apr 24, 2015

Apr 14, 2015

Apr 7, 2015

Apr 5, 2015

Apr 4, 2015

Mar 31, 2015

Dec 10, 2014

Nov 17, 2014

Sep 11, 2014

Sep 6, 2014

Aug 25, 2014

Jul 6, 2014

Jun 10, 2014

Jun 9, 2014

May 29, 2014

Apr 23, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages .

Source Distribution

Uploaded May 5, 2024 Source

Built Distribution

Uploaded May 5, 2024 Python 2 Python 3

Hashes for speechrecognition-3.10.4.tar.gz

Hashes for speechrecognition-3.10.4.tar.gz
Algorithm Hash digest
SHA256
MD5
BLAKE2b-256

Hashes for SpeechRecognition-3.10.4-py2.py3-none-any.whl

Hashes for SpeechRecognition-3.10.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256
MD5
BLAKE2b-256
  • português (Brasil)

Supported by

speechrecognition js

react-speech-recognition

  • 0 Dependencies
  • 59 Dependents
  • 41 Versions

A React hook that converts speech from the microphone to text and makes it available to your React components.

npm version

How it works

useSpeechRecognition is a React hook that gives a component access to a transcript of speech picked up from the user's microphone.

SpeechRecognition manages the global state of the Web Speech API, exposing functions to turn the microphone on and off.

Under the hood, it uses Web Speech API . Note that browser support for this API is currently limited, with Chrome having the best experience - see supported browsers for more information.

This version requires React 16.8 so that React hooks can be used. If you're used to version 2.x of react-speech-recognition or want to use an older version of React, you can see the old README here . If you want to migrate to version 3.x, see the migration guide here .

Useful links

Basic example, why you should use a polyfill with this library, cross-browser example, supported browsers, troubleshooting.

  • Version 3 migration guide
  • TypeScript declaration file in DefinitelyTyped

Installation

To install:

npm install --save react-speech-recognition

To import in your React code:

import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition'

The most basic example of a component using this hook would be:

You can see more examples in the example React app attached to this repo. See Developing .

By default, speech recognition is not supported in all browsers, with the best native experience being available on desktop Chrome. To avoid the limitations of native browser speech recognition, it's recommended that you combine react-speech-recognition with a speech recognition polyfill . Why? Here's a comparison with and without polyfills:

  • ✅ With a polyfill, your web app will be voice-enabled on all modern browsers (except Internet Explorer)
  • ❌ Without a polyfill, your web app will only be voice-enabled on the browsers listed here
  • ✅ With a polyfill, your web app will have a consistent voice experience across browsers
  • ❌ Without a polyfill, different native implementations will produce different transcriptions, have different levels of accuracy, and have different formatting styles
  • ✅ With a polyfill, you control who is processing your users' voice data
  • ❌ Without a polyfill, your users' voice data will be sent to big tech companies like Google or Apple to be transcribed
  • ✅ With a polyfill, react-speech-recognition will be suitable for use in commercial applications
  • ❌ Without a polyfill, react-speech-recognition will still be fine for personal projects or use cases where cross-browser support is not needed

react-speech-recognition currently supports polyfills for the following cloud providers:

Speechly

You can find the full guide for setting up a polyfill here . Alternatively, here is a quick (and free) example using Speechly:

  • Install @speechly/speech-recognition-polyfill in your web app
  • You will need a Speechly app ID. To get one of these, sign up for free with Speechly and follow the guide here
  • Here's a component for a push-to-talk button. The basic example above would also work fine.

Detecting browser support for Web Speech API

If you choose not to use a polyfill, this library still fails gracefully on browsers that don't support speech recognition. It is recommended that you render some fallback content if it is not supported by the user's browser:

Without a polyfill, the Web Speech API is largely only supported by Google browsers. As of May 2021, the following browsers support the Web Speech API:

  • Chrome (desktop): this is by far the smoothest experience
  • Safari 14.1
  • Microsoft Edge
  • Chrome (Android): a word of warning about this platform, which is that there can be an annoying beeping sound when turning the microphone on. This is part of the Android OS and cannot be controlled from the browser
  • Android webview
  • Samsung Internet

For all other browsers, you can render fallback content using the SpeechRecognition.browserSupportsSpeechRecognition function described above. Alternatively, as mentioned before, you can integrate a polyfill .

Detecting when the user denies access to the microphone

Even if the browser supports the Web Speech API, the user still has to give permission for their microphone to be used before transcription can begin. They are asked for permission when react-speech-recognition first tries to start listening. At this point, you can detect when the user denies access via the isMicrophoneAvailable state. When this becomes false , it's advised that you disable voice-driven features and indicate that microphone access is needed for them to work.

Controlling the microphone

Before consuming the transcript, you should be familiar with SpeechRecognition , which gives you control over the microphone. The state of the microphone is global, so any functions you call on this object will affect all components using useSpeechRecognition .

Turning the microphone on

To start listening to speech, call the startListening function.

This is an asynchronous function, so it will need to be awaited if you want to do something after the microphone has been turned on.

Turning the microphone off

To turn the microphone off, but still finish processing any speech in progress, call stopListening .

To turn the microphone off, and cancel the processing of any speech in progress, call abortListening .

Consuming the microphone transcript

To make the microphone transcript available in your component, simply add:

Resetting the microphone transcript

To set the transcript to an empty string, you can call the resetTranscript function provided by useSpeechRecognition . Note that this is local to your component and does not affect any other components using Speech Recognition.

To respond when the user says a particular phrase, you can pass in a list of commands to the useSpeechRecognition hook. Each command is an object with the following properties:

  • command : This is a string or RegExp representing the phrase you want to listen for. If you want to use the same callback for multiple commands, you can also pass in an array here, with each value being a string or RegExp
  • command : The command phrase that was matched. This can be useful when you provide an array of command phrases for the same callback and need to know which one triggered it
  • resetTranscript : A function that sets the transcript to an empty string
  • matchInterim : Boolean that determines whether "interim" results should be matched against the command. This will make your component respond faster to commands, but also makes false positives more likely - i.e. the command may be detected when it is not spoken. This is false by default and should only be set for simple commands.
  • The value of command (with any special characters removed)
  • The speech that matched command
  • The similarity between command and the speech
  • The object mentioned in the callback description above
  • fuzzyMatchingThreshold : If the similarity of speech to command is higher than this value when isFuzzyMatch is turned on, the callback will be invoked. You should set this only if isFuzzyMatch is true . It takes values between 0 (will match anything) and 1 (needs an exact match). The default value is 0.8 .
  • bestMatchOnly : Boolean that, when isFuzzyMatch is true , determines whether the callback should only be triggered by the command phrase that best matches the speech, rather than being triggered by all matching fuzzy command phrases. This is useful for fuzzy commands with multiple command phrases assigned to the same callback function - you may only want the callback to be triggered once for each spoken command. You should set this only if isFuzzyMatch is true . The default value is false .

Command symbols

To make commands easier to write, the following symbols are supported:

  • Example: 'I would like to order *'
  • The words that match the splat will be passed into the callback, one argument per splat
  • Example: 'I am :height metres tall'
  • The one word that matches the named variable will be passed into the callback
  • Example: 'Pass the salt (please)'
  • The above example would match both 'Pass the salt' and 'Pass the salt please'

Example with commands

Continuous listening.

By default, the microphone will stop listening when the user stops speaking. This reflects the approach taken by "press to talk" buttons on modern devices.

If you want to listen continuously, set the continuous property to true when calling startListening . The microphone will continue to listen, even after the user has stopped speaking.

Be warned that not all browsers have good support for continuous listening. Chrome on Android in particular constantly restarts the microphone, leading to a frustrating and noisy (from the beeping) experience. To avoid enabling continuous listening on these browsers, you can make use of the browserSupportsContinuousListening state from useSpeechRecognition to detect support for this feature.

Alternatively, you can try one of the polyfills to enable continuous listening on these browsers.

Changing language

To listen for a specific language, you can pass a language tag (e.g. 'zh-CN' for Chinese) when calling startListening . See here for a list of supported languages.

regeneratorRuntime is not defined

If you see the error regeneratorRuntime is not defined when using this library, you will need to ensure your web app installs regenerator-runtime :

  • npm i --save regenerator-runtime
  • If you are using NextJS, put this at the top of your _app.js file: import 'regenerator-runtime/runtime' . For any other framework, put it at the top of your index.js file

How to use react-speech-recognition offline?

Unfortunately, speech recognition will not function in Chrome when offline. According to the Web Speech API docs : On Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.

If you are building an offline web app, you can detect when the browser is offline by inspecting the value of navigator.onLine . If it is true , you can render the transcript generated by React Speech Recognition. If it is false , it's advisable to render offline fallback content that signifies that speech recognition is disabled. The online/offline API is simple to use - you can read how to use it here .

You can run an example React app that uses react-speech-recognition with:

On http://localhost:3000 , you'll be able to speak into the microphone and see your speech as text on the web page. There are also controls for turning speech recognition on and off. You can make changes to the web app itself in the example directory. Any changes you make to the web app or react-speech-recognition itself will be live reloaded in the browser.

View the API docs here or follow the guide above to learn how to use react-speech-recognition .

  • recognition

Package Sidebar

npm i react-speech-recognition

Git github.com/JamesBrill/react-speech-recognition

webspeechrecognition.com/

Downloads Weekly Downloads

Unpacked size, total files, last publish.

2 years ago

Collaborators

james.brill

  • Computer Vision
  • Federated Learning
  • Reinforcement Learning
  • Natural Language Processing
  • New Releases
  • Advisory Board Members
  • 🐝 Partnership and Promotion

Logo

Asif Razzaq

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

  • Lamini AI's Memory Tuning Achieves 95% Accuracy and Reduces Hallucinations by 90% in Large Language Models
  • Allen Institute for AI Releases Tulu 2.5 Suite on Hugging Face: Advanced AI Models Trained with DPO and PPO, Featuring Reward and Value Models
  • MAGPIE: A Self-Synthesis Method for Generating Large-Scale Alignment Data by Prompting Aligned LLMs with Nothing
  • NVIDIA AI Introduces Nemotron-4 340B: A Family of Open Models that Developers can Use to Generate Synthetic Data for Training Large Language Models (LLMs)

RELATED ARTICLES MORE FROM AUTHOR

Separating fact from logic: test of time tot benchmark isolates reasoning skills in llms for improved temporal understanding, lamini ai’s memory tuning achieves 95% accuracy and reduces hallucinations by 90% in large language models, pixel transformer: challenging locality bias in vision models, advancements in multilingual speech-to-speech translation and membership inference attacks: a comprehensive review, aider: an ai tool that lets you do pair programming in your terminal, sketchpad: an ai framework that gives multimodal language models lms a visual sketchpad and tools to draw on the sketchpad, separating fact from logic: test of time tot benchmark isolates reasoning skills in llms..., lamini ai’s memory tuning achieves 95% accuracy and reduces hallucinations by 90% in large....

  • AI Magazine
  • Privacy & TC
  • Cookie Policy

🐝 🐝 Join the Fastest Growing AI Research Newsletter...

Thank You 🙌

Privacy Overview

  • About AssemblyAI

How to Create WebVTT Files for Videos in Node.js

Learn how to create WebVTT subtitle files for videos using Node.js in this easy-to-follow guide.

JavaScript code to transcribe a video and generate WebVTT subtitles files.

Niels is a developer educator at AssemblyAI, a blogger at Swimburger.NET, and a Microsoft MVP focusing on C# .NET.

WebVTT .vtt or Web Video Text Tracks Format is a widely used and supported format for subtitles in videos. This is what the first lines of the WebVTT file for this YouTube video look like:

In this guide, you'll learn how to create WebVTT files for videos using Node.js and the AssemblyAI API.

Step 1: Set up your development environment

First, install Node.js 18 or higher on your system. Next, create a new project folder, change directories to it, and initialize a new Node.js project:

Open the package.json file and add type: "module", to the list of properties.

This will tell Node.js to use the ES Module syntax for exporting and importing modules, and not to use the old CommonJS syntax.

Then, install the AssemblyAI JavaScript SDK which makes it easier to interact with the AssemblyAI API:

Next, you need an AssemblyAI API key that you can find on your dashboard . If you don't have an AssemblyAI account, first sign up for free . Once you’ve copied your API key, configure it as the ASSEMBLYAI_API_KEY environment variable on your machine:

2. Transcribe your video

Now that your development environment is ready, you can start transcribing your video files. In this tutorial, you'll use this video in MP4 format . The AssemblyAI SDK can transcribe any audio or video file that’s publicly accessible via a URL, but you can also specify local files. Create a file called index.js and add the following code:

If the transcription is successful, the transcript object will be populated with the transcript text and many additional properties. However, you should verify whether an error occurred and log the error.

Add the following lines of JavaScript:

3. Generate WebVTT file

Now that you have a transcript, you can generate the subtitles in WebVTT format. Add the following import which you'll need to save the WebVTT file to disk.

Then add the following code to generate the WebVTT subtitles from the transcript and download the VTT file to disk.

You can customize the maximum number of characters per caption by specifying the third parameter ( chars_per_caption ).

WebVTT Subtitle Format

SRT is another widely supported and popular subtitle format. To generate SRT, replace `"vtt"` with `"srt"`, and save the file with the srt-extension.

4. Run the script

To run the script, go back to your shell and run:

After a couple of seconds, you'll see a new file on disk subtitles.vtt , which looks like this:

Now that you have your subtitle file, you can configure it in your video player, or if you're creating a YouTube video, upload it to YouTube Studio. You can also use other tools to bundle or even burn the subtitles into your video.

Check out our Audio Intelligence models and LeMUR to add even more capabilities to your audio and video applications.

Alternatively, feel free to check out our blog or YouTube channel for educational content on AI and Machine Learning, or feel free to join us on Twitter or Discord to stay in the loop when we release new content.

Popular posts

AI trends in 2024: Graph Neural Networks

AI trends in 2024: Graph Neural Networks

Marco Ramponi's picture

Developer Educator at AssemblyAI

AI for Universal Audio Understanding: Qwen-Audio Explained

AI for Universal Audio Understanding: Qwen-Audio Explained

Combining Speech Recognition and Diarization in one model

Combining Speech Recognition and Diarization in one model

How DALL-E 2 Actually Works

How DALL-E 2 Actually Works

Ryan O'Connor's picture

  • Stack Overflow Public questions & answers
  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Talent Build your employer brand
  • Advertising Reach developers & technologists worldwide
  • Labs The future of collective knowledge sharing
  • About the company

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

Speech Recognition API in Microsoft Edge (Not defined)

I have been attempting to use the SpeechRecognition API( https://wicg.github.io/speech-api/#examples-recognition ) in a recent project. I am currently using the browser Microsoft edge and according to https://caniuse.com/#feat=speech-recognition it is only partially supported on there. From the looks of it, it seems that the "text to speech" feature is supported (SpeechSynthesis) on Edge but not the Speech Recognition feature. As no matter what prefix I use for the SpeechRecognition (Speech to text) API in EDGE it always does not recognise it and says it "is not defined"

Anyone have any clarity on this situation, or know how to get the Speech Recognition to work with edge in JavaScript?

  • speech-recognition
  • microsoft-edge
  • speech-to-text

Ethan Venencia's user avatar

  • 1 It is partially supported in Edge Chromium - not the older Edge versions. Which Edge version do you have? –  fredrik Commented Mar 6, 2020 at 10:05
  • As far as i'm aware I have the most up to date version. I have seen it is partially supported in edge and I have the "Text to speech" api working in edge, so I assumed as the Speech to Text API wasn't working, that was one feature not yet working in edge. –  Ethan Venencia Commented Mar 6, 2020 at 10:09
  • 1 Check the about page to know which version it is you have. If you don't have the Chromium version there is probably nothing that can be done about it. –  fredrik Commented Mar 6, 2020 at 10:11
  • 1 If you don't have Edge Chromium there is probably no way of adding support for the speech recoginition api. You can always try to find a polyfill if you want. –  fredrik Commented Mar 6, 2020 at 10:18
  • 1 SpeechRecognition API indeed has some issues in Edge Chromium. I've seen other threads about this . I also provide feedback about the issues using SpeechRecognition API in Edge Chromium. Let's wait and see if the Edge team will fix it in future versions. –  Yu Zhou Commented Mar 31, 2020 at 7:48

UPDATE: As of 1/18/2022 the Speech Recognition part of the JavaScript Web Speech API seems to be working in Edge Chromium. Microsoft seems to be experimenting with it in Edge. It is automatically adding punctuation and there seems to be no way to disable auto punctuation. I'm not sure about all the languages it supports. But it seems to be working so far in English, Spanish, German, French, Chinese Simplified and Japanese. I'm leaving the information below for history.

As of 6/4/2020 Edge Chromium does not really support the Speech Recognition part of the Web Speech API. Microsoft seems to be working on it for Edge Chromium. It will probably never work for Edge Legacy (non-Chromium).

developer.microsoft.com says incorrectly that it is "Supported" but also says, "Working draft or equivalent". (UPDATE: As of 2/18/2021 it now says: "NOT SUPPORTED")

developer.mozilla.org compatibility table also incorrectly says that it is supported in Edge.

caniuse correctly shows that it is not supported in Edge Chromium even though it acts like it is but the proper events are not fired.

The only other browsers besides Chrome and Chromium that I have seen the Speech Recognition part of the Web Speech API work with is Brave and Yandex. Yandex probably connects to a server in Russia to process the speech recognition. It does not do a good job. At least in English. At the moment Brave is returning a "Network" error. According to this github Brave discussion Brave would have to pay Google in order to get the speech to text service.

Here is some quick code that can be used to test if Speech Recognition works in a browser and display all the errors and events in the body. It only works with https protocol. It does not seem to work with codepen or jsfiddle.

Jeff Baker's user avatar

  • How does chromium perform speech recognition? Is it done by a library? If so which one? Is it some internal work? Does it rely on a server for doing speech recognition? –  Stephan Commented Jul 20, 2020 at 19:37
  • 2 @Stephan Google programmed Chrome and Chromium to use the Web Speech API . It uses javascript. It does contact a server that Google has setup. That is not on the front end though and Google does not really mention it. –  Jeff Baker Commented Jul 21, 2020 at 1:08

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged javascript api speech-recognition microsoft-edge speech-to-text or ask your own question .

  • Featured on Meta
  • Upcoming sign-up experiments related to tags
  • Policy: Generative AI (e.g., ChatGPT) is banned
  • The return of Staging Ground to Stack Overflow
  • The 2024 Developer Survey Is Live

Hot Network Questions

  • What would happen to 'politicians' in this world?
  • Calculate mu and sigma of a log normal distribution from p50 and p99 in javascript
  • Is there only one viable definition of the logical connectives?
  • How did the Terminator recognize Sarah at Tech Noir?
  • Can you cast a copy of a card from the battlefield if the effect creating the copy specifies that it can be cast?
  • How do you handle plain sight while sneaking?
  • What do we know about the computable surreal numbers?
  • Is intrinsic spin a quantum or/and a relativistic phenomenon?
  • Is there any piece that can take multiple pieces at once?
  • Article that plagiarized our paper is still available and gets cited - what to do?
  • Is 0-1 knapsack problem still NP-Hard (1) with an equality constraint and (2) when all the weights in the constraint are equal to one?
  • Can I enter France on an expired EU passport?
  • Why is this call to an overloaded function that takes a std::function as an argument considered ambiguous?
  • Movie with a gate guarded by two statues
  • Source for "Beahava" in Birkat Kohanim? What if the Kohen dislikes someone in the congregation?
  • What does "crammle aboon the grees" mean?
  • Why is DSolve unable to solve this first order homogeneous ode with IC?
  • Why did the prosecution not charge?
  • Largest sequence of adjacent numbers less than 11 such that adjacent number divides the other
  • Content strategy for 2 web shops with same content, to avoid duplicate content penalty
  • How to make annotations follow the latitude lines in QGIS?
  • Box taking exactly N \baselineskip
  • How to define what type of node when the name has been changed?
  • Does anybody know what this phrase means? (Oracle bones)

speechrecognition js

  • Skip to main content
  • Skip to search
  • Skip to select language
  • Sign up for free

SpeechRecognition: lang property

The lang property of the SpeechRecognition interface returns and sets the language of the current SpeechRecognition . If not specified, this defaults to the HTML lang attribute value, or the user agent's language setting if that isn't set either.

A string representing the BCP 47 language tag for the current SpeechRecognition .

This code is excerpted from our Speech color changer example.

Specifications

Specification

Browser compatibility

BCD tables only load in the browser with JavaScript enabled. Enable JavaScript to view data.

  • Web Speech API

IMAGES

  1. Speech Recognition App Using Vanilla JavaScript

    speechrecognition js

  2. Building a Speech to Text App with JavaScript

    speechrecognition js

  3. Speech Command Recognition With Tensorflow.JS and React.JS

    speechrecognition js

  4. 35 Javascript Voice Recognition Api

    speechrecognition js

  5. How to build a speech recognising app with JavaScript

    speechrecognition js

  6. Speech Recognition in JavaScript with Code Example

    speechrecognition js

VIDEO

  1. python Speech recognition

  2. Html, css, javascript Voice Diary app. Part 1

  3. [Javascript Speech Recognition] 반짝 아이디어 구현해보기

  4. Alan Platform Overview

  5. AI 101: Deep Learning

  6. RAG Chat App with Speech Input/Ouput

COMMENTS

  1. SpeechRecognition

    SpeechRecognition. The SpeechRecognition interface of the Web Speech API is the controller interface for the recognition service; this also handles the SpeechRecognitionEvent sent from the recognition service. Note: On some browsers, like Chrome, using Speech Recognition on a web page involves a server-based recognition engine.

  2. Using the Web Speech API

    Speech recognition involves receiving speech through a device's microphone, which is then checked by a speech recognition service against a list of grammar (basically, the vocabulary you want to have recognized in a particular app.) When a word or phrase is successfully recognized, it is returned as a result (or list of results) as a text string, and further actions can be initiated as a result.

  3. SpeechRecognition: SpeechRecognition() constructor

    The SpeechRecognition() constructor creates a new SpeechRecognition object instance. Skip to main content; Skip to search; Skip to select language; Open main menu ... js. new SpeechRecognition Parameters. None. Examples. This code is excerpted from our Speech color changer example. js.

  4. JavaScript Speech Recognition Example (Speech to Text)

    recognition.start() method is used to start the speech recognition. Once we begin speech recognition, the onstart event handler can be used to inform the user that speech recognition has started and they should speak into the mocrophone. When the user is done speaking, the onresult event handler will have the result.

  5. Voice driven web apps

    The new JavaScript Web Speech API makes it easy to add speech recognition to your web pages. This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later. Here's an example with the recognized text appearing almost immediately while speaking. DEMO / SOURCE.

  6. Speech Recognition Using the Web Speech API in JavaScript

    Let's code. First, create a new JavaScript file and name it speechRecognition.js. Next, add the script to the HTML file using the script tag after the body tag. Adding the script tag after the body tag will make sure that the script file is loaded after all the elements have been loaded to the DOM which aids performance.

  7. Perform Speech Recognition in Your JavaScript Applications

    Annyang is a JavaScript Speech Recognition library to control the Website with voice commands. It is built on top of SpeechRecognition Web APIs. In next section, we are going to give an example on how annyang works. 2. artyom.js. artyom.js is a JavaScript Speech Recognition and Speech Synthesis library. It is built on top of Web speech APIs.

  8. Is there a way to use the Javascript SpeechRecognition API with an

    @guest271314 yes, all of that is correct, but you as user of the API are not able to specify an audio file as an argument to the start() method, or any other member of the SpeechRecognition API. The audio input is decided by the user-agent (browser), and the current implementations only use the microphone as an audio input.

  9. Speech Recognition with JavaScript

    Speech Recognition in the Browser with JavaScript - key code blocks: /* Check whether the SpeechRecognition or the webkitSpeechRecognition API is available on window and reference it */ const recognitionSvc = window.SpeechRecognition || window.webkitSpeechRecognition; // Instantiate it const recognition = new recognitionSvc(); /* Set the speech ...

  10. Artyom.js

    Artyom.js is an useful wrapper of the speechSynthesis and webkitSpeechRecognition APIs. Besides, artyom.js also lets you to add voice commands to your website easily, build your own Google Now, Siri or Cortana ! Github repository Read the documentation Get Artyom.js (latest version)

  11. JamesBrill/react-speech-recognition

    useSpeechRecognition is a React hook that gives a component access to a transcript of speech picked up from the user's microphone.. SpeechRecognition manages the global state of the Web Speech API, exposing functions to turn the microphone on and off.. Under the hood, it uses Web Speech API.Note that browser support for this API is currently limited, with Chrome having the best experience ...

  12. Web Speech API

    The Web Speech API makes web apps able to handle voice data. There are two components to this API: Speech recognition is accessed via the SpeechRecognition interface, which provides the ability to recognize voice context from an audio input (normally via the device's default speech recognition service) and respond appropriately. Generally you'll use the interface's constructor to create a new ...

  13. JavaScript Speech Recognition Example

    Demo: JavaScript Speech Recognition. Read JavaScript Speech Recognition. Allow access to your microphone and then say something -- the Speech Recognition API may echo back what you said! Also: check out the Dev Tools console to follow events: Start Listening. You said: nothing yet.

  14. SpeechRecognition · PyPI

    In your project, you can simply say that licensing information for SpeechRecognition can be found within the SpeechRecognition README, and make sure SpeechRecognition is visible to users if they wish to see it. SpeechRecognition distributes source code, binaries, and language files from CMU Sphinx. These files are BSD-licensed and ...

  15. react-speech-recognition

    For any other framework, put it at the top of your index.js file; How to use react-speech-recognition offline? Unfortunately, speech recognition will not function in Chrome when offline. According to the Web Speech API docs: On Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web ...

  16. Speech Recognition

    No matter the settings you'll choose, Google Chrome stops the speech recognition engine after a while... there's no way around it. The only reliable solution I've found for continuous speech recognition, is to start it again by binding to the onend() event, as you've suggested. If you try a similar technique, be aware of the following:

  17. Whisper WebGPU: Real-Time in-Browser Speech Recognition with OpenAI

    Achieving real-time speech recognition directly within a web browser has long been a sought-after milestone. Whisper WebGPU by a Hugging Face Engineer (nickname 'Xenova') is a groundbreaking technology that leverages OpenAI's Whisper model to bring real-time, in-browser speech recognition to fruition. This remarkable development is a monumental shift in interaction with AI-driven web ...

  18. SpeechRecognition: start() method

    The start() method of the Web Speech API starts the speech recognition service listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition. Syntax. js. start Parameters. None. Return value. None (undefined). Examples. js.

  19. How to Create SRT Files for Videos in Node.js

    In this guide, you'll learn how to create SRT files for videos using Node.js and the AssemblyAI API. Step 1: Set up your development environment. First, install Node.js 18 or higher on your system. Next, create a new project folder, change directories to it, and initialize a new Node.js project: mkdir srt-subtitles cd srt-subtitles npm init -y

  20. Web Speech API SpeechRecognition not defined when using React.js

    Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question.Provide details and share your research! But avoid …. Asking for help, clarification, or responding to other answers.

  21. How to Create WebVTT Files for Videos in Node.js

    Step 1: Set up your development environment. First, install Node.js 18 or higher on your system. Next, create a new project folder, change directories to it, and initialize a new Node.js project: mkdir vtt-subtitles cd vtt-subtitles npm init -y.

  22. SpeechRecognitionAlternative

    Returns a numeric estimate between 0 and 1 of how confident the speech recognition system is that the recognition is correct. Examples. This code is excerpted from our Speech color changer example. js. recognition. onresult = (event) => ...

  23. Speech Recognition API in Microsoft Edge (Not defined)

    Here is some quick code that can be used to test if Speech Recognition works in a browser and display all the errors and events in the body. It only works with https protocol. It does not seem to work with codepen or jsfiddle. var recognition = new sr(); event_list.forEach(function(e) {. recognition[e] = function() {.

  24. SpeechRecognition: lang property

    The lang property of the SpeechRecognition interface returns and sets the language of the current SpeechRecognition. If not specified, this defaults to the HTML lang attribute value, or the user agent's language setting if that isn't set either. ... js. const grammar = "#JSGF V1.0; grammar colors; public <color> = aqua | azure | beige | bisque ...