• Skip to content
  • Accessibility Policy
  • Oracle Cloud Infrastructure
  • Oracle Fusion Cloud Applications
  • Oracle Database
  • Download Java
  • Careers at Oracle


  • Create an Account

Java Speech API Frequently Asked Questions

This collection of frequently asked questions (FAQ) provides brief answers to many common questions about the Java Speech API (JSAPI).

Question Index

Download questions, where can i get the java speech api (jsapi).

The Java Speech API (JSAPI) is not part of the JDK and Sun does not ship an implementation of JSAPI. Instead, we work with third party speech companies to encourage the availability of multiple implementations .

API Questions

What is the java speech api (jsapi).

The Java Speech API allows Java applications to incorporate speech technology into their user interfaces. It defines a cross-platform API to support command and control recognizers, dictation systems and speech synthesizers.

What does the Java Speech API specification include?

The Java Speech API specification includes the Javadoc-style API documentation for the approximately 70 classes and interfaces in the API. The specification also includes a detailed Programmer's Guide which explains both introductory and advanced speech application programming with JSAPI. Two companion specifications are available: JSML and JSGF.

The specification is not yet provided with the .class files needed to compile applications with JSAPI.

What are JSML and JSGF?

The Java Speech API Markup Language (JSML) and the Java Speech API Grammar Format (JSGF) are companion specifications to the Java Speech API. JSML (currently in beta) defines a standard text format for marking up text for input to a speech synthesizer. JSGF version 1.0 defines a standard text format for providing a grammar to a speech recognizer.

How was the JSAPI specification developed?

Sun Microsystems, Inc. worked in partnership with leading speech technology companies to define the initial specification of the Java Speech API, JSML and JSGF. Sun is grateful for the contributions of:

  • Apple Computer, Inc.
  • Dragon Systems, Inc.
  • IBM Corporation
  • Novell, Inc.
  • Philips Speech Processing
  • Texas Instruments Incorporated

How does JSAPI relate to other Java APIs?

The Java Speech API is part of a family of APIs that work together as a suite to provide customers with enhanced graphics and extended communications capabilities. These APIs include the

  • Java 2D API
  • Java 3D API
  • Java Advanced Imaging API
  • Java Sound API
  • Java Telephony API

Implementation Questions

What jsapi implementations are now available.

The Java Speech API is a freely available specification and therefore anyone is welcome to develop an implementation. The following implementations are known to exist.

Note: Sun Microsystems, Inc. makes no representations or warranties about the suitability of the software listed here, either express or implied, including but not limited to the implied warranties of mechantability, fitness for a particular purpose, or non-infringement. The implementations listed here have not been tested with regard to compliance to the JSAPI specification, nor does their appearance on this page imply any form of endorsement of compliance on the part of Sun.

  • Description: Open source speech synthesizer written entirely in the Java programming language.
  • Requirements: JDK 1.4. Read about more requirements on the FreeTTS web site.
  • Description: Implementation based on IBM's ViaVoice product, which supports continuous dictation, command and control and speech synthesis. It supports all the European language versions of ViaVoice -- US & UK English, French, German, Italian and Spanish -- plus Japanese.
  • Requirements: JDK 1.1.7 or later or JDK 1.2 on Windows 95 with 32MB, or Windows NT with 48MB. Both platforms also require an installation ViaVoice 98.
  • Description: Beta version of "Speech for Java" on Linux. Currently only supports speech recognition.
  • Requirements: Red Hat Linux 6.0 with 32MB, and Blackdown JDK 1.1.7 with native thread support.
  • Description: Implementation for use with any recognition/TTS speech engine compliant with Microsoft's SAPI5 (with SAPI4 support for TTS engines only). An additional package allows redirection of audio data to/from Files, Lines and remote clients (using the javax.sound.sampled package). Some examples demonstrate its use in applets in Netscape and IE browsers.
  • Requirements: JDK 1.1 or better, Windows 98, Me, 2000 or NT, and any SAPI 5.1, 5.0 or 4.0 compliant speech engine (some of which can be downloaded from Microsoft's web site).

Lernout & Hauspie's TTS for Java Speech API

  • Description: Implementations based upon ASR1600 and TTS3000 engines, which support command and control and speech synthesis. Supports 10 different voices and associated whispering voices for the English language. Provides control for pitch, pitch range, speaking rate, and volume.
  • Requirements: Sun Solaris OS version 2.4 or later, JDK 1.1.5. Sun Swing package (free download) for graphical Type-n-Talk demo.
  • More information: Contact Edmund Kwan , Director of Sales, Western Region Speech and Language Technologies and Solutions ([email protected])

Conversa Web 3.0

  • Description: Conversa Web is a voice-enabled Web browser that provides a range of facilities for voice-navigation of the web by speech recognition and text-to-speech. The developers of Conversa Web chose to write a JSAPI implementation for the speech support.
  • Requirements: Windows 95/98 or NT 4.0 running on Intel Pentium 166 MHz processor or faster (or equivalent). Minimum of 32 MB RAM (64 MB recommended). Multimedia system: sound card and speakers. Microsoft Internet Explorer 4.0 or higher.
  • Description: Festival is a general multi-lingual speech synthesis system developed by the Centre for Speech Technology Research at the University of Edinburgh. It offers a full text to speech system with various APIs, as well an environment for development and research of speech synthesis techniques. It is written in C++ with a Scheme-based command interpreter for general control and provides a binding to the Java Speech API. Supports the English (British and American), Spanish and Welsh languages.
  • Requirements: Festival runs on Suns (SunOS and Solaris), FreeBSD, Linux, SGIs, HPs and DEC Alphas and is portable to other Unix machines. Preliminary support is available for Windows 95 and NT. For details and requirements see the Festival download page .

Elan Speech Cube

  • Description: Elan Speech Cube is a Multilingual, multichannel, cross-operating system text-to-speech software component for client-server architecture. Speech Cube is available with 2 TTS technologies (Elan Tempo : diphone concatenation and Elan Sayso : unit selection), covering 11 languages. Speech Cube native Java client supports JSAPI/JSML.
  • Requirements: JDK 1.3 or later on Windows NT/2000/XP, Linux or Solaris 2.7/2.8, Speech Cube V4.2 and higher.
  • About Elan Speech: Elan Speech is an established worldwide provider of text-to-speech technology (TTS). Elan TTS transforms any IT generated text into speech and reads it out loud.

How do I use JSAPI in an applet?

It is possible to use JSAPI in an applet. In order to do this, users will need the Java Plug-in (see here ). The reason for this is that JSAPI implementations require access to the AWT EventQueue, and the built-in JDK support in the browsers we've worked with denies any applet access to the AWT EventQueue. The Java Plug-in doesn't have this restriction, and users can configure the Java Plug-in to grant or deny applet access to the AWT Queue.

If you are using JRE 1.1:

Have your users follow these steps if your applet is based upon JDK 1.1:

  • Obtain a JDK 1.1.7 or better Java Runtime Environment (JRE). The reason for this is we have had problems with applet security being denied with JDK 1.1.6. Please note that the user needs the JRE and not the JDK. The JRE is freely available for download from the following URL:
  • Before running the browser, have the user modify their CLASSPATH environment variable to include the supporting classes for JSAPI. For example, if the user has IBM's Speech for Java, have the user include the ibmjs.jar file in CLASSPATH.
  • Make sure any shared libraries for the JSAPI support are in the user's PATH. For example, if the user has IBM's Speech For Java, have the user include the ibmjs lib directory in their PATH (e.g., c:\ibmjs\lib).
  • Have the user copy the speech.properties to their home directory. A user can determine their home directory by enabling the console for the Java Plug-in. When the user accesses a page that uses the Java Plug-in, the Java Plug-in console will tell the user what it thinks the user's home directory is.
  • Use javakey to add your identity to their signature database (i.e., identitydb.obj). This will tell the Java Plug-in to trust applets signed by you.
  • Copy the identitydb.obj that was created or updated in previous step to the user's home directory (the same place where the user copied speech.properties).

Then perform these steps on your applet:

  • Use javakey to both create a signature database for your system and to sign your applet's jar file. This will allow the applet to participate in the security model.
  • Create an HTML page that uses your applet in the Plug-in.
  • If the user experiences a "checkread" exception while attempting to run your applet, it's most likely due to a mismatch between the user's identitydb.obj file and the signature on your applet's jar file. A way to remedy this is to recreate your identitydb.obj and re-sign your jar file.

If you are using JRE 1.2:

The Java 2 platform's security model allows signing as done with JDK 1.1, but it also permits finer grained access control. The following are just some examples, and we recommend you read the Java Security Architecture Specification at the following URL before deciding what to do:

For a quick start, have your users do the following if your applet uses the Java 2 (i.e., JDK 1.2) platform:

  • Obtain the JDK 1.2 Plug-in.
  • Grant all applets the AllPermission property. This is extremely dangerous and is only provided as an example. To do this, have the user modify their java.policy file to contain only the following lines: grant { permission java.security.AllPermission; }
  • Grant permissions to a particular URL (e.g., the URL containing your applet). To do this, have the user add the following lines to their java.policy file: grant codeBase "http://your.url.here" { permission java.security.AllPermission; }

The information in this FAQ is not meant to be a complete tutorial on the JDK 1.1 and JDK 1.2 architecture. Instead, it is meant to be hopefully enough to get you started with running JSAPI applets in a browser. We suggest you visit the following URLs to obtain more information on the Java Security models:

Java Security Home Page: link

Tutorial on JDK 1.1 Security: link

Tutorial on JDK 1.2 Security: link

Why does Netscape Navigator or Internet Explorer throw a security exception when I use JSAPI in an applet?

JSAPI implementations require access to the AWT EventQueue. The built-in Java platform support in the browsers we've worked with denies an applet access to the AWT EventQueue. As a result, JSAPI implementations will be denied access to the AWT EventQueue. In addition, we are not aware of a way to configure the built-in Java platform support in these environments to allow access to the AWT EventQueue.

The Java Plug-in (see link ), however, can be configured to allow an applet the necessary permissions it needs to use an implementation of JSAPI. As a result, we currently recommend using the Java Plug-in for applets that use JSAPI.

I'm concerned about JSAPI applets "bugging" my office. What are the plans for JSAPI and security on JDK 1.2?

The JSAPI 1.0 specification includes the SpeechPermission class that currently only supports one SpeechPermission: javax.speech. When that permission is granted, an application or applet has access to all the capabilities provided by installed speech recognizers and synthesizers. Without that permission, an application or applet has no access to speech capabilities.

As speech technology matures it is anticipated that a finer-grained permission model will be introduced to provide access by applications and applets to some, but not all, speech capabilities.

Before granting speech permission, developers and users should consider the potential impact of the grant.

Does JSAPI allow me to control the audio input source of a recognizer or redirect the audio output of a speech synthesizer?

This support is currently not in JSAPI. We plan to use the Java Sound API to help provide this support in the future. We purposely left room for expansion in the javax.speech.AudioManager interface and will further investigate this support after the Java Sound API is finalized.

Introduction to the Java Speech API

By Nathan Tippy, OCI Senior Software Engineer

Speech Synthesis

Speech synthesis, also known as text-to-speech (TTS) conversion, is the process of converting text into human recognizable speech based on language and other vocal requirements. Speech synthesis can be used to enhance the user experience in many situations but care must be taken to ensure the user is comfortable with its use.

Speech synthesis has proven to be a great benefit in many ways. It is often used to assist the visually impaired as well as provide safety and efficiency in situations where the user needs to keep his eyes focused elsewhere. In the most successful applications of speech synthesis it is often central to the product requirements. If it is added on as an afterthought or a novelty it is rarely appreciated; people have high expectations when it comes to speech.

Natural sounding speech synthesis has been the goal of many development teams for a long time, yet it remains a significant challenge. People learn to speak at a very young age and continue to use their speaking and listening skills over the course of their lives, so it is very easy for people to recognize even the most minor flaws in speech synthesis.

As humans it is easy to take for granted our ability to speak but it is really a very complex process. There are a few different ways to implement a speech synthesis engine but in general they all complete the following steps:

java speech api

There are many voices available to developers today. Most of them are very good and a few are quite exceptional in how natural they sound. I put together a collection of both  commercial and non-commercial voices  so you can listen to them without having to setup or install anything.

Unfortunately the best voices (as of the time of this writing) are commercial so works produced using them can not be re-distributed without fees. Depending on how many voices you use and what you are using them for the annual costs for distribution rights can run from hundreds to thousands each year. Many vendors also provide different fee schedules for distributing applications that use a voice verses audio files and/or streams produced from the voices.

Java Speech API (JSAPI)

The goal of JSAPI is to enable cross-platform development of voice applications. The JSAPI enables developers to write applications that do not depend on the proprietary features of one platform or one speech engine.

Decoupling the engine from the application is important. As you can hear from the  voice demo page ; there is a wide variety of voices with different characteristics. Some users will be comfortable with a deep male voice while others may be more comfortable with a British female voice. The choice of speech engine and voice is subjective and may be expensive. In most cases, end users will use a single speech engine for multiple applications so they will expect any new speech enabled applications to integrate easily.

The Java Speech API 1.0 was first released by Sun in 1998 and defines packages for both speech recognition and speech synthesis. In order to remain brief the remainder of the article will focus on the speech synthesis package but if you would like to know more about speech recognition visit the  CMU Sphinx sourceforge.net project .

All the JSAPI implementations available today are compliant with 1.0 or a subset of 1.0 but work is progressing on  version 2.0 (JSR113) of the API . We will be using the open source implementation from  FreeTTS  for our demo app but there are other implementations such as the one from  Cloudscape  which provides support for the SAPI5 voices that Microsoft Windows uses.

Important Classes and Interfaces

Class: javax.speech.Central

This singleton class is the main interface for access to the speech engine facilities. It has a bad name (much too generic) but as part of the upgrade to version 2.0 they will be renaming it to  EngineManager  which is a much better name based on what it does.

For our example, we will only use the availableSynthesizers and createSynthesizer methods. Both of these methods need a mode description which is the next class we will use.

Class: javax.speech.synthesis.SynthesiserModeDesc

This simple bean holds all the required properties of the Synthesizer. When requesting a specific Synthesizer or a list of available Synthesizers this object can be passed in with specific properties to restrict the results to Synthesizers matching the defined properties only. The list of properties include the engine name, mode name, locale and running synthesizer.

The mode name property is not implemented with a type safe enumeration and it should only be set to the string value 'general' or 'time' when using the FreeTTS implementation. The mode name is specific to the engine, and in this case restricts the synthesizer to those that can speak any text or those that can only speak the time. If a time-only synthesizer is used for reading general text it will attempt to read it and print error messages when those phonemes it can't pronounce are encountered.

The locale property can be used to restrict international synthesizers which have support for many languages. See the MBROLA project for some international examples.

The running synthesizer property is used to limit the synthesizers returned to only those that are already loaded into memory. Because some synthesizers can take a long time to load into memory this feature may be helpful in limiting runtime delays.

Class: javax.speech.synthesis.Synthesiser

This class is used for converting text into speech using the selected voice. Synthesizers must be allocated before they can be used and this may take some time if high quality voices are supported which make use of large data files. It is recommended that the  allocate  method is called upon startup from a background thread. Call  deallocate  only when the application is about to exit. Once you have an allocated synthesizer it can be kept for the life of the application. Please note, in the chart below, the allocating and deallocating states that the synthesizer will be in while completing the allocate and deallocate operations, respectively.

java speech api

Class: javax.speech.synthesis.Voice

This simple bean holds the properties of the voice. The name, age and gender can be set along with a Boolean to indicate that only voices already loaded into memory should be used. The  setVoice  method uses these properties to select a voice matching the required properties. After a voice is selected the  getVoice  method can be called to get the properties of the voice currently being used.

Note that the age and gender parameters are integers and do not use a typesafe enumeration. If an invalid value is used a  PropertyVetoException  will be thrown. The valid constants for these fields are found on the Voice class and they are.

Interface: javax.speech.synthesis.Speakable

This interface should be implemented by any object that will produce marked up text that is to be spoken. The  specification for JSML  can be found on line and is very similar to  W3Cs Speech Synthesis Markup Language Specification (SSML)  which will be used instead of JSML for the 2.0 release.

Interface: javax.speech.synthesis.SpeakableListener

This interface should be implemented by any object wishing to listen to speech events. Notifications for events such as starting, stopping, pausing, resuming and others can be used to keep the application in sync with what the speech engine is doing.

Hello World

To try the demo you will need to set up the following:

Download  freetts-1.2.1-bin.zip  from  http://sourceforge.net/projects/freetts/   FreeTTS only supports a subset of 1.0 but it works well and has an easy-to-understand voice. Our JSML inflections will be ignored but the markup will be parsed correctly.

Unzip the freetts-1.2.1-bin.zip file to a local folder. The D:\apps\ folder will be used for this example

Go to  D:\apps\freetts-1.2.1\lib  and run jsapi.exe This will create the jsapi.jar from Sun Microsystems. This is done because it uses a different license than FreeTTS's BSD license.

Add this new jar and all the other jars found in the  D:\apps\freetts-1.2.1\lib  folder to your path. This will give us the engine, the JSAPI interfaces and three voices to use in our demo.

Copy the  D:\apps\freetts-1.2.1\speech.properties  file to your  %user.home%  or  %java.home%/lib  folders. This file is used by JSAPI to determine which speech engine will be used.

Compile the three demo files below and run BriefVoiceDemo from the command line.


  • package com.ociweb.jsapi ;
  • import java.beans.PropertyVetoException ;
  • import java.io.File ;
  • import java.text.DateFormat ;
  • import java.text.SimpleDateFormat ;
  • import java.util.Date ;
  • import java.util.Locale ;
  • import javax.speech.AudioException ;
  • import javax.speech.Central ;
  • import javax.speech.EngineException ;
  • import javax.speech.EngineList ;
  • import javax.speech.EngineModeDesc ;
  • import javax.speech.EngineStateError ;
  • import javax.speech.synthesis.JSMLException ;
  • import javax.speech.synthesis.Speakable ;
  • import javax.speech.synthesis.SpeakableListener ;
  • import javax.speech.synthesis.Synthesizer ;
  • import javax.speech.synthesis.SynthesizerModeDesc ;
  • import javax.speech.synthesis.Voice ;
  • public class BriefVoiceDemo {
  • Synthesizer synthesizer ;
  • public static void main ( String [ ] args ) {
  • //default synthesizer values
  • SynthesizerModeDesc modeDesc = new SynthesizerModeDesc (
  • null , // engine name
  • "general" , // mode name use 'general' or 'time'
  • Locale . US , // locale, see MBROLA Project for i18n examples
  • null , // prefer a running synthesizer (Boolean)
  • null ) ; // preload these voices (Voice[])
  • //default voice values
  • Voice voice = new Voice (
  • "kevin16" , //name for this voice
  • Voice. AGE_DONT_CARE , //age for this voice
  • Voice. GENDER_DONT_CARE , //gender for this voice
  • null ) ; //prefer a running voice (Boolean)
  • boolean error = false ;
  • for ( int r = 0 ; r < args. length ; r ++ ) {
  • String token = args [ r ] ;
  • String value = token. substring ( 2 ) ;
  • //overide some of the default synthesizer values
  • if ( token. startsWith ( "-E" ) ) {
  • modeDesc. setEngineName ( value ) ;
  • } else if ( token. startsWith ( "-M" ) ) {
  • modeDesc. setModeName ( value ) ;
  • } else
  • //overide some of the default voice values
  • if ( token. startsWith ( "-V" ) ) {
  • voice. setName ( value ) ;
  • } else if ( token. startsWith ( "-GF" ) ) {
  • voice. setGender ( Voice. GENDER_FEMALE ) ;
  • } else if ( token. startsWith ( "-GM" ) ) {
  • voice. setGender ( Voice. GENDER_MALE ) ;
  • //dont recognize this value so flag it and break out
  • System . out . println ( token +
  • " was not recognized as a supported parameter" ) ;
  • error = true ;
  • //The example starts here
  • BriefVoiceDemo briefExample = new BriefVoiceDemo ( ) ;
  • if ( error ) {
  • System . out . println ( "BriefVoiceDemo -E<ENGINENAME> " +
  • "-M<time|general> -V<VOICENAME> -GF -GM" ) ;
  • //list all the available voices for the user
  • briefExample. listAllVoices ( ) ;
  • System . exit ( 1 ) ;
  • //select synthesizer by the required parameters
  • briefExample. createSynthesizer ( modeDesc ) ;
  • //print the details of the selected synthesizer
  • briefExample. printSelectedSynthesizerModeDesc ( ) ;
  • //allocate all the resources needed by the synthesizer
  • briefExample. allocateSynthesizer ( ) ;
  • //change the synthesisers state from PAUSED to RESUME
  • briefExample. resumeSynthesizer ( ) ;
  • //set the voice
  • briefExample. selectVoice ( voice ) ;
  • //print the details of the selected voice
  • briefExample. printSelectedVoice ( ) ;
  • //create a listener to be notified of speech events.
  • SpeakableListener optionalListener = new BriefListener ( ) ;
  • //The Date and Time can be spoken by any of the selected voices
  • SimpleDateFormat formatter = new SimpleDateFormat ( "h mm" ) ;
  • String dateText = "The time is now " + formatter. format ( new Date ( ) ) ;
  • briefExample. speakTextSynchronously ( dateText, optionalListener ) ;
  • //General text like this can only be spoken by general voices
  • if ( briefExample. isModeGeneral ( ) ) {
  • //speak plain text
  • String plainText =
  • "Hello World, This is an example of plain text," +
  • " any markup like <jsml></jsml> will be spoken as is" ;
  • briefExample. speakTextSynchronously ( plainText, optionalListener ) ;
  • //speak marked-up text from Speakable object
  • Speakable speakableExample = new BriefSpeakable ( ) ;
  • briefExample. speakSpeakableSynchronously ( speakableExample,
  • optionalListener ) ;
  • //must deallocate the synthesizer before leaving
  • briefExample. deallocateSynthesizer ( ) ;
  •   * Select voice supported by this synthesizer that matches the required
  •   * properties found in the voice object. If no matching voice can be
  •   * found the call is ignored and the previous or default voice will be used.
  •   * @param voice required voice properties.
  • private void selectVoice ( Voice voice ) {
  • synthesizer. getSynthesizerProperties ( ) . setVoice ( voice ) ;
  • } catch ( PropertyVetoException e ) {
  • System . out . println ( "unsupported voice" ) ;
  • exit ( e ) ;
  •   * This method prepares the synthesizer for speech by moving it from the
  •   * PAUSED state to the RESUMED state. This is needed because all newly
  •   * created synthesizers start in the PAUSED state.
  •   * See Pause/Resume state diagram.
  •   * The pauseSynthesizer method is not shown but looks like you would expect
  •   * and can be used to pause any speech in process.
  • private void resumeSynthesizer ( ) {
  • //leave the PAUSED state, see state diagram
  • synthesizer. resume ( ) ;
  • } catch ( AudioException e ) {
  •   * The allocate method may take significant time to return depending on the
  •   * size and capabilities of the selected synthesizer. In a production
  •   * application this would probably be done on startup with a background thread.
  •   * This method moves the synthesizer from the DEALLOCATED state to the
  •   * ALLOCATING RESOURCES state and returns only after entering the ALLOCATED
  •   * state. See Allocate/Deallocate state diagram.
  • private void allocateSynthesizer ( ) {
  • //ensure that we only do this when in the DEALLOCATED state
  • if ( ( synthesizer. getEngineState ( ) & Synthesizer . DEALLOCATED ) != 0 )
  • //this call may take significant time
  • synthesizer. getEngineState ( ) ;
  • synthesizer. allocate ( ) ;
  • } catch ( EngineException e ) {
  • e. printStackTrace ( ) ;
  • } catch ( EngineStateError e ) {
  •   * deallocate the synthesizer. This must be done before exiting or
  •   * you will run the risk of having a resource leak.
  •   * This method moves the synthesizer from the ALLOCATED state to the
  •   * DEALLOCATING RESOURCES state and returns only after entering the
  •   * DEALLOCATED state. See Allocate/Deallocate state diagram.
  • private void deallocateSynthesizer ( ) {
  • //ensure that we only do this when in the ALLOCATED state
  • if ( ( synthesizer. getEngineState ( ) & Synthesizer . ALLOCATED ) != 0 )
  • //free all the resources used by the synthesizer
  • synthesizer. deallocate ( ) ;
  •   * Helper method to ensure the synthesizer is always deallocated before
  •   * existing the VM. The synthesiser may be holding substantial native
  •   * resources that must be explicitly released.
  •   * @param e exception to print before exiting.
  • private void exit ( Exception e ) {
  • deallocateSynthesizer ( ) ;
  •   * create a synthesiser with the required properties. The Central class
  •   * requires the speech.properties file to be in the user.home or the
  •   * java.home/lib folders before it can create a synthesizer.
  •   * @param modeDesc required properties for the created synthesizer
  • private void createSynthesizer ( SynthesizerModeDesc modeDesc ) {
  • //Create a Synthesizer with specified required properties.
  • //if none can be found null is returned.
  • synthesizer = Central. createSynthesizer ( modeDesc ) ;
  • catch ( IllegalArgumentException e1 ) {
  • e1. printStackTrace ( ) ;
  • } catch ( EngineException e1 ) {
  • if ( synthesizer == null ) {
  • System . out . println ( "Unable to create synthesizer with " +
  • "the required properties" ) ;
  • System . out . println ( ) ;
  • System . out . println ( "Be sure to check that the \" speech.properties \" " +
  • " file is in one of these locations:" ) ;
  • System . out . println ( " user.home : " + System . getProperty ( "user.home" ) ) ;
  • System . out . println ( " java.home/lib : " + System . getProperty ( "java.home" )
  • + File . separator + "lib" ) ;
  •   * is the selected synthesizer capable of speaking general text
  •   * @return is Mode General
  • private boolean isModeGeneral ( ) {
  • String mode = this . synthesizer . getEngineModeDesc ( ) . getModeName ( ) ;
  • return "general" . equals ( mode ) ;
  •   * Speak the marked-up text provided by the Speakable object and wait for
  •   * synthesisers queue to empty. Support for specific markup tags is
  •   * dependent upon the selected synthesizer. The text will be read as
  •   * though the mark up was not present if unsuppored tags are encounterd by
  •   * the selected synthesizer.
  •   * @param speakable
  •   * @param optionalListener
  • private void speakSpeakableSynchronously (
  • Speakable speakable,
  • SpeakableListener optionalListener ) {
  • this . synthesizer . speak ( speakable, optionalListener ) ;
  • } catch ( JSMLException e ) {
  • //wait for the queue to empty
  • this . synthesizer . waitEngineState ( Synthesizer . QUEUE_EMPTY ) ;
  • } catch ( IllegalArgumentException e ) {
  • } catch ( InterruptedException e ) {
  •   * Speak plain text 'as is' and wait until the synthesizer queue is empty
  •   * @param plainText that will be spoken ignoring any markup
  •   * @param optionalListener will be notified of voice events
  • private void speakTextSynchronously ( String plainText,
  • this . synthesizer . speakPlainText ( plainText, optionalListener ) ;
  •   * Print all the properties of the selected voice
  • private void printSelectedVoice ( ) {
  • Voice voice = this . synthesizer . getSynthesizerProperties ( ) . getVoice ( ) ;
  • System . out . println ( "Selected Voice:" + voice. getName ( ) ) ;
  • System . out . println ( " Style:" + voice. getStyle ( ) ) ;
  • System . out . println ( " Gender:" + genderToString ( voice. getGender ( ) ) ) ;
  • System . out . println ( " Age:" + ageToString ( voice. getAge ( ) ) ) ;
  •   * Helper method to convert gender constants to strings
  •   * @param gender as defined by the Voice constants
  •   * @return gender description
  • private String genderToString ( int gender ) {
  • switch ( gender ) {
  • case Voice. GENDER_FEMALE :
  • return "Female" ;
  • case Voice. GENDER_MALE :
  • return "Male" ;
  • case Voice. GENDER_NEUTRAL :
  • return "Neutral" ;
  • case Voice. GENDER_DONT_CARE :
  • return "Unknown" ;
  •   * Helper method to convert age constants to strings
  •   * @param age as defined by the Voice constants
  •   * @return age description
  • private String ageToString ( int age ) {
  • switch ( age ) {
  • case Voice. AGE_CHILD :
  • return "Child" ;
  • case Voice. AGE_MIDDLE_ADULT :
  • return "Middle Adult" ;
  • case Voice. AGE_NEUTRAL :
  • case Voice. AGE_OLDER_ADULT :
  • return "OlderAdult" ;
  • case Voice. AGE_TEENAGER :
  • return "Teenager" ;
  • case Voice. AGE_YOUNGER_ADULT :
  • return "Younger Adult" ;
  • case Voice. AGE_DONT_CARE :
  •   * Print all the properties of the selected synthesizer
  • private void printSelectedSynthesizerModeDesc ( ) {
  • EngineModeDesc description = this . synthesizer . getEngineModeDesc ( ) ;
  • System . out . println ( "Selected Synthesizer:" + description. getEngineName ( ) ) ;
  • System . out . println ( " Mode:" + description. getModeName ( ) ) ;
  • System . out . println ( " Locale:" + description. getLocale ( ) ) ;
  • System . out . println ( " IsRunning:" + description. getRunning ( ) ) ;
  •   * List all the available synthesizers and voices.
  • public void listAllVoices ( ) {
  • System . out . println ( "All available JSAPI Synthesizers and Voices:" ) ;
  • //Do not set any properties so all the synthesizers will be returned
  • SynthesizerModeDesc emptyDesc = new SynthesizerModeDesc ( ) ;
  • EngineList engineList = Central. availableSynthesizers ( emptyDesc ) ;
  • //loop over all the synthesizers
  • for ( int e = 0 ; e < engineList. size ( ) ; e ++ ) {
  • SynthesizerModeDesc desc = ( SynthesizerModeDesc ) engineList. get ( e ) ;
  • //loop over all the voices for this synthesizer
  • Voice [ ] voices = desc. getVoices ( ) ;
  • for ( int v = 0 ; v < voices. length ; v ++ ) {
  • System . out . println (
  • desc. getEngineName ( ) +
  • " Voice:" + voices [ v ] . getName ( ) +
  • " Gender:" + genderToString ( voices [ v ] . getGender ( ) ) ) ;


  •  * Simple Speakable
  •  * Returns marked-up text to be spoken
  • public class BriefSpeakable implements Speakable {
  •   * Returns marked-up text. The markup is used to help the vice engine.
  • public String getJSMLText ( ) {
  • return "<jsml><para>This Speech <sayas class='literal'>API</sayas> " +
  • "can integrate with <emp> most </emp> " +
  • "of the speech engines on the market today.</para>" +
  • "<break msecs='300'/><para>Keep on top of the latest developments " +
  • "by reading all you can about " +
  • "<sayas class='literal'>JSR113</sayas></para></jsml>" ;
  •   * Implemented so the listener can print out the source
  • public String toString ( ) {
  • return getJSMLText ( ) ;


  • import javax.speech.synthesis.SpeakableEvent ;
  •  * Simple SpeakableListener
  •  * Prints event type and the source object's toString()
  • public class BriefListener implements SpeakableListener {
  • private String formatEvent ( SpeakableEvent event ) {
  • return event. paramString ( ) + ": " + event. getSource ( ) ;
  • public void markerReached ( SpeakableEvent event ) {
  • System . out . println ( formatEvent ( event ) ) ;
  • public void speakableCancelled ( SpeakableEvent event ) {
  • public void speakableEnded ( SpeakableEvent event ) {
  • public void speakablePaused ( SpeakableEvent event ) {
  • public void speakableResumed ( SpeakableEvent event ) {
  • public void speakableStarted ( SpeakableEvent event ) {
  • public void topOfQueue ( SpeakableEvent event ) {
  • public void wordStarted ( SpeakableEvent event ) {

Further work on version 2.0 continues under JSR 113. The primary goal of the upcoming 2.0 spec is to bring JSAPI to J2ME but a few other overdue changes like class renaming have been done as well.

My impression after using JSAPI is that it would be much easier to use if it implemented unchecked exceptions. This would help make the code much easier to read and implement. Overall I think the API is on the right track and adds a needed abstraction layer for any project using speech synthesis.

As computer performance continues to improve and Java becomes embedded in more devices, interfaces that make computers easier for non-technical people such as voice synthesis and recognition will become ubiquitous. I recommend that anyone who might be working with embedded Java in the future keep an eye on JSR113.

  • [1] JSML http://java.sun.com/products/java-media/speech/forDevelopers/JSML/
  • [2] FreeTTS JSAPI setup http://freetts.sourceforge.net/
  • [3] JSAPI http://java.sun.com/products/java-media/speech/news/index.html   JSAPI Guide   JSAPI JavaDoc   Overview
  • [4] Diagrams http://JavaNut.com/BlogDraw

Easy Way to Learn Speech Recognition in Java With a Speech-To-Text API

Rev › Blog › Resources › Other Resources › Speech-to-Text APIs › Easy Way to Learn Speech Recognition in Java With a Speech-To-Text API

Here we explain show how to use a speech-to-text API with two Java examples.

We will be using the Rev AI API ( free for your first 5 hours ) that has two different speech-to-text API’s:

  • Asynchronous API – For pre-recorded audio or video
  • Streaming API – For live (streaming) audio or video

Asynchronous Rev AI API Java Code Example

We will use the Rev AI Java SDK located here .  We use this short audio , on the exciting topic of HR recruiting.

First, sign up for Rev AI for free and get an access token.

Create a Java project with whatever editor you normally use.  Then add this dependency to the Maven pom.xml manifest:

The code sample below is here . We explain it and show the output.

Submit the job from a URL:

Most of the Rev AI options are self-explanatory, for the most part.  You can use the callback to kick off downloading the transcription in another program that is on standby, listening on http, if you don’t want to use the polling method we use in this example.

Put the program in a loop and check the job status.  Download the transcription when it is done.

The SDK returns captions as well as text.

Here is the complete code:

It responds:

You can get the transcript with Java.

Or go get it later with curl, noting the job id from stdout above.

This returns the transcription in JSON format: 

Streaming Rev AI API Java Code Example

A stream is a websocket connection from your video or audio server to the Rev AI audio-to-text entire.

We can emulate this connection by streaming a .raw file from the local hard drive to Rev AI.

One Ubuntu run:

Download the audio then convert it to .raw format as shown below.  Converted it from wav to raw with the following ffmpeg command:

As you run that is gives key information about the audio file:

To explain, first we set a websocket connection and start streaming the file:

The important items to set here are the  sampling rate (not bit rate) and format.  We match this information from ffmpeg:    Audio: pcm_f32le, 48000 Hz , 

After the client connects, the onConnected event sends a message.  We can get the jobid from there.  This will let us download the transcription later if we don’t want to get it in real-time.

To get the transcription in real time, listen for the onHypothesis event:

Here is what the output looks like:

What is the Best Speech Recognition API for Java?

Accuracy is what you want in a speech-to-text API, and Rev AI is a one-of-a-kind speech-to-text API in that regard.

You might ask, “So what?  Siri and Alexa already do speech-to-text, and Google has a speech cloud API.”

That’s true.  But there’s one game-changing difference: 

The data that powers Rev AI is manually collected and carefully edited .  Rev pays 50,000 freelancers to transcribe audio & caption videos for its 99% accurate transcription & captioning services . Rev AI is trained with this human-sourced data, and this produces transcripts that are far more accurate than those compiled simply by collecting audio, as Siri and Alexa do.

java speech api

Rev AI’s accuracy is also snowballing, in a sense. Rev’s speech recognition system and API is constantly improving its accuracy rates as its dataset grows and the world-class engineers constantly improve the product.

java speech api

Labelled Data and Machine Learning

Why is human transcription important?

If you are familiar with machine learning then you know that converting audio to text is a classification problem.  

To train the computer to transcribe audio ML programmers feed feature-label data into their model.  This data is called a training set .

Features (sound) are input and labels (the corresponding letter) are output, calculated by the classification algorithm.

Alexa and Siri vacuum up this data all day long.  So you would think they would have the largest and therefore most accurate training data.  

But that’s only half of the equation.  It takes many hours of manual work to type in the labels that correspond to the audio.  In other words, a human must listen to the audio and type the corresponding letter and word.  

This is what Rev AI has done.

It’s a business model that has taken off, because it fills a very specific need.

For example, look at closed captioning on YouTube.  YouTube can automatically add captions to it’s audio.  But it’s not always clear.  You will notice that some of what it says is nonsense. It’s just like Google Translate: it works most of the time, but not all of the time.

The giant tech companies use statistical analysis, like the frequency distribution of words, to help their models.

But they are consistently outperformed by manually trained audio-to-voice training models.

More Caption & Subtitle Articles

Everybody’s favorite speech-to-text blog.

We combine AI and a huge community of freelancers to make speech-to-text greatness every day. Wanna hear more about it?

  • Java Arrays
  • Java Strings
  • Java Collection
  • Java 8 Tutorial
  • Java Multithreading
  • Java Exception Handling
  • Java Programs
  • Java Project
  • Java Collections Interview
  • Java Interview Questions
  • Spring Boot
  • Type Erasure in Java
  • Java 8 Predicate with Examples
  • Assertions in Java
  • What is Java Ring?
  • Java is Strictly Pass by Value!
  • Java Numeric Promotion in Conditional Expression
  • Comparable vs Comparator in Java
  • POJO vs Java Beans
  • What is Reactive Programming in Java?
  • Compressing and Decompressing files in Java
  • Private Constructors and Singleton Classes in Java
  • Java Program to open the command prompt and insert commands
  • Instance Variable Hiding in Java
  • Type conversion in Java with Examples
  • Java program without making class
  • Numbers in Java (With 0 Prefix and with Strings)
  • Java Debugger (JDB)
  • How to Fix int cannot be dereferenced Error in Java?
  • Valid variants of main() in Java

Converting Text to Speech in Java

Java Speech API: The Java Speech API allows Java applications to incorporate speech technology into their user interfaces. It defines a cross-platform API to support command and control recognizers, dictation systems and speech synthesizers.

Java Speech supports speech synthesis which means the process of generating spoken the language by machine on the basis of written input.

It is important to keep in mind that Java Speech is only a specification i.e. no implementation is included. Thus third-parties provide the implementations. The javax.speech package defines the common functionality of recognizers, synthesizers, and other speech engines. The package javax.speech.synthesis extends this basic functionality for synthesizers.

We will understand that what is required for java API to convert text to speech

  • Engine: The Engine interface is available inside the speech package.”Speech engine” is the generic term for a system designed to deal with either speech input or speech output. import javax.speech.Engine;
  • Central: Central provides the ability to locate, select and create speech recognizers and speech synthesizers. import javax.speech.Central;
  • SynthesizerModeDesc: SynthesizerModeDesc extends the EngineModeDesc with the properties that are specific to speech synthesizers. import javax.speech.synthesis.SynthesizerModeDesc;
  • Synthesizer: The Synthesizer interface provides primary access to speech synthesis capabilities.SynthesizerModeDesc adds two properties: List of voices provided by the synthesizer Voice to be loaded when the synthesizer is started. import javax.speech.synthesis.Synthesizer;

Below is an open-source implementation of Java Speech Synthesis called FreeTTS in the form of steps:

  • Download the FreeTTS in the form of zip folder from here
  • Extract the zip file and go to freetts-1.2.2-bin/freetts-1.2/lib/jsapi.exe
  • Open the jsapi.exe file and install it.
  • This will create a jar file by the name jsapi.jar . This is the JAR library that contains the FreeTTS library to be included in the project.
  • Create a new Java project in your IDE.
  • Include this jsapi.jar file into your project.
  • Now copy the below code into your project
  • Execute the project to get the below expected output.

Below is the code for the above project:


  • https://docs.oracle.com/cd/E17802_01/products/products/java-media/speech/forDevelopers/jsapi-doc/javax/speech/package-summary.html
  • https://www.javatpoint.com/q/5931/java-code-for-converting-audio-to-text-and-video-to-audio
  • http://www.oracle.com/technetwork/java/jsapifaq-135248.html

Related article: Convert Text to Speech in Python

Please Login to comment...

  • 10 Best Free Code Learning Apps for Android in 2024
  • 5 Best AI Tools for Plagiarism Detection 2024
  • 10 Best iMovie Alternatives in 2024
  • 10 Best AI Tools for Sentiment Analysis
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice


What kind of Experience do you want to share?

  • HTML & CSS
  • Java interaction
  • String / Number
  • Environment
  • JS interaction
  • JSP / Servlet
  • XML / RSS / JSON
  • Localization
  • Date / Time
  • Open Source
  • Powerscript
  • Win API & Registry
  • Common problems
  • WSH & VBScript
  • Windows, Batch, PDF, Internet
  • Latest Comments
  • TS2068, Sinclair QL Archives
  • Real's HowTo FAQ
  • Deprecated (old stuff)

java speech api

  • String and Number
  • Windows,Batch,PDF,...
  • TS2068/Sinclair QL

Use the Java Speech API (JSPAPI) Tag(s): IO

About cookies on this site.

We use cookies to collect and analyze information on site performance and usage, to provide social media features and to enhance and customize content and advertisements.

Text to Audio API: Quick Start Tutorial for App Integration Through Widely Used Java Libraries

Unreal Speech

Unreal Speech

The auditory dimension of applications has never been more accessible to developers, thanks to Text to Audio APIs. Integrating these APIs using Java is a strategic move to imbue applications with the power of speech. Widely used Java libraries like JavaTpoint and Eclipse provide the scaffolding for simple to complex voice-enabled features. For software engineers and game developers, this merger of Java's robust functionality with the expressiveness of audio opens up a new frontier in app development.

Whether the goal is to narrate interactive stories, provide vocal instructions, or offer responsive spoken feedback within an application, Text to Audio APIs are the cornerstone of voice interaction. Starting with the foundational JavaTpoint, and extending to GitHub repositories filled with Java examples, the provided libraries make the integration with Text to Audio APIs seamless. These resources, coupled with Java's widespread use across platforms, ensure a smooth journey from silent text blocks to engaging audio dialogs.

java speech api

Understanding Text to Audio APIs

In the ever-evolving landscape of application development, Text to Audio APIs constitute a vital toolset that enriches user interactions through speech. To fully leverage these APIs, it's crucial for developers to familiarize themselves with key terminologies. Here's a glossary that elucidates several core concepts developers may encounter when integrating Text to Audio functionalities into their Java applications.

Text to Audio API: A programmatic interface that enables conversion of text into spoken audio, often used in applications for accessibility and enhanced user experience.

Java Library: A collection of prewritten code and methods that Java programmers can use to perform common tasks related to audio processing and speech synthesis without reinventing the wheel.

JavaTpoint: An educational website that provides tutorials and resources for various technologies including Java, specifically pertaining to text-to-speech conversion in this context.

GitHub Repository: An online storage space hosted on GitHub where developers can manage, share, and collaborate on code, often including libraries and examples related to audio API integration.

Eclipse: An integrated development environment (IDE) widely used by Java developers that can support text to audio API development through plugins and tools.

Text to Speech Conversion: The process of transforming written text into synthetic speech, typically utilizing a Text to Audio API within an application's architecture.

Java Libraries for Text to Audio Conversion: A Guide

Navigating javatpoint libraries for tts integration.

JavaTpoint offers a wealth of tutorials and resources that simplify the integration of text to audio capabilities into Java applications. These libraries provide a range of functions, enabling developers to execute text to speech tasks easily:

  • Overview of JavaTpoint library capabilities for text to audio conversion
  • Step-by-step tutorials on implementing TTS features
  • Tips for optimizing speech quality and performance

Best Practices for Java GitHub Repositories in Text to Audio

GitHub is a treasure trove of Java repositories filled with code examples and libraries for text to audio API integration. These repositories can significantly accelerate development workflows:

  • Finding the most reliable and widely used Java repositories for TTS
  • Guidelines for cloning, forking, and contributing to repositories
  • Recommendations on repository maintenance and keeping dependencies up to date

Java-Based In-App Text to Audio Implementation

Implementing text to audio within Java applications has the potential to elevate the user experience by providing dynamic auditory feedback. The process involves utilizing Java's robust audio processing capabilities to transform text into clear and natural-sounding speech. Tailored specifically for Java applications, the integration not only offers accessibility solutions but also expands the application's functionality to include notifications, guides, and interactive voice responses.

Java's standard libraries and third-party APIs offer various mechanisms to achieve seamless text to audio implementation. These methods ensure secure, efficient processing of text inputs and management of audio outputs, making the development of sophisticated voice-enabled features straightforward for Java developers.

Quick Start Tutorials for TTS Java Integration

Integrating text to audio api in java using javatpoint.

JavaTpoint provides a practical and straightforward approach to incorporating text to audio features within Java applications. By following their tutorials, developers can access a wealth of information and step-by-step guidance on utilizing Java libraries to convert text into spoken word, streamlining the development process and enhancing application interfaces with auditory feedback.

Building Text to Audio Applications in Java with Eclipse

Utilizing Eclipse as an IDE for building text to audio applications entails setting up the necessary Java libraries and SDKs for TTS functionality. Eclipse offers a conducive environment for coding, testing, and debugging, ensuring developers can work with text to audio APIs efficiently. Its rich set of features and plugin support makes it an ideal choice for working on complex Java projects with TTS requirements.

Text to Speech Java Code Examples for Beginners

Beginners seeking to learn text to speech integration in Java can leverage numerous code examples available across various platforms. These examples provide a hands-on approach to learning, allowing new developers to experiment with simple TTS implementations and gradually progress to more advanced applications of the API.

Open Source Java Libraries for Text to Speech Conversion

Open source Java libraries offer an extensive array of options for text to speech conversion, catering to a wide range of application needs. These libraries, often community-driven, provide flexibility and the opportunity for customization, enabling developers to tailor TTS functionality to the specific needs of their projects.

Java and Audio API Resources for Effective Implementation

Java's ecosystem provides a robust environment for the deployment of audio APIs. Developers can take advantage of a wide variety of resources for effective implementation of text to audio in their applications. These resources include comprehensive libraries that facilitate easy access to TTS services, detailed documentation to guide users through the integration process and community forums where developers can share insights and seek solutions to complex challenges.

Moreover, Java's strong community support plays a crucial role in the continuous improvement of audio API resources. Developers can find updated Java Speech API libraries that include bug fixes, performance enhancements, and new features. Tools such as Maven repositories also offer reliable ways to manage project dependencies related to text to audio APIs, making sure that developers have the most compatible and recent versions for their Java applications.

Common Questions Re: Text to Audio in Java

How to convert text into voice in java.

To convert text into voice in Java, utilize Java libraries such as FreeTTS or Open Source APIs that can synthesize speech. Instantiate a Voice object from one of these libraries and call the speak method with the desired text.

How Do I Create a Text to Speech API?

Creating a text to speech API involves selecting a speech synthesis engine, setting up an interface for text input, and programming the engine to process and output spoken audio. This can be done independently or by leveraging existing services and libraries.

What Is the Java API for Voice Recognition?

The Java API for voice recognition is the Java Speech API (JSAPI), which allows Java applications to incorporate speech-to-text functionalities. JSAPI provides a framework for developers to add voice commands and dictation capabilities into their Java applications.

How Do I Turn Text into Audio?

To turn text into audio in a Java application, you can use a Text to Speech (TTS) library or API. Invoke the library's methods to input text and output the synthesized speech as audio, which can then be played back or stored as a file.

  • Skip to main content
  • Skip to search
  • Skip to select language
  • Sign up for free
  • English (US)

Using the Web Speech API

Speech recognition.

Speech recognition involves receiving speech through a device's microphone, which is then checked by a speech recognition service against a list of grammar (basically, the vocabulary you want to have recognized in a particular app.) When a word or phrase is successfully recognized, it is returned as a result (or list of results) as a text string, and further actions can be initiated as a result.

The Web Speech API has a main controller interface for this — SpeechRecognition — plus a number of closely-related interfaces for representing grammar, results, etc. Generally, the default speech recognition system available on the device will be used for the speech recognition — most modern OSes have a speech recognition system for issuing voice commands. Think about Dictation on macOS, Siri on iOS, Cortana on Windows 10, Android Speech, etc.

Note: On some browsers, such as Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.

To show simple usage of Web speech recognition, we've written a demo called Speech color changer . When the screen is tapped/clicked, you can say an HTML color keyword, and the app's background color will change to that color.

The UI of an app titled Speech Color changer. It invites the user to tap the screen and say a color, and then it turns the background of the app that color. In this case it has turned the background red.

To run the demo, navigate to the live demo URL in a supporting mobile browser (such as Chrome).


The HTML and CSS for the app is really trivial. We have a title, instructions paragraph, and a div into which we output diagnostic messages.

The CSS provides a very simple responsive styling so that it looks OK across devices.

Let's look at the JavaScript in a bit more detail.

Prefixed properties

Browsers currently support speech recognition with prefixed properties. Therefore at the start of our code we include these lines to allow for both prefixed properties and unprefixed versions that may be supported in future:

The grammar

The next part of our code defines the grammar we want our app to recognize. The following variable is defined to hold our grammar:

The grammar format used is JSpeech Grammar Format ( JSGF ) — you can find a lot more about it at the previous link to its spec. However, for now let's just run through it quickly:

  • The lines are separated by semicolons, just like in JavaScript.
  • The first line — #JSGF V1.0; — states the format and version used. This always needs to be included first.
  • The second line indicates a type of term that we want to recognize. public declares that it is a public rule, the string in angle brackets defines the recognized name for this term ( color ), and the list of items that follow the equals sign are the alternative values that will be recognized and accepted as appropriate values for the term. Note how each is separated by a pipe character.
  • You can have as many terms defined as you want on separate lines following the above structure, and include fairly complex grammar definitions. For this basic demo, we are just keeping things simple.

Plugging the grammar into our speech recognition

The next thing to do is define a speech recognition instance to control the recognition for our application. This is done using the SpeechRecognition() constructor. We also create a new speech grammar list to contain our grammar, using the SpeechGrammarList() constructor.

We add our grammar to the list using the SpeechGrammarList.addFromString() method. This accepts as parameters the string we want to add, plus optionally a weight value that specifies the importance of this grammar in relation of other grammars available in the list (can be from 0 to 1 inclusive.) The added grammar is available in the list as a SpeechGrammar object instance.

We then add the SpeechGrammarList to the speech recognition instance by setting it to the value of the SpeechRecognition.grammars property. We also set a few other properties of the recognition instance before we move on:

  • SpeechRecognition.continuous : Controls whether continuous results are captured ( true ), or just a single result each time recognition is started ( false ).
  • SpeechRecognition.lang : Sets the language of the recognition. Setting this is good practice, and therefore recommended.
  • SpeechRecognition.interimResults : Defines whether the speech recognition system should return interim results, or just final results. Final results are good enough for this simple demo.
  • SpeechRecognition.maxAlternatives : Sets the number of alternative potential matches that should be returned per result. This can sometimes be useful, say if a result is not completely clear and you want to display a list if alternatives for the user to choose the correct one from. But it is not needed for this simple demo, so we are just specifying one (which is actually the default anyway.)

Starting the speech recognition

After grabbing references to the output <div> and the HTML element (so we can output diagnostic messages and update the app background color later on), we implement an onclick handler so that when the screen is tapped/clicked, the speech recognition service will start. This is achieved by calling SpeechRecognition.start() . The forEach() method is used to output colored indicators showing what colors to try saying.

Receiving and handling results

Once the speech recognition is started, there are many event handlers that can be used to retrieve results, and other pieces of surrounding information (see the SpeechRecognition events .) The most common one you'll probably use is the result event, which is fired once a successful result is received:

The second line here is a bit complex-looking, so let's explain it step by step. The SpeechRecognitionEvent.results property returns a SpeechRecognitionResultList object containing SpeechRecognitionResult objects. It has a getter so it can be accessed like an array — so the first [0] returns the SpeechRecognitionResult at position 0. Each SpeechRecognitionResult object contains SpeechRecognitionAlternative objects that contain individual recognized words. These also have getters so they can be accessed like arrays — the second [0] therefore returns the SpeechRecognitionAlternative at position 0. We then return its transcript property to get a string containing the individual recognized result as a string, set the background color to that color, and report the color recognized as a diagnostic message in the UI.

We also use the speechend event to stop the speech recognition service from running (using SpeechRecognition.stop() ) once a single word has been recognized and it has finished being spoken:

Handling errors and unrecognized speech

The last two handlers are there to handle cases where speech was recognized that wasn't in the defined grammar, or an error occurred. The nomatch event seems to be supposed to handle the first case mentioned, although note that at the moment it doesn't seem to fire correctly; it just returns whatever was recognized anyway:

The error event handles cases where there is an actual error with the recognition successfully — the SpeechRecognitionErrorEvent.error property contains the actual error returned:

Speech synthesis

Speech synthesis (aka text-to-speech, or TTS) involves receiving synthesizing text contained within an app to speech, and playing it out of a device's speaker or audio output connection.

The Web Speech API has a main controller interface for this — SpeechSynthesis — plus a number of closely-related interfaces for representing text to be synthesized (known as utterances), voices to be used for the utterance, etc. Again, most OSes have some kind of speech synthesis system, which will be used by the API for this task as available.

To show simple usage of Web speech synthesis, we've provided a demo called Speak easy synthesis . This includes a set of form controls for entering text to be synthesized, and setting the pitch, rate, and voice to use when the text is uttered. After you have entered your text, you can press Enter / Return to hear it spoken.

UI of an app called speak easy synthesis. It has an input field in which to input text to be synthesized, slider controls to change the rate and pitch of the speech, and a drop down menu to choose between different voices.

To run the demo, navigate to the live demo URL in a supporting mobile browser.

The HTML and CSS are again pretty trivial, containing a title, some instructions for use, and a form with some simple controls. The <select> element is initially empty, but is populated with <option> s via JavaScript (see later on.)

Let's investigate the JavaScript that powers this app.

Setting variables

First of all, we capture references to all the DOM elements involved in the UI, but more interestingly, we capture a reference to Window.speechSynthesis . This is API's entry point — it returns an instance of SpeechSynthesis , the controller interface for web speech synthesis.

Populating the select element

To populate the <select> element with the different voice options the device has available, we've written a populateVoiceList() function. We first invoke SpeechSynthesis.getVoices() , which returns a list of all the available voices, represented by SpeechSynthesisVoice objects. We then loop through this list — for each voice we create an <option> element, set its text content to display the name of the voice (grabbed from SpeechSynthesisVoice.name ), the language of the voice (grabbed from SpeechSynthesisVoice.lang ), and -- DEFAULT if the voice is the default voice for the synthesis engine (checked by seeing if SpeechSynthesisVoice.default returns true .)

We also create data- attributes for each option, containing the name and language of the associated voice, so we can grab them easily later on, and then append the options as children of the select.

Older browser don't support the voiceschanged event, and just return a list of voices when SpeechSynthesis.getVoices() is fired. While on others, such as Chrome, you have to wait for the event to fire before populating the list. To allow for both cases, we run the function as shown below:

Speaking the entered text

Next, we create an event handler to start speaking the text entered into the text field. We are using an onsubmit handler on the form so that the action happens when Enter / Return is pressed. We first create a new SpeechSynthesisUtterance() instance using its constructor — this is passed the text input's value as a parameter.

Next, we need to figure out which voice to use. We use the HTMLSelectElement selectedOptions property to return the currently selected <option> element. We then use this element's data-name attribute, finding the SpeechSynthesisVoice object whose name matches this attribute's value. We set the matching voice object to be the value of the SpeechSynthesisUtterance.voice property.

Finally, we set the SpeechSynthesisUtterance.pitch and SpeechSynthesisUtterance.rate to the values of the relevant range form elements. Then, with all necessary preparations made, we start the utterance being spoken by invoking SpeechSynthesis.speak() , passing it the SpeechSynthesisUtterance instance as a parameter.

In the final part of the handler, we include an pause event to demonstrate how SpeechSynthesisEvent can be put to good use. When SpeechSynthesis.pause() is invoked, this returns a message reporting the character number and name that the speech was paused at.

Finally, we call blur() on the text input. This is mainly to hide the keyboard on Firefox OS.

Updating the displayed pitch and rate values

The last part of the code updates the pitch / rate values displayed in the UI, each time the slider positions are moved.

Javatpoint Logo

Java Tutorial

Control statements, java object class, java inheritance, java polymorphism, java abstraction, java encapsulation, java oops misc.


  • Send your Feedback to [email protected]

Help Others, Please Share


Learn Latest Tutorials

Splunk tutorial


Tumblr tutorial

Reinforcement Learning

R Programming tutorial

R Programming

RxJS tutorial

React Native

Python Design Patterns

Python Design Patterns

Python Pillow tutorial

Python Pillow

Python Turtle tutorial

Python Turtle

Keras tutorial



Verbal Ability

Interview Questions

Interview Questions

Company Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

Artificial Intelligence

AWS Tutorial

Cloud Computing

Hadoop tutorial

Data Science

Angular 7 Tutorial

Machine Learning

DevOps Tutorial

B.Tech / MCA

DBMS tutorial

Data Structures

DAA tutorial

Operating System

Computer Network tutorial

Computer Network

Compiler Design tutorial

Compiler Design

Computer Organization and Architecture

Computer Organization

Discrete Mathematics Tutorial

Discrete Mathematics

Ethical Hacking

Ethical Hacking

Computer Graphics Tutorial

Computer Graphics

Software Engineering

Software Engineering

html tutorial

Web Technology

Cyber Security tutorial

Cyber Security

Automata Tutorial

C Programming

C++ tutorial

Control System

Data Mining Tutorial

Data Mining

Data Warehouse Tutorial

Data Warehouse

RSS Feed

Java ® Platform, Standard Edition & Java Development Kit Version 22 API Specification

This document is divided into two sections:

Java SE The Java Platform, Standard Edition (Java SE) APIs define the core Java platform for general-purpose computing. These APIs are in modules whose names start with java . JDK The Java Development Kit (JDK) APIs are specific to the JDK and will not necessarily be available in all implementations of the Java SE Platform. These APIs are in modules whose names start with jdk .

Scripting on this page tracks web page traffic, but does not change the content in any way.

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

🙊 Speech Recognition , Text To Speech , Google Translate


Sponsor this project.

  • https://www.paypal.me/GOXR3PLUSCOMPANY

Contributors 3


  • Java 100.0%


  1. Convert Text-to-Speech in Java

    java speech api

  2. GitHub

    java speech api

  3. Speech To Text Conversion using Java API

    java speech api

  4. JavaScript Text to Speech with Code Example

    java speech api

  5. What is Java API? Definition, Types, and Examples

    java speech api

  6. Java Speech Recognition Tutorial (Part 4)+Text To Speech

    java speech api


  1. Java # Speech Recognizer In Java Sphinx 4 HD # Speech Recognizer in java using Eclipse SDK. #

  2. Web Speech API Demonstration

  3. Stream API. Java. Максимально простым языком

  4. Java Google Text To Speech : Tutorial [ 2 ] Downloading and Building the Project

  5. Text To Speech Using HTML , CSS & JavaScript

  6. NVIDIA Riva Automatic Speech Recognition for AudioCodes VoiceAI Connect Users


  1. Java Speech API Frequently Asked Questions

    Learn about the Java Speech API (JSAPI), a cross-platform API to support speech technology in Java applications. Find out how to download JSAPI, what are the specifications, implementations and requirements, and how to use JSML and JSGF.

  2. Java Speech API

    The Java Speech API (JSAPI) is an application programming interface for cross-platform support of command and control recognizers, dictation systems, and speech synthesizers. Although JSAPI defines an interface only, there are several implementations created by third parties, for example FreeTTS.

  3. Introduction to the Java Speech API

    Learn how to use the Java Speech API (JSAPI) to write applications that do not depend on the proprietary features of one speech engine or platform. JSAPI enables developers to access speech synthesis, recognition, and natural sounding voices with different characteristics and languages.

  4. Easy Way to Learn Speech Recognition in Java With a Speech-To-Text API

    Here we explain show how to use a speech-to-text API with two Java examples. We will be using the Rev AI API ( free for your first 5 hours) that has two different speech-to-text API's: Asynchronous API - For pre-recorded audio or video. Streaming API - For live (streaming) audio or video. Find the Full Java SDK for the Rev AI API Here.

  5. Converting Text to Speech in Java

    Java Speech API: The Java Speech API allows Java applications to incorporate speech technology into their user interfaces. It defines a cross-platform API to support command and control recognizers, dictation systems and speech synthesizers. Java Speech supports speech synthesis which means the process of generating spoken the language by machine on the basis of written input.

  6. GitHub

    The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides ...

  7. Where can I find and download the Java Speech API?

    Sorted by: 5. A link from the Desktop Java Java Speech API leads to the SourceForge page for FreeTTS. The FAQ says: The Java Speech API (JSAPI) is not part of the JDK and Sun does not ship an implementation of JSAPI. Instead, we work with third party speech companies to encourage the availability of multiple implementations.

  8. Use the Java Speech API (JSPAPI)

    The configuration is done in 2 steps. speech.properties contains the package to provide the JSAPI implementation. Typically, the file is located in the JRE\lib directory. In this example, I register the TTS package directly. You specify the available voices.

  9. GitHub

    recognition system written entirely in the Java programming language. It. was created via a joint collaboration between the Sphinx group at. Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi. Electric Research Labs (MERL), and Hewlett Packard (HP), with. contributions from the University of California at Santa Cruz (UCSC) and.

  10. Class javax.speech.Central

    Access to speech engines is restricted by Java's security system. This is to ensure that malicious applets don't use the speech engines inappropriately. For example, a recognizer should not be usable without explicit permission because it could be used to monitor ("bug") an office. A number of methods throughout the API throw SecurityException.

  11. Text to Audio API: Quick Start Tutorial for App Integration Through

    The Java API for voice recognition is the Java Speech API (JSAPI), which allows Java applications to incorporate speech-to-text functionalities. JSAPI provides a framework for developers to add voice commands and dictation capabilities into their Java applications.

  12. J.A.R.V.I.S. (Java Speech API)

    A short demo of J.A.R.V.I.S. a Java Speech API that utilizes Google's recognition, translation, and synthesis service. Github Project: https://github.com/lk...

  13. Maven Repository: com.github.lkuza2 » java-speech-api » v2.01

    v2.01. The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer.

  14. Using the Web Speech API

    The Web Speech API has a main controller interface for this — SpeechRecognition — plus a number of closely-related interfaces for representing grammar, results, etc. Generally, the default speech recognition system available on the device will be used for the speech recognition — most modern OSes have a speech recognition system for ...

  15. Processing Speech in Java

    The Java Speech API allows the Java applications to enable the speech technology in the user interfaces. The command-and-control recognizers, dictation systems, and speech synthesizers are supported by the cross-platform API defined by Java Speech API. It is not contained in the Java Development Kit and therefore we need a third-party speech ...

  16. GitHub

    In RecognizeSpeech.java we put a quick start example, which shows how you can use Google Speech API to automatically recognize speech based on a local file. For an example audio file, you can use the audio.raw file from the samples repository.

  17. Convert Text-to-Speech in Java

    Java provides the Speech API that incorporates speech technology in UI. It defines a cross-platform API to support command and control recognizers, dictation systems, and speech synthesizers. It is not a part of JDK. It is a third-party speech API to encourage the availability of multiple implementations. The architecture of the TTS system is ...

  18. java

    If you'd have searched for java speech recognition, you would've found the Java Speech API or short JSAPI. Share. Improve this answer. Follow edited Oct 4, 2019 at 14:32. Stypox. 1,038 13 13 silver badges 19 19 bronze badges. answered Dec 29, 2013 at 18:38. s3lph s3lph.

  19. Overview (Java SE 22 & JDK 22)

    Java ® Platform, Standard Edition & Java Development KitVersion 22 API Specification. This document is divided into two sections: Java SE. The Java Platform, Standard Edition (Java SE) APIs define the core Java platform for general-purpose computing. These APIs are in modules whose names start with java. JDK.

  20. goxr3plus/java-google-speech-api

    Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java. Google has released it's official library for ...

  21. Java Speech API download

    Download Java Speech API for free. Wrapper for vendors to simplify usage of the Java Speech API (JSR 113). Note that the spec is an untested early access and that there may be changes in the API.

  22. Speech to text, java speech API, where to find it?

    So I wanted to create a small application for my laboratory at home, and I need speech recognition so the java speech API seems as a pretty good solution to my problem of finding a suitable API. I have tried the Sphinx-4 API but I couldn't find any JAR-files in the PreAlpha.zip package that I downloaded, I could only find header files for C ...

  23. Speech to text

    The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model.They can be used to: Transcribe audio into whatever language the audio is in. Translate and transcribe the audio into english.