Google Speech-to-Text for streaming audio with Java - java

I am trying to use the Google Speech-to-Text API to do some voice-to-voice translation (also using Translation and Text-to-Speech). I would like for a person to speak into the microphone and for that text to be transcribed to text. I used the streaming audio tutorial found in the google documentation as a base for this method. I would also like the audio stream to stop when the person has stopped speaking.
Here is the modified method:
public static String streamingMicRecognize(String language) throws Exception {
ResponseObserver<StreamingRecognizeResponse> responseObserver = null;
try (SpeechClient client = SpeechClient.create()) {
responseObserver =
new ResponseObserver<StreamingRecognizeResponse>() {
ArrayList<StreamingRecognizeResponse> responses = new ArrayList<>();
public void onStart(StreamController controller) {}
public void onResponse(StreamingRecognizeResponse response) {
responses.add(response);
}
public void onComplete() {
SPEECH_TO_TEXT_ANSWER = "";
for (StreamingRecognizeResponse response : responses) {
StreamingRecognitionResult result = response.getResultsList().get(0);
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcript : %s\n", alternative.getTranscript());
SPEECH_TO_TEXT_ANSWER = SPEECH_TO_TEXT_ANSWER + alternative.getTranscript();
}
}
public void onError(Throwable t) {
System.out.println(t);
}
};
ClientStream<StreamingRecognizeRequest> clientStream =
client.streamingRecognizeCallable().splitCall(responseObserver);
RecognitionConfig recognitionConfig =
RecognitionConfig.newBuilder()
.setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
.setLanguageCode(language)
.setSampleRateHertz(16000)
.build();
StreamingRecognitionConfig streamingRecognitionConfig =
StreamingRecognitionConfig.newBuilder().setConfig(recognitionConfig).build();
StreamingRecognizeRequest request =
StreamingRecognizeRequest.newBuilder()
.setStreamingConfig(streamingRecognitionConfig)
.build(); // The first request in a streaming call has to be a config
clientStream.send(request);
// SampleRate:16000Hz, SampleSizeInBits: 16, Number of channels: 1, Signed: true,
// bigEndian: false
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
DataLine.Info targetInfo =
new Info(
TargetDataLine.class,
audioFormat); // Set the system information to read from the microphone audio stream
if (!AudioSystem.isLineSupported(targetInfo)) {
System.out.println("Microphone not supported");
System.exit(0);
}
// Target data line captures the audio stream the microphone produces.
TargetDataLine targetDataLine = (TargetDataLine) AudioSystem.getLine(targetInfo);
targetDataLine.open(audioFormat);
targetDataLine.start();
System.out.println("Start speaking");
playMP3("beep-07.mp3");
long startTime = System.currentTimeMillis();
// Audio Input Stream
AudioInputStream audio = new AudioInputStream(targetDataLine);
long estimatedTime = 0, estimatedTimeStoppedSpeaking = 0, startStopSpeaking = 0;
int currentSoundLevel = 0;
Boolean hasSpoken = false;
while (true) {
estimatedTime = System.currentTimeMillis() - startTime;
byte[] data = new byte[6400];
audio.read(data);
currentSoundLevel = calculateRMSLevel(data);
System.out.println(currentSoundLevel);
if (currentSoundLevel > 20) {
estimatedTimeStoppedSpeaking = 0;
startStopSpeaking = 0;
hasSpoken = true;
}
else {
if (startStopSpeaking == 0) {
startStopSpeaking = System.currentTimeMillis();
}
estimatedTimeStoppedSpeaking = System.currentTimeMillis() - startStopSpeaking;
}
if ((estimatedTime > 15000) || (estimatedTimeStoppedSpeaking > 1000 && hasSpoken)) { // 15 seconds or stopped speaking for 1 second
playMP3("beep-07.mp3");
System.out.println("Stop speaking.");
targetDataLine.stop();
targetDataLine.drain();
targetDataLine.close();
break;
}
request =
StreamingRecognizeRequest.newBuilder()
.setAudioContent(ByteString.copyFrom(data))
.build();
clientStream.send(request);
}
} catch (Exception e) {
System.out.println(e);
}
responseObserver.onComplete();
String ans = SPEECH_TO_TEXT_ANSWER;
return ans;
}
The output is supposed to be transcribed text in string form. However, it is very inconsistent. Most of the time, it returns an empty string. However, sometimes the program does work and does return the transcribed text.
I have also tried to record the audio seperately while the program is running. Although the method returned an empty string, when I saved the audio file recorded seperately and sent that directly through the api, it returned the correct transcribed text.
I do not understand why/how the program is only working some of the time.

Related

Missing character off incoming string from Bluetooth

I am currently trying to create a water level readout as a progress bar in a simple Android app. Currently, I am using an Arduino Mega 2560 with a HC-05 to transmit the readout of the water level sensor. To simplify things, the arduino code is just counting up and down from 0 to 1000 and back, as follows.
void setup() {
// put your setup code here, to run once:
Serial.begin(9600);
Serial.println("Test for Water Sensor");
Serial1.begin(9600);
}
void loop() {
// put your main code here, to run repeatedly:
for (int i = 0; i <= 1000; i++)
{
Serial1.println(i);
Serial.println(i);
delay(100);
}
for (int i = 1000; i >= 0; i--)
{
Serial1.println(i);
Serial.println(i);
delay(100);
}
}
On the android end, I am using this to convert to int, then change the progress bar. It also currently displays the unconverted message in a TextView.
mHandler = new Handler(Looper.getMainLooper()){
#Override
public void handleMessage(Message msg){
if(msg.what == MESSAGE_READ){
String readMessage = null;
try {
readMessage = new String((byte[]) msg.obj, "UTF-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
mReadBuffer.setText(readMessage);
try {
waterLevelValue = NumberFormat.getInstance().parse(readMessage).intValue();
waterLevel.setProgress(waterLevelValue);
} catch (ParseException e) {
e.printStackTrace();
}
}
if(msg.what == CONNECTING_STATUS){
if(msg.arg1 == 1)
mBluetoothStatus.setText("Connected to Device: " + msg.obj);
else
mBluetoothStatus.setText("Connection Failed");
}
}
};
The issue I am getting is that quite often (maybe 1-2 times a second) it is not reading the first digit. I can see on the Serial Monitor that all digits are going there, but on the android app, it will sometimes miss the first (eg: 443, 444, 45, 446, 447, etc)
What could be causing the issue here, I am very new to Bluetooth, so please help! More than happy to send more portions of code if needed.
EDIT: Adding code for reading input stream. Probably was important in the first place.
public void run() {
byte[] buffer = new byte[1024]; // buffer store for the stream
int bytes; // bytes returned from read()
// Keep listening to the InputStream until an exception occurs
while (true) {
try {
// Read from the InputStream
bytes = mmInStream.available();
if(bytes != 0) {
SystemClock.sleep(100); //pause and wait for rest of data. Adjust this depending on your sending speed.
bytes = mmInStream.available(); // how many bytes are ready to be read?
bytes = mmInStream.read(buffer, 0, bytes); // record how many bytes we actually read
mHandler.obtainMessage(MESSAGE_READ, bytes, -1, buffer)
.sendToTarget(); // Send the obtained bytes to the UI activity
}
} catch (IOException e) {
e.printStackTrace();
break;
}
}
}

Java audio Stream Closed error

I am trying to add sound to a game I am making, but every time I try to load the sound, I get a Stream Closed Exception. I don't understand why this is happening.
Loads the sound:
public class WavPlayer extends Thread {
/*
* #param s The path of the wav file.
* #return The sound data loaded into the WavSound object
*/
public static WavSound loadSound(String s){
// Get an input stream
InputStream is = WavPlayer.class.getClassLoader().getResourceAsStream(s);
AudioInputStream audioStream;
try {
// Buffer the input stream
BufferedInputStream bis = new BufferedInputStream(is);
// Create the audio input stream and audio format
audioStream = AudioSystem.getAudioInputStream(bis); //!Stream Closed Exception occurs here
AudioFormat format = audioStream.getFormat();
// The length of the audio file
int length = (int) (audioStream.getFrameLength() * format.getFrameSize());
// The array to store the samples in
byte[] samples = new byte[length];
// Read the samples into array to reduce disk access
// (fast-execution)
DataInputStream dis = new DataInputStream(audioStream);
dis.readFully(samples);
// Create a sound container
WavSound sound = new WavSound(samples, format, (int) audioStream.getFrameLength());
// Don't start the sound on load
sound.setState(SoundState.STATE_STOPPED);
// Create a new player for each sound
new WavPlayer(sound);
return sound;
} catch (Exception e) {
// An error. Mustn't happen
}
return null;
}
// Private variables
private WavSound sound = null;
/**
* Constructs a new player with a sound and with an optional looping
*
* #param s The WavSound object
*/
public WavPlayer(WavSound s) {
sound = s;
start();
}
/**
* Runs the player in a separate thread
*/
#Override
public void run(){
// Get the byte samples from the container
byte[] data = sound.getData();
InputStream is = new ByteArrayInputStream(data);
try {
// Create a line for the required audio format
SourceDataLine line = null;
AudioFormat format = sound.getAudioFormat();
// Calculate the buffer size and create the buffer
int bufferSize = sound.getLength();
// System.out.println(bufferSize);
byte[] buffer = new byte[bufferSize];
// Create a new data line to write the samples onto
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
line = (SourceDataLine) AudioSystem.getLine(info);
// Open and start playing on the line
try {
if (!line.isOpen()) {
line.open();
}
line.start();
} catch (Exception e){}
// The total bytes read
int numBytesRead = 0;
boolean running = true;
while (running) {
// Destroy this player if the sound is destroyed
if (sound.getState() == SoundState.STATE_DESTROYED) {
running = false;
// Release the line and release any resources used
line.drain();
line.close();
}
// Write the data only if the sound is playing or looping
if ((sound.getState() == SoundState.STATE_PLAYING)
|| (sound.getState() == SoundState.STATE_LOOPING)) {
numBytesRead = is.read(buffer, 0, buffer.length);
if (numBytesRead != -1) {
line.write(buffer, 0, numBytesRead);
} else {
// The samples are ended. So reset the position of the
// stream
is.reset();
// If the sound is not looping, stop it
if (sound.getState() == SoundState.STATE_PLAYING) {
sound.setState(SoundState.STATE_STOPPED);
}
}
} else {
// Not playing. so wait for a few moments
Thread.sleep(Math.min(1000 / Global.FRAMES_PER_SECOND, 10));
}
}
} catch (Exception e) {
// Do nothing
}
}
The error message I get is: "Exception in thread "main" java.io.IOException: Stream closed
at java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:134)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at com.sun.media.sound.WaveFileReader.getFMT(WaveFileReader.java:224)
at com.sun.media.sound.WaveFileReader.getAudioInputStream(WaveFileReader.java:140)
at javax.sound.sampled.AudioSystem.getAudioInputStream(AudioSystem.java:1094)
at stm.sounds.WavPlayer.loadSound(WavPlayer.java:42)
at stm.STM.(STM.java:265)
at stm.STM.main(STM.java:363)"
Most probably the file path in this line is not correct:
WavPlayer sound1 = WavPlayer.loadSound("coin.wav");
You should pass the path of the 'coin.wav' file instead of just its name.
For instance if its under a folder named sounds, which let's say right under the root of project, that parameter should be 'sounds/coin.wav'.
The problem is in your static method loadSound. This method returns null when an exception is thrown. You catch it but you do nothing with it,
NEVER make empty catch.
Catch specific exceptions.
I would change your method signature loadSound as
public static WavSound loadSound(String s) throws Exception // rather than exception specific exception!!
And then your method without try-catch

Play audio in multiple outputs

I need to play music in different audio outputs.
For instance, I have two musics: music1 and music2,
and they have to play in separate threads in diferent speakers.
Assuming that I have more than one audio device that is able
to play sound:
I found this method (here - it is the BasicPlayer):
protected void createLine() throws LineUnavailableException
{
log.info("Create Line");
if (m_line == null)
{
AudioFormat sourceFormat = m_audioInputStream.getFormat();
log.info("Create Line : Source format : " + sourceFormat.toString());
int nSampleSizeInBits = sourceFormat.getSampleSizeInBits();
if (nSampleSizeInBits <= 0) nSampleSizeInBits = 16;
if ((sourceFormat.getEncoding() == AudioFormat.Encoding.ULAW) || (sourceFormat.getEncoding() == AudioFormat.Encoding.ALAW)) nSampleSizeInBits = 16;
if (nSampleSizeInBits != 8) nSampleSizeInBits = 16;
AudioFormat targetFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, sourceFormat.getSampleRate(), nSampleSizeInBits, sourceFormat.getChannels(), sourceFormat.getChannels() * (nSampleSizeInBits / 8), sourceFormat.getSampleRate(), false);
log.info("Create Line : Target format: " + targetFormat);
// Keep a reference on encoded stream to progress notification.
m_encodedaudioInputStream = m_audioInputStream;
try
{
// Get total length in bytes of the encoded stream.
encodedLength = m_encodedaudioInputStream.available();
}
catch (IOException e)
{
log.error("Cannot get m_encodedaudioInputStream.available()", e);
}
// Create decoded stream.
m_audioInputStream = AudioSystem.getAudioInputStream(targetFormat, m_audioInputStream);
AudioFormat audioFormat = m_audioInputStream.getFormat();
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat, AudioSystem.NOT_SPECIFIED);
Mixer mixer = getMixer(m_mixerName);
if (mixer != null)
{
log.info("Mixer : "+mixer.getMixerInfo().toString());
m_line = (SourceDataLine) mixer.getLine(info);
}
else
{
m_line = (SourceDataLine) AudioSystem.getLine(info);
m_mixerName = null;
}
log.info("Line : " + m_line.toString());
log.debug("Line Info : " + m_line.getLineInfo().toString());
log.debug("Line AudioFormat: " + m_line.getFormat().toString());
}
}
With a little debugging, I've found out that the mixer is always null. Why is that?
The mixer shoudn't be the device that outputs sound through a target line?
This program always playback in the default device set on my computer, what can I do to change that?
I've actually just started working with the Java Sound API for one of my own projects, but from what I understand, Mixer is just an interface, not an object. That can explain a part of your problem.

DDS DataReader Cache Breaks And Is No Longer Accessible

Operating with a dds library on i386, trying to pull samples repeatedly. I am explicitly 'reading' not 'takeing' the sample, so they should never expire or be removed.
Start two blackboard applications, (1) and (2)
Perform a read in both applications. This will return "Cache is empty".
Write from (1), sensor id: 1, event id: 1, value: 1.
Read from (1), confirm values
Read from (2), confirm values
Write from (2), sensor id: 1, event id: 1, value: 2.
Read from (2), "cache is empty"
Read from (1), "cache is empty"
It seems like I "broke" it! I believe the lifetime for samples should be inifinity (or so I have come to understand... but cannot confirm!) -- but I can't set it explicitly. topicQos.lifespan.duration is of the type Duration_t, but I cannot set it to a "new Duration_t(Duration_t.DURATION_INFINITY_SEC,Duration_t.DURATION_INFINITY_NSEC)" because it is already finalized?
public class Main {
private static final String EVENT_TOPIC_NAME = "EVENTS";
private static BufferedReader in = null;
private static PrintStream out = null;
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
in = new BufferedReader(new InputStreamReader(System.in));
out = new PrintStream(new BufferedOutputStream(System.out));
DomainParticipantFactory factory = DomainParticipantFactory.TheParticipantFactory;
DomainParticipant participant = factory.create_participant(100,
DomainParticipantFactory.PARTICIPANT_QOS_DEFAULT,
null,
StatusKind.STATUS_MASK_NONE);
EventTypeSupport.register_type(participant, EventTypeSupport.get_type_name());
TopicQos topicQos = new TopicQos();
topicQos.durability.direct_communication = true;
topicQos.durability.kind = DurabilityQosPolicyKind.TRANSIENT_DURABILITY_QOS;
topicQos.reliability.kind = ReliabilityQosPolicyKind.RELIABLE_RELIABILITY_QOS;
topicQos.resource_limits.max_instances = 100;
topicQos.resource_limits.max_samples = 100;
topicQos.resource_limits.max_samples_per_instance = 1;
topicQos.ownership.kind = OwnershipQosPolicyKind.SHARED_OWNERSHIP_QOS;
topicQos.history.kind = HistoryQosPolicyKind.KEEP_LAST_HISTORY_QOS;
topicQos.history.depth = 1;
topicQos.history.refilter = RefilterQosPolicyKind.ALL_REFILTER_QOS;
// Since this is on the same computer, and being typed by a human, we can exepct source timestamps to be useful in ordering
topicQos.destination_order.kind = DestinationOrderQosPolicyKind.BY_SOURCE_TIMESTAMP_DESTINATIONORDER_QOS;
Topic topic =
participant.create_topic(EVENT_TOPIC_NAME,
EventTypeSupport.get_type_name(),
topicQos,
new EventTopicListener(),
StatusKind.STATUS_MASK_ALL);
exitIfNullBecause(topic, "Could not create topic");
Subscriber subscriber = participant.create_subscriber(DomainParticipant.SUBSCRIBER_QOS_DEFAULT,
null,
StatusKind.STATUS_MASK_NONE);
exitIfNullBecause(subscriber, "Could not create subscriber");
DataReader reader = subscriber.create_datareader(participant.lookup_topicdescription(EVENT_TOPIC_NAME),
subscriber.DATAREADER_QOS_USE_TOPIC_QOS,
null,
StatusKind.STATUS_MASK_NONE);
exitIfNullBecause(reader, "Could not create reader");
EventDataReader eventReader = (EventDataReader) reader;
Publisher publisher = participant.create_publisher(DomainParticipant.PUBLISHER_QOS_DEFAULT,
null,
StatusKind.STATUS_MASK_NONE);
exitIfNullBecause(publisher, "Could not create publisher");
DataWriter writer = publisher.create_datawriter(topic,
publisher.DATAWRITER_QOS_USE_TOPIC_QOS,
null,
StatusKind.STATUS_MASK_NONE);
exitIfNullBecause(writer, "Could not create writer");
EventDataWriter eventWriter = (EventDataWriter)writer;
boolean loop = true;
byte inputBuffer[] = new byte[1024];
String command;
while(loop){
print("Enter action [read|write|exit]: ");
command = in.readLine();
if(command.startsWith("r")){
dumpCache(eventReader);
} else if(command.startsWith("w")) {
writeCache(eventWriter);
} else if(command.startsWith("e")){
println("exiting...");
System.exit(0);
} else {
println("Unknown: '" + command + "'");
}
}
System.exit(0);
}
private static void print(String output){
out.print(output);
out.flush();
}
private static void println(String output){
out.println(output);
out.flush();
}
private static void exitIfNullBecause(Object thing, String string) {
if (thing == null) {
println("ERROR: " + string);
System.exit(1);
}
}
private static void dumpCache(EventDataReader eventReader) {
// Something interesting here: I can creat it with a collection as a paramter. TODO: Investigate!
EventSeq eventSeq = new EventSeq();
SampleInfoSeq infoSeq = new SampleInfoSeq();
Event event = null;
SampleInfo info = null;
try{
eventReader.read(eventSeq, infoSeq, 100, SampleStateKind.ANY_SAMPLE_STATE, ViewStateKind.ANY_VIEW_STATE, InstanceStateKind.ANY_INSTANCE_STATE);
} catch (Exception e){
println("Cache is empty");
return;
}
Iterator<SampleInfo> infoIter = infoSeq.iterator();
out.printf("| Sensor ID | Event ID | Value |\n");
for(int i=0; i<infoSeq.size(); i++){
event = (Event)eventSeq.get(i);
out.printf("| %9d | %8d | %5d |\n", event.sensor_id, event.event_id, event.value);
}
out.flush();
}
private static void writeCache(EventDataWriter eventWriter) throws IOException {
Event event = new Event();
print("Sensor ID: ");
String sensor_id_str = in.readLine();
print("Event ID: ");
String event_id_str = in.readLine();
print("Value: ");
String value_str = in.readLine();
Event sample = new Event();
sample.sensor_id = Integer.parseInt(sensor_id_str);
sample.event_id = Integer.parseInt(event_id_str);
sample.value = Integer.parseInt(value_str);
InstanceHandle_t handle = eventWriter.register_instance(sample);
// eventWriter.write(sample, handle);
eventWriter.write_w_timestamp(sample, handle, Time_t.now());
out.printf("SensorID: %s, EventID: %s, Value: %s\n",sensor_id_str,event_id_str,value_str); out.flush();
}
}
The problem does not seem to be related to lifespan.
I'm not sure which DDS implementation you are using but the according to DDS spec, you are performing a zero-copy operation in your dumpCache method. Maybe the implementation that you use behaves like this if you forget to return the loan.
You should normally return_loan after a read/take with zero-copy.
So please add the following code at the end of your dumpCachemethod:
try{
eventReader.return_loan(eventSeq, infoSeq);
} catch (Exception e){
println("Error returning loan");
return;
}

How to tell when AudioTrack object has finished playing?

I'm trying to play a PCM file in Android using the AudioTrack class. I can get the file to play just fine, but I cannot reliably tell when playback has finished. AudioTrack.getPlayState says playback has stopped when it hasn't finished playing. I'm having the same problem with AudioTrack.setNotificationMarkerPosition, and I'm pretty sure my marker is set to the end of the file (although I'm not completely sure I'm doing it right). Likewise, playback continues when getPlaybackHeadPosition is at the end of the file and has stopped incrementing. Can anyone help?
I found that using audioTrack.setNotificationMarkerPosition(audioLength) and audioTrack.setPlaybackPositionUpdateListener worked for me. See the following code:
// Get the length of the audio stored in the file (16 bit so 2 bytes per short)
// and create a short array to store the recorded audio.
int audioLength = (int) (pcmFile.length() / 2);
short[] audioData = new short[audioLength];
DataInputStream dis = null;
try {
// Create a DataInputStream to read the audio data back from the saved file.
InputStream is = new FileInputStream(pcmFile);
BufferedInputStream bis = new BufferedInputStream(is);
dis = new DataInputStream(bis);
// Read the file into the music array.
int i = 0;
while (dis.available() > 0) {
audioData[i] = dis.readShort();
i++;
}
// Create a new AudioTrack using the same parameters as the AudioRecord.
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, RECORDER_SAMPLE_RATE, RECORDER_CHANNEL_OUT,
RECORDER_AUDIO_ENCODING, audioLength, AudioTrack.MODE_STREAM);
audioTrack.setNotificationMarkerPosition(audioLength);
audioTrack.setPlaybackPositionUpdateListener(new OnPlaybackPositionUpdateListener() {
#Override
public void onPeriodicNotification(AudioTrack track) {
// nothing to do
}
#Override
public void onMarkerReached(AudioTrack track) {
Log.d(LOG_TAG, "Audio track end of file reached...");
messageHandler.sendMessage(messageHandler.obtainMessage(PLAYBACK_END_REACHED));
}
});
// Start playback
audioTrack.play();
// Write the music buffer to the AudioTrack object
audioTrack.write(audioData, 0, audioLength);
} catch (Exception e) {
Log.e(LOG_TAG, "Error playing audio.", e);
} finally {
if (dis != null) {
try {
dis.close();
} catch (IOException e) {
// don't care
}
}
}
This works for me:
do{ // Montior playback to find when done
x = audioTrack.getPlaybackHeadPosition();
}while (x< pcmFile.length() / 2);

Categories

Resources