I'm building a voice assistant. I have a working audio recorder that I can stop and start easily.
I just want to be able to detect when the user is actually speaking (it isn't silent) so that I only record what they say, stopping the recording when they stop speaking.
I've been struggling to find a way of doing this. Does anyone have any ideas?
Edit:
So far I've only found Sphinx4 to be able to detect a voice input but it's been flaky at best and I haven't been able to use it to trigger my sound recorder.
I also had a similar project and am using Sphinx4. As the OP said, using Sphinx4 to detect when a user is speaking (and just as importantly when they stop speaking) was an issue. As they said, it was "flaky" at best. Fortunately, I came up with a solution that hopefully helps others that stumble along here.
There are two solid ways I found that worked with Sphinx:
1st Solution: Get the Sphinx4 source and copy the classes from package edu.cmu.sphinx.api into your project. There were about 9 files in there, starting with AbstractSpeechRecognizer and ending in StreamSpeakRecognizer. Since the OP is talking about voice recognition, it is assumed that their inputstream is from a microphone, which makes the LiveSpeechRecognizer and Microphone classes important. Change the imports in your main java activity and import that package (or individual classes as necessary).
From there, there are multiple options. I ended up writing a method in LiveSpeechRecognizer and Microphone that used the results from [LiveSpeechRecognizer].getResult() and putting a shared boolean in there to recognize when a voice is detected. LiveSpeechRecognizer returns results whenever someone finishes speaking, so basically you just set the boolean when the first audio is detected, and detect the voice after the next results come in. Add a timer (I used java's Executor) in a separate thread to detect how long since the last words were detected (i.e. 2 seconds). This way if something goes wrong with the mic or they say a really short sentence, it'll still detect the "end".
In this solution, extending sphinx.api isn't strictly necessary, however I found the results a lot faster when modifying Microphone directly instead of waiting for the results from the main activity.
2nd Solution: You could also modify the source for Sphinx4 so that when "noise" is detected below a certain level it means the user has stopped talking. Sphinx is continuously monitoring the microphone using Java's TargetDataLine. Mess around with the threshold in which it filter's out noise and implement a listener when it changes too much. This approach is absolutely awful for voice recognition, but the OP wanted to detect when a person starts and stops talking, which this will do.
Related
I am trying to build an application that can open an IP camera stream, and if there are failures in that stream, the application should be notified of the nature of the failure, depending on which it takes necessary action. I got this to work pretty well with VLCJ. But since I may encounter situations wherein I detect a failure on one of the cameras, but I have an alternate camera that can be used, I would need to have the capability of having multiple instances of a media player in the same application. This is why I am trying out the trial version 1.2 of VLCJ pro.
However, I am not able to access the logging mechanism like in NativeLog in VLCJ. When astream fails, the logs are printed out on closing the JFrame which has the canvas, but how do I check these strings in my application, and use this information to execute alternate actions? I have gone through the user guide of VLCJ pro, but there doesn't seem to be a mention about this. Any ideas would be very helpful and much appreciated. Thank you.
Edit: I don't want the logs to be stored in a logfile, rather something like a runtime access to generated log messages.
This is not supported.
It might be possible for this to be added in the future, but there are currently no plans to do so.
For my weekend project I'm trying to create a simple program that waits for a program/process to output any sound, and when/if it does then do something.
in pseudocode:
if (application.outputsSound()) {
/* Do something */
}
For starters it could be any sound coming from the specific application, but if it's within reason to detect a specific sound based on a stored audio file, that would be really cool.
My thoughts:
I guess that I need some kind of native library (JNI / JNA), but since I'm new to that, it would be really neat if someone could point me in the right direction.
On Windows you could use the IAudioSessionEnumerator interface
https://msdn.microsoft.com/en-us/library/windows/desktop/dd368281(v=vs.85).aspx
Although this is not perfect, as third party audio stacks will not show up, like ASIO.
On Linux it depends on what kind of environment you are using.
How can I make it so Eclipse automatically updates my code in a window as I edit it? I've seen the feature before in youtube videos but I cannot find it. For example : I change a JApplet rectangle width from 20 to 10, I want to see it update immediately.
I've seen Notch do this on development videos (Minecraft), it is awesome but I don't know exactly how he does it.
-- EDIT --
This has been bugging me so I went and googled "how does notch code" and found this on a blog page https://gun.io/blog/what-i-learned-from-watching-notch-code/. It doesn't say exactly how it was done but gives a good hint (HotSwap) and makes it seem like he set it up himself without external software. Here's the most relevant section:
Incredibly Fast Testing
He began by building the engine, and to do this he used the ‘HotSwap’ functionality of the Java JVM 1.4.2, which continuously updates the running code when it detects that a class has changed.
When building the engine, Notch wrote a function which would continuously pan the camera around and clip through the walls and keep the view on top, so he could make changes to the code and see the effects they made in real time. I’m used to testing by writing a function, building it, installing it on the device I’m testing on, and then seeing the result, which can take up to a minute at a time, so it’s easy to see how HotSwapping could save a lot of development time.
--- ORIGINAL POST CONTINUED ---
I get a similar effect by using groovysh though, works smoothly and can use all your java classes as is.
What I'll usually do is write all my code in java, then go and fire up "Groovysh" where it will give you a little window to enter commands (You may have to ensure the classpath works correctly outside of eclipse). I can then "new" any of my classes and call methods on them one line at a time. When you do myFrame.setSize([100,100]) you will see it change immediately.
A good test is to just run groovysh and type something like:
import javax.swing.*
f=new JFrame()
f.setVisible(true)
f.setSize(100,100)
or the groovier version:
f=new JFrame(visible:true, size:[100,100])
and you will see your frame resize on the screen. You can even drag it bigger and then do something like:
println f.getWidth()
to show your new width. It's fun to interact this way but it's more complicated if you want to actually change your class definition and see it pick up the change, I have no idea how Notch did that. I looked into it a little--it's possible he was using something like JRebel
It requires something special since you would have to dynamically reload the classfile into your running system on every save--something that should have serious classloader issues.
By the way there is also a way to get your Java program to throw out a little GroovyConsole which will allow you to inspect and modify all the variables in your running code (but again you can't replace definitions of existing classes).
Also see answer here:
Change a method at runtime via a hot swap mechanism
Here is the stack trace:
java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:433)
at org.tritonus.share.sampled.file.TAudioFileReader.getAudioInputStream(TAudioFileReader.java:324)
at javazoom.spi.mpeg.sampled.file.MpegAudioFileReader.getAudioInputStream(Unknown Source)
at javazoom.spi.mpeg.sampled.file.MpegAudioFileReader.getAudioInputStream(Unknown Source)
at javax.sound.sampled.AudioSystem.getAudioInputStream(AudioSystem.java:1179)
at javazoom.jlgui.basicplayer.BasicPlayer.initAudioInputStream(Unknown Source)
at javazoom.jlgui.basicplayer.BasicPlayer.initAudioInputStream(Unknown Source)
at javazoom.jlgui.basicplayer.BasicPlayer.open(Unknown Source)
at BasicPlayerDemo.play(BasicPlayerDemo.java:49)
at BasicPlayerDemo.main(BasicPlayerDemo.java:24)
Seems that other people are also having this problem:
Jukebox: no sound?
Stack trace on player state update
Any reason for this? I am trying to make a simple Java Swing music player using JavaZoom classes.
There is a solution for this problem at (pelzkuh.de blog) It is in German but mainly says that the cause is an outdated library mp3spi1.9.4.jar. You have to replace it with the new one mp3spi1.9.5.jar. Links are provided in the pelzkuh blog entry.
The thread with answers alludes to something I have struck before.
MP3 is hardly any sort of 'standard' - with many extensions to the basic format. Java Sound based apps. will generally only deal with some of those types, and even if that were not the case
Media players generally go to considerable effort to play 'any rubbish file' (including invalid ones) thrown at them. It would be a major effort to replicate that ability.
So is there no simple solution to this? Should I just ignore such MP3s?
No
That sounds pretty easy (simple). Skip them & go to the next track. Pop a dialog or add it to a log if the user chooses 'high feedback' in the player options.
Actually I'm making a mini-project for my college, so it doesn't look good if this player doesn't play certain files.
I'd check that with the people marking it. If they are expecting you to provide support for 'any file thrown at it' in a college project, they need to pull their head out of the clouds. I'll bet I could make files that play on one 'major player' but cause the next to lock up & die (OK.. thinking more of some recent video attempts, but the same basic principle applies). 'Handling media' is tricky.
..the college isn't expecting anything, since I chose to create this myself. I didn't know that handling media is tricky. Now if they ask, I can tell them that!
I suggest supplying the player complete with a play list and media controlled by you (and sure to be compatible with your player). You can find 3 basic, distributable MP3 tracks at my media page. Those are:
(parsable by) the JMF MP3 codec
..so JavaZoom should be able to load them as well.
I m currently working on my project of remote desktop administration. I m using robot class to capture images and send over network. It works well but bit slower.
Because all the time we need to captuure and send image its too costly. Is it possible to detect only a portion of screen which is changed and send only that portion?
Please any one guide me on this. Thank you!!!
The keyword you're looking for (in order to be able to look this up and figure the solution yourself) is dirty rectangles.
You can look into some code here.
I looked into this awhile back, and the image capture is implemented particularly inefficiently. I don't recall the specific detail, but it was pretty bad the way they did it. I felt, at the time, that the only way to do it better would be to implement it in JNI. Which you could use JNA to shortcut.
I don't know if any platform's screen capture routines will allow only changed sections to be sent, but you could implement a decent image diff; although that could get expensive too. You would really need to measure whats going on to see if it works for you.