Java voice recognition for very small dictionary

Java voice recognition for very small dictionary - java

I have MP3 audio files that contain voicemails that are left by a computer.
The message content is always in same format and left by the same computer voice with only a slight variation in content:
"You sold 4 cars today" (where the 4 can be anything from 0 to 9).
I have be trying to set up Sphinx, but the out-of-the-box models did not work too good.
I then tried to write my own acoustic model and haven't had much better success yet (30% unrecognized is my best).
I am wondering if voice recognition might be overkill for this task since I have exactly ONE voice, an expected audio pattern and a very limited dictionary that would need to be recognized.
I have access to each of the ten sounds (spoken numbers) that I would need to search for in the message.
Is there a non-VR approach to finding sounds in an audio file (I can convert MP3 to another format if necessary).
Update: My solution to this task follows
After working with Nikolay directly, I learned that the answer to my original question is irrelevant since the desired results may be achieved (with 100% accuracy) using Sphinx4 and a JSGF grammar.
1: Since the speech in my audo files is very limited, I created a JSGF grammar (salesreport.gram) to describe it. All of the information I needed to create the following grammar was available on this JSpeech Grammar Format page.
#JSGF V1.0;
grammar salesreport;
public <salesreport> = (<intro> | <sales> | <closing>)+;
<intro> = this is your automated automobile sales report;
<sales> = you sold <digit> cars today;
<closing> = thank you for using this system;
<digit> = zero | one | two | three | four | five | six | seven | eight | nine;
NOTE: Sphinx does not support JSGF tags in the grammar. If necessary, a regular expression may be used to extract specific information (the number of sales in my case).
2: It is very important that your audio files are properly formatted. The default sample rate for Sphinx is 16Khz (16Khz means there are 16000 samples collected every second). I converted my MP3 audio files to WAV format using FFmpeg.
ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav
Unfortunately, FFmpeg renders this solution OS dependent. I am still looking for a way to convert the files using Java and will update this post if/when I find it.
Although it was not required to complete this task, I found Audacity helpful for working with audio files. It includes many utilities for working with the audio files (checking sample rate and bandwidth, file format conversion, etc).
3: Since telephone audio has a maximum bandwidth (the range of frequencies included in the audio) of 8kHz, I used the Sphinx en-us-8khz acoustic model.
4: I generated my dictionary, salesreport.dic, using lmtool
5: Using the files mentioned in the previous steps and the following code (modified version of Nikolay's example), my speech is recognized with 100% accuracy every time.
public String parseAudio(File voiceFile) throws FileNotFoundException, IOException
{
String retVal = null;
StringBuilder resultSB = new StringBuilder();
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("file:acoustic_models/en-us-8khz");
configuration.setDictionaryPath("file:salesreport.dic");
configuration.setGrammarPath("file:salesreportResources/")
configuration.setGrammarName("salesreport");
configuration.setUseGrammar(true);
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
try (InputStream stream = new FileInputStream(voiceFile))
{
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null)
{
System.out.format("Hypothesis: %s\n", result.getHypothesis());
resultSB.append(result.getHypothesis()
+ " ");
}
recognizer.stopRecognition();
}
return resultSB.toString().trim();
}

The accuracy on such task must be 100%. Here is the code sample to use with the grammar:
public class TranscriberDemoGrammar {
public static void main(String[] args) throws Exception {
System.out.println("Loading models...");
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("file:en-us-8khz");
configuration.setDictionaryPath("cmu07a.dic");
configuration.setGrammarPath("file:./");
configuration.setGrammarName("digits");
configuration.setUseGrammar(true);
StreamSpeechRecognizer recognizer =
new StreamSpeechRecognizer(configuration);
InputStream stream = new FileInputStream(new File("file.wav"));
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n",
result.getHypothesis());
}
recognizer.stopRecognition();
}
}
You also need to make sure that both sample rate and audio bandwidth matches the decoder configuration
http://cmusphinx.sourceforge.net/wiki/faq#qwhat_is_sample_rate_and_how_does_it_affect_accuracy

First of all, Sphinx only work with WAVE file. For very limited vocabulary, Sphinx should generate good result when using a JSGF grammar file (but not that good in dictation mode). The main issue I found is that it does not provide confidence score (it is currently bugged). You might want to check three other alternative:
SpeechRecognizer from Windows platform. It provide easy to use recognition with confidence score and support grammar. This is C#, but you could build a native wrapper or custom server.
Google Speech API is an online speech recognition engine, free up to 50 request per day. There is several API for this, but I like JARVIS. Be careful though, since there is no official support or documentation about this and Google might (and already have in the past) close this engine whenever they want. Of course, you will have some privacy issue (is it okay to send this audio data to a third party ?).
I recently came through ISpeech and got good result with it. It provides its own Java wrapper API, free for mobile app. Same privacy issue as Google API.
I myself choose to go with the first option and build a speech recognition service in a custom http server. I found it to be the most effective way to tackle speech recognition from Java until Sphinx scoring issue is fixed.

Related

CMUSphinx German command & control app, bad accuracy

I'm trying to implement a German command and control application with CMUSphinx and Java. So far, the application should recognize only a few words (numbers from 1 to 9, yes/no).
Unfortunately the accuracy is very bad. It seems, if a word is recognized correctly, it is only by chance.
Here is my java code so far (adapted from the tutorial):
public static void main(String[] args) throws IOException {
// Configuration Object
Configuration configuration = new Configuration();
// Set path to the acoustic model.
configuration.setAcousticModelPath("resource:/cmusphinx-de-voxforge-5.2");
// Set path to the dictionary.
configuration.setDictionaryPath("resource:/cmusphinx-voxforge-de.dic");
// use grammar
configuration.setGrammarPath("resource:/");
configuration.setGrammarName("dialog");
configuration.setUseGrammar(true);
LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
recognizer.startRecognition(true);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
}
Here is my grammer file:
#JSGF V1.0;
grammar dialog;
public <digit> = 1 | 2 | 3 | 4 |5 | 6 | 7 | 8 | 9 | ja | nein;
I've downloaded the German acoustic model and dictionary from here: https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/German/
Is there something obvious I'm missing here? Where is the problem?
Thanks in advance and kind regards.

I have tried to use pocketsphinx with Eng and German model and accuracy is good when it comes with predefined/limited set of phrases! You can forget about general things like "could you please find me a restaurant in the downtown".
To achieve good accuracy with a pocketshinx:
Check that your mic, audio device, file and everything are 16 kHz while general model is trained with such acoustic examples.
You should create your own limited dictionary you cannot use cmusphinx-voxforge-de.dic while accuracy is dramatically dropped.
You should create your own language model.
You can try to modify pronunciation files to fit your accent.
You can search for Jasper project on GitLab to see how it's implemented.
Also you can check the documentation

Well, accuracy is not great, probably the original database didn't have many examples like yours. Partially your dialect also contributes, Germans say 7 with z, not with s. Partially echo in your room contributes too. I am not sure how you recorded your audio, if you used some compression or codec in between it might also contribute to bad accuracy.
You might want to collect few hundred samples and perform MAP adaptation to improve the accuracy.

Programatically embed a video in a slideshow using Apache Open Office API

I want to create a plugin that adds a video on the current slide in an open instance of Open Office Impress by specifying the location of the video automatically. I have successfully added shapes to the slide. But I cannot find a way to embed a video.
Using the .uno:InsertAVMedia I can take user input to choose a file and it works. How do I want to specify the location of the file programmatically?
CONCLUSION:
This is not supported by the API. Images and audio can be inserted without user intervention but videos cannot be done this way. Hope this feature is released in subsequent versions.

You requested information about an extension, even though the code you are using is quite different, using a file stream reader and POI.
If you really do want to develop an extension, then start with one of the Java samples. An example that uses Impress is https://wiki.openoffice.org/wiki/File:SDraw.zip.
Inserting videos into an Impress presentation can be difficult. First be sure you can get it to work manually. The most obvious way to do that seems to be Insert -> Media -> Audio or Video. However many people use links to files instead of actually embedding the file. See also https://ask.libreoffice.org/en/question/1898/how-to-embed-video-into-impress-presentation/.
If embedding is working for your needs and you want to automate the embedding by using an extension (which seems to be what your question is asking), then there is a dispatcher method called InsertAVMedia that does this.
I do not know offhand what the parameters are for the call. See https://forum.openoffice.org/en/forum/viewtopic.php?f=20&t=61127 for how to look up parameters for dispatcher calls.
EDIT
Here is some Basic code that inserts a video.
sub insert_video
dim document as object
dim dispatcher as object
document = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
dispatcher.executeDispatch(document, ".uno:InsertAVMedia", "", 0, Array())
end sub
From looking at InsertAVMedia in sfx.sdi, it seems that this call does not take any parameters.
EDIT 2
Sorry but InsertVideo and InsertImage do not take parameters either. From svx.sdi it looks like the following calls take parameters of some sort: InsertGalleryPic, InsertGraphic, InsertObject, InsertPlugin, AVMediaToolBox.
However according to https://wiki.openoffice.org/wiki/Documentation/OOoAuthors_User_Manual/Getting_Started/Sometimes_the_macro_recorder_fails, it is not possible to specify a file for InsertObject. That documentation also mentions that you never know what will work until you try it.
InsertGraphic takes a FileName parameter, so I would think that should work.

It is possible to add an XPlayer on the current slide. It looks like this will allow you to play a video, and you can specify the file's URL automatically.
Here is an example using createPlayer: https://forum.openoffice.org/en/forum/viewtopic.php?f=20&t=57699.
EDIT:
This Basic code works on my system. To play the video, simply call the routine.
sub play_video
If Video_flag = 0 Then
video =converttoURL( _
"C:\Users\JimStandard\Downloads\H264_test1_Talkinghead_avi_480x360.avi")
Video_flag = 1
'for windows:
oManager = CreateUnoService("com.sun.star.media.Manager_DirectX")
' for Linux
' oManager = CreateUnoService("com.sun.star.media.Manager_GStreamer")
oPlayer = oManager.createPlayer( video )
' oPlayer.CreatePlayerwindow(array()) ' crashes?
'oPlayer.setRate(1.1)
oPlayer.setPlaybackLoop(False)
oPlayer.setMediaTime(0.0)
oPlayer.setVolumeDB(GetSoundVolume())
oPlayer.start() ' Lecture
Player_flag = 1
Else
oPlayer.start() ' Lecture
Player_flag = 1
End If
End Sub

Testing HLS using JMeter

I am using JMeter to test HLS playback from a Streaming Server. So, the first HTTP request is for a master manifest file(m3u8). Say,
http://myserver/application1/subpath1/file1.m3u8
The reply to this will result in a playlist something like,
subsubFolder/360p/file1.m3u8
subsubFolder/480p/file1.m3u8
subsubFolder/720p/file1.m3u8
So, next set of URLs become
http://myserver/application1/subpath1/subsubFolder/360p/file1.m3u8
http://myserver/application1/subpath1/subsubFolder/480p/file1.m3u8
http://myserver/application1/subpath1/subsubFolder/720p/file1.m3u8
Now, individual reply to these further will be an index of chunks, like
0/file1.ts
1/file1.ts
2/file2.ts
3/file3.ts
Again, we have next set of URLs as
http://myserver/application1/subpath1/subsubFolder/360p/0/file1.ts
http://myserver/application1/subpath1/subsubFolder/360p/1/file1.ts
http://myserver/application1/subpath1/subsubFolder/360p/2/file1.ts
http://myserver/application1/subpath1/subsubFolder/360p/3/file1.ts
This is just the case of one set(360p). There will be 2 more sets like these(for 480p, 720p).
I hope the requirement statement is clear uptill this.
Now, the problem statement.
Using http://myserver/application1 as static part, regex(.+?).m3u8 is applied at 1st reply which gives subpath1/subsubFolder/360p/file1. This, is then added to the static part again, to get http://myserver/application1/subpath1/subsubFolder/360p/file1 + .m3u8
The problem comes at the next stage. As, you can see, with parts extracted previously, all I'm getting is
http://myserver/application1/subpath1/subsubFolder/360p/file1/0/file1.ts
The problem is obvious, an extra file1, 360p/file1 in place of 360p/0.
Any suggestions, inputs or alternate approaches appreciated.

If I understood the problem correctly, all you need is the file name as the other URLs can be constructed with it. Rather than using http://myserver/application1 as static part of your regex, I would try to get the filename directly:
([^\/.]+)\.m3u8$
# match one or more characters that are not a forward slash or a period
# followed by a period
# followed by the file extension (m3u8)
# anchor the whole match to the end
Now consider your urls, e.g. http://myserver/application1/subpath1/subsubFolder/360p/file1.m3u8, the above regex will capture file1, see a working demo here. Now you can construct the other URLs, e.g. (pseudo code):
http://myserver/application1/subpath1/subsubFolder/360p/ + filename + .m3u8
http://myserver/application1/subpath1/subsubFolder/360p/ + filename + /0/ + filename + .ts
Is this what you were after?

Make sure you use:
(.*?) - as Regular Expression (change plus to asterisk in your regex)
-1 - as Match No.
$1$- as template
See How to Load Test HTTP Live Media Streaming (HLS) with JMeter article for detailed instructions.

If you are ready to pay for a commercial plugin, then there is an easy and much more realistic solution which is a plugin for Apache JMeter provided by UbikLoadPack:
Besides doing this job for you, it will simulate the way a player would read the file. It will also scale much better than any custom script or player solution.
It supports VOD and Live which are quite difficult to script.
See:
http://www.ubik-ingenierie.com/blog/easy-and-realistic-load-testing-of-http-live-streaming-hls-with-apache-jmeter/
http://www.ubik-ingenierie.com/blog/ubikloadpack-http-live-streaming-plugin-jmeter-videostreaming-mpegdash/
Disclaimer, we are the providers of this solution

Naive Bayes Text Classification Algorithm

Hye there! I just need the help for implementing Naive Bayes Text Classification Algorithm in Java to just test my Data Set for research purposes. It is compulsory to implement the algorithm in Java; rather using Weka or Rapid Miner tools to get the results!
My Data Set has the following type of Data:
Doc Words Category
Means that I have the Training Words and Categories for each training (String) known in advance. Some of the Data Set is given below:
Doc Words Category
Training
1 Integration Communities Process Oriented Structures...(more string) A
2 Integration Communities Process Oriented Structures...(more string) A
3 Theory Upper Bound Routing Estimate global routing...(more string) B
4 Hardware Design Functional Programming Perfect Match...(more string) C
.
.
.
Test
5 Methodology Toolkit Integrate Technological Organisational
6 This test contain string naive bayes test text text test
SO the Data Set comes from a MySQL DataBase and it may contain multiple training strings and test strings as well! The thing is I just need to implement Naive Bayes Text Classification Algorithm in Java.
The algorithm should follow the following example mentioned here Table 13.1
Source: Read here
The thing is that I can implement the algorithm in Java Code myself but i just need to know if it is possible that there exist some kind a Java library with source code documentation available to allow me to just test the results.
The problem is I just need the results for just one time only means its just a test for results.
So, come to the point can somebody tell me about any good java library that helps my code this algorithm in Java and that could made my dataset possible to process the results, or can somebody give me any good ideas how to do it easily...something good that can help me.
I will be thankful for your help.
Thanks in advance

As per your requirement, you can use the Machine learning library MLlib from apache. The MLlib is Spark’s scalable machine learning library consisting of common learning algorithms and utilities. There is also a java code template to implement the algorithm utilizing the library. So to begin with, you can:
Implement the java skeleton for the Naive Bayes provided on their site as given below.
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.mllib.classification.NaiveBayes;
import org.apache.spark.mllib.classification.NaiveBayesModel;
import org.apache.spark.mllib.regression.LabeledPoint;
import scala.Tuple2;
JavaRDD<LabeledPoint> training = ... // training set
JavaRDD<LabeledPoint> test = ... // test set
final NaiveBayesModel model = NaiveBayes.train(training.rdd(), 1.0);
JavaPairRDD<Double, Double> predictionAndLabel =
test.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {
#Override public Tuple2<Double, Double> call(LabeledPoint p) {
return new Tuple2<Double, Double>(model.predict(p.features()), p.label());
}
});
double accuracy = predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() {
#Override public Boolean call(Tuple2<Double, Double> pl) {
return pl._1().equals(pl._2());
}
}).count() / (double) test.count();
For testing your datasets, there is no best solution here than use the Spark SQL. MLlib fits into Spark's APIs perfectly. To start using it, I would recommend you to go through the MLlib API first, implementing the Algorithm according to your needs. This is pretty easy using the library.
For the next step to allow the processing of your datasets possible, just use the Spark SQL.
I will recommend you to stick to this. I too have hunted down multiple options before settling for this easy to use library and it's seamless support for inter-operations with some other technologies. I would have posted the complete code here to perfectly fit your answer. But I think you are good to go.

You can use the Weka Java API and include it in your project if you do not want to use the GUI.
Here's a link to the documentation to incorporate a classifier in your code:
https://weka.wikispaces.com/Use+WEKA+in+your+Java+code

Please take a look at the Bow toolkit.
It has a Gnu license and source code. Some of its code includes
Setting word vector weights according to Naive Bayes, TFIDF, and several other methods.
Performing test/train splits, and automatic classification tests.
It's not a Java library, but you could compile the C code to ensure that you Java had similar results for a given corpus.
I also spotted a decent Dr. Dobbs article that implements in Perl. Once again, not the desired Java, but will give you the one-time results that you are asking for.

Hi I thinks Spark would help you a lot:
http://spark.apache.org/docs/1.2.0/mllib-naive-bayes.html
you can even choose the language you think is the most appropriate to your needs Java / Python / Scala!

You may want to take a look at this.
https://mahout.apache.org/users/classification/bayesian.html

Please use scipy from python. There is already an implementation of what you need:
class sklearn.naive_bayes.MultinomialNB(alpha=1.0, fit_prior=True, class_prior=None)¶
scipy

You can use an algorithm platform like KNIME, it has variety of classification algorithms (Naive bayed included). You can run it with a GUI or Java API.

If you want to implement Naive Bayes Text Classification Algorithm in Java, then WEKA Java API will be a better solution. The data set should have to be in .arff format. Creating an .arff file from mySql database is very easy. Here is the attachment of the java code for the classifier a link of a sample .arff file.
Create a new Text document. Open it with Notepad. Copy and paste all the texts below the link. Save it as DataSet.arff. http://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/weather.arff
Download Weka Java API: http://www.java2s.com/Code/Jar/w/weka.htm
Code for the classifier:
public static void main(String[] args) {
try {
StringBuilder txtAreaShow = new StringBuilder();
//reads the arff file
BufferedReader breader = null;
breader = new BufferedReader(new FileReader("DataSet.arff"));
//if 40 attributes availabe then 39 will be the class index/attribuites(yes/no)
Instances train = new Instances(breader);
train.setClassIndex(train.numAttributes() - 1);
breader.close();
//
NaiveBayes nB = new NaiveBayes();
nB.buildClassifier(train);
Evaluation eval = new Evaluation(train);
eval.crossValidateModel(nB, train, 10, new Random(1));
System.out.println("Run Information\n=====================");
System.out.println("Scheme: " + train.getClass().getName());
System.out.println("Relation: ");
System.out.println("\nClassifier Model(full training set)\n===============================");
System.out.println(nB);
System.out.println(eval.toSummaryString("\nSummary Results\n==================", true));
System.out.println(eval.toClassDetailsString());
System.out.println(eval.toMatrixString());
//txtArea output
txtAreaShow.append("\n\n\n");
txtAreaShow.append("Run Information\n===================\n");
txtAreaShow.append("Scheme: " + train.getClass().getName());
txtAreaShow.append("\n\nClassifier Model(full training set)"
+ "\n======================================\n");
txtAreaShow.append("" + nB);
txtAreaShow.append(eval.toSummaryString("\n\nSummary Results\n==================\n", true));
txtAreaShow.append(eval.toClassDetailsString());
txtAreaShow.append(eval.toMatrixString());
txtAreaShow.append("\n\n\n");
System.out.println(txtAreaShow.toString());
} catch (FileNotFoundException ex) {
System.err.println("File not found");
System.exit(1);
} catch (IOException ex) {
System.err.println("Invalid input or output.");
System.exit(1);
} catch (Exception ex) {
System.err.println("Exception occured!");
System.exit(1);
}

You can take a look at Blayze - It's a pretty minimal Naive Bayes library for the JVM written in Kotlin. Should be easy to follow.
Full disclosure: I'm one of the authors of Blayze

sphinx speech recognition delay

I am using the open source sphinx sdk to do some voice recognition. I am currently running the HelloWorld example. However response is very sluggish, it takes several attempts to recognize a word, and sometimes it recognizes it but takes a little to output what I have said. Any ideas how to improve this? Also when I change the grammer file it doesn't update and recognize my new words.
Thanks

Basically you can use Sphinx in several configurations. If you know the pattern of the voice that you have to recognize then you can use the configuration with custom grammar.
In that configuration its having higher response rate than normal configuration, since it only listen for predefine words with pre-define pattern. (a Grammar)
You can define your own grammar file by following the JSGF standards. (more)
Sample Configuration
Configuration configuration = new Configuration();
configuration.setAcousticModelPath(ACOUSTIC_MODEL);
configuration.setDictionaryPath(DICTIONARY_PATH);
configuration.setGrammarPath(GRAMMAR_PATH);
configuration.setUseGrammar(true);
configuration.setGrammarName("mygrammar");
LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
Sample Grammar File
#JSGF V1.0;
grammar mygrammar;
public <COMMON_COMMAND> = [please] turn (on | off) lighs;

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.