I’m working on a modified speech to text feature that should take in a users speech and convert it to text but I want the output text to be exactly what the user is saying. This means I want to detect word disfluency’s such as stammers like “sstttop” and “pppplease”. Ive already written a Java program that does the speech to text but I need to know if it’s possible to modify it to detect speech disfluency. Any input and help would be much appreciated.
I think it's better to improve the structure of the text from the speech delivered by stammer
My first guess would be that you would have to analyze the time that a user spends producing each specific sound. For example, one S could be the 's' sound for half a second whereas two 's's could be represented by the user producing the sound for one second. I understand that this is not completely accurate but best guess I can think of.
Related
Hello Guys
I am developing a dictionary application where users can search Arabic <-> Turkish. I'm getting the data from firebase, no problem here. In my algorithm, the user's keyboard language is selected when the user presses the search view. If this language is Turkish, the text entered by the user is listed as Turkish in the search view (+recycle view), sends it to the recycle view and is listed. If that language is Arabic, I list it as Arabic. By the way, you can think of the data I listed as key & value. The Turkish equivalent of each Arabic word is on the same line. So far the app is working fine for me because I am using my phone's default keyboard and I can get the keyboard language.
The problem starts here;
I can't get this keyboard language when user uses custom keyboards published in Play Store. I can't list it because I can't get the keyboard language. I opened a thread on Stackoverflow but was told that I can't access the language of these custom keyboards in any way. So, how can I sort by understanding whether the user is searching in Arabic or Turkish, without picking up the keyboard language or in any way asking the user in which language to search? Thanks in advance and good work.
You will have to maintain a translation in your server, when user searches in one language the corresponding meaning in other language should also be searched, the corresponding meaning will be stored on the server(or on client).
If you can't reliably get the keyboard's locale, that seems like a no-go for what you want to do. But even if someone's using a Turkish keyboard, that doesn't mean they're typing Turkish text, right? Since it basically covers the latin character set - they could even be typing in romanised Arabic! (I don't know how likely that is, but it's possible)
You might want to look into a library that detects languages - from a quick search there are a few, and ML-Kit is a Google library that people seem to recommend for it.
I think whatever you do, you probably want the user to be able to set their input language explicitly - give them the final say (and responsibility for ensuring it's correct!). Similar to how Google Translate does it - you can type and it can guess what language you're using (and it says something like (automatically detected) next to it) but the user gets to explicitly choose
edit since you really want this to be automatic (I'd really recommend giving the user control over this, just in case) could you do something like checking if the characters they've entered are Arabic script?
Doesn't help with romanised Arabic (I don't know if that's really used much at all!) - but if you can assume Arabic uses Arabic script, and Turkish is anything else (or you could do the same with the Turkish characters) then maybe you could take a guess just by comparing their input to a set of potential characters. There might even be a convenient Unicode grouping you can check, but I'm not sure off-hand. Might be worth looking into
I saw a macro program that writes words on anywhere when user click a key on keyboard.(write words that user typed in advance)
But when I study programing i learned just printing out on console or my project like use "system.out"or "setText/print"
problem
I just tried to get curser focus and write words but i can't go forward.
Is it possible to write something from java project to another programs??
I'm not sure whether I understood your question correctly, but you can 'inject' user inputs with the AWT Robot API:
https://docs.oracle.com/javase/7/docs/api/java/awt/Robot.html
Here's an example you can take a look at.
For Android, we have Google's voice input api's where we speak into the microphone and it outputs k most likely things that we said. Is there a way that, instead, we give voice input a set of valid "commands" or sentences and it outputs the most likely one?
For example, a list of valid commands would be:
"Play song"
"Pause"
"Next"
"Previous"
However, no other words in the language would be considered as options.
Does anyone know how this could be accomplished?
You can archive that with CMUSphinx, it could take a grammar specifying the language to recognize as a parameter for decoder:
c.setString("-jsgf","grammar.jsgf");
The grammar is specified in JSGF format.
I have several devices that install as HID keyboard devices in most any operating system and, when used, send a string of text back, just like a keyboard. Is there any way in a Swing app to listen only to a chosen device, ignoring the standard keyboard, and do it without a TextComponent to capture the data? Thanks!
For anyone who comes across via google, etc, here is the solution I finally found.
(This is a solution to the second part of my question, how to capture input without a TextComponent).
I followed this tutorial and attached a KeyListener to my program. This allowed me to capture and parse input, albeit rather awkwardly. I have yet to find a smoother solution to this.
I may come back and add code to this. Please leave a comment if I have not yet done so and you would find it helpful
I have thousands of non-English songs in MP3s & thousands of PPTs, each displaying the lyrics of a song. I want to write a java program that would match up the correct MP3 with its PPT.
I could think of 2 strategies. I have no idea if they're even close to being implementable:-
1. Extract lyrics from song via some sort of non-English voice recognition & compare it to lyrics extracted as text from PPTs.
2. Play the song to generate a waveform. Make a "Windows Narrator" kinda program read the lyrics from PPTs to get its waveform & then try to match the two.
Notes:-
1. None of the MP3s have embedded lyrics.
2. The lyrics in the PPTs were typed in, they aren't images, so no need to think OCR.
I want to know about any other strategies you guys could come up with & most importantly, please write about any java packages that could be of any help.
Is Fourier Transform involved in any way?
Another approach: write something to extract the text from PPT and put the text plus name of the PPT file it came from into a database, text file or really, anything searchable.
Write another little app to hand a user one song file at a time; give them a way to play bits of the song, a text box to type in lyrics, and a search button that'll search through your PPT lyrics for a match. When they find a match, another button records the name of the MP3 against the matching lyrics file in your database or other file.
Hire a couple of intelligent college/high school kids to do the listening/searching.
I'm betting the project would be finished in far less time and cost than what you're considering doing.
If you find some good open source software for this, please post it. I doubt such a thing exists.
Keep in mind that reading and singing would have quite different wave forms - not to speak of the music that would have to be filtered out and the differences between voices.
Additionally, keep in mind that you might have to perform some similarity calculations since not always is the sung text equal to the written lyrics.
I'd say programming such a task is quite extensive and it would require a whole team to implement. Sure you can handle that?