I have thousands of non-English songs in MP3s & thousands of PPTs, each displaying the lyrics of a song. I want to write a java program that would match up the correct MP3 with its PPT.
I could think of 2 strategies. I have no idea if they're even close to being implementable:-
1. Extract lyrics from song via some sort of non-English voice recognition & compare it to lyrics extracted as text from PPTs.
2. Play the song to generate a waveform. Make a "Windows Narrator" kinda program read the lyrics from PPTs to get its waveform & then try to match the two.
Notes:-
1. None of the MP3s have embedded lyrics.
2. The lyrics in the PPTs were typed in, they aren't images, so no need to think OCR.
I want to know about any other strategies you guys could come up with & most importantly, please write about any java packages that could be of any help.
Is Fourier Transform involved in any way?
Another approach: write something to extract the text from PPT and put the text plus name of the PPT file it came from into a database, text file or really, anything searchable.
Write another little app to hand a user one song file at a time; give them a way to play bits of the song, a text box to type in lyrics, and a search button that'll search through your PPT lyrics for a match. When they find a match, another button records the name of the MP3 against the matching lyrics file in your database or other file.
Hire a couple of intelligent college/high school kids to do the listening/searching.
I'm betting the project would be finished in far less time and cost than what you're considering doing.
If you find some good open source software for this, please post it. I doubt such a thing exists.
Keep in mind that reading and singing would have quite different wave forms - not to speak of the music that would have to be filtered out and the differences between voices.
Additionally, keep in mind that you might have to perform some similarity calculations since not always is the sung text equal to the written lyrics.
I'd say programming such a task is quite extensive and it would require a whole team to implement. Sure you can handle that?
Related
I’m working on a modified speech to text feature that should take in a users speech and convert it to text but I want the output text to be exactly what the user is saying. This means I want to detect word disfluency’s such as stammers like “sstttop” and “pppplease”. Ive already written a Java program that does the speech to text but I need to know if it’s possible to modify it to detect speech disfluency. Any input and help would be much appreciated.
I think it's better to improve the structure of the text from the speech delivered by stammer
My first guess would be that you would have to analyze the time that a user spends producing each specific sound. For example, one S could be the 's' sound for half a second whereas two 's's could be represented by the user producing the sound for one second. I understand that this is not completely accurate but best guess I can think of.
I am trying to write a program that takes the source code from a html file and manipulates the data. So far, I have been able to write code to strip all html code and output onto a .txt file which is great.
But I am struggling with the next part of my code. I am trying to model a class for entertainment review (the website I am taking info from is a blog about movie, play, and film reviews). I want to be able to make it organized so that the new .txt file will be sorted by the type of entertainment they watched (i.e. film, play, etc.), then display the cost of the movie and also show whether it was recommended or not. I have attached what some of the source code looks like below.
I really don't even know where to start here, or how to manipulate the data to be organized. I feel I am supposed to make a constructor that takes each line of data, but I could be wrong.
<p>Tonight I saw <em class="film">You Will Meet a Tall Dark Stranger,</em> a film by Woody Allen but without him anywhere in it. I'd say it's okay to see once, but not critical.... lots of intertwining relationship stories with endings you can never anticipate.
I have a txt file with 200.000 lines. I want to show in AutoCompleteTextView only city name and country. I have idea (Show only characters) How to do that?
Example of line: (4463523 Denver 35.531250 -81.029800 US).
In this example i want to show Denver and US.
Well, unfortunately, there will not be a good way except reading the whole file in (use, for example, a Scanner). Then you can, for example, store the parts you want to access quickly in a HashMap. Depends on your specific usecase and how much memory/cpu time you want to spend.
I am writing some information in a file and I want to update some part of this information. For instance:
If we assume the current context of the file is following sentences:
This cake is made by Mary. 12
Students play football.12908
She is a teacher.546
Then I want to change 12908 to 765 in that file and write two new lines.The context of the file after changes would be like this:
This cake is made by Mary. 12
Students play football. 765
She is a teacher.546
I love my mother.
The sky is bule. 897
I want to update information many times in a file. How can I do this?
If we are talking about reading and writing so few lines you can use java's class RandomAccessFile which fits exactly to your needs, but by experience i can tell that if you need to deal with a considerable amount of data IO, this is very inefficient . In that case as suggested by alfasin: "Start by reading the file line-by-line and writing each line (or corrected line) into another file, and take it from there".
Here's a tutorial on how to use RandomAccessFile.
I'm trying to write an application which adds some noises (or markers) to various parts of a Video Clip and trigger an action once a section (marker) been reached.
I think using technologies like Audio Stenography cannot help this purpose. As far I understood it, It hides a text value in unused sections of a WAV file and extract them from a file.
I also learned that any frequencies under 20 Hz and upper 20KHz cannot be heared by a human ear. Using a Audio Analysis library like musicg,
gave me an idea to recognize and encode those frequencies with some algorithms like FFT and trigger an action based on that encoded unhearable frequency.
That was all I could find out after a week of investigation and unfortunately don't know further and will appreciate if somebody have had a similar experience in this field and can help me.