I am a musician/singer/songwriter,
I was hoping someone might know of information already out that does some if not all of what I'm trying to achieve.
I record song ideas into raw digital wav files using only my voice to emulate instruments ( vocal melody, bass, guitar, drums, etc.) into a song structure (verse, chorus, bridge).
I was hoping that java/fft could be used to slice each mili-second into an array that could be broken down into notes and riffs that I am singing.
Here is a list of some of the steps I see that need to be done with my wav files.
Find out the note that I'm singing. The software would take each note and nudge it into the nearest "true note" (a4=440hz).
It would take the notes and find out which key or possible keys the song may be in.
From a very large database of real songs, the software would make chord suggestions and placement suggestions depending on the genre the song is in.
It would take the riffs ( any sequence of more than 3 notes done more than 3 times in a song) and create loops with a drop down box of alternative voiceings and randomizing.
There is much more, but this should show you the basics of what I’m trying to do.
If there aren’t programs already written that already do all or part of this, would it be possible for me to write a program that uses java and fft to slice every millisecond into an array to determine notes?
I have read some java/fft material and it is way over my head mostly (I have studied a little java) but I was hoping someone might be able to lead me in the right direction.
Related
So I can get an array of all the bytes of a wav file, I just want to know how I can decode the raw sound data to something I can use to tell when the singer is speaking/his beat (I don't know the proper musical terms, sorry)
If there is an API or tutorial out there that someone could link me to, that would be swell since I can't seem to find anything good.
Will you know this beat in advance? If so, you could cross correlate the two signals and the highest peak in this output would correspond to the time delay.
Other than that, depending on the sound before the beat starts, you could convert to frequency domain (via FFT) and have a look at what frequencies are present and see whether there's a significant change when the beat begins.
Some examples/extra detail would help.
If you're trying to detect the tempo of said beat, please ignore everything most of what I've said.
In general, detecting "the instances when something beats" in a wave file is not as one may imagine at the first thought.
A possible first step is to transform your .wav into a so-called "spectrogram."
I don't think Java has a dedicated API for this purpose, but googling "java spectrogram" would give you a number of third-party examples.
I also found this question might be relevant.
P.S. I'm not a specialist in signal processing, so corrections are welcome.
I've recently begun trying to create a mobile app (iOS/Android) that will automatically beat match (http://en.wikipedia.org/wiki/Beatmatching) two songs.
I know that this exists out there, and there have been others who have had some success, but I'm running into issues related to the accuracy of the players.
Specifically, I run into "sync" issues where the "beats" don't line up. The various methods used to date are:
Calculate the BPM in advance, identify a "beat" (using something like sonicapi.com), and trying to line up appropriately, and begin a mix in with its playback rate adjusted (tempo adjustment)
Utilizing a bunch of meta data to trigger specific starts and stops
What does NOT work:
Leveraging echonest's API (it beat matches on the server, we want to do it on the client)
Something like pydub (does not do it in realtime)
Who uses this algorithm today:
iwebdj
Traktor
Does anyone have any suggestions on how to solve this problem? I've seen lots of people do it, but doing it in real time on a mobile device seems to be an issue.
There are lots of methods for solving this problem, some of which work better than others. Matthew Davies has published several papers on the matter, among many others. Glancing at this article seems to break down some of the steps necessary for doing this. I built a beat tracker in Matlab (unfortunately...) with a fellow student and our goal was to create an outro/intro between 2 songs so that the tempo was seamless between them. We wanted to do this for songs that varied in BPM by a small amount (+-7 or so BPM between the two). Our method went sort of like this:
Find two songs in our database that had overlapping 'key center'. So lets say 2 songs, both in Am.
Find this particular overlap of key centers between the two. Say 30 seconds into song 1 and 60 seconds into song 2
Now create a beat map, using an onset-detection algorithm with peak picking; Also, this was helpful for us.
Pick the first 'beat' for each track, and overlap the two tracks at that point. Now, since they are slightly different BPM from each other, the beats won't really line up with each other.
From this, we created a sort of map that gave us the sample offsets between beats of song A and beats of song B. From this, we wanted to be able to time-stretch the fade-in region of song B so that each one of its onsets (beats in this case) lined up at the correct sample index as the onsets from song A, over ITS fade-out region. So for example, if onset 2 from song B was shown as 5,000 samples ahead of onset 2 from song A, we simply stretched that 5,000 sample region so that onset 2 matched exactly between both songs.
This seems like it would sound weird, but it actually sounded pretty good. Although this was done entirely offline in Matlab, I am also looking for a way to do this in real-time in a mobile app. Not entirely sure about libraries you can use for this in Android world, but I imagine that it would be most efficient in C++.
A couple of libraries I have come across would be good for prototyping something, or at least studying the source code to get a better understanding of how you could do this in a mobile app:
Essentia (great community, open-source)
Aubio (also seems to be maintained pretty well, open-source)
Additional things to read up on for doing this kind of stuff in iOS land:
vDSP Programming guide
This article may also help
I came across this project that is doing some beat detection. Although it seems pretty out-dated unfortunately, it may offer some additional insights.
Unfortunately it isn't as simple as just 'pressing play' at the same time to align beats, unless you are assuming very specific aspects about them (exact tempos, etc.).
If you reallllly have some time on your hands, you should check out Tristan Jehan's (founder of Echonest) thesis; it is jam packed with algorithms and methods for beat detection, etc.
So basically I want this to get the range of 60 - 150 Hz which is the general area for bass that lies in a song. Whenever it is in this range I want it do a function, and only it the range, my problem is I have tried to look up the functions needed to do so but with no luck, if one could show me here or a good article or explanation on this it will be great! I appreciate all the help and I will continue looking on my own. If more explanation is needed I can provide whatever information that is needed!
Austin.
UPDATE: I simplified an algorithm here:
User selects the song they want
Song loads onto player
Function scans song and finds the lower frequencies throughout the song and the output is a pattern.
Step 1) Do a fast fourier transform: http://en.wikipedia.org/wiki/Fast_Fourier_transform
An FFT takes a piece of sound and transforms it into the frequency/time domain - as in, which frequencies are playing and how intensely and during what parts of the sound. This is a useful mathematical operation that relies upon the property that all sound, no matter how complex, can be fundamentally constructed out of one or more sine waves of different frequencies and amplitudes.
If you've ever looked at a spectrogram, for example in foobar2000, it is implemented using FFT:
I suggest instead of trying to implement FFT yourself you find a library that is well tested and fast, such as http://en.wikipedia.org/wiki/FFTW which is written in C
Step 2) Now that you've FFTed the part of the sound that the user is listening to, you can simply inspect the frequency bins and do whatever you want! Although detecting bass kicks is not as simple as 'is this frequency bin a high value?' because then you may mistake bass lines for bass kicks. You may need to do further testing and research to get it to work juuust right.
EDIT: Delyan suggests http://www.clear.rice.edu/elec301/Projects01/beat_sync/beatalgo.html and it looks pretty good.
i am developing a desktop application using java. this application is for school kid to teach English, where user can upload some English audio can be in any format which need to be converted into text file. where they can read the text.
I've found some api but i am not sure about them.
http://cmusphinx.sourceforge.net/wiki/
I've seen many question on stackoverflow regarding this but none was helpful. if someone can help on this will be very greatful
thank you
There are many technologies and services available to perform speech recognition. For an intro to some of the choices see https://stackoverflow.com/a/6351055/90236.
I'm not sure that the results will be acceptable for teaching children English as a second language, but it is worth trying.
What you seek is currently breaking edge technology. Tools like cmusphinx can detect words from a dedicated, limited dictionary (so you can teach it to understand, say, 15 words and that's it - you can't teach it to understand English).
Basically, those tools try to find patterns in the sound waves that you feed them. They don't understand anything, they just use the same algorithm on anything and then try to find the closest match. This works well for small sets of words but as the number of words increases, the difference between then shrinks and the jobs gets ever harder (without even starting with words like whether and weather or C and see).
What you might consider is "repeat after me" software. Here, you need to record all words for the test as templates. Then you can record the words from the pupils and then compute the difference. If the difference is not too large, the word is correct. But again: This is simple repetition to improve pronunciation - not English.
There is desktop software which can understand a lot of English (for example the products from Nuance, Dragon Naturally Speaking being one of the most prominent). They do offer server solutions but that software isn't free or cheap if you're on a tight budget.
I just want to play a very simple, straight forward note by giving my computer a certain frequency as an integer, and from there I can figure out how to make it play the note longer or shorter. It does not necessarily have to come out of the actual sound card - if it's generated and output by the internal speaker that's okay.
I looked at the midi libraries that java has included, and they are way more than what I want to do. This just needs to be very basic.
Look into JFugue -- it's really easy to do some basic stuff, and the capabilities are there if you want to expand later.
Player player = new Player();
player.play("A C# E");
This example constructs and plays an equal tempered scale.
As far as i know there is no way to do this without some boilerplate code in Java. The most simple API is probably provided by the Applet class (can be used by non-Applets as well) in the form of the static newAudioClip(URL url)-method. However this gives you just the ability to play predefined audio clips and you have very little control over the audio. If you just need to play audio from a small, predefined set of clips it might suffice (you could have a set of wav-files containing your notes and play them this way, AudioClip's can be looped if desired).
Other than that, both midi and sampled audio API's are much more powerful, but the flexibility comes at a price: You need considerably more code to set them up.