So I can get an array of all the bytes of a wav file, I just want to know how I can decode the raw sound data to something I can use to tell when the singer is speaking/his beat (I don't know the proper musical terms, sorry)
If there is an API or tutorial out there that someone could link me to, that would be swell since I can't seem to find anything good.
Will you know this beat in advance? If so, you could cross correlate the two signals and the highest peak in this output would correspond to the time delay.
Other than that, depending on the sound before the beat starts, you could convert to frequency domain (via FFT) and have a look at what frequencies are present and see whether there's a significant change when the beat begins.
Some examples/extra detail would help.
If you're trying to detect the tempo of said beat, please ignore everything most of what I've said.
In general, detecting "the instances when something beats" in a wave file is not as one may imagine at the first thought.
A possible first step is to transform your .wav into a so-called "spectrogram."
I don't think Java has a dedicated API for this purpose, but googling "java spectrogram" would give you a number of third-party examples.
I also found this question might be relevant.
P.S. I'm not a specialist in signal processing, so corrections are welcome.
Related
I am a musician/singer/songwriter,
I was hoping someone might know of information already out that does some if not all of what I'm trying to achieve.
I record song ideas into raw digital wav files using only my voice to emulate instruments ( vocal melody, bass, guitar, drums, etc.) into a song structure (verse, chorus, bridge).
I was hoping that java/fft could be used to slice each mili-second into an array that could be broken down into notes and riffs that I am singing.
Here is a list of some of the steps I see that need to be done with my wav files.
Find out the note that I'm singing. The software would take each note and nudge it into the nearest "true note" (a4=440hz).
It would take the notes and find out which key or possible keys the song may be in.
From a very large database of real songs, the software would make chord suggestions and placement suggestions depending on the genre the song is in.
It would take the riffs ( any sequence of more than 3 notes done more than 3 times in a song) and create loops with a drop down box of alternative voiceings and randomizing.
There is much more, but this should show you the basics of what I’m trying to do.
If there aren’t programs already written that already do all or part of this, would it be possible for me to write a program that uses java and fft to slice every millisecond into an array to determine notes?
I have read some java/fft material and it is way over my head mostly (I have studied a little java) but I was hoping someone might be able to lead me in the right direction.
I would like to do some audio and video analysis in Java.
In a bit more detail, I would like to identify the points in audio/video that have either been monotonous for quite some time or have drastically changed compared to some previous state.
If you want to look at it in a mathematical way, I can try to explain it like this:
Example:
You have an audio file. You should extract the waveform of that
audio file. You could try to approximate that waveform with some
simpler function, that can be expressed as a closed formula. Let's
call that function f(t).
Now, to find out how your function behaves (is it increasing or decreasing) at some point or interval, I guess I could use the first derivative,f'(t). If I'd like even more information, I assume second derivative, f''(t) would also come in handy.
So, if we assume we can do that then I guess I'd have 1 piece of information about the audio.
However, if I'm not mistaken, audio files can also have spectrograms, so I'm unsure how they fall into all of this.
So, the real question goes here: Is there a way to do this in Java (efficiently)? I've been doing some digging and I've found MusicG, however, the last update date is July 2012, which leads me to believe this may be abandoned.
The second part refers to video files, but without their audio component.
This is where I'll have more questions, so I'm just gonna go and shoot them.
How do you identify points of change in "pace" in videos?
Here's an example:
Imagine the video shows car driver's point of view while he's driving
on a long, straight road. Since the surroundings are mostly the same,
the pace could be described as "not changing much". At one point, the
road begins to curve but the driver, due to him falling asleep" is not
following the road that precisely, so the surroundings start to change
somewhat, and so does the pace. At the apex of that curve there is a
tree, which grows bigger and bigger as the car is approaching it.
Here, the POV (and the pace) is changing quite a lot, since the tree
is getting bigger and bigger. In the end, the car crashes into a tree,
all hell breaks loose, the car starts to roll uncontrollably, which
indicates a really intense pace.
I'm assuming one way could be to do an image segmentation and somehow determine which portions of the frames are changing, and how big are those portions to try to determine pace, but I'd like additional input.
If anyone has had prior experience doing any sort of related work in Java, what approaches did you explore and/or use? One thing that immediately comes to my mind is JavaCV, but as I said, with my limited experience, I'm unsure what to actually try.
I pretty much have no idea how I would go about doing this- I need to find the frequency of a played sine wave at a specific point in time.
I've been doing some research and see that it involves a Fourier transform. I've found a DFT class but am not really sure how to use it.
My questions are these:
How do I save a sound file from a mic at a specific point in time?
What format will this be saved in?
From this, how would I go about using an FFT/DFT to find the frequencies in that sample?
I don't really have a clear understanding of the mathematics behind the FFT (and yes, I've tried to understand it), so am not really sure what outputs/inputs represent.
Alternatively if there are any libraries that support this type of thing, a link would be much appreciated.
Thanks in advance
So basically I want this to get the range of 60 - 150 Hz which is the general area for bass that lies in a song. Whenever it is in this range I want it do a function, and only it the range, my problem is I have tried to look up the functions needed to do so but with no luck, if one could show me here or a good article or explanation on this it will be great! I appreciate all the help and I will continue looking on my own. If more explanation is needed I can provide whatever information that is needed!
Austin.
UPDATE: I simplified an algorithm here:
User selects the song they want
Song loads onto player
Function scans song and finds the lower frequencies throughout the song and the output is a pattern.
Step 1) Do a fast fourier transform: http://en.wikipedia.org/wiki/Fast_Fourier_transform
An FFT takes a piece of sound and transforms it into the frequency/time domain - as in, which frequencies are playing and how intensely and during what parts of the sound. This is a useful mathematical operation that relies upon the property that all sound, no matter how complex, can be fundamentally constructed out of one or more sine waves of different frequencies and amplitudes.
If you've ever looked at a spectrogram, for example in foobar2000, it is implemented using FFT:
I suggest instead of trying to implement FFT yourself you find a library that is well tested and fast, such as http://en.wikipedia.org/wiki/FFTW which is written in C
Step 2) Now that you've FFTed the part of the sound that the user is listening to, you can simply inspect the frequency bins and do whatever you want! Although detecting bass kicks is not as simple as 'is this frequency bin a high value?' because then you may mistake bass lines for bass kicks. You may need to do further testing and research to get it to work juuust right.
EDIT: Delyan suggests http://www.clear.rice.edu/elec301/Projects01/beat_sync/beatalgo.html and it looks pretty good.
Is there a library for detecting the currently playing song with Java? Not only with WinAMP or WMP, but generally, a technique to listen the audio output for example?
Thank you.
Edit: No, I just want to listen the audio output and decide whether there's a song playing right now, or not.
Not identifying. Just there's a song or not (playing right now).
You can take the Fourier transform of the audio input and compare it to known frequency distributions of various songs in your database. If they are close enough, you can say that they are the same.
This obviously has some flaws, but it's an idea you can work off of.
CLAMP is a good choice for WinAMP. People misunderstood the question so hard.
Two things you could try: 1. do STFFT and look for harmonic relationships. These appear in ordinary speech as well, of course, so you'll have to experiment with what kinds of harmonic relationships exist in music vs speech. 2. analyse the envelope for repeating rhythmic patterns.