Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to calculate the beat per minute in a Audio File in android , I have just a small clue , There is a Visualizer Library which creates a DIGITAL BAR effect with the Audio files wave , I can check for the beat with this , Is this the correct solution or is there any proper way to do this ? I want to categorize Audio files in a proper way. According to the Beat/minute in a File.
Any help would be greatfull
Beats per minute can be calculated with multiple levels, a simple energy calculator which you are referring to by the should level meter or VAD (voice/ audio activity detector) can be somewhat simple to make, where as a proper pitch detector, this is a complex process and isolating the beat of music segment can be complex since perception of a beat is complex.
If you are simply interested in energy calculator/ beat like feature what you can do is have a two running averages and see how large is your signal relative to the other.
X= [x1……xn] input audio samples, separate the buffers into smaller segment say 100 samples. n=100,
Take the absolute value for this array abs(X),
Simple one pole smoothing function can be made with
X_filtered_long= X_filtered_long . (1-alpha) + abs(X). alpha // alpha is .02, value depends on the sample rate, signals and what beat of interest
Create the second filtered signals
X_filtered_short= X_filtered_short . (1-beta) + abs(X). beta // beta is .2
If (X_filtered_short > X_filtered_long)
Detected_beat= 1;
InsideBeat=+1;
else
Detected_beat= 0;
InsideBeat=0;
If you want to, "I want to categorize Audio files in a proper way. According to the Beat/minute in a File." This can only be done with finger printing the audio with parameters such as MFCC.
Good reference would be
Automatic genre classification of music content: a survey
Scaringella, N. ; Zoia, G. ; Mlynek, D.
Signal Processing Magazine, IEEE
Volume: 23 , Issue: 2
What you're asking here is tremendously difficult.
Audio analysis to get the beats is usually done with complex mathematical manipulation of the audio data, by transforming the audio signal from the time-domain to the frequency-domain using signal-processing techniques. There are whole books dedicated to this subject.
Visualizers like the one you mention internally use many of these DSP techniques and it would probably be an even worse nightmare to analyze the visualizer data than the audio data.
Even if you manage to find a library that does this for you, the results would be very unreliable. Maybe for electronic music where the beats are more obvious you would get better results than for classical music or jazz.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
The smallest unit of a digital image is a pixel.
What is the smallest unit of digital sound?
what can be considered to be a pixel for sound?
How can we use java to manipulate it?
The smallest unit of sound is known as a frame. For 8 bit mono it will be a single byte. For stereo 16 bit it will be 4 bytes.
How can we use Java to manipulate it?
That depends on what you want to do with it. You will need to be a lot more specific to get reasonable answers.
Some possible operations are:
Volume change
Pan
Speed or slow the play rate, with or without..
Pitch shift
Spectrum analysis..
.. how many hertz or samples can the speaker produce?
That depends largely on the speaker. Speakers have all different types of dynamic ranges, usually in a kind of 'bell curve' with no absolute upper or lower limits.
Does that mean it takes 44KB to store 1 second of music that is CD Quality?
Each frame of CD quality sound contains 4 bytes, given it is stereo, 16 bit. Multiply 4 bytes by 44100 to calculate the number of bytes per second.
What's the difference between mono and stereo?
Mono has one channel, stereo has two.
What I want to do is manipulate individual units of sound and also - to create a custom musical instrument/synth.
It is not so hard to generate a simple sinusoidal sound in code. See Beeper for an example.
A lot of other effects can be created by playing around with the ADSR (Attack, Decay, Sustain, Release) envelope of a sound. For example, applying the ADSR envelope of a guitar note to a piano tone, will make it sound uncannily like a guitar, and vice versa.
What is channel? Is it like speaker - Left speaker is one channel and right speaker is another?
Pretty much. Mono sounds like rubbish (IMO), while stereo can make the different instruments sound like they are coming from different positions, just like if the band were sitting right in front of you.
5.1 channel sound is a little more complicated, and usually1 it 'cheats' by simply.
Putting the left channel through the left speaker(s).
Putting the right channel through the right speaker(s).
Mixing them both equally and putting that through the center speaker.
Filtering for just the low frequency sound and putting that through the single woofer or bass speaker. The human ear cannot easily tell where low frequency sounds are coming from, so that is acceptable. The woofer can be placed anywhere in the room, and still sound just the same.
To be honest, I do not know of any sound format that actually stores 5 or 6 channels for the sound, I think it is all separated out (for the woofer) or mixed together (for the center speaker) in hardware at run-time. Java Sound will only deal with one or 2 channels directly, in any case.
The smallest unit of digital sound is a sample -- the signal level at a particular point in time. [But see addendum below.]
To use Java to manipulate it: If you have to ask this question, you probably want to go looking for libraries someone else has written.
But if you want to know in general what's involved: Read in the sound file. If it was in a compressed format (such as MP3), unpack it. That will give you a very long array/vector of samples. You can cut-and-paste sections of that to edit the recording, or scale it to make it softer or louder (beware of "clipping", which results when you try to exceed the maximum volume). More complicated manipulations are possible, but that's a full course in digital signal processing which I'm not going to try to do here -- websearch that phrase, especially in conjunction with sound or audio or music should find more information.
You can also generate your own audio by producing the samples programmatically. A signal which varies sinusoidally from sample to sample produces a pure tone. Other repeating shapes add overtones of various kinds. Varying the frequency of the repetition changes the pitch. Adding several signals together (while watching out for clipping) mixes them into a single signal. And so on.
Note that MIDI is not "digital sound" -- it's a digital score. It describes what notes should be played when, but it's up to the synth to turn that into sound.
ADDENDUM: I haven't heard the term "frame" before (see Andrew's answer), but I'll believe it. I think of samples because I'm thinking at the hardware layer, but distinguishing that from sample meaning an audio clip is a Good Thing so I'd bet frame is indeed more correct/current.
In java you´d typically work with AudioInputStream instances (that you get out of classes defined by the Java sound API). Those are read byte-wise for playback.
I have never doen manipulation myself, but as far as I know, this is mostly done through Java sound´s mixer class.
Below tutorial should have all the info you´re looking for:
http://docs.oracle.com/javase/tutorial/sound/playing.html
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can the tempo/BPM of a song be determined programmatically? What algorithms are commonly used, and what considerations must be made?
This is challenging to explain in a single StackOverflow post. In general, the simplest beat-detection algorithms work by locating peaks in sound energy, which is easy to detect. More sophisticated methods use comb filters and other statistical/waveform methods. For a detailed explication including code samples, check this GameDev article out.
The keywords to search for are "Beat Detection", "Beat Tracking" and "Music Information Retrieval". There is lots of information here: http://www.music-ir.org/
There is a (maybe) annual contest called MIREX where different algorithms are tested on their beat detection performance.
http://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/
That should give you a list of algorithms to test.
A classic algorithm is Beatroot (google it), which is nice and easy to understand. It works like this:
Short-time FFT the music to get a sonogram.
Sum the increases in magnitude over all frequencies for each time step (ignore the decreases). This gives you a 1D time-varying function called the "spectral flux".
Find the peaks using any old peak detection algorithm. These are called "onsets" and correspond to the start of sounds in the music (starts of notes, drum hits, etc).
Construct a histogram of inter-onset-intervals (IOIs). This can be used to find likely tempos.
Initialise a set of "agents" or "hypotheses" for the beat-tracking result. Feed these agents the onsets one at a time in order. Each agent tracks the list of onsets that are also beats, and the current tempo estimate. The agents can either accept the onsets, if they fit closely with their last tracked beat and tempo, ignore them if they are wildly different, or spawn a new agent if they are in-between. Not every beat requires an onset - agents can interpolate.
Each agent is given a score according to how neat its hypothesis is - if all its beat onsets are loud it gets a higher score. If they are all regular it gets a higher score.
The highest scoring agent is the answer.
Downsides to this algorithm in my experience:
The peak-detection is rather ad-hoc and sensitive to threshold parameters and whatnot.
Some music doesn't have obvious onsets on the beats. Obviously it won't work with those.
Difficult to know how to resolve the 60bpm-vs-120bpm issue, especially with live tracking!
Throws away a lot of information by only using a 1D spectral flux. I reckon you can do much better by having a few band-limited spectral fluxes (and maybe one broadband one for drums).
Here is a demo of a live version of this algorithm, showing the spectral flux (black line at the bottom) and onsets (green circles). It's worth considering the fact that the beat is extracted from only the green circles. I've played back the onsets just as clicks, and to be honest I don't think I could hear the beat from them, so in some ways this algorithm is better than people at beat detection. I think the reduction to such a low-dimensional signal is its weak step though.
Annoyingly I did find a very good site with many algorithms and code for beat detection a few years ago. I've totally failed to refind it though.
Edit: Found it!
Here are some great links that should get you started:
http://marsyasweb.appspot.com/
http://www.vamp-plugins.org/download.html
Beat extraction involves the identification of cognitive metric structures in music. Very often these do not correspond to physical sound energy - for example, in most music there is a level of syncopation, which means that the "foot-tapping" beat that we perceive does not correspond to the presence of a physical sound. This means that this is a quite different field to onset detection, which is the detection of the physical sounds, and is performed in a different way.
You could try the Aubio library, which is a plain C library offering both onset and beat extraction tools.
There is also the online Echonest API, although this involves uploading an MP3 to a website and retrieving XML, so might not be so suitable..
EDIT: I came across this last night - a very promising looking C/C++ library, although I haven't used it myself. Vamp Plugins
The general area of research you are interested in is called MUSIC INFORMATION RETRIEVAL
There are many different algorithms that do this but they all are fundamentally centered around ONSET DETECTION.
Onset detection measures the start of an event, the event in this case is a note being played. You can look for changes in the weighted fourier transform (High Frequency Content) you can look for large changes in spectrial content. (Spectrial Difference). (there are a couple of papers that I recommend you look into further down) Once you apply an onset detection algorithm you pick off where the beats are via thresholding.
There are various algorithms that you can use once you've gotten that time localization of the beat. You can turn it into a pulse train (create a signal that is zero for all time and 1 only when your beat happens) then apply a FFT to that and BAM now you have a Frequency of Onsets at the largest peak.
Here are some papers to lead you in the right direction:
https://web.archive.org/web/20120310151026/http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
Here is an extension to what some people are discussing:
Someone mentioned looking into applying a machine learning algorithm: Basically collect a bunch of features from the onset detection functions (mentioned above) and combine them with the raw signal in a neural network/logistic regression and learn what makes a beat a beat.
look into Dr Andrew Ng, he has free machine learning lectures from Stanford University online (not the long winded video lectures, there is actually an online distance course)
If you can manage to interface with python code in your project, Echo Nest Remix API is a pretty slick API for python:
There's a method analysis.tempo which will give you the BPM. It can do a whole lot more than simple BPM, as you can see from the API docs or this tutorial
Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.
SO question that might help: Bpm audio detection Library
Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal
Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...
another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.
To repost my answer: The easy way to do it is to have the user tap a button in rhythm with the beat, and count the number of taps divided by the time.
Others have already described some beat-detection methods. I want to add that there are some libraries available that provide techniques and algorithms for this sort of task.
Aubio is one of them, it has a good reputation and it's written in C with a C++ wrapper so you can integrate it easily with a cocoa application (all the audio stuff in Apple's frameworks is also written in C/C++).
There are several methods to get the BPM but the one I find the most effective is the "beat spectrum" (described here).
This algorithm computes a similarity matrix by comparing each short sample of the music with every others. Once the similarity matrix is computed it is possible to get average similarity between every samples pairs {S(T);S(T+1)} for each time interval T: this is the beat spectrum. The first high peak in the beat spectrum is most of the time the beat duration. The best part is you can also do things like music structure or rythm analyses.
I'd imagine this will be easiest in 4-4 dance music, as there should be a single low frequency thud about twice a second.
This question already has answers here:
Detect silence when recording
(2 answers)
Closed 9 years ago.
I am starting a project which would allow me to use Java to read sound samples, and depending on the properties of each sample (I'm thinking focusing on decibels at the moment for the sake of simplification, or finding some way to compute the overall 'volume' of a specific sample or set of samples), return a value from 0-255 where 0 would be silence and 255 would be the highest sound pressure (Compared to a reference point, I suppose? I have no idea how to word this). I want to then have these values returned as bytes and sent to an Arduino in order to control the intensity of LED's using PWM, and visually 'see' the music.
I am not any sort of audio file format expert, and have no particular understanding of how the data is stored in a music file. As such, I am having trouble finding out how to read a sample and find a way to represent its overall volume level as a byte. I have looked through the javax.sound.sampled package and it is all very confusing to me. Any insight as to how I could accomplish this would be greatly appreciated.
First i suggest you to read Pulse-code modulation which is the format use to store data on a .wav file (the simplest to begin with).
Next there is a post on how to get PCM data from a wav file in java here.
Finally to get the "volume" (which is actually more the energy) apply this energy equation.
wish it could help you,
As Bastyen (+1 from me) indicates, calculating decibels is actually NOT simple, but requires looking at a large number of samples. However, since sound samples run MUCH more frequently than visual frames in an animation, making an aggregate measure works out rather neatly.
A nice visual animation rate, for example, updates 60 times per second, and the most common sampling rate for sound is 44100 times per second. So, 735 samples (44100 / 60 = 735) might end up being a good choice for interfacing with a visualizer.
By the way, of all the official Java tutorials I've read (I am a big fan), I have found the ones that accompany the javax.sound.sampled to be the most difficult. http://docs.oracle.com/javase/tutorial/sound/TOC.html
But they are still worth reading. If I were in charge of a rewrite, there would be many more code examples. Some of the best code examples are in several sections deep, e.g., the "Using Files and Format Converters" discussion.
If you don't wish to compute the RMS, a hack would be to store the local high and/or low value for the given number of samples. Relating these numbers to decibels would be dubious, but MAYBE could be useful after giving it a mapping of your choice to the visualizer. Part of the problem is that values for a single point on given wave can range wildly. The local high might be more due to the phase of the constituent harmonics happening to line up than about the energy or volume.
Your PCM top and bottom values would probably NOT be 0 and 256, more likely -128 to 127 for 8-bit encoding. More common still is 16-bit encoding (-32768 to 32767). But you will get the hang of this if you follow Bastyen's links. To make your code independent of the bit-encoding, you would likely normalize the data (convert to floats between -1 and 1) before doing any other calculations.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
First question: What would be the best language to create a video player in? / Anyone point me in the direction of a tut that can help me write said script?
Second Question: How can I code such player to echo a embed code for each video: Ala youtube/break/viemo.
^ whats amazing to me, is the fact I searched google for a day and a half and haven't even come close to someone explaining how to build a video player, let alone have a option for it to spit out a embed code or any other sharing options.
Usage info: Once the player is finished it will be imported into wordpress, so I can have total control of each video and manage them accordingly. Not asking for help for importing to WP but any tips would be great.
{Please don't point me to VideoJS or any other video service, as I will make my own and not pay for a license.}
In general, a video player is a picture gallery, where twenty four (or more) pictures are displayed in order every second during the entire duration of the film. Twenty four is the lowest limit for a person to visually confuse static pictures with motion, for better effects I would recommend thirty or more.
The second component of a video player is typically a music player, which displays many "frames" of music per second, which blend through the digital to analog playback system into something resembling continuous sound.
Getting these two subsystems to operate without letting one get ahead of the other is generally required for a "video playback" system. There are many "already done" systems, but it sounds like you envision building your own (to add in unique "features").
Keep in mind that there are very large volumes of data moving around in "video playback". This means that if it is possible, compressing the data is vital for reasonable performance. Compression routines are not as simple as they seem, and the major video codecs are those that do a good job of balancing CPU cycles to decompress, file size, and resulting image quality.
Assuming you really don't want to write a video player, but just want to use someone else's video player "with enhancements", you will be at the mercy of how well built the existing video player is, whether or not it supports any kind of customization, and if it does, how well it supports the customization you have in mind.
Since speed is such a consideration, even though more advanced languages exist, traditionally these things are done in C, assembly, or even hardware acceleration chips.
These are my thought, although you should try to search a little better... Tutorials are very easy to find ...
You could use Flash / ActionScript to create a custom video player. It's still common on the net, although more and more non-flash players are rising (HTML5). I still prefer Flash because of the performance, but keep in mind that iPhone / iPad doesn't support Flash...
If you are going to script your own videoplayer in Flash, this tutorial will set you off to create your own implementation...
For your second question:
Just create a database with a unique ID for every video URL your player will have. When you create the embed code you can include this unique ID as a URL var to the main video player.
From there on you can call your player page with URL vars (example: http://www.yourlink.com?videoid=ID).
When you embed your SWF object you can then pass the videoid along with a FlashVar, or prefetch the matching video URL and send that URL with a FlashVar to your SWF. It's not so complicated, more info can be found here.
try osmf.org. You can either use the strobe media playback with it or build your own player around it. OSMF is pretty robust
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to write programs that play music ( audio or midi or even pure tones will be ok)
But I would like to use it with threads, a thread play a sound while other thread play other sound.
Threads 1 * Can play pure tones in different intensity and frecuency
(to form a more complex envelop, creating the "timbre" of the sound))
Threads 2 * A group of threads 1 could play different notes in a given timbre
(to form chords from an instrument sound)
Threads 3 * A group of threads 2 could play chords in different notes
(to represent a musician)
Threads 4 * A group of threads 3 can become an orchestra! =)
The hard part here I think is that I want to output different sounds at same time, preprocessing that would be the typical way, but if the mix of sound can be done live, it becomes really more interesting.
Any ideas, experiences, libraries or info would help, thanks in advance!
I don't think threads are what you want here. The synchronization would be too difficult. What you probably want to do (and what I did for a similar application years ago), was maintain a data structure of active notes (could be implemented with class instances, or closures, or whatever works), and for each sample, call each item in the structure, sum the output (I'd recommend using signed 16bit math at this point, so your values are in a range of -32767 to +32768). To mix just sum the various signals.
Something like the following:
#ts = A clock, in eg, seconds, passed in to your calls for generation purposes.
sample = sum([notefunc(ts) for notefunc in notes])
#Now convert the sample to whatever format needed for your media lib
#Update notes array
... and repeat that loop 44100 times/sec. Some sort of buffering would probably be needed. Actual realtime was tricky. Back when I was playing around with this stuff (~2000 on a 233mhz G3 Powerbook) I could get real time with one or two simple notes, but not more.
You may want to have a look at the GStreamer framework. It allows you model audiostreams as "pipelines" composed of elements. Parallel elements will be automatically processed in different threads. Elements can be kept in sync using "clocks".
Have a look at the manual. The first 10 chapters will give you a good overview of the possibilities. (And it reads quickly.)
Looking at the list of plugins there seems to be some support for midi.
jMusic seems to have a comprehensive library. The links page on their site has further resources too.
[n.b. I haven't used this in anger; I looked at it some years back and went for a commercial package instead...]
hth, R
Here is an interesting blog that joins the music and software together. This page of the blog is dedicated to threading and lock free algorithms in musical software and there is a list of libraries. Also here is another list that you will be interested in.
Think about using Juce library ( http://www.rawmaterialsoftware.com/juce.php).
It's a C++ crossplatform library.
It has many different features ( http://www.rawmaterialsoftware.com/jucefeatures.php) in addition to audio function:
Threads syncronization functions
Gui building and graphics features
Support for VST plugins
Midi support
Double licensed (GPL 2.0 or proprietary licenses) allow you to redistribute your work or write closed source applications.
A lot of professional audio application are written with this library, like MAX/MSP ( http://en.wikipedia.org/wiki/Max_%28software%29 )
I would recommend JFugue.
I have used this library myself for programming music using multiple threads.
As an experiment, I have adapted an existing Piano module that is also using JFugue.