Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
The smallest unit of a digital image is a pixel.
What is the smallest unit of digital sound?
what can be considered to be a pixel for sound?
How can we use java to manipulate it?
The smallest unit of sound is known as a frame. For 8 bit mono it will be a single byte. For stereo 16 bit it will be 4 bytes.
How can we use Java to manipulate it?
That depends on what you want to do with it. You will need to be a lot more specific to get reasonable answers.
Some possible operations are:
Volume change
Pan
Speed or slow the play rate, with or without..
Pitch shift
Spectrum analysis..
.. how many hertz or samples can the speaker produce?
That depends largely on the speaker. Speakers have all different types of dynamic ranges, usually in a kind of 'bell curve' with no absolute upper or lower limits.
Does that mean it takes 44KB to store 1 second of music that is CD Quality?
Each frame of CD quality sound contains 4 bytes, given it is stereo, 16 bit. Multiply 4 bytes by 44100 to calculate the number of bytes per second.
What's the difference between mono and stereo?
Mono has one channel, stereo has two.
What I want to do is manipulate individual units of sound and also - to create a custom musical instrument/synth.
It is not so hard to generate a simple sinusoidal sound in code. See Beeper for an example.
A lot of other effects can be created by playing around with the ADSR (Attack, Decay, Sustain, Release) envelope of a sound. For example, applying the ADSR envelope of a guitar note to a piano tone, will make it sound uncannily like a guitar, and vice versa.
What is channel? Is it like speaker - Left speaker is one channel and right speaker is another?
Pretty much. Mono sounds like rubbish (IMO), while stereo can make the different instruments sound like they are coming from different positions, just like if the band were sitting right in front of you.
5.1 channel sound is a little more complicated, and usually1 it 'cheats' by simply.
Putting the left channel through the left speaker(s).
Putting the right channel through the right speaker(s).
Mixing them both equally and putting that through the center speaker.
Filtering for just the low frequency sound and putting that through the single woofer or bass speaker. The human ear cannot easily tell where low frequency sounds are coming from, so that is acceptable. The woofer can be placed anywhere in the room, and still sound just the same.
To be honest, I do not know of any sound format that actually stores 5 or 6 channels for the sound, I think it is all separated out (for the woofer) or mixed together (for the center speaker) in hardware at run-time. Java Sound will only deal with one or 2 channels directly, in any case.
The smallest unit of digital sound is a sample -- the signal level at a particular point in time. [But see addendum below.]
To use Java to manipulate it: If you have to ask this question, you probably want to go looking for libraries someone else has written.
But if you want to know in general what's involved: Read in the sound file. If it was in a compressed format (such as MP3), unpack it. That will give you a very long array/vector of samples. You can cut-and-paste sections of that to edit the recording, or scale it to make it softer or louder (beware of "clipping", which results when you try to exceed the maximum volume). More complicated manipulations are possible, but that's a full course in digital signal processing which I'm not going to try to do here -- websearch that phrase, especially in conjunction with sound or audio or music should find more information.
You can also generate your own audio by producing the samples programmatically. A signal which varies sinusoidally from sample to sample produces a pure tone. Other repeating shapes add overtones of various kinds. Varying the frequency of the repetition changes the pitch. Adding several signals together (while watching out for clipping) mixes them into a single signal. And so on.
Note that MIDI is not "digital sound" -- it's a digital score. It describes what notes should be played when, but it's up to the synth to turn that into sound.
ADDENDUM: I haven't heard the term "frame" before (see Andrew's answer), but I'll believe it. I think of samples because I'm thinking at the hardware layer, but distinguishing that from sample meaning an audio clip is a Good Thing so I'd bet frame is indeed more correct/current.
In java you´d typically work with AudioInputStream instances (that you get out of classes defined by the Java sound API). Those are read byte-wise for playback.
I have never doen manipulation myself, but as far as I know, this is mostly done through Java sound´s mixer class.
Below tutorial should have all the info you´re looking for:
http://docs.oracle.com/javase/tutorial/sound/playing.html
Related
I am attempting to write a very simple DAW in Java but am having trouble playing an audio clip in a sequence. I have looked into both the sampled and MIDI classes in Java Sound but what I really need is a hybrid of the two.
It seems that with the MIDI classes you cannot use a sequencer for example, to play your own audio clip.
I have attempted to write my own sequencer using scheduling to play a javax.sound.sampled.Clip in a sequence but the timings vary far too much. It is not really a viable option as it doesn't keep time.
Does anybody have any suggestions of how I could get around this?
I can attest that an audio mixing system combining aspects of MIDI and samples can be written in Java, as I wrote my own and it currently works with samples and a couple real-time synths that I also wrote.
The key is making the audio data of the samples available on a per-frame basis and a frame-counting command-processor/audio-mixer that both manages the execution of "commands," and collects and mixes the audio frame data. With 44100 fps, that's accuracy in the vicinity of 0.02 milliseconds. I can describe in more detail if requested.
Another way to go, probably saner, though I haven't done it personally, would be to make use of a Java bridge to a system such as Jack.
EDIT: Answering questions in comment (12/8/19).
Audio sample data in Java is usually either held in memory (Java uses Clip) or read from a .wav file. Because the individual frames are not exposed by Clip, I wrote an alternate, and use it to hold the data as signed floats ranging -1 to 1. Signed floats are a common way to hold audio data that we are going to perform multiple operations upon.
For playback of .wav audio, Java combines reading the data with AudioInputStream and outputting with SourceDataLine. Your system will have to sit in the middle, intercepting the AudioInputStream, convert to PCM float frames, and counting the frames as you go.
A number of sources or tracks can be processed at the same time, and merged (simple addition of the normalized floats) to a single signal. This signal can be converted back to bytes and sent out for playback via a single SourceDataLine.
Counting output frames from an arbitrary 0th frame from the single SourceDataLine will help with keeping constituent incoming tracks coordinated, and will provide the frame number reference used to schedule any additional commands that you wish to execute prior to that frame being output (e.g., changing a volume/pan of a source, or a setting on a synth).
My personal alternate to a Clip is very similar to AudioCue which you are welcome to inspect and use. The main difference is that for better or worse, I'm processing everything one frame at a time in my system, and AudioCue and its "Mixer" process buffer loads. I've had several very credible people criticize my personal per-frame system as inefficient, so when I made the public API for AudioCue, I bowed to that preconception. [There are ways to add buffering to a per-frame system to recapture that efficiency, and per-frame makes scheduling simpler. So I'm sticking with my per-frame logical scheme.]
No, you can't use a sequencer to play your own clips directly.
In the MIDI world, you have to deal with samples, instruments, and soundbanks.
Very quickly, a sample is the audio data + informations such as looping points, note range covered by the sample, base volume and envelopes, etc.
An instrument is a set of samples, and a soundbank contain a set of instruments.
If you want to use your own sounds to play some music, you must make a soundbank out of them.
You will also need to use another implementation than the default provided by Java, because that default only read soundbanks in a proprietary format, which is gone since at least 15 and perhaps even 20 years.
Back in 2008-2009, there existed for example Gervill. It was able to read SF2 and DLS soundbanks. SF2 and DLS are two popular soundbank formats, several programs exist in the market, free or paid, to edit them.
If you want to go from the other way round, starting with sampled, that's also exact as you ahve noticed, you can't rely on timers, task schedule, Thread.sleep and the like to have enough precision.
The best precision you can achieve by using those is around 10ms, what's of course far too few to be acceptable for music.
The usual way to go here is to generate the audio of your music by mixing your audio clips yourself into the final clip. So you can achieve frame precision.
In fact that's very roughly what does a MIDI synthesizer.
Android provides a default of 15 steps for its sound systems which you can access through Audio Manager. However, I would like to have finer control.
One method of doing so seems to be altering specific files within the Android system to divide the sound levels even further then default. I would like to programmatically achieve the same effect using Java.
Fine volume control is an example of the app being able to divide the sound levels into one hundred distinct intervals. How do I achieve this?
One way, in Java, to get very precise volume adjustment is to access the PCM data directly and multiply it by some factor, usually from 0 up to 1. Another is to try and access the line's volume control, if it has one. I've given up trying to do the latter. The precision is okay in terms of amplitude, but the timing is terrible. One can only have one volume change per audio buffer read.
To access the PCM data directly, one has to iterate through the audio read buffer, translate the bytes into PCM, perform the multiplication then translate back to bytes. But this gives you per-frame control, so very smooth and fast fades can be made.
EDIT: To do this in Java, first check out the sample code snippet at the start of this java tutorial link, in particular, the section with the comment
// Here, do something useful with the audio data that's now in the audioBytes array...
There are several StackOverflow questions that show code for the math to convert audio bytes to PCM and back, using Java. Should not be hard to uncover with a search.
Pretty late to the party, but I'm currently trying to solve this issue as well. IF you are making your own media player app and are running an instance of a MediaPlayer, then you can use the function setVolume(leftScalar, rightScalar) where leftScalar and rightScalar are floats in the range of 0.0 to 1.0. representing logarithmic scale volume for each respective ear.
HOWEVER, this means that you must have a reference to the currently active MediaPlayer instance. If you are making a music app, no biggie. If you're trying to run a background service that allows users to give higher precision over all media output, I'm not sure how to use this in that scenario.
Hope this helps.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to calculate the beat per minute in a Audio File in android , I have just a small clue , There is a Visualizer Library which creates a DIGITAL BAR effect with the Audio files wave , I can check for the beat with this , Is this the correct solution or is there any proper way to do this ? I want to categorize Audio files in a proper way. According to the Beat/minute in a File.
Any help would be greatfull
Beats per minute can be calculated with multiple levels, a simple energy calculator which you are referring to by the should level meter or VAD (voice/ audio activity detector) can be somewhat simple to make, where as a proper pitch detector, this is a complex process and isolating the beat of music segment can be complex since perception of a beat is complex.
If you are simply interested in energy calculator/ beat like feature what you can do is have a two running averages and see how large is your signal relative to the other.
X= [x1……xn] input audio samples, separate the buffers into smaller segment say 100 samples. n=100,
Take the absolute value for this array abs(X),
Simple one pole smoothing function can be made with
X_filtered_long= X_filtered_long . (1-alpha) + abs(X). alpha // alpha is .02, value depends on the sample rate, signals and what beat of interest
Create the second filtered signals
X_filtered_short= X_filtered_short . (1-beta) + abs(X). beta // beta is .2
If (X_filtered_short > X_filtered_long)
Detected_beat= 1;
InsideBeat=+1;
else
Detected_beat= 0;
InsideBeat=0;
If you want to, "I want to categorize Audio files in a proper way. According to the Beat/minute in a File." This can only be done with finger printing the audio with parameters such as MFCC.
Good reference would be
Automatic genre classification of music content: a survey
Scaringella, N. ; Zoia, G. ; Mlynek, D.
Signal Processing Magazine, IEEE
Volume: 23 , Issue: 2
What you're asking here is tremendously difficult.
Audio analysis to get the beats is usually done with complex mathematical manipulation of the audio data, by transforming the audio signal from the time-domain to the frequency-domain using signal-processing techniques. There are whole books dedicated to this subject.
Visualizers like the one you mention internally use many of these DSP techniques and it would probably be an even worse nightmare to analyze the visualizer data than the audio data.
Even if you manage to find a library that does this for you, the results would be very unreliable. Maybe for electronic music where the beats are more obvious you would get better results than for classical music or jazz.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can the tempo/BPM of a song be determined programmatically? What algorithms are commonly used, and what considerations must be made?
This is challenging to explain in a single StackOverflow post. In general, the simplest beat-detection algorithms work by locating peaks in sound energy, which is easy to detect. More sophisticated methods use comb filters and other statistical/waveform methods. For a detailed explication including code samples, check this GameDev article out.
The keywords to search for are "Beat Detection", "Beat Tracking" and "Music Information Retrieval". There is lots of information here: http://www.music-ir.org/
There is a (maybe) annual contest called MIREX where different algorithms are tested on their beat detection performance.
http://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/
That should give you a list of algorithms to test.
A classic algorithm is Beatroot (google it), which is nice and easy to understand. It works like this:
Short-time FFT the music to get a sonogram.
Sum the increases in magnitude over all frequencies for each time step (ignore the decreases). This gives you a 1D time-varying function called the "spectral flux".
Find the peaks using any old peak detection algorithm. These are called "onsets" and correspond to the start of sounds in the music (starts of notes, drum hits, etc).
Construct a histogram of inter-onset-intervals (IOIs). This can be used to find likely tempos.
Initialise a set of "agents" or "hypotheses" for the beat-tracking result. Feed these agents the onsets one at a time in order. Each agent tracks the list of onsets that are also beats, and the current tempo estimate. The agents can either accept the onsets, if they fit closely with their last tracked beat and tempo, ignore them if they are wildly different, or spawn a new agent if they are in-between. Not every beat requires an onset - agents can interpolate.
Each agent is given a score according to how neat its hypothesis is - if all its beat onsets are loud it gets a higher score. If they are all regular it gets a higher score.
The highest scoring agent is the answer.
Downsides to this algorithm in my experience:
The peak-detection is rather ad-hoc and sensitive to threshold parameters and whatnot.
Some music doesn't have obvious onsets on the beats. Obviously it won't work with those.
Difficult to know how to resolve the 60bpm-vs-120bpm issue, especially with live tracking!
Throws away a lot of information by only using a 1D spectral flux. I reckon you can do much better by having a few band-limited spectral fluxes (and maybe one broadband one for drums).
Here is a demo of a live version of this algorithm, showing the spectral flux (black line at the bottom) and onsets (green circles). It's worth considering the fact that the beat is extracted from only the green circles. I've played back the onsets just as clicks, and to be honest I don't think I could hear the beat from them, so in some ways this algorithm is better than people at beat detection. I think the reduction to such a low-dimensional signal is its weak step though.
Annoyingly I did find a very good site with many algorithms and code for beat detection a few years ago. I've totally failed to refind it though.
Edit: Found it!
Here are some great links that should get you started:
http://marsyasweb.appspot.com/
http://www.vamp-plugins.org/download.html
Beat extraction involves the identification of cognitive metric structures in music. Very often these do not correspond to physical sound energy - for example, in most music there is a level of syncopation, which means that the "foot-tapping" beat that we perceive does not correspond to the presence of a physical sound. This means that this is a quite different field to onset detection, which is the detection of the physical sounds, and is performed in a different way.
You could try the Aubio library, which is a plain C library offering both onset and beat extraction tools.
There is also the online Echonest API, although this involves uploading an MP3 to a website and retrieving XML, so might not be so suitable..
EDIT: I came across this last night - a very promising looking C/C++ library, although I haven't used it myself. Vamp Plugins
The general area of research you are interested in is called MUSIC INFORMATION RETRIEVAL
There are many different algorithms that do this but they all are fundamentally centered around ONSET DETECTION.
Onset detection measures the start of an event, the event in this case is a note being played. You can look for changes in the weighted fourier transform (High Frequency Content) you can look for large changes in spectrial content. (Spectrial Difference). (there are a couple of papers that I recommend you look into further down) Once you apply an onset detection algorithm you pick off where the beats are via thresholding.
There are various algorithms that you can use once you've gotten that time localization of the beat. You can turn it into a pulse train (create a signal that is zero for all time and 1 only when your beat happens) then apply a FFT to that and BAM now you have a Frequency of Onsets at the largest peak.
Here are some papers to lead you in the right direction:
https://web.archive.org/web/20120310151026/http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
Here is an extension to what some people are discussing:
Someone mentioned looking into applying a machine learning algorithm: Basically collect a bunch of features from the onset detection functions (mentioned above) and combine them with the raw signal in a neural network/logistic regression and learn what makes a beat a beat.
look into Dr Andrew Ng, he has free machine learning lectures from Stanford University online (not the long winded video lectures, there is actually an online distance course)
If you can manage to interface with python code in your project, Echo Nest Remix API is a pretty slick API for python:
There's a method analysis.tempo which will give you the BPM. It can do a whole lot more than simple BPM, as you can see from the API docs or this tutorial
Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.
SO question that might help: Bpm audio detection Library
Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal
Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...
another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.
To repost my answer: The easy way to do it is to have the user tap a button in rhythm with the beat, and count the number of taps divided by the time.
Others have already described some beat-detection methods. I want to add that there are some libraries available that provide techniques and algorithms for this sort of task.
Aubio is one of them, it has a good reputation and it's written in C with a C++ wrapper so you can integrate it easily with a cocoa application (all the audio stuff in Apple's frameworks is also written in C/C++).
There are several methods to get the BPM but the one I find the most effective is the "beat spectrum" (described here).
This algorithm computes a similarity matrix by comparing each short sample of the music with every others. Once the similarity matrix is computed it is possible to get average similarity between every samples pairs {S(T);S(T+1)} for each time interval T: this is the beat spectrum. The first high peak in the beat spectrum is most of the time the beat duration. The best part is you can also do things like music structure or rythm analyses.
I'd imagine this will be easiest in 4-4 dance music, as there should be a single low frequency thud about twice a second.
This question already has answers here:
Detect silence when recording
(2 answers)
Closed 9 years ago.
I am starting a project which would allow me to use Java to read sound samples, and depending on the properties of each sample (I'm thinking focusing on decibels at the moment for the sake of simplification, or finding some way to compute the overall 'volume' of a specific sample or set of samples), return a value from 0-255 where 0 would be silence and 255 would be the highest sound pressure (Compared to a reference point, I suppose? I have no idea how to word this). I want to then have these values returned as bytes and sent to an Arduino in order to control the intensity of LED's using PWM, and visually 'see' the music.
I am not any sort of audio file format expert, and have no particular understanding of how the data is stored in a music file. As such, I am having trouble finding out how to read a sample and find a way to represent its overall volume level as a byte. I have looked through the javax.sound.sampled package and it is all very confusing to me. Any insight as to how I could accomplish this would be greatly appreciated.
First i suggest you to read Pulse-code modulation which is the format use to store data on a .wav file (the simplest to begin with).
Next there is a post on how to get PCM data from a wav file in java here.
Finally to get the "volume" (which is actually more the energy) apply this energy equation.
wish it could help you,
As Bastyen (+1 from me) indicates, calculating decibels is actually NOT simple, but requires looking at a large number of samples. However, since sound samples run MUCH more frequently than visual frames in an animation, making an aggregate measure works out rather neatly.
A nice visual animation rate, for example, updates 60 times per second, and the most common sampling rate for sound is 44100 times per second. So, 735 samples (44100 / 60 = 735) might end up being a good choice for interfacing with a visualizer.
By the way, of all the official Java tutorials I've read (I am a big fan), I have found the ones that accompany the javax.sound.sampled to be the most difficult. http://docs.oracle.com/javase/tutorial/sound/TOC.html
But they are still worth reading. If I were in charge of a rewrite, there would be many more code examples. Some of the best code examples are in several sections deep, e.g., the "Using Files and Format Converters" discussion.
If you don't wish to compute the RMS, a hack would be to store the local high and/or low value for the given number of samples. Relating these numbers to decibels would be dubious, but MAYBE could be useful after giving it a mapping of your choice to the visualizer. Part of the problem is that values for a single point on given wave can range wildly. The local high might be more due to the phase of the constituent harmonics happening to line up than about the energy or volume.
Your PCM top and bottom values would probably NOT be 0 and 256, more likely -128 to 127 for 8-bit encoding. More common still is 16-bit encoding (-32768 to 32767). But you will get the hang of this if you follow Bastyen's links. To make your code independent of the bit-encoding, you would likely normalize the data (convert to floats between -1 and 1) before doing any other calculations.