What is the low level actual format of sound data when read from a stream in Java? For example, use the following dataline with 44.1khz sample rate, 16 bit sample depth, 2 channels, signed data, bigEndian format.
TargetDataLine tdLine = new TargetDataLine(new AudioFormat(44100,16,2,true,true));
I understand that it is sampling 44100 times a second and each sample is 16bits. What I don't understand is what the 16 bits, or each of the 16 bits, represent. Also, does each channel have its own 16bit sample?
I'll start with your last question first, yes, each channel has its own 16-bit sample for each of the 44100 samples each second.
As for your first question, you have to know about the hardware inside of a speaker. There is a diaphragm and an electormagnet. The diaphragm is the big round part you can see if you take the cover off. When the electromagnet is charged it pulls or pushes a ferrous plate that is attached to the diaphragm, causing it to move. That movement becomes a sound.
The value of each sample is how much electricity is sent to the speaker. So when a sample is zero, the diaphragm is at rest. When it is positive it is pushed one way and when it is negative, the other way. The larger the sample, the more the diaphragm is moved.
If you graph all of the samples in your data, you would have a graph of the movement of the speaker over time.
You should learn about the Digital Audio Basics (Wiki gives you a start and lots of links with further reads). After that 44.1khz sample rate, 16 bit sample depth, 2 channels, signed data, bigEndian format should immediately tell you the low level format.
In this case it means 44100 samples/sec, 16 bit signed integers representing each sample and finally endianess determines in which order the bytes of a 16bit int are put into the stream (big endian = most significant byte first).
Related
I am novel handling audio. I am trying to understand how the audio wav file works. I get the bytes with a java code and then render the first 1500 samples in an excel file. This is the image from Audacity:
AudacityImage
And this is the representation in Excel:
ExcelImage
I can see the wave but I don't know what the peaks mixed with the original signal are. Can someone explain this to me please?
You may yet need to do another important step in order for the Excel image to be meaningful. If the wav data is 16-bit or 24-bit or 32-bit encoding, the bytes need to be appended into PCM data. 8-bit values (+/- 128) are not used so often any more for encoding waveforms. Shorts (16-bit, +/- 32767) give much better fidelity. Check the format to see the encoding and the byte order (may be either big-endian or little-endian) and number of tracks (mono or stereo) and assemble your PCM values accordingly and I bet you will get the desired result.
The tutorial provided by Oracle, Overview of the Sampled Package, goes over basic concepts and tools that will be helpful in the Java context.
Is there a way to visualize audio in Java in a kind of wave?
How should I start, I already set up a microphone selection an a Thread to read the bytes from the TargetDataLine into a buffer.
But what should I do now?
Any help would be appreciated.
If you are using the Java Sound API, the data that you have read is 8 or 16 bits PCM. If it is 8-bit then it is fine, otherwise you may need to adjust the endianess.
If you are reading 8-bit PCM, each byte is a sample, then the value of that byte is the sound sample. If you are reading 16-bit PCM, then the samples are packed either as hi,lo,hi,lo or lo,hi,lo,hi (where hi and lo are high and low order bytes) depending on endianness. In that case you should convert that to a short value.
For plotting you will need a 3rd party library, such as freechart or jahuwaldt.plot. (I used the latter on a real time wave visualization program).
I was wondering if anyone knew how to convert a continuous input into the mic of an Android device into a byte array, or time-amplitude coordinates. What I want to do is get an array of data so that
array[time]=amplitude
This must be active, which is one of the major obstacles in my path, as most audio waveform graphers rely on closed files. Can anyone guide me in the right direction?
Do you have any special requirements for what time is supposed to be? A PCM stream (which is what you get when using the AudioRecord class) is by definition a digital represenation of the input signal's amplitude sampled at regular intervals.
So if you record at 48 kHz mono, each sample in the array of PCM data that you read from the AudioRecord will represent the audio signal's amplitude at time N*20.83 us.
I'm trying to write some basic sound-editing programs in Java, but I've been having a huge amount of trouble with my 16-bit WAVE file format.
When I asked Java how many samples it thought my sound file had, it gave me a number twice as big as I expected. When I told Java to generate a sine wave of a 80000 byte samples, it played for 1 second instead of 2 (even though the sample rate was about 40000 per second).
After some more searching, I realized the the "frame size" of my file was 2, that a "sample" was actually 2 bytes instead of one, and that this was called a 16-bit audio file. As an experiment, I wrote my sound file to an array of bytes, set every other byte to 0, and played back the result. When I kept only the odd samples, the sound file played back with a tiny bit of static noise. When I kept only the even ones, that static noise played back on its own without the sound file. This makes me think that the even bytes contain the exact inverse of the static in the odd bytes, which contain the actual sound to be played. When played back together, the even bytes silence the static in the odd bytes, which increases the sounds fidelity.
This website has a pretty good explanation of the basics of 16-bit sound encodings. However, it's not quite good enough for me to go ahead and start editing the file byte by byte. How can I do byte-by-byte editing of a 16-bit (or larger) sound file while still preserving its higher fidelity? What's the formula for encoding sound with 16 bits per sample instead of just 8?
How can I do byte-by-byte editing of a 16-bit (or larger) sound file...?
That question does not make any sense. When you say "byte-by-byte editing", you really should be saying "sample-by-sample". In this case, every sample is 16 bits (or two bytes), and it does not make sense to split the samples apart. That would be like trying to edit only the top halves of each letter in a text editor.
A single channel of a digital audio stream is a sequence of numbers (a.k.a., samples). Each sample is a representation of the pressure exerted on a microphone diaphragm by the sound wave at some instant in time. In an eight bit sound file, there are only 256 possible values, whereas in a 16-bit sound file, there are 65536 possible values. A 16-bit file has much greater resolution.
This makes me think that the even bytes contain the exact inverse of the static in the odd bytes, which contain the actual sound to be played.
There's a kernel of truth to that. The definition of "noise" in signal processing is the difference between what you hear and what you wanted to hear. When you zeroed out all of odd-numbered bytes, you were stomping on the low-order halves of each sample. By changing the samples, you were introducing something you didn't want to hear (i.e., noise). When you zeroed out the even-numbered bytes, you killed all of the high-order bits and therefore most of the signal. What remained in the low-order bytes was the exact inverse of the noise that you had introduced in your first experiment. (your ears can't tell the difference between a given sound wave and the inverse of the same sound wave.)
There is no absolute mapping between sample values and pressure, but there are a couple of things you should know:
1) Are the samples signed or are they unsigned? Every sample has a value that must lie between some minimum and some maximum. If the (16-bit) samples are signed, then the minimum value is -32768 (0x8000), the maximum is 32767 (0x7FFF), and 0 is right in the middle. If the samples are unsigned, then the minimum is 0, and the maximum is 65535 (0xFFFF). Get it wrong, and you will know immediately because all you will hear is massive noise.
2) Are the samples linear? The sample values are always proportional to something. If they are directly proportional to the sound pressure level, that's called "linear encoding." But they may be proportional to the logarithm of the sound pressure or, to some other function of the sound pressure. Non-linear encodings are almost always 8-bit, and they usually are only encountered in specialized applications like telephony. If you are dealing with 16-bit or larger samples, then they are almost certainly linear.
I am facing a problem while working with audio files. I am implementing an algorithm that deals with audio files, and the algorithm requires the input to be a 5 KHz mono audio file.
Most of the audio files I have are PCM 44.1 KHz 16-bit stereo, so my problem is how to convert 44.1 KHz stereo files to 5 KHz mono files?
I would be grateful if anyone could provide a tutorial that explain the basics of DSP behind the idea or any JAVA libraries.
Just to augment what was already said by Prasad, you should low-pass filter the signal at 2.5 kHz before downsampling to prevent aliasing in the result. If there is some 4 kHz tone in the original signal, it can't possibly be represented by a 5 kHz sample rate, and will be folded back across the 2.5 kHz nyquist limit, creating a false ("aliased") tone at 1.5 kHz.
See related: How to implement low pass filter using java
Also, if you're downsampling from 44100 to 5000 hz, you'll be saving one for every 8.82 original samples; not a nice integer division. This means you should also employ some type of interpolation since you'll be sampling non-integer values from the original signal.
Java Sound API (javax.sound.*) contains a lot of useful functions to manipulate sounds.
http://download.oracle.com/javase/tutorial/sound/index.html
You could find the already implemented java codes to easily down sample your audio file HERE.
With the stereo PCM I have handled usually every other 16-bit value in the pcm bytearray is a data point corresponding to a particular stereo channel, this is called interleaving. So first grab every other value in the stereo channel to extract a mono PCM bytearray.
As for the frequency downsampling, if you were to play a 44100 Hz audio file as if it were a 5000hz audio file, you'll have too much data, which will make it sound slowed down. So take samples in increments of int(44100/5000) to downsample it to a 5khz signal.