Audio programming and how to invert a sound wave? - java

I'm trying to invert a sound wave (phase shift 180 degrees), but I'm not exactly sure how I would go about doing this. Can any audio programmers point me in the right direction?

Inverting a sound wave should be generally easy if you have access to the byte array that makes up the sound. You simply need to take the negative of each value in the stream.
Audio streams come in many different flavors so it's impossible to be specific. However, if it was a 16bit PCM stream, which is full of 2-byte values, you'd loop over the data and for each two bytes in the stream: cast it to a short, take the negative of it, and put it back into the byte stream.

Related

visualize microphone audio in Java

Is there a way to visualize audio in Java in a kind of wave?
How should I start, I already set up a microphone selection an a Thread to read the bytes from the TargetDataLine into a buffer.
But what should I do now?
Any help would be appreciated.
If you are using the Java Sound API, the data that you have read is 8 or 16 bits PCM. If it is 8-bit then it is fine, otherwise you may need to adjust the endianess.
If you are reading 8-bit PCM, each byte is a sample, then the value of that byte is the sound sample. If you are reading 16-bit PCM, then the samples are packed either as hi,lo,hi,lo or lo,hi,lo,hi (where hi and lo are high and low order bytes) depending on endianness. In that case you should convert that to a short value.
For plotting you will need a 3rd party library, such as freechart or jahuwaldt.plot. (I used the latter on a real time wave visualization program).

Active Array of Streaming Audio Amplitude

I was wondering if anyone knew how to convert a continuous input into the mic of an Android device into a byte array, or time-amplitude coordinates. What I want to do is get an array of data so that
array[time]=amplitude
This must be active, which is one of the major obstacles in my path, as most audio waveform graphers rely on closed files. Can anyone guide me in the right direction?
Do you have any special requirements for what time is supposed to be? A PCM stream (which is what you get when using the AudioRecord class) is by definition a digital represenation of the input signal's amplitude sampled at regular intervals.
So if you record at 48 kHz mono, each sample in the array of PCM data that you read from the AudioRecord will represent the audio signal's amplitude at time N*20.83 us.

How do 16+ bit audio formats work?

I'm trying to write some basic sound-editing programs in Java, but I've been having a huge amount of trouble with my 16-bit WAVE file format.
When I asked Java how many samples it thought my sound file had, it gave me a number twice as big as I expected. When I told Java to generate a sine wave of a 80000 byte samples, it played for 1 second instead of 2 (even though the sample rate was about 40000 per second).
After some more searching, I realized the the "frame size" of my file was 2, that a "sample" was actually 2 bytes instead of one, and that this was called a 16-bit audio file. As an experiment, I wrote my sound file to an array of bytes, set every other byte to 0, and played back the result. When I kept only the odd samples, the sound file played back with a tiny bit of static noise. When I kept only the even ones, that static noise played back on its own without the sound file. This makes me think that the even bytes contain the exact inverse of the static in the odd bytes, which contain the actual sound to be played. When played back together, the even bytes silence the static in the odd bytes, which increases the sounds fidelity.
This website has a pretty good explanation of the basics of 16-bit sound encodings. However, it's not quite good enough for me to go ahead and start editing the file byte by byte. How can I do byte-by-byte editing of a 16-bit (or larger) sound file while still preserving its higher fidelity? What's the formula for encoding sound with 16 bits per sample instead of just 8?
How can I do byte-by-byte editing of a 16-bit (or larger) sound file...?
That question does not make any sense. When you say "byte-by-byte editing", you really should be saying "sample-by-sample". In this case, every sample is 16 bits (or two bytes), and it does not make sense to split the samples apart. That would be like trying to edit only the top halves of each letter in a text editor.
A single channel of a digital audio stream is a sequence of numbers (a.k.a., samples). Each sample is a representation of the pressure exerted on a microphone diaphragm by the sound wave at some instant in time. In an eight bit sound file, there are only 256 possible values, whereas in a 16-bit sound file, there are 65536 possible values. A 16-bit file has much greater resolution.
This makes me think that the even bytes contain the exact inverse of the static in the odd bytes, which contain the actual sound to be played.
There's a kernel of truth to that. The definition of "noise" in signal processing is the difference between what you hear and what you wanted to hear. When you zeroed out all of odd-numbered bytes, you were stomping on the low-order halves of each sample. By changing the samples, you were introducing something you didn't want to hear (i.e., noise). When you zeroed out the even-numbered bytes, you killed all of the high-order bits and therefore most of the signal. What remained in the low-order bytes was the exact inverse of the noise that you had introduced in your first experiment. (your ears can't tell the difference between a given sound wave and the inverse of the same sound wave.)
There is no absolute mapping between sample values and pressure, but there are a couple of things you should know:
1) Are the samples signed or are they unsigned? Every sample has a value that must lie between some minimum and some maximum. If the (16-bit) samples are signed, then the minimum value is -32768 (0x8000), the maximum is 32767 (0x7FFF), and 0 is right in the middle. If the samples are unsigned, then the minimum is 0, and the maximum is 65535 (0xFFFF). Get it wrong, and you will know immediately because all you will hear is massive noise.
2) Are the samples linear? The sample values are always proportional to something. If they are directly proportional to the sound pressure level, that's called "linear encoding." But they may be proportional to the logarithm of the sound pressure or, to some other function of the sound pressure. Non-linear encodings are almost always 8-bit, and they usually are only encountered in specialized applications like telephony. If you are dealing with 16-bit or larger samples, then they are almost certainly linear.

format of sound data

What is the low level actual format of sound data when read from a stream in Java? For example, use the following dataline with 44.1khz sample rate, 16 bit sample depth, 2 channels, signed data, bigEndian format.
TargetDataLine tdLine = new TargetDataLine(new AudioFormat(44100,16,2,true,true));
I understand that it is sampling 44100 times a second and each sample is 16bits. What I don't understand is what the 16 bits, or each of the 16 bits, represent. Also, does each channel have its own 16bit sample?
I'll start with your last question first, yes, each channel has its own 16-bit sample for each of the 44100 samples each second.
As for your first question, you have to know about the hardware inside of a speaker. There is a diaphragm and an electormagnet. The diaphragm is the big round part you can see if you take the cover off. When the electromagnet is charged it pulls or pushes a ferrous plate that is attached to the diaphragm, causing it to move. That movement becomes a sound.
The value of each sample is how much electricity is sent to the speaker. So when a sample is zero, the diaphragm is at rest. When it is positive it is pushed one way and when it is negative, the other way. The larger the sample, the more the diaphragm is moved.
If you graph all of the samples in your data, you would have a graph of the movement of the speaker over time.
You should learn about the Digital Audio Basics (Wiki gives you a start and lots of links with further reads). After that 44.1khz sample rate, 16 bit sample depth, 2 channels, signed data, bigEndian format should immediately tell you the low level format.
In this case it means 44100 samples/sec, 16 bit signed integers representing each sample and finally endianess determines in which order the bytes of a 16bit int are put into the stream (big endian = most significant byte first).

Create a wav with hidden binary data in it and read it (Java)

What I'm willing to do is to convert a text string into a wav file format in high frequencies (18500Hz +): this will be the encoder.
And create an engine to decode this text string from a wav formatted recording that will support error control as I will not use the same file obviously, to read, but a recording of this sound.
Thanks
An important consideration will be whether or not you want to hide the string into an existing audio file (so it sounds like a normal file, but has an encoded message -- that is called steganography), or whether you will just be creating a file that sounds like gibberish, for the purpose of encoding data only. I'm assuming the latter since you didn't ask to hide a message in an existing file.
So I assume you are not looking for low-level details on writing WAV files (I am sure you can find documentation on how to read and write individual samples to a WAV file). Obviously, the simplest approach would be to simply take each byte of the source string, and store it as a sample in the WAV file (assuming an 8-bit recording. If it's a 16-bit recording, you can store two bytes per sample. If it's a stereo 16-bit recording, you can store four bytes per sample). Then you can just read the WAV file back in and read the samples back as bytes. That's the simple approach but as you say, you want to be able to make a (presumably analog) recording of the sound, and then read it back into a WAV file, and still be able to read the data.
With the approach above, if the analog recording is not exactly perfect (and how could it be), you would lose bytes of the message. This means you need to store the message in such a way that missing bytes, or bytes that have a slight error, are not going to be a problem. How you do this will depend highly upon exactly what sort of "damage" will be happening to the sound file. I would expect two major forms of damage:
"Vertical" damage: A sample (byte) would have a slightly higher or lower value than it originally had.
"Horizontal" damage: Samples may be averaged, stretched or squashed horizontally. From a byte perspective, this means some samples may be repeated, while others may be missing.
To combat this, you need some redundancy in the message. More redundancy means the message will take up more space (be longer), but will be more reliable.
I would recommend thinking about how old (pre-mobile) telephone dial tones worked: each key generated a unique tone and sent it across the wire. The tones are long enough, and far enough apart pitch-wise that they can be distinguished even given the above forms of damage. So, choose two parameters: a) length and b) frequency-delta. For each byte of data, select a frequency, spacing the 256 byte values frequency-delta Hertz apart. Then, generate a sine wave for length milliseconds of that frequency. This encodes a lot more redundancy than the above one-byte-per-sample approach, since each byte takes up many samples, and if you lose some samples, it doesn't matter.
When you read them back in, read every length milliseconds of audio data and then estimate the frequency of the sine wave. Map this onto the byte value with the nearest frequency.
Obviously, longer values of length and further-apart frequency-delta will make the signal more reliable, but require the sound to be longer and higher-frequency, respectively. So you will have to play around with these values to see what works.
Some last thoughts, since your title says "hidden" binary data:
If you really want the data to be "hidden", consider encrypting it before encoding it to audio.
If you want to take the steganography approach, you will have to read up on audio steganography (I imagine you can use the above techniques, but you will have to insert them as extremely low-volume signals on top of the existing sound).

Categories

Resources