visualize microphone audio in Java - java

Is there a way to visualize audio in Java in a kind of wave?
How should I start, I already set up a microphone selection an a Thread to read the bytes from the TargetDataLine into a buffer.
But what should I do now?
Any help would be appreciated.

If you are using the Java Sound API, the data that you have read is 8 or 16 bits PCM. If it is 8-bit then it is fine, otherwise you may need to adjust the endianess.
If you are reading 8-bit PCM, each byte is a sample, then the value of that byte is the sound sample. If you are reading 16-bit PCM, then the samples are packed either as hi,lo,hi,lo or lo,hi,lo,hi (where hi and lo are high and low order bytes) depending on endianness. In that case you should convert that to a short value.
For plotting you will need a 3rd party library, such as freechart or jahuwaldt.plot. (I used the latter on a real time wave visualization program).

Related

Understanding bytes from wav audio data

I am novel handling audio. I am trying to understand how the audio wav file works. I get the bytes with a java code and then render the first 1500 samples in an excel file. This is the image from Audacity:
AudacityImage
And this is the representation in Excel:
ExcelImage
I can see the wave but I don't know what the peaks mixed with the original signal are. Can someone explain this to me please?
You may yet need to do another important step in order for the Excel image to be meaningful. If the wav data is 16-bit or 24-bit or 32-bit encoding, the bytes need to be appended into PCM data. 8-bit values (+/- 128) are not used so often any more for encoding waveforms. Shorts (16-bit, +/- 32767) give much better fidelity. Check the format to see the encoding and the byte order (may be either big-endian or little-endian) and number of tracks (mono or stereo) and assemble your PCM values accordingly and I bet you will get the desired result.
The tutorial provided by Oracle, Overview of the Sampled Package, goes over basic concepts and tools that will be helpful in the Java context.

Active Array of Streaming Audio Amplitude

I was wondering if anyone knew how to convert a continuous input into the mic of an Android device into a byte array, or time-amplitude coordinates. What I want to do is get an array of data so that
array[time]=amplitude
This must be active, which is one of the major obstacles in my path, as most audio waveform graphers rely on closed files. Can anyone guide me in the right direction?
Do you have any special requirements for what time is supposed to be? A PCM stream (which is what you get when using the AudioRecord class) is by definition a digital represenation of the input signal's amplitude sampled at regular intervals.
So if you record at 48 kHz mono, each sample in the array of PCM data that you read from the AudioRecord will represent the audio signal's amplitude at time N*20.83 us.

format of sound data

What is the low level actual format of sound data when read from a stream in Java? For example, use the following dataline with 44.1khz sample rate, 16 bit sample depth, 2 channels, signed data, bigEndian format.
TargetDataLine tdLine = new TargetDataLine(new AudioFormat(44100,16,2,true,true));
I understand that it is sampling 44100 times a second and each sample is 16bits. What I don't understand is what the 16 bits, or each of the 16 bits, represent. Also, does each channel have its own 16bit sample?
I'll start with your last question first, yes, each channel has its own 16-bit sample for each of the 44100 samples each second.
As for your first question, you have to know about the hardware inside of a speaker. There is a diaphragm and an electormagnet. The diaphragm is the big round part you can see if you take the cover off. When the electromagnet is charged it pulls or pushes a ferrous plate that is attached to the diaphragm, causing it to move. That movement becomes a sound.
The value of each sample is how much electricity is sent to the speaker. So when a sample is zero, the diaphragm is at rest. When it is positive it is pushed one way and when it is negative, the other way. The larger the sample, the more the diaphragm is moved.
If you graph all of the samples in your data, you would have a graph of the movement of the speaker over time.
You should learn about the Digital Audio Basics (Wiki gives you a start and lots of links with further reads). After that 44.1khz sample rate, 16 bit sample depth, 2 channels, signed data, bigEndian format should immediately tell you the low level format.
In this case it means 44100 samples/sec, 16 bit signed integers representing each sample and finally endianess determines in which order the bytes of a 16bit int are put into the stream (big endian = most significant byte first).

Audio files down-sample

I am facing a problem while working with audio files. I am implementing an algorithm that deals with audio files, and the algorithm requires the input to be a 5 KHz mono audio file.
Most of the audio files I have are PCM 44.1 KHz 16-bit stereo, so my problem is how to convert 44.1 KHz stereo files to 5 KHz mono files?
I would be grateful if anyone could provide a tutorial that explain the basics of DSP behind the idea or any JAVA libraries.
Just to augment what was already said by Prasad, you should low-pass filter the signal at 2.5 kHz before downsampling to prevent aliasing in the result. If there is some 4 kHz tone in the original signal, it can't possibly be represented by a 5 kHz sample rate, and will be folded back across the 2.5 kHz nyquist limit, creating a false ("aliased") tone at 1.5 kHz.
See related: How to implement low pass filter using java
Also, if you're downsampling from 44100 to 5000 hz, you'll be saving one for every 8.82 original samples; not a nice integer division. This means you should also employ some type of interpolation since you'll be sampling non-integer values from the original signal.
Java Sound API (javax.sound.*) contains a lot of useful functions to manipulate sounds.
http://download.oracle.com/javase/tutorial/sound/index.html
You could find the already implemented java codes to easily down sample your audio file HERE.
With the stereo PCM I have handled usually every other 16-bit value in the pcm bytearray is a data point corresponding to a particular stereo channel, this is called interleaving. So first grab every other value in the stereo channel to extract a mono PCM bytearray.
As for the frequency downsampling, if you were to play a 44100 Hz audio file as if it were a 5000hz audio file, you'll have too much data, which will make it sound slowed down. So take samples in increments of int(44100/5000) to downsample it to a 5khz signal.

Create a wav with hidden binary data in it and read it (Java)

What I'm willing to do is to convert a text string into a wav file format in high frequencies (18500Hz +): this will be the encoder.
And create an engine to decode this text string from a wav formatted recording that will support error control as I will not use the same file obviously, to read, but a recording of this sound.
Thanks
An important consideration will be whether or not you want to hide the string into an existing audio file (so it sounds like a normal file, but has an encoded message -- that is called steganography), or whether you will just be creating a file that sounds like gibberish, for the purpose of encoding data only. I'm assuming the latter since you didn't ask to hide a message in an existing file.
So I assume you are not looking for low-level details on writing WAV files (I am sure you can find documentation on how to read and write individual samples to a WAV file). Obviously, the simplest approach would be to simply take each byte of the source string, and store it as a sample in the WAV file (assuming an 8-bit recording. If it's a 16-bit recording, you can store two bytes per sample. If it's a stereo 16-bit recording, you can store four bytes per sample). Then you can just read the WAV file back in and read the samples back as bytes. That's the simple approach but as you say, you want to be able to make a (presumably analog) recording of the sound, and then read it back into a WAV file, and still be able to read the data.
With the approach above, if the analog recording is not exactly perfect (and how could it be), you would lose bytes of the message. This means you need to store the message in such a way that missing bytes, or bytes that have a slight error, are not going to be a problem. How you do this will depend highly upon exactly what sort of "damage" will be happening to the sound file. I would expect two major forms of damage:
"Vertical" damage: A sample (byte) would have a slightly higher or lower value than it originally had.
"Horizontal" damage: Samples may be averaged, stretched or squashed horizontally. From a byte perspective, this means some samples may be repeated, while others may be missing.
To combat this, you need some redundancy in the message. More redundancy means the message will take up more space (be longer), but will be more reliable.
I would recommend thinking about how old (pre-mobile) telephone dial tones worked: each key generated a unique tone and sent it across the wire. The tones are long enough, and far enough apart pitch-wise that they can be distinguished even given the above forms of damage. So, choose two parameters: a) length and b) frequency-delta. For each byte of data, select a frequency, spacing the 256 byte values frequency-delta Hertz apart. Then, generate a sine wave for length milliseconds of that frequency. This encodes a lot more redundancy than the above one-byte-per-sample approach, since each byte takes up many samples, and if you lose some samples, it doesn't matter.
When you read them back in, read every length milliseconds of audio data and then estimate the frequency of the sine wave. Map this onto the byte value with the nearest frequency.
Obviously, longer values of length and further-apart frequency-delta will make the signal more reliable, but require the sound to be longer and higher-frequency, respectively. So you will have to play around with these values to see what works.
Some last thoughts, since your title says "hidden" binary data:
If you really want the data to be "hidden", consider encrypting it before encoding it to audio.
If you want to take the steganography approach, you will have to read up on audio steganography (I imagine you can use the above techniques, but you will have to insert them as extremely low-volume signals on top of the existing sound).

Categories

Resources