Active Array of Streaming Audio Amplitude - java

I was wondering if anyone knew how to convert a continuous input into the mic of an Android device into a byte array, or time-amplitude coordinates. What I want to do is get an array of data so that
array[time]=amplitude
This must be active, which is one of the major obstacles in my path, as most audio waveform graphers rely on closed files. Can anyone guide me in the right direction?

Do you have any special requirements for what time is supposed to be? A PCM stream (which is what you get when using the AudioRecord class) is by definition a digital represenation of the input signal's amplitude sampled at regular intervals.
So if you record at 48 kHz mono, each sample in the array of PCM data that you read from the AudioRecord will represent the audio signal's amplitude at time N*20.83 us.

Related

Combine 2 wav files as one simultaneous song

I have a program that records input from a mic while music is playing. When done, the voice is saved as a wav file. I would like to combine that wav file with the wav file song that was playing during recording. Example: song1.wav plays in the background as person is singing. The recording of that person singing is now recording1.wav. I want to combine song1.wav and recording1.wav to play simultaneously and become one song song recording finalsong1.wav. I am a beginner in Java and have yet to find a solution, or even a starting point beside how to concatenate them, which is the opposite of what I'd like to do.
I am going to list the steps involved. These steps have been covered multiple times and should be straightforward to research.
For both wav files, read in using a AudioInputStream.
As the data arrives, convert the bytes to signed PCM. If the format is 16-bit, the PCM can be signed shorts, or scaled/normalized to floats that range from -1 to 1.
Use addition to combine the corresponding data (e.g., frame 1 right channel of voice with frame 1 right channel of music). If the signals are too "hot" it may be necessary to enforce a min and max function to prevent the data from exceeding the bounds of the range, as this sounds really terrible when it happens.
Convert the summed, signed PCM data back to bytes according to the audio format.
Write to a wav file.
I know of tools that can mix tracks for playback, but I don't recall one that will automatically save the results to wav. One likely exists somewhere, or should be easy to write or commission.

visualize microphone audio in Java

Is there a way to visualize audio in Java in a kind of wave?
How should I start, I already set up a microphone selection an a Thread to read the bytes from the TargetDataLine into a buffer.
But what should I do now?
Any help would be appreciated.
If you are using the Java Sound API, the data that you have read is 8 or 16 bits PCM. If it is 8-bit then it is fine, otherwise you may need to adjust the endianess.
If you are reading 8-bit PCM, each byte is a sample, then the value of that byte is the sound sample. If you are reading 16-bit PCM, then the samples are packed either as hi,lo,hi,lo or lo,hi,lo,hi (where hi and lo are high and low order bytes) depending on endianness. In that case you should convert that to a short value.
For plotting you will need a 3rd party library, such as freechart or jahuwaldt.plot. (I used the latter on a real time wave visualization program).

format of sound data

What is the low level actual format of sound data when read from a stream in Java? For example, use the following dataline with 44.1khz sample rate, 16 bit sample depth, 2 channels, signed data, bigEndian format.
TargetDataLine tdLine = new TargetDataLine(new AudioFormat(44100,16,2,true,true));
I understand that it is sampling 44100 times a second and each sample is 16bits. What I don't understand is what the 16 bits, or each of the 16 bits, represent. Also, does each channel have its own 16bit sample?
I'll start with your last question first, yes, each channel has its own 16-bit sample for each of the 44100 samples each second.
As for your first question, you have to know about the hardware inside of a speaker. There is a diaphragm and an electormagnet. The diaphragm is the big round part you can see if you take the cover off. When the electromagnet is charged it pulls or pushes a ferrous plate that is attached to the diaphragm, causing it to move. That movement becomes a sound.
The value of each sample is how much electricity is sent to the speaker. So when a sample is zero, the diaphragm is at rest. When it is positive it is pushed one way and when it is negative, the other way. The larger the sample, the more the diaphragm is moved.
If you graph all of the samples in your data, you would have a graph of the movement of the speaker over time.
You should learn about the Digital Audio Basics (Wiki gives you a start and lots of links with further reads). After that 44.1khz sample rate, 16 bit sample depth, 2 channels, signed data, bigEndian format should immediately tell you the low level format.
In this case it means 44100 samples/sec, 16 bit signed integers representing each sample and finally endianess determines in which order the bytes of a 16bit int are put into the stream (big endian = most significant byte first).

Audio files down-sample

I am facing a problem while working with audio files. I am implementing an algorithm that deals with audio files, and the algorithm requires the input to be a 5 KHz mono audio file.
Most of the audio files I have are PCM 44.1 KHz 16-bit stereo, so my problem is how to convert 44.1 KHz stereo files to 5 KHz mono files?
I would be grateful if anyone could provide a tutorial that explain the basics of DSP behind the idea or any JAVA libraries.
Just to augment what was already said by Prasad, you should low-pass filter the signal at 2.5 kHz before downsampling to prevent aliasing in the result. If there is some 4 kHz tone in the original signal, it can't possibly be represented by a 5 kHz sample rate, and will be folded back across the 2.5 kHz nyquist limit, creating a false ("aliased") tone at 1.5 kHz.
See related: How to implement low pass filter using java
Also, if you're downsampling from 44100 to 5000 hz, you'll be saving one for every 8.82 original samples; not a nice integer division. This means you should also employ some type of interpolation since you'll be sampling non-integer values from the original signal.
Java Sound API (javax.sound.*) contains a lot of useful functions to manipulate sounds.
http://download.oracle.com/javase/tutorial/sound/index.html
You could find the already implemented java codes to easily down sample your audio file HERE.
With the stereo PCM I have handled usually every other 16-bit value in the pcm bytearray is a data point corresponding to a particular stereo channel, this is called interleaving. So first grab every other value in the stereo channel to extract a mono PCM bytearray.
As for the frequency downsampling, if you were to play a 44100 Hz audio file as if it were a 5000hz audio file, you'll have too much data, which will make it sound slowed down. So take samples in increments of int(44100/5000) to downsample it to a 5khz signal.

Create a wav with hidden binary data in it and read it (Java)

What I'm willing to do is to convert a text string into a wav file format in high frequencies (18500Hz +): this will be the encoder.
And create an engine to decode this text string from a wav formatted recording that will support error control as I will not use the same file obviously, to read, but a recording of this sound.
Thanks
An important consideration will be whether or not you want to hide the string into an existing audio file (so it sounds like a normal file, but has an encoded message -- that is called steganography), or whether you will just be creating a file that sounds like gibberish, for the purpose of encoding data only. I'm assuming the latter since you didn't ask to hide a message in an existing file.
So I assume you are not looking for low-level details on writing WAV files (I am sure you can find documentation on how to read and write individual samples to a WAV file). Obviously, the simplest approach would be to simply take each byte of the source string, and store it as a sample in the WAV file (assuming an 8-bit recording. If it's a 16-bit recording, you can store two bytes per sample. If it's a stereo 16-bit recording, you can store four bytes per sample). Then you can just read the WAV file back in and read the samples back as bytes. That's the simple approach but as you say, you want to be able to make a (presumably analog) recording of the sound, and then read it back into a WAV file, and still be able to read the data.
With the approach above, if the analog recording is not exactly perfect (and how could it be), you would lose bytes of the message. This means you need to store the message in such a way that missing bytes, or bytes that have a slight error, are not going to be a problem. How you do this will depend highly upon exactly what sort of "damage" will be happening to the sound file. I would expect two major forms of damage:
"Vertical" damage: A sample (byte) would have a slightly higher or lower value than it originally had.
"Horizontal" damage: Samples may be averaged, stretched or squashed horizontally. From a byte perspective, this means some samples may be repeated, while others may be missing.
To combat this, you need some redundancy in the message. More redundancy means the message will take up more space (be longer), but will be more reliable.
I would recommend thinking about how old (pre-mobile) telephone dial tones worked: each key generated a unique tone and sent it across the wire. The tones are long enough, and far enough apart pitch-wise that they can be distinguished even given the above forms of damage. So, choose two parameters: a) length and b) frequency-delta. For each byte of data, select a frequency, spacing the 256 byte values frequency-delta Hertz apart. Then, generate a sine wave for length milliseconds of that frequency. This encodes a lot more redundancy than the above one-byte-per-sample approach, since each byte takes up many samples, and if you lose some samples, it doesn't matter.
When you read them back in, read every length milliseconds of audio data and then estimate the frequency of the sine wave. Map this onto the byte value with the nearest frequency.
Obviously, longer values of length and further-apart frequency-delta will make the signal more reliable, but require the sound to be longer and higher-frequency, respectively. So you will have to play around with these values to see what works.
Some last thoughts, since your title says "hidden" binary data:
If you really want the data to be "hidden", consider encrypting it before encoding it to audio.
If you want to take the steganography approach, you will have to read up on audio steganography (I imagine you can use the above techniques, but you will have to insert them as extremely low-volume signals on top of the existing sound).

Categories

Resources