Combine 2 wav files as one simultaneous song - java

I have a program that records input from a mic while music is playing. When done, the voice is saved as a wav file. I would like to combine that wav file with the wav file song that was playing during recording. Example: song1.wav plays in the background as person is singing. The recording of that person singing is now recording1.wav. I want to combine song1.wav and recording1.wav to play simultaneously and become one song song recording finalsong1.wav. I am a beginner in Java and have yet to find a solution, or even a starting point beside how to concatenate them, which is the opposite of what I'd like to do.

I am going to list the steps involved. These steps have been covered multiple times and should be straightforward to research.
For both wav files, read in using a AudioInputStream.
As the data arrives, convert the bytes to signed PCM. If the format is 16-bit, the PCM can be signed shorts, or scaled/normalized to floats that range from -1 to 1.
Use addition to combine the corresponding data (e.g., frame 1 right channel of voice with frame 1 right channel of music). If the signals are too "hot" it may be necessary to enforce a min and max function to prevent the data from exceeding the bounds of the range, as this sounds really terrible when it happens.
Convert the summed, signed PCM data back to bytes according to the audio format.
Write to a wav file.
I know of tools that can mix tracks for playback, but I don't recall one that will automatically save the results to wav. One likely exists somewhere, or should be easy to write or commission.

Related

Crossfading songs when streaming to Icecast2 in Java

Some months ago, I have written an own stream source client in Java for streaming playlists to your Icecast2 server.
The logic is simple:
You have multiple "Channels" and every channel has a playlist (in this case a folder filled with mp3 files). After a channel has started, it begins streaming by picking the first song and stream it via http to the icecast2 server. As you can imagine, after a song ended, the next one is picked.
Here is the code which I am currently using for sending audio to icecast:
https://gist.github.com/z3ttee/e40f89b80af16715efa427ace43ed0b4
What I would like to achieve is to implement a crossfade between two songs. So when a song ends, it should fade out and fade in the next one simultaneously.
I am relatively new when it comes to working with audio in java. What I know, that I have to rework the way the audio is sent to icecast. But there is the problem: I have no clue how to start or where to start.
If you have any idea where or how to start, feel free to share your experience.
Thank you in advance!
I think for cross-fading, you are likely going to have to use a library that works with the audio at the PCM level. If you wish to write your own mixer, the basic steps are as follows:
read the data via the input stream
using the audio format of stream, convert the audio to pcm
as pcm, the audio values can be mixed by simple addition -- so over the course of the cross fade, ramp one side up from zero and the other down to zero
convert the audio back to the original format and stream that
The cross fade that is linear, e.g., the audio data is multiplied by steps that progress linearly from 0 to 1 or vice versa (e.g., 0.1, 0.2, 0.3,...) will tend to leave the midpoint quieter than when running the beginning or ending track solo. A sine function is often used instead to keep the sum a steady volume.
There are two libraries I know of that might be helpful for mixing, but would likely require some modification. One is TinySound, the other is AudioCue (which I wrote). The modifications required for AudioCue might be relatively painless. The output of the mixer is enclosed in the class AudioMixerPlayer, a runnable that is located on line 268 of AudioMixer.java. A possible plan would be to modify the output line of this code, substituting your broadcast line for the SourceDataLine.
I should add, the songs to be played would first be loaded into the AudioCue class, which then exposes the capability of real-time volume control. But it might be necessary to tinker with the manner in which the volume commands are issued.
I'm really interested in having this work and could offer some assistance. I'm just now getting involved in projects with Socket and SocketServer, and would like to get some hands-on with streaming audio.

How can I play an audio clip in a (MIDI) sequence in Java?

I am attempting to write a very simple DAW in Java but am having trouble playing an audio clip in a sequence. I have looked into both the sampled and MIDI classes in Java Sound but what I really need is a hybrid of the two.
It seems that with the MIDI classes you cannot use a sequencer for example, to play your own audio clip.
I have attempted to write my own sequencer using scheduling to play a javax.sound.sampled.Clip in a sequence but the timings vary far too much. It is not really a viable option as it doesn't keep time.
Does anybody have any suggestions of how I could get around this?
I can attest that an audio mixing system combining aspects of MIDI and samples can be written in Java, as I wrote my own and it currently works with samples and a couple real-time synths that I also wrote.
The key is making the audio data of the samples available on a per-frame basis and a frame-counting command-processor/audio-mixer that both manages the execution of "commands," and collects and mixes the audio frame data. With 44100 fps, that's accuracy in the vicinity of 0.02 milliseconds. I can describe in more detail if requested.
Another way to go, probably saner, though I haven't done it personally, would be to make use of a Java bridge to a system such as Jack.
EDIT: Answering questions in comment (12/8/19).
Audio sample data in Java is usually either held in memory (Java uses Clip) or read from a .wav file. Because the individual frames are not exposed by Clip, I wrote an alternate, and use it to hold the data as signed floats ranging -1 to 1. Signed floats are a common way to hold audio data that we are going to perform multiple operations upon.
For playback of .wav audio, Java combines reading the data with AudioInputStream and outputting with SourceDataLine. Your system will have to sit in the middle, intercepting the AudioInputStream, convert to PCM float frames, and counting the frames as you go.
A number of sources or tracks can be processed at the same time, and merged (simple addition of the normalized floats) to a single signal. This signal can be converted back to bytes and sent out for playback via a single SourceDataLine.
Counting output frames from an arbitrary 0th frame from the single SourceDataLine will help with keeping constituent incoming tracks coordinated, and will provide the frame number reference used to schedule any additional commands that you wish to execute prior to that frame being output (e.g., changing a volume/pan of a source, or a setting on a synth).
My personal alternate to a Clip is very similar to AudioCue which you are welcome to inspect and use. The main difference is that for better or worse, I'm processing everything one frame at a time in my system, and AudioCue and its "Mixer" process buffer loads. I've had several very credible people criticize my personal per-frame system as inefficient, so when I made the public API for AudioCue, I bowed to that preconception. [There are ways to add buffering to a per-frame system to recapture that efficiency, and per-frame makes scheduling simpler. So I'm sticking with my per-frame logical scheme.]
No, you can't use a sequencer to play your own clips directly.
In the MIDI world, you have to deal with samples, instruments, and soundbanks.
Very quickly, a sample is the audio data + informations such as looping points, note range covered by the sample, base volume and envelopes, etc.
An instrument is a set of samples, and a soundbank contain a set of instruments.
If you want to use your own sounds to play some music, you must make a soundbank out of them.
You will also need to use another implementation than the default provided by Java, because that default only read soundbanks in a proprietary format, which is gone since at least 15 and perhaps even 20 years.
Back in 2008-2009, there existed for example Gervill. It was able to read SF2 and DLS soundbanks. SF2 and DLS are two popular soundbank formats, several programs exist in the market, free or paid, to edit them.
If you want to go from the other way round, starting with sampled, that's also exact as you ahve noticed, you can't rely on timers, task schedule, Thread.sleep and the like to have enough precision.
The best precision you can achieve by using those is around 10ms, what's of course far too few to be acceptable for music.
The usual way to go here is to generate the audio of your music by mixing your audio clips yourself into the final clip. So you can achieve frame precision.
In fact that's very roughly what does a MIDI synthesizer.

Matching two audio files using FFT (Android Studio)

I've been working on a part of my app for the past few days where I need to simultaneously play and record an audio file. The task I need to accomplish is just to compare the recording to the audio file played and return a matching percentage. Here's what I have done so far and some context to my questions:
The target API is >15
I decided to use a .wav audio file format to simplify decoding the file
I'm using AudioRecord for recording and MediaPlayer for playing the audio file
I created a decider class in order to pass my audio file and convert it to PCM in order to perform the matching analysis
I'm using the following specs for the recording AudioFormat (CHANNEL_MONO, 16 BIT, SAMPLE_RATE = 44100)
After I pass the audio file to the decoder, I then proceed to pass it to an FFT class in order to get the frequency domain data needed for my analysis.
And below are a few questions that I have:
When I record the audio using AudioRecord, is the format PCM by default or do I need to specify this some how?
I'm trying to pass the recording to the FFT class in order to acquire the frequency domain data to perform my matching analysis. Is there a way to do this without saving the recording on the user's device?
After performing the FFT analysis on both files, do I need to store the data in a text file in order to perform the matching analysis? What are some options or possible ways to do this?
After doing a fair amount of research, all the sources that I found cover how to match the recording with a song/music contained within a data base. My goal is to see how closely two specific audio files match, how would I go about this? - Do I need to create/use hash functions in order to accomplish my goal? A detailed answer to this would be really helpful
Currently I have a separate thread for recording; separate activity for decoding the audio file; separate activity for the FFT analysis. I plan to run the matching analysis in a separate thread as well or an AsyncTask. Do you think this structure is optimal or is there a better way to do it? Also, should I pass my audio file to the decoder in a separate thread as well or can I do it in the recording thread or MatchingAnalysis thread?
Do I need to perform windowing in my operations on audio files before I can do matching comparison?
Do I need to decode the .wav file or can I just compare 2 .wav files directly instead?
Do I need to perform low-pitching operations on audio files before comparison?
In order to perform my matching comparison, what data exactly do I need to generate (power spectrum, energy spectrum, spectrogram etc)?
Am I going about this the right way or am I missing something?
In apps like Shazam, Midomi audio matching is done using technique called audio-fingerprinting which uses spectrogram and hashing.
Your first step to find FFT is correct, but then you will need to make a 2d graph between time and frequency called Spectrogram.
This spectrogram array contains more than million samples, and we can't work upon this much data. So we find peak in amplitudes. A peak will be a (time, frequency) pair corresponding to an amplitude value which is the greatest in a local neighborhood around it. The peak finding will be a computationally expensive process, and different apps or projects do this in different way. We use peaks because these will be more insensitive to background noise.
Now different songs can have same peaks, but difference will be order and time difference of occurring. So we combine these peaks into unique hashes and save them in database.
Perform the above process for each of the audio file you want your app to recognise and match them from your database. Though matching is not simple, and time difference should also be taken into account because song can be from any instant, and we have fingerprint of full song. But it is not a problem because fingerprint contains relative time difference.
It is somewhat detailed process and you can find more explanation in this link http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
There are some libraries that can do it for you dejavu (https://github.com/worldveil/dejavu) and chromaprint (Its in c++). Musicg by google is in java, but it don't perform well with background noise.
Matching two audio files is a complicated process, and like above comments I will also tell you to try first on PC then on phones.

Active Array of Streaming Audio Amplitude

I was wondering if anyone knew how to convert a continuous input into the mic of an Android device into a byte array, or time-amplitude coordinates. What I want to do is get an array of data so that
array[time]=amplitude
This must be active, which is one of the major obstacles in my path, as most audio waveform graphers rely on closed files. Can anyone guide me in the right direction?
Do you have any special requirements for what time is supposed to be? A PCM stream (which is what you get when using the AudioRecord class) is by definition a digital represenation of the input signal's amplitude sampled at regular intervals.
So if you record at 48 kHz mono, each sample in the array of PCM data that you read from the AudioRecord will represent the audio signal's amplitude at time N*20.83 us.

Create a wav with hidden binary data in it and read it (Java)

What I'm willing to do is to convert a text string into a wav file format in high frequencies (18500Hz +): this will be the encoder.
And create an engine to decode this text string from a wav formatted recording that will support error control as I will not use the same file obviously, to read, but a recording of this sound.
Thanks
An important consideration will be whether or not you want to hide the string into an existing audio file (so it sounds like a normal file, but has an encoded message -- that is called steganography), or whether you will just be creating a file that sounds like gibberish, for the purpose of encoding data only. I'm assuming the latter since you didn't ask to hide a message in an existing file.
So I assume you are not looking for low-level details on writing WAV files (I am sure you can find documentation on how to read and write individual samples to a WAV file). Obviously, the simplest approach would be to simply take each byte of the source string, and store it as a sample in the WAV file (assuming an 8-bit recording. If it's a 16-bit recording, you can store two bytes per sample. If it's a stereo 16-bit recording, you can store four bytes per sample). Then you can just read the WAV file back in and read the samples back as bytes. That's the simple approach but as you say, you want to be able to make a (presumably analog) recording of the sound, and then read it back into a WAV file, and still be able to read the data.
With the approach above, if the analog recording is not exactly perfect (and how could it be), you would lose bytes of the message. This means you need to store the message in such a way that missing bytes, or bytes that have a slight error, are not going to be a problem. How you do this will depend highly upon exactly what sort of "damage" will be happening to the sound file. I would expect two major forms of damage:
"Vertical" damage: A sample (byte) would have a slightly higher or lower value than it originally had.
"Horizontal" damage: Samples may be averaged, stretched or squashed horizontally. From a byte perspective, this means some samples may be repeated, while others may be missing.
To combat this, you need some redundancy in the message. More redundancy means the message will take up more space (be longer), but will be more reliable.
I would recommend thinking about how old (pre-mobile) telephone dial tones worked: each key generated a unique tone and sent it across the wire. The tones are long enough, and far enough apart pitch-wise that they can be distinguished even given the above forms of damage. So, choose two parameters: a) length and b) frequency-delta. For each byte of data, select a frequency, spacing the 256 byte values frequency-delta Hertz apart. Then, generate a sine wave for length milliseconds of that frequency. This encodes a lot more redundancy than the above one-byte-per-sample approach, since each byte takes up many samples, and if you lose some samples, it doesn't matter.
When you read them back in, read every length milliseconds of audio data and then estimate the frequency of the sine wave. Map this onto the byte value with the nearest frequency.
Obviously, longer values of length and further-apart frequency-delta will make the signal more reliable, but require the sound to be longer and higher-frequency, respectively. So you will have to play around with these values to see what works.
Some last thoughts, since your title says "hidden" binary data:
If you really want the data to be "hidden", consider encrypting it before encoding it to audio.
If you want to take the steganography approach, you will have to read up on audio steganography (I imagine you can use the above techniques, but you will have to insert them as extremely low-volume signals on top of the existing sound).

Categories

Resources