I am developing a java voice chat for a game, I have a problem with the audio mix when several players are talking at the same time. The audio is only sent to nearby players, so I'm storing each user's buffers separately on the client and sending the id along with the voice packet on the server. To listen, I'm going through the list of users and checking the buffers of existing users to reproduce them. However, I have a problem with audio mixing, probably mixing it wrong. How should I mix these audio packages? Audio is 16-bit PCM. When several players are talking together, there is a lot of noise/hiss in these audios, the audio is practically inaudible.
What would be the correct algorithm to apply to this mixer?
Based on my limited experience with mike inputs, my starting point would be to try the following steps:
convert 16-bit bytes to PCM
(consider applying a low-pass filter to the PCM, and possibly volume gain)
add the PCM values together from each line being mixed
convert PCM back to bytes
I can't tell from your description if you are doing step 1 correctly.
A possible place for further research might be at jitsi.org. Their service is written in Java and is open source. It would be interesting to know how they handle this. But it seems to me the most usual thing is that only one line is selected and played at any one time. There may be good reasons for this limitation. But I don't know if it's technical (e.g.,the noise accumulates in a way that drowns out the voices) or if it's just that people talking at the same time create mass confusion quite easily. I supposed there may be echoes/feedback considerations as well. I will be looking forward to seeing what information other people might contribute on this.
Related
Some months ago, I have written an own stream source client in Java for streaming playlists to your Icecast2 server.
The logic is simple:
You have multiple "Channels" and every channel has a playlist (in this case a folder filled with mp3 files). After a channel has started, it begins streaming by picking the first song and stream it via http to the icecast2 server. As you can imagine, after a song ended, the next one is picked.
Here is the code which I am currently using for sending audio to icecast:
https://gist.github.com/z3ttee/e40f89b80af16715efa427ace43ed0b4
What I would like to achieve is to implement a crossfade between two songs. So when a song ends, it should fade out and fade in the next one simultaneously.
I am relatively new when it comes to working with audio in java. What I know, that I have to rework the way the audio is sent to icecast. But there is the problem: I have no clue how to start or where to start.
If you have any idea where or how to start, feel free to share your experience.
Thank you in advance!
I think for cross-fading, you are likely going to have to use a library that works with the audio at the PCM level. If you wish to write your own mixer, the basic steps are as follows:
read the data via the input stream
using the audio format of stream, convert the audio to pcm
as pcm, the audio values can be mixed by simple addition -- so over the course of the cross fade, ramp one side up from zero and the other down to zero
convert the audio back to the original format and stream that
The cross fade that is linear, e.g., the audio data is multiplied by steps that progress linearly from 0 to 1 or vice versa (e.g., 0.1, 0.2, 0.3,...) will tend to leave the midpoint quieter than when running the beginning or ending track solo. A sine function is often used instead to keep the sum a steady volume.
There are two libraries I know of that might be helpful for mixing, but would likely require some modification. One is TinySound, the other is AudioCue (which I wrote). The modifications required for AudioCue might be relatively painless. The output of the mixer is enclosed in the class AudioMixerPlayer, a runnable that is located on line 268 of AudioMixer.java. A possible plan would be to modify the output line of this code, substituting your broadcast line for the SourceDataLine.
I should add, the songs to be played would first be loaded into the AudioCue class, which then exposes the capability of real-time volume control. But it might be necessary to tinker with the manner in which the volume commands are issued.
I'm really interested in having this work and could offer some assistance. I'm just now getting involved in projects with Socket and SocketServer, and would like to get some hands-on with streaming audio.
I am attempting to write a very simple DAW in Java but am having trouble playing an audio clip in a sequence. I have looked into both the sampled and MIDI classes in Java Sound but what I really need is a hybrid of the two.
It seems that with the MIDI classes you cannot use a sequencer for example, to play your own audio clip.
I have attempted to write my own sequencer using scheduling to play a javax.sound.sampled.Clip in a sequence but the timings vary far too much. It is not really a viable option as it doesn't keep time.
Does anybody have any suggestions of how I could get around this?
I can attest that an audio mixing system combining aspects of MIDI and samples can be written in Java, as I wrote my own and it currently works with samples and a couple real-time synths that I also wrote.
The key is making the audio data of the samples available on a per-frame basis and a frame-counting command-processor/audio-mixer that both manages the execution of "commands," and collects and mixes the audio frame data. With 44100 fps, that's accuracy in the vicinity of 0.02 milliseconds. I can describe in more detail if requested.
Another way to go, probably saner, though I haven't done it personally, would be to make use of a Java bridge to a system such as Jack.
EDIT: Answering questions in comment (12/8/19).
Audio sample data in Java is usually either held in memory (Java uses Clip) or read from a .wav file. Because the individual frames are not exposed by Clip, I wrote an alternate, and use it to hold the data as signed floats ranging -1 to 1. Signed floats are a common way to hold audio data that we are going to perform multiple operations upon.
For playback of .wav audio, Java combines reading the data with AudioInputStream and outputting with SourceDataLine. Your system will have to sit in the middle, intercepting the AudioInputStream, convert to PCM float frames, and counting the frames as you go.
A number of sources or tracks can be processed at the same time, and merged (simple addition of the normalized floats) to a single signal. This signal can be converted back to bytes and sent out for playback via a single SourceDataLine.
Counting output frames from an arbitrary 0th frame from the single SourceDataLine will help with keeping constituent incoming tracks coordinated, and will provide the frame number reference used to schedule any additional commands that you wish to execute prior to that frame being output (e.g., changing a volume/pan of a source, or a setting on a synth).
My personal alternate to a Clip is very similar to AudioCue which you are welcome to inspect and use. The main difference is that for better or worse, I'm processing everything one frame at a time in my system, and AudioCue and its "Mixer" process buffer loads. I've had several very credible people criticize my personal per-frame system as inefficient, so when I made the public API for AudioCue, I bowed to that preconception. [There are ways to add buffering to a per-frame system to recapture that efficiency, and per-frame makes scheduling simpler. So I'm sticking with my per-frame logical scheme.]
No, you can't use a sequencer to play your own clips directly.
In the MIDI world, you have to deal with samples, instruments, and soundbanks.
Very quickly, a sample is the audio data + informations such as looping points, note range covered by the sample, base volume and envelopes, etc.
An instrument is a set of samples, and a soundbank contain a set of instruments.
If you want to use your own sounds to play some music, you must make a soundbank out of them.
You will also need to use another implementation than the default provided by Java, because that default only read soundbanks in a proprietary format, which is gone since at least 15 and perhaps even 20 years.
Back in 2008-2009, there existed for example Gervill. It was able to read SF2 and DLS soundbanks. SF2 and DLS are two popular soundbank formats, several programs exist in the market, free or paid, to edit them.
If you want to go from the other way round, starting with sampled, that's also exact as you ahve noticed, you can't rely on timers, task schedule, Thread.sleep and the like to have enough precision.
The best precision you can achieve by using those is around 10ms, what's of course far too few to be acceptable for music.
The usual way to go here is to generate the audio of your music by mixing your audio clips yourself into the final clip. So you can achieve frame precision.
In fact that's very roughly what does a MIDI synthesizer.
I've gone through the tutorials for the Java Sound API and I've successfully read off data from my microphone.
I would now like to go a step further and get data synchronously from multiple microphones in a microphone array (like a PS3 Eye or Respeaker)
I could get a TargetDataLine for each microphone and open/start/write the input to buffers - but I don't know how to do this in a way that will give me data that I can then line up time-wise (I would like to eventually do beamforming)
When reading from something like ALSA I would get the bytes from the different microphone simultaneously, so I know that each byte from each microphone is from the same time instant - but the Java Sound API seems to have an abstration that obfuscates this b/c you are just dumping/writing data out of separate line buffers and processing it and each line is acting separately. You don't interact with the whole device/mic-array at once
However I've found someone who managed to do beamforming in Java with the Kinect 1.0 so I know it should be possible. The problem is that the secret sauce is inside a custom Mixer object inside a .jar that was pulled out of some other software.. So I don't have any easy way to figure out how they pulled it off
You will only be able to align data from multiple sources with the time synchronous accuracy to perform beam-forming if this is supported by the underlying hardware drivers.
If the underlying hardware provides you with multiple, synchronised, data-streams (e.g. recording in 2 channels - in stereo), then your array data will be time synchronised.
If you are relying on the OS to simply provide you with two independent streams, then maybe you can rely on timestamping. Do you get the timestamp of the first element? If so, then you can re-align data by dropping samples based on your sample rate. There may be a final difference (delta-t) that you will have factor in to your beam-forming algorithm.
Reading about the PS3 Eye (which has an array of microphones), you will be able to do this if the audio driver provides all the channels at once.
For Java, this probably means "Can you open the channel with an AudioFormat that includes 4 channels"? If yes, then your samples will contain multiple frames and the decoded frame data will (almost certainly) be time aligned.
To quote the Java docs : "A frame contains the data for all channels at a particular time".
IDK what "beamforming" is, but if there is hardware that can provide synchronization, using that would obviously be the best solution.
Here, for what it is worth, is what should be a plausible algorithmic way to manage synchronization.
(1) Set up a frame counter for each TargetDataLine. You will have to convert bytes to PCM as part of this process.
(2) Set up some code to monitor the volume level on each line, some sort of RMS algorithm I would assume, on the PCM data.
(3) Create a loud, instantaneous burst that reaches each microphone at the same time, one that the RMS algorithm is able to detect and to give the frame count for the onset.
(4) Adjust the frame counters as needed, and reference them going forward on each line of incoming data.
Rationale: Java doesn't offer real-time guarantees, as explained in this article on real-time, low latency audio processing. But in my experience, the correspondence between the byte data and time (per the sample rate) is very accurate on lines closest to where Java interfaces with external audio services.
How long would frame counting remain accurate without drifting? I have never done any tests to research this. But on a practical level, I have coded a fully satisfactory "audio event" scheduler based on frame-counting, for playing multipart scores via real-time synthesis (all done with Java), and the timing is impeccable for the longest compositions attempted (6-7 minutes in length).
Android provides a default of 15 steps for its sound systems which you can access through Audio Manager. However, I would like to have finer control.
One method of doing so seems to be altering specific files within the Android system to divide the sound levels even further then default. I would like to programmatically achieve the same effect using Java.
Fine volume control is an example of the app being able to divide the sound levels into one hundred distinct intervals. How do I achieve this?
One way, in Java, to get very precise volume adjustment is to access the PCM data directly and multiply it by some factor, usually from 0 up to 1. Another is to try and access the line's volume control, if it has one. I've given up trying to do the latter. The precision is okay in terms of amplitude, but the timing is terrible. One can only have one volume change per audio buffer read.
To access the PCM data directly, one has to iterate through the audio read buffer, translate the bytes into PCM, perform the multiplication then translate back to bytes. But this gives you per-frame control, so very smooth and fast fades can be made.
EDIT: To do this in Java, first check out the sample code snippet at the start of this java tutorial link, in particular, the section with the comment
// Here, do something useful with the audio data that's now in the audioBytes array...
There are several StackOverflow questions that show code for the math to convert audio bytes to PCM and back, using Java. Should not be hard to uncover with a search.
Pretty late to the party, but I'm currently trying to solve this issue as well. IF you are making your own media player app and are running an instance of a MediaPlayer, then you can use the function setVolume(leftScalar, rightScalar) where leftScalar and rightScalar are floats in the range of 0.0 to 1.0. representing logarithmic scale volume for each respective ear.
HOWEVER, this means that you must have a reference to the currently active MediaPlayer instance. If you are making a music app, no biggie. If you're trying to run a background service that allows users to give higher precision over all media output, I'm not sure how to use this in that scenario.
Hope this helps.
Due to (quite annoying) limitations on many J2ME phones, audio files cannot be played until they are fully downloaded. So, in order to play live streams, I'm forced to download chunks at a time, and construct ByteArrayInputStreams, which I then feed to Players.
This works well, except that there's an annoying gap of about 1/4 of a second every time a stream ends and a new one is needed. Is there any way to solve this problem, or the problem above?
The only good way to play long (3 minutes and more) tracks with J2ME JSR135, moderately reliably, on the largest number of handsets out there, is to use a "file://" url when you create the player, or to have the inputstream actually come from a FileConnection.
recent blackberry phones can use a ByteArrayInputstream only when they have a large java heap memory available.
a lot of phones running on the Symbian operating system will allow you to put files in a private area for the J2ME application while still being able to play tracks in the same location.
Unfortunately you can't get rid of these gaps, at least not on any device I've tried it on. It's very annoying indeed. It's part of the spec that you can't stream audio or video over HTTP.
If you want to stream from a server, the only way to do it is to use an RTSP server instead though you'll need to check support for this on your device.
And faking RTSP using a local server on the device (rtsp://localhost...) doesn't work either.. I tried that too.
EDIT2: Or you could just look at this which seems to be exactly what you want: http://java.sun.com/javame/reference/apis/jsr135/javax/microedition/media/protocol/DataSource.html
I would create two Player classes and make sure that I had received enough chunks before I started playing them. Then I would start playing the first chunk through player one and load the second one into player two. Then I would use the TimeBase class to keep track of how much time has passed and when I knew the first chunk would end (you should know how long each chunk has to play) then I would start playing the second chunk through the second player and load the third chunk into the first and so on and so forth until there are no more chunks to play.
The key here is using the TimeBase class properly to know when to make the transition. I think that that should get rid of the annoying 1/4 second gap bet between chunks. I hope that works, let me know if it does because it sounds really interesting.
EDIT: Player.prefetch() could also be useful here in reducing latency.