I need to read the audio streaming and determine generated ultrasounds
How can I find a certain sequence of sounds from streaming audio?
At first I thought in the direction of DTMF, but then rejected it because it is the human ear hears.
If you have any other ideas, I'll be happy to hear them.
The straightforward way would be using a Fourier transform that turns periodic signals into a nice frequency chart. Chop your incoming signal into short portions, apply FFT and see if you have high enough levels at the right part of the spectrum. This will of course work only for signals that are long enough.
But detecting ultrasound with stock PC audio input may be tricky; it's standard to discretize the incoming sound ad 44100 Hz, so you'll only have very distorted signs of near ultrasound. Newer cards are capable of higher discretization frequencies, like 192 kHz.
Related
AudioInputStream stream = AudioSystem.getAudioInputStream(new File("file_a4.wav"));
I am looking for a way to recognise the frequency of a musical scale sound (e.g. A4 = 440 Hz) recorded on a .wav file. I have read a lot about FFT, but it has been suggested that the frequencies on the musical scale do not match the FFT.
I have also heard about DTFT. What should I use to recognise the frequency from a sound file?
What I understand from your question is that you want to recognize the musical note/s an instrument is playing in a wav file. If that is the case, there are several algorithms for doing that, and you could always train a neural network for doing that too.
Some important Things to take into account are:
Any instrument (the same would happen for the musical sounds produced by the human voice) has its own particular "color" when producing a note. This color is called the timbre (https://en.wikipedia.org/wiki/Timbre), and is composed by the harmonic and inharmonic frequencies that surround the frequency you psychoacoustically perceive when listening to that specific note. This is why you cannot just look for the peak of an FFT to detect the musical note, and it is also the reason why a piano sounds different than a guitar when playing the same note.
The analysis of an audio signal is often performed by windowing the signal and calculating the DFT of the windowed part of the signal. Each window would then produce its own spectrum, and it s from the analysis of each individual spectrum and/or the analysis of how they interact that you (or your CNN, for example) will obtain your conclusions/results. This process of windowing the signal and calculating the DFTs produces a spectogram (https://en.wikipedia.org/wiki/Spectrogram#:~:text=A%20spectrogram%20is%20a%20visual,sonographs%2C%20voiceprints%2C%20or%20voicegrams.)
After that short introduction, here are some simple algorithms for identifying single notes in a wav file. You will be able to find implementations of those algorithms on the internet, and many others. The detection of the notes produced by chords is more complex but can be done with other algorithms or neural networks.
On the use of autocorrelation analysis for pitch detection: https://ieeexplore.ieee.org/document/1162905
YIN algorithm: http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf
I am attempting to write a very simple DAW in Java but am having trouble playing an audio clip in a sequence. I have looked into both the sampled and MIDI classes in Java Sound but what I really need is a hybrid of the two.
It seems that with the MIDI classes you cannot use a sequencer for example, to play your own audio clip.
I have attempted to write my own sequencer using scheduling to play a javax.sound.sampled.Clip in a sequence but the timings vary far too much. It is not really a viable option as it doesn't keep time.
Does anybody have any suggestions of how I could get around this?
I can attest that an audio mixing system combining aspects of MIDI and samples can be written in Java, as I wrote my own and it currently works with samples and a couple real-time synths that I also wrote.
The key is making the audio data of the samples available on a per-frame basis and a frame-counting command-processor/audio-mixer that both manages the execution of "commands," and collects and mixes the audio frame data. With 44100 fps, that's accuracy in the vicinity of 0.02 milliseconds. I can describe in more detail if requested.
Another way to go, probably saner, though I haven't done it personally, would be to make use of a Java bridge to a system such as Jack.
EDIT: Answering questions in comment (12/8/19).
Audio sample data in Java is usually either held in memory (Java uses Clip) or read from a .wav file. Because the individual frames are not exposed by Clip, I wrote an alternate, and use it to hold the data as signed floats ranging -1 to 1. Signed floats are a common way to hold audio data that we are going to perform multiple operations upon.
For playback of .wav audio, Java combines reading the data with AudioInputStream and outputting with SourceDataLine. Your system will have to sit in the middle, intercepting the AudioInputStream, convert to PCM float frames, and counting the frames as you go.
A number of sources or tracks can be processed at the same time, and merged (simple addition of the normalized floats) to a single signal. This signal can be converted back to bytes and sent out for playback via a single SourceDataLine.
Counting output frames from an arbitrary 0th frame from the single SourceDataLine will help with keeping constituent incoming tracks coordinated, and will provide the frame number reference used to schedule any additional commands that you wish to execute prior to that frame being output (e.g., changing a volume/pan of a source, or a setting on a synth).
My personal alternate to a Clip is very similar to AudioCue which you are welcome to inspect and use. The main difference is that for better or worse, I'm processing everything one frame at a time in my system, and AudioCue and its "Mixer" process buffer loads. I've had several very credible people criticize my personal per-frame system as inefficient, so when I made the public API for AudioCue, I bowed to that preconception. [There are ways to add buffering to a per-frame system to recapture that efficiency, and per-frame makes scheduling simpler. So I'm sticking with my per-frame logical scheme.]
No, you can't use a sequencer to play your own clips directly.
In the MIDI world, you have to deal with samples, instruments, and soundbanks.
Very quickly, a sample is the audio data + informations such as looping points, note range covered by the sample, base volume and envelopes, etc.
An instrument is a set of samples, and a soundbank contain a set of instruments.
If you want to use your own sounds to play some music, you must make a soundbank out of them.
You will also need to use another implementation than the default provided by Java, because that default only read soundbanks in a proprietary format, which is gone since at least 15 and perhaps even 20 years.
Back in 2008-2009, there existed for example Gervill. It was able to read SF2 and DLS soundbanks. SF2 and DLS are two popular soundbank formats, several programs exist in the market, free or paid, to edit them.
If you want to go from the other way round, starting with sampled, that's also exact as you ahve noticed, you can't rely on timers, task schedule, Thread.sleep and the like to have enough precision.
The best precision you can achieve by using those is around 10ms, what's of course far too few to be acceptable for music.
The usual way to go here is to generate the audio of your music by mixing your audio clips yourself into the final clip. So you can achieve frame precision.
In fact that's very roughly what does a MIDI synthesizer.
I've gone through the tutorials for the Java Sound API and I've successfully read off data from my microphone.
I would now like to go a step further and get data synchronously from multiple microphones in a microphone array (like a PS3 Eye or Respeaker)
I could get a TargetDataLine for each microphone and open/start/write the input to buffers - but I don't know how to do this in a way that will give me data that I can then line up time-wise (I would like to eventually do beamforming)
When reading from something like ALSA I would get the bytes from the different microphone simultaneously, so I know that each byte from each microphone is from the same time instant - but the Java Sound API seems to have an abstration that obfuscates this b/c you are just dumping/writing data out of separate line buffers and processing it and each line is acting separately. You don't interact with the whole device/mic-array at once
However I've found someone who managed to do beamforming in Java with the Kinect 1.0 so I know it should be possible. The problem is that the secret sauce is inside a custom Mixer object inside a .jar that was pulled out of some other software.. So I don't have any easy way to figure out how they pulled it off
You will only be able to align data from multiple sources with the time synchronous accuracy to perform beam-forming if this is supported by the underlying hardware drivers.
If the underlying hardware provides you with multiple, synchronised, data-streams (e.g. recording in 2 channels - in stereo), then your array data will be time synchronised.
If you are relying on the OS to simply provide you with two independent streams, then maybe you can rely on timestamping. Do you get the timestamp of the first element? If so, then you can re-align data by dropping samples based on your sample rate. There may be a final difference (delta-t) that you will have factor in to your beam-forming algorithm.
Reading about the PS3 Eye (which has an array of microphones), you will be able to do this if the audio driver provides all the channels at once.
For Java, this probably means "Can you open the channel with an AudioFormat that includes 4 channels"? If yes, then your samples will contain multiple frames and the decoded frame data will (almost certainly) be time aligned.
To quote the Java docs : "A frame contains the data for all channels at a particular time".
IDK what "beamforming" is, but if there is hardware that can provide synchronization, using that would obviously be the best solution.
Here, for what it is worth, is what should be a plausible algorithmic way to manage synchronization.
(1) Set up a frame counter for each TargetDataLine. You will have to convert bytes to PCM as part of this process.
(2) Set up some code to monitor the volume level on each line, some sort of RMS algorithm I would assume, on the PCM data.
(3) Create a loud, instantaneous burst that reaches each microphone at the same time, one that the RMS algorithm is able to detect and to give the frame count for the onset.
(4) Adjust the frame counters as needed, and reference them going forward on each line of incoming data.
Rationale: Java doesn't offer real-time guarantees, as explained in this article on real-time, low latency audio processing. But in my experience, the correspondence between the byte data and time (per the sample rate) is very accurate on lines closest to where Java interfaces with external audio services.
How long would frame counting remain accurate without drifting? I have never done any tests to research this. But on a practical level, I have coded a fully satisfactory "audio event" scheduler based on frame-counting, for playing multipart scores via real-time synthesis (all done with Java), and the timing is impeccable for the longest compositions attempted (6-7 minutes in length).
So I am writing a program that splits an audio clip up into multiple parts when no sound is playing. So that all the clips created from the sound file only contain sections with sound. How would I accomplish this using Java? I plan on using FLAC, but the program currently supports WAV as well. Would RMS be the best way to determine this? Bonus points for any code.
You can roughly approximate the 'loudness' of a sampled waveform by averaging the squares of the differences between samples n and n+1. This will give you a rough indicator of how "loud" these samples will appear to the hearer.
The method is more sensitive to high frequencies than low ones, thats why it can be off quite a bit if the sound has a very extreme frequency distribution.
For a precise solution you will need to take the FFT approach and also correct the extracted frequencies weighting by a model representing the hearers ear (not all frequencies feel equally loud at the same DB level).
I am working on a small example application for my fourth year project (dealing with Functional Reactive Programming). The idea is to create a simple program that can play a .wav file and then shows a 'bouncing' animation of the current volume of the playing song (like in audio recording software). I'm building this in Scala so have mainly been looking at Java libraries and existing solutions.
Currently, I have managed to play a .wav file easily but I can't seem to achieve the second goal. Basically is there a way I can decode a .wav file so I can have someway of accessing
the 'volume' at any given time? By volume I think I means its amplitude but I may be wrong about this - Higher Physics was a while ago....
Clearly, I don't know much about this at all so it would be great if someone could point me in the right direction!
In digital audio processing you typically refer to the momentary peak amplitude of the signal (this is also called PPM -- peak programme metering). Depending on how accurate you want to be or if you wish to model some standardised metering or not, you could either
just use a sliding window of sample frames (find the maximum absolute value per window)
implement some sort of peak-hold mechanism that retains the last peak value for a given duration and then start to have the value 'fall' by a given amount of decibels per second.
The other measuring mode is RMS which is calculated by integrating over a certain time window (add the squared sample values, divide by the window length, and take the square-root, thus root-mean-square RMS). This gives a better idea of the 'energy' of the signal, moving smoother than peak measurements, but not capturing the maximum values observed. This mode is sometimes called VU meter as well. You can approximate this with a sort of lagging (lowpass) filter, e.g. y[i] = y[i-1]*a + |x[i]|*(a-1), for some value 0 < a < 1
You typically display the values logarithmically, i.e. in decibels, as this corresponds better with our perception of signal strength and also for most signals produces a more regular coverage of your screen space.
Three projects I'm involved with may help you:
ScalaAudioFile which you can use to read the sample frames from an AIFF or WAVE file
ScalaAudioWidgets which is a still young and incomplete project to provide some audio application widgets on top of scala-swing, including a PPM view -- just use a sliding window and set the window's current peak value (and optionally RMS) at a regular interval, and the view will take care of peak-hold and fall times
(ScalaCollider, a client for the SuperCollider sound synthesis system, which you might use to play back the sound file and measure the peak and RMS amplitudes in real time. The latter is probably an overkill for your project and would involve some serious learning curve if you have never heard of SuperCollider. The advantage would be that you don't need to worry about synchronising your sound playback with the meter display)
In a wav file, the data at a given point in the stream IS the volume (shifted by half of the dynamic range). In other words, if you know what type of wav file (for example 8 bit, mono) each byte represents a single sample. If you know the sample rate (say 44100 HZ) then multiply the time by 44100 and that is the byte you want to look at.
The value of the byte is the volume (distance from the middle.. 0 and 255 are the peaks, 127 is zero). This is assuming that the encoding is not mu-law encoding. I found some good info on how to tell the difference, or better yet, convert between these formats here:
http://www.gnu.org/software/octave/doc/interpreter/Audio-Processing.html
You may want to average these samples though over a window of some fixed number of samples.