Display audio waveform and zoom - java

I'm able to display waveform but I don't know how to implement zoom in on the waveform.
Any idea?
Thanks piccolo

By Zoom, I presume you mean horizontal zoom rather than vertical. The way audio editors do this is to scan the wavform breaking it up into time windows where each pixel in X represents some number of samples. It can be a fractional number, but you can get away with dis-allowing fractional zoom ratios without annoying the user too much. Once you zoom out a bit the max value is always a positive integer and the min value is always a negative integer.
for each pixel on the screen, you need to have to know the minimum sample value for that pixel and the maximum sample value. So you need a function that scans the waveform data in chunks and keeps track of the accumulated max and min for that chunk.
This is slow process, so professional audio editors keep a pre-calculated table of min and max values at some fixed zoom ratio. It might be at 512/1 or 1024/1. When you are drawing with a zoom ration of > 1024 samples/pixel, then you use the pre-calculated table. if you are below that ratio you get the data directly from the file. If you don't do this you will find that you drawing code gets to be too slow when you zoom out.
Its worthwhile to write code that handles all of the channels of the file in an single pass when doing this scanning, slowness here will make your whole program feel sluggish, it's the disk IO that matters here, the CPU has no trouble keeping up, so straightforward C++ code is fine for building the min/max tables, but you don't want to go through the file more than once and you want to do it sequentially.
Once you have the min/max tables, keep them around. You want to go back to the disk as little as possible and many of the reasons for wanting to repaint your window will not require you to rescan your min/max tables. The memory cost of holding on to them is not that high compared to the disk io cost of building them in the first place.
Then you draw the waveform by drawing a series of 1 pixel wide vertical lines between the max value and the min value for the time represented by that pixel. This should be quite fast if you are drawing from pre built min/max tables.
Answered by https://stackoverflow.com/users/234815/John%20Knoeller

Working on this right now, c# with a little linq but should be easy enough to read and understand. The idea here is to have a array of float values from -1 to 1 representing the amplitude for every sample in the wav file. Then knowing how many samples per second, we then need a scaling factor - segments per second. At this point you simply are reducing the datapoints and smoothing them out. to zoom in really tight give a samples per second of 1000, to zoom way out maybe 5-10. Note right now im just doing normal averaing, where this needs to be updated to be much more efficent and probably use RMS (root-mean-squared) averaging to make it perfect.
private List<float> BuildAverageSegments(float[] aryRawValues, int iSamplesPerSecond, int iSegmentsPerSecond)
{
double nDurationInSeconds = aryRawValues.Length/(double) iSamplesPerSecond;
int iNumSegments = (int)Math.Round(iSegmentsPerSecond*nDurationInSeconds);
int iSamplesPerSegment = (int) Math.Round(aryRawValues.Length/(double) iNumSegments); // total number of samples divided by the total number of segments
List<float> colAvgSegVals = new List<float>();
for(int i=0; i<iNumSegments-1; i++)
{
int iStartIndex = i * iSamplesPerSegment;
int iEndIndex = (i + 1) * iSamplesPerSegment;
float fAverageSegVal = aryRawValues.Skip(iStartIndex).Take(iEndIndex - iStartIndex).Average();
colAvgSegVals.Add(fAverageSegVal);
}
return colAvgSegVals;
}
Outside of this you need to get your audio into a wav format, you should be able to find source everywhere to read that data, then use something like this to convert the raw byte data to floats - again this is horribly rough and inefficent but clear
public float[] GetFloatData()
{
//Scale Factor - SignificantBitsPerSample
if (Data != null && Data.Length > 0)
{
float nMaxValue = (float) Math.Pow((double) 2, SignificantBitsPerSample);
float[] aryFloats = new float[Data[0].Length];
for (int i = 0; i < Data[0].Length; i++ )
{
aryFloats[i] = Data[0][i]/nMaxValue;
}
return aryFloats;
}
else
{
return null;
}
}

Related

How to find the length of a short array to fill a video with audio using Xuggler?

I'm trying to add audio to a video, where I need a single short array representing the audio. I don't know how to get the length of this array.
I've found an estimate of 91 shorts per millisecond, but I don't how how to get an exact value instead of guessing and checking.
Here's the relevant code:
IMediaWriter writer = ToolFactory.makeWriter(file.getAbsolutePath());
writer.addVideoStream(0, 0, IRational.make(fps, 1), animation.getWidth(), animation.getHeight());
writer.addAudioStream(1, 0, 2, 44100);
...
int scale = 1000 / 11; // TODO
short[] audio = new short[animation.getLength() * scale];
animation.getLength() is the length of the video in milliseconds
What's the formula for calculating the scale variable?
The reason a list of shorts is needed is since this is an animation library that supports adding lots of sounds into the outputted video. Thus, I loop through all the requested sounds, turn the sounds into short lists, and then add their values to the correct spot in the main audio list. Not using a short list would make it so I can't stack several sounds on top of each other and make the timing more difficult.
The scale is what's known as the audio sampling rate which is normally measured in Hertz (Hz) which corresponds to "samples per second".
Assuming each element in your array is a single audio sample, you can estimate the array size by multiplying the audio sampling rate by the animation duration in seconds.
For example, if your audio sampling rate is 48,000 Hz:
int audioSampleRate = 48000;
double samplesPerMillisecond = (double) audioSampleRate / 1000;
int estimatedArrayLength = (int) (animation.getLength() * samplesPerMillisecond);

Tone generator generates a second tone

I am trying to make a simple signalgenerator for isochronic pulsating sounds.
Basically a "beatFrequency" is controlling the amplitude variation of the main (pitch) frequency.
It works pretty well except that for some pitch frequencies above 4-5 K Hz, there is a second tone generated with lower frequency.
It's not for all frequencies but for quite many I can definetly hear a second tone.
What can this be? Some kind of resonance? I tried increasing the sampling rate, but its not changing anything, and using 44100 should be enough up to around 20 KHz, if I understand correctly?
I really can't figure it out on my own, so thankful for all help!
Here is an example code with beatFreequency 1 Hz, Pitch frequency 5000 Hz and samplerate 44100.
public void playSound() {
double beatFreq = 1;
double pitch = 5000;
int mSampleRate = 44100;
AudioTrack mAudioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, mSampleRate,
AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT,
256, AudioTrack.MODE_STREAM);
int loopLength = 2 * this.mSampleRate;
while (isPlaying) {
double[] mSound = new double[loopLength];
for (int i = 0; i < loopLength; i = i + 1) {
mSound[i] = beatFreq*Math.sin((1.0*Math.PI * i/(this.mSampleRate/pitch)));
mBuffer[i] = (short) (mSound[i] * Short.MAX_VALUE);
}
mAudioTrack.play();
mAudioTrack.write(mBuffer, 0, loopLength);
}
}
Here is (added) an image of the frequencies when I play the tone 4734Hz.. And for example there is a rather large peak at around 1100 Hz as well as many higher.
The code is now just using the pitch, I have removed the beat Freq:
In your code, you are using beatFreq*Math.sin((1.0*Math.PI * i/(this.mSampleRate/pitch)));to determine the frequency (Missing some sort of assignment here)
However, mSampleRate and pitch are int values, leading to an integer division instead of a double division. For special cases for pitch and samplerate , this will result into lower frequencies than intended. For greater pitch values the effect should getting worse.
Try to use double instead of int, that should get rid of the problem.
Rewriting answer, after better understanding question, that the OP wants to do amplitude modulation.
Yes, Java can do Amplitude Modulation. I've made a passable cricket sound, for example, by taking a 4.9KHz tone and modulating the volume at 66Hz, and giving the resulting tone an AR envelope.
In your code, the variable beatFreq remains constant over an entire for-loop. Isn't your intention to vary this as well over the course of time?
I think you should simultaneously compute the beatFreq wave value in its own function (but also using the varying i), and multiply that result (scaled to range from 0 to 1) against the value computed for the faster tone.
** EDIT
To move redundant calculations out of the inner loop the following is a possibility:
Have the following as instance variables:
private double incr;
private double incrSum;
private final double TWO_PI = Math.PI * 2;
Have the following calculation done only one time, in your constructor:
incr = pitch / audioFmt.getSampleRate();
incr *= TWO_PI;
This assumes pitch is a value stated in Hertz, and `audioFormat' is the Java AudioFormat being used. With Android, I don't know how the audio format sample rate is stored or accessed.
With this in place you can have a method that returns the next double PCM value with some very simple code:
private double getNextSinePCM()
{
incrSum += incr;
if (incrSum > TWO_PI)
{
incrSum -= TWO_PI;
}
return Math.sin(incrSum);
}
Note: do not reset incrSum to zero as you stated in your comment. This can introduce a discontinuity.
If you have a second tone, it would get its own increment and running sum. This the two results, you can then multiply them to get amplitude modulation.
Now, as to the question as how to properly convert the PCM double value returned to something Android can use, I can't give you a definitive answer as I am not running Android.
Just a comment: it seems to me you are enthusiastic about working with sound, but maybe self-taught or lagging a bit in basic programming techniques. Moving redundant calculations outside of a loop is kind of fundamental. So is the ability to make simple test cases for testing assumptions and trouble-shooting. As you go forward, I want to encourage you to dedicate some time to developing these fundamentals as well as pursuing the interest in sound! You might check out the StackOverflow code reading group as a resource for more tips. I am also self-taught and have learned a lot from there as well as the code-reading course at JavaRanch called "CattleDrive".

Improve Histogram

Ive made this method for getting me the pixel values of an image, im using it to compare 1 image against 50 other images. However it takes forever to produce outputs. Does anyone know of a way l can speed this method up? Would converting the images to Grayscale be a quicker way? If anyone could help with code, that would be great!
public static double[] GetHistogram (BufferedImage img) {
double[] myHistogram = new double [16777216];
for (int y = 0; y < img.getHeight(); y += 1)
{
for (int x = 0; x < img.getWidth(); x += 1)
{
int clr = img.getRGB(x,y);
Color c = new Color(img.getRGB(x, y));
int pixelIntValue = (int) c.getBlue() * 65536 + c.getGreen() * 256 + c.getRed();
myHistogram[pixelIntValue]++;
}
}
return myHistogram;
}
TLDR: use a smaller image and read this paper.
You should try to eliminate any unnecessary function calls as #Piglet mentioned, but you should definitely keep the colors in one histogram instead of a separate histogram for R, G, and B. Aside from getting rid of the extra function calls, I think there are four things you can do to speed up your algorithm—both creating and comparing the histograms—and reduce the memory usage (because less page caching means less disk thrashing and more speed).
Use a smaller image
One of the advantages of color histogram indexing is that it is relatively independent of resolution. The color of an object does not change with the size of the image. Obviously, there are limits to this—imagine trying to match objects using a 1×1 image. However, if your images have millions of pixels (like the images from most smart phones these days), you should definitely resize it. These authors found that an image resolution of only 16×11 still produced very good results [see page 17], but even resizing down to ~100×100 pixels should still provide a significant speed-up.
BufferedImage inherits the method getScaledInstance from Image, which you can use to get a smaller image.
double scalingFactor = 0.25; //You need to choose this value to work with your images
int aSmallHeight = myBigImage.getHeight() * scalingFactor;
int aSmallWidth = myBigImage.getWidth() * scalingFactor;
Image smallerImage = myBigImage.getScaledInstance(aSmallWidth, aSmallHeight, SCALE_FAST);
Reducing your image size is the single most effective thing you can do to speed up your algorithm. If you do nothing else, at least do this.
Use less information from each color channel
This won't make as much difference for generating your histograms because it will actually require a little more computation, but it will dramatically speed up comparing the histograms. The general idea is called quantization. Basically, if you have red values in the range 0..255, they can be represented as one byte. Within that byte, some bits are more important than others.
Consider this color sample image. I placed a mostly arbitrary shade of red in the top left, and in each of the other corners, I ignored one or more bits in the red channel (indicated by the underscores in the color byte). I intentionally chose a color with lots of one bits in it so that I could show the "worst" case of ignoring a bit. (The "best" case, when we ignore a zero bit, has no effect on the color.)
There's not much difference the upper right and upper left corners, even though we ignored one bit. The upper left and lower left have a visible, but minimal difference even though we ignored 3 bits. The Upper left and lower right corners are very different even though we ignored only one bit because it was the most significant bit. By strategically ignoring less significant bits, you can reduce the size of your histogram, which means there's less for the JVM to move around and fewer bins when it comes time to compare them.
Here are some solid numbers. Currently, you have 28×28×28 = 16777216 bins. If you ignore the 3 least significant bits from each color channel, you will get
25×25×25 = 32768 bins, which is 1/512 of the number of bins you are currently using. You may need to experiment with your set of images to see what level of quantization still produces acceptable results.
Quantization is very simple to implement. You can just ignore the rightmost bits by performing the bit shift operations.
int numBits = 3;
int quantizedRed = pixelColor.getRed() >> numBits;
int quantizedGreen = pixelColor.getGreen() >> numBits;
int quantizedBlue = pixelColor.getBlue() >> numBits;
Use a different color space
While grayscale might be quicker, you should not use grayscale because you lose all of your color information that way. When you're matching objects using color histograms, the actual hue or chromaticity is more important than how light or dark something is. (One reason for this is because the lighting intensity can vary across an image or even between images.) There are other representations of color that you could use that don't require you to use 3 color channels.
For example, L*a*b* (see also this) uses one channel (L) to encode the brightness, and two channels (a, b) to encode color. The a and b channels each range from -100 to 100, so if you create a histogram using only a and b, you would only need 40000 bins. The disadvantage of a histogram of only a and b is that you lose the ability to record black and white pixels. Other color spaces each have their own advantages and disadvantages for your algorithm.
It is generally not very difficult to convert between color spaces because there are many existing implementations of color space conversion functions that are freely available on the internet. For example, here is a Java conversion from RGB to L*a*b*.
If you do choose to use a different color space, be careful using quantization as well. You should apply any quantization after you do the color space conversion, and you will need to test different quantization levels because the new color space might be more or less sensitive to quantization than RGB. My preference would be to leave the image in RGB because quantization is already so effective at reducing the number of bins.
Use different data types
I did some investigating, and I notices that BufferedImage stores the image as a Raster, which uses a SampleModel to describe how pixels are stored in the data buffer. This means there is a lot of overhead just to retrieve the value of one pixel. You will achieve faster results if your image is stored as byte[] or int[]. You can get the byte array using
byte[] pixels = ((DataBufferByte) bufferedImage.getRaster().getDataBuffer()).getData();
See the answer to this previous question for more information and some sample code to convert it to a 2D array.
This last thing might not make much difference, but I noticed that you are using double for storing your histogram. You should consider whether int would work instead. In Java, int has a maximum value of > 2 billion, so overflow shouldn't be an issue (unless you are making a histogram of an image with more than 2 billion pixels, in which case, see my first point). An int uses only half as much memory as a double (which is a big deal when you have thousands or millions of histogram bins), and for many math operations they can be faster (though this depends on your hardware).
If you want to read more about color histograms for object matching, go straight to the source and read Swain and Ballard's Color Indexing paper from 1991.
Calculating a histogram with 16777216 classes is quite unusual.
Most histograms are calculated for each channel separately resulting in a 256 class histogram each for R,G and B. Or just one if you convert the image to grayscale.
I am no expert in Java. I don't know how clever the compilers optimize code.
But you call img.getHeight() for every row and img.getWidth() for every column of your image.
I don't know how often those expressions are actually evaluated but maybe you can save some processing time if you just use 2 variables that you assign the width and height of your image to befor you start your loops.
You also call img.getRGB(x,y) twice for every pixel. Same story. Maybe it is faster to just do it once. Function calls are usually slower than reading variables from memory.
You should also think about what you are doing here. img.getRGB(x,y) gives you an integer representation for a color.
Then you put that integer into a contrustor to make a Color object out of it. Then you use c.getBlue() and so on to get integer values for red, green and blue out of that Color object. Just to put it together into a integer again?
You could just use the return value of getRGB straight away and at least save 4 function calls, 3 multiplications, 3 summations...
So again given that I programmed Java for the last time like 10 years ago my function would look more like that:
public static double[] GetHistogram (BufferedImage img) {
double[] myHistogram = new double [16777216];
int width = img.getWidth()
int height = img.getHeight()
for (int y = 0; y < height; y += 1)
{
for (int x = 0; x < width; x += 1)
{
int clr = img.getRGB(x,y);
myHistogram[clr]++;
}
}
return myHistogram;
}
Of course the array type and size won't be correct and that whole 16777216 class histogram doesn't make sense but maybe that helps you a bit to speed things up.
I'd just use a bit mask to get the red, green and blue values out of that integer and create three histograms.

Generate a single period of a frequency?

I would like to be able to take a frequency (eg. 1000hz, 250hz, 100hz) and play it out through the phone hardware.
I know that Android's AudioTrack will allow me to play a 16-bit PCM if I can calculate an array of bits or shorts. I would like to calculate only a single period so that later I can loop it without any issues, and so I can keep the calculation time down.
How could this be achieved?
Looping a single period isn't necessarily a good idea - the cycle may not fit nicely into an exact number of samples so you might get an undesirable discontinuity at the end of each cycle, or worse, the audible frequency may end up slightly off.
That said, the math isn't hard:
float sample_rate = 44100;
float samples_per_cycle = sample_rate / frequency;
int samples_to_produce = ....
for (int i = 0; i < samples_to_produce; ++i) {
sample[i] = Math.floor(32767.0 * Math.sin(2 * Math.PI * i / samples_per_cycle));
}
To see what I meant above about the frequency, take the standard tuning pitch of 440 Hz.
Strictly this needs 100.227 samples, but the code above would produce 100. So if you repeat your 100 samples over and over you'll actually play the sample 441 times per second, so your pitch will be off by 1 Hz.
To avoid the problem you'd really need to calculate several periods of the waveform, although I don't know many is needed to fool the ear into hearing the right pitch.
Ideally it would be as many as are needed such that:
i / samples_per_cycle
is a whole number, so that the last sample (technically the one after the last sample) ends exactly on a cycle boundary. I think if your input frequencies are all whole numbers then producing one second's worth exactly would work.

int to float for mediaplayer.setVolume()

I am using a seekbar to change the volume of my MediaPlayer. The progress level is what I am using which gives a "1 to 100" int. I need to convert that into the float range of 0.0f to 1.0f. What is the correct way of doing that?
Thanks guys
float fVal = (float)val / 100; should do the trick.
divide by 100
int intValue = 1;
float floatValue = intValue/100.0;
Very late in the day, but as I was just reading up on this very subject I thought it worth posting an answer that doesn't just perform a straight linear scaling of the slider value 0 - 100 into a float value 0.0 - 1.0 and explain why you should be doing it differently.
So the API documentation for MediaPlayer.setVolume(float, float) states, in passing, that "UI controls should be scaled logarithmically" but doesn't explain why.
Sound as we hear it is measured in decibels (db), on a logarithmic scale. In simplified terms (over-simplified if you are an audio buff), twice the decibels = twice the volume. But because the decibel scale is logarithmic, the distance on the scale from (for example) 0 - 3db is bigger than the distance on the scale from 3db to 6db.
The most obvious effect of using linear scaling instead of logarithmic is that the volume with the slider at maximum is much more than twice as loud as the volume at half way, so most of the noticeable change in volume level happens in the lower three quarters (approximately) of the slider range rather than in an (apparently) linear fashion across the full slider range. And this is why straight linear scaling isn't quite the right way to translate a slider position into a value for the setVolume method.
Here's a simple function that will take your slider value (assumed to lie in the range 0 - 100), convert it into a logarithmic value and scale it:
private float scaleVolume(int sliderValue) {
double dSliderValue = sliderValue;
double logSliderValue = Math.log10(dSliderValue / 10);
double logMaxSliderValue = Math.log10(10);
float scaledVolume = (float) (logSliderValue / logMaxSliderValue);
return scaledVolume;
}
Now, the slider at 50 (the center position) will produce a sound that is about half as loud as when the slider is at 100 (the top position), and the slider at 25 will produce a sound that is half as loud as when the slider is at 50.
Be aware that your perception of what is "twice as loud" will be affected by the kind of audio you are playing as well as the quality of the speaker and how hard it is being pushed...
To map linearly to the range 0.0 to 1.0 use
int n = <some value>;
float val = (float)(n - 1)/99;

Categories

Resources