I am trying to join byte-arrays of wav-sound and it works except for backgroundnoise. Anyone knows any algoritm to add two byte-arrays of sound.
This is what I have tried so far
for(int i=0;i<bArr1.length;i++)
{
bArrJoined[i]=bArr1[i] + bArr2[i];
}
also tried to divide by 2 not to be to high numbers
for(int i=0;i<bArr1.length;i++)
{
bArrJoined[i]=(bArr1[i] + bArr2[i]) / 2;
}
Anyone knows how to make this work without the noise?
A number of things could cause artifacts here. Different audio sampling rates or data bit sizes could do it.
Assuming those are non-issues, you should be aware you can't add a byte with another byte without overflow (256 will become 0, etc.). So convert to int before adding. Clipping will occur if you exceed the max volume, so your divide by 2 operation is smart and should stop that issue. The divide operation should occur with the int versions. Only cast back to byte at the end.
However, if you aren't working with 8-bit audio, then a byte is not your atomic unit. For example, 16-bit audio uses 2 bytes and you would need to convert every two consecutive bytes to an int (with respect to proper endianness) before you perform any mathematical operations on the values. 32-bit audio data occupies 4 consecutive bytes for each single numeric value. Just having an array of bytes does not in itself tell you where the data boundaries are.
Related
I just found out that there is BitSet in java. There are already arrays and similar data structures. Where can BitSet be used?
As the above answer only explains what a BitSet is, I am providing here an answer of how I use BitSet and why. At first, I did not knew that the BitSet construct exists. I have a QR Code generator in C++ and for flexible reasons I don't want to use a specific Bitmap structures in returning this QR Code back to the caller. The QR Code is just black and white and can be represented as a series of bits. The problem was that in the JNI C++, I have to return the byte array that represents these series of bits and then I have to return the count of bits. Note that the size of the bytes array alone could not tell the count of bits. In effect, I am face with a scenario wherein my JNI C++ has to return two values:
the byte[] array
the count of bits
My first solution, was to return an array of boolean. The content of this array are the QR Code pixels, and the square root of the length of the array is the length of the side. Of course this worked but I felt wasted because it is supposed to be a series of bits. My next attempt was to return Pair<int, byte[]> object which, after lots of hair pulling i am not able to make it work in C++. Here comes the BitSet(145) construct. By returning this BitSet object, I am conveying two types of information i listed above. But there is minor trick. If QR Code pixel has total 144 pixels, because one side is 12, then you have to allocate BitSet(145) and do obj.set(144). That is, we introduce an artificial last bit that we then set, but this last bit is not part of the QR Code pixels. This ensures that, BitSet::length() correctly returns the bit count. So in Kotlin:
var pixels:BitSet = getqrpixels(inputdata)
var pixels_len = pixels.length() - 1
var side = sqrt(pixels_len.toFloat()).toInt()
drawSquareBitmap(pixels, side)
And thus, is my unexpected use case of this mysterious BitSet.
Take a look at this:
https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html
A BitSet is a vector of bits. Each entry in the list is either true (1) or false (0). The BitSet class comes with methods that resemble the bitwise operators. It is a little bit more flexible then a normal binary type.
BitSet, unlike a boolean[], is actually a dynamically sized bitmask. Essentially, instead of using booleans to store values, it uses longs, where each of the longs 64 bits are used to store a single bit.
I know that there are a lot of resources online explaining how to deinterleave PCM data. In the course of my current project I have looked at most of them...but I have no background in audio processing and I have had a very hard time finding a detailed explanation of how exactly this common form of audio is stored.
I do understand that my audio will have two channels and thus the samples will be stored in the format [left][right][left][right]...
What I don't understand is what exactly this means. I have also read that each sample is stored in the format [left MSB][left LSB][right MSB][right LSB]. Does this mean the each 16 bit integer actually encodes two 8 bit frames, or is each 16 bit integer its own frame destined for either the left or right channel?
Thank you everyone. Any help is appreciated.
Edit: If you choose to give examples please refer to the following.
Method Context
Specifically what I have to do is convert an interleaved short[] to two float[]'s each representing the left or right channel. I will be implementing this in Java.
public static float[][] deinterleaveAudioData(short[] interleavedData) {
//initialize the channel arrays
float[] left = new float[interleavedData.length / 2];
float[] right = new float[interleavedData.length / 2];
//iterate through the buffer
for (int i = 0; i < interleavedData.length; i++) {
//THIS IS WHERE I DON'T KNOW WHAT TO DO
}
//return the separated left and right channels
return new float[][]{left, right};
}
My Current Implementation
I have tried playing the audio that results from this. It's very close, close enough that you could understand the words of a song, but is still clearly not the correct method.
public static float[][] deinterleaveAudioData(short[] interleavedData) {
//initialize the channel arrays
float[] left = new float[interleavedData.length / 2];
float[] right = new float[interleavedData.length / 2];
//iterate through the buffer
for (int i = 0; i < left.length; i++) {
left[i] = (float) interleavedData[2 * i];
right[i] = (float) interleavedData[2 * i + 1];
}
//return the separated left and right channels
return new float[][]{left, right};
}
Format
If anyone would like more information about the format of the audio the following is everything I have.
Format is PCM 2 channel interleaved big endian linear int16
Sample rate is 44100
Number of shorts per short[] buffer is 2048
Number of frames per short[] buffer is 1024
Frames per packet is 1
I do understand that my audio will have two channels and thus the samples will be stored in the format [left][right][left][right]... What I don't understand is what exactly this means.
Interleaved PCM data is stored one sample per channel, in channel order before going on to the next sample. A PCM frame is made up of a group of samples for each channel. If you have stereo audio with left and right channels, then one sample from each together make a frame.
Frame 0: [left sample][right sample]
Frame 1: [left sample][right sample]
Frame 2: [left sample][right sample]
Frame 3: [left sample][right sample]
etc...
Each sample is a measurement and digital quantization of pressure at an instantaneous point in time. That is, if you have 8 bits per sample, you have 256 possible levels of precision that the pressure can be sampled at. Knowing that sound waves are... waves... with peaks and valleys, we are going to want to be able to measure distance from the center. So, we can define center at 127 or so and subtract and add from there (0 to 255, unsigned) or we can treat those 8 bits as signed (same values, just different interpretation of them) and go from -128 to 127.
Using 8 bits per sample with single channel (mono) audio, we use one byte per sample meaning one second of audio sampled at 44.1kHz uses exactly 44,100 bytes of storage.
Now, let's assume 8 bits per sample, but in stereo at 44.1.kHz. Every other byte is going to be for the left, and every other is going to be for the R.
LRLRLRLRLRLRLRLRLRLRLR...
Scale it up to 16 bits, and you have two bytes per sample (samples set up with brackets [ and ], spaces indicate frame boundaries)
[LL][RR] [LL][RR] [LL][RR] [LL][RR] [LL][RR] [LL][RR]...
I have also read that each sample is stored in the format [left MSB][left LSB][right MSB][right LSB].
Not necessarily. The audio can be stored in any endianness. Little endian is the most common, but that isn't a magic rule. I do think though that all channels go in order always, and front left would be channel 0 in most cases.
Does this mean the each 16 bit integer actually encodes two 8 bit frames, or is each 16 bit integer its own frame destined for either the left or right channel?
Each value (16-bit integer in this case) is destined for a single channel. Never would you have two multi-byte values smashed into each other.
I hope that's helpful. I can't run your code but given your description, I suspect you have an endian problem and that your samples aren't actual big endian.
Let's start by getting some terminology out of the way
A channel is a monaural stream of samples. The term does not necessarily imply that the samples are contiguous in the data stream.
A frame is a set of co-incident samples. For stereo audio (e.g. L & R channels) a frame contains two samples.
A packet is 1 or more frames, and is typically the minimun number of frames that can be processed by a system at once. For PCM Audio, a packet often contains 1 frame, but for compressed audio it will be larger.
Interleaving is a term typically used for stereo audio, in which the data stream consists of consecutive frames of audio. The stream therefore looks like L1R1L2R2L3R3......LnRn
Both big and little endian audio formats exist, and depend on the use-case. However, it's generally ever an issue when exchanging data between systems - you'll always use native byte-order when processing or interfacing with operating system audio components.
You don't say whether you're using a little or big endian system, but I suspect it's probably the former. In which case you need to byte-reverse the samples.
Although not set in stone, when using floating point samples are usually in the range -1.0<x<+1.0, so you want to divide the samples by 1<<15. When 16-bit linear types are used, they are typically signed.
Taking care of byte-swapping and format conversions:
int s = (int) interleavedData[2 * i];
short revS = (short) (((s & 0xff) << 8) | ((s >> 8) & 0xff))
left[i] = ((float) revS) / 32767.0f;
Actually your are dealing with an almost typical WAVE file at Audio CD quality, that is to say :
2 channels
sampling rate of 44100 kHz
each amplitude sample quantized on a 16-bits signed integer
I said almost because big-endianness is usually used in AIFF files (Mac world), not in WAVE files (PC world). And I don't know without searching how to deal with endianness in Java, so I will leave this part to you.
About how the samples are stored is quite simple:
each sample takes 16-bits (integer from -32768 to +32767)
if channels are interleaved: (L,1),(R,1),(L,2),(R,2),...,(L,n),(R,n)
if channels are not: (L,1),(L,2),...,(L,n),(R,1),(R,2),...,(R,n)
Then to feed an audio callback, it is usually required to provide 32-bits floating point, ranging from -1 to +1. And maybe this is where something may be missing in your aglorithm. Dividing your integers by 32768 (2^(16-1)) should make it sound as expected.
I ran into a similar issue with de-interleaving the short[] frames that came in through Spotify Android SDK's onAudioDataDelivered().
The documentation for onAudioDelivered was poorly written a year ago. See Github issue. They've updated the docs with a better description and more accurate parameter names:
onAudioDataDelivered(short[] samples, int sampleCount, int sampleRate, int channels)
What can be confusing is that samples.length can be 4096. However, it contains only sampleCount valid samples. If you're receiving stereo audio, and sampleCount = 2048 there are only 1024 frames (each frame has two samples) of audio in samples array!
So you'll need to update your implementation to make sure you're working with sampleCount and not samples.length.
I am writing a program in java in which takes the input from the user as to how many times a given image needs to be compressed, based on which it should compress the image.
Initially we convert image into pixel matrix, find the probabilities of each pixel appearing in that matrix and apply huffman code to obtain code in the form of 0's n 1's.
Now if we try to compress it 2nd time we will have only 2 probabilities i.e of 0 n 1.Hence we cant apply the huffman code now.
so what can be done in this situation?
You could apply an arithmetic code on two symbols. If, for example, there are many more zeros than ones then an arithmetic code would reduce the total number of bits by encoding the zeros each with less than one bit, and the ones with more than one bit. (This is done by considering the output bits to be a binary fraction, and each new input bit reducing the range of the binary fraction.)
However you will find after compressing using Huffman codes, that you will have very close to the same number of ones as zeros. It will not be compressible this way. Or really any way.
I read this line in the Java tutorial:
byte: The byte data type is an 8-bit signed two's complement integer. It has
a minimum value of -128 and a maximum value of 127 (inclusive). The
byte data type can be useful for saving memory in large arrays, where
the memory savings actually matters. They can also be used in place of
int where their limits help to clarify your code; the fact that a
variable's range is limited can serve as a form of documentation.
I don't clearly understand the bold line. Can somebody explain it for me?
Byte has a (signed) range from -128 to 127, where as int has a (also signed) range of −2,147,483,648 to 2,147,483,647.
What it means is that since the values you're going to use will always be between that range, by using the byte type you're telling anyone reading your code this value will be at most between -128 to 127 always without having to document about it.
Still, proper documentation is always key and you should only use it in the case specified for readability purposes, not as a replacement for documentation.
If you're using a variable which maximum value is 127 you can use byte instead of int so others know without reading any if conditions after, which may check the boundaries, that this variable can only have a value between -128 and 127.
So it's kind of self-documenting code - as mentioned in the text you're citing.
Personally, I do not recommend this kind of "documentation" - only because a variable can only hold a maximum value of 127 doesn't reveal it's really purpose.
Integers in Java are stored in 32 bits; bytes are stored in 8 bits.
Let's say you have an array with one million entries. Yikes! That's huge!
int[] foo = new int[1000000];
Now, for each of these integers in foo, you use 32 bits or 4 bytes of memory. In total, that's 4 million bytes, or 4MB.
Remember that an integer in Java is a whole number between -2,147,483,648 and 2,147,483,647 inclusively. What if your array foo only needs to contain whole numbers between, say, 1 and 100? That's a whole lot of numbers you aren't using, by declaring foo as an int array.
This is when byte becomes helpful. Bytes store whole numbers between -128 and 127 inclusively, which is perfect for what you need! But why choose bytes? Because they use one-fourth of the space of integers. Now your array is wasting less memory:
byte[] foo = new byte[1000000];
Now each entry in foo takes up 8 bits or 1 byte of memory, so in total, foo takes up only 1 million bytes or 1MB of memory.
That's a huge improvement over using int[] - you just saved 3MB of memory.
Clearly, you wouldn't want to use this for arrays that hold numbers that would exceed 127, so another way of reading the bold line you mentioned is, Since bytes are limited in range, this lets developers know that the variable is strictly limited to these bounds. There is no reason for a developer to assume that a number stored as a byte would ever exceed 127 or be less than -128. Using appropriate data types saves space and informs other developers of the limitations imposed on the variable.
I imagine one can use byte for anything dealing with actual bytes.
Also, the parts (red, green and blue) of colors commonly have a range of 0-255 (although byte is technically -128 to 127, but that's the same amount of numbers).
There may also be other uses.
The general opposition I have to using byte (and probably why it isn't seen as often as it can be) is that there's lots of casting needed. For example, whenever you do arithmetic operations on a byte (except X=), it is automatically promoted to int (even byte+byte), so you have to cast it if you want to put it back into a byte.
A very elementary example:
FileInputStream::read returns a byte wrapped in an int (or -1). This can be cast to an byte to make it clearer. I'm not supporting this example as such (because I don't really (at this moment) see the point of doing the below), just saying something similar may make sense.
It could also have returned a byte in the first place (and possibly thrown an exception if end-of-file). This may have been even clearer, but the way it was done does make sense.
FileInputStream file = new FileInputStream("Somefile.txt");
int val;
while ((val = file.read()) != -1)
{
byte b = (byte)val;
// ...
}
If you don't know much about FileInputStream, you may not know what read returns, so you see an int and you may assume the valid range is the entire range of int (-2^31 to 2^31-1), or possibly the range of a char (0-65535) (not a bad assumption for file operations), but then you see the cast to byte and you give that a second thought.
If the return type were to have been byte, you would know the valid range from the start.
Another example:
One of Color's constructors could have been changed from 3 int's to 3 byte's instead, since their range is limited to 0-255.
It means that knowing that a value is explicitly declared as a very small number might help you recall the purpose of it.
Go for real docs when you have to create a documentation for your code, though, relying on datatypes is not documentation.
An int covers the values from 0 to 4294967295 or 2 to the 32nd power. This is a huge range and if you are scoring a test that is out of 100 then you are wasting that extra spacce if all of your numbers are between 0 and 100. It just takes more memory and harddisk space to store ints, and in serious data driven applications this translates to money wasted if you are not using the extra range that ints provide.
byte data types are generally used when you want to handle data in the forms of streams either from file or from network. Reason behind this is because network and files works on the concept of byte.
Example: FileOutStream always takes byte array as input parameter.
I need to read in a couple of extremely large strings which are comprised of binary digits. These strings can be extremely large (up to 100,000 digits) which I need to store, be able to manipulate (flip bits) and add together. My first though was to split the string in to 8 character chunks, convert them to bytes and store them in an array. This would allow me to flip bits with relative ease given an index of the bit needed to be flipped, but with this approach I'm unsure how I would go about adding the entirety of the two values together.
Can anyone see a way of storing these values in a memory efficient manner which would allow me to be able to still be able to perform calculations on them?
EDIT:
"add together" (concatenate? arithmetic addition?) - arithmetic addition
My problem is that in the hardest case I have two 100,000 bit numbers (stored in an array of 12,500 bytes). Storing and manually flipping bits isn't an issue, but I need the sum of both numbers and then to be able to find out what the xth bit of this is.
"Strings of binary digits" definitely sound like byte arrays to me. To "add" two such byte arrays together, you'd just allocate a new byte array which is big enough to hold everything, and copy the contents using System.arraycopy.
However that assumes each "string" is a multiple of 8 bits. If you want to "add" a string of 15 bits to another string of 15 bits, you'll need to do bit-shifting. Is that likely to be a problem for you? Depending on what operations you need, you may even want to just keep an object which knows about two byte arrays and can find an arbitrary bit in the logically joined "string".
Either way, byte[] is going to be the way forward - or possibly BitSet.
What about
// Addition
byte[] resArr = new byte[byteArr1.length];
for (int i=0; i<byteArr1.length; i++)
{
res = byteArr1[i]+byteArr2[i];
}
?
Is it something like this you are trying to do?