Compressing a given image n times using huffman coding technique

Compressing a given image n times using huffman coding technique - java

I am writing a program in java in which takes the input from the user as to how many times a given image needs to be compressed, based on which it should compress the image.
Initially we convert image into pixel matrix, find the probabilities of each pixel appearing in that matrix and apply huffman code to obtain code in the form of 0's n 1's.
Now if we try to compress it 2nd time we will have only 2 probabilities i.e of 0 n 1.Hence we cant apply the huffman code now.
so what can be done in this situation?

You could apply an arithmetic code on two symbols. If, for example, there are many more zeros than ones then an arithmetic code would reduce the total number of bits by encoding the zeros each with less than one bit, and the ones with more than one bit. (This is done by considering the output bits to be a binary fraction, and each new input bit reducing the range of the binary fraction.)
However you will find after compressing using Huffman codes, that you will have very close to the same number of ones as zeros. It will not be compressible this way. Or really any way.

Related

How to store a n*n matrix consisting of integers into a bitmask value?

I have a n*n matrix, let us consider n as 3.
So there will be a total 9 elements. My Matrix can store Integers. When I change these integer values of the 9 elements, I will get some unique combinations of matrices.
Now i want to put these grids as key in a Hash Map.
An easy way to do this is to convert my matrix in some format(may be bits). So there will be encoding of this matrix, and then when i retrieve i must decode.
I read bit masks work well here. But I am unable to figure out how to create a bit mask out of a grid in java.
I am looking for a solution which tells me how to do it through an example, and the logic flow for encoding or decoding the grid to bit mask.

Joining two (or more) byte[] of wav-sound. Gives backgroundnoise

I am trying to join byte-arrays of wav-sound and it works except for backgroundnoise. Anyone knows any algoritm to add two byte-arrays of sound.
This is what I have tried so far
for(int i=0;i<bArr1.length;i++)
{
bArrJoined[i]=bArr1[i] + bArr2[i];
}
also tried to divide by 2 not to be to high numbers
for(int i=0;i<bArr1.length;i++)
{
bArrJoined[i]=(bArr1[i] + bArr2[i]) / 2;
}
Anyone knows how to make this work without the noise?

A number of things could cause artifacts here. Different audio sampling rates or data bit sizes could do it.
Assuming those are non-issues, you should be aware you can't add a byte with another byte without overflow (256 will become 0, etc.). So convert to int before adding. Clipping will occur if you exceed the max volume, so your divide by 2 operation is smart and should stop that issue. The divide operation should occur with the int versions. Only cast back to byte at the end.
However, if you aren't working with 8-bit audio, then a byte is not your atomic unit. For example, 16-bit audio uses 2 bytes and you would need to convert every two consecutive bytes to an int (with respect to proper endianness) before you perform any mathematical operations on the values. 32-bit audio data occupies 4 consecutive bytes for each single numeric value. Just having an array of bytes does not in itself tell you where the data boundaries are.

Q: use rejection sampling for true random number generation in a range (entropy from radioactive decay)

Hi everyone I have been doing some reading and have come across true random number generation using the entropy from radioactive decay. I have written a helper tool that returns the next random byte. It uses a server that provides bits from such a setup, I believe its data is from cesium decaying. I have done quite a bit of searching and have not really been able to figure out how to go about using this to generate numbers in a range from 0..n-1.
A user on the unofficial SO irc told me this
if you have a random byte, 0..255 evenly distributed and you want a random number in the range 0..5 there are 6 values in the output range and 256 in the input range the greatest multiple of 6 that is <= 256 is 252 so you would sample your random byte until you get a number in the range 0..251 then you could take the number MOD 6 to get your output number.
Im not really sure how to sample the byte. Do I use a single byte or do I have to continually request more bytes? Im really just having a hard time rapping my head around this, so any thorough explanation not using obscure math notations would be extremely appreciated.
Thanks.

"Sampling" means (Disclaimer: not the dictionary definition) "repeatedly checking for a value", so in your case you'd read bytes until you get one in the proper range, discarding the others.

Integer compression in java

I have a sequence of Integers in the following format:
Integer1 Integer2 Integer3 Integer4 Integer5 ....
Each four consective integers corresponds to values of a single record. So, I cannot really order them.
What would be the best way to compress such file?
Updates:
1- The values are indpendent of each other. Each 4 consective integers represents a record, for example:
CustomerId PurchaseId Products MoneySpent
Each hold an integer value.
2- Ideally I would like to have it compressed as an object and on disk.
Thanks

The simplest and most compatible approach is to GZIP the file as you write it by wrapping your stream with GZIPOutputStream and reading it wrapped with GZIPInputStream.
InputStream in = new BufferedInputStream(new GZIPInputStream(new FileInputStream(filename)));
OutputStream out = new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(filename)));

Using GZip is not optimal in the given way. Since your OrderID, your PurcaseId, ProductID and MoneySpent are different to each other but all OrderIds have something in common as have PurcaseId, ProductId and MoneySpent. So it is best to store those values not row wise but column wise.
Since you usually have a sort order within this table you are about to store, one column could expressed by delta value. For example if you sort your values by OrderId you can express the sequence of 10, 23, 44, 53 as +10, +13, +21, +53. These numbers are smaller and more prone to repeat than the original number.
Integer values can be expressed as variable bit length information. First you store the number of bits of the value and than the actual value. This way you save a lot of leading zeros.
For money spent you can also think about the actual repetition of typical numbers like 99, 25, 50, 49 and so on for the cent values. It is more likely that a product has the price of 49,99 but not 51,23. So spliting the money integer into two values will give you the ability to use Huffman encoding and treat special values as symbols and the rest as runlength bits.
To express the bit length, you can also use different encoding schemes one would be yet again a huffman code of 64 symbols (64 different length information) and train a coding schema. This way you will end up with very less numbers of bits instead of writing integers or even longs.
The remaining stuff can be put into gzip. This works usually better depending on the way you express the bit length since it is easier to compress leading zeros than different bit length information but every compression cost.
Another coding scheme for bit lengths is using the min max approach.
For example for the above sequence 10, 23, 44, 53 we store 10, +43 (53), +13, +23. The idea is to know that between 10 and 53 there are 43 elements. So the next value has a maximum length of 6 (2^6 = 64) bits. This way there is no need for bit length information. You just store the sequence in the oder first minimum, next maximum, next minimum, next maximum and so on.
A more efficient scheme is using minimum, maximum, middle, middle left, middle right, middle left left, middle left right, middle right left, middle right right ... . This way you have the best chance to result in smallest bit length knowledge. Using this way results in very small sizes of the integers without additional bit length information.
Using such schemes often leaves GZip a chance of further reduction by < 10% resulting in the omitting of GZip at all.
[Summary]
So GZip is simple, if you need to squeeze out more, go for column wise instead of row/entry wise. Use special knowledge of each column. If sorted use deltas as representation. Use bit lengths informations being expressed by huffman codes (one for each column) and using values for cents and dollars for product prices often result in very good compression chances. Store sorted columns by deltas and use the tree wise storage resulting in very good knowledge about the bit length to expect next.

How to manage and manipulate extremely large binary values

I need to read in a couple of extremely large strings which are comprised of binary digits. These strings can be extremely large (up to 100,000 digits) which I need to store, be able to manipulate (flip bits) and add together. My first though was to split the string in to 8 character chunks, convert them to bytes and store them in an array. This would allow me to flip bits with relative ease given an index of the bit needed to be flipped, but with this approach I'm unsure how I would go about adding the entirety of the two values together.
Can anyone see a way of storing these values in a memory efficient manner which would allow me to be able to still be able to perform calculations on them?
EDIT:
"add together" (concatenate? arithmetic addition?) - arithmetic addition
My problem is that in the hardest case I have two 100,000 bit numbers (stored in an array of 12,500 bytes). Storing and manually flipping bits isn't an issue, but I need the sum of both numbers and then to be able to find out what the xth bit of this is.

"Strings of binary digits" definitely sound like byte arrays to me. To "add" two such byte arrays together, you'd just allocate a new byte array which is big enough to hold everything, and copy the contents using System.arraycopy.
However that assumes each "string" is a multiple of 8 bits. If you want to "add" a string of 15 bits to another string of 15 bits, you'll need to do bit-shifting. Is that likely to be a problem for you? Depending on what operations you need, you may even want to just keep an object which knows about two byte arrays and can find an arbitrary bit in the logically joined "string".
Either way, byte[] is going to be the way forward - or possibly BitSet.

What about
// Addition
byte[] resArr = new byte[byteArr1.length];
for (int i=0; i<byteArr1.length; i++)
{
res = byteArr1[i]+byteArr2[i];
}
?
Is it something like this you are trying to do?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.