Create a BitSet's toByteArray method (Wrong byte order) - java

So i'm working on a bittorrent project. I need to create a bitfield according to number of piece. So i use BitSet but the problem is, the method toByteArray doesn't return the byte array in the order of byte that i wanted.
Eg:
//number of piece=11
bitSet.set(5,16); //following bittorrent specification
bitSet.toByteArray() -> 0xe0ff (this is the byte that i get)
But what i want is 0xffe0
Thanks in advance.

This is due to big-endian vs little-endian mismatch. BitSet is strictly bit-wise little-endian. BitSet#toByteArray() handles reversing the bit-order, but outputs little-endian bytes. So you'll have to rearrange the bytes yourself to match the desired order. The desired order will depend on the size of the "word" in your output data structure.
If the output is a short, you swap bytes 2 at a time, but if it's a long you'll need to reverse each set of 4 output bytes. It's possible to do this with a ByteBuffer but it's probably more work, unless you're already using ByteBuffers.
Unfortunately, there isn't a BitSet.toShortArray(), which would do what you want.
You might find the Wikipedia article Endianness useful.

Related

I wanted to Convert any length String to fixed 32 Bytes

I want to convert any length of String to byte32 in Java.
Code
String s="9c46267273a4999031c1d0f7e40b2a59233ce59427c4b9678d6c3a4de49b6052e71f6325296c4bddf71ea9e00da4e88c4d4fcbf241859d6aeb41e1714a0e";
//Convert into byte32
From the comments it became clear that you want to reduce the storage space of that string to 32 bytes.
The given string can easily be compressed from the 124 bytes to 62 bytes by doing a hexadecimal conversion.
However, there is no algorithm and there will not be an algorithm that can compress any data to 32 bytes. Imagine that would be possible: it would have been implemented and you would be able to get ZIP files of just 32 bytes for any file you compress.
So, unfortunately, the answer is: it's not possible.
You can not convert any length string to a byte array of length 32.
Java uses UTF-16 as it's string encoding, so in order to store 100% of the string, 1:1 as a fixed length byte array, you would be at a surface glance be limited to 16 characters.
If you are willing to live with the limitation of 16 characters, byte[] bytes = s.getBytes(); should give you a variable length byte array, but it's best to specify an explicit encoding. e.g. byte [] array2 = str.getBytes("UTF-16");
This doesn't completely solve your problem. You will now likely have to check that the byte array doesn't exceed 32 bytes, and come up with strategies for padding, possible null termination (which may potentially eat into your character budget)
Now, if you don't need the entire UTF-16 string space that Java uses for strings by default, you can get away with longer strings, by using other encodings.
IF this is to be used for any kind of other standard or something ( I see references to etherium being thrown around) then you will need to follow their standards.
Unless you are writing your own library for dealing with it directly, I highly recommend using a library that already exists, and appears to be well tested, and used.
You can achieve with the following function
byte[] bytes = s.getBytes();

Where do we use BitSet and why do we use it in java?

I just found out that there is BitSet in java. There are already arrays and similar data structures. Where can BitSet be used?
As the above answer only explains what a BitSet is, I am providing here an answer of how I use BitSet and why. At first, I did not knew that the BitSet construct exists. I have a QR Code generator in C++ and for flexible reasons I don't want to use a specific Bitmap structures in returning this QR Code back to the caller. The QR Code is just black and white and can be represented as a series of bits. The problem was that in the JNI C++, I have to return the byte array that represents these series of bits and then I have to return the count of bits. Note that the size of the bytes array alone could not tell the count of bits. In effect, I am face with a scenario wherein my JNI C++ has to return two values:
the byte[] array
the count of bits
My first solution, was to return an array of boolean. The content of this array are the QR Code pixels, and the square root of the length of the array is the length of the side. Of course this worked but I felt wasted because it is supposed to be a series of bits. My next attempt was to return Pair<int, byte[]> object which, after lots of hair pulling i am not able to make it work in C++. Here comes the BitSet(145) construct. By returning this BitSet object, I am conveying two types of information i listed above. But there is minor trick. If QR Code pixel has total 144 pixels, because one side is 12, then you have to allocate BitSet(145) and do obj.set(144). That is, we introduce an artificial last bit that we then set, but this last bit is not part of the QR Code pixels. This ensures that, BitSet::length() correctly returns the bit count. So in Kotlin:
var pixels:BitSet = getqrpixels(inputdata)
var pixels_len = pixels.length() - 1
var side = sqrt(pixels_len.toFloat()).toInt()
drawSquareBitmap(pixels, side)
And thus, is my unexpected use case of this mysterious BitSet.
Take a look at this:
https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html
A BitSet is a vector of bits. Each entry in the list is either true (1) or false (0). The BitSet class comes with methods that resemble the bitwise operators. It is a little bit more flexible then a normal binary type.
BitSet, unlike a boolean[], is actually a dynamically sized bitmask. Essentially, instead of using booleans to store values, it uses longs, where each of the longs 64 bits are used to store a single bit.

Huffman compress file (Got the tree but can't compress)- Java

Alright so I am trying to do a file compress using the Huffman tree.
We got the tree that is working just fine but we are unable to figure out how to write the binary string we get into the file.
So for example our tree returns: '110', it should mean this byte: '00000110' right?
And if the returns: '11111111 11111110' it should mean what? Should we just write it in in byte?
So the question is how do we convert the binary string we get into bytes so we can write it on the file?
Thanks alot,
Ara
So for example our tree returns: '110', it should mean this byte:
'00000110' right?
Wrong. You should have a byte buffer of bits into which you write your bits. Write the three bits 110 into the byte. (You will need to decide on a convention for bit ordering in the byte.) You still have five unused bits in the byte, so there it sits. Now you write 10 into the buffer. The byte buffer now has 11010, and three unused bits. So still it sits. Now you try to write 111011 into the byte buffer. The first three bits go into the byte buffer, giving you 11010111. You now have filled the buffer, so only now do you write out your byte to the file. You are left with 011. You clear your byte buffer of bits since you wrote it out, and put in the remaining 011 from your last code. Your byte buffer now has three bits in it, and five bits unused. Continue in this manner.
The buffer does not have to be one byte. 16-bit or 32-bit buffers are common and are more efficient. You write out bytes whenever the bits therein are eight or more, and shift the remaining 0-7 bits to the start of the buffer.
The only tricky part is what to do at the end, since you may have unused bits in your last byte. Your Huffman codes should have an end symbol to mark the end of the stream. Then you know when you should stop looking for more Huffman codes. If you do not have an end code, then you need to assure somehow that either the remaining bits in the byte cannot be a complete Huffman code, or you need to indicate in some other way where the stream of bits end.

Why no readUnsignedInt in RandomAccessFile class?

I just found there is no readUnsignedInt() method in the RandomAccessFile class. Why? Is there any workaround to read an unsigned int out from the file?
Edit:
I want to read an unsigned int from file and put it into a long space.
Edit2:
Cannot use readLong(). it will read 8 bytes not 4 bytes. the data in the file have unsigned ints in 4 bytes range.
Edit3:
Found answer here: http://www.petefreitag.com/item/183.cfm
Edit4:
how about if the data file is little-endian? we need to bits swap first?
I'd do it like this:
long l = file.readInt() & 0xFFFFFFFFL;
The bit operation is necessary because the upcast will extend a negative sign.
Concerning the endianness. To the best of my knowledge all I/O in Java is done in big endian fashion. Of course, often it doesn't matter (byte arrays, UTF-8 encoding, etc. are not affected by endianness) but many methods of DataInput are. If your number is stored in little endian, you have to convert it yourself. The only facility in standard Java I know of that allows configuration of endianness is ByteBuffer via the order() method but then you open the gate to NIO and I don't have a lot of experience with that.
Edited to remove readLong():
You could use readFully(byte[] b, int off, int len) and then convert to Long with the methods here: How to convert a byte array to its numeric value (Java)?
Because there is no unsigned int type in java?
Why not readLong() ?
You can readLong and then take first 32 bits.
Edit
You can try
long value = Long.parseLong(Integer.toHexString(file.readInt()), 16);
Depending on what you are doing with the int, you may not need to turn it into a long. You just need to be aware of the operations you are performing. After all its just 32-bits and you can treat it as signed or unsigned as you wish.
If you want to play with the ByteOrder, the simplest thing to do may be to use ByteBuffer which allows you to set a byte order. If your file is less than 2 GB, you can map the entire file into memory and access the ByteBuffer randomly.

Convert arbitrary size of byte[] to BigInteger[] and then safely convert back to exactly the same byte[], any clues?

I believe conversion exactly to BigInteger[] would be optimal in my case. Anyone had done or found this written in Java and willing to share?
So imagine I have arbitrary size byte[] = {0xff,0x3e,0x12,0x45,0x1d,0x11,0x2a,0x80,0x81,0x45,0x1d,0x11,0x2a,0x80,0x81}
How do I convert it to array of BigInteger's and then be able to recover it back the original byte array safely?
ty in advance.
Use BigInteger.toByteArray() and BigInteger(byte[]).
According to the javadoc, the latter ...
Translates a byte array containing the two's-complement binary representation of a BigInteger into a BigInteger. The input array is assumed to be in big-endian byte-order: the most significant byte is in the zeroth element.
If your byte-wise representation is different, you may need to apply some extra transformations.
EDIT - if you need to preserve leading (i.e. non-significant) zeros, do the following:
When you convert from the byte array to a BigInteger, also make a note of the size of the byte array. This information is not encoded in the BigInteger value.
When you convert from the BigInteger to a byte array, sign-extend the byte array out to the same length as the original byte array.
EDIT 2 - if you want to turn a byte array into an array of BigIntegers with at most N bytes in each one, you need to create a temporary array of size N, repeatedly 1) fill it with bytes from the input byte array (with left padding at the end) and 2) use it to create BigInteger values using the constructor above. Maybe 20 lines of code?
But I'm frankly baffled that you would (apparently) pick a value for N based on memory usage rather than based on the mathematical algorithm you are trying to implement.

Categories

Resources