So I want to write in a file integers with, for example, 10 bits each in Little Endian format. They also shouldn't be aligned to the byte.
The following image may help you understand the scructure.
I looked at ByteBuffer (I'm coding in Java) but it doesn't seem to do this.
This is not possible by default. Java doesn't have a bit type, so the closest you are going to get is Byte or Boolean. You can make a util class with 10 booleans (as bits) in whatever order you would like, but other than that, Java does not hold this functionality.
Related
I began working through the first problem set over at https://cryptopals.com the other day. I'm trying to learn Clojure simultaneously, so I figured I'd implement all of the exercises in Clojure. These exercises are for learning purposes of course, but I'm going out of my way to not use any libraries besides clojure.core and the Java standard library.
The first exercise asks you to write code that takes in a string encoded in hexadecimal and spit out a string encoded in base64. The algorithm for doing this is fairly straightforward:
Get the byte associated with each couplet of hex digits (for example, the hex 49 becomes 01001001).
Once all bytes for the hex string have been retrieved, turn the list of bytes into a sequence of individual bits.
For every 6 bits, return a base64 character (they're all represented as units of 6 bits).
I'm having trouble actually representing/working-with bits and bytes in Clojure (operating on raw bytes is one of the requirements of the exercise). I know I can do byte-array on the initial hex values and get back an array of bytes, but how do I access the raw bits so that I can translate from a series of bytes into a base64 encoded string?
Any help or direction would be greatly appreciated.
Always keep a browser tab open to the Clojure CheatSheet.
For detailed bit work, you want functions like bit-and, bit-test, etc.
If you are just parsing a hex string, see java.lang.BigInteger withe the radix option: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/math/BigInteger.html#%3Cinit%3E(java.lang.String,int)
java.lang.Long/parse( string, radix ) is also useful.
For the base64 part, you may be interested in the tupelo.base64 functions. This library function is all you really need to convert a string of hex into a base-64 string, although it may not count for your homework!
Please note that Java includes base-64 functions:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Base64.html
Remember, also, that you can get ideas by looking at the source code for both Clojure & the Tupelo lib.
And also, keep in mind that one of Clojure's super-powers is the ability to write low-level or performance-critical code in native Java and then link all the *.clj and *.java files together into one program (you can use Leiningen to compile & link everything in one step).
I just found out that there is BitSet in java. There are already arrays and similar data structures. Where can BitSet be used?
As the above answer only explains what a BitSet is, I am providing here an answer of how I use BitSet and why. At first, I did not knew that the BitSet construct exists. I have a QR Code generator in C++ and for flexible reasons I don't want to use a specific Bitmap structures in returning this QR Code back to the caller. The QR Code is just black and white and can be represented as a series of bits. The problem was that in the JNI C++, I have to return the byte array that represents these series of bits and then I have to return the count of bits. Note that the size of the bytes array alone could not tell the count of bits. In effect, I am face with a scenario wherein my JNI C++ has to return two values:
the byte[] array
the count of bits
My first solution, was to return an array of boolean. The content of this array are the QR Code pixels, and the square root of the length of the array is the length of the side. Of course this worked but I felt wasted because it is supposed to be a series of bits. My next attempt was to return Pair<int, byte[]> object which, after lots of hair pulling i am not able to make it work in C++. Here comes the BitSet(145) construct. By returning this BitSet object, I am conveying two types of information i listed above. But there is minor trick. If QR Code pixel has total 144 pixels, because one side is 12, then you have to allocate BitSet(145) and do obj.set(144). That is, we introduce an artificial last bit that we then set, but this last bit is not part of the QR Code pixels. This ensures that, BitSet::length() correctly returns the bit count. So in Kotlin:
var pixels:BitSet = getqrpixels(inputdata)
var pixels_len = pixels.length() - 1
var side = sqrt(pixels_len.toFloat()).toInt()
drawSquareBitmap(pixels, side)
And thus, is my unexpected use case of this mysterious BitSet.
Take a look at this:
https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html
A BitSet is a vector of bits. Each entry in the list is either true (1) or false (0). The BitSet class comes with methods that resemble the bitwise operators. It is a little bit more flexible then a normal binary type.
BitSet, unlike a boolean[], is actually a dynamically sized bitmask. Essentially, instead of using booleans to store values, it uses longs, where each of the longs 64 bits are used to store a single bit.
A while ago I read that java byte which is an 8 bits is stored internally as an int. I don't seem to find any info online that affirms this.
Thank you for taking the time to answer my question!
What about C++ char? Is it stored as an 8 bits or 32 bits?
How a byte (or any other Java value, for that matter) is stored is not specified by the JLS or the JVMS (the closest you'll find is the abstract specification on the level of the JVM, but that still doesn't say how it's stored natively). It is usually stored in the way which is most appropriate to the hardware architecture at hand, and that is usually 32 bits (or even 64).
Well, if you look at how methods are represented in a class file, you will notice that method parameters are loaded onto a method frame's execution stack with the same byte code instruction if they are bytes, ints, booleans, shorts or chars. This implies that they need to take the same size within a method frame what usually takes 32 bit.
As of storing bytes on the heap, most JVM implementations choose to store bytes with 32 bit while byte arrays are stored with 8 bit per array entry. This is however not specified in the JLS or the JVMS. If you wanted to implement your own JVM, you could use any amount of bit to store a byte and still pass the Java TCK compatibility tests.
So to say: What you say is not a manifestured truth but it is still correct most of the time.
I'd like to know if there is a simple way to "cast" a byte array containing a data-structure of a known layout to an Object. The byte[] consists of BCD packed values, 1 or 2-byte integer values and character values. I'm obtaining the byte[] via reading a file with a FileInputStream.
People who've worked on IBM-Mainframe systems will know what I mean right away - the problem is I have to do the same in Java.
Any suggestions welcome.
No, because the object layout can vary depending on what VM you're using, what architecture the code is running on etc.
Relying on an in-memory representation has always felt brittle to me...
I suggest you look at DataInputStream - that will be the simplest way to parse your data, I suspect.
Not immediately, but you can write one pretty easily if you know exactly what the bytes represent.
To convert a BCD packed number you need to extract the two digits encoded. The four lower bits encode the lowest digit and you get that by &'ing with 15 (1111 binary). The four upper bits encode the highest digit which you get by shifting right 4 bits and &'ing with 15.
Also note that IBM most likely have tooling available if you this is what you are actually doing. For the IBM i look for the jt400 IBM Toolbox for Java.
I'm trying to implement some compression algorithms, and I need to deal with bits in Java.
What I need to do is that when I write the value 1 then the value 2, those numbers are stored in the file as bits, so the file size will be 1 byte instead of 2, as 1 is stored in 1 bit and 2 is stored in 2 bits.
Is it possible? Thanks very much
All the I/O methods have a byte as the lowest granularity. You can write bits, but you have to pack them into bytes by yourself. Maybe a one-byte buffer that you write out to the file once it fills up would be appropriate.
Also note that there is no way to know the length of the file in bits (you do not know if the last byte was "full"). So your application needs to take care of that somehow.
You can also google for "BitOutputStream", of which there are a few, though not in libraries that are very common. Maybe just use one of those.
Finally, the file you will be creating will not be a "Text" file, it will be very much binary (even more so than usual...)