How to parse byte[] (including BCD coded values) to Object in Java

How to parse byte[] (including BCD coded values) to Object in Java - java

I'd like to know if there is a simple way to "cast" a byte array containing a data-structure of a known layout to an Object. The byte[] consists of BCD packed values, 1 or 2-byte integer values and character values. I'm obtaining the byte[] via reading a file with a FileInputStream.
People who've worked on IBM-Mainframe systems will know what I mean right away - the problem is I have to do the same in Java.
Any suggestions welcome.

No, because the object layout can vary depending on what VM you're using, what architecture the code is running on etc.
Relying on an in-memory representation has always felt brittle to me...
I suggest you look at DataInputStream - that will be the simplest way to parse your data, I suspect.

Not immediately, but you can write one pretty easily if you know exactly what the bytes represent.
To convert a BCD packed number you need to extract the two digits encoded. The four lower bits encode the lowest digit and you get that by &'ing with 15 (1111 binary). The four upper bits encode the highest digit which you get by shifting right 4 bits and &'ing with 15.
Also note that IBM most likely have tooling available if you this is what you are actually doing. For the IBM i look for the jt400 IBM Toolbox for Java.

Related

How is a variable actually stored in memory for Java?

I'm learning about Text I/O and Binary I/O in java right now. I read that each value that you write to a file is initially stored in binary. For text I/O, the individual digits are converted to it's corresponding Unicode values and then encoded to the file-specific encoding such as ASCII. For binary I/O, the binary value is directly represented in the file. For example, 199 would be represented as 0xC7 which in binary is 11000111. Now I'm confused on one part. If a variable is initially stored as a binary format, does each digit represent a separate byte that is stored or is the entirety of the number stored as a single byte. For example, is 199 originally stored as 0xc7 which would be 11000111 in binary? Or would it be stored in 3 bytes with each byte representing the binary value for the digit. If it was stored in 3 separate bytes, does binary I/O convert that 3 byte number to a single byte? If it's stored in a single byte, how does text I/O translate that single byte into 3 separate byte values. I'm just confused on how to word this. Hope you can understand what I'm getting at. Thanks

The only thing which a computer is capable of dealing with are sets of 0/1 bits which are stored in memory or, if you wish on a storage device. Those bits can be streamed to monitors and converted to characters by graphical hardware. Sams story with keyboards, you type a key and a few bits of data will be send to the computer.
Bits are stored in memory and are accessible by memory addresses. The addresses are also sets of bits.
For practical reasons the bits are grouped into bytes, words, long words, ... A byte used to be the smallest addressable unit of bits and historically ended up as a group of 8 bits, which is currently used in most of the hardware. Modern memory can store data in multiple byte addressable chunks. Same for the disk, you store data there, using specific addressing mechanisms. But in any case those are just sets of bits.
What you are confused about is the interpretation of those bits. They can represent integer numbers, floating point numbers, characters, addresses, ... The way they are interpreted only depends on the program which uses them.
Characters do not exist in the computer. They are just an abstraction which is provided by programming languages. The programs interpret the bits stored on the computer. There are standards. For example the ASCII encoding maps English characters plus a few special characters into numbers from 0 to 127. Those fit into a single byte (leaving number 128 to 255 for special use). A print command will read those bytes one by one and send them to graphics to form letters on the screen as specified in the encoding standard. Different encoding scheme will display the same bytes differently.
If you write a program wit the "hello world" sting in it, the program will convert the symbols between quotes into a set of 11 ascii bytes. (In 'c' it will add yet another byte which is equal to '0' and ends the string this way). Unicode is yet another way to represent characters. Every unicode character is represented by multiple bytes of data. There are other schemes as well. One thing to pay attention to. If you write strings on the disk using certain encoding, you should read them with the same encoding, or your prints will give you garbage. But you can always read and copy then as binary data without interpretation.
So, any variable of any type is just an abstraction and always consists of bytes of data which your program knows how to interpret based on the data type and/or operations it wants to perform. Variables of type int, double, any java object, including String, are just sets of bytes of different sizes. Only the program (and java interpreter is a program) knows what to do with them, use them in calculations or display as characters.

How can I directly work with bits in Clojure?

I began working through the first problem set over at https://cryptopals.com the other day. I'm trying to learn Clojure simultaneously, so I figured I'd implement all of the exercises in Clojure. These exercises are for learning purposes of course, but I'm going out of my way to not use any libraries besides clojure.core and the Java standard library.
The first exercise asks you to write code that takes in a string encoded in hexadecimal and spit out a string encoded in base64. The algorithm for doing this is fairly straightforward:
Get the byte associated with each couplet of hex digits (for example, the hex 49 becomes 01001001).
Once all bytes for the hex string have been retrieved, turn the list of bytes into a sequence of individual bits.
For every 6 bits, return a base64 character (they're all represented as units of 6 bits).
I'm having trouble actually representing/working-with bits and bytes in Clojure (operating on raw bytes is one of the requirements of the exercise). I know I can do byte-array on the initial hex values and get back an array of bytes, but how do I access the raw bits so that I can translate from a series of bytes into a base64 encoded string?
Any help or direction would be greatly appreciated.

Always keep a browser tab open to the Clojure CheatSheet.
For detailed bit work, you want functions like bit-and, bit-test, etc.
If you are just parsing a hex string, see java.lang.BigInteger withe the radix option: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/math/BigInteger.html#%3Cinit%3E(java.lang.String,int)
java.lang.Long/parse( string, radix ) is also useful.
For the base64 part, you may be interested in the tupelo.base64 functions. This library function is all you really need to convert a string of hex into a base-64 string, although it may not count for your homework!
Please note that Java includes base-64 functions:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Base64.html
Remember, also, that you can get ideas by looking at the source code for both Clojure & the Tupelo lib.
And also, keep in mind that one of Clojure's super-powers is the ability to write low-level or performance-critical code in native Java and then link all the *.clj and *.java files together into one program (you can use Leiningen to compile & link everything in one step).

How to write ints with specific amount of bits in Java

So I want to write in a file integers with, for example, 10 bits each in Little Endian format. They also shouldn't be aligned to the byte.
The following image may help you understand the scructure.
I looked at ByteBuffer (I'm coding in Java) but it doesn't seem to do this.

This is not possible by default. Java doesn't have a bit type, so the closest you are going to get is Byte or Boolean. You can make a util class with 10 booleans (as bits) in whatever order you would like, but other than that, Java does not hold this functionality.

What integer format when reading binary data from Java DataOutputStream in PHP?

I'm aware that this is probably not the best idea but I've been playing around trying to read a file in PHP that was encoded using Java's DataOutputStream.
Specifically, in Java I use:
dataOutputStream.writeInt(number);
Then in PHP I read the file using:
$data = fread($handle, 4);
$number = unpack('N', $data);
The strange thing is that the only format character in PHP that gives the correct value is 'N', which is supposed to represent "unsigned long (always 32 bit, big endian byte order)". I thought that int in java was always signed?
Is it possible to reliably read data encoded in Java in this way or not? In this case the integer will only ever need to be positive. It may also need to be quite large so writeShort() is not possible. Otherwise of course I could use XML or JSON or something.

This is fine, as long as you don't need that extra bit. l (instead of N) would work on a big endian machine.
Note, however, that the maximum number that you can store is 2,147,483,647 unless you want to do some math on the Java side to get the proper negative integer to represent the desired unsigned integer.
Note that a signed Java integer uses the two's complement method to represent a negative number, so it's not as easy as flipping a bit.

DataOutputStream.writeInt:
Writes an int to the underlying output stream as four bytes, high byte
first.
The formats available for the unpack function for signed integers all use machine dependent byte order. My guess is that your machine uses a different byte order than Java. If that is true, the DataOutputStream + unpack combination will not work for any signed primitive.

Reading and Writing Bits to Text Files in Java

I'm trying to implement some compression algorithms, and I need to deal with bits in Java.
What I need to do is that when I write the value 1 then the value 2, those numbers are stored in the file as bits, so the file size will be 1 byte instead of 2, as 1 is stored in 1 bit and 2 is stored in 2 bits.
Is it possible? Thanks very much

All the I/O methods have a byte as the lowest granularity. You can write bits, but you have to pack them into bytes by yourself. Maybe a one-byte buffer that you write out to the file once it fills up would be appropriate.
Also note that there is no way to know the length of the file in bits (you do not know if the last byte was "full"). So your application needs to take care of that somehow.
You can also google for "BitOutputStream", of which there are a few, though not in libraries that are very common. Maybe just use one of those.
Finally, the file you will be creating will not be a "Text" file, it will be very much binary (even more so than usual...)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.