Bitmask in big endian

Bitmask in big endian - java

This isn't a question as much as it's a sanity check!
If you needed to read 4 bytes into Java as a bitmask in Big endian and those bytes were:
0x00, 0x01, 0xB6, 0x02.
Making that into an int would be: 112130
The binary would be: 00000000000000011010011000000010
The endian of a series of bytes wouldn't affect the bit position, would it?
Thanks
Tony

Endian-ness reflects the ordering of bytes, but not the ordering of the bits within those bytes.
Let's say I want to represent the (two-byte) word 0x9001.
If I just type this out in binary, that would be 1001000000000001.
If I dump the bytes (from lower address to higher) on a big-endian machine, I would see 10010000 00000001.
If I dump the bytes (from lower address to higher) on a little-endian machine, I would see 00000001 10010000.

In general, if the thing you're reading from is giving you whole bytes, then you don't need to worry about the order of bits making up those bytes: it is just the order of the bytes that matters, as you correctly suppose.
The time you might have to worry about the "endianness" of individual bits is where you're actually reading/writing a stream of bits rather than whole bytes (e.g. if you were writing a compression algorithm that operated at the bit level, you'd have to make a decision about what order to write the bits in).

The only thing you have to pay attention is how exactly you "read 4 bytes into Java" - that's where endianness matters and you can mess it up (DataInputStream assumes big endian). Once the value you've read has become the int 112130, you're set.

Related

How to convert an unsigned byte to a signed byte

I am trying to convert unsigned bytes from a file to signed bytes in java. This is the current arrangement I have for reading unsigned bytes from a file in java:
ByteArrayOutputStream output = new ByteArrayOutputStream();
for (String string : fileKeyString) {
output.write(Integer.valueOf(string).byteValue());
}
return output.toByteArray();
Note: I have to use Java 8 and fileKeyString is a String Array that gets created when reading from a file. The variable string holds the unsigned byte. It outputs a byte array which is required.
How would I exactly convert this from an unsigned byte to signed bytes before it gets placed into output.write and evaluated by .byteValue()?
I dont have too much expereience with bytes so any help is appreciated.
Thankyou.

Found it! You just subtract by 256 if it exceeds by 128.
if (byte >= 128) { byte -= 256; }

You seem to misunderstand how computers work.
A byte is what it is. Just 01001100 on disk or in memory. What does 01001100 mean? Is that signed or unsigned? The byte doesn't know. Bytes just are 8 bits, that's it. That's all they ever are. It's things that interact with the byte that decide how one is to read it. Is that signed? The byte has no idea - the software (or the human eyeballs) that look at it decide whether it is or not.
Let's make it more interesting and work with the byte 10000000.
What is that? The byte has no idea. Perhaps you have some software that reads this byte and shows the value of it on screen.
Depending on which software you use, you might see any of the following and they are all equally correct:
128 (interpretation: It's an unsigned byte, show it in decimal)
-128 (interpretation: It's a 2s complement signed byte, show in decimal)
80 (interpretation: Show it in hexadecimal, unsigned)
-80 (interpretation: Show in hex, signed)
� (interpretation: It's a unicode character. The 128th item in the unicode table is 'control', and not really a character perse).
-127 (interpretation: It's a 1s complement signed byte, show in decimal)
Nothing appears on screen, instead, the dulcet tones of Unchained Melody blast out of the speaker (interpretation: It's an id of a song, and Unchained Melody's ID is bit sequence 10000000).
Given a file containing just 1 byte, with bitsequence 10000000 (which is just a sequence of bytes, no metadata), you have no idea which of the above interpretation is correct. In that sense they are ALL correct. I can make you a file which, if you name it 'foo.zip' and unzip it produces 1 file with the collected works of shakespeare in plain text inside. If you rename the .zip to .png, and open it, you see the mona lisa. Same bytes in either case - it's the app that reads them that causes those exact same bytes to mean something completely different.
The exact same principle (it's not the byte itself, it's the software or human eyeballs that decide what it means) applies in reverse as well: If I want to 'write' Unchained Melody to disk, it's the software that decides how to do it.
With that in mind, therefore:
How would I exactly convert this from an unsigned byte to signed bytes before it gets placed into output.write and evaluated by .byteValue()?
That question makes no sense. If I have the number -128 and I want to write it to disk, presumably you just write the bit sequence 10000000 to disk and, yup, that doesn't mean anything unless the user of the computer opens that file again with your app. Or any other app that knows that it is to be interpreted as a signed 2's complement byte.
The code you have already writes 1 byte to disk whose bit sequence is 10000000; you're already doing it, your code is fine as is.
If you are opening it with something and that says 'this file contains +128', and you want that to say '-128' instead, there is nothing you can change in your file writing code. Instead, you need to find different software to open it, or configure that software differently.

Does Java read a single byte in big endian bit order?

So we can talk about the endianness of both the bit and byte order.
When I read the next byte from FileInputStream, for example, I practically get an 8-bit signed integer, but I have no idea what is the bit order with which Java calculates the byte's integer value. Which comes first, the most significant or the least significant bit?
(sign bit, 2^6 ..... 2^0)
Or...
(2^0, ..... 2^6, sign bit)

Endianness only really applies when a unit is broken down into other units. So if you were transmitting a byte over a bit stream, you could observe whether the least significant bit was transmitted first or last. And at that point we could say that the stream was little-endian or big-endian.
But within a byte-addressable machine, i.e., where the byte is the smallest unit of storage, there is no "endianness" within the byte. No bit of the byte is "before" any other bit of the byte.
Note that another term for endianness is "byte order". The order of bytes within larger entities.
It is true we like to number bits (0 to 7, for an 8-bit byte) so we can talk about them, but this really does not define endianness, even though the numbering is often chosen to match the byte order of the machine; this is convention.
With respect to FileInputStream - according to its documentation, that transfers bytes: no part of the byte is sent before any other part, at least not as far as FileInputStream is concerned. If the byte has to be sent bitwise over some interconnect (say, a SATA cable), then the decision about which bit goes first is a matter for the hardware. The higher layer code is dealing in bytes (or even blocks).

in int first bit is the sign, the rest is the value, the last bit is the least significant bit.

Android LEB128 type size

I'm confused about LEB128 or Little Endian Base 128 format. In the AOSP source code Leb128.java, its read function's return type whether signed or unsigned is int. I know the the size of int in java is 4 bytes aka 32bits. But the max length of LEB128 in AOSP is 5 bytes aka 35 bits. So where are the other lost 3bits.
Thanks for your reply.

Each byte of data in LEB only accounts for 7 bits in the actual output - the remaining bit is used to indicate whether or not it's the end.
From Wikipedia:
To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend the number up to a multiple of 7 bits (such that the most significant 7 bits are not all 0). Break the number up into groups of 7 bits. Output one encoded byte for each 7 bit group, from least significant to most significant group.
The extra bits aren't so much "lost" as "used to indicate whether or not it's the end of the data".
You can't hope to encode arbitrary 32-bit values and some of them taking less than 4 bytes without some of them taking more than 4 bytes.

Huffman compress file (Got the tree but can't compress)- Java

Alright so I am trying to do a file compress using the Huffman tree.
We got the tree that is working just fine but we are unable to figure out how to write the binary string we get into the file.
So for example our tree returns: '110', it should mean this byte: '00000110' right?
And if the returns: '11111111 11111110' it should mean what? Should we just write it in in byte?
So the question is how do we convert the binary string we get into bytes so we can write it on the file?
Thanks alot,
Ara

So for example our tree returns: '110', it should mean this byte:
'00000110' right?
Wrong. You should have a byte buffer of bits into which you write your bits. Write the three bits 110 into the byte. (You will need to decide on a convention for bit ordering in the byte.) You still have five unused bits in the byte, so there it sits. Now you write 10 into the buffer. The byte buffer now has 11010, and three unused bits. So still it sits. Now you try to write 111011 into the byte buffer. The first three bits go into the byte buffer, giving you 11010111. You now have filled the buffer, so only now do you write out your byte to the file. You are left with 011. You clear your byte buffer of bits since you wrote it out, and put in the remaining 011 from your last code. Your byte buffer now has three bits in it, and five bits unused. Continue in this manner.
The buffer does not have to be one byte. 16-bit or 32-bit buffers are common and are more efficient. You write out bytes whenever the bits therein are eight or more, and shift the remaining 0-7 bits to the start of the buffer.
The only tricky part is what to do at the end, since you may have unused bits in your last byte. Your Huffman codes should have an end symbol to mark the end of the stream. Then you know when you should stop looking for more Huffman codes. If you do not have an end code, then you need to assure somehow that either the remaining bits in the byte cannot be a complete Huffman code, or you need to indicate in some other way where the stream of bits end.

CRC calculation in Java

I'm reading a file from serialport using x-modem protocol and 133 bytes packet. I'm reading
in that
1 byte is SOH
2 byte packet number
3 byte nagative of packet number
next 128 bytes data
2 bytes CRC sent from other side.
I have to calculate CRC of 128 bytes data and 2 bytes crc sent from other side that I have to make it single byte and have to comapare with my calculated crc. How can I do this in java?

Try using Jacksum.

Sun JDK 1.6 contains sun.misc.CRC16, but there is a possibility this is not the CRC16 you're looking for, since there's several different polynomials in use.

Here is my C code, which is trivial to port to Java - you are free to use it in any way you like. The references to word are for a 16 bit unsigned value - you should be able to use a char instead in Java.
It's been too long since I worked with 16 bit CRC's so I don't recall if there are variations based on seeding. I am pretty sure I used this code in a C implementation of X-Modem way back when.
The source is posted on tech.dolhub.com.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.