I'm making hash algorithm, the block of message is 512 bits.
In C/C++ I can store in char[64], but Java char takes 2 bytes.
Question: 512 bits of information are char[32] or char[64]?
Char is 16bit in Java. So char[32] should be enough for 512bits.
I think using byte[64] is better though because everyone know a byte is 8 bits and char[32] makes the code harder to read. Also you don't store characters but bits.
From the documentation:
char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff'
(or 65,535 inclusive).
So in order to store 512 bits, you should have an array of size 32.
Why use a char[]?
A hash value consists of bytes so the logical choice would be to use a byte[64].
The datatype char is intended to be used as a character and not as a number.
Related
I am reading a book about Java programming and in the first chapter it says: "The number 149 is stored in the byte at address 16" - is storing three characters, the 1, the 4, and the 9 in one byte possible?
No, the size of a character in java is 2 bytes. Thus obviously 6 bytes cannot fit into 1.
I think the book was trying to ask whether the number 149 could fit into a byte, in which yes and no, an unsigned byte can hold a value of 255 at max while a two's complement (signed) byte can only hold a value of 126.
Info about primitive data type
Storing the number 149 and characters '1', '4', and '9' separately are completely different. Storing the character '1' is actually storing its ASCII value 49, and the ASCII value 52, and 57 represent '4' and '9 respectively. The size of each character in Java is 2 bytes. So therefore 3 characters with a total size of 6 cannot fit into a single byte.
The byte data type is only 8 bits, and therefore it can store numbers from -128 to +127. That means the maximum value for a byte (Byte.Max_VALUE) is 127, and since 149 is bigger than 127, therefore it cannot fit into a byte, and you have to at least use a short to store 149. A short is 2 bytes in java.
I highly encourage you to read this Java documentation on data types. It's very short, but pretty useful to understand everything about data types.
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
In General, A byte is 8 bits and can store a number range from 0 to 255. Bytes are often used in RAW data processing and is how data is stored in memory. When storing characters or a "String", you are storing a sequence of bytes that represent a sequence of characters.
the number 149 in binary Byte form is 10010101
Decimal to Binary Converter
But storing characters are different than storing numbers. To address you question, storing the characters "1", "4", and "9" in a single byte is not possible, but storing the number 149 is.
Also, the number of bytes that a given character/string uses are highly dependant on which encoding you are using.
Java String see .getBytes(Charset charset)
All this being said, a byte in Java is signed. Its range goes from -128 to +127 inclusive. A byte can store 256 unique values. You can think of them as numbers, individual flags, whatever you want. I have no context to the OP's original problem, but if they are using a Java primitive byte, it cannot by default hold the number 149. IF you are talking about a sequence of 8 bits, it can.
Java Primitive Datatypes
As I searched difference between InputStream and Reader. I got answered that.
InputStream: Byte-Base ( read byte by byte )
Reader: Character-Base ( read char by char )
I paste á character in file that's ASCII (or may be other Charset) is 225 in my OS and byte's max_value is 127. and I used FileInputStream to just read() then why it returning 225? how it is able to read more than one byte? because read() method just read one byte or character at a time.
Or what is the actually difference between InputStream and Reader?
á does indeed have a unicode value of 225 (that's its code point, and is unrelated to its encoding). When you cast that down to a byte, you'll get -31. But if you take a careful look at the docs for InputStream.read, you'll see:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255.
(emphasis added) The read method returns an int, not a byte, but that int essentially represents an unsigned byte. If you cast that int down to a char, you'll get back to á. If you cast that int down to a byte, it'll wrap down to -31.
A bit more detail:
á has a unicode value of 225.
chars in Java are represented as UTF-16, which for 225 has a binary representation of 00000000 11100001
if you cast that down to a byte, it'll drop the high byte, leaving you with 11100001. This has a value of -31 if treated as a signed byte, but 225 if treated as unsigned.
InputStream.read returns an int so that it can represent the stream's end as -1. But if the int is non-negative, then only its bottom 8 bits are set (decimal values 0-255)
When you cast that int down to a byte, Java will drop all but the lowest 8 bits -- leaving you again with 11100001
The difference is that an InputStream will read the contents of the file as is, with no interpretation: the raw bytes.
A Reader on the other hand will use a CharsetDecoder to process the byte input and turn it into a sequence of chars instead. And the way it will process the byte input will depend on the Charset used.
And this is not a 1 <-> 1 relationship!
Also, forget about "ASCII values"; Java doesn't use ASCII, it uses Unicode, and a char is in fact a UTF-16 code unit. It was a full code point when Java began, but then Unicode defined code points outside the BMP and Java had to adapt: code points over U+FFFF are now represented using a surrogate pair, ie two chars.
See here for a more detailed explanation.
InputStream.read() returns an int. That is a value between 0 and 255.
Byte.MAX_VALUE is 127 but Byte.MIN_VALUE is -128 which is binary 10000000. But java does not support unsigned primitives so the most significant byte is always the sign bit.
I was trying to store byte value in a variable and trying to perform some logic based upon this calculation.
byte mByteValue = -129; // Holding byte value
Problem is I am always getting value 127, due to which my logic fails everytime.
Any specific reason behind this, why its behaving strange in my case?
A byte in java is a signed 8-bit value. 8 bits gives you 256 possible values, but since a byte is signed and can represent positive and negative values, those 256 values must be roughly split in half across the possible positive and negative values. So you can't store negative values past -128; in particular don't expect to be able to store -256.
What you're actually observing when your byte has the value 127 is known as overflow (see this wiki article)
If you need to manipulate values outside this range, as in your example code, or e.g. an unsigned byte, at some point you'll need to make use of a wider integer type, like short.
The standard libraries provide these limits as Byte.MIN_VALUE and Byte.MAX_VALUE (docs here and here).
The range of byte is from -128 to 127. You can not store any value beyond these range.
This is because byte is 8 bits. So the maximum positive number stored at byte is -
2^7 -1 = 127. // since the first bit is sing bit; 0 for positive
And minimum negative number stored at byte is-
2^7 = -128 //since the first bit is sign bit; 1 for negative.
And if you use unsigned byte the it would be 255.
To correctly convert a byte to an int use mByteValue & 0xFF. You can read more about the Two's complement here: https://en.wikipedia.org/wiki/Two%27s_complement.
I've found a few answers about this but none of them seem to apply to my issue.
I'm using the NDK and C++ is expecting an unsigned char array of 1024 elements, so I need to create this in java to pass it as a parameter.
The unsigned char array is expected to contain both numbers and characters.
I have tried this:
byte[] lMessage = new byte[1024];
lMessage[4] = 'a';
The problem is that then the 4th element gets added as a numerical value instead of maintaining the 'a' character.
I have also tried
char[] lMessage = new char[1024];
lMessage[4] = 'a';
While this retains the character, it does duplicate the amount of bytes in the array from 8 to 16.
I need the output to be a 8 bit ASCII unsigned array.
Any suggestions?
Thanks.
It is wrong to say that the element "gets added as a numerical value". The only thing that you can say for sure is that it gets added as electrostatic charges in eight cells of your RAM.
How you choose to represent those eight bits (01100001) in order to visualize them has little to do with what they really are, so if you choose to see them as a numerical value, then you might be tricked into believing that they are in fact a numerical value. (Kind of like a self-fulfilling prophecy (wikipedia).)
But in fact they are nothing but 8 electrostatic charges, interpretable in whatever way we like. We can choose to interpret them as a two's complement number (97), we can choose to interpret them as a binary-coded decimal number (61), we can choose to interpret them as an ASCII character ('a'), we can choose to interpret them as an x86 instruction opcode (popa), the list goes on.
The closest thing to an unsigned char in C++ is a byte in java. That's because the fundamental characteristic of these small data types is how many bits long they are. Chars in C++ are 8-bit long, and the only type in java which is also 8-bits long is the byte.
Unfortunately, a byte in java tends to be thought of as a numerical quantity rather than as a character, so tools (such as debuggers) that display bytes will display them as little numbers. But this is just an arbitrary convention: they could have just as easily chosen to display bytes as ASCII (8-bit) characters, and then you would be seeing an actual 'a' in byte[] lMessage[4].
So, don't be fooled by what the tools are showing, all that counts is that it is an 8-bit quantity. And if the tools are showing 97 (0x61), then you know that the bit pattern stored in those 8 memory cells can just as legitimately be thought of as an 'a', because the ASCII code of 'a' is 97.
So, finally, to answer your question, what you need to do is find a way to convert a java string, which consists of 16-bit unicode characters, to an array of ASCII characters, which would be bytes in java. You can try this:
String s = "TooManyEduardos";
byte[] bytes = s.getBytes("US-ASCII");
Or you can read the answers to this question: Convert character to ASCII numeric value in java for more ideas.
Will work for ASCII chars
lMessage[4] = new String('a').getBytes()[0];
I wanted to use DataOutputStream#writeBytes, but was running into errors. Description of writeBytes(String) from the Java Documentation:
Writes out the string to the underlying output stream as a sequence of bytes. Each character in the string is written out, in sequence, by discarding its high eight bits.
I think the problem I'm running into is due to the part about "discarding its high eight bits". What does that mean, and why does it work that way?
Most Western programmers tend to think in terms of ASCII, where one character equals one byte, but Java Strings are 16-bit Unicode. writeBytes just writes out the lower byte, which for ASCII/ISO-8859-1 is the "character" in the C sense.
The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive). But The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). That is why this function is writing the low-order byte of each char in the string from first to last. Any information in the high-order byte is lost. In other words, it assumes the string contains only characters whose value is between 0and 255.
You may look into the writeUTF(String s) method, which, retains the information in the high-order byte as well as the length of the string. First it writes the number of characters in the string onto the underlying output stream as a 2-byte unsigned int between 0 and 65,535. Next it encodes the string in UTF-8 and writes the bytes of the encoded string to the underlying output stream. This allows a data input stream reading those bytes to completely reconstruct the string.