Char into byte? (Java)

Char into byte? (Java) - java

How come this happens:
char a = '\uffff'; //Highest value that char can take - 65535
byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
char c = (char)b; //Let's get the value back
int d = (int)c;
System.out.println(d); //65535... how?
Basically, I saw that a char is 16-bit. Therefore, if you cast it into a byte, how come no data is lost? (Value is the same after casting into an int)
Thanks in advance for answering this little ignorant question of mine. :P
EDIT: Woah, found out that my original output actually did as expected, but I just updated the code above. Basically, a character is cast into a byte and then cast back into a char, and its original, 2-byte value is retained. How does this happen?

As trojanfoe states, your confusion on the results of your code is partly due to sign-extension. I'll try to add a more detailed explanation that may help with your confusion.
char a = '\uffff';
byte b = (byte)a; // b = 0xFF
As you noted, this DOES result in the loss of information. This is considered a narrowing conversion. Converting a char to a byte "simply discards all but the n lowest order bits".
The result is: 0xFFFF -> 0xFF
char c = (char)b; // c = 0xFFFF
Converting a byte to a char is considered a special conversion. It actually performs TWO conversions. First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion). Second, the int is converted to a char with a narrowing conversion.
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF
int d = (int)c; // d = 0x0000FFFF
Converting a char to an int is considered a widening conversion. When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).
The result is: 0xFFFF -> 0x0000FFFF. When printed, this will give you 65535.
The three links I provided are the official Java Language Specification details on primitive type conversions. I HIGHLY recommend you take a look. They are not terribly verbose (and in this case relatively straightforward). It details exactly what java will do behind the scenes with type conversions. This is a common area of misunderstanding for many developers. Post a comment if you are still confused with any step.

It's sign extension. Try \u1234 instead of \uffff and see what happens.

java byte is signed. it's counter intuitive. in almost all situations where a byte is used, programmers would want an unsigned byte instead. it's extremely likely a bug if a byte is cast to int directly.
This does the intended conversion correctly in almost all programs:
int c = 0xff & b ;
Empirically, the choice of signed byte is a mistake.

Some rather strange stuff going on your machine. Take a look at Java language specification, chapter 4.2.1:
The values of the integral types are
integers in the following ranges:
For byte, from -128 to 127, inclusive
... snip others...
If your JVM is standards compliant, then your output should be -1.

Related

Java type casting problems [duplicate]

//key & hash are both byte[]
int leftPos = 0, rightPos = 31;
while(leftPos < 16) {
//possible loss of precision. required: byte, found: int
key[leftPos] = hash[leftPos] ^ hash[rightPos];
leftPos++;
rightPos--;
}
Why would a bitwise operation on two bytes in Java return an int? I know I could just cast it back to byte, but it seems silly.

Because the language spec says so. It gives no reason, but I suspect that these are the most likely intentions:
To have a small and simple set of rules to cover arithmetic operations involving all possible combinations of types
To allow an efficient implementation - 32 bit integers are what CPUs use internally, and everything else requires conversions, explicit or implicit.

If it's correct and there are no value that can cause this loss of precision, in other words : "impossible loss of precision" the compiler should shut up ... and need to be corrected, and no cast should be added in this :
byte a = (byte) 0xDE;
byte b = (byte) 0xAD;
byte r = (byte) ( a ^ b);

There is no Java bitwise operations on two bytes. Your code implicitly and silently converts those bytes to a larger integer type (int), and the result is of that type as well.
You may now question the sanity of leaving bitwise operations on bytes undefined.

This was somewhere down in the answers to one of the similar questions that people have already pointed out:
http://blogs.msdn.com/oldnewthing/archive/2004/03/10/87247.aspx

Is byte not signed two's complement?

Given that byte,short and int are signed, why do byte and short in Java not get the usual signed two's complement treatment ? For instance 0xff is illegal for byte.
This has been discussed before here but I couldn't find a reason for why this is the case.

If you look at the actual memory used to store -1 in signed byte, then you will see that it is 0xff. However, in the language itself, rather than the binary representation, 0xff is simply out of range for a byte. The binary representation of -1 will indeed use two's complement but you are shielded from that implementation detail.
The language designers simply took the stance that trying to store 255 in a data type that can only hold -128 to 127 should be considered an error.
You ask in comments why Java allows:
int i = 0xffffffff;
The literal 0xffffffff is an int literal and is interpreted using two's complement. The reason that you cannot do anything similar for a byte is that the language does not provide syntax for specifying that a literal is of type byte, or indeed short.
I don't know why the decision not to offer more literal types was made. I expect it was made for reasons of simplicity. One of the goals of the language was to avoid unnecessary complexity.

You can write
int i = 0xFFFFFFFF;
but you can't write
byte b = 0xFF;
as the 0xFF is an int value not a byte so its equal to 255. There is no way to define a byte or short literal, so you have to cast it.
BTW You can do
byte b = 0;
b += 0xFF;
b ^= 0xFF;
even
byte b = 30;
b *= 1.75; // b = 52.

it is legal, but you need to cast it to byte explicitly, i.e. (byte)0xff because it is out of range.

You can literally set a byte, but surprisingly, you have to use more digits:
byte bad = 0xff; // doesn't work
byte b = 0xffffffff; // fine
The logic is, that 0xff is implicitly 0x000000ff, which exceeds the range of a byte. (255)
It's not the first idea you get, but it has some logic. The longer number is a smaller number (and smaller absolute value).
byte b = 0xffffffff; // -1
byte c = 0xffffff81; // -127
byte c = 0xffffff80; // -128

The possible values for a byte range from -128 to 127.
It would be possible to let values outside the range be assigned to the variable, and silently throw away the overflow, but that would rather be confusing than conventient.
Then we would have:
byte b = 128;
if (b < 0) {
// yes, the value magically changed from 128 to -128...
}
In most situations it's better to have the compiler tell you that the value is outside the range than to "fix" it like that.

Purpose of char in java

I'm seeing both char and short are two bytes each,
also the following is valid:
char a = 'x';
int b = a;
long c = a;
However when I do short d = a; I'm getting an error that cannot convert from char to short.
even though both are two bytes.
One use of char I can see is when you print it, it displays the character. Is that the only use of char?

In Java the data type char is an unsigned 2 byte variable while a short is signed 2 byte variable. Because of this difference you cannot convert a char to a short because a char has that extra bit that isn't the signed bit.
This means that if you convert char to a short you would have to truncate the char which would cause the complaining.

You should probably want to read about data types and type casting to have a clear understanding.
When you tried to do short d=a; you are trying to do the 'Narrowing Primitive Conversion'.
5.1.3 Narrowing Primitive Conversions
The following 22 specific conversions
on primitive types are called the
narrowing primitive conversions:
short to byte or char
char to byte or short
int to byte, short, or char
long to byte, short, char, or int
float to byte, short, char, int, or long
double to byte, short, char, int, long, or float
Narrowing conversions may lose
information about the overall
magnitude of a numeric value and may
also lose precision.

While it is true that char is an unsigned 16-bit type, it is not designed to be used as such. Conceptually, char is a Unicode character (more precisely, a character from the Unicode subset that can be represented with 16 bits in UTF-16) which is something more abstract than just a number. You should only ever use chars for text characters. I think that making it an integer type is a kind of bad design on Java's side. It shouldn't have allowed any conversions between char and integer types except by using special utility methods. Like boolean is impossible to convert to integers and back.

char is generally only used for text, yes.
char is unsigned, while short is signed.

check out the Java Docs
it's all there... Char and Short aren't the same.

Normally, I use char for text, and short for numbers. (If I use short. I'd rather use int/long when they're available).
You may want to "cast" the variable. It's done like this:
char a = 'x';
short sVar = (short)a;
System.out.println(a);
System.out.println(sVar);

char is a Unicode-character and, therefore, must be unsigned. short is like all other numeric primitive types signed. Therefore, short and char cover two different ranges of values causing the compiler to complain about the assignment.

Why does the xor operator on two bytes produce an int?

//key & hash are both byte[]
int leftPos = 0, rightPos = 31;
while(leftPos < 16) {
//possible loss of precision. required: byte, found: int
key[leftPos] = hash[leftPos] ^ hash[rightPos];
leftPos++;
rightPos--;
}
Why would a bitwise operation on two bytes in Java return an int? I know I could just cast it back to byte, but it seems silly.

Because the language spec says so. It gives no reason, but I suspect that these are the most likely intentions:
To have a small and simple set of rules to cover arithmetic operations involving all possible combinations of types
To allow an efficient implementation - 32 bit integers are what CPUs use internally, and everything else requires conversions, explicit or implicit.

If it's correct and there are no value that can cause this loss of precision, in other words : "impossible loss of precision" the compiler should shut up ... and need to be corrected, and no cast should be added in this :
byte a = (byte) 0xDE;
byte b = (byte) 0xAD;
byte r = (byte) ( a ^ b);

There is no Java bitwise operations on two bytes. Your code implicitly and silently converts those bytes to a larger integer type (int), and the result is of that type as well.
You may now question the sanity of leaving bitwise operations on bytes undefined.

This was somewhere down in the answers to one of the similar questions that people have already pointed out:
http://blogs.msdn.com/oldnewthing/archive/2004/03/10/87247.aspx

Unsigned short in Java

How can I declare an unsigned short value in Java?

You can't, really. Java doesn't have any unsigned data types, except char.
Admittedly you could use char - it's a 16-bit unsigned type - but that would be horrible in my view, as char is clearly meant to be for text: when code uses char, I expect it to be using it for UTF-16 code units representing text that's interesting to the program, not arbitrary unsigned 16-bit integers with no relationship to text.

If you really need a value with exactly 16 bits:
Solution 1: Use the available signed short and stop worrying about the sign, unless you need to do comparison (<, <=, >, >=) or division (/, %, >>) operations. See this answer for how to handle signed numbers as if they were unsigned.
Solution 2 (where solution 1 doesn't apply): Use the lower 16 bits of int and remove the higher bits with & 0xffff where necessary.

This is a really stale thread, but for the benefit of anyone coming after. The char is a numeric type. It supports all of the mathematical operators, bit operations, etc. It is an unsigned 16.
We process signals recorded by custom embedded hardware so we handle a lot of unsigned 16 from the A-D's. We have been using chars all over the place for years and have never had any problems.

You can use a char, as it is an unsigned 16 bit value (though technically it is a unicode character so could potnetially change to be a 24 bit value in the future)... the other alternative is to use an int and make sure it is within range.
Don't use a char - use an int :-)
And here is a link discussing Java and the lack of unsigned.

From DataInputStream.java
public final int readUnsignedShort() throws IOException {
int ch1 = in.read();
int ch2 = in.read();
if ((ch1 | ch2) < 0)
throw new EOFException();
return (ch1 << 8) + (ch2 << 0);
}

It is not possible to declare a type unsigned short, but in my case, I needed to get the unsigned number to use it in a for loop. There is the method toUnsignedInt in the class Short that returns "the argument converted to int by an unsigned conversion":
short signedValue = -4767;
System.out.println(signedValue ); // prints -4767
int unsignedValue = Short.toUnsignedInt(signedValue);
System.out.println(unsingedValue); // prints 60769
Similar methods exist for Integer and Long:
Integer.toUnsignedLong
Long.toUnsignedString : In this case it ends up in a String because there isn't a bigger numeric type.

No such type in java

Yep no such thing if you want to use the value in code vs. bit operations.

"In Java SE 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 232-1." However this only applies to int and long but not short :(

If using a third party library is an option, there is jOOU (a spin off library from jOOQ), which offers wrapper types for unsigned integer numbers in Java. That's not exactly the same thing as having primitive type (and thus byte code) support for unsigned types, but perhaps it's still good enough for your use-case.
import static org.joou.Unsigned.*;
// and then...
UShort s = ushort(1);
(Disclaimer: I work for the company behind these libraries)

No, really there is no such method, java is a high-level language. That's why Java doesn't have any unsigned data types.

He said he wanted to create a multi-dimensional short array. Yet no one suggested bitwise operators? From what I read you want to use 16 bit integers over 32 bit integers to save memory?
So firstly to begin 10,000 x 10,000 short values is 1,600,000,000 bits, 200,000,000 bytes, 200,000 kilobytes, 200 megabytes.
If you need something with 200MB of memory consumption you may want to redesign this idea. I also do not believe that will even compile let alone run. You should never initialize large arrays like that if anything utilize 2 features called On Demand Loading and Data Caching. Essentially on demand loading refers to the idea to only load data as it is needed. Then data caching does the same thing, but utilizes a custom frame work for delete old memory and adding new information as needed. This one is tricky to have GOOD speed performance. There are other things you can do, but those two are my favorite when done right.
Alright back to what I was saying about bitwise operators.
So a 32bit integer or in Java "int". You can store what are called "bits" to this so let's say you had 32 Boolean values which in Java all values take up 32 bits (except long) or for arrays they take up 8 for byte, 16 for short, and 32 for int. So unless you have arrays you don't get any memory benefits from using a byte or short. This does not mean you shouldn't use it as its a way to ensure you and others know the data range this value should have.
Now as I was saying you could effectively store 32 Booleans into a single integer by doing the following:
int many_booleans = -1; //All are true;
int many_booleans = 0; //All are false;
int many_booleans = 1 | 2 | 8; //Bits 1, 2, and 4 are true the rest are false;
So now a short consists of 16 bits so 16 + 16 = 32 which fits PERFECTLY within a 32bit integer. So every int value can consist of 2 short values.
int two_shorts = value | (value2 << 16);
So what the above is doing is value is something between -32768 and 32767 or as an unsigned value 0 - 65535. So let's say value equaled -1 so as an unsigned value it was 65535. This would mean bits 1 through 16 are turned on, but when actually performing the math consider the range 0 - 15.
So we need to then activate bits 17 - 32. So we must begin at something larger than 15 bits. So we begin at 16 bits. So by taking value2 and multiplying it by 65536 which is what "<< 16" does. We now would have let's say value2 equaled 3 it would be OR'd 3x65536 = 196608. So our integer value would equal 262143.
int assumed_value = 262143;
so let's say we want to retrieve the two 16bit integer values.
short value1 = (short)(assumed_value & 0xFFFF); //-1
short value2 = (short)(assumed_value >> 16); //=3
Also basically think of bitwise operators as powers of 2. That is all they really are. Never look at it terms of 0's and 1's. I mostly posted this to assist anyone who may come across this searching for unsigned short or even possibly multi-dimensional arrays. If there are any typo's I apologize quickly wrote this up.

Java does not have unsigned types. What do you need it for?
Java does have the 'byte' data type, however.

You can code yourself up a ShortUnsigned class and define methods for those operators you want. You won't be able to overload + and - and the others on them, nor have implicit type conversion with other primitive or numeric object types, alas.
Like some of the other answerers, I wonder why you have this pressing need for unsigned short that no other data type will fill.

Simple program to show why unsigned numbers are needed:
package shifttest;
public class ShiftTest{
public static void main(String[] args){
short test = -15000;
System.out.format ("0x%04X 0x%04X 0x%04X 0x%04X 0x%04X\n",
test, test>>1, test>>2, test>>3, test>>4);
}
}
results:
0xC568 0xFFFFE2B4 0xFFFFF15A 0xFFFFF8AD 0xFFFFFC56
Now for those that are not system types:
JAVA does an arithmetic shift because the operand is signed, however, there are cases where a logical shift would be appropriate but JAVA (Sun in particular), deemed it unnecessary, too bad for us on their short sightedness. Shift, And, Or, and Exclusive Or are limited tools when all you have are signed longer numbers. This is a particular problem when interfacing to hardware devices that talk "REAL" computer bits that are 16 bits or more. "char" is not guaranteed to work (it is two bytes wide now) but in several eastern gif based languages such as Chinese, Korean, and Japanese, require at least 3 bytes. I am not acquainted with the number need for sandscript style languages. The number of bytes does not depend on the programmer rather the standards committee for JAVA. So basing char as 16 bits has a downstream risk. To safely implement unsigned shorts JAVA, as special class is the best solution based on the aforementioned ambiguities. The downside of the class is the inability of overloading the mathematical operations for this special class. Many of the contributors for this thread of accurately pointed out these issues but my contribution is a working code example and my experience with 3 byte gifs languages in C++ under Linux.

//вот метод для получения аналога unsigned short
public static int getShortU(byte [] arr, int i ) throws Exception
{
try
{
byte [] b = new byte[2];
b[1] = arr[i];
b[0] = arr[i+1];
int k = ByteBuffer.wrap(b).getShort();
//if this:
//int k = ((int)b[0] << 8) + ((int)b[1] << 0);
//65536 = 2**16
if ( k <0) k = 65536+ k;
return k;
}
catch(Throwable t)
{
throw new Exception ("from getShort: i=" + i);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.