Can java floats be sorted by their byte representations?

Can java floats be sorted by their byte representations? - java

I'm working in Hadoop, and I need to provide a comparator to sort objects as raw network order byte arrays. This is easy for me to do with integers -- I just compare each byte in order. I also need to do this for floats. I think, but I can't find a reference, that the IEEE 754 format for floats used by Java can be sorted by just comparing each byte as a signed 8 bit value.
Can anyone confirm or refute this?
Edit: the representation is IEEE 754 32 bit floating point. I actually have a (larger) byte buffer and an offset and length within that buffer. I found some utility methods already there that make it easy to turn this into a float, so I guess this question is moot. I'm still curious if anyone knows the answer.

Positive floats have the same ordering as their bit representations viewed as 2s complement integers. Negative floats do not.
For example, the bit representation of -2.0f is 0xc0000000 and -1.0f is 0xbf800000. If you try to use a comparison on the representations, you get -2.0f > -1.0f, which is incorrect.
There's also the issue of NaNs (which compare unordered against all floating point data, whereas the representations do not), but you may not care about them.

This almost works:
int bits = Float.floatToIntBits(x);
bits ^= (bits >> 31) & Integer.MAX_VALUE;
Here negative floats have bits 0-30 inverted (because you want the opposite order to what the original sign/magnitude representation would give you, whilst preserving the sign bit.)
Caveats:
NaNs are included in the ordering (best to consider the results undefined if NaNs are involved.)
+0 now compares as greater than -0 (the built-in relational operators consider them equal.)
It works for all other values though, including denormals and infinities.

Use Float.toIntBits(float) and compare the integers.
Edit: This works only for positive numbers, including positive infinity, but not NaN. For negative numbers you have to reverse the ordering. And positive numbers are of course greater than negative numbers.

well, if you are transmitting data over the network, you should have some form of semantic representation for when you are transmitting an int and when you are transmitting a float. Since it is machine agnostic information, data type width should also be defined in some place or predefined by the spec (i.e 32 bit or 64 bit floats). So, what you should really do is accumulate your bytes into the appropriate data type, then use the natural language data types to do the comparison.
To be really accurate with an answer tho, we would need to see your transmit and receive code to see if you are autoboxing primitives via some kind of decorated i/o stream or some such thing. For a better answer, please provide better detail.

Related

What Primitive To Use In Java

I am a little confused on when to use what primitives. If I am defining a number, how do I know what to use byte, short, int, or long? I know they are different bytes, but does that mean that I can only use one of them for a certain number?
So simply, my question is, when do I use each of the four primitives listed above?

If I am lets say defining a number, how do I know what to use byte, short, int, or long?
Depending on your use-case. There's no huge penalty to using an int as opposed to a short, unless you have billions of numbers. Simply consider the range a variable might use. In most cases it is reasonable to use int, whose range is -2,147,483,648 to 2,147,483,647, while long handles numbers in the range of +/- 9.22337204*1018. If you aren't sure, long won't specifically hurt.
The only reasons you might want to use byte specifically is if you are storing byte data such as parts of a file, or are doing something like network communication or serialization where the number of bytes is important. Remember that Java's bytes are also signed (-128 to 127). Same for short--might be useful to save 2GB of memory for a billion-element array, but not specifically useful for much other than, again, serialization with a specific byte alignment.
does that mean that I can only use one of them for a certain number?
No, you can use any that is large enough to handle that number. Of course, decimal values need a double or a float--double is usually ideal due to its higher precision and few drawbacks. Some libraries (e.g. 3D drawing) might use floats however. Remember that you can cast numeric types (e.g. (byte) someInt or (float) functionReturningADouble();

A byte can be a number from -128 to 127
A short can be a number from -32,768 to 32,767
An int can be from -2^32 to 2^32 - 1
A long is -2^63 to 2^63 - 1
Usually unless space is an issue an int is fine.

In order to use a primitive you need to know the data range of the number stored in it. This depends heavily on the thing that you are modeling:
Use byte when your number is in the range [-128..127]
Use short when your number is in the range [-32768..32767]
Use int when your number is in the range [-231..231-1]
Use long when your number is in the range [-263..263-1]
In addition, byte data type is used for representing "raw" binary data, in which case it is not used like a number.
When you want an unlimited range, use BigInteger. This flexibility comes at a cost: operations on BigInteger can be orders of magnitude more expensive than identical operations on primitives.

Do you know what values your variable will potentially hold?
If it will be between -128 and 127, use a byte.
If it will be between -32,768 and 32,767, use a short.
If it will be between -2^32 and 2^32 - 1, use an int.
If it will be between -2^63 and 2^63 - 1, use a long.
Generally an int will be fine for most cases, but you might use a byte or a short if memory is a concern, or a long if you need larger values.
Try writing a little test program. What happens when you put a value that's too large in one of these types of variables?
Source: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
See also: the year 2038 problem

The general guidance should definitely be: If you don't have any good reason to pick a specific primitive, pick int.
This has at least three advantages:
Every kind of arithmetic or logical operation will return an int. Hence if you use anything smaller you will have to cast a lot to actually assign values back.
It's the accepted standard for "some number". If you pick anything else, other developers will wonder about the reasons.
It will be the most efficient, although it's likely that the performance difference from using anything else will be minuscule.
That said if you have an array of millions of elements yes you definitely should figure out what primitive fits the input range - I refer you to the other answers for a summary of how large each one is.

#dasblinkenlight
When you are working for large project, you need to optimize the code for each requirement. Apparently you would reduce the memory occupied by the project.
Hence using int, short, long, byte plays a vital role in terms of optimising the memory. When the interger holds a 1 digit value then you declare the integer with int or short, but you can still choose long,float and double technically but to become a good programmer you can try to follow standards.
Byte - Its 8 bit. So it can hold a value from -128 to 127.
short: Its 32 bit. so -32,768 and a maximum value of 32,767
Long- Its 64 bit. -2 (power of) 63 and a maximum value of 2(power of)63-1
float and double- for large decimal values.
Boolean for Yes or No decesions- can hold a value of 1 or 0
Char for characters.
Hope this answer helps you.

How to read binary 32-bit fixed-point number in Java

I need to read Adobe's signed 32-bit fixed-point number, with 8 bits for the integer part followed by 24 bits for the fractional part. This is a "path point", as defined in Adobe Photoshop File Formats Specification.
This is how I'd do it in Ruby, but I need to do it in Java.
read(1).unpack('c*')[0].to_f +
(read(3).unpack('B*')[0].to_i(2).to_f / (2 ** 24)).to_f

Based on some discussion in the comments of David Wallace's answer, there is a significantly faster way to compute the correct double.
Probably the very fastest way to convert it is like this:
double intFixedPoint24ToDouble(int bits){
return ((double) bits) / 1<<24;
}
The reason that this is faster is because of the way double-precision floating point arithmatic works. In this case, the above sequence can be converted to some extremely simple additions and bit shifting. When this gets run, the actual steps it takes look like this:
Convert an int (bits) to a double (done on FPU, usually). This is quite fast.
Subtract 0x00180000 from the upper 32 bits of that result. This is extremely fast.
A very similar optimization can be applied whenever you multiply or divide any floating point number by any compile-time constant integer that is a power of two.
This compiler optimization does not apply if you are instead dividing by a double, or if you divide by a non-compile-time-constant expression (any expression involving anything other than final compile-time-constant variables, literal numbers, or operators). In that case, it must be performed as a double-precision floating-point division, which is probably the slowest single operation, except for block transfers and advanced mathematical functions.
However, as you can see, 1<<24 is a compile-time constant power of two, and so the optimization does apply in this case.

Read your bytes into an int (which is always 32 bits in Java), then use this. You need double, not float, because single precision floating point won't necessarily be long enough to hold your 32 bit fixed point number.
double toFixedPoint(int bytes){
return bytes / Math.pow(2, 24);
}
If speed is a concern, then work out Math.pow(2,24) outside of this method and store it.

java long datatype conversion to unsigned value

I'm porting some C++ code to Java code.
There is no unsigned datatype in java which can hold 64 bits.
I have a hashcode which is stored in Java's long datatype (which of course is signed).
long vp = hashcode / 38; // hashcode is of type 'long'
Since 38 here is greater than 2, the resulting number can be safely used for any other arithmetic in java.
The question is what if the signed bit in 'hashcode' is set to 1. I don't want to get a negative value in variable vp. I wanted a positive value as if the datatype is an unsigned one.
P.S: I don't want to used Biginteger for this purpose because of performance issues.

Java's primative integral types are considered signed, and there isn't really anything you can do about it. However, depending on what you need it for, this may not matter.
Since the integers are all done in two's complement, signed and unsigned are exact same at the binary level. The difference is how you interpret them, and in certain operations. Specifically, right shift, division, modulus and comparison differ. Unsigned right shifts can be done with the >>> operator. As long as you don't need one of the missing operators, you can use longs perfectly well.

If you can use third-party libraries, you can e.g. use Guava's UnsignedLongs class to treat long values as unsigned for many purposes, including division. (Disclosure: I contribute to Guava.)

Well here is how i solved this. Right shift hashcode by 1 bit(division by 2). Then divide that right shifted number by 19(which is 38/2). So essentially i divided the number by 38 exactly like how it is done in c++. I got the same value what i got in c++

I want to store a number between 0 and 100 in a variable - should I use an INT or a BYTE and why?

I know this is a n00b question but I want to define a variable with a value between 0 and 100 in JAVA. I can use INT or BYTE as data types - but which one would be the best to use. And why?(benefits?)

Either int or byte would work for storing a number in that range.
Which is best depends on what you are aiming to do.
If you need to store lots of these numbers in an array, then a byte[] will take less space than an int[] with the same length.
(Surprisingly) a byte variable takes the same amount of space as a int ... due to the way that the JVM does the stack frame / object frame layout.
If you are doing lots of arithmetic, etc using these values, then int is more convenient than byte. In Java, arithmetic and bitwise operations automatically promote the operands to int, so when you need to assign back to a byte you need to use a (byte) type-cast.

Use byte datatype to store value between -128 to 127

I think since you have used 0.00 with decimals, this means you want to allow decimals.
The best way to do this is by using a floating point type: double or float.
However, if you don't need decimals, and the number is always an integer, you can use either int or byte. Byte can hold anything up to +127 or down to -128, and takes up much less space than an int.
with 32-bit ints on your system, your byte would take up 1/4 of the space!
The disadvantages are that you have to be careful when doing arithmetic on bytes - they can overflow, and cause unexpected results if they go out of range.
If you are using basic IO functions, like creating sounds or reading old file formats, byte is invaluable. But watch out for the sign convention: negative has the high-order bit set. However when doing normal numerical calculations, I always use int, since it allows easy extension to larger values if needed, and 32-bit cpus there is minimal speed cost really. Only thing is if you are storing a large quantity of them.

Citing Primitive Data Types:
byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). The byte data type can be useful for saving memory in large arrays, where the memory savings actually matters. They can also be used in place of int where their limits help to clarify your code; the fact that a variable's range is limited can serve as a form of documentation.
int: The int data type is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive). For integral values, this data type is generally the default choice unless there is a reason (like the above) to choose something else. This data type will most likely be large enough for the numbers your program will use, but if you need a wider range of values, use long instead.

Well it depends on what you're doing... out of the blue like that I would say use int because it's a standard way to declare an integer. But, if you do something specific that require memory optimization than maybe byte could be better in that case.

Technically, neither one is appropriate--if you need to store the decimal part, you want a float.
I'd personally argue that unless you're actually getting into resource-constrained issues (embedded system, big data, etc.) your code will be slightly clearer if you go with int. Good style is about being legible to people, so indicating that you're working with integers might make more sense than prematurely optimizing and thus forcing future maintainers of your code (including yourself) to speculate as to the true nature of that variable.

why C, C++, Java does not use one complement?

I heard C, C++, Java uses two complements for binary representation. Why not use 1 complement? Is there any advantage to use 2 complement over 1 complement?

Working with two's complement signed integers is a lot cleaner. You can basically add signed values as if they were unsigned and have things work as you might expect, rather than having to explicitly deal with an additional carry addition. It is also easier to check if a value is 0, because two's complement only contains one 0 value, whereas one's complement allows one to define both a positive and a negative zero.
As for the additional carry addition, think about adding a positive number to a smallish negative number. Because of the one's complement representation, the smallish negative number will actually be fairly large when viewed as an unsigned quantity. Adding the two together might lead to an overflow with a carry bit. Unlike unsigned addition, this doesn't necessarily mean that the value is too large to represent in the one's complement number, just that the representation temporarily exceeded the number of bits available. To compensate for this, you add the carry bit back in after adding the two one's complement numbers together.

The internal representation of numbers is not part of any of those languages, it's a feature of the architecture of the machine itself. Most implementations use 2's complement because it makes addition and subtraction the same binary operation (signed and unsigned operations are identical).

Is this a homework question? If so, think of how you would represent 0 in a 1's complement system.

The answer is different for different languages.
In the case of C, you could in theory implement the language on a 1's complement machine ... if you could still find a working 1's complement machine to run your programs! Using 1's complement would introduce portability issues, but that's the norm for C. I'm not sure what the deal is for C++, but I wouldn't be surprised if it is the same.
In the case of Java, the language specification sets out precise sizes and representations for the primitive types, and precise behaviour for the arithmetic operators. This is done to eliminate the portability issues that arise when you make these things implementation specific. The Java designers specified 2's complement arithmetic because all modern CPU architectures implement 2's complement and not 1's complement integers.
For reasons why modern hardware implements 2's complement and not 1's complement, take a look at (for example) the Wikipedia pages on the subject. See if you can figure out the implications of the alternatives.

At least C and C++ offer 1's complement negation (which is the same as bitwise negation) via the language's ~ operator. Most processors - and all modern ones - use 2's complement representation for a couple reasons:
Adding positive and negative numbers is the same as adding unsigned integers
No "wasted" values due to two representations of 0 (+0 and -0)
Edit: The draft of C++0x does not specify whether signed integer types are 1's complement or 2's complement, which means it's highly unlikely that earlier versions of C and C++ did specify it. What you have observed is implementation-defined behavior, which is 2's complement on at least modern processors for performance reasons.

Almost all existing CPU hardware uses two's complement, so it makes sense that most programming languages do, too.
C and C++ support one's complement, if the hardware provides it.

It has to do with zero and rounding. If you use 1st complement, you can end up have two zeros. See here for more info.

Sign-magnitude representation would be much better for numeric code. The lack of symmetry is a real problem with 2's complement and also precludes a lot of useful (numeric orientated) bit-hacks. 2's complement also introduces trick situations where an arithmetic operation may not give you the result you think it might. So you must be mindful with regards to division, shifting and negation.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Can java floats be sorted by their byte representations? - java

Use Float.toIntBits(float) and compare the integers. Edit: This works only for positive numbers, including positive infinity, but not NaN. For negative numbers you have to reverse the ordering. And positive numbers are of course greater than negative numbers.

Related

What Primitive To Use In Java

How to read binary 32-bit fixed-point number in Java

java long datatype conversion to unsigned value

I want to store a number between 0 and 100 in a variable - should I use an INT or a BYTE and why?

why C, C++, Java does not use one complement?

Categories

Resources