how to check exponent of a number in java - java

I want to check the exponent of numbers generated in my program. (They are generated like 2.333E-4 , when I print). I want to check which numbers are having Exponent smaller than E-4, which needs to be increased by multiplying with 10 till they reach E-4.
Any ideas how to achieve this? Even if someone can tell me how to extract the exponent , it will be helpful.

You can use Java docs: Double.doubleToRawLongBits
Double.doubleToRawLongBits
public static long doubleToRawLongBits(double value)
Returns a representation of the specified floating-point value according to the IEEE 754 floating-point "double format" bit layout, preserving Not-a-Number (NaN) values.
Bit 63 (the bit that is selected by the mask 0x8000000000000000L) represents the sign of the floating-point number.
Bits 62-52 (the bits that are selected by the mask 0x7ff0000000000000L) represent the exponent.
Bits 51-0 (the bits that are selected by the mask 0x000fffffffffffffL) represent the significand (sometimes called the mantissa) of the floating-point number.
So you could get the double value into a long, shift by 13*4 (13 hex digits = 13*4 bits) to the right and check the exponent value that way.
This link will help with the decoding: IEEE754 double FP format - exponent encoding

Another way to extract the exponent is to use the logarithm (base 10) function (Math.log10(double a) in Java).
For the input value you give, it will return -3.6320852612062473. For the input value 1e-4, it will return -4. Therefore, here is what you could do (assuming that "number" is the double value you want to test in the code below):
while (Math.log10(number) < -4) {
number = number * 10;
}

You can do this
double d = 0.01234e-4;
if (d < 1e-4)
System.out.printf("%fE-4", d * 1e4);
else
System.out.printf("%g", d);
prints
0.012340E-4

Related

Type casting in Java. Double to long

Why the output of System.out.println((long)Math.pow(2,63)); and System.out.println((long)(Math.pow(2,63)-1)); is same in Java?
The output is the same because double does not have enough bits to represent 263 exactly.
Mantissa of a double has only 52 bits:
This gives you at most 17 decimal digit precision. The value you computed, on the other hand, is 9223372036854775808, so it needs 19 digits to be represented exactly. As the result, the actual representation of 263 is 9223372036854776000:
Mantissa is set to 1.0 (1 in front is implied)
Exponent is set to 1086 (1024 is implicitly subtracted to yield 63)
The mantissa of representation of 1 is the same, while the exponent is 1024 for the effective value of zero, i.e. the exponents of the two numbers differ by 63, which is more than the size of the mantissa.
Subtraction of 1 happens while your number is represented as double. Since the magnitude of minuend is much larger than that of the subtrahend, the whole subtraction operation is ignored.
You would get the same result after subtracting much larger numbers - all the way to 512, which is 29 (demo). After that the difference in exponent would be less than 52, so you would start getting different results.
Math.pow( double, double ) returns a double values.
double in java is a 64-bit IEEE 754 floating point.(https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html)
If you look here: https://en.wikipedia.org/wiki/Double-precision_floating-point_format you will find, that this format is composed of:
1 bit sign
11 bit exponent
53 bit significant precision
The returned number by pow would need a higher precision (63) to be stored exactly.
Basically the 1 you add is below this precision threshold.
In contrast long has 64 bit precision.
To make it more clear lets assume we are working in decimal and not in base2:
In some imaginary small float datatype with a precision of 2 the value 1000 would be stored as 1.00e3. If you add 1 it would have to store it as 1.001e3. But since we only have a precision of 2 it can only store 1.00e3 and nothing changes. So 1.00e3 + 1 == 1.00e3
The same happens in your example, only that we are dealing with larger numbers and base2 of cause.
You should use parenthesis to incorporate the result and then subtract 1, like this:
System.out.println((long)Math.pow(2,63));
System.out.println(((long)(Math.pow(2,63))-1));
Output:
9223372036854775807
9223372036854775806
For long data type in java,Maximum value is 9,223,372,036,854,775,807 (inclusive). (2^63 -1)
So Even if you try
System.out.println((long)Math.pow(2,65));
System.out.println((long)(Math.pow(2,63)-1));
output
9223372036854775807
9223372036854775807

Determine smallest floating point type that can hold a string value

I'm working on a method that translates a string into an appropriate Number type, depending upon the format of the number. If the number appears to be a floating point value, then I need to return the smallest type I can use without sacrificing precision (Float, Double or BigDecimal).
Based on How many significant digits have floats and doubles in java? (and other resources), I've learned than Float values have 23 bits for the mantissa. Based on this, I used the following method to return the bit length for a given value:
private static int getBitLengthOfSignificand(String integerPart,
String fractionalPart) {
return new BigInteger(integerPart + fractionalPart).bitLength();
}
If the result of this test is below 24, I return a Float. If below 53 I return a Double, otherwise a BigDecimal.
However, I'm confused by the result when I consider Float.MAX_VALUE, which is 3.4028235E38. The bit length of the significand is 26 according to my method (where integerPart = 3 and fractionalPart = 4028235. This triggers my method to return a Double, when clearly Float would suffice.
Can someone highlight the flaw in my thinking or implementation? Another idea I had was to convert the string to a BigDecimal and scale down using floatValue() and doubleValue(), testing for overflow (which is represented by infinite values). But that loses precision, so isn't appropriate for me.
The significand is stored in binary, and you can think of it as a number in its decimal representation only if you don't let it confuse you.
The exponent is a binary exponent that does not represent a multiplication by a power of ten but by a power of two. For this reason, the E38 in the number you used as example is only a convenience: the real significand is in binary and should be multiplied by a power of two to obtain the actual number. Powers of two and powers of ten aren't the same, so “3.4028235” is not the real significand.
The real significand of Float.MAX_VALUE is in hexadecimal notation, 0x1.fffffe, and its associated exponent is 127, meaning that Float.MAX_VALUE is actually 0x1.fffffe * 2127.
Looking at the decimal representation to choose a binary floating-point type to put the value in, as you are trying to do, doesn't work. For one thing, the number of decimal digits that one is sure to recover from a float is different from the number of decimal digits one may need to write to distinguish a float from its neighbors (6 and 9 respectively). You chose to write “3.4028235E38” but you could have written 3.40282E38, which for your algorithm, looks easier to represent, when it isn't, really. When people write that “3.4028235E38” is the largest finite value of the float type, they mean that if you round this decimal number to float, you will arrive to the largest float. If you parse “3.4028235E38” as a double-precision number it won't even be equal to Float.MAX_VALUE.
To put it differently: another way to write Float.MAX_VALUE is 3.4028234663852885981170418348451692544E38. It is still representable as a float (it represents the exact same value as 3.4028235E38). It looks like it has many digits because these are decimal digits that appear for a decimal exponent, when in fact the number is represented internally with a binary exponent.
(By the way, your approach does not check that the exponent is in range to represent a number in the chosen type, which is another condition for a type to be able to represent the number from a string.)
I would work in terms of the difference between the actual value and the nearest float. BigDecimal can store any finite length decimal fraction exactly and do arithmetic on it:
Convert the String to the nearest float x. If x is infinite, but the value has a finite double representation use that.
Convert the String exactly to BigDecimal y.
If y is zero, use float, which can represent zero exactly.
If not, convert the float x to BigDecimal, z.
Calculate, in BigDecimal to a reasonable number of decimal places, the absolute value of (y-z)/z. That is the relative rounding error due to using float. If it is small enough for your purposes, less than some value you pick, use float. If not, use double.
If you literally want no sacrifice in precision, it is much simpler. Convert to both float and double. Compare them for equality. The comparison will be done in double. If they compare equal, go with the float. If not, go with the double.

Calculate the maximum value of a Java Double (5)?

Calculating this value for an long is easy:
It is simply 2 to the power of n-1, and than minus 1. n is the number of bits in the type. For a long this is defined as 64 bits. Because we must use represent negative numbers as well, we use n-1 instead of n. Because 0 must be accounted for, we subtract 1. So the maximum value is:
MAX = 2^(n-1)-1
what it the equivalent thought process, for a double:
Double.MAX_VALUE
comes to be
1.7976931348623157E308
The maximum finite value for a double is, in hexadecimal format, 0x1.fffffffffffffp1023, representing the product of a number just below 2 (1.ff… in hexadecimal notation) by 21023. When written this way, is is easy to see that it is made of the largest possible significand and the largest possible exponent, in a way very similar to the way you build the largest possible long in your question.
If you want a formula where all numbers are written in the decimal notation, here is one:
Double.MAX_VALUE = (2 - 1/252) * 21023
Or if you prefer a formula that makes it clear that Double.MAX_VALUE is an integer:
Double.MAX_VALUE = 21024 - 2971
If we look at the representation provided by Oracle:
0x1.fffffffffffffp1023
or
(2-2^-52)·2^1023
We can see that
fffffffffffff
is 13 hexadecimal digits that can be represented as 52 binary digits ( 13 * 4 ).
If each is set to 1 as it is ( F = 1111 ), we obtain the maximum fractional part.
The fractional part is always 52 bits as defined by
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
1 bit is for sign
and the remaining 11 bits make up the exponent.
Because the exponent must be both positive and negative and it must represent 0, it to can have a maximum value of:
2^10 - 1
or
1023
Doubles (and floats) are represented internally as binary fractions according to the IEEE standard 754
and can therefore not represent decimal fractions exactly:
http://mindprod.com/jgloss/floatingpoint.html
http://www.math.byu.edu/~schow/work/IEEEFloatingPoint.htm
http://en.wikipedia.org/wiki/Computer_numbering_formats
So there is no equivalent calculation.
Just take a look at the documentation. Basically, the MAX_VALUE computation for Double uses a different formula because of the finite number of real numbers that can be represented on 64 bits. For an extensive justification, you can consult this article about data representation.

Why is (long)9223372036854665200d giving me 9223372036854665216?

I know about weird stuff with precision errors, but I can't fathom,
Why is (long)9223372036854665200d giving me 9223372036854665216 ?
9223372036854665200d is a constant of type double. However, 9223372036854665200 does not fit in a double without loss of precision. A double only has 52 bits of mantissa, whereas the number in question requires 63 bits to be represented exactly.
The nearest double to 9223372036854665200d is the number whose mantissa equals 1.1111111111111111111111111111111111111111111110010100 in binary and whose exponent is 63 (decimal). This number is none other than 9223372036854665216 (call it U).
If we decrease the mantissa one notch to 1.1...0011, we get 9223372036854664192 (call it L).
The original number is between L and U and is much closer to U than it is to L
Finally, if you think that this truncation of the mantissa ought to result in a number that ends in a bunch of zeros, you're right. Only it happens in binary, not in decimal: U in base-16 is 0x7ffffffffffe5000 and L is 0x7ffffffffffe4c00.
Because doubles don't have that much precision. Why are you doing such a strange thing? Change the d to l.
Doubles have 52-53 bit precision, whereas a long has 64 bit precision (for integers only). The loss of precision in a double is used to represent the exponent, which allows a double to represent larger/smaller numbers than a long can.
Your number is 19 digits long, whereas a double can only store roughly 16 digits of (decimal) integer data. Thus the final number ends up being rounded.
Reference: Double - Wikipedia
Because doubles have limited precision. Your constant has more significant digits than a double can keep track of, so it loses them.
You are assuming that limited precision means that it is represented in decimal so is limited to 15 or 16 digits. Actually it is represented in binary and limited to 53 bits of precision. double takes the closest representable value.
double d = 9223372036854665200d;
System.out.println(d +" is actually\n" + new BigDecimal(d)+" so when cast to (long) is\n"+(long) d);
prints
9.2233720368546652E18 is actually
9223372036854665216 so when cast to (long) is
9223372036854665216

Given IEEE binary representation of a real how to get its true binary representation in Java?

I am just wonder how to use bit operations to achieve the goal: given an IEEE binary representation of a real, for example, 40AC0000 (5.375 in decimal), how to get its true binary representation (expecting 101.011 for the example) in Java?
This is kind of a tough question, especially if you don't already know about IEEE floats.
Since there are 4 bytes in your number, it's single precision. This means it has a structure of 1 sign bit, 8 Exponent bits and 23 Mantissa bits. The sign bit is obvious. The meaning of the exponent bits affects how you interpret the Mantissa bits. First check the 8 exponents bits. If they are all 0, you have a denormalized number; if they are all 1, you have an infinity value or a NaN; otherwise, it is normalized.
In the normalized , take the exponent bits, interpret it as an 8 bit number and subtract 127_10 (or 0xf7) from it. This is your exponent. Then take the remaining Mantissa bits, add a leading 1. Your result is then (-1)^[Sign] * 1.[Mantissa] * 2^[Exponent].
If it is a denormalized number, your exponent is -126 (1-127). In this case, interpret as (-1)^[Sign] * 0.[Mantissa] * 2^[Exponent].
In the remaining cases, if the Mantissa is all 0s, your number is (-1)^[Sign] * infinity. Otherwise, your float is a NaN.
Hope that helps.
Do you mean Float.floatToIntBits() and Float.intBitsToFloat() ?
What do you mean by "true binary representation"? There is nothing "untrue" about the hex representation (40AC0000).
You can convert between different radixes (hex, binary, decimal) using the methods on Integer:
Float.floatToIntBits(new Float("5.375"));
// = 1085014016
Integer.toString(1085014016, 16);
// = "40ac0000"
Integer.valueOf("40AC0000", 16);
// = 1085014016
Integer.toString(1085014016, 2);
// returns 1000000101011000000000000000000

Categories

Resources