I have the following code:
double d = 23d;
byte b = 52;
int result;
I am trying to store the result of d + b in result and I have to cast in order to do that, however I have found two ways to do that and I don't know what the difference is.
//First method
result = (int) d + b;
//Second Method
result = (int) (d+b);
What is the difference?
Thanks
In the first case, d is casted to int, and then added to b
result = (int) d + b;
// is also equivalent to
result = ((int) d) + b;
This is possible as adding a byte and a double can be done without any cast. On the other hand, int result = (int) b + d would fail, as this results in a int being added to a double, thus resulting in a double, that cannot be put in an int.
In the second case, the adding is done first, and then the result is casted to an int.
First of methods casts only the first operand, which in this case is d. Second method casts the result of the sum of d and b.
For most values, both expressions will return the same. However, take a look at this code that will illustrate the difference:
double d = 1 / (1 - (0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1)); // equal to 9.007199254740992E15
byte b = 1;
System.out.println((long) d + b); // prints 9007199254740993
System.out.println((long) (d + b)); // prints 9007199254740992
Now what happened ?
in the first expression, d is converted to an long first and then added to b
in the second expression, d + b is calculated as a double value and then converted to a long
Why does it matter ?
I did not pick d randomly. This is a number for which d + 1 is an integer but is impossible to represent as a double because it's a large number and double has limited precision. As a result, for this particular number d + 1 == d ! However, long has infinite precision for all integers that are small enough to fit in 64 bits. Therefore, converting d as a long before calculating d + 1 gives us the real sum without precision error.
PS: I used long instead of int because this number would not fit in an int and woud overflow. The two expressions would still return two different results but it would make my point less clear.
Thxs! below explanation of my understandings may help you.
1st Method Simplification:
b is implicitly converted to double and added to d. then produce double value as a result and allocates a memory to store this double value first which is larger. Finally, casting to this value makes the result as int.
double d1 = d + b;
result = (int) d1;
2nd Method Simplification:
First b is implicitly converted to double and added to d then casting the resulting value into byte. As a result, it allocates a memory to store this byte value which is comparatively lower. Finally, this byte value is implicitly converted to int and produces the result as int.
byte b1 = (byte)(d + b);
result = b1;
So we conclude that the second method is the best one because it allocates the lower memory spaces than the first method.
Related
I am working on a scaled down RSA encryption and decryption methods and everything seems to be working well, until I try to take the modulus of a number. The modulus operator isn't returning the expected result.
for(int k = 0; k < sA.length; k++){
int value = Integer.parseInt(sA[k]);
System.out.println("value : " + value);
double mToE = Math.pow(value,e);
System.out.println("mToE: " + mToE);
double c = mToE % n;
System.out.println("C: " + c);
sA is an array containing the values {06707708, 30670320, 50050050}.
mToE represents M(in this case each string in sA) raised to the power of e(13).
C = M^e mod n where n is input as a parameter.
These specific lines output:
value : 6707708
mToE: 5.56498415666044E88
C: 5.2630797E7
0
value : 30670320
mToE: 2.1248975415575414E97
C: 8.973537E7
1
value : 50050050
mToE: 1.2366838911816623E100
C: 3.4150233E7
2
For example the first value of c should be:
c: 3.2059734E7
or 32059734
What reason could there be for getting this result?
Thanks in advance for all of your advice.
The double type has lots of precision, 53 bits, but that's not enough to store values as precise as unity at the very high values you're seeing. The values of mToE are the double values that are closest to the true values of calculation.
With the Math.ulp method (unit in last place), we can determine the precision of a double value of the magnitude 5.56498415666044E88.
System.out.println(Math.ulp(mToE));
This outputs
7.067388259113537E72
Because of this, your value is very likely to be off in 72 digits. This will of course completely mess up the value of c, which is taken from a mod % operation.
For the necessary precision, you will need to use BigIntegers. Also, BigInteger has its own modPow method built specifically for your purpose.
My problem:
int a = 30, b = 12, c = 2, d = 5, e = 1;
double result = (double) (a - b / (c * d) + e);
System.out.print("Result = " + (double) result + " - " + result);
Result:
Result = 30.0 - 30.0
What I want to see is: 29.8 !
I have integers but I want to evaluate an expression which I need to have in double precision.
Is there a simple way to do what I tried?
You have to cast one of the integers in your expression to double. Actually, you can't cast any of the integers. The important one to cast is one of the operands of the division, since integer division is the cause of the precision loss you experience).
For example :
double result = (a - (double)b / (c * d) + e);
or
double result = (a - b / (double)(c * d) + e);
This will ensure the division is done using floating point arithmetic.
why does not java cast int to double in expression automatically?
Because the expression where you're not getting a fractional value is evaluated based purely on the operands you give it, not the greater context (which would be horribly complicated). So a - b, c * d, and ... + e all work with int values. Only once you have a final result do you cast it to double, and the compiler doesn't look at the greater context and guess that you wanted to do that earlier (which is a Good Thing(tm)).
If you want the operations to happen with double values, you have to tell the compiler that. In this case, you can put that cast in any of several places, which will then "bubble up" to any expressions the result is used in. Here's one place you can put it:
double result = (a - b / (double)(c * d) + e);
Live example
The result of the operation between int is int. So the 12/10 is the int one 1, if you want to get the double one 1.2, you must cast one of the expression to double, like double result = (a - (double)b / (c * d) + e);.
I was recently reading about storing floating point values in the memory. And I've written a small program to test what I've read. And I noticed that there is a difference in the way Java processes the floating point values.
public class Test
{
public static void main(String args[])
{
double a = 0.90;
System.out.println(a);
System.out.println(2.00-1.10);
}
}
The above program is printing
0.9
0.8999999999999999
Why both these statements are not printing the same value? I know some floating values can't be represented exactly. In that case, both should give same value.
Why both these statements are not printing the same value?
The result is not the same.
I know some floating values can't be represented exactly.
So you should assume that the result of an operation can depend on the amount of representation error of the values you use.
for (long l = 1; l <= 1e16; l *= 10) {
double a = l + 2;
double b = l + 1.1;
System.out.println(a + " - " + b + " is " + (a - b));
}
as the value gets larger the representation error increases and gets larger compares with the result of 0.9
3.0 - 2.1 is 0.8999999999999999
12.0 - 11.1 is 0.9000000000000004
102.0 - 101.1 is 0.9000000000000057
1002.0 - 1001.1 is 0.8999999999999773
10002.0 - 10001.1 is 0.8999999999996362
100002.0 - 100001.1 is 0.8999999999941792
1000002.0 - 1000001.1 is 0.9000000000232831
1.0000002E7 - 1.00000011E7 is 0.900000000372529
1.00000002E8 - 1.000000011E8 is 0.9000000059604645
1.000000002E9 - 1.0000000011E9 is 0.8999999761581421
1.0000000002E10 - 1.00000000011E10 is 0.8999996185302734
1.00000000002E11 - 1.000000000011E11 is 0.899993896484375
1.000000000002E12 - 1.0000000000011E12 is 0.9000244140625
1.0000000000002E13 - 1.00000000000011E13 is 0.900390625
1.00000000000002E14 - 1.000000000000011E14 is 0.90625
1.000000000000002E15 - 1.0000000000000011E15 is 0.875
1.0000000000000002E16 - 1.0000000000000002E16 is 0.0
and on the topic of when representation error gets so large your operation does nothing.
for (double d = 1; d < Double.MAX_VALUE; d *= 2) {
if (d == d + 1) {
System.out.println(d + " + 1 == " + (d + 1));
break;
}
}
for (double d = 1; d < Double.MAX_VALUE; d *= 2) {
if (d == d - 1) {
System.out.println(d + " - 1 == " + (d - 1));
break;
}
}
prints
9.007199254740992E15 + 1 == 9.007199254740992E15
1.8014398509481984E16 - 1 == 1.8014398509481984E16
When “0.90” is converted to double, the result is .9 plus some small error, e0. Thus a equals .9+e0.
When “1.10” is converted to double, the result is 1.1 plus some small error, e1, so the result is 1.1+e1.
These two errors, e0 and e1, are generally unrelated to each other. Simply put, different decimal numbers are different distances away from binary floating-point numbers. When you evaluate 2.00-1.10, the result is 2–(1.1+e1) = .9–e1. So one of your numbers is .9+e0, and the other is .9-e1, and there is no reason to expect them to be the same.
(As it happens in this case, e0 is .00000000000000002220446049250313080847263336181640625, and e1 is .000000000000000088817841970012523233890533447265625. Also, subtracting 1.1 from 2 introduces no new error, after the conversion of “1.1” to double, by Sterbenz’ Lemma.)
Additional details:
In binary, .9 is .11100110011001100110011001100110011001100110011001100 11001100… The bits in bold fit into a double. The trailing bits do not fit, so the number is rounded at that point. That causes a difference between the exact value of .9 and the value of “.9” represented as a double. In binary, 1.1 is 1.00011001100110011001100110011001100110011001 10011001… Again, the number is rounded. But observe the amount rounding is different. For .9, 1100 1100… was rounded up to 1 0000 0000…, which adds 00110011… at that position. For 1.1, 1001 1001 is rounded up to 1 0000 0000…, which adds 01100110… at that position (and causes a carry in the bold bits). And the two positions are different; 1.1 starts to the left of the radix point, so it looks like this: 1.[52 bits here][place where rounding occurs]. .9 starts to the right of the radix point, so it looks like this: .[53 bits here][place where rounding occurs]. So the rounding for 1.1, besides being 01100110… instead of 00110011…, is also doubled because it occurs one bit to the left of the .9 rounding. So you have two effects making e0 different from e1: The trailing bits that were rounded are different, and the place where rounding occurs is different.
I know some floating values can't be represented exactly
Well that is your answer (or more precisely, as pointed out by Mark Byers, some decimal values can't be represented exactly as a double)! Neither 0.9 or 1.1 can be represented as a double so you get rounding errors.
You can check the exact value of the various doubles with BigDecimal:
public static void main(String args[]) {
double a = 0.9d;
System.out.println(a);
System.out.println(new BigDecimal(a));
double b = 2d - 1.1d;
System.out.println(b);
System.out.println(new BigDecimal(2.0d));
System.out.println(new BigDecimal(1.1d));
System.out.println(new BigDecimal(b));
}
which outputs:
0.9
0.90000000000000002220446049250313080847263336181640625
0.8999999999999999
2
1.100000000000000088817841970012523233890533447265625
0.899999999999999911182158029987476766109466552734375
Your reasoning is that, even if 0.9 can't be represented precisely by a double, that it should have exactly the same double value as 2.0 - 1.1, and so result in the same printed value. That's the error -- this subtraction does not yield the double represented by "0.9" (or the exact value 0.9).
given the following code:
long l = 1234567890123;
double d = (double) l;
is the following expression guaranteed to be true?
l == (long) d
I should think no, because as numbers get larger, the gaps between two doubles grow beyond 1 and therefore the conversion back yields a different long value. In case the conversion does not take the value that's greater than the long value, this might also happen earlier.
Is there a definitive answer to that?
Nope, absolutely not. There are plenty of long values which aren't exactly representable by double. In fact, that has to be the case, given that both types are represented in 64 bits, and there are obviously plenty of double values which aren't representable in long (e.g. 0.5)
Simple example (Java and then C#):
// Java
class Test {
public static void main(String[] args) {
long x = Long.MAX_VALUE - 1;
double d = x;
long y = (long) d;
System.out.println(x == y);
}
}
// C#
using System;
class Test
{
static void Main()
{
long x = long.MaxValue;
double d = x;
long y = (long) d;
Console.WriteLine(x == y);
}
}
I observed something really strange when doing this though... in C#, long.MaxValue "worked" in terms of printing False... whereas in Java, I had to use Long.MAX_VALUE - 1. My guess is that this is due to some inlining and 80-bit floating point operations in some cases... but it's still odd :)
You can test this as there are a finite number of long values.
for (long l = Long.MIN_VALUE; l<Long.MAX_VALUE; l++)
{
double d = (double) l;
if (l == (long)d)
{
System.out.println("long " + l + " fails test");
}
}
Doesn't take many iterations to prove that;
l = -9223372036854775805
d = -9.223372036854776E18
(long)d = -9223372036854775808
My code started with 0 and incremented by 100,000,000. The smallest number that failed the test was found to be 2,305,843,009,300,000,000 (19 digits). So, any positive long less than 2,305,843,009,200,000,000 is representable exactly by doubles. In particular, 18-digit longs are also representable exactly by doubles.
By the way, the reason I was interested in this question is that I wondered if I can use doubles to represent timestamps (in milliseconds). Since current timestamps are on the order of 13 digits (and it will take for them rather long time to get to 18 digits), I'll do that.
I need do to some input validation but run into a question and I do not seem to find an answer (even with Google). The problem is simple: I have 2 positive integers on the input, and I need to check if their product fits int type in Java.
One of my attempts was to compare product with Integer.MAX_VALUE, but it seems if the product is too big for integer, value becomes negative.
I wanted to reason that product is too big by change in sign, but it seems if the product is "way too big" it will become positive again.
Could someone advise me how to detect if number becomes too big?
Many thanks in advance!
If you are doing a UI, you are presumably in no particular hurry. So you could use a BigInteger and then test the product against MAX_VALUE.
Cast the value to int and see if the value is the same. A simple check looks like
double d =
long l =
BigInteger bi =
if (d == (int) d) // can be represented as an int.
if (l == (int) l) // can be represented as an int.
int i = bi.intValue();
if (bi.equals(BigInteger.valueOf(i)))
If the value is the same when cast back, there is no loss of information and you can use an int value.
Searched and found the following:
Java is cavalier about overflow. There are no compile-time warnings or run-time exceptions to let you know when your calculations have become too big to store back in an int or long. There is no warning for float or double overflow either.
/**
* multiplies the two parameters, throwing a MyOverflowException if the result is not an int.
* #param a multiplier
* #param b multiplicand
* #result product
*/
public static int multSafe(int a, int b) throws MyOverflowException
{
long result = (long)a * (long)b;
int desiredhibits = - ((int)( result >>> 31 ) & 1);
int actualhibits = (int)( result >>> 32 );
if ( desiredhibits == actualhibits )
{
return(int)result;
}
else
{
throw new MyOverflowException( a + " * " + b + " = " + result );
}
}
You could create a BigInteger from your input value and use its intValue() method to convert. If the BigInteger is too big to fit in an int, only the low-order 32 bits are returned. So you need to compare the resulting value to your input value to ensure it was not truncated.