I want to run a particle simulation with periodic boundary conditions - for simplicity, let's assume a 1D simulation with a region of length 1.0. I could enforce these conditions using the following short snippet:
if (x > 1.0)
x -= 1.0;
if (x < 0.0)
x += 1.0;
but it feels "clumsy"1 - especially when generalizing this to higher dimensions. I tried doing something like
x = x % 1.0;
which takes good care of the case x > 1.0 but doesn't do what I want for x < 0.02. A few examples of the output of the "modulus" version and the "manual" version to show the difference:
Value: 1.896440, mod: 0.896440, manual: 0.896440
Value: -0.449115, mod: -0.449115, manual: 0.550885
Value: 1.355568, mod: 0.355568, manual: 0.355568
Value: -0.421918, mod: -0.421918, manual: 0.578082
1) For my current application, the "clumsy" way is probably good enough. But in my pursuit of becoming a better programmer, I'd still like to learn if there's a "better" (or at least better-looking...) way to do this.
2) Yes, I've read this post, so I know why it doesn't do what I want, and that it's not supposed to. My question is not about this, but rather about what to do instead.
You can use % with this slight modification x = (x + 1.0) % 1.0
The best approach is probably to subtract the floor of the value from the value itself. Computing a floating-point remainder in compliance with IEEE standards is rather complicated, and unless one is on a system which can detect and accelerate "easy cases", especially those where the divisor is a power of two, a pair of if statements is apt to be faster.
It might be interesting, though, to consider why fmod was designed the annoying way it was: if fmod were designed to return a value between 0 and the dividend, then the precision of the result when the dividend is a very small positive number would be much better than the precision when the dividend is a very small negative number (the precision would be limited to that of the divisor). The advantages of having fmod's precision be relatively symmetric about zero probably outweighs the advantages of having the results be non-negative, but that doesn't imply the IEEE is the only good way to design a range-limiting function.
An alternative approach which would combine the advantages of the IEEE's approach and the zero-to-divisor approach would be to specify that a mod function must yield a result whose numerical value was (for d positive) less than the numerical value of d/2, but no less than -d/2. Such a definition would always yield a result that was representable in the source operands' type (if D is a very small value such that D/2 is not precisely representable, the range of the modulus would be symmetrical). Unfortunately, I know of no library mod functions that work this way.
Related
I want to multiply or divide big numbers (BigIntegers) with 1.6.
Currently I'm using:
Division:
BigDecimal dez = new BigDecimal("1.6");
BigDecimal bigd = new BigDecimal(big);
return bigd.divide(dez,10000, RoundingMode.FLOOR).toBigInteger();
Multiplication:
BigDecimal dez = new BigDecimal("1.6");
BigDecimal bigd = new BigDecimal(big);
return bigd.multiply(dez).toBigInteger()
Usually my numbers have 1000 <= x <= 10000 Bytes. Caching doesn't help numbers are not repeating.
Exists there a bithack to do this faster?
For dividing by 1.6, absolutely there's a bit hack. Mathematically, x ÷ 1.6 = x ÷ 2 + x ÷ 8
So you can write this as
myBigInteger.shiftRight(1).add(myBigInteger.shiftRight(3))
There is no similar hack for multiplying by 1.6.
You will have to worry about rounding errors, of course, as bits disappear off the right hand end of the number. You'll have to think about what to do with these.
In particular, this expression is "off by one" if the last three bits of the original number are 101 or 111. So a better expression would be
myBigInteger.shiftRight(1).add(myBigInteger.shiftRight(3)) + (myBigInteger.testBit(0) && myBigInteger.testBit(2) ? 1 : 0)
Multiplication by 1.6 is equivalent to multiplication by ¹⁶⁄₁₀ or ⁸⁄₅, and division by 1.6 is equivalent to multiplication by ⁵⁄₈ or ¹⁰⁄₁₆
So to multiply you can do like either of this
big.shiftLeft(4).divide(BigInteger.TEN);
big.shiftLeft(3).divide(BigInteger.valueOf(5));
No need to use BigDecimal
Since the division is constant you can even convert it into a multiplication by the inverse although it might not worth it in this case
Similarly division can be done like this
big.multiply(BigInteger.TEN).shiftRight(4);
big.multiply(BigInteger.valueOf(5)).shiftRight(3);
If the value is negative you'll need a small change though because right shift rounds down instead of rounding towards zero
But if you need to do this millions of times then Java is probably the wrong choice. There are many excellent native big integer libraries, just call them using Java NDK. They're fast because they're native, but some can utilize SIMD so they'll significantly faster. For example y-cruncher can use AVX-512 for multiplication. See also Fast Multiple-Precision Integer Division Using Intel AVX-512
If you have multiple independent BigIntegers then you can also do then in parallel using multiple threads
I know that in Java (and probably other languages), Math.pow is defined on doubles and returns a double. I'm wondering why on earth the folks who wrote Java didn't also write an int-returning pow(int, int) method, which seems to this mathematician-turned-novice-programmer like a forehead-slapping (though obviously easily fixable) omission. I can't help but think that there's some behind-the-scenes reason based on the intricacies of CS that I just don't know, because otherwise... huh?
On a similar topic, ceil and floor by definition return integers, so how come they don't return ints?
Thanks to all for helping me understand this. It's totally minor, but has been bugging me for years.
java.lang.Math is just a port of what the C math library does.
For C, I think it comes down to the fact that CPU have special instructions to do Math.pow for floating point numbers (but not for integers).
Of course, the language could still add an int implementation. BigInteger has one, in fact. It makes sense there, too, because pow tends to result in rather big numbers.
ceil and floor by definition return integers, so how come they don't return ints
Floating point numbers can represent integers outside of the range of int. So if you take a double argument that is too big to fit into an int, there is no good way for floor to deal with it.
From a mathematical perspective, you're going to overflow your integer if it's larger than 231-1, and overflow your long if it's larger than 264-1. It doesn't take much to overflow it, either.
Doubles are nice in that they can represent numbers from ~10-308 to ~10308 with 53 bits of precision. There may be some fringe conversion issues (such as the next full integer in a double may not exactly be representable), but by and large you're going to get a much larger range of numbers than you would if you strictly dealt with integers or longs.
On a similar topic, ceil and floor by definition return integers, so how come they don't return ints?
For the same reason outlined above - overflow. If I have an integral value that's larger than what I can represent in a long, I'd have to use something that could represent it. A similar thing occurs when I have an integral value that's smaller than what I can represent in a long.
Optimal implementation of integer pow() and floating-point pow() are very different. And C's math library was probably developed around the time when floating-point coprocessors were a consideration. Optimal implementation of floating point operation is to shift the numbers closer to 1 (to force quicker conversion of the power series) and then shift the result back. For integer power, a more accurate result can be had in O(log(p)) time by doing something like this:
// p is a positive integer power set somewhere above, n is the number to raise to power p
int result = 1;
while( p != 0){
if (p & 1){
result *= n;
}
n = n*n;
p = p >> 1;
}
Because all ints can be upcast to a double without loss and the pow function on a double is no less efficient that that on an int.
The reason lies behind the implementation of Math.pow() (JNI of default implementation). The CPU has an exponentiation module which works with doubles as input and output. Why should Java convert that for you when you have much better control over this yourself?
For floor and ceil the reasons are the same, but note that:
(int) Math.floor(d) == (int) d; // d > 0
(int) Math.ceil(d) == -(int)(-d); // d < 0
For most cases (no warranty around or beyond Integer.MAX_VALUE or Integer.MIN_VALUE).
Java leaves you with
(int) Math.pow(a,b)
because the result of Math.pow may even be NaN or Infinity depending on input.
I am confused about using expm1 function in java
The Oracle java doc for Math.expm1 says:
Returns exp(x) -1. Note that for values of x near 0, the exact sum of
expm1(x) + 1 is much closer to the true result of ex than exp(x).
but this page says:
However, for negative values of x, roughly -4 and lower, the algorithm
used to calculate Math.exp() is relatively ill-behaved and subject to
round-off error. It's more accurate to calculate ex - 1 with a
different algorithm and then add 1 to the final result.
should we use expm1(x) for negative x values or near 0 values?
The implementation of double at the bit level means that you can store doubles near 0 with much more precision than doubles near 1. That's why expm1 can give you much more accuracy for near-zero powers than exp can, because double doesn't have enough precision to store very accurate numbers very close to 1.
I don't believe the article you're citing is correct, as far as the accuracy of Math.exp goes (modulo the limitations of double). The Math.exp specification guarantees that the result is within 1 ulp of the exact value, which means -- to oversimplify a bit -- a relative error of at most 2^-52, ish.
You use expm1(x) for anything close to 0. Positive or negative.
The reason is because exp(x) of anything close to 0 will be very close to 1. Therefore exp(x) - 1 will suffer from destructive cancellation when x is close to 0.
expm1(x) is properly optimized to avoid this destructive cancellation.
From the mathematical side: If exp is implemented using its Taylor Series, then expm1(x) can be done by simply omitting the first +1.
double x = 1;
double y = 3 * (1.0 / 3);
x == y
In a powerpoint I am studying, it said the statement is logically questionable. I cannot find out why it is thus, I mean you use == for primitives correct, or is it Logically questionable because doubles are not stored exactly or am I missing something obvious? Thanks
I think you've got it: since the data types are doubles, rather than int or Integer, the resulting x and y may not be precisely equal.
It is logically questionable because the compare statement at the end would evaluate to false. Doubles are stored as a series of powers of two. So values like 1/2 and 1/4 and 1/8 could actually be expressed in floating point formats exactly, but not 1/3. It will be approximated to 1/4 + 1/64 + ... there is now way it could exactly be apprroximate 1/3
Correct way to compare the floats is like this:
Math.double.abs ( x - y ) > tol
where tol is set to something sufficiently small, depending on your application. For example most graphics applications work well with tol = 0.00001
Because 1.0 / 3 is 0.3333..., up to the capacity of a double. 3 * 0.3333... is 0.9999..., up to the capacity of a double.
So we have the question 1 == 0.9999..., which I guess you could call "logically questionable".
It's because of roundoff error. The problem is analogous to the precision problems you have have with decimals when dealing with numbers that cannot be precisely expressed in the format you are using.
For example, with six digits of decimal precision, the best you can do for 1/3 is .333333. But:
1/3 + 1/3 + 1/3 -> .33333 + .333333 + .33333 = .999999 != 1.000000
Ouch. For 2/3, you can use either .666666 or .666667 but either way, you have problems.
If 2/3 -> .666666 then:
2/3 + 1/3 -> .333333 + .666666 != 1.000000
Ouch.
And if 2/3. -> .666667 then:
1/3 * 2 - 2/3 -> .333333 * 2.00000 - .666667 = .666666 - .666667 != 0
Ouch.
It's analogous with doubles. This paper is considered the authoritative work on the subject. In simple terms -- never compare floating point numbers for equality unless you know exactly what you're doing.
Logically questionable because doubles are not stored exactly
More or less.
As a general rule, a double cannot represent the precise value of a Real number. The value 1/3 is an example.
Some numbers can be represented precisely as double values. 1.0 and 3.0 are examples. However, the division operator produces a number (in this case) that cannot be represented.
In general, any code that uses == to compare double or float values is questionable ... in the sense that you need to analyse each case carefully to know whether the use of == is correct. (And the analysis is not intuitive for people who were taught at a very early age to do arithmetic in base 10!)
From the software engineering standpoint, the fact that you need to do a case-by-case analysis of == usage makes it questionable practice. Good software engineering is (in part) about eliminating mistakes and sources of mistakes.
I have cells for whom the numeric value can be anything between 0 and Integer.MAX_VALUE. I would like to color code these cells correspondingly.
If the value = 0, then r = 0. If the value is Integer.MAX_VALUE, then r = 255. But what about the values in between?
I'm thinking I need a function whose limit as x => Integer.MAX_VALUE is 255. What is this function? Or is there a better way to do this?
I could just do (value / (Integer.MAX_VALUE / 255)) but that will cause many low values to be zero. So perhaps I should do it with a log function.
Most of my values will be in the range [0, 10,000]. So I want to highlight the differences there.
The "fairest" linear scaling is actually done like this:
floor(256 * value / (Integer.MAX_VALUE + 1))
Note that this is just pseudocode and assumes floating-point calculations.
If we assume that Integer.MAX_VALUE + 1 is 2^31, and that / will give us integer division, then it simplifies to
value / 8388608
Why other answers are wrong
Some answers (as well as the question itself) suggsted a variation of (255 * value / Integer.MAX_VALUE). Presumably this has to be converted to an integer, either using round() or floor().
If using floor(), the only value that produces 255 is Integer.MAX_VALUE itself. This distribution is uneven.
If using round(), 0 and 255 will each get hit half as many times as 1-254. Also uneven.
Using the scaling method I mention above, no such problem occurs.
Non-linear methods
If you want to use logs, try this:
255 * log(value + 1) / log(Integer.MAX_VALUE + 1)
You could also just take the square root of the value (this wouldn't go all the way to 255, but you could scale it up if you wanted to).
I figured a log fit would be good for this, but looking at the results, I'm not so sure.
However, Wolfram|Alpha is great for experimenting with this sort of thing:
I started with that, and ended up with:
r(x) = floor(((11.5553 * log(14.4266 * (x + 1.0))) - 30.8419) / 0.9687)
Interestingly, it turns out that this gives nearly identical results to Artelius's answer of:
r(x) = floor(255 * log(x + 1) / log(2^31 + 1)
IMHO, you'd be best served with a split function for 0-10000 and 10000-2^31.
For a linear mapping of the range 0-2^32 to 0-255, just take the high-order byte. Here is how that would look using binary & and bit-shifting:
r = value & 0xff000000 >> 24
Using mod 256 will certainly return a value 0-255, but you wont be able to draw any grouping sense from the results - 1, 257, 513, 1025 will all map to the scaled value 1, even though they are far from each other.
If you want to be more discriminating among low values, and merge many more large values together, then a log expression will work:
r = log(value)/log(pow(2,32))*256
EDIT: Yikes, my high school algebra teacher Mrs. Buckenmeyer would faint! log(pow(2,32)) is the same as 32*log(2), and much cheaper to evaluate. And now we can also factor this better, since 256/32 is a nice even 8:
r = 8 * log(value)/log(2)
log(value)/log(2) is actually log-base-2 of value, which log does for us very neatly:
r = 8 * log(value,2)
There, Mrs. Buckenmeyer - your efforts weren't entirely wasted!
In general (since it's not clear to me if this is a Java or Language-Agnostic question) you would divide the value you have by Integer.MAX_VALUE, multiply by 255 and convert to an integer.
This works! r= value /8421504;
8421504 is actually the 'magic' number, which equals MAX_VALUE/255. Thus, MAX_VALUE/8421504 = 255 (and some change, but small enough integer math will get rid of it.
if you want one that doesn't have magic numbers in it, this should work (and of equal performance, since any good compiler will replace it with the actual value:
r= value/ (Integer.MAX_VALUE/255);
The nice part is, this will not require any floating-point values.
The value you're looking for is: r = 255 * (value / Integer.MAX_VALUE). So you'd have to turn this into a double, then cast back to an int.
Note that if you want brighter and brighter, that luminosity is not linear so a straight mapping from value to color will not give a good result.
The Color class has a method to make a brighter color. Have a look at that.
The linear implementation is discussed in most of these answers, and Artelius' answer seems to be the best. But the best formula would depend on what you are trying to achieve and the distribution of your values. Without knowing that it is difficult to give an ideal answer.
But just to illustrate, any of these might be the best for you:
Linear distribution, each mapping onto a range which is 1/266th of the overall range.
Logarithmic distribution (skewed towards low values) which will highlight the differences in the lower magnitudes and diminish differences in the higher magnitudes
Reverse logarithmic distribution (skewed towards high values) which will highlight differences in the higher magnitudes and diminish differences in the lower magnitudes.
Normal distribution of incidence of colours, where each colour appears the same number of times as every other colour.
Again, you need to determine what you are trying to achieve & what the data will be used for. If you have been tasked to build this then I would strongly recommend you get this clarified to ensure that it is as useful as possible - and to avoid having to redevelop it later on.
Ask yourself the question, "What value should map to 128?"
If the answer is about a billion (I doubt that it is) then use linear.
If the answer is in the range of 10-100 thousand, then consider square root or log.
Another answer suggested this (I can't comment or vote yet). I agree.
r = log(value)/log(pow(2,32))*256
Here are a bunch of algorithms for scaling, normalizing, ranking, etc. numbers by using Extension Methods in C#, although you can adapt them to other languages:
http://www.redowlconsulting.com/Blog/post/2011/07/28/StatisticalTricksForLists.aspx
There are explanations and graphics that explain when you might want to use one method or another.
The best answer really depends on the behavior you want.
If you want each cell just to generally have a color different than the neighbor, go with what akf said in the second paragraph and use a modulo (x % 256).
If you want the color to have some bearing on the actual value (like "blue means smaller values" all the way to "red means huge values"), you would have to post something about your expected distribution of values. Since you worry about many low values being zero I might guess that you have lots of them, but that would only be a guess.
In this second scenario, you really want to distribute your likely responses into 256 "percentiles" and assign a color to each one (where an equal number of likely responses fall into each percentile).
If you are complaining that the low numbers are becoming zero, then you might want to normalize the values to 255 rather than the entire range of the values.
The formula would become:
currentValue / (max value of the set)
I could just do (value / (Integer.MAX_VALUE / 255)) but that will cause many low values to be zero.
One approach you could take is to use the modulo operator (r = value%256;). Although this wouldn't ensure that Integer.MAX_VALUE turns out as 255, it would guarantee a number between 0 and 255. It would also allow for low numbers to be distributed across the 0-255 range.
EDIT:
Funnily, as I test this, Integer.MAX_VALUE % 256 does result in 255 (I had originally mistakenly tested against %255, which yielded the wrong results). This seems like a pretty straight forward solution.