Well I am working on a big dataset and after some calculations I am getting values for the features like 4.4E-5. I read it somewhere those values means 0.000044 that is ten to the power minus 5. So my question is whenever I want to use them for further processing will these values behave same as float works or do I need some other data type?
Yes, it is an extended notation presenting the same binary floating point data type.
Both 4.4E-5 and 0.00044 are the same. And that value only approximates 0.000044 with a sum of powers of 2: 2^-18 + ...
Multiplying lots of small numbers leads to underflow. Take the log and add. This technique is universal in computer science. Many of the Google hits for "underflow log" are useful, including SO hits, other techniques for dealing with it, etc.
Related
I am writing a basic neural network in Java and I am writing the activation functions (currently I have just written the sigmoid function). I am trying to use doubles (as apposed to BigDecimal) with hopes that training will actually take a reasonable amount of time. However, I've noticed that the function doesn't work with larger inputs. Currently my function is:
public static double sigmoid(double t){
return (1 / (1 + Math.pow(Math.E, -t)));
}
This function returns pretty precise values all the way down to when t = -100, but when t >= 37 the function returns 1.0. In a typical neural network when the input is normalized is this fine? Will a neuron ever get inputs summing over ~37? If the size of the sum of inputs fed into the activation function vary from NN to NN, what are some of the factors the affect it? Also, is there any way to make this function more precise? Is there an alternative that is more precise and/or faster?
Yes, in a normalized network double is fine to use. But this depend on your input, if your input layer is bigger, your input sum will be bigger of course.
I have encountered the same problem using C++, after t become big, the compiler/rte does not even take into account E^-t and returns plain 1, as it only calculates the 1/1 part. I tried to divide the already normalized input by 1000-1000000 and it worked sometimes, but sometimes it did not as I was using a randomized input for the first epoch and my input layer was a matrix 784x784. Nevertheless, if your input layer is small, and your input is normalized this will help you
The surprising answer is that double is actually more precision than you need. This blog article by Pete Warden claims that even 8 bits are enough precision. And not just an academic idea: NVidia's new Pascal chips emphasize their single-precision performance above everything else, because that is what matters for deep learning training.
You should be normalizing your input neuron values. If extreme values still happen, it is fine to set them to -1 or +1. In fact, this answer shows doing that explicitly. (Other answers on that question are also interesting - the suggestion to just pre-calculate 100 or so values, and not use Math.exp() or Math.pow() at all!)
I was wondering how to replace common trigonometric values in an expression. To put this into more context, I am making a calculator that needs to be able to evaluate user inputs such as "sin(Math.PI)", or "sin(6 * math.PI/2)". The problem is that floating point values aren't accurate and when I input sin(Math.PI), the calculator ends up with:
1.2245457991473532E-16
But I want it to return 0. I know I could try replacing in the expression all sin(Math.PI) and other common expressions with 0, 1, etc., except I have to check all multiples of Math.PI/2. Can any of you give me some guidance on how to return the user the proper values?
You're running into the problem that it's not quite possible to express a number like pi in a fixed number of bits, so with the available machine precision the computation gives you a small but non-zero number. Math.PI in any case is only an approximation of PI, which is an irrational number. To clean up your answer for display purposes, one possibility is to use rounding. You could instead try adding +1 and -1 to it which may well round the answer to zero.
This question here may help you further:
Java Strange Behavior with Sin and ToRadians
Your problem is that 1.2245457991473532E-16 is in fact zero for many purposes. What about simply rounding the result yielded by sin? With enough rounding, you may achieve what you want and even get 0.5, -0.5 and other important sin values relatively easily.
If you really want to replace those functions as your title suggests, then you can't do that in Java. Your best bet would be to create an SPI specification for common functions that could either fall back to the standard Java implementation or use your own implementation, which replaces the Java one.
Then users of your solution would need to retrieve one of the implementations using dependency injection of explicit references to a factory method.
I have to solve a problem in java which has an input consisting of 10^100 digits.
How can I take such a large input and process it.I am using JAVA as my programming language.
Are all those digits actually significant? Or do you just have a value like 1.234567890123456789 * 10^100?
As others have noted, having 10^100 essential digits would essentially mean you can stop now and write off your problem as uncomputable. You've either misunderstood it, or you shouldn't be approching it via brute-force number crunching. Or both.
If you don't need all the lower-order digits, then floats or doubles may do the job for you. If you need more digits of precision than a double can handle (but still a REASONABLE number), an extended-precision floating point package such as BigFloat might get you there.
If you told us what you were actually trying to do, we could tell you more about whether there's any reasonable way to do it.
I have something similar to a spreadsheet column in mind. A spreadsheet column has transparent data typing: text or any kinds of numbers.
But no matter how the typing is implemented internally, they allow roundoff-safe operations; eg adding up a column of hundreds of numbers with decimal points, and other arithmetic operations. And they do it efficiently too.
What way of handling numbers can make them:
transparent to the user
round-off safe
support efficient arithmetic, aggregation, sorting
handled by datastores and applications with Java primitive types?
I have in mind, using a 64b long datatype that is internally multiplied by 1000 to provide 3 decimal places. For example 123.456 is internally stored as 123456, `1 is stored as 1000. Reinventing floating point numbers seems clunky; I have to reinvent multiplication, for example.
Miscellany: I actually have in mind a document tagging system. A number tag is conceptually similar to a spreadsheet column that is used to store numbers.
I do want to know how spreadsheets handle it, and I would have titled the question as such.
I am using two datastores that uses Java primitive types. Point #4 wasnt hypothetical.
Unless you really need to use primatives, BigDecimal should handle that for you.
Excel uses double precision floats internally, then rounds the display portion in each cell according to the formatting options. It uses the double values for any calculations (unless the Precision as Displayed option is enabled - in which case it uses the rounded displayed value) and then rounds the result when displayed.
You could certainly use a long normalized to the max number of decimals you want to support - but then you're stuck with fixed-precision. That may or may not be acceptable. If you can use BigDecimal, that could work - but I don't think that qualifies as a Java primitive type.
I am working on probabilistic models, and when doing inference on those models, the estimated probabilities can become very small. In order to avoid underflow, I am currently working in the log domain (I store the log of the probabilities). Multiplying probabilities is equivalent to an addition, and summing is done by using the formula:
log(exp(a) + exp(b)) = log(exp(a - m) + exp(b - m)) + m
where m = max(a, b).
I use some very large matrices, and I have to take the element-wise exponential of those matrices to compute matrix-vector multiplications. This step is quite expensive, and I was wondering if there exist other methods to deal with underflow, when working with probabilities.
Edit: for efficiency reasons, I am looking for a solution using primitive types and not objects storing arbitrary-precision representation of real numbers.
Edit 2: I am looking for a faster solution than the log domain trick, not a more accurate solution. I am happy with the accuracy I currently get, but I need a faster method. Particularly, summations happen during matrix-vector multiplications, and I would like to be able to use efficient BLAS methods.
Solution: after a discussion with Jonathan Dursi, I decided to factorize each matrix and vector by its largest element, and to store that factor in the log domain. Multiplications are straightforward. Before additions, I have to factorize one of the added matrices/vectors by the ratio of the two factors. I update the factor every ten operations.
This issue has come up recently on the computational science stack exchange site as well, and although there the immediate worry there was overflow, the issues are more or less the same.
Transforming into log space is certainly one reasonable approach. Whatever space you're in, to do a large number of sums correctly, there's a couple of methods you can use to improve the accuracy of your summations. Compensated summation approaches, most famously Kahan summation, keep both a sum and what's effectively a "remainder"; it gives you some of the advantages of using higher precision arithmeitic without all of the cost (and only using primitive types). The remainder term also gives you some indication of how well you're doing.
In addition to improving the actual mechanics of your addition, changing the order of how you add your terms can make a big difference. Sorting your terms so that you're summing from smallest to largest can help, as then you're no longer adding terms as frequently that are very different (which can cause significant roundoff problems); in some cases, doing log2 N repeated pairwise sums can also be an improvement over just doing the straight linear sum, depending on what your terms look like.
The usefullness of all these approaches depend a lot on the properties of your data. The arbitrary precision math libraries, while enormously expensive in compute time (and possibly memory) to use, have the advantage of being a fairly general solution.
I ran into a similar problem years ago. The solution was to develop an approximation of log(1+exp(-x)). The range of the approximation does not need to be all that large (x from 0 to 40 will more than suffice), and at least in my case the accuracy didn't need to be particularly high, either.
In your case, it looks like you need to compute log(1+exp(-x1)+exp(-x2)+...). Throw out those large negative values. For example, suppose a, b, and c are three log probabilities, with 0>a>b>c. You can ignore c if a-c>38. It's not going to contribute to your joint log probability at all, at least not if you are working with doubles.
Option 1: Commons Math - The Apache Commons Mathematics Library
Commons Math is a library of lightweight, self-contained mathematics and statistics components addressing the most common problems not
available in the Java programming language or Commons Lang.
Note: The API protects the constructors to force a factory pattern while naming the factory DfpField (rather than the somewhat more intuitive DfpFac or DfpFactory). So you have to use
new DfpField(numberOfDigits).newDfp(myNormalNumber)
to instantiate a Dfp, then you can call .multiply or whatever on this. I thought I'd mention this because it's a bit confusing.
Option 2: GNU Scientific Library or Boost C++ Libraries.
In these cases you should use JNI in order to call these native libraries.
Option 3: If you are free to use other programs and/or languages, you could consider using programs/languages for numerical computations such as Octave, Scilab, and similar.
Option 4: BigDecimal of Java.
Rather than storing values in logarithmic form, I think you'd probably be better off using the same concept as doubles, namely, floating-point representation. For example, you might store each value as two longs, one for sign-and-mantissa and one for the exponent. (Real floating-point has a carefully tuned design to support lots of edge cases and avoid wasting a single bit; but you probably don't need to worry so much about any of those, and can focus on designing it in a way that's simple to implement.)
I don't understand why this works, but this formula seems to work and is simpler:
c = a + log(1 + exp(b - a))
Where c = log(exp(a)+exp(b))