Is there any library or open source function that approximate the area under a line that is described by some of its values taken at irregular intervals?
Action Script would be preferred but Java might work fine as well.
You could use the as3mathlib math library. Here's the relevant class:
http://code.google.com/p/as3mathlib/source/browse/trunk/src/com/vizsage/as3mathlib/math/calc/Integral.as
It includes the most common integral approximation methods.
Edit for more explanation (based on comments below):
Use timestamp values for each date; only convert to anything else if you need to display it to the user, and do so at the very end.
Hopefully there's a standard greatest common divisor (GCD) among the various differences between each set of adjacent timestamps. (If not, you'll need to calculate that first.) In other words, hopefully each timestamp differs by a number of whole days. If so, the GCD is 1 day. If it's not like this, you'll have to calculate what that GCD equals on the fly.
Then, use the GCD value in combination with the delta between the first and last timestamps to determine n, the number of partitions. Then, in f (your function to be integrated), determine whether the passed x corresponds to a defined timestamp. If so, return the numeric_value associated with that timestamp. If not, interpolate between the numeric_values of the nearest two defined timestamps, and return that.
Related
First of all I know what the Euclidean distance is and what it does or calculates between two vectors.
But my question is about how to calculate the distance between two class objects for example in Java or any other OOP-Language. I read pretty much stuff about machine learning already wrote a classifier using libraries etc. but I want to know how the Euclidean distance is calculated when I have for example this object:
class Object{
String name;
Color color;
int price;
int anotherProperty;
double something;
List<AnotherObject> another;
}
What I already know (If I am not wrong!) is that I have to convert this object to a(n) vector / array representing the properties or 'Features' (called in Machine Learning?)
But how can I do this? It is just this piece of puzzle which I need, to understand even more.
Do I have to collect all possible values for a property to convert it to a number and write it in the array/vector?
Example:
I guess the above object would be represented by an 6-dimensional array or smaller based on the 'Features' which are necessary to calculate.
Let's say Color, Name and the price are those necessary features the array/vector based on the following data:
color: green (Lets say an enum with 5 possible values where green is the third one)
name: "foo" (I would not know how to convert this one maybe using
addition of ascii code?)
price: 14 (Just take the integer?)
would look like this?
[3,324,14]
And if I do this with every Object from the same class I am able to calculate the Euclidean distance. Am I right or did I misunderstand something, or is it completely wrong?
For each data type you need to choose an appropriate method of determing the distance. In many cases each data type may also itself have to be treated as a vector.
For colour, for example, you could express the colour as an RGB value and then take the Euclidian distance (take the 3 differences, square them, sum and then square root). You might want to chose a different colour-space than RGB (e.g., HSI). See here: Colour Difference.
Comparing two strings is easier: a common method is the Levenshtein distance. There is an method in the Apache commons StringUtils class.
Numbers - just take the difference.
Every type will require some consideration for the best way of either generating a distance directly or calculating the a numeric value that can then be subtracted to give a "distance".
Once you have a vector of all of the "values" of all of the fields for each object you can calculate the Euclidian distance (square the differences, sum and square root the sum).
In your case, if you have:
object 1: [3,324,14]
object 2: [5,123,10]
The Euclidian distance is:
sqrt( (3-5)^2 + (324-123)^2 + (14-10)^2 )
But in the case of comparing strings, the Levenshtein algorithm gives you the distance directly without intermediate numbers for the fields.
Think about this problem as a statistics problem. Classify all the attributes into nominal, ordinal, and scale variables. Once you have done that, it is just a multiple dimension distance vector problem.
I was wondering how to replace common trigonometric values in an expression. To put this into more context, I am making a calculator that needs to be able to evaluate user inputs such as "sin(Math.PI)", or "sin(6 * math.PI/2)". The problem is that floating point values aren't accurate and when I input sin(Math.PI), the calculator ends up with:
1.2245457991473532E-16
But I want it to return 0. I know I could try replacing in the expression all sin(Math.PI) and other common expressions with 0, 1, etc., except I have to check all multiples of Math.PI/2. Can any of you give me some guidance on how to return the user the proper values?
You're running into the problem that it's not quite possible to express a number like pi in a fixed number of bits, so with the available machine precision the computation gives you a small but non-zero number. Math.PI in any case is only an approximation of PI, which is an irrational number. To clean up your answer for display purposes, one possibility is to use rounding. You could instead try adding +1 and -1 to it which may well round the answer to zero.
This question here may help you further:
Java Strange Behavior with Sin and ToRadians
Your problem is that 1.2245457991473532E-16 is in fact zero for many purposes. What about simply rounding the result yielded by sin? With enough rounding, you may achieve what you want and even get 0.5, -0.5 and other important sin values relatively easily.
If you really want to replace those functions as your title suggests, then you can't do that in Java. Your best bet would be to create an SPI specification for common functions that could either fall back to the standard Java implementation or use your own implementation, which replaces the Java one.
Then users of your solution would need to retrieve one of the implementations using dependency injection of explicit references to a factory method.
Lets say I have a double variable d. Is there a way to get the next or previous value that is supported by the CPU architecture.
As a trivial example, if the value was 10.1245125 and the precision of the architecture was fixed to 7 decimal places, then the next value would be 10.1245126 and the previous value would be 10.1245124.
Obviously on floating-point architectures this is not that simple. How would I be able to achieve this (in Java)?
Actually, an IEEE 754 floating-point architecture makes this easy: thanks to the standard, the function is called nextafter in nearly all languages that support it, and this uniformity allowed me to write an answer to your question with very little familiarity with Java:
The java.lang.Math.nextAfter(double start, double direction) returns the floating-point number adjacent to the first argument in the direction of the second argument.
Remember that -infinity and +infinity are floating-point values, and these values are convenient to give the direction (second argument). Do not make the common mistake of writing something like Math.nextAfter(x, x+1), which only works as long as 1 is greater than the ULP of x.
Anyone who writes the above probably means instead Math.nextAfter(x, Double.POSITIVE_INFINITY), which saves an addition and works for all values of x.
Math.nextUp and Math.nextDown can be used to get the next/previous element, which are equivalent to the proposed methods in the accepted answer, but more concise.
(this info has been originally provided as a comment by #BjörnZurmaar)
I did read the Point set registration and would like to implement it for my simple line matching. However, I only got very basic maths knowledge and cannot really understand the equations on the page.
Assuming I am able to extract points from 2 images, searching nearest pair by brute force looping and got a list of pairs with corresponding distances.
What is the next step to calculate a single index by utilizing the above data obtained?
The idea I currently come up with is to simply average all the distance. I believe this are many better approach. Or I should capture more data for the calculation?
Your instincts are almost correct.
Generally, the metric is the sum of squared distances; with the goal of finding the least-squares fit (minimizing the sum of all the individual square distances). Essentially this minimizes the standard deviation (actually it minimizes variance, but same end effect).
So take all your corresponding pairs, calculate the distance squared between them (fast calculation, no sqrt involved; faster than calculating actual distances) add them up and the lower the better. If your point sets differ in count you may wish to divide by the count to get a proper variance value.
This metric applies to pretty much any registration algorithm.
By the way, if you already have a point correspondance and you know there is no scaling/skewing, you might also be interested in Horn's method, which is a closed-form (non-iterative) algorithm that just spits out the least-squared fit directly. It's very efficient.
(P.S. For a very simple explanation of why the variance is a better indicator than the mean distance, check out this page).
I read that for handling date range query NumericRangeQuery is better than TermRangeQuery in "Lucene in action", But i couldnot find the reason. i want to know the reason behind it.
I used TermRangeQuery and NumericRangequery both for handling date range query and i found that searching is fast via NumericRangeQuery.
My second point is to query using NumericRangeQuery i have to create indexes using NumericField by which i can create indexes upto milisecond but what if i want to reduce my resolution upto hour or day.
Why is numeric so much faster than term?
As you have noted, there is a "precision step". This means that numbers are only stored to a certain precision, which means that there is a (very) limited number of terms. According to the documentation, it is rare to have more than 300 terms in an index. Check out the wikipedia article on Tries if you are interested in the theory.
How can you reduce precision?
The NumericField class has a "precision" parameter in the constructor. Note that the range query also has a precision parameter, and they must be the same. That JavaDoc page has a link to a paper written about the implementation explaining more of what precision means.
Explanation by #Xodarap about Numeric field is correct. Essentially, the precision is dropped for the numbers to reduce the actual term space. Also, I suppose, TermRangeQuery uses String comparison whereas NumericRange query is working with integers. That should squeeze some more performance.
You can index at any desirable resolution - millisecond to day. Date.getTime() gives you milliseconds since epoch. You can divide this number by 1000 to get time with resolution at second. Or you can divide by 60,000 to get resolution at minute. And so on.