Polynomial Regression values generated too far from the coordinates

Polynomial Regression values generated too far from the coordinates - java

As per the the below code for Polynomial Regression coefficients value, when I calculate the regression value at any x point. Value obtained is way more away from the equivalent y coordinate (specially for the below coordinates). Can anyone explain why the difference is so high, can this be minimized or any flaw in understanding. The current requirement is not a difference of more 150 at every point.
import numpy as np
x=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100]
y=[0,885,3517,5935,8137,11897,10125,13455,14797,15925,16837,17535,18017,18285,18328,18914,19432,19879,20249,20539,20746]
z=np.polyfit(x,y,3)
print(z)
I have also tried various various codes available in java, but the coefficient values are same every where for this data. Please help with the understanding.For example
0.019168 * N^3 + -5.540901 * N^2 + 579.846493 * N + -1119.339450
N equals 5 Value equals 1643.76649Y value 885
N equals 10 Value equals 4144.20338Y value 3517
N equals 100; Value=20624.29985 Y value 20746

The polynomial fit performs as expected. There is no error here, just a great deviation in your data. You might want to rescale your data though. If you add the parameter full=True to np.polyfit, you will receive additional information, including the residuals which essentially is the sum of the square fit errors. See this other SO post for more details.
import matplotlib.pyplot as plt
import numpy as np
x = [0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100]
y = [0,885,3517,5935,8137,11897,10125,13455,14797,15925,16837,17535,18017,18285,18328,18914,19432,19879,20249,20539,20746]
m = max(y)
y = [p/m for p in y] # rescaled y such that max(y)=1, and dimensionless
z, residuals, rank, sing_vals, cond_thres = np.polyfit(x,y,3,full=True)
print("Z: ",z) # [ 9.23914285e-07 -2.67082878e-04 2.79497972e-02 -5.39544708e-02]
print("resi:", residuals) # 0.02188 : quite decent, depending on WHAT you're measuring ..
Z = [z[3] + q*z[2] + q*q*z[1] + q*q*q*z[0] for q in x]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(x,y)
ax.plot(x,Z,'r')
plt.show()

After I reviewed the answer of #Magnus, I reduced the limits used for the data in a 3rd order polynomial. As you can see, the points within my crudely drawn red circle cannot both lie on a smooth line with the nearby data. While I could fit smooth lines such as a Hill sigmoidal equation through the data, the data variance (noise) itself appears to be the limiting factor in achieving a peak absolute error of 150 with this data set.

Related

Find solutions for a set of quadratic equations

I have a set of quadratic equations, like
x² + x + y = 7
x + 3y = -3
y² + x² + z = 11
All coefficients are integers. The set of equations could either have no solution, one solution of a whole set of solutions.
Does anybody know a method to find out whether these equations have a solution?
My first idea was to solve these equations one by one in double and use the results to solve the other equations. The problems are the rounding errors: If, in theory, I have two equations
x + y = 5
x = 5 - y
there would be plenty of solutions. But if my method results in
x + y = 4.999999
x = 5 - y
the system suddenly has no solution. In the next step, I could add epsilons to compensate for rounding errors, but I am not sure how large they should be. Any ideas or better approaches?
PS: The background is to find intersection points of a complicated set of circles and lines in the plane.

Since you have exact input with integers you could use exact algorithms.
You could, for instance, compute a Groebner basis of the polynomials corresponding to your equations, e.g.
x² + x + y - 7
x + 3y + 3
y² + x² + z - 11
With lexicographic ordering of the terms you will get a Groebner basis in a kind of "triangular" form, where the first polynomial contains as few variables as possible, e.g.
81z² - 176z + 92
2y + 9z - 8
2x - 27z + 30
This gives you two real roots for z and unique values for y and x once z is fixed. If the first polynomial of the computed basis does not contain a variable, then your set of equations does not have any solutions. If the first polynomial in the computed bases contains two variables, then you have an infinite number of solutions (possibly complex).
You can experiment with Groebner bases online with Wolfram Alpha (e.g. compute the basis for your example). A Groebner basis can be computed using the Buchberger algorithm, for which a few implementations are available in Java.
Note: the worst case complexity of the Buchberger algorithm is double exponential in the maximal total degree of the input, but in your application this might not matter.

Interpolating between multiple Points

I need some algorithm ideas here. I have a big set of data (~100k entries) of two-dimensional Points as objects with a third variable as the "value" at this point of the plane. Now I want to calculate the interpolated Point at some given coordinates.
My only idea so far would be to iterate over the whole dataset and collect a list of the 4 nearest Points to the one I want to calculate, and then to interpolate between them.
How do I interpolate weighted (by the point's distance in this case) between multiple data? So that Points nearer to the cross are more present in the result?
Do you have any other ideas on how to get the interpolated value at a given coordinate in this case? It should be relatively exact (but does not have to be 100%) and should not take ages to calculate, since this has to be done some 10.000 times and I don't want the user to wait too long.
About the data: The Points are nearly in a grid, and the values of the 2dimensional Points are actually height values, so the values of two near points are also nearly equal.
The Data is stored in a HashMap<Vector2f, Float> where Vector2f is a simple class consisting of 2 floats. There is nothing sorted.

This is non-obvious problem. It is a lot easier for 3 points, with 4 points you will get into situations which are ambigous. Imagine (for points surrounding your sample in square) ul and br corners having value 1, and other corners having value 5. It can be interpreted as valley with height 1 going through center, ridge with height 5 going there, or some kind of fancy spline saddle shape. If you add irregularity of grid into account, it becomes even more fun with closest 4 points being on same side of sample (so you cannot just choose 4 closest points, you need to find 4 bounding points).
For 3 point case, please check https://en.wikipedia.org/wiki/Barycentric_coordinate_system#Application:_Interpolation_on_a_triangular_unstructured_grid
For 4 point case on regular grid,you can use https://en.wikipedia.org/wiki/Bilinear_interpolation to generate kind of 'fair' interpolation.
For 4 points on irregular grid, you can look into solution like
http://www.ahinson.com/algorithms_general/Sections/InterpolationRegression/InterpolationIrregularBilinear.pdf
but be careful with finding proper bounding points (as I mentioned, finding closest ones won't work)

Looks like Bilinear interpolation. There are a few general algorithms and if you have constraints on your data you might be able to use them to simplify the algorithms. Since "the Points are nearly in a grid", I made the approximation that they are ("It should be relatively exact (but does not have to be 100%) and should not take ages to calculate").
Given
2 float x, y as the point you want to interpolate.
4 objects of type Vector2f named v1 to v4, and assuming the coordinates of each can be accessed as v1.x and v1.y and that they are approximately on a grid:
v1.x == v3.x, v2.x == v4.x, v1.y == v2.y, v3.y == v4.y
That the value of each Vector2f was retrieved as float h1 = map.get(v1) (h stands for "height"), then you could write something like this:
float n1 = h1 * (v2.x - x) * (v2.y - y);
float n2 = h2 * (x - v1.x) * (v2.y - y);
float n3 = h3 * (v2.x - x) * (y - v1.y);
float n4 = h4 * (x - v1.x) * (y - v1.y);
float den = (v2.x - v1.x) * (v2.y - v1.y);
float height = (n1 + n2 + n3 + n4) / den;
As a side note, you might also want to look into making your class strictfp.

Calculating distance in a 2D array, Non diagonal

I have the following implemented in Java:
[1,1][1,2][1,3][1,4]
[2,1][2,2][ B ][2,4]
[ A ][3,2][3,3][3,4]
I want to be able to calculate the Minimum distance between [ A ] and [ B ], without moving diagonally, i have searched online, but I'm not really sure how to word what I'm looking for. so far i have been able to calculate the diagonal distance using:
dMin = Math.min(dMin, Math.abs((xDistance - yDistance)));
Could some one please give me an algorithm i could look for online? any help is appreciated. thanks for you time :)
Expected output is:
Distance = 3 //Not Distance = 2 (as it would be diagonally).

It's called Manhattan Distance and can easily be computed by:
distance = abs(ydistance) + abs(xdistance)
That is, the number of cells you must travel vertically, plus the number of cells you must travel horizontally, much like a taxi driving through a grid of city streets.

You want the absolute difference between the x values of the points, plus the absolute difference between the y values of the points.
ie:
dMin = Math.abs(A.x - B.x) + Math.abs(A.y - B.y)
This is known as Manhattan Distance

You want the difference along the X axis plus the difference along the Y axis. Something like this:
int minDistance = Math.abs(A.x - B.x) + Math.abs(A.y - B.y);

Minimize complex linear multivariable function in java

I need to minimize a complex linear multivariable function under some constraints.
Let x be an array of complex numbers of length L.
a[0], a[1], ..., a[L-1] are complex coefficients and
F is the complex function F(x)= x[0]*a[0] + x[1]*a[1] + ... + x[L-1]*a[L-1] that has to be minimized.
b[0], b[1], ..., b[L-1] are complex coefficients and there is a constraint
1 = complexConjuate(x[0])*x[0] + complexConjuate(x[1])*x[1] + ... + complexConjuate(x[L-1])*x[L-1] that has to be fulfilled.
I already had a detailed look at http://math.nist.gov/javanumerics/ and went through many documentations. But I couldn't find a library which does minimization for complex functions.

You want to minimize a differentiable real-valued function f on a smooth hypersurface S. If such a minimum exists - in the situation after the edit it is guaranteed to exist because the hypersurface is compact - it occurs at a critical point of the restriction f|S of f to S.
The critical points of a differentiable function f defined in the ambient space restricted to a manifold M are those points where the gradient of f is orthogonal to the tangent space T(M) to the manifold. For the general case, read up on Lagrange multipliers.
In the case where the manifold is a hypersurface (it has real codimension 1) defined (locally) by an equation g(x) = 0 with a smooth function g, that is particularly easy to detect, the critical points of f|S are the points x on S where grad(f)|x is collinear with grad(g)|x.
Now the problem is actually a real (as in concerns the real numbers) problem and not a complex (as in concerning complex numbers) one.
Stripping off the unnecessary imaginary parts, we have
the hypersurface S, which conveniently is the unit sphere, globally defined by (x|x) = 1 where (a|b) denotes the scalar product a_1*b_1 + ... + a_k*b_k, the gradient of g at x is just 2*x
a real linear function L(x) = (c|x) = c_1*x_1 + ... + c_k*x_k, the gradient of L is c independent of x
So there are two critical points of L on the sphere (unless c = 0 in which case L is constant), the points where the line through the origin and c intersects the sphere, c/|c| and -c/|c|.
Obviously L(c/|c|) = 1/|c|*(c|c) = |c| and L(-c/|c|) = -1/|c|*(c|c) = -|c|, so the minimum occurs at -c/|c| and the value there is -|c|.

Each complex variable x can be considered as two real variables, representing the real and imaginary part, respectively, of x.
My recommendation is that you reformulate your objective function and constraint using the real and imaginary parts of each variable or coefficient as independent components.
According to the comments, you only intend to optimize the real part of the objective function, so you can end up with a single objective function subject to optimization.
The constraint can be split into two, where the "real" constraint should equal 1 and the "imaginary" constraint should equal 0.
After having reformulated the optimization problem this way, you should be able to apply any optimization algorithm that is applicable to the reformulated problem. For example, there is a decent set of optimizers in the Apache Commons Math library, and the SuanShu library also contains some optimization algorithms.

Find nearest point relative to an object without using the distance formula in Java

If I have an object with properties of x an y, how can I tell which point in an array is the closest without using the distance formula?

You can't get an accurate result without using some variant of the distance formula. But you can save a few cycles by not taking the square root after the fact; the comparison will remain valid.
r = dx2 + dy2

If you don't care about the exact distance, you could perhaps take the difference between the x and y coordinates of your source and destination points to provide you with some ordering.
//The following code does not return the closest point,
//but it somewhat does what you need and complies with
//your requirement to not use the distance formula
//it finds the sum of x and y displacements
Point destination=...
Point nearestPoint= points.get(0);
for (Point p : points){
closenessCoefficient= Math.abs(destination.x-p.x) + Math.abs(a.destination-p.y);
nearestPoint=Math.Min(closenessCoefficient, nearestPoint);
}
return nearestPoint;

If you have to find exactly the closest neighbour, there is no way around evaluating the distance formula, at least for a couple of points. As already pointed out, you can avoid evaluating the expensive sqrt for most of the time when simply comparing the distance-squared r^2 = x^2 + y^2.
However, if you have a big number of points spread out over a large range of distances, you can first use an approximation like the ones shown here http://www.flipcode.com/archives/Fast_Approximate_Distance_Functions.shtml . Then you can calculate the real distance formula only for the points closest as given by the approximation. On architectures where also the multiplication is expensive, this can make a big difference. On modern x86/x86-64 architectures this should not matter much though.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Polynomial Regression values generated too far from the coordinates - java

Related

Find solutions for a set of quadratic equations

Interpolating between multiple Points

Calculating distance in a 2D array, Non diagonal

Minimize complex linear multivariable function in java

Find nearest point relative to an object without using the distance formula in Java

Categories

Resources