As per the the below code for Polynomial Regression coefficients value, when I calculate the regression value at any x point. Value obtained is way more away from the equivalent y coordinate (specially for the below coordinates). Can anyone explain why the difference is so high, can this be minimized or any flaw in understanding. The current requirement is not a difference of more 150 at every point.
import numpy as np
x=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100]
y=[0,885,3517,5935,8137,11897,10125,13455,14797,15925,16837,17535,18017,18285,18328,18914,19432,19879,20249,20539,20746]
z=np.polyfit(x,y,3)
print(z)
I have also tried various various codes available in java, but the coefficient values are same every where for this data. Please help with the understanding.For example
0.019168 * N^3 + -5.540901 * N^2 + 579.846493 * N + -1119.339450
N equals 5 Value equals 1643.76649Y value 885
N equals 10 Value equals 4144.20338Y value 3517
N equals 100; Value=20624.29985 Y value 20746
The polynomial fit performs as expected. There is no error here, just a great deviation in your data. You might want to rescale your data though. If you add the parameter full=True to np.polyfit, you will receive additional information, including the residuals which essentially is the sum of the square fit errors. See this other SO post for more details.
import matplotlib.pyplot as plt
import numpy as np
x = [0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100]
y = [0,885,3517,5935,8137,11897,10125,13455,14797,15925,16837,17535,18017,18285,18328,18914,19432,19879,20249,20539,20746]
m = max(y)
y = [p/m for p in y] # rescaled y such that max(y)=1, and dimensionless
z, residuals, rank, sing_vals, cond_thres = np.polyfit(x,y,3,full=True)
print("Z: ",z) # [ 9.23914285e-07 -2.67082878e-04 2.79497972e-02 -5.39544708e-02]
print("resi:", residuals) # 0.02188 : quite decent, depending on WHAT you're measuring ..
Z = [z[3] + q*z[2] + q*q*z[1] + q*q*q*z[0] for q in x]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(x,y)
ax.plot(x,Z,'r')
plt.show()
After I reviewed the answer of #Magnus, I reduced the limits used for the data in a 3rd order polynomial. As you can see, the points within my crudely drawn red circle cannot both lie on a smooth line with the nearby data. While I could fit smooth lines such as a Hill sigmoidal equation through the data, the data variance (noise) itself appears to be the limiting factor in achieving a peak absolute error of 150 with this data set.
I have some number X (0-999) and six random numbers (<50).
I need program to automatically finds number X (or number closest to X if its not possible to find X) using basic mathematical operations with brackets and those 6 random numbers.
Can someone recommend me a way to approach this problem? I read somewhere that I should use postfix-notations and genetic algorithms but I don't know much about either of those.
Postfix notation avoids the complications that come with using brackets. It allows you to model the equation as
every permutation of the six numbers (6! = 720 permutations in all), followed by
every combination of five operators, where each operator is one of four choices (4^5 = 1024 combinations in all)
The total number of possible equations is 720*1024 = 737280. So I see no reason to use a genetic algorithm, you can simply try all the possibilities. After finding the best postfix solution, you'll need to convert to infix with the appropriate brackets.
If you are asking about a random equation you need to solve your program can solve it just like you would. example:
(10x + 5x + 8)*8 = (x+2)^2 + 3
step 1 work out brackets: 80x + 40x + 64 = x² + 4x + 4 + 3
step 2 move everything to the left: 80x + 40x + 64 - x² - 4x + 4 +3
step 3 simplify: -x² + 116x + 71
step 4 use the formula
I'm trying to calculate conditional entropy in order to calculate information gain for decision trees. I'm having a little trouble with the implementation in Java. An example may look like:
X Y f(x)
1 0 A
1 0 A
0 1 B
Given this example, how would I go about calculating conditional entropy in Java? I understand the math behind it but am just confused on the implementation.
An example can be found here: http://en.wikipedia.org/wiki/Conditional_entropy
Conditional entropy for variable Y:
(Probability of Y = 0)(Entropy of f(x) when Y=0) + (Prob. of Y = 1)(Entropy of f(x) when Y=1)
In your example:
(2/3)(-1(2/2*log(2)) + (1/3)*(-1(1/1*log(1)) = (2/3)*0 + (1/3)*0 = 0
i.e. this is a bad example because your conditional entropy is always 0. May be this will help: http://www.onlamp.com/pub/a/php/2005/03/24/joint_entropy.html?page=3
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm looking to incorporate some sort of implementation of numerical solving for linear algebraic solutions in Java, like this:
5x + 4 = 2x + 3
Ideally, I would prefer to parse as little as possible, and avoid using traditional "human" methods of solutions (i.e. combine like terms, etc). I've been looking into Newton's Method and plugging in values for x to approximate a solution.
I'm having trouble getting it to work though.
Does anyone know the best general way to do this, and how it should be done in code (preferably Java)?
Additional
In Netwon's Method, you iterate until the approximation is to acceptable accuracy. The formula looks like this:
x1 = x0 - (f(x0) / (f '(x0))
where x1 is the next value for x in the iteration, and x0 is the current value (or starting guess if on first iteration).
What is f prime? Assuming f(x0) is the function of your current x estimation, what expression does f'(x0) represent?
Clarification
This is still a question of how to PROGRAM this math evaluation, not simply how to do the math.
f'(x0) is the derivative of f evaluated at x0. You can compute an approximation to f' by evaluating:
f'(x0) ~ (f(x0+epsilon) - f(x0))/epsilon
for a suitably tiny value epsilon (because f is linear, any reasonable value of epsilon will give essentially the same result; for more general functions f the subtlety of choosing a good epsilon to use is entirely too subtle to be discussed in a S.O. post -- enroll in an upper-division undergraduate numerical analysis course).
However, since you want to avoid "human" methods, I should point out that for the specific case of linear equations, Newton's method always converges in a single iteration, and is in fact essentially equivalent to the usual algebraic solution technique.
To illustrate this, consider your example. To use Newton's method, one needs to transform the equation so that it looks like f(x) = 0:
5x + 4 = 2x + 3
5x + 4 - (2x + 3) = 0
So f(x) = 5x + 4 - (2x + 3). The derivative of f(x) is f'(x) = 5 - 2 = 3. If we start with an initial guess x0 = 0, then Newton's method gives us:
x1 = x0 - f(x0)/f'(x0)
= 0 - (5*0 + 4 - (2*0 + 3))/3
= 0 - (4-3)/3
= -1/3
This is actually exactly the same operations that a human would use to solve the equation, somewhat subtly disguised. Taking the derivative isolated the x terms (5x - 2x = 3x), and evaluating at zero isolated the terms without an x (4-3 = 1). Then we divided the constant coefficient by the linear coefficient and negated to get x.
Assuming that you don't want to use some new algorithms or rewrite old algorithms, you can use equation solver.
Suppose I have a simple equation of the form:
7x + 4y = n
where n is chosen by us and x, y and n are all positive integers. This is the only equation which is given to us. Among the possible solutions we need the solution (x,y) in which x is the smallest. e.g.
7x + 4y = 14, then (2, 0) is the solution
7x + 4y = 15, then (1, 2) is the solution
7x + 4y = 32, then (4, 1) and (0, 8) are the possible solutions,
of which (0, 8) is the correct solution
I would like to design an algorithm to calculate it in the least possible running time. The current algorithm which I have in mind goes something like this:
Given an input n
Calculate max(x) = n/7
for i = 0 to max(x)
If the equation 7*i + 4*y = n holds
return value of i and y
else
continue
This algorithm, I presume, can have a running time of upto O(n) in worst case behaviour. Is there some better algorithm to compute the solution?
Let us consider the more general problem
For two coprime positive integers a and b, given a positive integer n, find the pair (x,y) of nonnegative integers such that a*x + b*y = n with minimal x. (If there is one. There need not be, e.g. 7*x + 4*y = 5 has no solution with nonnegative x and y.)
Disregarding the nonnegativity for the moment, given any solution
a*x0 + b*y0 = n
all solutions have the form (x0 - k*b, y0 + k*a) for some integer k. So the remainder of x modulo b and of y modulo a is an invariant of the solutions, and we have
a*x ≡ n (mod b), and b*y ≡ n (mod a)
So we need to solve the equation a*x ≡ n (mod b) - the other one follows.
Let 0 < c be an integer with a*c ≡ 1 (mod b). You find it for example by the extended Euclidean algorithm, or (equivalently) the continued fraction expansion of a/b in O(log b) steps. Both algorithms naturally yield the unique c < b with that property.
Then the minimal candidate for x is the remainder x0 of n*c modulo b.
The problem has a solution with nonnegative x and y if and only if x0*a <= n, and then x0 is the minimal nonnegative x appearing in any solution with nonnegtaive x and y.
Of course, for small a and b like 7 and 4, the brute force is no slower than calculating the inverse of a modulo b.
We have
7(x-4)+4(y+7)=7x+4y
So if (x, y) is a solution, then (x-4,y+7) is also a solution. Hence if there is a solution then there is one with x<4. That's why you only need to test x=0..3 which runs in constant time.
This can be extended to any equation of the form ax+by=n, you only need to test x=0..b-1.
I would recommend checking out the Simplex method in the Numerical Recipes in C book. You can easily treat the C code like pseudo-code and make a java version. The version of the simplex you want is the "constrained-simplex" which deals in positive values only. The book is available online for free. Start with section 10.8 and read forward.
O(n) :
y=n/4;
while((n-4y)%7!=0 && y!=0){
y--;
}
x=(n-4y)/7;