Java: Reduce array to specific number of averages - java

The main issue which needs to be solved is:
Let's say I have an array with 8 numbers, e.g. [2,4,8,3,5,4,9,2] and I use them as values for my x axis in an coordinate system to draw a line. But I can only display 3 of this points.
What I need to do now is do reduce the number of points (8) to 3, without manipulating the line too much - so using an average should be an option.
I am NOT looking for the average of the array in a whole - I still need 3 points of the amount of 8 in total.
For an array like [2,4,2,4,2,4,2,4] and 4 numbers out of that array, I could simply use the average "3" of each pair - but that's not possible if the number is uneven.
But how would I do that? Do you know how this process is called in a mathematical way?
To give you some more realistic details about this issue: I have an x axis, which is 720px long and let's say I get 1000 points. Now I have to reduce this 1000 points (2 arrays, one for x and one for y values) to a maximum of 720 points.
Thought about interpolation and stuff like that, but I'm still not quite sure if this is what I am looking for.

Interpolation is good idea. You input your points and get a polynomial function as an output. Then you can use it to draw your line. Check more here : Interpolation over an array (or two)

I would recommend that you fit all the points you have in some fashion and then evaluate at the particular points you need for the display.
There are a myriad of choices for fitting:
Least squares
Piecewise using polynomials or splines
You should consult a text or find a library to help you - something like Apache Commons Math.

It sounds like you are looking for a more advanced mathematical function than a simple average.
I would suggest trying to identify potential algorithms via Mathematica Stack Exchange and then trying to find a Java library that implements any of the potential choices (maybe a new question here).

since its for an X-axis, why not use the
MIN, MAX and (MIN+MAX)/2
for your three points?

Related

boxcar averaging algorithm of the specified weight

Ok, I need to write a java algorithm which simulates the SMOOTH function written in IDL. But I'm not quite sure how that algorithm works. The smooth equation is given by:
I know there is already a similar post regarding boxcar averaging. But the algorithm seems to be different.
What I understand in this equation is that there is two states (if statement), the first one is calculating the weight average, the second one is to ignore the boundary.
In the first equation, I think I got the summation notation, it starts from 0 to (w - 1).
What I don't get is the one inside summation Ai+j-w/2.
The following is the sample data (just corner part of large data) that was calculated using IDL. I used weight 5 to calculate this.
Please, explain me how that algorithm works.
Thanks
You want the i'th average to be from a window around the i'th point. So it has to start before that point, and end after.
Subtracting off w/2 in the index causes j=0 to be the start of the window you want, and j=w-1 to be the end of the window you want.
It would be entirely equivalent to sum from j=-w/2 to j=w/2-1 instead.

Finding an equation to connect the max number of possible points

I would like to find a way, given a set of any points on a 2 dimensional (or 3 dimensional if possible) plane, to connect as many of these points as possible with an equation, preferably in the form of X^n+BX^n and so on. X being of course a variable, and b and n being any numbers.
This would hopefully work in a way that given say, 50 random points, I would be able to use the equation to draw a line that would pass through as many of these points as possible.
I plan on using this in a compression format where data is converted to X,Y coordinate pairs, the goal is then to create equations that can reproduce these points. The equation would then be stored and the data would be replaced with a pointer to the equation as well as the number to enter into the equation to get the data back.
Any feedback is nice, this is just an idea I thought of during class and wanted to see if it would be possible to implement in a usable format.
To connect n points you need a polynomial of at most degree n-1. You can use Polynomial Regression to form your line.

looking for a single index to show the similarity in point matching algorithm

I did read the Point set registration and would like to implement it for my simple line matching. However, I only got very basic maths knowledge and cannot really understand the equations on the page.
Assuming I am able to extract points from 2 images, searching nearest pair by brute force looping and got a list of pairs with corresponding distances.
What is the next step to calculate a single index by utilizing the above data obtained?
The idea I currently come up with is to simply average all the distance. I believe this are many better approach. Or I should capture more data for the calculation?
Your instincts are almost correct.
Generally, the metric is the sum of squared distances; with the goal of finding the least-squares fit (minimizing the sum of all the individual square distances). Essentially this minimizes the standard deviation (actually it minimizes variance, but same end effect).
So take all your corresponding pairs, calculate the distance squared between them (fast calculation, no sqrt involved; faster than calculating actual distances) add them up and the lower the better. If your point sets differ in count you may wish to divide by the count to get a proper variance value.
This metric applies to pretty much any registration algorithm.
By the way, if you already have a point correspondance and you know there is no scaling/skewing, you might also be interested in Horn's method, which is a closed-form (non-iterative) algorithm that just spits out the least-squared fit directly. It's very efficient.
(P.S. For a very simple explanation of why the variance is a better indicator than the mean distance, check out this page).

Best data structure to store and manipulate my data?

I am writing a simple Java program that will input a text file which will have some numbers representing a (n x n) matrix where numbers are separated by spaces. for ex:
1 2 3 4
5 6 7 8
9 1 2 3
4 5 6 7
I then want to store these numbers in a data structure that I will then use to manipulate the data (which will include, comparing adjecent numbers and also deleting certain numbers based on specific rules.
If a number is deleted, all the other numbers above it fall down the amount of spaces.
For the example above, if say i delete 8 and 9, then the result would be:
() 2 3 ()
1 6 7 4
5 1 2 3
4 5 6 7
so the numbers fall down in their columns.
And lastly, the matrix given will always be square (so always n x n, where n will be always given and will always be positive), therefore, the data structure has to be flexible to virtually accept any n-value.
I was originally implementing it in a 2-d array, but I was wandering if someone had an idea of a better data structure that I could use in order to improve efficiency (something that will allow me to more quickly access all the adjacent numbers in the matrix (rows and columns).
Ultimately, mu program will automatically check adjacent numbers against the rules, I delete numbers, re-format the matrix, and keep going, and in the end i want to be able to create an AI that will remove as many numbers from the matrix as possible in the least amount of moves as possible, for any n x n matrix.
In my opinion, you yo know the length of your array when you start, you are better off using an array. A simple dataType will be easier to navigate (direct access). Then again, using LinkedLists, you will be able to remove a middle value without having to re-arrange the data inside you matrix. This will leave you "top" value as null. in your example :
null 2 3 null
1 6 7 4
5 1 2 3
4 5 6 7
Hope this helps.
You could use one dimensional array with the size n*n.
int []myMatrix = new myMatrix[n * n];
To access element with coordinates (i,j) use myMatrix[i + j * n]. To fall elements use System.arraycopy to move lines.
Use special value (e.g. Integer.MIN_VALUE) as a mark for the () hole.
I expect it would be fastest and most memory efficient solution.
Array access is pretty fast. Accessing adjacent elements is easy, as you just increment the relevant index(s) (being cognizant of boundaries). You could write methods to encapsulate those operations that are well tested. Having elements 'fall down' though might get complicated, but shouldn't be too bad if you modularize it out by writing well tested methods.
All that said, if you don't need the absolute best speed, there are other options.
You also might want to consider a modified circularly linked list. When implementing a sudoku solver, I used the structure outlined here. Looking at the image, you will see that this will allow you to modify your 2d array as you want, since all you need to do is move pointers around.
I'll post a screen shot of relevant picture describing the datastructure here, although I would appreciate it if someone will warn me if I am violating some sort of copy right or other rights of the author, in which case I'll take it down...
Try a Array of LinkedLists.
If you want the numbers to auto-fall, I suggest you to use list for the coloumns.

How to find function range?

I have an arbitrary function or inequality (consisting of a number of trigonometrical, logarithmical, exponential, and arithmetic terms) that takes several arguments and I want to get its range knowing the domains of all the arguments. Are there any Java libraries that can help to solve a problem? What are the best practices to do that? Am I right that for an arbitrary function the only thing can be done is a brute-force approximation? Also, I'm interested in functions that can build intersections and complements for given domains.
Upd. The functions are entered by the user so the complexity cannot be predicted. However, if the library will treat at least simple cases (1-2 variables, 1-2 terms) it will be OK. I suggest the functions will mostly define the intervals and contain at most 2 independent variables. For instance, definitions like
y > (x+3), x ∈ [-7;8]
y <= 2x, x ∈ [-∞; ∞]
y = x, x ∈ {1,2,3}
will be treated in 99% of cases and covering them will be enough for now.
Well, maybe it's faster to write a simple brute-force for treating such cases. Probably it will be satisfactory for my case but if there are better options I would like to learn them.
Notational remark: I assume you want to find the range of the function, i.e. the set of values that the function can take.
I think this problem is not simple. I don't think that "brute force" is a solution at all, what does "brute force" even mean when we have continuous intervals (i.e infinitely many points!).
However, there might be some special cases where this is actually possible. For example, when you take a sin(F(x)) function, you know that its range is [-1,1], regardless of the inner function F(x) or when you take Exp(x) you know the range is (0,+inf).
You could try constructing a syntax tree with information about the ranges associated to each node. Then you could try going bottom-up through the tree to try to compute the information about the actual intervals in which the function values lie.
For example, for the function Sin(x)+Exp(x) and x in (-inf, +inf) you would get a tree
+ range: [left range] union [right range]
/ \
sin exp range [-1, 1] , range: (0,+inf)
| |
x x
so here the result would be [-1, 1] union (0, +inf) = [-1, +inf).
Of course there are many problems with this approach, for example the operation on ranges for + is not always union. Say you have two functions F(x) = Sin(x) and G(x) = 1-Sin(x). Both have ranges [-1,1], but their sum collapses to {1}. You need to detect and take care of such behaviour, otherwise you will get only an upper bound on the possible range (So sort of codomain).
If you provide more examples, maybe someone can propose a better solution, I guess a lot depends on the details of the functions.
#High Performance Mark: I had a look at JAS and it seems that its main purpose is to deal with multivariate polynomial rings, but the question mentioned trigonometric, logarithmic and other transcendental functions so pure polynomial arithmetic will not be sufficient.
Here's another approach and depending on how crazy your function can be (see EDIT) it might give you the universal solution to your problem.
Compose the final expression, which might be rather complex.
After that use numerical methods to find minimum and maximum of the function - this should give you the resulting range.
EDIT: Only in the case that your final expression is not continuous the above would not work and you would have to divide into continuous sections for each you would have to find min and max. At the end you would have to union those.
I would have thought that this is a natural problem to tackle with a Computer Algebra System. I googled around and JAS seems to be the most-cited Java CAS.
If I had to confine myelf to numeric approaches, then I'd probably tackle it with some variety of interval computations. So: the codomain of sin is [-1,1], the codomain of exp is (0,+Inf), and the codomain of exp(sin(expression)) is ...
over to you, this is where I'd reach for Mathematica (which is callable from Java).

Categories

Resources