Programatic way of counting floating point operations (JAVA) - java

I'm looking for a programmatic way of counting the number of floating point operations (flops) in call to a function, in JAVA.
There are several closely related questions, asking about what floating points are, and how to do big-O computational complexity analysis, for example here, here and here. But note that in my application I don't want a big-O number, I want to know for a particular run of a function (i.e. a particular input data size) how many flops did it take.
The two closest solutions I can find are (1) suggestions to use a run-time profiler to count the number of flops, but this does not suit my needs as I need to use the result later in the program and (2) a library of computation functions which can be called to increment a counter, and a closely related suggestion here.
These last two suggestions would meet my needs but involve a lot of manual modifications to the code I need to count. An alternative would be to just use CPU run-time which would be very quick and easy, but also quite rough.
Is anyone aware of a programmatic way of counting the flops executed by a section of code?

Related

Genetic Algorithm - convergence

I have a few questions about my genetic algorithm and GAs overall.
I have created a GA that when given points to a curve it tries to figure out what function produced this curve.
An example is the following
Points
{{-2, 4},{-1, 1},{0, 0},{1, 1},{2, 4}}
Function
x^2
Sometimes I will give it points that will never produce a function, or will sometimes produce a function. It can even depend on how deep the initial trees are.
Some questions:
Why does the tree depth matter in trying to evaluate the points and
produce a satisfactory function?
Why do I sometimes get a premature convergence and the GA never
breaks out if the cycle?
What can I do to prevent a premature convergence?
What about annealing? How can I use it?
Can you take a quick look at my code and tell me if anything is obviously wrong with it? (This is test code, I need to do some code clean up.)
https://github.com/kevkid/GeneticAlgorithmTest
Source: http://www.gp-field-guide.org.uk/
EDIT:
Looks like Thomas's suggestions worked well I get very fast results, and less premature convergence. I feel like increasing the Gene pool gives better results, but i am not exactly sure if it is actually getting better over every generation or if the fact that it is random allows it to find a correct solution.
EDIT 2:
Following Thomas's suggestions I was able to get it work properly, seems like I had an issue with getting survivors, and expanding my gene pool. Also I recently added constants to my GA test if anyone else wants to look at it.
In order to avoid premature convergence you can also use multiple-subpopulations. Each sub-population will evolve independently. At the end of each generation you can exchange some individuals between subpopulations.
I did an implementation with multiple-subpopulations for a Genetic Programming variant: http://www.mepx.org/source_code.html
I don't have the time to dig into your code but I'll try to answer from what I remember on GAs:
Sometimes I will give it points that will never produce a function, or will sometimes produce a function. It can even depend on how deep the initial trees are.
I'm not sure what's the question here but if you need a result you could try and select the function that provides the least distance to the given points (could be sum, mean, number of points etc. depending on your needs).
Why does the tree depth matter in trying to evaluate the points and produce a satisfactory function?
I'm not sure what tree depth you mean but it could affect two things:
accuracy: i.e. the higher the depth the more accurate the solution might be or the more possibilities for mutations are given
performance: depending on what tree you mean a higher depth might increase performance (allowing for more educated guesses on the function) or decrease it (requiring more solutions to be generated and compared).
Why do I sometimes get a premature convergence and the GA never breaks out if the cycle?
That might be due to too little mutations. If you have a set of solutions that all converge around a local optimimum only slight mutations might not move the resulting solutions far enough from that local optimum in order to break out.
What can I do to prevent a premature convergence?
You could allow for bigger mutations, e.g. when solutions start to converge. Alternatively you could throw entirely new solutions into the mix (think of is as "immigration").
What about annealing? How can I use it?
Annealing could be used to gradually improve your solutions once they start to converge on a certain point/optimum, i.e. you'd improve the solutions in a more controlled way than "random" mutations.
You can also use it to break out of a local optimum depending on how those are distributed. As an example, you could use your GA until solutions start to converge, then use annealing and/or larger mutations and/or completely new solutions (you could generate several sets of solutions with different approaches and compare them at the end), create your new population and if the convergence is broken, start a new iteration with the GA. If the solutions still converge at the same optimum then you could stop since no bigger improvement is to be expected.
Besides all that, heuristic algorithms may still hit a local optimum but that's the tradeoff they provide: performance vs. accuracy.

Existing Algorithm for Scheduling Problems?

Let's say I want to build a function that would properly schedule three bus drivers to drive in a week with the following constraints:
Each driver must not drive more than five times per week
There must be two drivers driving everyday
They will rest one day each week (will not clash with other drivers' rest day)
What kind of algorithm would be used to solve a problem like this?
I looked through several sites and I found these:
1) Backtracking algorithm (brute force)
2) Genetic algorithm
3) Constraint programming
Frankly, these are all "culture shock" for me as I have never learnt any kind of linear programming in the past. There are two things I want to know:
1) Which algorithm will best suit the case scenario above?
2) What would be the simplest algorithm to solve this problem?
3) Please suggest any other algorithms I can look into to solve the above problem.
1) I agree brute force is bad.
2) Your Problem is an Integer Problem. They can be solved with Linear Programming though.
3) You can distinquish 2 different approaches: heuristics and exact approaches.
Heuristics provide good solutions in reasonable computation time. They are used when there are strict requirements on the computation time or if the problem is too hard to calculate an optimal solution. Genetic Algorithms is a heuristic.
As your Problem is comparably simple, you would probably go with an exact approach.
4) The standard way to solve this exacly, is to embed a Linear Program in a Branch & Bound search tree. There is lots of literature on it. The procedure can be outlined as follows:
Solve the Linear Program with the Simplex-Algorithm
Find a fractional variable for branching. I.e. x=1.5
Create two new nodes and add the constraints x<=1 and x>=2 respectively
Go into one node (selected by some strategy)
Go to point 1
Additionally, at every node in the tree, after point 1, the algorithms checks, if a node can be pruned. That means to stop searching 'deeper' from this node on, because
a) the problem has become infeasible,
b) a better solution already exists,
c) an integer solution is found. This objective value of this solution is used to determine point b.
The procedure finishes when all nodes are pruned.
Luckily, as Nicolas stated, there are free implementations that do just this. All you have to do is to create your model. Code its objective and constraints in some tool and let it solve.
First of all this is a discrete optimization problem, so linear programming is probably not a good idea (since it is meant for continuous optimization). You can still solve this using linear programming (it will become an integer or mixed-integer program) but that is exponentially heard (if your input size is small then it is ok).
Now back to the comparison:
Brute force : worst.
Genetic: Can not guarantee optimality. The algorithm may not be able to solve the problem.
Constraint programming: definitely the best in this case (and in many discrete optimization problems). There is a super efficient implementation of it in IBM ILOG CPLEX solver (but is is not free, it is free for academia or for testing though).

How to calculate max value of function in range?

I have some function (for example, double function(double value)), and some range (for example, from A to B). I need to calculate max value of function in this range. Are there existed libraries for it? Please, give me advice.
If the function needs to handle floating-point values, you're going to have to use something like Golden section search. Note that for this specific method, there are significant limitations regarding the functions that can be handled (specifically it must be unimodal). There are some adjustments you can make to the algorithm which extend it to more functions, specifically these modifications will allow it to work for continuous functions.
Is this a continuous function, or a set of discrete values? If discrete values, then you can either iterate over all values, and set max/min flags as 808sound suggests, or you can load all values into an array.
If it's a continuous function, then you can either populate an array with the function's value at discrete inputs, and find the max as above, or if it's differentiable, then you can use basic calculus to find the points at which df(x)/dx are 0. The latter case is a little more abstract, and probably more complicated than you want, though?
A quick google search led me to this:
http://code.google.com/p/javacalculus/
But I've never used it myself, so I don't know if that implements the required functionality. It does differential equations, though, so I assume they'd have "baby stuff" like basic differentiation.
I do not know if there are any librairies in Java for your problem.
But I know you can easily do that with MatLab (or Octave for the OpenSource equivalent).
If you do not have any indication of what the functions inner workings are (i.e. the function is a black box that accepts an input and produces an output), there is no "easy" way to find the global maximum.
There are an infinite number of points to choose for your input (technically) so "iterating over all possible inputs" is not feasible mathematically.
There are various algorithms that will give you estimated maximum values ina function like this:
The hill climbing algorithm, and the firefly algorithm are two, but there are many more. This is a fairly well documented/studied computer science problem and there is a lot of material online for you to look at. I suggest starting with the hill climbing algorithm, and maybe expanding out to other global optimization algorithms.
Note: These algorithms do not guarantee that the result is the maximum, but provide an estimate of its value.*

Most Efficient way to calculate integrals/derivatives of inputted functions in Java?

I now have an idea, that I use the function as a string, and I calculate the real integral by hand, and ask a question to the user what the definite integral is, but that isn't a real solution.
I was wondering if there was a way to input a function and output an integral/derivative (depending on user choice). My initial step was to put it into an array somehow, but given the many types of functions, this wasn't happening.
I researched everywhere, and I haven't found a method that actually does this with no additional code, nor any code that actually does this, period.
Also, I want to see if there was a way to make a GUI interface and plot inputted functions on to that, if that's possible too.
Thanks :)
What you're describing is known as symbolic integration. There's currently no fully general way to implement it, but there are some techniques available. One such is the Risch algorithm.
Alternatively, an easier problem than symbolic integration is [symbolic differentiation -- and, if the differential of the user's input is equivalent* to the expression which they were asked to integrate, then their integral is probably correct.
You may also want to consider using an existing CAS**, such as Mathematica, to implement this. They've already implemented most of the tools you're after.
*: Keep in mind, though, that two mathematical expressions may be equivalent without being identical, either in trivial ways (e.g, terms in a different order), more complex ones (e.g, large expressions factored differently), or fundamentally (e.g, trig functions replaced with complex exponentials or vice versa).
**: Computer algebra system
Javacalculus is what you are looking for.
Good luck!

For given operations on a large set of data, is there a way to determine if the data can be decomposed into mapreduce operations?

We do stats and such on large sets of data. Right now it is all done on one machine. We're studying the feasibility of moving to a map-reduce paradigm where we decompose the data into subsets, run some operations on that, then combine the results.
Is there any sort of mathematical test that can be applied to a set of operations to determine if the data they operate on can be decomposed?
Or maybe a list somewhere saying what can and cannot be decomposed?
For instance, I didn't think there was a way to decompose standard deviation, but there is...
edit: added tags
Take a look at this paper: http://www.janinebennett.org/index_files/ParallelStatisticsAlgorithms.pdf . They have algorithms for many common statistical problems, and there is open source code available.
Variance, as well as the mean can be calculated online (in a single pass), see wikipedia. There's also a parallel algorithm.
Parallel computing is best suited to problem which are "embarrassingly parallel" i.e., there is no dependency between any two tasks.
Please check out http://en.wikipedia.org/wiki/Embarrassingly_parallel
Also, In cases where the operations are commutative or associative, MapReduce programs can easily be optimized for better performance.

Categories

Resources