Partition-P roblem with discontinuous subsets

Partition-P roblem with discontinuous subsets - java

I'm trying to solve a variant of the partition problem. I have two important twists. I need to solve for k partitions, not just 2, as in the classic partition problem.
The following code does that:
https://gist.github.com/ishikawa/21680
I also need to allow the freedom to jumble up the order of the items, so that I can get the optimal solution. So, where the classic problem requires the order of the elements be left in tact, and the array is just split at an semi-optimal points, I need to allow the array to be re-ordered in such a way as the difference between the partitions is smallest.
How can I tackle this? Both twists are necessary for this real world application. I'd be extremely happy if I could find a Java library that already handles this.

Related

Finding all paths though a graph. Is it possible (how?) to create an iterator of results, and would it be more efficient?

Lets say I have a graph and a starting node. Each note has a weight.
The problem is to find all paths from the starting node where the sum of weights of the nodes are, say, five.
A simple approach is to do a depth first search, and, on discovering the sum of the current path is five, you could simply copy the solution into a list of solutions.
This simply solution would require you to search the entire tree until there are no more possibilities. What if you only needed one solution? Or two? Or the best out of 100? Wouldn't this approach waste potentially a large amount of memory storing all solutions?
I imagine you could write some sort of Iterator, where .next() simply continues the search until it finds a new path. This way you waste no storage or computation time.
I figured I'd ask if such a known pattern, or solution, or algorithm exists before trying to reinvent the wheel.
Additionally:
My actual problem is a special Iterator which finds all trees matching a certain condition, but I assumed the answer to the more general path problem would lead me closer to the solution to my problem. Any information on this would also be very much appreciated.

You could simply pass around a count in your function and return when you've reached that count.
void dfs(List<...> solutions, int count, ...)
{
// check for solution and add to solutions
if (solutions.size() == count)
return;
...
dfs(solutions, count, ...);
if (solutions.size() == count)
return;
...
}
This is probably the solution requiring the least changes (C# has functionality that allows you to have "some sort of Iterator, where .next() simply continues the search until it finds a new path", but Java doesn't, to my knowledge).
Alternate solutions are:
Have multiple threads. Let one thread run your function, adding items to a global list. Let another thread access this list and presumably add some functionality to have the one running your function stop after adding an element to the list, and let the other thread resume it, or something like that.
If you're using a recursive function, convert this to an iterative one, and store the Stack in a class variable after generating a single solution, and return, allowing you to continue again.

There exists an algorithm used to solve exactly your problem.
It is called the Djikstra's Algorithm and is used to solve shortest path node problems.
You can find codes of it on the web, and probably learn more as well.
Wikilink

What's the best way to classify a high dimensional int-vector with the weka API?

I have some high dimensional (30000 dimensions) vectors of integer numbers. I have 2 classes: [YES, NO]. I have 6000 samples of the YES-class and 50000 samples of the NO-class. I would like to train a classifier, to classify new samples in future automatically to one of these classes.
I know how to use the Weka Java API, but I am not sure which algorithms in which order to use. Can anyone give me advice on the following questions:
Are the vectors too high dimensional or do I have too many samples to do this efficiently in Weka?
Should I reduce the dimensionality before I start? What algorithm can I use to identify significant elements of my feature vector?
What classifier would be best to classify this kind of data? I think a decision tree should work fine, but maybe a naive bayes is faster to train, is it?
Since every element must have a name in weka, how can I assign a name to each of my 30000 features?
Any advice is appreciated. Thanks.

The number of dimensions to this problem most certainly are quite large, but I believe that Weka should be able to handle a large number of dimensions. The number of samples should not be a problem, but there are a lot more NO class samples than there are YES Class, so balancing the two might assist in classifying the NO Class cases better.
If you believe that there are redundant dimensions or some of the dimensions may contain noise, then it would certainly help.
A decision tree shouldn't be too much of a problem. There are a number of algorithms available in Weka, but I wouldn't recommend Neural Networks given the dimensionality of the problem.
If you have saved the data in a CSV File, you could assign attribute names in the first row of the data. This way, you can assign attribute names. Given the number of dimensions, you would likely call these a1 to a30000 and output for the output class.
Hope this Helps!

Existing Algorithm for Scheduling Problems?

Let's say I want to build a function that would properly schedule three bus drivers to drive in a week with the following constraints:
Each driver must not drive more than five times per week
There must be two drivers driving everyday
They will rest one day each week (will not clash with other drivers' rest day)
What kind of algorithm would be used to solve a problem like this?
I looked through several sites and I found these:
1) Backtracking algorithm (brute force)
2) Genetic algorithm
3) Constraint programming
Frankly, these are all "culture shock" for me as I have never learnt any kind of linear programming in the past. There are two things I want to know:
1) Which algorithm will best suit the case scenario above?
2) What would be the simplest algorithm to solve this problem?
3) Please suggest any other algorithms I can look into to solve the above problem.

1) I agree brute force is bad.
2) Your Problem is an Integer Problem. They can be solved with Linear Programming though.
3) You can distinquish 2 different approaches: heuristics and exact approaches.
Heuristics provide good solutions in reasonable computation time. They are used when there are strict requirements on the computation time or if the problem is too hard to calculate an optimal solution. Genetic Algorithms is a heuristic.
As your Problem is comparably simple, you would probably go with an exact approach.
4) The standard way to solve this exacly, is to embed a Linear Program in a Branch & Bound search tree. There is lots of literature on it. The procedure can be outlined as follows:
Solve the Linear Program with the Simplex-Algorithm
Find a fractional variable for branching. I.e. x=1.5
Create two new nodes and add the constraints x<=1 and x>=2 respectively
Go into one node (selected by some strategy)
Go to point 1
Additionally, at every node in the tree, after point 1, the algorithms checks, if a node can be pruned. That means to stop searching 'deeper' from this node on, because
a) the problem has become infeasible,
b) a better solution already exists,
c) an integer solution is found. This objective value of this solution is used to determine point b.
The procedure finishes when all nodes are pruned.
Luckily, as Nicolas stated, there are free implementations that do just this. All you have to do is to create your model. Code its objective and constraints in some tool and let it solve.

First of all this is a discrete optimization problem, so linear programming is probably not a good idea (since it is meant for continuous optimization). You can still solve this using linear programming (it will become an integer or mixed-integer program) but that is exponentially heard (if your input size is small then it is ok).
Now back to the comparison:
Brute force : worst.
Genetic: Can not guarantee optimality. The algorithm may not be able to solve the problem.
Constraint programming: definitely the best in this case (and in many discrete optimization problems). There is a super efficient implementation of it in IBM ILOG CPLEX solver (but is is not free, it is free for academia or for testing though).

R-Tree implementation in Java [duplicate]

I was searching the last few days for a stable implementation of the R-Tree with support of unlimited dimensions (20 or so would be enough). I only found this http://sourceforge.net/projects/jsi/ but they only support 2 dimensions.
Another Option would be a multidimensional implementation of an interval-tree.
Maybe I'm completly wrong with the idea of using an R-Tree or Intervall-Tree for my Problem so i state the Problem in short, that you can send me your thoughts about this.
The Problem I need to solve is some kind of nearest-neighbour search. I have a set of Antennas and rooms and for each antenna an interval of Integers. E.g. antenna 1, min -92, max -85. In fact it could be represented as room -> set of antennas -> interval for antenna.
The idea was that each room spans a box in the R-Tree over the dimension of the antennas and in each dimension by the interval.
If I get a query with N-Antennas and values for each antenna I then could just represent the Information as a query point in the room and retrieve the rooms "nearest" to the point.
Hope you got an Idea of the problem and my idea.

Be aware that R-Trees can degrade badly when you have discrete data. The first thing you really need to find out is an appropriate data representation, then test if your queries work on a subset of the data.
R-Trees will only make your queries faster. If they don't work in the first place, it will not help. You should test your approach without using R-Trees first. Unless you hit a large amount of data (say, 100.000 objects), a linear scan in-memory can easily outperform an R-Tree, in particular when you need some adapter layer because it is not well-intergrated with your code.
The obvious approach here is to just use bounding rectangles, and linearly scan over them. If they work, you can then store the MBRs in an R-Tree to get some performance improvements. But if it doesn't work with a linear scan, it won't work with an R-Tree either (it will not work faster.)

I'm not entirely clear on what your exact problem is, but an R-Tree or interval tree would not work well in 20 dimensions. That's not a huge number of dimensions, but it is large enough for the curse of dimensionality to begin showing up.
To see what I mean, consider just trying to look at all of the neighbors of a box, including ones off of corners and edges. With 20 dimensions, you'll have 320 - 1 or 3,486,784,400 neighboring boxes. (You get that by realizing that along each axis a neighbor can be -1 unit, 0 unit, or +1 unit, but (0,0,0) is not a neighbor because it represents the original box.)
I'm sorry, but you either need to accept brute force searching, or else analyze your problem better and come up with a cleverer solution.

I have found this R*-Tree implementation in Java which seems to offer many features:
https://github.com/davidmoten/rtree
You might want to check it out!

Another good implementation in Java is ELKI: https://elki-project.github.io/.

You can use PostgreSQL’s Generalized Search Tree indexing facility.
GiST
Quick demo

R-Tree Implementation Java

I was searching the last few days for a stable implementation of the R-Tree with support of unlimited dimensions (20 or so would be enough). I only found this http://sourceforge.net/projects/jsi/ but they only support 2 dimensions.
Another Option would be a multidimensional implementation of an interval-tree.
Maybe I'm completly wrong with the idea of using an R-Tree or Intervall-Tree for my Problem so i state the Problem in short, that you can send me your thoughts about this.
The Problem I need to solve is some kind of nearest-neighbour search. I have a set of Antennas and rooms and for each antenna an interval of Integers. E.g. antenna 1, min -92, max -85. In fact it could be represented as room -> set of antennas -> interval for antenna.
The idea was that each room spans a box in the R-Tree over the dimension of the antennas and in each dimension by the interval.
If I get a query with N-Antennas and values for each antenna I then could just represent the Information as a query point in the room and retrieve the rooms "nearest" to the point.
Hope you got an Idea of the problem and my idea.

Be aware that R-Trees can degrade badly when you have discrete data. The first thing you really need to find out is an appropriate data representation, then test if your queries work on a subset of the data.
R-Trees will only make your queries faster. If they don't work in the first place, it will not help. You should test your approach without using R-Trees first. Unless you hit a large amount of data (say, 100.000 objects), a linear scan in-memory can easily outperform an R-Tree, in particular when you need some adapter layer because it is not well-intergrated with your code.
The obvious approach here is to just use bounding rectangles, and linearly scan over them. If they work, you can then store the MBRs in an R-Tree to get some performance improvements. But if it doesn't work with a linear scan, it won't work with an R-Tree either (it will not work faster.)

I'm not entirely clear on what your exact problem is, but an R-Tree or interval tree would not work well in 20 dimensions. That's not a huge number of dimensions, but it is large enough for the curse of dimensionality to begin showing up.
To see what I mean, consider just trying to look at all of the neighbors of a box, including ones off of corners and edges. With 20 dimensions, you'll have 320 - 1 or 3,486,784,400 neighboring boxes. (You get that by realizing that along each axis a neighbor can be -1 unit, 0 unit, or +1 unit, but (0,0,0) is not a neighbor because it represents the original box.)
I'm sorry, but you either need to accept brute force searching, or else analyze your problem better and come up with a cleverer solution.

I have found this R*-Tree implementation in Java which seems to offer many features:
https://github.com/davidmoten/rtree
You might want to check it out!

Another good implementation in Java is ELKI: https://elki-project.github.io/.

You can use PostgreSQL’s Generalized Search Tree indexing facility.
GiST
Quick demo

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.