I am writing a program which has to be able to sort up to 1 billion random Squares. I wrote a small example program below that creates a random ArrayList of Squares and then sorts it with two different methods.
When I was looking for an efficient method of sorting I found that using a Merge Sort was meant to be the most efficient/quickest. However, when comparing a merge sort to a custom sort (don't know if this sort of sort has a name) which I wrote I found the sort I wrote was more efficient.
The output I got from my program was
Time in nanoseconds for comparator sort: 2346757466
Time in nanoseconds for merge sort: 24156585699
Standard Sort is faster
So why is the sort I wrote so much quicker than a merge sort?
Can either of the used sorts be improved to make a faster, more efficient sort?
import java.security.SecureRandom;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.Objects;
public class SortSquares {
public void run() {
ArrayList<Square> list = new ArrayList<Square>();
SecureRandom rand = new SecureRandom();
int randSize = 10;
for(int i = 1; i <= 10000000; i++)
list.add(new Square(i + rand.nextInt(randSize), i + rand.nextInt(randSize)));
//Create shallow copies to allow for timing
ArrayList<Square> comp = new ArrayList<Square>(list);
ArrayList<Square> merge = new ArrayList<Square>(list);
long startTime = System.nanoTime();
comp.sort(new SquareSort());
long endTime = System.nanoTime();
long duration = (endTime - startTime);
System.out.println("Time in nanoseconds for comparator sort: " + duration);
long startTime1 = System.nanoTime();
merge = mergeSort(merge);
long endTime1 = System.nanoTime();
long duration1 = (endTime1 - startTime1);
System.out.println("Time in nanoseconds for merge sort: " + duration1);
if(duration < duration1)
System.out.println("Standard Sort is faster");
else if(duration == duration1)
System.out.println("The sorts are the same");
else
System.out.println("Merge Sort is faster");
}
private class SquareSort implements Comparator<Square> {
#Override
public int compare(Square s1, Square s2) {
if(s1.getLocation()[0] > s2.getLocation()[0]) {
return 1;
} else if(s1.getLocation()[0] == s2.getLocation()[0]) {
if(s1.getLocation()[1] > s2.getLocation()[1]) {
return 1;
} else if(s1.getLocation()[1] == s2.getLocation()[1]) {
return 0;
} else {
return -1;
}
} else {
return -1;
}
}
}
public ArrayList<Square> mergeSort(ArrayList<Square> whole) {
ArrayList<Square> left = new ArrayList<Square>();
ArrayList<Square> right = new ArrayList<Square>();
int center;
if (whole.size() <= 1) {
return whole;
} else {
center = whole.size()/2;
for (int i = 0; i < center; i++) {
left.add(whole.get(i));
}
for (int i = center; i < whole.size(); i++) {
right.add(whole.get(i));
}
left = mergeSort(left);
right = mergeSort(right);
merge(left, right, whole);
}
return whole;
}
private void merge(ArrayList<Square> left, ArrayList<Square> right, ArrayList<Square> whole) {
int leftIndex = 0;
int rightIndex = 0;
int wholeIndex = 0;
while (leftIndex < left.size() && rightIndex < right.size()) {
if ((left.get(leftIndex).compareTo(right.get(rightIndex))) < 0) {
whole.set(wholeIndex, left.get(leftIndex));
leftIndex++;
} else {
whole.set(wholeIndex, right.get(rightIndex));
rightIndex++;
}
wholeIndex++;
}
ArrayList<Square> rest;
int restIndex;
if (leftIndex >= left.size()) {
rest = right;
restIndex = rightIndex;
} else {
rest = left;
restIndex = leftIndex;
}
for (int i = restIndex; i < rest.size(); i++) {
whole.set(wholeIndex, rest.get(i));
wholeIndex++;
}
}
private class Square {
private int[] location = new int[2];
public Square(int x, int y) {
location[0] = x;
location[1] = y;
}
public int[] getLocation() {
return location;
}
#Override
public boolean equals(Object obj) {
if(obj instanceof Square)
if(getLocation()[0] == ((Square) obj).getLocation()[0] &&
getLocation()[1] == ((Square) obj).getLocation()[1])
return true;
return false;
}
#Override
public int hashCode() {
return Objects.hash(getLocation()[0], getLocation()[1]);
}
public int compareTo(Square arg0) {
if(getLocation()[0] > arg0.getLocation()[0]) {
return 1;
} else if(getLocation()[0] == arg0.getLocation()[0]) {
if(getLocation()[1] > arg0.getLocation()[1]) {
return 1;
} else if(getLocation()[1] == arg0.getLocation()[1]) {
return 0;
} else {
return -1;
}
} else {
return -1;
}
}
}
public static void main(String[] args) {
SortSquares e = new SortSquares();
e.run();
}
}
You can use java.util.Collections.sort( List list ) method from jdk. As mentioned above it uses merge sort with complexity O(nlogn).
In order to measure the performance of your implementation and compared it against other implementation I would suggest to use jmh http://openjdk.java.net/projects/code-tools/jmh/. Please find below a short example.
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.*;
import java.util.concurrent.TimeUnit;
#BenchmarkMode( Mode.AverageTime )
#OutputTimeUnit( TimeUnit.NANOSECONDS )
#State( Scope.Benchmark )
#Warmup( iterations = 5)
#Measurement( iterations = 5 )
#Fork( value = 1)
public class SortingPerformanceBenchmark
{
private final int[] dataArray = new int[10_000_000];
List<Integer> arrayList;
#Setup
public void load() {
Random rand = new Random();
for (int i = 0; i < dataArray.length; ++i) {
dataArray[i] = rand.nextInt();
}
}
#Benchmark
public List<Integer> Benchmark_SortObjects() {
arrayList = new ArrayList( Arrays.asList( dataArray ) );
Collections.sort( arrayList );
return arrayList;
}
public static void main(String... args) throws Exception {
Options opts = new OptionsBuilder()
.include(SortingPerformanceBenchmark.class.getSimpleName())
.build();
new Runner( opts).run();
}
}
The opposite is true: the standard method is much faster.
First, you create two arrays in each call to the recursive function mergeSort. The standard one probably merges the elements inplace in the original array, and use indices to the begin and the end of a range.
Second, the standard method can start new threads on multicore machines.
Considering algorithms It depends largely on the data.
Supposedly your sort method is quicksort.
You have O(n2) worst-case runtime and O(nlogn) average case runtime.
Mergesort is always O(n log n). This means stability. That's why it was chosen for sorting for the java collections.
Both sort and the mergesort you implemented is the same algorithm (sort on java collections is based on merge sort). You need to run the same code many times and warm up your jvm first to have more reliable results.
Somehow you can ensure that your custom mergesort is efficient and make comparisons with the collections one.
In any case you don't have to implement you own merge sort for something simple.
Write a method to return the Toy that occurs in the list most frequent and another method to sort the toys by count.
This is my code
import java.util.ArrayList;
public class ToyStore {
private ArrayList<Toy> toyList;
public ToyStore() {
}
public void loadToys(String toys) {
toyList = new ArrayList<Toy>();
for (String item : toys.split(" ")) {
Toy t = getThatToy(item);
if (t == null) {
toyList.add(new Toy(item));
} else {
t.setCount(t.getCount() + 1);
}
}
}
public Toy getThatToy(String nm) {
for (Toy item : toyList) {
if (item.getName().equals(nm)) {
return item;
}
}
return null;
}
public String getMostFrequentToy() {
int position = 0;
int maximum = Integer.MIN_VALUE;
for (int i = toyList.size() - 1; i >= 0; i--) {
if (toyList.get(i).getCount() > maximum)
maximum = toyList.get(i).getCount();
position = i;
}
return toyList.get(position).getName();
}
public void sortToysByCount() {
ArrayList<Toy> t = new ArrayList<Toy>();
int count = 0;
int size = toyList.size();
for (int i = size; i > 0; i--) {
t.add(new Toy(getMostFrequentToy()));
t.get(count).setCount(getThatToy(getMostFrequentToy()).getCount());
toyList.remove(getThatToy(getMostFrequentToy()));
count++;
}
toyList = t;
}
public String toString() {
return toyList + "" + "\n" + "max == " + getMostFrequentToy();
}
}
Here is the method I care about
public void sortToysByCount() {
ArrayList<Toy> t = new ArrayList<Toy>();
int count = 0;
int size = toyList.size();
for (int i = size; i > 0; i--) {
t.add(new Toy(getMostFrequentToy()));
t.get(count).setCount(getThatToy(getMostFrequentToy()).getCount());
toyList.remove(getThatToy(getMostFrequentToy()));
count++;
}
toyList = t;
}
Here is my output
[sorry 4, bat 1, train 2, teddy 2, ball 2]
Here is what I want
[sorry 4, train 2, teddy 2, ball 2, bat 1];
What is wrong in my code? How do I do it?
The problem is in your getMostFrequentToy() method:
Replace
if (toyList.get(i).getCount() > maximum)
maximum = toyList.get(i).getCount();
position = i;
with
if (toyList.get(i).getCount() > maximum) {
maximum = toyList.get(i).getCount();
position = i;
}
because you want to get the position that corresponds to that maximum.
You have some in-efficiencies in your code. Every single time you call getMostFrequentToy(), you are iterating over the whole list, which may be fine as you are constantly removing objects, but you really don't need to make new Toy objects for those that already exist in the list.
So, this is "better", but still not sure you need to getThatToy when you should already know which one is the most frequent.
String frequent;
for (int i = size; i > 0; i--) {
frequent = getMostFrequentToy();
t.add(new Toy(frequent));
t.get(count).setCount(getThatToy(frequent).getCount());
toyList.remove(getThatToy(frequent));
count++;
}
Anyways, I think the instructions asked you to return the Toy object, not its name.
It's quite simple, just keep track of the max count.
public Toy getMostFrequentToy() {
Toy mostFrequent = null;
int maximum = Integer.MIN_VALUE;
for (Toy t : toyList) {
if (t.getCount() > maximum)
mostFrequent = t;
}
return t;
}
Now, the above code can become
public void sortToysByCount() {
ArrayList<Toy> t = new ArrayList<Toy>();
// int count = 0;
int size = toyList.size();
Toy frequent;
for (int i = size; i > 0; i--) {
frequent = getMostFrequentToy();
t.add(frequent);
// t.get(count).setCount(frequent.getCount()); // Not sure about this
toyList.remove(frequent);
// count++;
}
toyList.clear();
toyList.addAll(t);
}
Realistically, though, when you want to sort, you really should see how to create a Comparator for your Toy objects.
I have a set of K element and i need to create a combination of N ordered element.
For examle if K=1 and i have {X1, emptyset} and n = 2 then i have an ordered pair i need to make this:
Example1:
({},{})
({X1},{}), ({},{X1})
({X1},{X1})
Note that I need to get the element in this order: first the element with 0 node as the sum of both pairs, second the element with 1, ecc
My idea is to make the set of parts of the intial set, adding an element at time, but I'm losing my mind. Any suggestions? I need to do this in java.
EDIT 1:
In other words I need to create an Hasse diagram:
http://en.wikipedia.org/wiki/Hasse_diagram
where every node is an element of the set of parts and the partial-ordering function is the inclusion of on all the subset like this:
Example2:
ni = (S1i,S2i) C nj = (S1j,S2j) only if S1i C S1j AND S21 C s2j
EDIT2: #RONALD:
If I have K=2 for a set S = {1, 2} and n =2, i need this output:
level0: ({}, {})
level1: ({1}, {}); ({2}, {}); ({}, {1}); ({}, {2})
level2: ({1,2}, {}); ({1}, {1}); ({1}, {2}); ({2}, {1}); ({2}, {2}); ({}, {1,2});
[..]
the order is important between levels, for example:
If at level1 i have
({1}, {}); ({2}, {}); ({}, {1}); ({}, {2})
OR
({}, {2}); ({}, {1}); ({2}, {}); ({1}, {});
is the same thing. But it's importat that at level 2 i have all superset of level2 and a superset is explained in example 2
EDIT3:
If my set is S= {x,y,z} and i have only one set per node the result (starting from the bottom) is this:
http://upload.wikimedia.org/wikipedia/commons/e/ea/Hasse_diagram_of_powerset_of_3.svg
If I have S={1,2} and two set per nod the result is this (thanks Ronald for the diagram) :
http://www.independit.de/Downloads/hasse.pdf
EDIT4:
Because is a super-exponential problem my idea is: I compute one level at time (in ordered mode!) and with some rule i prune a node and all his superset. Another stop rule may be to stop at a certain level. For this rule it is essential to calculate combinations directly in an orderly manner and not to calculate all and then reorder.
EDIT5:
The Marco13's code work fine, i have make some modify for:
Use function PowerSet because it's helpfull for make all combination of only K element of a set S (I only need to get the first tot element of powerset for do this).
Now the algorithm do all but i need to speed up it. Is there any way to parallelize the computation? such a way to use Map Reduce (Apache hadoop implementation) paradigm?
package utilis;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class HasseDiagramTest4
{
public static void main(String[] args)
{
int numberOfSetsPerNode = 3;
List<Integer> set = Arrays.asList(1, 2, 3, 4, 5,6);
List<Set<Integer>> powerSet = computePowerSet(set);
powerSet = KPowerSet(powerSet, 3);
List<List<Set<Integer>>> prunedNodes =
new ArrayList<List<Set<Integer>>>();
List<Set<Integer>> prunedNode = new ArrayList<Set<Integer>>();
HashSet<Integer> s = new HashSet<Integer>();
HashSet<Integer> s_vuoto = new HashSet<Integer>();
s.add(1);
s.add(2);
prunedNode.add(s);
prunedNode.add(s_vuoto);
prunedNode.add(s);
prunedNodes.add(prunedNode);
compute(ordina(powerSet), numberOfSetsPerNode, prunedNodes);
}
private static <T> HashMap<Integer, List<Set<T>>> ordina(List<Set<T>> powerSet) {
HashMap<Integer, List<Set<T>>> hs = new HashMap<Integer, List<Set<T>>>();
for(Set<T> l: powerSet)
{
List<Set<T>> lput = new ArrayList<Set<T>>();
if(hs.containsKey(l.size()))
{
lput = hs.get(l.size());
lput.add(l);
hs.put(l.size(), lput);
}
else
{
lput.add(l);
hs.put(l.size(), lput);
}
}
return hs;
}
private static <T> List<Set<T>> KPowerSet(List<Set<T>> powerSet, int k)
{
List<Set<T>> result = new ArrayList<Set<T>>();
for(Set<T>s:powerSet)
{
if(s.size() <= k)
{
result.add(s);
}
}
return result;
}
private static <T> List<Set<T>> computePowerSet(List<T> set)
{
List<Set<T>> result = new ArrayList<Set<T>>();
int numElements = 1 << set.size();
for (int j=0; j<numElements; j++)
{
Set<T> element = new HashSet<T>();
for (int i = 0; i < set.size(); i++)
{
long b = 1 << i;
if ((j & b) != 0)
{
element.add(set.get(i));
}
}
result.add(element);
}
return result;
}
private static List<Integer> createList(int numberOfElements)
{
List<Integer> list = new ArrayList<Integer>();
for (int i=0; i<numberOfElements; i++)
{
list.add(i+1);
}
return list;
}
private static <T> void compute(
HashMap<Integer, List<Set<T>>> powerSet, int numberOfSetsPerNode,
List<List<Set<T>>> prunedNodes)
{
Set<List<Set<T>>> level0 = createLevel0(numberOfSetsPerNode);
System.out.println("Level 0:");
print(level0);
Set<List<Set<T>>> currentLevel = level0;
int level = 0;
while (true)
{
Set<List<Set<T>>> nextLevel =
createNextLevel(currentLevel, powerSet, prunedNodes);
if (nextLevel.size() == 0)
{
break;
}
System.out.println("Next level: "+nextLevel.size()+" nodes");
print(nextLevel);
currentLevel = nextLevel;
level++;
}
}
private static <T> Set<List<Set<T>>> createLevel0(int numberOfSetsPerNode)
{
Set<List<Set<T>>> level0 =
new LinkedHashSet<List<Set<T>>>();
List<Set<T>> level0element = new ArrayList<Set<T>>();
for (int i=0; i<numberOfSetsPerNode; i++)
{
level0element.add(new LinkedHashSet<T>());
}
level0.add(level0element);
return level0;
}
private static <T> List<Set<T>> getNext(Set<T> current, HashMap<Integer, List<Set<T>>> powerSet)
{
ArrayList<Set<T>> ritorno = new ArrayList<Set<T>>();
int level = current.size();
List<Set<T>> listnext = powerSet.get(level+1);
if(listnext != null)
{
for(Set<T> next: listnext)
{
if(next.containsAll(current))
{
ritorno.add(next);
}
}
}
return ritorno;
}
private static <T> Set<List<Set<T>>> createNextLevel(
Set<List<Set<T>>> currentLevel, HashMap<Integer, List<Set<T>>> powerSet,
List<List<Set<T>>> prunedNodes)
{
Set<List<Set<T>>> nextLevel = new LinkedHashSet<List<Set<T>>>();
//Per ogni nodo del livello corrente
for (List<Set<T>> currentLevelElement : currentLevel)
{
//Per ogni insieme del nodo preso in considerazione
for (int i=0; i<currentLevelElement.size(); i++)
{
List<Set<T>> listOfnext = getNext (currentLevelElement.get(i), powerSet);
for (Set<T> element : listOfnext)
{
List<Set<T>> nextLevelElement = copy(currentLevelElement);
Set<T> next = element;
nextLevelElement.remove(i);
nextLevelElement.add(i, next);
boolean pruned = false;
for (List<Set<T>> prunedNode : prunedNodes)
{
if (isSuccessor(prunedNode, nextLevelElement))
{
pruned = true;
}
}
if (!pruned)
{
nextLevel.add(nextLevelElement);
}
else
{
System.out.println("Pruned "+nextLevelElement+ " due to "+prunedNodes);
}
}
}
}
return nextLevel;
}
private static <T> boolean isSuccessor(
List<Set<T>> list, List<Set<T>> successor)
{
for (int i=0; i<list.size(); i++)
{
Set<T> set = list.get(i);
Set<T> successorSet = successor.get(i);
//System.out.println("Successor:" + successorSet + "pruned:" + set);
if (!successorSet.containsAll(set))
{
return false;
}
}
return true;
}
private static <T> List<Set<T>> copy(List<Set<T>> list)
{
List<Set<T>> result = new ArrayList<Set<T>>();
for (Set<T> element : list)
{
result.add(new LinkedHashSet<T>(element));
}
return result;
}
private static <T> void print(
Iterable<? extends Collection<? extends Collection<T>>> sequence)
{
for (Collection<? extends Collection<T>> collections : sequence)
{
System.out.println(" "+collections);
}
}
}
After 4 EDITs and a lot of discussion, it's slowly becoming more clear what the goal of this application is. Indeed, one would have to think about an appropriate formalization, but it finally does not seem to be so difficult.
In contrast to my first answer ( https://stackoverflow.com/a/22092523 ) this new one iteratively computes the next level from the previous level (and the core of this, createNextLevel, is just 10 lines of code).
In the compute method, the pruning that was asked for in "EDIT4" could be integrated into the while loop.
EDIT: Still more requests in the comments. Integrated them. But this is becoming ridiculous. Um den Rest kannst du dich selbst kümmern.
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Set;
public class HasseDiagramTest2
{
public static void main(String[] args)
{
int numberOfElements = 2;
int numberOfSetsPerNode = 2;
List<Integer> list = createList(numberOfElements);
List<List<Set<Integer>>> prunedNodes =
new ArrayList<List<Set<Integer>>>();
List<Set<Integer>> prunedNode = new ArrayList<Set<Integer>>();
prunedNode.add(Collections.singleton(1));
prunedNode.add(Collections.singleton(1));
prunedNodes.add(prunedNode);
compute(list, numberOfSetsPerNode, prunedNodes);
}
private static List<Integer> createList(int numberOfElements)
{
List<Integer> list = new ArrayList<Integer>();
for (int i=0; i<numberOfElements; i++)
{
list.add(i+1);
}
return list;
}
private static <T> void compute(
List<T> elements, int numberOfSetsPerNode,
List<List<Set<T>>> prunedNodes)
{
Set<List<Set<T>>> level0 = createLevel0(numberOfSetsPerNode);
System.out.println("Level 0:");
print(level0);
Set<List<Set<T>>> currentLevel = level0;
int level = 0;
while (true)
{
Set<List<Set<T>>> nextLevel =
createNextLevel(currentLevel, elements, prunedNodes);
if (nextLevel.size() == 0)
{
break;
}
System.out.println("Next level: "+nextLevel.size()+" nodes");
print(nextLevel);
currentLevel = nextLevel;
level++;
}
}
private static <T> Set<List<Set<T>>> createLevel0(int numberOfSetsPerNode)
{
Set<List<Set<T>>> level0 =
new LinkedHashSet<List<Set<T>>>();
List<Set<T>> level0element = new ArrayList<Set<T>>();
for (int i=0; i<numberOfSetsPerNode; i++)
{
level0element.add(new LinkedHashSet<T>());
}
level0.add(level0element);
return level0;
}
private static <T> Set<List<Set<T>>> createNextLevel(
Set<List<Set<T>>> currentLevel, List<T> elements,
List<List<Set<T>>> prunedNodes)
{
Set<List<Set<T>>> nextLevel = new LinkedHashSet<List<Set<T>>>();
for (List<Set<T>> currentLevelElement : currentLevel)
{
for (int i=0; i<currentLevelElement.size(); i++)
{
for (T element : elements)
{
List<Set<T>> nextLevelElement = copy(currentLevelElement);
Set<T> next = nextLevelElement.get(i);
boolean changed = next.add(element);
if (!changed)
{
continue;
}
boolean pruned = false;
for (List<Set<T>> prunedNode : prunedNodes)
{
if (isSuccessor(prunedNode, nextLevelElement))
{
pruned = true;
}
}
if (!pruned)
{
nextLevel.add(nextLevelElement);
}
else
{
// System.out.println(
// "Pruned "+nextLevelElement+
// " due to "+prunedNodes);
}
}
}
}
return nextLevel;
}
private static <T> boolean isSuccessor(
List<Set<T>> list, List<Set<T>> successor)
{
for (int i=0; i<list.size(); i++)
{
Set<T> set = list.get(i);
Set<T> successorSet = successor.get(i);
if (!successorSet.containsAll(set))
{
return false;
}
}
return true;
}
private static <T> List<Set<T>> copy(List<Set<T>> list)
{
List<Set<T>> result = new ArrayList<Set<T>>();
for (Set<T> element : list)
{
result.add(new LinkedHashSet<T>(element));
}
return result;
}
private static <T> void print(
Iterable<? extends Collection<? extends Collection<T>>> sequence)
{
for (Collection<? extends Collection<T>> collections : sequence)
{
System.out.println(" "+collections);
}
}
}
As mentioned in the comments, I'm rather sure that the formalization of what actually should be done is either unclear or plainly wrong. The criterion for comparing the "nodes" does not match the examples. However, once the sorting criterion (in form of a Comparator) has been specified, this should be rather easy to implement.
Here, the criterion for comparing two "nodes" is the sum of the sizes of all sets in the node, which matches the example that was given (although it intuitively does not make sense, because it does not correspond to any real subset relationship....)
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
public class HasseDiagramTest
{
public static void main(String[] args)
{
List<Integer> set = Arrays.asList(1, 2);
List<List<Integer>> powerSet = computePowerSet(set);
List<List<List<Integer>>> combinations =
computeCombinations(powerSet, 2);
Comparator<List<List<Integer>>> comparator = createComparator();
Collections.sort(combinations, comparator);
List<List<List<List<Integer>>>> levels = createLevels(combinations);
for (List<List<List<Integer>>> level : levels)
{
System.out.println(level);
}
}
private static <T> List<List<List<List<T>>>> createLevels(
List<List<List<T>>> sortedCombinations)
{
List<List<List<List<T>>>> levels = new ArrayList<List<List<List<T>>>>();
int previousTotalSize = -1;
List<List<List<T>>> currentLevel = null;
for (int i=0; i<sortedCombinations.size(); i++)
{
List<List<T>> combination = sortedCombinations.get(i);
int totalSize = totalSize(combination);
if (previousTotalSize != totalSize)
{
previousTotalSize = totalSize;
currentLevel = new ArrayList<List<List<T>>>();
levels.add(currentLevel);
}
currentLevel.add(combination);
}
return levels;
}
private static <T> Comparator<List<List<T>>> createComparator()
{
return new Comparator<List<List<T>>>()
{
#Override
public int compare(List<List<T>> list0, List<List<T>> list1)
{
return Integer.compare(totalSize(list0), totalSize(list1));
}
};
}
private static <T> int totalSize(List<List<T>> lists)
{
int totalSize = 0;
for (List<T> list : lists)
{
totalSize += list.size();
}
return totalSize;
}
private static <T> List<List<T>> computePowerSet(List<T> set)
{
List<List<T>> result = new ArrayList<List<T>>();
int numElements = 1 << set.size();
for (int j=0; j<numElements; j++)
{
List<T> element = new ArrayList<T>();
for (int i = 0; i < set.size(); i++)
{
long b = 1 << i;
if ((j & b) != 0)
{
element.add(set.get(i));
}
}
result.add(element);
}
return result;
}
private static <T> List<List<T>> computeCombinations(List<T> list, int sampleSize)
{
int numElements = (int) Math.pow(list.size(), sampleSize);
int chosen[] = new int[sampleSize];
List<List<T>> result = new ArrayList<List<T>>();
for (int current = 0; current < numElements; current++)
{
List<T> element = new ArrayList<T>(sampleSize);
for (int i = 0; i < sampleSize; i++)
{
element.add(list.get(chosen[i]));
}
result.add(element);
increase(chosen, list.size());
}
return result;
}
private static void increase(int chosen[], int inputSize)
{
int index = chosen.length - 1;
while (index >= 0)
{
if (chosen[index] < inputSize - 1)
{
chosen[index]++;
return;
}
chosen[index] = 0;
index--;
}
}
}
So if you have a basic set S = {1, 2}, then K = 2 and the set of subsets of S is {{}, {1}, {2}, {1,2}}. Assume n is still 2. Then your output will be something like
({}, {})
({1}, {}); ({2}, {}); ({}, {1}); ({}, {2})
({1,2}, {}); ({}, {1,2})
({1}, {1}); ({1}, {2}); ({2}, {1}); ({2}, {2})
({1}, {1,2}); ({1,2}, {1}); ({2}, {1,2}); ({1,2}, {2})
({1,2}, {1,2})
Correct? The ordering with the output is a bit difficult because the result isn't fully ordered. But it still boils down to counting. Not, as I initially thought, (K+1)-ary but more (2^K)-ary.
In order to determine if one set is a subset of another, using primes might be an idea.
You assign a prime number to each element of your original set. In my example, that would be 2 and 3. The set of subsets can be build by generating all products of the prime numbers. In my example that would be {1 /* empty set */, 2, 3, 6}.
If you have two sets, represented by your product it is easy to test the inclusion:
if (a % b == 0) then b is a subset of a
It's just a bunch of ideas, but they might help you finding a solution. Of course, the prime trick only works for a relatively small number of elements in your original set, but as soon as K and N grow, you'll get problems anyway. (The number of elements in your output will be (2^K)^N = 2^(NK). If K == N == 5, you'll have 2^(5 * 5) = 2^25, about 32 million output elements. And here the prime thought still works).
Edit: Well I wrote a small Java Program to show my ideas.
save it to Hasse.java
compile it: javac Hasse.java
run it: java Hasse > hasse.dot
run dot: dot -Tpdf -ohasse.pdf hasse.dot
view it: acroread hasse.pdf
Source Code:
import java.lang.*;
import java.util.*;
public class Hasse {
private static int K[] = { 1, 2, 3 };
private static int N = 2;
private static int prime[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };
//
// PK[0][] is the array of "subsets"
// PK[1][] is the array of number of elements of K participating in the subset
//
private static int PK[][];
// some constants; the initialization is clear enough
private static final long twoNK = pow(2, N * K.length);
private static final int twoK = (int) pow(2, K.length);
private static final int NK = N * K.length;
private static final long NKf = fac(NK);
//
// this power function isn't suitable for large powers
// but in the range we are working, it's OK
//
public static long pow(int b, int p)
{
long result = 1;
for (int i = 0; i < p; ++i)
result *= b;
return result;
}
// fac calculates n! (needed for the a over b calculation)
public static long fac(int n)
{
long result = 1;
for (int i = n; i > 0; --i) result *= i;
return result;
}
//
// constructPK builds the set of subsets of K
// a subset is represented by a product of primes
// each element k_i of K has an associated prime p_i
// since the prime factorization of a number is unique,
// the product can be translated into a subset and vice versa
//
public static void constructPK()
{
int i, cnt;
int numElms = twoK;
PK = new int[2][numElms];
for (i = 0; i < numElms; ++i) {
int j = i;
cnt = 0;
PK[0][i] = 1;
PK[1][i] = 0;
while (j > 0) {
if (j % 2 == 1) {
PK[0][i] *= prime[cnt];
PK[1][i]++;
}
cnt++;
j /= 2;
}
}
}
// we have a k-ary number (that is: binary if k == 2, octal if k == 8
// and so on
// the addOne() function calculates the next number based on the input
public static void addOne(int kAry[])
{
int i = 0;
kAry[i] += 1;
while (kAry[i] >= twoK) {
kAry[i] = 0;
++i;
kAry[i] += 1;
}
}
// the addN() function is similar to the addOne() function
// with the difference that it add n to the input, not just 1
public static void addN(int kAry[], int n)
{
int i = 0;
kAry[i] += n;
for (i = 0; i < N - 1; ++i) {
while (kAry[i] >= twoK) {
kAry[i] -= twoK;
kAry[i+1] += 1;
}
}
}
// from the k-ary number, which represents a node in the graph,
// the "level" is calculated.
public static int getLevel(int kAry[])
{
int level = 0;
for (int i = 0; i < N; ++i) {
level += PK[1][kAry[i]];
}
return level;
}
// output function for a node
public static String renderNode(int kAry[])
{
StringBuffer sb = new StringBuffer();
String sep = "";
sb.append("(");
for (int i = 0; i < N; ++i) {
String setSep = "";
int p = PK[0][kAry[i]];
sb.append(sep);
sb.append("{");
for (int j = 0; j < K.length; ++j) {
if (p % prime[j] == 0) {
sb.append(setSep + K[j]);
setSep = ", ";
}
}
sb.append("}");
sep = ", ";
}
sb.append(")");
return sb.toString();
}
// This function calculates the numerical representation
// of a node, addressed by its level and position within the level,
// in the k-ary number system
// if there's a more elegant way of finding the node, it would
// largely speed up the calculation, since this function is needed
// for calculating the edges
public static int[] getKAry(int level, int node)
{
int kAry[] = new int[N];
int nodesSoFar = 0;
for (int i = 0; i < N; ++i) kAry[i] = 0;
for (int cnt = 0; cnt < twoNK; ++cnt) {
if (getLevel(kAry) == level) {
if (nodesSoFar == node) {
return kAry;
} else
nodesSoFar++;
}
if (cnt + 1 < twoNK)
addOne(kAry);
}
return null;
}
// this function converts the decimal nodeNumber to
// its k-ary representation
public static int[] getKAry(int nodeNumber)
{
int kAry[] = new int[N];
for (int i = 0; i < N; ++i) kAry[i] = 0;
addN(kAry, nodeNumber);
return kAry;
}
public static String getLabel(int level, int node)
{
int kAry[] = getKAry(level, node);
return (kAry == null ? "Oops!" : renderNode(kAry));
}
public static void printPK()
{
System.out.println("# Number of elements: " + PK[0].length);
for (int i = 0; i < PK[0].length; ++i) {
System.out.println("# PK[0][" + i + "] = " + PK[0][i] + ",\tPK[1][" + i + "] = " + PK[1][i]);
}
}
public static void printPreamble()
{
System.out.println("digraph G {");
System.out.println("ranksep = 3");
System.out.println();
}
public static void printEnd()
{
System.out.println("}");
}
public static void printNodes()
{
int numNodes;
for (int i = 0; i <= NK; ++i) {
int level = i + 1;
numNodes = (int) (NKf / (fac(i) * fac(NK - i)));
for (int j = 0; j < numNodes; ++j) {
System.out.println("level_" + level + "_" + (j+1) + " [shape=box,label=\"" + getLabel(i, j) + "\"];");
}
System.out.println();
}
System.out.println();
}
// having two vectors of "sets", this function determines
// if each set in the ss (small set) vector is a subset of
// the corresponding set in the ls (large set) vector
public static boolean isSubset(int ss[], int ls[])
{
for (int i = 0; i < N; ++i)
if (PK[0][ls[i]] % PK[0][ss[i]] != 0) return false;
return true;
}
// this function finds and prints the edges
// it is called about twoNK times (once for each node)
// therefore performance optimizations have to be done here
public static void printEdges(int level, int node, int nodeNumber)
{
int kAry[] = getKAry(node);
int nlAry[];
int numNodes = (int) (NKf / (fac(level + 1) * fac(NK - level - 1)));
String myNode = "level_" + (level + 1) + "_" + (node + 1);
for (int i = 0; i < numNodes; ++i) {
nlAry = getKAry(level + 1, i);
if (nlAry == null) System.exit(1);
if (isSubset(kAry, nlAry)) {
System.out.println(myNode + " -> level_" + (level + 2) + "_" + (i + 1));
}
}
}
// this function renders the dot file
// first some initial text (preamble),
// then the nodes and the edges
// and finally the closing brace
public static void renderDot()
{
int numNodes;
int nodeNumber = 0;
printPreamble();
printNodes();
for (int level = 0; level < NK; ++level) {
numNodes = (int) (NKf / (fac(level) * fac(NK - level)));
for (int node = 0; node < numNodes; ++node) {
// find the edges to the nodes on the next level
printEdges(level, node, nodeNumber);
++nodeNumber;
}
System.out.println();
}
printEnd();
}
public static void main (String argv[])
{
constructPK();
renderDot();
}
}
I have a continuous running thread in my application, which consists of a HashSet to store all the symbols inside the application. As per the design at the time it was written, inside the thread's while true condition it will iterate the HashSet continuously, and update the database for all the symbols contained inside HashSet.
The maximum number of symbols that might be present inside the HashSet will be around 6000. I don't want to update the DB with all the 6000 symbols at once, but divide this HashSet into different subsets of 500 each (12 sets) and execute each subset individually and have a thread sleep after each subset for 15 minutes, so that I can reduce the pressure on the database.
This is my code (sample code snippet)
How can I partition a set into smaller subsets and process (I have seen the examples for partitioning ArrayList, TreeSet, but didn't find any example related to HashSet)
package com.ubsc.rewji.threads;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.PriorityBlockingQueue;
public class TaskerThread extends Thread {
private PriorityBlockingQueue<String> priorityBlocking = new PriorityBlockingQueue<String>();
String symbols[] = new String[] { "One", "Two", "Three", "Four" };
Set<String> allSymbolsSet = Collections
.synchronizedSet(new HashSet<String>(Arrays.asList(symbols)));
public void addsymbols(String commaDelimSymbolsList) {
if (commaDelimSymbolsList != null) {
String[] symAr = commaDelimSymbolsList.split(",");
for (int i = 0; i < symAr.length; i++) {
priorityBlocking.add(symAr[i]);
}
}
}
public void run() {
while (true) {
try {
while (priorityBlocking.peek() != null) {
String symbol = priorityBlocking.poll();
allSymbolsSet.add(symbol);
}
Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
while (ite.hasNext()) {
String symbol = ite.next();
if (symbol != null && symbol.trim().length() > 0) {
try {
updateDB(symbol);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Thread.sleep(2000);
} catch (Exception e) {
e.printStackTrace();
}
}
}
public void updateDB(String symbol) {
System.out.println("THE SYMBOL BEING UPDATED IS" + " " + symbol);
}
public static void main(String args[]) {
TaskerThread taskThread = new TaskerThread();
taskThread.start();
String commaDelimSymbolsList = "ONVO,HJI,HYU,SD,F,SDF,ASA,TRET,TRE,JHG,RWE,XCX,WQE,KLJK,XCZ";
taskThread.addsymbols(commaDelimSymbolsList);
}
}
With Guava:
for (List<String> partition : Iterables.partition(yourSet, 500)) {
// ... handle partition ...
}
Or Apache Commons:
for (List<String> partition : ListUtils.partition(yourList, 500)) {
// ... handle partition ...
}
Do something like
private static final int PARTITIONS_COUNT = 12;
List<Set<Type>> theSets = new ArrayList<Set<Type>>(PARTITIONS_COUNT);
for (int i = 0; i < PARTITIONS_COUNT; i++) {
theSets.add(new HashSet<Type>());
}
int index = 0;
for (Type object : originalSet) {
theSets.get(index++ % PARTITIONS_COUNT).add(object);
}
Now you have partitioned the originalSet into 12 other HashSets.
We can use the following approach to divide a Set.
We will get the output as
[a, b]
[c, d]
[e]`
private static List<Set<String>> partitionSet(Set<String> set, int partitionSize)
{
List<Set<String>> list = new ArrayList<>();
int setSize = set.size();
Iterator iterator = set.iterator();
while(iterator.hasNext())
{
Set newSet = new HashSet();
for(int j = 0; j < partitionSize && iterator.hasNext(); j++)
{
String s = (String)iterator.next();
newSet.add(s);
}
list.add(newSet);
}
return list;
}
public static void main(String[] args)
{
Set<String> set = new HashSet<>();
set.add("a");
set.add("b");
set.add("c");
set.add("d");
set.add("e");
int size = 2;
List<Set<String>> list = partitionSet(set, 2);
for(int i = 0; i < list.size(); i++)
{
Set<String> s = list.get(i);
System.out.println(s);
}
}
If you are not worried much about space complexity, you can do like this in a clean way :
List<List<T>> partitionList = Lists.partition(new ArrayList<>(inputSet), PARTITION_SIZE);
List<Set<T>> partitionSet = partitionList.stream().map((Function<List<T>, HashSet>) HashSet::new).collect(Collectors.toList());
The Guava solution from #Andrey_chaschev seems the best, but in case it is not possible to use it, I believe the following would help
public static List<Set<String>> partition(Set<String> set, int chunk) {
if(set == null || set.isEmpty() || chunk < 1)
return new ArrayList<>();
List<Set<String>> partitionedList = new ArrayList<>();
double loopsize = Math.ceil((double) set.size() / (double) chunk);
for(int i =0; i < loopsize; i++) {
partitionedList.add(set.stream().skip((long)i * chunk).limit(chunk).collect(Collectors.toSet()));
}
return partitionedList;
}
A very simple way for your actual problem would be to change your code as follows:
Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
int i = 500;
while ((--i > 0) && ite.hasNext()) {
A general method would be to use the iterator to take the elements out one by one in a simple loop:
int i = 500;
while ((--i > 0) && ite.hasNext()) {
sublist.add(ite.next());
ite.remove();
}