Partition a Set into smaller Subsets and process as batch - java

I have a continuous running thread in my application, which consists of a HashSet to store all the symbols inside the application. As per the design at the time it was written, inside the thread's while true condition it will iterate the HashSet continuously, and update the database for all the symbols contained inside HashSet.
The maximum number of symbols that might be present inside the HashSet will be around 6000. I don't want to update the DB with all the 6000 symbols at once, but divide this HashSet into different subsets of 500 each (12 sets) and execute each subset individually and have a thread sleep after each subset for 15 minutes, so that I can reduce the pressure on the database.
This is my code (sample code snippet)
How can I partition a set into smaller subsets and process (I have seen the examples for partitioning ArrayList, TreeSet, but didn't find any example related to HashSet)
package com.ubsc.rewji.threads;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.PriorityBlockingQueue;
public class TaskerThread extends Thread {
private PriorityBlockingQueue<String> priorityBlocking = new PriorityBlockingQueue<String>();
String symbols[] = new String[] { "One", "Two", "Three", "Four" };
Set<String> allSymbolsSet = Collections
.synchronizedSet(new HashSet<String>(Arrays.asList(symbols)));
public void addsymbols(String commaDelimSymbolsList) {
if (commaDelimSymbolsList != null) {
String[] symAr = commaDelimSymbolsList.split(",");
for (int i = 0; i < symAr.length; i++) {
priorityBlocking.add(symAr[i]);
}
}
}
public void run() {
while (true) {
try {
while (priorityBlocking.peek() != null) {
String symbol = priorityBlocking.poll();
allSymbolsSet.add(symbol);
}
Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
while (ite.hasNext()) {
String symbol = ite.next();
if (symbol != null && symbol.trim().length() > 0) {
try {
updateDB(symbol);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Thread.sleep(2000);
} catch (Exception e) {
e.printStackTrace();
}
}
}
public void updateDB(String symbol) {
System.out.println("THE SYMBOL BEING UPDATED IS" + " " + symbol);
}
public static void main(String args[]) {
TaskerThread taskThread = new TaskerThread();
taskThread.start();
String commaDelimSymbolsList = "ONVO,HJI,HYU,SD,F,SDF,ASA,TRET,TRE,JHG,RWE,XCX,WQE,KLJK,XCZ";
taskThread.addsymbols(commaDelimSymbolsList);
}
}

With Guava:
for (List<String> partition : Iterables.partition(yourSet, 500)) {
// ... handle partition ...
}
Or Apache Commons:
for (List<String> partition : ListUtils.partition(yourList, 500)) {
// ... handle partition ...
}

Do something like
private static final int PARTITIONS_COUNT = 12;
List<Set<Type>> theSets = new ArrayList<Set<Type>>(PARTITIONS_COUNT);
for (int i = 0; i < PARTITIONS_COUNT; i++) {
theSets.add(new HashSet<Type>());
}
int index = 0;
for (Type object : originalSet) {
theSets.get(index++ % PARTITIONS_COUNT).add(object);
}
Now you have partitioned the originalSet into 12 other HashSets.

We can use the following approach to divide a Set.
We will get the output as
[a, b]
[c, d]
[e]`
private static List<Set<String>> partitionSet(Set<String> set, int partitionSize)
{
List<Set<String>> list = new ArrayList<>();
int setSize = set.size();
Iterator iterator = set.iterator();
while(iterator.hasNext())
{
Set newSet = new HashSet();
for(int j = 0; j < partitionSize && iterator.hasNext(); j++)
{
String s = (String)iterator.next();
newSet.add(s);
}
list.add(newSet);
}
return list;
}
public static void main(String[] args)
{
Set<String> set = new HashSet<>();
set.add("a");
set.add("b");
set.add("c");
set.add("d");
set.add("e");
int size = 2;
List<Set<String>> list = partitionSet(set, 2);
for(int i = 0; i < list.size(); i++)
{
Set<String> s = list.get(i);
System.out.println(s);
}
}

If you are not worried much about space complexity, you can do like this in a clean way :
List<List<T>> partitionList = Lists.partition(new ArrayList<>(inputSet), PARTITION_SIZE);
List<Set<T>> partitionSet = partitionList.stream().map((Function<List<T>, HashSet>) HashSet::new).collect(Collectors.toList());

The Guava solution from #Andrey_chaschev seems the best, but in case it is not possible to use it, I believe the following would help
public static List<Set<String>> partition(Set<String> set, int chunk) {
if(set == null || set.isEmpty() || chunk < 1)
return new ArrayList<>();
List<Set<String>> partitionedList = new ArrayList<>();
double loopsize = Math.ceil((double) set.size() / (double) chunk);
for(int i =0; i < loopsize; i++) {
partitionedList.add(set.stream().skip((long)i * chunk).limit(chunk).collect(Collectors.toSet()));
}
return partitionedList;
}

A very simple way for your actual problem would be to change your code as follows:
Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
int i = 500;
while ((--i > 0) && ite.hasNext()) {
A general method would be to use the iterator to take the elements out one by one in a simple loop:
int i = 500;
while ((--i > 0) && ite.hasNext()) {
sublist.add(ite.next());
ite.remove();
}

Related

How to save permutation in a Set Java

I have this method that prints my permutations of a Set I'm giving with my parameters. But I need to save them in 2 separate sets and compare them. So, for instance I have [5,6,3,1] and [5,6,1,3], by adding them in two separate BST, I can compare them by using the compareTo function to check whether their level order is the same. But I am having trouble with saving these permutations from my method into a set in my main. Does anyone know how to save these into a set?
What I have now:
import edu.princeton.cs.algs4.BST;
import java.util.*;
public class MyBST {
public static void main(String[] args) {
int size = 4;
BST<Integer, Integer> bst1 = new BST<Integer, Integer>();
BST<Integer, Integer> bst2 = new BST<Integer, Integer>();
Random r = new Random();
Set<Integer> tes = new LinkedHashSet<>(size);
Stack<Integer> stack = new Stack<>();
while (tes.size() < size) {
tes.add(r.nextInt(10));
}
System.out.println(tes);
System.out.println("possible combinations");
Iterator<Integer> it = tes.iterator();
for (int i = 0; i < tes.toArray().length; i++) {
Integer key = it.next();
bst1.put(key, 0);
}
combos(tes, stack, tes.size());
}
}
and here is the method I use:
public static void combos(Set<Integer> items, Stack<Integer> stack, int size) {
if (stack.size() == size) {
System.out.println(stack);
}
Integer[] itemz = items.toArray(new Integer[0]);
for (Integer i : itemz) {
stack.push(i);
items.remove(i);
combos(items, stack, size);
items.add(stack.pop());
}
}
And this is the output:
I'm not sure if I understood your idea but maybe this will help:
Yours combos method will return set of all permutations (as Stacks)
...
for (int i = 0; i < tes.toArray().length; i++) {
Integer key = it.next();
bst1.put(key, 0);
}
Set<Stack<Integer>> combos = combos(tes, stack, tes.size()); //there you have set with all Stacks
}
}
public static Set<Stack<Integer>> combos(Set<Integer> items, Stack<Integer> stack, int size) {
Set<Stack<Integer>> set = new HashSet<>();
if(stack.size() == size) {
System.out.println(stack.to);
set.add((Stack) stack.clone());
}
Integer[] itemz = items.toArray(new Integer[0]);
for(Integer i : itemz) {
stack.push(i);
items.remove(i);
set.addAll(combos(items, stack, size));
items.add(stack.pop());
}
return set;
}

How to overcome this stack overflow issue when finding SCCs?

This is the code I wrote to find SCCs usigng Kosaraju's Two-Passed Algorithm. When I run the main method, I get a StackOverFlowError on SCC.revDFS. How can I avoid the stack overflow error when having a large amount of recursive calls?
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Arrays;
import java.util.Scanner;
public class SCC {
int n = 875714;
Map<Integer,List<Integer>> adjList;
Map<Integer,List<Integer>> adjListRev;
int[] ft;
int t;
int s;
boolean[] marked;
int[] leaders;
public SCC() {
init();
t = 0;
s = 0;
marked = new boolean[n + 1];
leaders = new int[n + 1];
}
void init() {
adjList = new HashMap<Integer,List<Integer>>();
adjListRev = new HashMap<Integer,List<Integer>>();
ft = new int[n + 1];
List<Integer> adj;
try {
Scanner scanner = new Scanner (new InputStreamReader(this.getClass().
getClassLoader().getResourceAsStream("SCC.txt")));
while(scanner.hasNextLine()) {
String s = scanner.nextLine().trim();
String[] num = s.split(" ");
if (!adjList.containsKey(Integer.parseInt(num[0]))) {
adjList.put(Integer.parseInt(num[0]), new ArrayList<Integer>());
}
adj = adjList.get(Integer.parseInt(num[0]));
adj.add(Integer.parseInt(num[1]));
adjList.put(Integer.parseInt(num[0]), adj);
if (!adjListRev.containsKey(Integer.parseInt(num[1]))) {
adjListRev.put(Integer.parseInt(num[1]), new ArrayList<Integer>());
}
adj = adjListRev.get(Integer.parseInt(num[1]));
adj.add(Integer.parseInt(num[0]));
adjListRev.put(Integer.parseInt(num[1]), adj);
}
} catch (Exception e) {
e.printStackTrace();
}
}
public void DFS_Loop() {
for (int i = 1; i < n + 1; i++) {
marked[i] = false;
}
for (int i = n; i > 0; i--) {
if (!marked[i]) {
revDFS(i);
}
}
for (int i = 1; i < n + 1; i++) {
marked[i] = false;
leaders[i] = 0;
}
for (int i = n; i > 0; i--) {
if (!marked[ft[i]]) {
s = ft[i];
DFS(ft[i]);
}
}
}
public void revDFS(int i) {
marked[i] = true;
List<Integer> edges = adjListRev.get(i);
if (edges != null) {
for (int j: edges) {
if (!marked[j]) {
revDFS(j);
}
}
}
t += 1;
ft[t] = i;
}
public void DFS(int i) {
marked[i] = true;
leaders[s] += 1;
List<Integer> edges = adjList.get(i);
if (edges != null) {
for (int j: edges) {
if (!marked[j]) {
DFS(j);
}
}
}
}
public static void main(String[] args) {
SCC scc = new SCC();
scc.DFS_Loop();
Arrays.sort(scc.leaders);
for (int i = scc.n; i < scc.n - 5; i--) {
System.out.println(scc.leaders[i]);
}
}
}
Maybe you can try to convert the logic to iterative approach. Also, do check if you have base and edge cases handled properly.
The basic idea for converting a recursive function into an iterative function is that a recursive function consumes arguments from a stack.
So you can create a stack and push the values into it and then consume them in a loop.
public void _revDFS(int _i) {
LinkedList<Integer> stack = new LinkedList<>();
stack.push(_i);
while(!stack.isEmpty()){
int i = stack.pop();
marked[i] = true;
List<Integer> edges = adjListRev.get(i);
if (edges != null) {
for (int j: edges) {
if (!marked[j]) {
stack.push(j);
//revDFS(j);
}
}
}
t += 1;
ft[t] = i;
}
}
I can't really test it to see if I made a mistake of some kind and revDFS is a function with a lot of side effect and it does not return a value, so is a bit difficult to reason with it.
But the gist is that instead of calling the function itself you can just push the edge indexes onto the stack and then consume them.
The child edges will be processed in reverse order so if you want to keep the same order of processing of the original you should read the edges in reverse order :
ListIterator<Integer> li = edges.listIterator(edges.size());
while(li.hasPrevious()){
int j = li.previous();
if (!marked[j]) {
stack.push(j);
//revDFS(j);
}
}
you have implemented your Dfs function recursively which causes "stack overflow". To overcome this issue you need to implement it using stack data structure.
see link bellow for more motivations
https://github.com/sinamalakouti/MyFavoriteAlgorithmProblems

[Hackerrank][Performance Improvement] Similar Destinations

I am currently solving a challenge that I found on Hackerrank and am in need of some assistance in the code optimization/performance department. I've managed to get my code working and returning the right results but it is failing at the final test case with a timeout error. The input is quite large so, that explains why the code is taking longer that expected.
Problem statement: Similar Destinations
I've attempted to think of different ways of pruning my (intermediate) result set but could not come up with something that I did not already have. I believe that the find function could use a bit more tweaking. I've tried my best to reduce the number of paths that the recursive function has to take but ultimately, it has to look at every destination in order to come up with the right results. However, I did terminate a recursive path if the number of tags in common between destinations were below the min limit. Is there anything else that I could do here?
My code is as follows:-
static class Destination {
String dest;
List<String> tags;
public Destination(String dest, List<String> tags) {
this.dest = dest;
this.tags = tags;
}
#Override
public String toString() {
return dest;
}
}
static List<Destination> allDest = new ArrayList<Destination>();
static int min;
static Set<String> keysTracker = new HashSet<String>();
static Set<String> tagsTracker = new HashSet<String>();
static Map<String, List<String>> keysAndTags = new HashMap<String, List<String>>();
static void find(List<String> commonKey, List<String> commonTags, int index) {
if (index >= allDest.size())
return;
if (commonTags.size() < min)
return;
if (tagsTracker.contains(commonTags.toString()) || keysTracker.contains(commonKey.toString())) {
return;
}
String dest = allDest.get(index).dest;
commonKey.add(dest);
for (int i = index + 1; i < allDest.size(); ++i) {
List<String> tempKeys = new ArrayList<String>(commonKey);
List<String> tags = allDest.get(i).tags;
List<String> tempTags = new ArrayList<String>(commonTags);
tempTags.retainAll(tags);
find(tempKeys, tempTags, i);
if (tempTags.size() >= min) {
if (!tagsTracker.contains(tempTags.toString())
&& !keysTracker.contains(tempKeys.toString())) {
tagsTracker.add(tempTags.toString());
keysTracker.add(tempKeys.toString());
StringBuilder sb = new StringBuilder();
for (int j = 0; j < tempKeys.size(); ++j) {
sb.append(tempKeys.get(j));
if (j + 1 < tempKeys.size())
sb.append(",");
}
keysAndTags.put(sb.toString(), tempTags);
}
}
}
}
public static void main(String[] args) {
init();
sort();
calculate();
answer();
}
static void init() {
Scanner s = new Scanner(System.in);
min = s.nextInt();
s.nextLine();
String line;
while (s.hasNextLine()) {
line = s.nextLine();
if (line.isEmpty())
break;
String[] tokens = line.split(":");
String dest = tokens[0];
tokens = tokens[1].split(",");
List<String> tags = new ArrayList<String>();
for (int j = 0; j < tokens.length; ++j)
tags.add(tokens[j]);
Collections.sort(tags);
Destination d = new Destination(dest, tags);
allDest.add(d);
}
s.close();
}
static void sort() {
Collections.sort(allDest, new Comparator<Destination>() {
#Override
public int compare(Destination d1, Destination d2) {
return d1.dest.compareTo(d2.dest);
}
});
}
static void calculate() {
for (int i = 0; i < allDest.size() - 1; ++i) {
find(new ArrayList<String>(), new ArrayList<String>(allDest.get(i).tags), i);
}
}
static void answer() {
List<Map.Entry<String, List<String>>> mapInListForm = sortAnswer();
for (Map.Entry<String, List<String>> entry : mapInListForm) {
System.out.print(entry.getKey() + ":");
for (int i = 0; i < entry.getValue().size(); ++i) {
System.out.print(entry.getValue().get(i));
if (i + 1 < entry.getValue().size())
System.out.print(",");
}
System.out.println();
}
}
static List<Map.Entry<String, List<String>>> sortAnswer() {
List<Map.Entry<String, List<String>>> mapInListForm =
new LinkedList<Map.Entry<String, List<String>>>(keysAndTags.entrySet());
Collections.sort(mapInListForm, new Comparator<Map.Entry<String, List<String>>>() {
public int compare(Map.Entry<String, List<String>> e1, Map.Entry<String, List<String>> e2) {
if (e1.getValue().size() > e2.getValue().size()) {
return -1;
} else if (e1.getValue().size() < e2.getValue().size()) {
return 1;
}
return e1.getKey().compareTo(e2.getKey());
}
});
return mapInListForm;
}
Any help is greatly appreciated. Thanks!
I've managed to solve the problem after a bit of selective profiling. It would seem that my initial hunch was right. The problem had less to do with the algorithm and more towards the data structures that I was using! The culprit was in the find method. Specifically, when calling the retainAll method on two lists. I had forgotten the that it would take O(n^2) time to iterate through two lists. That was why it was slow. I then changed list into a HashSet instead. As most of us know, a HashSet has an O(1) time complexity when it comes to accessing its values. The retainAll method stayed but instead of finding the intersection between two lists, we now find the intersection between two sets instead! That managed to shave off a couple of seconds off of the total elapsed runtime and all the tests passed. :)
The find method now looks like this:-
static void find(List<String> commonKey, List<String> commonTags, int index) {
if (index >= allDest.size())
return;
if (commonTags.size() < min)
return;
if (tagsTracker.contains(commonTags.toString()) || keysTracker.contains(commonKey.toString())) {
return;
}
String dest = allDest.get(index).dest;
commonKey.add(dest);
for (int i = index + 1; i < allDest.size(); ++i) {
List<String> tempKeys = new ArrayList<String>(commonKey);
List<String> tags = allDest.get(i).tags;
Set<String> tempTagsSet1 = new HashSet<String>(commonTags);
Set<String> tempTagsSet2 = new HashSet<String>(tags);
tempTagsSet1.retainAll(tempTagsSet2);
List<String> tempTags = new ArrayList<String>(tempTagsSet1);
if (tempTags.size() >= min)
Collections.sort(tempTags);
find(tempKeys, tempTags, i);
if (tempTags.size() >= min) {
if (!tagsTracker.contains(tempTags.toString())
&& !keysTracker.contains(tempKeys.toString())) {
tagsTracker.add(tempTags.toString());
keysTracker.add(tempKeys.toString());
StringBuilder sb = new StringBuilder();
for (int j = 0; j < tempKeys.size(); ++j) {
sb.append(tempKeys.get(j));
if (j + 1 < tempKeys.size())
sb.append(",");
}
keysAndTags.put(sb.toString(), tempTags);
}
}
}
}

How to optimize performance when repeatedly looping over a big list of objects

I have a simple file that contains two integer values per line (a source integer and a target integer). Each line represents a relation between two values. The file is not sorted and the actual file contains about 4 million lines. After sorting it may look like this:
sourceId;targetId
1;5
2;3
4;7
7;4
8;7
9;5
My goal is to create a new object that will represent all unique related integers in a list with a unique identifier. The expected output of this example should be the following three objects:
0, [1, 5, 9]
1, [2, 3]
2, [4, 7, 8]
So groupId 0 contains a group of relations (1, 5 and 9).
Below is my current way to create a list of these objects. The list of Relation objects contains all the lines in memory. And the list of GroupedRelation should be the end result.
public class GroupedRelationBuilder {
private List<Relation> relations;
private List<GroupedRelation> groupedRelations;
private List<String> ids;
private int frameId;
public void build() {
relations = new ArrayList<>();
relations.add(new Relation(1, 5));
relations.add(new Relation(4, 7));
relations.add(new Relation(8, 7));
relations.add(new Relation(7, 4));
relations.add(new Relation(9, 5));
relations.add(new Relation(2, 3));
// sort
relations.sort(Comparator.comparing(Relation::getSource).thenComparing(Relation::getTarget));
// build the groupedRelations
groupId = 0;
groupedRelations = new ArrayList<>();
for (int i = 0; relations.size() > 0;) {
ids = new ArrayList<>();
int compareSource = relations.get(i).getSource();
int compareTarget = relations.get(i).getTarget();
ids.add(Integer.toString(compareSource));
ids.add(Integer.toString(compareTarget));
relations.remove(i);
for (int j = 0; j < relations.size(); j++) {
int source = relations.get(j).getSource();
int target = relations.get(j).getTarget();
if ((source == compareSource || source == compareTarget) && !ids.contains(Integer.toString(target))) {
ids.add(Integer.toString(target));
relations.remove(j);
continue;
}
if ((target == compareSource || target == compareTarget) && !ids.contains(Integer.toString(source))) {
ids.add(Integer.toString(source));
relations.remove(j);
continue;
}
}
if (relations.size() > 0) {
groupedRelations.add(new GroupedRelation(groupId++, ids));
}
}
}
class GroupedRelation {
private int groupId;
private List<String> relatedIds;
public GroupedRelation(int groupId, List<String> relations) {
this.groupId = groupId;
this.relatedIds = relations;
}
public int getGroupId() {
return groupId;
}
public List<String> getRelatedIds() {
return relatedIds;
}
}
class Relation {
private int source;
private int target;
public Relation(int source, int target) {
this.source = source;
this.target = target;
}
public int getSource() {
return source;
}
public void setSource(int source) {
this.source = source;
}
public int getTarget() {
return target;
}
public void setTarget(int target) {
this.target = target;
}
}
}
When I run this small example program, it takes 15 seconds to create 1000 GroupedRelation objects. To create 1 million GroupedRelation it would take 250 minutes..
I am looking for help in optimizing my code that does get the result I want but simply takes to long.
Is it possible to optimize the iteration in such a way that the expected result is the same but the time it takes to get the expected result is reduced significantly? If this is possible, how would you go about it?
The current implementation is slow due to the ids.contains step.
The time complexity of the ArrayList.contains method is O(n):
to check if it contains an element it checks the elements one by one,
in the worst case scanning the entire list.
You can greatly improve the performance if you change the type of ids from List<String> to Set<String>, and use HashSet<String> instances.
The expected time complexity of Set.contains implementations is O(1),
significantly faster compared to a list.
As much as possible I would attempt to do it in a single pass from source.
import java.io.*;
import java.util.*;
/**
* Created by peter on 10/07/16.
*/
public class GroupedRelationBuilder {
public static List<List<Integer>> load(File file) throws IOException {
Map<Integer, Group> idToGroupMap = new HashMap<>();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
br.readLine();
for (String line; (line = br.readLine()) != null; ) {
String[] parts = line.split(";");
Integer source = Integer.parseInt(parts[0]);
Integer target = Integer.parseInt(parts[1]);
Group grp0 = idToGroupMap.get(source);
Group grp1 = idToGroupMap.get(target);
if (grp0 == null) {
if (grp1 == null) {
Group grp = new Group();
List<Integer> list = grp.ids;
list.add(source);
list.add(target);
idToGroupMap.put(source, grp);
idToGroupMap.put(target, grp);
} else {
grp1.ids.add(source);
idToGroupMap.put(source, grp1);
}
} else if (grp1 == null) {
grp0.ids.add(target);
idToGroupMap.put(target, grp0);
} else {
grp0.ids.addAll(grp1.ids);
grp1.ids = grp0.ids;
}
}
}
Set<List<Integer>> idsSet = Collections.newSetFromMap(new IdentityHashMap<>());
for (Group group : idToGroupMap.values()) {
idsSet.add(group.ids);
}
return new ArrayList<>(idsSet);
}
static class Group {
List<Integer> ids = new ArrayList<>();
}
public static void main(String[] args) throws IOException {
File file = File.createTempFile("deleteme", "txt");
Set<String> pairs = new HashSet<>();
try (PrintWriter pw = new PrintWriter(file)) {
pw.println("source;target");
Random rand = new Random();
int count = 1000000;
while (pairs.size() < count) {
int a = rand.nextInt(count);
int b = rand.nextInt(count);
if (a < b) {
int t = a;
a = b;
b = t;
}
pairs.add(a + ";" + b);
}
for (String pair : pairs) {
pw.println(pair);
}
}
System.out.println("Processing");
long start = System.currentTimeMillis();
List<List<Integer>> results = GroupedRelationBuilder.load(file);
System.out.println(results.size() + " took " + (System.currentTimeMillis() - start) / 1e3 + " sec");
}
}
For one million pairs this prints
Processing
105612 took 12.719 sec
You implementation is slow due to the Integer.toString() usage.
Changing the type means object and memory allocations. This is now done 4-5 times in the subloop.
Changing it took me from 126ms to 35ms: 4 times faster!
Several other things I see are:
first for loop can be changed into while(!relations.isEmpty())
the second loop could be done by using an iterator for (Iterator<Relation> iterator = relations.iterator(); iterator.hasNext();). When you remove an item, you are now skipping the next.
Place the declaration of ids inside the loop

Test all possible combinations of rows

The problem is the following. There are multiple rows that have non-unique identifiers:
id value
0: {1,2,3}
0: {1,2,2}
1: {1,2,3}
2: {1,2,3}
2: {1,1,3}
I have the function equals that can compare multiple rows between each other. I need to write a code that selects the rows as an input of the function equals. The rows selected must have unique ids, BUT I should check all possible combinations of unique ids. For instance, if there are 5 rows with ids: 0,0,1,2,3, then I should check the following two combinations of ids: 0,1,2,3 and 0,1,2,3, because 0 apears twice. Of course, each of these two combinations will consist of unique rows that have id=0.
My code snippet is the following:
public class Test {
public static void main(String[] args) {
ArrayList<Row> allRows = new ArrayList<Row>();
allRows.add(new Row(0,new int[]{1,2,3}));
allRows.add(new Row(0,new int[]{1,2,2}));
allRows.add(new Row(1,new int[]{1,2,3}));
allRows.add(new Row(2,new int[]{1,2,3}));
allRows.add(new Row(2,new int[]{1,1,3}));
boolean answer = hasEqualUniqueRows(allRows);
}
private boolean hasEqualUniqueRows(ArrayList<Row> allTokens) {
for (int i=0; i<allTokens.size(); i++) {
ArrayList<Integer[]> rows = new ArrayList<Integer[]>();
rows = findUniqueRows(i,allTokens);
boolean answer = equalsExceptForNulls(rows);
if (answer) return true;
}
return false;
}
// Compare rows for similarities
public static <T> boolean equalsExceptForNulls(ArrayList<T[]> ts) {
for (int i=0; i<ts.size(); i++) {
for (int j=0; j<ts.size(); j++) {
if (i != j) {
boolean answer = equals(ts.get(i),ts.get(j));
if (!answer) return false;
}
}
}
return true;
}
public static <T> boolean equals(T[] ts1, T[] ts2) {
if (ts1.length != ts2.length) return false;
for(int i = 0; i < ts1.length; i++) {
T t1 = ts1[i], t2 = ts2[i];
if (t1 != null && t2 != null && !t1.equals(t2))
return false;
}
return true;
}
class Row {
private String key;
private Integer[] values;
public Row(String k,Integer[] v) {
this.key = k;
this.values = v;
}
public String getKey() {
return this.key;
}
public Integer[] getValues() {
return this.values;
}
}
}
Since the number of rows with unique ids is apriori unknown, I don´t know how to solve this problem. Any suggestions? Thanks.
Edit#1
I updated the code. Now it´s more complete. But it lacks the implementation of the function findUniqueRows. This function should select rows from the ArrayList that have unique keys (ids). Could someone help me to develop this function? Thanks.
Assuming the objective is to find every combination without duplicates you can do this with the following. The test to find duplicates is just to confirm it doesn't generate any duplicates in the first place.
import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;
public class Main {
public static void main(String... args) {
Bag<Integer> b = new Bag<>();
b.countFor(1, 2);
b.countFor(2, 1);
b.countFor(3, 3);
Set<String> set = new LinkedHashSet<>();
for (List<Integer> list : b.combinations()) {
System.out.println(list);
String s = list.toString();
if (!set.add(s))
System.err.println("Duplicate entry " + s);
}
}
}
class Bag<E> {
final Map<E, AtomicInteger> countMap = new LinkedHashMap<>();
void countFor(E e, int n) {
countMap.put(e, new AtomicInteger(n));
}
void decrement(E e) {
AtomicInteger ai = countMap.get(e);
if (ai.decrementAndGet() < 1)
countMap.remove(e);
}
void increment(E e) {
AtomicInteger ai = countMap.get(e);
if (ai == null)
countMap.put(e, new AtomicInteger(1));
else
ai.incrementAndGet();
}
List<List<E>> combinations() {
List<List<E>> ret = new ArrayList<>();
List<E> current = new ArrayList<>();
combinations0(ret, current);
return ret;
}
private void combinations0(List<List<E>> ret, List<E> current) {
if (countMap.isEmpty()) {
ret.add(new ArrayList<E>(current));
return;
}
int position = current.size();
current.add(null);
List<E> es = new ArrayList<>(countMap.keySet());
if (es.get(0) instanceof Comparable)
Collections.sort((List) es);
for (E e : es) {
current.set(position, e);
decrement(e);
combinations0(ret, current);
increment(e);
}
current.remove(position);
}
}

Categories

Resources