I have a HashMap<GC, List<RR>> with sample data like:
key values
gc1 - rr1
- rr2
- rr3
gc2 - rr4
- rr5
gc3 - rr6
And I need to create all possible combinations of RR from different GC like:
Combination1: rr1, rr4, rr6
Combination2: rr1, rr5, rr6
Combination3: rr2, rr4, rr6
Combination4: rr2, rr5, rr6
Combination5: rr3, rr4, rr6
Combination6: rr3, rr5, rr6
What I've tried so far is, as #Sanket Makani suggests, to turn my HashMap<GC, List<RR>> into a List<List<RR>>, and then iterate through all the elements like:
List<List<RR>> inputList = new ArrayList<List<RR>>();
for (Map.Entry<GC, List<RR>> rrList : Map.entrySet()) {
inputList.add(rrList.getValue());
}
List<List<RR>> combinationsList = new ArrayList<List<RR>>();
for (List<RR> rrList : inputList) {
List<RR> rrList1 = new ArrayList<RR>();
for (RR rr : rrList) {
rrList1.add(rr);
}
combinationsList.add(rrList1);
}
This is not working for me, as it groups all the RR inside one GC like:
Combination1: rr1, rr2, rr3
Combination2: rr4, rr5
Combination3: rr6
So my quesiton is, how can I adapt my code to obtain the expected result?
PS: I'm working with Java6 unfortunately, so no lambdas/streams allowed.
PS2: I've seen similar questions, but can't find and exact example of what I'm looking for.
EDIT:
This is my final implementation with #nandsito's answer:
//this method groups RRs by GC key with a given list
HashMap<GC, List<RR>> GCRRHashMap = groupRRsByGC(list);
List<Map.Entry<GC, List<RR>>> mapEntryList = new ArrayList<Map.Entry<GC, List<RR>>>(GCRRHashMap.entrySet());
List<List<RR>> combinationsList = new ArrayList<List<RR>>();
List<RR> combinations = new ArrayList<RR>();
generateCombinations(mapEntryList, combinations, combinationsList);
private void generateCombinations(
List<Map.Entry<GC, List<RR>>> mapEntryList,
List<RR> combinations, List<List<RR>> combinationsList) {
if (mapEntryList.isEmpty()) {
combinationsList.add(new ArrayList<RoomStay>(combinations));
return;
}
Map.Entry<GC, List<RR>> entry = mapEntryList.remove(0);
List<RR> entryValue = new ArrayList<RR>(entry.getValue());
while (!entryValue.isEmpty()) {
RR rr = entryValue.remove(0);
combinations.add(rr);
generateCombinations(mapEntryList, combinations, combinationsList);
combinations.remove(combinations.size() - 1);
}
mapEntryList.add(0, entry);
}
Here's a recursive solution:
public static void main(String[] args) {
// your data map
Map<GC, List<RR>> map;
// the map entry set as list, which will help
// combining the elements
//
// note this is a modifiable list
List<Map.Entry<GC, List<RR>>> mapEntryList =
new ArrayList<Map.Entry<GC, List<RR>>>(map.entrySet());
// the combinations list, which will store
// the desired results
List<RR> combinations = new ArrayList<RR>();
doRecursion(mapEntryList, combinations);
}
private static void doRecursion(
List<Map.Entry<GC, List<RR>>> mapEntryList,
List<RR> combinations) {
// end of recursion
if (mapEntryList.isEmpty()) {
// do what you wish
//
// here i print each element of the combination
for (RR rr : combinations) {
System.out.println(rr);
}
System.out.println();
return;
}
// remove one GC from the entry list,
// then for each RR from the taken GC
// put RR in the combinations list,
// call recursion
// the remove RR from the combinations list
// end for each
// then put GC back into its list
Map.Entry<GC, List<RR>> entry = mapEntryList.remove(0);
List<RR> entryValue = new ArrayList<RR>(entry.getValue());
while (!entryValue.isEmpty()) {
RR rr = entryValue.remove(0);
combinations.add(rr);
doRecursion(mapEntryList, combinations);
combinations.remove(combinations.size() - 1);
}
mapEntryList.add(0, entry);
}
All you really need to do is work through an incrementing index list:
0,0,0
0,1,0
1,0,0
1,1,0
2,0,0
... etc.
It should be obvious how to translate each of those rows into to values from your data structure. e.g. 0,0,0 maps to rr1, rr4, rr6. This will involve converting the map into a list so that indexes are consistent.
It's very much like a normal base-b count where you increment the rightmost column and if it overflows, set to zero and increment the next one. The only difference is that each column overflows at a different number.
So:
boolean increment(int[] indexes) {
int column = 0;
while(true) {
indexes[column]++;
if(indexes[column] < numberOfRRsInColumn(column)) {
return true; // finished
}
indexes[column]=0;
column++;
if(column = indexes.length) {
return false; // can't increment, no more.
}
}
}
This implementation uses indexes[0] as the "rightmost" column. I've glossed over numberOfRRsInColumn(), but it should be pretty obvious how to do it.
Then:
int[] indexes = new int[mapsize];
do {
output(translate(map, indexes));
} while (increment(indexes));
Related
In a way, it is the reverse of the problem of generating subsets of size k from an array containing k+1 elements.
For example, if somebody gives me the pairs {a,b} , {a,c} , {b,c} , {a,e} , {b,e}, {a,f}, I need an algorithm that will tell me the triplets {a,b,c} and (a,b,e} are completely covered for their pairwise combinations in the pairs given to me. I need to generalize from pair/triplet in my example to the case k/k+1
My hunch was that there would be a well documented and efficient algorithm that solves my problem. Sadly, searching the internet did not help obtaining it. Questions already posted in stackoverflow do not cover this problem. I am thereby compelled to post this question to find my solution.
I'm not familiar with an established algorithm for this and you didn't ask for a specific language so I've written up a C# algorithm that accomplishes what you've asked and matches the test values provided. It doesn't have much real-world error checking. I've got a .Net fiddle you can run to see the results within a web browser. https://dotnetfiddle.net/ErwTeg
It works by converting your array of arrays (or similar container) to a dictionary with every unique value as a key and the value for each key being every value that is found within any list with the key. From your sample, a gets {b,c,e,f} (We'll call them friends, and this is what the GetFriends function does)
The AreFriendsWithEachother function indicates whether or not all passed values are friends with all other values.
The results of the friends list are then fed to the MakeTeam function which makes teams of a given size by enumerating every friend that a key has and trying every size length permutation of these. For instance, in the original example a has friend permutations of {{a,b,c},{a,b,e},{a,b,f},{a,c,b},{a,c,e},{a,c,f},{a,e,b},{a,e,c},{a,e,f},{a,f,b},{a,f,c},{a,f,e}}. Of these we make sure that all three values are friends by checking the friends list we created earlier. If all values within a permutation are friends then we add it to our results cache. The results would then be culled for all duplicate sets. This is handled in C# by using HashSet which only adds items that aren't already on the list.
The MakeTeam function is terrible looking because it contains a runtime variable number of loops (normally visualized by foreach). I am rolling up and down through enumerators and emulating the foreach loops myself.
I've included versions for MakeTeamOf3 and MakeTeamOf4 which show static loop structures, which are very easily adapted when you know your k value ahead of time.
The same code is provided here
using System;
using System.Collections.Generic;
using System.Linq;
namespace kfromkm1 // k elements from k minus 1
{
public class Program
{
static readonly string[][] pairs =
{
new string[] { "a", "b" },
new string[] { "a", "c" },
new string[] { "b", "c" },
new string[] { "a", "e" },
new string[] { "b", "e" },
new string[] { "a", "f" }
};
static readonly string[][] pairsExpectedResult =
{
new string[] { "a", "b", "c" },
new string[] { "a", "b", "e" }
};
static readonly string[][] triplets =
{
new string[] { "a", "b", "c" },
new string[] { "a", "b", "d" },
new string[] { "a", "c", "d" },
new string[] { "b", "c", "d" },
new string[] { "b", "c", "e" }
};
static readonly string[][] tripletsExpectedResults =
{
new string[] { "a", "b", "c", "d" }
};
public static void Main(string[] args)
{
Dictionary<string, HashSet<string>> friendsList = GetFriends(pairs);
Dump(nameof(pairs), pairs);
Console.WriteLine();
Dump(nameof(pairsExpectedResult), pairsExpectedResult);
Console.WriteLine();
HashSet<HashSet<string>> teams = MakeTeams(friendsList, 3);
Dump(nameof(teams), teams);
Console.WriteLine();
friendsList = GetFriends(triplets);
Dump(nameof(triplets), triplets);
Console.WriteLine();
Dump(nameof(tripletsExpectedResults), tripletsExpectedResults);
Console.WriteLine();
teams = MakeTeams(friendsList, 4);
Dump(nameof(teams), teams);
Console.ReadLine();
}
// helper function to display results
static void Dump<T>(string name, IEnumerable<IEnumerable<T>> values)
{
Console.WriteLine($"{name} =");
int line = 0;
bool notfirst;
foreach (IEnumerable<T> layer in values)
{
Console.Write($"{line}: {{");
notfirst = false;
foreach (T value in layer)
{
if (notfirst)
Console.Write($", {value}");
else
{
Console.Write(value);
notfirst = true;
}
}
Console.WriteLine("}");
line++;
}
}
// items are friends if they show up in a set (pair in the example) together
// list can be a list of lists, array of arrays, list of arrays, etc
// {a, b} means a and b are friends
// {a, b, c} means a is friends with b and c, b is friends with a and c, c is friends with a and b
static Dictionary<T, HashSet<T>> GetFriends<T>(IEnumerable<IEnumerable<T>> list) where T : IEquatable<T>
{
Dictionary<T, HashSet<T>> result = new Dictionary<T, HashSet<T>>();
foreach (IEnumerable<T> set in list) // one set at a time
{
foreach (T current in set) // enumerate the set from front to back
{
foreach (T other in set) // enumerate the set with a second pointer to compare every item
{
if (!current.Equals(other)) // ignore self
{
if (!result.ContainsKey(current)) // initialize this item's result hashset
result[current] = new HashSet<T>();
result[current].Add(other); // add friend (hashset will ignore duplicates)
}
}
}
}
return result;
}
// indicates whether or not all items are friends
static bool AreFriendsWithEachother<T>(Dictionary<T, HashSet<T>> friendsList, IEnumerable<T> values)
{
if (friendsList == null) // no list = no results
throw new ArgumentNullException(nameof(friendsList));
foreach (T first in values)
{
if (!friendsList.ContainsKey(first)) // not on list, has no friends
return false;
foreach (T other in values)
{
if (!friendsList[first].Contains(other) && !first.Equals(other)) // false if even one doesn't match, don't count self as non-friend for computational ease
return false;
}
}
return true; // all matched so true
}
// size represents how many items should be in each team
static HashSet<HashSet<T>> MakeTeams<T>(Dictionary<T, HashSet<T>> friendsList, int size) where T : IEquatable<T>
{
if (friendsList == null) // no list = no results
throw new ArgumentNullException(nameof(friendsList));
if (size < 2)
throw new ArgumentOutOfRangeException(nameof(size), size, "Size should be greater than 2");
HashSet<HashSet<T>> result = new HashSet<HashSet<T>>(HashSet<T>.CreateSetComparer());
T[] values = new T[size];
IEnumerator<T>[] enumerators = new IEnumerator<T>[size - 1]; // gotta cache our own enumerators with a variable number of "foreach" layers
int layer;
bool moveNext;
foreach (T key in friendsList.Keys) // this is a mess because it's a runtime variable number of copies of enumerators running over the same list
{
values[0] = key;
for (int index = 0; index < size - 1; index++)
enumerators[index] = friendsList[key].GetEnumerator();
moveNext = true;
layer = 0;
while (moveNext)
{
while (layer < size - 1 && moveNext)
{
if (enumerators[layer].MoveNext())
layer++;
else
{
if (layer == 0)
moveNext = false;
else
{
enumerators[layer].Reset();
layer--;
}
}
}
for (int index = 1; index < size; index++)
values[index] = enumerators[index - 1].Current;
if (values.Distinct().Count() == size && AreFriendsWithEachother(friendsList, values))
result.Add(new HashSet<T>(values));
layer--;
}
}
return result;
}
// provided as an example
static HashSet<HashSet<T>> MakeTeamsOf3<T>(Dictionary<T, HashSet<T>> friendsList) where T : IEquatable<T>
{
if (friendsList == null) // no list = no results
throw new ArgumentNullException(nameof(friendsList));
HashSet<HashSet<T>> result = new HashSet<HashSet<T>>(HashSet<T>.CreateSetComparer());
T[] values;
foreach (T key in friendsList.Keys) // start with every key
{
foreach (T first in friendsList[key])
{
foreach (T second in friendsList[key])
{
values = new T[] { key, first, second };
if (values.Distinct().Count() == 3 && AreFriendsWithEachother(friendsList, values)) // there's no duplicates and they are friends
result.Add(new HashSet<T>(values));
}
}
}
return result;
}
// provided as an example
static HashSet<HashSet<T>> MakeTeamsOf4<T>(Dictionary<T, HashSet<T>> friendsList) where T : IEquatable<T>
{
if (friendsList == null) // no list = no results
throw new ArgumentNullException(nameof(friendsList));
HashSet<HashSet<T>> result = new HashSet<HashSet<T>>(HashSet<T>.CreateSetComparer());
T[] values;
foreach (T key in friendsList.Keys) // start with every key
{
foreach (T first in friendsList[key])
{
foreach (T second in friendsList[key])
{
foreach (T third in friendsList[key])
{
values = new T[] { key, first, second, third };
if (values.Distinct().Count() == 4 && AreFriendsWithEachother(friendsList, values)) // there's no duplicates and they are friends
result.Add(new HashSet<T>(values));
}
}
}
}
return result;
}
}
}
Function to generate SetOfkNbrdElementCombinations
//to generate outputs with k values greater than two (pairwise)
Take SetOfkNbrdElementCombinations as an input
//Example - {{a,b},{b,c},...} : here k is 2 (though variable name will retain the letter k); elements are a,b,c,..; sets {a,b}, {b,c} are 2-numbered combinations of elements
Take nextSize as an input
//nextSize should be bigger than the k in the input SetOfkNbrdElementCombinations by 1.
//For example above where k is 2, nextSize would be 3
//Logic:
Comb(SetOfkNbrdElementCombinations={S1,S2,...Sn},nextSize) = {S1,Comb({SetOfkNbrdElementCombinations-a1},nextSize-l)}
//The recursive algorithm specified in the line above generates sets containing unique nextSize numbered combinations of the combinations in SetOfkNbrdElementCombinations
//Code that implements the algorithm is available at Rosetta code
//In our example it would generate {{{a,b},{b,c},{b,e}},{{a,b},{b,c},{a,c}},...} (triplets of pairs)
//My logic to generate nextSize numbered combinations of elements is below
// Example of my output, based on the example input above, would be {{a,b,c},{a,c,e},...}
Intitialize oputSetOfkNbrdElementCombinations to empty
For each nextSize sized combination of combinations generated above
Join the contained combinations in a union set
If the nbr of elements in the union is nextSize, add the union set to oputSetOfkNbrdElementCombinations
Output oputSetOfkNbrdElementCombinations
Here is the Java implementation of the algorithm. You can copy, paste and run on https://ide.geeksforgeeks.org/
/* This program takes k sized element combinations and generates the k+1 sized element combinations
that are possible.
For example if the program is given {a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}, {b,c,e}
which are 3 sized combinations, it will identify {a,b,c,d} the
4 sized combination that has all the 3 sized combinations of its elements covered
in what were provided to the program
The program can scale to higher values of k.
The program uses only the hashset data structure
*/
//AUTHOR: Suri Chitti
import java.util.*;
public class uppOrdCombsFromCombs {
//sample CSV strings...let us pretend they came from a file
//This is a sample of input to the program
static String[] csvStrings = new String[] {
"a,b,c",
"a,b,d",
"a,c,d",
"b,c,d",
"b,c,e"
};
/* //Another sample CSV strings...let us pretend they came from a file
//This is another sample of input to the program
static String[] csvStrings = new String[] {
"a,b",
"b,c",
"a,c",
"c,e",
"a,e"
};
*/ /////USE ONLY ONE SAMPLE
//Before we can generate a k+1 sized combination of elements from a bunch
//of k sized combinations we need to obtain groups containing k+1 number of
// k sized combinations
//The method below, called SetOfNxtSizeNbrdkElementCombinationsets will do it for us
//It takes a bunch of k sized combinations called the parameter hsSetOfkNbrdCombinationsetsPrm
//which is a hashset.
//It also takes k+1 as input called the parameter nextSize which is an integer
//Outputs k+1 sized groups of k sized element combinations as a variable called hsSetOfNxtSizeNbrdCombinationsets
//which is hashset
static HashSet SetOfNxtSizeNbrdCombinationsets(HashSet hsSetOfkNbrdCombinationsetsPrm, Integer nextSize){
HashSet hsSetOfNxtSizeNbrdCombinationsets = new HashSet<>();//is it better to have nested <HashSet> tokens in this declaration?
HashSet hsRecursor = new HashSet<>(hsSetOfkNbrdCombinationsetsPrm);
Iterator <HashSet> loopIterator1 = hsSetOfkNbrdCombinationsetsPrm.iterator();
while (loopIterator1.hasNext()) {
HashSet hsName = loopIterator1.next();
if(nextSize == 1){
hsSetOfNxtSizeNbrdCombinationsets.add(hsName);
}
else {
HashSet hsConc1 = new HashSet<>();
hsRecursor.remove(hsName);
hsConc1 = SetOfNxtSizeNbrdCombinationsets(hsRecursor,nextSize-1);
Iterator <HashSet> loopIterator2 = hsConc1.iterator();
while (loopIterator2.hasNext()) {
HashSet hsConc2 = new HashSet<>();
HashSet hsConc3 = new HashSet<>();
hsConc2 = loopIterator2.next();
Iterator <HashSet> loopIterator3 = hsConc2.iterator();
Object obj = loopIterator3.next();
if (String.class.isInstance(obj)) {
hsConc3.add(hsName);
hsConc3.add(hsConc2);
}
else {
loopIterator3 = hsConc2.iterator();
hsConc3.add(hsName);
while (loopIterator3.hasNext()) {
hsConc3.add(loopIterator3.next());
}
}
hsSetOfNxtSizeNbrdCombinationsets.add(hsConc3);
}
}
}
return hsSetOfNxtSizeNbrdCombinationsets;
}
//The method below takes the k+1 sized groupings of k sized element combinations
//generated by the method above and generates all possible K+1 sized combinations of
//elements contained in them
//Name of the method is SetOfkNbrdCombinationsets
//It takes the k+1 sized groupings in a parameter called hsSetOfNxtSizeNbrdCombinationsetsPrm which is a HashSet
//It takes the value k+1 as a parameter called nextSize which is an Integer
//It returns k+1 sized combinations as a variable called hsSetOfkNbrdCombinationsets which is a HashSet
//This is the intended output of the whole program
static HashSet SetOfkNbrdCombinationsets(HashSet hsSetOfNxtSizeNbrdCombinationsetsPrm, Integer nextSize){
HashSet hsSetOfkNbrdCombinationsets = new HashSet<>();
HashSet hsMember = new HashSet<>();
Iterator <HashSet> loopIteratorOverParam = hsSetOfNxtSizeNbrdCombinationsetsPrm.iterator();
while (loopIteratorOverParam.hasNext()) {
hsMember = loopIteratorOverParam.next();
HashSet hsInnerUnion = new HashSet<>();
Iterator <HashSet> loopIteratorOverMember = hsMember.iterator();
while (loopIteratorOverMember.hasNext()) {
HashSet hsInnerMemb = new HashSet<>(loopIteratorOverMember.next());
hsInnerUnion.addAll(hsInnerMemb);
}
if (hsInnerUnion.size()==nextSize) {
HashSet hsTemp = new HashSet<>(hsInnerUnion);
hsSetOfkNbrdCombinationsets.add(hsTemp);
}
hsInnerUnion.clear();
}
return hsSetOfkNbrdCombinationsets;
}
public static void main(String args[]) {
HashSet hsSetOfkNbrdCombinationsets = new HashSet<>();//should this have nested <HashSet> tokens?
HashSet hsSetOfNxtSizeNbrdCombinationsets = new HashSet<>();//should this have nested <HashSet> tokens?
Integer innerSize=0,nextSize = 0;
System.out.println("Ahoy");
//pretend we are looping through lines in a file here
for(String line : csvStrings)
{
String[] linePieces = line.split(",");
List<String> csvPieces = new ArrayList<String>(linePieces.length);
for(String piece : linePieces)
{
//System.out.println(piece); will print each piece in separate lines
csvPieces.add(piece);
}
innerSize = csvPieces.size();
Set<String> hsInner = new HashSet<String>(csvPieces);
hsSetOfkNbrdCombinationsets.add(hsInner);
}
nextSize = innerSize+1; //programmatically obtain nextSize
hsSetOfNxtSizeNbrdCombinationsets = SetOfNxtSizeNbrdCombinationsets(hsSetOfkNbrdCombinationsets,nextSize);
hsSetOfkNbrdCombinationsets = SetOfkNbrdCombinationsets(hsSetOfNxtSizeNbrdCombinationsets, nextSize);
System.out.println("The " + nextSize + " sized combinations from elements present in the input combinations are: " + hsSetOfkNbrdCombinationsets);
} //end of main
} //end of class
I am trying to write a while loop that will continue to iterate until the nodes list does not have a certain key in a it's map. My code looks like this:
List<Map<Integer, Integer>> nodes = new LinkedList<Map<Integer, Integer>>();
List<Integer> parent = new LinkedList<Integer>();
.
.
.
while (parent != null) {
int vertex = parent.remove(0);
while(//The problem )
}
}
I will be pulling the integer from parent and placing it into vertex and will be using vertex to find the key in the nodes. What would the call look like to find the integer in nodes?
This might help
for(int vertex: parent) {
for(Map<Integer, Integer) entry : nodes) {
if(entry.contains(vertex) {
//the map entry has the key, write your logic and return
return entry
}
}
}
I'm not exactly sure if that's what you want but it might.
List<Map<Integer, Integer>> nodes = new LinkedList<Map<Integer, Integer>>();
LinkedList<Integer> parent = new LinkedList<Integer>();
// ^ or Queue to use .poll() which removes the first item
Integer parentItem;
while ((parentItem = parent.poll()) != null) {
// check if that item is somewhere in the maps
boolean inMaps = false;
for(Map<Integer, Integer> map : nodes) {
if (map.containsKey(parentItem)) {
inMaps = true;
break;
}
}
// if it is not do something special, maybe "return" or "break;"
if (!inMaps) {
// do something.
}
}
I have an algorithmic problem at hand. To easily explain the problem, I will be using a simple analogy.
I have an input file
Country,Exports
Austrailia,Sheep
US, Apple
Austrialia,Beef
End Goal:
I have to find the common products between the pairs of countries so
{"Austrailia,New Zealand"}:{"apple","sheep}
{"Austrialia,US"}:{"apple"}
{"New Zealand","US"}:{"apple","milk"}
Process :
I read in the input and store it in a TreeMap > Where the List, the strings are interned due to many duplicates.
Essentially, I am aggregating by country.
where Key is country, Values are its Exports.
{"austrailia":{"apple","sheep","koalas"}}
{"new zealand":{"apple","sheep","milk"}}
{"US":{"apple","beef","milk"}}
I have about 1200 keys (countries) and total number of values(exports) is 80 million altogether.
I sort all the values of each key:
{"austrailia":{"apple","sheep","koalas"}} -- > {"austrailia":{"apple","koalas","sheep"}}
This is fast as there are only 1200 Lists to sort.
for(k1:keys)
for(k2:keys)
if(k1.compareTo(k2) <0){ //Dont want to double compare
List<String> intersectList = intersectList_func(k1's exports,k2's exports);
countriespair.put({k1,k2},intersectList)
}
This code block takes so long.I realise it O(n2) and around 1200*1200 comparisions.Thus,Running for almost 3 hours till now..
Is there any way, I can speed it up or optimise it.
Algorithm wise is best option, or are there other technologies to consider.
Edit:
Since both List are sorted beforehand, the intersectList is O(n) where n is length of floor(listOne.length,listTwo.length) and NOT O(n2) as discussed below
private static List<String> intersectList(List<String> listOne,List<String> listTwo){
int i=0,j=0;
List<String> listResult = new LinkedList<String>();
while(i!=listOne.size() && j!=listTwo.size()){
int compareVal = listOne.get(i).compareTo(listTwo.get(j));
if(compareVal==0){
listResult.add(listOne.get(i));
i++;j++;} }
else if(compareVal < 0) i++;
else if (compareVal >0) j++;
}
return listResult;
}
Update 22 Nov
My current implementation is still running for almost 18 hours. :|
Update 25 Nov
I had run the new implementation as suggested by Vikram and a few others. It's been running this Friday.
My question, is that how does grouping by exports rather than country save computational complexity. I find that the complexity is the same. As Groo mentioned, I find that the complexity for the second part is O(E*C^2) where is E is exports and C is country.
This can be done in one statement as a self-join using SQL:
test data. First create a test data set:
Lines <- "Country,Exports
Austrailia,Sheep
Austrailia,Apple
New Zealand,Apple
New Zealand,Sheep
New Zealand,Milk
US,Apple
US,Milk
"
DF <- read.csv(text = Lines, as.is = TRUE)
sqldf Now that we have DF issue this command:
library(sqldf)
sqldf("select a.Country, b.Country, group_concat(Exports) Exports
from DF a, DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
")
giving this output:
Country Country Exports
1 Austrailia New Zealand Sheep,Apple
2 Austrailia US Apple
3 New Zealand US Apple,Milk
with index If its too slow add an index to the Country column (and be sure not to forget the main. parts:
sqldf(c("create index idx on DF(Country)",
"select a.Country, b.Country, group_concat(Exports) Exports
from main.DF a, main.DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
"))
If you run out memory then add the dbname = tempfile() sqldf argument so that it uses disk.
Store something like following datastructure:- (following is a pseudo code)
ValuesSet ={
apple = {"Austrailia","New Zealand"..}
sheep = {"Austrailia","New Zealand"..}
}
for k in ValuesSet
for k1 in k.values()
for k2 in k.values()
if(k1<k2)
Set(k1,k2).add(k)
time complextiy: O(No of distinct pairs with similar products)
Note: I might be wrong but i donot think u can reduce this time complexity
Following is a java implementation for your problem:-
public class PairMatching {
HashMap Country;
ArrayList CountNames;
HashMap ProdtoIndex;
ArrayList ProdtoCount;
ArrayList ProdNames;
ArrayList[][] Pairs;
int products=0;
int countries=0;
public void readfile(String filename) {
try {
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
CountNames = new ArrayList();
Country = new HashMap<String,Integer>();
ProdtoIndex = new HashMap<String,Integer>();
ProdtoCount = new ArrayList<ArrayList>();
ProdNames = new ArrayList();
products = countries = 0;
while((line=br.readLine())!=null) {
String[] s = line.split(",");
s[0] = s[0].trim();
s[1] = s[1].trim();
int k;
if(!Country.containsKey(s[0])) {
CountNames.add(s[0]);
Country.put(s[0],countries);
k = countries;
countries++;
}
else {
k =(Integer) Country.get(s[0]);
}
if(!ProdtoIndex.containsKey(s[1])) {
ProdNames.add(s[1]);
ArrayList n = new ArrayList();
ProdtoIndex.put(s[1],products);
n.add(k);
ProdtoCount.add(n);
products++;
}
else {
int ind =(Integer)ProdtoIndex.get(s[1]);
ArrayList c =(ArrayList) ProdtoCount.get(ind);
c.add(k);
}
}
System.out.println(CountNames);
System.out.println(ProdtoCount);
System.out.println(ProdNames);
} catch (FileNotFoundException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
}
}
void FindPairs() {
Pairs = new ArrayList[countries][countries];
for(int i=0;i<ProdNames.size();i++) {
ArrayList curr = (ArrayList)ProdtoCount.get(i);
for(int j=0;j<curr.size();j++) {
for(int k=j+1;k<curr.size();k++) {
int u =(Integer)curr.get(j);
int v = (Integer)curr.get(k);
//System.out.println(u+","+v);
if(Pairs[u][v]==null) {
if(Pairs[v][u]!=null)
Pairs[v][u].add(i);
else {
Pairs[u][v] = new ArrayList();
Pairs[u][v].add(i);
}
}
else Pairs[u][v].add(i);
}
}
}
for(int i=0;i<countries;i++) {
for(int j=0;j<countries;j++) {
if(Pairs[i][j]==null)
continue;
ArrayList a = Pairs[i][j];
System.out.print("\n{"+CountNames.get(i)+","+CountNames.get(j)+"} : ");
for(int k=0;k<a.size();k++) {
System.out.print(ProdNames.get((Integer)a.get(k))+" ");
}
}
}
}
public static void main(String[] args) {
PairMatching pm = new PairMatching();
pm.readfile("Input data/BigData.txt");
pm.FindPairs();
}
}
[Update] The algorithm presented here shouldn't improve time complexity compared to the OP's original algorithm. Both algorithms have the same asymptotic complexity, and iterating through sorted lists (as OP does) should generally perform better than using a hash table.
You need to group the items by product, not by country, in order to be able to quickly fetch all countries belonging to a certain product.
This would be the pseudocode:
inputList contains a list of pairs {country, product}
// group by product
prepare mapA (product) => (list_of_countries)
for each {country, product} in inputList
{
if mapA does not contain (product)
create a new empty (list_of_countries)
and add it to mapA with (product) as key
add this (country) to the (list_of_countries)
}
// now group by country_pair
prepare mapB (country_pair) => (list_of_products)
for each {product, list_of_countries} in mapA
{
for each pair {countryA, countryB} in list_of_countries
{
if mapB does not countain country_pair {countryA, countryB}
create a new empty (list_of_products)
and add it to mapB with country_pair {countryA, countryB} as key
add this (product) to the (list_of_products)
}
}
If your input list is length N, and you have C distinct countries and P distinct products, then the running time of this algorithm should be O(N) for the first part and O(P*C^2) for the second part. Since your final list needs to have pairs of countries mapping to lists of products, I don't think you will be able to lose the P*C^2 complexity in any case.
I don't code in Java too much, so I added a C# example which I believe you'll be able to port pretty easily:
// mapA maps each product to a list of countries
var mapA = new Dictionary<string, List<string>>();
foreach (var t in inputList)
{
List<string> countries = null;
if (!mapA.TryGetValue(t.Product, out countries))
{
countries = new List<string>();
mapA[t.Product] = countries;
}
countries.Add(t.Country);
}
// note (this is very important):
// CountryPair tuple must have value-type comparison semantics,
// i.e. you need to ensure that two CountryPairs are compared
// by value to allow hashing (mapping) to work correctly, in O(1).
// In C# you can also simply use a Tuple<string,string> to
// represent a pair of countries (which implements this correctly),
// but I used a custom class to emphasize the algorithm
// mapB maps each CountryPair to a list of products
var mapB = new Dictionary<CountryPair, List<string>>();
foreach (var kvp in mapA)
{
var product = kvp.Key;
var countries = kvp.Value;
for (int i = 0; i < countries.Count; i++)
{
for (int j = i + 1; j < countries.Count; j++)
{
var pair = CountryPair.Create(countries[i], countries[j]);
List<string> productsForCountryPair = null;
if (!mapB.TryGetValue(pair, out productsForCountryPair))
{
productsForCountryPair = new List<string>();
mapB[pair] = productsForCountryPair;
}
productsForCountryPair.Add(product);
}*
}
}
This is a great example to use Map Reduce.
At your map phase you just collect all the exports that belong to each Country.
Then, the reducer sorts the products (Products belong to the same country, because of mapper)
You will benefit from distributed, parallel algorithm that can be distributed into a cluster.
You are actually taking O(n^2 * time required for 1 intersect).
Lets see if we can improve time for intersect. We can maintain map for every country which stores corresponding products, so you have n hash maps for n countries. Just need to iterate thru all products once for initializing. If you want quick lookup, maintain a map of maps as:
HashMap<String,HashMap<String,Boolean>> countryMap = new HashMap<String, HashMap<String,Boolean>>();
Now if you want to find the common products for countries str1 and str2 do:
HashMap<String,Boolean> map1 = countryMap.get("str1");
HashMap<String,Boolean> map2 = countryMap.get("str2");
ArrayList<String > common = new ArrayList<String>();
Iterator it = map1.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String,Boolean> pairs = (Map.Entry)it.next();
//Add to common if it is there in other map
if(map2.containsKey(pairs.getKey()))
common.add(pairs.getKey());
}
So, total it will be O(n^2 * k) if there are k entries in one map assuming hash map lookup implementation is O(1) (I guess it is log k for java).
Using hashmaps where necessary to speed things up:
1) Go through the data and create a map with keys Items and values a list of countries associated with that item. So e.g. Sheep:Australia, US, UK, New Zealand....
2) Create a hashmap with keys each pair of countries and (initially) an empty list as values.
3) For each Item retrieve the list of countries associated with it and for each pair of countries within that list, add that item to the list created for that pair in step (2).
4) Now output the updated list for each pair of countries.
The largest costs are in steps (3) and (4) and both of these costs are linear in the amount of output produced, so I think this is not too far from optimal.
My program creates an arraylist of 5000 to 60000 records depending on time of day. I want to split it into as many arraylists as possible that each arraylist will have 1000 records. I looked at many examples online and tried a few things but I ran into strange problems. Can you please show me an example of this?
Regards!
public static <T> Collection<Collection<T>> split(Collection<T> bigCollection, int maxBatchSize) {
Collection<Collection<T>> result = new ArrayList<Collection<T>>();
ArrayList<T> currentBatch = null;
for (T t : bigCollection) {
if (currentBatch == null) {
currentBatch = new ArrayList<T>();
} else if (currentBatch.size() >= maxBatchSize) {
result.add(currentBatch);
currentBatch = new ArrayList<T>();
}
currentBatch.add(t);
}
if (currentBatch != null) {
result.add(currentBatch);
}
return result;
}
Here's how we use it (assuming emails an a large ArrayList of email addresses:
Collection<Collection<String>> emailBatches = Helper.split(emails, 500);
for (Collection<String> emailBatch : emailBatches) {
sendEmails(emailBatch);
// do something else...
// and something else ...
}
}
where emailBatch would iterate over the collection like this:
private static void sendEmails(Collection<String> emailBatch){
for(String email: emailBatch){
// send email code here.
}
}
You can use the subList http://docs.oracle.com/javase/6/docs/api/java/util/List.html#subList from List to split your ArrayList. The sublist will give you a view of the original list. If you really want to create a new list, separate from the old one, you could do something like:
int index = 0;
int increment = 1000;
while ( index < bigList.size() ) {
newLists.add(new ArrayList<Record>(bigList.subList(index,index+increment));
index += increment;
}
Note you'll have to check for off by one errors here. This is just a quick pseudocode sample.
I'm having a tough time wrapping my head around the following situation. The best way to explain may be by example
I have a Map<Column,Set<Row>> object.
Let's say it contains the following data:
ColumnA['abc','def']
ColumnB['efg','hij','klm']
ColumnC['nop']
ColumnD['qrs','tuv','wxy','zzz']
I am trying to generate the following output:
Row1[abc,efg,nop,qrs]
Row2[abc,efg,nop,tuv]
Row3[abc,efg,nop,wxy]
Row4[abc,efg,nop,zzz]
Row5[abc,hij,nop,qrs]
Row6[abc,hij,nop,wxy]
etc...
So in this case there would be 24 rows total.
However, the number of columns and rows are both dynamic. I feel like this needs to be recursively done somehow but I'm not sure where to start.
Any help would be appreciated.
Update - I made a Tree structure that seems to work.
DefaultMutableTreeNode root = new DefaultMutableTreeNode();
Set<DefaultMutableTreeNode> curNodes = new HashSet<DefaultMutableTreeNode>();
curNodes.add(root);
final Set<Column> keys = map.keySet();
for (final Column key : keys) {
final Set<Row> rowSet = map.get(key);
Set<DefaultMutableTreeNode> tmpNodes = new HashSet<DefaultMutableTreeNode>();
for (final Row row : rowSet) {
DefaultMutableTreeNode curNode = new DefaultMutableTreeNode();
curNode.setUserObject(row);
tmpNodes.add(curNode);
for (DefaultMutableTreeNode n : curNodes) {
n.add(curNode);
}
}
curNodes = tmpNodes;
}
I hope this is not some student's homework.
First to keep the order of the map's keys the same, use a SortedMap, like TreeMap.
Furthermore in your initial map every Row contains just a single value like 'abc'.
Recursion here is a depth-first traversal. The hard thing is that a map has not a
natural traversal. For the rest have todos/candidates and dones/result; do a step changing data and afterwards restore them.
Here I use the more known List, but a Stack would be nicer.
public List<Row> generateRows(SortedMap<Column, Set<Cell>> map) {
List<Row> done = new ArrayList<Row>();
List<Column> columnsToDo = new LinkedList<Column>(map.keySet());
List<Cell> partialRow = new LinkedList<Cell>();
generateRowsRec(map, columnsToDo, partialRow, done);
return done;
}
void generateRowsRec(SortedMap<Column, Set<Cell>> map, List<Column> columnsToDo, List<Cell> partialRow, List<Row> done) {
if (columnsToDo.isEmpty()) {
done.add(new Row(partialRow));
return;
}
Column firstColumn = columnsToDo.remove(0); // Step A
for (Cell cell : map.get(firstColumn)) {
partialRow.add(cell); // Step B
generateRowsRec(map, columnsToDo, partialRow, done);
partialRow.remove(partialRow.size() - 1); // Unstep B
}
columnsToDo.add(0, firstColumn); // Unstep A
}