Related
I have .csv file that has a list of grocery items that has their UPC code and the product name. I already wrote some code to put the product's code and its name into a map and they are working fine (code shown below). Without using a TreeMap or anything related directly to using the Map interface, how do I extract the key set of the map into a list and apply a sorting algorithm on it (like quick sort, for example) and have an output of those sorted keys with their assigned value? (I can write the sorting algorithm by myself so it's not a concern). The keys and values are unique.
I can change the method to use an array list instead of a map and directly apply the sorting algorithm on in. I also know there are various great ways to do this sorting revolving around using the Map interface, but this is just for the sake of practicing and I'm curious to know how do I implement something like that.
My item list look like this (I use the second data row 'upc14' as key and the final data row 'name' as value):
grp_id,upc14,upc12,brand,name
1,00035200264013,035200264013,Riceland,Riceland American Jazmine Rice
2,00011111065925,011111065925,Caress,Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct
3,00023923330139,023923330139,Earth's Best,Earth's Best Organic Fruit Yogurt Smoothie Mixed Berry
4,00208528800007,208528800007,Boar's Head,Boar's Head Sliced White American Cheese - 120 Ct
5,00759283100036,759283100036,Back To Nature,Back To Nature Gluten Free White Cheddar Rice Thin Crackers
6,00074170388732,074170388732,Sally Hansen,Sally Hansen Nail Color Magnetic 903 Silver Elements
7,00070177154004,070177154004,Twinings Of London,Twinings Of London Classics Lady Grey Tea - 20 Ct
8,00051600080015,051600080015,Lea & Perrins,Lea & Perrins Marinade In-a-bag Cracked Peppercorn
9,00019600923015,019600923015,Van De Kamp's,Van De Kamp's Fillets Beer Battered - 10 Ct
10,00688267141676,688267141676,Ahold,Ahold Cocoa Almonds
public class ReadFile {
// File reading stuffs
private static Map<Long,String> readFromCSV() {
Map<Long,String> map = new LinkedHashMap<>();
try {
BufferedReader reader = new BufferedReader(new FileReader("Grocery_UPC_Database.csv"));
reader.readLine();
String line = reader.readLine();
while (line != null) {
String[] attributes = line.split(",");
for(int i = 1; i < attributes.length; i++) {
map.put(Long.parseLong(attributes[1]), attributes[4]);
}
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
return map;
}
private static void createFile(Map<Long,String> map) {
PrintWriter out = null;
try {
out = new PrintWriter("output.txt");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
for (Map.Entry<Long,String> entry : map.entrySet()) {
assert out != null;
out.println(entry.getValue());
}
assert out != null;
out.close();
}
public static void main(String[] args) {
Map<Long,String> map = readFromCSV();
createFile(map);
}
}
how do I extract the key set of the map ...
Call keySet().
... into a list ...
Copy the keys into a List using the ArrayList constructor made for the purpose:
List<Long> keyList = new ArrayList<>(map.keySet());
... and apply a sorting algorithm on it (like quick sort, for example) ...
Call sort() (Java 8+) or Collections.sort(), with a Comparator that implements the desired ordering, e.g.
keyList.sort((a, b) -> Long.compare(b, a)/*sort descending*/);
... and have an output of those sorted keys with their assigned value?
Iterate the list and print the values from the map, e.g.
for (Long key : keyList) {
System.out.println(key + " = " + map.get(key));
}
If you are using Java8, you can use streams to sort using compareByKey().
Map sortedMap = map.entrySet().stream().sorted(comparingByKey())
.collect(toMap(e -> e.getKey(), e -> e.getValue(), (e1, e2) -> e2), LinkedHashMap::new));
When I wrote this piece of code due to the pnValue.clear(); the output I was getting was null values for the keys. So I read somewhere that adding values of one map to the other is a mere reference to the original map and one has to use the clone() method to ensure the two maps are separate. Now the issue I am facing after cloning my map is that if I have multiple values for a particular key then they are being over written. E.g. The output I am expecting from processing a goldSentence is:
{PERSON = [James Fisher],ORGANIZATION=[American League, Chicago Bulls]}
but what I get is:
{PERSON = [James Fisher],ORGANIZATION=[Chicago Bulls]}
I wonder where I am going wrong considering I am declaring my values as a Vector<String>
for(WSDSentence goldSentence : goldSentences)
{
for (WSDElement word : goldSentence.getWsdElements()){
if (word.getPN()!=null){
if (word.getPN().equals("group")){
String newPNTag = word.getPN().replace("group", "organization");
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (Vector<String>) pnValue.clone();
annotationMap.put(newPNTag.toUpperCase(),newPNValue);
}
else{
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (Vector<String>) pnValue.clone();
annotationMap.put(word.getPN().toUpperCase(),newPNValue);
}
}
sentenceAnnotationMap = (LinkedHashMap<String, Vector<String>>) annotationMap.clone();
pnValue.clear();
}
EDITED CODE
Replaced Vector with List and removed cloning. However this still doesn't solve my problem. This takes me back to square one where my output is : {PERSON=[], ORGANIZATION=[]}
for(WSDSentence goldSentence : goldSentences)
{
for (WSDElement word : goldSentence.getWsdElements()){
if (word.getPN()!=null){
if (word.getPN().equals("group")){
String newPNTag = word.getPN().replace("group", "organization");
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (List<String>) pnValue;
annotationMap.put(newPNTag.toUpperCase(),newPNValue);
}
else{
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = pnValue;
annotationMap.put(word.getPN().toUpperCase(),newPNValue);
}
}
sentenceAnnotationMap = annotationMap;
}
pnValue.clear();
You're trying a bunch of stuff without really thinking through the logic behind it. There's no need to clear or clone anything, you just need to manage separate lists for separate keys. Here's the basic process for each new value:
If the map contains our key, get the list and add our value
Otherwise, create a new list, add our value, and add the list to the map
You've left out most of your variable declarations, so I won't try to show you the exact solution, but here's the general formula:
List<String> list = map.get(key); // try to get the list
if (list == null) { // list doesn't exist?
list = new ArrayList<>(); // create an empty list
map.put(key, list); // insert it into the map
}
list.add(value); // update the list
I have got two ArrayLists, created from parsed html. First one contains jobs and is like
Job A
Job B
Job C
and the second one is like
Company A
Company B
Company C
What I need is combination of Job A and Company A and so on, so I can get the results like (an ArrayList too would be great)
Job A : Company A
Job B : Company B
Job C : Company C
I didn't find clear tutorial or something. Any ideas?
Are you sure you are looking at the correct data structure to achieve this?
Why not use a Map? You can define a key/value relationship going this route.
Map<Company, Job> jobMap = new HashMap<Company, Job>();
jobMap.put("Company A" /* or corresponding list item */, "Job A" /* or corresponding list item */);
You may even do something like this: (Swap out the strings to your to fit your implementation)
Map<Company, List<Job>> jobMap...;
List<Job> jobList = new ArrayList<Job>();
jobList.add("Job A");
jobList.add("Job B");
jobList.add("Job C");
jobMap.put("Company A", jobList);
What this will do is define a company as your key and you can set multiple jobs to a company
if (jobs.length() != companies.length()) {
throw new InvalidArgumentException("Mismatch of jobs and companies");
}
for (int i = 0; i < jobs.length(); i++) {
combine(jobs.get(i), companies.get(i));
}
There are lots of ways to combine references between two kinds of objects. Here's a flexible example that will let you use one to look up the other. It's overkill if you know which you'd always be using to do the lookup. Using LinkedHashMap also preserves the insertion order. So if you decide to put them in B, C, A order, you can get them out in B, C, A order.
LinkedHashMap<Job, Company> jobToCompany = new LinkedHashMap<>();
LinkedHashMap<Company, Job> companyToJob = new LinkedHashMap<>();
private void combine(Job job, Company company) {
jobToCompany.put(job, company);
companyToJob.put(company, job);
}
If you really want to store the combined values in an ArrayList then the following code will work for you:
List<String> jobs = new ArrayList<>();
List<String> companies = new ArrayList<>();
List<String> mergedList = new ArrayList<>();
//assuming the value are populated for `jobs` and `companies`.
if(jobs.size() == companies.size()) {
int n = jobs.size();
for(int index=0; index<n; index++)
{
mergedList.add(jobs.get(index) + " : " + companies.get(index))
}
} else {
System.out.println("Cannot combine");
//Throw exception or take any action you need.
}
Keep in mind that if you need to search for any item it would be O(n) but I assume you are aware of it before taking decision of going with an ArrayList.
If you're not willing to use a Map (not sure why you would that) my approach would be: To create another class (lets call it CompanyJob) that would contain both a Company and a Job attribute, then simply have a collection of your CompanyJob instances (an ArrayList would do).
class CompanyList{
private Company mCompany;
private Job mJob;
public CompanyList (Company com, Job job){
mCompany = com;
mJob = job;
}
// Some logic ...
}
// Then your list
private ArrayList<CompanyList> yourList = new ArraList <>();
int i = 0;
for (Company tmpCom: companyList){
yourList.add (new CompanyJob (tmpCom,jobList.get(i));
i++;
}
You need to create a new one
List<String> third = new ArrayList<String>();
Also need a counter.
int position = 0;
Then iterate through the list (considering the size is same for both the list).
for(String item:firstList){
third.add(item+ " : " + secondList.get(position);
position ++;
}
Then the third will have the desired result.
To confirm:
for (String item:third){
//try to print "item" here
}
[updated code] (Sorry guys, I didn't provide the whole code, because in my experience large codes seem to "scare off" possible helpers.)
For an ExpandableListView I want to build a HashMap<String, ArrayList<String>> where String is the name of a category and ArrayList<String> the names of animals belonging to that category. I populate the HashMap as such:
HashMap<String, ArrayList<String>> map_groups_childs;
ArrayList<String> list_group_titles;
private void prepareListData(ArrayList<Id_triv_cat> search_results) {
list_group_titles = new ArrayList<String>(); // this is a list of group titles
map_groups_childs = new HashMap<String, ArrayList<String>>(); // this is a map. each group title gets a list of its respective childs
// temporary List of items for a certain category/group
ArrayList<String> temp_childs_list = new ArrayList<String>();
// "search_results" is an ArrayList of self defined objects each containing an ID, a name and a category name
int count = search_results.size();
int i_cat = 0;
int i=0;
// if category "i" is the same as the next category "i+1", add child to list
for (i=0; i<count-1; i++) {
// build group with category name
list_group_titles.add(search_results.get(i).get_type());
// while category does not change, add child to the temporary childs-array
while (i<=count && search_results.get(i).get_type().equals(search_results.get(i+1).get_type())) {
temp_childs_list.add(search_results.get(i).get_name());
i++;
}
// must be done, as the while loop does not get to the last "i" of every category
temp_childs_list.add(search_results.get(i).get_name());
Log.i("DEBUG", temp_childs_list.size()); // --> returns always more than 0
Log.i("DEBUG", temp_childs_list.toString()); // --> returns always something like [word1, word2, word3, ...]
Log.i("DEBUG", list_group_titles.get(i_cat)); // --> returns always a single word like "Insekten"
// add [group_title:list_of_childs] to map
map_groups_childs.put(list_group_titles.get(i_cat++), temp_childs_list);
// clear temp_list, otherwise former category's species will be added to new category
temp_childs_list.clear();
}
Log.i("DEBUG", map_groups_childs.containsKey("Insekten")); // --> returns true
Log.i("DEBUG", map_groups_childs.size()); // --> returns 10
Log.i("DEBUG", map_groups_childs.get("Insekten").size()); // --> returns 0
Log.i("DEBUG", map_groups_childs.toString()); // --> returns {Insekten=[], Reptilien=[], ...}
}
The use of the same i in the for- and while-loop may seem wrong or confusing, but it is okay. No i is skipped in any way or used twice.
All keys I put in the HashMap are there, but the ArrayList I want to get with (for example) map_groups_childs.get("Insekten") is empty. What am I doing wrong?
...
map_groups_childs.put(..., temp_childs_list);
temp_childs_list.clear();
}
Objects are passed as a reference in Java. You are always putting the same List in to the Map and clearing it after every iteration. Thus every value in the Map points to the same List which is empty.
What you probably need is something like this:
for( ... ) {
List<String> tempChildsList = new ArrayList<>();
...
mapGroupChilds.put(..., tempChildsList);
}
Thus a new List is created on every iteration.
I also agree with #CandiedOrange your code is a mess and probably overly complex. In general the point of abstractions like List and Map is to not access things by counting numerical indexes all the time.
Note that in Java, the convention is that identifiers for variables are camelCase, not under_scored.
Near as I can tell what you need are the 5 lines in the second group below
HashMap<String, ArrayList<String>> map_groups_childs = new HashMap<String, ArrayList<String>>();
ArrayList<String> list_group_titles = new ArrayList<String>();
ArrayList<String> temp_childs_list = new ArrayList<String>();
list_group_titles.add("insects");
temp_childs_list.add("Swallowtail");
temp_childs_list.add("Small White");
temp_childs_list.add("Large White");
temp_childs_list.add("Silverfish");
int k = 0;
Log.i("DEBUG", list_group_titles.get(k)); // --> returns "insects"
Log.i("DEBUG", temp_childs_list); // --> returns [Swallowtail, Small White, Large White, Silverfish]
map_groups_childs.put(list_group_titles.get(k), temp_childs_list);
Log.i("DEBUG", map_groups_childs.size()); // --> returns 1 not 10
Log.i("DEBUG", map_groups_childs.containsKey("insects")); // --> returns true
Log.i("DEBUG", map_groups_childs.get("insects").size()); // --> returns 4 not 0
Even with the edit, your code is still missing huge clues about what is going on. I've made some guesses and if I'm right, you are making this way too hard.
I've inferred the existence of a class Id_triv_cat that has getters named get_name() and get_type().
static class Id_triv_cat {
String type;
String name;
Id_triv_cat( String type, String name )
{
this.name = name;
this.type = type;
}
public String get_type()
{
return type;
}
public String get_name()
{
return name;
}
}
I've also written this code to test your prepareListData() method.
public static void main(String[] args){
ArrayList<Id_triv_cat> search_results = new ArrayList<Id_triv_cat>();
search_results.add( new Id_triv_cat("type1", "name1") );
search_results.add( new Id_triv_cat("type2", "name2") );
search_results.add( new Id_triv_cat("Insekten", "insekten name1") );
search_results.add( new Id_triv_cat("Insekten", "insekten name2") );
search_results.add( new Id_triv_cat("type3", "name3") );
new Test().myPrepareListData( search_results );
}
And while I could fix your minor defect like a good SE denizen I'm going to risk having all of this migrated to Programmers because your biggest problem is a design problem. Your code is suffering from a lack of clear local identifiers. Instead of making locals you are having a dot fest with the java utility classes that's making your code pointlessly hard to follow.
If, as I suspect, you are trying to build a map from a list of Id_triv_cat's using their type as a key and name as a value then try this:
private void myPrepareListData(ArrayList<Id_triv_cat> search_results) {
// Each group title is mapped to a list of its respective children
HashMap<String, ArrayList<String>> my_map_groups_childs =
new HashMap<String, ArrayList<String>>();
for (Id_triv_cat search_result : search_results)
{
String type = search_result.get_type();
String name = search_result.get_name();
if ( my_map_groups_childs.containsKey(type) )
{
ArrayList<String> names = my_map_groups_childs.get(type);
names.add(name);
}
else
{
ArrayList<String> names = new ArrayList<String>();
names.add(name);
my_map_groups_childs.put(type, names);
}
}
Log.i("DEBUG", "my_map_groups_childs = " + my_map_groups_childs);
}
This displays: my_map_groups_childs = {Insekten=[insekten name1, insekten name2], type1=[name1], type3=[name3], type2=[name2]}
See how a few well chosen local's can make life so much easier?
If that's not what you wanted you're going to have to make your question clearer.
And Radiodef is right. You really should use camelCase when you code in Java.
I have an algorithmic problem at hand. To easily explain the problem, I will be using a simple analogy.
I have an input file
Country,Exports
Austrailia,Sheep
US, Apple
Austrialia,Beef
End Goal:
I have to find the common products between the pairs of countries so
{"Austrailia,New Zealand"}:{"apple","sheep}
{"Austrialia,US"}:{"apple"}
{"New Zealand","US"}:{"apple","milk"}
Process :
I read in the input and store it in a TreeMap > Where the List, the strings are interned due to many duplicates.
Essentially, I am aggregating by country.
where Key is country, Values are its Exports.
{"austrailia":{"apple","sheep","koalas"}}
{"new zealand":{"apple","sheep","milk"}}
{"US":{"apple","beef","milk"}}
I have about 1200 keys (countries) and total number of values(exports) is 80 million altogether.
I sort all the values of each key:
{"austrailia":{"apple","sheep","koalas"}} -- > {"austrailia":{"apple","koalas","sheep"}}
This is fast as there are only 1200 Lists to sort.
for(k1:keys)
for(k2:keys)
if(k1.compareTo(k2) <0){ //Dont want to double compare
List<String> intersectList = intersectList_func(k1's exports,k2's exports);
countriespair.put({k1,k2},intersectList)
}
This code block takes so long.I realise it O(n2) and around 1200*1200 comparisions.Thus,Running for almost 3 hours till now..
Is there any way, I can speed it up or optimise it.
Algorithm wise is best option, or are there other technologies to consider.
Edit:
Since both List are sorted beforehand, the intersectList is O(n) where n is length of floor(listOne.length,listTwo.length) and NOT O(n2) as discussed below
private static List<String> intersectList(List<String> listOne,List<String> listTwo){
int i=0,j=0;
List<String> listResult = new LinkedList<String>();
while(i!=listOne.size() && j!=listTwo.size()){
int compareVal = listOne.get(i).compareTo(listTwo.get(j));
if(compareVal==0){
listResult.add(listOne.get(i));
i++;j++;} }
else if(compareVal < 0) i++;
else if (compareVal >0) j++;
}
return listResult;
}
Update 22 Nov
My current implementation is still running for almost 18 hours. :|
Update 25 Nov
I had run the new implementation as suggested by Vikram and a few others. It's been running this Friday.
My question, is that how does grouping by exports rather than country save computational complexity. I find that the complexity is the same. As Groo mentioned, I find that the complexity for the second part is O(E*C^2) where is E is exports and C is country.
This can be done in one statement as a self-join using SQL:
test data. First create a test data set:
Lines <- "Country,Exports
Austrailia,Sheep
Austrailia,Apple
New Zealand,Apple
New Zealand,Sheep
New Zealand,Milk
US,Apple
US,Milk
"
DF <- read.csv(text = Lines, as.is = TRUE)
sqldf Now that we have DF issue this command:
library(sqldf)
sqldf("select a.Country, b.Country, group_concat(Exports) Exports
from DF a, DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
")
giving this output:
Country Country Exports
1 Austrailia New Zealand Sheep,Apple
2 Austrailia US Apple
3 New Zealand US Apple,Milk
with index If its too slow add an index to the Country column (and be sure not to forget the main. parts:
sqldf(c("create index idx on DF(Country)",
"select a.Country, b.Country, group_concat(Exports) Exports
from main.DF a, main.DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
"))
If you run out memory then add the dbname = tempfile() sqldf argument so that it uses disk.
Store something like following datastructure:- (following is a pseudo code)
ValuesSet ={
apple = {"Austrailia","New Zealand"..}
sheep = {"Austrailia","New Zealand"..}
}
for k in ValuesSet
for k1 in k.values()
for k2 in k.values()
if(k1<k2)
Set(k1,k2).add(k)
time complextiy: O(No of distinct pairs with similar products)
Note: I might be wrong but i donot think u can reduce this time complexity
Following is a java implementation for your problem:-
public class PairMatching {
HashMap Country;
ArrayList CountNames;
HashMap ProdtoIndex;
ArrayList ProdtoCount;
ArrayList ProdNames;
ArrayList[][] Pairs;
int products=0;
int countries=0;
public void readfile(String filename) {
try {
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
CountNames = new ArrayList();
Country = new HashMap<String,Integer>();
ProdtoIndex = new HashMap<String,Integer>();
ProdtoCount = new ArrayList<ArrayList>();
ProdNames = new ArrayList();
products = countries = 0;
while((line=br.readLine())!=null) {
String[] s = line.split(",");
s[0] = s[0].trim();
s[1] = s[1].trim();
int k;
if(!Country.containsKey(s[0])) {
CountNames.add(s[0]);
Country.put(s[0],countries);
k = countries;
countries++;
}
else {
k =(Integer) Country.get(s[0]);
}
if(!ProdtoIndex.containsKey(s[1])) {
ProdNames.add(s[1]);
ArrayList n = new ArrayList();
ProdtoIndex.put(s[1],products);
n.add(k);
ProdtoCount.add(n);
products++;
}
else {
int ind =(Integer)ProdtoIndex.get(s[1]);
ArrayList c =(ArrayList) ProdtoCount.get(ind);
c.add(k);
}
}
System.out.println(CountNames);
System.out.println(ProdtoCount);
System.out.println(ProdNames);
} catch (FileNotFoundException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
}
}
void FindPairs() {
Pairs = new ArrayList[countries][countries];
for(int i=0;i<ProdNames.size();i++) {
ArrayList curr = (ArrayList)ProdtoCount.get(i);
for(int j=0;j<curr.size();j++) {
for(int k=j+1;k<curr.size();k++) {
int u =(Integer)curr.get(j);
int v = (Integer)curr.get(k);
//System.out.println(u+","+v);
if(Pairs[u][v]==null) {
if(Pairs[v][u]!=null)
Pairs[v][u].add(i);
else {
Pairs[u][v] = new ArrayList();
Pairs[u][v].add(i);
}
}
else Pairs[u][v].add(i);
}
}
}
for(int i=0;i<countries;i++) {
for(int j=0;j<countries;j++) {
if(Pairs[i][j]==null)
continue;
ArrayList a = Pairs[i][j];
System.out.print("\n{"+CountNames.get(i)+","+CountNames.get(j)+"} : ");
for(int k=0;k<a.size();k++) {
System.out.print(ProdNames.get((Integer)a.get(k))+" ");
}
}
}
}
public static void main(String[] args) {
PairMatching pm = new PairMatching();
pm.readfile("Input data/BigData.txt");
pm.FindPairs();
}
}
[Update] The algorithm presented here shouldn't improve time complexity compared to the OP's original algorithm. Both algorithms have the same asymptotic complexity, and iterating through sorted lists (as OP does) should generally perform better than using a hash table.
You need to group the items by product, not by country, in order to be able to quickly fetch all countries belonging to a certain product.
This would be the pseudocode:
inputList contains a list of pairs {country, product}
// group by product
prepare mapA (product) => (list_of_countries)
for each {country, product} in inputList
{
if mapA does not contain (product)
create a new empty (list_of_countries)
and add it to mapA with (product) as key
add this (country) to the (list_of_countries)
}
// now group by country_pair
prepare mapB (country_pair) => (list_of_products)
for each {product, list_of_countries} in mapA
{
for each pair {countryA, countryB} in list_of_countries
{
if mapB does not countain country_pair {countryA, countryB}
create a new empty (list_of_products)
and add it to mapB with country_pair {countryA, countryB} as key
add this (product) to the (list_of_products)
}
}
If your input list is length N, and you have C distinct countries and P distinct products, then the running time of this algorithm should be O(N) for the first part and O(P*C^2) for the second part. Since your final list needs to have pairs of countries mapping to lists of products, I don't think you will be able to lose the P*C^2 complexity in any case.
I don't code in Java too much, so I added a C# example which I believe you'll be able to port pretty easily:
// mapA maps each product to a list of countries
var mapA = new Dictionary<string, List<string>>();
foreach (var t in inputList)
{
List<string> countries = null;
if (!mapA.TryGetValue(t.Product, out countries))
{
countries = new List<string>();
mapA[t.Product] = countries;
}
countries.Add(t.Country);
}
// note (this is very important):
// CountryPair tuple must have value-type comparison semantics,
// i.e. you need to ensure that two CountryPairs are compared
// by value to allow hashing (mapping) to work correctly, in O(1).
// In C# you can also simply use a Tuple<string,string> to
// represent a pair of countries (which implements this correctly),
// but I used a custom class to emphasize the algorithm
// mapB maps each CountryPair to a list of products
var mapB = new Dictionary<CountryPair, List<string>>();
foreach (var kvp in mapA)
{
var product = kvp.Key;
var countries = kvp.Value;
for (int i = 0; i < countries.Count; i++)
{
for (int j = i + 1; j < countries.Count; j++)
{
var pair = CountryPair.Create(countries[i], countries[j]);
List<string> productsForCountryPair = null;
if (!mapB.TryGetValue(pair, out productsForCountryPair))
{
productsForCountryPair = new List<string>();
mapB[pair] = productsForCountryPair;
}
productsForCountryPair.Add(product);
}*
}
}
This is a great example to use Map Reduce.
At your map phase you just collect all the exports that belong to each Country.
Then, the reducer sorts the products (Products belong to the same country, because of mapper)
You will benefit from distributed, parallel algorithm that can be distributed into a cluster.
You are actually taking O(n^2 * time required for 1 intersect).
Lets see if we can improve time for intersect. We can maintain map for every country which stores corresponding products, so you have n hash maps for n countries. Just need to iterate thru all products once for initializing. If you want quick lookup, maintain a map of maps as:
HashMap<String,HashMap<String,Boolean>> countryMap = new HashMap<String, HashMap<String,Boolean>>();
Now if you want to find the common products for countries str1 and str2 do:
HashMap<String,Boolean> map1 = countryMap.get("str1");
HashMap<String,Boolean> map2 = countryMap.get("str2");
ArrayList<String > common = new ArrayList<String>();
Iterator it = map1.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String,Boolean> pairs = (Map.Entry)it.next();
//Add to common if it is there in other map
if(map2.containsKey(pairs.getKey()))
common.add(pairs.getKey());
}
So, total it will be O(n^2 * k) if there are k entries in one map assuming hash map lookup implementation is O(1) (I guess it is log k for java).
Using hashmaps where necessary to speed things up:
1) Go through the data and create a map with keys Items and values a list of countries associated with that item. So e.g. Sheep:Australia, US, UK, New Zealand....
2) Create a hashmap with keys each pair of countries and (initially) an empty list as values.
3) For each Item retrieve the list of countries associated with it and for each pair of countries within that list, add that item to the list created for that pair in step (2).
4) Now output the updated list for each pair of countries.
The largest costs are in steps (3) and (4) and both of these costs are linear in the amount of output produced, so I think this is not too far from optimal.