in java I would like to be able to maintain my Collection of fishes sorted by species at all time (hence the use of a HashMap) while being able to pick a random element from all species except one with constant time complexity. For example the following code does the job but with O(number of elements) complexity :
import java.util.*;
HashMap<String, ArrayList<Fish>> fishesBySpecies = new HashMap<>();
// Insert some fishes...
// Fish has a String attribute that describes its species
// Now we want to pick a random Fish that isn't from the unwantedSpecies
String unwanted = "unwanted species";
ArrayList<Fish> wantedSpecies = new ArrayList<>();
for (String species : fishesBySpecies.keySet()) {
if (!Objects.equals(species, unwanted)) {
wantedSpecies.addAll(fishesBySpecies.get(species));
}
}
// Finally !
int randomIndex = new Random().nextInt(wantedSpecies.size());
Fish randomElement = wantedSpecies.get(randomIndex);
Any idea how to do this with constant time complexity if possible ? Thanks !
What you are performing is filtering, and when filtering you have to check each element whether they need to be taken out or not. You could try to use alphabetical sorting on the keys and stop filtering once the key is alphabetically larger than your filtering (unwanted) key.
Your code can also be thoroughly shortened by using java streams:
HashMap<String, ArrayList<Fish>> fishesBySpecies = new HashMap<>();
// Insert some fishes...
// Fish has a String attribute that describes its species
// Now we want to pick a random Fish that isn't from the unwantedSpecies
String unwanted = "unwanted species";
fishesBySpecies.keySet().stream() // Get the keyset and create a stream out of it
.filter(key -> !key.equalsIgnoreCase(unwanted)) // If key is not equal to unwanted then leave it in else remove it
.forEach(filteredKey ->
wantedSpecies.addAll(fishesBySpecies.get(filteredKey))); // For each key that was left in, we fetch the fishes
OR
fishesBySpecies.keySet().stream() // Get the keyset and create a stream out of it
.forEach(key ->
{
if(!key.equalsIgnoreCase(unwanted))
{
wantedSpecies.addAll(fishesBySpecies.get(unwanted));
}
}
); // iterate and filter at the same time. Faster.
The only way I can think of would consist in maintaining an ArrayList<Fish> as well as the map you already have. There is a drawback though: adding or removing fishes would be slightly more complex:
Map<String, List<Fish>> fishesBySpecies = new HashMap<>();
List<Fish> wantedFishes = new ArrayList<>();
//...
public void addFish(String species, Fish fish) {
List<Fish> speciesFishes = fishesBySpecies.get(species);
if (speciesFishes == null) {
speciesFishes = new ArrayList<>();
fishesBySpecies.put(species, speciesFishes);
}
speciesFishes.add(fish);
// also maintain the list of wanted fishes
if (!unwantedSpecies.equals(species)) {
wantedFishes.add(fish);
}
}
When I wrote this piece of code due to the pnValue.clear(); the output I was getting was null values for the keys. So I read somewhere that adding values of one map to the other is a mere reference to the original map and one has to use the clone() method to ensure the two maps are separate. Now the issue I am facing after cloning my map is that if I have multiple values for a particular key then they are being over written. E.g. The output I am expecting from processing a goldSentence is:
{PERSON = [James Fisher],ORGANIZATION=[American League, Chicago Bulls]}
but what I get is:
{PERSON = [James Fisher],ORGANIZATION=[Chicago Bulls]}
I wonder where I am going wrong considering I am declaring my values as a Vector<String>
for(WSDSentence goldSentence : goldSentences)
{
for (WSDElement word : goldSentence.getWsdElements()){
if (word.getPN()!=null){
if (word.getPN().equals("group")){
String newPNTag = word.getPN().replace("group", "organization");
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (Vector<String>) pnValue.clone();
annotationMap.put(newPNTag.toUpperCase(),newPNValue);
}
else{
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (Vector<String>) pnValue.clone();
annotationMap.put(word.getPN().toUpperCase(),newPNValue);
}
}
sentenceAnnotationMap = (LinkedHashMap<String, Vector<String>>) annotationMap.clone();
pnValue.clear();
}
EDITED CODE
Replaced Vector with List and removed cloning. However this still doesn't solve my problem. This takes me back to square one where my output is : {PERSON=[], ORGANIZATION=[]}
for(WSDSentence goldSentence : goldSentences)
{
for (WSDElement word : goldSentence.getWsdElements()){
if (word.getPN()!=null){
if (word.getPN().equals("group")){
String newPNTag = word.getPN().replace("group", "organization");
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (List<String>) pnValue;
annotationMap.put(newPNTag.toUpperCase(),newPNValue);
}
else{
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = pnValue;
annotationMap.put(word.getPN().toUpperCase(),newPNValue);
}
}
sentenceAnnotationMap = annotationMap;
}
pnValue.clear();
You're trying a bunch of stuff without really thinking through the logic behind it. There's no need to clear or clone anything, you just need to manage separate lists for separate keys. Here's the basic process for each new value:
If the map contains our key, get the list and add our value
Otherwise, create a new list, add our value, and add the list to the map
You've left out most of your variable declarations, so I won't try to show you the exact solution, but here's the general formula:
List<String> list = map.get(key); // try to get the list
if (list == null) { // list doesn't exist?
list = new ArrayList<>(); // create an empty list
map.put(key, list); // insert it into the map
}
list.add(value); // update the list
I have two maps:
Map<Date, List<Journey>> journeyMap = new TreeMap<Date, List<Journey>>
Map<Date, List<Job>> jobMap = new TreeMap<Date, List<Job>>
I used TreeMap because that means they're sorted by date but I want to go through both maps at the same time, get the values of Journey/Job, then do some work.
I think i could use generics, storing the Job/Journey as an Object, then checking the instanceOf but I'm not sure if thats the solution?
Thanks.
Even though the others are right, that there are better, safer and more comfortable ways to achive whatever you want, it is possible to iterate over (the entries of) two Maps (aka Collections) at the same time.
//replace keySet() with your favorite method in for-each-loops
Iterator<Date> journeyIterator = journeyMap.keySet().iterator()
Iterator<Date> jobIterator = jobMap.keySet().iterator();
while(journeyIterator.hasNext() && jobIterator.hasNext()){
Date journeyDate = journeyIter.next()
Date jobDate = jobIterator.next();
//... do whatever you want with the data
}
This code does explicitly, what a for-each-loop can do implicitly for one Collection. It retrieves the Iterator and gets the element from the Collection from it, much like reading a file.
You're making an assumption that these maps are having values sorted in the very same way, but this is definitely not correct. At least if you want to write a logic like this you need to declare the same implementing class as a reference:
TreeMap<Date, List<Journey>> journeyMap = new TreeMap<Date, List<Journey>>
TreeMap<Date, List<Job>> jobMap = new TreeMap<Date, List<Job>>
but believe me you don't want to do it.
You're right! Instead doing 2 maps create 1, holding pair of Job/Journey objects - create a JobJourneyHolder class which holds both objects, this will be a good solution.
Yes, defining a new class for that is definitely the solution, because it composes related objects together, which is very welcomed in OOP. And you should not forget to implement hashCode() and equals() methods to make such classes work properly in Java collections:
public final class JourneyJob {
final Journey journey;
final Job job;
public JourneyJob(Journey journey, Job job) {
if (journey == null || job == null)
throw new NullPointerException();
this.journey = journey;
this.job = job;
}
public int hashCode() {
return Objects.hash(journey, job);
}
public boolean equals(JourneyJob other) {
return other.job.equals(job) && other.journey.equals(journey);
}
}
To add elements to common Map:
Map<Date, List<JourneyJob>> map = new TreeMap<>();
...
if (map.contains(date)) {
map.get(date).add(new JourneyJob(journey, job));
} else {
map.put(date, new ArrayList<>(Arrays.asList(new JourneyJob(journey, job)));
}
...
To retrieve JourneyJob objects:
for (List<JourneyJob> jjList : map.values()) {
for (JourneyJob jj : jjList) {
journey = jj.journey;
job = jj.job;
//... do your work here
}
}
Or, if you use Java 8, this can be done using nested forEach():
map.values().stream().forEach(list ->
list.stream().forEach(jj -> {
Journey journey = jj.journey;
Job job = jj.job;
//... do your work here
})
);
I have an algorithmic problem at hand. To easily explain the problem, I will be using a simple analogy.
I have an input file
Country,Exports
Austrailia,Sheep
US, Apple
Austrialia,Beef
End Goal:
I have to find the common products between the pairs of countries so
{"Austrailia,New Zealand"}:{"apple","sheep}
{"Austrialia,US"}:{"apple"}
{"New Zealand","US"}:{"apple","milk"}
Process :
I read in the input and store it in a TreeMap > Where the List, the strings are interned due to many duplicates.
Essentially, I am aggregating by country.
where Key is country, Values are its Exports.
{"austrailia":{"apple","sheep","koalas"}}
{"new zealand":{"apple","sheep","milk"}}
{"US":{"apple","beef","milk"}}
I have about 1200 keys (countries) and total number of values(exports) is 80 million altogether.
I sort all the values of each key:
{"austrailia":{"apple","sheep","koalas"}} -- > {"austrailia":{"apple","koalas","sheep"}}
This is fast as there are only 1200 Lists to sort.
for(k1:keys)
for(k2:keys)
if(k1.compareTo(k2) <0){ //Dont want to double compare
List<String> intersectList = intersectList_func(k1's exports,k2's exports);
countriespair.put({k1,k2},intersectList)
}
This code block takes so long.I realise it O(n2) and around 1200*1200 comparisions.Thus,Running for almost 3 hours till now..
Is there any way, I can speed it up or optimise it.
Algorithm wise is best option, or are there other technologies to consider.
Edit:
Since both List are sorted beforehand, the intersectList is O(n) where n is length of floor(listOne.length,listTwo.length) and NOT O(n2) as discussed below
private static List<String> intersectList(List<String> listOne,List<String> listTwo){
int i=0,j=0;
List<String> listResult = new LinkedList<String>();
while(i!=listOne.size() && j!=listTwo.size()){
int compareVal = listOne.get(i).compareTo(listTwo.get(j));
if(compareVal==0){
listResult.add(listOne.get(i));
i++;j++;} }
else if(compareVal < 0) i++;
else if (compareVal >0) j++;
}
return listResult;
}
Update 22 Nov
My current implementation is still running for almost 18 hours. :|
Update 25 Nov
I had run the new implementation as suggested by Vikram and a few others. It's been running this Friday.
My question, is that how does grouping by exports rather than country save computational complexity. I find that the complexity is the same. As Groo mentioned, I find that the complexity for the second part is O(E*C^2) where is E is exports and C is country.
This can be done in one statement as a self-join using SQL:
test data. First create a test data set:
Lines <- "Country,Exports
Austrailia,Sheep
Austrailia,Apple
New Zealand,Apple
New Zealand,Sheep
New Zealand,Milk
US,Apple
US,Milk
"
DF <- read.csv(text = Lines, as.is = TRUE)
sqldf Now that we have DF issue this command:
library(sqldf)
sqldf("select a.Country, b.Country, group_concat(Exports) Exports
from DF a, DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
")
giving this output:
Country Country Exports
1 Austrailia New Zealand Sheep,Apple
2 Austrailia US Apple
3 New Zealand US Apple,Milk
with index If its too slow add an index to the Country column (and be sure not to forget the main. parts:
sqldf(c("create index idx on DF(Country)",
"select a.Country, b.Country, group_concat(Exports) Exports
from main.DF a, main.DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
"))
If you run out memory then add the dbname = tempfile() sqldf argument so that it uses disk.
Store something like following datastructure:- (following is a pseudo code)
ValuesSet ={
apple = {"Austrailia","New Zealand"..}
sheep = {"Austrailia","New Zealand"..}
}
for k in ValuesSet
for k1 in k.values()
for k2 in k.values()
if(k1<k2)
Set(k1,k2).add(k)
time complextiy: O(No of distinct pairs with similar products)
Note: I might be wrong but i donot think u can reduce this time complexity
Following is a java implementation for your problem:-
public class PairMatching {
HashMap Country;
ArrayList CountNames;
HashMap ProdtoIndex;
ArrayList ProdtoCount;
ArrayList ProdNames;
ArrayList[][] Pairs;
int products=0;
int countries=0;
public void readfile(String filename) {
try {
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
CountNames = new ArrayList();
Country = new HashMap<String,Integer>();
ProdtoIndex = new HashMap<String,Integer>();
ProdtoCount = new ArrayList<ArrayList>();
ProdNames = new ArrayList();
products = countries = 0;
while((line=br.readLine())!=null) {
String[] s = line.split(",");
s[0] = s[0].trim();
s[1] = s[1].trim();
int k;
if(!Country.containsKey(s[0])) {
CountNames.add(s[0]);
Country.put(s[0],countries);
k = countries;
countries++;
}
else {
k =(Integer) Country.get(s[0]);
}
if(!ProdtoIndex.containsKey(s[1])) {
ProdNames.add(s[1]);
ArrayList n = new ArrayList();
ProdtoIndex.put(s[1],products);
n.add(k);
ProdtoCount.add(n);
products++;
}
else {
int ind =(Integer)ProdtoIndex.get(s[1]);
ArrayList c =(ArrayList) ProdtoCount.get(ind);
c.add(k);
}
}
System.out.println(CountNames);
System.out.println(ProdtoCount);
System.out.println(ProdNames);
} catch (FileNotFoundException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
}
}
void FindPairs() {
Pairs = new ArrayList[countries][countries];
for(int i=0;i<ProdNames.size();i++) {
ArrayList curr = (ArrayList)ProdtoCount.get(i);
for(int j=0;j<curr.size();j++) {
for(int k=j+1;k<curr.size();k++) {
int u =(Integer)curr.get(j);
int v = (Integer)curr.get(k);
//System.out.println(u+","+v);
if(Pairs[u][v]==null) {
if(Pairs[v][u]!=null)
Pairs[v][u].add(i);
else {
Pairs[u][v] = new ArrayList();
Pairs[u][v].add(i);
}
}
else Pairs[u][v].add(i);
}
}
}
for(int i=0;i<countries;i++) {
for(int j=0;j<countries;j++) {
if(Pairs[i][j]==null)
continue;
ArrayList a = Pairs[i][j];
System.out.print("\n{"+CountNames.get(i)+","+CountNames.get(j)+"} : ");
for(int k=0;k<a.size();k++) {
System.out.print(ProdNames.get((Integer)a.get(k))+" ");
}
}
}
}
public static void main(String[] args) {
PairMatching pm = new PairMatching();
pm.readfile("Input data/BigData.txt");
pm.FindPairs();
}
}
[Update] The algorithm presented here shouldn't improve time complexity compared to the OP's original algorithm. Both algorithms have the same asymptotic complexity, and iterating through sorted lists (as OP does) should generally perform better than using a hash table.
You need to group the items by product, not by country, in order to be able to quickly fetch all countries belonging to a certain product.
This would be the pseudocode:
inputList contains a list of pairs {country, product}
// group by product
prepare mapA (product) => (list_of_countries)
for each {country, product} in inputList
{
if mapA does not contain (product)
create a new empty (list_of_countries)
and add it to mapA with (product) as key
add this (country) to the (list_of_countries)
}
// now group by country_pair
prepare mapB (country_pair) => (list_of_products)
for each {product, list_of_countries} in mapA
{
for each pair {countryA, countryB} in list_of_countries
{
if mapB does not countain country_pair {countryA, countryB}
create a new empty (list_of_products)
and add it to mapB with country_pair {countryA, countryB} as key
add this (product) to the (list_of_products)
}
}
If your input list is length N, and you have C distinct countries and P distinct products, then the running time of this algorithm should be O(N) for the first part and O(P*C^2) for the second part. Since your final list needs to have pairs of countries mapping to lists of products, I don't think you will be able to lose the P*C^2 complexity in any case.
I don't code in Java too much, so I added a C# example which I believe you'll be able to port pretty easily:
// mapA maps each product to a list of countries
var mapA = new Dictionary<string, List<string>>();
foreach (var t in inputList)
{
List<string> countries = null;
if (!mapA.TryGetValue(t.Product, out countries))
{
countries = new List<string>();
mapA[t.Product] = countries;
}
countries.Add(t.Country);
}
// note (this is very important):
// CountryPair tuple must have value-type comparison semantics,
// i.e. you need to ensure that two CountryPairs are compared
// by value to allow hashing (mapping) to work correctly, in O(1).
// In C# you can also simply use a Tuple<string,string> to
// represent a pair of countries (which implements this correctly),
// but I used a custom class to emphasize the algorithm
// mapB maps each CountryPair to a list of products
var mapB = new Dictionary<CountryPair, List<string>>();
foreach (var kvp in mapA)
{
var product = kvp.Key;
var countries = kvp.Value;
for (int i = 0; i < countries.Count; i++)
{
for (int j = i + 1; j < countries.Count; j++)
{
var pair = CountryPair.Create(countries[i], countries[j]);
List<string> productsForCountryPair = null;
if (!mapB.TryGetValue(pair, out productsForCountryPair))
{
productsForCountryPair = new List<string>();
mapB[pair] = productsForCountryPair;
}
productsForCountryPair.Add(product);
}*
}
}
This is a great example to use Map Reduce.
At your map phase you just collect all the exports that belong to each Country.
Then, the reducer sorts the products (Products belong to the same country, because of mapper)
You will benefit from distributed, parallel algorithm that can be distributed into a cluster.
You are actually taking O(n^2 * time required for 1 intersect).
Lets see if we can improve time for intersect. We can maintain map for every country which stores corresponding products, so you have n hash maps for n countries. Just need to iterate thru all products once for initializing. If you want quick lookup, maintain a map of maps as:
HashMap<String,HashMap<String,Boolean>> countryMap = new HashMap<String, HashMap<String,Boolean>>();
Now if you want to find the common products for countries str1 and str2 do:
HashMap<String,Boolean> map1 = countryMap.get("str1");
HashMap<String,Boolean> map2 = countryMap.get("str2");
ArrayList<String > common = new ArrayList<String>();
Iterator it = map1.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String,Boolean> pairs = (Map.Entry)it.next();
//Add to common if it is there in other map
if(map2.containsKey(pairs.getKey()))
common.add(pairs.getKey());
}
So, total it will be O(n^2 * k) if there are k entries in one map assuming hash map lookup implementation is O(1) (I guess it is log k for java).
Using hashmaps where necessary to speed things up:
1) Go through the data and create a map with keys Items and values a list of countries associated with that item. So e.g. Sheep:Australia, US, UK, New Zealand....
2) Create a hashmap with keys each pair of countries and (initially) an empty list as values.
3) For each Item retrieve the list of countries associated with it and for each pair of countries within that list, add that item to the list created for that pair in step (2).
4) Now output the updated list for each pair of countries.
The largest costs are in steps (3) and (4) and both of these costs are linear in the amount of output produced, so I think this is not too far from optimal.
A database call is made and result is a bunch of rows of two string columns of type A and B. e.g. (x_a, y_b), (x_a, y1_b), (x2_a,y_b)
The idea is to come up with a list of maps like {(x_a,{y_b,y1_b}), (x2_a,{y_b})} where the objects of type A are not repeated and to do this while pulling the results from a database.
Here's what I tried:
int i =0;
List<String> type2 = new ArrayList<String>();
Map<String,List<String>> type1_type2 = new HashMap<String,List<String>>();
List<Map> list_type1_type2 = new ArrayList<Map>();
String [] type1Array = new String[100];
String [] type2Array = new String[100];
int trackStart = 0;
while (res.next()){
String type1 = res.getString(1);
String type2 = res.getString(2);
type1Array[i]=type1;
type2Array[i] = type2;
if(i>0 && !type1Array[i].equals(type2Array[i-1])){
int trackStop = i;
for(int j = trackStart; j<trackStop;j++){
type2.add(type2Array[j]);
}
type1_type2.put(type1Array[i-1], type2);
list_type1_type2.add(type1_type2);
//debugging stuff
String x = list_type1_type2.toString();
System.out.println(x);
System.out.println(" printing because "+ type1Array[i]+" is not equal to " + type1Array[i-1]);
type2 = new ArrayList<String>();
type1_type2 = new HashMap<String,List<String>>();
trackStart=i;
}
i++;
}
This method does not work when the last type1 values of the result object are the same.
Is there a way to do this in the same spirit (within the while(res.next)) without first storing the results of the database call in separate arrays or adding an extra for loop outside the while loop to "patch it up"?
The simple way to do this is to use a Guava / Google Collections SetMultiMap. This is essentially a mapping from a key (your 'A' objects) to a set of values (your 'B' objects).
[I'm not going to try to code it for you. Your current code is too horrible to read ... unless you were paying me :-) ]
However, a better idea would be to get the database to do the collation. If you can do that, you will reduce the amount of (redundant) data that gets send across the database connection ... assuming that you are using JDBC.
If you don't want duplicates like {x_a:[y_b, y_b]} then use a set as the value of your map:
Map<String,Set<String>> type1_type2;
I don't know what the other various list and arrays are for. You can probably just get by with the type1_type2 map. Process each (x, y) in pseudo-code:
Set s = type1_type2.get(x)
if s == null:
s = new Set()
type1_type2.put(x, s)
s.add(y)