Iterate big hashmap in parallel

Iterate big hashmap in parallel - java

I have a linked hashmap which may contain upto 300k records at maximum. I want to iterate this map in parallel to improve the performance. The function iterates through the map of vectors and finds dot product of given vector against all the vectors in map. Also have one more check based on date value. And the function returns a nested hashmap. T
This is the code using iterator:
public HashMap<String,HashMap<String,Double>> function1(String key, int days) {
LocalDate date = LocalDate.now().minusDays(days);
HashMap<String,Double> ret = new HashMap<>();
HashMap<String,Double> ret2 = new HashMap<>();
OpenMapRealVector v0 = map.get(key).value;
for(Map.Entry<String, FixedTimeHashMap<OpenMapRealVector>> e: map.entrySet()) {
if(!e.getKey().equals(key)) {
Double d = v0.dotProduct(e.getValue().value);
d = Double.parseDouble(new DecimalFormat("###.##").format(d));
ret.put(e.getKey(),d);
if(e.getValue().date.isAfter(date)){
ret2.put(e.getKey(),d);
}
}
}
HashMap<String,HashMap<String,Double>> result = new HashMap<>();
result.put("dot",ret);
result.put("anomaly",ret2);
return result;
}
Update:
I looked into Java 8 streams, but I am running into CastException and Null pointer exceptions when using the parallel stream as this map is being modified else where.
Code:
public HashMap<String,HashMap<String,Double>> function1(String key, int days) {
LocalDate date = LocalDate.now().minusDays(days);
HashMap<String,Double> ret = new HashMap<>();
HashMap<String,Double> ret2 = new HashMap<>();
OpenMapRealVector v0 = map.get(key).value;
synchronized (map) {
map.entrySet().parallelStream().forEach(e -> {
if(!e.getKey().equals(key)) {
Double d = v0.dotProduct(e.getValue().value);
d = Double.parseDouble(new DecimalFormat("###.##").format(d));
ret.put(e.getKey(),d);
if(e.getValue().date.isAfter(date)) {
ret2.put(e.getKey(),d);
}
}
});
}
}
I have synchronized the map usage, but it still gives me the following errors:
java.util.concurrent.ExecutionException: java.lang.ClassCastException
Caused by: java.lang.ClassCastException
Caused by: java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
Also, I was thinking Should i split up the map into multiple pieces and run each using different threads in parallel?

You need to retrieve the Set<Map.Entry<K, V>> from the map.
Here's how you iterate on a Map using parallel Streams in Java8:
Map<String, String> myMap = new HashMap<> ();
myMap.entrySet ()
.parallelStream ()
.forEach (entry -> {
String key = entry.getKey ();
String value = entry.getValue ();
// here add whatever processing you wanna do using the key / value retrieved
// ret.put (....);
// ret2.put (....)
});
Clarification:
The maps ret and ret2 should be declared as ConcurrentHashMaps to allow the concurrent inserts / updates from multiple threads.
So the declaration of the 2 maps become:
Map<String,Double> ret = new ConcurrentHashMap<> ();
Map<String,Double> ret2 = new ConcurrentHashMap<> ();

One possible solution using Java 8 would be,
Map<String, Double> dotMap = map.entrySet().stream().filter(e -> !e.getKey().equals(key))
.collect(Collectors.toMap(Map.Entry::getKey, e -> Double
.parseDouble(new DecimalFormat("###.##").format(v0.dotProduct(e.getValue().value)))));
Map<String, Double> anomalyMap = map.entrySet().stream().filter(e -> !e.getKey().equals(key))
.filter(e -> e.getValue().date.isAfter(date))
.collect(Collectors.toMap(Map.Entry::getKey, e -> Double
.parseDouble(new DecimalFormat("###.##").format(v0.dotProduct(e.getValue().value)))));
result.put("dot", dotMap);
result.put("anomaly", anomalyMap);
Update
Here's much more elegant solution,
Map<String, Map<String, Double>> resultMap = map.entrySet().stream().filter(e -> !e.getKey().equals(key))
.collect(Collectors.groupingBy(e -> e.getValue().date.isAfter(date) ? "anomaly" : "dot",
Collectors.toMap(Map.Entry::getKey, e -> Double.parseDouble(
new DecimalFormat("###.##").format(v0.dotProduct(e.getValue().value))))));
Here we first group them based on anomaly or dot, and then use a downstream Collector to create a Map for each group. Also I have updated .filter() criteria based on the following suggestions.

Related

delete from parent hashmap from a nested hashmap condition

I have the following structure
HashMap<String, HashMap<String, String>> h = new HashMap<>();
HashMap<String, String>> h1 = new HashMap<>();
h1.put("key10", "value10")
h1.put("key11", "value11")
h1.put("date", "2018-10-18T00:00:57.907Z")
h.put("1#100", h1)
HashMap<String, String>> h2 = new HashMap<>();
h2.put("key20", "value20")
h2.put("key21", "value21")
h2.put("date", "2023-02-03T10:00:00.907Z")
h.put("2#000", h2)
Imagine I have many entries like the examples above.
In certain moment (scheduler) i have this requirement:
check all nested hash maps (for each/stream)
see if date condition is true
find parent key and delete from main hash map
In this exemple the final hash map will be
h2.put("key20", "value20")
h2.put("key21", "value21")
h2.put("date", "2023-02-03T10:00:00.907Z")
h.put("2#000", h2)
h2 => {key20 => value20, key21 => value21, date => 2023-02-03T10:00:00.907Z}
i have this code right now
h.forEach((k,v) -> {
v.entrySet()
.stream()
.filter(e -> e.getKey().equals("date"))
.filter(t -> Timestamp.from(Instant.now()).getTime() - Timestamp.valueOf(t.getValue()).getTime() > milisDiff)
//need now to access parent and delete with by k key
Can do in one step (lambda) or i need to have extra structure to collect parent keys and after proceed to delete within for each ?

This may do what you want. Just filter out bad elements and assign to the same map.
HashMap<String, HashMap<String, String>> h = new HashMap<>();
HashMap<String, String> h1 = new HashMap<>();
h1.put("key10", "value10");
h1.put("key11", "value11");
h1.put("date", "2018-10-18T00:00:57.907Z");
h.put("1#100", h1);
HashMap<String, String> h2 = new HashMap<>();
h2.put("key20", "value20");
h2.put("key21", "value21");
h2.put("date", "2023-02-04T10:00:00.907Z");
h.put("2#000", h2);
// any instant after `now` will pass the filter and be put in the map
Predicate<String> check = str -> Instant.parse(str)
.isAfter(Instant.now());
h = h.entrySet().stream()
.filter(e -> check.test(e.getValue().get("date")))
.collect(Collectors.toMap(Entry::getKey, Entry::getValue,
(a,b)->a,
HashMap::new));
h.values().forEach(m -> {
m.entrySet().forEach(System.out::println);
});
prints
date=2023-02-04T10:00:00.907Z
key21=value21
key20=value20
My predicate simply deleted the date if it expired. Yours was a tighter threshold.
Updated
Here is another option in case building a new map takes too long. It uses an iterator to run thru the loop and modify the existing map by removing Maps with old dates.
Iterator<Entry<String,Map<String,String>>> it = h.entrySet().iterator();
while (it.hasNext()) {
Entry<String,Map<String, String>> e = it.next();
if (!check.test(e.getValue().get("date"))) {
it.remove();
}
}

Get Submap from Map using PredicateMap

I wanted to get a submap from predicateMap:
I have tried this:
public class first {
public static void main(String[] args)
{
TreeMap<String, String> myMap = new TreeMap<String, String>();
Predicate onlyStrings = new InstanceofPredicate( String.class );
myMap.put("Key1","1");
myMap.put("Key2","2");
myMap.put("Key3","3");
System.out.println("Before using submap: "+ myMap );
Predicate pred1 = new EqualPredicate( "1" );
Predicate pred2 = new EqualPredicate( "2" );
Predicate rule = new OrPredicate( pred1, pred2 );
Map map = PredicatedMap.decorate( myMap, onlyStrings, rule );
System.out.println("Before using submap: "+ map );
}
I am not able to get the desired submap which is the following:
Initial Map: {key1=1, key2=2, key3=3}
Output (submap): {key2=2, key3=3}
Can someone please help with this

It doesn't seems PredicatedMap do what you want to achive. It looks more like a validator when adding new values to map.
If you want to extract some values from a map base on predicate, Stream API from JDK should be enough.
If doesn't bother you to modify initial list:
myMap.entrySet().removeIf( e -> !(e.getValue().equals("1") || e.getValue().equals("2")));
If you want to keep initial list and create a new one:
Map<String, String> collect = myMap.entrySet().stream().filter(x -> x.getValue().equals("1") || x.getValue().equals("2"))
.collect(Collectors.toMap(e -> e.getKey(),e -> e.getValue()));
If you have a bigger list of value that you want to keep, you can create a set of them:
Set<String> values = Set.of("1","2");
and filter base on this set:
collect = myMap.entrySet().stream().filter(x -> values.contains(x.getValue()))
.collect(Collectors.toMap(e -> e.getKey(),e -> e.getValue()));
Or for the case with modifying initial list:
myMap.entrySet().removeIf( e -> !values.contains(e.getValue()));
Looks a bit clear if you extract values to keep as a set in my opinion.

How can I convert this source code to lambda?

It consists of a map in the list object. I try to match lists with the same id by comparing them through loop statements. How can I convert to lambda?
List<Map<String, String>> combineList = new ArrayList<>(); // Temp List
for(Map titleMap : titleList) { // Name List
for(Map codeMap : codeList) { // Age List
if(titleMap.get("ID").equals(codeMap.get("ID"))) { // compare Id
Map<String,String> tempMap = new HashMap<>();
tempMap.put("ID", titleMap.get("ID"));
tempMap.put("NAME", titleMap.get("NAME"));
tempMap.put("AGE", codeMap.get("AGE"));
combineList.add(tempMap);
}
}
}

You are already doing it in efficient manner. So if you want you could change same code to just use stream().forEach or if want to use streams more do it as below:
titleList.stream()
.forEach(titleMap ->
combineList.addAll(
codeList.stream()
.filter(codeMap -> titleMap.get("ID").equals(codeMap.get("ID")))
.map(codeMap -> {
Map<String, Object> tempMap = new HashMap<>();
tempMap.put("ID", titleMap.get("ID"));
tempMap.put("NAME", titleMap.get("NAME"));
tempMap.put("ID", codeMap.get("ID"));
tempMap.put("AGE", codeMap.get("AGE"));
return tempMap;
})
.collect(Collectors.toList())
)
);
Notice that you have to filter from the codeList each time because your condition is that way. Try using a class in place of Map to be more efficient, cleaner and effective.

Accumulator not working properly in parallel stream

I made collector who can reduce a stream to a map which has the keys as the items that can be bought by certain customers and the names of customers as values, my implementation is working proberly in sequential stream
but when i try to use parallel it's not working at all, the resulting sets always contain one customer name.
List<Customer> customerList = this.mall.getCustomerList();
Supplier<Object> supplier = ConcurrentHashMap<String,Set<String>>::new;
BiConsumer<Object, Customer> accumulator = ((o, customer) -> customer.getWantToBuy().stream().map(Item::getName).forEach(
item -> ((ConcurrentHashMap<String,Set<String>>)o)
.merge(item,new HashSet<String>(Collections.singleton(customer.getName())),
(s,s2) -> {
HashSet<String> res = new HashSet<>(s);
res.addAll(s2);
return res;
})
));
BinaryOperator<Object> combiner = (o,o2) -> {
ConcurrentHashMap<String,Set<String>> res = new ConcurrentHashMap<>((ConcurrentHashMap<String,Set<String>>)o);
res.putAll((ConcurrentHashMap<String,Set<String>>)o2);
return res;
};
Function<Object, Map<String, Set<String>>> finisher = (o) -> new HashMap<>((ConcurrentHashMap<String,Set<String>>)o);
Collector<Customer, ?, Map<String, Set<String>>> toItemAsKey =
new CollectorImpl<>(supplier, accumulator, combiner, finisher, EnumSet.of(
Collector.Characteristics.CONCURRENT,
Collector.Characteristics.IDENTITY_FINISH));
Map<String, Set<String>> itemMap = customerList.stream().parallel().collect(toItemAsKey);
There is certainly a problem in my accumulator implementation or another Function but I cannot figure it out! could anyone suggest what should i do ?

Your combiner is not correctly implemented.
You overwrite all entries that has the same key. What you want is adding values to existing keys.
BinaryOperator<ConcurrentHashMap<String,Set<String>>> combiner = (o,o2) -> {
ConcurrentHashMap<String,Set<String>> res = new ConcurrentHashMap<>(o);
o2.forEach((key, set) -> set.forEach(string -> res.computeIfAbsent(key, k -> new HashSet<>())
.add(string)));
return res;
};

Updating Values in Map on the basis of other map in Java

Map<String, String> map1 = new HashMap<>();
map1.put("k1", "v1");
map1.put("k2", "v2");
map1.put("k3", "v3");
Map<String, String> map2 = new HashMap<>();
map2.put("v1", "val1");
map2.put("v2", "val2");
map2.put("v3", "vav3");
I want to update values of map1 so that it has entries:
"k1" , "val1",
"k2" , "val2",
"k3" , "val3"
My solution:
for (Map.Entry<String, String> entry : map1.entrySet()) {
map1.put(entry.getKey(), map2.get(entry.getValue()));
}
Is there any better way to do this?
Edit: I am using Java 7 but curious to know if there any better way in Java 8.

Starting with Java 8, you can just have
map1.replaceAll((k, v) -> map2.get(v));
replaceAll(function) will replace all values from the map map1 with the result of applying the given function. In this case, the function simply retrieves the value from map2.
Note that this solution has the same issues that your initial code: if map2 doesn't have a corresponding mapping, null will be returned. You may want to call getOrDefault to have a default value in that case.
public static void main(String[] args) {
Map<String, String> map1 = new HashMap<>();
map1.put("k1", "v1");
map1.put("k2", "v2");
map1.put("k3", "v3");
Map<String, String> map2 = new HashMap<>();
map2.put("v1", "val1");
map2.put("v2", "val2");
map2.put("v3", "val3");
map1.replaceAll((k, v) -> map2.get(v));
System.out.println(map1); // prints "{k1=val1, k2=val2, k3=val3}"
}

For Java 7 there is nothing more that you can do, you are already doing it in the best way possible.
I'm adding this answer as a reference to show that for such case using Lambda Expressions in Java 8 will be even worst. See this example:
public static void main(String[] args) {
Map<String, String> map1 = new HashMap<>();
final Map<String, String> map2 = new HashMap<>();
for ( int i=0; i<100000; i++ ){
map1.put("k"+i, "v"+i);
map2.put("v"+i, "val"+i);
}
long time;
long prev_time = System.currentTimeMillis();
for (Map.Entry<String, String> entry : map1.entrySet()) {
map1.put(entry.getKey(), map2.get(entry.getValue()));
}
time = System.currentTimeMillis() - prev_time;
System.out.println("Time after for loop " + time);
map1 = new HashMap<>();
for ( int i=0; i<100000; i++ ){
map1.put("k"+i, "v"+i);
}
prev_time = System.currentTimeMillis();
map1.replaceAll((k, v) -> map2.get(v));
time = System.currentTimeMillis() - prev_time;
System.out.println("Time after for loop " + time);
}
The output for this will be:
Time after for loop 40
Time after for loop 100
The second loop is variable but always bigger than the first one.
I'm not Lambda specialist but I guess that there are more to be processed with it than a plain "foreach" of the first scenario
Running this test case over and over you will get for lambda almost always twice the time of the first "foreach" case.

in Java 8 you can write:
map1.entrySet()
.stream()
.map(entry -> new SimpleEntry(entry.getKey(), map2.get(entry.getValue())))
.collect(Collectors.toMap(entry -> entry.getKey(), entry.getValue()));
Not the nicest thing, though, but still a non-mutating solution.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Iterate big hashmap in parallel - java

Related

delete from parent hashmap from a nested hashmap condition

Get Submap from Map using PredicateMap

How can I convert this source code to lambda?

Accumulator not working properly in parallel stream

Updating Values in Map on the basis of other map in Java

Categories

Resources