I have a stream that processes some strings and collects them in a map.
But getting the following exception:
java.lang.IllegalStateException:
Duplicate key test#yahoo.com
(attempted merging values [test#yahoo.com] and [test#yahoo.com])
at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
I'm using the following code:
Map<String, List<String>> map = emails.stream()
.collect(Collectors.toMap(
Function.identity(),
email -> processEmails(email)
));
The flavor of toMap() you're using in your code (which expects only keyMapper and valueMapper) disallow duplicates merely because it's not capable to handle them. And exception message explicitly tells you that.
Judging by the resulting type Map<String, List<String>> and by the exception message which shows strings enclosed in square brackets, it is possible to make the conclusion that processEmails(email) produces a List<String> (although it's not obvious from your description and IMO worth specifying).
There are multiple ways to solve this problem, you can either:
Use this another version of toMap(keyMapper,valueMapper,mergeFunction) which requires the third argument, mergeFunction - a function responsible for resolving duplicates.
Map<String, List<String>> map = emails.stream()
.collect(Collectors.toMap(
Function.identity(),
email -> processEmails(email),
(list1, list2) -> list1 // or { list1.addAll(list2); return list1} depending on the your logic of resolving duplicates you need
));
Make use of the collector groupingBy(classifier,downstream) to preserve all the emails retrieved by processEmails() that are associated with the same key by storing them into a List. As a downstream collector we could utilize a combination of collectors flatMapping() and toList().
Map<String, List<String>> map = emails.stream()
.collect(Collectors.groupingBy(
Function.identity(),
Collectors.flatMapping(email -> processEmails(email).stream(),
Collectors.toList())
));
Note that the later option would make sense only if processEmails() somehow generates different results for the same key, otherwise you would end up with a list of repeated values which doesn't seem to be useful.
But what you definitely shouldn't do in this case is to use distinct(). It'll unnecessarily increase the memory consumption because it eliminates the duplicates by maintaining a LinkedHashSet under the hood. It would be wasteful because you're already using Map which is capable to deal with duplicated keys.
You have duplicate emails. The toMap version you're using explicitly doesn't allow duplicate keys. Use the toMap that takes a merge function. How to merge those processEmails results depends on your business logic.
Alternatively, use distinct() before collecting, because otherwise you'll probably end up sending some people multiple emails.
try using
Collectors.toMap(Function keyFuntion, Function valueFunction, BinaryOperator mergeFunction)
You obviously have to write your own merge logic, a simple mergeFunction could be
(x1, x2) -> x1
Related
I am doing a group by on a list of Objects as shown in the below code
Map<String, List<InventoryAdjustmentsModel>> buildDrawNumEquipmentMap = equipmentsAndCargoDetails.stream().
collect(Collectors.groupingBy(InventoryAdjustmentsModel :: getBuildDrawNum));
Now I know the values for all the keys would have only one element, so how can I reduce it to just
Map<String, InventoryAdjustmentsModel>
instead of having to iterate through or get the 0th element for all the keys.
You may use the toMap collector with a merge function like this.
Map<String, InventoryAdjustmentsModel> resultMap = equipmentsAndCargoDetails.stream().
collect(Collectors.toMap(InventoryAdjustmentsModel::getBuildDrawNum,
e -> e, (a, b) -> a));
Try it like this. By using toMap you can specify the key and the value. Since you said there were no duplicate keys this does not include the merge method. This means you will get an error if duplicate keys are discovered. Something I presumed you would want to know about.
Map<String, InventoryAdjustmentsModel> buildDrawNumEquipmentMap =
equipmentsAndCargoDetails.stream().
collect(Collectors.toMap(InventoryAdjustmentsModel::getBuildDrawNum,
model->model));
Consider the following Java HashMap.
Map<String, String> unsortMap = new HashMap<String, String>();
unsortMap.put("Z", "z");
unsortMap.put("B", "b");
unsortMap.put("A", "a");
unsortMap.put("C", "c");
Now I wish to sort this Map by Key. One option is for me to use a TreeMap for this purpose.
Map<String, String> treeMap = new TreeMap<String, String>(unsortMap);
Another option is for me to Use Java Streams with Sorted(), as follows.
Map<String, Integer> sortedMap = new HashMap<>();
unsortMap.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.forEachOrdered(x -> sortedMap.put(x.getKey(), x.getValue()));
Out of these two, which option is preferred and why (may be in terms of performance)?
Thank you
As pointed out by others dumping the sorted stream of entries into a regular HashMap would do nothing... LinkedHashMap is the logical choice.
However, an alternative to the approaches above is to make full use of the Stream Collectors API.
Collectors has a toMap method that allows you to provide an alternative implementation for the Map. So instead of a HashMap you can ask for a LinkedHashMap like so:
unsortedMap.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.collect(Collectors.toMap(
Map.Entry::getKey,
Map.Entry::getValue,
(v1, v2) -> v1, // you will never merge though ask keys are unique.
LinkedHashMap::new
));
Between using a TreeMap vs LinkedHashMap ... The complexity of construction is likely to be the same something like O(n log n)... Obviously the TreeMap solution is a better approach if you plan to keep adding more elements to it... I guess you should had started with a TreeMap in that case. The LinkedHashMap option has the advantage that lookup is going to be O(1) on the Linked or the original unsorted map whereas as TreeMap's is something like O(log n) so if you would need to keep the unsorted map around for efficient lookup whereas in if you build the LinkedHashMap you could toss the original unsorted map (thus saving some memory).
To make things a bit more efficient with LinkedHashMap you should provide an good estimator of the required size at construction so that there is not need for dynamic resizing, so instead of LinkedHashMap::new you say () -> new LinkedHashMap<>(unsortedMap.size()).
I'm my opinion the use of a TreeMap is more neat... as keeps the code smaller so unless there is actual performance issue that could be addressed using the unsorted and sorted linked map approach I would use the Tree.
Your stream code won't even sort the map, because it is performing the operation against a HashMap, which is inherently unsorted. To make your second stream example work, you may use LinkedHashMap, which maintains insertion order:
Map<String, Integer> sortedMap = new LinkedHashMap<>();
unsortMap.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.forEachOrdered(x -> sortedMap.put(x.getKey(), x.getValue()));
But now your two examples are not even the same underlying data structure. A TreeMap is backed by a tree (red black if I recall correctly). You would use a TreeMap if you wanted to be able to iterate in a sorted way, or search quickly for a key. A LinkedHashMap is hashmap with a linked list running through it. You would use this if you needed to maintain insertion order, for example when implementing a queue.
The second way does not work, when you call HashMap#put, it does not hold the put order. You might need LinkedHashMap.
TreeMap v.s. Stream(LinkedHashMap):
code style. Using TreeMap is more cleaner since you can achieve it in one line.
space complexity. If the original map is HashMap, with both method you need to create a new Map. If If the original map is LinkedHashMap, then you only need create a new Map with the first approach. You can re-use the LinkedHashMap with the second approach.
time complexity. They should both have O(nln(n)).
I have a Map<Person, Long> personToSalary which I would like to map to another Map<String, Long> lastnameToSalary.
Now, this can potentially lead that two Person objects that are normally not equal, will have the equal lastname property and this will cause the duplicate key insertion to a new map. This is OK, as long as I can catch this exception, and throw my own. But not sure how to do so.
Here is the code.
Map<Person, Long> personToSalaray = getMappings();
// lastname to salary
Map<String, Long> personToSalary.entrySet().stream()
.collect(toMap(
e -> e.getKey().getLastname(),
e -> e.getValue()));
While this works, it will potentially throw an exception on duplicate key insertion (same lastname). How to catch it? I can't declare a try-catch inside of toMap.
According to the Javadocs of Collectors.map():
If the mapped keys contains duplicates (according to Object.equals(Object)), an IllegalStateException is thrown when the collection operation is performed. If the mapped keys may have duplicates, use toMap(Function, Function, BinaryOperator) instead.
So the solution should not be catching an exception but using the other toMap overload specifically designed to treat duplicates. The documentation of that method gives an example close to your scenario:
There are multiple ways to deal with collisions between multiple elements mapping to the same key. The other forms of toMap simply use a merge function that throws unconditionally, but you can easily write more flexible merge policies. For example, if you have a stream of Person, and you want to produce a "phone book" mapping name to address, but it is possible that two persons have the same name, you can do as follows to gracefully deals with these collisions, and produce a Map mapping names to a concatenated list of addresses:
Map<String, String> phoneBook
people.stream().collect(toMap(Person::getName,
Person::getAddress,
(s, a) -> s + ", " + a));
use https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toMap-java.util.function.Function-java.util.function.Function-java.util.function.BinaryOperator-
Map<String, Long> personToSalary.entrySet().stream()
.collect(toMap(
e -> e.getKey().getLastname(),
e -> e.getValue(),
(a, b) -> a // just choose one of the duplicates, or you can put more logic to decide which one you need
));
Do I understand this correctly? Is this how Java developers do it? Or is there a better way?
So if I want to find a key in a map that matches a predicate, I must first get the key set from the map through the conveniently supplied method. THEN, I have to convert the set into a stream through the conveniently supplied method. THEN, I have to filter the stream with my predicate through the conveniently supplied method. THEN, I have to convert the stream into a container of a type of my choosing, possibly supplying a collector to do so, through the conveniently supplied method. THEN, I can at least check the container for empty to know if anything matched. Only then can I use the key(s) to extract the values of interest, or I could have used the entry set from the beginning and spare myself the extra step.
Is this the way, really? Because as far as I can tell, there are no other methods either built into the map or provided as a generic search algorithm over iterators or some other container abstraction.
I prefer entrySet myself as well. You should find this efficient:
Map<String, Integer> map; //Some example Map
//The map is filled here
List<Integer> valuesOfInterest = map.entrySet()
.stream() //Or parallelStream for big maps
.filter(e -> e.getKey().startsWith("word")) //Or some predicate
.map(Map.Entry::getValue) //Get the values
.collect(Collectors.toList()); //Put them in a list
The list is empty if nothing matched. This is useful if multiple keys match the predicate.
In a nutshell, it is as simple as:
Predicate<T> predicate = (t -> <your predicate here>);
return myMap.keySet()
.stream()
.filter(predicate)
.findAny()
.map(myMap::get);
returns an empty Optional if no key matches
(nota: findAny is better than findFirst because it does not prevent parallelization if relevant, and findFirst is useless anyway since the Set of keys is not sorted in any meaningful way, unless your Map is a SortedMap)
It’s not clear why you are shouting “THEN” so often. It’s the standard way of solving problems, to combine tools designed for broad variety of use cases to get your specific result. There is a built-in capability for traversing a sequence of elements and search for matches, the Stream API. Further, the Map interface provides you with the Collection views, keySet(), entrySet(), and values(), to be able to use arbitrary tools operating on Collections, the bridge to the Stream API being one of them.
So if you have a Map<Key,Value> and are interested in the values, whose keys match a predicate, you may use
List<Value> valuesOfInterest = map.entrySet().stream()
.filter(e -> e.getKey().fulfillsCondition())
.map(Map.Entry::getValue)
.collect(Collectors.toList());
which consists of three main steps, filter to select matches, map to specify whether you are interested in the key, value, entry or a converted value of each matche and collect(Collectors.toList()) to specify that you want to collect the results into a List.
Each of these steps could be replaced by a different choice and the entire stream pipeline could be augmented by additional processing steps. Since you want this specific combination of operations, there is nothing wrong with having to specify exactly these three steps instead of getting a convenience method for your specific use case.
The initial step of entrySet().stream() is required as you have to select the entry set as starting point and switch to the Stream API which is the dedicated API for element processing that doesn’t modify the source. The Collection API, on the other hand, provides you with methods with might mutate the source. If you are willing to use that, the alternative to the code above is
map.keySet().removeIf(key -> !key.fulfillsCondition());
Collection<Value> valuesOfInterest=map.values();
which differs in that the nonmatching entries are indeed removed from the source map. Surely, you don’t want to confuse these two, so it should be understandable, why there is a clear separation between the Collection API and the Stream API.
I am trying to get a HashMap<String,String> from a CSV String value using Java 8 Streams API. I am able to get the values etc, but how do I add the index of the List as the key in my HashMap.
(HashMap<String, String>) Arrays.asList(sContent.split(",")).stream()
.collect(Collectors.toMap(??????,i->i );
So my map will contain like Key ,Value as below.
0->Value1
1->Value2
2->Value3
...
Using Normal Java I can do it easily but I wanted to use the JAVA 8 stream API.
That’s a strange requirement. When you call Arrays.asList(sContent.split(",")), you already have a data structure which maps int numbers to their Strings. The result is a List<String>, something on which you can invoke .get(intNumber) to get the desired value as you can with a Map<Integer,String>…
However, if it really has to be a Map and you want to use the stream API, you may use
Map<Integer,String> map=new HashMap<>();
Pattern.compile(",").splitAsStream(sContent).forEachOrdered(s->map.put(map.size(), s));
To explain it, Pattern.compile(separator).splitAsStream(string) does the same as Arrays.stream(string.split(separator)) but doesn’t create an intermediate array, so it’s preferable. And you don’t need a separate counter as the map intrinsically maintains such a counter, its size.
The code above in the simplest code for creating such a map ad-hoc whereas a clean solution would avoid mutable state outside of the stream operation itself and return a new map on completion. But the clean solution is not always the most concise:
Map<Integer,String> map=Pattern.compile(",").splitAsStream(sContent)
.collect(HashMap::new, (m,s)->m.put(m.size(), s),
(m1,m2)->{ int off=m1.size(); m2.forEach((k,v)->m1.put(k+off, v)); }
);
While the first two arguments to collect define an operation similar to the previous solution, the biggest obstacle is the third argument, a function only used when requesting parallel processing though a single csv line is unlikely to ever benefit from parallel processing. But omitting it is not supported. If used, it will merge two maps which are the result of two parallel operations. Since both used their own counter, the indices of the second map have to be adapted by adding the size of the first map.
You can use below approach to get you the required output
private Map<Integer, String> getMapFromCSVString(String csvString) {
AtomicInteger integer = new AtomicInteger();
return Arrays.stream(csvString.split(","))
.collect(Collectors.toMap(splittedStr -> integer.getAndAdd(1), splittedStr -> splittedStr));
}
I have written below test to verify the output.
#Test
public void getCsvValuesIntoMap(){
String csvString ="shirish,vilas,Nikhil";
Map<Integer,String> expected = new HashMap<Integer,String>(){{
put(0,"shirish");
put(1,"vilas");
put(2,"Nikhil");
}};
Map<Integer,String> result = getMapFromCSVString(csvString);
System.out.println(result);
assertEquals(expected,result);
}
You can do it creating a range of indices like this:
String[] values = sContent.split(",");
Map<Integer, String> result = IntStream.range(0, values.length)
.boxed()
.collect(toMap(Function.identity(), i -> values[i]));