How to create HashMap with streams overriding duplicates?

How to create HashMap with streams overriding duplicates? - java

I'm creating a HashMap using java8 stream API as follows:
Map<Integer, String> map = dao.findAll().stream()
.collect(Collectors.toMap(Entity::getType, Entity::getValue));
Now if an element is added to the collection where the key already exists, I just want to keep the existing element in the list and skip
the additional element. How can I achieve this? Probably I have to make use of BinaryOperation<U> of toMap(), but could anyone provide
an example of my specific case?

Yes, you need that BinaryOperation<U> and use it as a third argument for Collectors.toMap().
In case of a conflict (appearance of an already existing key) you get to choose between value oldValue (the existing one) and newValue. In the code example we always take value oldValue. But you are free to do anything else with these two values (take the larger one, merge the two etc.).
The following example shows one possible solution where the existing value always remains in the map:
Map<Integer, String> map = dao.findAll().stream()
.collect(Collectors.toMap(Entity::getType, Entity::getValue, (oldValue, newValue) -> oldValue));
See the documentation for another example.

Related

How to continually update a HashMap where both Key and Value are ArrayLists in Java

I am working on a project that involves minor combining and relating of datasets for some work, and have been stuck for some time.
I have groups of data, which are similar to other groups of data in a project. I have an array list of the group's names. I am comparing these similar groups to other dataset, describing similar things. (A collection of groups, which are similar to other groups in their own collection).
I've been trying to solve this by using:
HashMap<ArrayList<String>, ArrayList<String>>
It is proving very difficult to add another group (by name, a String) when another relation is found.
If I find another group from each dataset and want to add to a current ArrayList (which is why I am using ArrayLists), it creates another entry, where the new key and value are the same as the previous but with the added element in each ArrayList.
Here is the current, relevant code:
...
for(ArrayList<String> similarGroupsDataset : map.keySet()) {
...
ArrayList<String> value = map.get(similarGroupsDataset);
ArrayList<String> key = similarGroups;
value.add(groupToAdd);
key.add(groupToAdd2);
map.remove(similarGroupsDataset);
map.put(key, value);
}
Store the ArrayList key and Arraylist value into variables, add the newly found pieces of data, remove the old entry, and add the updated version.
For some reason this seams to not remove the entry which does not have the newly added found data.
So if I print out the map, it would look like
({1,2},{a,b}) , ({1,2,3},{a,b,c})
What it should look like is
({1,2,3} , {a,b,c}), taking out the irrelevant entry.
Where 1,2 in dataset1 are similar, which are similar still to a,b from dataset2, etc. if that makes sense.
I have tried to do
map.get(relevantGroupFromDataset2).add(data)
//adds the newly found similar group to the list of groups
//which are all similar to eachother, from dataset1.
That works sometimes, but only for the value, not the key, it seems.
In the end, my goal is to remake these datasets with an identifier tying these groups together by a new identifier, rather than their current identifier which doesn't tie them together in the way I want.
Am I doing something wrong here? Is there a better data structure to use in this scenario? Is a HashMap or similar structure the way to go?

If I find another group from each dataset and want to add to a current
ArrayList (which is why I am using ArrayLists), it creates another
entry, where the new key and value are the same as the previous but
with the added element in each ArrayList.
You use as key an ArrayList.
In a map, the keys are retrieved from their hashCode()/equals() method.
So when you change the content of the ArrayList key here :
ArrayList<String> value = map.get(similarGroupsDataset);
ArrayList<String> key = similarGroups;
value.add(groupToAdd);
key.add(groupToAdd2); // <-- here
hashCode() and equals() will not produce the same result any longer.
It is considered as a new key for the map.
So map.put(key, value); will add a new element.
It is not obvious to give you a good workaround with your actual code as the logic performed by the Map and what you expect is really not clear.
For example :
value.add(groupToAdd);
key.add(groupToAdd2);
is either a very bad naming or you populate only with groups the key-values of your map.
The general idea is you should not use in your map a key which the hashCode()/ equals() result may change after that the key were added in the map.
To achieve it :
Either put the value with the the ArrayList key at a time where you know that the key will not be modified any longer.
Either remove the value with the key and add it again with the new key.
In any case, to avoid this kind of errors, you should create an unmodifiable List for keys and pass it in the map:
map.put(Collections.unmodifiableList(keys), ...);

In a Map one should better keep the key object immutable.
When one changes the key object in a HashMap, and the new hashCode is different, the map is corrupt.
So you have to remove the old key object and insert the new key object.
The data structure fitting your example would be a tree of (group, datum), where you extend the path to the leafs.
tree -> (a, 1)
+--> (x, 24)
+--> (b, 2)
+--> (c, 3)
And consider only all paths to a final leaf.
Admittedly a bit more work.

By key.add(groupToAdd2); the key is changed, but keys of a map must be effective immutable:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed...
Swap the lines key.add(groupToAdd2); and map.remove(similarGroupsDataset); to fix this or even better:
...
for (Entry<ArrayList<String>, ArrayList<String>> entry : map.entrySet()) {
...
map.remove(entry.getKey()); // remove from map before changing the key
entry.getKey().add(groupToAdd2);
entry.getValue().add(groupToAdd);
map.put(entry.getKey(), entry.getValue());
}

Is there an efficient way of checking if HashMap contains keys that map to the same value?

I basically need to know if my HashMap has different keys that map to the same value. I was wondering if there is a way other than checking each keys value against all other values in the map.
Update:
Just some more information that will hopefully clarify what I'm trying to accomplish. Consider a String "azza". Say that I'm iterating over this String and storing each character as a key, and it's corresponding value is some other String. Let's say I eventually get to the last occurrence of 'a' and the value is already be in the map.This would be fine if the key corresponding with the value that is already in the map is also 'a'. My issue occurs when 'a' and 'z' both map to the same value. Only if different keys map to the same value.

Sure, the fastest to both code and execute is:
boolean hasDupeValues = new HashSet<>(map.values()).size() != map.size();
which executes in O(n) time.
Sets don't allow duplicates, so the set will be smaller than the values list if there are dupes.

Very similar to EJP's and Bohemian's answer above but with streams:
boolean hasDupeValues = map.values().stream().distinct().count() != map.size();

You could create a HashMap that maps values to lists of keys. This would take more space and require (slightly) more complex code, but with the benefit of greatly higher efficiency (amortized O(1) vs. O(n) for the method of just looping all values).
For example, say you currently have HashMap<Key, Value> map1, and you want to know which keys have the same value. You create another map, HashMap<Value, List<Key>> map2.
Then you just modify map1 and map2 together.
map1.put(key, value);
if(!map2.containsKey(value)) {
map2.put(value, new ArrayList<Key>);
}
map2.get(value).add(key);
Then to get all keys that map to value, you just do map2.get(value).
If you need to put/remove in many different places, to make sure that you don't forget to use map2 you could create your own data structure (i.e. a separate class) that contains 2 maps and implement put/remove/get/etc. for that.
Edit: I may have misunderstood the question. If you don't need an actual list of keys, just a simple "yes/no" answer to "does the map already contain this value?", and you want something better than O(n), you could keep a separate HashMap<Value, Integer> that simply counts up how many times the value occurs in the map. This would take considerably less space than a map of lists.

You can check whether a map contains a value already by calling map.values().contains(value). This is not as efficient as looking up a key in the map, but still, it's O(n), and you don't need to create a new set just in order to count its elements.
However, what you seem to need is a BiMap. There is no such thing in the Java standard library, but you can build one relatively easily by using two HashMaps: one which maps keys to values and one which maps values to keys. Every time you map a key to a value, you can then check in amortized O(1) whether the value already is mapped to, and if it isn't, map the key to the value in the one map and the value to the key in the other.
If it is an option to create a new dependency for your project, some third-party libraries contain ready-made bimaps, such as Guava (BiMap) and Apache Commons (BidiMap).

You could iterate over the keys and save the current value in the Set.
But, before inserting that value in a Set, check if the Set already contains that value.
If this is true, it means that a previous key already contains the same value.
Map<Integer, String> map = new HashMap<>();
Set<String> values = new HashSet<>();
Set<Integter> keysWithSameValue = new HashSet<>();
for(Integer key : map.keySet()) {
if(values.contains(map.get(key))) {
keysWithSameValue.add(key);
}
values.add(map.get(key));
}

How do I search a map in Java for a key that matches a predicate?

Do I understand this correctly? Is this how Java developers do it? Or is there a better way?
So if I want to find a key in a map that matches a predicate, I must first get the key set from the map through the conveniently supplied method. THEN, I have to convert the set into a stream through the conveniently supplied method. THEN, I have to filter the stream with my predicate through the conveniently supplied method. THEN, I have to convert the stream into a container of a type of my choosing, possibly supplying a collector to do so, through the conveniently supplied method. THEN, I can at least check the container for empty to know if anything matched. Only then can I use the key(s) to extract the values of interest, or I could have used the entry set from the beginning and spare myself the extra step.
Is this the way, really? Because as far as I can tell, there are no other methods either built into the map or provided as a generic search algorithm over iterators or some other container abstraction.

I prefer entrySet myself as well. You should find this efficient:
Map<String, Integer> map; //Some example Map
//The map is filled here
List<Integer> valuesOfInterest = map.entrySet()
.stream() //Or parallelStream for big maps
.filter(e -> e.getKey().startsWith("word")) //Or some predicate
.map(Map.Entry::getValue) //Get the values
.collect(Collectors.toList()); //Put them in a list
The list is empty if nothing matched. This is useful if multiple keys match the predicate.

In a nutshell, it is as simple as:
Predicate<T> predicate = (t -> <your predicate here>);
return myMap.keySet()
.stream()
.filter(predicate)
.findAny()
.map(myMap::get);
returns an empty Optional if no key matches
(nota: findAny is better than findFirst because it does not prevent parallelization if relevant, and findFirst is useless anyway since the Set of keys is not sorted in any meaningful way, unless your Map is a SortedMap)

It’s not clear why you are shouting “THEN” so often. It’s the standard way of solving problems, to combine tools designed for broad variety of use cases to get your specific result. There is a built-in capability for traversing a sequence of elements and search for matches, the Stream API. Further, the Map interface provides you with the Collection views, keySet(), entrySet(), and values(), to be able to use arbitrary tools operating on Collections, the bridge to the Stream API being one of them.
So if you have a Map<Key,Value> and are interested in the values, whose keys match a predicate, you may use
List<Value> valuesOfInterest = map.entrySet().stream()
.filter(e -> e.getKey().fulfillsCondition())
.map(Map.Entry::getValue)
.collect(Collectors.toList());
which consists of three main steps, filter to select matches, map to specify whether you are interested in the key, value, entry or a converted value of each matche and collect(Collectors.toList()) to specify that you want to collect the results into a List.
Each of these steps could be replaced by a different choice and the entire stream pipeline could be augmented by additional processing steps. Since you want this specific combination of operations, there is nothing wrong with having to specify exactly these three steps instead of getting a convenience method for your specific use case.
The initial step of entrySet().stream() is required as you have to select the entry set as starting point and switch to the Stream API which is the dedicated API for element processing that doesn’t modify the source. The Collection API, on the other hand, provides you with methods with might mutate the source. If you are willing to use that, the alternative to the code above is
map.keySet().removeIf(key -> !key.fulfillsCondition());
Collection<Value> valuesOfInterest=map.values();
which differs in that the nonmatching entries are indeed removed from the source map. Surely, you don’t want to confuse these two, so it should be understandable, why there is a clear separation between the Collection API and the Stream API.

Trouble understanding Java map Entry sets

I'm looking at a java hangman game here: https://github.com/leleah/EvilHangman/blob/master/EvilHangman.java
The code in particular is this:
Iterator<Entry<List<Integer>, Set<String>>> k = partitions.entrySet().iterator();
while (k.hasNext())
{
Entry<?, ?> pair = (Entry<?, ?>)k.next();
int sizeOfSet = ((Set<String>)pair.getValue()).size();
if (sizeOfSet > biggestPartitionSize)
{
biggestPartitionSize = sizeOfSet;
}
}
Now my question. My google foo is weak I guess, I cannot find much on Entry sets other than the java doc itself. Is is just a temporary copy of the map? And I cannot find any info at all on the syntax:
Entry<?, ?>
Can anyone explain or point me toward an explanation of what is going on with those question marks? Thanks in advance!

An entrySet is the set of all Entries in a Map - i.e. the set of all key,value pairs in the Map. Because a Map consists of key,value pairs, if you want to iterate over it you need to specify whether you want to iterate over the keys, the values, or both (Entries).
The <?,?> indicates that the pair variable holds an Entry where the key and the value could be of any type. This would normally indicate that we don't care what types of values it holds. In your code this is not the case, because you need to cast the value to Set<String> so you can check its size.
You could also rewrite the code as follows, avoiding the cast to Set<String>
Iterator<Entry<List<Integer>, Set<String>>> k = partitions.entrySet().iterator();
while (k.hasNext())
{
Entry<?, Set<String>> pair = (Entry<?, Set<String>>)k.next();
int sizeOfSet = pair.getValue().size();
if (sizeOfSet > biggestPartitionSize)
{
biggestPartitionSize = sizeOfSet;
}
When we need to be more specific about the types that the Entry holds, we can use the full type: Entry<List<Integer>, Set<String>>. This avoids the need to cast the key or value to a particular type (and the risk of casting to the wrong type).
You can also specify just the type of the key, or the value, as shown in my example above.

You can find information about Entry in the Javadoc:
http://docs.oracle.com/javase/7/docs/api/java/util/Map.Entry.html
The <?, ?> part is because Entry is a generic interface.
More info on ? here: http://docs.oracle.com/javase/tutorial/java/generics/wildcards.html
That being said, the usage in this example is not very nice. A cleaner way of getting sizeOfSet:
int sizeOfSet = k.next().getValue().size();

You aren't supposed to know much about the entrySet() function returns. All you are allowed to depend on is that it is a Set<Map.Entry<x,y>>. It might be a rather special (*) copy, but it's more likely to be an object that provides a view of the innards of the original Map.
In modern Java, this sort of thing get commonly written as:
for (Map.Entry<X, Y> me : map.entrySet()) {
// code that looks at me.xxx() here
}
In your example, X,Y is List<Integer>, Set<String>.
(*) The documentation says, 'The set is backed by the map, so changes to the map are reflected in the set, and vice-versa.'
Thanks to #Miserable Variable.

Entry<?, ?>
Can anyone explain or point me toward an explanation of what is going on with those question marks?
I could be wrong but I think the programmer is trying to be lazy. It should have been
Entry<List<Integer>, Set<String>>>
then none of the casts would have been required
ALSO:
My google foo is weak I guess, I cannot find much on Entry sets other than the java doc itself. Is is just a temporary copy of the map?
Javadoc has everything you need (emphasis mine):
Returns a Set view of the mappings contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa.
If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation, or through the setValue operation on a map entry returned by the iterator) the results of the iteration are undefined.
The set supports element removal, which removes the corresponding mapping from the map, via the Iterator.remove, Set.remove, removeAll, retainAll and clear operations.
It does not support the add or addAll operations.
What more information are you looking for?

How to transform List<String> to Map<String,String> with Google collections?

I have a list of strings and I have a function to generate a value for each key in the list.
I want to create a map using this function. Can I do this with Google collections?

Use Maps.uniqueIndex(Iterable, Function) :
Returns an immutable map for which the
Map.values() are the given elements in
the given order, and each key is the
product of invoking a supplied
function on its corresponding value.(from javadoc)
Example:
Map<String,String> mappedRoles = Maps.uniqueIndex(yourList, new Function<String,String>() {
public String apply(String from) {
// do stuff here
return result;
}});

As of 7/26/2012, Guava master contains two new ways to do this. They should be in release 14.0.
Maps.asMap(Set<K>, Function<? super K, V>) (and two overloads for SortedSet and NavigableSet) allows you to view a Set plus a Function as a Map where the value for each key in the set is the result of applying the function to that key. The result is a view, so it doesn't copy the input set and the Map result will change as the set does and vice versa.
Maps.toMap(Iterable<K>, Function<? super K, V>) takes an Iterable and eagerly converts it to an ImmutableMap where the distinct elements of the iterable are the keys and the values are the results of applying the function to each key.

EDIT: It's entirely possible that Sean's right and I misunderstood the question.
If the original list is meant to be keys, then it sounds like you might be able to just use a computing map, via MapMaker.makeComputingMap, and ignore the input list to start with. EDIT: As noted in comments, this is now deprecated and deleted in Guava 15.0. Have a look at CacheBuilder instead.
On the other hand, that also doesn't give you a map which will return null if you ask it for a value corresponding to a key which wasn't in the list to start with. It also won't give you In other words, this may well not be appropriate, but it's worth consideration, depending on what you're trying to do with it. :)
I'll leave this answer here unless you comment that neither approach here is useful to you, in which case I'll delete it.
Original answer
Using Guava you can do this pretty easily with Maps.uniqueIndex:
Map<String, String> map = Maps.uniqueIndex(list, keyProjection);
(I mentioned Guava specifically as opposed to Google collections, as I haven't checked whether the older Google collections repository includes Maps.uniqueIndex.)

Either I have misunderstood you or the other posters have. I understand that you want your list to be the map keys, while Maps.uniqueIndex() creates keys to map to your values (which is quite the opposite).
Anyway, there is an open Guava issue that requests the exact functionality you are requesting, and I have also implemented such a solution for a previous question.

Using Guava + Lambda
Map<String, YourCustomClass> map = Maps.uniqueIndex(YourList, YourCustomClass -> YourCustomClass.getKey());

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to create HashMap with streams overriding duplicates? - java

Related

How to continually update a HashMap where both Key and Value are ArrayLists in Java

Is there an efficient way of checking if HashMap contains keys that map to the same value?

How do I search a map in Java for a key that matches a predicate?

Trouble understanding Java map Entry sets

How to transform List<String> to Map<String,String> with Google collections?

Categories

Resources