I 've run into a scenario where I want to lowercase all the keys of a HashMap (don't ask why, I just have to do this). The HashMap has some millions of entries.
At first, I thought I 'd just create a new Map, iterate over the entries of the map that is to be lowercased, and add the respective values. This task should run only once per day or something like that, so I thought I could bare this.
Map<String, Long> lowerCaseMap = new HashMap<>(myMap.size());
for (Map.Entry<String, Long> entry : myMap.entrySet()) {
lowerCaseMap.put(entry.getKey().toLowerCase(), entry.getValue());
}
this, however, caused some OutOfMemory errors when my server was overloaded during this one time that I was about to copy the Map.
Now my question is, how can I accomplish this task with the smallest memory footprint?
Would removing each key after lowercased - added to the new Map help?
Could I utilize java8 streams to make this faster? (e.g something like this)
Map<String, Long> lowerCaseMap = myMap.entrySet().parallelStream().collect(Collectors.toMap(entry -> entry.getKey().toLowerCase(), Map.Entry::getValue));
Update
It seems that it's a Collections.unmodifiableMap so I don't have the option of
removing each key after lowercased - added to the new Map
Instead of using HashMap, you could try using a TreeMap with case-insensitive ordering. This would avoid the need to create a lower-case version of each key:
Map<String, Long> map = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
map.putAll(myMap);
Once you've constructed this map, put() and get() will behave case-insensitively, so you can save and fetch values using all-lowercase keys. Iterating over keys will return them in their original, possibly upper-case forms.
Here are some similar questions:
Case insensitive string as HashMap key
Is there a good way to have a Map<String, ?> get and put ignoring case?
You cannot remove the entry while iterating over the map. You will have a ConcurentModificationException if you try to do this.
As the issue is an OutOfMemoryError, not a performance error, using parallel stream will not help either.
Despite some task on the Stream API will be done lately, this will still lead to have two maps in memory at some point so you will still have the issue.
To workaround it, I only saw two ways :
Give more memory to your process (by increasing -Xmx on the Java command line). Memory is cheap these days ;)
Split the map and work in chunks : for example you divide the size of the map by ten and you process one chunck at a time and delete the processed entries before processing the new chunk. By this instead of having two times the map in memory you will just have 1.1 times the map.
For the split algorithm, you can try someting like this using the Stream API :
Map<String, String> toMap = new HashMap<>();
int chunk = fromMap.size() / 10;
for(int i = 1; i<= 10; i++){
//process the chunk
List<Entry<String, String>> subEntries = fromMap.entrySet().stream().limit(chunk)
.collect(Collectors.toList());
for(Entry<String, String> entry : subEntries){
toMap.put(entry.getKey().toLowerCase(), entry.getValue());
fromMap.remove(entry.getKey());
}
}
the concerns in the above answers are correct and you might need to reconsider changing the data structure you are using.
for me, I had a simple map I needed to change its keys to lower case
take a look at my snippet, its a trivial solution and bad at performance
private void convertAllFilterKeysToLowerCase() {
HashSet keysToRemove = new HashSet();
getFilters().keySet().forEach(o -> {
if(!o.equals(((String) o).toLowerCase()))
keysToRemove.add(o);
});
keysToRemove.forEach(o -> getFilters().put(((String) o).toLowerCase(), getFilters().remove(o)));
}
Not sure about the memory footprint. If using Kotlin, you can try the following.
val lowerCaseMap = myMap.mapKeys { it.key.toLowerCase() }
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/map-keys.html
Related
I am trying to get a HashMap<String,String> from a CSV String value using Java 8 Streams API. I am able to get the values etc, but how do I add the index of the List as the key in my HashMap.
(HashMap<String, String>) Arrays.asList(sContent.split(",")).stream()
.collect(Collectors.toMap(??????,i->i );
So my map will contain like Key ,Value as below.
0->Value1
1->Value2
2->Value3
...
Using Normal Java I can do it easily but I wanted to use the JAVA 8 stream API.
That’s a strange requirement. When you call Arrays.asList(sContent.split(",")), you already have a data structure which maps int numbers to their Strings. The result is a List<String>, something on which you can invoke .get(intNumber) to get the desired value as you can with a Map<Integer,String>…
However, if it really has to be a Map and you want to use the stream API, you may use
Map<Integer,String> map=new HashMap<>();
Pattern.compile(",").splitAsStream(sContent).forEachOrdered(s->map.put(map.size(), s));
To explain it, Pattern.compile(separator).splitAsStream(string) does the same as Arrays.stream(string.split(separator)) but doesn’t create an intermediate array, so it’s preferable. And you don’t need a separate counter as the map intrinsically maintains such a counter, its size.
The code above in the simplest code for creating such a map ad-hoc whereas a clean solution would avoid mutable state outside of the stream operation itself and return a new map on completion. But the clean solution is not always the most concise:
Map<Integer,String> map=Pattern.compile(",").splitAsStream(sContent)
.collect(HashMap::new, (m,s)->m.put(m.size(), s),
(m1,m2)->{ int off=m1.size(); m2.forEach((k,v)->m1.put(k+off, v)); }
);
While the first two arguments to collect define an operation similar to the previous solution, the biggest obstacle is the third argument, a function only used when requesting parallel processing though a single csv line is unlikely to ever benefit from parallel processing. But omitting it is not supported. If used, it will merge two maps which are the result of two parallel operations. Since both used their own counter, the indices of the second map have to be adapted by adding the size of the first map.
You can use below approach to get you the required output
private Map<Integer, String> getMapFromCSVString(String csvString) {
AtomicInteger integer = new AtomicInteger();
return Arrays.stream(csvString.split(","))
.collect(Collectors.toMap(splittedStr -> integer.getAndAdd(1), splittedStr -> splittedStr));
}
I have written below test to verify the output.
#Test
public void getCsvValuesIntoMap(){
String csvString ="shirish,vilas,Nikhil";
Map<Integer,String> expected = new HashMap<Integer,String>(){{
put(0,"shirish");
put(1,"vilas");
put(2,"Nikhil");
}};
Map<Integer,String> result = getMapFromCSVString(csvString);
System.out.println(result);
assertEquals(expected,result);
}
You can do it creating a range of indices like this:
String[] values = sContent.split(",");
Map<Integer, String> result = IntStream.range(0, values.length)
.boxed()
.collect(toMap(Function.identity(), i -> values[i]));
My app will receive a Hashmap<String,Object> from another application.
Is there any way to trim the string keys without iterating the hashmap which leads to performance downgrade becoz Hashmap may contain a lot of real data entries.
Thanks
You have two options:
trim before adding
iterate through MapEntries ad update all keys
In your example you have no choice, and you need to iterate over those keys.
Althought in java 8 you could use pararell streams to boost up such operation. But I would not recommend it in multithread enviroment.
One thing to note beforehand:
The map might contain 2 keys where the first is the trimmed version of the second. By doing what you want, it would overwrite/remove one of them from the map! E.g. the map might contain the keys "a " and "a", and by trimming the keys one of them will disappear!
HashMap does not provide any way to manipulate keys without iterating over them.
You can either "copy" the entries to a new map with keys trimmed (as with #RuchiraGayanRanaweera's solution), or you can do it in the same map like this:
Solution #1: Duplicate entry set and replace the different keys
So what you may do is iterate over the entries, and trim the keys. This also means that if the trimmed key is not equal to the original, you have to remove the entry with the old key and put it again with the new one. You only need to replace the entry if the trimmed version is different:
Map<String, Object> map = new HashMap<>();
for (Entry<String, Object> entry : new HashSet<>(map.entrySet())) {
String trimmed = entry.getKey().trim();
if (!trimmed.equals(entry.getKey())) {
map.remove(entry.getKey());
map.put(trimmed, entry.getValue());
}
}
Note that it is necessary to create a new Set of the entry set because quoting from the javadoc of HashMap.entrySet():
If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation, or through the setValue operation on a map entry returned by the iterator) the results of the iteration are undefined.
Solution #2: Collect first then replace the different keys
Another option is to collect the keys where the trimmed key is different, and change only those after the first iteration. This solution has the advantage of not having to "duplicate" the entry set to iterate over it. If there are relatively few keys whose trimmed variant is different, probably this is the fastest solution:
Map<String, Object> map = new HashMap<>();
// Set to store the modified keys,
// Also store the trimmed String for performance reasons
Set<String[]> modifiedSet = new HashSet<>();
for (Entry<String, Object> entry : map.entrySet()) {
String trimmed = entry.getKey().trim();
if (!trimmed.equals(entry.getKey()))
modifiedSet.add(new String[]{entry.getKey(), trimmed});
}
// Changing a key can be done in one step:
// Removing the old entry (which returns the old value) and put the new
for (String[] modified : modifiedSet)
map.put(modified[1], map.remove(modified[0]));
If there is no way to trim() keys before adding to HashMap then you have to do something like following:
HashMap<String,Object> map=new HashMap<>();
HashMap<String,Object> newMap=new HashMap<>();
for(Map.Entry<String,Object> entry:map.entrySet()){
newMap.put(entry.getKey().trim(),entry.getValue());
}
In my application, we already have
Map<String, List<String>>
Now we got another use-case where need to find a key which is mapped to a particular string inside the list.
I am thinking of writing following:
string getKey(Map<String, List<String>> m, String str) {
for (Entry<String, List<String>> entry :m.entrySet()) {
if(entry.getValue().contains(str)) {
retrun entry.getKey();
}
}
return null;
}
Map can have at max 2000 entries. and each List can have at max 500 Strings.
Any suggestions that might be a better fit? I can change the initial data structure (Map) also if there is a better way to do it..
I would suggest that you add another map that gives the reverse mapping - from a string to list of keys. This will require a bit more of work to keep in sync with the first map but will give the best performance for your task.
Still if you think the frequency of such queries will be relatively low, maybe your solution could be better(although slower, the time you save for keeping both maps in sync will make up for that).
Are you familiar with Guava API? If you have the liberty to add/change dependencies, it's definitely worth checking out, particularly implementations of the Multimap interface. It's exactly built for those cases where the same key can be mapped to multiple values.
In case I misunderstood the question and the same key will not be mapped to multiple values then you might need to reformulate/rethink your question as your current idea Map<String, List<String>> is essentially just that.
I'm interested how I can very quickly change the Boolean values into this hashmap:
HashMap<String, Boolean> selectedIds = new HashMap<>();
I want very quickly to replace the Boolean values all to be true. How I can do this?
The fastest way is this:
for (Map.Entry<String, Boolean> entry : selectedIds.entrySet()) {
entry.setValue(true);
}
This code avoids any lookups whatsoever, because it iterates though the entire map's entries and sets their values directly.
Note that whenever HashMap.put() is called, a key look up occurs in the internal Hashtable. While the code is highly optimized, it nevertheless requires work to calculate and compare hashcodes, then employ an algorithm to ultimately find the entry (if it exists). This is all "work", and consumes CPU cycles.
Java 8 update:
Java 8 introduced a new method replaceAll() for just such a purpose, making the code required even simpler:
selectedIds.replaceAll((k, v) -> true);
This will iterate through your map and replace all the old values with a true value for each key. HashMap put method
for(String s : selectedIds.keySet()) {
selectedIds.put(s, true);
}
I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!
Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}
You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.
I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()
There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.
Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.
If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.