Convert CSV string value to Hashmap using Stream lambda - java

I am trying to get a HashMap<String,String> from a CSV String value using Java 8 Streams API. I am able to get the values etc, but how do I add the index of the List as the key in my HashMap.
(HashMap<String, String>) Arrays.asList(sContent.split(",")).stream()
.collect(Collectors.toMap(??????,i->i );
So my map will contain like Key ,Value as below.
0->Value1
1->Value2
2->Value3
...
Using Normal Java I can do it easily but I wanted to use the JAVA 8 stream API.

That’s a strange requirement. When you call Arrays.asList(sContent.split(",")), you already have a data structure which maps int numbers to their Strings. The result is a List<String>, something on which you can invoke .get(intNumber) to get the desired value as you can with a Map<Integer,String>…
However, if it really has to be a Map and you want to use the stream API, you may use
Map<Integer,String> map=new HashMap<>();
Pattern.compile(",").splitAsStream(sContent).forEachOrdered(s->map.put(map.size(), s));
To explain it, Pattern.compile(separator).splitAsStream(string) does the same as Arrays.stream(string.split(separator)) but doesn’t create an intermediate array, so it’s preferable. And you don’t need a separate counter as the map intrinsically maintains such a counter, its size.
The code above in the simplest code for creating such a map ad-hoc whereas a clean solution would avoid mutable state outside of the stream operation itself and return a new map on completion. But the clean solution is not always the most concise:
Map<Integer,String> map=Pattern.compile(",").splitAsStream(sContent)
.collect(HashMap::new, (m,s)->m.put(m.size(), s),
(m1,m2)->{ int off=m1.size(); m2.forEach((k,v)->m1.put(k+off, v)); }
);
While the first two arguments to collect define an operation similar to the previous solution, the biggest obstacle is the third argument, a function only used when requesting parallel processing though a single csv line is unlikely to ever benefit from parallel processing. But omitting it is not supported. If used, it will merge two maps which are the result of two parallel operations. Since both used their own counter, the indices of the second map have to be adapted by adding the size of the first map.

You can use below approach to get you the required output
private Map<Integer, String> getMapFromCSVString(String csvString) {
AtomicInteger integer = new AtomicInteger();
return Arrays.stream(csvString.split(","))
.collect(Collectors.toMap(splittedStr -> integer.getAndAdd(1), splittedStr -> splittedStr));
}
I have written below test to verify the output.
#Test
public void getCsvValuesIntoMap(){
String csvString ="shirish,vilas,Nikhil";
Map<Integer,String> expected = new HashMap<Integer,String>(){{
put(0,"shirish");
put(1,"vilas");
put(2,"Nikhil");
}};
Map<Integer,String> result = getMapFromCSVString(csvString);
System.out.println(result);
assertEquals(expected,result);
}

You can do it creating a range of indices like this:
String[] values = sContent.split(",");
Map<Integer, String> result = IntStream.range(0, values.length)
.boxed()
.collect(toMap(Function.identity(), i -> values[i]));

Related

How do I simultaneously iterate through a list and a set while pairing them in another map (using stream)

What i've tried is creating an iterator for the list and using stream on the set as such
Set //some object which has a getId method
Iterator<String> iterator = list.iterator();
someSet.stream()
.map(Collectors.toMap(e -> e.getId(), e -> iterator.next() );
The Stream API is designed to work and iterate through one and only one collection and no more.
If you want to achieve such iteration called "zipping", as mentioned in the another answer, you have to iterate the indices. Since Set is not ordered, you have to use List instead and know the order is not predictable.
However, the usage of Stream API should be fine here:
Set<MyObject> set = ... // MyObject with getId method
List<MyObject> listFromSet = new ArrayList<>(set);
List<MyObject> list = ... // existing list
IntStream.range(0, Math.min(list.size(), listFromSet.size()))
.mapToObj(index -> new AbstractMap.SimpleEntry<>(
listFromSet.get(index).getId(), // key
list.get(index)) // value
)
.collect(Collectors.toMap(Entry::getKey, Entry::getValue)); // to Map
Few notes:
To know the highest number you can iterate through, you need ti find a lower from the sizes of the iterables: Math.min(list.size(), listFromSet.size()).
map(Collector.toMap(...)) doesn't convert a Stream to Map but is not a valid construct, moreover the method map is not a terminal operation. Use .collect(Collectors.toMap(...)) instead which is.
Not all the keys from set might be used, not all the values from list might be used, there is no guaranteed order of the keys and the matching key-value will be random.
If I were to implement this, I'd definetly go for the simple for-loop iteration over the Streams.
I think, what you wat to achieve is called "zip" in fuctional programming. This would be in Java to make a new stream from two existing streams by combining each of two corresponding elements of the given streams.
Look at this question to see how to do it:
Zipping streams using JDK8 with lambda (java.util.stream.Streams.zip)

GroupBy on ArrayList of HashMap in java

I want to do a "group-by" on arrayList of HashMap Data structure. As my data is not fixed, so I don't have any fixed classes.
Data is shown as below.
[{"name":"laxman","state":"Karnataka","Mobile":9034782882},
{"name":"rahul","state":"Kerala","Mobile":9034782882},
{"name":"laxman","state":"karnataka","Mobile":9034782882},
{"name":"ram","state":"delhi","Mobile":9034782882}]
The above keys are not fixed, So, I can't have classes for it.
Data and formulas will be dynamical. But for now, I am taking this example to understand Stream.Collector on this data.
Now, I want to get the count on basis of name and state,
So basically I want to group-by on name and state and want to get count.
I tried to use Stream.Collector but am not able to achieve what I want.
You can accomplish this with Collectors.groupingBy, using a List as the key of the returned Map:
Map<List<String>, Long> result = yourListOfMaps.stream()
.collect(Collectors.groupingBy(
m -> Arrays.asList(String.valueOf(m.get("name")), String.valueOf(m.get("state"))),
Collectors.counting()));
This works well because all implementations of List in Java implement hashCode and equals consistently, which is a must for every class that is to be used as the key of any Map implementation.
You have to do groupingBy twice once on the key and once again on the value.
Map<String, Map<Object, Long>> map = listOfMap.stream().flatMap(a -> a.entrySet().stream())
.collect(Collectors.groupingBy(Map.Entry<String, String>::getKey,
Collectors.groupingBy(Map.Entry::getValue, Collectors.counting())));
Output
{mobile={9034782882=4}, name={rahul=1, laxman=2, ram=1}, state={Karnataka=2, delhi=1, Kerala=1}}

Filtering a map

I have a list of entries ,where entry has studentId and subjectId attributes.
List<Candidate> candidates
class Candidate {
...
String studentId;
String subjectId;
}
The objective is to derive a map of subjectId to list of studentIds,for those subjects which have been subscribed to by MORE than one student.
I can obviously create a temporary map by iterating over the candidates(a big count),remove single entries later - which seems a costly route.
Any other suggestions ?
We are using Java 1.7
Following this answer (corrected and completed) :
Map<String, Integer> map = candidates.stream()
.collect(Collectors.groupingBy(Candidate::getSubjectId))
.entrySet().stream().filter(x -> x.getValue().size()>1)
.collect(Collectors.toMap(x -> x.getKey(), x -> x.getValue().size()));
You could also use candidates.parallelStream().
When you say 'costly', it's not exactly clear what you mean. - but I'll make an attempt.
With Java8, you can use the groupingBy construct to create a
Map<String, List<Candidate> groupedResults = candidates.stream().collect(Collectors.groupingBy(Candidate::getSubjectId));
and then simply filter out the entries where size <= 1. Rather simple and since it uses streams, memory efficiency should not be an issue.

Zip two lists into an immutable multimap in Java 8 with Guava?

The for loop looks like
ImmutableListMultiMap.<Key, Value>Builder builder
= ImmutableListMultiMap.<Key, Value>newBuilder();
for (int i = 0; i < Math.min(keys.length(), values.length()); i++) {
builder.put(keys.at(i), values.at(i));
}
A possible first step in Guava / Java 8 is
Streams.zip(keys, values, zippingFunction)
I think zippingFunction needs to return a map entry, but there isn't a publicly constructable list entry. So the "most" functional way I can write this is with a zipping function that returns a Pair, which I'm not sure exists in Guava, or returns a two-element list, which is a mutable type that does not properly connote there are exactly 2 elements.
This would be desired if I could create a map entry:
Streams.zip(keys, values, zippingFunction)
.collect(toImmutableListMultimap(e -> e.getKey(), e.getValue())
This seems like the best way, except it's not possible, and the zip into entries and unzip from entries still seems roundabout. Is there a way to make this possible or a way it can be improved?
If your lists are random access, you can do it without zipping:
Map<Key, Value> map = IntStream.range(0, Math.min(keys.size(), values.size()))
.boxed()
.collect(toImmutableListMultimap(i -> keys[i], i -> values[i]));
I think your procedural code is the most optimal solution already (both in terms of memory and speed, assuming random access lists). With small corrections, so that your code compiles, it would be:
ImmutableListMultimap.Builder<Key, Value> builder = ImmutableListMultimap.builder();
for (int i = 0; i < Math.min(keys.size(), values.size()); i++) {
builder.put(keys.get(i), values.get(i));
}
return builder.build();
If you really want to use streams in order to "be functional", zipping two streams is the way to go, but you'd still have to create intermediate "pair" objects before collecting to multimap. You claim that "there isn't a publicly constructable list entry", but it's not true, there are JDK's SimpleImmutableEntry and Guava's Maps.immutableEntry you can use here (and they fit better than more generic Pair, which, in fact, cannot be found in both JDK or Guava.
Using Streams#zip requires passing streams, so the final code would look like this:
Streams.zip(keys.stream(), values.stream(), SimpleImmutableEntry::new)
.collect(toImmutableListMultimap(Map.Entry::getKey, Map.Entry::getValue));
If you're open to using other "functional" Java libraries which allow more stream-related operations, you could use jOOL and its Seq.zip, which accept iterable parameters:
Seq.zip(keys, values, SimpleImmutableEntry::new)
.collect(toImmutableListMultimap(Map.Entry::getKey, Map.Entry::getValue));
Another library would be StreamEx, which exposes EntryStream - an abstraction for key-value pair streams.
So, you almost did it right! Below is your updated code(Suppose your lists are of Integer type) :
Streams.zip(keys.stream(), values.stream(), AbstractMap.SimpleImmutableEntry::new)
.collect(toImmutableListMultimap(Map.Entry<Integer, Integer>::getKey,
Map.Entry<Integer, Integer>::getValue)
);
But I always love to do things at the native level as it gives you more power. See code below by using collect:
Streams.zip(keys.stream(), values.stream(), (k, v) -> new AbstractMap.SimpleImmutableEntry(k, v))
.collect(ImmutableListMultimap::builder,
ImmutableListMultimap.Builder::put,
(builder2, builder3) -> builder2.putAll(builder3.build())
).build();
Note:- builder2.putAll(builder3.build()) BiConsumer will work only when you use parallel stream. That's the behavior of collect(One of my favorite from streams).

Lowercase all HashMap keys

I 've run into a scenario where I want to lowercase all the keys of a HashMap (don't ask why, I just have to do this). The HashMap has some millions of entries.
At first, I thought I 'd just create a new Map, iterate over the entries of the map that is to be lowercased, and add the respective values. This task should run only once per day or something like that, so I thought I could bare this.
Map<String, Long> lowerCaseMap = new HashMap<>(myMap.size());
for (Map.Entry<String, Long> entry : myMap.entrySet()) {
lowerCaseMap.put(entry.getKey().toLowerCase(), entry.getValue());
}
this, however, caused some OutOfMemory errors when my server was overloaded during this one time that I was about to copy the Map.
Now my question is, how can I accomplish this task with the smallest memory footprint?
Would removing each key after lowercased - added to the new Map help?
Could I utilize java8 streams to make this faster? (e.g something like this)
Map<String, Long> lowerCaseMap = myMap.entrySet().parallelStream().collect(Collectors.toMap(entry -> entry.getKey().toLowerCase(), Map.Entry::getValue));
Update
It seems that it's a Collections.unmodifiableMap so I don't have the option of
removing each key after lowercased - added to the new Map
Instead of using HashMap, you could try using a TreeMap with case-insensitive ordering. This would avoid the need to create a lower-case version of each key:
Map<String, Long> map = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
map.putAll(myMap);
Once you've constructed this map, put() and get() will behave case-insensitively, so you can save and fetch values using all-lowercase keys. Iterating over keys will return them in their original, possibly upper-case forms.
Here are some similar questions:
Case insensitive string as HashMap key
Is there a good way to have a Map<String, ?> get and put ignoring case?
You cannot remove the entry while iterating over the map. You will have a ConcurentModificationException if you try to do this.
As the issue is an OutOfMemoryError, not a performance error, using parallel stream will not help either.
Despite some task on the Stream API will be done lately, this will still lead to have two maps in memory at some point so you will still have the issue.
To workaround it, I only saw two ways :
Give more memory to your process (by increasing -Xmx on the Java command line). Memory is cheap these days ;)
Split the map and work in chunks : for example you divide the size of the map by ten and you process one chunck at a time and delete the processed entries before processing the new chunk. By this instead of having two times the map in memory you will just have 1.1 times the map.
For the split algorithm, you can try someting like this using the Stream API :
Map<String, String> toMap = new HashMap<>();
int chunk = fromMap.size() / 10;
for(int i = 1; i<= 10; i++){
//process the chunk
List<Entry<String, String>> subEntries = fromMap.entrySet().stream().limit(chunk)
.collect(Collectors.toList());
for(Entry<String, String> entry : subEntries){
toMap.put(entry.getKey().toLowerCase(), entry.getValue());
fromMap.remove(entry.getKey());
}
}
the concerns in the above answers are correct and you might need to reconsider changing the data structure you are using.
for me, I had a simple map I needed to change its keys to lower case
take a look at my snippet, its a trivial solution and bad at performance
private void convertAllFilterKeysToLowerCase() {
HashSet keysToRemove = new HashSet();
getFilters().keySet().forEach(o -> {
if(!o.equals(((String) o).toLowerCase()))
keysToRemove.add(o);
});
keysToRemove.forEach(o -> getFilters().put(((String) o).toLowerCase(), getFilters().remove(o)));
}
Not sure about the memory footprint. If using Kotlin, you can try the following.
val lowerCaseMap = myMap.mapKeys { it.key.toLowerCase() }
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/map-keys.html

Categories

Resources