Fill Map<String,Map<String,Integer>> with Stream - java

I have a Linkedlist with Data ( author, date , LinkedList<Changes(lines, path)> )
now i want to create with a stream out of this a Map< Filepath, Map< Author, changes >>
public Map<String, Map<String, Integer>> authorFragmentation(List<Commit> commits) {
return commits.stream()
.map(Commit::getChangesList)
.flatMap(changes -> changes.stream())
.collect(Collectors.toMap(
Changes::getPath,
Collectors.toMap(
Commit::getAuthorName,
(changes) -> 1,
(oldValue, newValue) -> oldValue + 1)));
}
I try it so but this doesnt work.
How can i create this Map in a Map with the Stream and count at the same time the changes ?

Jeremy Grand is completely correct in his comment: in your collector it has long been forgotten that you started out from a stream of Commit objects, so you cannot use Commit::getAuthorName there. The challenge is how to keep the author name around to a place where you also got the path. One solution is to put both into a newly created string array (since both are strings).
public Map<String, Map<String, Long>> authorFragmentation(List<Commit> commits) {
return commits.stream()
.flatMap(c -> c.getChangesList()
.stream()
.map((Changes ch) -> new String[] { c.getAuthorName(), ch.getPath() }))
.collect(Collectors.groupingBy(sa -> sa[1],
Collectors.groupingBy(sa -> sa[0], Collectors.counting())));
}
Collectors.counting() insists on counting into a Long, not Integer, so I have modified your return type. I’m sure a conversion to Integer would be possible if necessary, but I would first consider whether I could live with Long.
It’s not the most beautiful stream code, and I will wait to see if other suggestions come up.
The code is compiled, but since I neither have your classes nor your data, I have not tried running it. If there are any issues, please revert.

Your mistake is that map/flatMap call "throws away" the Commit. You do not know which Commit a Change belongs to when trying to collect. In order to keep that information I'd recommend creating a small helper class (you could use a simple Pair, though):
public class OneChange
{
private Commit commit;
private Change change;
public OneChange(Commit commit, Change change)
{
this.commit = commit;
this.change = change;
}
public String getAuthorName() { return commit.getAuthorName(); };
public String getPath() { return change.getPath(); };
public Integer getLines() { return change.getLines(); };
}
You can then flatMap to that, group it by path and author, and then sum up the lines changed:
commits.stream()
.flatMap(commit -> commit.getChanges().stream().map(change -> new OneChange(commit, change)))
.collect(Collectors.groupingBy(OneChange::getPath,
Collectors.groupingBy(OneChange::getAuthorName,
Collectors.summingInt(OneChange::getLines))));
In case you do not want to sum up the lines, but just count the Changes, replace Collectors.summingInt(OneChange::getLines) by Collectors.counting().

Related

How to collect data from a stream in different lists based on a condition?

I have a stream of data as shown below and I wish to collect the data based on a condition.
Stream of data:
452857;0;L100;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L120;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L121;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L126;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L100;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L122;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
I wish to collect the data based on the index = 2 (L100,L121 ...) and store it in different lists of L120,L121,L122 etc using Java 8 streams. Any suggestions?
Note: splittedLine array below is my stream of data.
For instance: I have tried the following but I think there's a shorter way:
List<String> L100_ENTITY_NAMES = Arrays.asList("L100", "L120", "L121", "L122", "L126");
List<List<String>> list= L100_ENTITY_NAMES.stream()
.map(entity -> Arrays.stream(splittedLine)
.filter(line -> {
String[] values = line.split(String.valueOf(DELIMITER));
if(values.length > 0){
return entity.equals(values[2]);
}
else{
return false;
}
}).collect(Collectors.toList())).collect(Collectors.toList());
I'd rather change the order and also collect the data into a Map<String, List<String>> where the key would be the entity name.
Assuming splittedLine is the array of lines, I'd probably do something like this:
Set<String> L100_ENTITY_NAMES = Set.of("L100", ...);
String delimiter = String.valueOf(DELIMITER);
Map<String, List<String>> result =
Arrays.stream(splittedLine)
.map(line -> {
String[] values = line.split(delimiter );
if( values.length < 3) {
return null;
}
return new AbstractMap.SimpleEntry<>(values[2], line);
})
.filter(Objects::nonNull)
.filter(tempLine -> L100_ENTITY_NAMES.contains(tempLine.getEntityName()))
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.mapping(Map.Entry::getValue, Collectors.toList());
Note that this isn't necessarily shorter but has a couple of other advantages:
It's not O(n*m) but rather O(n * log(m)), so it should be faster for non-trivial stream sizes
You get an entity name for each list rather than having to rely on the indices in both lists
It's easier to understand because you use distinct steps:
split and map the line
filter null values, i.e. lines that aren't valid in the first place
filter lines that don't have any of the L100 entity names
collect the filtered lines by entity name so you can easily access the sub lists
I would convert the semicolon-delimited lines to objects as soon as possible, instead of keeping them around as a serialized bunch of data.
First, I would create a model modelling our data:
public record LBasedEntity(long id, int zero, String lcode, …) { }
Then, create a method to parse the line. This can be as well an external parsing library, for this looks like CSV with semicolon as delimiter.
private static LBasedEntity parse(String line) {
String[] parts = line.split(";");
if (parts.length < 3) {
return null;
}
long id = Long.parseLong(parts[0]);
int zero = Integer.parseInt(parts[1]);
String lcode = parts[2];
…
return new LBasedEntity(id, zero, lcode, …);
}
Then the mapping is trivial:
Map<String, List<LBasedEntity>> result = Arrays.stream(lines)
.map(line -> parse(line))
.filter(Objects::nonNull)
.filter(lBasedEntity -> L100_ENTITY_NAMES.contains(lBasedEntity.lcode()))
.collect(Collectors.groupingBy(LBasedEntity::lcode));
map(line -> parse(line)) parses the line into an LBasedEntity object (or whatever you call it);
filter(Objects::nonNull) filters out all null values produced by the parse method;
The next filter selects all entities of which the lcode property is contained in the L100_ENTITY_NAMES list (I would turn this into a Set, to speed things up);
Then a Map is with key-value pairs of L100_ENTITY_NAME → List<LBasedEntity>.
You're effectively asking for what languages like Scala provide on collections: groupBy. In Scala you could write:
splitLines.groupBy(_(2)) // Map[String, List[String]]
Of course, you want this in Java, and in my opinion, not using streams here makes sense due to Java's lack of a fold or groupBy function.
HashMap<String, ArrayList<String>> map = new HashMap<>();
for (String[] line : splitLines) {
if (line.length < 2) continue;
ArrayList<String> xs = map.getOrDefault(line[2], new ArrayList<>());
xs.addAll(Arrays.asList(line));
map.put(line[2], xs);
}
As you can see, it's very easy to understand, and actually shorter than the stream based solution.
I'm leveraging two key methods on a HashMap.
The first is getOrDefault; basically if the value associate with our key doesn't exist, we can provide a default. In our case, an empty ArrayList.
The second is put, which actually acts like a putOrReplace because it lets us override the previous value associated with the key.
I hope that was helpful. :)
you're asking for a shorter way to achieve the same, actually your code is good. I guess the only part that makes it look lengthy is the if/else check in the stream.
if (values.length > 0) {
return entity.equals(values[2]);
} else {
return false;
}
I would suggest introduce two tiny private methods to improve the readability, like this:
List<List<String>> list = L100_ENTITY_NAMES.stream()
.map(entity -> getLinesByEntity(splittedLine, entity)).collect(Collectors.toList());
private List<String> getLinesByEntity(String[] splittedLine, String entity) {
return Arrays.stream(splittedLine).filter(line -> isLineMatched(entity, line)).collect(Collectors.toList());
}
private boolean isLineMatched(String entity, String line) {
String[] values = line.split(DELIMITER);
return values.length > 0 && entity.equals(values[2]);
}

Extract variable from spring webflux reactive pipeline

I am working on reactive streams using Spring webflux. I want to extract a variable(name) from the middle of the reactive pipeline and use it in a different place as follows.
public class Example {
public Mono<String> test() {
String name;
return Mono.just("some random string")
.map(s -> {
name = s.toUpperCase();
return name;
}).map(...)
.map(...)
.flatMap(...)
.map(...)
.map(result -> result+name)
.doOnSuccess(res -> asyncPublish(name));
public void asyncPublish(String name) {
// logic to write to a Messaging queue asynchronously
}
}
}
The above code is not working. This is a contrived example but shows what I want to achieve.
Note: I don't want to use multiple zip operators just to bring the name all the way to the last map where I want to use it. Is there a way I can store it in a variable as shown above and then use it somewhere else whereever I need it.
You might use for example a Tuple2 to pass the value of name along with the modified data through the chain.
return Mono.just("some random string")
.map(s -> s.toUpperCase())
.map(s -> Tuples.of(s, x(s))) // given that x(s) is the transformation of this map statement
.map(...) // keeping Tuple with the value of `name` in the first slot...
.flatMap(...) // keeping Tuple with the value of `name` in the first slot...
.map(resultTuple -> Tuples.of(resultTuple.getT1(), resultTuple.getT2() + resultTuple.getT1()) // keeping Tuple with the value of `name` in the first slot...
.doOnSuccess(resultTuple -> asyncPublish(resultTuple.getT1()))
.map(resultTuple -> resultTuple.getT2()); // in case that the returned Mono should contain the modified value...
Tuples is from the package reactor.util.function and part of reactor-core.
Another way (without passing the value through the chain using Tuples) could be to use AtomicReference (but I still think that the Tuple way is cleaner). The AtomicReference way might look like this:
public Mono<String> test() {
final AtomicReference<String> nameRef = new AtomicReference<>();
return Mono.just("some random string")
.map(s -> {
final String name = s.toUpperCase();
nameRef.set(name);
return name;
}).map(...)
.map(...)
.flatMap(...)
.map(...)
.map(result -> result+nameRef.get())
.doOnSuccess(res -> asyncPublish(nameRef.get()));
public void asyncPublish(String name) {
// logic to write to a Messaging queue asynchronously
}
}

Java 8 - Update two properties in same Stream code

I'm wondering if there is a way I can update two times an object in a Stream lambda code, I need to update two properties of a class, I need to update the value and the recordsCount properties
Object:
public class HistoricDataModelParsed {
private Date startDate;
private Date endDate;
private Double value;
private int recordsCount;
}
I tried doing something like this:
val existingRecord = response.stream()
.filter(dateTime ->fromDate.equals(dateTime.getStartDate()))
.findAny()
.orElse(null);
response.stream()
.filter(dateTime ->fromDate.equals(dateTime.getStartDate()))
.findAny()
.orElse(existingRecord)
.setValue(valueAdded)
.setRecordsCount(amount);
But I got this error: "Cannot invoke setRecordsCount(int) on the primitive type void"
So I ended up doing the stream two times to update each of the two fields I needed
response.stream()
.filter(dateTime ->fromDate.equals(dateTime.getStartDate()))
.findAny()
.orElse(existingRecord)
.setValue(valueAdded);
response.stream()
.filter(dateTime ->fromDate.equals(dateTime.getStartDate()))
.findAny()
.orElse(existingRecord)
.setRecordsCount(amount);
Is there a way I can achieve what I need without the need to stream two times the list?
The return type of setValue is void and not HistoricDataModelParsed. So you cannot invoke the method setRecordsCount which is in HistoricDataModelParsed class.
You could have added a method in HistoricDataModelParsed which takes two parameters for value and recordsCount:
public void setValueAndCount(Double value, int count) {
this.value = value;
this.recordsCount = count;
}
Then call this method after orElse:
response.stream()
.filter(dateTime ->fromDate.equals(dateTime.getStartDate()))
.findAny()
.orElse(existingRecord)
.setValueAndCount(valueAdded, amount);
The state of an object should not change within a stream. It can lead to inconsistent results. But you can create new instances of the objects and pass new values via the constructor. Here is a simple record that demonstrates the method. Records are basically immutable classes that have no setters. The getters are the names of the variables. A class would also work in this example.
record Temp(int getA, int getB) {
#Override
public String toString() {
return "[" + getA + ", " + getB +"]";
}
}
Some data
List<Temp> list = List.of(new Temp(10, 20), new Temp(50, 200),
new Temp(100, 200));
And the transformation. A new instance of Temp with new values is created along with the old ones to completely populate the constructor. Otherwise, the existing object is passed along.
List<Temp> result = list.stream().map(
t -> t.getA() == 50 ? new Temp(2000, t.getB()) : t)
.toList();
System.out.println(result);
Prints
[[10, 20], [2000, 200], [100, 200]]
To answer the void error you got it's because a stream expects values to continue thru out the stream so if a method is void, it isn't returning anything so you would have to return it. Here is an example:
stream.map(t->{voidReturnMethod(t); return t;}).toList();
The return ensures the pipeline continues.
Simply store the result of orElse and then call your methods on it.
HistoricDataModelParsed record =
response.stream()
.filter(dateTime -> fromDate.equals(dateTime.getStartDate()))
.findAny()
.orElse(existingRecord);
record.setValue(valueAdded)
record.setRecordsCount(amount);

How to parallelize a loop in Java

In the following code, a local method is called on every element of a HashSet. If it returns a special value we halt the loop. Otherwise we add every return value to a new HashSet.
HashSet<Object> myHashSet=…;
HashSet<Object> mySecondHashSet=…;
for (Object s : myHashSet) {
Object value = my_method(s);
if(value==specialValue)
return value;
else
mySecondHashSet.add(value);
}
I’d like to parralelize this process. None of the objects in the HashSet have any objects in common (it’s a tree-like structure) so I know they can run without any synchonization issues. How do I modify the code such that each call of my_method(s) starts a new tread, and also that if one of the threads evaluates to the special values, all the threads halt without returning and the special value is returned?
Having in mind java 8, this could be relatively simple, while it won't preserve your initial code semantics:
In case all you need is to return special value once you hit it
if (myHashSet.parallelStream()
.map(x -> method(x))
.anyMatch(x -> x == specialValue)) {
return specialValue;
}
If you need to keep transformed values until you meet the special value, you already got an answer from #Elliot in comments, while need to mention that semantic is not the same as your original code, since no orderer will be preserved.
While it yet to be checked, but I would expect following to be optimized and stop once it will hit wanted special value:
if (myHashSet.parallelStream()
.anyMatch(x -> method(x) == specialValue)) {
return specialValue;
}
I would do that in two passes:
find if any of the transformed set elements matches the special value;
transform them to a Set.
Starting a new thread for each transformation is way too heavy, and will bring your machine to its knees (unless you have very few elements, in which case parallelizing is probably not worth the effort.
To avoid transforming the values twice with my_method, you can do the transformation lazily and memoize the result:
private class Memoized {
private Object value;
private Object transformed;
private Function<Object, Object> transform;
public Memoized(Object value, Function<Object, Object> transform) {
this.value = value;
}
public Object getTransformed() {
if (transformed == null) {
transformed = transform.apply(value);
}
return transformed;
}
}
And then you can use the following code:
Set<Memoized> memoizeds =
myHashSet.stream() // no need to go parallel here
.map(o -> new Memoized(o, this::my_method))
.collect(Collectors.toSet());
Optional<Memoized> matching = memoized.parallelStream()
.filter(m -> m.getTransformed().equals(specialValue))
.findAny();
if (matching.isPresent()) {
return matching.get().getTransformed();
}
Set<Object> allTransformed =
memoized.parallelStream()
.map(m -> m.getTransformed())
.collect(Collectors.toSet());

Group and Reduce list of objects

I have a list of objects with many duplicated and some fields that need to be merged. I want to reduce this down to a list of unique objects using only Java 8 Streams (I know how to do this via old-skool means but this is an experiment.)
This is what I have right now. I don't really like this because the map-building seems extraneous and the values() collection is a view of the backing map, and you need to wrap it in a new ArrayList<>(...) to get a more specific collection. Is there a better approach, perhaps using the more general reduction operations?
#Test
public void reduce() {
Collection<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(this::getfoos)
.collect(Collectors.toMap(f -> f.name, f -> f, (l, r) -> {
l.ids.addAll(r.ids);
return l;
})).values();
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0,10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If you break the grouping and reducing steps up, you can get something cleaner:
Stream<Foo> input = Stream.of("foo", "bar", "baz").flatMap(this::getfoos);
Map<String, Optional<Foo>> collect = input.collect(Collectors.groupingBy(f -> f.name, Collectors.reducing(Foo::merge)));
Collection<Optional<Foo>> collected = collect.values();
This assumes a few convenience methods in your Foo class:
public Foo(String n, List<Integer> ids) {
this.name = n;
this.ids.addAll(ids);
}
public static Foo merge(Foo src, Foo dest) {
List<Integer> merged = new ArrayList<>();
merged.addAll(src.ids);
merged.addAll(dest.ids);
return new Foo(src.name, merged);
}
As already pointed out in the comments, a map is a very natural thing to use when you want to identify unique objects. If all you needed to do was find the unique objects, you could use the Stream::distinct method. This method hides the fact that there is a map involved, but apparently it does use a map internally, as hinted by this question that shows you should implement a hashCode method or distinct may not behave correctly.
In the case of the distinct method, where no merging is necessary, it is possible to return some of the results before all of the input has been processed. In your case, unless you can make additional assumptions about the input that haven't been mentioned in the question, you do need to finish processing all of the input before you return any results. Thus this answer does use a map.
It is easy enough to use streams to process the values of the map and turn it back into an ArrayList, though. I show that in this answer, as well as providing a way to avoid the appearance of an Optional<Foo>, which shows up in one of the other answers.
public void reduce() {
ArrayList<Foo> foos = Stream.of("foo", "bar", "baz").flatMap(this::getfoos)
.collect(Collectors.collectingAndThen(Collectors.groupingBy(f -> f.name,
Collectors.reducing(Foo.identity(), Foo::merge)),
map -> map.values().stream().
collect(Collectors.toCollection(ArrayList::new))));
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0, 10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
private static final Foo BASE_FOO = new Foo("", 0);
public static Foo identity() {
return BASE_FOO;
}
// use only if side effects to the argument objects are okay
public static Foo merge(Foo fooOne, Foo fooTwo) {
if (fooOne == BASE_FOO) {
return fooTwo;
} else if (fooTwo == BASE_FOO) {
return fooOne;
}
fooOne.ids.addAll(fooTwo.ids);
return fooOne;
}
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If the input elements are supplied in the random order, then having intermediate map is probably the best solution. However if you know in advance that all the foos with the same name are adjacent (this condition is actually met in your test), the algorithm can be greatly simplified: you just need to compare the current element with the previous one and merge them if the name is the same.
Unfortunately there's no Stream API method which would allow you do to such thing easily and effectively. One possible solution is to write custom collector like this:
public static List<Foo> withCollector(Stream<Foo> stream) {
return stream.collect(Collector.<Foo, List<Foo>>of(ArrayList::new,
(list, t) -> {
Foo f;
if(list.isEmpty() || !(f = list.get(list.size()-1)).name.equals(t.name))
list.add(t);
else
f.ids.addAll(t.ids);
},
(l1, l2) -> {
if(l1.isEmpty())
return l2;
if(l2.isEmpty())
return l1;
if(l1.get(l1.size()-1).name.equals(l2.get(0).name)) {
l1.get(l1.size()-1).ids.addAll(l2.get(0).ids);
l1.addAll(l2.subList(1, l2.size()));
} else {
l1.addAll(l2);
}
return l1;
}));
}
My tests show that this collector is always faster than collecting to map (up to 2x depending on average number of duplicate names), both in sequential and parallel mode.
Another approach is to use my StreamEx library which provides a bunch of "partial reduction" methods including collapse:
public static List<Foo> withStreamEx(Stream<Foo> stream) {
return StreamEx.of(stream)
.collapse((l, r) -> l.name.equals(r.name), (l, r) -> {
l.ids.addAll(r.ids);
return l;
}).toList();
}
This method accepts two arguments: a BiPredicate which is applied for two adjacent elements and should return true if elements should be merged and the BinaryOperator which performs merging. This solution is a little bit slower in sequential mode than the custom collector (in parallel the results are very similar), but it's still significantly faster than toMap solution and it's simpler and somewhat more flexible as collapse is an intermediate operation, so you can collect in another way.
Again both these solutions work only if foos with the same name are known to be adjacent. It's a bad idea to sort the input stream by foo name, then using these solutions, because the sorting will drastically reduce the performance making it slower than toMap solution.
As already pointed out by others, an intermediate Map is unavoidable, as that’s the way of finding the objects to merge. Further, you should not modify source data during reduction.
Nevertheless, you can achieve both without creating multiple Foo instances:
List<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(n->IntStream.range(0,10).mapToObj(i -> new Foo(n, i)))
.collect(collectingAndThen(groupingBy(f -> f.name),
m->m.entrySet().stream().map(e->new Foo(e.getKey(),
e.getValue().stream().flatMap(f->f.ids.stream()).collect(toList())))
.collect(toList())));
This assumes that you add a constructor
public Foo(String n, List<Integer> l) {
name = n;
ids=l;
}
to your Foo class, as it should have if Foo is really supposed to be capable of holding a list of IDs. As a side note, having a type which serves as single item as well as a container for merged results seems unnatural to me. This is exactly why to code turns out to be so complicated.
If the source items had a single id, using something like groupingBy(f -> f.name, mapping(f -> id, toList()), followed by mapping the entries of (String, List<Integer>) to the merged items was sufficient.
Since this is not the case and Java 8 lacks the flatMapping collector, the flatmapping step is moved to the second step, making it look much more complicated.
But in both cases, the second step is not obsolete as it is where the result items are actually created and converting the map to the desired list type comes for free.

Categories

Resources