How to parallelize a loop in Java - java

In the following code, a local method is called on every element of a HashSet. If it returns a special value we halt the loop. Otherwise we add every return value to a new HashSet.
HashSet<Object> myHashSet=…;
HashSet<Object> mySecondHashSet=…;
for (Object s : myHashSet) {
Object value = my_method(s);
if(value==specialValue)
return value;
else
mySecondHashSet.add(value);
}
I’d like to parralelize this process. None of the objects in the HashSet have any objects in common (it’s a tree-like structure) so I know they can run without any synchonization issues. How do I modify the code such that each call of my_method(s) starts a new tread, and also that if one of the threads evaluates to the special values, all the threads halt without returning and the special value is returned?

Having in mind java 8, this could be relatively simple, while it won't preserve your initial code semantics:
In case all you need is to return special value once you hit it
if (myHashSet.parallelStream()
.map(x -> method(x))
.anyMatch(x -> x == specialValue)) {
return specialValue;
}
If you need to keep transformed values until you meet the special value, you already got an answer from #Elliot in comments, while need to mention that semantic is not the same as your original code, since no orderer will be preserved.
While it yet to be checked, but I would expect following to be optimized and stop once it will hit wanted special value:
if (myHashSet.parallelStream()
.anyMatch(x -> method(x) == specialValue)) {
return specialValue;
}

I would do that in two passes:
find if any of the transformed set elements matches the special value;
transform them to a Set.
Starting a new thread for each transformation is way too heavy, and will bring your machine to its knees (unless you have very few elements, in which case parallelizing is probably not worth the effort.
To avoid transforming the values twice with my_method, you can do the transformation lazily and memoize the result:
private class Memoized {
private Object value;
private Object transformed;
private Function<Object, Object> transform;
public Memoized(Object value, Function<Object, Object> transform) {
this.value = value;
}
public Object getTransformed() {
if (transformed == null) {
transformed = transform.apply(value);
}
return transformed;
}
}
And then you can use the following code:
Set<Memoized> memoizeds =
myHashSet.stream() // no need to go parallel here
.map(o -> new Memoized(o, this::my_method))
.collect(Collectors.toSet());
Optional<Memoized> matching = memoized.parallelStream()
.filter(m -> m.getTransformed().equals(specialValue))
.findAny();
if (matching.isPresent()) {
return matching.get().getTransformed();
}
Set<Object> allTransformed =
memoized.parallelStream()
.map(m -> m.getTransformed())
.collect(Collectors.toSet());

Related

Can this code be reduced using Java 8 Streams?

I want to use Java 8 lambdas and streams to reduce the amount of code in the following method that produces an Optional. Is it possible to achieve?
My code:
protected Optional<String> getMediaName(Participant participant) {
for (ParticipantDevice device : participant.getDevices()) {
if (device.getMedia() != null && StringUtils.isNotEmpty(device.getMedia().getMediaType())) {
String mediaType = device.getMedia().getMediaType().toUpperCase();
Map<String, String> mediaToNameMap = config.getMediaMap();
if (mediaMap.containsKey(mediaType)) {
return Optional.of(mediaMap.get(mediaType));
}
}
}
return Optional.empty();
}
Yes. Assuming the following class hierarchy (I used records here).
record Media(String getMediaType) {
}
record ParticipantDevice(Media getMedia) {
}
record Participant(List<ParticipantDevice> getDevices) {
}
It is pretty self explanatory. Unless you have an empty string as a key you don't need, imo, to check for it in your search. The main difference here is that once the map entry is found, Optional.map is used to return the value instead of the key.
I also checked this out against your loop version and it works the same.
public static Optional<String> getMediaName(Participant participant) {
Map<String, String> mediaToNameMap = config.getMediaMap();
return participant.getDevices().stream()
.map(ParticipantDevice::getMedia).filter(Objects::nonNull)
.map(media -> media.getMediaType().toUpperCase())
.filter(mediaType -> mediaToNameMap.containsKey(mediaType))
.findFirst()
.map(mediaToNameMap::get);
}
Firstly, since your Map of media types returned by config.getMediaMap() doesn't depend on a particular device, it makes sense to generate it before processing the collection of devices. I.e. regurless of the approach (imperative or declarative) do it outside a Loop, or before creating a Stream, to avoid generating the same Map multiple times.
And to implement this method with Streams, you need to use filter() operation, which expects a Predicate, to apply the conditional logic and map() perform a transformation of stream elements.
To get the first element that matches the conditions apply findFirst(), which produces an optional result, as a terminal operation.
protected Optional<String> getMediaName(Participant participant) {
Map<String, String> mediaToNameMap = config.getMediaMap();
return participant.getDevices().stream()
.filter(device -> device.getMedia() != null
&& StringUtils.isNotEmpty(device.getMedia().getMediaType())
)
.map(device -> device.getMedia().getMediaType().toUpperCase())
.filter(mediaToNameMap::containsKey)
.map(mediaToNameMap::get)
.findFirst();
}

Arbitrary created with flatMap does not consider the filter

I am trying jqwik (version 1.5.1) and I read from the documentation that I can create an Arbitrary whose generated value depends on the one supplied by another Arbitrary, specifically using the flatMap function.
My actual goal is different, but based on this idea: I need 2 Arbitrarys that always generate different values for a single test. This is what I tried:
#Provide
private Arbitrary<Tuple.Tuple2<Integer, Integer>> getValues() {
var firstArbitrary = Arbitraries.integers().between(1, Integer.MAX_VALUE);
var secondArbitrary = firstArbitrary.flatMap(first ->
Arbitraries.integers().between(1, Integer.MAX_VALUE).filter(i -> !i.equals(first)));
return Combinators.combine(firstArbitrary, secondArbitrary).as(Tuple::of);
}
#Property
public void test(#ForAll("getValues") Tuple.Tuple2<Integer, Integer> values) {
assertThat(values.get1()).isNotEqualTo(values.get2());
}
And it immediately fails with this sample:
Shrunk Sample (1 steps)
-----------------------
arg0: (1, 1)
Throwing an AssertionError of course:
java.lang.AssertionError:
Expecting:
1
not to be equal to:
1
I expected the filter function would have been enough to exclude the generated value produced by the firstArbitrary but it seems like it is not even considered, or more likely it does something else. What am I missing? Is there an easier way to make sure that, given a certain number of integer generators, they always produce different values?
The general idea of one generated value influencing the next generation step through flatMap is right. The thing you are missing is that you loose this coupling by combining firstArbitrary and secondArbitrary outside of the flat mapping scope. The fix is minor:
#Provide
private Arbitrary<Tuple.Tuple2<Integer, Integer>> getValues() {
var firstArbitrary = Arbitraries.integers().between(1, Integer.MAX_VALUE);
return firstArbitrary.flatMap(
first -> Arbitraries.integers().between(1, Integer.MAX_VALUE)
.filter(i -> !i.equals(first))
.map(second -> Tuple.of(first, second))
);
}
That said there are more - I'd argue simpler - ways to achieve your goal:
#Provide
private Arbitrary<Tuple.Tuple2<Integer, Integer>> getValues() {
var firstArbitrary = Arbitraries.integers().between(1, Integer.MAX_VALUE);
return firstArbitrary.tuple2().filter(t -> !t.get1().equals(t.get2()));
}
This gets rid of flat mapping, which means less effort while shrinking for jqwik.
Another possible solution:
#Provide
private Arbitrary<Tuple.Tuple2<Integer, Integer>> getValues() {
var firstArbitrary = Arbitraries.integers().between(1, Integer.MAX_VALUE);
return firstArbitrary.list().ofSize(2).uniqueElements().map(l -> Tuple.of(l.get(0), l.get(1)));
}
This one might seem a bit involved, but it has the advantage that no flat mapping and no filtering is being used. Filtering often reduces performance of generation, edge cases, exhaustive generation and shrinking. That's why I steer clear of filtering whenever I can without too much hassle.

Get unique Object from a Stream if present

Starting with a bean class MyBean with a single relevant propterty:
#Data
class MyBean {
private String myProperty;
}
Now I have got a set of these beans Set<MyBean> mySet usually with 0, 1, or 2 elements.
The question is: How do I retrieve myProperty from this set if it is equal for all elements, or else null. Preferably in a single line with effort O(n).
I found several examples to determine the boolean if all properties are equal. But I want to know the corresponding property.
Is there something smarter than this?
String uniqueProperty = mySet.stream().map(MyBean::getMyProperty).distinct().count() == 1
? mySet.stream().map(MyBean::getMyProperty).findAny().orElse(null)
: null;
Your version is already O(n).
It's possible to do this with a one-liner (although yours is too depending on how you write it).
String uniqueProperty = mySet.stream()
.map(MyBean::getMyProperty)
.map(Optional::ofNullable)
.reduce((a, b) -> a.equals(b) ? a : Optional.empty()) // Note: equals compares 2 Optionals here
.get() // unwraps first Optional layer
.orElse(null); // unwraps second layer
The only case this doesn't work for is when all property values are null. You cannot distinguish the set (null, null) from (null, "A") for example, they both return null.
Just a single iteration without the use of streams looks much better for such a use case :
Iterator<MyBean> iterator = mySet.iterator();
String uniqueProperty = iterator.next().getMyProperty();
while (iterator.hasNext()) {
if (!iterator.next().getMyProperty().equals(uniqueProperty)) {
uniqueProperty = null; // some default value possibly
break;
}
}
You use the findAny() first and check mySet again with allMatch() to require all items to match the first one in a filter():
String uniqueProperty = mySet.stream().findAny().map(MyBean::getMyProperty)
.filter(s -> mySet.stream().map(MyBean::getMyProperty).allMatch(s::equals))
.orElse(null);
The advantage of this is, that allMatch() will only evaluate all elements if necessary (docs).

Fill Map<String,Map<String,Integer>> with Stream

I have a Linkedlist with Data ( author, date , LinkedList<Changes(lines, path)> )
now i want to create with a stream out of this a Map< Filepath, Map< Author, changes >>
public Map<String, Map<String, Integer>> authorFragmentation(List<Commit> commits) {
return commits.stream()
.map(Commit::getChangesList)
.flatMap(changes -> changes.stream())
.collect(Collectors.toMap(
Changes::getPath,
Collectors.toMap(
Commit::getAuthorName,
(changes) -> 1,
(oldValue, newValue) -> oldValue + 1)));
}
I try it so but this doesnt work.
How can i create this Map in a Map with the Stream and count at the same time the changes ?
Jeremy Grand is completely correct in his comment: in your collector it has long been forgotten that you started out from a stream of Commit objects, so you cannot use Commit::getAuthorName there. The challenge is how to keep the author name around to a place where you also got the path. One solution is to put both into a newly created string array (since both are strings).
public Map<String, Map<String, Long>> authorFragmentation(List<Commit> commits) {
return commits.stream()
.flatMap(c -> c.getChangesList()
.stream()
.map((Changes ch) -> new String[] { c.getAuthorName(), ch.getPath() }))
.collect(Collectors.groupingBy(sa -> sa[1],
Collectors.groupingBy(sa -> sa[0], Collectors.counting())));
}
Collectors.counting() insists on counting into a Long, not Integer, so I have modified your return type. I’m sure a conversion to Integer would be possible if necessary, but I would first consider whether I could live with Long.
It’s not the most beautiful stream code, and I will wait to see if other suggestions come up.
The code is compiled, but since I neither have your classes nor your data, I have not tried running it. If there are any issues, please revert.
Your mistake is that map/flatMap call "throws away" the Commit. You do not know which Commit a Change belongs to when trying to collect. In order to keep that information I'd recommend creating a small helper class (you could use a simple Pair, though):
public class OneChange
{
private Commit commit;
private Change change;
public OneChange(Commit commit, Change change)
{
this.commit = commit;
this.change = change;
}
public String getAuthorName() { return commit.getAuthorName(); };
public String getPath() { return change.getPath(); };
public Integer getLines() { return change.getLines(); };
}
You can then flatMap to that, group it by path and author, and then sum up the lines changed:
commits.stream()
.flatMap(commit -> commit.getChanges().stream().map(change -> new OneChange(commit, change)))
.collect(Collectors.groupingBy(OneChange::getPath,
Collectors.groupingBy(OneChange::getAuthorName,
Collectors.summingInt(OneChange::getLines))));
In case you do not want to sum up the lines, but just count the Changes, replace Collectors.summingInt(OneChange::getLines) by Collectors.counting().

Group and Reduce list of objects

I have a list of objects with many duplicated and some fields that need to be merged. I want to reduce this down to a list of unique objects using only Java 8 Streams (I know how to do this via old-skool means but this is an experiment.)
This is what I have right now. I don't really like this because the map-building seems extraneous and the values() collection is a view of the backing map, and you need to wrap it in a new ArrayList<>(...) to get a more specific collection. Is there a better approach, perhaps using the more general reduction operations?
#Test
public void reduce() {
Collection<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(this::getfoos)
.collect(Collectors.toMap(f -> f.name, f -> f, (l, r) -> {
l.ids.addAll(r.ids);
return l;
})).values();
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0,10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If you break the grouping and reducing steps up, you can get something cleaner:
Stream<Foo> input = Stream.of("foo", "bar", "baz").flatMap(this::getfoos);
Map<String, Optional<Foo>> collect = input.collect(Collectors.groupingBy(f -> f.name, Collectors.reducing(Foo::merge)));
Collection<Optional<Foo>> collected = collect.values();
This assumes a few convenience methods in your Foo class:
public Foo(String n, List<Integer> ids) {
this.name = n;
this.ids.addAll(ids);
}
public static Foo merge(Foo src, Foo dest) {
List<Integer> merged = new ArrayList<>();
merged.addAll(src.ids);
merged.addAll(dest.ids);
return new Foo(src.name, merged);
}
As already pointed out in the comments, a map is a very natural thing to use when you want to identify unique objects. If all you needed to do was find the unique objects, you could use the Stream::distinct method. This method hides the fact that there is a map involved, but apparently it does use a map internally, as hinted by this question that shows you should implement a hashCode method or distinct may not behave correctly.
In the case of the distinct method, where no merging is necessary, it is possible to return some of the results before all of the input has been processed. In your case, unless you can make additional assumptions about the input that haven't been mentioned in the question, you do need to finish processing all of the input before you return any results. Thus this answer does use a map.
It is easy enough to use streams to process the values of the map and turn it back into an ArrayList, though. I show that in this answer, as well as providing a way to avoid the appearance of an Optional<Foo>, which shows up in one of the other answers.
public void reduce() {
ArrayList<Foo> foos = Stream.of("foo", "bar", "baz").flatMap(this::getfoos)
.collect(Collectors.collectingAndThen(Collectors.groupingBy(f -> f.name,
Collectors.reducing(Foo.identity(), Foo::merge)),
map -> map.values().stream().
collect(Collectors.toCollection(ArrayList::new))));
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0, 10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
private static final Foo BASE_FOO = new Foo("", 0);
public static Foo identity() {
return BASE_FOO;
}
// use only if side effects to the argument objects are okay
public static Foo merge(Foo fooOne, Foo fooTwo) {
if (fooOne == BASE_FOO) {
return fooTwo;
} else if (fooTwo == BASE_FOO) {
return fooOne;
}
fooOne.ids.addAll(fooTwo.ids);
return fooOne;
}
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If the input elements are supplied in the random order, then having intermediate map is probably the best solution. However if you know in advance that all the foos with the same name are adjacent (this condition is actually met in your test), the algorithm can be greatly simplified: you just need to compare the current element with the previous one and merge them if the name is the same.
Unfortunately there's no Stream API method which would allow you do to such thing easily and effectively. One possible solution is to write custom collector like this:
public static List<Foo> withCollector(Stream<Foo> stream) {
return stream.collect(Collector.<Foo, List<Foo>>of(ArrayList::new,
(list, t) -> {
Foo f;
if(list.isEmpty() || !(f = list.get(list.size()-1)).name.equals(t.name))
list.add(t);
else
f.ids.addAll(t.ids);
},
(l1, l2) -> {
if(l1.isEmpty())
return l2;
if(l2.isEmpty())
return l1;
if(l1.get(l1.size()-1).name.equals(l2.get(0).name)) {
l1.get(l1.size()-1).ids.addAll(l2.get(0).ids);
l1.addAll(l2.subList(1, l2.size()));
} else {
l1.addAll(l2);
}
return l1;
}));
}
My tests show that this collector is always faster than collecting to map (up to 2x depending on average number of duplicate names), both in sequential and parallel mode.
Another approach is to use my StreamEx library which provides a bunch of "partial reduction" methods including collapse:
public static List<Foo> withStreamEx(Stream<Foo> stream) {
return StreamEx.of(stream)
.collapse((l, r) -> l.name.equals(r.name), (l, r) -> {
l.ids.addAll(r.ids);
return l;
}).toList();
}
This method accepts two arguments: a BiPredicate which is applied for two adjacent elements and should return true if elements should be merged and the BinaryOperator which performs merging. This solution is a little bit slower in sequential mode than the custom collector (in parallel the results are very similar), but it's still significantly faster than toMap solution and it's simpler and somewhat more flexible as collapse is an intermediate operation, so you can collect in another way.
Again both these solutions work only if foos with the same name are known to be adjacent. It's a bad idea to sort the input stream by foo name, then using these solutions, because the sorting will drastically reduce the performance making it slower than toMap solution.
As already pointed out by others, an intermediate Map is unavoidable, as that’s the way of finding the objects to merge. Further, you should not modify source data during reduction.
Nevertheless, you can achieve both without creating multiple Foo instances:
List<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(n->IntStream.range(0,10).mapToObj(i -> new Foo(n, i)))
.collect(collectingAndThen(groupingBy(f -> f.name),
m->m.entrySet().stream().map(e->new Foo(e.getKey(),
e.getValue().stream().flatMap(f->f.ids.stream()).collect(toList())))
.collect(toList())));
This assumes that you add a constructor
public Foo(String n, List<Integer> l) {
name = n;
ids=l;
}
to your Foo class, as it should have if Foo is really supposed to be capable of holding a list of IDs. As a side note, having a type which serves as single item as well as a container for merged results seems unnatural to me. This is exactly why to code turns out to be so complicated.
If the source items had a single id, using something like groupingBy(f -> f.name, mapping(f -> id, toList()), followed by mapping the entries of (String, List<Integer>) to the merged items was sufficient.
Since this is not the case and Java 8 lacks the flatMapping collector, the flatmapping step is moved to the second step, making it look much more complicated.
But in both cases, the second step is not obsolete as it is where the result items are actually created and converting the map to the desired list type comes for free.

Categories

Resources