Java 8 Iterating Stream Operations - java

I want to perform a stream where the output from the stream is then used as the source for the same stream, in the same operation.
I currently perform this sort of operation using a queue; I remove an item, process it, and add any results that need further processing back to the queue. Here are two examples of this sort of thing:
Queue<WorkItem> workQueue = new Queue<>(workToDo);
while(!workQueue.isEmpty()){
WorkItem item = workQueue.remove();
item.doOneWorkUnit();
if(!item.isDone()) workQueue.add(item);
}
Queue<Node> nodes = new Queue<>(rootNodes);
while(!nodesLeft.isEmpty()){
Node node = nodes.remove();
process(node);
nodes.addAll(node.children());
}
I would imagine that the first could be performed concurrently like this:
try {
LinkedBlockingQueue<WorkItem> workQueue = new LinkedBlockingQueue<>();
Stream<WorkItem> reprocess = Stream.generate(() -> workQueue.remove()).parallel();
Stream.concat(workToDo.parallelstream(), reprocess)
.filter(item -> {item.doOneWorkUnit(); return !item.isDone();})
.collect(Collectors.toCollection(() -> workQueue));
} catch (NoSuchElementException e){}
And the second as:
try {
LinkedBlockingQueue<Node> reprocessQueue = new LinkedBlockingQueue<>();
Stream<WorkItem> reprocess = Stream.generate(() -> nodes.remove()).parallel();
Stream.concat(rootNodes.parallelStream(), reprocess)
.filter(item -> {process(item); return true;})
.flatMap(node -> node.children().parallelStream())
.collect(Collectors.toCollection(() -> reprocessQueue));
} catch (NoSuchElementException e){}
However, these feel like kludgy workarounds, and I dislike having to resort to using exceptions. Does anyone have a better way to do this sort of thing?

To make work parallel, I would use standard java.util.concurrent.Executor. To return the task back to working queue, in the end of the code of each task, add executor.execute(this).

Related

Validate beginning of stream in Reactor Flux

Using Reactor, I'm trying to validate the beginning of a cold Flux stream and then become a pass-through.
For example, say I need to validate the first N elements. If (and only if) it passes, these and further elements are forwarded. If it fails, only an error is emitted.
This is what I have so far. It works, but is there a better or more correct way to do this? I was tempted to implement my own operator, but I'm told it's complicated and not recommended.
flux
.bufferUntil(new Predicate<>() {
private int count = 0;
#Override
public boolean test(T next) {
return ++count >= N;
}
})
// Zip with index to know the first element
.zipWith(Flux.<Integer, Integer>generate(() -> 0, (cur, s) -> {
s.next(cur);
return cur + 1;
}))
.map(t -> {
if (t.getT2() == 0 && !validate(t.getT1()))
throw new RuntimeException("Invalid");
return t.getT1();
})
// Flatten buffered elements
.flatMapIterable(identity())
I could have used doOnNext instead of the second map since it doesn't map anything, but I'm not sure it's an acceptable use of the peek methods.
I could also have used a stateful mapper in the second map to run only once instead of zipping with index, I guess that's acceptable since I'm already using a stateful predicate...
Your requirement sounds interesting! We have switchOnFirst which could be useful for validating the first element. But if you have N number of elements to validate, we can try something like this.
Here I assume that I have to validate the first 5 elements which should be <= 5. Then it is a valid stream. Otherwise we would simply throw error saying validation failed.
Flux<Integer> integerFlux = Flux.range(1, 10).delayElements(Duration.ofSeconds(1));
integerFlux
.buffer(5)
.switchOnFirst((signal, flux) -> {
//first 5 elements are <= 5, then it is a valid stream
return signal.get().stream().allMatch(i -> i <= 5) ? flux : Flux.error(new RuntimeException("validation failed"));
})
.flatMapIterable(Function.identity())
.subscribe(System.out::println,
System.out::println);
However this approach is not good as it keeps collecting 5 elements every time even after the first validation is done which we might not want.
To avoid buffering N elements after the validation, we can use bufferUntil. Once we had collected the first N elements and validated, it would just pass the 1 element as and when it receives to the downstream.
AtomicInteger atomicInteger = new AtomicInteger(1);
integerFlux
.bufferUntil(i -> {
if(atomicInteger.get() < 5){
atomicInteger.incrementAndGet();
return false;
}
return true;
})
.switchOnFirst((signal, flux) -> {
return signal.get().stream().allMatch(i -> i <= 5) ? flux : Flux.error(new RuntimeException("validation failed"));
})
.flatMapIterable(Function.identity())
.subscribe(System.out::println,
System.out::println);

Access object reference from first stream to next stream api in java 8

I have below existing code which converting one object to another -
for(Department dept : company.getDepartments()) {
if(!isEmpty(dept.getEmployees())) {
for(Employee emp : dept.getEmployees()) {
try {
employyeV2List.add(new EmployeeV2(emp.getId(), emp.getFirstName(),..., dept.getId()));
} catch (ParseException e) {
//error logger
}
}
}
}
I want add java 8 stream api here instead of two for loops but if you see in try block there is dept.getId() which I can not access in stream API. I tried below -
List<Employee> employees = company.getDepartment().stream().map(x -> x.getEmployees())
.flatMap(x -> x.stream()).collect(Collectors.toList());
List<EmployeeV2> employeeV2List = employees.stream().map(x -> getEmployeeV2(x)).collect(Collectors.toList());
Here in getEmployeeV2() I am creating EmployeeV2 object. But I not sure how I can pass Department to here so I can access department id.
You may do it like so,
List<EmployeeV2> result = company.getDepartment().stream()
.flatMap(d -> d.getEmployees().stream()
.map(e -> new EmployeeV2(e.getId(), e.getFirstName(), d.getId())))
.collect(Collectors.toList());
Since the constructor of your EmployeeV2 class is throwing an Exception you have different options to solve this depending on the business logic you need.
First one is to catch the exception in your Lambda:
List<EmployeeV2> result = company.getDepartment().stream()
.flatMap(d -> d.getEmployees().stream()
.map(e -> {
try {
return new EmployeeV2(e.getId(), e.getFirstName(), d.getId());
} catch (ParseException exception) {
return null;
}
}))
.filter(Objects::nonNull)
.collect(Collectors.toList());
This has the advantage, that get get a list of all employees which could be created. But you wont notice a failure.
A second alternative is to update the constructor of EmployeeV2 and throw some kind of RuntimeException which you don't need to catch in the Lambda:
try {
List<EmployeeV2> result = company.getDepartment().stream()
.flatMap(d -> d.getEmployees().stream()
.map(e -> new EmployeeV2(e.getId(), e.getFirstName(), d.getId())))
.collect(Collectors.toList());
} catch (UncheckedParseException exception) {
// handle the exception
}
This one has the advantage that you will notice errors, but don't get a List of successful created employees.
I hope these two examples help you to decide whats the correct usage for your application. You also can outsource the exception handling in an external method, like you already did in your question.

Replace nested for loops with parallel stream - Java

I'm working on improving the speed of a program where performance is critical. Currently it fails to process large data sets. There are many nested for loops and so I thought it would be worth trying parallel streams. I have access to a high performance cluster so potentially have many cores available.
I have the method below:
public MinSpecSetFamily getMinDomSpecSets() {
MinSpecSetFamily result = new MinSpecSetFamily();
ResourceType minRT = this.getFirstEssentialResourceType();
if (minRT == null || minRT.noSpecies()) {
System.out.println("Problem in getMinDomSpecSets()");
}
for (Species spec : minRT.specList) {
SpecTree minTree = this.getMinimalConstSpecTreeRootedAt(spec);
ArrayList<SpecTreeNode> leafList = minTree.getLeaves();
for (SpecTreeNode leaf : leafList) {
ArrayList<Species> sp = leaf.getAncestors();
SpecSet tmpSet = new SpecSet(sp);
result.addSpecSet(tmpSet);
}
}
return result;
}
I understand that I can turn a nested for loop into a parallel stream with something like:
minRT.specList.parallelStream().flatMap(leaf -> leaflist.parallelStream())
However, I cannot find examples showing how to deal with the actions inside each for loop and I'm not at all confident about how this is supposed to work. I'd really appreciate some assistance and explanation of how to convert this method so that I can translate the solution to other methods in the program too.
Thanks.
Here's one way of doing it (hopefully I have no typos):
MinSpecSetFamily result =
minRT.specList
.parallelStream()
.flatMap(spec -> getMinimalConstSpecTreeRootedAt(spec).getLeaves().stream())
.map(leaf -> new SpecSet(leaf.getAncestors()))
.reduce(new MinSpecSetFamily (),
(fam,set)-> {
fam.addSpecSet(set);
return fam;
},
(f1, f2) -> new MinSpecSetFamily(f1, f2));
EDIT: Following Holger's comment, you should use collect instead of reduce:
MinSpecSetFamily result =
minRT.specList
.parallelStream()
.flatMap(spec -> getMinimalConstSpecTreeRootedAt(spec).getLeaves().stream())
.map(leaf -> new SpecSet(leaf.getAncestors()))
.collect(MinSpecSetFamily::new,MinSpecSetFamily::addSpecSet,MinSpecSetFamily::add);

Java Lazy Stream of Strings including List<String>

I'm creating a Stream of String lazily, for the first two simple items. However, part of my stream is List of String.
Stream<String> streamA = Stream.concat(
Stream.generate(item::getStringA),
Stream.generate(item::getStringB))
return Stream.concat(streamA, item.getStringList(param).stream())
The above works, but .getStringList needs to be called lazily as well. It's not clear to me how to fetch it and "merge" it in with the rest of the stream.
I think, what you actually want to do, is
return Stream.<Supplier<Stream<String>>>of(
() -> Stream.of(item.getStringA()),
() -> Stream.of(item.getStringB()),
() -> item.getStringList(param).stream())
.flatMap(Supplier::get);
This produces a fully lazy Stream<String> where, e.g. .limit(0).count() will not call any method on item or .findFirst() will only invoke getStringA(), etc.
The stream’s content will be equivalent to
Stream.concat(
Stream.of(item.getStringA(), item.getStringB()), item.getStringList(param).stream())
I don't think any of this does what you think it does. Stream.generate always generates an infinite stream. But the closest thing to what you want is going to be
StreamSupport.stream(() -> item.getStringList(param).spliterator(), 0, false)
...which will lazily call item.getStringList(param). (What you want isn't really an intended use case of Stream, so it's not very well supported.)
What you could do is return a Stream from your item.getStringList()
public static void main(String[] args) throws InterruptedException {
Item item = new Item();
Stream<String> a = Stream.concat(Stream.of("A"), Stream.of("B"));
Stream<String> anotherStream = Stream.concat(a, item.getStringList());
anotherStream.forEach(System.out::println);
}
private static class Item {
public Stream<String> getStringList() {
List<String> l = new ArrayList<>();
l.add("C");
l.add("D");
l.add("E");
l.add("F");
final AtomicInteger i = new AtomicInteger(0);
return Stream.iterate(l.get(i.get()), (f) -> {
// Proof of laziness
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return l.get(i.getAndIncrement());
})
// Iterate is by default unbound
.limit(l.size());
}
}
I'm not sure how helpful that approach would be still, since your list is still in the memory.

How to parallelize steps for creating a complex object?

class MyItem {
private param1, param2, param3;
}
MyItem item = new MyItem();
computeParam1(item);
computeParam2(item);
computeParam3(item);
waitForAllParamsToBeSet();
Each of the steps is independent from each other, and each step write the paramter into the object as final result.
The methods are completely different from their logic, no recursion.
How could I parallelize those steps, if possible at all?
Start Futures and then wait for results before assigning.
Future<Type1> item1 = ComputeParam1();
Future<Type2> item2 = ComputeParam2();
Future<Type3> item2 = ComputeParam3();
MyItem item = new MyItem();
assignParam1(item1.get());
assignParam2(item2.get());
assignParam3(item3.get());
As all computeParamX() accept one MyItem argument and have void return, they have a signature of Consumer<MyItem>. So you can parallelize their execution calling them in .forEach() of parallel stream, as follows:
final MyItem item = new MyItem();
Stream.<Consumer<MyItem>>of(this::computeParam1, this::computeParam2, this::computeParam3)
.parallel()
.forEach(c -> c.accept(item));
As .forEach() is terminal operation, it will block until all operations complete, so you can safely use item object after it returns.
In Java 8 you could simply create your collection of tasks as next:
Collection<Runnable> tasks = Arrays.asList(
() -> System.out.println("Compute param1"),
() -> System.out.println("Compute param2"),
() -> System.out.println("Compute param3")
);
Then launch the tasks in parallel
tasks.parallelStream().forEach(Runnable::run);

Categories

Resources