I have a question about lambda expressions. I have a class Pair which should hold a String and an int.
Pair gets the String out of a file.
and the int is representiv for the line number.
So far I have this:
Stream<String> lineNumbers = Files.lines(Paths.get(fileName));
List<Integer> posStream = Stream.iterate(0, x -> x + 1).limit(lineNumbers.count()).collect(Collectors.toList());
lineNumbers.close();
Stream<String> line = Files.lines(Paths.get(fileName));
List<Pair> pairs = line.map((f) -> new Pair<>(f,1))
.collect(Collectors.toList());
pairs.forEach(f -> System.out.println(f.toString()));
line.close();
How can I now input the file numbers to the pairs?
Is there a lambda expression which can perform this? Or do I need something else?
There are a few ways to do this. The counter technique suggested by Saloparenator's answer could be implemented as follows, using an AtomicInteger as the mutable counter object and assuming the obvious Pair class:
List<Pair> getPairs1() throws IOException {
AtomicInteger counter = new AtomicInteger(0);
try (Stream<String> lines = Files.lines(Paths.get(FILENAME))) {
return lines.parallel()
.map(line -> new Pair(line, counter.incrementAndGet()))
.collect(toList());
}
}
The problem is that if the stream is run in parallel, the counter won't be incremented in the same order as the lines are read! This will occur if your file has several thousand lines. The Files.lines stream source will batch up bunches of lines and dispatch them to several threads, which will then number their batches in parallel, interleaving their calls to incrementAndGet(). Thus, the lines won't be numbered sequentially. It will work if you can guarantee that your stream will never run in parallel, but it's often a bad idea to write streams that are likely to return different results sequentially vs. in parallel.
Here's another approach. Since you're reading all the lines into memory no matter what, just read them all into a list. Then use a stream to number them:
static List<Pair> getPairs2() throws IOException {
List<String> lines = Files.readAllLines(Paths.get(FILENAME));
return IntStream.range(0, lines.size())
.parallel()
.mapToObj(i -> new Pair(lines.get(i), i+1))
.collect(toList());
}
Another functional way would be to ZIP your list of stream with a integer generator
(I can see java8 dont have it yet but it is essentially merging every cell of two list in a list of pair, so it is easy to implement)
You can see example of generator in java8 here
int cnt = 1;
List<Pair> pairs = line.map((f) -> new Pair<>(f,cnt++))
.collect(Collectors.toList());
I have not tried it yet but may work.
Related
So I have 2000 records of company names. I could like to take first 50 names and concatenate it and save as a string and then append it to a new List .
Does anyone have any idea how can we achieve it using java8 ?
Can this be done using parallelstream api?
Currently I’m iteration over 2k records and appending the data to a string builder . Meanwhile after every 50th count I’m creating a new String builder . After every 50 record I add the string builder content to a list. Finally i get list With all the data.
Example: a1# , a2 till a2000
Final output: LiSt of String with
1st entry —> concatenation of a1 to a50,
2nd entry —> concatenation of a51 to a100
Code:
List<String> bulkEmails ; //Fetched from DB
int count = 50;
List<String> splitEmails = new ArrayList<>(); //Final output
StringBuilder builder = new StringBuilder(); // temp builder
for (String mail : bulkEmails) {
builder.append(mail).append(",");
count++;
//append concatenated 50mails, appends to finalOutput and then resets the counter
if (count == 50) {
splitEmails.add(builder.toString());
builder = new StringBuilder();
count = 0;
}
}
Suggestions are appreciated.
As I've said in the comments, 2K isn't really a massive data.
And there's an issue with executing this task in parallel.
Let's imagine this task of splitting the data into groups of 50 elements is running in parallel somehow and the first worker thread is being assigned with a chunk of data containing 523 elements, the second - 518 elements, and so on. And none of these chunks is a product 50. As a consequence of this, each thread would produce a partial result containing a group of size which differs from 50.
Depending on how you want to deal with such cases, there are different approaches on how to implement this functionality:
Join partial results as is. It implies that the final result would contain an arbitrary number of groups having size in range [1,49]. That is the simplest option to implement, and cheapest to execute. But, note that there could even the case when every resulting group is smaller than 50 be since you're not in control of the splitterator implementation (i.e. you can't dictate what would be large would be a chuck of data which particular thread would be assigned to work with). Regardless how strict/lenient your requirements are, that doesn't sound very nice.
The second option requires reordering the elements. While joining partial results produced by each thread in parallel, we can merge the two last groups produced by every thread to ensure that there would be only one group at most that differs in size from 50 in the final result.
If you're not OK with either joining partial results as is, or reordering the elements, that implies that this task isn't suitable for parallel execution because in case when the first thread produces a partial result containing a group of size smaller than 50 all the groups created by the second thread need to be rearranged. Which results in worse performance in parallel because of doing the same job twice.
The second thing we need to consider is that the operation of creating groups requires maintaining a state. Therefore, the right place for this transformation is inside a collector, where the stream is being consumed and mutable container of the collector gets updated.
Let's start with implementing a Collector which ignores the issue described above and joins partial results as is.
For that we can use static method Collector.of().
public static <T> Collector<T, ?, Deque<List<T>>> getGroupCollector(int groupSize) {
return Collector.of(
ArrayDeque::new,
(Deque<List<T>> deque, T next) -> {
if (deque.isEmpty() || deque.getLast().size() == groupSize) deque.add(new ArrayList<>());
deque.getLast().add(next);
},
(left, right) -> {
left.addAll(right);
return left;
}
);
}
Now let's implement a Collector which merges the two last groups produced in different threads (option 2 in the list above):
public static <T> Collector<T, ?, Deque<List<T>>> getGroupCollector(int groupSize) {
return Collector.of(
ArrayDeque::new,
(Deque<List<T>> deque, T next) -> {
if (deque.isEmpty() || deque.getLast().size() == groupSize) deque.add(new ArrayList<>());
deque.getLast().add(next);
},
(left, right) -> {
if (left.peekLast().size() < groupSize) {
List<T> leftLast = left.pollLast();
List<T> rightLast = right.peekLast();
int llSize = leftLast.size();
int rlSize = rightLast.size();
if (rlSize + llSize <= groupSize) {
rightLast.addAll(leftLast);
} else {
rightLast.addAll(leftLast.subList(0, groupSize - rlSize));
right.add(new ArrayList<>(leftLast.subList(groupSize - rlSize, llSize)));
}
}
left.addAll(right);
return left;
}
);
}
If you wish to implement a Collector which combiner function rearranges the partial results if needed (the very last option in the list), I'm leaving it to the OP/reader as an exercise.
Now let's use the collector defined above (the latest). Let's consider a stream of single-letter strings representing the characters of the English alphabet, and let's group size would be 5.
public static void main(String[] args) {
List<String> groups = IntStream.rangeClosed('A', 'Z')
.mapToObj(ch -> String.valueOf((char) ch))
.collect(getGroupCollector(5))
.stream()
.map(group -> String.join(",", group))
.collect(Collectors.toList());
groups.forEach(System.out::println);
}
Output:
A,B,C,D,E
F,G,H,I,J
K,L,M,N,O
P,Q,R,S,T
U,V,W,X,Y
Z
That's the sequential result. If you switch, the stream from sequential to parallel the contents of the groups would probably change, but the sizes would not be affected.
Also note that since there are effectively two independent streams chained together, you need to apply parrallel() twice to make the whole thing working in parallel.
Consider the following code:
List<Integer> odd = new ArrayList<Integer>();
List<Integer> even = null;
List<Integer> myList = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
even = myList.stream()
.filter(item -> {
if(item%2 == 0) { return true;}
else {
odd.add(item);
return false;
}
})
.collect(Collectors.toList());
What I am trying to do here is get the even and odd values from a list into separate lists.
The stream filter() method returns true for even items and the stream collector will collect them.
For the odd case, the filter will return false and the item will never reach the collector.
So I am adding such odd numbers in another list I created before under the else block.
I know this is not an elegant way of working with streams. For example if I use a parallel stream then there will be thread safety issue with the odd list. I cannot run it multiple times with different filters because of performance reasons (should be O(n)).
This is just an example for one use-case, the list could contain any object and the lambda inside the filter needs to separate them based on some logic into separate lists.
In simple terms: from a list create multiple lists containing items separated based on some criteria.
Without streams it would be just to run a for loop and do simple if-else and collect the items based on the conditions.
Here is an example of how you could separate elements (numbers) of this list in even and odd numbers:
List<Integer> myList = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
Map<Boolean, List<Integer>> evenAndOdds = myList.stream()
.collect(partitioningBy(i -> i % 2 == 0));
You would get lists of even/odd numbers like this (either list may be empty):
List<Integer> even = evenAndOdds.get(true);
List<Integer> odd = evenAndOdds.get(false);
You could pass any lambda with required logic in partitioningBy.
i have to calculate the average of a Infinite Sequence using Stream API
Input:
Stream<Double> s = a,b,c,d ...
int interval = 3
Expected Result:
Stream<Double> result = avg(a,b,c), avg(d,e,f), ....
the result can be also an Iterator, or any other type
as long as it mantains the structure of an infinite list
of course what i written is pseudo code and doesnt run
There is a #Beta API termed mapWithIndex within Guava that could help here with certain assumption:
static Stream<Double> stepAverage(Stream<Double> stream, int step) {
return Streams.mapWithIndex(stream, (from, index) -> Map.entry(index, from))
.collect(Collectors.groupingBy(e -> (e.getKey() / step), TreeMap::new,
Collectors.averagingDouble(Map.Entry::getValue)))
.values().stream();
}
The assumption that it brings in is detailed in the documentation clearly(emphasized by me):
The resulting stream is efficiently splittable if and only if stream
was efficiently splittable and its underlying spliterator reported
Spliterator.SUBSIZED. This is generally the case if the underlying
stream comes from a data structure supporting efficient indexed random
access, typically an array or list.
This should work fine using vanilla Java
I'm using Stream#mapMulti and a Set external to the Stream to aggregate the doubles
As you see, I also used DoubleSummaryStatistics to count the average.
I could have use the traditional looping and summing then dividing but I found this way more explicit
Update:
I changed the Collection used from Set to List as a Set could cause unexpected behaviour
int step = 3;
List<Double> list = new ArrayList<>();
Stream<Double> averagesStream =
infiniteStream.mapMulti(((Double aDouble, Consumer<Double> doubleConsumer) -> {
list.add(aDouble);
if (list.size() == step) {
DoubleSummaryStatistics doubleSummaryStatistics = new DoubleSummaryStatistics();
list.forEach(doubleSummaryStatistics::accept);
list.clear();
doubleConsumer.accept(doubleSummaryStatistics.getAverage());
}
}));
Consider the following code:
List<Integer> odd = new ArrayList<Integer>();
List<Integer> even = null;
List<Integer> myList = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
even = myList.stream()
.filter(item -> {
if(item%2 == 0) { return true;}
else {
odd.add(item);
return false;
}
})
.collect(Collectors.toList());
What I am trying to do here is get the even and odd values from a list into separate lists.
The stream filter() method returns true for even items and the stream collector will collect them.
For the odd case, the filter will return false and the item will never reach the collector.
So I am adding such odd numbers in another list I created before under the else block.
I know this is not an elegant way of working with streams. For example if I use a parallel stream then there will be thread safety issue with the odd list. I cannot run it multiple times with different filters because of performance reasons (should be O(n)).
This is just an example for one use-case, the list could contain any object and the lambda inside the filter needs to separate them based on some logic into separate lists.
In simple terms: from a list create multiple lists containing items separated based on some criteria.
Without streams it would be just to run a for loop and do simple if-else and collect the items based on the conditions.
Here is an example of how you could separate elements (numbers) of this list in even and odd numbers:
List<Integer> myList = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
Map<Boolean, List<Integer>> evenAndOdds = myList.stream()
.collect(partitioningBy(i -> i % 2 == 0));
You would get lists of even/odd numbers like this (either list may be empty):
List<Integer> even = evenAndOdds.get(true);
List<Integer> odd = evenAndOdds.get(false);
You could pass any lambda with required logic in partitioningBy.
I have a data set represented by a Java 8 stream:
Stream<T> stream = ...;
I can see how to filter it to get a random subset - for example
Random r = new Random();
PrimitiveIterator.OfInt coin = r.ints(0, 2).iterator();
Stream<T> heads = stream.filter((x) -> (coin.nextInt() == 0));
I can also see how I could reduce this stream to get, for example, two lists representing two random halves of the data set, and then turn those back into streams.
But, is there a direct way to generate two streams from the initial one? Something like
(heads, tails) = stream.[some kind of split based on filter]
Thanks for any insight.
A collector can be used for this.
For two categories, use Collectors.partitioningBy() factory.
This will create a Map<Boolean, List>, and put items in one or the other list based on a Predicate.
Note: Since the stream needs to be consumed whole, this can't work on infinite streams. And because the stream is consumed anyway, this method simply puts them in Lists instead of making a new stream-with-memory. You can always stream those lists if you require streams as output.
Also, no need for the iterator, not even in the heads-only example you provided.
Binary splitting looks like this:
Random r = new Random();
Map<Boolean, List<String>> groups = stream
.collect(Collectors.partitioningBy(x -> r.nextBoolean()));
System.out.println(groups.get(false).size());
System.out.println(groups.get(true).size());
For more categories, use a Collectors.groupingBy() factory.
Map<Object, List<String>> groups = stream
.collect(Collectors.groupingBy(x -> r.nextInt(3)));
System.out.println(groups.get(0).size());
System.out.println(groups.get(1).size());
System.out.println(groups.get(2).size());
In case the streams are not Stream, but one of the primitive streams like IntStream, then this .collect(Collectors) method is not available. You'll have to do it the manual way without a collector factory. It's implementation looks like this:
[Example 2.0 since 2020-04-16]
IntStream intStream = IntStream.iterate(0, i -> i + 1).limit(100000).parallel();
IntPredicate predicate = ignored -> r.nextBoolean();
Map<Boolean, List<Integer>> groups = intStream.collect(
() -> Map.of(false, new ArrayList<>(100000),
true , new ArrayList<>(100000)),
(map, value) -> map.get(predicate.test(value)).add(value),
(map1, map2) -> {
map1.get(false).addAll(map2.get(false));
map1.get(true ).addAll(map2.get(true ));
});
In this example I initialize the ArrayLists with the full size of the initial collection (if this is known at all). This prevents resize events even in the worst-case scenario, but can potentially gobble up 2NT space (N = initial number of elements, T = number of threads). To trade-off space for speed, you can leave it out or use your best educated guess, like the expected highest number of elements in one partition (typically just over N/2 for a balanced split).
I hope I don't offend anyone by using a Java 9 method. For the Java 8 version, look at the edit history.
I stumbled across this question to my self and I feel that a forked stream has some use cases that could prove valid. I wrote the code below as a consumer so that it does not do anything but you could apply it to functions and anything else you might come across.
class PredicateSplitterConsumer<T> implements Consumer<T>
{
private Predicate<T> predicate;
private Consumer<T> positiveConsumer;
private Consumer<T> negativeConsumer;
public PredicateSplitterConsumer(Predicate<T> predicate, Consumer<T> positive, Consumer<T> negative)
{
this.predicate = predicate;
this.positiveConsumer = positive;
this.negativeConsumer = negative;
}
#Override
public void accept(T t)
{
if (predicate.test(t))
{
positiveConsumer.accept(t);
}
else
{
negativeConsumer.accept(t);
}
}
}
Now your code implementation could be something like this:
personsArray.forEach(
new PredicateSplitterConsumer<>(
person -> person.getDateOfBirth().isPresent(),
person -> System.out.println(person.getName()),
person -> System.out.println(person.getName() + " does not have Date of birth")));
Unfortunately, what you ask for is directly frowned upon in the JavaDoc of Stream:
A stream should be operated on (invoking an intermediate or terminal
stream operation) only once. This rules out, for example, "forked"
streams, where the same source feeds two or more pipelines, or
multiple traversals of the same stream.
You can work around this using peek or other methods should you truly desire that type of behaviour. In this case, what you should do is instead of trying to back two streams from the same original Stream source with a forking filter, you would duplicate your stream and filter each of the duplicates appropriately.
However, you may wish to reconsider if a Stream is the appropriate structure for your use case.
You can get two Streams out of one
since Java 12 with teeing
counting heads and tails in 100 coin flips
Random r = new Random();
PrimitiveIterator.OfInt coin = r.ints(0, 2).iterator();
List<Long> list = Stream.iterate(0, i -> coin.nextInt())
.limit(100).collect(teeing(
filtering(i -> i == 1, counting()),
filtering(i -> i == 0, counting()),
(heads, tails) -> {
return(List.of(heads, tails));
}));
System.err.println("heads:" + list.get(0) + " tails:" + list.get(1));
gets eg.: heads:51 tails:49
Not exactly. You can't get two Streams out of one; this doesn't make sense -- how would you iterate over one without needing to generate the other at the same time? A stream can only be operated over once.
However, if you want to dump them into a list or something, you could do
stream.forEach((x) -> ((x == 0) ? heads : tails).add(x));
This is against the general mechanism of Stream. Say you can split Stream S0 to Sa and Sb like you wanted. Performing any terminal operation, say count(), on Sa will necessarily "consume" all elements in S0. Therefore Sb lost its data source.
Previously, Stream had a tee() method, I think, which duplicate a stream to two. It's removed now.
Stream has a peek() method though, you might be able to use it to achieve your requirements.
not exactly, but you may be able to accomplish what you need by invoking Collectors.groupingBy(). you create a new Collection, and can then instantiate streams on that new collection.
This was the least bad answer I could come up with.
import org.apache.commons.lang3.tuple.ImmutablePair;
import org.apache.commons.lang3.tuple.Pair;
public class Test {
public static <T, L, R> Pair<L, R> splitStream(Stream<T> inputStream, Predicate<T> predicate,
Function<Stream<T>, L> trueStreamProcessor, Function<Stream<T>, R> falseStreamProcessor) {
Map<Boolean, List<T>> partitioned = inputStream.collect(Collectors.partitioningBy(predicate));
L trueResult = trueStreamProcessor.apply(partitioned.get(Boolean.TRUE).stream());
R falseResult = falseStreamProcessor.apply(partitioned.get(Boolean.FALSE).stream());
return new ImmutablePair<L, R>(trueResult, falseResult);
}
public static void main(String[] args) {
Stream<Integer> stream = Stream.iterate(0, n -> n + 1).limit(10);
Pair<List<Integer>, String> results = splitStream(stream,
n -> n > 5,
s -> s.filter(n -> n % 2 == 0).collect(Collectors.toList()),
s -> s.map(n -> n.toString()).collect(Collectors.joining("|")));
System.out.println(results);
}
}
This takes a stream of integers and splits them at 5. For those greater than 5 it filters only even numbers and puts them in a list. For the rest it joins them with |.
outputs:
([6, 8],0|1|2|3|4|5)
Its not ideal as it collects everything into intermediary collections breaking the stream (and has too many arguments!)
I stumbled across this question while looking for a way to filter certain elements out of a stream and log them as errors. So I did not really need to split the stream so much as attach a premature terminating action to a predicate with unobtrusive syntax. This is what I came up with:
public class MyProcess {
/* Return a Predicate that performs a bail-out action on non-matching items. */
private static <T> Predicate<T> withAltAction(Predicate<T> pred, Consumer<T> altAction) {
return x -> {
if (pred.test(x)) {
return true;
}
altAction.accept(x);
return false;
};
/* Example usage in non-trivial pipeline */
public void processItems(Stream<Item> stream) {
stream.filter(Objects::nonNull)
.peek(this::logItem)
.map(Item::getSubItems)
.filter(withAltAction(SubItem::isValid,
i -> logError(i, "Invalid")))
.peek(this::logSubItem)
.filter(withAltAction(i -> i.size() > 10,
i -> logError(i, "Too large")))
.map(SubItem::toDisplayItem)
.forEach(this::display);
}
}
Shorter version that uses Lombok
import java.util.function.Consumer;
import java.util.function.Predicate;
import lombok.RequiredArgsConstructor;
/**
* Forks a Stream using a Predicate into postive and negative outcomes.
*/
#RequiredArgsConstructor
#FieldDefaults(makeFinal = true, level = AccessLevel.PROTECTED)
public class StreamForkerUtil<T> implements Consumer<T> {
Predicate<T> predicate;
Consumer<T> positiveConsumer;
Consumer<T> negativeConsumer;
#Override
public void accept(T t) {
(predicate.test(t) ? positiveConsumer : negativeConsumer).accept(t);
}
}
How about:
Supplier<Stream<Integer>> randomIntsStreamSupplier =
() -> (new Random()).ints(0, 2).boxed();
Stream<Integer> tails =
randomIntsStreamSupplier.get().filter(x->x.equals(0));
Stream<Integer> heads =
randomIntsStreamSupplier.get().filter(x->x.equals(1));