Lets say I have huge webserver log file that does not fit in memory. I need to stream this file to a mapreduce method and save to database. I do this using Java 8 stream api. For example, I get a list after the mapreduce process such as, consumption by client, consumption by ip, consumption by content. But, my needs are not that like that given in my example. Since I cannot share code, I just want to give basic example.
By Java 8 Stream Api, I want to read file exactly once, get 3 lists at the same time, while I am streaming file, parallel or sequential. But parallel would be good. Is there any way to do that?
Generally collecting to anything other than standard API's gives you is pretty easy via a custom Collector. In your case collecting to 3 lists at a time (just a small example that compiles, since you can't share your code either):
private static <T> Collector<T, ?, List<List<T>>> to3Lists() {
class Acc {
List<T> left = new ArrayList<>();
List<T> middle = new ArrayList<>();
List<T> right = new ArrayList<>();
List<List<T>> list = Arrays.asList(left, middle, right);
void add(T elem) {
// obviously do whatever you want here
left.add(elem);
middle.add(elem);
right.add(elem);
}
Acc merge(Acc other) {
left.addAll(other.left);
middle.addAll(other.middle);
right.addAll(other.right);
return this;
}
public List<List<T>> finisher() {
return list;
}
}
return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}
And using it via:
Stream.of(1, 2, 3)
.collect(to3Lists());
Obviously this custom collector does not do anything useful, but just an example of how you could work with it.
I have adapted the answer to this question to your case. The custom Spliterator will "split" the stream into multiple streams that collect by different properties:
#SafeVarargs
public static <T> long streamForked(Stream<T> source, Consumer<Stream<T>>... consumers)
{
return StreamSupport.stream(new ForkingSpliterator<>(source, consumers), false).count();
}
public static class ForkingSpliterator<T>
extends AbstractSpliterator<T>
{
private Spliterator<T> sourceSpliterator;
private List<BlockingQueue<T>> queues = new ArrayList<>();
private boolean sourceDone;
#SafeVarargs
private ForkingSpliterator(Stream<T> source, Consumer<Stream<T>>... consumers)
{
super(Long.MAX_VALUE, 0);
sourceSpliterator = source.spliterator();
for (Consumer<Stream<T>> fork : consumers)
{
LinkedBlockingQueue<T> queue = new LinkedBlockingQueue<>();
queues.add(queue);
new Thread(() -> fork.accept(StreamSupport.stream(new ForkedConsumer(queue), false))).start();
}
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
sourceDone = !sourceSpliterator.tryAdvance(t -> queues.forEach(queue -> queue.offer(t)));
return !sourceDone;
}
private class ForkedConsumer
extends AbstractSpliterator<T>
{
private BlockingQueue<T> queue;
private ForkedConsumer(BlockingQueue<T> queue)
{
super(Long.MAX_VALUE, 0);
this.queue = queue;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
while (queue.peek() == null)
{
if (sourceDone)
{
// element is null, and there won't be no more, so "terminate" this sub stream
return false;
}
}
// push to consumer pipeline
action.accept(queue.poll());
return true;
}
}
}
You can use it as follows:
streamForked(Stream.of(new Row("content1", "client1", "location1", 1),
new Row("content2", "client1", "location1", 2),
new Row("content1", "client1", "location2", 3),
new Row("content2", "client2", "location2", 4),
new Row("content1", "client2", "location2", 5)),
rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getClient,
Collectors.groupingBy(Row::getContent,
Collectors.summingInt(Row::getConsumption))))),
rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getClient,
Collectors.groupingBy(Row::getLocation,
Collectors.summingInt(Row::getConsumption))))),
rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getContent,
Collectors.groupingBy(Row::getLocation,
Collectors.summingInt(Row::getConsumption))))));
// Output
// {client2={location2=9}, client1={location1=3, location2=3}}
// {client2={content2=4, content1=5}, client1={content2=2, content1=4}}
// {content2={location1=2, location2=4}, content1={location1=1, location2=8}}
Note that you can do pretty much anything you want with your the copies of the stream. As per your example, I used a stacked groupingBy collector to group the rows by two properties and then summed up the int property. So the result will be a Map<String, Map<String, Integer>>. But you could also use it for other scenarios:
rows -> System.out.println(rows.count())
rows -> rows.forEach(row -> System.out.println(row))
rows -> System.out.println(rows.anyMatch(row -> row.getConsumption() > 3))
Related
My question is: is there a better way to implement this task?
I have a list of orderable elements (in this example by age, the youngest first).
And I want to delete all elements that fulfill a condition (in this example red elements) but keep the first 2 of them.
Stream<ElementsVO> stream = allElements.stream();
Stream<ElementsVO> redStream = stream.filter(elem->elem.getColor()==RED).sorted((c1, c2) -> { return c1.getAge() - c2.getAge();
}).limit(2);
Stream<ElementsVO> nonRedStream=stream.filter(elem->elem.getColor()!=RED);
List<ElementsVO> resultList = Stream.concat(redStream,nonRedStream).sorted((c1, c2) -> { return c1.getAge() - c2.getAge();
}).collect(Collectors.toList());
Any idea to improve this? Any way to implement an accumulator function or something like that with streams?
You can technically do this with a stateful predicate:
Predicate<ElementsV0> statefulPredicate = new Predicate<ElementsV0>() {
private int reds = 0;
#Override public boolean test(ElementsV0 e) {
if (elem.getColor() == RED) {
reds++;
return reds < 2;
}
return true;
}
};
Then:
List<ElementsVO> resultList =
allElements.stream()
.sorted(comparingInt(ElementsV0::getAge))
.filter(statefulPredicate)
.collect(toList());
This might work, but it is a violation of the Stream API: the documentation for Stream.filter says that the predicate should be stateless, which in general allows the stream implementation to apply the filter in any order. For small input lists, streamed sequentially, this will almost certainly be the appearance order in the list, but it's not guaranteed.
Caveat emptor. Your current way works, although you could do the partitioning of the list more efficiently using Collectors.partitioningBy to avoid iterating it twice.
You can implement a custom collector that will maintain two separate collections of RED and non-RED element.
And since you need only two red elements having the greatest age to improve performance, you can introduce a partial sorting. I.e. collection of non-red element needs to maintain an order and always must be of size 2 at most, with that overhead of sorting will be far less significant in comparison to sorting of elements having the property of RED in order to pick only two of them.
In order to create a custom collector, you might make use of the static method Collector.of() which expects the following arguments:
Supplier Supplier<A> is meant to provide a mutable container which store elements of the stream. Because we need to separate elements by color into two groups as a container, we can use a map that will contain only 2 keys (true and false), denoting whether elements mapped to this key are red. In order to store red-elements and perform a partial sorting, we need a collection that is capable of maintaining the order. PriorityQueue is a good choice for that purpose. To store all other elements, I've used ArrayDeque, which doesn't maintain the order and as fast as ArrayList.
Accumulator BiConsumer<A,T> defines how to add elements into the mutable container provided by the supplier. For this task, the accumulator needs to guarantee that the queue, containing red-elements will not exceed the given size by rejecting values that are smaller than the lowest value previously added to the queue and by removing the lowest value if the size has reached the limit and a new value needs to be added. This functionality extracted into a separate method tryAdd()
Combiner BinaryOperator<A> combiner() establishes a rule on how to merge two containers obtained while executing stream in parallel. Here, combiner rely on the same logic that was described for accumulator.
Finisher Function<A,R> is meant to produce the final result by transforming the mutable container. In the code below, finisher dumps the contents of both queues into a stream, sorts them and collects into an immutable list.
Characteristics allow fine-tuning the collector by providing additional information on how it should function. Here a characteristic Collector.Characteristics.UNORDERED is being applied. Which indicates that the order in which partial results of the reduction produced in parallel is not significant, that can improve performance of this collector with parallel streams.
The code might look like this:
public static void main(String[] args) {
List<ElementsVO> allElements =
List.of(new ElementsVO(Color.RED, 25), new ElementsVO(Color.RED, 23), new ElementsVO(Color.RED, 27),
new ElementsVO(Color.BLACK, 19), new ElementsVO(Color.GREEN, 23), new ElementsVO(Color.GREEN, 29));
Comparator<ElementsVO> byAge = Comparator.comparing(ElementsVO::getAge);
List<ElementsVO> resultList = allElements.stream()
.collect(getNFiltered(byAge, element -> element.getColor() != Color.RED, 2));
resultList.forEach(System.out::println);
}
The method below is responsible for creating of a collector that partition the elements based on the given predicate and will sort them in accordance with the provided comparator.
public static <T> Collector<T, ?, List<T>> getNFiltered(Comparator<T> comparator,
Predicate<T> condition,
int limit) {
return Collector.of(
() -> Map.of(true, new PriorityQueue<>(comparator),
false, new ArrayDeque<>()),
(Map<Boolean, Queue<T>> isRed, T next) -> {
if (condition.test(next)) isRed.get(false).add(next);
else tryAdd(isRed.get(true), next, comparator, limit);
},
(Map<Boolean, Queue<T>> left, Map<Boolean, Queue<T>> right) -> {
left.get(false).addAll(right.get(false));
left.get(true).forEach(next -> tryAdd(left.get(true), next, comparator, limit));
return left;
},
(Map<Boolean, Queue<T>> isRed) -> isRed.values().stream()
.flatMap(Queue::stream).sorted(comparator).toList(),
Collector.Characteristics.UNORDERED
);
}
This method is responsible for adding the next red-element into the priority queue. It expects a comparator in order to be able to determine whether the next element should be added or discarded, and a value of the maximum size of the queue (2), to check if it was exceeded.
public static <T> void tryAdd(Queue<T> queue, T next, Comparator<T> comparator, int size) {
if (queue.size() == size && comparator.compare(queue.element(), next) < 0)
queue.remove(); // if the next element is greater than the smallest element in the queue and max size has been exceeded, the smallest element needs to be removed from the queue
if (queue.size() < size) queue.add(next);
}
Output
lementsVO{color=BLACK, age=19}
ElementsVO{color=GREEN, age=23}
ElementsVO{color=RED, age=25}
ElementsVO{color=RED, age=27}
ElementsVO{color=GREEN, age=29}
I wrote a generic Collector with a predicate and a limit of elements to add which match the predicate:
public class LimitedMatchCollector<T> implements Collector<T, List<T>, List<T>> {
private Predicate<T> filter;
private int limit;
public LimitedMatchCollector(Predicate<T> filter, int limit)
{
super();
this.filter = filter;
this.limit = limit;
}
private int count = 0;
#Override
public Supplier<List<T>> supplier() {
return () -> new ArrayList<T>();
}
#Override
public BiConsumer<List<T>, T> accumulator() {
return this::accumulator;
}
#Override
public BinaryOperator<List<T>> combiner() {
return this::combiner;
}
#Override
public Set<Characteristics> characteristics() {
return Stream.of(Characteristics.IDENTITY_FINISH)
.collect(Collectors.toCollection(HashSet::new));
}
public List<T> accumulator(List<T> list , T e) {
if (filter.test(e)) {
if (count >= limit) {
return list;
}
count++;
}
list.add(e);
return list;
}
public List<T> combiner(List<T> left , List<T> right) {
right.forEach( e -> {
if (filter.test(e)) {
if (count < limit) {
left.add(e);
count++;
}
}
});
return left;
}
#Override
public Function<List<T>, List<T>> finisher()
{
return Function.identity();
}
}
Usage:
List<ElementsVO> list = Arrays.asList(new ElementsVO("BLUE", 1)
,new ElementsVO("BLUE", 2) // made color a String
,new ElementsVO("RED", 3)
,new ElementsVO("RED", 4)
,new ElementsVO("GREEN", 5)
,new ElementsVO("RED", 6)
,new ElementsVO("YELLOW", 7)
);
System.out.println(list.stream().collect(new LimitedMatchCollector<ElementsVO>( (e) -> "RED".equals(e.getColor()),2)));
I'm trying to collapse several streams backed by huge amounts of data into one, then buffer them. I'm able to collapse these streams into one stream of items with no problem. When I attempt to buffer/chunk the streams, though, it attempts to fully buffer the first stream, which instantly fills up my memory.
It took me a while to narrow down the issue to a minimum test case, but there's some code below.
I can refactor things such that I don't run into this issue, but without understanding why exactly this blows up, I feel like using streams is just a ticking time bomb.
I took inspiration from Buffer Operator on Java 8 Streams for the buffering.
import java.util.*;
import java.util.stream.LongStream;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public class BreakStreams
{
//#see https://stackoverflow.com/questions/47842871/buffer-operator-on-java-8-streams
/**
* Batch a stream into chunks
*/
public static <T> Stream<List<T>> buffer(Stream<T> stream, final long count)
{
final Iterator<T> streamIterator = stream.iterator();
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<List<T>>()
{
#Override public boolean hasNext()
{
return streamIterator.hasNext();
}
#Override public List<T> next()
{
List<T> intermediate = new ArrayList<>();
for (long v = 0; v < count && hasNext(); v++)
{
intermediate.add(streamIterator.next());
}
return intermediate;
}
}, 0), false);
}
public static void main(String[] args)
{
//create streams from huge datasets
Stream<Long> streams = Stream.of(LongStream.range(0, Integer.MAX_VALUE).boxed(),
LongStream.range(0, Integer.MAX_VALUE).boxed())
//collapse into one stream
.flatMap(x -> x);
//iterating over the stream one item at a time is OK..
// streams.forEach(x -> {
//buffering the stream is NOT ok, you will go OOM
buffer(streams, 25).forEach(x -> {
try
{
Thread.sleep(2500);
}
catch (InterruptedException ignore)
{
}
System.out.println(x);
});
}
}
This seems to be connected to the older issue “Why filter() after flatMap() is "not completely" lazy in Java streams?”. While that issue has been fixed for the Stream’s builtin operations, it seems to still exist when we try to iterate over a flatmapped stream externally.
We can simplify the code to reproduce the problem to
Stream.of(LongStream.range(0, Integer.MAX_VALUE))
.flatMapToLong(x -> x)
.iterator().hasNext();
Note that using Spliterator is affected as well
Stream.of(LongStream.range(0, Integer.MAX_VALUE))
.flatMapToLong(x -> x)
.spliterator()
.tryAdvance((long l) -> System.out.println("first item: "+l));
Both try to buffer elements until ultimately bailing out with an OutOfMemoryError.
Since spliterator().forEachRemaining(…) seems not to be affected, you could implement a solution which works for your use case of forEach, but it would be fragile, as it would still exhibit the problem for short-circuiting stream operations.
public static <T> Stream<List<T>> buffer(Stream<T> stream, final int count) {
boolean parallel = stream.isParallel();
Spliterator<T> source = stream.spliterator();
return StreamSupport.stream(
new Spliterators.AbstractSpliterator<List<T>>(
(source.estimateSize()+count-1)/count, source.characteristics()
&(Spliterator.SIZED|Spliterator.DISTINCT|Spliterator.ORDERED)
| Spliterator.NONNULL) {
List<T> list;
Consumer<T> c = t -> list.add(t);
#Override
public boolean tryAdvance(Consumer<? super List<T>> action) {
if(list == null) list = new ArrayList<>(count);
if(!source.tryAdvance(c)) return false;
do {} while(list.size() < count && source.tryAdvance(c));
action.accept(list);
list = null;
return true;
}
#Override
public void forEachRemaining(Consumer<? super List<T>> action) {
source.forEachRemaining(t -> {
if(list == null) list = new ArrayList<>(count);
list.add(t);
if(list.size() == count) {
action.accept(list);
list = null;
}
});
if(list != null) {
action.accept(list);
list = null;
}
}
}, parallel);
}
But note that Spliterator based solutions are preferable in general, as they support carrying additional information enabling optimizations and have lower iteration costs in a lot of use cases. So this is the way to go once this issue has been fixed in the JDK code.
As a workaround, you can use Stream.concat(…) to combine streams, but it has an explicit warning about not to combine too many streams at once in its documentation:
Use caution when constructing streams from repeated concatenation. Accessing an element of a deeply concatenated stream can result in deep call chains, or even StackOverflowException [sic].
The throwable’s name has been corrected to StackOverflowError in Java 9’s documentation
This question already has answers here:
Java 8 Distinct by property
(34 answers)
Closed 3 years ago.
I frequently ran into a problem with Java lambda expressions where when I wanted to distinct() a stream on an arbitrary property or method of an object, but wanted to keep the object rather than map it to that property or method. I started to create containers as discussed here but I started to do it enough to where it became annoying and made a lot of boilerplate classes.
I threw together this Pairing class, which holds two objects of two types and allows you to specify keying off the left, right, or both objects. My question is... is there really no built-in lambda stream function to distinct() on a key supplier of some sorts? That would really surprise me. If not, will this class fulfill that function reliably?
Here is how it would be called
BigDecimal totalShare = orders.stream().map(c -> Pairing.keyLeft(c.getCompany().getId(), c.getShare())).distinct().map(Pairing::getRightItem).reduce(BigDecimal.ZERO, (x,y) -> x.add(y));
Here is the Pairing class
public final class Pairing<X,Y> {
private final X item1;
private final Y item2;
private final KeySetup keySetup;
private static enum KeySetup {LEFT,RIGHT,BOTH};
private Pairing(X item1, Y item2, KeySetup keySetup) {
this.item1 = item1;
this.item2 = item2;
this.keySetup = keySetup;
}
public X getLeftItem() {
return item1;
}
public Y getRightItem() {
return item2;
}
public static <X,Y> Pairing<X,Y> keyLeft(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.LEFT);
}
public static <X,Y> Pairing<X,Y> keyRight(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.RIGHT);
}
public static <X,Y> Pairing<X,Y> keyBoth(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.BOTH);
}
public static <X,Y> Pairing<X,Y> forItems(X item1, Y item2) {
return keyBoth(item1, item2);
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
result = prime * result + ((item1 == null) ? 0 : item1.hashCode());
}
if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
result = prime * result + ((item2 == null) ? 0 : item2.hashCode());
}
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Pairing<?,?> other = (Pairing<?,?>) obj;
if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
if (item1 == null) {
if (other.item1 != null)
return false;
} else if (!item1.equals(other.item1))
return false;
}
if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
if (item2 == null) {
if (other.item2 != null)
return false;
} else if (!item2.equals(other.item2))
return false;
}
return true;
}
}
UPDATE:
Tested Stuart's function below and it seems to work great. The operation below distincts on the first letter of each string. The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream
public class DistinctByKey {
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
public static void main(String[] args) {
final ImmutableList<String> arpts = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI");
arpts.stream().filter(distinctByKey(f -> f.substring(0,1))).forEach(s -> System.out.println(s));
}
Output is...
ABQ
CHI
PHX
BWI
The distinct operation is a stateful pipeline operation; in this case it's a stateful filter. It's a bit inconvenient to create these yourself, as there's nothing built-in, but a small helper class should do the trick:
/**
* Stateful filter. T is type of stream element, K is type of extracted key.
*/
static class DistinctByKey<T,K> {
Map<K,Boolean> seen = new ConcurrentHashMap<>();
Function<T,K> keyExtractor;
public DistinctByKey(Function<T,K> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
I don't know your domain classes, but I think that, with this helper class, you could do what you want like this:
BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order,CompanyId>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
Unfortunately the type inference couldn't get far enough inside the expression, so I had to specify explicitly the type arguments for the DistinctByKey class.
This involves more setup than the collectors approach described by Louis Wasserman, but this has the advantage that distinct items pass through immediately instead of being buffered up until the collection completes. Space should be the same, as (unavoidably) both approaches end up accumulating all distinct keys extracted from the stream elements.
UPDATE
It's possible to get rid of the K type parameter since it's not actually used for anything other than being stored in a map. So Object is sufficient.
/**
* Stateful filter. T is type of stream element.
*/
static class DistinctByKey<T> {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
Function<T,Object> keyExtractor;
public DistinctByKey(Function<T,Object> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
This simplifies things a bit, but I still had to specify the type argument to the constructor. Trying to use diamond or a static factory method doesn't seem to improve things. I think the difficulty is that the compiler can't infer generic type parameters -- for a constructor or a static method call -- when either is in the instance expression of a method reference. Oh well.
(Another variation on this that would probably simplify it is to make DistinctByKey<T> implements Predicate<T> and rename the method to eval. This would remove the need to use a method reference and would probably improve type inference. However, it's unlikely to be as nice as the solution below.)
UPDATE 2
Can't stop thinking about this. Instead of a helper class, use a higher-order function. We can use captured locals to maintain state, so we don't even need a separate class! Bonus, things are simplified so type inference works!
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
BigDecimal totalShare = orders.stream()
.filter(distinctByKey(o -> o.getCompany().getId()))
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
You more or less have to do something like
elements.stream()
.collect(Collectors.toMap(
obj -> extractKey(obj),
obj -> obj,
(first, second) -> first
// pick the first if multiple values have the same key
)).values().stream();
Another way of finding distinct elements
List<String> uniqueObjects = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI")
.stream()
.collect(Collectors.groupingBy((p)->p.substring(0,1))) //expression
.values()
.stream()
.flatMap(e->e.stream().limit(1))
.collect(Collectors.toList());
A variation on Stuart Marks second update. Using a Set.
public static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) {
Set<Object> seen = Collections.newSetFromMap(new ConcurrentHashMap<>());
return t -> seen.add(keyExtractor.apply(t));
}
We can also use RxJava (very powerful reactive extension library)
Observable.from(persons).distinct(Person::getName)
or
Observable.from(persons).distinct(p -> p.getName())
To answer your question in your second update:
The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream:
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
In your code sample, distinctByKey is only invoked one time, so the ConcurrentHashMap created just once. Here's an explanation:
The distinctByKey function is just a plain-old function that returns an object, and that object happens to be a Predicate. Keep in mind that a predicate is basically a piece of code that can be evaluated later. To manually evaluate a predicate, you must call a method in the Predicate interface such as test. So, the predicate
t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null
is merely a declaration that is not actually evaluated inside distinctByKey.
The predicate is passed around just like any other object. It is returned and passed into the filter operation, which basically evaluates the predicate repeatedly against each element of the stream by calling test.
I'm sure filter is more complicated than I made it out to be, but the point is, the predicate is evaluated many times outside of distinctByKey. There's nothing special* about distinctByKey; it's just a function that you've called one time, so the ConcurrentHashMap is only created one time.
*Apart from being well made, #stuart-marks :)
You can use the distinct(HashingStrategy) method in Eclipse Collections.
List<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
ListIterate.distinct(list, HashingStrategies.fromFunction(s -> s.substring(0, 1)))
.each(System.out::println);
If you can refactor list to implement an Eclipse Collections interface, you can call the method directly on the list.
MutableList<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
list.distinct(HashingStrategies.fromFunction(s -> s.substring(0, 1)))
.each(System.out::println);
HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.
Set.add(element) returns true if the set did not already contain element, otherwise false.
So you can do like this.
Set<String> set = new HashSet<>();
BigDecimal totalShare = orders.stream()
.filter(c -> set.add(c.getCompany().getId()))
.map(c -> c.getShare())
.reduce(BigDecimal.ZERO, BigDecimal::add);
If you want to do this parallel, you must use concurrent map.
It can be done something like
Set<String> distinctCompany = orders.stream()
.map(Order::getCompany)
.collect(Collectors.toSet());
Need some help thinking in lambdas from my fellow StackOverflow luminaries.
Standard case of picking through a list of a list of a list to collect some children deep in a graph. What awesome ways could Lambdas help with this boilerplate?
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
for (final Service service : server.findServices()) {
if (service.getContainer() instanceof Engine) {
final Engine engine = (Engine) service.getContainer();
for (final Container possibleHost : engine.findChildren()) {
if (possibleHost instanceof Host) {
final Host host = (Host) possibleHost;
for (final Container possibleContext : host.findChildren()) {
if (possibleContext instanceof Context) {
final Context context = (Context) possibleContext;
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
}
}
}
}
}
}
return list;
}
Note the list itself is going to the client as JSON, so don't focus on what is returned. Must be a few neat ways I can cut down the loops.
Interested to see what my fellow experts create. Multiple approaches encouraged.
EDIT
The findServices and the two findChildren methods return arrays
EDIT - BONUS CHALLENGE
The "not important part" did turn out to be important. I actually need to copy a value only available in the host instance. This seems to ruin all the beautiful examples. How would one carry state forward?
final ContextInfo info = new ContextInfo(context.getPath());
info.setHostname(host.getName()); // The Bonus Challenge
It's fairly deeply nested but it doesn't seem exceptionally difficult.
The first observation is that if a for-loop translates into a stream, nested for-loops can be "flattened" into a single stream using flatMap. This operation takes a single element and returns an arbitrary number elements in a stream. I looked up and found that StandardServer.findServices() returns an array of Service so we turn this into a stream using Arrays.stream(). (I make similar assumptions for Engine.findChildren() and Host.findChildren().
Next, the logic within each loop does an instanceof check and a cast. This can be modeled using streams as a filter operation to do the instanceof followed by a map operation that simply casts and returns the same reference. This is actually a no-op but it lets the static typing system convert a Stream<Container> to a Stream<Host> for example.
Applying these transformations to the nested loops, we get the following:
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
Arrays.stream(server.findServices())
.filter(service -> service.getContainer() instanceof Engine)
.map(service -> (Engine)service.getContainer())
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(possibleHost -> possibleHost instanceof Host)
.map(possibleHost -> (Host)possibleHost)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(possibleContext -> possibleContext instanceof Context)
.map(possibleContext -> (Context)possibleContext)
.forEach(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
});
return list;
}
But wait, there's more.
The final forEach operation is a slightly more complicated map operation that converts a Context into a ContextInfo. Furthermore, these are just collected into a List so we can use collectors to do this instead of creating and empty list up front and then populating it. Applying these refactorings results in the following:
public List<ContextInfo> list() {
final StandardServer server = getServer();
return Arrays.stream(server.findServices())
.filter(service -> service.getContainer() instanceof Engine)
.map(service -> (Engine)service.getContainer())
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(possibleHost -> possibleHost instanceof Host)
.map(possibleHost -> (Host)possibleHost)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(possibleContext -> possibleContext instanceof Context)
.map(possibleContext -> (Context)possibleContext)
.map(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
})
.collect(Collectors.toList());
}
I usually try to avoid multi-line lambdas (such as in the final map operation) so I'd refactor it into a little helper method that takes a Context and returns a ContextInfo. This doesn't shorten the code at all, but I think it does make it clearer.
UPDATE
But wait, there's still more.
Let's extract the call to service.getContainer() into its own pipeline element:
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.filter(container -> container instanceof Engine)
.map(container -> (Engine)container)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
// ...
This exposes the repetition of filtering on instanceof followed by a mapping with a cast. This is done three times in total. It seems likely that other code is going to need to do similar things, so it would be nice to extract this bit of logic into a helper method. The problem is that filter can change the number of elements in the stream (dropping ones that don't match) but it can't change their types. And map can change the types of elements, but it can't change their number. Can something change both the number and types? Yes, it's our old friend flatMap again! So our helper method needs to take an element and return a stream of elements of a different type. That return stream will contain a single casted element (if it matches) or it will be empty (if it doesn't match). The helper function would look like this:
<T,U> Stream<U> toType(T t, Class<U> clazz) {
if (clazz.isInstance(t)) {
return Stream.of(clazz.cast(t));
} else {
return Stream.empty();
}
}
(This is loosely based on C#'s OfType construct mentioned in some of the comments.)
While we're at it, let's extract a method to create a ContextInfo:
ContextInfo makeContextInfo(Context context) {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
}
After these extractions, the pipeline looks like this:
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.flatMap(container -> toType(container, Engine.class))
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.flatMap(possibleHost -> toType(possibleHost, Host.class))
.flatMap(host -> Arrays.stream(host.findChildren()))
.flatMap(possibleContext -> toType(possibleContext, Context.class))
.map(this::makeContextInfo)
.collect(Collectors.toList());
Nicer, I think, and we've removed the dreaded multi-line statement lambda.
UPDATE: BONUS CHALLENGE
Once again, flatMap is your friend. Take the tail of the stream and migrate it into the last flatMap before the tail. That way the host variable is still in scope, and you can pass it to a makeContextInfo helper method that's been modified to take host as well.
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.flatMap(container -> toType(container, Engine.class))
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.flatMap(possibleHost -> toType(possibleHost, Host.class))
.flatMap(host -> Arrays.stream(host.findChildren())
.flatMap(possibleContext -> toType(possibleContext, Context.class))
.map(ctx -> makeContextInfo(ctx, host)))
.collect(Collectors.toList());
This would be my version of your code using JDK 8 streams, method references, and lambda expressions:
server.findServices()
.stream()
.map(Service::getContainer)
.filter(Engine.class::isInstance)
.map(Engine.class::cast)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(Host.class::isInstance)
.map(Host.class::cast)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(Context.class::isInstance)
.map(Context.class::cast)
.map(context -> {
ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
})
.collect(Collectors.toList());
In this approach, I replace your if-statements for filter predicates. Take into account that an instanceof check can be replaced with a Predicate<T>
Predicate<Object> isEngine = someObject -> someObject instanceof Engine;
which can also be expressed as
Predicate<Object> isEngine = Engine.class::isInstance
Similarly, your casts can be replaced by Function<T,R>.
Function<Object,Engine> castToEngine = someObject -> (Engine) someObject;
Which is pretty much the same as
Function<Object,Engine> castToEngine = Engine.class::cast;
And adding items manually to a list in the for loop can be replaced with a collector. In production code, the lambda that transforms a Context into a ContextInfo can (and should) be extracted into a separate method, and used as a method reference.
Solution to bonus challenge
Inspired by #EdwinDalorzo answer.
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<>();
final StandardServer server = getServer();
return server.findServices()
.stream()
.map(Service::getContainer)
.filter(Engine.class::isInstance)
.map(Engine.class::cast)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(Host.class::isInstance)
.map(Host.class::cast)
.flatMap(host -> mapContainers(
Arrays.stream(host.findChildren()), host.getName())
)
.collect(Collectors.toList());
}
private static Stream<ContextInfo> mapContainers(Stream<Container> containers,
String hostname) {
return containers
.filter(Context.class::isInstance)
.map(Context.class::cast)
.map(context -> {
ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
info.setHostname(hostname); // The Bonus Challenge
return info;
});
}
First attempt beyond ugly. It will be years before I find this readable. Has to be a better way.
Note the findChildren methods return arrays which of course work with for (N n: array) syntax, but not with the new Iterable.forEach method. Had to wrap them with Arrays.asList
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
asList(server.findServices()).forEach(service -> {
if (!(service.getContainer() instanceof Engine)) return;
final Engine engine = (Engine) service.getContainer();
instanceOf(Host.class, asList(engine.findChildren())).forEach(host -> {
instanceOf(Context.class, asList(host.findChildren())).forEach(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
});
});
});
return list;
}
The utility methods
public static <T> Iterable<T> instanceOf(final Class<T> type, final Collection collection) {
final Iterator iterator = collection.iterator();
return () -> new SlambdaIterator<>(() -> {
while (iterator.hasNext()) {
final Object object = iterator.next();
if (object != null && type.isAssignableFrom(object.getClass())) {
return (T) object;
}
}
throw new NoSuchElementException();
});
}
And finally a Lambda-powerable implementation of Iterable
public static class SlambdaIterator<T> implements Iterator<T> {
// Ya put your Lambdas in there
public static interface Advancer<T> {
T advance() throws NoSuchElementException;
}
private final Advancer<T> advancer;
private T next;
protected SlambdaIterator(final Advancer<T> advancer) {
this.advancer = advancer;
}
#Override
public boolean hasNext() {
if (next != null) return true;
try {
next = advancer.advance();
return next != null;
} catch (final NoSuchElementException e) {
return false;
}
}
#Override
public T next() {
if (!hasNext()) throw new NoSuchElementException();
final T v = next;
next = null;
return v;
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
}
Lots of plumbing and no doubt 5x the byte code. Must be a better way.
Since Java doesn't allow passing methods as parameters, what trick do you use to implement Python like list comprehension in Java ?
I have a list (ArrayList) of Strings. I need to transform each element by using a function so that I get another list. I have several functions which take a String as input and return another String as output. How do I make a generic method which can be given the list and the function as parameters so that I can get a list back with each element processed. It is not possible in the literal sense, but what trick should I use ?
The other option is to write a new function for each smaller String-processing function which simply loops over the entire list, which is kinda not so cool.
In Java 8 you can use method references:
List<String> list = ...;
list.replaceAll(String::toUpperCase);
Or, if you want to create a new list instance:
List<String> upper = list.stream().map(String::toUpperCase).collect(Collectors.toList());
Basically, you create a Function interface:
public interface Func<In, Out> {
public Out apply(In in);
}
and then pass in an anonymous subclass to your method.
Your method could either apply the function to each element in-place:
public static <T> void applyToListInPlace(List<T> list, Func<T, T> f) {
ListIterator<T> itr = list.listIterator();
while (itr.hasNext()) {
T output = f.apply(itr.next());
itr.set(output);
}
}
// ...
List<String> myList = ...;
applyToListInPlace(myList, new Func<String, String>() {
public String apply(String in) {
return in.toLowerCase();
}
});
or create a new List (basically creating a mapping from the input list to the output list):
public static <In, Out> List<Out> map(List<In> in, Func<In, Out> f) {
List<Out> out = new ArrayList<Out>(in.size());
for (In inObj : in) {
out.add(f.apply(inObj));
}
return out;
}
// ...
List<String> myList = ...;
List<String> lowerCased = map(myList, new Func<String, String>() {
public String apply(String in) {
return in.toLowerCase();
}
});
Which one is preferable depends on your use case. If your list is extremely large, the in-place solution may be the only viable one; if you wish to apply many different functions to the same original list to make many derivative lists, you will want the map version.
The Google Collections library has lots of classes for working with collections and iterators at a much higher level than plain Java supports, and in a functional manner (filter, map, fold, etc.). It defines Function and Predicate interfaces and methods that use them to process collections so that you don't have to. It also has convenience functions that make dealing with Java generics less arduous.
I also use Hamcrest** for filtering collections.
The two libraries are easy to combine with adapter classes.
** Declaration of interest: I co-wrote Hamcrest
Apache Commons CollectionsUtil.transform(Collection, Transformer) is another option.
I'm building this project to write list comprehension in Java, now is a proof of concept in https://github.com/farolfo/list-comprehension-in-java
Examples
// { x | x E {1,2,3,4} ^ x is even }
// gives {2,4}
Predicate<Integer> even = x -> x % 2 == 0;
List<Integer> evens = new ListComprehension<Integer>()
.suchThat(x -> {
x.belongsTo(Arrays.asList(1, 2, 3, 4));
x.is(even);
});
// evens = {2,4};
And if we want to transform the output expression in some way like
// { x * 2 | x E {1,2,3,4} ^ x is even }
// gives {4,8}
List<Integer> duplicated = new ListComprehension<Integer>()
.giveMeAll((Integer x) -> x * 2)
.suchThat(x -> {
x.belongsTo(Arrays.asList(1, 2, 3, 4));
x.is(even);
});
// duplicated = {4,8}
You can use lambdas for the function, like so:
class Comprehension<T> {
/**
*in: List int
*func: Function to do to each entry
*/
public List<T> comp(List<T> in, Function<T, T> func) {
List<T> out = new ArrayList<T>();
for(T o: in) {
out.add(func.apply(o));
}
return out;
}
}
the usage:
List<String> stuff = new ArrayList<String>();
stuff.add("a");
stuff.add("b");
stuff.add("c");
stuff.add("d");
stuff.add("cheese");
List<String> newStuff = new Comprehension<String>().comp(stuff, (a) -> { //The <String> tells the comprehension to return an ArrayList<String>
a.equals("a")? "1":
(a.equals("b")? "2":
(a.equals("c")? "3":
(a.equals("d")? "4": a
)))
});
will return:
["1", "2", "3", "4", "cheese"]
import java.util.Arrays;
class Soft{
public static void main(String[] args){
int[] nums=range(9, 12);
System.out.println(Arrays.toString(nums));
}
static int[] range(int low, int high){
int[] a=new int[high-low];
for(int i=0,j=low;i<high-low;i++,j++){
a[i]=j;
}
return a;
}
}
Hope, that I help you :)