Mapping 2 Objects in a Stream to a single 3rd one? - java

If I have a list of timestamps and a file path of an object that I want to convert, can I make a collection of converters that expect the method signature Converter(filePath, start, end)?
More Detail (Pseuo-Code):
Some list that has timestamps (Imagine they're in seconds) path = somewhere, list = {0, 15, 15, 30},
How can I do something like this:
list.stream.magic.map(start, end -> new Converter (path, start, end))?
Result: new Converter (path, 0, 15), new Converter(path, 15, 30)
Note: I'm aware of BiFunction, but to my knowledge, streams do not implement it.

There are many approaches to get the required result using streams.
But first of all, you're not obliged to use Stream API, and in case of dealing with lists of tens and hundreds elements I would suggest to use plain old list iterations.
Just for instance try the code sample below.
We easily can see the two surface problems arising from the nature of streams and their incompatibility with the very idea of pairing its elements:
it's necessary to apply stateful function which is really tricky for using in map() and should be considered dirty coding; and the mapping produce some nulls on even places that should be filtered out properly;
problems are there when stream contains odd number of elements, and you never can predict if it does.
If you decide to use streams then to make it a clear way we need a custom implementation of Iterator, Spliterator or Collector - depends on demands.
Anyway there are couple of non-obvious corner cases you won't be happy to implement by yourself, so can try tons of third-party stream libs.
Two of the most popular are Streamex and RxJava.
Definitely they have tools for pairing stream elements... but don't forget to check the performance for your case!
import java.util.Objects;
import java.util.function.Function;
import java.util.stream.Stream;
public class Sample
{
public static void main(String... arg)
{
String path = "somewhere";
Stream<Converter> stream = Stream.of(0, 15, 25, 30).map(
new Function<Integer, Converter>()
{
int previous;
boolean even = true;
#Override
public Converter apply(Integer current)
{
Converter converter = even ? null : new Converter(path, previous, current);
even = !even;
previous = current;
return converter;
}
}).filter(Objects::nonNull);
stream.forEach(System.out::println);
}
static class Converter
{
private final String path;
private final int start;
private final int end;
Converter(String path, int start, int end)
{
this.path = path;
this.start = start;
this.end = end;
}
public String toString()
{
return String.format("Converter[%s,%s,%s]", path, start, end);
}
}
}

Related

Java 8 Stream Collectors - Collector to create a Map with objects in multiple buckets

The following code works and is readable but it seems to me I have intermediate operations that feel like they shouldn't be necessary. I've written this simplified version as the actual code is part of a much larger process.
I've got a Collection of Widget, each with a name and multiple types (indicated by constants of the WidgetType enum). These multiple types are gettable as a Stream<WidgetType> though, if necessary, I could return those as some other type. (For various reasons, it is strongly desirable that these be returned as a Stream<WidgetType> because of how these widgets are used later in the actual code.)
These widgets are added to an EnumMap<WidgetType, List<Widget>> which is, later, translated into an EnumMap<WidgetType, Widget[]>.
If each Widget only had a single WidgetType, this would be a trivial solve but, since any Widget could have 1 or more types, I am tripping all over myself with the syntax of the Collectors.groupingBy() method (and its overloads).
Here's the code example, again, fully functional and gives me the exact result I need.
class StackOverFlowExample {
private final Map<WidgetType, Widget[]> widgetMap = new EnumMap<>(WidgetType.class);
public static void main(String[] args) { new StackOverFlowExample(); }
StackOverFlowExample() {
Collection<Widget> widgetList = getWidgetsFromWhereverWidgetsComeFrom();
{
final Map<WidgetType, List<Widget>> intermediateMap = new EnumMap<>(WidgetType.class);
widgetList.forEach(w ->
w.getWidgetTypes().forEach(wt -> {
intermediateMap.putIfAbsent(wt, new ArrayList<>());
intermediateMap.get(wt).add(w);
})
);
intermediateMap.entrySet().forEach(e -> widgetMap.put(e.getKey(), e.getValue().toArray(new Widget[0])));
}
Arrays.stream(WidgetType.values()).forEach(wt -> System.out.println(wt + ": " + Arrays.toString(widgetMap.get(wt))));
}
private Collection<Widget> getWidgetsFromWhereverWidgetsComeFrom() {
return Arrays.asList(
new Widget("1st", WidgetType.TYPE_A, WidgetType.TYPE_B),
new Widget("2nd", WidgetType.TYPE_A, WidgetType.TYPE_C),
new Widget("3rd", WidgetType.TYPE_A, WidgetType.TYPE_D),
new Widget("4th", WidgetType.TYPE_C, WidgetType.TYPE_D)
);
}
}
This outputs:
TYPE_A: [1st, 2nd, 3rd]
TYPE_B: [1st]
TYPE_C: [2nd, 4th]
TYPE_D: [3rd, 4th]
For completeness sake, here's the Widget class and the WidgetType enum:
class Widget {
private final String name;
private final WidgetType[] widgetTypes;
Widget(String n, WidgetType ... wt) { name = n; widgetTypes = wt; }
public String getName() { return name; }
public Stream<WidgetType> getWidgetTypes() { return Arrays.stream(widgetTypes).distinct(); }
#Override public String toString() { return name; }
}
enum WidgetType { TYPE_A, TYPE_B, TYPE_C, TYPE_D }
Any ideas on a better way to execute this logic are welcome. Thanks!
IMHO, the key is to convert a Widget instance to a Stream<Pair<WidgetType, Widget>> instance. Once we have that, we can flatMap a stream of widgets and collect on the resulting stream. Of course we don't have Pair in Java, so have to use AbstractMap.SimpleEntry instead.
widgets.stream()
// Convert a stream of widgets to a stream of (type, widget)
.flatMap(w -> w.getTypes().map(t->new AbstractMap.SimpleEntry<>(t, w)))
// Grouping by the key, and do additional mapping to get the widget
.collect(groupingBy(e->e.getKey(),
mapping(e->e.getValue,
collectingAndThen(toList(), l->l.toArray(new Widget[0])))));
P.S. this is an occasion where IntelliJ's suggestion doesn't shorten a lambda with method reference.
This is a bit convoluted, but it produces the same output, not necessarily in the same order. It uses a static import of java.util.stream.Collectors.*.
widgetMap = widgetList.stream()
.flatMap(w -> w.getWidgetTypes().map(t -> new AbstractMap.SimpleEntry<>(t, w)))
.collect(groupingBy(Map.Entry::getKey, collectingAndThen(mapping(Map.Entry::getValue, toSet()), s -> s.stream().toArray(Widget[]::new))));
Output on my machine:
TYPE_A: [1st, 3rd, 2nd]
TYPE_B: [1st]
TYPE_C: [2nd, 4th]
TYPE_D: [3rd, 4th]

Complex custom Collector with Java 8

I have a stream of objects which I would like to collect the following way.
Let's say we are handling forum posts:
class Post {
private Date time;
private Data data
}
I want to create a list which groups posts by a period. If there were no posts for X minutes, create a new group.
class PostsGroup{
List<Post> posts = new ArrayList<> ();
}
I want to get a List<PostGroups> containing the posts grouped by the interval.
Example: interval of 10 minutes.
Posts:
[{time:x, data:{}}, {time:x + 3, data:{}} , {time:x + 12, data:{}, {time:x + 45, data:{}}}]
I want to get a list of posts group:
[
{posts : [{time:x, data:{}}, {time:x + 3, data:{}}, {time:x + 12, data:{}]]},
{posts : [{time:x + 45, data:{}]}
]
notice that the first group lasted till X + 22. Then a new post was received at X + 45.
Is this possible?
This problem could be easily solved using the groupRuns method of my StreamEx library:
long MAX_INTERVAL = TimeUnit.MINUTES.toMillis(10);
StreamEx.of(posts)
.groupRuns((p1, p2) -> p2.time.getTime() - p1.time.getTime() <= MAX_INTERVAL)
.map(PostsGroup::new)
.toList();
I assume that you have a constructor
class PostsGroup {
private List<Post> posts;
public PostsGroup(List<Post> posts) {
this.posts = posts;
}
}
The StreamEx.groupRuns method takes a BiPredicate which is applied to two adjacent input elements and returns true if they must be grouped together. This method creates the stream of lists where each list represents the group. This method is lazy and works fine with parallel streams.
You need to retain state between stream entries and write yourself a grouping classifier. Something like this would be a good start.
class Post {
private final long time;
private final String data;
public Post(long time, String data) {
this.time = time;
this.data = data;
}
#Override
public String toString() {
return "Post{" + "time=" + time + ", data=" + data + '}';
}
}
public void test() {
System.out.println("Hello");
long t = 0;
List<Post> posts = Arrays.asList(
new Post(t, "One"),
new Post(t + 1000, "Two"),
new Post(t + 10000, "Three")
);
// Group every 5 seconds.
Map<Long, List<Post>> gouped = posts
.stream()
.collect(Collectors.groupingBy(new ClassifyByTimeBetween(5000)));
gouped.entrySet().stream().forEach((e) -> {
System.out.println(e.getKey() + " -> " + e.getValue());
});
}
class ClassifyByTimeBetween implements Function<Post, Long> {
final long delay;
long currentGroupBy = -1;
long lastDateSeen = -1;
public ClassifyByTimeBetween(long delay) {
this.delay = delay;
}
#Override
public Long apply(Post p) {
if (lastDateSeen >= 0) {
if (p.time > lastDateSeen + delay) {
// Grab this one.
currentGroupBy = p.time;
}
} else {
// First time - start there.
currentGroupBy = p.time;
}
lastDateSeen = p.time;
return currentGroupBy;
}
}
Since no one has provided a solution with a custom collector as it was required in the original problem statement, here is a collector-implementation that groups Post objects based on the provided time-interval.
Date class mentioned in the question is obsolete since Java 8 and not recommended to be used in new projects. Hence, LocalDateTime will be utilized instead.
Post & PostGroup
For testing purposes, I've used Post implemented as a Java 16 record (if you substitute it with a class, the overall solution will be fully compliant with Java 8):
public record Post(LocalDateTime dateTime) {}
Also, I've enhanced the PostGroup object. My idea is that it should be capable to decide whether the offered Post should be added to the list of posts or rejected as the Information expert principle suggests (in short: all manipulations with the data should happen only inside a class to which that data belongs).
To facilitate this functionality, two extra fields were added: interval of type Duration from the java.time package to represent the maximum interval between the earliest post and the latest post in a group, and intervalBound of type LocalDateTime which gets initialized after the first post will be added a later on will be used internally by the method isWithinInterval() to check whether the offered post fits into the interval.
public class PostsGroup {
private Duration interval;
private LocalDateTime intervalBound;
private List<Post> posts = new ArrayList<>();
public PostsGroup(Duration interval) {
this.interval = interval;
}
public boolean tryAdd(Post post) {
if (posts.isEmpty()) {
intervalBound = post.dateTime().plus(interval);
return posts.add(post);
} else if (isWithinInterval(post)) {
return posts.add(post);
}
return false;
}
public boolean isWithinInterval(Post post) {
return post.dateTime().isBefore(intervalBound);
}
#Override
public String toString() {
return "PostsGroup{" + posts + '}';
}
}
I'm making two assumptions:
All posts in the source are sorted by time (if it is not the case, you should introduce sorted() operation in the pipeline before collecting the results);
Posts need to be collected into the minimum number of groups, as a consequence of this it's not possible to split this task and execute stream in parallel.
Building a Custom Collector
We can create a custom collector either inline by using one of the versions of the static method Collector.of() or by defining a class that implements the Collector interface.
These parameters have to be provided while creating a custom collector:
Supplier Supplier<A> is meant to provide a mutable container which store elements of the stream. In this case, ArrayDeque (as an implementation of the Deque interface) will be handy as a container to facilitate the convenient access to the most recently added element, i.e. the latest PostGroup.
Accumulator BiConsumer<A,T> defines how to add elements into the container provided by the supplier. For this task, we need to provide the logic on that will allow determining whether the next element from the stream (i.e. the next Post) should go into the last PostGroup in the Deque, or a new PostGroup needs to be allocated for it.
Combiner BinaryOperator<A> combiner() establishes a rule on how to merge two containers obtained while executing stream in parallel. Since this operation is treated as not parallelizable, the combiner is implemented to throw an AssertionError in case of parallel execution.
Finisher Function<A,R> is meant to produce the final result by transforming the mutable container. The finisher function in the code below turns the container, a deque containing the result, into an immutable list.
Note: Java 16 method toList() is used inside the finisher function, for Java 8 it can be replaced with collect(Collectors.toUnmodifiableList()) or collect(Collectors.toList()).
Characteristics allow providing additional information, for instance Collector.Characteristics.UNORDERED which is used in this case denotes that the order in which partial results of the reduction produced while executing in parallel is not significant. In this case, collector doesn't require any characteristics.
The method below is responsible for generating the collector based on the provided interval.
public static Collector<Post, ?, List<PostsGroup>> groupPostsByInterval(Duration interval) {
return Collector.of(
ArrayDeque::new,
(Deque<PostsGroup> deque, Post post) -> {
if (deque.isEmpty() || !deque.getLast().tryAdd(post)) { // if no groups have been created yet or if adding the post into the most recent group fails
PostsGroup postsGroup = new PostsGroup(interval);
postsGroup.tryAdd(post);
deque.addLast(postsGroup);
}
},
(Deque<PostsGroup> left, Deque<PostsGroup> right) -> { throw new AssertionError("should not be used in parallel"); },
(Deque<PostsGroup> deque) -> deque.stream().collect(Collectors.collectingAndThen(Collectors.toUnmodifiableList())));
}
main() - demo
public static void main(String[] args) {
List<Post> posts =
List.of(new Post(LocalDateTime.of(2022,4,28,15,0)),
new Post(LocalDateTime.of(2022,4,28,15,3)),
new Post(LocalDateTime.of(2022,4,28,15,5)),
new Post(LocalDateTime.of(2022,4,28,15,8)),
new Post(LocalDateTime.of(2022,4,28,15,12)),
new Post(LocalDateTime.of(2022,4,28,15,15)),
new Post(LocalDateTime.of(2022,4,28,15,18)),
new Post(LocalDateTime.of(2022,4,28,15,27)),
new Post(LocalDateTime.of(2022,4,28,15,48)),
new Post(LocalDateTime.of(2022,4,28,15,54)));
Duration interval = Duration.ofMinutes(10);
List<PostsGroup> postsGroups = posts.stream()
.collect(groupPostsByInterval(interval));
postsGroups.forEach(System.out::println);
}
Output:
PostsGroup{[Post[dateTime=2022-04-28T15:00], Post[dateTime=2022-04-28T15:03], Post[dateTime=2022-04-28T15:05], Post[dateTime=2022-04-28T15:08]]}
PostsGroup{[Post[dateTime=2022-04-28T15:12], Post[dateTime=2022-04-28T15:15], Post[dateTime=2022-04-28T15:18]]}
PostsGroup{[Post[dateTime=2022-04-28T15:27]]}
PostsGroup{[Post[dateTime=2022-04-28T15:48], Post[dateTime=2022-04-28T15:54]]}
You can also play around with this Online Demo

How to process chuncks of a file with java.util.stream

To get familliar with the stream api, I tried to code a quite simple pattern.
Problem: Having a text file containing not nested blocks of text. All blocks are identified by start/endpatterns (e.g. <start> and <stop>. The content of a block isn't syntactically distinguishable from the noise between the blocks. Therefore it is impossible, to work with simple (stateless) lambdas.
I was just able to implement something ugly like:
Files.lines(path).collect(new MySequentialParseAndProsessEachLineCollector<>());
To be honest, this is not what I want.
Im looking for a mapper something like:
Files.lines(path).map(MyMapAllLinesOfBlockToBuckets()).parallelStream().collect(new MyProcessOneBucketCollector<>());
is there a good way to extract chunks of data from a java 8 stream seems to contain a skeleton of a solution. Unfortunatly, I'm to stubid to translate that to my problem. ;-)
Any hints?
Here is a solution which can be used for converting a Stream<String>, each element representing a line, to a Stream<List<String>>, each element representing a chunk found using a specified delimiter:
public class ChunkSpliterator implements Spliterator<List<String>> {
private final Spliterator<String> source;
private final Predicate<String> start, end;
private final Consumer<String> getChunk;
private List<String> current;
ChunkSpliterator(Spliterator<String> lineSpliterator,
Predicate<String> chunkStart, Predicate<String> chunkEnd) {
source=lineSpliterator;
start=chunkStart;
end=chunkEnd;
getChunk=s -> {
if(current!=null) current.add(s);
else if(start.test(s)) current=new ArrayList<>();
};
}
public boolean tryAdvance(Consumer<? super List<String>> action) {
while(current==null || current.isEmpty()
|| !end.test(current.get(current.size()-1)))
if(!source.tryAdvance(getChunk)) return false;
current.remove(current.size()-1);
action.accept(current);
current=null;
return true;
}
public Spliterator<List<String>> trySplit() {
return null;
}
public long estimateSize() {
return Long.MAX_VALUE;
}
public int characteristics() {
return ORDERED|NONNULL;
}
public static Stream<List<String>> toChunks(Stream<String> lines,
Predicate<String> chunkStart, Predicate<String> chunkEnd,
boolean parallel) {
return StreamSupport.stream(
new ChunkSpliterator(lines.spliterator(), chunkStart, chunkEnd),
parallel);
}
}
The lines matching the predicates are not included in the chunk; it would be easy to change this behavior, if desired.
It can be used like this:
ChunkSpliterator.toChunks( Files.lines(Paths.get(myFile)),
Pattern.compile("^<start>$").asPredicate(),
Pattern.compile("^<stop>$").asPredicate(),
true )
.collect(new MyProcessOneBucketCollector<>())
The patterns are specifying as ^word$ to require the entire line to consist of the word only; without these anchors, lines containing the pattern can start and end a chunk. The nature of the source stream does not allow parallelism when creating the chunks, so when chaining with an immediate collection operation the parallelism for the entire operation is rather limited. It depends on the MyProcessOneBucketCollector if there can be any parallelism at all.
If your final result does not depend on the order of occurrences of the buckets in the source file, it is strongly recommended that either your collector reports itself to be UNORDERED or you insert an unordered() in the stream’s method chains before the collect.

Google Guava - Filter a Collection by the value of one if it's element's properties relative to another element's

Really poorly worded title for my first question here, but hopefully I'll still get an answer to it!
What I would like to have is to be able to chain what the filterIt-method in the following code-snippet does into my existing FluentIterable.
I'm VERY new to Guava (at least the functional programming part of it), so please bear with me.
import com.google.common.collect.Sets;
import org.joda.time.DateTime;
import org.joda.time.Days;
import java.util.Set;
public class Blah {
private DateTime date;
private Blah(final DateTime date) {
this.date = date;
}
public static void main(String[] args) {
Set<Blah> blahs = Sets.newHashSet(
new Blah(DateTime.now()),
new Blah(DateTime.now().minusDays(10)),
new Blah(DateTime.now().minusDays(21)),
new Blah(DateTime.now().minusDays(15))
);
Set<Blah> filteredBlahs = filterIt(blahs);
final int filtered = blahs.size() - filteredBlahs.size();
System.out.println(filtered + " results were filtered out");
}
private static Set<Blah> filterIt(final Set<Blah> blahs) {
final Set<Blah> filteredBlahs = Sets.newHashSet();
for (Blah currentBlah : blahs) {
final DateTime currentDate = currentBlah.date;
for (Blah blah : blahs) {
if (blah != currentBlah && !filteredBlahs.contains(blah)) {
final Days days = Days.daysBetween(currentDate, blah.date);
if (Math.abs(days.getDays()) < 5) {
filteredBlahs.add(currentBlah);
filteredBlahs.add(blah);
}
}
}
}
return filteredBlahs;
}
}
This code is written quickly as an example for what I want implemented. My problem is that I want this type of filtering to happen in the middle of some other transformations, and being able to chain it instead of splitting it up into different Iterables would make the flow more understandable at a glance.
Any feedback on how I can better the question, or clarify it, would be very much welcome!
NO.
To elaborate: You could write Functions to extract the date, you could write Predicates to test them, but even if you've found the way to stick them together, it would be a mess because of all those caveats. You'd have to wait for JDK8 in order to make something at least remotely readable out of it.
The other thing is that you're testing pairs of Blahs (+1 for the name), which goes really too far for a general purpose library. Imagine the myriads of methods like this.
The last thing is that what you're doing is not really functional: Your condition depends on the filteredBlahs list, which changes during the iteration. That's fine, if you need it, but converting this into something functionally-looking would be an obfuscation.
Predicates used for filtering really shouldn't change in the process, otherwise you can run in undefined or confusing behavior like in this issue.

Is there anything in Java close to the parallel collections in Scala?

What is the simplest way to implement a parallel computation (e.g. on a multiple core processor) using Java.
I.E. the java equivalent to this Scala code
val list = aLargeList
list.par.map(_*2)
There is this library, but it seems overwhelming.
http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/
Don't give up so fast, snappy! ))
From the javadocs (with changes to map to your f) the essential matter is really just this:
ParallelLongArray a = ... // you provide
a.replaceWithMapping (new LongOp() { public long op(long a){return a*2L;}};);
is pretty much this, right?
val list = aLargeList
list.par.map(_*2)
& If you are willing to live with a bit less terseness, the above can be a reasonably clean and clear 3 liner (and of course, if you reuse functions, then its the same exact thing as Scala - inline functions.):
ParallelLongArray a = ... // you provide
LongOp f = new LongOp() { public long op(long a){return a*2L;}};
a.replaceWithMapping (f);
[edited above to show concise complete form ala OP's Scala variant]
and here it is in maximal verbose form where we start from scratch for demo:
import java.util.Random;
import jsr166y.ForkJoinPool;
import extra166y.Ops.LongGenerator;
import extra166y.Ops.LongOp;
import extra166y.ParallelLongArray;
public class ListParUnaryFunc {
public static void main(String[] args) {
int n = Integer.parseInt(args[0]);
// create a parallel long array
// with random long values
ParallelLongArray a = ParallelLongArray.create(n-1, new ForkJoinPool());
a.replaceWithGeneratedValue(generator);
// use it: apply unaryLongFuncOp in parallel
// to all values in array
a.replaceWithMapping(unaryLongFuncOp);
// examine it
for(Long v : a.asList()){
System.out.format("%d\n", v);
}
}
static final Random rand = new Random(System.nanoTime());
static LongGenerator generator = new LongGenerator() {
#Override final
public long op() { return rand.nextLong(); }
};
static LongOp unaryLongFuncOp = new LongOp() {
#Override final public long op(long a) { return a * 2L; }
};
}
Final edit and notes:
Also note that a simple class such as the following (which you can reuse across your projects):
/**
* The very basic form w/ TODOs on checks, concurrency issues, init, etc.
*/
final public static class ParArray {
private ParallelLongArray parr;
private final long[] arr;
public ParArray (long[] arr){
this.arr = arr;
}
public final ParArray par() {
if(parr == null)
parr = ParallelLongArray.createFromCopy(arr, new ForkJoinPool()) ;
return this;
}
public final ParallelLongArray map(LongOp op) {
return parr.replaceWithMapping(op);
}
public final long[] values() { return parr.getArray(); }
}
and something like that will allow you to write more fluid Java code (if terseness matters to you):
long[] arr = ... // you provide
LongOp f = ... // you provide
ParArray list = new ParArray(arr);
list.par().map(f);
And the above approach can certainly be pushed to make it even cleaner.
Doing that on one machine is pretty easy, but not as easy as Scala makes it. That library you posted is already apart of Java 5 and beyond. Probably the simplest thing to use is a ExecutorService. That represents a series of threads that can be run on any processor. You send it tasks and those things return results.
http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
http://www.fromdev.com/2009/06/how-can-i-leverage-javautilconcurrent.html
I'd suggest using ExecutorService.invokeAll() which will return a list of Futures. Then you can check them to see if their done.
If you're using Java7 then you could use the fork/join framework which might save you some work. With all of these you can build something very similar to Scala parallel arrays so using it is fairly concise.
Using threads, Java doesn't have this sort of thing built-in.
There will be an equivalent in Java 8: http://www.infoq.com/articles/java-8-vs-scala

Categories

Resources