Appending to a list within a stream to a map - java

I'm attempting to consolidate multiple unnecessary web requests into a map, with the key connected to a location's ID, and the value being a list of products at that location.
The idea is to reduce the amount of requests to my flask server by creating a single request for each location, with a list of required products mapped to it.
I have tried to find others who has faced a similar problem using Java 8's streaming functionality, but I cannot find anyone who is trying to append to a list within a map.
Example;
public class Product {
public Integer productNumber();
public Integer locationNumber();
}
List<Product> products = ... (imagine many products in this list)
Map<Integer, List<Integer>> results = products.stream()
.collect(Collectors.toMap(p -> p.locationNumber, p -> Arrays.asList(p.productNumber));
Also, the second p parameter cannot access the current product in stream.
Because of this, I have been unable to test if I can append to a List when the location number matches a pre-existing list. I don't believe I can use Arrays.asList(), as I believe its immutable.
At the end, the map should have many product numbers in a list per location. Is it possible to append Integers to a pre-existing list within a map?

You may do it like so,
Map<Integer, List<Integer>> res = products.stream()
.collect(Collectors.groupingBy(Product::locationNumber,
Collectors.mapping(Product::productNumber, Collectors.toList())));

The java collectors API is pretty powerful and have lots of nice utility method to solve this.
public class Learn {
static class Product {
final Integer productNumber;
final Integer locationNumber;
Product(Integer productNumber, Integer locationNumber) {
this.productNumber = productNumber;
this.locationNumber = locationNumber;
}
Integer getProductNumber() {
return productNumber;
}
Integer getLocationNumber() {
return locationNumber;
}
}
public static Product of(int i, int j){
return new Product(i,j);
}
public static void main(String[] args) {
List productList = Arrays.asList(of(1,1),of(2,1),of(3,1),
of(7,2),of(8,2),of(9,2));
Map> results = productList.stream().collect(Collectors.groupingBy(Product::getLocationNumber,
Collectors.collectingAndThen(Collectors.toList(), pl->pl.stream().map(Product::getProductNumber).collect(Collectors.toList()))));
System.out.println(results);
}
}
So, what we are doing here is we are streaming the product list and grouping the stream by the location attribute but with the twist that we want to transform the collected list of products to list of product numbers.
Collectors.collectingAndThen is precisely the method for this which will let you specify a main collector toList() and a transformer function which is nothing but again a stream to map product to product numbers. IN java API doc the main collector and transformer are labeled as downstream collector and finisher.
Please go through the Collectors source code to have a complete understanding as to how all these different collectors are defined.

Related

How obtain a Set of Strings from a Map of objects with a string property

I need to get a Set<String> with the accountNumbers from the given Map<String, Account> and filter the map contents by retaining only active accounts (active == true) so that there will be no inactive accounts.
Account class has the following attributes:
private String number;
private String owner;
private double balance;
private boolean active = true;
My solution so far looks like this:
public Set<String> getAccountNumbers() {
return new HashSet<String>(accounts.values().stream().filter(Account::isActive));
}
I tried casting, but that didn't seem to work. Can somebody tell me, how I access the attribute number from here on?
You have to apply map in order to transform the stream of accounts Stream<Account> into a stream of strings Stream<String>. And then apply the terminal operation collect to obtain a Set as a result of the execution of stream pipeline.
Your attempt to pass the stream to the constructor of the HashSet is incorrect (it expects a Collection, not a Stream) and unnecessary.
Note: stream without a terminal operation (like collect, count, forEach, etc.) will never get executed. map and filter are called intermediate operations.
public Set<String> getAccountNumbers() {
return accounts.values().stream() // Stream<Account>
.filter(Account::isActive)
.map(Account::getNumber) // Stream<String>
.collect(Collectors.toSet());
}
For more information on streams take a look at this tutorial

Group and Reduce list of objects

I have a list of objects with many duplicated and some fields that need to be merged. I want to reduce this down to a list of unique objects using only Java 8 Streams (I know how to do this via old-skool means but this is an experiment.)
This is what I have right now. I don't really like this because the map-building seems extraneous and the values() collection is a view of the backing map, and you need to wrap it in a new ArrayList<>(...) to get a more specific collection. Is there a better approach, perhaps using the more general reduction operations?
#Test
public void reduce() {
Collection<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(this::getfoos)
.collect(Collectors.toMap(f -> f.name, f -> f, (l, r) -> {
l.ids.addAll(r.ids);
return l;
})).values();
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0,10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If you break the grouping and reducing steps up, you can get something cleaner:
Stream<Foo> input = Stream.of("foo", "bar", "baz").flatMap(this::getfoos);
Map<String, Optional<Foo>> collect = input.collect(Collectors.groupingBy(f -> f.name, Collectors.reducing(Foo::merge)));
Collection<Optional<Foo>> collected = collect.values();
This assumes a few convenience methods in your Foo class:
public Foo(String n, List<Integer> ids) {
this.name = n;
this.ids.addAll(ids);
}
public static Foo merge(Foo src, Foo dest) {
List<Integer> merged = new ArrayList<>();
merged.addAll(src.ids);
merged.addAll(dest.ids);
return new Foo(src.name, merged);
}
As already pointed out in the comments, a map is a very natural thing to use when you want to identify unique objects. If all you needed to do was find the unique objects, you could use the Stream::distinct method. This method hides the fact that there is a map involved, but apparently it does use a map internally, as hinted by this question that shows you should implement a hashCode method or distinct may not behave correctly.
In the case of the distinct method, where no merging is necessary, it is possible to return some of the results before all of the input has been processed. In your case, unless you can make additional assumptions about the input that haven't been mentioned in the question, you do need to finish processing all of the input before you return any results. Thus this answer does use a map.
It is easy enough to use streams to process the values of the map and turn it back into an ArrayList, though. I show that in this answer, as well as providing a way to avoid the appearance of an Optional<Foo>, which shows up in one of the other answers.
public void reduce() {
ArrayList<Foo> foos = Stream.of("foo", "bar", "baz").flatMap(this::getfoos)
.collect(Collectors.collectingAndThen(Collectors.groupingBy(f -> f.name,
Collectors.reducing(Foo.identity(), Foo::merge)),
map -> map.values().stream().
collect(Collectors.toCollection(ArrayList::new))));
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0, 10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
private static final Foo BASE_FOO = new Foo("", 0);
public static Foo identity() {
return BASE_FOO;
}
// use only if side effects to the argument objects are okay
public static Foo merge(Foo fooOne, Foo fooTwo) {
if (fooOne == BASE_FOO) {
return fooTwo;
} else if (fooTwo == BASE_FOO) {
return fooOne;
}
fooOne.ids.addAll(fooTwo.ids);
return fooOne;
}
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If the input elements are supplied in the random order, then having intermediate map is probably the best solution. However if you know in advance that all the foos with the same name are adjacent (this condition is actually met in your test), the algorithm can be greatly simplified: you just need to compare the current element with the previous one and merge them if the name is the same.
Unfortunately there's no Stream API method which would allow you do to such thing easily and effectively. One possible solution is to write custom collector like this:
public static List<Foo> withCollector(Stream<Foo> stream) {
return stream.collect(Collector.<Foo, List<Foo>>of(ArrayList::new,
(list, t) -> {
Foo f;
if(list.isEmpty() || !(f = list.get(list.size()-1)).name.equals(t.name))
list.add(t);
else
f.ids.addAll(t.ids);
},
(l1, l2) -> {
if(l1.isEmpty())
return l2;
if(l2.isEmpty())
return l1;
if(l1.get(l1.size()-1).name.equals(l2.get(0).name)) {
l1.get(l1.size()-1).ids.addAll(l2.get(0).ids);
l1.addAll(l2.subList(1, l2.size()));
} else {
l1.addAll(l2);
}
return l1;
}));
}
My tests show that this collector is always faster than collecting to map (up to 2x depending on average number of duplicate names), both in sequential and parallel mode.
Another approach is to use my StreamEx library which provides a bunch of "partial reduction" methods including collapse:
public static List<Foo> withStreamEx(Stream<Foo> stream) {
return StreamEx.of(stream)
.collapse((l, r) -> l.name.equals(r.name), (l, r) -> {
l.ids.addAll(r.ids);
return l;
}).toList();
}
This method accepts two arguments: a BiPredicate which is applied for two adjacent elements and should return true if elements should be merged and the BinaryOperator which performs merging. This solution is a little bit slower in sequential mode than the custom collector (in parallel the results are very similar), but it's still significantly faster than toMap solution and it's simpler and somewhat more flexible as collapse is an intermediate operation, so you can collect in another way.
Again both these solutions work only if foos with the same name are known to be adjacent. It's a bad idea to sort the input stream by foo name, then using these solutions, because the sorting will drastically reduce the performance making it slower than toMap solution.
As already pointed out by others, an intermediate Map is unavoidable, as that’s the way of finding the objects to merge. Further, you should not modify source data during reduction.
Nevertheless, you can achieve both without creating multiple Foo instances:
List<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(n->IntStream.range(0,10).mapToObj(i -> new Foo(n, i)))
.collect(collectingAndThen(groupingBy(f -> f.name),
m->m.entrySet().stream().map(e->new Foo(e.getKey(),
e.getValue().stream().flatMap(f->f.ids.stream()).collect(toList())))
.collect(toList())));
This assumes that you add a constructor
public Foo(String n, List<Integer> l) {
name = n;
ids=l;
}
to your Foo class, as it should have if Foo is really supposed to be capable of holding a list of IDs. As a side note, having a type which serves as single item as well as a container for merged results seems unnatural to me. This is exactly why to code turns out to be so complicated.
If the source items had a single id, using something like groupingBy(f -> f.name, mapping(f -> id, toList()), followed by mapping the entries of (String, List<Integer>) to the merged items was sufficient.
Since this is not the case and Java 8 lacks the flatMapping collector, the flatmapping step is moved to the second step, making it look much more complicated.
But in both cases, the second step is not obsolete as it is where the result items are actually created and converting the map to the desired list type comes for free.

TreeMap filtered view performance

I have a class that has (among other things):
public class TimeSeries {
private final NavigableMap<LocalDate, Double> prices;
public TimeSeries() { prices = new TreeMap<>(); }
private TimeSeries(NavigableMap<LocalDate, Double> prices) {
this.prices = prices;
}
public void add(LocalDate date, double price) { prices.put(date, price); }
public Set<LocalDate> dates() { return prices.keySet(); }
//the 2 methods below are examples of why I need a TreeMap
public double lastPriceAt(LocalDate date) {
Map.Entry<LocalDate, Double> price = prices.floorEntry(date);
return price.getValue(); //after some null checks
}
public TimeSeries between(LocalDate from, LocalDate to) {
return new TimeSeries(this.prices.subMap(from, true, to, true));
}
}
Now I need to have a "filtered" view on the map where only some of the dates are available. To that effect I have added the following method:
public TimeSeries onDates(Set<LocalDate> retainDates) {
TimeSeries filtered = new TimeSeries(new TreeMap<> (this.prices));
filtered.dates().retainAll(retainDates);
return filtered;
}
The onDates method is a huge performance bottleneck, representing 85% of the processing time of the program. And since the program is running millions of simulations, that means hours spent in that method.
How could I improve the performance of that method?
I'd give ImmutableSortedMap a try, assuming you can use it. It's based on a sorted array rather then a balanced tree, so I guess its overhead is much smaller(*). For building it, you need to employ biziclop's idea as the builder supports no removals.
(*) There's a call to Collection.sort there, but it should be harmless as the collection is already sorted and TimSort is optimized for such a case.
In case your original map doesn't change after creating onDates, maybe a view could help. In case it does, you'd need some "persistent" map, which sounds rather complicated.
Maybe some hacky solution based on sorted arrays and binary search could be fastest, maybe you could even convert LocalDate first to int and then to double and put everything into a single interleaved double[] in order to save memory (and hopefully also time). You'd need your own binary search, but this is rather trivial.
The view idea is rather simple, assuming that
you don't need all NavigableMap methods, but just a couple of methods
the original map doesn't change
only a few elements are missing in retainDates
An example method:
public double lastPriceAt(LocalDate date) {
Map.Entry<LocalDate, Double> price = prices.floorEntry(date);
while (!retainDates.contains(price.getKey()) {
price = prices.lowerEntry(price.getKey()); // after some null checks
}
return price.getValue(); // after some null checks
}
The simplest optimisation:
public TimeSeries onDates(Set<LocalDate> retainDates) {
TreeMap<LocalDate, Double> filteredPrices = new TreeMap<>();
for (Entry<LocalDate, Double> entry : prices.entrySet() ) {
if (retainDates.contains( entry.getKey() ) ) {
filteredPrices.put( entry.getKey(), entry.getValue() );
}
}
TimeSeries filtered = new TimeSeries( filteredPrices );
return filtered;
}
Saves you the cost of creating a full copy of your map first, then iterating across the copy again to filter.

Is there an aggregateBy method in the stream Java 8 api?

Run across this very interesting but one year old presentation by Brian Goetz - in the slide linked he presents an aggregateBy() method supposedly in the Stream API, which is supposed to aggregate the elements of a list (?) to a map (given a default initial value and a method manipulating the value (for duplicate keys also) - see next slide in the presentation).
Apparently there is no such method in the Stream API. Is there another method that does something analogous in Java 8 ?
The aggregate operation can be done using the Collectors class. So in the video, the example would be equivalent to :
Map<String, Integer> map =
documents.stream().collect(Collectors.groupingBy(Document::getAuthor, Collectors.summingInt(Document::getPageCount)));
The groupingBy method will give you a Map<String, List<Document>>. Now you have to use a downstream collector to sum all the page count for each document in the List associated with each key.
This is done by providing a downstream collector to groupingBy, which is summingInt, resulting in a Map<String, Integer>.
They give basically the same example in the documentation where they compute the sum of the employees' salary by department.
I think that they removed this operation and created the Collectors class instead to have a useful class that contains a lot of reductions that you will use commonly.
Let's say we have a list of employees with their department and salary and we want the total salary paid by each department.
There are several ways to do it and you could for example use a toMap collector to aggregate the data per department:
the first argument is the key mapper (your aggregation axis = the department),
the second is the value mapper (the data you want to aggregate = salaries), and
the third is the merging function (how you want to aggregate data = sum the values).
Example:
import static java.util.stream.Collectors.*;
public static void main(String[] args) {
List<Person> persons = Arrays.asList(new Person("John", "Sales", 10000),
new Person("Helena", "Sales", 10000),
new Person("Somebody", "Marketing", 15000));
Map<String, Double> salaryByDepartment = persons.stream()
.collect(toMap(Person::department, Person::salary, (s1, s2) -> s1 + s2));
System.out.println("salary by department = " + salaryByDepartment);
}
As often with streams, there are several ways to get the desired result, for example:
import static java.util.stream.Collectors.*;
Map<String, Double> salaryByDepartment = persons.stream()
.collect(groupingBy(Person::department, summingDouble(Person::salary)));
For reference, the Person class:
static class Person {
private final String name, department;
private final double salary;
public Person(String name, String department, double salary) {
this.name = name;
this.department = department;
this.salary = salary;
}
public String name() { return name; }
public String department() { return department; }
public double salary() { return salary; }
}
This particular Javadoc entry is about the closest thing I could find on this piece of aggregation in Java 8. Even though it's a third party API, the signatures seem to line up pretty well - you provide some function to get values from, some terminal function for values (zero, in this case), and some function to combine the function and the values together.
It feels a lot like a Collector, which would offer us the ability to do this.
Map<String, Integer> strIntMap =
intList.stream()
.collect(Collectors
.groupingBy(Document::getAuthor,
Collectors.summingInt(Document::getPageCount)));
The idea then is that we group on the author's name for each entry in our list, and add up the total page numbers that the author has into a Map<String, Integer>.

Create dynamic ArrayLists

I have a problem related to "dynamic ArrayLists". I have a List that contains usernames and their data. I want for every distinct username to create a single list that contains all data of this user. For example, I have an arraylist (username,tweet) that has: lefteris,"Plays ball", Kostas, "Plays basketball", lefteris, "Nice weather". And I want after that to create two lists. One list with kostas and his tweets and another with lefteris and its tweets (2 tweets). The parent arraylist may have 20 distinct usernames or more. How can I do that ?
I recommend you to use hashmap or hashset instead because if you need to store something in pairs, hashing is a perfect solution......
I'd go with the following data structure:
HashMap<String, ArrayList<String>>
Then you could manipulate a "dynamic" list of properties keyed to each name, if the properties are single items:
Lefteris->("Plays ball", "Nice weather",...)
Kostas->("Plays basketball",...)
If the properties are key-value pairs, do:
HashMap<String, HashMap<String, Object>>
Data looking like:
Lefteris->(Sport->"Plays ball", Weather->"Nice",...)
Kostas->(Sport->"basketball",...)
Since you parse the items from a file, you can do the following.
Create a map that contains the tweets associated to a particular username
Map<String,List<String>> userTweets = new HashMap<String,List<String>>();
Then, have a method to associate a tweet to certain user, verifying that it is already added in the map and adding it if it isn't.
public void addTweetToUser(String user, String tweet) {
if(userTweets.containsKey(user))
userTweets.get(user).add(tweet);
else {
List<String> newUserTweets = new LinkedList<String>();
newUserTweets.add(tweet);
userTweets.put(user, newUserTweets);
}
}
As a plus, you can improve this by creating an object UserTweet that contains:
public class UserTweet {
private String user;
private String tweet;
//Constructor, Setters & Getters or all of them
}
Then your addTweetToUser method can have an UserTweet parameter instead.
When you want to know the tweets for a certain user, you just obtain the corresponding list from the userTweets map. I alsomethods to remove tweets and/or remove users, just in case.
Several libraries add excellent collection-processing functionality to Java along the lines of what functional languages provide. One such library is Google Guava. Guava provides a MultiMap suitable for grouping things the way you want. There are also many utility methods, like MultiMaps.index(), which collects items from a list into a map by applying some function to the elements of the list to calculate a key. With such support, it only takes a few lines of code and one Function implementation (a closure in any other language) to solve your problem:
import com.google.common.base.Function;
import com.google.common.collect.Lists;
import com.google.common.collect.Multimap;
import com.google.common.collect.Multimaps;
import java.util.Arrays;
import java.util.List;
public class Tweets {
public static final int NAME = 0;
public static final int TWEET = 1;
public static void main(String[] args) {
List<String> namesAndTweets = Arrays.asList(
"lefteris", "Plays ball",
"Kostas", "Plays basketball",
"lefteris", "Nice weather");
List<List<String>> nameTweetPairs =
Lists.partition(namesAndTweets, 2);
Multimap<String, List<String>> namesAndTweetsByName =
Multimaps.index(nameTweetPairs, get(NAME));
Multimap<String, String> tweetsByName =
Multimaps.transformValues(namesAndTweetsByName, get(TWEET));
System.out.println(tweetsByName);
}
private static Function<List<String>, String> get(final int n) {
return new Function<List<String>, String>() {
#Override
public String apply(List<String> nameAndTweet) {
return nameAndTweet.get(n);
}
};
}
}
Outputs:
{lefteris=[Plays ball, Nice weather], Kostas=[Plays basketball]}
Update: To explain the code a bit more, there are three basic steps:
Take the list that has names and tweets all mixed together and use Lists.partition() to break it into pairs of (name, tweet).
Use MultiMaps.index() to build a MultiMap from the pairs, taking the name as the map key. This gives you a map where map keys are names and map values are the (name, tweet) pairs.
Use MultiMaps.transformValues() to reduce the map values from (name, tweet) pairs to just the tweets.
P.S. does anyone know if there's a built-in Function that does what my get() does? It seems like a useful Function that should be provided, but I can't find it anywhere.

Categories

Resources