Accumulating value of objects when carrying the same timestamp - java

I am currently stuck on this:
I have datapoints that carry a value and a timestamp as a Long (epoch seconds):
public class MyDataPoint(){
private Float value;
private Long timestamp;
//constructor, getters and setters here
}
I have lists that are bound to different sources where these datapoints are coming from.
public class MySource(){
private Interger sourceId;
private List<MyDataPoint> dataPointList;
//constructor, getters and setters here
}
Now I want to accumulate these datapoints in a new list:
each datapoint with the same timestamp should be accumulated in a new datapoint with the sum of the value of each datapoint that carries the same timestamp.
So for instance I have 3 datapoints with the same timestamp, I want to create one datapoint with the timestamp, and the sum of the three values.
However, these datapoints have not started or ended recording at the same time. And for one timestamp maybe only one datapoint exists.
For now I have stuffed all of the datapoints into one list, thinking I could use streams to achieve my goal, but I can't figure it out. Maybe this is the wrong way anyway because I can't see how to use filters or maps to do this.
I have thought about using Optionals since for one timestamp maybe only one exists, but there is no obvious answer for me.
Anyone able to help me out?

I am guessing that you are trying to grouping the value you in the list, then convert it to new list using stream. What i suggest is using Collectors.groupingBy and Collectors.summingInt to convert your List to a Map<Long,Double> first - which holding your timestamp as key and Double as sum of all value that has same timestamp. After this you can convert this map back to the new list.
Not tested yet but to convert your List to Map<Long, Double> should be something like:
dataPointList.stream().collect(Collectors.groupingBy(d -> d.timestamp, Collectors.summingDouble(d -> d.value))); //you can using method reference for better readability

Following assumes your DataPoint is immutable (you cannot use the same instance to accumulate into) so uses an intermediate Map.
Collection<DataPoint> summary = sources.stream()
.flatMap(source -> source.dataPointList.stream()) // smush sources into a single stream of points
.collect(groupingBy(p -> p.timestamp, summingDouble(p -> (double)p.value))) // Collect points into Map<Long, Double>
.entrySet().stream() // New stream, the entries of the Map
.map(e -> new MyDataPoint(e.getKey(), e.getValue()))
.collect(toList());
Another solution avoids the potentially large intermediate Map by collecting directly into a DataPoint.
public static DataPoint combine(DataPoint left, DataPoint right) {
return new DataPoint(left.timestamp, left.value + right.value); // return new if immutable or increase left if not
}
Collection<DataPoint> summary = sources.stream()
.flatMap(source -> source.dataPointList.stream()) // smush into a single stream of points
.collect(groupingBy(p -> p.timestamp, reducing(DataPoint.ZERO, DataPoint::combine))) // Collect all values into Map<Long, DataPoint>
.values();
This can be upgraded to parallelStream() if DataPoint is threadsafe etc

I think the "big picture" solution it's quite easy even if I can predict some multithread issues to complicate all.
In pure Java, you need simply a Map:
Map<Long,List<MyDataPoint>> dataPoints = new HashMap<>();
just use Timestamp as KEY.
For the sake of OOP, Let's create a class like DataPointCollector
public class DataPointCollector {
private Map<Long,List<MyDataPoint>> dataPoints = new HashMap<>();
}
To add element, create a method in DataPointCollector like:
public void addDataPoint(MyDataPoint dp){
if (dataPoints.get(dp.getTimestamp()) == null){
dataPoints.put(dp.getTimestamp(), new ArrayList<MyDataPoint>());
}
dataPoints.get(dp.getTimestamp()).add(dp);
}
This solve most of your theorical problems.
To get the sum, just iterate over the List and sum the values.
If you need a realtime sum, just wrap the List in another object that has totalValue and List<MyDataPoint> as fields and update totalValue on each invokation of addDataPoint(...).
About streams: streams depends by use cases, if in a certain time you have all the DataPoints you need, of course you can use Streams to do things... however streams are often expensive for common cases and I think it's better to focus on an easy solution and then make it cool with streams only if needed

Related

Java 8 : functional way to write sort, filter and count at same time

I am pretty new to Java, and I am trying to write the below logic in functional way.
I have a list of Objects, which have many fields. List<someObject>
The fields of interest for now are long timestamp and String bookType
The problem statement is - I want to find the count of number of Objects in given list which have the same bookType as the one with lowest timestamp.
For example, if we sort the given list of objects based on timestamp in ascending order, and the first object in the sorted list has bookType field as SOMETYPE ; then I want to find out how many Objects are there in the list with the bookType SOMETYPE
I have written this logic using the plain old non functional way, by maintaing 2 temp variables and then iterating over the list once to find the lowest timestamp and the corresponding bookType amd a count of each bookType
But this is not acceptable to be run in a lambda, as it requires variables to be final
I could only write the part where I could sort the given list based on timestamp
n.stream().sorted(Comparator.comparingLong(someObject::timestamp)).collect(Collectors.toList());
I am stuck how to proceed with finding the count of the lowest timestamp bookType
But this is not acceptable to be run in a lambda, as it requires variables to be final
First of all - this is not a problem since you can make your variable Effectively final by creating eg. single entry array and pass its single (first) object to the lambda
Second thing is that there's basically no sense to put everything in one lambda - think about this, how logically finding min value is connected with counting objects grouped by some attribute? It is not - putting this (somehow) to one stream will just obfuscate your code
What you should do - you should prepare method to find min value and returning you it's bookType then stream collection and group it by bookType and return size of the collection with given key value
It could look like on this scratch:
public class Item {
private final long timestamp;
private final String bookType;
// Constructors, getters etc
}
// ...
public int getSizeOfBookTypeByMinTimestamp() {
return items.stream()
.collect(Collectors.groupingBy(Item::getBookType))
.get(getMin(items))
.size();
}
private String getMin(List<Item> items) {
return items
.stream()
.min(Comparator.comparingLong(Item::getTimestamp))
.orElse( /* HANDLE NO OBJECT */ ) // you can use also orElseThrow etc
.getBookType();
}
The best way is to first find the item with the lowest timestamp and then filter the list for items with a matching timestamp. So in two steps:
Book first = n.stream().min(Comparator.comparingLong(someObject::timestamp).orElseThrow(NoSuchElementException::new);
List<Book> result = n.stream().filter(b -> b.timestamp.equals(first.timestamp)).collect(Collectors.toList());

Optimising multiple streams to single loop

I am trying to find the best way to optimise the converters below to follow the flow I call 'convertAndGroupForUpdate' first which triggers the conversions and relevant mappings.
Any help to optimise this code would be massively appreciated.
public List<GroupedOrderActionUpdateEntity> convertAndGroupForUpdate(List<SimpleRatifiableAction> actions) {
List<GroupedOrderActionUpdateEntity> groupedActions = new ArrayList<>();
Map<String, List<SimpleRatifiableAction>> groupSimple = actions.stream()
.collect(Collectors.groupingBy(x -> x.getOrderNumber() + x.getActionType()));
groupSimple.entrySet().stream()
.map(x -> convertToUpdateGroup(x.getValue()))
.forEachOrdered(groupedActions::add);
return groupedActions;
}
public GroupedOrderActionUpdateEntity convertToUpdateGroup(List<SimpleRatifiableAction> actions) {
List<OrderActionUpdateEntity> actionList = actions.stream().map(x -> convertToUpdateEntity(x)).collect(Collectors.toList());
return new GroupedOrderActionUpdateEntity(
actions.get(0).getOrderNumber(),
OrderActionType.valueOf(actions.get(0).getActionType()),
actions.get(0).getSource(),
12345,
actions.stream().map(SimpleRatifiableAction::getNote)
.collect(Collectors.joining(", ", "Group Order Note: ", ".")),
actionList);
}
public OrderActionUpdateEntity convertToUpdateEntity(SimpleRatifiableAction action) {
return new OrderActionUpdateEntity(action.getId(), OrderActionState.valueOf(action.getState()));
}
You can’t elide a grouping operation, but you don’t need to store the intermediate result in a local variable.
Further, you should not add to a list manually, when you can collect to a List. Just do it like you did in the other method.
Also, creating a grouping key via string concatenation is tempting, but very dangerous, depending on the contents of the properties, the resulting strings may clash. And string concatenation is rather expensive. Just create a list of the property values, as long as you don’t modify it, it provides the right equality semantics and hash code implementation.
If you want to process the values of a map only, don’t call entrySet(), to map each entry via getValue(). Just use values() in the first place.
public List<GroupedOrderActionUpdateEntity> convertAndGroupForUpdate(
List<SimpleRatifiableAction> actions) {
return actions.stream()
.collect(Collectors.groupingBy( // use List.of(…, …) in Java 9 or newer
x -> Arrays.asList(x.getOrderNumber(), x.getActionType())))
.values().stream()
.map(x -> convertToUpdateGroup(x))
.collect(Collectors.toList());
}
Since convertToUpdateGroup is processing the list of actions of each group multiple times, there is not much that can be simplified and I wouldn’t inline it either. If there was only one operation, e.g. joining them to a string, you could do that right in the groupingBy operation, but there is no simply way to collect to multiple results.

Efficient ways to traverse and group similar objects from a huge collection

I am currently working towards on an implementation that basically involves attending to an arraylist of objects, say a 1000, find commonalities in their properties and group them.
For example
ArrayList itemList<CustomJaxbObj> = {Obj1,obj2,....objn} //n can reach to 1000
Object attributes - year of registration, location, amount
Grouping criteria - for objects with same year of reg and location...add the amount
If there are 10 Objects, out of which 8 objects have same loc and year of registration, add amount for all 8 and other 2 whose year of reg and loc match. So at the end of operation I am left with 2 objects. 1 which is a total sum of 8 matched objects and 1 which is a total of 2 matched criteria of objects.
Currently I am using dual traditional loops. Advanced loops are better but they dont offer much control over indices, which I need to perform grouping. It allows me to keep track of which individual entries combined to form a new entry of grouped entries.
for (i = 0; i < objlist.size(); i++) {
for(j = i+1; j< objList.size();j++){
//PErform the check with if/else condition and traverse the whole list
}
}
Although this does the job, looked very inefficient and process heavy. Is there a better way to do this. I have seen other answers which asked me to use Java8 streams, but the operations are complex, hence grouping needs to be done. I have given an example of doing something when there is a match but there is more to it than just adding.
Is there a better approach to this? A better data structure to hold data of this kind which makes searching and grouping easier?
Adding more perspective, apologies for not furnishing this info before.
The arraylist is a collection of jaxb objects from an incoming payload xml.
XML heirarchy
<Item>
<Item1>
<Item-Loc/>
<ItemID>
<Item-YearofReg/>
<Item-Details>
<ItemID/>
<Item-RefurbishMentDate>
<ItemRefurbLoc/>
</Item-Details>
</Item1>
<Item2></Item2>
<Item3></Item3>
....
</Item>
So the Jaxb Object of Item has a list of 900-1000 Items. Each item might have a sub section of ItemDetails which has a refurbishment date.The problem I face is, dual loops work fine when there is no Item Details section, and every item can be traversed and checked. Requirement says if the item has been refurbished, then we overlook its year of reg and instead consider year of refurbishment to match the criteria.
Another point is, Item Details need not belong to same Item in the section, that is Item1's item details can come up in Item2 Item Details section, item id is the field using which we map the correct item to its item details.
This would mean I cannot start making changes unless I have read through the complete list. Something a normal for loop would do it, but it would increase the cyclomatic complexity, which has already increased because of dual loops.
Hence the question, which would need a data structure to first store and analyse the list of objects before performing the grouping.
Apologies for not mentioning this before. My first question in stackoverflow, hence the inexperience.
Not 100% sure what your end goal is but here is something to get you started. to group by the two properties, you can do something like:
Map<String, Map<Integer, List<MyObjectType>>> map = itemList.stream()
.collect(Collectors.groupingBy(MyObjectType::getLoc,
Collectors.groupingBy(MyObjectType::getYear)));
The solution above assumes getLoc is a type String and getYear is a type Integer, you can then perform further stream operations to get the sum you want.
You can use hash to add the amounts of elements having same year of registration and location
You can use Collectors.groupingBy(classifier, downstream) with Collectors.summingInt as the downstream collector. You didn't post the class of the objects so I took the leave to define my own. But the idea is similar. I also used AbstractMap.SimpleEntry as the key to the final map.
import java.util.AbstractMap;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class GroupByYearAndLoc {
static class Node {
private Integer year;
private String loc;
private int value;
Node(final Integer year, final String loc, final int value) {
this.year = year;
this.loc = loc;
this.value = value;
}
}
public static void main(String[] args) {
List<Node> nodes = new ArrayList<>();
nodes.add(new Node(2017, "A", 10));
nodes.add(new Node(2017, "A", 12));
nodes.add(new Node(2017, "B", 13));
nodes.add(new Node(2016, "A", 10));
Map<AbstractMap.SimpleEntry<Integer, String>, Integer> sums = nodes.stream()
// group by year and location, then sum the value.
.collect(Collectors.groupingBy(n-> new AbstractMap.SimpleEntry<>(n.year, n.loc), Collectors.summingInt(x->x.value)));
sums.forEach((k, v)->{
System.out.printf("(%d, %s) = %d\n", k.getKey(), k.getValue(), v);
});
}
}
And the output:
(2017, A) = 22
(2016, A) = 10
(2017, B) = 13
I would make "Year+Location" concatenated be the key in a hashmap, and then let that map hold whatever is associated with each unique key. Then you can just have one "for loop" (not nested looping). That's the simplest approach.

What is the best way to aggregate Streams into one DISTINCT with Java 8

Suppose i have multiple java 8 streams that each stream potentially can be converted into Set<AppStory> , now I want with the best performance to aggregate all streams into one DISTINCT stream by ID , sorted by property ("lastUpdate")
There are several ways to do what but i want the fastest one , for example:
Set<AppStory> appStr1 =StreamSupport.stream(splititerato1, true).
map(storyId1 -> vertexToStory1(storyId1).collect(toSet());
Set<AppStory> appStr2 =StreamSupport.stream(splititerato2, true).
map(storyId2 -> vertexToStory2(storyId1).collect(toSet());
Set<AppStory> appStr3 =StreamSupport.stream(splititerato3, true).
map(storyId3 -> vertexToStory3(storyId3).collect(toSet());
Set<AppStory> set = new HashSet<>();
set.addAll(appStr1)
set.addAll(appStr2)
set.addAll(appStr3) , and than make sort by "lastUpdate"..
//POJO Object:
public class AppStory implements Comparable<AppStory> {
private String storyId;
private String ........... many other attributes......
public String getStoryId() {
return storyId;
}
#Override
public int compareTo(AppStory o) {
return this.getStoryId().compareTo(o.getStoryId());
}
}
... but it is the old way.
How can I create ONE DISTINCT by ID sorted stream with BEST PERFORMANCE
somethink like :
Set<AppStory> finalSet = distinctStream.sort((v1, v2) -> Integer.compare('not my issue').collect(toSet())
Any Ideas ?
BR
Vitaly
I think the parallel overhead is much greater than the actual work as you stated in the comments. So let your Streams do the job in sequential manner.
FYI: You should prefer using Stream::concat because slicing operations like Stream::limit can be bypassed by Stream::flatMap.
Stream::sorted is collecting every element in the Stream into a List, sort the List and then pushing the elements in the desired order down the pipeline. Then the elements are collected again. So this can be avoided by collecting the elements into a List and do the sorting afterwards. Using a List is a far better choice than using a Set because the order matters (I know there is a LinkedHashSet but you can't sort it).
This is the in my opinion the cleanest and maybe the fastest solution since we cannot prove it.
Stream<AppStory> appStr1 =StreamSupport.stream(splititerato1, false)
.map(this::vertexToStory1);
Stream<AppStory> appStr2 =StreamSupport.stream(splititerato2, false)
.map(this::vertexToStory2);
Stream<AppStory> appStr3 =StreamSupport.stream(splititerato3, false)
.map(this::vertexToStory3);
List<AppStory> stories = Stream.concat(Stream.concat(appStr1, appStr2), appStr3)
.distinct().collect(Collectors.toList());
// assuming AppStory::getLastUpdateTime is of type `long`
stories.sort(Comparator.comparingLong(AppStory::getLastUpdateTime));
I can't guarantee that this would be faster than what you have (I guess so, but you'll have to measure to be sure), but you can simply do this, assuming you have 3 streams:
List<AppStory> distinctSortedAppStories =
Stream.of(stream1, stream2, stream3)
.flatMap(Function.identity())
.map(this::vertexToStory)
.distinct()
.sorted(Comparator.comparing(AppStory::getLastUpdate))
.collect(Collectors.toList());

Storing data for a linked list in a pair in Java

I am trying to create a linked list that will take a large amount of data, either integers or strings, and get the frequency that they occur. I know how to create a basic linked list that would achieve this but since the amount of data is so large, I want to find a quicker way to sort through the data, instead of going through the entire linked list every time I call a certain method. In order to do this I need to make a Pair of <Object, Integer> where the Object is the data and the integer is the frequency it occurs.
So far I have tried creating arrays and lists that would help me sort out the data but cannot figure out how to get it into a Pair that represents the data and frequency. If you have any ideas that can help me at least get started that would be much appreciated.
First of all you must define your own data type, let's say
public FrequencyCount<T> implements Comparable<FrequencyCount<T>>
{
public final T data;
public int frequency;
public int compareTo(FrequencyCount<T> other) {
// implement this method to choose your correct natural ordering
}
}
With a similar object everything becomes trivial:
List<FrequencyCount<Some>> data = new ArrayList<FrequencyCount<Some>>();
Collections.sort(data);
Set<FrequencyCount<Some>> sortedData = new TreeSet<FrequencyCount<Some>>(data);
You could place all values into a List, create a Set from it and then iterate over the Set to find the frequency in the List using Collections.frequency: http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#frequency(java.util.Collection,%20java.lang.Object)
List<Integer> allValues = ...;
Set<Integer> uniqueValues = new HashSet<Integer>(allValues);
for(Integer val : uniqueValues) {
int frequency = Collections.frequency(allValues, val);
// use val and frequency as key and value as you wish
}

Categories

Resources