I have a list of objects. I need to update a single object from the list that match my filter. I can do something like below:
List<MyObject> list = list.stream().map(d -> {
if (d.id == 1) {
d.name = "Yahoo";
return d;
}
return d;
});
But my worry is i am like iterating through the whole list which may be up to 20k records. I can do a for loop then break, but that one I think also will be slow.
Is there any efficient way to do this?
Use findFirst so after finding the first matching element in the list remaining elements will not be processed
Optional<MyObject> result = list.stream()
.filter(obj->obj.getId()==1)
.peek(o->o.setName("Yahoo"))
.findFirst();
Or
//will not return anything but will update the first matching object name
list.stream()
.filter(obj->obj.getId()==1)
.findFirst()
.ifPresent(o->o.setName("Yahoo"));
You can use a Map instead of a list and save the id as a key.
https://docs.oracle.com/javase/8/docs/api/java/util/Map.html
Then you can extract it with O(1).
It depends on how often you need to perform this logic on your input data.
If it happens to be several times, consider using a Map as suggested by Raz.
You can also transform your List into a Map using a Stream:
Map<Integer, MyObject> map = list.stream()
.collect(Collectors.toMap(
MyObject::getId
Function.identity()
));
The first argument of toMap maps a stream item to the corresponding key in the map (here the ID of MyObject), and the second argument maps an item to the map value (here the MyObject item itself).
Constructing the map will cost you some time and memory, but once you have it, searching an item by ID is extremely fast.
The more often you search an item, the more constructing the map first pays off.
However, if you only ever need to update a single item and then forget about the whole list, just search for the right element, update it and you're done.
If your data is already sorted by ID, you can use binary search to find your item faster.
Otherwise, you need to iterate through your list until you find your item. In terms of performance, nothing will beat a simple loop here. But using Stream and Optional as shown in Deadpool's answer is fine as well, and might result in clearer code, which is more important in most cases.
.stream().peek(t->t.setTag(t.getTag().replace("/","")));
Do anything you want with peek() meyhod
Related
I'm interested in sorting an list of object based on date attribute in that object. I can either use list sort method.
list.sort( (a, b) -> a.getDate().compareTo(b.getDate()) );
Or I can use stream sorted method
List<E> l = list.stream()
.sorted( (a, b) -> a.getDate().compareTo(b.getDate()))
.collect(Collectors.toList());
Out of both above option which should we use and why?
I know the former one will update my original list and later one will not update the original but instead give me a fresh new list object.
So, I don't care my original list is getting updated or not. So which one is good option and why?
If you only need to sort your List, and don't need any other stream operations (such as filtering, mapping, etc...), there's no point in adding the overhead of creating a Stream and then creating a new List. It would be more efficient to just sort the original List.
If you wish to known which is best, your best option is to benchmark it: you may reuse my answer JMH test.
It should be noted that:
List::sort use Arrays::sort. It create an array before sorting. It does not exists for other Collection.
Stream::sorted is done as state full intermediate operation. This means the Stream need to remember its state.
Without benchmarking, I'd say that:
You should use collection.sort(). It is easier to read: collection.stream().sorted().collect(toList()) is way to long to read and unless you format your code well, you might have an headache (I exaggerate) before understanding that this line is simply sorting.
sort() on a Stream should be called:
if you filter many elements making the Stream effectively smaller in size than the collection (sorting N items then filtering N items is not the same than filtering N items then sorting K items with K <= N).
if you have a map transformation after the sort and you loose a way to sort using the original key.
If you use your stream with other intermediate operation, then sort might be required / useful:
collection.stream() // Stream<U> #0
.filter(...) // Stream<U> #1
.sorted() // Stream<U> #2
.map(...) // Stream<V> #3
.collect(toList()) // List<V> sorted by U.
;
In that example, the filter apply before the sort: the stream #1 is smaller than #0, so the cost of sorting with stream might be less than Collections.sort().
If all that you do is simply filtering, you may also use a TreeSet or a collectingAndThen operation:
collection.stream() // Stream<U> #0
.filter(...) // Stream<U> #1
.collect(toCollection(TreeSet::new))
;
Or:
collection.stream() // Stream<U>
.filter(...) // Stream<U>
.collect(collectingAndThen(toList(), list -> {
list.sort();
return list;
})); // List<V>
Streams have some overheads because it creates many new objects like a concrete Stream, a Collector, and a new List. So if you just want to sort a list and doesn't care about whether the original gets changed or not, use List.sort.
There is also Collections.sort, which is an older API. The difference between it and List.sort can be found here.
Stream.sorted is useful when you are doing other stream operations alongside sorting.
Your code can also be rewritten with Comparator:
list.sort(Comparator.comparing(YourClass::getDate)));
First one would be better in term of performance. In the first one, the sort method just compares the elements of the list and orders them. The second one will create a stream from your list, sort it and create a new list from that stream.
In your case, since you can update the first list, the first approach is the better, both in term of performance and memory consumption. The second one is convenient if you need to and with a stream, or if you have a stream and want to end up with a sorted list.
You use the first method
list.sort((a, b) -> a.getDate().compareTo(b.getDate()));
it's much faster than the second one and it didn't create a new intermediate object. You could use the second method when you want to do some additional stream operations (e.g. filtering, map).
I have a method named find_duplicates(List<DP> dp_list) which takes an ArrayList of my custom data type DP. Each DP has a String named 'ID' which should be unique for each DP.
My method goes through the whole list and adds any DP which does not have a unique ID to another ArrayList, which is returned when the method finishes. It also changes a boolean field isUnique of the DP from true to false.
I want to make this method multi-threaded, since each check of an element is independent of other elements' checks. But for each check the thread would need to read the dp_list. Is it possible to give the read access of the same List to different threads at the same time? Can you suggest a method to make it multithreaded?
Right now my code looks like this-
List<DP> find_duplicates(List<DP> dp_list){
List<DP> dup_list = new ArrayList<>();
for(DP d: dp_list){
-- Adds d to dup_list and sets d.isUnique=false if d.ID is not unique --
}
return dup_list;
}
List<DP> unique = dp_list.stream().parallel().distinct().collect(Collectors.toList());
Then just find the difference between the original list and the list of unique elements and you have your duplicates.
Obviously you will need a filter if your items are only unique by one of their fields - a quick SO search for "stream distinct by key" can provide a myriad of ways to do that.
It seems like you want to leverage parallelism where possible. First and foremost I'd suggest measuring your code whether that is with an imperative approach or using a sequential stream and then if you think by going parallel can really help improve performance then you can use a parallel stream. see here to help decide when to use a parallel stream.
As for accomplishing the task at hand, it can be done as follows:
List<DP> find_duplicates(List<DP> dp_list){
List<DP> dup_list = dp_list.stream() //dp_list.parallelStream()
.collect(Collectors.groupingBy(DP::getId))
.values()
.stream()
.filter(e -> e.size() > 1)
.flatMap(Collection::stream)
.collect(Collectors.toList());
dup_list.forEach(s -> s.setUnique(false));
return dup_list;
}
This will create a stream from the source then groups the elements by their ids and retains all the elements that have a duplicate id and then finally sets the isUnique field to false;
There are better ways in which you can do this.
All you need to do is get the lock of the list and check if item exists followed by further processing.
void find_duplicates(List<DP> dp_list, DP item){
synchronized(dp_list){
if(dp_list.contains(item)){
//Set your flags
}
}
}
We have a collection - Map<String,HashSet<String>> requestIdToproductMap
It is meant to map a requestId to the productIds corresponding to the requestId.
Given a set of requestIds and a productId,we iterate through each key(requestId),to find if productId is present.
If yes,remove it and check if the values set becomes empty.
for (String reqId: requests) {
Set<String> productIdset = requestIdToproductMap.get(reqId);
productIdset.remove(productId);
if (productIdset.isEmpty()) {
//add requestId to discarded list
}
}
Is there a more efficient way to do this - given that this is a high volume operation?
As long as you want to retain your Map<String,HashSet<String>> requestIdToproductMap, and need to remove productId from all of the sets, I don't think there is a much better way to do it. You have to check all the sets of given requests and remove the productId from all of them - this is what you do.
Your code as it stands should be actually quite performant. HashMap.get (if you're using a HashMap) is amortized O(1), so is HashSet.remove. So the overall performance should be pretty good.
You could maybe consider a different data structure. Where insteadn of mapping requestId onto set of productIds you simply store pairs of requestId/productId. Implement a class like RequestIdProductIdPair (don't forget equals(...) and hashCode()). Then store RequestIdProductIdPairs in a HashSet<RequestIdProductIdPair> requestIdProductIds. You could then simply construct and remove all the given requestId/productId pairs.
Set<String> discardedRequestIds = requests
.stream()
.map(requestId -> new RequestIdProductIdPair(requestId, productId))
.map(requestIdProductIdPair -> {
if (requestIdProductIds.remove(requestIdProductIdPair) {
return requestIdProductIdPair;
}
else {
return null;
}
})
.filter(Objects::nonNull)
.map(RequestIdProductIdPair::getRequestId)
.collect(Collectors.toSet());
Update: I've thought about it a bit more and on the second thought I think the second option with HashSet<RequestIdProductIdPair> will probably not be better than your code. Removal will be probably somewhat more performant, but this code create a one new RequestIdProductIdPair object per requestId/productId pair. So it will probably end up being worse than your code.
I get a iterable list which I iterate using following code
for (IssueField issueObj : issue.getFields())
{
System.out.println(issueObj.getId());
}
the list is of following structure
[IssueField{id=customfield_13061, name=Dev Team Updates, type=null, value=null},
IssueField{id=customfield_13060, name=Development, type=null, value={}},
IssueField{id=customfield_11160, name=Rank, type=null, value=1|i0065r:},
IssueField{id=customfield_13100, name=TM Product, type=null, value=IntelliGlance},
IssueField{id=customfield_11560, name=Release Notes, type=null, value=null},
IssueField{id=customfield_13500, name=Request Type, type=null, value=null},
IssueField{id=customfield_13900, name=Category, type=null, value=null},
IssueField{id=environment, name=Environment, type=null, value=null}]
there are more then 100 of such objects in the list. is there a way I can directly get the desired objects value without iterating all the values. currently using something like this which I think is not efficient.
for (IssueField issueObj : issue.getFields())
{
if(issueObj.getId().equalIgnoreCase(someId)){
//Object Found
}
}
If you want to be able to do frequent searching over a large dataset like you said then you should use a HashMap where the string is the value in your getId(). The big O time complexity for search of a HashMap is O(1) where for a list it is O(n). This would net you the desired efficiency.
If you are using java 8, You can try this
<your-object> result1 = <your-list>.stream()
.filter(x -> "jack".equals(x.getId()))
.findAny()
.orElse(null);
First of all it Convert list to Streams, then you want id like "jack", If 'findAny' then return object otherwise return null
If you don't want to iterate the hole Collection to get the desired object, you need to store the objects in the most suitable collection, so if you want to get an element by its id i think the most efficient way is by using a Hashmap , then you use the get function to retrieve the desired element.
HashMap implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Source
Is there a short and elegant way of reversing mapping function?
The problem goes like this: I have a bunch of objects of type A, which I need to map to type B, perform filtering on the B objects, and then return back to corresponding A objects.
To provide some specifics and an example:
I have a bulk filtering function
Collection<CarVO> myFilter(Collection<CarVO> originalCollection);
and a Collection<Car> cars, and finally a mapping method Car#toVO().
I came up with this code:
Collection<Car> filtered = cars.stream()
.filter(car -> !myFilter(Collections.singleton(car)).isEmpty())
.collect(toList());
but I suspect this is not efficient because I filter objects one by one.
Other way to do the same thing:
Map<Car, CarVO> map = new HashMap<>();
for (Car car : cars) map.put(car, car.toVO());
Collection<CarVO> filteredVOs = myFilter(map.values());
cars = cars.stream.filter(car -> filteredVOs.contains(map.get(car))).collect(toList());
But I don't fancy building map explicitly.
UPD: Thank you all,
Didn't know that Map.values() returned a live view. I'll stick to using retainAll() then.
Since your method myFilter(Collection<CarVO>) must make a decision about whether each element should be removed or not, and you assume this to work even for a single-element collection, it must be possible to extract the logic into a single-item Predicate. Using that extracted predicate would be the most efficient solution.
But if you can’t do that and are for whatever reason bound to using this existing method, you may use it as follows:
Map<Car, CarVO> map = new HashMap<>();
for(Car car: cars) map.put(car, car.toVO());
map.values().retainAll(myFilter(map.values()));
Set<Car> filtered=map.keySet();
If you really need a list, you may replace the last line with
List<Car> filtered=new ArrayList<>(map.keySet());
Note that this assumes that myFilter returns a new collection. If it applies the filter directly to its collection parameter by removing elements from it, you may replace map.values().retainAll(myFilter(map.values())); with myFilter(map.values());
Use a Predicate for a single Car?
In the Predicate you could transform a car in a CarVO (if necessary) and then apply the logic for the single element.
And for the the sake of the DRY principle you should use this predicate inside the original myFilter method.