Make an iterative multi-threaded method - java

I have a method named find_duplicates(List<DP> dp_list) which takes an ArrayList of my custom data type DP. Each DP has a String named 'ID' which should be unique for each DP.
My method goes through the whole list and adds any DP which does not have a unique ID to another ArrayList, which is returned when the method finishes. It also changes a boolean field isUnique of the DP from true to false.
I want to make this method multi-threaded, since each check of an element is independent of other elements' checks. But for each check the thread would need to read the dp_list. Is it possible to give the read access of the same List to different threads at the same time? Can you suggest a method to make it multithreaded?
Right now my code looks like this-
List<DP> find_duplicates(List<DP> dp_list){
List<DP> dup_list = new ArrayList<>();
for(DP d: dp_list){
-- Adds d to dup_list and sets d.isUnique=false if d.ID is not unique --
}
return dup_list;
}

List<DP> unique = dp_list.stream().parallel().distinct().collect(Collectors.toList());
Then just find the difference between the original list and the list of unique elements and you have your duplicates.
Obviously you will need a filter if your items are only unique by one of their fields - a quick SO search for "stream distinct by key" can provide a myriad of ways to do that.

It seems like you want to leverage parallelism where possible. First and foremost I'd suggest measuring your code whether that is with an imperative approach or using a sequential stream and then if you think by going parallel can really help improve performance then you can use a parallel stream. see here to help decide when to use a parallel stream.
As for accomplishing the task at hand, it can be done as follows:
List<DP> find_duplicates(List<DP> dp_list){
List<DP> dup_list = dp_list.stream() //dp_list.parallelStream()
.collect(Collectors.groupingBy(DP::getId))
.values()
.stream()
.filter(e -> e.size() > 1)
.flatMap(Collection::stream)
.collect(Collectors.toList());
dup_list.forEach(s -> s.setUnique(false));
return dup_list;
}
This will create a stream from the source then groups the elements by their ids and retains all the elements that have a duplicate id and then finally sets the isUnique field to false;

There are better ways in which you can do this.
All you need to do is get the lock of the list and check if item exists followed by further processing.
void find_duplicates(List<DP> dp_list, DP item){
synchronized(dp_list){
if(dp_list.contains(item)){
//Set your flags
}
}
}

Related

Update a single object from a list using stream

I have a list of objects. I need to update a single object from the list that match my filter. I can do something like below:
List<MyObject> list = list.stream().map(d -> {
if (d.id == 1) {
d.name = "Yahoo";
return d;
}
return d;
});
But my worry is i am like iterating through the whole list which may be up to 20k records. I can do a for loop then break, but that one I think also will be slow.
Is there any efficient way to do this?
Use findFirst so after finding the first matching element in the list remaining elements will not be processed
Optional<MyObject> result = list.stream()
.filter(obj->obj.getId()==1)
.peek(o->o.setName("Yahoo"))
.findFirst();
Or
//will not return anything but will update the first matching object name
list.stream()
.filter(obj->obj.getId()==1)
.findFirst()
.ifPresent(o->o.setName("Yahoo"));
You can use a Map instead of a list and save the id as a key.
https://docs.oracle.com/javase/8/docs/api/java/util/Map.html
Then you can extract it with O(1).
It depends on how often you need to perform this logic on your input data.
If it happens to be several times, consider using a Map as suggested by Raz.
You can also transform your List into a Map using a Stream:
Map<Integer, MyObject> map = list.stream()
.collect(Collectors.toMap(
MyObject::getId
Function.identity()
));
The first argument of toMap maps a stream item to the corresponding key in the map (here the ID of MyObject), and the second argument maps an item to the map value (here the MyObject item itself).
Constructing the map will cost you some time and memory, but once you have it, searching an item by ID is extremely fast.
The more often you search an item, the more constructing the map first pays off.
However, if you only ever need to update a single item and then forget about the whole list, just search for the right element, update it and you're done.
If your data is already sorted by ID, you can use binary search to find your item faster.
Otherwise, you need to iterate through your list until you find your item. In terms of performance, nothing will beat a simple loop here. But using Stream and Optional as shown in Deadpool's answer is fine as well, and might result in clearer code, which is more important in most cases.
.stream().peek(t->t.setTag(t.getTag().replace("/","")));
Do anything you want with peek() meyhod

Iterating values of a Set

We have a collection - Map<String,HashSet<String>> requestIdToproductMap
It is meant to map a requestId to the productIds corresponding to the requestId.
Given a set of requestIds and a productId,we iterate through each key(requestId),to find if productId is present.
If yes,remove it and check if the values set becomes empty.
for (String reqId: requests) {
Set<String> productIdset = requestIdToproductMap.get(reqId);
productIdset.remove(productId);
if (productIdset.isEmpty()) {
//add requestId to discarded list
}
}
Is there a more efficient way to do this - given that this is a high volume operation?
As long as you want to retain your Map<String,HashSet<String>> requestIdToproductMap, and need to remove productId from all of the sets, I don't think there is a much better way to do it. You have to check all the sets of given requests and remove the productId from all of them - this is what you do.
Your code as it stands should be actually quite performant. HashMap.get (if you're using a HashMap) is amortized O(1), so is HashSet.remove. So the overall performance should be pretty good.
You could maybe consider a different data structure. Where insteadn of mapping requestId onto set of productIds you simply store pairs of requestId/productId. Implement a class like RequestIdProductIdPair (don't forget equals(...) and hashCode()). Then store RequestIdProductIdPairs in a HashSet<RequestIdProductIdPair> requestIdProductIds. You could then simply construct and remove all the given requestId/productId pairs.
Set<String> discardedRequestIds = requests
.stream()
.map(requestId -> new RequestIdProductIdPair(requestId, productId))
.map(requestIdProductIdPair -> {
if (requestIdProductIds.remove(requestIdProductIdPair) {
return requestIdProductIdPair;
}
else {
return null;
}
})
.filter(Objects::nonNull)
.map(RequestIdProductIdPair::getRequestId)
.collect(Collectors.toSet());
Update: I've thought about it a bit more and on the second thought I think the second option with HashSet<RequestIdProductIdPair> will probably not be better than your code. Removal will be probably somewhat more performant, but this code create a one new RequestIdProductIdPair object per requestId/productId pair. So it will probably end up being worse than your code.

Using Java 8 Stream Reduce to return List after performing operation on each element using previous elements values

I'm new to Streams and Reduce so I'm trying it out and have hit a problem:
I have a list of counters which have a start counter and end counter. The startcounter of an item is always the endcounter of the previous. I have a list of these counters listItems which I want to loop through efficiently, filter out inactive records and then reduce the list into a new List where all the StartCounters are set. I have the following code:
List<CounterChain> active = listItems.stream()
.filter(e -> e.getCounterStatus() == CounterStatus.ACTIVE)
.reduce(new ArrayList<CounterChain>(), (a,b) -> { b.setStartCounter(a.getEndCounter()); return b; });
But it doesn't really work and I'm kind of stuck, can anyone give me a few suggestions to help me get this working? Or is there an equally efficient better way to do this? Thanks!
A Reduction reduces all elements to a single value. Using a reduction function of the (a,b) -> b form will reduce all elements to the last one, so it’s not appropriate when you want to get a List containing all (matching) elements.
Besides that, you are performing a modification of the input value, which is violating the contract of that operation. Note further, that the function is required to be associative, i.e. it shouldn’t matter whether the stream will perform f(f(e₁,e₂),e₃)) or f(e₁,f(e₂,e₃)) when processing three subsequent stream elements with your reduction function.
Or, to put it in one line, you are not using the right tool for the job.
The cleanest solution is not to mix these unrelated operations:
List<CounterChain> active = listItems.stream()
.filter(e -> e.getCounterStatus() == CounterStatus.ACTIVE)
.collect(Collectors.toList());
for(int ix=1, num=active.size(); ix<num; ix++)
active.get(ix).setStartCounter(active.get(ix-1).getEndCounter());
The second loop could also be implemented using forEach, but it would require an inner class due to its stateful nature:
active.forEach(new Consumer<CounterChain>() {
CounterChain last;
public void accept(CounterChain next) {
if(last!=null) next.setStartCounter(last.getEndCounter());
last = next;
}
});
Or, using an index based stream:
IntStream.range(1, active.size())
.forEach(ix -> active.get(ix).setStartCounter(active.get(ix-1).getEndCounter()));
But neither has much advantage over a plain for loop.
although the solution with plain for loop provided by #Holger is good enough, I would like to recommend you try third-party library for this kind of common issues. for example: StreamEx or JOOL. Here is solution by StreamEx.
StreamEx.of(listItems).filter(e -> e.getCounterStatus() == CounterStatus.ACTIVE)
.scanLeft((a,b) -> { b.setStartCounter(a.getEndCounter()); return b; });

Java sets vs lists

Can someone suggest me a datatype/structure in java that satisfies:
1) no fixed size
2) does not automatically sort data. Data should be stored in the order in which it arrives
3) it should store only unique entries
4) its elements are accessible or atleast the first element should be!
links are not able to maintain unique entries.
I tried working with Sets but it changes the order of my data automatically which i dont want to let happen.
So i am now trying to work my way with LinkedHashSet, but I am not able to find the exact way to access the first element of the same for comparision.
Any suggestions please. Thanks!
You can use LinkedHashSet if you don't wanna write your own structure. Getting elements may be kinda tricky, try this:
Integer lastInteger = set.stream().skip(set.size()-1).findFirst().get();
This is gonna get the last element, if you want different elements you need to skip a different count. This is only one of the ways, you can get an iterator and iterate yourself etc. Remember to override hashCode and equals when working with sets.
LinkedHashSet is the right data structure for your requirements.
You can access the first element like so:
Set<String> set = new LinkedHashSet<>();
set.add("a");
set.add("b"); // And so on
// Retrieve first element
// Will throw NoSuchElementException if set is empty
String firstElement = set.iterator().next();
// Retrieve and remove first element
Iterator<String> i = set.iterator();
String otherFirstElement = i.next();
i.remove();
For accessing other elements, see answer from #Whatzs.
If I properly understand your question, you are looking for a data structure that would combine the properties of a Set and an ArrayList, a kind of "ArraySet".
I haven't found anything in the core java for that but it looks like the Android JDK has such a data structure.
https://developer.android.com/reference/android/util/ArraySet.html
https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/util/ArraySet.java
One solution might be to build your own based on the android implementation.

Filtering List without using iterator

I need to filter a List of size 1000 or more and get a sublist out of it.
I dont want to use an iterator.
1) At present I am iterating the List and comparing it using Java. This is time consuming task. I need to increase the performance of my code.
2) I also tried to use Google Collections(Guava), but I think it will also iterate in background.
Predicate<String> validList = new Predicate<String>(){
public boolean apply(String aid){
return aid.contains("1_15_12");
}
};
Collection<String> finalList =com.google.common.collect.Collections2.filter(Collection,validList);
Can anyone suggest me how can I get sublist faster without iterating or if iterator is used I will get result comparatively faster.
Consider what happens if you call size() on your sublist. That has to check every element, as every element may change the result.
If you have a very specialized way of using your list which means you don't touch every element in it, don't use random access, etc, perhaps you don't want the List interface at all. If you could tell us more about what you're doing, that would really help.
List is an ordered collection of objects. So You must to iterate it in order to filter.
I enrich my comment:
I think iterator is inevitable during filtering, as each element has to be checked.
Regarding to Collections2.filter, it's different from simple filter: the returned Collection is still "Predicated". That means IllegalArgumentException will be thrown if unsatisfied element is added to the Collection.
If the performance is really your concern, most probably the predicate is pretty slow. What you can do is to Lists.partition your list, filter in parallel (you have to write this) and then concatenate the results.
There might be better ways to solve your problem, but we would need more information about the predicate and the data in the List.

Categories

Resources