We have a collection - Map<String,HashSet<String>> requestIdToproductMap
It is meant to map a requestId to the productIds corresponding to the requestId.
Given a set of requestIds and a productId,we iterate through each key(requestId),to find if productId is present.
If yes,remove it and check if the values set becomes empty.
for (String reqId: requests) {
Set<String> productIdset = requestIdToproductMap.get(reqId);
productIdset.remove(productId);
if (productIdset.isEmpty()) {
//add requestId to discarded list
}
}
Is there a more efficient way to do this - given that this is a high volume operation?
As long as you want to retain your Map<String,HashSet<String>> requestIdToproductMap, and need to remove productId from all of the sets, I don't think there is a much better way to do it. You have to check all the sets of given requests and remove the productId from all of them - this is what you do.
Your code as it stands should be actually quite performant. HashMap.get (if you're using a HashMap) is amortized O(1), so is HashSet.remove. So the overall performance should be pretty good.
You could maybe consider a different data structure. Where insteadn of mapping requestId onto set of productIds you simply store pairs of requestId/productId. Implement a class like RequestIdProductIdPair (don't forget equals(...) and hashCode()). Then store RequestIdProductIdPairs in a HashSet<RequestIdProductIdPair> requestIdProductIds. You could then simply construct and remove all the given requestId/productId pairs.
Set<String> discardedRequestIds = requests
.stream()
.map(requestId -> new RequestIdProductIdPair(requestId, productId))
.map(requestIdProductIdPair -> {
if (requestIdProductIds.remove(requestIdProductIdPair) {
return requestIdProductIdPair;
}
else {
return null;
}
})
.filter(Objects::nonNull)
.map(RequestIdProductIdPair::getRequestId)
.collect(Collectors.toSet());
Update: I've thought about it a bit more and on the second thought I think the second option with HashSet<RequestIdProductIdPair> will probably not be better than your code. Removal will be probably somewhat more performant, but this code create a one new RequestIdProductIdPair object per requestId/productId pair. So it will probably end up being worse than your code.
Related
I have a list of employees. Each employee has an unique identifier id. I have an employeeId and have to check whether the employee represented by the employeeid is inside the list. There are two ways i can think of doing it, which of them is better. Is there a performance difference?
1)
if (employees.stream().map(Employee::getId).collect(Collectors.toList()).contains(employeeId)) {
// do something
}
2)
boolean employeeIsInsideTheList = false;
for (Employee employee : employees) {
if (employee.getId() == employeeId) {
employeeIsInsideTheList = true;
}
}
if(employeeIsInsideTheList) {
// do something
}
Your Stream version defeats the purpose of Streams, since it doesn't take advantage of lazy evaluation and short circuiting. You are doing two full iterations - the first to transform the List of Employees to a List of IDs, and the second to search for a specific ID in the List of IDs (via the contains() method).
A better solution would be to search for a matching ID without building a List of IDs:
if (employees.stream().anyMatch(e -> e.getId().equals(employeeId)))) {
// do something
}
Your for loop solution can be similarly improved by breaking out of the loop once a matching identifier is found.
After the improvements, whether or not there is a performance difference is meaningless if the List is relatively small. I'd prefer the Streams version, which is shorter and more readable.
If the List is large, and performance is an issue, I suggest you measure the performance of both solutions to find out which one if faster.
I have a list of objects. I need to update a single object from the list that match my filter. I can do something like below:
List<MyObject> list = list.stream().map(d -> {
if (d.id == 1) {
d.name = "Yahoo";
return d;
}
return d;
});
But my worry is i am like iterating through the whole list which may be up to 20k records. I can do a for loop then break, but that one I think also will be slow.
Is there any efficient way to do this?
Use findFirst so after finding the first matching element in the list remaining elements will not be processed
Optional<MyObject> result = list.stream()
.filter(obj->obj.getId()==1)
.peek(o->o.setName("Yahoo"))
.findFirst();
Or
//will not return anything but will update the first matching object name
list.stream()
.filter(obj->obj.getId()==1)
.findFirst()
.ifPresent(o->o.setName("Yahoo"));
You can use a Map instead of a list and save the id as a key.
https://docs.oracle.com/javase/8/docs/api/java/util/Map.html
Then you can extract it with O(1).
It depends on how often you need to perform this logic on your input data.
If it happens to be several times, consider using a Map as suggested by Raz.
You can also transform your List into a Map using a Stream:
Map<Integer, MyObject> map = list.stream()
.collect(Collectors.toMap(
MyObject::getId
Function.identity()
));
The first argument of toMap maps a stream item to the corresponding key in the map (here the ID of MyObject), and the second argument maps an item to the map value (here the MyObject item itself).
Constructing the map will cost you some time and memory, but once you have it, searching an item by ID is extremely fast.
The more often you search an item, the more constructing the map first pays off.
However, if you only ever need to update a single item and then forget about the whole list, just search for the right element, update it and you're done.
If your data is already sorted by ID, you can use binary search to find your item faster.
Otherwise, you need to iterate through your list until you find your item. In terms of performance, nothing will beat a simple loop here. But using Stream and Optional as shown in Deadpool's answer is fine as well, and might result in clearer code, which is more important in most cases.
.stream().peek(t->t.setTag(t.getTag().replace("/","")));
Do anything you want with peek() meyhod
I have a method named find_duplicates(List<DP> dp_list) which takes an ArrayList of my custom data type DP. Each DP has a String named 'ID' which should be unique for each DP.
My method goes through the whole list and adds any DP which does not have a unique ID to another ArrayList, which is returned when the method finishes. It also changes a boolean field isUnique of the DP from true to false.
I want to make this method multi-threaded, since each check of an element is independent of other elements' checks. But for each check the thread would need to read the dp_list. Is it possible to give the read access of the same List to different threads at the same time? Can you suggest a method to make it multithreaded?
Right now my code looks like this-
List<DP> find_duplicates(List<DP> dp_list){
List<DP> dup_list = new ArrayList<>();
for(DP d: dp_list){
-- Adds d to dup_list and sets d.isUnique=false if d.ID is not unique --
}
return dup_list;
}
List<DP> unique = dp_list.stream().parallel().distinct().collect(Collectors.toList());
Then just find the difference between the original list and the list of unique elements and you have your duplicates.
Obviously you will need a filter if your items are only unique by one of their fields - a quick SO search for "stream distinct by key" can provide a myriad of ways to do that.
It seems like you want to leverage parallelism where possible. First and foremost I'd suggest measuring your code whether that is with an imperative approach or using a sequential stream and then if you think by going parallel can really help improve performance then you can use a parallel stream. see here to help decide when to use a parallel stream.
As for accomplishing the task at hand, it can be done as follows:
List<DP> find_duplicates(List<DP> dp_list){
List<DP> dup_list = dp_list.stream() //dp_list.parallelStream()
.collect(Collectors.groupingBy(DP::getId))
.values()
.stream()
.filter(e -> e.size() > 1)
.flatMap(Collection::stream)
.collect(Collectors.toList());
dup_list.forEach(s -> s.setUnique(false));
return dup_list;
}
This will create a stream from the source then groups the elements by their ids and retains all the elements that have a duplicate id and then finally sets the isUnique field to false;
There are better ways in which you can do this.
All you need to do is get the lock of the list and check if item exists followed by further processing.
void find_duplicates(List<DP> dp_list, DP item){
synchronized(dp_list){
if(dp_list.contains(item)){
//Set your flags
}
}
}
I am trying to achieve the best performance for my app. At some point in the code, i want to retrieve all the values from a map except one that corresponds to a specific key.
Now, if i wanted to retrieve all the values i would use this:
map.values();
and assuming that the TreeMap class is created efficiently, the 'values()' method is just returning a refference so --> O(1).
In my case though i want to exclude the value of a specific key. This code:
Set<String> set = new ...
for (String key: map.keySet()) {
if (!key.equals("badKey")) {
set.add(map.get(key));
}
}
has a complexity of N*(logN) which is much slower than the initial O(1) and this is caused by the need of removing only one value.
Is there a better way to do this?
You can use entrySet instead of keySet. This way it would take O(1) to find out if a given value belongs to the key you wish to exclude.
You can call entrySet any time you need to iterate over the values, and exclude the bad key while iterating over them. This would give you the same complexity as iterating over the values() Collection would.
How about this?
map.entrySet().stream()
.filter(e -> !e.getKey().equals(keyToFilter))
.map(Map.Entry::getValue);
Finish with either forEach or toCollection(Collectors.TO_SET), or simply return the stream.
Sorry if the code doesn't compile exactly, it's from memory and I haven't touched the java 8 APIs in a few months, but you should get the drift. ;)
You can create set from map.values() and after it remove "badKey" value from this set.
Set<String> set = new HashSet<String>(map.values());
String badValue = map.get("badKey");
set.remove(badValue);
I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!
Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}
You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.
I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()
There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.
Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.
If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.