I want to periodically iterate over a ConcurrentHashMap while removing entries, like this:
for (Iterator<Entry<Integer, Integer>> iter = map.entrySet().iterator(); iter.hasNext(); ) {
Entry<Integer, Integer> entry = iter.next();
// do something
iter.remove();
}
The problem is that another thread may be updating or modifying values while I'm iterating. If that happens, those updates can be lost forever, because my thread only sees stale values while iterating, but the remove() will delete the live entry.
After some consideration, I came up with this workaround:
map.forEach((key, value) -> {
// delete if value is up to date, otherwise leave for next round
if (map.remove(key, value)) {
// do something
}
});
One problem with this is that it won't catch modifications to mutable values that don't implement equals() (such as AtomicInteger). Is there a better way to safely remove with concurrent modifications?
Your workaround works but there is one potential scenario. If certain entries have constant updates map.remove(key,value) may never return true until updates are over.
If you use JDK8 here is my solution
for (Iterator<Entry<Integer, Integer>> iter = map.entrySet().iterator(); iter.hasNext(); ) {
Entry<Integer, Integer> entry = iter.next();
Map.compute(entry.getKey(), (k, v) -> f(v));
//do something for prevValue
}
....
private Integer prevValue;
private Integer f(Integer v){
prevValue = v;
return null;
}
compute() will apply f(v) to the value and in our case assign the value to the global variable and remove the entry.
According to Javadoc it is atomic.
Attempts to compute a mapping for the specified key and its current mapped value (or null if there is no current mapping). The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.
Your workaround is actually pretty good. There are other facilities on top of which you can build a somewhat similar solution (e.g. using computeIfPresent() and tombstone values), but they have their own caveats and I have used them in slightly different use-cases.
As for using a type that doesn't implement equals() for the map values, you can use your own wrapper on top of the corresponding type. That's the most straightforward way to inject custom semantics for object equality into the atomic replace/remove operations provided by ConcurrentMap.
Update
Here's a sketch that shows how you can build on top of the ConcurrentMap.remove(Object key, Object value) API:
Define a wrapper type on top of the mutable type you use for the values, also defining your custom equals() method building on top of the current mutable value.
In your BiConsumer (the lambda you're passing to forEach), create a deep copy of the value (which is of type your new wrapper type) and perform your logic determining whether the value needs to be removed on the copy.
If the value needs to be removed, call remove(myKey, myValueCopy).
If there have been some concurrent changes while you were calculating whether the value needs to be removed, remove(myKey, myValueCopy) will return false (barring ABA problems, which are a separate topic).
Here's some code illustrating this:
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicInteger;
public class Playground {
private static class AtomicIntegerWrapper {
private final AtomicInteger value;
AtomicIntegerWrapper(int value) {
this.value = new AtomicInteger(value);
}
public void set(int value) {
this.value.set(value);
}
public int get() {
return this.value.get();
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!(obj instanceof AtomicIntegerWrapper)) {
return false;
}
AtomicIntegerWrapper other = (AtomicIntegerWrapper) obj;
if (other.value.get() == this.value.get()) {
return true;
}
return false;
}
public static AtomicIntegerWrapper deepCopy(AtomicIntegerWrapper wrapper) {
int wrapped = wrapper.get();
return new AtomicIntegerWrapper(wrapped);
}
}
private static final ConcurrentMap<Integer, AtomicIntegerWrapper> MAP
= new ConcurrentHashMap<>();
private static final int NUM_THREADS = 3;
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 10; ++i) {
MAP.put(i, new AtomicIntegerWrapper(1));
}
Thread.sleep(1);
for (int i = 0; i < NUM_THREADS; ++i) {
new Thread(() -> {
Random rnd = new Random();
while (!MAP.isEmpty()) {
MAP.forEach((key, value) -> {
AtomicIntegerWrapper elem = MAP.get(key);
if (elem == null) {
System.out.println("Oops...");
} else if (elem.get() == 1986) {
elem.set(1);
} else if ((rnd.nextInt() & 128) == 0) {
elem.set(1986);
}
});
}
}).start();
}
Thread.sleep(1);
new Thread(() -> {
Random rnd = new Random();
while (!MAP.isEmpty()) {
MAP.forEach((key, value) -> {
AtomicIntegerWrapper elem =
AtomicIntegerWrapper.deepCopy(MAP.get(key));
if (elem.get() == 1986) {
try {
Thread.sleep(10);
} catch (Exception e) {}
boolean replaced = MAP.remove(key, elem);
if (!replaced) {
System.out.println("Bailed out!");
} else {
System.out.println("Replaced!");
}
}
});
}
}).start();
}
}
You'll see printouts of "Bailed out!", intermixed with "Replaced!" (removal was successful, as there were no concurrent updates that you care about) and the calculation will stop at some point.
If you remove the custom equals() method and continue to use a copy, you'll see an endless stream of "Bailed out!", because the copy is never considered equal to the value in the map.
If you don't use a copy, you won't see "Bailed out!" printed out, and you'll hit the problem you're explaining - values are removed regardless of concurrent changes.
Let us consider what options you have.
Create your own Container-class with isUpdated() operation and use your own workaround.
If your map contains just a few elements and you are iterating over the map very frequently compared against put/delete operation. It could be a good choice to use CopyOnWriteArrayList
CopyOnWriteArrayList<Entry<Integer, Integer>> lookupArray = ...;
The other option is to implement your own CopyOnWriteMap
public class CopyOnWriteMap<K, V> implements Map<K, V>{
private volatile Map<K, V> currentMap;
public V put(K key, V value) {
synchronized (this) {
Map<K, V> newOne = new HashMap<K, V>(this.currentMap);
V val = newOne.put(key, value);
this.currentMap = newOne; // atomic operation
return val;
}
}
public V remove(Object key) {
synchronized (this) {
Map<K, V> newOne = new HashMap<K, V>(this.currentMap);
V val = newOne.remove(key);
this.currentMap = newOne; // atomic operation
return val;
}
}
[...]
}
There is a negative side effect. If you are using copy-on-write Collections your updates will be never lost, but you can see some former deleted entry again.
Worst case: deleted entry will be restored every time if map get copied.
Related
I'm trying to find a data-structure in Java (or Groovy) that where something like this works:
MemberAdressableSetsSet mass = new MemberAdressableSetsSet();
mass.addSet(["a","b"]);
mass.addSet(["c","d","e"]);
mass.get("d").add("f");
String output = Arrays.toString(mass.get("e").toArray());
System.out.println(output); // [ "c", "d", "e", "f" ] (ordering irrelevant)
Does anything like that exist? And if not, is there a way to implement something like this with normal Java code that doesn't give the CPU or the memory nightmares for weeks?
Edit: more rigorously
MemberAdressableSetsSet mass = new MemberAdressableSetsSet();
Set<String> s1 = new HashSet<String>();
s1.add("a");
Set<String> s2 = new HashSet<String>();
s2.add("c");s2.add("d");s2.add("e");
mass.addSet(s1);
mass.addSet(s2);
Set<String> s3 = new HashSet<String>();
s3.add("a");s3.add("z");
mass.addSet(s3);
/* s3 contains "a", which is already in a subset of mass, so:
* Either
* - does nothing and returns false or throws Exception
* - deletes "a" from its previous subset before adding s3
* => possibly returns the old subset
* => deletes the old subset if that leaves it empty
* => maybe requires an optional parameter to be set
* - removes "a" from the new subset before adding it
* => possibly returns the new subset that was actually added
* => does not add the new subset if purging it of overlap leaves it empty
* => maybe requires an optional parameter to be set
* - merges all sets that would end up overlapping
* - adds it with no overlap checks, but get("a") returns an array of all sets containing it
*/
mass.get("d").add("f");
String output = Arrays.toString(mass.get("e").toArray());
System.out.println(output); // [ "c", "d", "e", "f" ] (ordering irrelevant)
mass.get("d") would return the Set<T> in mass that contains "d". Analogous to how get() works in, say, HashMap:
HashMap<String,LinkedList<Integer>> map = new HashMap<>();
LinkedList<Integer> list = new LinkedList<>();
list.add(9);
map.put("d",list);
map.get("d").add(4);
map.get("d"); // returns a LinkedList with contents [9,4]
The best I could come up with so far looks like this:
import java.util.HashMap;
import java.util.Set;
public class MemberAdressableSetsSet {
private int next_id = 1;
private HashMap<Object,Integer> members = new HashMap();
private HashMap<Integer,Set> sets = new HashMap();
public boolean addSet(Set s) {
if (s.size()==0) return false;
for (Object member : s) {
if (members.get(member)!=null) return false;
}
sets.put(next_id,s);
for (Object member : s) {
members.put(member,next_id);
}
next_id++;
return true;
}
public boolean deleteSet(Object member) {
Integer id = members.get(member);
if (id==null) return false;
Set set = sets.get(id);
for (Object m : set) {
members.remove(m);
}
sets.remove(id);
return true;
}
public boolean addToSet(Object member, Object addition) {
Integer id = members.get(member);
if (id==null) throw new IndexOutOfBoundsException();
if (members.get(addition)!=null) return false;
sets.get(id).add(addition);
members.put(addition,id);
return true;
}
public boolean removeFromSet(Object member) {
Integer id = members.get(member);
if (id==null) return false;
Set s = sets.get(id);
if (s.size()==1) sets.remove(id);
else s.remove(member);
members.remove(member);
return true;
}
public Set getSetClone(Object member) {
Integer id = members.get(member);
if (id==null) throw new IndexOutOfBoundsException();
Set copy = new java.util.HashSet(sets.get(id));
return copy;
}
}
Which has some drawbacks:
Sets are not directly accessible, which makes all Set methods and properties not exposed by explicitly defined translation methods inaccessible, unless the clones are an acceptable option
Type information is lost.
Say a Set<Date> is added.
It would not complain about trying to add, for example, a File object to that set.
At least the lost type information for the Sets doesn't extend to their members: the Set.contains() still works exactly as expected, despite both sides having been typecast to Object before being compared by contains(). So a set containing (Object)3 won't return true when asked whether it contains (Object)3L and vice versa, for example.
A set containing (Object)(new java.util.Date(10L)) will return true when asked whether it contains (Object)(new java.sql.Date(10L)) (and the other way round), but that's true even without the (Object) in front, so I guess that's "works as intended" ¯\_(ツ)_/¯
How often do you need to access by one element? Might be worth using a map and storing the same Set reference under multiple keys.
I would prevent external mutation to the map and sub sets, and provide helper method to do all of the updates:
public class MemberAdressableSets<T> {
Map<T, Set<T>> data = new HashMap<>();
public void addSet(Set<T> dataSet) {
if (dataSet.stream().anyMatch(data::containsKey)) {
throw Exception("Key already in member addressable data");
}
Set<T> protectedSet = new HashSet<>(dataSet);
dataSet.forEach(d -> data.put(d, protectedSet));
}
public void updateSet(T key, T... newData) {
Set<T> dataSet = data.get(key);
Arrays.stream(newData).forEach(dataSet::add);
Arrays.stream(newData).forEach(d -> data.put(d, dataSet));
}
public Set<T> get(T key) {
return Collections.unmodifiableSet(data.get(key));
}
}
Alternatively you could update the addSet and updateSet to create new Set instances if the key doesn't exist and make updateSet never throw. You'll also need to extend this class to handle the cases of merging sets. i.e. handle the use-case:
mass.addSet(["a","b"]);
mass.addSet(["a","c"]);
This solution allows for things like mass.get("d").add("f"); to affect the subset stored in mass, but with major drawbacks.
import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.Set;
public class MemberAdressableSetsSetDirect {
private LinkedHashSet<Set> sets;
public void addSet(Set newSet) {
sets.add(newSet);
}
public Set removeSet(Object member) {
Iterator<Set> it = sets.iterator();
while (it.hasNext()) {
Set s = it.next();
if (s.contains(member)) {
it.remove();
return s;
}
}
return null;
}
public int removeSets(Object member) {
int removed = 0;
Iterator<Set> it = sets.iterator();
while (it.hasNext()) {
Set s = it.next();
if (s.contains(member)) {
it.remove();
removed++;
}
}
return removed;
}
public void deleteEmptySets() {
sets.removeIf(Set::isEmpty);
}
public Set get(Object member) {
for (Set s : sets) {
if (s.contains(member)) return s;
}
return null;
}
public Set[] getAll(Object member) {
LinkedHashSet<Set> results = new LinkedHashSet<>();
for (Set s : sets) {
if (s.contains(member)) results.add(s);
}
return (Set[]) results.toArray();
}
}
There's no built-in protection against overlap and thus we have unreliable access, as well as introducing the possibility of countless empty sets that need to be periodically purged with a manual call to deleteEmptySets(), as this solution can't detect if a subset was modified by direct access.
MemberAdressableSetsSetDirect massd = new MemberAdressableSetsSetDirect();
Set s1 = new HashSet();Set s2 = new HashSet();Set s3 = new HashSet();
s1.add("a");s1.add("b");
s2.add("c");s2.add("d");
s3.add("e");
massd.addSet(s1);massd.addSet(s2);
massd.get("c").add("a");
// massd.get("a") will now either return the Set ["a","b"] or the Set ["a","c","d"]
// (could be that my usage of a LinkedHashSet as the basis of massd
// at least makes it consistently return the set added first)
massd.get("e").remove("e");
// the third set is now empty, can't be accessed anymore,
// and massd has no clue about that until it's told to look for empty sets
massd.get("c").remove("d");
massd.get("c").remove("c");
// if LinkedHashSet makes this solution act as I suspected above,
// this makes the third subset inaccessible except via massd.getAll("a")[1]
Additionaly, this solution also can't preserve type information.
This will not even give warnings:
MemberAdressableSetsSetDirect massd = new MemberAdressableSetsSetDirect();
Set<Long> s = new HashSet<Long>();
s.add(3L);
massd.addSet(s);
massd.get(3L).add("someString");
// massd.get(3L) will now return a Set with contents [3L, "someString"]
This question already has answers here:
Sample Directed Graph and Topological Sort Code [closed]
(7 answers)
Closed 4 years ago.
Problem
I have the requirement to sort a list by a certain property of each object in that list. This is a standard action supported in most languages.
However, there is additional requirement that certain items may depend on others, and as such, must not appear in the sorted list until items they depend on have appeared first, even if this requires going against the normal sort order. Any such item that is 'blocked', should appear in the list the moment the items 'blocking' it have been added to the output list.
An Example
If I have items:
[{'a',6},{'b',1},{'c',5},{'d',15},{'e',12},{'f',20},{'g',14},{'h',7}]
Sorting these normally by the numeric value will get:
[{'b',1},{'c',5},{'a',6},{'h',7},{'e',12},{'g',14},{'d',15},{'f',20}]
However, if the following constraints are enforced:
a depends on e
g depends on d
c depends on b
Then this result is invalid. Instead, the result should be:
[{'b',1},{'c',5},{'h',7},{'e',12},{'a',6},{'d',15},{'g',14},{'f',20}]
Where b, c, d, e, f and h have been sorted in correct order b, c, h, e, d and f; both a and g got delayed until e and d respectively had been output; and c did not need delaying, as the value it depended on, b, had already been output.
What I have already tried
Initially I investigated if this was possible using basic Java comparators, where the comparator implementation was something like:
private Map<MyObject,Set<MyObject>> dependencies; // parent to set of children
public int compare(MyObj x, MyObj y) {
if (dependencies.get(x).contains(y)) {
return 1;
} else if (dependencies.get(y).contains(x)) {
return -1;
} else if (x.getValue() < y.getValue()) {
return -1;
} else if (x.getValue() > y.getValue()) {
return 1;
} else {
return 0;
}
}
However this breaks the requirement of Java comparators of being transitive. Taken from the java documentation:
((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
However, in the above example
a(6) < h(7) : true
h(7) < e(12) : true
a(6) < e(12) : false
Instead, I have come up with the below code, which while works, seems massively over-sized and over-complex for what seems like a simple problem. (Note: This is a slightly cut down version of the class. It can also be viewed and run at https://ideone.com/XrhSeA)
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.ListIterator;
import java.util.Map;
import java.util.Objects;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public final class ListManager<ValueType extends Comparable<ValueType>> {
private static final class ParentChildrenWrapper<ValueType> {
private final ValueType parent;
private final Set<ValueType> childrenByReference;
public ParentChildrenWrapper(ValueType parent, Set<ValueType> childrenByReference) {
this.parent = parent;
this.childrenByReference = childrenByReference;
}
public ValueType getParent() {
return this.parent;
}
public Set<ValueType> getChildrenByReference() {
return this.childrenByReference;
}
}
private static final class QueuedItem<ValueType> implements Comparable<QueuedItem<ValueType>> {
private final ValueType item;
private final int index;
public QueuedItem(ValueType item, int index) {
this.item = item;
this.index = index;
}
public ValueType getItem() {
return this.item;
}
public int getIndex() {
return this.index;
}
#Override
public int compareTo(QueuedItem<ValueType> other) {
if (this.index < other.index) {
return -1;
} else if (this.index > other.index) {
return 1;
} else {
return 0;
}
}
}
private final Set<ValueType> unsortedItems;
private final Map<ValueType, Set<ValueType>> dependentsOfParents;
public ListManager() {
this.unsortedItems = new HashSet<>();
this.dependentsOfParents = new HashMap<>();
}
public void addItem(ValueType value) {
this.unsortedItems.add(value);
}
public final void registerDependency(ValueType parent, ValueType child) {
if (!this.unsortedItems.contains(parent)) {
throw new IllegalArgumentException("Unrecognized parent");
} else if (!this.unsortedItems.contains(child)) {
throw new IllegalArgumentException("Unrecognized child");
} else if (Objects.equals(parent,child)) {
throw new IllegalArgumentException("Parent and child are the same");
} else {
this.dependentsOfParents.computeIfAbsent(parent, __ -> new HashSet<>()).add(child);
}
}
public List<ValueType> createSortedList() {
// Create a copy of dependentsOfParents where the sets of children can be modified without impacting the original.
// These sets will representing the set of children for each parent that are yet to be dealt with, and such sets will shrink as more items are processed.
Map<ValueType, Set<ValueType>> blockingDependentsOfParents = new HashMap<>(this.dependentsOfParents.size());
for (Map.Entry<ValueType, Set<ValueType>> parentEntry : this.dependentsOfParents.entrySet()) {
Set<ValueType> childrenOfParent = parentEntry.getValue();
if (childrenOfParent != null && !childrenOfParent.isEmpty()) {
blockingDependentsOfParents.put(parentEntry.getKey(), new HashSet<>(childrenOfParent));
}
}
// Compute a list of which children impact which parents, alongside the set of children belonging to each parent.
// This will allow a child to remove itself from all of it's parents' lists of blocking children.
Map<ValueType,List<ParentChildrenWrapper<ValueType>>> childImpacts = new HashMap<>();
for (Map.Entry<ValueType, Set<ValueType>> entry : blockingDependentsOfParents.entrySet()) {
ValueType parent = entry.getKey();
Set<ValueType> childrenForParent = entry.getValue();
ParentChildrenWrapper<ValueType> childrenForParentWrapped = new ParentChildrenWrapper<>(parent,childrenForParent);
for (ValueType child : childrenForParent) {
childImpacts.computeIfAbsent(child, __ -> new LinkedList<>()).add(childrenForParentWrapped);
}
}
// If there are no relationships, the remaining code can be massively optimised.
boolean hasNoRelationships = blockingDependentsOfParents.isEmpty();
// Create a pre-sorted stream of items.
Stream<ValueType> rankedItemStream = this.unsortedItems.stream().sorted();
List<ValueType> outputList;
if (hasNoRelationships) {
// There are no relationships, and as such, the stream is already in a perfectly fine order.
outputList = rankedItemStream.collect(Collectors.toList());
} else {
Iterator<ValueType> rankedIterator = rankedItemStream.iterator();
int queueIndex = 0;
outputList = new ArrayList<>(this.unsortedItems.size());
// A collection of items that have been visited but are blocked by children, stored in map form for easy deletion.
Map<ValueType,QueuedItem<ValueType>> lockedItems = new HashMap<>();
// A list of items that have been freed from their blocking children, but have yet to be processed, ordered by order originally encountered.
PriorityQueue<QueuedItem<ValueType>> freedItems = new PriorityQueue<>();
while (true) {
// Grab the earliest-seen item which was once locked but has now been freed. Otherwise, grab the next unseen item.
ValueType item;
boolean mustBeUnblocked;
QueuedItem<ValueType> queuedItem = freedItems.poll();
if (queuedItem == null) {
if (rankedIterator.hasNext()) {
item = rankedIterator.next();
mustBeUnblocked = false;
} else {
break;
}
} else {
item = queuedItem.getItem();
mustBeUnblocked = true;
}
// See if this item has any children that are blocking it from being added to the output list.
Set<ValueType> childrenWaitingUpon = blockingDependentsOfParents.get(item);
if (childrenWaitingUpon == null || childrenWaitingUpon.isEmpty()) {
// There are no children blocking this item, so start removing it from all blocking lists.
// Get a list of all parents that is item was blocking, if there are any.
List<ParentChildrenWrapper<ValueType>> childImpact = childImpacts.get(item);
if (childImpact != null) {
// Iterate over all those parents
ListIterator<ParentChildrenWrapper<ValueType>> childImpactIterator = childImpact.listIterator();
while (childImpactIterator.hasNext()) {
// Remove this item from that parent's blocking children.
ParentChildrenWrapper<ValueType> wrappedParentImpactedByChild = childImpactIterator.next();
Set<ValueType> childrenOfParentImpactedByChild = wrappedParentImpactedByChild.getChildrenByReference();
childrenOfParentImpactedByChild.remove(item);
// Does this parent no longer have any children blocking it?
if (childrenOfParentImpactedByChild.isEmpty()) {
// Remove it from the children impacts map, to prevent unnecessary processing of a now empty set in future iterations.
childImpactIterator.remove();
// If this parent was locked, mark it as now freed.
QueuedItem<ValueType> freedQueuedItem = lockedItems.remove(wrappedParentImpactedByChild.getParent());
if (freedQueuedItem != null) {
freedItems.add(freedQueuedItem);
}
}
}
// If there are no longer any parents at all being blocked by this child, remove it from the map.
if (childImpact.isEmpty()) {
childImpacts.remove(item);
}
}
outputList.add(item);
} else if (mustBeUnblocked) {
throw new IllegalStateException("Freed item is still blocked. This should not happen.");
} else {
// Mark the item as locked.
lockedItems.put(item,new QueuedItem<>(item,queueIndex++));
}
}
// Check that all items were processed successfully. Given there is only one path that will add an item to to the output list without an exception, we can just compare sizes.
if (outputList.size() != this.unsortedItems.size()) {
throw new IllegalStateException("Could not complete ordering. Are there recursive chains of items?");
}
}
return outputList;
}
}
My question
Is there an already existing algorithm, or an algorithm significantly shorter than the above, that will allow this to be done?
While the language I am developing in is Java, and the code above is in Java, language-independent answers that I could implement in Java are also fine.
This is called topological sorting. You can model "blocking" as edges of a directed graph. This should work if there are no circular "blockings".
I've done this in <100 lines of c# code (with comments). This implementation seems a little complicated.
Here is the outline of the algorithm
Create a priority queue that is keyed by value that you want to sort by
Insert all the items that do not have any "blocking" connections incoming
While there are elements in the queue:
Take an element of the queue. Put it in your resulting list.
If there are any elements that were being directly blocked by this element and were not visited previously, put them into the queue (an element can have more than one blocking element, so you check for that)
A list of unprocessed elements should be empty at the end, or you had a cycle in your dependencies.
This is essentialy Topological sort with built in priority for nodes. Keep in mind that the result can be quite suprising depending on the number of connections in your graph (ex. it's possible to actually get elements that are in reverse order).
As Pratik Deoghare stated in their answer, you can use topological sorting. You can view your "dependencies" as arcs of a Directed Acyclic Graph (DAG). The restriction that the dependencies on the objects are acyclic is important as topological sorting is only possible "if and only if the graph has no directed cycles." The dependencies also of course don't make sense otherwise (i.e. a depends on b and b depends on a doesn't make sense because this is a cyclic dependency).
Once you do topological sorting, the graph can be interpreted as having "layers". To finish the solution, you need to sort within these layers. If there are no dependencies in the objects, this leads to there being just one layer where all the nodes in the DAG are on the same layer and then they are sorted based on their value.
The overall running time is still O(n log n) because topological sorting is O(n) and sorting within the layers is O(n log n). See topological sorting wiki for full running time analysis.
Since you said any language that could be converted to Java, I've done a combination of [what I think is] your algorithm and ghord's in C.
A lot of the code is boilerplate to handle arrays, searches, and array/list insertions that I believe can be reduced by using standard Java primitives. Thus, the amount of actual algorithm code is fairly small.
The algorithm I came up with is:
Given: A raw list of all elements and a dependency list
Copy elements that depend on another element to a "hold" list. Otherwise, copy them to a "sort" list.
Note: an alternative is to only use the sort list and just remove the nodes that depend on another to the hold list.
Sort the "sort" list.
For all elements in the dependency list, find the corresponding nodes in the sort list and the hold list. Insert the hold element into the sort list after the corresponding sort element.
Here's the code:
#include <stdio.h>
#include <stdlib.h>
// sort node definition
typedef struct {
int key;
int val;
} Node;
// dependency definition
typedef struct {
int keybef; // key of node that keyaft depends on
int keyaft; // key of node to insert
} Dep;
// raw list of all nodes
Node rawlist[] = {
{'a',6}, // depends on e
{'b',1},
{'c',5}, // depends on b
{'d',15},
{'e',12},
{'f',20},
{'g',14}, // depends on d
{'h',7}
};
// dependency list
Dep deplist[] = {
{'e','a'},
{'b','c'},
{'d','g'},
{0,0}
};
#define MAXLIST (sizeof(rawlist) / sizeof(rawlist[0]))
// hold list -- all nodes that depend on another
int holdcnt;
Node holdlist[MAXLIST];
// sort list -- all nodes that do _not_ depend on another
int sortcnt;
Node sortlist[MAXLIST];
// prtlist -- print all nodes in a list
void
prtlist(Node *node,int nodecnt,const char *tag)
{
printf("%s:\n",tag);
for (; nodecnt > 0; --nodecnt, ++node)
printf(" %c:%d\n",node->key,node->val);
}
// placenode -- put node into hold list or sort list
void
placenode(Node *node)
{
Dep *dep;
int holdflg;
holdflg = 0;
// decide if node depends on another
for (dep = deplist; dep->keybef != 0; ++dep) {
holdflg = (node->key == dep->keyaft);
if (holdflg)
break;
}
if (holdflg)
holdlist[holdcnt++] = *node;
else
sortlist[sortcnt++] = *node;
}
// sortcmp -- qsort compare function
int
sortcmp(const void *vlhs,const void *vrhs)
{
const Node *lhs = vlhs;
const Node *rhs = vrhs;
int cmpflg;
cmpflg = lhs->val - rhs->val;
return cmpflg;
}
// findnode -- find node in list that matches the given key
Node *
findnode(Node *node,int nodecnt,int key)
{
for (; nodecnt > 0; --nodecnt, ++node) {
if (node->key == key)
break;
}
return node;
}
// insert -- insert hold node into sorted list at correct spot
void
insert(Node *sort,Node *hold)
{
Node prev;
Node next;
int sortidx;
prev = *sort;
*sort = *hold;
++sortcnt;
for (; sort < &sortlist[sortcnt]; ++sort) {
next = *sort;
*sort = prev;
prev = next;
}
}
int
main(void)
{
Node *node;
Node *sort;
Node *hold;
Dep *dep;
prtlist(rawlist,MAXLIST,"RAW");
printf("DEP:\n");
for (dep = deplist; dep->keybef != 0; ++dep)
printf(" %c depends on %c\n",dep->keyaft,dep->keybef);
// place nodes into hold list or sort list
for (node = rawlist; node < &rawlist[MAXLIST]; ++node)
placenode(node);
prtlist(sortlist,sortcnt,"SORT");
prtlist(holdlist,holdcnt,"HOLD");
// sort the "sort" list
qsort(sortlist,sortcnt,sizeof(Node),sortcmp);
prtlist(sortlist,sortcnt,"SORT");
// add nodes from hold list to sort list
for (dep = deplist; dep->keybef != 0; ++dep) {
printf("inserting %c after %c\n",dep->keyaft,dep->keybef);
sort = findnode(sortlist,sortcnt,dep->keybef);
hold = findnode(holdlist,holdcnt,dep->keyaft);
insert(sort,hold);
prtlist(sortlist,sortcnt,"POST");
}
return 0;
}
Here's the program output:
RAW:
a:6
b:1
c:5
d:15
e:12
f:20
g:14
h:7
DEP:
a depends on e
c depends on b
g depends on d
SORT:
b:1
d:15
e:12
f:20
h:7
HOLD:
a:6
c:5
g:14
SORT:
b:1
h:7
e:12
d:15
f:20
inserting a after e
POST:
b:1
h:7
e:12
a:6
d:15
f:20
inserting c after b
POST:
b:1
c:5
h:7
e:12
a:6
d:15
f:20
inserting g after d
POST:
b:1
c:5
h:7
e:12
a:6
d:15
g:14
f:20
I think you are generally on the right track, and the core concept behind your solution is similar to the one I will post below. The general algorithm is as follows:
Create a map that associates each item to the items that depend upon it.
Insert elements with no dependencies into a heap.
Remove the top element from the heap.
Subtract 1 from dependency count of each dependent of the element.
Add any elements with a dependency count of zero to the heap.
Repeat from step 3 until the heap is empty.
For simplicity I have replaced your ValueType with a String, but the same concepts apply.
The BlockedItem class:
import java.util.ArrayList;
import java.util.List;
public class BlockedItem implements Comparable<BlockedItem> {
private String value;
private int index;
private List<BlockedItem> dependentUpon;
private int dependencies;
public BlockedItem(String value, int index){
this.value = value;
this.index = index;
this.dependentUpon = new ArrayList<>();
this.dependencies = 0;
}
public String getValue() {
return value;
}
public List<BlockedItem> getDependentUpon() {
return dependentUpon;
}
public void addDependency(BlockedItem dependentUpon) {
this.dependentUpon.add(dependentUpon);
this.dependencies++;
}
#Override
public int compareTo(BlockedItem other){
return this.index - other.index;
}
public int countDependencies() {
return dependencies;
}
public int subtractDependent(){
return --this.dependencies;
}
#Override
public String toString(){
return "{'" + this.value + "', " + this.index + "}";
}
}
The BlockedItemHeapSort class:
import java.util.*;
public class BlockedItemHeapSort {
//maps all blockedItems to the blockItems which depend on them
private static Map<String, Set<BlockedItem>> generateBlockedMap(List<BlockedItem> unsortedList){
Map<String, Set<BlockedItem>> blockedMap = new HashMap<>();
//initialize a set for each element
unsortedList.stream().forEach(item -> {
Set<BlockedItem> dependents = new HashSet<>();
blockedMap.put(item.getValue(), dependents);
});
//place each element in the sets corresponding to its dependencies
unsortedList.stream().forEach(item -> {
if(item.countDependencies() > 0){
item.getDependentUpon().stream().forEach(dependency -> blockedMap.get(dependency.getValue()).add(item));
}
});
return blockedMap;
}
public static List<BlockedItem> sortBlockedItems(List<BlockedItem> unsortedList){
List<BlockedItem> sorted = new ArrayList<>();
Map<String, Set<BlockedItem>> blockedMap = generateBlockedMap(unsortedList);
PriorityQueue<BlockedItem> itemHeap = new PriorityQueue<>();
//put elements with no dependencies in the heap
unsortedList.stream().forEach(item -> {
if(item.countDependencies() == 0) itemHeap.add(item);
});
while(itemHeap.size() > 0){
//get the top element
BlockedItem item = itemHeap.poll();
sorted.add(item);
//for each element that depends upon item, decrease its dependency count
//if it has a zero dependency count after subtraction, add it to the heap
if(!blockedMap.get(item.getValue()).isEmpty()){
blockedMap.get(item.getValue()).stream().forEach(dependent -> {
if(dependent.subtractDependent() == 0) itemHeap.add(dependent);
});
}
}
return sorted;
}
}
You can modify this to more closely fit your use-case.
Java Code for topological sort:
static List<ValueType> topoSort(List<ValueType> vertices) {
List<ValueType> result = new ArrayList<>();
List<ValueType> todo = new LinkedList<>();
Collections.sort(vertices);
for (ValueType v : vertices){
todo.add(v);
}
outer:
while (!todo.isEmpty()) {
for (ValueType r : todo) {
if (!hasDependency(r, todo)) {
todo.remove(r);
result.add(r);
// no need to worry about concurrent modification
continue outer;
}
}
}
return result;
}
static boolean hasDependency(ValueType r, List<ValueType> todo) {
for (ValueType c : todo) {
if (r.getDependencies().contains(c))
return true;
}
return false;
}
ValueType is described like below:
class ValueType implements Comparable<ValueType> {
private Integer index;
private String value;
private List<ValueType> dependencies;
public ValueType(int index, String value, ValueType...dependencies){
this.index = index;
this.value = value;
this.dependencies = dependencies==null?null:Arrays.asList(dependencies);
}
public List<ValueType> getDependencies() {
return dependencies;
}
public void setDependencies(List<ValueType> dependencies) {
this.dependencies = dependencies;
}
#Override
public int compareTo(#NotNull ValueType o) {
return this.index.compareTo(o.index);
}
#Override
public String toString() {
return value +"(" + index +")";
}
}
And tested with these values:
public static void main(String[] args) {
//[{'a',6},{'b',1},{'c',5},{'d',15},{'e',12},{'f',20},{'g',14},{'h',7}]
//a depends on e
//g depends on d
//c depends on b
ValueType b = new ValueType(1,"b");
ValueType c = new ValueType(5,"c", b);
ValueType d = new ValueType(15,"d");
ValueType e = new ValueType(12,"e");
ValueType a = new ValueType(6,"a", e);
ValueType f = new ValueType(20,"f");
ValueType g = new ValueType(14,"g", d);
ValueType h = new ValueType(7,"h");
List<ValueType> valueTypes = Arrays.asList(a,b,c,d,e,f,g,h);
List<ValueType> r = topoSort(valueTypes);
for(ValueType v: r){
System.out.println(v);
}
}
So I'm trying to create a smart data structure based off AVL tree and Hash Table.
I'm making sure I need to check first which implementation the data type will have depending on the size the list given to it.
For example, if I have a list n of size 1000, it'll be implemented using a Hash table. For anything more than 1000, using an AVL tree.
Code for this:
public class SmartULS<K,V> {
protected TreeMap<K,V> tree = new TreeMap<>();
protected AbstractHashMap<K,V> hashMap = new AbstractHashMap<K,V>();
public void setSmartThresholdULS(size){
int threshold = 1000;
if (size >= threshold) {
map = new AbtractMap<K,V>();
}
else
map = new TreeMap<K,V>();
}
}
Now after this, I should be writing the standard methods such as
get(SmartULS, Key), add(SmartULS, Key, Value), remove(SmartULS,Key), nextKey(Key), previousKey(Key), etc.
I'm really lost as to how to start this? I've thought about creating these methods like this(written in pseudo):
Algorithm add(SmartULS, Key, Value):
i<- 0
If SmartULS instanceof AbstractHashMap then
For i to SmartULS.size do
If Key equals to SmartULS[i] then
SmartULS.get(Key).setValue(Value)
Else
SmartULS.add(Key, Value)
Else if SmartULS instanceof TreeMap then
Entry newAdd equals new MapEntry(Key, Value)
Position<Entry> p = treeSearch(root( ), Key)
You're on the correct track, this is how I understood your question and implemented it:
public class SmartULS<K, V> {
Map<K,V> map;
public static final int THRESHOLD = 1000;
public SmartULS(int size) {
if(size < THRESHOLD) {
map = new HashMap();
} else {
map = new TreeMap();
}
}
public V get(K key) {
return map.get(key);
}
public V put(K key, V value) {
return map.put(key, value);
}
public V remove(K key) {
return map.remove(key);
}
}
Based on the initial size given, the constructor decides if to initialize a hash table or a tree. I also added a the get, put and remove functions and used the Map's interface functions.
I didn't understand what the nextKey and previousKey functions are suppose to do or return, so couldn't help there.
A way of using the class would be as follows:
public static void main(String[] args) {
SmartULS<String, String> smartULS = new SmartULS(952);
smartULS.put("firstKey", "firstValue");
smartULS.put("secondKey", "secondsValue");
String value = smartULS.get("firstKey");
smartULS.remove("secondKey");
}
Hope this helps:)
I am creating a function that loops through a string, separates it by comma and then takes the key from the second item in the array and the value from the 1st after splitting the string.
I then want to place these values in a map. This works perfectly, however if i have two strings with the same key it doesn't add the value up it just replaces it.
For example if my string was
123,totti 100,roma, 100,totti
I would want
totti 223
roma 100
Here is my code
private void processCallLogs(String[] splitCalls) {
for (String individualCall : splitCalls) {
int duration = 0;
String[] singleCall = individualCall.split(",");
duration += DurationParser.returnDuration(singleCall[0]);
this.cost += CalculateCost.calculateCostPerCall(singleDuration);
if (totalCallDurations.containsKey(singleCall[1])) {
totalCallDurations.put(singleCall[1], singleDuration);
} else {
totalCallDurations.put(singleCall[1], duration);
}
}
}
You can replace the if with something like this:
if (totalCallDurations.containsKey(singleCall[1])) {
duration += totalCallDurations.get(singleCall[1]);
}
totalCallDurations.put(singleCall[1], duration);
Create a map and update the value if the key is present
public static void main(String[] args) {
myMap = new HashMap<>();
// 123,totti 100,roma, 100,totti
addToMap("totti", 123);
addToMap("roma", 100);
addToMap("totti", 100);
System.out.println(myMap);
}
private static void addToMap(String string, int i) {
int t = i;
if (myMap.get(string) != null) {
t += myMap.get(string);
}
myMap.put(string, t);
}
If you're using Java 8, you can do this easily with the Map.merge() method:
totalCallDurations.merge(singleCall[1], duration, Integer::sum);
If you want to make a map that will add the values together instead of replacing, I would recommend extending the Map type to make your own map. Since Map is very abstract. I would extend HashMap. (I suggest this both for code style and because it will make your code more extendable).
public class AdderMap extends HashMap<String, Integer> { // This extends the HashMap class
public Integer get(String key) { // This overrides the Map::get method
if(super.containsKey(key)) return super.get(key); // If the key-value pairing exists, return the value
else return 0; // If it doesn't exist, return 0
}
public Integer put(String key, Integer value) { // This overrides the Map::put method
Integer old_value = this.get(key); // Get the former value of the key-value pairing (which is 0 if it doesn't exist)
super.put(key, old_value + value); // Add the new value to the former value and replace the key-value pairing (this behaves normally when the former value didn't exist)
return old_value; // As per the documentation, Map::put will return the old value of the key-value pairing
}
}
Now, when you initialize your map, make it an AdderMap. Then, you can just use put(String, Integer) and it will add it together.
The advantage of this solution is that it helps with keeping your code clean and it allows you to use this type of map again in the future without needing separate code in your main code. The disadvantage is that it requires another class, and having too many classes can become cluttered.
In Java, I have several SortedSet instances. I would like to iterate over the elements from all these sets. One simple option is to create a new SortedSet, such as TreeSet x, deep-copy the contents of all the individual sets y_1, ..., y_n into it using x.addAll(y_i), and then iterate over x.
But is there a way to avoid deep copy? Couldn't I just create a view of type SortedSet which would somehow encapsulate the iterators of all the inner sets, but behave as a single set?
I'd prefer an existing, tested solution, rather than writing my own.
I'm not aware of any existing solution to accomplish this task, so I took the time to write one for you. I'm sure there's room for improvement on it, so take it as a guideline and nothing else.
As Sandor points out in his answer, there are some limitations that must be imposed or assumed. One such limitation is that every SortedSet must be sorted relative to the same order, otherwise there's no point in comparing their elements without creating a new set (representing the union of every individual set).
Here follows my code example which, as you'll notice, is relatively more complex than just creating a new set and adding all elements to it.
import java.util.*;
final class MultiSortedSetView<E> implements Iterable<E> {
private final List<SortedSet<E>> sets = new ArrayList<>();
private final Comparator<? super E> comparator;
MultiSortedSetView() {
comparator = null;
}
MultiSortedSetView(final Comparator<? super E> comp) {
comparator = comp;
}
#Override
public Iterator<E> iterator() {
return new MultiSortedSetIterator<E>(sets, comparator);
}
MultiSortedSetView<E> add(final SortedSet<E> set) {
// You may remove this `if` if you already know
// every set uses the same comparator.
if (comparator != set.comparator()) {
throw new IllegalArgumentException("Different Comparator!");
}
sets.add(set);
return this;
}
#Override
public boolean equals(final Object o) {
if (this == o) { return true; }
if (!(o instanceof MultiSortedSetView)) { return false; }
final MultiSortedSetView<?> n = (MultiSortedSetView<?>) o;
return sets.equals(n.sets) &&
(comparator == n.comparator ||
(comparator != null ? comparator.equals(n.comparator) :
n.comparator.equals(comparator)));
}
#Override
public int hashCode() {
int hash = comparator == null ? 0 : comparator.hashCode();
return 37 * hash + sets.hashCode();
}
#Override
public String toString() {
return sets.toString();
}
private final static class MultiSortedSetIterator<E>
implements Iterator<E> {
private final List<Iterator<E>> iterators;
private final PriorityQueue<Element<E>> queue;
private MultiSortedSetIterator(final List<SortedSet<E>> sets,
final Comparator<? super E> comparator) {
final int n = sets.size();
queue = new PriorityQueue<Element<E>>(n,
new ElementComparator<E>(comparator));
iterators = new ArrayList<Iterator<E>>(n);
for (final SortedSet<E> s: sets) {
iterators.add(s.iterator());
}
prepareQueue();
}
#Override
public E next() {
final Element<E> e = queue.poll();
if (e == null) {
throw new NoSuchElementException();
}
if (!insertFromIterator(e.iterator)) {
iterators.remove(e.iterator);
}
return e.element;
}
#Override
public boolean hasNext() {
return !queue.isEmpty();
}
private void prepareQueue() {
final Iterator<Iterator<E>> iterator = iterators.iterator();
while (iterator.hasNext()) {
if (!insertFromIterator(iterator.next())) {
iterator.remove();
}
}
}
private boolean insertFromIterator(final Iterator<E> i) {
while (i.hasNext()) {
final Element<E> e = new Element<>(i.next(), i);
if (!queue.contains(e)) {
queue.add(e);
return true;
}
}
return false;
}
private static final class Element<E> {
final E element;
final Iterator<E> iterator;
Element(final E e, final Iterator<E> i) {
element = e;
iterator = i;
}
#Override
public boolean equals(final Object o) {
if (o == this) { return true; }
if (!(o instanceof Element)) { return false; }
final Element<?> e = (Element<?>) o;
return element.equals(e.element);
}
}
private static final class ElementComparator<E>
implements Comparator<Element<E>> {
final Comparator<? super E> comparator;
ElementComparator(final Comparator<? super E> comp) {
comparator = comp;
}
#Override
#SuppressWarnings("unchecked")
public int compare(final Element<E> e1, final Element<E> e2) {
if (comparator != null) {
return comparator.compare(e1.element, e2.element);
}
return ((Comparable<? super E>) e1.element)
.compareTo(e2.element);
}
}
}
}
The inner workings of this class are simple to grasp. The view keeps a list of sorted sets, the ones you want to iterate over. It also needs the comparator that will be used to compare elements (null to use their natural ordering). You can only add (distinct) sets to the view.
The rest of the magic happens in the Iterator of this view. This iterator keeps a PriorityQueue of the elements that will be returned from next() and a list of iterators from the individual sets.
This queue will have, at all times, at most one element per set, and it discards repeating elements. The iterator also discards empty and used up iterators. In short, it guarantees that you will traverse every element exactly once (as in a set).
Here's an example on how to use this class.
SortedSet<Integer> s1 = new TreeSet<>();
SortedSet<Integer> s2 = new TreeSet<>();
SortedSet<Integer> s3 = new TreeSet<>();
SortedSet<Integer> s4 = new TreeSet<>();
// ...
MultiSortedSetView<Integer> v =
new MultiSortedSetView<Integer>()
.add(s1)
.add(s2)
.add(s3)
.add(s4);
for (final Integer i: v) {
System.out.println(i);
}
I do not think that is possible unless it is some special case, which would require custom implementation.
For example take the following two comparators:
public class Comparator1 implements Comparator<Long> {
#Override
public int compare(Long o1, Long o2) {
return o1.compareTo(o2);
}
}
public class Comparator2 implements Comparator<Long> {
#Override
public int compare(Long o1, Long o2) {
return -o1.compareTo(o2);
}
}
and the following code:
TreeSet<Long> set1 = new TreeSet<Long>(new Comparator1());
TreeSet<Long> set2 = new TreeSet<Long>(new Comparator2());
set1.addAll(Arrays.asList(new Long[] {1L, 3L, 5L}));
set2.addAll(Arrays.asList(new Long[] {2L, 4L, 6L}));
System.out.println(Joiner.on(",").join(set1.descendingIterator()));
System.out.println(Joiner.on(",").join(set2.descendingIterator()));
This will result in:
5,3,1
2,4,6
and is useless for any Comparator operating on the head element of the given Iterators.
This makes it impossible to create such a general solution. It is only possible if all sets are sorted using the same Comparator, however that cannot be guaranteed and ensured by any implementation which accept SortedSet objects, given multiple SortedSet instances (e.g. anything that would accept SortedSet<Long> instances, would accept both TreeSet objects).
A little bit more formal approach:
Given y_1,..,y_n are all sorted sets, if:
the intersect of these sets are an empty set
and there is an ordering of the sets where for every y_i, y_(i+1) set it is true that y_i[x] <= y_(i+1)[1] where x is the last element of the y_i sorted set, and <= means a comparative function
then the sets y_1,..,y_n can be read after each other as a SortedSet.
Now if any of the following conditions are not met:
if the first condition is not met, then the definition of a Set is not fulfilled, so it can not be a Set until a deep copy merge is completed and the duplicated elements are removed (See Set javadoc, first paragraph:
sets contain no pair of elements e1 and e2 such that e1.equals(e2)
the second condition can only be ensured using exactly the same comparator <= function
The first condition is the more important, because being a SortedSet implies being a Set, and if the definition of being a Set cannot be fulfilled, then the stronger conditions of a SortedSet definitely cannot be fulfilled.
There is a possibility that an implementation can exists which mimics the working of a SortedSet, but it will definitely not be a SortedSet.
com.google.common.collect.Sets#union from Guava will do the trick. It returns an unmodifiable view of the union of two sets. You may iterate over it. Returned set will not be sorted. You may then create new sorted set from returned set (new TreeSet() or com.google.common.collect.ImmutableSortedSet. I see no API to create view of given set as sorted set.
If your concern is a deep-copy on the objects passed to the TreeSet#addAll method, you shouldn't be. The javadoc does not indicate it's a deep-copy (and it certainly would say so if it was)...and the OpenJDK implementation doesn't show this either. No copies - simply additional references to the existing object.
Since the deep-copy isn't an issue, I think worrying about this, unless you've identified this as a specific performance problem, falls into the premature optimization category.