If anybody is familiar with Objective-C there is a collection called NSOrderedSet that acts as Set and its items can be accessed as an Array's ones.
Is there anything like this in Java?
I've heard there is a collection called LinkedHashMap, but I haven't found anything like it for a set.
Take a look at LinkedHashSet class
From Java doc:
Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). Note that insertion order is not affected if an element is re-inserted into the set. (An element e is reinserted into a set s if s.add(e) is invoked when s.contains(e) would return true immediately prior to the invocation.).
Every Set has an iterator(). A normal HashSet's iterator is quite random, a TreeSet does it by sort order, a LinkedHashSet iterator iterates by insert order.
You can't replace an element in a LinkedHashSet, however. You can remove one and add another, but the new element will not be in the place of the original. In a LinkedHashMap, you can replace a value for an existing key, and then the values will still be in the original order.
Also, you can't insert at a certain position.
Maybe you'd better use an ArrayList with an explicit check to avoid inserting duplicates.
Take a look at the Java standard API doc. Right next to LinkedHashMap, there is a LinkedHashSet. But note that the order in those is the insertion order, not the natural order of the elements. And you can only iterate in that order, not do random access (except by counting iteration steps).
There is also an interface SortedSet implemented by TreeSet and ConcurrentSkipListSet. Both allow iteration in the natural order of their elements or a Comparator, but not random access or insertion order.
For a data structure that has both efficient access by index and can efficiently implement the set criterium, you'd need a skip list, but there is no implementation with that functionality in the Java Standard API, though I am certain it's easy to find one on the internet.
TreeSet is ordered.
http://docs.oracle.com/javase/6/docs/api/java/util/TreeSet.html
Try using java.util.TreeSet that implements SortedSet.
To quote the doc:
"The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used"
Note that add, remove and contains has a time cost log(n).
If you want to access the content of the set as an Array, you can convert it doing:
YourType[] array = someSet.toArray(new YourType[yourSet.size()]);
This array will be sorted with the same criteria as the TreeSet (natural or by a comparator), and in many cases this will have a advantage instead of doing a Arrays.sort()
treeset is an ordered set, but you can't access via an items index, just iterate through or go to beginning/end.
If we are talking about inexpensive implementation of the skip-list, I wonder in term of big O, what the cost of this operation is:
YourType[] array = someSet.toArray(new YourType[yourSet.size()]);
I mean it is always get stuck into a whole array creation, so it is O(n):
java.util.Arrays#copyOf
You might also get some utility out of a Bidirectional Map like the BiMap from Google Guava
With a BiMap, you can pretty efficiently map an Integer (for random index access) to any other object type. BiMaps are one-to-one, so any given integer has, at most, one element associated with it, and any element has one associated integer. It's cleverly underpinned by two HashTable instances, so it uses almost double the memory, but it's a lot more efficient than a custom List as far as processing because contains() (which gets called when an item is added to check if it already exists) is a constant-time and parallel-friendly operation like HashSet's, while List's implementation is a LOT slower.
I had a similar problem. I didn't quite need an ordered set but more a list with a fast indexOf/contains. As I didn't find anything out there I implemented one myself. Here's the code, it implements both Set and List, though not all bulk list operations are as fast as the ArrayList versions.
disclaimer: not tested
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Set;
import java.util.Collection;
import java.util.Comparator;
import java.util.function.Predicate;
import java.util.function.UnaryOperator;
import static java.util.Objects.requireNonNull;
/**
* An ArrayList that keeps an index of its content so that contains()/indexOf() are fast. Duplicate entries are
* ignored as most other java Set's do.
*/
public class IndexedArraySet<E> extends ArrayList<E> implements Set<E> {
public IndexedArraySet() { super(); }
public IndexedArraySet(Iterable<E> c) {
super();
addAll(c);
}
private HashMap<E, Integer> indexMap = new HashMap<>();
private void reindex() {
indexMap.clear();
int idx = 0;
for (E item: this) {
addToIndex(item, idx++);
}
}
private E addToIndex(E e, int idx) {
indexMap.putIfAbsent(requireNonNull(e), idx);
return e;
}
#Override
public boolean add(E e) {
if(indexMap.putIfAbsent(requireNonNull(e), size()) != null) return false;
super.add(e);
return true;
}
#Override
public boolean addAll(Collection<? extends E> c) {
return addAll((Iterable<? extends E>) c);
}
public boolean addAll(Iterable<? extends E> c) {
boolean rv = false;
for (E item: c) {
rv |= add(item);
}
return rv;
}
#Override
public boolean contains(Object e) {
return indexMap.containsKey(e);
}
#Override
public int indexOf(Object e) {
if (e == null) return -1;
Integer i = indexMap.get(e);
return (i == null) ? -1 : i;
}
#Override
public int lastIndexOf(Object e) {
return indexOf(e);
}
#Override #SuppressWarnings("unchecked")
public Object clone() {
IndexedArraySet clone = (IndexedArraySet) super.clone();
clone.indexMap = (HashMap) indexMap.clone();
return clone;
}
#Override
public void add(int idx, E e) {
if(indexMap.putIfAbsent(requireNonNull(e), -1) != null) return;
super.add(idx, e);
reindex();
}
#Override
public boolean remove(Object e) {
boolean rv;
try { rv = super.remove(e); }
finally { reindex(); }
return rv;
}
#Override
public void clear() {
super.clear();
indexMap.clear();
}
#Override
public boolean addAll(int idx, Collection<? extends E> c) {
boolean rv;
try {
for(E item : c) {
// check uniqueness
addToIndex(item, -1);
}
rv = super.addAll(idx, c);
} finally {
reindex();
}
return rv;
}
#Override
public boolean removeAll(Collection<?> c) {
boolean rv;
try { rv = super.removeAll(c); }
finally { reindex(); }
return rv;
}
#Override
public boolean retainAll(Collection<?> c) {
boolean rv;
try { rv = super.retainAll(c); }
finally { reindex(); }
return rv;
}
#Override
public boolean removeIf(Predicate<? super E> filter) {
boolean rv;
try { rv = super.removeIf(filter); }
finally { reindex(); }
return rv;
}
#Override
public void replaceAll(final UnaryOperator<E> operator) {
indexMap.clear();
try {
int duplicates = 0;
for (int i = 0; i < size(); i++) {
E newval = requireNonNull(operator.apply(this.get(i)));
if(indexMap.putIfAbsent(newval, i-duplicates) == null) {
super.set(i-duplicates, newval);
} else {
duplicates++;
}
}
removeRange(size()-duplicates, size());
} catch (Exception ex) {
// If there's an exception the indexMap will be inconsistent
reindex();
throw ex;
}
}
#Override
public void sort(Comparator<? super E> c) {
try { super.sort(c); }
finally { reindex(); }
}
}
IndexedTreeSet from the indexed-tree-map project provides this functionality (ordered/sorted set with list-like access by index).
My own MapTreeAVL (whose implementations are under the mtAvl package next to that interface) provides both a method to obtain a SortedSet and a way to have index-based access (some so-called Optimizations such as MinMaxIndexIteration might be preferred in that case).
(Note: since my repo is open you can directly download the desired file, but I advise you to check if [and will be] some other classes in the same superpackage dataStructure are required/used/imported)
Related
I want to periodically iterate over a ConcurrentHashMap while removing entries, like this:
for (Iterator<Entry<Integer, Integer>> iter = map.entrySet().iterator(); iter.hasNext(); ) {
Entry<Integer, Integer> entry = iter.next();
// do something
iter.remove();
}
The problem is that another thread may be updating or modifying values while I'm iterating. If that happens, those updates can be lost forever, because my thread only sees stale values while iterating, but the remove() will delete the live entry.
After some consideration, I came up with this workaround:
map.forEach((key, value) -> {
// delete if value is up to date, otherwise leave for next round
if (map.remove(key, value)) {
// do something
}
});
One problem with this is that it won't catch modifications to mutable values that don't implement equals() (such as AtomicInteger). Is there a better way to safely remove with concurrent modifications?
Your workaround works but there is one potential scenario. If certain entries have constant updates map.remove(key,value) may never return true until updates are over.
If you use JDK8 here is my solution
for (Iterator<Entry<Integer, Integer>> iter = map.entrySet().iterator(); iter.hasNext(); ) {
Entry<Integer, Integer> entry = iter.next();
Map.compute(entry.getKey(), (k, v) -> f(v));
//do something for prevValue
}
....
private Integer prevValue;
private Integer f(Integer v){
prevValue = v;
return null;
}
compute() will apply f(v) to the value and in our case assign the value to the global variable and remove the entry.
According to Javadoc it is atomic.
Attempts to compute a mapping for the specified key and its current mapped value (or null if there is no current mapping). The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.
Your workaround is actually pretty good. There are other facilities on top of which you can build a somewhat similar solution (e.g. using computeIfPresent() and tombstone values), but they have their own caveats and I have used them in slightly different use-cases.
As for using a type that doesn't implement equals() for the map values, you can use your own wrapper on top of the corresponding type. That's the most straightforward way to inject custom semantics for object equality into the atomic replace/remove operations provided by ConcurrentMap.
Update
Here's a sketch that shows how you can build on top of the ConcurrentMap.remove(Object key, Object value) API:
Define a wrapper type on top of the mutable type you use for the values, also defining your custom equals() method building on top of the current mutable value.
In your BiConsumer (the lambda you're passing to forEach), create a deep copy of the value (which is of type your new wrapper type) and perform your logic determining whether the value needs to be removed on the copy.
If the value needs to be removed, call remove(myKey, myValueCopy).
If there have been some concurrent changes while you were calculating whether the value needs to be removed, remove(myKey, myValueCopy) will return false (barring ABA problems, which are a separate topic).
Here's some code illustrating this:
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicInteger;
public class Playground {
private static class AtomicIntegerWrapper {
private final AtomicInteger value;
AtomicIntegerWrapper(int value) {
this.value = new AtomicInteger(value);
}
public void set(int value) {
this.value.set(value);
}
public int get() {
return this.value.get();
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!(obj instanceof AtomicIntegerWrapper)) {
return false;
}
AtomicIntegerWrapper other = (AtomicIntegerWrapper) obj;
if (other.value.get() == this.value.get()) {
return true;
}
return false;
}
public static AtomicIntegerWrapper deepCopy(AtomicIntegerWrapper wrapper) {
int wrapped = wrapper.get();
return new AtomicIntegerWrapper(wrapped);
}
}
private static final ConcurrentMap<Integer, AtomicIntegerWrapper> MAP
= new ConcurrentHashMap<>();
private static final int NUM_THREADS = 3;
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 10; ++i) {
MAP.put(i, new AtomicIntegerWrapper(1));
}
Thread.sleep(1);
for (int i = 0; i < NUM_THREADS; ++i) {
new Thread(() -> {
Random rnd = new Random();
while (!MAP.isEmpty()) {
MAP.forEach((key, value) -> {
AtomicIntegerWrapper elem = MAP.get(key);
if (elem == null) {
System.out.println("Oops...");
} else if (elem.get() == 1986) {
elem.set(1);
} else if ((rnd.nextInt() & 128) == 0) {
elem.set(1986);
}
});
}
}).start();
}
Thread.sleep(1);
new Thread(() -> {
Random rnd = new Random();
while (!MAP.isEmpty()) {
MAP.forEach((key, value) -> {
AtomicIntegerWrapper elem =
AtomicIntegerWrapper.deepCopy(MAP.get(key));
if (elem.get() == 1986) {
try {
Thread.sleep(10);
} catch (Exception e) {}
boolean replaced = MAP.remove(key, elem);
if (!replaced) {
System.out.println("Bailed out!");
} else {
System.out.println("Replaced!");
}
}
});
}
}).start();
}
}
You'll see printouts of "Bailed out!", intermixed with "Replaced!" (removal was successful, as there were no concurrent updates that you care about) and the calculation will stop at some point.
If you remove the custom equals() method and continue to use a copy, you'll see an endless stream of "Bailed out!", because the copy is never considered equal to the value in the map.
If you don't use a copy, you won't see "Bailed out!" printed out, and you'll hit the problem you're explaining - values are removed regardless of concurrent changes.
Let us consider what options you have.
Create your own Container-class with isUpdated() operation and use your own workaround.
If your map contains just a few elements and you are iterating over the map very frequently compared against put/delete operation. It could be a good choice to use CopyOnWriteArrayList
CopyOnWriteArrayList<Entry<Integer, Integer>> lookupArray = ...;
The other option is to implement your own CopyOnWriteMap
public class CopyOnWriteMap<K, V> implements Map<K, V>{
private volatile Map<K, V> currentMap;
public V put(K key, V value) {
synchronized (this) {
Map<K, V> newOne = new HashMap<K, V>(this.currentMap);
V val = newOne.put(key, value);
this.currentMap = newOne; // atomic operation
return val;
}
}
public V remove(Object key) {
synchronized (this) {
Map<K, V> newOne = new HashMap<K, V>(this.currentMap);
V val = newOne.remove(key);
this.currentMap = newOne; // atomic operation
return val;
}
}
[...]
}
There is a negative side effect. If you are using copy-on-write Collections your updates will be never lost, but you can see some former deleted entry again.
Worst case: deleted entry will be restored every time if map get copied.
This has been annoying me in a project recently and my Google phoo is failing me at finding a suitable answer.
Is there a collection, that has access to the ListIterator but also only allows for unique values inside the collection?
Reasoning for this, I have a collection of items, that in this collection, there should only ever be one of each element. I also want to be able to traverse this collection in both directions whilst also being sorted or allow me to sort it using Collections.Sort();
I've not found anything suitable and had to write my own class using the following code:
public class UniqueArrayList<E> extends ArrayList<E> {
#Override
public boolean add(E element){
if (this.contains(element))
return false;
else
return super.add(element);
}
#Override
public void add(int index, E element){
if (this.contains(element))
return;
else
super.add(index, element);
}
#Override
public boolean addAll(Collection<? extends E> c){
if (new HashSet<E>(c).size() < c.size())
return false;
for(E element : c){
if (this.contains(c))
return false;
}
return super.addAll(c);
}
#Override
public boolean addAll(int index, Collection<? extends E> c) {
if (new HashSet<E>(c).size() < c.size())
return false;
for(E element : c){
if (this.contains(c))
return false;
}
return super.addAll(index, c);
}
#Override
public ListIterator<E> listIterator(int index) {
if (index < 0 || index > this.size())
throw new IndexOutOfBoundsException("Index: "+index);
return new ListItr(index);
}
#Override
public ListIterator<E> listIterator() {
return new ListItr(0);
}
#Override
public Iterator<E> iterator() {
return new Itr();
}
private class Itr implements Iterator<E> {
int cursor; // index of next element to return
int lastRet = -1; // index of last element returned; -1 if no such
int expectedModCount = modCount;
public boolean hasNext() {
return cursor != size();
}
#SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size())
throw new NoSuchElementException();
Object[] elementData = UniqueArrayList.this.toArray();
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = i];
}
public void remove() {
if (lastRet < 0)
throw new IllegalStateException();
checkForComodification();
try {
UniqueArrayList.this.remove(lastRet);
cursor = lastRet;
lastRet = -1;
expectedModCount = modCount;
} catch (IndexOutOfBoundsException ex) {
throw new ConcurrentModificationException();
}
}
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
}
private class ListItr extends Itr implements ListIterator<E> {
ListItr(int index) {
super();
cursor = index;
}
public boolean hasPrevious() {
return cursor != 0;
}
public int nextIndex() {
return cursor;
}
public int previousIndex() {
return cursor - 1;
}
#SuppressWarnings("unchecked")
public E previous() {
checkForComodification();
int i = cursor - 1;
if (i < 0)
throw new NoSuchElementException();
Object[] elementData = UniqueArrayList.this.toArray();
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i;
return (E) elementData[lastRet = i];
}
public void set(E e) {
if (lastRet < 0)
throw new IllegalStateException();
checkForComodification();
try {
//Need to allow this for the collections sort to work!
//if (!UniqueArrayList.this.contains(e))
UniqueArrayList.this.set(lastRet, e);
} catch (IndexOutOfBoundsException ex) {
throw new ConcurrentModificationException();
}
}
public void add(E e) {
checkForComodification();
try {
int i = cursor;
UniqueArrayList.this.add(i, e);
cursor = i + 1;
lastRet = -1;
expectedModCount = modCount;
} catch (IndexOutOfBoundsException ex) {
throw new ConcurrentModificationException();
}
}
}
}
However this is far from perfect, for one I can't override the ListIterator.set(); because Collections.sort(); uses it to move items in the list. If I try to prevent non unique items from being added to the list here, the sort never happens.
So, does anyone have a better method or know of another collection that abides by the rules that I would like? Or do I just need to live with this rather irritating issue?
[Edit]
This is the Collections.sort(); method:
public static <T extends Comparable<? super T>> void sort(List<T> list) {
Object[] a = list.toArray();
Arrays.sort(a);
ListIterator<T> i = list.listIterator();
for (int j=0; j<a.length; j++) {
i.next();
i.set((T)a[j]);
}
}
The reasoning they give for doing this is:
This implementation dumps the specified list into an array, sorts the
array, and iterates over the list resetting each element from the
corresponding position in the array. This avoids the n2
log(n) performance that would result from attempting to sort a linked
list in place.
When you need unique values you should try to switch to a Set.
You can use a TreeSet together with a Comparator instance to sort the entries. The descendingSet() method of TreeSet will give you the reverse order.
If you really need a ListIterator at some point you could create a temporary list from the set.
As Thor Stan mentioned, a TreeSet gets you most of what you want. It ensures elements are unique, it keeps them sorted, and you can iterate it in either direction using iterator() or descendingIterator().
It's not entirely clear why you're asking for ListIterator though. Most things about a ListIterator are very positional: the notion of an index, or adding something at the current position, or setting the current element. These don't make sense for a sorted set.
One aspect of a ListIterator that you might be looking for, though, is the ability to reverse directions in the midst of iteration. You can't do this directly with a TreeSet iterator, since it offers access only via an ordinary Iterator instead of a ListIterator.
However, a TreeSet implements the NavigableSet interface, which lets you step through the elements in order, in either direction. The NavigableSet interface is a subinterface of SortedSet, which provides the first() and last() methods to get you started at one of the "ends" of the set. Once you have an element in the set, you can step in either direction using the lower(E) and higher(E) methods. Or, if you want to start somewhere in the middle, you can't start at a position by index, but you can start with a value (which needn't be a member of the set) and then call lower(E) or higher(E).
For example:
TreeSet<String> set = new TreeSet<>(
Arrays.asList("a", "z", "b", "y"));
String cur;
cur = set.first(); // a
cur = set.higher(cur); // b
cur = set.higher(cur); // y
cur = set.higher(cur); // z
cur = set.lower(cur); // y
cur = set.lower(cur); // b
cur = set.lower(cur); // a
cur = set.lower(cur); // null
Java considers List to allow non-unique items and Set to not. Sets obviously don't support ListIterators and therefore code that uses ListIterator can assume that the underlying collection is not a Set.
Java 8 doesn't use ListIterator for sorting anymore, but if you're stuck with Java 7 that doesn't really help you. Basically depending a lot on your context and usage, it might be useful to either use a Set like Thor Stan said in his response, creating a List on demand when needed.
Another option is to just provide your own ListIterator that accesses the list through a method that doesn't check for duplicates. This would have the advantage of not creating extraneous objects, but the Set option would most likely result in shorter code and is highly unlikely to be significant performance-wise.
There are probably other options too, but I can't think of any elegant ones.
This is a problem with no easy answers.
The problem is that you have broken the contract for add(int, E). This method must either add the element or throw an exception - it is not allowed to return without adding the element.
If you override set(int, E) so that sometimes it doesn't set the element, it would break the contract for that method too, and it would prevent Collections.sort() from working as you identified.
I would not recommend breaking these contracts - it may cause other methods that act on lists to behave in unpredictable ways.
Others have experienced these difficulties - see the java docs for SetUniqueList for example.
Another problem with your implementation is that it would be extremely slow because ArrayList.contains() is a linear search.
One solution would be to write a class that uses both an ArrayList and a HashSet, and write your own versions of add and set, rather than breaking the contracts for the usual versions. This is not tested, but you get the idea.
public final class MyList<E extends Comparable<? super E>> extends AbstractList<E> {
private final List<E> list = new ArrayList<>();
private final Set<E> set = new HashSet<>();
public E get(int i) {
return list.get(i);
}
public int size() {
return list.size();
}
public boolean tryAdd(E e) {
return set.add(e) && list.add(e);
}
public boolean tryAdd(int i, E e) {
if (set.add(e)) {
list.add(i, e);
return true;
}
return false;
}
public boolean trySet(int i, E e) {
return set.add(e) && set.remove(list.set(i, e));
}
public boolean remove(Object o) {
return set.remove(o) && list.remove(o);
}
public void sort() {
Collections.sort(list);
}
// One bonus of this approach is that contains() is now O(1)
public boolean contains(Object o) {
return set.contains(o);
}
// rest omitted.
}
Something like this would not break the contract for List. Note that with this version, add and set throw an UnsupportedOperationException, because this is the behaviour inherited from AbstractList. You would be able to sort by calling myList.sort(). Also, the listIterator method would work, but you would not be able to use it to set or add (although remove would work).
If you needed to add or set elements while iterating over this List, you would need to use an explicit index, rather than a ListIterator. Personally, I do not consider this a major problem. ListIterator is necessary for lists like LinkedList that do not have a constant time get method, but for ArrayList it's nice but not essential.
I am trying to write an implementation of Set which has an additional method randomElement() which returns an element at random from the Set. I have based the implementation on HashSet for a fast contains() method. I have also used an ArrayList so that the randomElement() method is O(1) too - with an ArrayList all you have to do is choose a random index.
Here is my code.
public final class RandomChoiceSet<E> extends AbstractSet<E> {
private final List<E> list = new ArrayList<>();
private final Set<E> set = new HashSet<>();
private final Random random = new Random();
public RandomChoiceSet() {}
public RandomChoiceSet(Collection<? extends E> collection) {
addAll(collection);
}
public E randomElement() {
return list.get(random.nextInt(list.size()));
}
#Override
public int size() {
return list.size();
}
#Override
public boolean contains(Object o) {
return set.contains(o);
}
#Override
public void clear() {
list.clear();
set.clear();
}
#Override
public boolean add(E e) {
boolean result = set.add(e);
if (result)
list.add(e);
return result;
}
#Override
public boolean remove(Object o) {
boolean result = set.remove(o);
if (result)
list.remove(o);
return result;
}
#Override
public Iterator<E> iterator() {
return new Iterator<E>() {
private final Iterator<E> iterator = list.iterator();
private E e;
#Override
public boolean hasNext() {
return iterator.hasNext();
}
#Override
public E next() {
return e = iterator.next();
}
#Override
public void remove() {
iterator.remove();
set.remove(e);
}
};
}
}
The drawback to maintaining a List as well as a Set is that it makes the remove() method O(n) because the element has to be found in the ArrayList first, and then all other elements have to be moved along one place. So I was wondering whether it is possible to write such a Set where all five operations size(), contains(), add(), remove() and randomElement() are O(1)?
The only thing I can think of is to replace your set with a HashMap that maps from your element to it's position in the arrayList. size, contains, add and random will be the same. For remove you would do as follows:
Find the element from the HashMap
Retrieve it's position in the arrayList
Remove the element from the HashMap
Swap the deleted element with the last element in the array
Modify the swapped element position in the HashMap
Delete the last element from the array //Now this is O(1)
What makes this work is that you don't need any particular order in your array, you just need a random access storage so changing the order or the data will not cause any problems as long as you keep it in sync with your hashmap.
Here is a functioning implementation, following Amr's solution. Even the Iterator's remove() method works, because elements are always swapped in from a later position.
public final class RandomChoiceSet<E> extends AbstractSet<E> {
private final List<E> list = new ArrayList<>();
private final Map<E, Integer> map = new HashMap<>();
private final Random random = new Random();
public RandomChoiceSet() {}
public RandomChoiceSet(Collection<? extends E> collection) {
addAll(collection);
}
public E randomElement() {
return list.get(random.nextInt(list.size()));
}
#Override
public int size() {
return list.size();
}
#Override
public boolean contains(Object o) {
return map.containsKey(o);
}
#Override
public void clear() {
list.clear();
map.clear();
}
#Override
public boolean add(E e) {
if (map.containsKey(e))
return false;
map.put(e, list.size());
list.add(e);
return true;
}
#Override
public boolean remove(Object o) {
Integer currentIndex = map.get(o);
if (currentIndex == null)
return false;
int size = list.size();
E lastE = list.get(size - 1);
list.set(currentIndex, lastE);
list.remove(size - 1);
map.put(lastE, currentIndex);
map.remove(o);
return true;
}
#Override
public Iterator<E> iterator() {
return new Iterator<E>() {
private int index = 0;
#Override
public boolean hasNext() {
return index < list.size();
}
#Override
public E next() {
return list.get(index++);
}
#Override
public void remove() {
RandomChoiceSet.this.remove(list.get(--index));
}
};
}
}
EDIT
The more I think about this, the more useful this Set implementation seems. Although I have not done so in the code above, you can include a get() method to get elements by index. Not only is this nicer than the other ways for getting any element from a Set (e.g. set.iterator().next() or set.stream().findAny().get()), but it enables you to iterate over the set with an explicit index.
int n = set.size();
for (int i = 0; i < n; i++) {
Object o = set.get(i);
// do something with o.
}
I haven't done proper bench-marking, but iterating over a Set like this seems to be several times faster than iterating over a HashSet using a for each loop. Obviously there are no checks for concurrent modification, and this is a Set implementation that is more memory-hungry than a HashSet (although not as memory-hungry as a LinkedHashSet), but overall I think it's pretty interesting.
I think you need a custom implementation of HashMap, because it requires a fine-grain control. Randomly picking a bucket is easy enough, and you could use ArrayLists in your buckets to have random access.
To make it clear, you would implement a classic HashMap but instead of using a LinkedList in each bucket you would have an ArrayList. Picking a random element would be as easy as :
randomly pick an index rd0 between 0 and nbBuckets
randomly picking an index rd1 between 0 and buckets[rd0].size()
This is a perfect example of why the Collections interface should have a getRandom() method and why it is a fundamental missing feature in the API. HashSet's implementation of getRandom() would just call it's private HashMap getRandom() method and so forth until the data representation is either indexable or iterable, point at which the getRandom() logic should be implemented.
Basically this getRandom() method would vary complexity according to the underlying implementation. But since under the hood all data eventually has to be stored either as arrays or linked lists there's a lot of optimization being cast aside by not having a Collection aware getRandom().
Think, what is a Collection? think can I retrieve a random element from a collection in the real world? yes, and so should it be in proper OO code.
However it doesn't, so you are stuck with iterating through the items in your hashSet and returning the random.nextInt(size()) element if you can't afford to build your own class.
If you can afford to build your own implementation which seems to be your case, then your suggestion is a fair approach however I don't get it why you implement your own anonymous Iterator, this should do just fine.
#Override
public Iterator<E> iterator() {
return set.iterator();
}
I'm looking for a way to tell if two sets of different element types are identical if I can state one-to-one relation between those element types. Is there a standard way for doing this in java or maybe guava or apache commons?
Here is my own implementation of this task. For example, I have two element classes which I know how to compare. For simplicity, I compare them by id field:
class ValueObject {
public int id;
public ValueObject(int id) { this.id=id; }
public static ValueObject of(int id) { return new ValueObject(id); }
}
class DTO {
public int id;
public DTO(int id) { this.id=id; }
public static DTO of(int id) { return new DTO(id); }
}
Then I define an interface which does the comparison
interface TwoTypesComparator<L,R> {
boolean areIdentical(L left, R right);
}
And the actual method for comparing sets looks like this
public static <L,R> boolean areIdentical(Set<L> left, Set<R> right, TwoTypesComparator<L,R> comparator) {
if (left.size() != right.size()) return false;
boolean found;
for (L l : left) {
found = false;
for (R r : right) {
if (comparator.areIdentical(l, r)) {
found = true; break;
}
}
if (!found) return false;
}
return true;
}
Example of a client code
HashSet<ValueObject> valueObjects = new HashSet<ValueObject>();
valueObjects.add(ValueObject.of(1));
valueObjects.add(ValueObject.of(2));
valueObjects.add(ValueObject.of(3));
HashSet<DTO> dtos = new HashSet<DTO>();
dtos.add(DTO.of(1));
dtos.add(DTO.of(2));
dtos.add(DTO.of(34));
System.out.println(areIdentical(valueObjects, dtos, new TwoTypesComparator<ValueObject, DTO>() {
#Override
public boolean areIdentical(ValueObject left, DTO right) {
return left.id == right.id;
}
}));
I'm looking for the standard solution to to this task. Or any suggestions how to improve this code are welcome.
This is what I would do in your case. You have sets. Sets are hard to compare, but on top of that, you want to compare on their id.
I see only one proper solution where you have to normalize the wanted values (extract their id) then sort those ids, then compare them in order, because if you don't sort and compare you can possibly skip pass over duplicates and/or values.
Think about the fact that Java 8 allows you to play lazy with streams. So don't rush over and think that extracting, then sorting then copying is long. Lazyness allows it to be rather fast compared to iterative solutions.
HashSet<ValueObject> valueObjects = new HashSet<>();
valueObjects.add(ValueObject.of(1));
valueObjects.add(ValueObject.of(2));
valueObjects.add(ValueObject.of(3));
HashSet<DTO> dtos = new HashSet<>();
dtos.add(DTO.of(1));
dtos.add(DTO.of(2));
dtos.add(DTO.of(34));
boolean areIdentical = Arrays.equals(
valueObjects.stream()
.mapToInt((v) -> v.id)
.sorted()
.toArray(),
dtos.stream()
.mapToInt((d) -> d.id)
.sorted()
.toArray()
);
You want to generalize the solution? No problem.
public static <T extends Comparable<?>> boolean areIdentical(Collection<ValueObject> vos, Function<ValueObject, T> voKeyExtractor, Collection<DTO> dtos, Function<DTO, T> dtoKeyExtractor) {
return Arrays.equals(
vos.stream()
.map(voKeyExtractor)
.sorted()
.toArray(),
dtos.stream()
.map(dtoKeyExtractor)
.sorted()
.toArray()
);
}
And for a T that is not comparable:
public static <T> boolean areIdentical(Collection<ValueObject> vos, Function<ValueObject, T> voKeyExtractor, Collection<DTO> dtos, Function<DTO, T> dtoKeyExtractor, Comparator<T> comparator) {
return Arrays.equals(
vos.stream()
.map(voKeyExtractor)
.sorted(comparator)
.toArray(),
dtos.stream()
.map(dtoKeyExtractor)
.sorted(comparator)
.toArray()
);
}
You mention Guava and if you don't have Java 8, you can do the following, using the same algorithm:
List<Integer> voIds = FluentIterables.from(valueObjects)
.transform(valueObjectIdGetter())
.toSortedList(intComparator());
List<Integer> dtoIds = FluentIterables.from(dtos)
.transform(dtoIdGetter())
.toSortedList(intComparator());
return voIds.equals(dtoIds);
Another solution would be to use List instead of Set (if you are allowed to do so). List has a method called get(int index) that retrieves the element at the specified index and you can compare them one by one when both your lists have the same size. More on lists: http://docs.oracle.com/javase/7/docs/api/java/util/List.html
Also, avoid using public variables in your classes. A good practice is to make your variables private and use getter and setter methods.
Instantiate lists and add values
List<ValueObject> list = new ArrayList<>();
List<DTO> list2 = new ArrayList<>();
list.add(ValueObject.of(1));
list.add(ValueObject.of(2));
list.add(ValueObject.of(3));
list2.add(DTO.of(1));
list2.add(DTO.of(2));
list2.add(DTO.of(34));
Method that compares lists
public boolean compareLists(List<ValueObject> list, List<DTO> list2) {
if(list.size() != list2.size()) {
return false;
}
for(int i = 0; i < list.size(); i++) {
if(list.get(i).id == list2.get(i).id) {
continue;
} else {
return false;
}
}
return true;
}
Your current method is incorrect or at least inconsistent for general sets.
Imagine the following:
L contains the Pairs (1,1), (1,2), (2,1).
R contains the Pairs (1,1), (2,1), (2,2).
Now if your id is the first value your compare would return true but are those sets really equal? The problem is that you have no guarantee that there is at most one Element with the same id in the set because you don't know how L and R implement equals so my advise would be to not compare sets of different types.
If you really need to compare two Sets the way you described I would go for copying all Elements from L to a List and then go through R and every time you find the Element in L remove it from the List. Just make sure you use LinkedList instead of ArrayList .
You could override equals and hashcode on the dto/value object and then do : leftSet.containsAll(rightSet) && leftSet.size().equals(rightSet.size())
If you can't alter the element classes, make a decorator and have the sets be of the decorator type.
In Java, I have several SortedSet instances. I would like to iterate over the elements from all these sets. One simple option is to create a new SortedSet, such as TreeSet x, deep-copy the contents of all the individual sets y_1, ..., y_n into it using x.addAll(y_i), and then iterate over x.
But is there a way to avoid deep copy? Couldn't I just create a view of type SortedSet which would somehow encapsulate the iterators of all the inner sets, but behave as a single set?
I'd prefer an existing, tested solution, rather than writing my own.
I'm not aware of any existing solution to accomplish this task, so I took the time to write one for you. I'm sure there's room for improvement on it, so take it as a guideline and nothing else.
As Sandor points out in his answer, there are some limitations that must be imposed or assumed. One such limitation is that every SortedSet must be sorted relative to the same order, otherwise there's no point in comparing their elements without creating a new set (representing the union of every individual set).
Here follows my code example which, as you'll notice, is relatively more complex than just creating a new set and adding all elements to it.
import java.util.*;
final class MultiSortedSetView<E> implements Iterable<E> {
private final List<SortedSet<E>> sets = new ArrayList<>();
private final Comparator<? super E> comparator;
MultiSortedSetView() {
comparator = null;
}
MultiSortedSetView(final Comparator<? super E> comp) {
comparator = comp;
}
#Override
public Iterator<E> iterator() {
return new MultiSortedSetIterator<E>(sets, comparator);
}
MultiSortedSetView<E> add(final SortedSet<E> set) {
// You may remove this `if` if you already know
// every set uses the same comparator.
if (comparator != set.comparator()) {
throw new IllegalArgumentException("Different Comparator!");
}
sets.add(set);
return this;
}
#Override
public boolean equals(final Object o) {
if (this == o) { return true; }
if (!(o instanceof MultiSortedSetView)) { return false; }
final MultiSortedSetView<?> n = (MultiSortedSetView<?>) o;
return sets.equals(n.sets) &&
(comparator == n.comparator ||
(comparator != null ? comparator.equals(n.comparator) :
n.comparator.equals(comparator)));
}
#Override
public int hashCode() {
int hash = comparator == null ? 0 : comparator.hashCode();
return 37 * hash + sets.hashCode();
}
#Override
public String toString() {
return sets.toString();
}
private final static class MultiSortedSetIterator<E>
implements Iterator<E> {
private final List<Iterator<E>> iterators;
private final PriorityQueue<Element<E>> queue;
private MultiSortedSetIterator(final List<SortedSet<E>> sets,
final Comparator<? super E> comparator) {
final int n = sets.size();
queue = new PriorityQueue<Element<E>>(n,
new ElementComparator<E>(comparator));
iterators = new ArrayList<Iterator<E>>(n);
for (final SortedSet<E> s: sets) {
iterators.add(s.iterator());
}
prepareQueue();
}
#Override
public E next() {
final Element<E> e = queue.poll();
if (e == null) {
throw new NoSuchElementException();
}
if (!insertFromIterator(e.iterator)) {
iterators.remove(e.iterator);
}
return e.element;
}
#Override
public boolean hasNext() {
return !queue.isEmpty();
}
private void prepareQueue() {
final Iterator<Iterator<E>> iterator = iterators.iterator();
while (iterator.hasNext()) {
if (!insertFromIterator(iterator.next())) {
iterator.remove();
}
}
}
private boolean insertFromIterator(final Iterator<E> i) {
while (i.hasNext()) {
final Element<E> e = new Element<>(i.next(), i);
if (!queue.contains(e)) {
queue.add(e);
return true;
}
}
return false;
}
private static final class Element<E> {
final E element;
final Iterator<E> iterator;
Element(final E e, final Iterator<E> i) {
element = e;
iterator = i;
}
#Override
public boolean equals(final Object o) {
if (o == this) { return true; }
if (!(o instanceof Element)) { return false; }
final Element<?> e = (Element<?>) o;
return element.equals(e.element);
}
}
private static final class ElementComparator<E>
implements Comparator<Element<E>> {
final Comparator<? super E> comparator;
ElementComparator(final Comparator<? super E> comp) {
comparator = comp;
}
#Override
#SuppressWarnings("unchecked")
public int compare(final Element<E> e1, final Element<E> e2) {
if (comparator != null) {
return comparator.compare(e1.element, e2.element);
}
return ((Comparable<? super E>) e1.element)
.compareTo(e2.element);
}
}
}
}
The inner workings of this class are simple to grasp. The view keeps a list of sorted sets, the ones you want to iterate over. It also needs the comparator that will be used to compare elements (null to use their natural ordering). You can only add (distinct) sets to the view.
The rest of the magic happens in the Iterator of this view. This iterator keeps a PriorityQueue of the elements that will be returned from next() and a list of iterators from the individual sets.
This queue will have, at all times, at most one element per set, and it discards repeating elements. The iterator also discards empty and used up iterators. In short, it guarantees that you will traverse every element exactly once (as in a set).
Here's an example on how to use this class.
SortedSet<Integer> s1 = new TreeSet<>();
SortedSet<Integer> s2 = new TreeSet<>();
SortedSet<Integer> s3 = new TreeSet<>();
SortedSet<Integer> s4 = new TreeSet<>();
// ...
MultiSortedSetView<Integer> v =
new MultiSortedSetView<Integer>()
.add(s1)
.add(s2)
.add(s3)
.add(s4);
for (final Integer i: v) {
System.out.println(i);
}
I do not think that is possible unless it is some special case, which would require custom implementation.
For example take the following two comparators:
public class Comparator1 implements Comparator<Long> {
#Override
public int compare(Long o1, Long o2) {
return o1.compareTo(o2);
}
}
public class Comparator2 implements Comparator<Long> {
#Override
public int compare(Long o1, Long o2) {
return -o1.compareTo(o2);
}
}
and the following code:
TreeSet<Long> set1 = new TreeSet<Long>(new Comparator1());
TreeSet<Long> set2 = new TreeSet<Long>(new Comparator2());
set1.addAll(Arrays.asList(new Long[] {1L, 3L, 5L}));
set2.addAll(Arrays.asList(new Long[] {2L, 4L, 6L}));
System.out.println(Joiner.on(",").join(set1.descendingIterator()));
System.out.println(Joiner.on(",").join(set2.descendingIterator()));
This will result in:
5,3,1
2,4,6
and is useless for any Comparator operating on the head element of the given Iterators.
This makes it impossible to create such a general solution. It is only possible if all sets are sorted using the same Comparator, however that cannot be guaranteed and ensured by any implementation which accept SortedSet objects, given multiple SortedSet instances (e.g. anything that would accept SortedSet<Long> instances, would accept both TreeSet objects).
A little bit more formal approach:
Given y_1,..,y_n are all sorted sets, if:
the intersect of these sets are an empty set
and there is an ordering of the sets where for every y_i, y_(i+1) set it is true that y_i[x] <= y_(i+1)[1] where x is the last element of the y_i sorted set, and <= means a comparative function
then the sets y_1,..,y_n can be read after each other as a SortedSet.
Now if any of the following conditions are not met:
if the first condition is not met, then the definition of a Set is not fulfilled, so it can not be a Set until a deep copy merge is completed and the duplicated elements are removed (See Set javadoc, first paragraph:
sets contain no pair of elements e1 and e2 such that e1.equals(e2)
the second condition can only be ensured using exactly the same comparator <= function
The first condition is the more important, because being a SortedSet implies being a Set, and if the definition of being a Set cannot be fulfilled, then the stronger conditions of a SortedSet definitely cannot be fulfilled.
There is a possibility that an implementation can exists which mimics the working of a SortedSet, but it will definitely not be a SortedSet.
com.google.common.collect.Sets#union from Guava will do the trick. It returns an unmodifiable view of the union of two sets. You may iterate over it. Returned set will not be sorted. You may then create new sorted set from returned set (new TreeSet() or com.google.common.collect.ImmutableSortedSet. I see no API to create view of given set as sorted set.
If your concern is a deep-copy on the objects passed to the TreeSet#addAll method, you shouldn't be. The javadoc does not indicate it's a deep-copy (and it certainly would say so if it was)...and the OpenJDK implementation doesn't show this either. No copies - simply additional references to the existing object.
Since the deep-copy isn't an issue, I think worrying about this, unless you've identified this as a specific performance problem, falls into the premature optimization category.