What Java Data Structure/Solution would best fit these requirements? - java

I need a java data structure/solution that meets these requirements. What best fits these?
1) Object's insertion order must be kept
2) Object's must be unique (These are database objects that are uniquely identified by a UUID).
3) If a newer object with the same ID is added, the older version of the object should be over-written/removed
4) The Solution should be accessible by many threads.
5) When the first object added to the Structure is read/used, it should be removed from the data structure

There are a couple of possibilities here. The simplest might be to start with a LinkedHashSet. That will provide you with the uniqueness and predictable ordering that you require. Then, you could wrap the resulting set to make it thread-safe:
Set<T> s = Collections.synchronizedSet(new LinkedHashSet<T>(...));
Note: Since a Set doesn't really define a method for retrieving items from it, your code would have to manually invoke Set.remove(Object).
Alternatively, you could wrap a LinkedHashMap, which does provide a hook for the delete-on-read semantics you require:
class DeleteOnReadMap<K, V> implements Map<K, V> {
private Map<K, V> m = new LinkedHashMap<K, V>();
// implement Map "read" methods Map with delete-on-read semantics
public V get(K key) {
// ...
}
// (other read methods here)
// implement remaining Map methods by forwarding to inner Map
public V put(K key, V value) {
return m.put(key, value);
}
// (remaining Map methods here)
}
Finally, wrap an instance of your custom Map to make it thread-safe:
Map<K, V> m = Collections.synchronizedMap(new DeleteOnReadMap<K, V>(...));

My thought is something like the following:
Collections.synchronizedMap(new LinkedHashMap<K, V>());
I think that takes care of everything except requirement 5, but you can do that by using the remove() method instead of get().
This won't be quite as efficient as a ConcurrentMap would be - synchronization locks the entire map on every access, but I think ConncurrentMap implementations can use read-write locks and selective locking on only part of the map to allow multiple non-conflicting accesses to go on simultaneously. If you wanted, you could probably get better performance by writing your own subclass of some existing Map implementation.

1) Object's insertion order must be
kept
This is any "normal" data structure - array, arrayList, tree. So avoid self-balancing or self-sorting data structures: heaps, hashtables, or move-to-front trees (splay trees, for example.) Then again, you could use one of those structures, but then you have to keep track of its insertion order in each node.
2) Object's must be unique (These are
database objects that are uniquely
identified by a UUID).
Keep a unique identifier associated with each object. If this is a C program, then the pointer to that node is unique (I guess this applies in Java as well.) If the node's pointer is not sufficient to maintain "uniqueness", then you need to add a field to each node which you gaurantee to have a unique value.
3) If a newer object with the same ID
is added, the older version of the
object should be over-written/removed
Where do you want to place the node? Do you want to replace the existing node? Or do you want to delete the old node,and then add the new one to the end? This is important because it is related to your requirement #1, where the order of insertion must be preserved.
4) The Solution should be accessible
by many threads.
The only way I can think of to do this is to implement some sort of locking. Java lets you wrap strucutres and code within an synchronized block.
5) When the first object added to the
Structure is read/used, it should be
removed from the data structure
Kinda like a "dequeue" operation.
Seems like an ArrayList is a pretty good option for this: simply because of #5. The only problem is that searches are linear. But if you have a relatively small amount of data, then it isn't really that much of a problem.
Otherwise, like others have said: a HashMap or even a Tree of some sort would work - but that will depend on the frequency of accesses. (For example, if the "most recent" element is most likely to be accessed, I'd use a linear structure. But if accesses will be of "random" elements, I'd go with a HashMap or Tree.)

The solutions talking about LinkedHashSet would be a good starting point.
However, you would have to override the equals and hashcode methods on the objects that you are going to be putting in the set in order to satisfy your requirement number 3.

Sounds like you have to create your own data structure, but it sounds like a pretty easy class assignment.
Basically you start with anything like an Array or Stack but then you have to extend it for the rest of the functionality.
You can look at the 'Contains' method as you will need that.

Related

Is it possible/required to speed up HashMap operations on same entry?

Suppose I wish to check HashMap entry and then replace it:
if( check( hashMap.get(key) ) ) {
hashMap.put(key, newValue);
}
this will cause search procedure inside HashMap to run two times: once while get and another one while put. This looks ineffective. Is it possible to modify value of already found entry of Map?
UPDATE
I know I can make a wrapper and I know I have problems to mutate entry. But the question is WHY? May be HashMap remembers last search to improve repeated one? Why there are no methods to do such operation?
EDIT: I've just discovered that you can modify the entry, via Map.Entry.setValue (and the HashMap implementation is mutable). It's a pain to get the entry for a particular key though, and I can't remember ever seeing anyone do this. You can get a set of the entries, but you can't get the entry for a single key, as far as I can tell.
There's one evil way of doing it - declare your own subclass of HashMap within the java.util package, and create a public method which just delegates to the package-private existing method:
package java.util;
// Please don't actually do this...
public class BadMap<K, V> extends HashMap<K, V> {
public Map.Entry<K, V> getEntryPublic(K key) {
return getEntry(key);
}
}
That's pretty nasty though.
You wouldn't normally modify the entry - but of course you can change data within the value, if that's a mutable type.
I very much doubt that this is actually a performance bottleneck though, unless you're doing this a heck of a lot. You should profile your application to prove to yourself that this is a real problem before you start trying to fine-tune something which is probably not an issue.
If it does turn out to be an issue, you could change (say) a Map<Integer, String> into a Map<Integer, AtomicReference<String>> and use the AtomicReference<T> as a simple mutable wrapper type.
Too much information for a comment on your question. Check the documentation for Hashmap.
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
Constant time means that it always requires the same amount of time to do the get and put operations [O(1)]. The amount of time that is going to be required is going to be linear based on how many times you need to loop [O(n)].
You can change the entry if it is mutable. One example of where you might do this is
private final Map<String, List<String>> map = new LinkedHashMap<>();
public void put(String key, String value) {
List<String> list = map.get(key);
if (list == null)
map.put(key, list = new ArrayList<>());
list.add(value);
}
This allows you to update a value, but you can't find and replace a value in one operation.
Take a look at trove ( http://trove4j.sourceforge.net/ ), their maps do have several methods that might be what you want:
adjustOrPut
putIfAbsent
I don't know how this is implemented internally, but i would guess that since trove is made to be highly performant, there will be only one lookup.

Maintaining multiple indexes with guava cache (in-memory table)

I'm trying to implement a simplified in-memory cached "table" where there are 2 types of indexes: primary and secondary.
Primary index maps a single key (primary key) to a unique value (Map interface)
Secondary index maps a single key to a Collection of values (Multimap fits the bill)
Very similar to a table in RDBMS world where one has several lookup columns. Sometimes you want to search by PK, sometimes return a list of rows based on a common property. Right now, there is no need in other operations than equals (=) (ie no range queries, or pattern matching).
Add cache semantics to the above data structure (eviction, data population/cache loader, refresh etc.) and that's pretty much what is needed.
I would like to ask your advice on how to best approach given problem. Should it be Cache per index or Cache (for PK) + (synchronized) Multimap for secondary indexes ?
Any help is much appreciated.
Regards.
You can replace a Map with a Guava com.google.common.cache.Cache. It doesn't support Multimap type semantics , so you'd have to use
Cache<K, ? extends List<V>>
in that case.
For the sake of simplicity I would make the 'primary index' a subset of the secondary index - i.e. you have a single index that returns a list of values for a given key and primary keys just return a list with a single value.
The challenge here is to maintain the integrity of two indexes regardless of whether you use two cache or even one cache for PK + multimap.
May be you should create a new cache class (say TableCache) which extends com.google.common.cache.Cache, internally this class can maintain a multimap instance variable for the secondary index (which can be a ConcurrentHashMap).
Then you can override Cache methods (put, get, invalidate etc) to keep the secondary index in sync.
Of course, you have to provide a get function to retrieve values based on secondary index.
This approach gives you the ability to maintain the integrity of primary and secondary indexes.
public class TableCache<K, V> extends Cache<K, V> {
Map<K, List<V>> secondaryIndex = new ConcurrentHashMap<K, List<V>>();
public void put(K key, V value) {
super.put(key, value);
// Update secondaryIndex
}
}
I have had this problem many times myself.
What would fix this problem is if Java had better STM support. It is very difficult to make non blocking atomic data structures. The best I have seen is multiverse.
Thus #vladimir's answer is probably the best but I would say that the stored collections should be immutable and you will have to retrieve the whole collection on refresh/cache miss etc.... Also if you change one of the members of multiset your going to have a tough time knowing how to update its parent and invalidate the cache.
Otherwise I would consider something like Redis for larger data sets which does support atomic operations on maps and list combinations.

java constantly sorted list with quick retrieval

I'm looking for a constantly sorted list in java, which can also be used to retrieve an object very quickly. PriorityQueue works great for the "constantly sorted" requirement, and HashMap works great for the fast retrieval by key, but I need both in the same list. At one point I had wrote my own, but it does not implement the collections interfaces (so can't be used as a drop-in replacement for a java.util.List etc), and I'd rather stick to standard java classes if possible.
Is there such a list out there? Right now I'm using 2 lists, a priority queue and a hashmap, both contain the same objects. I use the priority queue to traverse the first part of the list in sorted order, the hashmap for fast retrieval by key (I need to do both operations interchangeably), but I'm hoping for a more elegant solution...
Edit: I should add that I need to have the list sorted by a different comparator then what is used for retrieval by key; the list is sorted by a long value, the key retrieval is a String.
Since you're already using HashMap, that implies that you have unique keys. Assuming that you want to order by those keys, TreeMap is your answer.
It sounds like what you're talking about is a collection with an automatically-maintained index.
Try looking at GlazedLists which use "list pipelines" to efficiently propagate changes -- their SortedList class should do the job.
edit: missed your retrieval-by-key requirement. That can be accomplished with GlazedLists.syncEventListToMap and GlazedLists.syncEventListToMultimap -- syncEventListToMap works if there are no duplicate keys, and syncEventListToMultimap works if there are duplicate keys. The nice part about this approach is that you can create multiple maps based on different indices.
If you want to use TreeMaps for indices -- which may give you better performance -- you need to keep your TreeMaps privately encapsulated within a custom class of your choosing, that exposes the interfaces/methods you want, and create accessors/mutators for that class to keep the indices in sync with the collection. Be sure to deal with concurrency issues (via synchronized methods or locks or whatever) if you access the collection from multiple threads.
edit: finally, if fast traversal of the items in sorted order is important, consider using ConcurrentSkipListMap instead of TreeMap -- not for its concurrency, but for its fast traversal. Skip lists are linked lists with multiple levels of linkage, one that traverses all items, the next that traverses every K items on average (for a given constant K), the next that traverses every K2 items on average, etc.
TreeMap
http://download.oracle.com/javase/6/docs/api/java/util/TreeMap.html
Go with a TreeSet.
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
I haven't tested this so I might be wrong, so consider this just an attempt.
Use TreeMap, wrap the key of this map as an object which has two attributes (the string which you use as the key in hashmap and the long which you use to maintain the sort order in PriorityQueue). Now for this object, override the equals and hashcode method using the string. Implement the comparable interface using the long.
Why don't you encapsulate your solution to a class that implements Collection or Map?
This way you could simply delegate the retrieval methods to the faster/better suiting collection. Just make sure that calls to write-methods (add/remove/put) will be forwarded to both collections. Remember indirect accesses, like iterator.remove(). Most of these methods are optional to implement, but you have to deactivate them (Collections.unmodifiableXXX will help here in most cases).

In Java, what precautions should be taken when using a Set as a key in a Map?

I'm not sure what are prevailing opinions about using dynamic objects such as Sets as keys in Maps.
I know that typical Map implementations (for example, HashMap) use a hashcode to decide what bucket to put the entry in, and that if that hashcode should change somehow (perhaps because the contents of the Set should change at all, then that could mess up the HashMap by causing the bucket to be incorrectly computed (compared to how the Set was initially inserted into the HashMap).
However, if I ensure that the Set contents do not change at all, does that make this a viable option? Even so, is this approach generally considered error-prone because of the inherently volatile nature of Sets (even if precautions are taken to ensure that they are not modified)?
It looks like Java allows one to designate function arguments as final; this is perhaps one minor precaution that could be taken?
Do people even do stuff like this in commercial/open-source practice? (put List, Set, Map, or the like as keys in Maps?)
I guess I should describe sort of what I'm trying to accomplish with this, so that the motivation will become more clear and perhaps alternate implementations could be suggested.
What I am trying to accomplish is to have something of this sort:
class TaggedMap<T, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
...essentially, to be able to "tag" certain data (V) with certain keys (T) and write other auxiliary functions to access/modify the data and do other fancy stuff with it (ie. return a list of all entries satisfying some criteria of keys). The function of the _keys is to serve as a sort of index, to facilitate looking up the values without having to cycle through all of _map's entries.
In my case, I intend to specifically use T = String, V = Integer. Someone I talked to about this had suggested substituting a String for the Set, viz, something like:
class TaggedMap<V> {
Map<String, V> _map;
Map<T, Set<String>> _keys;
}
where the key in _map is of the sort "key1;key2;key3" with keys separated by delimiter. But I was wondering if I could accomplish a more generalised version of this rather than having to enforce a String with delimiters between the keys.
Another thing I was wondering was whether there was some way to make this as a Map extension. I was envisioning something like:
class TaggedMap<Set<T>, V> implements Map<Set<T>, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
However, I was not able to get this to compile, probably due to my inferior understanding of generics. With this as a goal, can anyone fix the above declaration so that it works according to the spirit of what I had described or suggest some slight structural modifications? In particular, I am wondering about the "implements Map, V>" clause, whether it is possible to declare such a complex interface implementation.
You are correct that if you ensure that
The Set contents are not modified, and
The Sets themselves are not modified
That it is perfectly safe to use them as keys in a Map.
It's difficult to ensure that (1) is not violated accidentally. One option might be to specifically design the class being stored inside the Set so that all instances of that class are immutable. This would prevent anyone from accidentally changing one of the Set keys, so (1) would not be possible. For example, if you use a Set<String> as a key, you don't need to worry about the Strings inside the Set changing due to external modification.
You can make (2) possible quite easily by using the Collections.unmodifiableSet method, which returns a wrapped view of a Set that cannot be modified. This can be done to any Set, which means that it's probably a very good idea to use something like this for your keys.
Hope this helps! And if your user name means what I think it does, good luck learning every language! :-)
As you mention, sets can change, and even if you prevent the set from changing (i.e., the elements it contains), the elements themselves may change. Those factor into the hashcode.
Can you describe what you are trying to do in higher-level terms?
#templatetypedef's answer is basically correct. You can only safely use a Set as a key in some data structure if the set's state cannot change while it is a key. If the set's state changes, the invariants of the data structure are violated and operations on it will give incorrect results.
The wrappers created using Collections.unmodifiableSet can help, but there is a hidden gotcha. If the original set is still directly reachable, the application could modify it; e.g.
public void addToMap(Set key, Object value);
someMap.put(Collections.unmodifiableSet(key), value);
}
// but ...
Set someKey = ...
addToMap(someKey, "Hi mum");
...
someKey.add("something"); // Ooops ...
To guarantee that this can't happen, you need to make a deep copy of the set before you wrap it. That could be expensive.
Another problem with using a Set as a key is that it can be expensive. There are two general approaches to implementing key / value mappings; using hashcode method or using a compareTo method that implements an ordering. Both of these are expensive for sets.

modifying a ConcurrentHashMap and Synchronized ArrayList in same method

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...
Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?
It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?
Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.
Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.
You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Categories

Resources