Java: Iterate over a set while contents of set are being modified - java

I wish to iterate over a set but the contents of the set will modify during its iteration. I wish to iterate over the original set at the time the iterator was created and not iterate over any of the new elements added to the set. How is this possible? Is this is the default behavior of set or how can I accomplish this?
One way I can think of is to get a new set from the original set which won't be modified but this seems inelegant and there must be a better solution.

Taking a snapshot of the set sounds like exactly the right solution to me, if you want to make sure you don't see any new elements. There are some sets such as ConcurrentSkipListSet which will allow you to keep iterating, but I can't see any guarantees around behaviour of an iterator in terms of seeing new elements.
EDIT: CopyOnWriteArraySet has the requirements you need, but writes are expensive, which sounds like it's not appropriate for you.
Those are the only sets I can see in java.util.concurrent, which is the natural package for such collections. Taking a copy is still likely to be simpler :)

EDIT: This answer was designed for a single-threaded case, since I had interpreted the OP's question as avoiding comodification rather than avoiding issues from multithreading. I'm leaving this answer here in case it ends up being useful to anyone in the future who is using a single-threaded approach.
There is no direct way to accomplish this. However, one option that is quite nice is to have two sets - the main set, which you iterate over, and a secondary set into which you insert all the new elements that need to be added. You can then iterate over the primary set, and then once that's finished go and use addAll to add all the new elements to the primary set.
For example:
Set<T> masterSet = /* ... */
Set<T> newElems = /* ... */
for (T obj: masterSet) {
/* ... do something to each object ... */
}
masterSet.addAll(newElems);
Hope this helps!

Making a copy of the Set is the elegant solution.
Set<Obj> copyOfObjs = new HashSet<Obj>(originalSet);
for(Obj original : originalSet) {
//add some more stuff to copyOfObjs
}

You can use a ConcurrentHashMap with dummy keys.
Or a ConcurrentSkipListSet

As others have suggested here, there is no optimal solution to what you search for. It all depends on the use-case of your application, or the usage of the set
Since Set is an interface you might define your own DoubleSet class which will implement Set and let's say will use two HashSet fields.
When you retrieve an iterator, you should mark one of these sets to be in "interation only mode", so the add method will add only to the other set
I am still new to Stackoverlflow, so I need to understand how to embed code in my answers :( but in general you should have a class called MySet (Generic of generic type T) implementing Set of generic type T.
You need to implement all the methods, and have two fields - one is called iterationSet and the other is called insertionSet.
You will also have a boolean field indicating if to insert to the two sets or not. When iterator() method is called, this boolean should be set to false, meaning you should insert only to the insertionSet.
You should have a method that will synchronize the content of the two sets once you're done with the iterator.
I hope I was clear

Now that OP has clarified the requirements, the solutions are
Copy the set before iterating
Use CopyOnWriteArraySet
Write your own custom code and try to be smarter than a lot of smart people.
The drawback of #1 is that you always copy the set even if it may not be needed (e.g. if no insertions actually occur while you are iterating) I'd suggest option #2, unless you prove that frequent inserts are causing a real performance issue.

Related

Does HashMap keeps its order (not insertion order)

Sorry if and answer to this is out there. I just could not find it.
I don't care about insertion order, I just want to ensure that HashMap keeps its order without any puts being used in between.
If I have the following code:
StringBuilder keyChecker = new StringBuilder("");
for(String field : hashmap().keySet()){
keyChecker.append(field).append(",");
}
for(String field : hashmap().keySet()){
setAny(checker,x++, hashmap().get(field) );
x++;
}
Will the (1st,2nd,3rd,etc) field always match the same one next time I call HashMap keyset.
From my tests it seems like it always does, but I am not sure about any edge cases that I may come across.
Yes. It will keep its order if no new items are added. An idle map does not just decide to rearrange itself. But that order is non deterministic and can change once items are added.
WJS is correct. That said, it is very bad style to depend on this. If you actually depend on the order of the entries, I would suggest using a TreeMap or one of the Apache Commons implementations of OrderedMap.
You might be able to get by with your assumption that the order will be stable right now ... but if another developer works on the code, that assumption might not be known and the code will break in unexpected ways that will be big headache for somebody to solve.
If you depend on entry order, use a data structure that guarantees that order.

Is there any data structure that has no duplicates but can have elements added to it while being iterated over?

I know a set has no duplicates but the issue is that I can't add elements to it while iterating over it using an iterator or for each loop. Is there any other way? Thank you.
The ConcurrentHashMap class can be used for this. For example:
Set<T> set = Collections.newSetFromMap(new ConcurrentHashMap<T, Boolean>());
(You can replace <T, Boolean> with <> and let the compiler infer the types. I wrote it as above for illustrative purposes.)
The Collections::newSetFromMap javadoc says:
Returns a set backed by the specified map. The resulting set displays the same ordering, concurrency, and performance characteristics as the backing map. In essence, this factory method provides a Set implementation corresponding to any Map implementation.
Since ConcurrentHashMap allows simultaneous iteration and updates, so does the Set produced as above. The catch is that an iteration may not see the effect of additions or removals made while iterating.
The concurrency properties of iteration can be inferred from the javadoc for ConcurrentHashMap.
Is there any other way.
It depends on your requirements, but there are potentially ways to avoid the problem. For example, you could:
copy the set before iterating it, OR
add the new element to another new set and add the existing elements to the new set to the new set after ... or while ... iterating.
However, these these are unlikely to work without a concurrency bottleneck (e.g. 1.) or a differences in behavior (e.g. 2.)
Not sure whether below approach fixes your problem but you can try it:
HashSet<Integer> original = new HashSet<>();
HashSet<Integer> elementsToAdd = new HashSet<>();
elementsToAdd.add(element); //while iterating original
original.addAll(elementsToAdd); //Once done with iterating, just add all.

Returning a private collection using a getter method in Java

I have a number of Java classes that use private sets or lists internally. I want to be able to return these sets/lists using a get...List() method.
The alternatives I am considering:
return a reference to the internal object
construct a new set/list and fill it up (this seems bad practice?)
use Collections.unmodifiableList(partitions);
Which of these is the most common / best way to solve this issue?
There are many aspects to consider here. As others already have pointed out, the final decision depends on what your intention is, but some general statements regarding the three options:
1. return a reference to the internal object
This may impose problems. You can hardly ever guarantee a consistent state when you are doing this. The caller might obtain the list, and then do nasty things
List<Element> list = object.getList();
list.clear();
list.add(null);
...
Maybe not with a malicious intention but accidentally, because he assumed that it was safe/allowed to do this.
2. construct a new set/list and fill it up (this seems bad practice?)
This is not a "bad practice" in general. In any case, it's by far the safest solution in terms of API design. The only caveat here may be that there might be a performance penalty, depending on several factors. E.g. how many elements are contained in the list, and how the returned list is used. Some (questionable?) patterns like this one
for (int i=0; i<object.getList().size(); i++)
{
Element element = object.getList().get(i);
...
}
might become prohibitively expensive (although one could argue whether in this particular case, it was the fault of the user who implemented it like that, the general issue remains valid)
3. use Collections.unmodifiableList(partitions);
This is what I personally use rather often. It's safe in the sense of API design, and involves only a negligible overhead compared to copying the list. However, it's important for the caller to know whether this list may change after he obtained a reference to it.
This leads to...
The most important recommendation:
Document what the method is doing! Don't write a comment like this
/**
* Returns the list of elements.
*
* #return The list of elements.
*/
public List<Element> getList() { ... }
Instead, specify what you can make sure about the list. For example
/**
* Returns a copy of the list of elements...
*/
or
/**
* Returns an unmodifiable view on the list of elements...
*/
Personally, I'm always torn between the two options that one has for this sort of documentation:
Make clear what the method is doing and how it may be used
Don't expose or overspecify implementation details
So for example, I'm frequently writing documentations like this one:
/**
* Returns an unmodifiable view on the list of elements.
* Changes in this object will be visible in the returned list.
*/
The second sentence is a clear and binding statement about the behavior. It's important for the caller to know that. For a concurrent application (and most applications are concurrent in one way or the other), this means that the caller has to assume that the list may change concurrently after he obtained the reference, which may lead to a ConcurrentModificationException when the change happens while he is iterating over the list.
However, such detailed specifications limit the possibilities for changing the implementation afterwards. If you later decide to return a copy of the internal list, then the behavior will change in an incompatible way.
So sometimes I also explicitly specify that the behavior is not specified:
/**
* Returns an unmodifiable list of elements. It is unspecified whether
* changes in this object will be visible in the returned list. If you
* want to be informed about changes, you may attach a listener to this
* object using this-and-that method...
*/
These questions are mainly imporant when you intent do create a public API. Once you have implemented it in one way or another, people will rely on the behavior in one or the other way.
So coming back to the first point: It always depends on what you want to achieve.
Your decision should be based on one thing (primarily)
Allow other methods to modify the original collection ?
Yes : return a reference of the internal object.
No :
construct a new set/list and fill it up (this seems bad practice? -- No. Not at all. This is called Defensive programming and is widely used).
use Collections.unmodifiableList(partitions);
return a reference to the internal object
In this case receiver end can able to modify the object's set or list which might not be requirement. If you allow users to modify state of object then it is simplest approach.
construct a new set/list and fill it up (this seems bad practice?)
This is example shallow copy where collection object will not be modifiable but object would be used same. So any change in object state will effect the actual collection.
use Collections.unmodifiableList(partitions);
In this case it returns an unmodifiable view of the specified list. This method allows modules to provide users with "read-only" access to internal lists. This could be used as best practice in situation where you want to keep object's state safe.
I believe the best solution is to return an unmodifiable list. If compared to the construction of a new list, returning an unmodifiable "proxy" of the original list may save the client from implicitly generating a lot of unnecessary lists. On the other hand, if the client really needs to have a modifiable list, let it create a new list by itself.
The problem you still have to consider is that the objects contained into the list may be modified. There is no cheap and easy const-correctness in Java.
The second option is definitely the right way to go.
The other two options depend on your requirements.
If you are not going to modify the list values outside the class, return an unmodifiable list.
otherwise, just return the reference.

In Java, what precautions should be taken when using a Set as a key in a Map?

I'm not sure what are prevailing opinions about using dynamic objects such as Sets as keys in Maps.
I know that typical Map implementations (for example, HashMap) use a hashcode to decide what bucket to put the entry in, and that if that hashcode should change somehow (perhaps because the contents of the Set should change at all, then that could mess up the HashMap by causing the bucket to be incorrectly computed (compared to how the Set was initially inserted into the HashMap).
However, if I ensure that the Set contents do not change at all, does that make this a viable option? Even so, is this approach generally considered error-prone because of the inherently volatile nature of Sets (even if precautions are taken to ensure that they are not modified)?
It looks like Java allows one to designate function arguments as final; this is perhaps one minor precaution that could be taken?
Do people even do stuff like this in commercial/open-source practice? (put List, Set, Map, or the like as keys in Maps?)
I guess I should describe sort of what I'm trying to accomplish with this, so that the motivation will become more clear and perhaps alternate implementations could be suggested.
What I am trying to accomplish is to have something of this sort:
class TaggedMap<T, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
...essentially, to be able to "tag" certain data (V) with certain keys (T) and write other auxiliary functions to access/modify the data and do other fancy stuff with it (ie. return a list of all entries satisfying some criteria of keys). The function of the _keys is to serve as a sort of index, to facilitate looking up the values without having to cycle through all of _map's entries.
In my case, I intend to specifically use T = String, V = Integer. Someone I talked to about this had suggested substituting a String for the Set, viz, something like:
class TaggedMap<V> {
Map<String, V> _map;
Map<T, Set<String>> _keys;
}
where the key in _map is of the sort "key1;key2;key3" with keys separated by delimiter. But I was wondering if I could accomplish a more generalised version of this rather than having to enforce a String with delimiters between the keys.
Another thing I was wondering was whether there was some way to make this as a Map extension. I was envisioning something like:
class TaggedMap<Set<T>, V> implements Map<Set<T>, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
However, I was not able to get this to compile, probably due to my inferior understanding of generics. With this as a goal, can anyone fix the above declaration so that it works according to the spirit of what I had described or suggest some slight structural modifications? In particular, I am wondering about the "implements Map, V>" clause, whether it is possible to declare such a complex interface implementation.
You are correct that if you ensure that
The Set contents are not modified, and
The Sets themselves are not modified
That it is perfectly safe to use them as keys in a Map.
It's difficult to ensure that (1) is not violated accidentally. One option might be to specifically design the class being stored inside the Set so that all instances of that class are immutable. This would prevent anyone from accidentally changing one of the Set keys, so (1) would not be possible. For example, if you use a Set<String> as a key, you don't need to worry about the Strings inside the Set changing due to external modification.
You can make (2) possible quite easily by using the Collections.unmodifiableSet method, which returns a wrapped view of a Set that cannot be modified. This can be done to any Set, which means that it's probably a very good idea to use something like this for your keys.
Hope this helps! And if your user name means what I think it does, good luck learning every language! :-)
As you mention, sets can change, and even if you prevent the set from changing (i.e., the elements it contains), the elements themselves may change. Those factor into the hashcode.
Can you describe what you are trying to do in higher-level terms?
#templatetypedef's answer is basically correct. You can only safely use a Set as a key in some data structure if the set's state cannot change while it is a key. If the set's state changes, the invariants of the data structure are violated and operations on it will give incorrect results.
The wrappers created using Collections.unmodifiableSet can help, but there is a hidden gotcha. If the original set is still directly reachable, the application could modify it; e.g.
public void addToMap(Set key, Object value);
someMap.put(Collections.unmodifiableSet(key), value);
}
// but ...
Set someKey = ...
addToMap(someKey, "Hi mum");
...
someKey.add("something"); // Ooops ...
To guarantee that this can't happen, you need to make a deep copy of the set before you wrap it. That could be expensive.
Another problem with using a Set as a key is that it can be expensive. There are two general approaches to implementing key / value mappings; using hashcode method or using a compareTo method that implements an ordering. Both of these are expensive for sets.

Best way to remove repeats in a collection in Java?

This is a two-part question:
First, I am interested to know what the best way to remove repeating elements from a collection is. The way I have been doing it up until now is to simply convert the collection into a set. I know sets cannot have repeating elements so it just handles it for me.
Is this an efficient solution? Would it be better/more idiomatic/faster to loop and remove repeats? Does it matter?
My second (related) question is: What is the best way to convert an array to a Set? Assuming an array arr The way I have been doing it is the following:
Set x = new HashSet(Arrays.asList(arr));
This converts the array into a list, and then into a set. Seems to be kinda roundabout. Is there a better/more idiomatic/more efficient way to do this than the double conversion way?
Thanks!
Do you have any information about the collection, like say it is already sorted, or it contains mostly duplicates or mostly unique items? With just an arbitrary collection I think converting it to a Set is fine.
Arrays.asList() doesn't create a brand new list. It actually just returns a List which uses the array as its backing store, so it's a cheap operation. So your way of making a Set from an array is how I'd do it, too.
Use HashSet's standard Collection conversion constructor. According to The Java Tutorials:
Here's a simple but useful Set idiom.
Suppose you have a Collection, c, and
you want to create another Collection
containing the same elements but with
all duplicates eliminated. The
following one-liner does the trick.
Collection<Type> noDups = new HashSet<Type>(c);
It works by creating a Set (which, by
definition, cannot contain a
duplicate), initially containing all
the elements in c. It uses the
standard conversion constructor
described in the The Collection
Interface section.
Here is a minor variant of this idiom
that preserves the order of the
original collection while removing
duplicate element.
Collection<Type> noDups = new LinkedHashSet<Type>(c);
The following is a generic method that
encapsulates the preceding idiom,
returning a Set of the same generic
type as the one passed.
public static <E> Set<E> removeDups(Collection<E> c) {
return new LinkedHashSet<E>(c);
}
Assuming you really want set semantics, creating a new Set from the duplicate-containing collection is a great approach. It's very clear what the intent is, it's more compact than doing the loop yourself, and it leaves the source collection intact.
For creating a Set from an array, creating an intermediate List is a common approach. The wrapper returned by Arrays.asList() is lightweight and efficient. There's not a more direct API in core Java to do this, unfortunately.
I think your approach of putting items into a set to produce the collection of unique items is the best one. It's clear, efficient, and correct.
If you're uncomfortable using Arrays.asList() on the way into the set, you could simply run a foreach loop over the array to add items to the set, but I don't see any harm (for non-primitive arrays) in your approach. Arrays.asList() returns a list that is "backed by" the source array, so it doesn't have significant cost in time or space.
1.
Duplicates
Concurring other answers: Using Set should be the most efficient way to remove duplicates. HashSet should run in O(n) time on average. Looping and removing repeats would run in the order of O(n^2). So using Set is recommended in most cases. There are some cases (e.g. limited memory) where iterating might make sense.
2.
Arrays.asList() is a cheap operation that doesn't copy the array, with minimal memory overhead. You can manually add elements by iterating through the array.
public static Set arrayToSet(T[] array) {
Set set = new HashSet(array.length / 2);
for (T item : array)
set.add(item);
return set;
}
Barring any specific performance bottlenecks that you know of (say a collection of tens of thousands of items) converting to a set is a perfectly reasonable solution and should be (IMO) the first way you solve this problem, and only look for something fancier if there is a specific problem to solve.

Categories

Resources