All,
Can anyone please let me know exactly what are the performance issues between the 2? The site : CodeRanch provides a brief overview of the internal calls that would be needed when using keySet() and get(). But it would be great if anyone can provide exact details about the flow when keySet() and get() methods are used. This would help me understand the performance issues better.
The most common case where using entrySet is preferable over keySet is when you are iterating through all of the key/value pairs in a Map.
This is more efficient:
for (Map.Entry entry : map.entrySet()) {
Object key = entry.getKey();
Object value = entry.getValue();
}
than:
for (Object key : map.keySet()) {
Object value = map.get(key);
}
Because in the second case, for every key in the keySet the map.get() method is called, which - in the case of a HashMap - requires that the hashCode() and equals() methods of the key object be evaluated in order to find the associated value*. In the first case that extra work is eliminated.
Edit: This is even worse if you consider a TreeMap, where a call to get is O(log(n)), i.e. the comparator may need to run log2(n) times (n = size of the Map) before finding the associated value.
*Some Map implementations have internal optimisations that check the objects' identity before the hashCode() and equals() are called.
First of all, this depends entirely on which type of Map you're using. But since the JavaRanch thread talks about HashMap, I'll assume that that's the implementation you're referring to. And let's assume also that you're talking about the standard API implementation from Sun/Oracle.
Secondly, if you're concerned about performance when iterating through your hash map, I suggest you have a look at LinkedHashMap. From the docs:
Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.
HashMap.entrySet()
The source-code for this implementation is available. The implementation basically just returns a new HashMap.EntrySet. A class which looks like this:
private final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
public Iterator<Map.Entry<K,V>> iterator() {
return newEntryIterator(); // returns a HashIterator...
}
// ...
}
and a HashIterator looks like
private abstract class HashIterator<E> implements Iterator<E> {
Entry<K,V> next; // next entry to return
int expectedModCount; // For fast-fail
int index; // current slot
Entry<K,V> current; // current entry
HashIterator() {
expectedModCount = modCount;
if (size > 0) { // advance to first entry
Entry[] t = table;
while (index < t.length && (next = t[index++]) == null);
}
}
final Entry<K,V> nextEntry() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
Entry<K,V> e = next;
if (e == null)
throw new NoSuchElementException();
if ((next = e.next) == null) {
Entry[] t = table;
while (index < t.length && (next = t[index++]) == null);
}
current = e;
return e;
}
// ...
}
So there you have it... That's the code dictating what will happen when you iterate through an entrySet. It walks through the entire array, which is as long as the map's capacity.
HashMap.keySet() and .get()
Here you first need to get hold of the set of keys. This takes time proportional to the capacity of the map (as opposed to size for the LinkedHashMap). After this is done, you call get() once for each key. Sure, in the average case, with a good hashCode-implementation this takes constant time. However, it will inevitably require lots of hashCode() and equals() calls, which will obviously take more time than just doing a entry.value() call.
Here is the link to an article comparing the performance of entrySet(), keySet() and values(), and advice regarding when to use each approach.
Apparently the use of keySet() is faster (besides being more convenient) than entrySet() as long as you don't need to Map.get() the values.
Related
I have a class, Employee, let's say, and my hashCode function for this class is really bad (let's say it always return a constant). My code looks like the following.
public class Employee {
private String name;
public Employee(String name) {
this.name = name;
}
#Override
public int hashCode() { return 1; }
#Override
public boolean equals(Object object) {
if(null == object || !(object instanceof Employee)) {
return false;
}
Employee other = (Employee)object;
return this.name.equals(other.name);
}
}
Let's say I want to use Employee as the key in a Map, and so I can do something like the following.
public static void main(String[] args) {
Map<Employee, Long> map = new HashMap<>();
for(int i=0; i < 1000; i++) {
map.put(new Employee("john"+i, 1L));
}
System.out.println(map.size());
}
How come when I run this code, I always get 1,000 as the size?
Using Employee as a key seems to be "good" in the following sense.
It is immutable
Two employees that are equals always generate the same hash code
What I expected was that since the output of hashCode is always 1, then map.size() should always be 1. But it is not. Why? If I have a Map<Integer,Integer>, and I do map.put(1, 1) followed by map.put(1, 2), I would only expect the size to be 1.
The equals method must somehow be coming into play here, but I'm not sure how.
Any pointers are appreciated.
Your loop
for(int i=0; i < 1000; i++) {
map.put(new Employee("john"+System.currentTimeMillis(), 1L));
}
executes within a couple of milliseconds, so System.currentTimeMillis() will be returning the same value for the vast majority of the iterations of your loop. So, several hundred of your johns will have the exact same name + number.
Then, we have java's retarded Map which does not have an add() method, (which one would reasonably expect to throw an exception if the item already exists,) but instead it only has a put() method which will either add or replace items, without failing. So, most of your johns get overwritten by subsequent johns, without any increase in the map size, and without any exception being thrown to give you a hint about what you are doing wrong.
Furthermore, you seem to be a bit confused as to exactly what the effect of a bad hashCode() function is on a map. A bad hashCode() simply results in collisions. Collisions in a hashmap do not cause items to be lost; they only cause the internal structure of the map to not be very efficient. Essentially, a constant hashCode() will result in a degenerate map which internally looks like a linked list. It will be inefficient both for insertions and for deletions, but no items will be lost due to that.
Items will be lost due to a bad equals() method, or due to overwriting them with newer items. (Which is the case in your code.)
Mike's answer is right about what is causing this. But the real reason that it's happening is this:
In the put method of HashMap it first checks the hashcode for each entry. If the hash code is equal to the hashcode of your new key then it checks for .equals(). If equals() returns true it just replaces the existing object with the new one otherwise adds a new key value pair. That's where it gets busted. Because somethings your equals() function will return true because of the currentMilliSeconds and sometimes it won't hence different sizes every time.
Just pay attention to the equals in the code below (java HashMap).
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
If your hashcode is the same for every entry then your time complexity will be O(n) because the hashcode creates buckets to store your elements. If you only create a single bucket then you have to traverse the entire bucket to get your element.
If however, your hashcode is unique for every element then you will have a unique bucket and will only have to traverse a single element.
Bucket lookups (Hash) are O(1) so the better the hashcode the better the time complexity.
I think you have a misconecption what HashBuckets in a HashMap are for.
When you put two Objectswhich are not equal but have the same hashCode in a HashMap, both elements will be present in the Hashmap in the same HashBucket. An element is only overwritten when an element exists in the HashMapwhich has the same hashCode and is equals to an existing element.
The HashBuckets make the HashMap fast at lookup, because when searching for an element, only elements in the HahsBucket corresponding to the hashCode need to be considered. This is why it is generally a bad idea to wirte a HashFunction which is constant.
Your hashcode has to comply with certain requirements e.g equal objects should return equal hashcode.
But your implementation is not solid then it will give performance problem, if many of your objects have the same hashcode your look up some simply become O(N) instead of O(1). In your case it simply putting all items in a List. So the size is 1000.
I am trying to implement my own LRU cache. Yes, I know that Java provides a LinkedHashMap for this purpose, but I am trying to implement it using basic data structures.
From reading about this topic, I understand that I need a HashMap for O(1) lookup of a key and a linked list for management of the "least recently used" eviction policy. I found these references that all use a standard library hashmap but implement their own linked list:
"What data structures are commonly used for LRU caches and quickly
locating objects?" (stackoverflow.com)
"What is the best way to Implement a LRU Cache?" (quora.com)
"Implement a LRU Cache in C++" (uml.edu)
"LRU Cache (Java)" (programcreek.com)
The hash table is supposed to directly store a linked list Node as I show below. My cache should store Integer keys and String values.
However, in Java the LinkedList collection does not expose its internal nodes, so I can't store them inside the HashMap. I could instead have the HashMap store indices into the LinkedList, but then getting to an item would require O(N) time. So I tried to store a ListIterator instead.
import java.util.Map;
import java.util.HashMap;
import java.util.List;
import java.util.LinkedList;
import java.util.ListIterator;
public class LRUCache {
private static final int DEFAULT_MAX_CAPACITY = 10;
protected Map<Integer, ListIterator> _map = new HashMap<Integer, ListIterator>();
protected LinkedList<String> _list = new LinkedList<String>();
protected int _size = 0;
protected int _maxCapacity = 0;
public LRUCache(int maxCapacity) {
_maxCapacity = maxCapacity;
}
// Put the key, value pair into the LRU cache.
// The value is placed at the head of the linked list.
public void put(int key, String value) {
// Check to see if the key is already in the cache.
ListIterator iter = _map.get(key);
if (iter != null) {
// Key already exists, so remove it from the list.
iter.remove(); // Problem 1: ConcurrentModificationException!
}
// Add the new value to the front of the list.
_list.addFirst(value);
_map.put(key, _list.listIterator(0));
_size++;
// Check if we have exceeded the capacity.
if (_size > _maxCapacity) {
// Remove the least recently used item from the tail of the list.
_list.removeLast();
}
}
// Get the value associated with the key.
// Move value to the head of the linked list.
public String get(int key) {
String result = null;
ListIterator iter = _map.get(key);
if (iter != null) {
//result = iter
// Problem 2: HOW DO I GET THE STRING FROM THE ITERATOR?
}
return result;
}
public static void main(String argv[]) throws Exception {
LRUCache lruCache = new LRUCache(10);
lruCache.put(10, "This");
lruCache.put(20, "is");
lruCache.put(30, "a");
lruCache.put(40, "test");
lruCache.put(30, "some"); // Causes ConcurrentModificationException
}
}
So this leads to three problems:
Problem 1: I am getting a ConcurrentModificationException when I update the LinkedList using the iterator that I store in the HashMap.
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.remove(LinkedList.java:919)
at LRUCache.put(LRUCache.java:31)
at LRUCache.main(LRUCache.java:71)
Problem 2. How do I retrieve the value pointed to by the ListIterator? It seems I can only retrieve the next() value.
Problem 3. Is there any way to implement this LRU cache using the Java collections LinkedList, or do I really have to implement my own linked list?
1) This isn't really what Iterators are for.
By contract, if you modify the list without using the iterator -- as you do here
_list.addFirst(value);
then ALL OPEN ITERATORS on that list should throw ConcurrentModificationException. They were open to a version of the list that no longer exists.
2) A LinkedList is not, exactly, a linked list of nodes. It's a java.util.List, whose backing implementation is a doubly linked list of nodes. That List contract is why it does not expose references to the backing implementation -- so operations like "remove this node, as a node, and move it to the head" are no good. This encapsulation is for your own protection (same as the concurrent mod exception) -- it allows your code to rely on the List semantics of a LinkedList (iterability, for example) without worry that some joker two cubes away was hacking away at its innards and broke the contract.
3) What you really need here is NOT a LinkedList. What you need is a Stack that that allows you to move any arbitrary entry to the head and dump the tail. You are implying that you want a fast seek time to an arbitrary entry and also a fast remove and a fast add, AND you want to be able to find the tail at any moment in case you need to remove it.
Fast seek time == HashSomething
Fast add/remove of arbitrary elements == LinkedSomething
Fast addressing of the final element == SomekindaList
4) You're going to need to build your own linking structure...or use a LinkedHashMap.
PS LinkedHashSet is cheating, it's implemented using a LinkedHashMap.
I'll deal with problem 3 first:
As you point out in your question, LinkedList (like all well designed generic collections) hides the details of the implementation such as the nodes containing the links. In your case you need your hash map to reference these links directly as the values of the map. To do otherwise (e.g. having indirection through a third class) would defeat the purpose of an LRU cache to allow very low overhead on value access. But this is not possible with standard Java Collections - they don't (and shouldn't) provide direct access to internal structures.
So the logical conclusion of this is that, yes, you need to implement your own way of storing the order in which items in the cache have been used. That doesn't have to be a double-linked list. Those have traditionally been used for LRU caches because the most common operation is to move a node to the top of the list when it is accessed. That is an incredibly cheap operation in a double-linked list requiring just four nodes to be relinked with no memory allocation or free.
Problem 1 & 2:
Essentially the root cause here is that this you are trying to use iterators as a cursor. They are designed to be created, stepped through to perform some operation and then disposed of. Even if you get over the problems you are having I expect there will be further problems behind them. You're putting a square peg in a round hole.
So my conclusion is that you need to implement your own way to hold values in a class that keeps track of order of access. However it can be incredibly simple: only three operations are required: create, get value and remove from tail. Both create and get value must move the node to the head of the list. No inserting or deleting from the middle of the list. No deleting the head. No searching. Honestly dead simple.
Hopefully this will get you started :-)
public class <K,V> LRU_Map implements Map<K,V> {
private class Node {
private final V value;
private Node previous = null;
private Node next = null;
public Node(V value) {
this.value = value;
touch();
if (tail == null)
tail = this;
}
public V getValue() {
touch();
return value;
}
private void touch() {
if (head != this) {
unlink();
moveToHead();
}
}
private void unlink() {
if (tail == this)
tail = prev;
if (prev != null)
prev.next = next;
if (next != null)
next.prev = prev;
}
private void moveToHead() {
prev = null;
next = head;
head = this;
}
public void remove() {
assert this == tail;
assert this != head;
assert next == null;
if (prev != null)
prev.next = null;
tail = prev;
}
}
private final Map<K,Node> map = new HashMap<>();
private Node head = null;
private Node tail = null;
public void put(K key, V value) {
if (map.size() >= MAX_SIZE) {
assert tail != null;
tail.remove();
}
map.put(key, new Node(value));
}
public V get(K key) {
if (map.containsKey(key))
return map.get(key).getValue();
else
return null;
}
// and so on for other Map methods
}
Another way to skin this cat would be to implement a very simple class that extends the LinkedList, but runs any modifications to the list (e.g. add, remove, etc) inside of a "synchronized" block. You'll need to run your HashMap pseudo-pointer through the get() every time, but it should work just fine. e.g.
...
private Object lock = new Object(); //semaphore
//override LinkedList's implementations...
#Override
public <T> remove(int index) { synchronized(lock) { return super.remove(index); } }
...
If you have Eclipse or IntelliJ IDEA, then you should be able to auto-generate the method stubs you need almost instantly, and you can evaluate which ones need to be locked.
I understand that HashSet is based on HashMap implementation but is used when you need unique set of elements. So why in the next code when putting same objects into the map and set we have size of both collections equals to 1? Shouldn't map size be 2? Because if size of both collection is equal I don't see any difference of using this two collections.
Set testSet = new HashSet<SimpleObject>();
Map testMap = new HashMap<Integer, SimpleObject>();
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println(testSet.size());
System.out.println(testMap.size());
The output is 1 and 1.
SimpleObject code
public class SimpleObject {
private String dataField1;
private int dataField2;
public SimpleObject(){}
public SimpleObject(String data1, int data2){
this.dataField1 = data1;
this.dataField2 = data2;
}
public String getDataField1() {
return dataField1;
}
public int getDataField2() {
return dataField2;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((dataField1 == null) ? 0 : dataField1.hashCode());
result = prime * result + dataField2;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SimpleObject other = (SimpleObject) obj;
if (dataField1 == null) {
if (other.dataField1 != null)
return false;
} else if (!dataField1.equals(other.dataField1))
return false;
if (dataField2 != other.dataField2)
return false;
return true;
}
}
The map holds unique keys. When you invoke put with a key that exists in the map, the object under that key is replaced with the new object. Hence the size 1.
The difference between the two should be obvious:
in a Map you store key-value pairs
in a Set you store only the keys
In fact, a HashSet has a HashMap field, and whenever add(obj) is invoked, the put method is invoked on the underlying map map.put(obj, DUMMY) - where the dummy object is a private static final Object DUMMY = new Object(). So the map is populated with your object as key, and a value that is of no interest.
A key in a Map can only map to a single value. So the second time you put in to the map with the same key, it overwrites the first entry.
In case of the HashSet, adding the same object will be more or less a no-op. In case of a HashMap, putting a new key,value pair with an existing key will overwrite the existing value to set a new value for that key. Below I've added equals() checks to your code:
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
//If the below prints true, the 2nd add will not add anything
System.out.println("Are the objects equal? " , (simpleObject1.equals(simpleObject2));
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
//This is a no-brainer as you've the exact same key, but lets keep it consistent
//If this returns true, the 2nd put will overwrite the 1st key-value pair.
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println("Are the keys equal? ", (key.equals(key));
System.out.println(testSet.size());
System.out.println(testMap.size());
I just wanted to add to these great answers, the answer to your last dilemma. You wanted to know what is the difference between these two collections, if they are returning the same size after your insertion. Well, you can't really see the difference here, because you are inserting two values in the map with the same key, and hence changing the first value with the second. You would see the real difference (among the others) should you have inserted the same value in the map, but with the different key. Then, you would see that you can have duplicate values in the map, but you can't have duplicate keys, and in the set you can't have duplicate values. This is the main difference here.
Answer is simple because it is nature of HashSets.
HashSet uses internally HashMap with dummy object named PRESENT as value and KEY of this hashmap will be your object.
hash(simpleObject1) and hash(simplObject2) will return the same int. So?
When you add simpleObject1 to hashset it will put this to its internal hashmap with simpleObject1 as a key. Then when you add(simplObject2) you will get false because it is available in the internal hashmap already as key.
As a little extra info, HashSet use effectively hashing function to provide O(1) performance by using object's equals() and hashCode() contract. That's why hashset does not allow "null" which cannot be implemented equals() and hashCode() to non-object.
I think the major difference is,
HashSet is stable in the sense, it doesn't replace duplicate value (if found after inserting first unique key, just discard all future duplicates), and HashMap will make the effort to replace old with new duplicate value. So there must be overhead in HashMap of inserting new duplicate item.
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, Serializable
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set
More Details
Performance wise, is there really a big difference between using:
ArrayList.contains(o) vs foreach|iterator
LinkedList.contains(o) vs foreach|iterator
Of course, for the foreach|iterator loops, I'll have to explicitly compare the methods and return true or false accordingly.
The object I'm comparing is an object where equals() and hashcode() are both properly overridden.
EDIT: Don't need to know about containsValue after all, sorry about that. And yes, I'm stupid... I realized how stupid my question was about containsKey vs foreach, nevermind about that, I don't know what I was thinking. I basically want to know about the ones above (edited out the others).
EDITED:
With the new form of the question no longer including HashMap and TreeMap, my answer is entirely different. I now say no.
I'm sure that other people have answered this, but in both LinkedList and ArrayList, contains() just calls indexOf(), which iterates over the collection.
It's possible that there are tiny performance differences, both between LinkedList and ArrayList, and between contains and foreach, there aren't any big differences.
This makes no differency since contains(o) calls indexOf(o) which simply loops like this:
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
(Checked in ArrayList)
Without benchmarking, contains should be faster or the same in all cases.
For 1 and 2, it doesn't need to call the iterator methods. It can loop internally. Both ArrayList and LinkedList implement contains in terms of indexOf
ArrayList - indexOf is a C-style for loop on the backing array.
LinkedList - indexOf walks the linked list in a C-style for loop.
For 3 and 4, you have to distinguish between containsKey and containsValue.
3. HashMap, containsKey is O(1). It works by hashing the key, getting the associated bucket, then walking the linked list. containsValue is O(n) and works by simply checking every value in every bucket in a nested for loop.
4. TreeMap, containsKey is O(log n). It checks whether it's in range, then searches the red-black tree. containsValue, which is O(n), uses an in-order walk of the tree.
ArrayList.contains does
return indexOf(o) >= 0;
where
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
It's similar for LinkedList, only it uses .next() to iterate through the elements, so not much difference there.
public int indexOf(Object o) {
int index = 0;
if (o==null) {
for (Entry e = header.next; e != header; e = e.next) {
if (e.element==null)
return index;
index++;
}
} else {
for (Entry e = header.next; e != header; e = e.next) {
if (o.equals(e.element))
return index;
index++;
}
}
return -1;
}
HashMap.containKey uses the hash of the key to fetch all keys with that hash (which is fast) and then uses equals only on those keys, so there's an improvement there; but containsValue() goes through the values with a for.
TreeMap.containsKey seem to do an informed search using a comparator to find the Key faster, so still better; but containsValue still seems to go through the entire three until it finds a value.
Overall I think you should use the methods, since they're easier to write than doing a loop every time :).
I think using contains is better because generally the library implementation is more efficient than manual implementation of the same. Check out if you can during object construction or afterwards pass a comparator method that you have written which takes care of your custom equals and hashcode implementation.
Thanks,
Krishna
Traversing the container with foreach/iterator is always O(n) time.
ArrayList/LinkedList search is O(n) as well.
HashMap.containsKey() is O(1) amortized time.
TreeMap.containsKey() is O(log n) time.
For both HashMap and TreeMap containsValue() is O(n), but there may be implementations optimized for containsValue() be as fast as containsKey().
I know that it's typically a big no-no to remove from a list using java's "foreach" and that one should use iterator.remove(). But is it safe to remove() if I'm looping over a HashMap's keySet()? Like this:
for(String key : map.keySet()) {
Node n = map.get(key).optimize();
if(n == null) {
map.remove(key);
} else {
map.put(key, n);
}
}
EDIT:
I hadn't noticed that you weren't really adding to the map - you were just changing the value within the entry. In this case, pstanton's (pre-edit1) solution is nearly right, but you should call setValue on the entry returned by the iterator, rather than calling map.put. (It's possible that map.put will work, but I don't believe it's guaranteed - whereas the docs state that entry.setValue will work.)
for (Iterator<Map.Entry<String, Node>> it = map.entrySet().iterator();
it.hasNext();)
{
Map.Entry<String, Node> entry = it.next();
Node n = entry.getValue().optimize();
if(n == null)
{
it.remove();
}
else
{
entry.setValue(n);
}
}
(It's a shame that entry doesn't have a remove method, otherwise you could still use the enhanced for loop syntax, making it somewhat less clunky.)
Old answer
(I've left this here for the more general case where you just want to make arbitrary modifications.)
No - you should neither add to the map nor remove from it directly. The set returned by HashSet.keySet() is a view onto the keys, not a snapshot.
You can remove via the iterator, although that requires that you use the iterator explicitly instead of via an enhanced for loop.
One simple option is to create a new set from the original:
for (String key : new HashSet<String>(map.keySet())) {
...
}
At this point you're fine, because you're not making any changes to the set.
EDIT: Yes, you can definitely remove elements via the key set iterator. From the docs for HashMap.keySet():
The set supports element removal,
which removes the corresponding
mapping from the map, via the
Iterator.remove, Set.remove,
removeAll, retainAll, and clear
operations. It does not support the
add or addAll operations.
This is even specified within the Map interface itself.
1 I decided to edit my answer rather than just commenting on psanton's, as I figured the extra information I'd got for similar-but-distinct situations was sufficiently useful to merit this answer staying.
you should use the entry set:
for(Iterator<Map.Entry<String, Node>> it = map.entrySet().iterator(); it.hasNext();)
{
Map.Entry<String, Node> entry = it.next();
Node n = entry.getValue().optimize();
if(n == null)
it.remove();
else
entry.setValue(n);
}
EDIT fixed code