Sorry if and answer to this is out there. I just could not find it.
I don't care about insertion order, I just want to ensure that HashMap keeps its order without any puts being used in between.
If I have the following code:
StringBuilder keyChecker = new StringBuilder("");
for(String field : hashmap().keySet()){
keyChecker.append(field).append(",");
}
for(String field : hashmap().keySet()){
setAny(checker,x++, hashmap().get(field) );
x++;
}
Will the (1st,2nd,3rd,etc) field always match the same one next time I call HashMap keyset.
From my tests it seems like it always does, but I am not sure about any edge cases that I may come across.
Yes. It will keep its order if no new items are added. An idle map does not just decide to rearrange itself. But that order is non deterministic and can change once items are added.
WJS is correct. That said, it is very bad style to depend on this. If you actually depend on the order of the entries, I would suggest using a TreeMap or one of the Apache Commons implementations of OrderedMap.
You might be able to get by with your assumption that the order will be stable right now ... but if another developer works on the code, that assumption might not be known and the code will break in unexpected ways that will be big headache for somebody to solve.
If you depend on entry order, use a data structure that guarantees that order.
Related
The problem (No binary tree with duplicates in java collections).
I need a binary tree with duplicates, I need the O(Log(n)) complexity of searching and insertion while keeping the order (so i can't use Hash tables), java doesn't have a collection that implements a binary tree and allow duplicates while keeping all binary tree operations.
Can we use TreeSet to do this?
I'm trying to tweak TreeSet and allow duplicates by passing a comparator that never returns 0. I know this won't be a set anymore but it's ok, i need duplicates.
Example
TreeSet<Integer> binaryTreeWithDuplicates = new TreeSet<Integer>((x, y) -> x>y?1:-1);
Will there be an undesirable side effect of such implementation and usage?
because we are obviously violating the rules in comparator api like the sign rule.
contains would never return true.
You could have arbitrary duplicates in the set, and not be able to identify or remove them. (remove would never work.)
I suggest trying a TreeMultiset instead
Opinion:
Your Set ceases to be a Set, for one thing. Since that's outside the specification, the implementation is not obliged to do anything consistently. Anything could happen.
Note that since you can't say 2 == 2, you're obliged to have 2 > 2 at the same time that 2 < 2. That doesn't seem like it will end happily.
The only way to discover what happens with this sort of abuse is to try it. Maybe it'll all work and allow duplicate entries. But that doesn't guarantee that it will continue to happen with the next bugfix.
If I needed something like this (though I don't understand your use case), I'd either build a special-purpose class, or else try and use something like a set of lists of numbers.
Java offers us Collections, where every option is best used in a certain scenario.
But what would be a good solution for the combination of following tasks:
Quickly iterate through every element in the list (order does not matter)
Check if the list contains (a) certain element(s)
Some options that were considered which may or may not be good practice:
It could be possible to, for example, first use a LinkedList, and
then convert it to a HashSet when the amount of elements
is unknown in advance (and if duplicates will not be present)
Pick a solution for one of both tasks and use the same implementation for the other task (if switching to another implementation is not worth it)
Perhaps some implementation exists that does both (failed to find one)
Is there a 'best' solution to this, and if so, what is it?
EDIT: For potential future visitors, this page contains many implementations with big O runtimes.
A HashSet can be iterated through quickly and provides efficient lookups.
HashSet<Object> set = new HashSet<>();
set.add("Hello");
for (Object obj : set) {
System.out.println(obj);
}
if (set.contains("Hello")) {
System.out.println("Found");
}
Quickly iterate through every element in the list (order does not matter)
It the order does not matter, you should go with a Collection implementation with a time complexity of O(n), since each of them is implementing Iterable and if you want to iterate over each element, you have to visit each element at least once (hence there is nothing better than O(n)). Practically, of course, one implementation is more suited compared to another one, since more often you have multiple considerations to take into account.
Check if the list contains (a) certain element(s)
This is typically the user case for a Set, you will have much better time complexity for contains operations. One thing to note here is that a Set does not have a predefined order when iterating over elements. It can change between implementations and it is risky to make assumptions about it.
Now to your question:
From my perspective, if you have the choice to choose the data structure of a class yourself, go with the most natural one for that use case. If you can imagine that you have to call contains a lot, then a Set might be suited for your use case. You can also use a List and each time you need to call contains (multiple times) you can create a Set with all elements from the List before. Of course, if you call this method often, it would be expensive to create the Set for each invocation. You may use a Set in the first place.
Your comment stated that you have a world of players and you want to check if a player is part of a certain world object. Since the world owns the players, it should also contain a Collection of some kind to store them. Now, in this case i would recommend a Map with a common identifier of the player as key, and the player itself as value.
public class World {
private Map<String, Player> players = new HashMap<>();
public Collection<Player> getPlayers() { ... }
public Optional<Player> getPlayer(String nickname) { ... }
// ...
}
I had originally written an ArrayList and stored unique values (usernames, i.e. Strings) in it. I later needed to use the ArrayList to search if a user existed in it. That's O(n) for the search.
My tech lead wanted me to change that to a HashMap and store the usernames as keys in the array and values as empty Strings.
So, in Java -
hashmap.put("johndoe","");
I can see if this user exists later by running -
hashmap.containsKey("johndoe");
This is O(1) right?
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
My question is, is this a good approach? The efficiency beats ArrayList#contains or an array search in general. It works.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
Since you have a set of unique values, a Set is the appropriate data structure. You can put your values inside HashSet, an implementation of the Set interface.
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
The advice of the lead is flawed. Map is not the right abstraction for this, Set is. A Map is appropriate for key-value pairs. But you don't have values, only keys.
Example usage:
Set<String> users = new HashSet<>(Arrays.asList("Alice", "Bob"));
System.out.println(users.contains("Alice"));
// -> prints true
System.out.println(users.contains("Jack"));
// -> prints false
Using a Map would be awkward, because what should be the type of the values? That question makes no sense in your use case,
as you have just keys, not key-value pairs.
With a Set, you don't need to ask that, the usage is perfectly natural.
This is O(1) right?
Yes, searching in a HashMap or a HashSet is O(1) amortized worst case, while searching in a List or an array is O(n) worst case.
Some comments point out that a HashSet is implemented in terms of HashMap.
That's fine, at that level of abstraction.
At the level of abstraction of the task at hand ---
to store a collection of unique usernames,
using a set is a natural choice, more natural than a map.
This is basically how HashSet is implemented, so I guess you can say it's a good approach. You might as well use HashSet instead of your HashMap with empty values.
For example :
HashSet's implementation of add is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
where map is the backing HashMap and PRESENT is a dummy value.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
As I mentioned, the developers of the JDK are using this same approach.
I have this Map definition :
TreeMap <String, Set<Integer>>
It may contain millions of entries, and I also need a "natural order" (that's why I've chosen a TreeMap, though I could write a Comparator if needed).
So, what I have to do in order to add an element to the map is :
Check if a key already exists.
If not, create a new Set and add the value.
If it exists, I have to add the value to the Set
I have this implementation which works fine :
private void addToMap (String key, Integer value){
Set<Integer> vs = dataMap.get(key);
if (vs == null){
vs = new TreeSet<Integer>();
dataMap.put(key,vs);
}
vs.add(value);
}
But I would like to avoid searching for the key and then putting the element if it doesn't exist (it will perform a new search over the huge map).
I think I could use ConcurrentHashMap.putIfAbsent method, but then :
I will not have the natural ordering of the keys (and I will need to perform a sort on the millions keys)
I may have (I don't know) additional overhead because of synchronization over the ConcurrentHashMap, and in my situation my process is single threaded and it may impact performance.
Reading this post : Java map.get(key) - automatically do put(key) and return if key doesn't exist?
there's an answer that talks about Guava MapMaker.makeComputingMap but looks like the method is not there anymore.
Performance is critical in this situation (as always :D), so please let me know your recommendations.
Thanks in advance.
NOTE :
Thanks a lot for so many helping answers in just some minutes.
(I don't know which one to select as the best).
I will do some performance tests on the suggestions (TreeMultiMap, ConcurrentSkipListMap, TreeSet + HashMap) and update with the results. I will select the one with the best performance then, as I'd like to select all three of them but I cannot.
NOTE2
So, I did some performance testing with 1.5 million entries, and these are the results :
ConcurrentSkipListMap, it doesn't work as I expected, because it replaces the existing value with the new empty set I provided. I thought it was setting the value only if it the key doesn't exist, so I cannot use this one. (my mistake).
TreeSet + HashMap, works fine but doesn't give the best performance. It is like 1.5 times slower than TreeMap alone or TreeMultiMap.
TreeMultiMap gives the best performance, but it is almost the same as the TreeMap alone. I will check this one as the answer.
Again, thanks a lot for your contributions and help.
Concurrent map does not do magic, it checks the existence and then inserts if not exists.
Guava have MultiMaps, for example TreeMultiMap can be what you need.
If performance is critical I wouldn't use a TreeSet of Integer, I would find a more light weight structure like TIntArrayList or something which wraps int values. I would also use a HashMap as its look up is O(1) instead of O(log N). If you also need to keep the keys sorted, I would use a second collection for that.
I agree that putIfAbsent on ConcurrentHashMap is overkill and get/put on a HashMap is likely to be the fastest option.
ConcurrentSkipListMap might be a good option to use putIfAbsent, but it I would make sure its not slower.
BTW Even worse than doing a get/put is creating a HashSet you don't need.
PutIfAbsent has the benefit of concurrency, that is: if many threads call this at the same time, they don't have to wait (it doesn't use synchronized internally). However this comes at a minor cost of execution speed, so if you work only single-threaded, this will slow things down.
If you need this sorted, try the ConcurrentSkipListMap.
I wish to iterate over a set but the contents of the set will modify during its iteration. I wish to iterate over the original set at the time the iterator was created and not iterate over any of the new elements added to the set. How is this possible? Is this is the default behavior of set or how can I accomplish this?
One way I can think of is to get a new set from the original set which won't be modified but this seems inelegant and there must be a better solution.
Taking a snapshot of the set sounds like exactly the right solution to me, if you want to make sure you don't see any new elements. There are some sets such as ConcurrentSkipListSet which will allow you to keep iterating, but I can't see any guarantees around behaviour of an iterator in terms of seeing new elements.
EDIT: CopyOnWriteArraySet has the requirements you need, but writes are expensive, which sounds like it's not appropriate for you.
Those are the only sets I can see in java.util.concurrent, which is the natural package for such collections. Taking a copy is still likely to be simpler :)
EDIT: This answer was designed for a single-threaded case, since I had interpreted the OP's question as avoiding comodification rather than avoiding issues from multithreading. I'm leaving this answer here in case it ends up being useful to anyone in the future who is using a single-threaded approach.
There is no direct way to accomplish this. However, one option that is quite nice is to have two sets - the main set, which you iterate over, and a secondary set into which you insert all the new elements that need to be added. You can then iterate over the primary set, and then once that's finished go and use addAll to add all the new elements to the primary set.
For example:
Set<T> masterSet = /* ... */
Set<T> newElems = /* ... */
for (T obj: masterSet) {
/* ... do something to each object ... */
}
masterSet.addAll(newElems);
Hope this helps!
Making a copy of the Set is the elegant solution.
Set<Obj> copyOfObjs = new HashSet<Obj>(originalSet);
for(Obj original : originalSet) {
//add some more stuff to copyOfObjs
}
You can use a ConcurrentHashMap with dummy keys.
Or a ConcurrentSkipListSet
As others have suggested here, there is no optimal solution to what you search for. It all depends on the use-case of your application, or the usage of the set
Since Set is an interface you might define your own DoubleSet class which will implement Set and let's say will use two HashSet fields.
When you retrieve an iterator, you should mark one of these sets to be in "interation only mode", so the add method will add only to the other set
I am still new to Stackoverlflow, so I need to understand how to embed code in my answers :( but in general you should have a class called MySet (Generic of generic type T) implementing Set of generic type T.
You need to implement all the methods, and have two fields - one is called iterationSet and the other is called insertionSet.
You will also have a boolean field indicating if to insert to the two sets or not. When iterator() method is called, this boolean should be set to false, meaning you should insert only to the insertionSet.
You should have a method that will synchronize the content of the two sets once you're done with the iterator.
I hope I was clear
Now that OP has clarified the requirements, the solutions are
Copy the set before iterating
Use CopyOnWriteArraySet
Write your own custom code and try to be smarter than a lot of smart people.
The drawback of #1 is that you always copy the set even if it may not be needed (e.g. if no insertions actually occur while you are iterating) I'd suggest option #2, unless you prove that frequent inserts are causing a real performance issue.