How to avoid unintended Hash-Key manipulation - java

I noticed that adding an element to a list does change its hash-key value and therefore it cannot be accessed again since .contains(modifiedObject) won't get a collision here. I did not expect that behavior to be honest. Makes me wonder how HashSet does its hashing .. So how can I make sure to not destroy my HashSet as I modify e.g. a list of strings as shown below. Is there a method to do that safe or is that just something I have to look out as a programmer?
private HashSet<List<String>> bagOfWordsMap = new HashSet<List<String>>();
private void createBagOfWordsList(UnifiedTag[] invalidTags) {
for(List<String> sentences : getSentenceList()) {
List<String> sentenceStemWords = new ArrayList<String>();
// Not what you would want to do since sentenceStemWords is
// modified right after and bagOfWordsMap.contains(sentenceStemWords)
// won't collide again:
// bagOfWordsMap.add(sentenceStemWords);
for(String word : sentences) {
String stem = Stemmer.getStem(word);
sentenceStemWords.add(stem);
}
bagOfWordsMap.add(sentenceStemWords);
}
}

Never use a mutable object as key in a map or set
Implement a frozen type that cannot be modified anymore if you want to prevent accidential modification!
fine print: it's technically okay to have mutable attributes on an object if they don't change the key, but you won't be able to access them easily by key in a java set, as there is no HashSet.get to get the current member, only a contains. Also, it's bad style and fragile. It's much better to split such objects into key, value.

One way is to use an UnmodifiableList<String> instead of a List<String> in your HashSet.
Another option is to use a HashMap<String,List<String>> instead of your HashSet<List<String>>, provided that you can associate some unique String key with each of your Lists.

Related

lookup key by key in Java map or set

In most languages, including Java, there is an API something like java.util.Map which is designed to make it easy to loop up a value, given the key that maps to it. But there is not always a convenient way to look up the key, given the key (I'm pretty sure Python makes it hard, C++ makes it easy (just ask for an iterator), this question is about Java, which I suspect as as bad as Python). At first this may sound dumb: why would you need to look up a key you already have? But consider something like this (example below uses Set instead of Map, but same idea):
TreeSet<String> dictionary = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
dictionary.add("Monday"); // populate dictionary
String word = "MONDAY"; // user input, or something
if(dictionary.contains(word)) System.out.println(word + " already in dictionary");
The above code snippet will print MONDAY already in dictionary. This is of course wrong, because "MONDAY" is not in the dictionary; rather, "Monday" is. How can we make the message more accurate? In this case, we can make a help function that take advantage of the fact that a TreeSet is a NavigableSet (Actually a similar trick works for SortedSet, though it's a bit less convenient.):
String lookup(NavigableSet<String> set, String key) {
assert set.contains(key) : key + " not in set";
return set.floor(key);
}
Now we can fix the last line of the previous code snippet:
if(dictionary.contains(word)) System.out.println(lookup(word) + " already in dictionary");
Which will print the correct thing. But now let's try an example with a hash set:
import java.util.HashSet;
/** Maintains a set of strings; useful as a replacement for String.intern() */
class StringInterner {
private final HashSet<String> set = new HashSet<>();
/** use this instead of String.intern() */
String intern(String s) {
if(!set.contains(s)) {
s.add(s);
return s;
}
for(String str : set) // linear scan!!
if(str.equals(s)) return str;
throw new AssertionError("something went very wrong");
}
}
The code above resorts to a linear scan to find something that it already knows is there. Note that HashSet could easily give us what we're looking for, because it needs to be able to do this just to implement contains(). But there's no API for it, so we can't even ask the question. (Actually, HashMap has an internal method called getNode which is pretty much what we want, but it is internal.) An easy workaround in this case is to use a map instead of a set: instead of set.add(s) we could instead use map.put(s,s). But what if we're already using a map, because we already have data we want to associate with our key? Then we can either use two maps, and carefully keep them in sync, or else store a tuple of size 2 as the "value" in our map, where the first item in the tuple is simply the map key. Both of these solutions seem needlessly awkward.
Is there a better way?
Is there a better way?
No, there isn't.
For HashMaps, it doesn't matter, because your "two equivalent keys" argument doesn't make sense for HashMap. The two keys will always have to be equals, which as far as Java is concerned, means they should be substitutable in every way.

Can AnyOne Please Explain this HashMap Behaviour

My Program is
public class Demo {
public static void main(String[] args) {
List<String> arr = new ArrayList<>();
arr.add("a");
arr.add("b");
Map<List<String>, String> map = new HashMap<>();
map.put(arr, "Ravinda");
System.out.println(map.get(arr));
arr.add("c");
System.out.println(map.get(arr));
}
}
Output is: Ravindra and
null
I am not able to get why the output of second System.out.println is null.
Can anyone please explain.
When you call: map.put(arr, "Ravinda"); you are setting the key of the "Ravinda" value to be a List containing two strings.
By calling arr.add("c"); you are modifying List used earlier to index the "Ravinda" value in your hashmap.
Since the arr List has been changed, it no longer matches the key specified when you called: map.put(arr, "Ravinda");
This is why the hashmap is returning a null value when you try to access it for the second time.
The hashmap still contains the 'Ravinda' value, but this value is indexed against the list containing only the two values.
As this answer explains, you need to be careful when using an object who's hash code is mutable. The hashCode method for an ArrayList varies depending on the elements it contains, so by adding "c" to the list you have changed its hash code.
It's important to note that even if the hash code did not change, the lists would still have to be equal. An object's hash code is not a unique identifier, so internally HashMap uses an equals comparison on the key after retrieving the bucket it is in.
If you have a program that is in this position, you need to take a step back and determine another solution to the problem at hand. There is no reliable method to use a mutable list in a Map that doesn't boil down into reference equality (which makes things pretty pointless anyway).

Modifying elements while iterating Java Set

Pretty new to Java here. I am coming from Python. There are similar questions on SO talking about remove or add element while iterating a Java Set. What I would like to know is to modify the elements containing in the Set. For instance, ["apple", "orange"] to ["iapple", "iorange"]. In addition, I would like to do it in place, i.e., not creating another set and put the modified element into the new set while iterating it.
Apparently a simple for loop doesn't work, as the following:
import java.util.Arrays;
import java.util.Set;
import java.util.HashSet;
class Test {
public static void main (String[] args) {
Set<String> strs = new HashSet<String>(Arrays.asList("apple", "orange"));
for (String str : strs) {
str = "i" + str;
}
}
}
The issue with what you've written is you don't do anything with the computed value. The for-each loop sets the value of str to each element. Within the for loop you are changing the value of str, but not doing anything else with it.
This would be easy to do in a linkedlist or any data structure which supports indexing, but with a set it can be tricky. Just removing the old element and adding the new one will likely screw up the iteration, especially because you're dealing with a hash set.
A simple way to do this is to convert to a list and back:
class Test {
public static void main (String[] args) {
Set<String> strs = new HashSet<String>(Arrays.asList("apple", "orange"));
//Convert to list
LinkedList<String> strsList = new LinkedList<String>();
strsList.addAll(strs);
//Do modification
for (int i = 0; i < strsList.size(); i++) {
String str = strsList.get(i);
strsList.set(i,"i" + str);
}
//Convert back to set
strs.clear();
strs.addAll(strsList);
}
}
This is clearly a bit more work than you would expect, but if mass-replacing is behavior you anticipate then probably don't use a set.
I'm interested to see what other answers pop up as well.
You cannot modify a String in java, they are immutable.
While it is theoretically possible to have mutable elements in a Set and mutate them in place, it is a terrible idea if the mutation effects hashcode (and equals).
So the answer to your specific question is no, you cannot mutate a String value in a Set without removing then adding entries to that Set.
The problem is that in the for loop, str is merely a reference to a String. References, when reassigned, don't change the actual object it refers to. Additionally, strings are immutable anyway, so calling any method on them will give you a new String instead of modifying the original. What you want to do is store all the new Strings somewhere, take out the old ones, and then add the new ones.
EDIT: My original code would have thrown a ConcurrentModificationException.
There are 3 problems in your approach to solve the problem.
1st - You can't modify the contents of a Set while you are iterating it. You would need to create a new Set with the new values.
"But I'm not modifying the set, I am modifying the objects within it", which leads to problem two
2nd - In your code you are modifying a reference to a string, not the string itself.
To modify the string you would need to call a method over it, like string.changeTo("this"), or modify a field, string.value = "new value".
Which leads to problem three.
3rd - Strings are immutable. when you construct a string, say new String("hello"), you can't further modify it's inner value.
The solutions:
First option is the simpler one, create a new set.
The second option is to use string builders instead of strings, which are mutable string creators/placeholders.
http://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html

How to check whether a HashMap has all the elements of an ArrayList?

If I have a HashMap hashM, an ArrayList arrayL. If I would like to use an if statement to check whether hashM has all the elements in arrayL, how can I do that?
I cm currently using something like
if (hashM.values().containsAll(arrayL.getPreReqs()))
However it doesn't work properly.
Dear all thanks for the answers!
Actually containsAll works however the way I structure the my codes is wrong so that I got wrong outcomes. Now it has been fixed.
Cheers!
Given
Map<?,?> map = new HashMap<?,?>();
List<?> list = new ArrayList<?>();
The approach you tried (well, nearly, as pointed out by Marko Topolnik) is indeed correct:
if (map.values().containsAll(list)) { ... }
(Or map.keySet().containsAll(list) if you were interested in the map keys instead of values.)
For this to work as expected for custom types, you of course must have implemented equals() and hashcode() correctly for them. (See e.g. this question or better yet, read Item 9 in Effective Java.)
By the way, when working with Java Collections, it is good practice to define fields and variables using the interfaces (such as List, Set, Map), not implementation types (e.g. ArrayList, HashSet, HashMap). For example:
List<String> list = new ArrayList<String>();
Map<Integer, String> map = new HashMap<Integer, String>();
Similarly, a more "correct" or fluent title for your question would have been "How to check whether a Map has all the elements of a List?". Check out the Java Collections tutorial for more info.
Your code is correct except..
if (hashM.values().containsAll(arrayL)) {....}
[EDIT]
You can use HashMap.containsValue(Object value)
public boolean containsList(HashMap<K, V> map, List<V> list) {
for(V value : list) {
if(!map.containsValue(value)) {
return false;
}
}
return true;
}
Your code should work - but will not be particularly efficient. You need to compare every element in the list with every element in the map.
If (and only if) you can easily extract the key of the map from the elements then you would be better off looping through your List and for each element do map.containsKey(getKey(elem)), this will be much faster.
If you are doing this sort of comparison a lot and you cannot map from element to key then it may be worth keeping a HashSet of the values for this purpose.
I agree with JoniK. This can be done in a single line like this.
if(hashM.values().containsAll(arrayL)) {// put your code here that will be returned}

How better convert all Set elements in java?

I have set like:
Set<String>
I need on each element of Set make split by ; and create new Set that will contain only 2-nd element. Should I make it directly one by one or exists better way?
Thanks.
If you can relax your constraint of an output being a Set<String> to being a Collection<String> you could use Guava and defer the transformation of elements until enumeration of elements through the Collections2#transform() method. You would just have to write a custom function to perform the split on an individual element.
But if you cannot/should not relax this constraint, you are best left to doing the already proposed individual iterations (as it'd be much more legible).
Code would look something like:
Set<String> input; //Given
Collection<String> output = Collections2.transform(input, new Function<String,String>() {
#Override
public String apply (String element) {
// As JohnnyO says, add appropriate edge case checking...
return element.split(";")[1];
}
});
Set<String> suffixSet = new HashSet<String>();
for (String s : inputSet) {
suffixSet.add(s.split(";")[1])
}
I'd also add appropriate error checking and handling for the case when s does not have a ; present.
As you have not shown the code, we can only guess what you're trying to do. You need to iterate through the Set and split each String. You can use split method if you want.
It is hard to say with so little information. You could iterate over the set doing split and adding to another Set.
Or you could replace Set with a e.g. HashMap and when you create the map put as key the first part of the string and as value the second so that you can retrieve the second part when you need fast.
Or if you create the strings yourself place them in different sets directly
Or...(you don't say enough) to provide more options

Categories

Resources