In the code below, I am creating a hashmap to store objects called Datums, which contain a String (location) and a count. Unfortunately, the code is giving very strange behavior.
FileSystem fs = FileSystem.get(new Configuration());
Random r = new Random();
FSDataOutputStream fsdos = fs.create(new Path("error/" + r.nextInt(1000000)));
HashMap<String, Datum> datums = new HashMap<String, Datum>();
while (itrtr.hasNext()) {
Datum next = itrtr.next();
synchronized (datums) {
if (!datums.containsKey(next.location)) {
fsdos.writeUTF("INSERTING: " + next + "\n");
datums.put(next.location, next);
} else {
} // skit those that are already indexed
}
}
for (Datum d : datums.values()) {
fsdos.writeUTF("PRINT DATUM VALUES: " + d.toString() + "\n");
}
The hashmap has Strings as keys.
Here is the output I get in the error files (example):
INSERTING: (test.txt,3)
INSERTING: (test2.txt,1)
PRINT DATUM VALUES: (test.txt,3)
PRINT DATUM VALUES: (test.txt,3)
The correct output for the print should be:
INSERTING: (test.txt,3)
INSERTING: (test2.txt,1)
PRINT DATUM VALUES: (test.txt,3)
PRINT DATUM VALUES: (test2.txt,1)
What is happening to the Datum with test2.txt as its location? Why is it getting replaced with test.txt??
Basically, I should never see the same location twice. (that is what the !datums.containsKey is checking for). Unfortunately, I'm getting very strange behavior.
This is on Hadoop, by the way, in a reducer.
I tried putting the synchronized here in case it was running in multiple threads, which, to my knowledge, it isn't. Still, the same thing happens.
According to this answer Hadoop's iterator always returns the same object, instead of creating a new object to return each time around the loop.
So, holding onto references to the object returned by the iterator is not valid and will produce surprising results. You'll need to copy the data to a new object:
while (itrtr.hasNext()) {
Datum next = itrtr.next();
// copy any values from the Datum to a fresh instance
Datum insert = new Datum(next.location, next.value);
if (!datums.containsKey(insert.location)) {
datums.put(insert.location, insert);
}
}
Here is a reference to the Hadoop Reducer documentation which confirms this:
The framework will reuse the key and value objects that are passed
into the reduce, therefore the application should clone the objects
they want to keep a copy of.
it is not problem of the map but of the code
datums.put(next.location, next); inserts as value reference that is later chnaged :)
that is why at the end all values in the map are the same equal to last processed datum in the map
Related
Java doc says HashMap.clone() returns a shallow copy.
So I expect that if I change the value of some key in original HashMap, the cloned one will also see the change. Thus, I have this:
public class ShallowCopy {
public static void main(String[] args) {
HashMap<Integer, String> map = new HashMap<Integer, String>();
map.put(2,"microsoft");
map.put(3,"yahoo");
map.put(1,"amazon");
// Type safety: Unchecked cast from Object to
// HashMap<Integer, String> Java(16777761)
Map<Integer, String> mClone =
(HashMap<Integer, String>)map.clone();
String previous = map.replace(3, "google");
System.out.println(previous); // yahoo
System.out.println(map.get(3)); // google
System.out.println(mClone.get(3)); // yahoo, but why?
}
}
In my code, I called HashMap.replace() and I see the value in map is changed from "yahoo" to "google".
Strangely, the last line prints the previous value when looking for it in mClone.
But the fact is it prints "yahoo" not "google" as I expected.
Where does it get wrong, please kindly fix my understandings?
Plus: I also got a compiler warning as I commented in my code(Java(16777761)), how to fix it?
TL;DR
After the cloning operation the values are simply references to the same object. So if you were to modify one reference in one map, the other would also be modified. But you didn't modify the object you replaced it. At that point it became distinct from a reference perspective.
Example
The clone operation is working just as you presumed. But you are interpreting the results incorrectly. Consider the following class.
class FooClass {
int a;
public FooClass(int a) {
this.a = a;
}
public void setA(int a) {
this.a = a;
}
#Override
public String toString() {
return a + "";
}
}
And now create a map and its clone.
HashMap<Integer, FooClass> map = new HashMap<>();
map.put(10, new FooClass(25));
HashMap<Integer,FooClass> mClone = (HashMap<Integer,FooClass>)map.clone();
The values of each key are the same object reference. As shown by the following:
System.out.println(System.identityHashCode(map.get(10)));
System.out.println(System.identityHashCode(mClone.get(10)));
prints
1523554304
1523554304
So if I modify one, it will modify the other.
The same was true for the String values of your maps. But when you replaced "yahoo" with "google" you didn't modify the String you replaced it with a different Object.
If I were to do the same for FooClass, here is the result.
System.out.println("Modifying same object");
mClone.get(10).setA(99);
System.out.println(map.get(10));
System.out.println(mClone.get(10));
prints
Modifying same object
99
99
But if I were to replace the object with a new one.
System.out.println("Replacing the instance");
FooClass previous = mClone.replace(10, new FooClass(1000));
System.out.println("Previous = " + previous);
System.out.println("map: " + map.get(10));
System.out.println("mClone: " + mClone.get(10));
prints
Replacing the instance
Previous = 99
map: 99
mClone: 1000
And this latter operation is what you did.
Method clone() creates a new map which gets populated with references to the values and keys contained in the source map, it's not a view of the initial map but an independent collection.
When you're calling map.replace(3, "google") a value mapped to the key 3 gets replaced with a new string, however the cloned map remains unaffected, it stills holds a reference to the same string "yahoo".
When I wrote this piece of code due to the pnValue.clear(); the output I was getting was null values for the keys. So I read somewhere that adding values of one map to the other is a mere reference to the original map and one has to use the clone() method to ensure the two maps are separate. Now the issue I am facing after cloning my map is that if I have multiple values for a particular key then they are being over written. E.g. The output I am expecting from processing a goldSentence is:
{PERSON = [James Fisher],ORGANIZATION=[American League, Chicago Bulls]}
but what I get is:
{PERSON = [James Fisher],ORGANIZATION=[Chicago Bulls]}
I wonder where I am going wrong considering I am declaring my values as a Vector<String>
for(WSDSentence goldSentence : goldSentences)
{
for (WSDElement word : goldSentence.getWsdElements()){
if (word.getPN()!=null){
if (word.getPN().equals("group")){
String newPNTag = word.getPN().replace("group", "organization");
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (Vector<String>) pnValue.clone();
annotationMap.put(newPNTag.toUpperCase(),newPNValue);
}
else{
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (Vector<String>) pnValue.clone();
annotationMap.put(word.getPN().toUpperCase(),newPNValue);
}
}
sentenceAnnotationMap = (LinkedHashMap<String, Vector<String>>) annotationMap.clone();
pnValue.clear();
}
EDITED CODE
Replaced Vector with List and removed cloning. However this still doesn't solve my problem. This takes me back to square one where my output is : {PERSON=[], ORGANIZATION=[]}
for(WSDSentence goldSentence : goldSentences)
{
for (WSDElement word : goldSentence.getWsdElements()){
if (word.getPN()!=null){
if (word.getPN().equals("group")){
String newPNTag = word.getPN().replace("group", "organization");
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = (List<String>) pnValue;
annotationMap.put(newPNTag.toUpperCase(),newPNValue);
}
else{
pnValue.add(word.getToken().replaceAll("_", " "));
newPNValue = pnValue;
annotationMap.put(word.getPN().toUpperCase(),newPNValue);
}
}
sentenceAnnotationMap = annotationMap;
}
pnValue.clear();
You're trying a bunch of stuff without really thinking through the logic behind it. There's no need to clear or clone anything, you just need to manage separate lists for separate keys. Here's the basic process for each new value:
If the map contains our key, get the list and add our value
Otherwise, create a new list, add our value, and add the list to the map
You've left out most of your variable declarations, so I won't try to show you the exact solution, but here's the general formula:
List<String> list = map.get(key); // try to get the list
if (list == null) { // list doesn't exist?
list = new ArrayList<>(); // create an empty list
map.put(key, list); // insert it into the map
}
list.add(value); // update the list
I am trying to unit test a function that takes a HashMap and concatenates the keys into a comma separated string. The problem is that when I iterate through the HashMap using entrySet (or keySet or valueSet) the values are not in the order I .put() them in. IE:
testData = new HashMap<String, String>(0);
testData.put("colA", "valA");
testData.put("colB", "valB");
testData.put("colC", "valC");
for (Map.Entry<String, String> entry : testData.entrySet()) {
System.out.println("TestMapping " + entry.getKey());
}
Gives me the following output:
TestMapping colB
TestMapping colC
TestMapping colA
The string created by the SUT is ColB,ColC,ColA
How can I unit test this, since keySet(), valueSet(), etc are somewhat arbitrary in their order?
This is the function I am trying to test:
public String getColumns() {
String str = "";
for (String key : data.keySet()) {
str += ", " + key;
}
return str.substring(1);
}
There is no point in iterating over the HashMap in this case. The only reason to iterate over it would be to construct the expected String, in other words, perform the same operation as the method under test, so if you made an error implementing the method, you are likely to repeat the error when implementing the same for the unit test, failing to spot the error.
You should focus on the validity of the output. One way to test it, is to split it into the keys and check whether they match the keys of the source map:
testData = new HashMap<>();
testData.put("colA", "valA");
testData.put("colB", "valB");
testData.put("colC", "valC");
String result = getColumn();
assertEquals(testData.keySet(), new HashSet<>(Arrays.asList(result.split(", "))));
You are in control of the test data, so you can ensure that no ", " appears within the key strings.
Note that in its current form, your question’s method would fail, because the result String has an additional leading space. You have to decide whether it is intentional (in this case, you have to change the test to assertEquals(testData.keySet(), new HashSet<>(Arrays.asList(result.substring(1) .split(", "))));) or a spotted bug (then, you have to change the method’s last line to return str.substring(2);).
Don’t forget to make a testcase for an empty map…
HashMap does not maintain insertion order....If you want insertion order to be maintained use a linkedhashmap
I am trying to store data in a HashMap however I can only seem to store the very last item of the data source I am reading into the HashMap and I am unsure why.
Below is my code:
//Loops through the counties and stores the details in a Hashmap
void getCountyDetails(List<Marker>m){
HashMap t = new HashMap();
for(Marker county: countyMarkers){
println("county:" + county.getProperties());
t = county.getProperties();
}
println(t);
}
This line -> println("county:" + county.getProperties());
Outputs this:
county:{name=Carlow, pop=54,612}
county:{name=Cavan, pop=73,183}
county:{name=Clare, pop=117,196}
county:{name=Cork, pop=519,032}
county:{name=Donegal, pop=161,137}
county:{name=Dublin, pop=1,273,069}
county:{name=Galway, pop=250,541}
county:{name=Kerry, pop=145,502}
county:{name=Kildare, pop=210,312}
county:{name=Kilkenny, pop=95,419}
county:{name=Laois, pop=80,559}
county:{name=Letrim, pop=31,796}
county:{name=Limerick, pop=191,809}
county:{name=Longford, pop=39,000}
county:{name=Louth, pop=122,897}
county:{name=Mayo, pop=130,638}
county:{name=Meath, pop=184,135}
county:{name=Monaghan, pop=60,483}
county:{name=Offaly, pop=76,687}
county:{name=Roscommon, pop=64,065}
county:{name=Sligo, pop=65,393}
county:{name=Tipperary, pop=158,754}
county:{name=Waterford, pop=113,795}
county:{name=Westmeath, pop=86,164}
county:{name=Wexford, pop=145,320}
county:{name=Wicklow, pop=136,640}
I would like to store them in a HashMap.
This line -> println(t); outputs:
{name=Wicklow, pop=136,640}
Would appreciate any help on the matter guys. Basically it's just getting the list of data into the hashmap and currently only the last item in that list is being placed in.
If you want to print the properties of each Marker , move the println(t) line into the for loop, because at the moment t will point to the last used element's properties, because you just reassign it;s value each iteration of the cycle. To put an element in the map, use put(Key, Value) or putAll() methods instead
In java, you should use hashMap.put(key, value) to add new item into hash map.
In your code, you wrote HashMap t = new HashMap(); t = county.getProperties(); so you map value is actually been reassigned to country property each time.
I have a problem with the HashMap. It changes the references stored as values when new Key-Value-Pairs are inserted.
I use the HashMap for quicker access to Objects that are otherwise stored in a very hierarchical structure. When the first pair was inserted, its address and the original address are identical. After adding another pair, the address stored in the HashMap is changed. Therefor I cant the original Objects through the HashMap.
Why is this happening?
Here is the code how I construct the HashMap. In the second method, in the first for-loop the above described happens.
private Map<String, Parameter> createRefMap(Settings settings) {
Map<String, Parameter> result = new HashMap<String, Parameter>();
for (ParameterList parameterList : settings.getParameterList()) {
result.putAll(createRefMap(parameterList, "SETTINGS"));
}
return result;
}
private Map<String, Parameter> createRefMap(ParameterList parameterList, String preLevel) {
Map<String, Parameter> result = new HashMap<String, Parameter>();
String level = preLevel + "/" + parameterList.getName();
for (Parameter parameter : parameterList.getParameter()) {
result.put(level + "/" + parameter.getName(), parameter);
}
for (ParameterList innerParameterList : parameterList.getParameterList()) {
result.putAll(createRefMap(innerParameterList, level));
}
return result;
}
This is how I call it
this.actRefMap = createRefMap(this.actAppSettings);
If I understand you correctly, if you do something like this:
System.out.println(thing1.toString());
myMap.put(key1, thing1);
myMap.put(key2, thing2);
System.out.println(thing1.toString());
that the second println will somehow print out results from a different object? Is it any particular object, or just one at random? What you state as your problem is not possible; it would break an unthinkable number of java programs.
Part of your assertion is that the "address" changes; I'm not sure what you mean by that. The object id, visible in many debuggers? physical memory address? Again, if either of these things happened, Map would be broken.
If your actual problem is that some other reference to thing1 no longer has the contents of the reference in the map, then you are changing that external reference to thing1 somewhere.