How to see the distribution of keys in a HashMap? - java

When using a hash map, it's important to evenly distribute the keys over the buckets.
If all keys end up in the same bucket, you essentially end up with a list.
Is there a way to "audit" a HashMap in Java in order to see how well the keys are distributed?
I tried subtyping it and iterating Entry<K,V>[] table, but it's not visible.

I tried subtyping it and iterating Entry[] table, but it's not visible
Use Reflection API!
public class Main {
//This is to simulate instances which are not equal but go to the same bucket.
static class A {
#Override
public boolean equals(Object obj) { return false;}
#Override
public int hashCode() {return 42; }
}
public static void main(String[] args) {
//Test data
HashMap<A, String> map = new HashMap<A, String>(4);
map.put(new A(), "abc");
map.put(new A(), "def");
//Access to the internal table
Class clazz = map.getClass();
Field table = clazz.getDeclaredField("table");
table.setAccessible(true);
Map.Entry<Integer, String>[] realTable = (Map.Entry<Integer, String>[]) table.get(map);
//Iterate and do pretty printing
for (int i = 0; i < realTable.length; i++) {
System.out.println(String.format("Bucket : %d, Entry: %s", i, bucketToString(realTable[i])));
}
}
private static String bucketToString(Map.Entry<Integer, String> entry) throws Exception {
if (entry == null) return null;
StringBuilder sb = new StringBuilder();
//Access to the "next" filed of HashMap$Node
Class clazz = entry.getClass();
Field next = clazz.getDeclaredField("next");
next.setAccessible(true);
//going through the bucket
while (entry != null) {
sb.append(entry);
entry = (Map.Entry<Integer, String>) next.get(entry);
if (null != entry) sb.append(" -> ");
}
return sb.toString();
}
}
In the end you'll see something like this in STDOUT:
Bucket : 0, Entry: null
Bucket : 1, Entry: null
Bucket : 2, Entry: Main$A#2a=abc -> Main$A#2a=def
Bucket : 3, Entry: null

HashMap uses the keys produced by the hashCode() method of your key objects, so I guess you are really asking how evenly distributed those hash code values are. You can get hold of the key objects using Map.keySet().
Now, the OpenJDK and Oracle implementations of HashMap do not use the key hash codes directly, but apply another hashing function to the provided hashes before distributing them over the buckets. But you should not rely on or use this implementation detail. So you ought to ignore it. So you should just ensure that the hashCode() methods of your key values are well distributed.
Examining the actual hash codes of some sample key value objects is unlikely to tell you anything useful unless your hash cide method is very poor. You would be better doing a basic theoretical analysis of your hash code method. This is not as scary as it might sound. You may (indeed, have no choice but to do so) assume that the hash code methods of the supplied Java classes are well distributed. Then you just need a check that the means you use for combining the hash codes for your data members behaves well for the expected values of your data members. Only if your data members have values that are highly correlated in a peculiar way is this likely to be a problem.

You can use reflection to access the hidden fields:
HashMap map = ...;
// get the HashMap#table field
Field tableField = HashMap.class.getDeclaredField("table");
tableField.setAccessible(true);
Object[] table = (Object[]) tableField.get(map);
int[] counts = new int[table.length];
// get the HashMap.Node#next field
Class<?> entryClass = table.getClass().getComponentType();
Field nextField = entryClass.getDeclaredField("next");
nextField.setAccessible(true);
for (int i = 0; i < table.length; i++) {
Object e = table[i];
int count = 0;
if (e != null) {
do {
count++;
} while ((e = nextField.get(e)) != null);
}
counts[i] = count;
}
Now you have an array of the entry counts for each bucket.

Client.java
public class Client{
public static void main(String[] args) {
Map<Example, Number> m = new HashMap<>();
Example e1 = new Example(100); //point 1
Example e2 = new Example(200); //point2
Example e3 = new Example(300); //point3
m.put(e1, 10);
m.put(e2, 20);
m.put(e3, 30);
System.out.println(m);//point4
}
}
Example.java
public class Example {
int s;
Example(int s) {
this.s =s;
}
#Override
public int hashCode() {
// TODO Auto-generated method stub
return 5;
}
}
Now at point 1, point 2 and point 3 in Client.java, we are inserting 3 keys of type Example in hashmap m. Since hashcode() is overridden in Example.java, all three keys e1,e2,e3 will return same hashcode and hence same bucket in hashmap.
Now the problem is how to see the distribution of keys.
Approach :
Insert a debug point at point4 in Client.java.
Debug the java application.
Inspect m.
Inside m, you will find table array of type HashMap$Node and size 16.
This is literally the hashtable. Each index contains a linked list of Entry objects that are inserted into hashmap. Each non null index has a hash variable that correspond to the hash value returned by the hash() method of Hashmap. This hash value is then sent to indexFor() method of HashMap to find out the index of table array , where the Entry object will be inserted. (Refer #Rahul's link in comments to question to understand the concept of hash and indexFor).
For the case, taken above, if we inspect table, you will find all but one key null.
We had inserted three keys but we can see only one, i.e. all three keys have been inserted into the same bucket i.e same index of table.
Inspect the table array element(in this case it will be 5), key correspond to e1, while value correspond to 10 (point1)
next variable here points to next node of Linked list i.e. next Entry object which is (e2, 200) in our case.
So in this way you can inspect the hashmap.
Also i would recommend you to go through internal implementation of hashmap to understand HashMap by heart.
Hope it helped..

Related

Compare value according to key in LinkedHashMap to another LinkedHashMap using java

I have two linked hashmap (key - String, value = String[]) which got the same size and the same keys in both linked hashmaps, I want to be able to compare values according to the key, verifying values on one linked hashmap are equals to the same values in the second linked hashmap (by key) or at least the other linked hashmap contains the values.
I am populating both of the linked hashmaps with keys and values and set it to different linked hash maps.
Example for hashmap:
Key - alert - Value (array of strings)
0 - Device_UID,Instance_UID,Configuration_Set_ID,Alert_UID
1 - a4daeccb-0115-430c-b516-ab7edf314d35,0a7938aa-9a01-437f-88ac-4b2927ed7665,96,61b68069-9de7-4b85-83cb-8d9f558e8ecb
2 - a4daeccb-0115-430c-b516-ab7edf314d35,0a7938aa-9a01-437f-88ac-4b2927ed7665,12,92757faa-bf6b-4aa3-ba6d-2e57b44f333c
3 - a4daeccb-0115-430c-b516-ab7edf314d35,0a7938aa-9a01-437f-88ac-4b2927ed7665,369,779b3294-2ca3-4613-a413-bf8d4aa05d16
and it should be at least in the second linked hash- map
String rdsColumns="";
for(String key : mapServer.keySet()){
String[] value = mapServer.get(key);
String[] item = value[0].split(",");
rdsColumns="";
for(String val:item){
rdsColumns = rdsColumns.concat(val + ",");
}
rdsColumns = rdsColumns.concat(" ");
rdsColumns = rdsColumns.replace(", ", "");
info(("Query is: "+ returnSuitableQueryString(rdsColumns, key, alertId, deviceId)));
String query=returnSuitableQueryString(rdsColumns, key, alertId, deviceId);
mapRDS.put(key, insightSQL.returnResultsAsArray(query ,rdsColumns.split(","),rdsColumns));
}
where rdsColumns are the fields I am querying in RDS data-base.
Expected: iterating over both maps and verifying at that all values according to key in the first map contains or equal in the second map.
This is the code you are looking for:
for (String keys : firstMap.keySet()) {
String[] val1 = firstMap.get(keys);
String[] val2 = secondMap.get(keys);
if (Arrays.equals(val1, val2)) {
//return true;
}
ArrayList<Boolean> contains = new ArrayList<>();
for (int i = 0; i < val1.length; i++) {
for (String[] secondMapVal : secondMap.values()) {
List<String> list = Arrays.asList(secondMapVal);
if (list.contains(val1[i])) {
contains.add(true);
break;
} else contains.add(false);
}
}
if (contains.contains(true)) {
//return true; Even a single value matches up
} else {
//return false; Not even a sinle value matches up
}
}
Basically what we have here is a HashMap<String, String>. We take the set of keys and iterate through them. Then we take the value with the key from the two sets. After we got the values we compare them and if they are the same I just print that they match. You can change this and implement this with other types of HashMaps, even where you use custom values. If I didn't understand your problem tell me and I will edit the answer.

Java - how to get a key object (or entry) stored in HashMap by key?

I'd like to get the "canonical" key object for each key usable to query a map. See here:
Map<UUID, String> map = new HashMap();
UUID a = new UUID("ABC...");
map.put(a, "Tu nejde o zamykání.");
UUID b = new UUID("ABC...");
String string = map.get(b); // This gives that string.
// This is what I am looking for:
UUID againA = map.getEntry(b).key();
boolean thisIsTrue = a == againA;
A HashMap uses equals(), which is the same for multiple unique objects. So I want to get the actual key from the map, which will always be the same, no matter what object was used to query the map.
Is there a way to get the actual key object from the map? I don't see anything in the interface, but perhaps some clever trick I overlooked?
(Iterating all entries or keys doesn't count.)
Is there a way to get the actual key object from the map?
OK, so I am going to make some assumptions about what you mean. After all, you said that your question doesn't need clarification, so the obvious meaning that I can see must be the correct one. Right? :-)
The answer is No. There isn't a way.
Example scenario (not compileable!)
UUID uuid = UUID.fromString("xxxx-yyy-zzz");
UUID uuid2 = UUID.fromString("xxxx-yyy-zzz"); // same string
println(uuid == uuid2); // prints false
println(uuid.equals(true)); // prints true
Map<UUID, String> map = new ...
map.put(uuid, "fred");
println(map.get(uuid)); // prints fred
println(map.get(uuid2)); // prints fred (because uuid.equals(uuid2) is true)
... but, the Map API does not provide a way to find the actual key (in the example above it is uuid) in the map apart from iterating the key or entry sets. And I'm not aware of any existing Map class (standard or 3rd-party) that does provide this1.
However, you could implement your own Map class with an additional method for returning the actual key object. There is no technical reason why you couldn't, though you would have more code to write, test, maintain, etcetera.
But I would add that I agree with Jim Garrison. If you have a scenario where you have UUID objects (with equality-by-value semantics) and you also want to implement equality by identity semantics, then there is probably something wrong with your application's design. The correct approach would be to change the UUID.fromString(...) implementation to always return the same UUID object for the same input string.
1 - This is not to say that such a map implementation doesn't exist. But if it does, you should be able to find it if you look hard enough Note that Questions asking us to find or recommend a library are off-topic!
There is a (relatively) simple way of doing this. I’ve done so in my applications from time to time, when needed ... not for the purpose of == testing, but to reduce the number of identical objects being stored when tens of thousand of objects exist, and are cross-referenced with each other. This significantly reduced my memory usage, and improved performance ... while still using equals() for equality tests.
Just maintain a parallel map for interning the keys.
Map<UUID, UUID> interned_keys = ...
UUID key = ...
if (interned_keys.contains(key))
key = interned_keys.get(key)
Of course, it is far better when the object being stored knows what its own identity is. Then you get the interning basically for free.
class Item {
UUID key;
// ...
}
Map<UUID, Item> map = ...
map.put(item.key, item);
UUID key = ...
key = map.get(key).key; // get interned key
I think there are valid reasons for wanting the actual key. For example, to save memory. Also keep in mind that the actual key may store other objects. For instance, suppose you have a vertex of a graph. The vertex can store the actual data (Say a String, for instance), as well as the incident vertices. A vertex hash value can be dependent only on the data. So to look up a vertex with some data,
D, look up a vertex with data, D,and with with no incident values. Now if you can return the actual vertex in the map you will be able to get the actual incident to the vertex.
It seems to me that many map implementations could easily provide a getEntry method. For example, the HashMap implementation for get is:
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
One could use the getNode method to return an Entry:
public getEntry(Object key){
Node<K,V> e = getNode(hash(key),key);
if(e == null) return null;
return new Entry<>(e.key,e.value);
}
The easiest way is to duplicate the reference to the key in the value using a generic Pair type, like this:
HashMap<UUID,Pair<UUID,String>> myMap = new HashMap<>();
When you put them in the map, you provide the reference to the key to the pair. The cost is one reference per entry.
void add(UUID uuid, String str)
{
myMap.put(uuid,Pair.of(uuid,str));
}
Pair<UUID,String> get(UUID uuid)
{
return myMap.get(uuid);
}
Then getFirst() of the Pair is your key. getSecond() is the value.
Whatever you do, it's going to cost you in either time or space.
Your Pair class will be something like:
public class Pair<A,B>
{
private final A a;
private final B b;
public Pair(A a, B b)
{
this.a = a;
this.b = b;
}
/**
* #return the first argument of the Pair
*/
public A getFirst()
{
return this.a;
}
/**
* #return the second argument of the Pair
*/
public B getSecond()
{
return this.b;
}
/**
* Create a Pair.
*
* #param a The first argument (of type A)
* #param b The second argument (of type B)
*
* #return A Pair of A and B
*/
public static <A,B> Pair<A,B> of(A a, B b)
{
return new Pair<>(a,b);
}
// Don't forget to get your IDE to produce a hashcode()
// and equals() method for you, depending
// on if you allow nulls or not, or DIY.
}
it could help. You can use a for each like below.
Map<String,Object> map = new HashMap<>();
map.put("hello1", new String("Hello"));
map.put("hello2", new String("World"));
map.put("hello3", new String("How"));
map.put("hello4", new String("Are u"));
for(Map.Entry<String,Object> e: map.entrySet()){
System.out.println(e.getKey());
}

Contains operation in hashmap key

My hashmap contains one of entry as **key: its-site-of-origin-from-another-site##NOUN** and **value: its##ADJ site-of-origin-from-another-site##NOUN**
i want to get the value of this key on the basis of only key part of `"its-site-of-origin-from-another-site"``
If hashmap contains key like 'its-site-of-origin-from-another-site' then it should be first pick 'its' and then 'site-of-origin-from-another-sit' only not the part after '##'
No. It would be a String so it will pick up whatever after "##" as well. If you need value based on substring then you would have to iterate over the map like:
String value = map.get("its...");
if (value != null) {
//exact match for value
//use it
} else {//or use map or map which will reduce your search time but increase complexity
for (Map.Entry<String, String> entry : map.entrySet()) {
if (entry.getKey().startsWith("its...")) {
//that's the value i needed.
}
}
}
You can consider using a Patricia trie. It's a data structure like a TreeMap where the key is a String and any type of value. It's kind of optimal for storage because common string prefix between keys are shared, but the property which is interesting for your use case is that you can search for specific prefix and get a sorted view of the map entries.
Following is an example with Apache Common implementation.
import org.apache.commons.collections4.trie.PatriciaTrie;
public class TrieStuff {
public static void main(String[] args) {
// Build a Trie with String values (keys are always strings...)
PatriciaTrie<String> pat = new PatriciaTrie<>();
// put some key/value stuff with common prefixes
Random rnd = new Random();
String[] prefix = {"foo", "bar", "foobar", "fiz", "buz", "fizbuz"};
for (int i = 0; i < 100; i++) {
int r = rnd.nextInt(6);
String key = String.format("%s-%03d##whatever", prefix[r], i);
String value = String.format("%s##ADJ %03d##whatever", prefix[r], i);
pat.put(key, value);
}
// Search for all entries whose keys start with "fiz"
SortedMap<String, String> fiz = pat.prefixMap("fiz");
fiz.entrySet().stream().forEach(e -> System.out.println(e));
}
}
Prints all keys that start with "fiz" and sorted.
fiz-000##whatever
fiz-002##whatever
fiz-012##whatever
fiz-024##whatever
fiz-027##whatever
fiz-033##whatever
fiz-036##whatever
fiz-037##whatever
fiz-041##whatever
fiz-045##whatever
fiz-046##whatever
fiz-047##whatever
fizbuz-008##whatever
fizbuz-011##whatever
fizbuz-016##whatever
fizbuz-021##whatever
fizbuz-034##whatever
fizbuz-038##whatever

Java, hashmap inside a hashmap

follow up from my question here: How To Access hash maps key when the key is an object
I wanted to try something like this: webSearchHash.put(xfile.getPageTitle(i),outlinks.put(keyphrase.get(i), xfile.getOutLinks(i)));
Wonder why my keys are null
here is my code:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Set;
import readFile.*;
public class WebSearch {
readFile.ReadFile xfile = new readFile.ReadFile("inputgraph.txt");
HashMap webSearchHash = new HashMap();
ArrayList belongsTo = new ArrayList();
ArrayList keyphrase = new ArrayList();
public WebSearch() {
}
public void createGraph()
{
HashMap <Object, ArrayList<Integer> > outlinks = new HashMap <Object, ArrayList<Integer>>();
for (int i = 0; i < xfile.getNumberOfWebpages(); i++ )
{
keyphrase.add(i,xfile.getKeyPhrases(i));
webSearchHash.put(xfile.getPageTitle(i),outlinks.put(keyphrase.get(i), xfile.getOutLinks(i)));
}
}
}
when I do System.out.print(webSearchHash); the output is {Star-Ledger=null, Apple=null, Microsoft=null, Intel=null, Rutgers=null, Targum=null, Wikipedia=null, New York Times=null}
However System.out.print(outlinks); gives me : {[education, news, internet]=[0, 3], [power, news]=[1, 4], [computer, internet, device, ipod]=[2]} Basically I want a hashmap to be a value of my key
You really shouldn't use a HashMap (or any mutable object) as your key, since it will destabilize your Map. Depending on what you're intending to accomplish, there may be a number of useful approaches and libraries, but using an unstable object as a Map key is asking for trouble.
So figured I just do this which gives exactly what I want:
for (int i = 0; i < xfile.getNumberOfWebpages(); i++ )
{
HashMap <Object, ArrayList<Integer> > outlinks = new HashMap <Object, ArrayList<Integer>>();
keyphrase.add(i,xfile.getKeyPhrases(i));
outlinks.put(keyphrase.get(i), xfile.getOutLinks(i));
webSearchHash.put(xfile.getPageTitle(i), outlinks);
}
Your problem is you are putting in null with this statement
webSearchHash.put(xfile.getPageTitle(i),outlinks.put(keyphrase.get(i), xfile.getOutLinks(i)));
lets break it down. a put is of the form
map.put(key,value)
so for your key you have getPageTitle(i). which is fine
for your value, you have the return value of
outlinks.put(keyphrase.get(i), xfile.getOutLinks(i))
according to the javadoc, a hashmap put returns the previous value that was associated with this key (in this case keyphrase.get(i)) or null if no value was previously associated with it.
Since nothing was previously associated with your key, it returns null.
So your statement effectively is saying
webSearchHash.put(xfile.getPageTitle(i),null);
http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html#put(K, V)

Java : ArrayList<HashMap<Integer,Integer>>

Is there a better approach to do the below in Java, without using external libraries.
I need to model group/child (tree like) structure of int (primitive). In Json
[{1,1}, {1,2}, {2,1},{3,1}]
I need to support addition/removal of elements (element is a pair {group, child} ) without duplication.
I am thinking of, keeping a data structure like.
ArrayList<HashMap<Integer,Integer>>
To add.
Iterate through ArrayList, check HashMap key and value against the value to insert, and insert if not exist.
To delete:
Iterate through ArrayList, check HashMap key and value against the value to delete, and delete if exist.
Is there a better data structure/approach with standard library.
As per one of the answer below, I made a class like this.
Please let me know anything to watchout. I am expecting (and going to try out) arraylist would handle add/remove correctly by using the equal method in KeyValue class. thanks.
static class KeyValue {
int groupPos;
int childPos;
KeyValue(int groupPos, int childPos) {
this.groupPos = groupPos;
this.childPos = childPos;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
KeyValue keyValue = (KeyValue) o;
if (childPos != keyValue.childPos) return false;
if (groupPos != keyValue.groupPos) return false;
return true;
}
#Override
public int hashCode() {
int result = groupPos;
result = 31 * result + childPos;
return result;
}
}
If I understand what you're trying to do, this may be simpler:
TreeMap<Integer,TreeSet<Integer>>
or
HashMap<Integer,HashSet<Integer>>
So, rather than
[{1,1}, {1,2}, {2,1}, {3,1}]
you'd have
[{1, {1, 2}},
{2, {1}},
{3, {1}}]
Note that all 4 of the above classes automatically handles eliminating duplicates.
To add:
TreeMap<Integer, TreeSet<Integer>> map;
TreeSet<Integer> set = map.get(group);
if (set == null) // create it if it doesn't exist
{
set = new TreeSet<Integer>();
map.put(group, set);
}
set.add(child);
To remove:
TreeMap<Integer, TreeSet<Integer>> map;
TreeSet<Integer> set = map.get(group);
set.remove(child);
if (set.isEmpty()) // remove it if it is now empty
map.remove(group);
You may write a class with name KeyValue with two properties to hold group and child. Add KeyValue Objects to ArrayList. For CRUD operations, you may implement equals and compare in your KeyValue pair class.
Instead of HashMap, use a class called Pair with two fields {group,child} which will implement Comparable interface. Then implement/override its equals(), hashCode() and compareTo() methods. Then use either a List<Pair> or Set<Pair> depending on your needs to hold them. Having compareTo() implemented gives you the flexibility to sort Pairs easily too.
I am new to the Data Structure world but I think we can use this based on the assumption that no two Set Objects will be similar
Set validSet=new HashSet(); // Use Generics here
HashSet will provide a constant time for add/delete/contains
SomeObject{
Integer parent ;
Integer child;
//define equals method based on your requirement
}
Going By your Question i think that You want to show this line
[{1,1}, {1,2}, {2,1},{3,1}]
as
Group 1-> 1 , 2 (from first two pair) Group 2-> 1(from
third pair) Group 3-> 1 (from fourth pair)
The data structure that suites most for storing this hierarchy is :
Map<Integer,Set<Integer>> map = new HashMap<Integer,Set<Integer>>();
Where the key part of map stores the group Number. And the value part of map is storing TreeSet which stores the children of that group.
As Example of code:
import java.util.HashMap;
import java.util.ListIterator;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;
class TreeLike
{
public static void main(String[] args)
{
Map<Integer,Set<Integer>> map = new HashMap<Integer,Set<Integer>>();
int groups[] = {1,2,3,4,5,6,7};
//To add new group in map
for (int i = 0 ; i < groups.length; i++)
{
Set<Integer> child = new TreeSet<Integer>();
child.add(1);child.add(2);child.add(3);child.add(4);child.add(5);
map.put(groups[i],child);
}
//To add new child(8) to a group (say group 1)
Set<Integer> child = map.get(1);
if (child != null)
{
child.add(8);
map.put(1,child);
}
//To remove a child (say child 4) from group 3
child = map.get(3);
if (child != null)
{
child.remove(4);
map.put(1,child);
}
//To Iterate through all trees
Set<Map.Entry<Integer,Set<Integer>>> entrySet = map.entrySet();
Iterator<Map.Entry<Integer,Set<Integer>>> iterator = entrySet.iterator();
while (iterator.hasNext())
{
Map.Entry<Integer,Set<Integer>> entry = iterator.next();
int group = entry.getKey();
Set<Integer> children = entry.getValue();
System.out.println("Group "+group+" children-->"+children);
}
}
}

Categories

Resources