I am new to Java and I am trying to learn about hash tables. I want to insert objects into my hash table and then be able to print all the objects from the hash table at the end. I am not sure I am doing doing this right because I have read that I need to override the get() method or hashCode() method but I am not sure why.
I am passing in String objects of student names. When I run the debugger after my inserts, it shows the key as "null" and the indexes of my inserts are at random places in the hash table. Ex. 1, 6, 10
This is how I have been adding. Can anyone tell me if this is correct and do I actually need to override things?
Thanks in advance!
CODE
Hashtable<String,String> hashTable=new Hashtable<String,String>();
hashTable.put("Donald", "Trump");
hashTable.put("Mike", "Myers");
hashTable.put ("Jimmer", "Markus");
You are doing things correctly. Remember, a Hashtable is not a direct-access structure. You can't "get the third item from a Hashtable", for example. There is no real meaning to the term "index" when you're talking about a Hashtable: numerical indexes of items mean nothing.
A Hashtable guarantees that it will hold key-value pairs for you, in a way that it will be very fast to conclude a value based on a key (for example: given Donald, you will get Trump very quickly). Of course, certain conditions have to be fulfilled for this to work right, but for your simple String-to-String example, that works.
You should read more about hash tables in general, to see how they really work behind the scenes.
EDIT (as per OP's request): you are asking about storing Student instances in your Hashtable. As I mentioned above, certain conditions have to be addressed for a Hashtable to work correctly. Those conditions are concerning the key part, not the value part.
If your Student instance is the value, and a simple String is the key, then there's nothing special for you to do, because the String primitive already answers all of the conditions required for a proper Hashtable key.
If your Student instance is the key, then the following conditions must be met:
Inside Student, you must override the hashCode method in such a way that subsequent invocations of hashCode will return exactly the same value. In other words, the expression x.hashCode() == x.hashCode() must always be true.
Inside Student, you must override the equals method in such a way that it will only return true for two identical instances of Student, and return false otherwise.
These conditions are enough for Student to function as a proper Hashtable key. You can further optimize things by writing a better hashCode implementation (read about it... it's quite long to type in here), but as long as you answer the aforementioned two, you're good to go.
Example:
class Student {
private String name;
private String address;
public int hashCode() {
// Assuming 'name' and 'address' are not null, for simplification here.
return name.hashCode() + address.hashCode();
}
public boolean equals (Object other) {
if (!(other instanceof Student) {
return false;
}
if (other == this) {
return true;
}
Student otherStudent = (Student) other;
return name.equals(otherStudent.name) && address.equals(otherStudent.address);
}
}
Try this code:
Hashtable<String,String> hashTable=new Hashtable<String,String>();
hashTable.put("Donald", "16 years old");
hashTable.put("Mike", "20 years old");
hashTable.put ("Jimmer", "18 years old");
Enumeration studentsNames;
String str;
// Show all students in hash table.
studentsNames = hashTable.keys();
while(studentsNames.hasMoreElements()) {
str = (String) studentsNames.nextElement();
txt.append("\n"+str + ": " + hashTable.get(str));
}
Related
I want to create a class Customer who can be uniquely identified by Customer No.
I wrote the code below
public class Customer{
private Integer customerNo;
private String customerName;
public Customer(Integer customerNo, String customerName){
this.customerNo = customerNo;
this.customerName = customerName;
}
#Override
public int hashCode(){
return this.customerNo;
}
public Integer getCustomerNo(){
return this.customerNo;
}
public String getCustomerName(){
return this.customerName;
}
#Override
public boolean equals(Object o){
Customer cus = (Customer) o;
return (this.customerNo == cus.getCustomerNo() && this.customerName != null && this.customerName.equals(cus.getCustomerName()));
}
#Override
public String toString(){
StringBuffer strb = new StringBuffer();
strb.append("Customer No ")
.append(this.customerNo)
.append(", Customer Name ")
.append(this.customerName)
.append("\n");
return strb.toString();
}
public static void main(String [] args){
Set<Customer> set = null;
try{
set = new HashSet<Customer>();
set.add(new Customer(1,"Jack"));
set.add(new Customer(3,"Will"));
set.add(new Customer(1,"Tom"));
set.add(new Customer(3,"Fill"));
System.out.println("Size "+set.size());
}catch(Exception e){
e.printStackTrace();
}
}
}
From the above code you can see that I am returning my hashcode as customer No.
And my equality is also based on customer No. and Customer Name
If I run the above code the output will be
D:\Java_Projects>java Customer
Size 4
D:\Java_Projects>
The output is 4 objects getting created of same customer No.
The reason is even though the customer no. is same, but the names are different,
as per my above implementation of 'equals' its based on both customerNo and customer Name.
As 4 different combinations of CustomerNo-CustomerName, hence 4 objects getting created.
My question is,
Is my above hashcode implementation a bad practise ?
What all failures I can come accross ?
What if I create 500,000 Customer objects with same customer No, what will happen ?
Whether there will be 500,000 customer objects placed in a same bucket No ?
Is my above hashcode implementation a bad practise ?
Assuming different customers have different customerNo most of the time, this is a good implementation. In a real world application, customerNo would most likely be a unique identifier, with uniqueness guaranteed by a database constraint.
What all failures I can come accross ?
You haven't handled the case where customerNo is null. Here's one way to do that:
public int hashCode(){
return Objects.hash(customerNo);
}
This will return 0 when customerNo is null.
You have another bug in the equals method: Integer objects should not be compared with ==, it will give you unexpected results. Also, two customers with customerName set to null are never equal. The Objects.equals method solves these problems.
return Objects.equals(this.customerNo, cus.customerNo)
&& Objects.equals(this.customerName, cus.customerName);
What if I create 500,000 Customer objects with same customer No, what will happen ?
Whether there will be 500,000 customer objects placed in a same bucket No ?
In this scenario, all objects will indeed be placed in the same bucket. Your HashSet is reduced to a linked list data structure, and it will perform poorly: to locate a customer object, the data structure has to compare the given object with every object in the worst case.
If Customer implemented Comparable, the hash table bucket could use a binary search tree instead of a linked list, and the performance would not be impacted as badly.
There is an implicit contract beteween equals(...) and hashCode():
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
Your implementation satisfies all three constraints. However, best practice is that all attributes that are compared in equals(...) should also influence hashCode() and vice-versa. Otherwise, it might be possible that data structures using hashCode() (e.g. HahsMap and HashSet) can perform sub-optimally. One reason is, as you mentioned, that all objects with the same hashCode are placed in the same bucket and thus the accesss may not have constant time complexity.
This will not, however, result in exceptions being thrown.
I've created a loop which goes through my HashMap. Then I check whether the name of the current key (A) is equal to the key that might be added (B). The hash codes of key A and key B aren't necessarily equal when their names are. Therefore I check whether they are equal by transforming them into a string (with override of .equals()). The code is working but there most be a cleaner and easier way to do this.
This is my current code:
for (HashMap.Entry<Identifier, SetInterface<BigInteger>> entry : idAndSet.entrySet()) {
if (entry.getKey().isEqual(identifier)) {
factor = entry.getValue();
return factor;
}
}
The hash codes of key A and key B aren't necessarily equal when their names are.
That is not a good idea. Any class that acts as a key should override equals and hashCode. And it would be a good idea to make the class immutable as well (otherwise you could end up with some difficult debugging to do).
Once you do that, you can just do
Map<Indentifer, Object> map...;
Object value = map.get(id);
// or as of Java 8+
Object value = map.getorDefault(id, someDefaultValue);
You could misuse the Map.computeIfPresent method.
factor = map.computeIfPresent(identifier, (k,v) -> v);
return factor;
The method returns the value associated with the specified key, or null if none
I know (contract) we need to override hashcode when equals is overridden.
Why should I consider same fields used for equals comparison to compute hashcode?
Is it to improve performance, by avoiding too many objects mapping to same bucket, as in below case?
i.e. all objects created on same "date" would map to same bucket and linear comparison will take time in checking object exists using equals() method?
If my above statement is true, what other potential issues will come with below code other than performance issue. Is that the only reason we should use same fields / members used in equals to compute hashcode? Please share. Thanks.
class MyClass {
int date;
int pay;
int id;
public boolean equals(Object o) {
//null and same class instance check
MyClass obj = (MyClass) o;
return (date == obj.date && pay == obj.pay && id == obj.id);
}
public int hashCode() {
int hash = 7;
return (31 * hash + date);
}
}
//please pardon syntax errors, I typed without using ide.
***my intention is to use all fields in equals, and know why same number of elements should be used in hashcode, and what happens if only few elements are used
Clarification:
With only using "date" to compute hashcode,pointer checks right bucket address (do you agree?) furthermore, I get list of items in that bucket, collection will iterate over to check if particular obj exists using equals. And my definition of equals is "all fields must be same". With this, I believe my code works fine, and I only find performance issue. Please point out where I am wrong. Thank you
For your example, I suggest you use just id for equality and that annotate that they're overrides. Also, I like to override toString()
#Override
public boolean equals(Object o) {
if (o instanceof MyClass) {
return (id == ((MyClass) o).id);
}
return false;
}
#Override
public int hashCode() {
return id;
}
#Override
public String toString() {
return String.format("MyClass (id=%d, date=%d, pay=%d)", id, date, pay);
}
That way you can update the date and/or the pay without having to recreate the hash structure. Also, that's what appears to be unique about instances.
I found the answer in Effective Java, by Joshua Bloch, 2nd edtn, page 49 "Do not be tempted to exclude significant parts of an object from the hash code computation to improve performance" . The poor quality may degrade hash tables' performance.
So my guess was right, multiple hashes will map to same bucket.
Additional information:
http://www.javaranch.com/journal/2002/10/equalhash.html
Since the class members/variables num and data do participate in the
equals method comparison, they should also be involved in the
calculation of the hash code. Though, this is not mandatory. You can
use subset of the variables that participate in the equals method
comparison to improve performance of the hashCode method. Performance
of the hashCode method indeed is very important.
Below is my class. The insertSymbol method is supposed to add an object to the linked list which is then added to a hash table. But when I print the contents of the hash table it has double entries. I tried to correct this by using "if(temp.contains(value)){return;}" but it isn't working. I read that I need to use #override in a couple of places. Could anyone help me know how and where to use the overrides? Thank you!
import java.util.*;
public class Semantic {
String currentScope;
Stack theStack = new Stack();
HashMap<String, LinkedList> SymbolTable= new HashMap<String, LinkedList>();
public void insertSymbol(String key, SymbolTableItem value){
LinkedList<SymbolTableItem> temp = new LinkedList<SymbolTableItem>();
if(SymbolTable.get(key) == null){
temp.addLast(value);
SymbolTable.put(key, temp);
}else{
temp = SymbolTable.get(key);
if(temp.contains(value)){
return;
}else{
temp.addLast(value);
SymbolTable.put(key, temp);
}
}
}
public String printValues(){
return SymbolTable.toString();
}
public boolean isBoolean(){
return true;
}
public boolean isTypeMatching(){
return true;
}
public void stackPush(String theString){
theStack.add(theString);
}
}
You have multiple options here. You'll need at least to add an equals (and therefor also a hashcode) method to your class.
However, if you want your collection to only contain unique items, why not use a Set instead?
If you still want to use a List, you can use your current approach, it just that the characteristics of a Set are that all items in a Set are unique, so a Set might make sense here.
Adding an equals method can quite easily be done. Apache Equalsbuilder is a good approach in this.
You don't need the 2nd line when you add a new value with the same key:
temp.addLast(value);
SymbolTable.put(key, temp); // <-- Not needed. Its already in there.
Let me explain something that #ErikPragt alludes to regarding this code:
if(temp.contains(value)){
What do you suppose that means?
If you look in the javadocs for LinkedList you will find that if a value in the list is non-null, it uses the equals() method on the value object to see if the list element is the same.
What that means, in your case, is that your class SymbolTableItem needs an equals() method that will compare two of these objects to see if they are the same, whatever that means in your case.
Lets assume the instances will be considered the same if the names are the same. You will need a method like this in the 'SymbolTableItem` class:
#Overrides
public boolean equals(Object that) {
if (that == null) {
return false;
}
if (this.getName() == null) {
return that.getName() == null;
}
return this.getName().equals(that.getName());
}
It it depends on more fields, the equals will be correspondingly more complex.
NOTE: One more thing. If you add an equals method to a class, it is good programming practice to add a hashcode() method too. The rule is that if two instances are equal, they should have the same hashcode and if not equal they don't have to be different hashcodes but it would be very nice if they did.
If you use your existing code where only equals is used, you don't need a hashcode, stricly. But if you don't add a hashcode it could be a problem someday. Maybe today.
In the case where the name is all that matters, your hashcode could just return: this.getName().hashcode().
Again, if there are more things to compare to tell if they are equal, the hashcode method will be more complex.
I have a program for my Java class where I want to use hashSets to compare a directory of text documents. Essentially, my plan is to create a hashSet of strings for each paper, and then add two of the papers hashSets together into one hashSet and find the number of same 6-word sequences.
My question is, do I have to manually check for, and handle, collisions, or does Java do that for me?
Java Hash Maps/Sets Automatically handle Hash collisions, this is why it is important to override both the equals and the hashCode methods. As both of them are utilised by Sets to differentiate duplicate or unique entries.
It is also important to note that these hash collisions hava a performance impace since multiple objects are referenced by the same Hash.
public class MyObject {
private String name;
//getter and setters
public int hashCode() {
int hashCode = //Do some object specifc stuff to gen hashCode
return int;
}
public boolean equals(Object obj) {
if(this==obj) return true;
if(obj instanceOf MyObject) {
if(this.name.equals((MyObject)obj.getName())) {
return true;
}
return false;
}
}
}
Note: Standard Java Objects such as String have already implemented hashCode and equals so you only have to do that for your own kind of Data Objects.
I think you did not ask for hash collisions, right? The question is what happens when HashSet a and HashSet b are added into a single set e.g. by a.addAll(b).
The answer is a will contain all elements and no duplicates. In case of Strings this means you can count the number of equal String from the sets with a.size() before add - a.size() after add + b.size().
It does not even matter if some of the Strings have the same hash code but are not equal.