i need to compare two objects in java, it should be tested if their attributes have the same values. Instead of simply compare all the attributes i was thinking about using hash functions. Therefore i have written the following code
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Vector;
public class Test {
private static Vector<String> vecA, vecB;
public static void main(String args[]) {
vecA = new Vector<String>();
vecB = new Vector<String>();
vecA.add("hallo");
vecA.add("blödes Beispiel");
vecA.add("Einer geht noch");
vecB.add("hallo");
vecB.add("blödes Beispiel");
vecB.add("Einer geht noch");
System.out.println("HashCode() VecA: " + vecA.hashCode());
System.out.println("HashCode() VecB: " + vecB.hashCode());
System.out.println("md5 VecA: " + md5(vecA));
System.out.println("md5 VecB: " + md5(vecB));
vecA.add("ungleich");
System.out.println("HashCode() VecA: " + vecA.hashCode());
System.out.println("HashCode() VecB: " + vecB.hashCode());
System.out.println("md5 VecA: " + md5(vecA));
System.out.println("md5 VecB: " + md5(vecB));
}
private static String md5(Vector<String> v){
try {
MessageDigest algorithm = MessageDigest.getInstance("MD5");
algorithm.reset();
algorithm.update(vecA.toString().getBytes());
byte messageDigest[] = algorithm.digest();
StringBuffer hexString = new StringBuffer();
for (int i = 0; i < messageDigest.length; i++) {
String hex = Integer.toHexString(0xFF & messageDigest[i]);
if (hex.length() == 1)
hexString.append('0');
hexString.append(hex);
}
return hexString.toString();
} catch (NoSuchAlgorithmException nsae) {}
return null;
}
}
The md5 function is simply copied by some website. This leads to the following output
HashCode() VecA: -356464767
HashCode() VecB: -356464767
md5 VecA: 6805716958249f5b7f177fc95408713e
md5 VecB: 6805716958249f5b7f177fc95408713e
HashCode() VecA: 1477685990
HashCode() VecB: -356464767
md5 VecA: c76297ce297d5308359ca06f26fb97ca
md5 VecB: c76297ce297d5308359ca06f26fb97ca
I am confused that adding an element in vecA seems to change the md5 Code of vecB and so their hashes are still the same. Whats the reason and is their any advantage in using java.security.MessageDigest or simply hashCode() in that case? How about the performance of hash functions vs. comparisons of all attributes?
You're building the md5 hash for vecA in both cases:
algorithm.update(vecA.toString().getBytes());
should probably be
algorithm.update(v.toString().getBytes());
Whats the reason and is their any advantage in using java.security.MessageDigest or simply hashCode() in that case?
One advantage of not using hashCode would be that if that method is not overridden, two instances of the same class and with the same attribute values would still return different hashes, due to the default implementation of hashCode.
How about the performance of hash functions vs. comparisons of all attributes?
If you just compare the attributes once, there might not be any noticable performance difference (that, however, depends on how you compare the attributes) but in case of repeatedly comparing a number of attributes vs. calculating a hash once and repeatedly comparing the hashes, the latter might be faster.
Edit:
Here's an example to clarify the answer to the first quote.
Consider the following simple piece of code:
static class A {
int x;
public A( int i) {
x = i;
}
}
static class B {
int x;
public B( int i) {
x = i;
}
public int hashCode() {
final int prime = 31;
return prime * x;
}
public boolean equals( Object obj ) {
//by contract you should always override equals and hashCode together
//also note that some checks are omitted for simplicity's sake (obj might be null etc.)
return getClass().equals( obj.getClass()) && x == ((B)obj).x;
}
}
As you can see, A doesn't override hashCode while B does. Thus you'll get the following result:
System.out.println(new A( 500 ).hashCode() == new A(500).hashCode()); //false
System.out.println(new B( 500 ).hashCode() == new B(500).hashCode()); //true
Note that x is the same in both cases but for A#hashCode() the object identity is used, not the value of x as does B#hashCode()
There is a problem in your md5 method
algorithm.update(vecA.toString().getBytes());
Probably should be
algorithm.update(v.toString().getBytes());
Related
I am loading data on network traffic from a file. The information I'm loading is attacker IP address, victim IP address, and date. I've combined these data into a Traffic object, for which I've defined the hashCode and equals functions. Despite this, the HashMap I'm loading them into treats identical Traffic objects as different keys. The entire Traffic object complete with some simple test code in the main method follows:
import java.util.HashMap;
public class Traffic {
public String attacker;
public String victim;
public int date;
//constructors, getters and setters
#Override
public int hashCode() {
long attackerHash = 1;
for (char c:attacker.toCharArray()) {
attackerHash = attackerHash * Character.getNumericValue(c) + 17;
}
long victimHash = 1;
for (char c:victim.toCharArray()) {
victimHash = victimHash * Character.getNumericValue(c) + 17;
}
int IPHash = (int)(attackerHash*victimHash % Integer.MAX_VALUE);
return (IPHash + 7)*(date + 37) + 17;
}
public boolean equals(Traffic t) {
return this.attacker.equals(t.getAttacker()) && this.victim.equals(t.getVictim()) && this.date == t.getDate();
}
public static void main(String[] args) {
Traffic a = new Traffic("209.167.099.071", "172.016.112.100", 7);
Traffic b = new Traffic("209.167.099.071", "172.016.112.100", 7);
System.out.println(a.hashCode());
System.out.println(b.hashCode());
HashMap<Traffic, Integer> h = new HashMap<Traffic, Integer>();
h.put(a, new Integer(1));
h.put(b, new Integer(2));
System.out.println(h);
}
}
I can't speak to the strength of my hash method, but the outputs of the first two prints are identical, meaning it at least holds for this case.
Since a and b are identical in data (and therefore equals returns true), and the hashes are identical, the HashMap should recognize them as the same and update the value from 1 to 2 instead of creating a second entry with value 2. Unfortunately, it does not recognize them as the same and the output of the final print is the following:
{packagename.Traffic#1c051=1, packagename.Traffic#1c051=2}
My best guess at this is that HashMap's internal workings are ignoring my custom hashCode and equals methods, but if that's the case then why? And if that guess is wrong then what is happening here?
The problem here is your equals method, which does not override Object#equals. To prove this, the following will not compile with the #Override annotation:
#Override
public boolean equals(Traffic t) {
return this.attacker.equals(t.getAttacker()) &&
this.victim.equals(t.getVictim()) &&
this.date == t.getDate();
}
The implementation of HashMap uses Object#equals and not your custom implementation. Your equals method should accept an Object as a parameter instead:
#Override
public boolean equals(Object o) {
if (!(o instanceof Traffic)) {
return false;
}
Traffic t = (Traffic) o;
return Objects.equals(attacker, t.attacker) &&
Objects.equals(victim, t.victim) &&
date == t.date;
}
Context
Hi, I'm working on an assignment for school that asks us to implement a hash table in Java. There are no requirements that collisions be kept to a minimum, but low collision rate and speed seem to be the two most sought-after qualities in all the reading (some more) that I've done.
Problem
I'd like some guidance on how to map the output of a hash function to a smaller range, without having >20% of my keys collide (yikes).
In all of the algorithms that I've explored, keys are mapped to the entire range of an unsigned 32 bit integer (or in many cases, 64, even 128 bit). I'm not finding much about this on here, Wikipedia, or in any of the hash-related articles / discussions I've come across.
In terms of the specifics of my implementation, I'm working in Java (mandate of my school), which is problematic since there are no unsigned types to work with. To get around this, I've been using the 64-bit long integer type, then using a bit mask to map back down to 32 bits. Instead of simply truncating, I XOR the top 32 bits with the bottom 32, then perform a bitwise AND to mask out any upper bits that might result in a negative value when I cast it down to a 32 bit integer. After all that, a separate function compresses the resulting hash value down to fit into the bounds of the hash table's inner array.
It ends up looking like:
int hash( String key ) {
long h;
for( int i = 0; i < key.length(); i++ )
//do some stuff with each character in the key
h = h ^ ( h << 32 );
return h & 2147483647;
}
Where the inner-loop depends on the hash function (I've implemented a few: polynomial hashing, FNV1, SuperFastHash, and a custom one tailored to the input data).
They basically all perform horribly. I have yet to see <20% keys collide. Even before I compress the hash values down to array indices, none of my hash functions will get me less thank 10k collisions. My inputs are two text files, each ~220,000 lines. One is English words, the other is random strings of varying length.
My lecture notes recommend the following, for compressing the hashed keys:
(hashed key) % P
Where P is the largest prime < the size of the inner array.
Is this an accepted method of compressing hash values? I have a feeling it isn't, but since performance is so poor even before compression, I have a feeling it's not the primary culprit, either.
I don´t know if I understand well your concrete problem, but I will try to help in hash performance and collisions.
The hash based objects will determine in which bucket they will store the key-value pair based on hash value. Inside each bucket there is a structure (In HashMap case a LinkedList) in where the pair is stored.
If the hash value is usually the same, the bucket will be usually the same so the performance will degrade a lot, let´s see an example:
Consider this class
package hashTest;
import java.util.Hashtable;
public class HashTest {
public static void main (String[] args) {
Hashtable<MyKey, String> hm = new Hashtable<>();
long ini = System.currentTimeMillis();
for (int i=0; i<100000; i++) {
MyKey a = new HashTest().new MyKey(String.valueOf(i));
hm.put(a, String.valueOf(i));
}
System.out.println(hm.size());
long fin = System.currentTimeMillis();
System.out.println("tiempo: " + (fin-ini) + " mls");
}
private class MyKey {
private String str;
public MyKey(String i) {
str = i;
}
public String getStr() {
return str;
}
#Override
public int hashCode() {
return 0;
}
#Override
public boolean equals(Object o) {
if (o instanceof MyKey) {
MyKey aux = (MyKey) o;
if (this.str.equals(aux.getStr())) {
return true;
}
}
return false;
}
}
}
Note that hashCode in class MyKey returns always '0' as hash. It is ok with the hashcode definition (http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()). If we run that program, this is the result
100000
tiempo: 62866 mls
Is a very poor performance, now we are going to change the MyKey hashcode code:
package hashTest;
import java.util.Hashtable;
public class HashTest {
public static void main (String[] args) {
Hashtable<MyKey, String> hm = new Hashtable<>();
long ini = System.currentTimeMillis();
for (int i=0; i<100000; i++) {
MyKey a = new HashTest().new MyKey(String.valueOf(i));
hm.put(a, String.valueOf(i));
}
System.out.println(hm.size());
long fin = System.currentTimeMillis();
System.out.println("tiempo: " + (fin-ini) + " mls");
}
private class MyKey {
private String str;
public MyKey(String i) {
str = i;
}
public String getStr() {
return str;
}
#Override
public int hashCode() {
return str.hashCode() * 31;
}
#Override
public boolean equals(Object o) {
if (o instanceof MyKey) {
MyKey aux = (MyKey) o;
if (this.str.equals(aux.getStr())) {
return true;
}
}
return false;
}
}
}
Note that only hashcode in MyKey has changed, now when we run the code te result is
100000
tiempo: 47 mls
There is an incredible better performance now with a minor change. Is a very common practice return the hashcode multiplied by a prime number (in this case 31), using the same hashcode members that you use inside equals method in order to determine if two objects are the same (in this case only str).
I hope that this little example can you point out a solution for your problem.
I'm looking for a hash function to hash Strings. For my purposes (identifying changed objects during an import) it should have the following properties:
fast
can be used incremental, i.e. I can use it like this:
Hasher h = new Hasher();
h.add("somestring");
h.add("another part");
h.add("eveno more");
Long hash = h.create();
without compromising the other properties or keeping the strings in memory during the complete process.
Secure against collisions. If I compare two hash values from different strings 1 million times per day for the rest of my life, the risk that I get a collision should be neglectable.
It does not have to be secure against malicious attempts to create collisions.
What algorithm can I use? An algorithm with an existent free implementation in Java is preferred.
Clarification
The hash doesn't have to be a long. A String for example would be just fine.
The data to be hashed will come from a file or a db, with many 10MB or up to a few GB of data, that will get distributed into different Hashes. So keeping the complete Strings in memory is not really an option.
Hashs are a sensible topic and it is hard to recommend any such hash based upon your question. You might want to ask this question on https://security.stackexchange.com/ to get expert opinions on the usability of hashs in certain usecases.
What I understood so far is that most hashs are implemented incrementally in the very core; the execution-timing on the other hand is not that easy to predict.
I present you two Hasher implementations which rely on "an existent free implementation in Java". Both implementations are constructed in a way that you can arbitrarily split your Strings before calling add() and get the same result as long as you do not change the order of the characters in them:
import java.math.BigInteger;
import java.nio.charset.Charset;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
/**
* Created for https://stackoverflow.com/q/26928529/1266906.
*/
public class Hashs {
public static class JavaHasher {
private int hashCode;
public JavaHasher() {
hashCode = 0;
}
public void add(String value) {
hashCode = 31 * hashCode + value.hashCode();
}
public int create() {
return hashCode;
}
}
public static class ShaHasher {
public static final Charset UTF_8 = Charset.forName("UTF-8");
private final MessageDigest messageDigest;
public ShaHasher() throws NoSuchAlgorithmException {
messageDigest = MessageDigest.getInstance("SHA-256");
}
public void add(String value) {
messageDigest.update(value.getBytes(UTF_8));
}
public byte[] create() {
return messageDigest.digest();
}
}
public static void main(String[] args) {
javaHash();
try {
shaHash();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace(); // TODO: implement catch
}
}
private static void javaHash() {
JavaHasher h = new JavaHasher();
h.add("somestring");
h.add("another part");
h.add("eveno more");
int hash = h.create();
System.out.println(hash);
}
private static void shaHash() throws NoSuchAlgorithmException {
ShaHasher h = new ShaHasher();
h.add("somestring");
h.add("another part");
h.add("eveno more");
byte[] hash = h.create();
System.out.println(Arrays.toString(hash));
System.out.println(new BigInteger(1, hash));
}
}
Here obviously "SHA-256" could be replaced with other common hash-algorithms; Java ships quite a few of them.
Now you called out for a Long as return-value which would imply you are looking for a 64bit-Hash. If this really was on purpose have a look at the answers to What is a good 64bit hash function in Java for textual strings?. The accepted answer is a slight variant of the JavaHasher as String.hashCode() does basically the same calculation, but with lower overflow-boundary:
public static class Java64Hasher {
private long hashCode;
public Java64Hasher() {
hashCode = 1125899906842597L;
}
public void add(CharSequence value) {
final int len = value.length();
for(int i = 0; i < len; i++) {
hashCode = 31*hashCode + value.charAt(i);
}
}
public long create() {
return hashCode;
}
}
Unto your points:
fast
With SHA-256 being slower than the other two I still would call all three presented approaches fast.
can be used incremental without compromising the other properties or keeping the strings in memory during the complete process.
I can not guarantee that property for the ShaHasher as I understand it is block-based and I lack the source code.Still I would suggest that at most one block, the hash and some internal states are kept. The other two obviously only store the partial hash between calls to add()
Secure against collisions. If I compare two hash values from different strings 1 million times per day for the rest of my life, the risk that I get a collision should be neglectable.
For every hash there are collisions. Given a good distribution the bit-size of the hash is the main factor on how often a collision happens. The JavaHasher is used in e.g. HashMaps and seems to be "collision-free" enough to distribute similar keys far apart each other. As for any deeper analysis: do your own tests or ask your local security engineer - sorry.
I hope this gives a good starting point, details are probably mainly opinion-based.
Not intended as an answer, just to demonstrate that hash collisions are much more likely than human intuition tends to assume.
The following tiny program generates 2^31 distinct strings and checks if any of their hashes collide. It does this by keeping a tracking bit per possible hash value (so you need >512MB heap to run it), to mark each hash value as "used" as they are encountered. It takes several minutes to complete.
public class TestStringHashCollisions {
public static void main(String[] argv) {
long collisions = 0;
long testcount = 0;
StringBuilder b = new StringBuilder(64);
for (int i=0; i>=0; ++i) {
// construct distinct string
b.setLength(0);
b.append("www.");
b.append(Integer.toBinaryString(i));
b.append(".com");
// check for hash collision
String s = b.toString();
++testcount;
if (isColliding(s.hashCode()))
++collisions;
// progress printing
if ((i & 0xFFFFFF) == 0) {
System.out.println("Tested: " + testcount + ", Collisions: " + collisions);
}
}
System.out.println("Tested: " + testcount + ", Collisions: " + collisions);
System.out.println("Collision ratio: " + (collisions / (double) testcount));
}
// storage for 2^32 bits in 2^27 ints
static int[] bitSet = new int[1 << 27];
// test if hash code has appeared before, mark hash as "used"
static boolean isColliding(int hash) {
int index = hash >>> 5;
int bitMask = 1 << (hash & 31);
if ((bitSet[index] & bitMask) != 0)
return true;
bitSet[index] |= bitMask;
return false;
}
}
You can adjust the string generation part easily to test different patterns.
I was asked this in interview. using Google Guava or MultiMap is not an option.
I have a class
public class Alpha
{
String company;
int local;
String title;
}
I have many instances of this class (in order of millions). I need to process them and at the end find the unique ones and their duplicates.
e.g.
instance --> instance1, instance5, instance7 (instance1 has instance5 and instance7 as duplicates)
instance2 --> instance2 (no duplicates for instance 2)
My code works fine
declare datastructure
HashMap<Alpha,ArrayList<Alpha>> hashmap = new HashMap<Alpha,ArrayList<Alpha>>();
Add instances
for (Alpha x : arr)
{
ArrayList<Alpha> list = hashmap.get(x); ///<<<<---- doubt about this. comment#1
if (list == null)
{
list = new ArrayList<Alpha>();
hashmap.put(x, list);
}
list.add(x);
}
Print instances and their duplicates.
for (Alpha x : hashmap.keySet())
{
ArrayList<Alpha> list = hashmap.get(x); //<<< doubt about this. comment#2
System.out.println(x + "<---->");
for(Alpha y : list)
{
System.out.print(y);
}
System.out.println();
}
Question: My code works, but why? when I do hashmap.get(x); (comment#1 in code). it is possible that two different instances might have same hashcode. In that case, I will add 2 different objects to the same List.
When I retrieve, I should get a List which has 2 different instances. (comment#2) and when I iterate over the list, I should see at least one instance which is not duplicate of the key but still exists in the list. I don't. Why?. I tried returning constant value from my hashCode function, it works fine.
If you want to see my implementation of equals and hashCode,let me know.
Bonus question: Any way to optimize it?
Edit:
#Override
public boolean equals(Object obj) {
if (obj==null || obj.getClass()!=this.getClass())
return false;
if (obj==this)
return true;
Alpha guest = (Alpha)obj;
return guest.getLocal()==this.getLocal()
&& guest.getCompany() == this.getCompany()
&& guest.getTitle() == this.getTitle();
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (title==null?0:title.hashCode());
result = prime * result + local;
result = prime * result + (company==null?0:company.hashCode());
return result;
}
it is possible that two different instances might have same hashcode
Yes, but hashCode method is used to identify the index to store the element. Two or more keys could have the same hashCode but that's why they are also evaluated using equals.
From Map#containsKey javadoc:
Returns true if this map contains a mapping for the specified key. More formally, returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k)). (There can be at most one such mapping.)
Some enhancements to your current code:
Code oriented to interfaces. Use Map and instantiate it by HashMap. Similar to List and ArrayList.
Compare Strings and Objects in general using equals method. == compares references, equals compares the data stored in the Object depending the implementation of this method. So, change the code in Alpha#equals:
public boolean equals(Object obj) {
if (obj==null || obj.getClass()!=this.getClass())
return false;
if (obj==this)
return true;
Alpha guest = (Alpha)obj;
return guest.getLocal().equals(this.getLocal())
&& guest.getCompany().equals(this.getCompany())
&& guest.getTitle().equals(this.getTitle());
}
When navigating through all the elements of a map in pairs, use Map#entrySet instead, you can save the time used by Map#get (since it is supposed to be O(1) you won't save that much but it is better):
for (Map.Entry<Alpha, List<Alpha>> entry : hashmap.keySet()) {
List<Alpha> list = entry.getValuee();
System.out.println(entry.getKey() + "<---->");
for(Alpha y : list) {
System.out.print(y);
}
System.out.println();
}
Use equals along with hashCode to solve the collision state.
Steps:
First compare on the basis of title in hashCode()
If the title is same then look into equals() based on company name to resolve the collision state.
Sample code
class Alpha {
String company;
int local;
String title;
public Alpha(String company, int local, String title) {
this.company = company;
this.local = local;
this.title = title;
}
#Override
public int hashCode() {
return title.hashCode();
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Alpha) {
return this.company.equals(((Alpha) obj).company);
}
return false;
}
}
...
Map<Alpha, ArrayList<Alpha>> hashmap = new HashMap<Alpha, ArrayList<Alpha>>();
hashmap.put(new Alpha("a", 1, "t1"), new ArrayList<Alpha>());
hashmap.put(new Alpha("b", 2, "t1"), new ArrayList<Alpha>());
hashmap.put(new Alpha("a", 3, "t1"), new ArrayList<Alpha>());
System.out.println("Size : "+hashmap.size());
Output
Size : 2
We are storing a String key in a HashMap that is a concatenation of three String fields and a boolean field. Problem is duplicate keys can be created if the delimiter appears in the field value.
So to get around this, based on advice in another post, I'm planning on creating a key class which will be used as the HashMap key:
class TheKey {
public final String k1;
public final String k2;
public final String k3;
public final boolean k4;
public TheKey(String k1, String k2, String k3, boolean k4) {
this.k1 = k1; this.k2 = k2; this.k3 = k3; this.k4 = k4;
}
public boolean equals(Object o) {
TheKey other = (TheKey) o;
//return true if all four fields are equal
}
public int hashCode() {
return ???;
}
}
My questions are:
What value should be returned from hashCode(). The map will hold a total of about 30 values. Of those 30, there are about 10 distinct values of k1 (some entries share the same k1 value).
To store this key class as the HashMap key, does one only need to override the equals() and hashCode() methods? Is anything else required?
Just hashCode and equals should be fine. The hashCode could look something like this:
public int hashCode() {
int hash = 17;
hash = hash * 31 + k1.hashCode();
hash = hash * 31 + k2.hashCode();
hash = hash * 31 + k3.hashCode();
hash = hash * 31 + k4 ? 0 : 1;
return hash;
}
That's assuming none of the keys can be null, of course. Typically you could use 0 as the "logical" hash code for a null reference in the above equation. Two useful methods for compound equality/hash code which needs to deal with nulls:
public static boolean equals(Object o1, Object o2) {
if (o1 == o2) {
return true;
}
if (o1 == null || o2 == null) {
return false;
}
return o1.equals(o2);
}
public static boolean hashCode(Object o) {
return o == null ? 0 : o.hashCode();
}
Using the latter method in the hash algorithm at the start of this answer, you'd end up with something like:
public int hashCode() {
int hash = 17;
hash = hash * 31 + ObjectUtil.hashCode(k1);
hash = hash * 31 + ObjectUtil.hashCode(k2);
hash = hash * 31 + ObjectUtil.hashCode(k3);
hash = hash * 31 + k4 ? 0 : 1;
return hash;
}
In Eclipse you can generate hashCode and equals by Alt-Shift-S h.
Ask Eclipse 3.5 to create the hashcode and equals methods for you :)
this is how a well-formed equals class with equals ans hashCode should look like: (generated with intellij idea, with null checks enabled)
class TheKey {
public final String k1;
public final String k2;
public final String k3;
public final boolean k4;
public TheKey(String k1, String k2, String k3, boolean k4) {
this.k1 = k1;
this.k2 = k2;
this.k3 = k3;
this.k4 = k4;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
TheKey theKey = (TheKey) o;
if (k4 != theKey.k4) return false;
if (k1 != null ? !k1.equals(theKey.k1) : theKey.k1 != null) return false;
if (k2 != null ? !k2.equals(theKey.k2) : theKey.k2 != null) return false;
if (k3 != null ? !k3.equals(theKey.k3) : theKey.k3 != null) return false;
return true;
}
#Override
public int hashCode() {
int result = k1 != null ? k1.hashCode() : 0;
result = 31 * result + (k2 != null ? k2.hashCode() : 0);
result = 31 * result + (k3 != null ? k3.hashCode() : 0);
result = 31 * result + (k4 ? 1 : 0);
return result;
}
}
The implementation of your hashCode() doesn't matter much unless you make it super stupid. You could very well just return the sum of all the strings hash codes (truncated to an int) but you should make sure you fix this:
If your hash code implementation is slow, consider caching it in the instance. Depending on how long your key objects stick around and how they are used with the hash table when you get things out of it you may not want to spend longer than necessary calculating the same value over and over again. If you stick with Jon's implementation of hashCode() there is probably no need for it as String already cache its hashCode() for you.
This is however more of a general advice, since the mid 90's I've seen quite a few developers get stung on slow (and even worse, changing) hashCode() implementations.
Don't be sloppy when you create the equals() implementation. Your equals() above will be both ineffective and flawed. First of all you don't need to compare the values if the objects have different hash codes. You should also return false (and not a null pointer exception) if you get a null as the argument.
The rules are simple, this page will walk you through them.
Edit:
I have to ask one more thing... You say "Problem is duplicate keys can be created if the delimiter appears in the field value". Why is that?
If the format is key+delimiter+key+delimiter+key it really doesn't matter if there are one or more delimiters in the keys unless you get really unlucky with a combination of two keys and in that case you probably should have selected another delimiter (there are quite a few to choose from in unicode).
Anyway, Jon is right in his comment below... Don't do caching "until you've proven it's a good thing". It is a good practice always.
Have you taken a look at the specifications of hashCode()? Perhaps this will give you a better idea of what the function should return.
I do not know if this is an option for you but apache commons library provides an implementation for MultiKeyMap
For the hashCode, you could instead use something like
k1.hashCode() ^ k2.hashCode() ^ k3.hashCode() ^ k4.hashCode()
XOR is entropy-preserving, and this incorporates k4's hashCode in a much better way than the previous suggestions. Just having one bit of information from k4 means that if all your composite keys have identical k1, k2, k3 and only differing k4s, your hash codes will all be identical and you'll get a degenerate HashMap.
I thought your main concern was speed (based on your original post)? Why don't you just make sure you use a separator which does not occur in your (handfull of) field values? Then you can just create String key using concatenation and do away with all this 'key-class' hocus pocus. Smells like serious over-engineering to me.