Why is my HashMap implementation 10 times slower than the JDK's?

Why is my HashMap implementation 10 times slower than the JDK's? - java

I would like to know what makes the difference, what should i aware of when im writing code.
Used the same parameters and methods put(), get() when testing
without printing
Used System.NanoTime() to test runtime
I tried it with 1-10 int keys with 10 values, so every single hash returns unique index, which is the most optimal scenario
My HashSet implementation which is based on this is almost as fast as the JDK's
Here's my simple implementation:
public MyHashMap(int s) {
this.TABLE_SIZE=s;
table = new HashEntry[s];
}
class HashEntry {
int key;
String value;
public HashEntry(int k, String v) {
this.key=k;
this.value=v;
}
public int getKey() {
return key;
}
}
int TABLE_SIZE;
HashEntry[] table;
public void put(int key, String value) {
int hash = key % TABLE_SIZE;
while(table[hash] != null && table[hash].getKey() != key)
hash = (hash +1) % TABLE_SIZE;
table[hash] = new HashEntry(key, value);
}
public String get(int key) {
int hash = key % TABLE_SIZE;
while(table[hash] != null && table[hash].key != key)
hash = (hash+1) % TABLE_SIZE;
if(table[hash] == null)
return null;
else
return table[hash].value;
}
Here's the benchmark:
public static void main(String[] args) {
long start = System.nanoTime();
MyHashMap map = new MyHashMap(11);
map.put(1,"A");
map.put(2,"B");
map.put(3,"C");
map.put(4,"D");
map.put(5,"E");
map.put(6,"F");
map.put(7,"G");
map.put(8,"H");
map.put(9,"I");
map.put(10,"J");
map.get(1);
map.get(2);
map.get(3);
map.get(4);
map.get(5);
map.get(6);
map.get(7);
map.get(8);
map.get(9);
map.get(10);
long end = System.nanoTime();
System.out.println(end-start+" ns");
}

If you read the documentation of the HashMap class, you see that it implements a hash table implementation based on the hashCode of the keys. This is dramatically more efficient than a brute-force search if the map contains a non-trivial number of entries, assuming reasonable key distribution amongst the "buckets" that it sorts the entries into.
That said, benchmarking the JVM is non-trivial and easy to get wrong, if you're seeing big differences with small numbers of entries, it could easily be a benchmarking error rather than the code.

When it is up to performance, never assume something.
Your assumption was "My HashSet implementation which is based on this is almost as fast as the JDK's". No, obviously it is not.
That is the tricky part when doing performance work: doubt everything unless you have measured with great accuracy. Worse, you even measured, and the measurement told you that your implementation is slower; and instead of checking your source, and the source of the thing you are measuring against; you decided that the measuring process must be wrong ...

Related

Get least accessed element in a List (Java 8)

I am trying to create an API wrapper.
This API requires an API key, like most do. My goal is to spread out the usage as evenly as possible between a list of API keys. This is needed to reduce the possibility of rate limiting.
Needs:
Immutable List
A solution I could think of is to somehow get the least accessed element maybe with an object that keeps track of only the uses and the actual data? And then sort it and get the first element?
class Key {
private int uses;
private UUID key;
public Key(UUID key) {
this.key = key;
this.uses = 0;
}
public UUID get() {
this.uses++;
return this.key;
}
public int getUses() {
return this.uses;
}
}
I am up for using maven libraries such as Google Guava (which I am already using) if needed or for a more elegant solution. Here is an example of what it might look like.
List<UUID> keys = new ArrayList<>();
public Data getDataFromApi(String name) {
return getData(ENDPOINT_URL_STR + "key=" + keys.getLeastAccessed().toString() + "&name=" + name);
}

Given that the set of keys would be immutable, I suggest implementing round-robin, i.e. you use 1st key, then 2nd, 3rd and so on until you reach nth and then you start over again from 1st key.
This way difference between usages of any 2 keys would be <= 1

I tend to use Apache's LRUMap which removes the least recently used entry if an entry is added when full.
It sounds like what you are looking for. Documentation is here

If there is a list of mentioned Key instances, Stream API would be sufficient to get the least used elements using Stream::filter:
public static List<Key> getLeastUsedKeys(List<Key> keys) {
if (null == keys || keys.size() < 2) {
return keys;
}
int minUsage = keys.stream().mapToInt(Key::getUses).min().getAsInt();
return keys.stream().filter(k -> minUsage == k.getUses()).collect(Collectors.toList());
}
If a single key is needed, Collectors.minBy may be used:
public static Key getLeastUsedKey(List<Key> keys) {
if (null == keys || keys.isEmpty()) {
return null;
}
return keys.stream()
.collect(Collectors.minBy(Comparator.comparingInt(Key::getUses)))
.orElse(null);
}

Direct Recursion vs While Loop for time complexity performance

I was wondering how time complexity compares between these two methods. I have written the first findEmpty function and a friend wrote the 2nd. Both more or less achieve the same thing, however, I'm unsure which exactly computes faster (if at all) and why?
these examples come from an implementation of a hashtable class we've been working on. This function finds the next empty location in the array after the given parameters and returns it. Data is stored in the array "arr" as a Pair object containing a key and a value.
I believe this would run at O(1):
private int findEmpty(int startPos, int stepNum, String key) {
if (arr[startPos] == null || ((Pair) arr[startPos]).key.equals(key)) {
return startPos;
} else {
return findEmpty(getNextLocation(startPos), stepNum++, key);
}
}
I believe this would run at O(n):
private int findEmpty(int startPos, int stepNum, String key) {
while (arr[startPos] != null) {
if (((Pair) arr[startPos]).key.equals(key)) {
return startPos;
}
startPos = getNextLocation(startPos);
}
return startPos;
}
here is the code for the Pair object and getNextLocation:
private class Pair {
private String key;
private V value;
public Pair(String key, V value) {
this.key = key;
this.value = value;
}
}
private int getNextLocation(int startPos) {
int step = startPos;
step++;
return step % arr.length;
}
I expect my understanding is off and probably haven't approached this question as concisely as possible, but I appreciate and welcome any corrections.

Your solution has the same time complexity as your friend's. Both are linear to the length of your array. recursion did not reduce your time complexity to O(1), as it keeps calling getNextLocation until it finds the key.
And also in your function, getNextLocation
private int getNextLocation(int startPos, int stepNum) {
int step = startPos;
step++;
return step % arr.length;
}
the second parameter stepNum is never used in this function, and it should be cleared from all your functions to make it easier to read and understand. please write concise and clean code from the beginning.

Mapping hash values to a range, with minimal collisions

Context
Hi, I'm working on an assignment for school that asks us to implement a hash table in Java. There are no requirements that collisions be kept to a minimum, but low collision rate and speed seem to be the two most sought-after qualities in all the reading (some more) that I've done.
Problem
I'd like some guidance on how to map the output of a hash function to a smaller range, without having >20% of my keys collide (yikes).
In all of the algorithms that I've explored, keys are mapped to the entire range of an unsigned 32 bit integer (or in many cases, 64, even 128 bit). I'm not finding much about this on here, Wikipedia, or in any of the hash-related articles / discussions I've come across.
In terms of the specifics of my implementation, I'm working in Java (mandate of my school), which is problematic since there are no unsigned types to work with. To get around this, I've been using the 64-bit long integer type, then using a bit mask to map back down to 32 bits. Instead of simply truncating, I XOR the top 32 bits with the bottom 32, then perform a bitwise AND to mask out any upper bits that might result in a negative value when I cast it down to a 32 bit integer. After all that, a separate function compresses the resulting hash value down to fit into the bounds of the hash table's inner array.
It ends up looking like:
int hash( String key ) {
long h;
for( int i = 0; i < key.length(); i++ )
//do some stuff with each character in the key
h = h ^ ( h << 32 );
return h & 2147483647;
}
Where the inner-loop depends on the hash function (I've implemented a few: polynomial hashing, FNV1, SuperFastHash, and a custom one tailored to the input data).
They basically all perform horribly. I have yet to see <20% keys collide. Even before I compress the hash values down to array indices, none of my hash functions will get me less thank 10k collisions. My inputs are two text files, each ~220,000 lines. One is English words, the other is random strings of varying length.
My lecture notes recommend the following, for compressing the hashed keys:
(hashed key) % P
Where P is the largest prime < the size of the inner array.
Is this an accepted method of compressing hash values? I have a feeling it isn't, but since performance is so poor even before compression, I have a feeling it's not the primary culprit, either.

I don´t know if I understand well your concrete problem, but I will try to help in hash performance and collisions.
The hash based objects will determine in which bucket they will store the key-value pair based on hash value. Inside each bucket there is a structure (In HashMap case a LinkedList) in where the pair is stored.
If the hash value is usually the same, the bucket will be usually the same so the performance will degrade a lot, let´s see an example:
Consider this class
package hashTest;
import java.util.Hashtable;
public class HashTest {
public static void main (String[] args) {
Hashtable<MyKey, String> hm = new Hashtable<>();
long ini = System.currentTimeMillis();
for (int i=0; i<100000; i++) {
MyKey a = new HashTest().new MyKey(String.valueOf(i));
hm.put(a, String.valueOf(i));
}
System.out.println(hm.size());
long fin = System.currentTimeMillis();
System.out.println("tiempo: " + (fin-ini) + " mls");
}
private class MyKey {
private String str;
public MyKey(String i) {
str = i;
}
public String getStr() {
return str;
}
#Override
public int hashCode() {
return 0;
}
#Override
public boolean equals(Object o) {
if (o instanceof MyKey) {
MyKey aux = (MyKey) o;
if (this.str.equals(aux.getStr())) {
return true;
}
}
return false;
}
}
}
Note that hashCode in class MyKey returns always '0' as hash. It is ok with the hashcode definition (http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()). If we run that program, this is the result
100000
tiempo: 62866 mls
Is a very poor performance, now we are going to change the MyKey hashcode code:
package hashTest;
import java.util.Hashtable;
public class HashTest {
public static void main (String[] args) {
Hashtable<MyKey, String> hm = new Hashtable<>();
long ini = System.currentTimeMillis();
for (int i=0; i<100000; i++) {
MyKey a = new HashTest().new MyKey(String.valueOf(i));
hm.put(a, String.valueOf(i));
}
System.out.println(hm.size());
long fin = System.currentTimeMillis();
System.out.println("tiempo: " + (fin-ini) + " mls");
}
private class MyKey {
private String str;
public MyKey(String i) {
str = i;
}
public String getStr() {
return str;
}
#Override
public int hashCode() {
return str.hashCode() * 31;
}
#Override
public boolean equals(Object o) {
if (o instanceof MyKey) {
MyKey aux = (MyKey) o;
if (this.str.equals(aux.getStr())) {
return true;
}
}
return false;
}
}
}
Note that only hashcode in MyKey has changed, now when we run the code te result is
100000
tiempo: 47 mls
There is an incredible better performance now with a minor change. Is a very common practice return the hashcode multiplied by a prime number (in this case 31), using the same hashcode members that you use inside equals method in order to determine if two objects are the same (in this case only str).
I hope that this little example can you point out a solution for your problem.

Fast Incremental Hash in Java

I'm looking for a hash function to hash Strings. For my purposes (identifying changed objects during an import) it should have the following properties:
fast
can be used incremental, i.e. I can use it like this:
Hasher h = new Hasher();
h.add("somestring");
h.add("another part");
h.add("eveno more");
Long hash = h.create();
without compromising the other properties or keeping the strings in memory during the complete process.
Secure against collisions. If I compare two hash values from different strings 1 million times per day for the rest of my life, the risk that I get a collision should be neglectable.
It does not have to be secure against malicious attempts to create collisions.
What algorithm can I use? An algorithm with an existent free implementation in Java is preferred.
Clarification
The hash doesn't have to be a long. A String for example would be just fine.
The data to be hashed will come from a file or a db, with many 10MB or up to a few GB of data, that will get distributed into different Hashes. So keeping the complete Strings in memory is not really an option.

Hashs are a sensible topic and it is hard to recommend any such hash based upon your question. You might want to ask this question on https://security.stackexchange.com/ to get expert opinions on the usability of hashs in certain usecases.
What I understood so far is that most hashs are implemented incrementally in the very core; the execution-timing on the other hand is not that easy to predict.
I present you two Hasher implementations which rely on "an existent free implementation in Java". Both implementations are constructed in a way that you can arbitrarily split your Strings before calling add() and get the same result as long as you do not change the order of the characters in them:
import java.math.BigInteger;
import java.nio.charset.Charset;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
/**
* Created for https://stackoverflow.com/q/26928529/1266906.
*/
public class Hashs {
public static class JavaHasher {
private int hashCode;
public JavaHasher() {
hashCode = 0;
}
public void add(String value) {
hashCode = 31 * hashCode + value.hashCode();
}
public int create() {
return hashCode;
}
}
public static class ShaHasher {
public static final Charset UTF_8 = Charset.forName("UTF-8");
private final MessageDigest messageDigest;
public ShaHasher() throws NoSuchAlgorithmException {
messageDigest = MessageDigest.getInstance("SHA-256");
}
public void add(String value) {
messageDigest.update(value.getBytes(UTF_8));
}
public byte[] create() {
return messageDigest.digest();
}
}
public static void main(String[] args) {
javaHash();
try {
shaHash();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace(); // TODO: implement catch
}
}
private static void javaHash() {
JavaHasher h = new JavaHasher();
h.add("somestring");
h.add("another part");
h.add("eveno more");
int hash = h.create();
System.out.println(hash);
}
private static void shaHash() throws NoSuchAlgorithmException {
ShaHasher h = new ShaHasher();
h.add("somestring");
h.add("another part");
h.add("eveno more");
byte[] hash = h.create();
System.out.println(Arrays.toString(hash));
System.out.println(new BigInteger(1, hash));
}
}
Here obviously "SHA-256" could be replaced with other common hash-algorithms; Java ships quite a few of them.
Now you called out for a Long as return-value which would imply you are looking for a 64bit-Hash. If this really was on purpose have a look at the answers to What is a good 64bit hash function in Java for textual strings?. The accepted answer is a slight variant of the JavaHasher as String.hashCode() does basically the same calculation, but with lower overflow-boundary:
public static class Java64Hasher {
private long hashCode;
public Java64Hasher() {
hashCode = 1125899906842597L;
}
public void add(CharSequence value) {
final int len = value.length();
for(int i = 0; i < len; i++) {
hashCode = 31*hashCode + value.charAt(i);
}
}
public long create() {
return hashCode;
}
}
Unto your points:
fast
With SHA-256 being slower than the other two I still would call all three presented approaches fast.
can be used incremental without compromising the other properties or keeping the strings in memory during the complete process.
I can not guarantee that property for the ShaHasher as I understand it is block-based and I lack the source code.Still I would suggest that at most one block, the hash and some internal states are kept. The other two obviously only store the partial hash between calls to add()
Secure against collisions. If I compare two hash values from different strings 1 million times per day for the rest of my life, the risk that I get a collision should be neglectable.
For every hash there are collisions. Given a good distribution the bit-size of the hash is the main factor on how often a collision happens. The JavaHasher is used in e.g. HashMaps and seems to be "collision-free" enough to distribute similar keys far apart each other. As for any deeper analysis: do your own tests or ask your local security engineer - sorry.
I hope this gives a good starting point, details are probably mainly opinion-based.

Not intended as an answer, just to demonstrate that hash collisions are much more likely than human intuition tends to assume.
The following tiny program generates 2^31 distinct strings and checks if any of their hashes collide. It does this by keeping a tracking bit per possible hash value (so you need >512MB heap to run it), to mark each hash value as "used" as they are encountered. It takes several minutes to complete.
public class TestStringHashCollisions {
public static void main(String[] argv) {
long collisions = 0;
long testcount = 0;
StringBuilder b = new StringBuilder(64);
for (int i=0; i>=0; ++i) {
// construct distinct string
b.setLength(0);
b.append("www.");
b.append(Integer.toBinaryString(i));
b.append(".com");
// check for hash collision
String s = b.toString();
++testcount;
if (isColliding(s.hashCode()))
++collisions;
// progress printing
if ((i & 0xFFFFFF) == 0) {
System.out.println("Tested: " + testcount + ", Collisions: " + collisions);
}
}
System.out.println("Tested: " + testcount + ", Collisions: " + collisions);
System.out.println("Collision ratio: " + (collisions / (double) testcount));
}
// storage for 2^32 bits in 2^27 ints
static int[] bitSet = new int[1 << 27];
// test if hash code has appeared before, mark hash as "used"
static boolean isColliding(int hash) {
int index = hash >>> 5;
int bitMask = 1 << (hash & 31);
if ((bitSet[index] & bitMask) != 0)
return true;
bitSet[index] |= bitMask;
return false;
}
}
You can adjust the string generation part easily to test different patterns.

Null-free "maps": Is a callback solution slower than tryGet()?

In comments to "How to implement List, Set, and Map in null free design?", Steven Sudit and I got into a discussion about using a callback, with handlers for "found" and "not found" situations, vs. a tryGet() method, taking an out parameter and returning a boolean indicating whether the out parameter had been populated. Steven maintained that the callback approach was more complex and almost certain to be slower; I maintained that the complexity was no greater and the performance at worst the same.
But code speaks louder than words, so I thought I'd implement both and see what I got. The original question was fairly theoretical with regard to language ("And for argument sake, let's say this language don't even have null") -- I've used Java here because that's what I've got handy. Java doesn't have out parameters, but it doesn't have first-class functions either, so style-wise, it should suck equally for both approaches.
(Digression: As far as complexity goes: I like the callback design because it inherently forces the user of the API to handle both cases, whereas the tryGet() design requires callers to perform their own boilerplate conditional check, which they could forget or get wrong. But having now implemented both, I can see why the tryGet() design looks simpler, at least in the short term.)
First, the callback example:
class CallbackMap<K, V> {
private final Map<K, V> backingMap;
public CallbackMap(Map<K, V> backingMap) {
this.backingMap = backingMap;
}
void lookup(K key, Callback<K, V> handler) {
V val = backingMap.get(key);
if (val == null) {
handler.handleMissing(key);
} else {
handler.handleFound(key, val);
}
}
}
interface Callback<K, V> {
void handleFound(K key, V value);
void handleMissing(K key);
}
class CallbackExample {
private final Map<String, String> map;
private final List<String> found;
private final List<String> missing;
private Callback<String, String> handler;
public CallbackExample(Map<String, String> map) {
this.map = map;
found = new ArrayList<String>(map.size());
missing = new ArrayList<String>(map.size());
handler = new Callback<String, String>() {
public void handleFound(String key, String value) {
found.add(key + ": " + value);
}
public void handleMissing(String key) {
missing.add(key);
}
};
}
void test() {
CallbackMap<String, String> cbMap = new CallbackMap<String, String>(map);
for (int i = 0, count = map.size(); i < count; i++) {
String key = "key" + i;
cbMap.lookup(key, handler);
}
System.out.println(found.size() + " found");
System.out.println(missing.size() + " missing");
}
}
Now, the tryGet() example -- as best I understand the pattern (and I might well be wrong):
class TryGetMap<K, V> {
private final Map<K, V> backingMap;
public TryGetMap(Map<K, V> backingMap) {
this.backingMap = backingMap;
}
boolean tryGet(K key, OutParameter<V> valueParam) {
V val = backingMap.get(key);
if (val == null) {
return false;
}
valueParam.value = val;
return true;
}
}
class OutParameter<V> {
V value;
}
class TryGetExample {
private final Map<String, String> map;
private final List<String> found;
private final List<String> missing;
private final OutParameter<String> out = new OutParameter<String>();
public TryGetExample(Map<String, String> map) {
this.map = map;
found = new ArrayList<String>(map.size());
missing = new ArrayList<String>(map.size());
}
void test() {
TryGetMap<String, String> tgMap = new TryGetMap<String, String>(map);
for (int i = 0, count = map.size(); i < count; i++) {
String key = "key" + i;
if (tgMap.tryGet(key, out)) {
found.add(key + ": " + out.value);
} else {
missing.add(key);
}
}
System.out.println(found.size() + " found");
System.out.println(missing.size() + " missing");
}
}
And finally, the performance test code:
public static void main(String[] args) {
int size = 200000;
Map<String, String> map = new HashMap<String, String>();
for (int i = 0; i < size; i++) {
String val = (i % 5 == 0) ? null : "value" + i;
map.put("key" + i, val);
}
long totalCallback = 0;
long totalTryGet = 0;
int iterations = 20;
for (int i = 0; i < iterations; i++) {
{
TryGetExample tryGet = new TryGetExample(map);
long tryGetStart = System.currentTimeMillis();
tryGet.test();
totalTryGet += (System.currentTimeMillis() - tryGetStart);
}
System.gc();
{
CallbackExample callback = new CallbackExample(map);
long callbackStart = System.currentTimeMillis();
callback.test();
totalCallback += (System.currentTimeMillis() - callbackStart);
}
System.gc();
}
System.out.println("Avg. callback: " + (totalCallback / iterations));
System.out.println("Avg. tryGet(): " + (totalTryGet / iterations));
}
On my first attempt, I got 50% worse performance for callback than for tryGet(), which really surprised me. But, on a hunch, I added some garbage collection, and the performance penalty vanished.
This fits with my instinct, which is that we're basically talking about taking the same number of method calls, conditional checks, etc. and rearranging them. But then, I wrote the code, so I might well have written a suboptimal or subconsicously penalized tryGet() implementation. Thoughts?
Updated: Per comment from Michael Aaron Safyan, fixed TryGetExample to reuse OutParameter.

I would say that neither design makes sense in practice, regardless of the performance. I would argue that both mechanisms are overly complicated and, more importantly, don't take into account actual usage.
Actual Usage
If a user looks up a value in a map and it isn't there, most likely the user wants one of the following:
To insert some value with that key into the map
To get back some default value
To be informed that the value isn't there
Thus I would argue that a better, null-free API would be:
has(key) which indicates if the key is present (if one only wishes to check for the key's existence).
get(key) which reports the value if the key is present; otherwise, throws NoSuchElementException.
get(key,defaultval) which reports the value for the key, or defaultval if the key isn't present.
setdefault(key,defaultval) which inserts (key,defaultval) if key isn't present, and returns the value associated with key (which is defaultval if there is no previous mapping, otherwise prev mapping).
The only way to get back null is if you explicity ask for it as in get(key,null). This API is incredibly simple, and yet is able to handle the most common map-related tasks (in most use cases that I have encountered).
I should also add that in Java, has() would be called containsKey() while setdefault() would be called putIfAbsent(). Because get() signals an object's absence via a NoSuchElementException, it is then possible to associate a key with null and treat it as a legitimate association.... if get() returns null, it means the key has been associated with the value null, not that the key is absent (although you can define your API to disallow a value of null if you so choose, in which case you would throw an IllegalArgumentException from the functions that are used to add associations if the value given is null). Another advantage to this API, is that setdefault() only needs to perform the lookup procedure once instead of twice, which would be the case if you used if( ! dict.has(key) ){ dict.set(key,val); }. Another advantage is that you do not surprise developers who write something like dict.get(key).doSomething() who assume that get() will always return a non-null object (because they have never inserted a null value into the dictionary)... instead, they get a NoSuchElementException if there is no value for that key, which is more consistent with the rest of the error checking in Java and which is also a much easier to understand and debug than NullPointerException.
Answer To Question
To answer original question, yes, you are unfairly penalizing the tryGet version.... in your callback based mechanism you construct the callback object only once and use it in all subsequent calls; whereas in your tryGet example, you construct your out parameter object in every single iteration. Try taking the line:
OutParameter out = new OutParameter();
Take the line above out of the for-loop and see if that improves the performance of the tryGet example. In other words, place the line above the for-loop, and re-use the out parameter in each iteration.

David, thanks for taking the time to write this up. I'm a C# programmer, so my Java skills are a bit vague these days. Because of this, I decided to port your code over and test it myself. I found some interesting differences and similarities, which are pretty much worth the price of admission as far as I'm concerned. Among the major differences are:
I didn't have to implement TryGet because it's built into Dictionary.
In order to use the native TryGet, instead of inserting nulls to simulate misses, I simply omitted those values. This still means that v = map[k] would have set v to null, so I think it's a proper porting. In hindsight, I could have inserted the nulls and changed (_map.TryGetValue(key, out value)) to (_map.TryGetValue(key, out value) && value != null)), but I'm glad I didn't.
I want to be exceedingly fair. So, to keep the code as compact and maintainable as possible, I used lambda calculus notation, which let me define the callbacks painlessly. This hides much of the complexity of setting up anonymous delegates, and allows me to use closures seamlessly. Ironically, the implementation of Lookup uses TryGet internally.
Instead of declaring a new type of Dictionary, I used an extension method to graft Lookup onto the standard dictionary, much simplifying the code.
With apologies for the less-than-professional quality of the code, here it is:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication1
{
static class CallbackDictionary
{
public static void Lookup<K, V>(this Dictionary<K, V> map, K key, Action<K, V> found, Action<K> missed)
{
V v;
if (map.TryGetValue(key, out v))
found(key, v);
else
missed(key);
}
}
class TryGetExample
{
private Dictionary<string, string> _map;
private List<string> _found;
private List<string> _missing;
public TryGetExample(Dictionary<string, string> map)
{
_map = map;
_found = new List<string>(_map.Count);
_missing = new List<string>(_map.Count);
}
public void TestTryGet()
{
for (int i = 0; i < _map.Count; i++)
{
string key = "key" + i;
string value;
if (_map.TryGetValue(key, out value))
_found.Add(key + ": " + value);
else
_missing.Add(key);
}
Console.WriteLine(_found.Count() + " found");
Console.WriteLine(_missing.Count() + " missing");
}
public void TestCallback()
{
for (int i = 0; i < _map.Count; i++)
_map.Lookup("key" + i, (k, v) => _found.Add(k + ": " + v), k => _missing.Add(k));
Console.WriteLine(_found.Count() + " found");
Console.WriteLine(_missing.Count() + " missing");
}
}
class Program
{
static void Main(string[] args)
{
int size = 2000000;
var map = new Dictionary<string, string>(size);
for (int i = 0; i < size; i++)
if (i % 5 != 0)
map.Add("key" + i, "value" + i);
long totalCallback = 0;
long totalTryGet = 0;
int iterations = 20;
TryGetExample tryGet;
for (int i = 0; i < iterations; i++)
{
tryGet = new TryGetExample(map);
long tryGetStart = DateTime.UtcNow.Ticks;
tryGet.TestTryGet();
totalTryGet += (DateTime.UtcNow.Ticks - tryGetStart);
GC.Collect();
tryGet = new TryGetExample(map);
long callbackStart = DateTime.UtcNow.Ticks;
tryGet.TestCallback();
totalCallback += (DateTime.UtcNow.Ticks - callbackStart);
GC.Collect();
}
Console.WriteLine("Avg. callback: " + (totalCallback / iterations));
Console.WriteLine("Avg. tryGet(): " + (totalTryGet / iterations));
}
}
}
My performance expectations, as I said in the article that inspired this one, would be that neither one is much faster or slower than the other. After all, most of the work is in the searching and adding, not in the simple logic that structures it. In fact, it varied a bit among runs, but I was unable to detect any consistent advantage.
Part of the problem is that I used a low-precision timer and the test was short, so I increased the count by 10x to 2000000 and that helped. Now callbacks are about 3% slower, which I do not consider significant. On my fairly slow machine, callbacks took 17773437 while tryget took 17234375.
Now, as for code complexity, it's a bit unfair because TryGet is native, so let's just ignore the fact that I had to add a callback interface. At the calling spot, lambda notation did a great job of hiding the complexity. If anything, it's actually shorter than the if/then/else used in the TryGet version, although I suppose I could have used a ternary operator to make it equally compact.
On the whole, I found the C# to be more elegant, and only some of that is due to my bias as a C# programmer. Mainly, I didn't have to define and implement interfaces, which cut down on the plumbing overhead. I also used pretty standard .NET conventions, which seem to be a bit more streamlined than the sort of style favored in Java.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why is my HashMap implementation 10 times slower than the JDK's? - java

Related

Get least accessed element in a List (Java 8)

Direct Recursion vs While Loop for time complexity performance

Mapping hash values to a range, with minimal collisions

Fast Incremental Hash in Java

Null-free "maps": Is a callback solution slower than tryGet()?

Categories

Resources