Using Chronicle Map producing garbage while using Streams API - java

Today I was experimenting with Chronicle Map. Here is a code sample:
package experimental;
import net.openhft.chronicle.core.values.IntValue;
import net.openhft.chronicle.map.ChronicleMap;
import net.openhft.chronicle.values.Values;
public class Tmp {
public static void main(String[] args) {
try (ChronicleMap<IntValue, User> users = ChronicleMap
.of(IntValue.class, User.class)
.name("users")
.entries(100_000_000)
.create();) {
User user = Values.newHeapInstance(User.class);
IntValue id = Values.newHeapInstance(IntValue.class);
for (int i = 1; i < 100_000_000; i++) {
user.setId(i);
user.setBalance(Math.random() * 1_000_000);
id.setValue(i);
users.put(id, user);
if (i % 100 == 0) {
System.out.println(i + ". " +
users.values()
.stream()
.max(User::compareTo)
.map(User::getBalance)
.get());
}
}
}
}
public interface User extends Comparable<User> {
int getId();
void setId(int id);
double getBalance();
void setBalance(double balance);
#Override
default int compareTo(User other) {
return Double.compare(getBalance(), other.getBalance());
}
}
}
As you see in above code I am just creating User object and putting it in Chronicle Map, and after each 100th record I am just printing the User with max balance. But unfortunately it is producing some garbage. When I monitored it with VisualVM I got the following:
It seems using streams in Chronicle Map will produce garbage anyway.
So my questions are:
* Does this mean that I should not use Streams API with Chronicle Map.
* Are there any other solutions/ways of doing this?
* How to filter/search Chronicle Map in proper way because I have use cases other than
just putting/getting data in it.

ChronicleMap's entrySet().iterator() (as well as iterator on keySet() and values()) is implemented so that it dumps all objects in a Chronicle Map's segment into memory before iterating over them.
You can inspect how much segments do you have by calling map.segments(). You could also configure it during the ChronicleMap construction phase, check out ChronicleMapBuilder javadoc.
So, during iteration, you should expect regularly, approximately numEntries / numSegments entries to be dumped into memory at once, where numEntries is the size of your Chronicle Map.
You can implement streaming processing on a Chronicle Map avoiding creating a lot of garbage, by reusing objects, via Segment Context API:
User[] maxUser = new User[1];
for (int i = 0; i < users.segments(); i++) {
try (MapSegmentContext<IntValue, User, ?> c = map.segmentContext(i)) {
c.forEachSegmentEntry((MapEntry<IntValue, User> e) -> {
User user = e.value().get();
if (maxUser[0] == null || user.compareTo(maxUser[0]) > 0) {
// Note that you cannot just assign `maxUser[0] = user`:
// this object will be reused by the SegmentContext later
// in the iteration, and it's contents will be rewritten.
// Check out the doc for Data.get().
if (maxUser[0] == null) {
maxUser[0] = Values.newHeapInstance(User.class);
}
User newMaxUser = e.value().getUsing(maxUser[0]);
// assert the object is indeed reused
assert newMaxUser == maxUser[0];
}
});
}
}
Link to doc for Data.get().
The code of the above example is adapted from here.

Related

How to use the same hashmap in multiple threads

I have a Hashmap that is created for each "mailer" class and each "agent" class creates a mailer.
My problem is that each of my "agents" creates a "mailer" that in turn creates a new hashmap.
What I'm trying to do is to create one Hashmap that will be used by all the agents(every agent is a thread).
This is the Agent class:
public class Agent implements Runnable {
private int id;
private int n;
private Mailer mailer;
private static int counter;
private List<Integer> received = new ArrayList<Integer>();
#Override
public void run() {
System.out.println("Thread has started");
n = 10;
if (counter < n - 1) {
this.id = ThreadLocalRandom.current().nextInt(0, n + 1);
counter++;
}
Message m = new Message(this.id, this.id);
this.mailer.getMap().put(this.id, new ArrayList<Message>());
System.out.println(this.mailer.getMap());
for (int i = 0; i < n; i++) {
if (i == this.id) {
continue;
}
this.mailer.send(i, m);
}
for (int i = 0; i < n; i++) {
if (i == this.id) {
continue;
}
if (this.mailer.getMap().get(i) == null) {
continue;
} else {
this.received.add(this.mailer.readOne(this.id).getContent());
}
}
System.out.println(this.id + "" + this.received);
}
}
This is the Mailer class :
public class Mailer {
private HashMap<Integer, List<Message>> map = new HashMap<>();
public void send(int receiver, Message m) {
synchronized (map) {
while (this.map.get(receiver) == null) {
this.map.get(receiver);
}
if (this.map.get(receiver) == null) {
} else {
map.get(receiver).add(m);
}
}
}
public Message readOne(int receiver) {
synchronized (map) {
if (this.map.get(receiver) == null) {
return null;
} else if (this.map.get(receiver).size() == 0) {
return null;
} else {
Message m = this.map.get(receiver).get(0);
this.map.get(receiver).remove(0);
return m;
}
}
}
public HashMap<Integer, List<Message>> getMap() {
synchronized (map) {
return map;
}
}
}
I have tried so far :
Creating the mailer object inside the run method in agent.
Going by the idea (based on your own answer to this question) that you made the map static, you've made 2 mistakes.
do not use static
static means there is one map for the entire JVM you run this on. This is not actually a good thing: Now you can't create separate mailers on one JVM in the future, and you've made it hard to test stuff.
You want something else: A way to group a bunch of mailer threads together (these are all mailers for the agent), but a bit more discerning than a simple: "ALL mailers in the ENTIRE system are all the one mailer for the one agent that will ever run".
A trivial way to do this is to pass the map in as argument. Alternatively, have the map be part of the agent, and pass the agent to the mailer constructor, and have the mailer ask the agent for the map every time.
this is not thread safe
Thread safety is a crucial concept to get right, because the failure mode if you get it wrong is extremely annoying: It may or may not work, and the JVM is free to base whether it'll work right this moment or won't work on the phase of the moon or the flip of a coin: The JVM is given room to do whatever it feels like it needs to, in order to have a JVM that can make full use of the CPU's powers regardless of which CPU and operating system your app is running on.
Your code is not thread safe.
In any given moment, if 2 threads are both referring to the same field, you've got a problem: You need to ensure that this is done 'safely', and the compiler nor the runtime will throw errors if you fail to do this, but you will get bizarre behaviour because the JVM is free to give you caches, refuse to synchronize things, make ghosts of data appear, and more.
In this case the fix is near-trivial: Use java.util.concurrent.ConcurrentHashMap instead, that's all you'd have to do to make this safe.
Whenever you're interacting with a field that doesn't have a convenient 'typesafe' type, or you're messing with the field itself (one thread assigns a new value to the field, another reads it - you don't do that here, there is just the one field that always points at the same map, but you're messing with the map) - you need to use synchronized and/or volatile and/or locks from the java.util.concurrent package and in general it gets very complicated. Concurrent programming is hard.
I was able to solve this by changing the mailer to static in the Agent class

Compose variable number of ListenableFuture

I'm quite new to Futures and am stuck on chaining calls and create a list of objects. I'm using Android, API min is 19.
I want to code the method getAllFoo() below:
ListenableFuture<List<Foo>> getAllFoo() {
// ...
}
I have these 2 methods available:
ListenableFuture<Foo> getFoo(int index) {
// gets a Foo by its index
}
ListenableFuture<Integer> getNbFoo() {
// gets the total number of Foo objects
}
Method Futures.allAsList() would work nicely here, but my main constraint is that each call to getFoo(int index) cannot occur until the previous one is completed.
As far as I understand it (and tested it), Futures.allAsList() "fans-out" the calls (all the calls start at the same time), so I can't use something like that:
ListenableFuture<List<Foo>> getAllFoo() {
// ...
List<ListenableFuture<Foo>> allFutureFoos = new ArrayList<>();
for (int i = 0; i < size; i++) {
allFutureFoos.add(getFoo(i));
}
ListenableFuture<List<Foo>> allFoos = Futures.allAsList(allFutureFoos);
return allFoos;
}
I have this kind of (ugly) solution (that works):
// ...
final SettableFuture<List<Foo>> future = SettableFuture.create();
List<Foo> listFoos = new ArrayList<>();
addApToList(future, 0, nbFoo, listFoos);
// ...
private ListenableFuture<List<Foo>> addFooToList(SettableFuture future, int idx, int size, List<Foo> allFoos) {
Futures.addCallback(getFoo(idx), new FutureCallback<Foo>() {
#Override
public void onSuccess(Foo foo) {
allFoos.add(foo);
if ((idx + 1) < size) {
addFooToList(future, idx + 1, size, allFoos);
} else {
future.set(allFoos);
}
}
#Override
public void onFailure(Throwable throwable) {
future.setException(throwable);
}
});
return future;
}
How can I implement that elegantly using ListenableFuture ?
I found multiple related topics (like this or that), but these are using "coded" transform, and are not based on a variable number of transformations.
How can I compose ListenableFutures and get the same return value as Futures.allAsList(), but by chaining calls (fan-in)?
Thanks !
As a general rule, it's better to chain derived futures together with transform/catching/whennAllSucceed/whenAllComplete than with manual addListener/addCallback calls. The transformation methods can do some more for you:
present fewer opportunities to forget to set an output, thus hanging the program
propagate cancellation
avoid retaining memory longer than needed
do tricks to reduce the chance of stack overflows
Anyway, I'm not sure there's a particularly elegant way to do this, but I suggest something along these lines (untested!):
ListenableFuture<Integer> countFuture = getNbFoo();
return countFuture.transformAsync(
count -> {
List<ListenableFuture<Foo>> results = new ArrayList<>();
ListenableFuture<?> previous = countFuture;
for (int i = 0; i < count; i++) {
final int index = i;
ListenableFuture<Foo> current = previous.transformAsync(
unused -> getFoo(index),
directExecutor());
results.add(current);
previous = current;
}
return allAsList(results);
},
directExecutor());

Arrays.sort on 2-dimensional array without triggering Garbage Collection?

This code works perfectly, but unfortunately it triggers garbage collection because of the Arrays.sort() Comparator.
Is there a way to do this that won't trigger Garbage Collection?
(NOTE: This code has been modified to be more "generic". The actual code is for an Android game, which is why Garbage Collection-induced slowdown is an issue.)
static final byte INCOME = 0;
static final byte INDEX = 1;
public void vSortEmployees() {
nPaidEmployees = 0;
for (nIter=0; nIter<MAX_EMPLOYEES; nIter++) {
if ((employees[nIter].current == true) && (employees[nIter].volunteer == false)) {
// We have another current and paid employee; add that employee's "amount earned to date" to the list.
paidemployees[nPaidEmployees][INCOME] = employees[nIter].fGetTotalIncomeToDate();
paidemployees[nPaidEmployees][INDEX] = nIter;
nPaidEmployees++;
}
}
Arrays.sort(paidemployees, new Comparator<float[]>() {
#Override
public int compare(float[] f1, float[] f2) {
if (f2[INCOME] < f1[INCOME])
return -1;
else if (f2[INCOME] > f1[INCOME])
return 1;
else
return 0;
}
});
// Now we have a list of current, paid employees in order of income received.
// Highest income paid out
paidemployees[0][INCOME]
// Second highest income paid out
paidemployees[1][INCOME]
// If we need to reference the original employee object, we can:
employees[paidemployees[0][INDEX]].getName();
}
There is not way to consistently trigger or not to trigger GC. GC lives its own life. The fact that it runs when you are sorting your array does not mean anything.
But however you probably can do something. Just do not user anonymous inner class for comparator. You do not really need this. Use regular class and create its object as a singleton. Then just use this instance. In this case no new objects will be created in your code during the sort and GC probably will not run.
class FloatArrayComparator implements Comparator<float[]>() {
#Override
public int compare(float[] f1, float[] f2) {
if (f2[INCOME] < f1[INCOME])
return -1;
else if (f2[INCOME] > f1[INCOME])
return 1;
else
return 0;
}
};
class SomeClass {
private Comparator<float[]> floatArrayComparator = new FloatArrayComparator();
void myMethod() {
Arrays.sort(myArray, floatArrayComparator);
}
}

synchronize and merge messaging/data flow

It is about very common sensor data processing problem.
To synchronize and merge sensor data from different sources, I would like to implement it in Java without too complicated 3rd libs or framework.
Say, I define an object (O) which consists of, for example, 4 attributes (A1,..A4). The 4 attributes come from different data channels, e.g. socket channel.
The 4 attributes arrive generally in a rate of 1.0 ~ 2.0 Hz and their arrivals are independent from each other.
Once there are 4 attributes (A1, ..A4) coming at the same time (within a small time window, e.g. 100ms), then I construct a new object (O) from those 4 attributes.
a descriptive scenario is as follows.
the arrival time point of A1 ~ A4 is marked with *.
Objects O1 ~ U3 are constructed on the time point of t1, t2 and t3 respectively.
Some attributes arrives between t2 and t3, but are not complete for constructing an Object, therefore they
would be dropped and ignored.
A1 * * * *
A2 * * * *
A3 * * *
A4 * * * *
--------|------------|-----------------|----------> time
t1 t2 t3
O1 O2 O3
some requirements:
identify the time point a.s.a.p. to construct a object from the last incoming 4 attributes.
FIFO, O1 must be constructed before O2, and so on.
less locking in Java
drop data eventually if they are not complete to construct a object.
Some quick idea on implementation are:
store any incoming attributes in a FIFO queue of time-discrete buckets (each bucket contains 4 different attributes).
run an endless thread concurrently to check the FIFO queue (from the head of the queue) if any bucket is already filled with 4 different attributes. If yes, then construct an object and remove the bucket from the queue. If a bucket is not complete filled within a specific time window, it will be dropped.
any suggestion and correction is welcome!
This is unlikely to solve your problem, but it might point you in the right direction.
I would use Google Guava's MapMaker for a first attempt:
ConcurrentMap<Key, Bucket> graphs = new MapMaker()
.expireAfterAccess(100, TimeUnit.MILLISECOND)
.makeComputingMap(new Function<Key, Bucket>() {
public Bucket apply(Key key) {
return new Bucket(key);
}
});
This would create a map whose entries would disappear if they had not been accessed for 100 ms, and creates a new bucket when it is asked for.
What I can't work out is exactly what the Key would be :S What you're really after is the same kind of functionality in the form of a queue.
Here's another crazy idea:
use one single LinkedBlockingQueue to write values to from all sensors A1-A4
assign this queue to AtomicReference variable
create a timer task which will switch this queue with a new one at specified intervals (100ms)
fetch all data from the old queue and see if you have all data A1-A4
if yes, then create the object, otherwise drop everything
This is another way of doing it - it's just pseudocode though, you'll need to write it yourself :)
class SlidingWindow {
AtomicReference<Object> a1;
AtomicReference<Object> a2;
AtomicReference<Object> a3;
AtomicReference<Object> a4;
Queue<Long> arrivalTimes = new Queue(4);
public Bucket setA1(Object data) {
a1.set(data);
now = System.currentTimeInMillis()
long oldestArrivalTime = arrivalTimes.pop();
arrivalTimes.push(now);
if (now - oldestArrivalTime < 100) {
return buildBucket();
}
return null;
}
public Bucket setA2(Object data) { ...
...
private Bucket buildBucket() {
Bucket b = new Bucket(a1, a2, a3, a4);
a1.clear();
a2.clear();
a3.clear();
a4.clear();
return b;
}
}
You could do something like this, the get operation is blocking till data has arrived, the add operation is not blocking. The get operation could be optimized a bit so that you keep candidates in a paralell structure so that you don't need to iterate over all candidates when filtering out old items. Iterating over 4 items should however be fast enough.
import java.util.HashMap;
import java.util.Iterator;
import java.util.concurrent.LinkedBlockingQueue;
public class Filter<V> {
private static final long MAX_AGE_IN_MS = 100;
private final int numberOfSources;
private final LinkedBlockingQueue<Item> values = new LinkedBlockingQueue<Item>();
public Filter(int numberOfSources) {
this.numberOfSources = numberOfSources;
}
public void add(String source, V data) {
values.add(new Item(source, data));
}
public void get() throws InterruptedException {
HashMap<String, Item> result = new HashMap<String, Item>();
while (true) {
while (result.size() < numberOfSources) {
Item i = values.take();
result.put(i.source, i);
if (result.size() == numberOfSources) {
break;
}
}
//We got candidates from each source now, check if some are too old.
long now = System.currentTimeMillis();
Iterator<Item> it = result.values().iterator();
while (it.hasNext()) {
Item item = it.next();
if (now - item.creationTime > MAX_AGE_IN_MS) {
it.remove();
}
}
if (result.size() == numberOfSources) {
System.out.println("Got result, create a result object and return the items " + result.values());
break;
}
}
}
private class Item {
final String source;
final V value;
final long creationTime;
public Item(String source, V value) {
this.source = source;
this.value = value;
this.creationTime = System.currentTimeMillis();
}
public String toString() {
return String.valueOf(value);
}
}
public static void main(String[] args) throws Exception {
final Filter<String> filter = new Filter<String>(4);
new Thread(new Runnable() {
public void run() {
try {
filter.get();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}).start();
filter.add("a0", "va0.1");
filter.add("a0", "va0.2");
Thread.sleep(2000);
filter.add("a0", "va0.3");
Thread.sleep(100);
filter.add("a1", "va1.1");
filter.add("a2", "va2.1");
filter.add("a0", "va0.4");
Thread.sleep(100);
filter.add("a3", "va3.1");
Thread.sleep(10);
filter.add("a1", "va1.2");
filter.add("a2", "va2.2");
filter.add("a0", "va0.5");
}
}

Null-free "maps": Is a callback solution slower than tryGet()?

In comments to "How to implement List, Set, and Map in null free design?", Steven Sudit and I got into a discussion about using a callback, with handlers for "found" and "not found" situations, vs. a tryGet() method, taking an out parameter and returning a boolean indicating whether the out parameter had been populated. Steven maintained that the callback approach was more complex and almost certain to be slower; I maintained that the complexity was no greater and the performance at worst the same.
But code speaks louder than words, so I thought I'd implement both and see what I got. The original question was fairly theoretical with regard to language ("And for argument sake, let's say this language don't even have null") -- I've used Java here because that's what I've got handy. Java doesn't have out parameters, but it doesn't have first-class functions either, so style-wise, it should suck equally for both approaches.
(Digression: As far as complexity goes: I like the callback design because it inherently forces the user of the API to handle both cases, whereas the tryGet() design requires callers to perform their own boilerplate conditional check, which they could forget or get wrong. But having now implemented both, I can see why the tryGet() design looks simpler, at least in the short term.)
First, the callback example:
class CallbackMap<K, V> {
private final Map<K, V> backingMap;
public CallbackMap(Map<K, V> backingMap) {
this.backingMap = backingMap;
}
void lookup(K key, Callback<K, V> handler) {
V val = backingMap.get(key);
if (val == null) {
handler.handleMissing(key);
} else {
handler.handleFound(key, val);
}
}
}
interface Callback<K, V> {
void handleFound(K key, V value);
void handleMissing(K key);
}
class CallbackExample {
private final Map<String, String> map;
private final List<String> found;
private final List<String> missing;
private Callback<String, String> handler;
public CallbackExample(Map<String, String> map) {
this.map = map;
found = new ArrayList<String>(map.size());
missing = new ArrayList<String>(map.size());
handler = new Callback<String, String>() {
public void handleFound(String key, String value) {
found.add(key + ": " + value);
}
public void handleMissing(String key) {
missing.add(key);
}
};
}
void test() {
CallbackMap<String, String> cbMap = new CallbackMap<String, String>(map);
for (int i = 0, count = map.size(); i < count; i++) {
String key = "key" + i;
cbMap.lookup(key, handler);
}
System.out.println(found.size() + " found");
System.out.println(missing.size() + " missing");
}
}
Now, the tryGet() example -- as best I understand the pattern (and I might well be wrong):
class TryGetMap<K, V> {
private final Map<K, V> backingMap;
public TryGetMap(Map<K, V> backingMap) {
this.backingMap = backingMap;
}
boolean tryGet(K key, OutParameter<V> valueParam) {
V val = backingMap.get(key);
if (val == null) {
return false;
}
valueParam.value = val;
return true;
}
}
class OutParameter<V> {
V value;
}
class TryGetExample {
private final Map<String, String> map;
private final List<String> found;
private final List<String> missing;
private final OutParameter<String> out = new OutParameter<String>();
public TryGetExample(Map<String, String> map) {
this.map = map;
found = new ArrayList<String>(map.size());
missing = new ArrayList<String>(map.size());
}
void test() {
TryGetMap<String, String> tgMap = new TryGetMap<String, String>(map);
for (int i = 0, count = map.size(); i < count; i++) {
String key = "key" + i;
if (tgMap.tryGet(key, out)) {
found.add(key + ": " + out.value);
} else {
missing.add(key);
}
}
System.out.println(found.size() + " found");
System.out.println(missing.size() + " missing");
}
}
And finally, the performance test code:
public static void main(String[] args) {
int size = 200000;
Map<String, String> map = new HashMap<String, String>();
for (int i = 0; i < size; i++) {
String val = (i % 5 == 0) ? null : "value" + i;
map.put("key" + i, val);
}
long totalCallback = 0;
long totalTryGet = 0;
int iterations = 20;
for (int i = 0; i < iterations; i++) {
{
TryGetExample tryGet = new TryGetExample(map);
long tryGetStart = System.currentTimeMillis();
tryGet.test();
totalTryGet += (System.currentTimeMillis() - tryGetStart);
}
System.gc();
{
CallbackExample callback = new CallbackExample(map);
long callbackStart = System.currentTimeMillis();
callback.test();
totalCallback += (System.currentTimeMillis() - callbackStart);
}
System.gc();
}
System.out.println("Avg. callback: " + (totalCallback / iterations));
System.out.println("Avg. tryGet(): " + (totalTryGet / iterations));
}
On my first attempt, I got 50% worse performance for callback than for tryGet(), which really surprised me. But, on a hunch, I added some garbage collection, and the performance penalty vanished.
This fits with my instinct, which is that we're basically talking about taking the same number of method calls, conditional checks, etc. and rearranging them. But then, I wrote the code, so I might well have written a suboptimal or subconsicously penalized tryGet() implementation. Thoughts?
Updated: Per comment from Michael Aaron Safyan, fixed TryGetExample to reuse OutParameter.
I would say that neither design makes sense in practice, regardless of the performance. I would argue that both mechanisms are overly complicated and, more importantly, don't take into account actual usage.
Actual Usage
If a user looks up a value in a map and it isn't there, most likely the user wants one of the following:
To insert some value with that key into the map
To get back some default value
To be informed that the value isn't there
Thus I would argue that a better, null-free API would be:
has(key) which indicates if the key is present (if one only wishes to check for the key's existence).
get(key) which reports the value if the key is present; otherwise, throws NoSuchElementException.
get(key,defaultval) which reports the value for the key, or defaultval if the key isn't present.
setdefault(key,defaultval) which inserts (key,defaultval) if key isn't present, and returns the value associated with key (which is defaultval if there is no previous mapping, otherwise prev mapping).
The only way to get back null is if you explicity ask for it as in get(key,null). This API is incredibly simple, and yet is able to handle the most common map-related tasks (in most use cases that I have encountered).
I should also add that in Java, has() would be called containsKey() while setdefault() would be called putIfAbsent(). Because get() signals an object's absence via a NoSuchElementException, it is then possible to associate a key with null and treat it as a legitimate association.... if get() returns null, it means the key has been associated with the value null, not that the key is absent (although you can define your API to disallow a value of null if you so choose, in which case you would throw an IllegalArgumentException from the functions that are used to add associations if the value given is null). Another advantage to this API, is that setdefault() only needs to perform the lookup procedure once instead of twice, which would be the case if you used if( ! dict.has(key) ){ dict.set(key,val); }. Another advantage is that you do not surprise developers who write something like dict.get(key).doSomething() who assume that get() will always return a non-null object (because they have never inserted a null value into the dictionary)... instead, they get a NoSuchElementException if there is no value for that key, which is more consistent with the rest of the error checking in Java and which is also a much easier to understand and debug than NullPointerException.
Answer To Question
To answer original question, yes, you are unfairly penalizing the tryGet version.... in your callback based mechanism you construct the callback object only once and use it in all subsequent calls; whereas in your tryGet example, you construct your out parameter object in every single iteration. Try taking the line:
OutParameter out = new OutParameter();
Take the line above out of the for-loop and see if that improves the performance of the tryGet example. In other words, place the line above the for-loop, and re-use the out parameter in each iteration.
David, thanks for taking the time to write this up. I'm a C# programmer, so my Java skills are a bit vague these days. Because of this, I decided to port your code over and test it myself. I found some interesting differences and similarities, which are pretty much worth the price of admission as far as I'm concerned. Among the major differences are:
I didn't have to implement TryGet because it's built into Dictionary.
In order to use the native TryGet, instead of inserting nulls to simulate misses, I simply omitted those values. This still means that v = map[k] would have set v to null, so I think it's a proper porting. In hindsight, I could have inserted the nulls and changed (_map.TryGetValue(key, out value)) to (_map.TryGetValue(key, out value) && value != null)), but I'm glad I didn't.
I want to be exceedingly fair. So, to keep the code as compact and maintainable as possible, I used lambda calculus notation, which let me define the callbacks painlessly. This hides much of the complexity of setting up anonymous delegates, and allows me to use closures seamlessly. Ironically, the implementation of Lookup uses TryGet internally.
Instead of declaring a new type of Dictionary, I used an extension method to graft Lookup onto the standard dictionary, much simplifying the code.
With apologies for the less-than-professional quality of the code, here it is:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication1
{
static class CallbackDictionary
{
public static void Lookup<K, V>(this Dictionary<K, V> map, K key, Action<K, V> found, Action<K> missed)
{
V v;
if (map.TryGetValue(key, out v))
found(key, v);
else
missed(key);
}
}
class TryGetExample
{
private Dictionary<string, string> _map;
private List<string> _found;
private List<string> _missing;
public TryGetExample(Dictionary<string, string> map)
{
_map = map;
_found = new List<string>(_map.Count);
_missing = new List<string>(_map.Count);
}
public void TestTryGet()
{
for (int i = 0; i < _map.Count; i++)
{
string key = "key" + i;
string value;
if (_map.TryGetValue(key, out value))
_found.Add(key + ": " + value);
else
_missing.Add(key);
}
Console.WriteLine(_found.Count() + " found");
Console.WriteLine(_missing.Count() + " missing");
}
public void TestCallback()
{
for (int i = 0; i < _map.Count; i++)
_map.Lookup("key" + i, (k, v) => _found.Add(k + ": " + v), k => _missing.Add(k));
Console.WriteLine(_found.Count() + " found");
Console.WriteLine(_missing.Count() + " missing");
}
}
class Program
{
static void Main(string[] args)
{
int size = 2000000;
var map = new Dictionary<string, string>(size);
for (int i = 0; i < size; i++)
if (i % 5 != 0)
map.Add("key" + i, "value" + i);
long totalCallback = 0;
long totalTryGet = 0;
int iterations = 20;
TryGetExample tryGet;
for (int i = 0; i < iterations; i++)
{
tryGet = new TryGetExample(map);
long tryGetStart = DateTime.UtcNow.Ticks;
tryGet.TestTryGet();
totalTryGet += (DateTime.UtcNow.Ticks - tryGetStart);
GC.Collect();
tryGet = new TryGetExample(map);
long callbackStart = DateTime.UtcNow.Ticks;
tryGet.TestCallback();
totalCallback += (DateTime.UtcNow.Ticks - callbackStart);
GC.Collect();
}
Console.WriteLine("Avg. callback: " + (totalCallback / iterations));
Console.WriteLine("Avg. tryGet(): " + (totalTryGet / iterations));
}
}
}
My performance expectations, as I said in the article that inspired this one, would be that neither one is much faster or slower than the other. After all, most of the work is in the searching and adding, not in the simple logic that structures it. In fact, it varied a bit among runs, but I was unable to detect any consistent advantage.
Part of the problem is that I used a low-precision timer and the test was short, so I increased the count by 10x to 2000000 and that helped. Now callbacks are about 3% slower, which I do not consider significant. On my fairly slow machine, callbacks took 17773437 while tryget took 17234375.
Now, as for code complexity, it's a bit unfair because TryGet is native, so let's just ignore the fact that I had to add a callback interface. At the calling spot, lambda notation did a great job of hiding the complexity. If anything, it's actually shorter than the if/then/else used in the TryGet version, although I suppose I could have used a ternary operator to make it equally compact.
On the whole, I found the C# to be more elegant, and only some of that is due to my bias as a C# programmer. Mainly, I didn't have to define and implement interfaces, which cut down on the plumbing overhead. I also used pretty standard .NET conventions, which seem to be a bit more streamlined than the sort of style favored in Java.

Categories

Resources