concurrentHashMap and Atomic Values - java

My Rest API works fine. However, I'm concerned about concurrency issues, though I've tested via scripts and have yet to see any. In my studies, I encountered some material with regards to utilizing Atomic Values with concurrentHasMap to avoid what amounts to dirty reads. My questions is twofold. First, should I be concerned, given my implementation? Second, if I should be, what would be the most prudent way to implement Atomic values, if indeed I should? I've contemplated dropping the wrapper class for the RestTemplate and simply passing a String back to the Angular 4 component as a catalyst for speed, but given I may use the value objects elsewhere, I'm hesitant. See, implementation below.
#Service
#EnableScheduling
public class TickerService implements IQuoteService {
#Autowired
private ApplicationConstants Constants;
private ConcurrentHashMap<String,Quote> quotes = new ConcurrentHashMap<String, Quote>();
private ConcurrentHashMap<String,LocalDateTime> quoteExpirationQueue = new ConcurrentHashMap<String, LocalDateTime>();
private final RestTemplate restTemplate;
public TickerService(RestTemplateBuilder restTemplateBuilder) {
this.restTemplate = restTemplateBuilder.build();
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
public Quote getQuote(String symbol) {
if (this.quotes.containsKey(symbol)){
Quote q = (Quote)this.quotes.get(symbol);
//Update Expiration
LocalDateTime ldt = LocalDateTime.now();
this.quoteExpirationQueue.put(symbol, ldt.plus(Constants.getQuoteExpirationMins(),ChronoUnit.MINUTES));
return q;
} else {
QuoteResponseWrapper qRes = this.restTemplate.getForObject( Constants.getRestURL(symbol), QuoteResponseWrapper.class, symbol);
ArrayList<Quote> res = new ArrayList<Quote>();
res = qRes.getQuoteResponse().getResult();
//Add to Cache
quotes.put(symbol, res.get(0));
//Set Expiration
LocalDateTime ldt = LocalDateTime.now();
this.quoteExpirationQueue.put(symbol, ldt.plus(Constants.getQuoteExpirationMins(),ChronoUnit.MINUTES));
return res.get(0);
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
public ConcurrentHashMap<String,Quote> getQuotes(){
return this.quotes;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#Scheduled(fixedDelayString = "${application.quoteRefreshFrequency}")
public void refreshQuotes(){
if (quoteExpirationQueue.isEmpty()) {
return;
}
LocalDateTime ldt = LocalDateTime.now();
//Purge Expired Quotes
String expiredQuotes = quoteExpirationQueue.entrySet().stream().filter(x -> x.getValue().isBefore(ldt)).map(p -> p.getKey()).collect(Collectors.joining(","));
if (!expiredQuotes.equals("")) {
this.purgeQuotes(expiredQuotes.split(","));
}
String allQuotes = quoteExpirationQueue.entrySet().stream().filter(x -> x.getValue().isAfter(ldt)).map(p -> p.getKey()).collect(Collectors.joining(","));
List<String> qList = Arrays.asList(allQuotes.split(","));
Stack<String> stack = new Stack<String>();
stack.addAll(qList);
// Break Requests Into Manageable Chunks using property file settings
while (stack.size() > Constants.getMaxQuoteRequest()) {
String qSegment = "";
int i = 0;
while (i < Constants.getMaxQuoteRequest() && !stack.isEmpty()) {
qSegment = qSegment.concat(stack.pop() + ",");
i++;
}
logger.debug(qSegment.substring(0, qSegment.lastIndexOf(",")));
this.updateQuotes(qSegment);
}
// Handle Remaining Request Delta
if (stack.size() < Constants.getMaxQuoteRequest() && !stack.isEmpty()) {
String rSegment = "";
while (!stack.isEmpty()){
rSegment = rSegment.concat(stack.pop() + ",");
}
logger.debug(rSegment);
this.updateQuotes(rSegment.substring(0, rSegment.lastIndexOf(",")));
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
private void updateQuotes(String symbols) {
if (symbols.equals("")) {
return;
}
System.out.println("refreshing -> " + symbols);
QuoteResponseWrapper qRes = this.restTemplate.getForObject( Constants.getRestURL(symbols), QuoteResponseWrapper.class, symbols);
for (Quote q : qRes.getQuoteResponse().getResult()) {
this.quotes.put(q.getSymbol(), q);
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
private void purgeQuotes(String[] symbols) {
for (String q : symbols) {
System.out.println("purging -> " + q);
this.quotes.remove(q);
this.quoteExpirationQueue.remove(q);
}
}
}

Changed implementation of IQuoteService and implementation TickerService to use concurrenHashMap with Atomic References:
#Autowired
private ApplicationConstants Constants;
private ConcurrentHashMap<AtomicReference<String>,AtomicReference<Quote>>
quotes = new ConcurrentHashMap<AtomicReference<String>,AtomicReference<Quote>> ();
private ConcurrentHashMap<AtomicReference<String>,AtomicReference<LocalDateTime>> quoteExpirationQueue = new ConcurrentHashMap<AtomicReference<String>,AtomicReference<LocalDateTime>>();
private final RestTemplate restTemplate;
The code works precisely as it did prior with the with the new implementation being that it "should" ensure that updates to values are not partially read prior to being completely written, and the values obtained should be consistent. Given, I could find no sound examples and acquire no answers on this topic, I will test this and post any issues I find.

The main concurrency risks with this code come about if refreshQuotes() was to be called concurrently. If this is a risk, then refreshQuotes just needs to be marked as synchronized.
Working on the premise that refreshQuotes() is only ever called once at a time, and that Quote/LocalDateTime are both immutable; then the question appears to be does updating immutable values within a ConcurrentHashMap risk dirty reads/writes. The answer is no, the values are immutable and ConcurrentHashMap handles the concurrency of updating the references.
For more information, I strongly recommend reading JSR133 (The Java Memory Model). It covers in some detail when data will and will not become visible between threads. Doug Lea's JSR133 Cookbook will almost certainly give you far more information than you ever wanted to know.

Related

Merging few collections of different objects into one

I got several collections of objects I'm receiving from external API. For this example, lets say they look like this. In real scenario I can't modify those classes.
#Data
public class ExternalResourceA {
private LocalDate date;
private String type;
}
#Data
public class ExternalResourceB {
private LocalDate date;
private String id;
}
And I'm having my own class, that combines those two based on few business rules that are not important here. Also, same as above, this is generated class, can't edit it. All I can do is write wrapper class and translate it later to original one.
#Data
public class MyResource {
private LocalDate date;
private String type;
private String id;
}
For this example, let's say this is the data I'm getting from API
private List<ExternalResourceA> externalCollectionA() {
final List<ExternalResourceA> collection = new ArrayList<>();
final var today = LocalDate.now();
for(int i = 0 ; i < 36; i++) {
collection.add(new ExternalResourceA(today.minusMonths(i), "type" + i));
}
Collections.shuffle(collection);
return collection;
}
private List<ExternalResourceB> externalCollectionB() {
final List<ExternalResourceB> collection = new ArrayList<>();
final var today = LocalDate.now();
for(int i = 0 ; i < 36; i++) {
collection.add(new ExternalResourceB(today.minusMonths(i), "id" + i));
}
Collections.shuffle(collection);
return collection;
}
Now, I need to combine data from ExternalResourceA and ExternalResourceB from year 2019 and save it into MyResource. Some of data might be missing, for example I got A for march 2019, but I dont have B for march 2019.
I managed to do that like this
void filterResources() {
final var resourceAin2019 = getAFrom2019(externalCollectionA());
final var resourceBin2019 = getBFrom2019(externalCollectionB());
final List<MyResource> myCollection = new ArrayList<>();
for(int i = 0; i <= 12; i++) {
final var resource = new MyResource();
findA(resourceAin2019, i).ifPresent(res -> {
resource.setDate(res.getDate());
resource.setType(res.getType());
});
findB(resourceBin2019, i).ifPresent(res -> {
resource.setId(res.getId());
});
myCollection.add(resource);
}
myCollection.forEach(System.out::println);
}
private Optional<ExternalResourceA> findA(List<ExternalResourceA> list, int index) {
return list.size() > index ? Optional.ofNullable(list.get(index)) : Optional.empty();
}
private Optional<ExternalResourceB> findB(List<ExternalResourceB> list, int index) {
return list.size() > index ? Optional.ofNullable(list.get(index)) : Optional.empty();
}
private List<ExternalResourceA> getAFrom2019(List<ExternalResourceA> resource) {
return resource.stream()
.filter(res -> res.getDate().isAfter(LocalDate.parse("2019-01-01") || res.getDate().isEqual(LocalDate.parse("2019-01-01")))
.sorted(Comparator.comparing(ExternalResourceA::getDate))
.collect(Collectors.toList());
}
private List<ExternalResourceB> getBFrom2019(List<ExternalResourceB> resource) {
return resource.stream()
.filter(res -> res.getDate().isAfter(LocalDate.parse("2019-01-01") || res.getDate().isEqual(LocalDate.parse("2019-01-01")))
.sorted(Comparator.comparing(ExternalResourceB::getDate))
.collect(Collectors.toList());
}
And it kinda works but even in this simple example, there is a lot of almost identical functions, that just operate on other classes. In real scenario, this will grow even more, as I got much more structures I need to deal with. I'm wondering if there isn't possibility to make this much cleanier and simplier?
Edit#
After further checking, my solution isn't working as I expected, its simplier to show than explain, here is result when I'm missing data from ExternalResourceB on March 2019, end empty value appears on last element instead.

How to make this piece of code thread safe?

This code is part of within a method. The code go through two lists using two for loop. I want to see whether there is a possibility of using multi thread to speed up this process for the two loops. My concern is how to make it thread safe.
EDITTED: more complete code
static class Similarity {
double similarity;
String seedWord;
String candidateWord;
public Similarity(double similarity, String seedWord, String candidateWord) {
this.similarity = similarity;
this.seedWord = seedWord;
this.candidateWord = candidateWord;
}
public double getSimilarity() {
return similarity;
}
public String getSeedWord() {
return seedWord;
}
public String getCandidateWord() {
return candidateWord;
}
}
static class SimilarityTask implements Callable<Similarity> {
Word2Vec vectors;
String seedWord;
String candidateWord;
Collection<String> label1;
Collection<String> label2;
public SimilarityTask(Word2Vec vectors, String seedWord, String candidateWord, Collection<String> label1, Collection<String> label2) {
this.vectors = vectors;
this.seedWord = seedWord;
this.candidateWord = candidateWord;
this.label1 = label1;
this.label2 = label2;
}
#Override
public Similarity call() {
double similarity = cosineSimForSentence(vectors, label1, label2);
return new Similarity(similarity, seedWord, candidateWord);
}
}
Now, is this 'compute' thread safe? There are 3 variables involved:
1) vectors;
2) toeknizerFactory;
3) similarities;
public static void compute() throws Exception {
File modelFile = new File("sim.bin");
Word2Vec vectors = WordVectorSerializer.readWord2VecModel(modelFile);
TokenizerFactory tokenizerFactory = new TokenizerFactory()
List<String> seedList = loadSeeds();
List<String> candidateList = loadCandidates();
log.info("Computing similarity: ");
ExecutorService POOL = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
List<Future<Similarity>> tasks = new ArrayList<>();
int totalCount=0;
for (String seed : seedList) {
Collection<String> label1 = getTokens(seed.trim(), tokenizerFactory);
if (label1.isEmpty()) {
continue;
}
for (String candidate : candidateList) {
Collection<String> label2 = getTokens(candidate.trim(), tokenizerFactory);
if (label2.isEmpty()) {
continue;
}
Callable<Similarity> callable = new SimilarityTask(vectors, seed, candidate, label1, label2);
tasks.add(POOL.submit(callable));
log.info("TotalCount:" + (++totalCount));
}
}
Map<String, Set<String>> similarities = new HashMap<>();
int validCount = 0;
for (Future<Similarity> task : tasks) {
Similarity simi = task.get();
Double similarity = simi.getSimilarity();
String seedWord = simi.getSeedWord();
String candidateWord = simi.getCandidateWord();
Set<String> similarityWords = similarities.get(seedWord);
if (similarity >= 0.85) {
if (similarityWords == null) {
similarityWords = new HashSet<>();
}
similarityWords.add(candidateWord);
log.info(seedWord + " " + similarity + " " + candidateWord);
log.info("ValidCount: " + (++validCount));
}
if (similarityWords != null) {
similarities.put(seedWord, similarityWords);
}
}
}
Added one more relevant method, which is used by the call() method:
public static double cosineSimForSentence(Word2Vec vectors, Collection<String> label1, Collection<String> label2) {
try {
return Transforms.cosineSim(vectors.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
} catch (Exception e) {
log.warn("OOV: " + label1.toString() + " " + label2.toString());
//e.getMessage();
//e.printStackTrace();
return 0.0;
}
}
(Answer updated for changed question.)
In general you should profile the code before attempting to optimise it, particularly if it is quite complex.
For threading you need to identify which mutable state is shared between threads. Ideally as much as that as possible before resorting to locks and concurrent data structures. Mutable state that is contained within one thread isn't a problem as such. Immutables are great.
I assume nothing passed to your task gets modified. It's tricky to tell. final on fields is a good idea. Collections can be placed in unmodifiable wrappers, though that doesn't stop them being modified via other references and does now show itself in static types.
Assuming you don't break up the inner loop, the only shared mutable state appears to be similarities and the values it contains.
You may or may not find you still end up doing too much serially and need to change similarities to become concurrent
ConcurrentMap<String, Set<String>> similarities = new ConcurrentHashMap<>();
The get and put of similarities will need to be thread-safe. I suggest always creating the Set.
Set<String> similarityWords = similarities.getOrDefault(seed, new HashSet<>());
or
Set<String> similarityWords = similarities.computeIfAbsent(seed, key -> new HashSet<>());
You could use a thread-safe Set (for instance with Collections.synchronizedSet), but I suggest holding a relevant lock for the entire inner loop.
synchronized (similarityWords) {
...
}
If you wanted to create similarityWords lazily then it would be "more fun".

Related jobs in JSprit , One before another case : IllegalArgumentException

This question is related to this topic : Related jobs in JSprit
I'm trying to use the "one before another" constraint but i'm experiencing a java.lang.IllegalArgumentException: arg must not be null . It looks like Capacity cap2 is null when calculating Capacity max. I don't really understand why.
:(
Do you have an idea about this?
For the record, I'm on the 1.6.2 version. TY for your help.
String before = "2";
String after = "11";
final StateManager stateManager = new StateManager(problem);
stateManager.addStateUpdater(new JobsInRouteMemorizer(stateManager));
ConstraintManager constraintManager = new ConstraintManager(problem, stateManager);
constraintManager.addConstraint(new OneJobBeforeAnother(stateManager, before, after));
final RewardAndPenaltiesThroughSoftConstraints contrib = new RewardAndPenaltiesThroughSoftConstraints(problem, before, after);
SolutionCostCalculator costCalculator = new SolutionCostCalculator() {
#Override
public double getCosts(VehicleRoutingProblemSolution solution) {
double costs = 0.;
List<VehicleRoute> routes = (List<VehicleRoute>) solution.getRoutes();
for(VehicleRoute route : routes){
costs+=route.getVehicle().getType().getVehicleCostParams().fix;
costs+=stateManager.getRouteState(route, InternalStates.COSTS, Double.class);
costs+=contrib.getCosts(route);
}
return costs;
}
};
VehicleRoutingAlgorithmBuilder vraBuilder = new VehicleRoutingAlgorithmBuilder(problem,
"algorithmConfig.xml");
vraBuilder.addCoreConstraints();
vraBuilder.setStateAndConstraintManager(stateManager, constraintManager);
vraBuilder.addDefaultCostCalculators();
vraBuilder.setObjectiveFunction(costCalculator);
algorithm = vraBuilder.build();
public class JobsInRouteMemorizer implements StateUpdater, ActivityVisitor {
private StateManager stateManager;
private VehicleRoute route;
public JobsInRouteMemorizer(StateManager stateManager) {
super();
this.stateManager = stateManager;
}
#Override
public void begin(VehicleRoute route) {
this.route=route;
}
#Override
public void visit(TourActivity activity) {
if(activity instanceof JobActivity){
String jobId = ((JobActivity) activity).getJob().getId();
StateId stateId = stateManager.createStateId(jobId);
System.out.println(stateId.getIndex());
System.out.println(stateId.toString());
stateManager.putProblemState(stateId, VehicleRoute.class, this.route);
}
}
#Override
public void finish() {}
}
Short answer: You cannot create StateId instances on the fly. All StateId instances have to be generated before the algorithm is run. See longer answer for why doing this is still not a good idea and you should consider a redesign.
Analysis: I ran into the same problem and traced it back to the way StateId instances are created in StateManager:
public StateId createStateId(String name) {
if (createdStateIds.containsKey(name)) return createdStateIds.get(name);
if (stateIndexCounter >= activityStates[0].length) {
activityStates = new Object[vrp.getNuActivities() + 1][stateIndexCounter + 1];
vehicleDependentActivityStates = new Object[nuActivities][nuVehicleTypeKeys][stateIndexCounter + 1];
routeStatesArr = new Object[vrp.getNuActivities()+1][stateIndexCounter+1];
vehicleDependentRouteStatesArr = new Object[nuActivities][nuVehicleTypeKeys][stateIndexCounter+1];
problemStates = new Object[stateIndexCounter+1];
}
StateId id = StateFactory.createId(name, stateIndexCounter);
incStateIndexCounter();
createdStateIds.put(name, id);
return id;
}
Each time you create a new StateId and there is no more space available for states the old state arrays are overwritten with a longer version to make space for your new state (at start there is space for 30 StateIds, a few already used by JSprit itself). As you can see, the old elements aren't copied over, so what happens here is a race condition between UpdateLoads, which sets the state used as cap2, your code, which generates a new StateId and overwrites the current state and UpdateMaxCapacityUtilisationAtActivitiesByLookingForwardInRoute which reads the state (that doesn't exist anymore).
Given that this code only extends the arrays by one it is very inefficient to have many StateIds, as for each new StateId all arrays have to be recreated. To mitigate this I used only one StateId in my code and stored a Map<String, VehicleRoute> in it:
Map<String, VehicleRoute> routeMapping = Optional.ofNullable(stateManager.getProblemState(stateId, Map.class)).orElse(new ConcurrentHashMap<>())
This way you don't run out of StateId instances and can still store relations between an unlimited number of jobs.

Modifying local variable from inside lambda

Modifying a local variable in forEach gives a compile error:
Normal
int ordinal = 0;
for (Example s : list) {
s.setOrdinal(ordinal);
ordinal++;
}
With Lambda
int ordinal = 0;
list.forEach(s -> {
s.setOrdinal(ordinal);
ordinal++;
});
Any idea how to resolve this?
Use a wrapper
Any kind of wrapper is good.
With Java 10+, use this construct as it's very easy to setup:
var wrapper = new Object(){ int ordinal = 0; };
list.forEach(s -> {
s.setOrdinal(wrapper.ordinal++);
});
With Java 8+, use either an AtomicInteger:
AtomicInteger ordinal = new AtomicInteger(0);
list.forEach(s -> {
s.setOrdinal(ordinal.getAndIncrement());
});
... or an array:
int[] ordinal = { 0 };
list.forEach(s -> {
s.setOrdinal(ordinal[0]++);
});
Note: be very careful if you use a parallel stream. You might not end up with the expected result. Other solutions like Stuart's might be more adapted for those cases.
For types other than int
Of course, this is still valid for types other than int.
For instance, with Java 10+:
var wrapper = new Object(){ String value = ""; };
list.forEach(s->{
wrapper.value += "blah";
});
Or if you're stuck with Java 8 or 9, use the same kind of construct as we did above, but with an AtomicReference...
AtomicReference<String> value = new AtomicReference<>("");
list.forEach(s -> {
value.set(value.get() + s);
});
... or an array:
String[] value = { "" };
list.forEach(s-> {
value[0] += s;
});
This is fairly close to an XY problem. That is, the question being asked is essentially how to mutate a captured local variable from a lambda. But the actual task at hand is how to number the elements of a list.
In my experience, upward of 80% of the time there is a question of how to mutate a captured local from within a lambda, there's a better way to proceed. Usually this involves reduction, but in this case the technique of running a stream over the list indexes applies well:
IntStream.range(0, list.size())
.forEach(i -> list.get(i).setOrdinal(i));
If you only need to pass the value from the outside into the lambda, and not get it out, you can do it with a regular anonymous class instead of a lambda:
list.forEach(new Consumer<Example>() {
int ordinal = 0;
public void accept(Example s) {
s.setOrdinal(ordinal);
ordinal++;
}
});
As the used variables from outside the lamda have to be (implicitly) final, you have to use something like AtomicInteger or write your own data structure.
See
https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html#accessing-local-variables.
An alternative to AtomicInteger is to use an array (or any other object able to store a value):
final int ordinal[] = new int[] { 0 };
list.forEach ( s -> s.setOrdinal ( ordinal[ 0 ]++ ) );
But see the Stuart's answer: there might be a better way to deal with your case.
Yes, you can modify local variables from inside lambdas (in the way shown by the other answers), but you should not do it. Lambdas have been made for functional style of programming and this means: No side effects. What you want to do is considered bad style. It is also dangerous in case of parallel streams.
You should either find a solution without side effects or use a traditional for loop.
If you are on Java 10, you can use var for that:
var ordinal = new Object() { int value; };
list.forEach(s -> {
s.setOrdinal(ordinal.value);
ordinal.value++;
});
You can wrap it up to workaround the compiler but please remember that side effects in lambdas are discouraged.
To quote the javadoc
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement
A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; these should be used with care
I had a slightly different problem. Instead of incrementing a local variable in the forEach, I needed to assign an object to the local variable.
I solved this by defining a private inner domain class that wraps both the list I want to iterate over (countryList) and the output I hope to get from that list (foundCountry). Then using Java 8 "forEach", I iterate over the list field, and when the object I want is found, I assign that object to the output field. So this assigns a value to a field of the local variable, not changing the local variable itself. I believe that since the local variable itself is not changed, the compiler doesn't complain. I can then use the value that I captured in the output field, outside of the list.
Domain Object:
public class Country {
private int id;
private String countryName;
public Country(int id, String countryName){
this.id = id;
this.countryName = countryName;
}
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getCountryName() {
return countryName;
}
public void setCountryName(String countryName) {
this.countryName = countryName;
}
}
Wrapper object:
private class CountryFound{
private final List<Country> countryList;
private Country foundCountry;
public CountryFound(List<Country> countryList, Country foundCountry){
this.countryList = countryList;
this.foundCountry = foundCountry;
}
public List<Country> getCountryList() {
return countryList;
}
public void setCountryList(List<Country> countryList) {
this.countryList = countryList;
}
public Country getFoundCountry() {
return foundCountry;
}
public void setFoundCountry(Country foundCountry) {
this.foundCountry = foundCountry;
}
}
Iterate operation:
int id = 5;
CountryFound countryFound = new CountryFound(countryList, null);
countryFound.getCountryList().forEach(c -> {
if(c.getId() == id){
countryFound.setFoundCountry(c);
}
});
System.out.println("Country found: " + countryFound.getFoundCountry().getCountryName());
You could remove the wrapper class method "setCountryList()" and make the field "countryList" final, but I did not get compilation errors leaving these details as-is.
To have a more general solution, you can write a generic Wrapper class:
public static class Wrapper<T> {
public T obj;
public Wrapper(T obj) { this.obj = obj; }
}
...
Wrapper<Integer> w = new Wrapper<>(0);
this.forEach(s -> {
s.setOrdinal(w.obj);
w.obj++;
});
(this is a variant of the solution given by Almir Campos).
In the specific case this is not a good solution, as Integer is worse than int for your purpose, anyway this solution is more general I think.

Null-free "maps": Is a callback solution slower than tryGet()?

In comments to "How to implement List, Set, and Map in null free design?", Steven Sudit and I got into a discussion about using a callback, with handlers for "found" and "not found" situations, vs. a tryGet() method, taking an out parameter and returning a boolean indicating whether the out parameter had been populated. Steven maintained that the callback approach was more complex and almost certain to be slower; I maintained that the complexity was no greater and the performance at worst the same.
But code speaks louder than words, so I thought I'd implement both and see what I got. The original question was fairly theoretical with regard to language ("And for argument sake, let's say this language don't even have null") -- I've used Java here because that's what I've got handy. Java doesn't have out parameters, but it doesn't have first-class functions either, so style-wise, it should suck equally for both approaches.
(Digression: As far as complexity goes: I like the callback design because it inherently forces the user of the API to handle both cases, whereas the tryGet() design requires callers to perform their own boilerplate conditional check, which they could forget or get wrong. But having now implemented both, I can see why the tryGet() design looks simpler, at least in the short term.)
First, the callback example:
class CallbackMap<K, V> {
private final Map<K, V> backingMap;
public CallbackMap(Map<K, V> backingMap) {
this.backingMap = backingMap;
}
void lookup(K key, Callback<K, V> handler) {
V val = backingMap.get(key);
if (val == null) {
handler.handleMissing(key);
} else {
handler.handleFound(key, val);
}
}
}
interface Callback<K, V> {
void handleFound(K key, V value);
void handleMissing(K key);
}
class CallbackExample {
private final Map<String, String> map;
private final List<String> found;
private final List<String> missing;
private Callback<String, String> handler;
public CallbackExample(Map<String, String> map) {
this.map = map;
found = new ArrayList<String>(map.size());
missing = new ArrayList<String>(map.size());
handler = new Callback<String, String>() {
public void handleFound(String key, String value) {
found.add(key + ": " + value);
}
public void handleMissing(String key) {
missing.add(key);
}
};
}
void test() {
CallbackMap<String, String> cbMap = new CallbackMap<String, String>(map);
for (int i = 0, count = map.size(); i < count; i++) {
String key = "key" + i;
cbMap.lookup(key, handler);
}
System.out.println(found.size() + " found");
System.out.println(missing.size() + " missing");
}
}
Now, the tryGet() example -- as best I understand the pattern (and I might well be wrong):
class TryGetMap<K, V> {
private final Map<K, V> backingMap;
public TryGetMap(Map<K, V> backingMap) {
this.backingMap = backingMap;
}
boolean tryGet(K key, OutParameter<V> valueParam) {
V val = backingMap.get(key);
if (val == null) {
return false;
}
valueParam.value = val;
return true;
}
}
class OutParameter<V> {
V value;
}
class TryGetExample {
private final Map<String, String> map;
private final List<String> found;
private final List<String> missing;
private final OutParameter<String> out = new OutParameter<String>();
public TryGetExample(Map<String, String> map) {
this.map = map;
found = new ArrayList<String>(map.size());
missing = new ArrayList<String>(map.size());
}
void test() {
TryGetMap<String, String> tgMap = new TryGetMap<String, String>(map);
for (int i = 0, count = map.size(); i < count; i++) {
String key = "key" + i;
if (tgMap.tryGet(key, out)) {
found.add(key + ": " + out.value);
} else {
missing.add(key);
}
}
System.out.println(found.size() + " found");
System.out.println(missing.size() + " missing");
}
}
And finally, the performance test code:
public static void main(String[] args) {
int size = 200000;
Map<String, String> map = new HashMap<String, String>();
for (int i = 0; i < size; i++) {
String val = (i % 5 == 0) ? null : "value" + i;
map.put("key" + i, val);
}
long totalCallback = 0;
long totalTryGet = 0;
int iterations = 20;
for (int i = 0; i < iterations; i++) {
{
TryGetExample tryGet = new TryGetExample(map);
long tryGetStart = System.currentTimeMillis();
tryGet.test();
totalTryGet += (System.currentTimeMillis() - tryGetStart);
}
System.gc();
{
CallbackExample callback = new CallbackExample(map);
long callbackStart = System.currentTimeMillis();
callback.test();
totalCallback += (System.currentTimeMillis() - callbackStart);
}
System.gc();
}
System.out.println("Avg. callback: " + (totalCallback / iterations));
System.out.println("Avg. tryGet(): " + (totalTryGet / iterations));
}
On my first attempt, I got 50% worse performance for callback than for tryGet(), which really surprised me. But, on a hunch, I added some garbage collection, and the performance penalty vanished.
This fits with my instinct, which is that we're basically talking about taking the same number of method calls, conditional checks, etc. and rearranging them. But then, I wrote the code, so I might well have written a suboptimal or subconsicously penalized tryGet() implementation. Thoughts?
Updated: Per comment from Michael Aaron Safyan, fixed TryGetExample to reuse OutParameter.
I would say that neither design makes sense in practice, regardless of the performance. I would argue that both mechanisms are overly complicated and, more importantly, don't take into account actual usage.
Actual Usage
If a user looks up a value in a map and it isn't there, most likely the user wants one of the following:
To insert some value with that key into the map
To get back some default value
To be informed that the value isn't there
Thus I would argue that a better, null-free API would be:
has(key) which indicates if the key is present (if one only wishes to check for the key's existence).
get(key) which reports the value if the key is present; otherwise, throws NoSuchElementException.
get(key,defaultval) which reports the value for the key, or defaultval if the key isn't present.
setdefault(key,defaultval) which inserts (key,defaultval) if key isn't present, and returns the value associated with key (which is defaultval if there is no previous mapping, otherwise prev mapping).
The only way to get back null is if you explicity ask for it as in get(key,null). This API is incredibly simple, and yet is able to handle the most common map-related tasks (in most use cases that I have encountered).
I should also add that in Java, has() would be called containsKey() while setdefault() would be called putIfAbsent(). Because get() signals an object's absence via a NoSuchElementException, it is then possible to associate a key with null and treat it as a legitimate association.... if get() returns null, it means the key has been associated with the value null, not that the key is absent (although you can define your API to disallow a value of null if you so choose, in which case you would throw an IllegalArgumentException from the functions that are used to add associations if the value given is null). Another advantage to this API, is that setdefault() only needs to perform the lookup procedure once instead of twice, which would be the case if you used if( ! dict.has(key) ){ dict.set(key,val); }. Another advantage is that you do not surprise developers who write something like dict.get(key).doSomething() who assume that get() will always return a non-null object (because they have never inserted a null value into the dictionary)... instead, they get a NoSuchElementException if there is no value for that key, which is more consistent with the rest of the error checking in Java and which is also a much easier to understand and debug than NullPointerException.
Answer To Question
To answer original question, yes, you are unfairly penalizing the tryGet version.... in your callback based mechanism you construct the callback object only once and use it in all subsequent calls; whereas in your tryGet example, you construct your out parameter object in every single iteration. Try taking the line:
OutParameter out = new OutParameter();
Take the line above out of the for-loop and see if that improves the performance of the tryGet example. In other words, place the line above the for-loop, and re-use the out parameter in each iteration.
David, thanks for taking the time to write this up. I'm a C# programmer, so my Java skills are a bit vague these days. Because of this, I decided to port your code over and test it myself. I found some interesting differences and similarities, which are pretty much worth the price of admission as far as I'm concerned. Among the major differences are:
I didn't have to implement TryGet because it's built into Dictionary.
In order to use the native TryGet, instead of inserting nulls to simulate misses, I simply omitted those values. This still means that v = map[k] would have set v to null, so I think it's a proper porting. In hindsight, I could have inserted the nulls and changed (_map.TryGetValue(key, out value)) to (_map.TryGetValue(key, out value) && value != null)), but I'm glad I didn't.
I want to be exceedingly fair. So, to keep the code as compact and maintainable as possible, I used lambda calculus notation, which let me define the callbacks painlessly. This hides much of the complexity of setting up anonymous delegates, and allows me to use closures seamlessly. Ironically, the implementation of Lookup uses TryGet internally.
Instead of declaring a new type of Dictionary, I used an extension method to graft Lookup onto the standard dictionary, much simplifying the code.
With apologies for the less-than-professional quality of the code, here it is:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication1
{
static class CallbackDictionary
{
public static void Lookup<K, V>(this Dictionary<K, V> map, K key, Action<K, V> found, Action<K> missed)
{
V v;
if (map.TryGetValue(key, out v))
found(key, v);
else
missed(key);
}
}
class TryGetExample
{
private Dictionary<string, string> _map;
private List<string> _found;
private List<string> _missing;
public TryGetExample(Dictionary<string, string> map)
{
_map = map;
_found = new List<string>(_map.Count);
_missing = new List<string>(_map.Count);
}
public void TestTryGet()
{
for (int i = 0; i < _map.Count; i++)
{
string key = "key" + i;
string value;
if (_map.TryGetValue(key, out value))
_found.Add(key + ": " + value);
else
_missing.Add(key);
}
Console.WriteLine(_found.Count() + " found");
Console.WriteLine(_missing.Count() + " missing");
}
public void TestCallback()
{
for (int i = 0; i < _map.Count; i++)
_map.Lookup("key" + i, (k, v) => _found.Add(k + ": " + v), k => _missing.Add(k));
Console.WriteLine(_found.Count() + " found");
Console.WriteLine(_missing.Count() + " missing");
}
}
class Program
{
static void Main(string[] args)
{
int size = 2000000;
var map = new Dictionary<string, string>(size);
for (int i = 0; i < size; i++)
if (i % 5 != 0)
map.Add("key" + i, "value" + i);
long totalCallback = 0;
long totalTryGet = 0;
int iterations = 20;
TryGetExample tryGet;
for (int i = 0; i < iterations; i++)
{
tryGet = new TryGetExample(map);
long tryGetStart = DateTime.UtcNow.Ticks;
tryGet.TestTryGet();
totalTryGet += (DateTime.UtcNow.Ticks - tryGetStart);
GC.Collect();
tryGet = new TryGetExample(map);
long callbackStart = DateTime.UtcNow.Ticks;
tryGet.TestCallback();
totalCallback += (DateTime.UtcNow.Ticks - callbackStart);
GC.Collect();
}
Console.WriteLine("Avg. callback: " + (totalCallback / iterations));
Console.WriteLine("Avg. tryGet(): " + (totalTryGet / iterations));
}
}
}
My performance expectations, as I said in the article that inspired this one, would be that neither one is much faster or slower than the other. After all, most of the work is in the searching and adding, not in the simple logic that structures it. In fact, it varied a bit among runs, but I was unable to detect any consistent advantage.
Part of the problem is that I used a low-precision timer and the test was short, so I increased the count by 10x to 2000000 and that helped. Now callbacks are about 3% slower, which I do not consider significant. On my fairly slow machine, callbacks took 17773437 while tryget took 17234375.
Now, as for code complexity, it's a bit unfair because TryGet is native, so let's just ignore the fact that I had to add a callback interface. At the calling spot, lambda notation did a great job of hiding the complexity. If anything, it's actually shorter than the if/then/else used in the TryGet version, although I suppose I could have used a ternary operator to make it equally compact.
On the whole, I found the C# to be more elegant, and only some of that is due to my bias as a C# programmer. Mainly, I didn't have to define and implement interfaces, which cut down on the plumbing overhead. I also used pretty standard .NET conventions, which seem to be a bit more streamlined than the sort of style favored in Java.

Categories

Resources