HashMap vs Switch statement performance - java

A HashMap essentially has O(1) performance while a switch state can have either O(1) or O(log(n)) depending on if the compiler uses a tableswitch or lookup switch.
Understandably, if a switch statement is written as such,
switch (int) {
case 1:
case 2:
case 3:
case 4:
default:
}
then it would use a tableswitch and clearly have a performance advantage over a standard HashMap. But what if the switch statement is sparse? These would be two examples that I would be comparing:
HashMap<Integer, String> example = new HashMap<Integer, String>() {{
put(1, "a");
put(10, "b");
put(100, "c");
put(1000, "d");
}};
.
switch (int) {
case 1:
return "a";
case 10:
return "b";
case 100:
return "c";
case 1000:
return "d";
default:
return null;
}
What would provide more throughput, a lookupswitch or HashMap?
Does the overhead of the HashMap give the lookupswitch an advantage early but eventually tapers off as the number of cases/entries increase?
Edit: I tried some benchmarks using JMH, here are my results and code used. https://gist.github.com/mooman219/bebbdc047889c7cfe612
As you guys mentioned, the lookupswitch statement outperformed the HashTable. I'm still wondering why though.

The accepted answer is wrong here.
http://java-performance.info/string-switch-implementation/
Switches will always be as fast as if not faster than hash maps. Switch statements are transformed into direct lookup tables. In the case of Integer values (ints, enums, shorts, longs) it is a direct lookup/jmp to the statement. There is no additional hashing that needs to happen. In the case of a String, it precomputes the string hash for the case statements and uses the input String's hashcode to determine where to jump. In the case of collision, it does an if/else chain. Now you might think "This is the same as HashMap, right?" But that isn't true. The hash code for the lookup is computed at compile time and it isn't reduced based on the number of elements (lower chance of collision).
Switches have O(1) lookup, not O(n). (Ok, in truth for a small number of items, switches are turned into if/else statements. This provides better code locality and avoids additional memory lookups. However, for many items, switches are changed into the lookup table I mentioned above).
You can read more about it here
How does Java's switch work under the hood?

It depends:
If there are a few items | fixed items. Using switch if you can ( worst case O(n))
If there are a lot of items OR you want to add future items without modifying much code ---> Using hash-map ( access time is considered as constant time)
You should NOT try to improve performance for the case, because the difference in execution time is nanoseconds. Just focus on readability/maintainability of your code. Is it worth optimizing a simple case to improve a few nanoseconds?

TL/DR
Base it on code readability and maintainability. Both are cost O(1) and provide almost no difference (though switches generally will be slightly faster).
In this particular case a map would be faster, as a switch returns an address and then must go to that address to identify the return value. (A rare case example). If your switch is just calling functions anyways, a Map would also be faster.
To make things faster, I would ensure using numeric cases and avoid using strings via constants or enumerators (typescript).
(edited) I confirmed by expectation: How does Java's switch work under the hood? with switches.
More detailed answer
In the weeds:
A switch statement will usually be higher performance. It creates a lookup table and goto reference and starts at that point. However there are exceptions.
When you utilize a simple switch such as return map.get(x) vs. switch(1=>'a', 2=>'b', etc). That is because the map can directly return the value desired where the switch will stop map the addresses and continue until break or the end is met.
In any event, they should be extremely similar in execution cost.
Think about maintainability and readability of the code
Using a map decouples the data, which can gain the benefit of creating "switch" cases dynamically. More detailed below.
If there are several complex functions/processes you need to handle, it may be easier to read/write if you utilize a map instead. Especially if the switch statement starts to exceed 20 or 30 options.
Personally used case example for maps:
I have been utilize the following pattern for flux (Redux/useReducer) in React applications for some time.
I create a central map where I map the trigger as the key, and the value is a functional reference. I can then load cases where and when it makes sense.
Initially I used this to be able to break the use cases down to reduce file size and group cases of similar function together in a more organized fashion. Although I later evolved it to be loaded in domains and configured the events and data in a domain hook, like useUser, useFeedback, useProfile, etc...
Doing so allowed me to create the default state, initialization functions, events, and so forth into a logical file structure, it also allowed me to keep the footprint low until needed.
One note to keep in mind
Using a map does not allow for drop through, though most people consider this code smell anyways. At the same time it protects from accidental fall through.

In your case, since you are using an Integer key for your HashMap and a plain 'int' for your switch statement, the best performing implementation will be the switch statement unless the number of passes through this section of code is very high (tens or hundreds of thousands).

If I have that kind of example I use Guava ImmutableMaps (sure You can use java 9 builder as well).
private static final Map<String, String> EXAMPLE = ImmutableMaps.<String, String>>builder()
.put("a", "100")
.put("b", "200")
.build();
That way they are immutable and initated only once.
Sometimes I use strategy pattern that way:
private static final Map<String, Command> EXAMPLE = ImmutableMaps.<String, String>>builder()
.put("a", new SomethingCool())
.put("b", new BCool())
.build();
private static final Command DEFAULT= new DefCommand();
Use:
EXAMPLE.getOrDefault("a", DEFAULT).execute(); //java 8
About performance just pick readability. You will thank me later (1 year later) :D.

Related

AtomicInteger & lambda expressions in single-threaded app

I need to modify a local variable inside a lambda expression in a JButton's ActionListener and since I'm not able to modify it directly, I came across the AtomicInteger type.
I implemented it and it works just fine but I'm not sure if this is a good practice or if it is the correct way to solve this situation.
My code is the following:
newAnchorageButton.addActionListener(e -> {
AtomicInteger anchored = new AtomicInteger();
anchored.set(0);
cbSets.forEach(cbSet ->
cbSet.forEach(cb -> {
if (cb.isSelected())
anchored.incrementAndGet();
})
);
// more code where I use the 'anchored' variable...
}
I'm not sure if this is the right way to solve this since I've read that AtomicInteger is used mostly for concurrency-related applications and this program is single-threaded, but at the same time I can't find another way to solve this.
I could simply use two nested for-loops to go over those arrays but I'm trying to reduce the method's cognitive complexity as much as I can according to the sonarlint vscode extension, and leaving those for-loops theoretically increases the method complexity and therefore its readability and maintainability.
Replacing the for-loops with lambda expressions reduces the cognitive complexity but maybe I shouldn't pay that much attention to it.
While it is safe enough in single-threaded code, it would be better to count them in a functional way, like this:
long anchored = cbSets.stream() // get a stream of the sets
.flatMap(List::stream) // flatten to list of cb's
.filter(JCheckBox::isSelected) // only selected ones
.count(); // count them
Instead of mutating an accumulator, we limit the flattened stream to only the ones we're interested in and ask for the count.
More generally, though, it is always possible to sum things up or generally aggregate the values without a mutable variable. Consider:
record Country(int population) { }
countries.stream()
.mapToInt(Country::population)
.reduce(0, Math::addExact)
Note: we never mutate any values; instead, we combine each successive value with the preceding one, producing a new value. One could use sum() but I prefer reduce(0, Math::addExact) to avoid the possibility of overflow.
and leaving those for-loops theoretically increases the method complexity and therefore its readability and maintainability.
This is obvious horsepuckey. x.forEach(foo -> bar) is not 'cognitively simpler' than for (var foo : x) bar; - you can map each AST node straight over from one to the other.
If a definition is being used to define complexity which concludes that one is significantly more complex than the other, then the only correct conclusion is that the definition is silly and should be fixed or abandoned.
To make it practical: Yes, introducing AtomicInteger, whilst performance wise it won't make one iota of difference, does make the code way more complicated. AtomicInteger's simple existence in the code suggests that concurrency is relevant here. It isn't, so you'd have to add a comment to explain why you're using it. Comments are evil. (They imply the code does not speak for itself, and they cannot be tested in any way). They are often the least evil, but evil they are nonetheless.
The general 'trick' for keeping lambda-based code cognitively easily followed is to embrace the pipeline:
You write some code that 'forms' a stream. This can be as simple as list.stream(), but sometimes you do some stream joining or flatmapping a collection of collections.
You have a pipeline of operations that operate on single elements in the stream and do not refer to the whole or to any neighbour.
At the end, you reduce (using collect, reduce, max - some terminator) such that the reducing method returns what you need.
The above model (and the other answer follows it precisely) tends to result in code that is as readable/complex as the 'old style' code, and rarely (but sometimes!) more readable, and significantly less complicated. Deviate from it and the result is virtually always considerably more complicated - a clear loser.
Not all for loops in java fit the above model. If it doesn't fit, then trying to force that particular square peg into the round hole will take a lot of effort and almost always results in code that is significantly worse: Either an order of magnitude slower or considerably more cognitively complicated.
It also means that it is virtually never 'worth' rewriting perfectly fine readable non-stream based code into stream based code; at best it becomes a percentage point more readable according to some personal tastes, with no significant universally agreed upon improvement.
Turn off that silly linter rule. The fact that it considers the above 'less' complex, and that it evidently determines that for (var foo : x) bar; is 'more complicated' than x.forEach(foo -> bar) is proof enough that it's hurting way more than it is helping.
I have the following to add to the two other answers:
Two general good practices in your code are in question:
Lambdas shouldn't be longer than 3-4 lines
Except in some precise cases, lambdas of stream operations should be stateless.
For #1, consider extracting the code of the lambda to a private method for example, when it's getting too long.
You will probably gain in readability, and you will also probably gain in better separating UI from business logic.
For #2, you are probably not concerned since you are working in a single thread at the moment, but streams can be parallelized, and they may not always execute exactly as you think it does.
For that reason, it's always better to keep the code stateless in stream pipeline operations. Otherwise you might be surprised.
More generally, streams are very good, very concise, but sometimes it's just better to do the same with good old loops.
Don't hesitate to come back to classic loops.
When Sonar tells you that the complexity is too high, in fact, you should try to factorize your code: split into smaller methods, improve the model of your objects, etc.

Reducing complexity of large switch statements [duplicate]

This question already has answers here:
Eliminating `switch` statements [closed]
(23 answers)
Closed 5 years ago.
In the codebase I'm currently working on, it's common to have to take a string passed in from further up the chain and use it as a key to find a different String. The current standard idiom is to use switch statements, however for larger switch statements (think ~20-30 cases) sonarqube says it's a code smell and should be reduced for cyclomatic complexity. My current solution is to use a static HashMap, like so
private static final HashMap<String, String> sortMap;
static {
sortMap = new HashMap<>();
sortMap.put("foo1", "bar1");
sortMap.put("foo2", "bar2");
sortMap.put("foo3", "bar3");
etc...
}
protected String mapSortKey(String key) {
return sortMap.get(key);
}
However this doesn't seem to actually be any cleaner, and if anything seems more confusing for maintainers. Is there a better way to solve this? Or should sonarqube be ignored in this situation? I am aware of using polymorphism, i.e. Ways to eliminate switch in code, however that seems like it is overkill for this problem, as the switch statements are being used as makeshift data structures rather than as rudimentary polymorphism. Other similar questions I've found about reducing switch case cyclomatic complexity aren't really applicable in this instance.
If, by your example, this is just the case of choosing a mapped value from a key, a table or properties file would be a more appropriate way to handle this.
If you're talking about logic within the different switch statements, you might find that a rules engine would suit better.
You hit upon the major requirement: maintainability. If we are coding in too much logic or too much data, we have made brittle code. Choose a design pattern suited to the type of switched information and export the functionality into a maintainable place for whomever must make changes later... because with a long list like this, chances are high that changes will be occurring with some frequency.

How do I make multiple OR's in an in an statement more concise in Java

I have an if statement along the lines of:
if (characterStrings[occupation] == "bar-owner" || characterStrings[occupation] == "barrista" || characterStrings[occupation] == "shop owner")
How can I make this and similar or statements more concise in java?
Thanks very much I havent be able to find a documentation of this anywhere,
You could use the following code:
if (Arrays.asList("bar-owner", "barrista", "shop owner").contains(characterStrings[occupation]))
This will check if characterStrings[occupation] is any of bar owner, barrista or shop owner.
Use switch -case instead of multiple or(||)
From java7 onwards switch supports strings also
small example
switch(characterStrings[occupation])
{
case "bar-owner": //some codes for bar-owner
break;
case "barrista":// codes for barrista
break;
}
First, you should NOT compare strings using ==. It is nearly always a bug. For example:
if ("hello" == new String("hello")) {
System.out.println("Something impossible just happened!!");
}
(The only cases where it is not a bug involve comparison of String literals and/or manually "interned" String objects. And even then, it is a rather dubious optimization because its correctness depends on you never using "normal" strings.)
In Java 6 and earlier there is no way do a sequence of String equals comparisons that is BOTH more concise than AND as efficient as the original version.
Using Arrays.asList, as in
if (Arrays.asList("bar-owner", "barrista",
"shop owner").contains(characterStrings[occupation])) {
// statements
}
is more concise, but it is also significantly less efficient:
The contains call must internally iterate over the elements of the list object, testing each one with equals.
The asList call involves allocating and initializing the String[] for the varargs argument, and allocating and initializing the List object returned by the call. (You can potentially "hoist" this to improve performance, but that detracts from the conciseness ...)
In Java 7:
switch (characterStrings[occupation]) {
case "bar-owner": case "barrista": case "shop owner":
// statements
break;
}
is more concise, and could also be more efficient. It is plausible that the Java compiler(s) could turn that into a lookup in a hidden static HashSet<String> or the equivalent. There is going to be a "break even" point where cost of a sequence of N equals tests is greater than the cost of a hash table lookup.
If you have to check all those conditions, then you have to check all those conditions. However, a switch block is cleaner with many cases to check for.
Also note that you are comparing strings with ==. Don't. Use the equals() method.
If you are comparing strings use equals() method instead of ==.
you can use switch

Call Method based on user preferences, which is faster/better

We have X methods and we like call the one relative to user settings which of the following runs faster?
Case 1:
int userSetting = 1;
Method method = Class.getDeclaredMethod("Method" + userSetting);
method.invoke();
Case 2:
int userSetting = 1;
switch(userSettings) {
case 0:
Method0();
break;
case 1:
Method1();
break;
...
...
}
Case 3:
int userSetting = 1;
if(userSetting == 0){
Method0();
} else if(userSetting == 1){
Method1();
} else....
Also:
You think one even if slower is better practice that the others? If yes why?
There is another way witch is better/faster...please tell us.
Thanks
Option 1 uses reflection, and thus will probably be slower, as the javadocs indicate:
Performance Overhead
Because reflection involves types that are dynamically resolved, certain Java
virtual machine optimizations can not be performed. Consequently, reflective
operations have slower performance than their non-reflective counterparts,
and should be avoided in sections of code which are called frequently in
performance-sensitive applications.
However it is easier to maintain this option then options 2+3.
I would suggest you to use a complete different option: use the strategy design pattern. It is more likely to be faster and much more readable then the alternatives.
As amit points out, this is a case for the Strategy design pattern. Additionally, I want to give a short example:
Pseudo-Code:
public interface Calculator {
public int calc(...);
}
public class FastCalc implements Calculator {
public int calc(...) {
// Do the fast stuff here
}
}
public class SlowCalc implements Calculator {
public int calc(...) {
// Do the slow stuff here
}
}
You main program then decides which strategy to use based on the user preferences:
Calculator calc = userPreference.getBoolean("fast") ? new FastCalc() : new SlowCalc();
int result = calc.calc(...);
This is because later, you can use the Factory pattern to create multiple strategies for various operations:
Factory factory = new SlowFactory();
Calculator calc = factory.createCalculator();
Operation op = factory.createSomeOtherOperation();
Factory factory = new FastFactory();
Calculator calc = factory.createCalculator();
Operation op = factory.createSomeOtherOperation();
As you can see, the code is the same for the Slow case and for the Fast case, except the factory class, and that you can create by deciding based on the user preference. Especially if you have more such operations, such as Calculator and my Operation example, then you will want your code to not be dependent on the user preference everywhere but only at a single place.
I think the obvious slowest version is number one. reflexion is complex and is done during the runtime. for number 2 and number 3 you could have a look at Java: case-statment or if-statement efficiency perspective.
another way: could the configuration of the user change during the execution? if not, make the decision only one time on start-up.
Case 1 uses reflection and suffers a performance hit beyond approaches 2 and 3.
Between approaches 2 & 3 performance difference would be marginal at most. You must ask yourselves if any possible performance gain is really justified over code readability? Unless being on a truly limited microchip or similar I would always answer no.
Apart from the performance view, as #HoeverCraft Full Of Eels already pointed out you're probably better of redesigning your program to completely avoid the series of conditional clauses.
As all others have said #1 will most likely be the slowest.
The differences between 2 and 3 are negligible, but generally #2 shouldn't be slower than #3, because the compiler can change a switch to a cascaded if, if it thinks it would be faster. Also since the switch is clearly better readable than the if/else cascade I'd go with the second anyhow.
Although I'm extremely sure that this isn't the bottleneck anyhow - even if using reflection..

Performance of extra string comparisons vs HashMap lookups

Assume I am running either of the code snippets below for a list of 1000 Event entries (in allEventsToAggregate). Would I see a performance improvement in the first implementation if the events in allEventsToAggregate are sorted by customerId, with each customer having roughly 3 events? This is essentially a question of string comparison vs. HashMap lookup performance.
Option 1:
Map<String, List<Event>> eventsByCust = new HashMap<String, List<Event>>();
List<Event> thisCustEntries;
String lastCust = null;
for (Event thisEvent : allEventsToAggregate) {
if (!thisEvent.getCustomerId().equals(lastCust)) {
thisCustEntries = eventsByCust.get(thisEvent.getCustomerId());
if (thisCustEntries == null) {
thisCustEntries = new ArrayList<Event>();
}
}
thisCustEntries.add(thisEvent);
eventsByCust.put(thisEvent.getCustomerId(), thisCustEntries);
lastCust = thisEvent.getCustomerId();
}
Option 2:
Map<String, List<Event>> eventsByCust = new HashMap<String, List<Event>>();
for (Event thisEvent : allEventsToAggregate) {
List<Event> thisCustEntries = eventsByCust.get(thisEvent.getCustomerId());
if (thisCustEntries == null) {
thisCustEntries = new ArrayList<Event>();
}
thisCustEntries.add(thisEvent);
}
Would I see a performance improvement
Almost certainly not. Unless this block represents a critical inner loop of your application, any marginal performance gains will almost certainly be unnoticeable.
Consequently, I would go with the second version of the code, as its a clearer expression of your intent and so will be easier to maintain (as well as being slightly less prone to subtle bugs in the first place). Maintainability almost certainly trumps making the application 0.001% faster.
1) Remember that a successful retrieval of an item from a HashMap requires a string compare to confirm that you really have found the correct item.
2) We seem to be talking about very small differences in execution time, not real algorithmic improvements. Is it really worth losing readability for this?
3) For small differences, the only way to really know will be to actually time the thing in practice - in fact not only to run a comparison, but to organise it as a fully fledged scientific experiment. There is just too much too to worry about these days about what your compiler and run time system has chosen to optimise, what cpu caching or VM page faulting means, and what Java garbage collection thinks of your algorithm. Then, of course, you may well find that you get different answers for different versions of Java or on hardware with different cpus, motherboards, or memory sizes, or even how long the system has been running and so how much time it has had to migrate its disk contents into memory cache, or JIT-compile relevant bits of Java, or whatever.

Categories

Resources