I have a method that checks all of the combinations of 5 different conditions with 32 if-else statements (think of the truth table). The 5 different letters represent methods that each run their own regular expressions on a string, and return a boolean indicating whether or not the string matches the regex. For example:
if(A,B,C,D,E){
}else if(A,B,C,D,!E){
}else if(A,B,C,!D,!E){
}...etc,etc.
However, it is really affecting the performance of my application (sorry, I can't go into too many details). Can anyone recommend a better way to handle such logic?
Each method using a regular expression looks like this:
String re1 = "regex here";
Pattern p = Pattern.compile(re1, Pattern.DOTALL);
Matcher m = p.matcher(value);
return m.find();
Thanks!
You can try
boolean a,b,c,d,e;
int combination = (a?16:0) + (b?8:0) + (c?4:0) + (d?2:0) + (e?1:0);
switch(combination) {
case 0:
break;
// through to
case 31:
break;
}
represent each condition as a bit flag, test each condition once, and set the relevant flag in a single int. then switch on the int value.
int result = 0;
if(A) {
result |= 1;
}
if(B) {
result |= 2;
}
// ...
switch(result) {
case 0: // (!A,!B,!C,!D,!E)
case 1: // (A,!B,!C,!D,!E)
// ...
}
All the above answers are wrong, because the correct answer to an optimisation question is: Measure! Use a profiler to measure where your code is spending its time.
Having said that, I'd be prepared to bet that the biggest win is avoiding compiling the regexes more than once each. And after that, as others suggested, only evaluate each condition once and store the results in boolean variables. So thait84 has the best answer.
I'm also prepared to bet jtahlborn and Peter Lawrey's and Salvatore Previti suggestions (essentially the same), clever though they are, will get you negligible additional benefit, unless you're running on a 6502...
(This answer reads like I'm full of it, so in the interests of full disclosure I should mention that I'm actually hopeless at optimisation. But measuring still is the right answer.)
Without knowing more details, it might be helpful to arrange the if statements in such a way that the ones which do the "heavy" lifting are executed last. This is making the assumption that the other conditionals will be true thereby avoiding the "heavy" lifting ones all together. In short, take advantage of short-circuits if possible.
Run the regex once for each string and store the results in to booleans and just do the if / else on the booleans instead of running the regex multiple times. Also, if you can, try to re-use a pre-compiled version of your regex and re-use this.
One possible solution: use a switch creating a binary value.
int value = (a ? 1 : 0) | (b ? 2 : 0) | (c ? 4 : 0) | (d ? 8 : 0) | (e ? 16 : 0);
switch (value)
{
case 0:
case 1:
case 2:
case 3:
case 4:
...
case 31:
}
If you can avoid the switch and use an array it would be faster.
Maybe partition it into layers, like so:
if(A) {
if(B) {
//... the rest
} else {
//... the rest
}
} else {
if(B) {
//... the rest
} else {
//... the rest
}
}
Still, feels like there must be a better way to do this.
I have a solution with EnumSet. However it's too verbose and I guess I prefer #Peter Lawrey's solution.
In Effective Java by Bloch it's recommended to use EnumSet over bit fields, but I would make an exception here. Nonetheless I posted my solution because it could be useful for someone with a slightly different problem.
import java.util.EnumSet;
public enum MatchingRegex {
Tall, Blue, Hairy;
public static EnumSet<MatchingRegex> findValidConditions(String stringToMatch) {
EnumSet<MatchingRegex> validConditions = EnumSet.noneOf(MatchingRegex.class);
if (... check regex stringToMatch for Tall)
validConditions.add(Tall);
if (... check regex stringToMatch for Blue)
validConditions.add(Blue);
if (... check regex stringToMatch for Hairy)
validConditions.add(Hairy);
return validConditions;
}
}
and you use it like this:
Set<MatchingRegex> validConditions = MatchingRegex.findValidConditions(stringToMatch);
if (validConditions.equals(EnumSet.of(MatchingRegex.Tall, MathchingRegex.Blue, MatchingRegex.Hairy))
...
else if (validConditions.equals(EnumSet.of(MatchingRegex.Tall, MathchingRegex.Blue))
...
else if ... all 8 conditions like this
But it would be more efficient like this:
if (validConditions.contains(MatchingRegex.Tall)) {
if (validConditions.contains(MatchingRegex.Blue)) {
if (validConditions.contains(MatchingRegex.Hairy))
... // tall blue hairy
else
... // tall blue (not hairy)
} else {
if (validConditions.contains(MatchingRegex.Hairy))
... // tall (not blue) hairy
else
... // tall (not blue) (not hairy)
} else {
... remaining 4 conditions
}
You could also adapt your if/else to a switch/case (which I understand is faster)
pre-generating A,B,C,D and E as booleans rather than evaluating them in if conditions blocks would provide both readability and performance. If you're also concerned about performance the different cases, you may organise them as a tree or combine them into a single integer (X = (A?1:0)|(B?2:0)|...|(E?16:0)) that you'd use in a switch.
Related
I have an if statement with many conditions (have to check for 10 or 15 constants to see if any of them are present.)
Instead of writing something like:
if (x == 12 || x == 16 || x == 19 || ...)
is there any way to format it like
if x is [12, 16, 19]?
Just wondering if there is an easier way to code this, any help appreciated.
The answers have been very helpful, but I was asked to add more detail by a few people, so I will do that to satiate their curiosity. I was making a date validation class that needed to make sure days were not > 30 in the months that have only 30 days (of which there are 4, I think) and I was writing an if statement to check things like this:
if (day > 30 && (month == 4 || month == 6 || month == 9 || month == 11))
I was just wondering if there was a faster way to code things like that - many of the answers below have helped.
I use this kind of pattern often. It's very compact:
// Define a constant in your class. Use a HashSet for performance
private static final Set<Integer> values = new HashSet<Integer>(Arrays.asList(12, 16, 19));
// In your method:
if (values.contains(x)) {
...
}
A HashSet is used here to give good look-up performance - even very large hash sets are able to execute contains() extremely quickly.
If performance is not important, you can code the gist of it into one line:
if (Arrays.asList(12, 16, 19).contains(x))
but know that it will create a new ArrayList every time it executes.
Do you want to switch to this??
switch(x) {
case 12:
case 16:
case 19:
//Do something
break;
default:
//Do nothing or something else..
break;
}
If the set of possibilities is "compact" (i.e. largest-value - smallest-value is, say, less than 200) you might consider a lookup table. This would be especially useful if you had a structure like
if (x == 12 || x == 16 || x == 19 || ...)
else if (x==34 || x == 55 || ...)
else if (...)
Set up an array with values identifying the branch to be taken (1, 2, 3 in the example above) and then your tests become
switch(dispatchTable[x])
{
case 1:
...
break;
case 2:
...
break;
case 3:
...
break;
}
Whether or not this is appropriate depends on the semantics of the problem.
If an array isn't appropriate, you could use a Map<Integer,Integer>, or if you just want to test membership for a single statement, a Set<Integer> would do. That's a lot of firepower for a simple if statement, however, so without more context it's kind of hard to guide you in the right direction.
Use a collection of some sort - this will make the code more readable and hide away all those constants. A simple way would be with a list:
// Declared with constants
private static List<Integer> myConstants = new ArrayList<Integer>(){{
add(12);
add(16);
add(19);
}};
// Wherever you are checking for presence of the constant
if(myConstants.contains(x)){
// ETC
}
As Bohemian points out the list of constants can be static so it's accessible in more than one place.
For anyone interested, the list in my example is using double brace initialization. Since I ran into it recently I've found it nice for writing quick & dirty list initializations.
You could look for the presence of a map key or see if it's in a set.
Depending on what you're actually doing, though, you might be trying to solve the problem wrong :)
No you cannot do that in Java. you can however write a method as follows:
boolean isContains(int i, int ... numbers) {
// code to check if i is one of the numbers
for (int n : numbers) {
if (i == n) return true;
}
return false;
}
With Java 8, you could use a primitive stream:
if (IntStream.of(12, 16, 19).anyMatch(i -> i == x))
but this may have a slight overhead (or not), depending on the number of comparisons.
Here is another answer based on a comment above, but simpler:
List numbers= Arrays.asList(1,2,3,4,5);
if(numbers.contains(x)){
//
}
I have an if statement with many conditions (have to check for 10 or 15 constants to see if any of them are present.)
Instead of writing something like:
if (x == 12 || x == 16 || x == 19 || ...)
is there any way to format it like
if x is [12, 16, 19]?
Just wondering if there is an easier way to code this, any help appreciated.
The answers have been very helpful, but I was asked to add more detail by a few people, so I will do that to satiate their curiosity. I was making a date validation class that needed to make sure days were not > 30 in the months that have only 30 days (of which there are 4, I think) and I was writing an if statement to check things like this:
if (day > 30 && (month == 4 || month == 6 || month == 9 || month == 11))
I was just wondering if there was a faster way to code things like that - many of the answers below have helped.
I use this kind of pattern often. It's very compact:
Define a constant in your class:
private static final Set<Integer> VALUES = Set.of(12, 16, 19);
// Pre Java 9 use: VALUES = new HashSet<Integer>(Arrays.asList(12, 16, 19));
In your method:
if (VALUES.contains(x)) {
...
}
Set.of() returns a HashSet, which performs very well even for very large sets.
If performance is not important, you can code the gist of it into one line for less code footprint:
if (Set.of(12, 16, 19).contains(x))
but know that it will create a new Set every time it executes.
Do you want to switch to this??
switch(x) {
case 12:
case 16:
case 19:
//Do something
break;
default:
//Do nothing or something else..
break;
}
If the set of possibilities is "compact" (i.e. largest-value - smallest-value is, say, less than 200) you might consider a lookup table. This would be especially useful if you had a structure like
if (x == 12 || x == 16 || x == 19 || ...)
else if (x==34 || x == 55 || ...)
else if (...)
Set up an array with values identifying the branch to be taken (1, 2, 3 in the example above) and then your tests become
switch(dispatchTable[x])
{
case 1:
...
break;
case 2:
...
break;
case 3:
...
break;
}
Whether or not this is appropriate depends on the semantics of the problem.
If an array isn't appropriate, you could use a Map<Integer,Integer>, or if you just want to test membership for a single statement, a Set<Integer> would do. That's a lot of firepower for a simple if statement, however, so without more context it's kind of hard to guide you in the right direction.
Use a collection of some sort - this will make the code more readable and hide away all those constants. A simple way would be with a list:
// Declared with constants
private static List<Integer> myConstants = new ArrayList<Integer>(){{
add(12);
add(16);
add(19);
}};
// Wherever you are checking for presence of the constant
if(myConstants.contains(x)){
// ETC
}
As Bohemian points out the list of constants can be static so it's accessible in more than one place.
For anyone interested, the list in my example is using double brace initialization. Since I ran into it recently I've found it nice for writing quick & dirty list initializations.
You could look for the presence of a map key or see if it's in a set.
Depending on what you're actually doing, though, you might be trying to solve the problem wrong :)
No you cannot do that in Java. you can however write a method as follows:
boolean isContains(int i, int ... numbers) {
// code to check if i is one of the numbers
for (int n : numbers) {
if (i == n) return true;
}
return false;
}
With Java 8, you could use a primitive stream:
if (IntStream.of(12, 16, 19).anyMatch(i -> i == x))
but this may have a slight overhead (or not), depending on the number of comparisons.
Here is another answer based on a comment above, but simpler:
List numbers= Arrays.asList(1,2,3,4,5);
if(numbers.contains(x)){
//
}
I am wondering complexity of following if statement
if (isTrue()) //case 1
VS
if(isTrue()==true) //case 2
And isTrue defined as
boolean isTrue(){
//lots of calculation and return true false based on that.
return output;
}
I was thinking, complexity of if (isTrue()) is lower then if(isTrue()==true) because on case 2 require additional comparison for equals.
What about space complexity?
Any different thought?
Both of them are same in speed/space. But second way is weird for C/C++ programmers.
The different is, second way is just less readable.
They are equivalent. And when doing global optimizations condition is removed altogether.
The second case (checking for ==true) can get problematic if you or someone else redefines the value of true.
Let's say that we have the following C code:
#define true 2
bool isEqual(int a, int b)
{
return (a == b);
}
if (isEqual(5, 5)) {
printf("isEqual #1\n");
}
if (isEqual(5, 5) == true) {
printf("isEqual #2\n");
}
The output from this code will be
isEqual #1
So the shorter form where you leave out ==true is preferable not only because it leads to less verbose code but also because you avoid potential problems like these.
Is it possible to avoid code duplication in such cases? (Java code)
void f()
{
int r;
boolean condition = true;
while(condition)
{
// some code here (1)
r = check();
if(r == 0)
break ;
else if(r == 1)
return ;
else if(r == 2)
continue ;
else if(r == 3)
condition = false;
// some code here (2)
r = check();
if(r == 0)
break ;
else if(r == 1)
return ;
else if(r == 2)
continue ;
else if(r == 3)
condition = false;
// some code here (3)
}
// some code here (4)
}
int check()
{
// check a condition and return something
}
A possible solution may be using Exceptions, but that doesn't seem to be a good practice.
Is there any so-called good pattern of program flow control in such cases? For example, a way to call break ; from inside the check() function.
(Possibly in other programming languages)
Some good answers (especially #Garrett's just now) to a tough question but I'll add my $0.02 for posterity.
There is no easy answer here about how to refactor this block without seeing the actual code but my reaction to it is that it needs to be redesigned.
For example, a way to call break ; from inside the check() function. (Possibly in other programming languages)
If you are asking for a different break that Java does not support (without a hack) and having the duplicated check() and various different loop exit/repeat code indicates to me that this is a large and complicated method. Here are some ideas for you to think about:
Each of the some code here blocks are doing something. If you pull those out to their own methods, how does that change the loop?
Maybe break the loop down into a series of comments. Don't get deep into the code but think about it conceptually to see if a different configuration drops out.
Have you had another developer in your organization who is not involved with this code take a look at it? If you explain in detail how the code works someone they may see some patterns that you are not since you are in the weeds.
I also think that #aix's idea of a finite state machine is a good one but I've needed to use this sort of mechanism very few times in my programming journeys -- mostly during pattern recognition. I suspect that a redesign of the code with smaller code blocks pulled into methods will be enough to improve the code.
If you do want to implement the state machine here are some more details. You could have a loop that was only running a single switch statement that called methods. Each method would return the next value for the switch. This doesn't match your code completely but something like:
int state = 0;
WHILE: while(true) {
switch (state) {
case 0:
// 1st some code here
state = 1;
break;
case 1:
state = check();
break;
case 2:
return;
case 3:
break WHILE;
case 4:
// 2nd some code
state = 1;
break;
...
}
}
Hope some of this helps and best of luck.
The best way to avoid this duplication is not to let it happen in the first place by keeping your methods small and focused.
If the // some code here blocks are not independent, then you need to post all the code before someone can help you refactor it. If they are independent then there are ways to refactor it.
Code smell
First of all, I second aix's answer: rewrite your code! For this, the state design pattern might help. I would also say that using break, continue and return in such a way is just as much a code smell as the code duplication itself.
Having said that, here is a solution, just for fun
private int r;
void f()
{
distinction({void => codeBlock1()}, {void => codeBlock4()}, {void => f()},
{void => distinction( {void => codeBlock2()},{void => codeBlock4()},
{void => f()}, {void => codeBlock3()} )
});
}
void distinction( {void=>void} startingBlock, {void=>void} r0Block, {void=>void} r2Block, {void=>void} r3Block){
startingBlock.invoke();
r = check();
if(r == 0)
r0Block.invoke();
else if(r == 1)
{}
else if(r == 2)
r2Block.invoke();
else if(r == 3)
// if condition might be changed in some codeBlock, you still
// would need the variable condition and set it to false here.
r3Block.invoke();
}
This uses closures. Of course the parameters r0Block and r2Block could be ommited and instead codeBlock4() and f() hard-coded within distinction(). But then distinction() would only be usable by f(). With Java <=7, you would need to use an Interface with the method invoke() instead, with the 4 implementations codeBlock1 to codeBlock4. Of course this approach is not at all readable, but so general that it would work for any business logic within the codeBlocks and even any break/return/continue-orgy.
Not really.
The second continue is redundant (your code would continue anyway).
Try using the Switch statement. It will make your code more readable.
One nicer way to do it would be to use switch statements, something like this:
void f()
{
int r;
boolean condition = true;
while(condition)
{
outerloop:
r = check();
switch(r){
case 0: break outerloop;
case 1: return;
case 2: continue;
case 3: condition = false;
}
You might want to think about re-formulating your logic as a state machine. It might simplify things, and will probably make the logic easier to follow.
What modification would bring to this piece of code? In the last lines, should I use more if-else structures, instead of "if-if-if"
if (action.equals("opt1"))
{
//something
}
else
{
if (action.equals("opt2"))
{
//something
}
else
{
if ((action.equals("opt3")) || (action.equals("opt4")))
{
//something
}
if (action.equals("opt5"))
{
//something
}
if (action.equals("opt6"))
{
//something
}
}
}
Later Edit: This is Java. I don't think that switch-case structure will work with Strings.
Later Edit 2:
A switch works with the byte, short,
char, and int primitive data types. It
also works with enumerated types
(discussed in Classes and Inheritance)
and a few special classes that "wrap"
certain primitive types: Character,
Byte, Short, and Integer (discussed
in Simple Data Objects ).
Even if you don't use a switch statement, yes, use else if to avoid useless comparison: if the first if is taken, you don't want all others ifs to be evaluated here since they'll always be false. Also you don't need indenting each if making the last block being so indented that you can't see it without scrolling, the following code is perfectly readable:
if (action.equals("opt1")) {
}
else if (action.equals("opt2")) {
}
else if (action.equals("opt3")) {
}
else {
}
Use a dictionary with string as key type and delegates* as value type.
- Retrieving the method from using the string will take O(1+load).
Fill the dictionary within the class's constructor.
Java does not support delegate, so as a work around you may need to define a few inner classes - one for each case and pass the instance of the inner classes instead of the methods as values.
Use a switch statement assuming your language supports switching on a string.
switch(action)
{
case "opt6":
//
break;
case "opt7":
//
...
...
...
}
There are a number of ways to do this in Java, but here's a neat one.
enum Option {
opt1, opt2, opt3, opt4, opt5, opt6
}
...
switch (Option.valueOf(s)) {
case opt1:
// do opt1
break;
case opt2:
// do opt2
break;
case opt3: case opt4:
// do opt3 or opt4
break;
...
}
Note that valueOf(String) will throw an IllegalArgumentException if the argument
is not the name of one of the members of the enumeration. Under the hood, the implementation of valueOf uses a static hashmap to map its String argument to an enumeration value.
You can use a switch.
switch (action)
{
case "opt3":
case "opt4":
doSomething;
break;
case "opt5":
doSomething;
break;
default:
doSomeWork;
break;
}
It could help if you specified the language... As it looks like C++, you could use switch.
switch (action) {
case "opt1":
// something
break;
case "opt2":
// something
break;
...
}
And in case you want to use if statements, I think you could improve readability and performance a bit if you used "else if" without the curly braces, as in:
if (action.equals("opt1")) {
//something
} else if (action.equals("opt2")) {
//something
} else if ((action.equals("opt3")) || (action.equals("opt4"))) {
//something
} else if (action.equals("opt5")) {
//something
} else if (action.equals("opt6")) {
//something
}
I think some compilers can optimize else if better than a else { if. Anyways, I hope I could help!
I would just clean it up as a series of if/else statements:
if(action.equals("opt1"))
{
// something
}
else if (action.equals("opt2"))
{
// something
}
else if (action.equals("opt3"))
{
// something
}
etc...
It depends on your language, but it looks C-like, so you could try a switch statement:
switch(action)
{
case "opt1":
// something
break;
case "opt2":
// something
break;
case "opt3":
case "opt4":
// something
break;
case "opt5":
// something
break;
case "opt6":
// something
break;
}
However, sometimes switch statements don't provide enough clarity or flexibility (and as Victor noted below, will not work for strings in some languages). Most programming languages will have a way of saying "else if", so rather than writing
if (condition1)
{
...
}
else
{
if (condition2)
{
...
}
else
{
if (condition3)
{
...
}
else
{
// This can get very indented very fast
}
}
}
...which has a heap of indents, you can write something like this:
if (condition1)
{
...
}
else if (condition2)
{
...
}
else if (condition3)
{
...
}
else
{
...
}
In C/C++ and I believe C#, it's else if. In Python, it's elif.
The answers advising the use of a switch statement are the way to go. A switch statement is much easier to read than the mess of if and if...else statements you have now.
Simple comparisons are fast, and the //something code won't executed for all but one case, so you can skip "optimizing" and go for "maintainability."
Of course, that's assuming that the action.equals() method does something trivial and inexpensive like a ==. If action.equals() is expensive, you've got other problems.
Procedural switching like this very often is better handled by polymorphism - rather than having an action represented by a string, represent an action by an object who has a 'something' method you can specialise. If you find you do need to map a string to the option, use a Map<String,Option>.
If you want to stick to procedural code, and the options in your real code really are all "optX":
if ( action.startsWith("opt") && action.length() == 4 ) {
switch ( action.charAt(3) ) {
case '1': something; break;
case '2': something; break;
case '3': something; break;
...
}
}
which would be OK in something like a parser ( where breaking strings up is part of the problem domain ), and should be fast, but isn't cohesive ( the connection between the object action and the behaviour is based on the parts of its representation, rather than anything intrinsic in of the object ).
In fact this depends on branch analysis. If 99% of your decisions are "opt1" this code is already pretty good. If 99% of your decisions are "opt6" this code is ugly bad.
If you got often "opt6" and seldom "opt1" put "opt6" in the first comparison and order the following comparisons according to the frequency of the strings in your execution data stream.
If you have a lot of options and all have equal frequency you can sort the options and split them into a form of a binary tree like this:
if (action < "opt20")
{
if( action < "opt10" )
{
if( action == "opt4" ) {...}
else if( action == "opt2" ) {...}
else if( action == "opt1" ) {...}
else if( action == "opt8" ) {...}
}
}
else
{
if( action < "opt30 )
{
}
else
{
if( action == "opt38" ) {...}
else if( action == "opt32" ) {...}
}
}
In this sample the the range splits reduces the needed comparisons for "opt38" and "opt4" to 3. Doing this consequent you get log2(n) +1 comparisons in every branch. this is best for equal frequencies of the options.
Don't do the binary spit to the end, at the end use 4-10 "normal" else if constructs that are ordered by the frequency of the options. The last two or three levels in a binary tree don't take much advance.
Summary
At least there are two optimizations for this kind of comparisons.
Binary Decision Trees
Ordering due to the frequency of the options
The binary decision tree is used for large switch-case constructs by the compiler. But the compiler don't know anything about frequencies of an option. So the ordering according to the frequencies can be a performance benefit to the use of switch-case if one or two options are much more frequent than others. In this case this is a workaround:
if (action == "opt5") // Processing a frequent (99%) option first
{
}
else // Processing less frequent options (>1%) second
{
switch( action )
{
case "opt1": ...
case "opt2": ...
}
}
Warning
Don't optimize your code until you have done profiling and it is really necessary. It is best to use switch-case or else-if straight forward and your code keeps clean and readable. If you have optimized your code, place some good comments in the code so everybody can understand this ugly peace of code. One year later you won't know the profiling data and some comments will be really helpful.
If you find the native java switch construct is too much limiting give a glance to the lambdaj Switcher that allows to declaratively switch on any object by matching them with some hamcrest matchers.
Note that using strings in the cases of a switch statement is one of the new features that will be added in the next version of Java.
See Project Coin: Proposal for Strings in switch