This question already has answers here:
Recursion or iteration?
(14 answers)
Closed 2 years ago.
I have this method that calculates some statistics:
public void calculateAverage(int hour){
if (hour != 20) {
int data =0;
int times = 0;
for (CallQueue cq : queues) {
data += cq.getCallsByTime().get(hour);
times++;
}
averageData.add((double)data/times);
calculateAverage(hour + 1);
}
}
Now I am very proud that I have created a recursive method but I know that this could have been solved with a loop.
My question is: is it better to solve these kind of problems recursive or with a loop?
Recursion in general
In general, a recursion would be more expensive, because the stack has to be modified with copies of variables for each time the function recurses.
A set of addresses & states need to be saved, so that the recursive procedure can return to the right state after that particular run.
Iteration would be better if possible. Recursion, when iteration just won't cut it, or will result in a lot more complicated code.
Code Maintenance
From a maintenance perspective, debugging iterative code is a lot easier than recursive procedures as it is relatively easier to understand what the state is at any particular iteration, as compared to thinking about a particular recursion.
Your code
The procedure calls itself, but each run has nothing to do with the results of the previous run. Each run being independent, is usually the biggest give-away, that recursion there might not be necessary.
In my opinion, calculateAverage(hour + 1); should be moved outside the function, as it would also be clearer to someone reading your code. that each call is independent.
In Java, C, and Python, recursion is fairly expensive compared to iteration (in general) because it requires the allocation of a new stack frame. In some C compilers, one can use a compiler flag to eliminate this overhead, which transforms certain types of recursion (actually, certain types of tail calls) into jumps instead of function calls. (source)
For this particular problem there isn't too much of a runtime difference. I personally would rather use iteration, I think it would be more simple and easier to understand, but to each his own I suppose.
now some recursive functions(like recursive Fibonacci numbers for example) should be done by iteration instead, simply because they can have exponential growth.
generally, I don't use recursion unless It would make my problem actually easier to understand.
You should investigate the perimeter circumstances. For big recursions stack might get overflow, thats +1 for loops.
I'm not sure which one runs faster but that is relatively easy to measure, taking JIT and other stuff into considerations.
Code maintenance aspect: it is much easier for the most of us to understand and fix loops than recursion. Developers time is usually more important than minor performance differences.
It depends on the context. For example if I have a tree of Composite objects (in SWT) and you wish to traverse them the easiest way is to use recursion like this:
private boolean checkControlParent(Composite comp) {
boolean ret = false;
if (comp != null) {
if (this.equals(comp)) {
ret = true;
} else {
ret = checkControlParent(comp.getParent());
}
}
return ret;
}
otherwise if performance is important be advised that recursive calls are slower in most cases than simple loops because of the function/method call overhead.
So the main thing is that if you need to iterate through objects where recursion is a natural solution and you don't risk a StackOverflowError go ahead and use recursion. Otherwise you'll probably better off with a loop.
One more thing: recursive methods are sometimes tend to be harder to read, understand and debug.
Related
I'm working with an external library that decided to handle collections on its own. Not working with it or updating is outside my control. To work with elements of this third party "collection" it only returns iterators.
A question came up during a code review about having multiple returns in the code to gain performance. We all agree (within the team) the code is more readable with a single return, but some are worried about optimizations.
I'm aware premature optimization is bad. That is a topic for another day.
I believe the JIT compiler can handle this and skip the unneeded iterations, but could not find any info to back this up. Is JIT capable of such a thing?
A code sample of the issue at hand:
public void boolean contains(MyThings things, String valueToFind) {
Iterator<Thing> thingIterator = things.iterator();
boolean valueFound = false;
while(thingIterator.hasNext()) {
Thing thing = thingIterator.next();
if (valueToFind.equals(thing.getValue())) {
valueFound = true;
}
}
return valueFound;
}
VS
public void boolean contains(MyThings things, String valueToFind) {
Iterator<Thing> thingIterator = things.iterator();
while(thingIterator.hasNext()) {
Thing thing = thingIterator.next();
if (valueToFind.equals(thing.getValue())) {
return true;
}
}
return false;
}
We all agree the code is more readable with a single return.
Not really. This is just old school structured programming when functions were typically not kept small and the paradigms of keeping values immutable weren't popular yet.
Although subject to debate, there is nothing wrong with having very small methods (a handful of lines of code), which return at different points. For example, in recursive methods, you typically have at least one base case which returns immediately, and another one which returns the value returned by the recursive call.
Often you will find that creating an extra result variable, just to hold the return value, and then making sure no other part of the function overwrites the result, when you already know you can just return, just creates noise which makes it less readable not more. The reader has to deal with cognitive overload to see the result is not modified further down. During debugging this increases the pain even more.
I don't think your example is premature optimisation. It is a logical and critical part of your search algorithm. That is why you can break from loops, or in your case, just return the value. I don't think the JIT could realise that easily it should break out the loop. It doesn't know if you want to change the variable back to false if you find something else in the collection. (I don't think it is that smart to realise that valueFound doesn't change back to false).
In my opinion, your second example is not only more readable (the valueFound variable is just extra noise) but also faster, because it just returns when it does its job. The first example would be as fast if you put a break after setting valueFound = true. If you don't do this, and you have a million items to check, and the item you need is the first, you will be comparing all the others just for nothing.
Java compiler cannot do an optimization like that, because doing so in a general case would change the logic of the program.
Specifically, adding an early return would change the number of invocations of thingIterator.hasNext(), because your first code block continues iterating the collection to the end.
Java could potentially replace a break with an early return, but that would have any effect on the timing of the program.
Simple question asked mostly out of curiosity about what java compiler's are smart enough to do. I know not all compilers are built equally, but I'm wondering if others feel it's reasonable to expect an optimization on most compilers I'm likely to run against, not if it works on a specific version or on all versions.
So lets say that I have some tree structure and I want to collect all the descendant of a node. There are two easy ways to do this recursively.
The more natural method, for me, to do this would be something like this:
public Set<Node> getDescendants(){
Set<Node> descendants=new HashSet<Node>();
descendants.addall(getChildren());
for(Node child: getChildren()){
descendants.addall(child.getDescendants());
}
return descendants;
}
However, assuming no compiler optimizations and a decent sized tree this could get rather expensive. On each recursive call I create and fully populate a set, only to return that set up the stack so the calling method can add the contents of my returning set to it's version of the descendants set, discarding the version that was just built and populated in the recursive call.
So now I'm creating many sets just to have them be discarded as soon as I return their contents. Not only do I pay a minor initialization cost for building the sets, but I also pay the more substantial cost of moving all the contents of one set into the larger set. In large trees most of my time is spent moving Nodes around in memory from set A to B. I think this even makes my algorithm O(n^2) instead of O(n) due to the time spent copying Nodes; though it may work out to being O(N log(n)) if I set down to do the math.
I could instead have a simple getDescendants method that calls a helper method that looks like this:
public Set<Node> getDescendants(){
Set<node> descendants=new HashSet<Node>();
getDescendantsHelper(descendants);
return descendants;
}
public Set<Node> getDescendantsHelper(Set<Node> descendants){
descendants.addall(getChildren());
for(Node child: getChildren()){
child.getDescendantsHelper(descendant);
}
return nodes;
}
This ensures that I only ever create one set and I don't have to waste time copying from one set to another. However, it requires writing two methods instead of one and generally feels a little more cumbersome.
The question is, do I need to do option two if I'm worried about optimizing this sort of method? or can I reasonably expect the java compiler, or JIT, to recognize that I am only creating temporary sets for convenience of returning to the calling method and avoid the wasteful copying between sets?
edit: cleaned up bad copy paste job which lead to my sample method adding everything twice. You know something is bad when your 'optimized' code is slower then your regular code.
The question is, do I need to do option two if I'm worried about optimizing this sort of method?
Definitely yes. If performance is a concern (and most of the time it is not!), then you need it.
The compiler optimizes a lot but on a very different scale. Basically, it works with one method only and it optimizes the most commonly used path there in. Due to heavy inlining it can sort of optimize across method calls, but nothing like the above.
It can also optimize away needless allocations, but only in very simple cases. Maybe something like
int sum(int... a) {
int result = 0;
for (int x : a) result += x;
return result;
}
Calling sum(1, 2, 3) means allocating int[3] for the varargs arguments and this can be eliminated (if the compiler really does it is a different question). It can even find out that the result is a constant (which I doubt it really does). If the result doesn't get used, it can perform dead code elimination (this happens rather often).
Your example involves allocating a whole HashMap and all its entries, and is several orders of magnitude more complicated. The compiler has no idea how a HashMap works and it can't find out e.g., that after m.addAll(m1) the set m contains all member of m1. No way.
This is an algorithmical optimization rather than low-level. That's what humans are still needed for.
For things the compiler could do (but currently fails to), see e.g. these questions of mine concerning associativity and bounds checks.
I would like to know which one is good. I am writing a for loop. In the condition part I am using str.length(). I wonder is this a good idea. I can also assign the value to an integer variable and use it in the loop.
Which one is the suitable/better way?
If you use str.length() more than once or twice in the code, it's logical to extract it to a local var simply for brevity's sake. As for performance, it will most probably be exactly the same because the JIT compiler will inline that call, so the native code will be as if you have used a local variable.
There is no distinct downside to calling a function in the loop condition expression in the sense that "you really should never do it". You want to watch out when calling functions that have side effects, but even that can be acceptable in some circumstances.
There are three major reasons for moving function calls out of the loop (including the loop condition expressions):
Performance. The function may (depending on the JIT compiler) get called for every iteration of the loop, which costs you execution time. Particularly if the function's code has a higher order of complexity than O(1) after the first execution, this will increase the execution time. By how much depends entirely on exactly what the function in question does and how it is implemented.
Side effects. If the function has any side effects, those may (will) be executed repeatedly. This might be exactly what you want, but you need to be aware of it. A side effect is basically something that is observable outside of the function that is being called; for example, disk or network I/O are often considered to be side effects. A function that simply performs calculations on already available data is generally a pure function.
Code clarity. Admittedly str.length() isn't very long, but if you have a complex calculation based around a function call in the loop conditional, code clarity can very easily suffer. For this reason it may be advantageous to move the loop termination condition calculation out of the loop condition expression itself. Beware of awakening the sleeping beast, however; make very sure that the refactored code actually is more readable.
For str.length() it doesn't really matter unless you are really after the last bit of performance you can get, particularly as as has been pointed out by other answerers, String#length() is an O(1) complexity operation. Especially in the general case, if you need the additional performance, consider introducing a variable to hold the result of the function call and comparing against that rather than making the function call repeatedly.
Personally, I'd consider code clarity before worrying about micro-optimizations like exactly where to place a specific function call. But if you have everything else down and still need to ooze a little bit more performance out of the code, moving the function call out of the condition expression and using a local variable (preferably of a primitive type) is something worth considering. Chances are, though, that if you are worried about that, you'll see bigger gains by considering a different algorithm. (Do you really need to iterate over the string the way you are doing? Is there no other way to do what you are after?)
It usually doesn't matter. Use whichever makes your code clearer.
If a value is going to be used more than once, then there are two advantages to assigning it to a local variable:
You can give the variable a good name, which makes your code easier to read an understand
You can sometimes avoid a small amount of overhead by calling the method only once. This helps performance (although the difference is often too small to be noticeable - if in doubt you should benchmark)
Note: This advice only applies to pure functions. You need to be much more careful if the function has side effects, or might return a different value each time (like Math.random()) - in these cases you need to think much more carefully about the effect of multiple function calls.
Calling length costs O(1) since the length is stored as a member - It's a constant operation, don't waste your time thinking about complexity and performance of this thing.
there are no difference at all between the two
But suppose if the str.length changes then in the for loop you need to manualy change the value
for example
String str="hi";
so in the for loop you write this way
for int i=0;i<str.length();i++)
{
}
or
for int i=0;i<2;i++)
{
}
Now suppose you want to change the str String str="hi1";
so in the for loop
for int i=0;i<3;i++)
{
}
So I would suggest you to go for str.length()
If you use str.length always this will evaluated. It is better to assign this value to variable and use that in for loop.
for(int i=0; i<str.length;i++){ // str.length always evaluvated
}
int k=str.length; // only one time evaluvated
for(int i=0;i<k;i++){
}
If you are concern about performance you may use second approach.
If you are using str.length() in the code more than one time then you need to assign it to another variable and use it. Otherwise you can use str.length() itself.
Reason for need
When we call a method, each time the current position is stored in a DS (heap/stack) and go to the corresponding called method and make their operations
And come back and from the DS retrieve the current position and do the normal operations.
That is actually happening. So when we do it so many times in a program it will cause the above mentioned scenario for several times.
Therefore we need to create a local variable and assign into it and use where ever need in the program.
The difference is probably miniscule or doesn't exist, but which one is more efficient and why?
int nItems = param.getItemList().size();
for (...) {
if (nitems == 1) doSomething();
}
or
for (...) {
if (param.getItemList().size() == 1) doSomething();
}
In theory the first one will be faster (have you profiled it? you should!), because it pulls a frequently used method call outside of the loop. However, given enough time the JIT compiler might optimize/inline the method call making both solutions indistinguishable in terms of performance.
Such micro-micro-optimizations are not worth the effort, better aim for the solution that it's clearer and simpler to understand. Which IMHO happens to be the first one.
It is more efficient to make the method call outside the loop. Sometimes it may even matter. The one circumstance under which the second approach might be more efficient is if the conditions of the for loop resulted in the loop being skipped entirely.
The first is more efficient because the method need not be invoked several times. However, as both getItemList() and size() seem to be accessors, the difference will be miniscule.
The first one, because you don't need to make the two calls for every iteration of the loop
Today I had a coworker suggest I refactor my code to use a label statement to control flow through 2 nested for loops I had created. I've never used them before because personally I think they decrease the readability of a program. I am willing to change my mind about using them if the argument is solid enough however. What are people's opinions on label statements?
Many algorithms are expressed more easily if you can jump across two loops (or a loop containing a switch statement). Don't feel bad about it. On the other hand, it may indicate an overly complex solution. So stand back and look at the problem.
Some people prefer a "single entry, single exit" approach to all loops. That is to say avoiding break (and continue) and early return for loops altogether. This may result in some duplicate code.
What I would strongly avoid doing is introducing auxilary variables. Hiding control-flow within state adds to confusion.
Splitting labeled loops into two methods may well be difficult. Exceptions are probably too heavyweight. Try a single entry, single exit approach.
Labels are like goto's: Use them sparingly, and only when they make your code faster and more importantly, more understandable,
e.g., If you are in big loops six levels deep and you encounter a condition that makes the rest of the loop pointless to complete, there's no sense in having 6 extra trap doors in your condition statements to exit out the loop early.
Labels (and goto's) aren't evil, it's just that sometimes people use them in bad ways. Most of the time we are actually trying to write our code so it is understandable for you and the next programmer who comes along. Making it uber-fast is a secondary concern (be wary of premature optimization).
When Labels (and goto's) are misused they make the code less readable, which causes grief for you and the next developer. The compiler doesn't care.
There are few occasions when you need labels and they can be confusing because they are rarely used. However if you need to use one then use one.
BTW: this compiles and runs.
class MyFirstJavaProg {
public static void main(String args[]) {
http://www.javacoffeebreak.com/java101/java101.html
System.out.println("Hello World!");
}
}
I'm curious to hear what your alternative to labels is. I think this is pretty much going to boil down to the argument of "return as early as possible" vs. "use a variable to hold the return value, and only return at the end."
Labels are pretty standard when you have nested loops. The only way they really decrease readability is when another developer has never seen them before and doesn't understand what they mean.
I have use a Java labeled loop for an implementation of a Sieve method to find prime numbers (done for one of the project Euler math problems) which made it 10x faster compared to nested loops. Eg if(certain condition) go back to outer loop.
private static void testByFactoring() {
primes: for (int ctr = 0; ctr < m_toFactor.length; ctr++) {
int toTest = m_toFactor[ctr];
for (int ctr2 = 0; ctr2 < m_divisors.length; ctr2++) {
// max (int) Math.sqrt(m_numberToTest) + 1 iterations
if (toTest != m_divisors[ctr2]
&& toTest % m_divisors[ctr2] == 0) {
continue primes;
}
} // end of the divisor loop
} // end of primes loop
} // method
I asked a C++ programmer how bad labeled loops are, he said he would use them sparingly, but they can occasionally come in handy. For example, if you have 3 nested loops and for certain conditions you want to go back to the outermost loop.
So they have their uses, it depends on the problem you were trying to solve.
I've never seen labels used "in the wild" in Java code. If you really want to break across nested loops, see if you can refactor your method so that an early return statement does what you want.
Technically, I guess there's not much difference between an early return and a label. Practically, though, almost every Java developer has seen an early return and knows what it does. I'd guess many developers would at least be surprised by a label, and probably be confused.
I was taught the single entry / single exit orthodoxy in school, but I've since come to appreciate early return statements and breaking out of loops as a way to simplify code and make it clearer.
I'd argue in favour of them in some locations, I found them particularly useful in this example:
nextItem: for(CartItem item : user.getCart()) {
nextCondition : for(PurchaseCondition cond : item.getConditions()) {
if(!cond.check())
continue nextItem;
else
continue nextCondition;
}
purchasedItems.add(item);
}
I think with the new for-each loop, the label can be really clear.
For example:
sentence: for(Sentence sentence: paragraph) {
for(String word: sentence) {
// do something
if(isDone()) {
continue sentence;
}
}
}
I think that looks really clear by having your label the same as your variable in the new for-each. In fact, maybe Java should be evil and add implicit labels for-each variables heh
I never use labels in my code. I prefer to create a guard and initialize it to null or other unusual value. This guard is often a result object. I haven't seen any of my coworkers using labels, nor found any in our repository. It really depends on your style of coding. In my opinion using labels would decrease the readability as it's not a common construct and usually it's not used in Java.
Yes, you should avoid using label unless there's a specific reason to use them (the example of it simplifying implementation of an algorithm is pertinent). In such a case I would advise adding sufficient comments or other documentation to explain the reasoning behind it so that someone doesn't come along later and mangle it out of some notion of "improving the code" or "getting rid of code smell" or some other potentially BS excuse.
I would equate this sort of question with deciding when one should or shouldn't use the ternary if. The chief rationale being that it can impede readability and unless the programmer is very careful to name things in a reasonable way then use of conventions such as labels might make things a lot worse. Suppose the example using 'nextCondition' and 'nextItem' had used 'loop1' and 'loop2' for his label names.
Personally labels are one of those features that don't make a lot of sense to me, outside of Assembly or BASIC and other similarly limited languages. Java has plenty of more conventional/regular loop and control constructs.
I found labels to be sometimes useful in tests, to separate the usual setup, excercise and verify phases and group related statements. For example, using the BDD terminology:
#Test
public void should_Clear_Cached_Element() throws Exception {
given: {
elementStream = defaultStream();
elementStream.readElement();
Assume.assumeNotNull(elementStream.lastRead());
}
when:
elementStream.clearLast();
then:
assertThat(elementStream.lastRead()).isEmpty();
}
Your formatting choices may vary but the core idea is that labels, in this case, provide a noticeable distinction between the logical sections comprising your test, better than comments can. I think the Spock library just builds on this very feature to declare its test phases.
Personally whenever I need to use nested loops with the innermost one having to break out of all the parent loops, I just write everything in a method with a return statement when my condition is met, it's far more readable and logical.
Example Using method:
private static boolean exists(int[][] array, int searchFor) {
for (int[] nums : array) {
for (int num : nums) {
if (num == searchFor) {
return true;
}
}
}
return false;
}
Example Using label (less readable imo):
boolean exists = false;
existenceLoop:
for (int[] nums : array) {
for (int num : nums) {
if (num == searchFor) {
exists = true;
break existenceLoop;
}
}
}
return exists;