Java Junit precise Benchmarking and conclusion of complexity - java

I am using java and I have used junit-benchmarks-0.7.2 for JUNIT performance tests , it works fine for the part of warm ups , multiple runs for test functions and for plotting results , i just want to ask about two features that i can't find in junit-benchmark :
1-it is not precise for execution time in milliseconds (specially in plots , so i just have plots for functions taking more than 0.1 sec in execution
2-Is there a plugin that can give rough or exact estimation for the complexity of my code ??
even if it displays the performance of my code vs the expected performance for cases like O(N^2) or O(N) or how ever it calculates it ??? (it doesn't matter if it is free or paid plugin , I just want one to do the task )

I guess this is so far the answer:
Won't fix; will stick with millis as the default granularity. If somebody really needs nanosecond-grade timing, run your benchmarks with caliper.
This is what I'd actually recommend, too, as to me junit-benchmarks doesn't seem to be as advanced. But I may be wrong as I haven't watched it closely.
You can write a JUnit test which is also a caliper benchmark like I did, if it helps.
Concerning the complexity estimator, there were such plans for caliper, but I strongly doubt that anyone did it. You could do it yourself... in a few hours, I guess. I'm afraid, that it won't be really useful: It can just extrapolate what it sees and there may be problems which manifest themselves only outside of the measured range. So you should better interpolate only and then it loses sense as you can spot problems without the tool.

Related

Shortest path using Simulated Annealing using an Android app

I am implementing an android application using different geographic coordinates and I need to solve a problem similar to the traveling salesman.
I found an implementation of the algorithm at http://www.theprojectspot.com/tutorial-post/simulated-annealing-algorithm-for-beginners/6.
I adjusted the code to what I need, and it produces theoretically optimal results. I noticed, however, that each execution produces a different type of result.
I went back to the original code and found that even in the original, there is disagreement as to the results.
Do not understand. Shouldn't the result be unique? After all, we are looking for the smallest path ... perhaps some small variation, but each execution differs by several units from the previous execution.
How could I adjust the algorithm to produce the same result in all runs? Has anyone worked with this?
That's the price you pay for an algorithm like this one: the results obtained might very well be different every time. The algorithm does not "find the shortest path," which is a computationally intractable problem ("travelling salesman"). Instead, it seeks to quickly find a solution that is "short enough." Whether or not it actually does so depends very much on the data ... and, to a non-trivial degree, on random chance.
And, since the algorithm is comparatively fast, sometimes you do run it several times in a row in order to gauge the variability of the solutions obtained. If (say) three runs each produce results that are "close enough" to one another, there's a good chance that the result is reliable. But if the standard deviation is very large, the algorithm might not be giving you a good answer. (Bear in mind that sometimes the solution will be wrong.)
So to speak: "you get what you pay for, but you don't pay much for it, and of course that is the point."

How to measure C++ or Java file complexity?

I want to start measuring what Michael Feathers has referred to as the turbulence of code, namely churn vs. complexity.
To do this, I need to measure the complexity of a C++ or Java file. So I found a couple tools that measure cyclomatic complexity (CC). They each measure CC well at the function or method level. However, I need a metric at the file level, and they don't do so well there. One tool just returns the average of all method complexities in the file, and the other tool treats the whole file like it is one giant method, i.e., it counts all the decision points in the whole file.
So I did some research and found that McCabe defines CC only in terms of modules--and they define a module as a function--not as a file (see slides 20 and 30 of this presentation). And I think that makes sense.
So now I'm left with trying to figure out how to represent file complexity. My thought is that I should just use the maximum method CC for that file.
Any thoughts about that approach or any other suggestions?
Thanks!
Ken
Few years ago I had the same question. I answered it in the following way and it worked and works for me perfectly:
The purpose to minimize complexity is to improve maintainability. Cyclomatic complexity is an indicator of logical complexity, and you are right - it is applied to the smallest 'unit', i.e. function. It is possible to derive 'summary' metrics, like total/max/min/etc but they rarely show something useful, when it is about cyclomatic complexity. I tried to use 'summary' metrics to compare 2 code bases, but concluded that only distribution graphs of cyclomatic complexity are really useful here.
So, what could be used to indicate something about maintainability level for bigger units/levels of abstractions, like files/components/subsystems? I found that the first metric is a size of a unit in lines of code. If you limit the size of a file, like 1000 lines, and limit cyclomatic complexity for each function in the file, you will have relatively "simple" file, because it is "small" and contains only "simple" functions. You may include or exclude comment/blank lines or count only statements or only executable lines...
However, I concluded that it does not really matter in this particular application. Just limit some 'size' metric and it will serve the purpose in most cases.. Later you may think about limiting the total number of lines of code per a component/subsystem. It will have the same effect - component is "simple", because it contains "small" number of "simple" files.
The post you referred to is very good. It can be extended to broader metric, which usually is named as 'maintainability index'. The index is very high if a function is complex, file is big and has got frequent changes, little coverage by tests, and so on (add here whatever you think defines maintainability). It is the best way, I know, to find hot-spots for re-factoring...
Disclaimer: I am looking after Metrix++ tool which executes the use case scenario, I explained above.

Getting Time Complexity and Space Complexity From a Program

I am creating a website (my Academic Project) in which user can upload his program files (.cs,.PHP,.java), then the web compiles the program and able to say Time and space complexity automatically. Is this possible? How can we calculate complexity of program. Is there any code in Java for finding complexity of a program? Or can we found these from compiler itself?
Determining the time- and space complexity of a program is a hard problem. As the feedback have pointed out, it is not even possible in general to point out if a program will terminate. (This is known as the Halting Problem)
To make a start with your project I would advice to look into Cyclomatic Complexity which is calculated for example by the GMetrics project.
This will get you started in your exploration of the subject matter.
Assuming you know what kind of input to give the program, you can estimate the complexity through successive iterations of actually running the program. However, general case static analysis is impossible, due to the Halting Problem.
If you can run the application multiple times on input sets of various sizes, you can develop an approximation.
In the classic case of sorting numbers, you can have the application sort a list of 2 numbers, then 4, 8, 16, 32, etc... and essentially graph the memory and time requirements for each run. Basic curve fitting will show you the growth in complexity.
Note that this is not rigorously accurate, as certain algorithms may have points at which their performance changes radically. Such a system may also get fooled by the differences between growths that "look" similar, but have vastly different properties, such as asymptotic and logarithmic curves, or exponential and polynomial curves.

Tool for java code performance anaylizer

There are many tools for code quality. But sometimes need gain performance also if code is not corresponds to rules of cod quality. Exists some open source tool for this?
Thanks.
There's no tool for that, but you can try out jVisualVM, however.
http://download.oracle.com/javase/6/docs/technotes/tools/share/jvisualvm.html
It usually comes with your jdk. # C:\Program Files\Java\jdk1.6.0_21\bin
No tool is going to tell you performance and quality. Both are hard to measure.
You can certainly use something like FindBugs or IntelliJ's Inspector to examine your code, but they'll just look for rule violations. I'm not aware of a tool that will point out when I've written code that performs badly. How will a Java code inspector know that your database has no indexes?
I can't answer you regarding code quality. Others can. But when you "need gain performance", I would rather tell you how to do it than tell you what tools to use.
There are tools, but more important than tools is understanding what you're doing.
The most important is to understand that measuring doesn't tell you what to fix to get higher performance; it only tells you how much improvement you got.
The way to improve performance is to find activities, whatever they are, that account for a significant fraction of time and can be improved.
Measuring is not finding.
Example:
I can manually sample the state of a program, several times, and see it much of the time doing container class manipulations, like fetching elements, testing for end conditions, etc.
(That's the finding part.)
This can be happening in many different places in the code, so no particular routine appears to be causing a large fraction of time to be spent.
There is no particular hotspot or obvious bottleneck.
There is no "bad algorithm" or "slow routine", the kinds of thing people say they look for.
Nevertheless, I can see in those few samples that it is doing container class operations, and I can see exactly where.
If I can replace those container class operations with something else that accomplishes the same purpose, I can save time.
How much time? Up to roughly the fraction of time I saw those operations happening, and that can be quite large.
The real payoff for doing this is there can be multiple issues.
Suppose issue A costs 40% of the time, B costs 20%, and C costs 10%,
and the total time is, say, 10 seconds.
You go after A, the most obvious one.
Fixing it reduces time to about 6 seconds. (Speedup 10/6 = 1.67).
Then problem B takes a larger percent of time (2/6 = .33) so it is easier to find with samples.
Fixing it reduces time to 4 seconds (Speedup 6/4 = 1.5)
Then C is (1/4 = 25%) and is much easier to find than before.
Removing it reduces time to 3 seconds (Speedup 4/3 = 1.33).
The total speedup factor is 10/3 = 3.33.
You can look at it as the compounded product of each speedup: 10/6 * 6/4 * 4/3 = 10/3.
Now I'm dealing in numbers here, but none of these had to be measurements of time spent in localized pieces of code.
They were just rough estimates gotten from describing what was happening in a small number of detailed samples of what the program was doing.
The samples aren't really concerned with measuring.
They are concerned with exposing the problems.

Where to code this heuristic?

I want to ask a complex question.
I have to code a heuristic for my thesis. I need followings:
Evaluate some integral functions
Minimize functions over an interval
Do this over thousand and thousand times.
So I need a faster programming language to do these jobs. Which language do you suggest? First, I started with Java, but taking integrals become a problem. And I'm not sure about speed.
Connecting Java and other softwares like MATLAB may be a good idea. Since I'm not sure, I want to take your opinions.
Thanks!
C,Java, ... are all Turing complete languages. They can calculate the same functions with the same precision.
If you want achieve performance goals use C that is a compiled and high performances language . Can decrease your computation time avoiding method calls and high level features present in an interpreted language like Java.
Anyway remember that your implementation may impact the performances more than which language you choose, because for increasing input dimension is the computational complexity that is relevant ( http://en.wikipedia.org/wiki/Computational_complexity_theory ).
It's not the programming language, it's probably your algorithm. Determine the big0 notation of your algorithm. If you use loops in loops, where you could use a search by a hash in a Map instead, your algorithm can be made n times faster.
Note: Modern JVM's (JDK 1.5 or 1.6) compile Just-In-Time natively (as in not-interpreted) to a specific OS and a specific OS version and a specific hardware architecture. You could try the -server to JIT even more aggressively (at the cost of an even longer initialization time).
Do this over thousand and thousand times.
Are you sure it's not more, something like 10^1000 instead? Try accurately calculating how many times you need to run that loop, it might surprise you. The type of problems on which heuristics are used, tend to have a really big search space.
Before you start switching languages, I'd first try to do the following things:
Find the best available algorithms.
Find available implementations of those algorithms usable from your language.
There are e.g. scientific libraries for Java. Try to use these libraries.
If they are not fast enough investigate whether there is anything to be done about it. Is your problem more specific than what the library assumes. Are you able to improve the algorithm based on that knowledge.
What is it that takes so much/memory? Is this realy related to your language? Try to avoid observing JVM start times instead of the time it performed calculation for you.
Then, I'd consider switching languages. But don't expect it to be easy to beat optimized third party java libraries in c.
Order of the algorithm
Tipically switching between languages only reduce the time required by a constant factor. Let's say you can double the speed using C, but if your algorithm is O(n^2) it will take four times to process if you double the data no matter the language.
And the JVM can optimize a lot of things getting good results.
Some posible optimizations in Java
If you have functions that are called a lot of times make them final. And the same for entire classes. The compiler will know that it can inline the method code, avoiding creating method-call stack frames for that call.

Categories

Resources