What standard optimization refactoring can I do to my Java application?

What standard optimization refactoring can I do to my Java application? - java

I have a semi big Java application. It's written quite poorly and I suspect there are quite a lot of simple things I can do that will clean things up a bit and improve performance.
For example I recently found the
String.matches(regex)
function used quite a lot in loops with the same regex. So I've replaced that with precompiled Patterns. I use FindBugs (which is great BTW) but that didn't spot this problem and I'm limited on tools that I can use here at work.
Is there anything else simple like this that I should look at?

Have a look at these differents tools:
FindBugs (you already did it).
Checkstyle
PMD
...
I suggest also to have a look at Sonar, which is a great wrapper for all these tools.
All these tools are free!
Globally, they will not help you to improve the performance, but the quality of the code.

First of all make it a well written application. In my experience most of the performance benefits will come not doing stupid things, rather than doing clever optimisations.
When you have a well written application, then is the time to run a profiler and optimise only that which matters.

Is performance an issue? If so, I wouldn't redo anything until I'd profiled the code under a wide variety of conditions and had some hard data in hand to tell me where time was being spent. You might be changing things that have no benefit at all.
I'd be concerned about thread safety. How much attention was paid to that?
If you're going to refactor, start by writing JUnit tests first. It'll help familiarize the code and provide you with a safety net. Tests must pass before and after your changes.
Most importantly, don't undertake a big refactoring just because you consider it a mess. Your team and customer should be on board with what you're doing before you start. Have you communicated your (admittedly good) ideas to others? Software is a team sport; communication is key.

One very important thing to do when refactoring to improve an application is to first refactor the code so that the source looks at least decent. After that's done, never guess where to optimize, always measure where the bottlenecks are and concentrate on solving those problems. Generally, programmers like to do things such as exchanging recursive methods for loops, choosing the exact correct sorting algorithm, etc., which very often makes very little difference at all. So make sure to focus on the correct area (using too much CPU? Memory? Too many threads? Long thread locks?)
EDIT: Like one of the other posters wrote, of course make sure that others on your team/your boss consider this work worth doing as well, if it's not an issue for them, they could probably care less.

Run the code through a profiler (check both speed and memory). Once you find where it is slow (usually not where you think it is) figure out what you can do to speed it up.
Another useful thing (if you are a little brave) is to use NetBeans 7.0 M2 (don't be too panicked, their non-release version are generlaly very stable) there is a plugin called "Jackpot" which searches your code for refactorings. Some of them have to do with performance... but I don't think any of them are going to make a radical change in speed.
Generally speaking, keep the code clean and easy to read and it'll be fast. When it isn't fast you will ahve an easier time speeding it up than if it is mess.
What I did one time when I was writing something that I knew had to be fast (it was code to parse classfiles) is to run the profiler each time I made a change. So for one step I thought I would reduce memroy by calling String.,intern to make sure that all of the Strings were pooled together. When I added the intern() call the memry did go down a bit but the time went up by some huge amount (String.intern is needlessly slow, or it was a few years ago). So at that point I knew that what I had just done was unnaceptably slow and I undid that change.
I don't reccomend doing that in general development, but running the code through a profiler once a day or once a week just to see how things are isn't a productivity killer.

If you are into books, I would highly recommend reading Effective Java by Josh Bloch.

Related

Detecting and pinpointing performance regressions

Are there any known techniques (and resources related to them, like research papers or blog entries) which describe how do dynamically programatically detect the part of the code that caused a performance regression, and if possible, on the JVM or some other virtual machine environment (where techniques such as instrumentation can be applied relatively easy)?
In particular, when having a large codebase and a bigger number of committers to a project (like, for example, an OS, language or some framework), it is sometimes hard to find out the change that caused a performance regression. A paper such as this one goes a long way in describing how to detect performance regressions (e.g. in a certain snippet of code), but not how to dynamically find the piece of the code in the project that got changed by some commit and caused the performance regression.
I was thinking that this might be done by instrumenting pieces of the program to detect the exact method which causes the regression, or at least narrowing the range of possible causes of the performance regression.
Does anyone know about anything written about this, or any project using such performance regression detection techniques?
EDIT:
I was referring to something along these lines, but doing further analysis into the codebase itself.

Perhaps not entirely what you are asking, but on a project I've worked on with extreme performance requirements, we wrote performance tests using our unit testing framework, and glued them into our continuous integration environment.
This meant that every check-in, our CI server would run tests that validated we hadn't slowed down the functionality beyond our acceptable boundaries.
It wasn't perfect - but it did allow us to keep an eye on our key performance statistics over time, and it caught check-ins that affected the performance.
Defining "acceptable boundaries" for performance is more an art than a science - in our CI-driven tests, we took a fairly simple approach, based on the hardware specification; we would fail the build if the performance tests exceeded a response time of more than 1 second with 100 concurrent users. This caught a bunch of lowhanging fruit performance issues, and gave us a decent level of confidence on "production" hardware.
We explicitly didn't run these tests before check-in, as that would slow down the development cycle - forcing a developer to run through fairly long-running tests before checking in encourages them not to check in too often. We also weren't confident we'd get meaningful results without deploying to known hardware.

With tools like YourKit you can take a snapshot of the performance breakdown of a test or application. If you run the application again, you can compare performance breakdowns to find differences.
Performance profiling is more of an art than a science. I don't believe you will find a tool which tells you exactly what the problem is, you have to use your judgement.
For example, say you have a method which is taking much longer than it used to do. Is it because the method has changed or because it is being called a different way, or much more often. You have to use some judgement of your own.

JProfiler allows you to see list of instrumented methods which you can sort by average execution time, inherent time, number of invocations etc. I think if this information is saved over releases one can get some insight into regression. Offcourse the profiling data will not be accurate if the tests are not exactly same.

Some people are aware of a technique for finding (as opposed to measuring) the cause of excess time being taken.
It's simple, but it's very effective.
Essentially it is this:
If the code is slow it's because it's spending some fraction F (like 20%, 50%, or 90%) of its time doing something X unnecessary, in the sense that if you knew what it was, you'd blow it away, and save that fraction of time.
During the general time it's being slow, at any random nanosecond the probability that it's doing X is F.
So just drop in on it, a few times, and ask it what it's doing.
And ask it why it's doing it.
Typical apps are spending nearly all their time either waiting for some I/O to complete, or some library function to return.
If there is something in your program taking too much time (and there is), it is almost certainly one or a few function calls, that you will find on the call stack, being done for lousy reasons.
Here's more on that subject.

Does attaching a profiler cause some things to run slower then others?

Is it possible that attaching a profiler to a JVM (let's say VisualVM) could make some methods run slower, while not effecting others and thus causing a skew in the results that makes it look like a certain piece of code is a hotspot when in fact it's not. I will ask specifically about reflection calls for an example. I'm running some code that shows a lot of time spent in Spring AOP calls (specifically invokeJoinpointUsingReflection) - which the author says runs fine in testing (using an in code microbenchmark) but when they profiled it showed this method to take longer then other non-reflection methods. (sorry if that' a little unclear) So it got my wondering if the profiler could really have this effect and lead a developer down a false trail. Feel free to answer with any examples, the reflection part is just my example.

Profilers regularly give mis-leading information, but in generally they are usually right. Where they tend to skew the result is in very simple methods which might be further optimised if profiling wasn't enabled.
If in doubt I suggest you use another profiler, such as YourKit (evalation version should be fine) It has more light weight recording, but can have the same issues.

Heisenberg famously observed that collecting information from a system always disturbs it, so you can't get an undisturbed observation. (Thus the software term, "Heisenbug"). Yes, collecting profiling information can cause the actual performance to be changed in ways that will misdirect you.
Whether that is true in a significant way for your particular JVM or profiler, and how much disturbance occurs, is a matter of engineering.

Most profilers are sample based, and thus the more data you collect, the more accurate the results are. As far as I know, there is no bias for or against methods written purely in Java.

Certain profilers require a calibration step, e.g. NetBeans and VisualVM. You might verify the vintage and settings for your chosen profiler.

Java Parallel Programming

I need to parallelize a CPU intensive Java application on my multicore desktop but I am not so comfortable with threads programming. I looked at Scala but this would imply learning a new language which is really time consuming. I also looked at Ateji PX Java parallel extensions which seem very easy to use but did not have a chance yet to evaluate it. Would anyone recommend it? Other suggestions welcome.
Thanks in advance for your help
Bill

I would suggest you try the built-in ExecutorService for distributing multiple tasks across multiple threads/cores. Do you have any requirements which this might not do for you?

The Java concurrency utilites:
http://download.oracle.com/javase/1.5.0/docs/guide/concurrency/overview.html
make parallel programming on Java even easier than it already was. I would suggest starting there - if you are uncomfortable with that level of working with threads, I would think twice about proceeding further. Parallelizing anything requires some level of technical comfort with how concurrent computation is done and coordinated. In my opinion, it can't get much easier than that framework - which is part of the reason why you see so few alternatives.
Second, the main thing you should think about is what the unit of work is for parallelization. If your unit of work is independent (i.e., each parallel task does not impact the others), this is generally far easier because you don't need to worry about much (or any) synchronization at all. Put effort into thinking how to model the problem so that computation is as independent as possible. If you model it well, you will almost certainly reduce the lines of code (which reduces the error, etc).
Admittedly, frameworks that automatically parallelize for you are less error prone, but can be suboptimal if your model unit of work doesn't play to their parallelization scheme.

I am the lead developer of Ateji PX. As you mention, guaranteeing thread safety is an important topic. It is also a very difficult one, and there's not much help out there beside hand-written and hand-checked #ThreadSafe annotations. See e.g. "The problem with threads".
We are currently working on a parallel verifier for Ateji PX. This has become possible because parallelism in Ateji PX is compositional (unlike threads) and based on a sound mathematical foundation, namely pi-calculus. Even without a tool, experience shows that expressing parallelism in an intuitive and compositional way makes it much easier to "think parallel" and catch errors earlier.

I browsed quickly through the Ateji PX web site. Seems to be a nice product but I'm afraid you will be disappointed at some point since Ateji PX only provides you an intuitive simple way of performing high level parallel operations such as distributing the work load on several workers, creating rendez-vous points between parallel tasks, etc. However as you can read in the FAQ in the section How do you detect and prevent data dependencies? Ateji PX does not ensure that the underlying code is thread safe. So at any rate you'll still be needing skills in Java thread programming.
Edit:
Also consider that when maintenance time will come and you won't be available to perform it, it'll be easier to find a contractor, employee or trainee with skills in standard Java multithread programming than in Ateji PX.
Last word, there's a free 30 days evaluation, try it.

Dont worry java 7 is coming up with Fork Join by Doug lea for Distributed Processing.

I'm asked to tune a long starting app into a short time period

I'm asked to shorten the start-up period of a long starting app, however I have also to obligate to my managers to the amount of time i will reduce the startup - something like 10-20 seconds.
As I'm new in my company I said I can obligate with timeframe of months (its a big server and I'm new and I plan to do lazy load + performance tuning).
That answer was not accepted I was required to do some kind of a cache to hold important data in another server and then when my server starts up it would reach all its data from that cache - I find it a kind of a workaround and I don't really like it.
do you like it?
What do you think I should do?
PS when I profiled the app I saw many small issues that make the start-up long (like 2 minutes) it would not be a short process to fix all and to make lazy load.
Any kind of suggestions would help.
Language is Java.
Thanks

Rule one of performance optimisation: measure it. Get hard figures. At each stage of optimisation measure the performance gain/loss/lack of change. You (and your managers) are not in a position to say that a particular optimisation will or will not work before you try it and measure it. You can always ask to test & measure a solution before implementing it.
Rule two of performance optimisation (or anything really): choose your battles. Please bear in mind that your managers may be very experienced with the system in question, and may know the correct solution already; there may be other things (politics) involved as well, so don't put your position at risk by butting heads at this point.

I agree with MatthieuF. The most important thing to do is to measure it. Then you need to analyze the measurements to see which parts are most costly and also which resource (memory, CPU, network, etc) is the bottleneck.
If you know these answers you can propose solutions. You might be able to create small tests (proof of concepts) of your solution so you can report back early to your managers.
There can be all kind of solutions for example simply buying more hardware might be the best way to go. It's also possible that buying more hardware will have no results and you need to make modifications. The modifications can be optimizing the software, the database or other software. It might be to choose better algorithms, to introduce caching (in expense of more memory usage) or introduce multi threading to take advantage of multiple CPU cores. You can also make modifications to the "surroundings" of your application such as the configuration/version of your operating system, Java virtual machine, application server, database server and others. All of these components have settings which can affect the performance.
Again, it's very important to measure, identify the problem, think of a solution, build solution (maybe in proof of concept) and measure if solution is working. Don't fall in to the trap of first choosing a solution without knowing the problem.

It sounds to me as if you've come in at a relatively junior position, and your managers don't (yet) trust your abilities and judgment.
I don't understand why they would want you to commit to a particular speed-up without knowing if it was achievable.
Maybe they really understand the code and its problems, and know that a certain level of speed-up is achievable. In this case, they should have a good idea how to do it ... so try and get them to tell you. Even if their ideas are not great, you will get credit for at least giving them a try.
Maybe they are just trying to apply pressure (or pass on pressure applied to them) in order to get you to work harder. In this case, I'd probably give them a worth-while but conservative estimate. Then spend some time investigating the problems more thoroughly. And if after a few days research you find that your "off the cuff" estimates are significantly off the mark, go back to the managers with a more accurate estimate.
On the technical side, a two minute start-up times sounds rather excessive to me. What is the application doing in all that time? Loading data structures from files or a database? Recalculating things? Profiling may help answer some of these questions, but you also need to understand the system's architecture to make sens of the profile stats.
Without knowing what the real issues are here, I'd suggest trying to get the service to become available early while doing some of the less critical initialization in the background, or lazily. (And your managers' idea of caching some important data may turn out to be a good one, if viewed in this light.) Alternatively, I'd see if it was feasible to implement a "hot standby" for the system, or replicate it in such a way that allowed you to reduce startup times.

Is there any benchmark which compares PHP and JSP?

I don't want to make a holy war. I just want to know if there is such benchmark? Or maybe you can say something about this thread from your experience?

I just stumbled over the language shootout yesterday, again where you can compare some performance characteristics of both languages while running different programs. I didn't find a benchmark for web performance, though.
Fact is, that interpreted languages like PHP are always slower than a compiled language. JSP files get compiled, too, so once the server is up an running and doesn't get changed anymore, the performance will be better than a PHP script that gets interpreted every time a request comes in.
On the other hand, the first performance bottleneck you will have will probably be the database speed, anyway. And then there are still lots of other ways for improving performance like pre compiling your PHP scripts, externalizing heavy calculations into C etc. And compared to the monster of Java web development PHP is easy to learn and to get along with. In the end, if you have the choice, you should go with the language you are most comfortable with. If you are starting a new project you may not even know if all the performance considerations will ever be important because you don't have the users yet and just want to get your application out there quickly.

While Daff's explanation of PHP vs JSP is technically wrong, the essential gist of his post is correct: choose the language that is best for you. Only very rarely will you find yourself in a position where performance really matters badly. At that point, you are much more likely to be able to make significant architectural optimizations in your language of choice - and these optimizations are likely to have significantly more effect than the difference between PHP and JSP.
One of the core rules of programming has always been to avoid premature optimization - if for no other reason than because until you're actually under pressure you don't know what you actually need to optimize, nor do you have a means of determining whether it worked.
In the event that you believe there's a possibility you may face performance issues, no website can help you. The most vital thing is to create your own load testing benchmarks that represent the specifics of how your site works, simulating how your users do things. Only once you have done that are you able to move on to tweaking your code, implementing things like caching, load balancing, data and request partitioning etc with any confidence that the changes you are making are having a positive impact on your site performance.
There are books specifically about the process of optimization in general, but the key sequence is this:
Benchmark
Change test
Benchmark to see if change indicates performance improvement
Go live
Evaluate live response to see if benchmark prediction was correct
(People forget #5 a lot and cause themselves grief)
If you're going to spend time worrying about performance, spend time setting up that sequence, don't spend time worrying about your language choice.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.