Java reduce CPU usage - java

Greets-
We gots a few nutters in work who enjoy using
while(true) { //Code }
in their code. As you can imagine this maxes out the CPU. Does anyone know ways to reduce the CPU utilization so that other people can use the server as well.
The code itself is just constantly polling the internet for updates on sites. Therefore I'd imagine a little sleep method would greatly reduce the the CPU usage.
Also all manipulation is being done in String objects (Java) anyone know how much StringBuilders would reduce the over head by?
Thanks for any pointers

A lot of the "folk wisdom" about StringBuilder is incorrect. For example, changing this:
String s = s1 + ":" + s2 + ":" + s3;
to this:
StringBuilder sb = new StringBuilder(s1);
sb.append(":");
sb.append(s2);
sb.append(":");
sb.append(s3);
String s = sb.toString();
probably won't make it go any faster. This is because the Java compiler actually translates the concatenation sequence into an equivalent sequence of appends to a temporary StringBuilder. Unless you are concatenating Strings in a loop, you are better of just using the + operator. Your code will be easier to read.
The other point that should be made is that you should use a profiler to identify the places in your code that would benefit from work to improve performance. Most developers' intuition about what is worth optimizing is not that reliable.

I'll start off with your second question, I would like to agree with the rest that StringBuilder vs String is very much dependent on the particular string manipulations. I had "benchmarked" this once and generally speaking as the amount of new string allocations went up (usually in the form of concatenations) the overall execution time went up. I won't go into the details and just say that StringBuilder turned out to be most efficient overtime, when compared to String, StringBuffer, String.format(), MessageFormat...
My rule of thumb is that whenever I wish to concatenate more than 3 string together I always use StringBuilder.
As for your first question. We had a requirement to bring CPU usage to 5%. Not an easy task. We used Spring's AOP mechanism to add a Thread.sleep() to before any method execution of a CPU intensive method. The Thread.sleep() would get invoked only if some limit had been exceeded. I am sorry to say that the computation of this limit is not that simple. And even sorrier to say that I still have not obtained the permission to post it up on the net. So this is just in order to put you on an interesting but complicated track that has proven to work over time.

How often do those sites update? You're probably really annoying the hosts. Just stick a Thread.sleep(60 * 1000); at the end of the loop and you can avoid this. That'll poll them once a minute—surely that's enough?

Make it wait some time before firing again, like this:
while(true) {
//Code
Thread.sleep (1000); //Wait 1 second
}
As for the second question, it would reduce memory and possibly CPU usage as well, but the gains really depend on what's happening with those strings.

A sleep would reduce the CPU usage. As for the StringBuilders they could lower the memory usage and improve performance.

You could also try
Thread.yield()

If their code is in a separate JVM and is running on Linux or some other Unix, require them to run their program at nice 20.
Or you could have them run inside a virtual machine like VirtualBox and use it to limit the processor utilization.
Or you could fire them if they continue burn cycles like that instead of using some event-driven model or Thread.sleep.

Never assume something you see in the code is bad (or good) for performance. Performance issues are notoriously deceptive.
Here's an easy way to see for sure what is taking the time.

What sites is the code polling? Do
they support an RSS push?
If they do, then there is no need to
poll the sites, and instead plug a
module to wait for updates. The current method then waits using wait();
The new module then notifies the object with this method using notifyAll().
Admittedly this is some extra work, but it saves a whole lot of bandwidth and computation.

Related

String.split() temporary objects and Garbage Collect

In my project, we have a requirement to read a very large file, where each line has identifiers separated by a special character ( "|"). Unfortunately I can't use parallelism, since it is necessary to make a validation between the last character of a line with the first of the next line, to decide whether it will or not be extracted. Anyway, the requirement is very simple: break the line into tokens, analyze them and store only some of them in memory. The code is very simple, something like below:
final LineIterator iterator = FileUtils.lineIterator(file)
while(iterator.hasNext()){
final String[] tokens = iterator.nextLine().split("\\|");
//process
}
But this little piece of code is very, very inefficient. The method split() generates too many temporary objects that are not been collected (as best explained here: http://chrononsystems.com/blog/hidden-evils-of-javas-stringsplit-and-stringr.
For comparison purposes: a 5mb file was using around 35 mb memory at the end of file process.
I tested some alternatives like:
Using a pre compiled pattern (Performance of StringTokenizer class vs. split method in Java)
Use Guava's Splitter (Java split String performances)
Optimize String storage (http://java-performance.info/string-packing-converting-characters-to-bytes/)
Use of optimized collections (http://blog.takipi.com/5-coding-hacks-to-reduce-gc-overhead)
But none of them appears to be efficient enough. Using JProfiler, I could see that the amount of memory used by temporary objects is too high (35 mb used, but only 15 mb is actually been used by valid objects).
Then I decide make a simple test: after 50,000 lines read, explicit call to System.gc(). And then, at the end of process, the memory usage has decreased from 35 mb to 16mb. I tested many, many times, and always got the same result.
I know invoke that invoke of System.gc () is a bad practice (as indicated in Why is it bad practice to call System.gc()?). But is there is any other alternative in a cenario where the split() method could be invoked millions of times?
[UPDATE]
I use a 5 mb file only for test purpose, but the system should process much larger files (500Mb ~ 1Gb)
The first and most important thing to say here is, don't worry about it. The JVM is consuming 35MB of RAM because it's configuration says that's a low enough amount. When its highly efficient GC algorithm decides it's time, it will sweep all those objects away, no problem.
If you really want to, you can invoke Java with memory management options (e.g. java -Xmxn=...) -- I suggest it's not worth doing unless you're running on very limited hardware.
However, if you really want to avoid allocating an array of String each time you process a line, there are many ways to do so.
One way is to use a StringTokenizer:
StringTokenizer st = new StringTokenizer(line,"|");
while (st.hasMoreElements()) {
process(st.nextElement());
}
You could also avoid consuming a line at a time. Get your file as a stream, use a StreamTokenizer, and consume one token at a time in this way.
Read the API docs for Scanner, BufferedInputStream, Reader -- there are lots of choices in this area, because you're doing something fundamental.
However, none of these will cause Java to GC sooner or more aggressively. If the JRE doesn't consider itself short of memory, it won't collect any garbage.
Try writing something like this:
public static void main(String[] args) {
Random r = new Random();
Integer x;
while(true) {
x = Integer.valueof(r.nextInt());
}
}
Run it and watch your JVM's heap size as it runs (put a sleep in if the usage shoots up too quickly to see). Each time around the loop, Java creates what you call a 'temporary object' of type Integer. All of these stay in the heap until the GC decides it needs to clear them away. You'll see that it won't do this until it reaches a certain level. But when it reaches that level, it will do a good job of ensuring that its limits are never exceeded.
You should adjust your way of analyzing situations. While the article about the regex compilation under the hood is correct in general, it doesn’t apply here. When you look at the source code of String.split(String), you’ll see that it just delegates to String.split(String,int) which has a special code path for patterns consisting of just one literal character, including escaped ones like your \|.
The only temporary object created within that code path is an ArrayList. The regex package is not involved at all; this fact might help you understanding why precompiling a regex pattern did not improve the performance here.
When you use a Profiler to come to the conclusion that there are too many objects, you should use it also to find out what kinds of objects there are and where they originate, instead of doing wild guessing.
But it’s not clear, why you complain at all. You can configure the JVM to use a certain maximum memory. As long as that maximum has not been reached, the JVM just does what you told it, using that memory rather than wasting CPU cycles just to not using the available memory. Where’s the sense in not using the available memory?

Simple Multi-Threading in Java

Currently, I'm running on a thread-less model that isn't working simply because I'm running out of memory before I can process the data I'm being handed. I've made all the changes that I can to optimize the code, and it's still just not quite quick enough.
Clearly I should move on to a threaded model. I'm wondering what the simplest, easiest way to do the following is:
The main thread passes some info to the worker
That worker performs some work that I'll refactor out of the main method
The workers will disappear and new ones will be instantiated when needed
I've never worked with java threading and from what I've read up on it seems pretty complicated, even if what I'm looking for seems pretty simple.
If you have multiple independent units of work of equal priority, the best solution is generally some sort of work queue, where a limited number of threads (the number chosen to optimize performance) sit in a while(true) loop dequeuing work units from the queue and executing them.
Generally the optimum number of threads is going to be the number of processors +/- 1, though in some cases a larger number will be optimal if the threads tend to get stalled by disk I/O requests or some such.
But keep in mind that tuning the entire system may be required. Eg, you may need more disk arms, and certainly more RAM may be required.
I'd start by having a read through Java Concurrency as refresher ;)
In particular, I would spend some time getting to know the Executors API as it will do most of what you've described without a lot of the overhead of dealing with to many locks ;)
Distributing the memory consumption to multiple threads will not change overall memory consumption. From what I read out of your question, I would like to step forward and tell you: Increase the heap of the Java engine, this will help. Looks like you have to optimize the Java startup parameters and not your code. If I am wrong, then you will have to buffer the data. To Disk! Not to a thread in the same memory model.

Is a concurrency approach a good idea for speeding a long iteration?

I have an app that does an iteration to create points on a graph over time.
While I'm gathering data for each point across the x-axis I also must execute a recursive lookup which effectually means I have a loop inside another loop. This is not scaling too well. I don't see a lot of examples of using a "divide and conquer" solution on iterations. I was thinking of using Java's Executor concurrency framework to run each loop in it's own thread, await the answers, gather the results and return them. The initial test results I'm getting don't seem that much faster. I know I should show some code but what I want to know first is if this approach has merit compared to better methods that I may not be familiar with.
Thanks in advance!
Adding some groovyish/javaish pseudo code to assist thinking about this:
class Car {
id
model
make
weight
}
for (number in listOfImportantCarIDs) {
Car car = carsMap.get(number) // find the car we care about
String maker = car.make //get it's 'parent'
// get amount of all related cars
Iterator<Car> allcars = carsMap.values().iterator();
while (allcars.hasNext()) {
Car aCar = alldocs.next();
if (maker.equals(aCar.make)) {
totalCarCount++; // increment total related cars
BigDecimal totalWeightofAllCars = totalWeightofAllCars.add(aCar.getWeight()); // add weight to total
// a ghetto cache to prevent double counting
countedMaufacturers.add(make);
}
}
}
Using threads will speed up your application by some small constant factor, at the price of significant added complexity for inter-thread communication. If a better algorithm exists, that can save you orders of magnitude. Therefore, I strongly recommend that you first verify that there indeed isn't a sub-quadratic algorithm to solve your problem.
Perhaps if you detailed the problem you are trying to solve and your current solution, we could assist here.
Edit: Goodness, finding a much better algorithm isn't hard at all:
for (Car car : cars {
Stats s = stats.get(car.maker);
if (s == null) {
s = new Stats();
stats.put(car.maker, s);
}
stats.count++;
stats.totalWeight+=car.weight;
}
for (Car car in importantCars) {
stats.get(car.maker);
}
There really is no need to iterate, for each important car, over all cars just to find those with identical maker ...
If the task for computing the y value for each x value is completely atomic for each x value, then I would think this is a good fit for an executor service. Most performance questions simply require measure, even after great lengths of reasoning about the solution. The optimal number of threads for CPU bound problems is p or p+1, keep that in mind.
Have you looked at a Dynamic Programming approach? Is it applicable for your problem? Essentially, recursion means you are solving the same problem over and over again but for slightly smaller input values. When running multiple iterations of the same recursive algorithm a program tends to often re-solve the same exact problem. Instead, a dynamic programming approach might store the solution in a cache and reference the cache before calculation the solution a second time.
Without knowing your exact problem, it is difficult to give an exact answer.
It depends on what's making your loops slow. Are they executing database queries? Accessing a hard disk? Otherwise doing something that causes them to wait around for an external I/O operation? In that case, your best bet is to start adding threads until you stop seeing returns on them, you basically have to tune it.
If they're slow simply because of raw processing taking place in memory in Java, then you could try adding a thread per CPU core on your machine, but probably won't see much benefit beyond that.
If your "Source" data isn't modified by your algorithm or each iteration only operates on it's own data (doesn't change nearby data) it could give you a (max) 2x speed boost on a dual-core.
That's not a great solution and as the time grows for your current solution, this will grow at 1/2 which is still quite expensive if you are in a double-nested loop. O(X^2/2) still pretty much equals O(X^2).
If you can find a way to optimize your algorithm that has a much higher potential of real success and much less of a potential for an amazing amount of time sunk into bug fixing.
Can each point along the X-axis be computed in parallel? If so, that's your opportunity for performance improvements via multi-threading.
The Fork-Join framework will make this kind of program simple. You can get an early access version, or imitate it yourself to some extent.
"a loop inside another loop" that should ring bells ;). The outer loop is always waiting for the inner one. So this is a serial execution and using threads wont change a bit. Unless each loop writes the result to a central service which can be consulted by a following event (loop) on your x-axis. So the service would be stateful...
i don't think concurrency can make your application faster. it can help you to run multiple task in your application, but it won't make them faster.
my experience:
try to get rid of recursive calls, that's the best way to make the app faster.

Java: enough free heap to create an object?

I recently came across this in some code - basically someone trying to create a large object, coping when there's not enough heap to create it:
try {
// try to perform an operation using a huge in-memory array
byte[] massiveArray = new byte[BIG_NUMBER];
} catch (OutOfMemoryError oome) {
// perform the operation in some slower but less
// memory intensive way...
}
This doesn't seem right, since Sun themselves recommend that you shouldn't try to catch Error or its subclasses. We discussed it, and another idea that came up was explicitly checking for free heap:
if (Runtime.getRuntime().freeMemory() > SOME_MEMORY) {
// quick memory-intensive approach
} else {
// slower, less demanding approach
}
Again, this seems unsatisfactory - particularly in that picking a value for SOME_MEMORY is difficult to easily relate to the job in question: for some arbitrary large object, how can I estimate how much memory its instantiation might need?
Is there a better way of doing this? Is it even possible in Java, or is any idea of managing memory below the abstraction level of the language itself?
Edit 1: in the first example, it might actually be feasible to estimate the amount of memory a byte[] of a given length might occupy, but is there a more generic way that extends to arbitrary large objects?
Edit 2: as #erickson points out, there are ways to estimate the size of an object once it's created, but (ignoring a statistical approach based on previous object sizes) is there a way of doing so for yet-uncreated objects?
There also seems to be some debate as to whether it's reasonable to catch OutOfMemoryError - anyone know anything conclusive?
freeMemory isn't quite right. You'd also have to add maxMemory()-totalMemory(). e.g. assuming you start up the VM with max-memory=100M, the JVM may at the time of your method call only be using (from the OS) 50M. Of that, let's say 30M is actually in use by the JVM. That means you'll show 20M free (roughly, because we're only talking about the heap here), but if you try to make your larger object, it'll attempt to grab the other 50M its contract allows it to take from the OS before giving up and erroring. So you'd actually (theoretically) have 70M available.
To make this more complicated, the 30M it reports as in use in the above example includes stuff that may be eligible for garbage collection. So you may actually have more memory available, if it hits the ceiling it'll try to run a GC to free more memory.
You can try to get around this bit by manually triggering a System.GC, except that that's not such a terribly good thing to do because
-it's not guaranteed to run immediately
-it will stop everything in its tracks while it runs
Your best bet (assuming you can't easily rewrite your algorithm to deal with smaller memory chunks, or write to a memory-mapped file, or something less memory intensive) might be to do a safe rough estimate of the memory needed and insure that it's available before you run your function.
There are some kludges that you can use to estimate the size of an existing object; you could adapt some of these to predict the size of a yet-to-be created object.
However, in this case, I think it might be best to catch the Error. First of all, asking for the free memory doesn't account for what's available after garbage collection, which will be performed before raising an OOME. And, requesting a garbage collection with System.gc() isn't reliable. It's often explicitly disabled because it can wreck performance, and if it's not disabled… well, it can wreck performance when used unnecessarily.
It is impossible to recover from most errors. However, recoverability is up to the caller, not the callee. In this case, if you have a strategy to recover from an OutOfMemoryError, it is valid to catch it and fall back.
I guess that, in practice, it really comes down to the difference between the "slow" and "fast" way. If the "slow" method is fast enough, I'd stick with that, as it's safer and simpler. And, it seems to me, allowing it to be used as a fall back means that it is "fast enough." Don't let small optimizations derail the reliability of your application.
The "try to allocate and handle the error" approach is very dangerous.
What if you barely get your memory? A later OOM exception might occur because you brought things too close to the limits. Almost any library call will allocate memory at least briefly.
During your allocation a different thread may receive an OOM exception while trying to allocate a relatively small object. Even if your allocation is destined to fail.
The only viable approach is your second one, with the corrections noted in other answers. But you have to be sure and leave extra "slop space" in the heap when you decide to use your memory intensive approach.
I don't believe that there's a reasonable, generic approach to this that could safely be assumed to be 100% reliable. Even the Runtime.freeMemory approach is vulnerable to the fact that you may actually have enough memory after a garbage collection, but you wouldn't know that unless you force a gc. But then there's no foolproof way to force a GC either. :)
Having said that, I suspect if you really did know approximately how much you needed, and did run a System.gc() beforehand, and your running in a simple single-threaded app, you'd have a reasonably decent shot at getting it right with the .freeMemory call.
If any of those constraints fail, though, and you get the OOM error, your back at square one, and therefore are probably no better off than just catching the Error subclass. While there are some risks associated with this (Sun's VM does not make a lot of guarantees about what happens after an OOM... there's some risk of internal state corruption), there are many apps for which just catching it and moving on with life will leave you with no serious harm.
A more interesting question in my mind, however, is why are there cases where you do have enough memory to do this and others where you don't? Perhaps some more analysis of the performance tradeoffs involved is the real answer?
Definitely catching error is the worst approach. Error happens when there is NOTHING you can do about it. Not even create a log, puff, like "... Houston, we lost the VM".
I didn't quite get the second reason. It was bad because it is hard to relate SOME_MEMORY to the operations? Could you rephrase it for me?
The only alternative I see, is to use the hard disk as the memory ( RAM/ROM as in the old days ) I guess that is what you're pointing in your "else slower, less demanding approach"
Every platform has its limits, java suppport as much as RAM your hardware is willing to give ( well actually you by configuring the VM ) In Sun JVM impl that could be done with the
-Xmx
Option
like
java -Xmx8g some.name.YourMemConsumingApp
For instance
Of course you may end up trying to perform an operation that takes 10 gb of RAM
If that's your case then you should definitely swap to disk.
Additionally, using the strategy pattern could make a nicer code. Although here it looks overkill:
if (isEnoughMemory(SOME_MEMORY)) {
strategy = new InMemoryStrategy();
} else {
strategy = new DiskStrategy();
}
strategy.performTheAction();
But it may help if the "else" involves a lot of code and looks bad. Furthermore if somehow you can use a third approach ( like using a cloud for processing ) you can add a third Strategy
...
strategy = new ImaginaryCloudComputingStrategy();
...
:P
EDIT
After getting the problem with the second approach: If there are some times when you don't know how much RAM is going to be consumed but you do know how much you have left, you could use a mixed approach ( RAM when you have enough, ROM[disk] when you don't )
Suppose this theorical problem.
Suppose you receive a file from a stream and don't know how big it is.
Then you perform some operation on that stream ( encrypt it for instance ).
If you use RAM only it would be very fast, but if the file is large enough as to consume all your APP memory, then you have to perform some of the operation in memory and then swap to file and save temporary data there.
The VM will GC when running out of memory, you get more memory and then you perform the other chunk. And this repeat until you have the big stream processed.
while( !isDone() ) {
if (isMemoryLow()) {
//Runtime.getRuntime().freeMemory() < SOME_MEMORY + some other validations
swapToDisk(); // and make sure resources are GC'able
}
byte [] array new byte[PREDEFINED_BUFFER_SIZE];
process( array );
process( array );
}
cleanUp();

Java Performance Testing [duplicate]

This question already has answers here:
Is stopwatch benchmarking acceptable?
(13 answers)
Closed 7 years ago.
I want to do some timing tests on a Java application. This is what I am currently doing:
long startTime = System.currentTimeMillis();
doSomething();
long finishTime = System.currentTimeMillis();
System.out.println("That took: " + (finishTime - startTime) + " ms");
Is there anything "wrong" with performance testing like this? What is a better way?
Duplicate: Is stopwatch benchmarking acceptable?
The one flaw in that approach is that the "real" time doSomething() takes to execute can vary wildly depending on what other programs are running on the system and what its load is. This makes the performance measurement somewhat imprecise.
One more accurate way of tracking the time it takes to execute code, assuming the code is single-threaded, is to look at the CPU time consumed by the thread during the call. You can do this with the JMX classes; in particular, with ThreadMXBean. You can retrieve an instance of ThreadMXBean from java.lang.management.ManagementFactory, and, if your platform supports it (most do), use the getCurrentThreadCpuTime method in place of System.currentTimeMillis to do a similar test. Bear in mind that getCurrentThreadCpuTime reports time in nanoseconds, not milliseconds.
Here's a sample (Scala) method that could be used to perform a measurement:
def measureCpuTime(f: => Unit): java.time.Duration = {
import java.lang.management.ManagementFactory.getThreadMXBean
if (!getThreadMXBean.isThreadCpuTimeSupported)
throw new UnsupportedOperationException(
"JVM does not support measuring thread CPU-time")
var finalCpuTime: Option[Long] = None
val thread = new Thread {
override def run(): Unit = {
f
finalCpuTime = Some(getThreadMXBean.getThreadCpuTime(
Thread.currentThread.getId))
}
}
thread.start()
while (finalCpuTime.isEmpty && thread.isAlive) {
Thread.sleep(100)
}
java.time.Duration.ofNanos(finalCpuTime.getOrElse {
throw new Exception("Operation never returned, and the thread is dead " +
"(perhaps an unhandled exception occurred)")
})
}
(Feel free to translate the above to Java!)
This strategy isn't perfect, but it's less subject to variations in system load.
The code shown in the question is not a good performance measuring code:
The compiler might choose to optimize your code by reordering statements. Yes, it can do that. That means your entire test might fail. It can even choose to inline the method under test and reorder the measuring statements into the now-inlined code.
The hotspot might choose to reorder your statements, inline code, cache results, delay execution...
Even assuming the compiler/hotspot didn't trick you, what you measure is "wall time". What you should be measuring is CPU time (unless you use OS resources and want to include these as well or you measure lock contestation in a multi-threaded environment).
The solution? Use a real profiler. There are plenty around, both free profilers and demos / time-locked trials of commercials strength ones.
Using a Java Profiler is the best option and it will give you all the insight that you need into the code. viz Response Times, Thread CallTraces, Memory Utilisations, etc
I will suggest you JENSOR, an open source Java Profiler, for its ease-of-use and no overheads on CPU. You can download it, instrument the code and will get all the info you need about your code.
You can download it from: http://jensor.sourceforge.net/
Keep in mind that the resolution of System.currentTimeMillis() varies between different operating systems. I believe Windows is around 15 msec. So if your doSomething() runs faster than the time resolution, you'll get a delta of 0. You could run doSomething() in a loop multiple times, but then the JVM may optimize it.
Have you looked at the profiling tools in netbeans and eclipse. These tools give you a better handle on what is REALLY taking up all the time in your code. I have found problems that I did not realize by using these tools.
Well that is just one part of performance testing. Depending on the thing you are testing you may have to look at heap size, thread count, network traffic or a whole host of other things. Otherwise I use that technique for simple things that I just want to see how long they take to run.
That's good when you are comparing one implementation to another or trying to find a slow part in your code (although it can be tedious). It's a really good technique to know and you'll probably use it more than any other, but be familiar with a profiling tool as well.
I'd imagine you'd want to doSomething() before you start timing too, so that the code is JITted and "warmed up".
Japex may be useful to you, either as a way to quickly create benchmarks, or as a way to study benchmarking issues in Java through the source code.

Categories

Resources