Resettable timer performance in Java

Resettable timer performance in Java - java

When profiling my app, I found a source of frequent allocations... I am utilizing multiple timers to check for timeouts (~2ms, varies with app state).
According to this SO answer, ScheduledThreadPoolExecutor is preferable to Timer. So, I made class very similar to this other SO answer that schedules tasks as recommended. I use it like so:
private final ResettableTimer timer = new ResettableTimer(new Runnable() {
public void run() {
//timeout, do something
}
});
timer.reset(2, TimeUnit.MILLISECONDS);
I found that every time a task is scheduled, java allocates the following:
java.util.concurrent.LinkedBlockingQueue$Node - 16 bytes
java.util.concurrent.Executors$RunnableAdapter - 16 bytes
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask - 64 bytes
java.util.concurrent.locks.AbstractQueuedSynchronizer$node - 28 bytes
For example: 2 timers at 500 tasks/s = 124kB/s
This is small, but it causes very frequent garbage collection. It doesn't seem to be causing me problems right now, although it may be the cause of some stuttering I'm seeing on older devices. Should I be worried?
More importantly, these allocations seem pretty unnecessary since these tasks are doing the same thing over & over again. Is there any way to avoid this re-allocation?

Related

When does concurrency/multithreading help improve performance?

I have been planning to use concurrency in project after learning it indeed has increased through put for many.
Now I have not worked much on multi threading or concurrency so decided to learn and have a simple proof of concept before using it in actual project.
Below are the two examples I have tried:
1. With use of concurrency
public static void main(String[] args)
{
System.out.println("start main ");
ExecutorService es = Executors.newFixedThreadPool(3);
long startTime = new Date().getTime();
Collection<SomeComputation> collection = new ArrayList<SomeComputation>();
for(int i=0; i< 10000; i++){
collection.add(new SomeComputation("SomeComputation"+i));
}
try {
List<Future< Boolean >> list = es.invokeAll(collection);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("\n end main "+(new Date().getTime() - startTime));
}
2. Without use of concurrency
public static void main(String[] args) {
System.out.println("start main ");
long startTime = new Date().getTime();
Collection<SomeComputation> collection = new ArrayList<SomeComputation>();
for(int i=0; i< 10000; i++){
collection.add(new SomeComputation("SomeComputation"+i));
}
for(SomeComputation sc:collection)
{
sc.compute();
}
System.out.println("\n end main "+(new Date().getTime() - startTime));
}
Both share a common class
class SomeComputation implements Callable<Boolean>
{
String name;
SomeComputation(String name){this.name=name;}
public Boolean compute()
{
someDumbStuff();
return true;
}
public Boolean call()
{
someDumbStuff();
return true;
}
private void someDumbStuff()
{
for (int i = 0;i<50000;i++)
{
Integer.compare(i,i+1);
}
System.out.print("\n done with "+this.name);
}
}
Now the analysis after 20 odd runs of each approach.
1st one with concurrency takes on average 451 msecs.
2nd one without concurrency takes on average 290 msecs.
Now I learned this depends on configuration , OS , version(java 7) and processor.
But all was same for both approaches.
Also learned the cost of concurrency is affordable on when computation is heavy.But this point wasn't clear to me.
Hope some one can help me understand this more.
PS: I tried finding similar questions but could find this kind.Please comment the link if you do.

Concurrency has at least two different purposes: 1) performance, and 2) simplicity of code (like 1000 listeners for web requests).
If your purpose is performance, you can't get any more speedup than the number of hardware cores you put to work.
(And that's only if the threads are CPU-bound.)
What's more, each thread has a significant startup overhead.
So if you launch 1000 threads on a 4-core machine, you can't possibly do any better than a 4x speedup, but against that, you have 1000 thread startup costs.

As mentioned in one of answers, one use of concurrency is to have simplicity of code i.e. there are certain problems which are logically concurrent so there is no way to model those problems in a non-concurrent way like producer - consumer problems, listeners to web requests etc
Other than that, a concurrent program adds to performance only if its able to save CPU Cycles for you i.e. goal is to keep CPU or CPUs busy all the time and not waste its cycles, which further means that you let your CPU do something useful when your program is supposed to be busy in something NON - CPU tasks like waiting for Disk I/O, Wait for Locks , Sleeping, GUI app user wait etc - these times simply add time to your total program run time.
So the question is, what your CPU doing when your program not using it? Can I complete a portion of my program during that time and segregate waiting part in another thread? Now a days, most of modern systems are multiprocessor and multi core systems further leading to wastage if programs are not concurrent.
The example that you wrote is doing all the processing in memory without going into any of wait states so you don't see much gain but only loss in setting up of threads and context switching.
Try to measure performance by hitting a DB, get 1 million records, process those records and then save those records to DB again. Do it in one go sequentially and in small batches in concurrent way so you notice performance difference because DB operations are disk intensive and when you are reading or writing to DB, you are actually doing Disk I/O and CPU is free wasting its cycles during that time.
In my opinion, good candidates for concurrency are long running tasks involving one of wait operations mentioned above otherwise you don't see much gain.Programs which need some background tasks are also good candidates for concurrency.
Concurrency has not to be confused with multitasking of CPU i.e. when you run different programs on same CPU at the same time.
Hope it helps !!

concurrecy is needed when the threads are sharing the same data sources
so when some thread is working with this source the others must wait until it
finish the job than they have the acces
so you need to learn synchronized methode and bluck or some thing like that
sorry for my english read this turial it's helpful
https://docs.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html

Constantly nulling out and re-creating Timer, TimerTask in Java

I know using Timer and TimerTask is no longer the current accepted practice (some have suggested using threads, others suggest using ScheduledExecutorService or its variants), so this question is not so much about good programming practice, but about the possibility of actual errors or exception.
Essentially, what I have is a servlet that keeps a running counter (which is a static Calendar object) that gets incremented every second. When a specified deadline is met (when we reach 10 minutes by default), I make a call from my application to a NIST time server to get the current time, which I then use to re-set my counter.
The same TimerTask-defined method that increments the counter (every second) is also the one that must be paused and re-scheduled every time I make a call to a NIST server (every ten minutes). I have been unsuccessful in pausing/cancelling the existing Timer/TimerTask objects before the NIST server call and re-scheduling the TimerTask after the call.
The exceptions that occur from this are described here:
How do I use a timer in Java when required to set and cancel multiple times?
Suffice it to say, neither TimerTask nor Timer can be scheduled more than once, even by using purge() or cancel(), which appear to be only good for setting those objects as eligible for Java garbage collection.
Using wait() and notify() resulted in synchronization exceptions that I, unfortunately, did not have the time to figure out, so my initial experiment with threading was a failure.
What I ended up doing is this:
secondTickerTask.cancel();
secondTicker.purge();
secondTicker.cancel();
secondTickerTask = null;
secondTicker = null;
Date newCurrentTime = getNistTimeFromFirstWorkingServer();
// Save new date to current time, reset second counter.
setCurrentTimeAndDeadline(newCurrentTime);
startSecondIncrementTimer(newCurrentTime);
secondTicker = new Timer();
secondTickerTask = new TimerTask(){
public void run(){
incrementCurrentTimeAndTestDeadline();
}
I ran this code over-night a few times, at 10-minute and 1-minute intervals between NIST server calls, and it worked smoothly.
So, after that long lead-up (thank you for your patience), this is what my question is: Being forced, for the moment, to use the code that I have, is there any damage that could result in the long run? If I keep making new TimerTask and Timer objects while nulling out the old ones over, let's say, a period of a month, or six months, will I force the Server to run out of memory? Is Java's garbage collection robust enough to handle this sort of use? Can any other scary thing happen?
Thank you very much for your time,
- Eli

Java will handle the creation and abandonment of the timer tasks just fine. You need to ensure that you drop all references to the timers when you are done with them, which it appears you are doing, and then when the GC runs it will clean up any garbage the Timers introduced.
You are safe.
You should note that, over long periods of time, some Java processes tend to keep allocating memory until they hit their -Xmx limit. This does not mean that there is a problem (because that space will be reused by the GC), but it also means that, if you want a long-running Java process to have a relatively small footprint that you should not specify an -Xmx much larger than what you actually need.

Does it make sense to reuse Runnables in a thread pool?

I'm implementing a thread pool for processing a high volume market data feed and have a question about the strategy of reusing my worker instances that implement runnable which are submitted to the thread pool for execution. In my case I only have one type of worker that takes a String and parses it to create a Quote object which is then set on the correct Security. Given the amount of data coming off the feed it is possible to have upwards of 1,000 quotes to process per second and I see two ways to create the workers that get submitted to the thread pool.
First option is simply creating a new instance of a Worker every time a line is retrieved from the underlying socket and then adding it to the thread pool which will eventually be garbage collected after its run method executed. But then this got me thinking about performance, does it really make sense to instantiate 1,0000 new instances of the Worker class every second. In the same spirit as a thread pool do people know if it is a common pattern to have a runnable pool or queue as well so I can recycle my workers to avoid object creation and garbage collection. The way I see this being implemented is before returning in the run() method the Worker adds itself back to a queue of available workers which is then drawn from when processing new feed lines instead of creating new instances of Worker.
From a performance perspective, do I gain anything by going with the second approach or does the first make more sense? Has anyone implemented this type of pattern before?
Thanks - Duncan

I use a library I wrote called Java Chronicle for this. It is designed to persist and queue one million quotes per second without producing any significant garbage.
I have a demo here where it sends quote like objects with nano second timing information at a rate of one million messages per second and it can send tens of millions in a JVM with a 32 MB heap without triggering even a minor collection. The round trip latency is less than 0.6 micro-seconds 90% of the time on my ultra book. ;)
from a performance perspective, do I gain anything by going with the second approach or does the first make more sense?
I strongly recommend not filling your CPU caches with garbage. In fact I avoid any constructs which create any significant garbage. You can build a system which creates less than one object per event end to end. I have a Eden size which is larger than the amount of garbage I produce in a day so no GCs minor or full to worry about.
Has anyone implemented this type of pattern before?
I wrote a profitable low latency trading system in Java five years ago. At the time it was fast enough at 60 micro-seconds tick to trade in Java, but you can do better than that these days.
If you want low latency market data processing system, this is the way I do it. You might find this presentation I gave at JavaOne interesting as well.
http://www.slideshare.net/PeterLawrey/writing-and-testing-high-frequency-trading-engines-in-java
EDIT I have added this parsing example
ByteBuffer wrap = ByteBuffer.allocate(1024);
ByteBufferBytes bufferBytes = new ByteBufferBytes(wrap);
byte[] bytes = "BAC,12.32,12.54,12.56,232443".getBytes();
int runs = 10000000;
long start = System.nanoTime();
for (int i = 0; i < runs; i++) {
bufferBytes.reset();
// read the next message.
bufferBytes.write(bytes);
bufferBytes.position(0);
// decode message
String word = bufferBytes.parseUTF(StopCharTesters.COMMA_STOP);
double low = bufferBytes.parseDouble();
double curr = bufferBytes.parseDouble();
double high = bufferBytes.parseDouble();
long sequence = bufferBytes.parseLong();
if (i == 0) {
assertEquals("BAC", word);
assertEquals(12.32, low, 0.0);
assertEquals(12.54, curr, 0.0);
assertEquals(12.56, high, 0.0);
assertEquals(232443, sequence);
}
}
long time = System.nanoTime() - start;
System.out.println("Average time was " + time / runs + " nano-seconds");
when set with -verbose:gc -Xmx32m it prints
Average time was 226 nano-seconds
Note: there are no GCes triggered.

I'd use the Executor from the concurrency package. I believe it handles all this for you.

does it really make sense to instantiate 1,0000 new instances of the Worker class every second.
Not necessarily however you are going to have to be putting the Runnables into some sort of BlockingQueue to be able to be reused and the cost of the queue concurrency may outweigh the GC overhead. Using a profiler or watching the GC numbers via Jconsole will tell you if it is spending a lot of time in GC and this needs to be addressed.
If this does turn out to be a problem, a different approach would be to just put your String into your own BlockingQueue and submit the Worker objects to the thread-pool only once. Each of the Worker instances would dequeue from the queue of Strings and would never quit. Something like:
public void run() {
while (!shutdown) {
String value = myQueue.take();
...
}
}
So you would not need to create your 1000s of Workers per second.

Yes of course, something like this, because OS and JVM don't care about what is going on a thread, so generally this is a good practice to reuse a recyclable object.

I see two questions in your problem. One is about thread pooling, and another is about object pooling. For your thread pooling issue, Java has provided an ExecutorService . Below is an example of using an ExecutorService.
Runnable r = new Runnable() {
public void run() {
//Do some work
}
};
// Thread pool of size 2
ExecutorService executor = Executors.newFixedThreadPool(2);
// Add the runnables to the executor service
executor.execute(r);
The ExecutorService provides many different types of thread pools with different behaviors.
As far as object pooling is concerned, (Does it make sense to create 1000 of your objects per second, then leave them for garbage collection, this all is dependent on the statefulness and expense of your object. If your worried about the state of your worker threads being compromised, you can look at using the flyweight pattern to encapsulate your state outside of the worker. Additionally, if you were to follow the flyweight pattern, you can also look at how useful Future and Callable objects would be in your application architecture.

java square wave

I am trying to create a square wave on the parallel port with java. So far I have this implementation.
public class Wave extends Thread {
public Wave() {
super();
setPriority(MAX_PRIORITY);
}
#Override
public void run() {
Wave.high();
LockSupport.parkNanos(20000000);
Wave.low();
LockSupport.parkNanos(20000000);
}
public static native void high();
public static native void low();
}
In which high() and low() are implemented using JNI (a shared C library controls the parallel port). It works pretty well; it generates a square wave with a period of about 40ms. Using an oscilloscope it looks like the standard deviation is about 10 microseconds when the computer is idle. When the computer is not idle the standard deviation becomes much larger. I think this is because more context switches happen and Threads stay too long in the waiting state and the specified 20 ms is not achieved accurately.
Is there a way to make my implementation more accurate? I know I could use hardware for this but I want to know if I can do this with software too.
Would an option be to "listen" to a clock and perform an action timed to the millisecond?

Just "listening" to the clock won't solve the problem of context switches causing jitter.
If you can dedicate a core to this:
bind the thread to the core;
move IRQ handling to other cores;
have a tight loop constantly checking the time (using System.nanoTime() or RDTS/RDTSCP), and calling high()/low() as appropriate.
This way you should be able to achieve very low jitter.
Of course, if the task is to simply produce a square wave, this is a pretty inefficient use of computing resources.

i think there are going to be two sources of jitter.
first, garbage collection (and possibly other background processes, like the JIT) in java. for the code you gave, there should not be any gc. but if this is part of a larger system then you will likely find that garbage collection is required, and that it may alter the timings when it runs. you can try ameliorate this by playing with the jvm settings (java -X).
second, the system scheduler. in addition to the suggestions by aix, you can bump the priority of the process and do some linux-specific tweaks. this article explains some of the problems with linux. ubuntu has a low-latency kernel, which you can install, but i can't find info on what it actually contains so you can do the same on other systems (update: i think it may contain this patch). if you want to look for more info "low latency" is the key think to search for, and people doing audio processing on linux tend to be the ones who care most about this).

If your context switching does not cause too much delay, you may try to park your thread until a given time, rather than for a given interval:
public class Wave extends Thread {
private final Object BLOCKER = new Object();
public Wave() {
super();
setPriority(MAX_PRIORITY);
}
#Override
public void run() {
// I suspect this should be running in an endless loop?
for (;;) {
Wave.high();
long t1 = System.currentTimeMillis();
// Round interval up to the next 20ms "deadline"
LockSupport.parkUntil(BLOCKER, t1 + 20 - (t1 % 20));
Wave.low();
// Round interval up to the next 20ms "deadline"
long t2 = System.currentTimeMillis();
LockSupport.parkUntil(BLOCKER, t2 + 20 - (t2 % 20));
}
}
public static native void high();
public static native void low();
}
As this relies on the wall-clock time in ms, rather than a more precise nano-seconds time, this will not work well for much higher frequencies. But this may not work either, as GC (and other processes) may interrupt this thread for an "unfortunate" amount of time, resulting in the same jitter.
When I tested this on my Windows 7 quad-core with JDK 6, I had some non-negligible jitter about every second, so aix's solution is probably better

Accurate Sleep for Java on Windows

Does anyone know a Library which provides a Thread.sleep() for Java which has an error not higher than 1-2 Millisecond?
I tried a mixture of Sleep, error measurement and BusyWait but I don't get this reliable on different windows machines.
It can be a native implementation if the implementation is available for Linux and MacOS too.
EDIT
The link Nick provided ( http://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks ) is a really good resource to understand the issues all kinds of timers/sleeps/clocks java has.

To improve granularity of sleep you can try the following from this Thread.sleep page.
Bugs with Thread.sleep() under Windows
If timing is crucial to your
application, then an inelegant but
practical way to get round these bugs
is to leave a daemon thread running
throughout the duration of your
application that simply sleeps for a
large prime number of milliseconds
(Long.MAX_VALUE will do). This way,
the interrupt period will be set once
per invocation of your application,
minimising the effect on the system
clock, and setting the sleep
granularity to 1ms even where the
default interrupt period isn't 15ms.
The page also mentions that it causes a system-wide change to Windows which may cause the user's clock to run fast due to this bug.
EDIT
More information about this is available
here and an associated bug report from Sun.

This is ~5 months late but might be useful for people reading this question. I found that java.util.concurrent.locks.LockSupport.parkNanos() does the same as Thread.sleep() but with nanosecond precision (in theory), and much better precision than Thread.sleep() in practice. This depends of course on the Java Runtime you're using, so YMMV.
Have a look: LockSupport.parkNanos
(I verified this on Sun's 1.6.0_16-b01 VM for Linux)

Unfortunately, as of Java 6 all java sleep-related methods on Windows OS [including LockSupport.awaitNanos()] are based on milliseconds, as mentioned by several people above.
One way of counting precise interval is a "spin-yield". Method System.nanoTime() gives you fairly precise relative time counter. Cost of this call depends on your hardware and lies somewhere 2000-50 nanos.
Here is suggested alternative to Thread.sleep():
public static void sleepNanos (long nanoDuration) throws InterruptedException {
final long end = System.nanoTime() + nanoDuration;
long timeLeft = nanoDuration;
do {
if (timeLeft > SLEEP_PRECISION)
Thread.sleep (1);
else
if (timeLeft > SPIN_YIELD_PRECISION)
Thread.yield();
timeLeft = end - System.nanoTime();
} while (timeLeft > 0);
}
This approach has one drawback - during last 2-3 milliseconds of the wait hit CPU core. Note that sleep()/yield() will share with other threads/processes. If you are willing to compromise a little of CPU this gives you very accurate sleep.

There are no good reasons to use Thread.sleep() in normal code - it is (almost) always an indication of a bad design. Most important is, that there is no gurantee that the thread will continue execution after the specified time, because the semantics of Thread.sleep() is just to stop execution for a given time, but not to continue immedeately after that period elapsed.
So, while I do not know what you try to achieve, I am quite sure you should use a timer instead.

JDK offers the Timer class.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/Timer.html
Reading the docs clearly indicates that beyond the plumbing to make this a generalized framework, it uses nothing more sophisticated than a call to Object.wait(timeout):
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#wait(long)
So, you can probably cut the chase an just use Object#wait yourself.
Beyond those considerations, the fact remains that JVM can not guarantee time accuracy across platforms. (Read the docs on http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#currentTimeMillis())
I think you'll need to experiment with a compromise solution combining Timer and busy polling if you want to want the highest timing precision possible on your platform. Effectively Object#wait(1) -> System#nanoTime -> calculate delta -> [loop if necessary].
If you are willing to roll your own, JNI pretty much leaves it wide open for platform specific solutions. I am blissfully un-aware of Window's internals, but obviously if the host OS does provide sufficiently accurate realtime timer services, the barebones structure of setting up a timerRequest(timedelta, callback) native library shouldn't be beyond reach.

The Long.MAX_VALUE hack is the working solution.
I tried Object.wait(int milis) to replace Thread.sleep, but found that Object.wait is as accurate as Thread.sleep (10ms under Windows). Without the hack, both methods are not suitable for any animation

Use one of the Thread::join overrides on the current thread. You specify the number of milliseconds (and nanoseconds) to wait.

You could try using the new concurrency libraries. Something like:
private static final BlockingQueue SLEEPER = new ArrayBlockingQueue(1);
public static void main(String... args) throws InterruptedException {
for(int i=0;i<100;i++) {
long start = System.nanoTime();
SLEEPER.poll(2, TimeUnit.MILLISECONDS);
long time = System.nanoTime() - start;
System.out.printf("Sleep %5.1f%n", time/1e6);
}
}
This sleeps between 2.6 and 2.8 milliseconds.

Sounds like you need an implementation of real-time Java.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.