I have a really odd bug with some Java code I'm writing. If I run some parallel code I've written it runs perfectly, however if I run the same code multiple times in the same run the runtime gets slower and slower each time.
If I increase the number threads from 4 to 8, the slowdown is more dramatic each iteration.
Each run is completely independent, I even set the runtime variable to null in between to clear the old run. So I have no idea what could be slowing it down. I've been using the Visual VM and it says that .run() is spending all of its time on "Self Time" or "Thread.init()" which is not helpful.
Some snippets of code:
for (int r = 0; r < replicateRuns; ++r) {
startTime = System.nanoTime();
result = newRuntime.execute();
result = null;
System.out.println((System.nanoTime() - startTime) / 1000000);
total += System.nanoTime() - startTime;
}
parentThread = this;
thread = new Thread[numberOfThreads];
barrier = new CyclicBarrier(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
thread[i] = new Thread(new Runtime(parentThread, variation, delta, eta, i,
numberOfThreads), i + "");
thread[i].start();
}
for (int i = 0; i < numberOfThreads; i++) {
try {
thread[i].join();
thread[i] = null;
} catch (Exception ex) {
ex.printStackTrace();
}
}
So any clues at to why if I launch the Java app many times I get decent results, but if I run it many times within the same launch everything slows down, even though as far as I can see I'm null'ing everything so the GC comes and cleans it.
I'm using thread local variables, but from what I've read they're all cleaned when the thread itself is set to null.
Cheers for any help!
EDIT 1:
Thanks for all the help. The plot thickens, on my Windows desktop (as opposed to my MacBook) there are no issues at all, each thread runs fine with no slow down inbetween even when I increase the amount of runs! After staring at this for a day I'm going to try again with Eclipse MAT first thing in the morning.
With regards to the source, I'm extending the MOEA framework with a parallel version of MOEAD, hence the many dependencies and classes. MOEA framework You can find the source of my class here . Essentially iterate is called repeatedly until numberOfEvaulations reaches a set figure.
I believe the problem as guys are saying here is that you are not 'stopping' your threads in the right way - sort of speak.
The best way in my experience is to store a state in a thread, in a boolean variable e.g. isRunning. Then inside your loop you test the state of the isRunning flag, i.e.
//inside the run method
while(isRunning){
//your code goes here
}
This way on each iteration of the loop you are checking the current state of the flag thus when you will set it to 'false' in, for example, your custom stop() method. The next iteration of the loop will cause the thread to exit its run method thus ending life of your thread. Well technically now it becomes ready to be garbage collected. It's memory will be deallocated at some point in the near future, considering there is no reference to this threat hanging in some place in your code.
There is more sources showing this approach, for example, check out this discussion on LinkedIn.
As a side note it would be actually useful to see what exactly is the newRuntime or result variables, their classes and inheritance etc. Otherwise we can only try to guess as to what actually is going on in your code.
You are always generating new threads and never disposing of them .
If the number of threads is larger than the number of processor cores we have to switch threads , which can decrease performance about 1000 times .
If you are using Netbeans IDE , in profiler you can the threads and their status .
Related
So we're working on an signal processing application, there's a specific type of hardware in the PC and a C driver communicating with it.
The application frontend/gui is written in JavaFX. We're having some issues with the JavaFX LineChart, we're measuring electrical signal frequency and trying to plot it on the aforementioned LineChart.
The measurements are running in a loop until 1000 samples are gathered, we've been testing with 100Hz signal, which means that it takes 10s to get these 1000 samples.
There's a separate 'LineChart' thread running and checking (every 10ms) whether there are new samples available, if so these are added to the LineChart, if the measurement thread is finished the LineChart thread resets the LineChart (clears the series data) and the process starts over.
Every thing is running fine for first ~20 min, after which it seems that the LineChart 'slows down', it looks as if the drawing is not as fast/dynamic as in the beginning.
We've checked pretty much everything we could in the application and found nothing, so we've created a separate project which only has the LineChart and a thread that adds samples to the chart every 10ms (up to 1000 samples). We've observed the same behavior, here's how it's done:
Thread t = new Thread(new Runnable() {
#Override
public void run() {
int iteration = 0;
long start = 0;
long stop = 0;
while (run) {
CountDownLatch latch = new CountDownLatch(1);
start = System.currentTimeMillis();
for (int i = 0; i < 1001; i++) {
double ran = random(50, 105);
final int c = i;
Platform.runLater(() -> {
series.getData().add(new XYChart.Data<>(c, ran));
if (c == 1000) {
System.out.print("Points: " + series.getData().size());
series.getData().clear();
latch.countDown();
}
});
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
iteration++;
stop = System.currentTimeMillis();
try {
latch.await();
} catch (InterruptedException e) {}
System.out.println(", Iteration : " + iteration + ", elapsed: " + (stop - start) + " [ms]");
}
}
});
What are we missing here? Why is the performance dropping after ~30-45 min in above example? Any ideas?
The above piece of code was run for 8h, each time all points were added to the Chart, the 'drawing time' was comparable (between 10100ms and 10350ms).
You don't have anything wrong with the code that I can see, but, you keep adding to the series. I don't think that this is an issue with the code, but it's the machine trying to keep up and manage ALL the points you have, you said 1000 in 10s, that means that after 20 mins, you have 120,000 points in storage, being managed, and plotted. Assuming that you record to the tenths place, thats a TON of storage, and more likely than not, you're seeing the processing slow down with all that info. Simply put, the machine can't handle it.
This is an older question, but in case anyone stumbles across it looking for performance problems, there is a huge hit here in the way data is added to the series.
Points should be added to a series all at once when possible, rather than individually.
In the case of the above example, if the code collected all encountered data points into a list, and then added to the entire list to the series using an addAll call, the performance would increase. The frequency of the addAll call can be set based on trial and error for aesthetic performance, but the hertz a user can see is much less than the hertz the Platform.RunLater is trying to update.
I found the reason, it was lack of hardware acceleration on Linux Platforms where AMD GFX cards where installed. Oracle did not provide hardware support so JavaFX was falling back to some crappy one resulting in performance decay. The piece of code from original post works no problem on Win machines or Linux machines with Nvidia cards BUT not on Linux with AMD cards. On Linux with amd cards you have to manually enforce software acceleration (as opposed to the default one).
I have been planning to use concurrency in project after learning it indeed has increased through put for many.
Now I have not worked much on multi threading or concurrency so decided to learn and have a simple proof of concept before using it in actual project.
Below are the two examples I have tried:
1. With use of concurrency
public static void main(String[] args)
{
System.out.println("start main ");
ExecutorService es = Executors.newFixedThreadPool(3);
long startTime = new Date().getTime();
Collection<SomeComputation> collection = new ArrayList<SomeComputation>();
for(int i=0; i< 10000; i++){
collection.add(new SomeComputation("SomeComputation"+i));
}
try {
List<Future< Boolean >> list = es.invokeAll(collection);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("\n end main "+(new Date().getTime() - startTime));
}
2. Without use of concurrency
public static void main(String[] args) {
System.out.println("start main ");
long startTime = new Date().getTime();
Collection<SomeComputation> collection = new ArrayList<SomeComputation>();
for(int i=0; i< 10000; i++){
collection.add(new SomeComputation("SomeComputation"+i));
}
for(SomeComputation sc:collection)
{
sc.compute();
}
System.out.println("\n end main "+(new Date().getTime() - startTime));
}
Both share a common class
class SomeComputation implements Callable<Boolean>
{
String name;
SomeComputation(String name){this.name=name;}
public Boolean compute()
{
someDumbStuff();
return true;
}
public Boolean call()
{
someDumbStuff();
return true;
}
private void someDumbStuff()
{
for (int i = 0;i<50000;i++)
{
Integer.compare(i,i+1);
}
System.out.print("\n done with "+this.name);
}
}
Now the analysis after 20 odd runs of each approach.
1st one with concurrency takes on average 451 msecs.
2nd one without concurrency takes on average 290 msecs.
Now I learned this depends on configuration , OS , version(java 7) and processor.
But all was same for both approaches.
Also learned the cost of concurrency is affordable on when computation is heavy.But this point wasn't clear to me.
Hope some one can help me understand this more.
PS: I tried finding similar questions but could find this kind.Please comment the link if you do.
Concurrency has at least two different purposes: 1) performance, and 2) simplicity of code (like 1000 listeners for web requests).
If your purpose is performance, you can't get any more speedup than the number of hardware cores you put to work.
(And that's only if the threads are CPU-bound.)
What's more, each thread has a significant startup overhead.
So if you launch 1000 threads on a 4-core machine, you can't possibly do any better than a 4x speedup, but against that, you have 1000 thread startup costs.
As mentioned in one of answers, one use of concurrency is to have simplicity of code i.e. there are certain problems which are logically concurrent so there is no way to model those problems in a non-concurrent way like producer - consumer problems, listeners to web requests etc
Other than that, a concurrent program adds to performance only if its able to save CPU Cycles for you i.e. goal is to keep CPU or CPUs busy all the time and not waste its cycles, which further means that you let your CPU do something useful when your program is supposed to be busy in something NON - CPU tasks like waiting for Disk I/O, Wait for Locks , Sleeping, GUI app user wait etc - these times simply add time to your total program run time.
So the question is, what your CPU doing when your program not using it? Can I complete a portion of my program during that time and segregate waiting part in another thread? Now a days, most of modern systems are multiprocessor and multi core systems further leading to wastage if programs are not concurrent.
The example that you wrote is doing all the processing in memory without going into any of wait states so you don't see much gain but only loss in setting up of threads and context switching.
Try to measure performance by hitting a DB, get 1 million records, process those records and then save those records to DB again. Do it in one go sequentially and in small batches in concurrent way so you notice performance difference because DB operations are disk intensive and when you are reading or writing to DB, you are actually doing Disk I/O and CPU is free wasting its cycles during that time.
In my opinion, good candidates for concurrency are long running tasks involving one of wait operations mentioned above otherwise you don't see much gain.Programs which need some background tasks are also good candidates for concurrency.
Concurrency has not to be confused with multitasking of CPU i.e. when you run different programs on same CPU at the same time.
Hope it helps !!
concurrecy is needed when the threads are sharing the same data sources
so when some thread is working with this source the others must wait until it
finish the job than they have the acces
so you need to learn synchronized methode and bluck or some thing like that
sorry for my english read this turial it's helpful
https://docs.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html
I have created a runnable jar, which runs on a single thread. The thread executes a for loop having 100 iteration. However the cpu usgae goes upto 60% on i3 processor win7 64 bit machine.
I tried to analyze the cpu usage in process Explorer
The native threads are consuming CPUs.
The native threads are all at msvcr100.dll!endthreadex+0x60
consuming cpu
I am using jdl 1.7.
Can somebody please suggest what might be going wrong here.
Here is the code:
the app accepts socket connection and processes the date sent by the client.
while (true)
{
try
{
Socket sock = ssock.accept();
// This is the function which has the for loop
obj.MyFunction();
}
catch(Exception ex)
{
ssock.close();
}
Thread.sleep(1000);
}
void MyFunction()
{
for(int i=0; i < 1000;i++)
{
// Processing done here
}
}
Profile your code, see http://docs.oracle.com/javase/7/docs/technotes/tools/index.html#jconsole for more info.
I suspect that if you try this:
ex.printStackTrace();
you might get some more information. Right now, when you get an exception, you have no idea what happened or why. You are silent about it, which is rarely the right behavior for hitting exceptions.
What is the processing? If you're trying to do a lot of work, it might not be unexpected to go to 60% cpu. What's the Big O runtime of your algo? What's the data set size?
I'm trying to record the time taken to enqueue and dequeue a specific number of Strings to a linked list queue.
If I set the number of strings manually, every time the program is run comes back with more or less the same elapsed time.
However if I ask the user for input (as below), and enter the same number the program takes twice as long to run most times. I don't understand how this is happening since I don't start the timer until just before the queueing and dequeuing function is called.
public static void main(String[], args){
long start, elapsed;
int num = Integer.parseInt(javax.swing.JOptionPane.showInputDialog("State the number of elements to queue:"));
System.out.println("Processing " + num + " strings...");
Queue lq = new LinkedQueue();
// timing section
start = System.nanoTime();
testQueue(num, lq);
elapsedTime = System.nanoTime() - start;
}
Does anyone know why this is happening?
You are expecting deterministic behaviour and i guess you are running that program on a PC with a normal OS. So its not possible to expect exact timing mainly because:
you are running a VM for java code to execute in
the VM runs in an OS.
The VM does things you dont control and the OS aswel. So you can only make a rough guess on how much time your program is going to take to execute unless you run your program in an adequate environment.
The garbage collector could be interrupting your program in the middle of its execution or the scheduler could schedule you out for another more important process, etc etc.
Without more information it is difficult to say. It could well be that waiting for user input is somehow discouraging the JIT compiler from compiling the function and it ends up being interpreted instead and taking a longer time.
I have a bizarre problem - I'm hoping someone can explain to me what is happening and a possible workaround. I am implementing a Z80 core in Java, and attempting to slow it down, by using a java.util.Timer object in a separate thread.
The basic setup is that I have one thread running an execute loop, 50 times per second. Within this execute loop, however many cycles are executed, and then wait() is invoked. The external Timer thread will invoke notifyAll() on the Z80 object every 20ms, simulating a PAL Sega Master System clock frequency of 3.54 MHz (ish).
The method I have described above works perfectly on Windows 7 (tried two machines) but I have also tried two Windows XP machines and on both of them, the Timer object seems to be oversleeping by around 50% or so. This means that one second of emulation time is actually taking around 1.5 seconds or so on a Windows XP machine.
I have tried using Thread.sleep() instead of a Timer object, but this has exactly the same effect. I realise granularity of time in most OSes isn't better than 1ms, but I can put up with 999ms or 1001ms instead of 1000ms. What I can't put up with is 1562ms - I just don't understand why my method works OK on newer version of Windows, but not the older one - I've investigated interrupt periods and so on, but don't seem to have developed a workaround.
Could anyone please tell me the cause of this problem and a suggested workaround? Many thanks.
Update: Here is the full code for a smaller app I built to show the same issue:
import java.util.Timer;
import java.util.TimerTask;
public class WorkThread extends Thread
{
private Timer timerThread;
private WakeUpTask timerTask;
public WorkThread()
{
timerThread = new Timer();
timerTask = new WakeUpTask(this);
}
public void run()
{
timerThread.schedule(timerTask, 0, 20);
while (true)
{
long startTime = System.nanoTime();
for (int i = 0; i < 50; i++)
{
int a = 1 + 1;
goToSleep();
}
long timeTaken = (System.nanoTime() - startTime) / 1000000;
System.out.println("Time taken this loop: " + timeTaken + " milliseconds");
}
}
synchronized public void goToSleep()
{
try
{
wait();
}
catch (InterruptedException e)
{
System.exit(0);
}
}
synchronized public void wakeUp()
{
notifyAll();
}
private class WakeUpTask extends TimerTask
{
private WorkThread w;
public WakeUpTask(WorkThread t)
{
w = t;
}
public void run()
{
w.wakeUp();
}
}
}
All the main class does is create and start one of these worker threads. On Windows 7, this code produces a time of around 999ms - 1000ms, which is totally fine. Running the same jar on Windows XP however produces a time of around 1562ms - 1566ms, and this is on two separate XP machines that I have tested this. They are all running Java 6 update 27.
I find this problem is happening because the Timer is sleeping for 20ms (quite a small value) - if I bung all the execute loops for a single second into wait wait() - notifyAll() cycle, this produces the correct result - I'm sure people who see what I'm trying to do (emulate a Sega Master System at 50fps) will see how this is not a solution though - it won't give an interactive response time, skipping 49 of every 50. As I say, Win7 copes fine with this. Sorry if my code is too large :-(
Could anyone please tell me the cause of this problem and a suggested workaround?
The problem you are seeing probably has to do with clock resolution. Some Operating Systems (Windows XP and earlier) are notorious for oversleeping and being slow with wait/notify/sleep (interrupts in general). Meanwhile other Operating Systems (every Linux I've seen) are excellent at returning control at quite nearly the moment specified.
The workaround? For short durations, use a live wait (busy loop). For long durations, sleep for less time than you really want and then live wait the remainder.
I'd forgo the TimerTask and just use a busy loop:
long sleepUntil = System.nanoTime() + TimeUnit.MILLISECONDS.toNanos(20);
while (System.nanoTime() < sleepUntil) {
Thread.sleep(2); // catch of InterruptedException left out for brevity
}
The two millisecond delay gives the host OS plenty of time to work on other stuff (and you're likely to be on a multicore anyway). The remaining program code is a lot simpler.
If the hard-coded two milliseconds are too much of a blunt instrument, you can calculate the required sleep time and use the Thread.sleep(long, int) overload.
You can set the timer resolution on Windows XP.
http://msdn.microsoft.com/en-us/library/windows/desktop/dd757624%28v=vs.85%29.aspx
Since this is a system-wide setting, you can use a tool to set the resolution so you can verify whether this is your problem.
Try this out and see if it helps: http://www.lucashale.com/timer-resolution/
You might see better timings on newer versions of Windows because, by default, newer version might have tighter timings. Also, if you are running an application such as Windows Media Player, it improves the timer resolution. So if you happen to be listening to some music while running your emulator, you might get great timings.