When does concurrency/multithreading help improve performance?

When does concurrency/multithreading help improve performance? - java

I have been planning to use concurrency in project after learning it indeed has increased through put for many.
Now I have not worked much on multi threading or concurrency so decided to learn and have a simple proof of concept before using it in actual project.
Below are the two examples I have tried:
1. With use of concurrency
public static void main(String[] args)
{
System.out.println("start main ");
ExecutorService es = Executors.newFixedThreadPool(3);
long startTime = new Date().getTime();
Collection<SomeComputation> collection = new ArrayList<SomeComputation>();
for(int i=0; i< 10000; i++){
collection.add(new SomeComputation("SomeComputation"+i));
}
try {
List<Future< Boolean >> list = es.invokeAll(collection);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("\n end main "+(new Date().getTime() - startTime));
}
2. Without use of concurrency
public static void main(String[] args) {
System.out.println("start main ");
long startTime = new Date().getTime();
Collection<SomeComputation> collection = new ArrayList<SomeComputation>();
for(int i=0; i< 10000; i++){
collection.add(new SomeComputation("SomeComputation"+i));
}
for(SomeComputation sc:collection)
{
sc.compute();
}
System.out.println("\n end main "+(new Date().getTime() - startTime));
}
Both share a common class
class SomeComputation implements Callable<Boolean>
{
String name;
SomeComputation(String name){this.name=name;}
public Boolean compute()
{
someDumbStuff();
return true;
}
public Boolean call()
{
someDumbStuff();
return true;
}
private void someDumbStuff()
{
for (int i = 0;i<50000;i++)
{
Integer.compare(i,i+1);
}
System.out.print("\n done with "+this.name);
}
}
Now the analysis after 20 odd runs of each approach.
1st one with concurrency takes on average 451 msecs.
2nd one without concurrency takes on average 290 msecs.
Now I learned this depends on configuration , OS , version(java 7) and processor.
But all was same for both approaches.
Also learned the cost of concurrency is affordable on when computation is heavy.But this point wasn't clear to me.
Hope some one can help me understand this more.
PS: I tried finding similar questions but could find this kind.Please comment the link if you do.

Concurrency has at least two different purposes: 1) performance, and 2) simplicity of code (like 1000 listeners for web requests).
If your purpose is performance, you can't get any more speedup than the number of hardware cores you put to work.
(And that's only if the threads are CPU-bound.)
What's more, each thread has a significant startup overhead.
So if you launch 1000 threads on a 4-core machine, you can't possibly do any better than a 4x speedup, but against that, you have 1000 thread startup costs.

As mentioned in one of answers, one use of concurrency is to have simplicity of code i.e. there are certain problems which are logically concurrent so there is no way to model those problems in a non-concurrent way like producer - consumer problems, listeners to web requests etc
Other than that, a concurrent program adds to performance only if its able to save CPU Cycles for you i.e. goal is to keep CPU or CPUs busy all the time and not waste its cycles, which further means that you let your CPU do something useful when your program is supposed to be busy in something NON - CPU tasks like waiting for Disk I/O, Wait for Locks , Sleeping, GUI app user wait etc - these times simply add time to your total program run time.
So the question is, what your CPU doing when your program not using it? Can I complete a portion of my program during that time and segregate waiting part in another thread? Now a days, most of modern systems are multiprocessor and multi core systems further leading to wastage if programs are not concurrent.
The example that you wrote is doing all the processing in memory without going into any of wait states so you don't see much gain but only loss in setting up of threads and context switching.
Try to measure performance by hitting a DB, get 1 million records, process those records and then save those records to DB again. Do it in one go sequentially and in small batches in concurrent way so you notice performance difference because DB operations are disk intensive and when you are reading or writing to DB, you are actually doing Disk I/O and CPU is free wasting its cycles during that time.
In my opinion, good candidates for concurrency are long running tasks involving one of wait operations mentioned above otherwise you don't see much gain.Programs which need some background tasks are also good candidates for concurrency.
Concurrency has not to be confused with multitasking of CPU i.e. when you run different programs on same CPU at the same time.
Hope it helps !!

concurrecy is needed when the threads are sharing the same data sources
so when some thread is working with this source the others must wait until it
finish the job than they have the acces
so you need to learn synchronized methode and bluck or some thing like that
sorry for my english read this turial it's helpful
https://docs.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html

Related

vert.x - How many Timer possible

With vert.x, how many Timers (one-shot mainly) can I create without thinking about reduced performance, memory issues, etc.? Are there some limits?
The scenario of creating those Timers is not artificial, I don't want to loop and create at a time a few thousand Timers, but rather in a real world scenario over the time Timers might be created continuously, a few at a time or a few 10th in an hour, or 100 or so over a day, but it might happen, that there are a few thousand created overall and waiting to run.
And what to expect if for example a few hundred of them are executed at the same time or are running at almost the same time? How does vert.x handle such load?
Cheers

Short answer: nothing much will happen, even if you have million of periodic timers set.
You can try it out for yourself:
Vertx vertx = Vertx.vertx();
AtomicLong counter = new AtomicLong(0);
for (int i = 0; i < 1_000_000; i++) {
vertx.setPeriodic(1, (l) -> {
if (counter.incrementAndGet() % 1_000_000 == 0) {
System.out.print(".");
}
});
}
System.out.println("Done deploying");
Here you can see that I creat 1M timers, each of them incrementing a counter every millisecond. This will peak your CPU, and consume a lot of memory, but it will work.
And we're talking about actively running timers. The only resource your sleeping timers will consume is memory (since they are couple of objects sitting is a data structure).
As long as you don't run of memory (and you'll need hundreds of thousands of them for that to happen), you should be fine.
And if you're really curious how timers in Vert.x work, you can start looking from here: https://github.com/eclipse/vert.x/blob/master/src/main/java/io/vertx/core/impl/VertxImpl.java#L372

Java multithreading for the purpose of simulating data

So I am currently creating a data analytics and predictive program, and for testing purposes, I am simulating large amounts of data (in the range of 10,000 - 1,000,000) "trials". The data is a simulated Match for a theoretical game. Each Match has rounds. The basic psudocode for the program is this:
main(){
data = create(100000);
saveToFile(data);
}
Data create(){
Data returnData = new Data(playTestMatch());
}
Match playTestMatch(){
List<Round> rounds = new List<Round>();
while(!GameFinished){
rounds.add(playTestRound());
}
Match returnMatch = new Match(rounds);
}
Round playTestRound(){
//Do round stuff
}
Right now, I am wondering whether I can handle the simulation of these rounds over multiple threads to speed up the process. I am NOT familiar with the theory behind multithreading, so would someone please either help me accomplish this, OR explain to me why this won't work (won't speed up the process). Thanks!

If you are new to Java multi-threading, this explanation might seem a little difficult to understand at first but I'll try and make it seem as simple as possible.
Basically I think generally whenever you have large datasets, running operations concurrently using multiple threads does significantly speed up the process as oppose to using a single threaded approach, but there are exceptions of course.
You need to think about three things:
Creating threads
Managing Threads
Communicating/sharing results computed by each thread with main thread
Creating Threads:
Threads can be created manually extending the Thread class or you can use Executors class.
I would prefer the Executors class to create threads as it allows you to create a thread pool and does the thread management for you. That is it will allow you to re-use existing threads that are idle in the thread pool, thus reducing memory footprint of the application.
You also have to look at ExecutorService Interface as you will be using it to excite your tasks.
Managing threads:
Executors/Executors service does a great job of managing threads automatically, so if you use it you don't have to worry about thread management much.
Communication: This is the key part of the entire process. Here you have to consider in great detail about thread safety of your app.
I would recommend using two queues to do the job, a read queue to read data off and write queue to write data to.
But if you are using a simple arraylist make sure that you synchronize your code for thread safety by enclosing the arraylist in a synchronized block
synchronized(arrayList){
// do stuff
}

If your code is thread-safe and you can split the task into discrete chunks that do not rely on each other then it is relatively easy. Make the class that does the work Callable and add the chunks of work to a List, and then use ExecutorService, like this:
ArrayList<Simulation> SL=new ArrayList<Simulation>();
for(int i=0; i<chunks; i++)
SL.add(new Simulation(i));
ExecutorService executor=Executors.newFixedThreadPool(nthreads);//how many threads
List<Future<Result>> results=null;
try {
results = executor.invokeAll(SL);
} catch (InterruptedException e) {
e.printStackTrace();
}
executor.shutdown();
for(Future<Result> result:results)
result.print();
So, Simulation is callable and returns a Result, results is a List which gets filled when executor.invokeAll is called with the ArrayList of simulations. Once you've got your results you can print them or whatever. Probably best to set nthreads equal to the number of cores you available.

Does it make sense to reuse Runnables in a thread pool?

I'm implementing a thread pool for processing a high volume market data feed and have a question about the strategy of reusing my worker instances that implement runnable which are submitted to the thread pool for execution. In my case I only have one type of worker that takes a String and parses it to create a Quote object which is then set on the correct Security. Given the amount of data coming off the feed it is possible to have upwards of 1,000 quotes to process per second and I see two ways to create the workers that get submitted to the thread pool.
First option is simply creating a new instance of a Worker every time a line is retrieved from the underlying socket and then adding it to the thread pool which will eventually be garbage collected after its run method executed. But then this got me thinking about performance, does it really make sense to instantiate 1,0000 new instances of the Worker class every second. In the same spirit as a thread pool do people know if it is a common pattern to have a runnable pool or queue as well so I can recycle my workers to avoid object creation and garbage collection. The way I see this being implemented is before returning in the run() method the Worker adds itself back to a queue of available workers which is then drawn from when processing new feed lines instead of creating new instances of Worker.
From a performance perspective, do I gain anything by going with the second approach or does the first make more sense? Has anyone implemented this type of pattern before?
Thanks - Duncan

I use a library I wrote called Java Chronicle for this. It is designed to persist and queue one million quotes per second without producing any significant garbage.
I have a demo here where it sends quote like objects with nano second timing information at a rate of one million messages per second and it can send tens of millions in a JVM with a 32 MB heap without triggering even a minor collection. The round trip latency is less than 0.6 micro-seconds 90% of the time on my ultra book. ;)
from a performance perspective, do I gain anything by going with the second approach or does the first make more sense?
I strongly recommend not filling your CPU caches with garbage. In fact I avoid any constructs which create any significant garbage. You can build a system which creates less than one object per event end to end. I have a Eden size which is larger than the amount of garbage I produce in a day so no GCs minor or full to worry about.
Has anyone implemented this type of pattern before?
I wrote a profitable low latency trading system in Java five years ago. At the time it was fast enough at 60 micro-seconds tick to trade in Java, but you can do better than that these days.
If you want low latency market data processing system, this is the way I do it. You might find this presentation I gave at JavaOne interesting as well.
http://www.slideshare.net/PeterLawrey/writing-and-testing-high-frequency-trading-engines-in-java
EDIT I have added this parsing example
ByteBuffer wrap = ByteBuffer.allocate(1024);
ByteBufferBytes bufferBytes = new ByteBufferBytes(wrap);
byte[] bytes = "BAC,12.32,12.54,12.56,232443".getBytes();
int runs = 10000000;
long start = System.nanoTime();
for (int i = 0; i < runs; i++) {
bufferBytes.reset();
// read the next message.
bufferBytes.write(bytes);
bufferBytes.position(0);
// decode message
String word = bufferBytes.parseUTF(StopCharTesters.COMMA_STOP);
double low = bufferBytes.parseDouble();
double curr = bufferBytes.parseDouble();
double high = bufferBytes.parseDouble();
long sequence = bufferBytes.parseLong();
if (i == 0) {
assertEquals("BAC", word);
assertEquals(12.32, low, 0.0);
assertEquals(12.54, curr, 0.0);
assertEquals(12.56, high, 0.0);
assertEquals(232443, sequence);
}
}
long time = System.nanoTime() - start;
System.out.println("Average time was " + time / runs + " nano-seconds");
when set with -verbose:gc -Xmx32m it prints
Average time was 226 nano-seconds
Note: there are no GCes triggered.

I'd use the Executor from the concurrency package. I believe it handles all this for you.

does it really make sense to instantiate 1,0000 new instances of the Worker class every second.
Not necessarily however you are going to have to be putting the Runnables into some sort of BlockingQueue to be able to be reused and the cost of the queue concurrency may outweigh the GC overhead. Using a profiler or watching the GC numbers via Jconsole will tell you if it is spending a lot of time in GC and this needs to be addressed.
If this does turn out to be a problem, a different approach would be to just put your String into your own BlockingQueue and submit the Worker objects to the thread-pool only once. Each of the Worker instances would dequeue from the queue of Strings and would never quit. Something like:
public void run() {
while (!shutdown) {
String value = myQueue.take();
...
}
}
So you would not need to create your 1000s of Workers per second.

Yes of course, something like this, because OS and JVM don't care about what is going on a thread, so generally this is a good practice to reuse a recyclable object.

I see two questions in your problem. One is about thread pooling, and another is about object pooling. For your thread pooling issue, Java has provided an ExecutorService . Below is an example of using an ExecutorService.
Runnable r = new Runnable() {
public void run() {
//Do some work
}
};
// Thread pool of size 2
ExecutorService executor = Executors.newFixedThreadPool(2);
// Add the runnables to the executor service
executor.execute(r);
The ExecutorService provides many different types of thread pools with different behaviors.
As far as object pooling is concerned, (Does it make sense to create 1000 of your objects per second, then leave them for garbage collection, this all is dependent on the statefulness and expense of your object. If your worried about the state of your worker threads being compromised, you can look at using the flyweight pattern to encapsulate your state outside of the worker. Additionally, if you were to follow the flyweight pattern, you can also look at how useful Future and Callable objects would be in your application architecture.

java square wave

I am trying to create a square wave on the parallel port with java. So far I have this implementation.
public class Wave extends Thread {
public Wave() {
super();
setPriority(MAX_PRIORITY);
}
#Override
public void run() {
Wave.high();
LockSupport.parkNanos(20000000);
Wave.low();
LockSupport.parkNanos(20000000);
}
public static native void high();
public static native void low();
}
In which high() and low() are implemented using JNI (a shared C library controls the parallel port). It works pretty well; it generates a square wave with a period of about 40ms. Using an oscilloscope it looks like the standard deviation is about 10 microseconds when the computer is idle. When the computer is not idle the standard deviation becomes much larger. I think this is because more context switches happen and Threads stay too long in the waiting state and the specified 20 ms is not achieved accurately.
Is there a way to make my implementation more accurate? I know I could use hardware for this but I want to know if I can do this with software too.
Would an option be to "listen" to a clock and perform an action timed to the millisecond?

Just "listening" to the clock won't solve the problem of context switches causing jitter.
If you can dedicate a core to this:
bind the thread to the core;
move IRQ handling to other cores;
have a tight loop constantly checking the time (using System.nanoTime() or RDTS/RDTSCP), and calling high()/low() as appropriate.
This way you should be able to achieve very low jitter.
Of course, if the task is to simply produce a square wave, this is a pretty inefficient use of computing resources.

i think there are going to be two sources of jitter.
first, garbage collection (and possibly other background processes, like the JIT) in java. for the code you gave, there should not be any gc. but if this is part of a larger system then you will likely find that garbage collection is required, and that it may alter the timings when it runs. you can try ameliorate this by playing with the jvm settings (java -X).
second, the system scheduler. in addition to the suggestions by aix, you can bump the priority of the process and do some linux-specific tweaks. this article explains some of the problems with linux. ubuntu has a low-latency kernel, which you can install, but i can't find info on what it actually contains so you can do the same on other systems (update: i think it may contain this patch). if you want to look for more info "low latency" is the key think to search for, and people doing audio processing on linux tend to be the ones who care most about this).

If your context switching does not cause too much delay, you may try to park your thread until a given time, rather than for a given interval:
public class Wave extends Thread {
private final Object BLOCKER = new Object();
public Wave() {
super();
setPriority(MAX_PRIORITY);
}
#Override
public void run() {
// I suspect this should be running in an endless loop?
for (;;) {
Wave.high();
long t1 = System.currentTimeMillis();
// Round interval up to the next 20ms "deadline"
LockSupport.parkUntil(BLOCKER, t1 + 20 - (t1 % 20));
Wave.low();
// Round interval up to the next 20ms "deadline"
long t2 = System.currentTimeMillis();
LockSupport.parkUntil(BLOCKER, t2 + 20 - (t2 % 20));
}
}
public static native void high();
public static native void low();
}
As this relies on the wall-clock time in ms, rather than a more precise nano-seconds time, this will not work well for much higher frequencies. But this may not work either, as GC (and other processes) may interrupt this thread for an "unfortunate" amount of time, resulting in the same jitter.
When I tested this on my Windows 7 quad-core with JDK 6, I had some non-negligible jitter about every second, so aix's solution is probably better

why my android project raise CPU usage range from 60% ~ 100%?

Hello I'm making a chat application in android
so overall, I have a service which contains lots of classes and threads.
in my service, i had socket input read class, socket output writer class, and pinger that in summary have 6 threads.
Actually, i'm very new with this problem, well i can say i have no idea what makes a program occupy high percentage of CPU processes. is it cause too many static variables maybe? or too many running threads maybe, or too many local variables maybe?
I don't know exactly what is going on...?
So, please share with me your experiences and knowledge
UPDATE
public void run() {
while(isRunning) {
try {
Thread.sleep(500);
if(!startCheck) {
//Log.v(TAG, "SocketQueue: "+socketTaskQueue.size()
if(socketTaskQueue.size() > 0) {
processSocketTask();// TODO
}
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
so basically, i made my threads like above example.
so, i have a vector called socketTaskQueue, and this thread job's is to check whether there's a socket task or not. if it does, then it will execute processSocketTask function that will get the top element of the vector queue and then remove it.
UPDATE
T.T this post is embarrassing! i forget to put Thread.sleep() in some of my threads!
SORRY FOR BOTHERING YOU GUYS! :p

It is caused, usually, by threads that use CPU even when they cannot accomplish useful work. For example, when a thread is waiting for something to happen, does it wait in a way that uses no CPU? Or does it keep waking up needlessly even before it can do work?
It can also be caused by threads that do work in extremely inefficient ways.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.