Multiple threads waiting for nothing?

Multiple threads waiting for nothing? - java

TLDR : during a multithreaded massive database insertion, multiple thread are waiting for no evident reason.
We need to create multiple rows in a database. To speed up insertion, we use multithreading so that multiple objects can be generated and inserted in parallel. We are using Hibernate, Spring batch and Spring scheduling (ThreadPoolTaskExecutor, Partitioner, ItemProcessor). We started from this example.
We looked at thread states with JVisualVM and noticed that there are never more than 8 active threads at a time, whatever the hardware running the program. We tried "standard desktop" computers (dual core), but also two AIX : one with 8 active CPU, one with 60 active CPUs.
Any idea why we can't have more than 8 working threads at a time?
A list of things we already checked:
All threads have a work to do (Partitioner and ThreadPoolTaskExecutor are configured so that each thread has the same amount of data to insert in DB).
We tried various commit-interval : 1, P where P is the size of the partition, N where N is the sum of all P (it should not be the cause of the problem, but commiting data seems to be the long part of the job while data generation is fast).
8 is not a default value of any object's parameter we use.

Related

Changing the number of running steps by increasing the step pool size in Spring Batch

This is a compound question regarding how changing size of the thread pool sizes at run-time affects the spring batch run-time system.
To start I would like to do a verbiage clarification: concurrency = # of running steps and parallelism = # threads per step.
For a clear understanding of how I am using spring batch to do my processing. Currently I have a large number of files(200+) that are being generated and I am using Spring Batch to transfer the files where each step maps to 1 file.
Everything about the job is dynamic, as in the number of steps and each step's reader and writer is distinct to that step. So no step shares readers or writers. There is a thread pool dedicated to running the steps concurrently, and then each step has its own thread pool so we can do parallelism per step. When combined with commit interval this gives great throughput and control.
So my questions are:
How can I change the number of running steps after the Job has started?
How can I change the commit interval after a step has started processing?
So lets consider an example of why I would like to do this and what exactly I mean by change "running steps" and "commit interval".
Consider the case you have a total of 300 steps to process with a step thread pool size 5. I begin processing and realize that I have more resources to utilize, I would like to change the thread count to say 8.
When I actually do this at run-time what I experience is that the thread pool does increase but the number of running steps does not change. Why is that?
Following a similar logic say I have more memory to utilize, I would then like to increase my commit interval at run-time. I have not found anything in the StepExecution class that would let me change the commit interval surprisingly. Why not?
What is interesting is that for parallelism I am able to change the number of running threads by simply increasing that thread pool's size. From simply changing the number of parallel threads I noticed massive increase in throughput.
If you would like more information I can provide code, and link to the repository.
Thank you very much.

While it is possible to make the commit interval and thread pool size configurable and change them at startup time, it is not possible to change them at runtime (ie "in-flight") once the job execution has started.
Making the commit interval and thread pool size configurable (via application/system properties or passing them as job parameters) will allow you to empirically adapt the values to best utilize your resources without having to recompile/repackage your application.
The runtime dynamism you are looking for is not available by default, but you can always implement the Step interface and use it as part of a Spring Batch job next to other step types provided out-of-the-box by the framework.

Schedulers.elastic() not using early generated threads

I've written web-server on spring-boot-2.0 which uses netty under the hood.
This application uses Schedulers.elastic() threads.
When it started, about 100 elastic threads were created. These threads were rarely used and we've had few loading. But after a working day, the number of threads in elastic pool has increased to 1300. And now execution is on the elastic-1XXX, elastic-12XX threads, (name's numbers are above 100 and even 900).
Elastic, as I understand it, uses cachedThreadPool under the hood.
Why have new elastic threads been created and why has task switched to new threads?
What is the criteria for adding new trends?
And why haven't old threads (elastic-XX, elastic-1xx) been shutdown?

Without more information about the type of workload, the maximum concurrency, burst and average task duration, it’s really hard to tell if there’s a problem here.
By definition the elastic scheduler is creating an unbounded number of threads, as long as new tasks are added to the queue.
If the workload is bursty, with night concurrency at regular times, then it’s n unexpected to find a large number of threads. You could leverage the newElastic variants to reduce the TTL (default is 60s).
Again, without more information it’s hard to tell but your workload might not fit this scheduler. If your workload is CPU bound, the parallel scheduler is a better fit. The elastic one is tailored for IO/latency bound tasks.

The problem was: i was using Schedulers.elastic() for non-blocking operations, while there were no such operations . When i had removed elastic(), my service started to working correctly (without elastic's threads).

Why my java long running Threads (5k+ thread) not utilizing all machines cores (12 cores)?

I've Worte a simple multithread java application, The main method just creates 5k threads, each thread will loop over a list having 5M records to process.
My Machine specs:
CPU cores: 12 cores
Memory: 13Gb RAM
OS: Debian 64-bit
My jar is now running, And I use hTop to monitor my application and this is what I can see while its running
And This is how I construct a Thread:
ExecutorService executor = Executors.newCachedThreadPool();
Future<MatchResult> future = executor.submit(() -> {
Match match = new Match();
return match.find(this);
});
Match.class
find(Main main){
// looping over a list of 5M
// process this values and doing some calculations
// send the result back to the caller
// this function has no problem and it just takes a long time to run (~160 min)
}
And now I have some questions:
1- Based on my understanding if I have a multiThreaded process, it'll fully utilize all my cores until the task is completed, so why the work load is only around 0.5 (only half a core is used)?
2- Why my Java app state is "S" (sleeping) while its actually running and filling up the logs file?
3- Why I can only see 2037 threads out of 5k are running (this number was actually less than this and its getting increased over time)
My Target: to utilize all cores and get all this 5k+ done as fast as it can be :)

Based on my understanding if I have a multiThreaded process, it'll fully utilize all my cores until the task is completed.
Your understanding is not correct. There are lots of reasons why cores may not (all) be used in a poorly designed multi-threaded application.
so why the work load is only around 0.5 (only half a core is used)?
A number of possible reasons:
The threads may be deadlocked.
The threads may all be contending for a single lock (or a small number of locks), resulting in most of them waiting.
The threads could all be waiting for I/O; e.g. reading the records from some database.
And those are just some of the more obvious possible reasons.
Given the that your threads are making some progress, I think that explanation #2 is a good fit to your "symptoms".
For what it is worth, creating 5k threads is almost certainly a really bad idea. At most 12 of those could possibly be running at any time. The rest will waiting to run (assuming you resolve the problem that is leading to thread starvation) and tying down memory. The latter has various secondary performance effects.
My Target: to utilize all cores and get all this 5k+ done as fast as it can be :)
Those two goals are probably mutually exclusive :-)
All threads are logging to the same file by a the java.util.Logger.
That is a possibly leading to them all contending for the same lock ... on a something in the logger framework. Or bottlenecking on file I/O for the log file.
Generally speaking logging is expensive. If you want performance, minimize your logging, and for cases where the logging is essential, use a logging framework that doen't introduce a concurrency bottleneck.
The best way to solve this problem is to profile the code and figure ouot where it is spending most of its time.
Guesswork is inefficient.

Thank you guys, I've fixed the problem and now Im having the 12 cores running up to maximum as you see in the picture. :)
I actually tried to run this command jstack <Pid> to see the status of my all running threads in this process ID, and I found that 95% of my threads are actually BLOCKED at the logging line, I did some googling and found that I can use AsynchAppender in log4J so logging will not block the thread

Scalability guidance for spawning 50 thousand threads

I have Java app which reads JSON file which contains SQL queries and fires them on database using JDBC.
Now I have 50 thousand such files and I need to spawn 50 thousand independent threads to read each files and upload them into database. I need to spawn these threads on a specific time after specific seconds. For e.g. I have the following Map of sorted login details when I should spawn these threads. Login details are in seconds many threads to be spawned at 0 seconds, 10 seconds, 50 seconds etc
Map<String,Integer> loginMap = new HashMap<>(50000);
I am using ScheduleExecutureService to schedule these threads I have something like the following
ScheduleExecutureService ses = Executors.newScheduledThreadPool(50000);
for(Map.Entry<String,Integer> entry : loginMap.entrySet()) {
Integer loginTime = (Integer) entry.getValue();
ses.schedule(new MyWorker(entry.getKey()),loginTime,TimeUnit.SECONDS);
}
Above code works for small files in few thousands but it does not scale for 50 thousands and also since my worker uses JDBC connections database is running out of connections.
Even though I acquire connection in the run method of thread. Does these threads starts executing run even if it is not suppose to run? I am new to multi-threading.

You don't want 50,000 threads! Each thread consumes some resources, particularly an area of RAM for stack space, this could be about 1MB. Do you have 50GB of RAM?
There is also no benefit for running many more threads than you have cores.
This doesn't mean you can't queue 50,000 tasks and a sensible number of worker threads related to the hardware.
ScheduleExecutureService ses = Executors.newScheduledThreadPool(8); //sensible, though could be derived from acutal hardware capabilities.

Distribute database records evenly across multiple processes

I have a database table with 3 million records. A java thread reads 10,000 records from table and processes it. After processing it jumps to next 10,000 and so on. In order to speed up, i have 25 threads doing the same task (reading + processing), and then I have 4 physical servers running the same java program. So effectively i have 100 thread doing the same work (reading + processing).
I strategy i have used is to have a sql procedure which does the work of grabbing next 10,000 records and marking them as being processed by a particular thread. However, i have noticed that the threads seems to be waiting for a some time trying to invoke the procedure and getting a response back. What other strategy i can use to speed up this process of data selection.
My database server is mysql and programming language is java

The idiomatic way of handling such scenario is producer-consumer design pattern. And in idiomatic way of implementing it in Java land is by using jms.
Essentially you need one master server reading records and pushing them to JMS queue. Then you'll have arbitrary number of consumers reading from that queue and competing with each other. It is up to you how you want to implement this in detail: do you want to send a message with whole record or only ID? All 10000 records in one message or record per message?
Another approach is map-reduce, check out hadoop. But the learning curve is a bit steeper.

Sounds like a job for Hadoop to me.

I would suspect that you are majorly database IO bound with this scheme. If you are trying to increase performance of your system, I would suggest partitioning your data across multiple database servers if you can do so. MySQL has some partitioning modes that I have no experience with. If you do partition yourself, it can add a lot of complexity to a database schema and you'd have to add some sort of routing layer using a hash mechanism to divide up your records across the multiple partitions somehow. But I suspect you'd get a significant speed increase and your threads would not be waiting nearly as much.
If you cannot partition your data, then moving your database to a SSD memory drive would be a huge win I suspect -- anything to increase the IO rates on those partitions. Stay away from RAID5 because of the inherent performance issues. If you need a reliable file system then mirroring or RAID10 would have much better performance with RAID50 also being an option for a large partition.
Lastly, you might find that your application performs better with less threads if you are thrashing your database IO bus. This depends on a number of factors including concurrent queries, database layout, etc.. You might try dialing down the per-client thread count to see if that makes a different. The effect may be minimal however.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.