Scalability guidance for spawning 50 thousand threads

Scalability guidance for spawning 50 thousand threads - java

I have Java app which reads JSON file which contains SQL queries and fires them on database using JDBC.
Now I have 50 thousand such files and I need to spawn 50 thousand independent threads to read each files and upload them into database. I need to spawn these threads on a specific time after specific seconds. For e.g. I have the following Map of sorted login details when I should spawn these threads. Login details are in seconds many threads to be spawned at 0 seconds, 10 seconds, 50 seconds etc
Map<String,Integer> loginMap = new HashMap<>(50000);
I am using ScheduleExecutureService to schedule these threads I have something like the following
ScheduleExecutureService ses = Executors.newScheduledThreadPool(50000);
for(Map.Entry<String,Integer> entry : loginMap.entrySet()) {
Integer loginTime = (Integer) entry.getValue();
ses.schedule(new MyWorker(entry.getKey()),loginTime,TimeUnit.SECONDS);
}
Above code works for small files in few thousands but it does not scale for 50 thousands and also since my worker uses JDBC connections database is running out of connections.
Even though I acquire connection in the run method of thread. Does these threads starts executing run even if it is not suppose to run? I am new to multi-threading.

You don't want 50,000 threads! Each thread consumes some resources, particularly an area of RAM for stack space, this could be about 1MB. Do you have 50GB of RAM?
There is also no benefit for running many more threads than you have cores.
This doesn't mean you can't queue 50,000 tasks and a sensible number of worker threads related to the hardware.
ScheduleExecutureService ses = Executors.newScheduledThreadPool(8); //sensible, though could be derived from acutal hardware capabilities.

Related

Configuring akka dispatcher for large amount of concurrent graphs

My current system has around 100 thousand running graphs, Each is built like that:
Amqp Source ~> Processing Stage ~> Sink
Each amqp source receives messages at a rate of 1 per second. Only around 10 thousand graphs receive messages at once, So I've figured there is no need for more than 10 thousand threads running concurrently.
These are currently the settings i'm using:
my-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 16
parallelism-factor = 2.0
parallelism-max = 32
}
throughput = 20
}
Obviously these settings are not defining enough resources for the wanted performances, So I wonder:
Am I correct to assume that 10 thousand threads are enough?
Is it possible to configure the dispatcher (by editing application.conf) for that amount of threads? How would the configuration look like? Should I pick "fork-join-executor" or "thread-pool-executor" as the executor?
Thanks.

Akka and Akka Streams is based on async, an actor or stream only uses a thread for a chunk of processing and then hands the thread back to the threadpool, this is nice because you can size the threadpool according the number of cores you have to actually execute the threads rather than the things you want to execute. Having many threads will have an overhead, both in scheduling/switching and in that the JVM allocates a stack of somewhere around 0.5-1Mb per thread.
So, 10 thousand actors or running streams, can still execute fine on a small thread pool. Increasing the number of threads may rather slow the processing down than make anything faster as more time is spent on switching between threads. Even the default settings may be fine and you should always benchmark when tuning to see if the changes had the effect you expected.
Generally the fork join pool gives good performance for actors and streams. The thread-pool based one is good for use cases where you cannot avoid blocking (see this section of the docs: https://doc.akka.io/docs/akka/current/dispatchers.html#blocking-needs-careful-management)

Executor Service and Huge IO

I have a service which calls a database and performs a callback on each result.
ExecutorService service = Executors.newFixedThreadPool(10);
service.exectute(runnable(segmentID, callback)); // database is segmented
Runnable is:
call database - collect all the rows for the segment keep in memory
perform callback(segment);
Now the issue is I get a huge rows returned by database and my understanding is executor service will schedule threads whenever they are idle in I/O. So I go into Out of Memory.
Is there any way to restrict only 10 threads are running at a time and no executor service scheduling happens?
For some reason I have to keep all the rows of a segment in memory.
How can I prevent going OOM by doing this. Is Executor service newFixedThreadPool solution for this?
Please let me know if I missed anything.
Thanks

You must use a fixed thread pool. There's a rule that you should only spawn N threads where N should be in the same order of magnitude than the number of cores in the CPU. There's a debate on the size of N, and you can read more about it here. For a normal CPU we could be talking 4,8, 16 threads.
But even if you were running your program in a cluster, which I think you are not, you can't just fetch 20k rows from a DB and pretend to spawn 20k threads. If you do, the performance of your app is going to degrade big time, because most of the CPU cycles would be consumed in context switching.
Now even with fixed thread pool, you might run into OOM exceptions anyway if the data fetched is stored in memory at the same time. I think the only solution to this is to fetch smaller chunks of data, or write the data to a file as it gets downloaded.

Multiple threads waiting for nothing?

TLDR : during a multithreaded massive database insertion, multiple thread are waiting for no evident reason.
We need to create multiple rows in a database. To speed up insertion, we use multithreading so that multiple objects can be generated and inserted in parallel. We are using Hibernate, Spring batch and Spring scheduling (ThreadPoolTaskExecutor, Partitioner, ItemProcessor). We started from this example.
We looked at thread states with JVisualVM and noticed that there are never more than 8 active threads at a time, whatever the hardware running the program. We tried "standard desktop" computers (dual core), but also two AIX : one with 8 active CPU, one with 60 active CPUs.
Any idea why we can't have more than 8 working threads at a time?
A list of things we already checked:
All threads have a work to do (Partitioner and ThreadPoolTaskExecutor are configured so that each thread has the same amount of data to insert in DB).
We tried various commit-interval : 1, P where P is the size of the partition, N where N is the sum of all P (it should not be the cause of the problem, but commiting data seems to be the long part of the job while data generation is fast).
8 is not a default value of any object's parameter we use.

Clarification on Thread performance processing 1000's of log files

I am extracting out lines matching a pattern from log files. Hence I allotted each log file to a Runnable object which writes the found pattern lines to a result file. (well synchronised writer methods)
Important snippet under discussion :
ExecutorService executor = Executors.newFixedThreadPool(NUM_THREAD);
for (File eachLogFile : hundredsOfLogFilesArrayObject) {
executor.execute(new RunnableSlavePatternMatcher(eachLogFile));
}
Important Criteria :
The number of log files could be very few like 20 or for some users the number of logs files could cross 1000. I recorded series of tests in an excel sheet and I am really concerned on the RED marked results. 1. I assume that if the number of threads created is equal to the number of files to be processed then the processing time would be less, compared to the case when the number of thread is lesser than the number of files to be processed which didn't happen. (please advice me if my understanding is wrong)
Result :
I would like to identify a value for the NUM_THREAD which is efficient for less number of files as well as 1000's of files
Suggest me answer for Question 1 & 2
Thanks !
Chandru

you just found that your program is not CPU bound but (likely) IO bound
this means that beyond 10 threads the OS can't keep up with the requested reads of all the thread that want their data and more threads are waiting for the next block of data at a time
also because writing the output is synchronized across all threads that may even be the biggest bottle neck in your program, (producer-consumer solution may be the answer here to minimize the time threads are waiting to output)
the optimal number of threads depends on how fast you can read the files (the faster you can read the more threads are useful),

It appears that 2 threads is enough to use all your processing power. Most likely you have two cores and hyper threading.
Mine is a Intel i5 2.4GHz 4CPU 8GB Ram . Is this detail helpful ?
Depending on the model, this has 2 cores and hyper-threading.
I assume that if the number of threads created is equal to the number of files to be processed then the processing time would be less,
This will maximise the overhead, but wont give you more cores than you have already.

When parallelizing, using a lot more threads than you have available cpu cores will usually increase the overall time. You system will spend some overhead time switching from thread to thread on one cpu core instead of having it executing the tasks at once, one after an other.
If you have 8 cpu cores on your computer, you might observe some improvement using 8/9/10 threads instead of using only 1 while using 20+ threads will actually be less efficient.

One problem is that I/O doesn't parallelize well, especially if you have a non-SSD, since sequential reads (what happens when one thread reads a file) are much faster than random reads (when the read head has to jump around between different files read by several threads). I would guess you could speed up the program by reading the files from the thread sending the jobs to the executor:
for (File file : hundredsOfLogFilesArrayObject) {
byte[] fileContents = readContentsOfFile(file);
executor.execute(new RunnableSlavePatternMatcher(fileContents));
}
As for the optimal thread count, that depends.
If your app is I/O bound (which is quite possible if you're not doing extremely heavy processing of the contents), a single worker thread which can process the file contents while the original thread reads the next file will probably suffice.
If you're CPU bound, you probably don't want many more threads than you've got cores:
ExecutorService executor = Executors.newFixedThreadPool(
Runtime.getRuntime().availableProcessors());
Although, if your threads get suspended a lot (waiting for synchronization locks, or something), you may get better result with more threads. Or if you've got other CPU-munching activitities going on, you may want fewer threads.

You can try using cached thread pool.
public static ExecutorService newCachedThreadPool()
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available.
You can read more here

how many threads to run in java?

I had this brilliant idea to speed up the time needed for generating 36 files: use 36 threads!! Unfortunately if I start one connection (one j2ssh connection object) with 36 threads/sessions, everything lags way more than if I execute each thread at a time.
Now if I try to create 36 new connections (36 j2ssh connection objects) then each thread has a separate connection to server, either i get out of memory exception (somehow the program still runs, and successfully ends its work, slower than the time when I execute one thread after another).
So what to do? how to find the optimal thread number I should use?
because Thread.activeCount() is 3 before starting mine 36 threads?! i'm using Lenovo laptop Intel core i5.

You could narrow it down to a more reasonable number of threads with an ExecutorService. You probably want to use something near the number of processor cores available, e.g:
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(threads);
for (int i = 0; i < 36; i++) {
service.execute(new Runnable() {
public void run() {
// do what you need per file here
}
});
}
service.shutdown();

A good practice would be to spawn threads equivalent to the number of cores in your processor. I normally use a Executors.fixedThreadPool(numOfCores) executor service and keep feeding it the jobs from my job queue, simple. :-)

Your Intel i5 has two cores; hyperthreading makes them look like four. So you only get four cores' worth of parallelization; the rest of your threads are time sliced.
Assume 1MB RAM per thread just for thread creation, then add the memory that each thread requires to process the file. That will give you an idea about why you're getting out of memory errors. How big are the files you're dealing with? You can see that you'll have a problem if they're very large to have them in memory at the same time.
I'll assume that the server receiving the files can accept multiple connections, so there's value in trying this.
I'd benchmark with 1 thread and then increase them until I found that the performance curve was flattening out.

Brute force: Profile incrementally. Increase the number of threads gradually and check the performance. As the number to connections is just 36, its should be easy

You need to understand that if you create 36 threads you still have one or two processors and it would be switching between threats most of the time.
I would say you increment the threads a little bit, let's say 6 and see the behavior. And then go from there

One way to tune the numebr of threads to the size of the machine is to use
int processors = Runtime.getRuntime().availableProcessors();
int threads = processors * N; // N could 1, 2 or more depending on what you are doing.
ExecutorService es = Executors.newFixedThreadPool(threads);

First you have to find out where the bottle neck is.
If it is the SSH connection, it usually does not help to open multiple connections in parallel. Better use multiple channels on one connection, if needed.
If it is the disk IO, creating multiple threads writing (or reading) only helps if they are accessing different disks (which is seldom the case). But you could have another thread doing CPU-bound things while you are waiting on your disk IO in one thread.
If it is the CPU, and you have enough idle cores, more threads can help. Even more, if they don't need to access common data. But still, more threads than cores (+ some threads doing IO) does not help. (Also take in mind that usually there are other processes on your server, too.)

Using more threads than the number of cores on your machine is going only to slow down the whole process. It will speed up till you reach this number.

Be sure you don't create more threads than you have processing units or you are likely to create more overhead with context switching than you gain in concurrency. Also remember that you only have 1 HDD and 1 HDD controller as a result, I doubt multithreading is going to help you at all here.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Scalability guidance for spawning 50 thousand threads - java

Related

Configuring akka dispatcher for large amount of concurrent graphs

Executor Service and Huge IO

Multiple threads waiting for nothing?

Clarification on Thread performance processing 1000's of log files

how many threads to run in java?

Categories

Resources