In a life without Java Executors, new threads would have to be created for each Runnable tasks. Making new threads requires thread overhead (creation and teardown) that adds complexity and wasted time to a non-Executor program.
Referring to code:
no Java Executor -
new Thread (aRunnableObject).start ();
with Java Executor -
Executor executor = some Executor factory method;
exector.execute (aRunnable);
Bottom line is that Executors abstract the low-level details of how to manage threads.
Is that true?
Thanks.
Bottom line is that Executors abstract the low-level details of how to manage threads. Is that true?
Yes.
They deal with issues such as creating the thread objects, maintaining a pool of threads, controlling the number of threads are running, and graceful / less that graceful shutdown. Doing these things by hand is non-trivial.
EDIT
There may or may not be a performance hit in doing this ... compared with a custom implementation perfectly tuned to the precise needs of your application. But the chances are that:
your custom implementation wouldn't be perfectly tuned, and
the performance difference wouldn't be significant anyway.
Besides, the Executor support classes allow you to simply tune various parameters (e.g. thread pool sizes) if there is an issue that needs to be addressed. I don't see how garbage collection overheads would be significantly be impacted by using Executors, one way or the other.
As a general rule, you should focus on writing your applications simply and robustly (e.g. using the high level concurrency support classes), and only worry about performance if:
your application is running "too slow", and
the profiling tools tell you that you've got a problem in a particular area.
Couple of benefits of executors as against normal threads.
Throttling can be achieved easily by varying the size of ThreadPools. This helps keeping control/check on the number of threads flowing through your application. Particularly helpful when benchmarking your application for load bearing.
Better management of Runnable tasks can be achieved using the RejectionHandlers.
I think all that executors do is that they will do the low level tasks
for you, but you still have to judiciously decide which thread pool do
you want. I mean if your use case needs maximum 5 threads and you go
and use thread pool having 100 threads, then certainly it is going to
have impact on performance. Other than this there is noting extra
being done at low level which is going to halt the system. And last of
all, it is always better to get an idea what is being done at low
level so that it will give us fair idea about the underground things.
Related
I was reading on threading and learned about fork/join API.
I found that you can either run threads with the commonPool being the default pool managing the threads, or I can submit the threads to a newly created ForkJoinPool.
The difference between the two is as follows, to my understanding:
The commonPool is the main pool created statically (where some pool methods don't work as they normally do with other pools like shutting it down), and is used mainly for the application to run.
The number of parallelism in the default/commonPool is the number of cores - 1, where the default number of parallelism of a newly created pool = number of cores (or the number specified by system property parallelism - I'm ignoring the fully qualified system property key name -).
Based on the documentation, the commonPool is fine for most uses.
This all boils down to my question:
When should I use the common pool? And why so? When should I create a new pool? And why so?
Short Story
The answer, like most things in software engineering, is: "It depends".
Pros of using the common pool
If you look at this wonderful article:
According to Oracle’s documentation, using the predefined common pool reduces resource consumption, since this discourages the creation of a separate thread pool per task.
and
Using the fork/join framework can speed up processing of large tasks,
but to achieve this outcome, some guidelines should be followed:
Use as few thread pools as possible – in most cases, the best decision
is to use one thread pool per application or system
Use the default
common thread pool, if no specific tuning is needed
Use a reasonable threshold for splitting ForkJoingTask into subtasks
Avoid any blocking in your ForkJoingTasks
Pros of using dedicated pools
However, there are also some arguments AGAINST following this approach:
Dedicated Pool for Complex Applications
Having a dedicated pool per logical working unit in a complex application is sometimes the preferred approach. Imagine an application that:
Takes in a lot of events and groups them (that can be done in parallel)
Then workers do the work (that can be done in parallel as well)
Finally, some cleanup workers do some cleanup (that can be done in parallel as well).
So your application has 3 logical work groups each of which might have its own demands for parallelism. (Keep in mind that this pool has parallelism set to something fairly low on most machines)
Better to not step on each other's toes, right? Note that this can scale up to a certain level, where it's recommended to have a separate microservice for each of these work units, but if for one reason or another you are not there already, then a dedicated forkJoinPool per logical work unit is not a bad idea.
Other libraries
If your app's code has only one place where you want parallelism, you don't have a guarantee that some developer wouldn't pull some 3-rd party dependency which also relies on the common ForkJoinPool, and you still have two places where this pool is in demand. That might be okay for your use case, and it might not be, especially if your default pool's parallelism is 4 or below.
Imagine the situation when your app critical code (e.g event handling or saving data to a database) is having to compete for the common pool with some library which exports logs in parallel to some log sink.
Dedicated ForkJoinPool Makes Logging Neater
Additionally, the common forkJoinPool has a rather non-descriptive naming so if you are debugging or looking at logs, chances are you will have to sift through a ton of
ForkJoinPool.commonPool-worker-xx
In the situation described above, compare that with:
ForkJoinPool.grouping-worker-xx
ForkJoinPool.payload-handler-worker-xx
ForkJoinPool.cleanup-worker
Therefore you can see there is some benefit in logging cleanness when using a dedicated ForkJoinPool per logical work group.
TL;DR
Using the common ForkJoinPool has lower memory impact, less resources and thread creation and lower garbage collection demands. However, this approach might be insufficient for some use cases, as pointed above.
Using a dedicated ForkJoinPool per logical work unit in your application provides neater logging, is not a bad idea to use when you have low parallelism level (i.e not many cores), and when you want to avoid thread contention between logically different parts of your application. This, however, comes at a price of higher cpu utilization, higher memory overhead, and more thread creation.
I'm designing a class that provides statistical information about groups of Collatz sequences. One of my goals is to be able to process a large number of sequences containing enormous terms (on the scale of hundreds or even thousands of digits) simultaneously, with maximum efficiency.
To this end, I plan on using the best data collection technique for each individual statistic, which means some tasks may be more efficiently dealt with by a ForkJoinPool, others by the standard cached and fixed thread pools provided in Executors. Would the overhead of creating multiple thread pools, or shutting one down and creating another, if I went that route, cost me more than I would save?
Would the overhead of creating multiple thread pools, or shutting one down and creating another, if I went that route, cost me more than I would save?
How could we possibly tell you that?
There is definitely an overhead in shutting down and restarting a thread pool. If any kind. Creating threads is not cheap.
However, we have no way of quantifying how much you save by using different kinds of thread pool. If we can't quantify that it is impossible to advise you on whether your strategy will work ... or not.
(But I think that repeatedly shutting down and recreating thread pools would be a bad idea. The performance impact of an idle pool is minimal.)
This "smells" of premature optimization. (It is like trying to tune the engine of a racing car before you have manufactured the engine block!)
My advice would be to (largely1) forget about performance to start with. For now, focus on getting something that works. Here's what I would do:
Implement the code using the easiest strategy, write test cases, test / debug until it works.
Choose a sample problem or set of problems that is typical of the kind you will be trying to solve
Implement a test harness that allows you to measure the code's performance for the sample problems. (Beware of the standard problems with Java benchmarking ...)
Benchmark your code.
Is it fast enough? Stop NOW.
If not, continue.
Implement one of the alternative strategies, and test / debug.
Benchmark the modified code.
Is it fast enough? Stop NOW.
Is it clear that it doesn't help?. Abandon it, and try another strategy.
Can you tweak it? If so, try that.
Go to 5.
Also, it may be worthwhile implementing the different strategies in such a way that you can tune them or switch between them using command line or config file settings.
As a general rule, it is hard to determine a priori how well any complicated algorithm or strategy is going to perform. Generally speaking, there are too many factors to take into account for a theoretical ... or intuitive ... approach to give a reliable prediction. Benchmarking and tuning is the way to go.
1 - Obviously, if you know that some technique or algorithm will perform badly, and you have a better alternative that is about the same effort to implement ... do the sensible thing.
Since you are only talking about two different types of pools (fork-join and Executor based pools), and you claim that at least some of your tasks are more suited to one type or pool or the other, it is entirely likely that the overhead of using two types of pools is worth it.
After all, you can just keep both types of pools alive and so there is only a one time cost to setting up the pools and creating the threads, while the (apparent) benefit of the two pool types will apply across the entirety of your processing. Since you are doing an "enormous" amount of work even small benefits will eventually add up and overwhelm the one-time costs (which are probably measured in micro-architecture per thread).
Key to this observation is that there is no real ongoing overhead for existing but inactive threads in the pool you aren't using.
Of course, that said, the short answer it "just try both approaches and measure it!".
I need to update 550 000 records in a table with the JBOSS Server is Starting up. I need to make this update as a backgroundt process with multiple threads and parallel processing. Application is Spring, so I can use initializing bean for this.
To perform the parallal processing I am planning to use Java executor framework.
ThreadPoolExecutor executor=(ThreadPoolExecutor)Executors.newFixedThreadPool(50); G
How to decide the thread pool count?
I think this is depends on hardware My hardware. it is 16 GB Ram and Co-i 3 processor.
Is it a good practice to Thread.sleep(20);while processing this big update as background.
I don't know much about Spring processing specifically, but your questions seem general enough that I can still provide a possibly inadequate answer.
Generally there's a lot of factors that go into how many threads you want. You definitely don't want multiple threads on a core, as that'll slow things down as threads start contending for CPU time instead of working, so probably your core count would be your ceiling, or maybe core count - 1 to allow one core for all other tasks to run on (so in your case maybe 3 or 4 cores, tops, if I remember core counts for i3 processors right). However, in this case I'd guess you're more likely to run into I/O and/or memory/cache bottlenecks, since when those are involved, those are more likely to slow down your program than insufficient parallelization. In addition, the tasks that your threads are doing would affect the number of threads you can use; if you have one thread to pull data in and one thread to dump data back out after processing, it might be possible for those threads to share a core.
I'm not sure why this would be a good idea... What use do you see for Thread.sleep() while processing? I'd guess it'd actually slow down your processing, because all you're doing is putting threads to sleep when they could be working.
In any case, I'd be wary of parallelizing what is likely to be an I/O bound task. You'll definitely need to profile to see where your bottlenecks are, even before you start parallelizing, to make sure that multiple cores will actually help you.
If it is the CPU that is adding extra time to complete your task, then you can start parallelizing. Even then, be careful about cache issues; try to make sure each thread works on a totally separate chunk of data (e.g. through ThreadLocal) so cache/memory issues don't limit any performance increases. One way this could work is by having a reader thread dump data into a Queue which the worker threads can then read into a ThreadLocal structure, process, etc.
I hope this helped. I'll keep updating as the mistakes I certainly made are pointed out.
I am using ThreadPoolexecutor by replacing it with legacy Thread.
I have created executor as below:
pool = new ThreadPoolExecutor(coreSize, size, 0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(coreSize),
new CustomThreadFactory(name),
new CustomRejectionExecutionHandler());
pool.prestartAllCoreThreads();
Here core size is maxpoolsize/5. I have pre-started all the core threads on start up of application roughly around 160 threads.
In legacy design we were creating and starting around 670 threads.
But the point is even after using Executor and creating and replacing legacy design we are not getting much better results.
For results memory management we are using top command to see memory usage.
For time we have placed loggers of System.currentTime in millis to check the usage.
Please tell how to optimize this design. Thanks.
But the point is even after using Executor and creating and replacing legacy design we are not getting much better results.
I am assuming that you are looking at the overall throughput from your application and you are not seeing a better performance as opposed to running each task in its own thread -- i.e. not with a pool?
This sounds like you were not being blocked because of context switching. Maybe your application is IO bound or otherwise waiting on some other system resource. 670 threads sounds like a lot and you would have been using a lot of thread stack memory but otherwise it may not have been holding back the performance of your application.
Typically we use the ExecutorService classes not necessarily because they are faster than raw threads but because the code is easier to manage. The concurrent classes take care of a lot of locking, queueing, etc. out of your hands.
Couple code comments:
I'm not sure you want the LinkedBlockingQueue to be limited by core-size. Those are two different numbers. core-size is the minimum number of threads in the pool. The size of the BlockingQueue is how many jobs can be queued up waiting for a free thread.
As an aside, the ThreadPoolExecutor will never allocate a thread past the core thread number, unless the BlockingQueue is full. In your case, if all of the core-threads are busy and the queue is full with the core-size number of queued tasks is when the next thread is forked.
I've never had to use pool.prestartAllCoreThreads();. The core threads will be started once tasks are submitted to the pool so I don't think it buys you much -- at least not with a long running application.
For time we have placed loggers of System.currentTime in millis to check the usage.
Be careful on this. Too many loggers could affect performance of your application more than re-architecting it. But I assume you added the loggers after you didn't see a performance improvement.
The executor merely wraps the creation/usage of Threads, so it's not doing anything magical.
It sounds like you have a bottleneck elsewhere. Are you locking on a single object ? Do you have a single single-threaded resource that every thread hits ? In such a case you wouldn't see any change in behaviour.
Is your process CPU-bound ? If so your threads should (very roughly speaking) match the number of processing cores available. Note that each thread you create consumes memory for its stack, and if you're memory bound, then creating multiple threads won't help here.
I've created an object of arrays with a size of 1000, they are all threaded so that means 1000 threads are added. Each object holds a socket and 9 more global variables. The whole object consists of 1000 lines of code.
I'm looking for ways to make the program efficient because it lags. CPU use is at 100% everytime I start the program.
I understand that I'm going to have to change the way the program works, but I can't find a good way. Can anyone explain how to achieve this?
It depends on what your threads actually do - are the tasks primarily using CPU or other resources? For CPU intensive tasks, the best strategy is to run as many threads as you have cores, or a few more. For threads which are blocking a lot on e.g. reading files, waiting for the net etc. you can have many more threads than CPUs.
It also depends on how many cores the system has. Obviously the answer is very different for a single processor machine than for a 128-way multiprocessor. The above rules of thumb can give you some estimates, but it is best to make experiments yourself based on these, to figure out the ideal number of threads for your specific setup.
Moreover, since Java5, it is always advisable to use e.g. a ThreadPoolExecutor instead of creating your threads manually. This makes your app both more robust and more flexible.
1/ use thread pool
2/ use futures
You should consider refactor you usage of threads.
1000 Threads normally makes no sense on a normal machine/server although your problem seems to be I/O-heavy. You should consider the number of cpu-threads that are available.
A possible solution would be to use a dispatcher that passes the handling (and possible responding) to a request on the socket into a queue of a ThreadPoolExecutor.
From my experience, 1000 threads are just too many (at least on 8core/8GB RAM machines). A common symptom is context switching slashing, where your OS is just busy jumping from thread to thread while doing little useful work (and a lot of memory is wasted etc.).
If you have to maintain 1000 sockets, you probably have to go for NIO. Easier way out would be closing/opening sockets every time (whether you can do this dependents on the characteristics of your work.).
The way you solve this many thread problem is to use a thread pool, as others note. Instead of extending Thread, code a Runnable instead. This is easier said than done though because you have to maintain state if you need conversation. This commonly involves a ConcurrentMap. I personally tend to put a Handler (which implements Runnable) on this map that should run when the counter party returns a response (the response contains a key everytime). In this case you'd be closing the socket every time. If you use NIO, it's more like coding with Threads in the sense you don't need to identify the counterparty like this, but it has its own complexity.