I'm in the middle of a problem where I am unable decide which solution to take.
The problem is a bit unique. Lets put it this way, i am receiving data from the network continuously (2 to 4 times per second). Now each data belongs to a different, lets say, group.
Now, lets call these groups, group1, group2 and so on.
Each group has a dedicated job queue where data from the network is filtered and added to its corresponding group for processing.
At first I created a dedicated thread per group which would take data from the job queue, process it and then goes to blocking state (using Linked Blocking Queue).
But my senior suggested that i should use thread pools because this way threads wont get blocked and will be usable by other groups for processing.
But here is the thing, the data im getting is fast enough and the time a thread takes to process it is long enough for the thread to, possibly, not go into blocking mode. And this will also guarantee that data gets processed sequentially (job 1 gets done before job 2), which in pooling, very little chances are, might not happen.
My senior is also bent on the fact that pooling will also save us lots of memory because threads are POOLED (im thinking he really went for the word ;) ). While i dont agree to this because, i personally think, pooled or not each thread gets its own stack memory. Unless there is something in thread pools which i am not aware of.
One last thing, I always thought that pooling helps where jobs appear in a big number for short time. This makes sense because thread spawning would be a performance kill because of the time taken to init a thread is lot more than time spent on doing the job. So pooling helps a lot here.
But in my case group1, group2,...,groupN always remain alive. So if there is data or not they will still be there. So thread spawning is not the issue here.
My senior is not convinced and wants me to go with the pooling solution because its memory footprint is great.
So, which path to take?
Thank you.
Good question.
Pooling indeed saves you initialization time, as you said. But it has another aspect: resource management. And here I am asking you this- just how many groups (read- dedicated threads) do you have?
do they grow dynamically during the execution span of the application?
For example, consider a situation where the answer to this question is yes. new Groups types are added dynamically. In this case, you might not want to dedicate a a thread to each one since there is technically no restrictions on the amount of groups that will be created, you will create a lot of threads and the system will be context switching instead of doing real work.
Threadpooling to the rescue- thread pool allows you to specify a restriction on the maxumal number of threads that could be possibly created, with no regard to load. So the application may deny service from certain requests, but the ones that get through are handled properly, without critically depleting the system resources.
Considering the above, I is very possible that in your case, it is very much OK to have a dedicated
thread for each group!
The same goes for your senior's conviction that it will save memory.. Indeed, a thread takes up memory on the heap, but is it really so much, if it is a predefined amount, say 5. Even 10- it is probably OK. Anyway, you should not use pooling unless you are a-priory and absolutely convinced that you actually have a problem!
Pooling is a design decision, not an architectural one. You can not-pool at the beggining and proceed with optimizations in case you find pooling to be beneficial after you encountered a performance issue.
Considering the serialization of requests (in order execution) it is no matter whether you are using a threadpool or a dedicated thread. The sequential execution is a property of the queue coupled with a single handler thread.
Creating a thread will consume resources, including the default stack per thread (IIR 512Kb, but configurable). So the advantage to pooling is that you incur a limited resource hit. Of course you need to size your pool according to the work that you have to perform.
For your particular problem, I think the key is to actually measure performance/thread usage etc. in each scenario. Unless your running into constraints I perhaps wouldn't worry either way, other than to make sure that you can swap one implementation for another without a major impact on your application. Remember that premature optimisation is the root of all evil. Note that:
"Premature optimization" is a phrase used to describe a situation
where a programmer lets performance considerations affect the design
of a piece of code. This can result in a design that is not as clean
as it could have been or code that is incorrect, because the code is
complicated by the optimization and the programmer is distracted by
optimizing.
Related
Given we have an application that is heavily polluted with concurrency constructs
multiple techniques are used (different people worked without clear architecture in mind),
multiple questionable locks that are there "just in case", thread safe queues. CPU usage is around 20%.
Now my goal is to optimize it such that it is making better use of caches and generally improve its performance and service time.
I'm considering to pin the parent process to a single core, remove all things that cause membars,
replace all thread safe data structures and replace all locks with some UnsafeReentrantLock
which would simply use normal reference field but take care of exclusive execution
needs...
I expect that we would end up with much more cache friendly application,
since we don't have rapid cache flushes all the time (no membars).
We would have less overhead since we dont need thread safe data structures,
volaties, atomics and replace all sorts of locks with I would assume that service time would improve also,
since we no longer synchronize on multiple thread safe queues...
Is there something that I'm overlooking here?
Maybe blocking operations would have to be paid attention to since they would not show up in that 20% usage?
Subject:
I’m trying to implement a basic job scheduling in Java to handle recurrent persisted scheduled task (for a personal learn project). I don’t want to use any (ready-to-use) libraries like Quartz/Obsidian/Cron4J/etc.
Objective:
Job have to be persistent (to handle server shutdown)
Job execution time can take up to ~2-5 mn.
Manage a large amount of job
Multithread
Light and fast ;)
All my job are in a MySQL Database.
JOB_TABLE (id, name, nextExecution,lastExecution, status(IDLE,PENDING,RUNNING))
Step by step:
Retrieve each job from “JOB_TABLE” where “nextExecution > now” AND “status = IDLE“. This step is executed every 10mn by a single thread.
For each job retrieved, I put a new thread in a ThreadPoolExecutor then I update the job status to “PENDING” in my “JOB_TABLE”.
When the job thread is running, I update the job status to “RUNNING”.
When the job is finished, I update the lastExecution with current time, I set a new nextExecution time and I change the job status to “IDLE”.
When server is starting, I put each PENDING/RUNNING job in the ThreadPoolExecutor.
Question/Observation:
Step 2 : Will the ThreadPoolExecutor handle a large amount of thread (~20000) ?
Should I use a NoSQL solution instead of MySQL ?
Is it the best solution to deal with such use case ?
This is a draft, there is no code behind. I’m open to suggestion, comments and criticism!
I have done similar to your task on a real project, but in .NET. Here is what I can recall regarding your questions:
Step 2 : Will the ThreadPoolExecutor handle a large amount of thread (~20000)?
We discovered that .NET's built-in thread pool was the worst approach, as the project was a web application. Reason: the web application relies on the built-in thread pool (which is static and thus shared for all uses within the running process) to run each request in separate thread, while maintain effective recycling of threads. Employing the same thread pool for our internal processing was going to exhaust it and leave no free threads for the user requests, or spoil their performance, which was unacceptable.
As you seem to be running quite a lot of jobs (20k is a lot for a single machine) then you definitely should look for a custom thread pool. No need to write your own though, I bet there are ready solutions and writing one is far beyond what your study project would require* see the comments (if I understand correctly you are doing a school or university project).
Should I use a NoSQL solution instead of MySQL?
Depends. You obviously need to update the job status concurrently, thus, you will have simultaneous access to one single table from multiple threads. Databases can scale pretty well to that, assuming you did your thing right. Here is what I refer to doing this right:
Design your code in a way that each job will affect only its own subset of rows in the database (this includes other tables). If you are able to do so, you will not need any explicit locks on database level (in the form of transaction serialization levels). You can even enforce a liberal serialization level that may allow dirty or phantom reads - that will perform faster. But beware, you must carefully ensure no jobs will concur over the same rows. This is hard to achieve in real-life projects, so you should probably look for alternative approaches in db locking.
Use appropriate transaction serialization mode. The transaction serialization mode defines the lock behavior on database level. You can set it to lock the entire table, only the rows you affect, or nothing at all. Use it wisely, as any misuse could affect the data consistency, integrity and the stability of the entire application or db server.
I am not familiar with NoSQL database, so I can only advice you to research on the concurrency capabilities and map them to your scenario. You could end up with a really suitable solution, but you have to check according to your needs. From your description, you will have to support simultaneous data operations over the same type of objects (what is the analog for a table).
Is it the best solution to deal with such use case ?
Yes and No.
Yes, as you will encounter one of the difficult tasks developers are facing in real world. I have worked with colleagues having more than 3 times my own experience and they were more reluctant to do multi-threading tasks than me, they really hated that. If you feel this area is interesting to you, play with it, learn and improve as much as you have to.
No, because if you are working on a real-life project, you need something reliable. If you have so many questions, you will obviously need time to mature and be able to produce a stable solution for such a task. Multi-threading is a difficult topic for many reasons:
It is hard to debug
It introduces many points of failure, you need to be aware of all of them
It could be a pain for other developers to assist or work with your code, unless you sticked to commonly accepted rules.
Error handling can be tricky
Behavior is unpredictable / undeterministic.
There are existing solutions with high level of maturity and reliability that are the preferred approach for real projects. Drawback is that you will have to learn them and examine how customizable they are for your needs.
Anyway, if you need to do it your way, and then port your achievement to a real project, or a project of your own, I can advice you to do this in a pluggable way. Use abstraction, programming to interfaces and other practices to decouple your own specific implementation from the logic that will set the scheduled jobs. That way, you can adapt your api to an existing solution if this becomes a problem.
And last, but not least, I did not see any error-handling predictions on your side. Think and research on what to do if a job fails. At least add a 'FAILED' status or something to persist in such case. Error handling is tricky when it comes to threads, so be thorough on your research and practices.
Good luck
You can declare the maximum pool size with ThreadPoolExecutor#setMaximumPoolSize(int). As Integer.MAX is larger 20000 then technically yes it can.
The other question is that does your machine wold support so many thread to run. You will have provide enough RAM so each tread will allocate on stack.
Thee should not be problem to address ~20,000 threads on modern desktop or laptop but on mobile device it could be an issue.
From doc:
Core and maximum pool sizes
A ThreadPoolExecutor will automatically
adjust the pool size (see getPoolSize()) according to the bounds set
by corePoolSize (see getCorePoolSize()) and maximumPoolSize (see
getMaximumPoolSize()). When a new task is submitted in method
execute(java.lang.Runnable), and fewer than corePoolSize threads are
running, a new thread is created to handle the request, even if other
worker threads are idle. If there are more than corePoolSize but less
than maximumPoolSize threads running, a new thread will be created
only if the queue is full. By setting corePoolSize and maximumPoolSize
the same, you create a fixed-size thread pool. By setting
maximumPoolSize to an essentially unbounded value such as
Integer.MAX_VALUE, you allow the pool to accommodate an arbitrary
number of concurrent tasks. Most typically, core and maximum pool
sizes are set only upon construction, but they may also be changed
dynamically using setCorePoolSize(int) and setMaximumPoolSize(int).
More
About the DB. Create a solution that is not depend to DB structure. Then you can set up two enviorements and measure it. Start with the technology that you know. But keep open to other solutions. At the begin the relations DB should keep up with the performance. And if you mange it properly the it should not be an issue later. The NoSQL are used to work with really big data. But the best for you is to create both and run some performace tests.
Disclaimer: I don't know much about the theoretical background of CSP.
Since I read about it, I tend to structure most of my multi-threading "CSP-like", meaning I have threads waiting for jobs on a BlockingQueue.
This works very well and simplified my thinking about threading a lot.
What are the downsides of this approach?
Can you think of situations where I'm performance-wise better off with a synchronized block?
...or Atomics?
If I have many threads mostly sleeping/waiting, is there some kind of performance impact, except the memory they use? For example during scheduling?
This is one possibly way to designing the architecture of your code to prevent thread issues from even happening, this is however not the only one and sometimes not the best one.
First of all you obviously need to have a series of tasks that can be splitted and put into such a queue, which is not always the case if you for example have to calculate the result of a single yet very straining formula, which just cannot be taken apart to utilize multi-threading.
Then there is the issue if the task at hand is so tiny, that creating the task and adding it into the list is already more expensive than the task itself. Example: You need to set a boolean flag on many objects to true. Splittable, but the operation itself is not complex enough to justify a new Runnable for each boolean.
You can of course come up with solutions to work around this sometimes, for example the second example could be made reasonable for your approach by having each thread set 100 flags per execution, but then this is only a workaround.
You should imagine those ideas for threading as what they are: tools to help you solve your problem. So the concurrent framework and patters using those are all together nothing but a big toolbox, but each time you have a task at hand, you need to select one tool out of that box, because in the end putting in a screw with a hammer is possible, but probably not the best solution.
My recommendation to get more familiar with the tools is, that each time you have a problem that involves threading: go through the tools, select the one you think fits best, then experiment with it until you are satisfied that this specific tool fits the specific task best. Prototyping is - after all - another tool in the box. ;)
What are the downsides of this approach?
Not many. A queue may require more overhead than an uncontended lock - a lock of some sort is required internally by the queue classs to protect it from multiple access. Compared with the advantages of thread-pooling and queued comms in general, some extra overhead does not bother me much.
better off with a synchronized block?
Well, if you absolutely MUST share mutable data between threads :(
is there some kind of performance impact,
Not so anyone would notice. A not-ready thread is, effectively, an extra pointer entry in some container in the kernel, (eg. a queue belonging to a semaphore). Not worth bothering about.
You need synchronized blocks, Atomics, and volatiles whenever two or more threads access mutable data. Keep this to a minimum and it needn't affect your design. There are lots of Java API classes that can handle this for you, such as BlockingQueue.
However, you could get into trouble if the nature of your problem/solution is perverse enough. If your threads try to read/modify the same data at the same time, you'll find that most of your threads are waiting for locks and most of your cores are doing nothing. To improve response time you'll have to let a lot more threads run, perhaps forgetting about the queue and letting them all go.
It becomes a trade off. More threads chew up a lot of CPU time, which is okay if you've got it, and speed response time. Fewer threads use less CPU time for a given amount of work (but what will you do with the savings?) and slow your response time.
Key point: In this case you need a lot more running threads than you have cores to keep all your cores busy.
This sort of programming (multithreaded as opposed to parallel) is difficult and (irreproducible) bug prone, so you want to avoid it if you can before you even start to think about performance. Plus, it only helps noticably if you've got more than 2 free cores. And it's only needed for certain sorts of problems. But you did ask for downsides, and it might pay to know this is out there.
I've created an object of arrays with a size of 1000, they are all threaded so that means 1000 threads are added. Each object holds a socket and 9 more global variables. The whole object consists of 1000 lines of code.
I'm looking for ways to make the program efficient because it lags. CPU use is at 100% everytime I start the program.
I understand that I'm going to have to change the way the program works, but I can't find a good way. Can anyone explain how to achieve this?
It depends on what your threads actually do - are the tasks primarily using CPU or other resources? For CPU intensive tasks, the best strategy is to run as many threads as you have cores, or a few more. For threads which are blocking a lot on e.g. reading files, waiting for the net etc. you can have many more threads than CPUs.
It also depends on how many cores the system has. Obviously the answer is very different for a single processor machine than for a 128-way multiprocessor. The above rules of thumb can give you some estimates, but it is best to make experiments yourself based on these, to figure out the ideal number of threads for your specific setup.
Moreover, since Java5, it is always advisable to use e.g. a ThreadPoolExecutor instead of creating your threads manually. This makes your app both more robust and more flexible.
1/ use thread pool
2/ use futures
You should consider refactor you usage of threads.
1000 Threads normally makes no sense on a normal machine/server although your problem seems to be I/O-heavy. You should consider the number of cpu-threads that are available.
A possible solution would be to use a dispatcher that passes the handling (and possible responding) to a request on the socket into a queue of a ThreadPoolExecutor.
From my experience, 1000 threads are just too many (at least on 8core/8GB RAM machines). A common symptom is context switching slashing, where your OS is just busy jumping from thread to thread while doing little useful work (and a lot of memory is wasted etc.).
If you have to maintain 1000 sockets, you probably have to go for NIO. Easier way out would be closing/opening sockets every time (whether you can do this dependents on the characteristics of your work.).
The way you solve this many thread problem is to use a thread pool, as others note. Instead of extending Thread, code a Runnable instead. This is easier said than done though because you have to maintain state if you need conversation. This commonly involves a ConcurrentMap. I personally tend to put a Handler (which implements Runnable) on this map that should run when the counter party returns a response (the response contains a key everytime). In this case you'd be closing the socket every time. If you use NIO, it's more like coding with Threads in the sense you don't need to identify the counterparty like this, but it has its own complexity.
Our company is running a Java application (on a single CPU Windows server) to read data from a TCP/IP socket and check for specific criteria (using regular expressions) and if a match is found, then store the data in a MySQL database. The data is huge and is read at a rate of 800 records/second and about 70% of the records will be matching records, so there is a lot of database writes involved. The program is using a LinkedBlockingQueue to handle the data. The producer class just reads the record and puts it into the queue, and a consumer class removes from the queue and does the processing.
So the question is: will it help if I use multiple consumer threads instead of a single thread? Is threading really helpful in the above scenario (since I am using single CPU)? I am looking for suggestions on how to speed up (without changing hardware).
Any suggestions would be really appreciated. Thanks
Simple: Try it and see.
This is one of those questions where you argue several points on either side of the argument. But it sounds like you already have most of the infastructure set up. Just create another consumer thread and see if the helps.
But the first question you need to ask yourself:
What is better?
How do you measure better?
Answer those two questions then try it.
Can the single thread keep up with the incoming data? Can the database keep up with the outgoing data?
In other words, where is the bottleneck? If you need to go multithreaded then look into the Executor concept in the concurrent utilities (There are plenty to choose from in the Executors helper class), as this will handle all the tedious details with threading that you are not particularly interested in doing yourself.
My personal gut feeling is that the bottleneck is the database. Here indexing, and RAM helps a lot, but that is a different question.
It is very likely multi-threading will help, but it is easy to test. Make it a configurable parameter. Find out how many you can do per second with 1 thread, 2 threads, 4 threads, 8 threads, etc.
First of all:
It is wise to create your application using the java 5 concurrent api
If your application is created around the ExecutorService it is fairly easy to change the number of threads used. For example: you could create a threadpool where the number of threads is specified by configuration. So if ever you want to change the number of threads, you only have to change some properties.
About your question:
- About the reading of your socket: as far as i know, it is not usefull (if possible at all) to have two threads read data from one socket. Just use one thread that reads the socket, but make the actions in that thread as few as possible (for example read socket - put data in queue -read socket - etc).
- About the consuming of the queue: It is wise to construct this part as pointed out above, that way it is easy to change number of consuming threads.
- Note: you cannot really predict what is better, there might be another part that is the bottleneck, etcetera. Only monitor / profiling gives you a real view of your situation. But if your application is constructed as above, it is really easy to test with different number of threads.
So in short:
- Producer part: one thread that only reads from socket and puts in queue
- Consumer part: created around the ExecutorService so it is easy to adapt the number of consuming threads
Then use profiling do define the bottlenecks, and use A-B testing to define the optimal numbers of consuming threads for your system
As an update on my earlier question:
We did run some comparison tests between single consumer thread and multiple threads (adding 5, 10, 15 and so on) and monitoring the que size of yet-to-be processed records. The difference was minimal and what more.. the que size was getting slightly bigger after the number of threads was crossing 25 (as compared to running 5 threads). Leads me to the conclusion that the overhead of maintaining the threads was more than the processing benefits got. Maybe this could be particular to our scenario but just mentioning my observations.
And of course (as pointed out by others) the bottleneck is the database. That was handled by using the multiple-insert statement in mySQL instead of single inserts. If we did not have that to start with, we could not have handled this load.
End result: I am still not convinced on how multi-threading will give benefit on processing time. Maybe it has other benefits... but I am looking only from a processing-time factor. If any of you have experience to the contrary, do let us hear about it.
And again thanks for all your input.
In your scenario where a) the processing is minimal b) there is only one CPU c) data goes straight into the database, it is not very likely that adding more threads will help. In other words, the front and the backend threads are I/O bound, with minimal processing int the middle. That's why you don't see much improvement.
What you can do is to try to have three stages: 1st is a single thread pulling data from the socket. 2nd is the thread pool that does processing. 3rd is a single threads that serves the DB output. This may produce better CPU utilization if the input rate varies, at the expense of temporarily growth of the output queue. If not, the throughput will be limited by how fast you can write to the database, no matter how many threads you have, and then you can get away with just a single read-process-write thread.