I am trying to to log asynchronously in a heavily multi-threaded environment in java on linux platform.what would be a suitable data structure(lock-free) to bring in low thread contention?
I need to log GBs of messages. I need to do it in async/lock-free manner so I don't kill performance on the main logic(the code that invokes the logger apis).
Logback has an AsyncAppender that might meet your needs.
The simplest way to do it is to write into multiple files - one for each thread.
Make sure you put timestamps at the start of each record, so it is easier to merge them into a single log file.
example unix command:
cat *.log | sort | less
But for a better / more useful answer you do need to clarify your question by adding a lot more detail.
I would use Java Chronicle, mostly because I wrote it but I suggest it here because you can write lock free and garbage free logging with a minimum of OS calls. This requires one log per thread, but you will have kept these to a minimum already I assume.
I have used this library to write 1 GB/second from a two threads. You may find having more threads will not help as much as you think if logging is a bottle neck for you.
BTW: You have given me an idea of how the log can be updated from multiple threads/processes, but it will take a while to implement and test.
To reduce contention, you can first put log messages in a buffer, private to each thread. When the buffer is full, put it in a queue handled by a separate log thread, which then merges messages from different threads and writes them to a file. Note, you need that separate thread in any case, in order not to slowdown the working threads when the next buffer is to be written on disk.
It is impossible to avoid queue contention as your logging thread will most likely log faster than your writer (disk i/o) thread can keep up, but with some smart wait strategies and thread pinning you can minimize latency and maximize throughput.
Take a look on CoralLog developed by Coral Blocks (with which I am affiliated) which uses a lock-free queue and can log a 64-byte message in 52 nanoseconds on average. It is capable of writing more than 5 million messages per second.
Related
I've Worte a simple multithread java application, The main method just creates 5k threads, each thread will loop over a list having 5M records to process.
My Machine specs:
CPU cores: 12 cores
Memory: 13Gb RAM
OS: Debian 64-bit
My jar is now running, And I use hTop to monitor my application and this is what I can see while its running
And This is how I construct a Thread:
ExecutorService executor = Executors.newCachedThreadPool();
Future<MatchResult> future = executor.submit(() -> {
Match match = new Match();
return match.find(this);
});
Match.class
find(Main main){
// looping over a list of 5M
// process this values and doing some calculations
// send the result back to the caller
// this function has no problem and it just takes a long time to run (~160 min)
}
And now I have some questions:
1- Based on my understanding if I have a multiThreaded process, it'll fully utilize all my cores until the task is completed, so why the work load is only around 0.5 (only half a core is used)?
2- Why my Java app state is "S" (sleeping) while its actually running and filling up the logs file?
3- Why I can only see 2037 threads out of 5k are running (this number was actually less than this and its getting increased over time)
My Target: to utilize all cores and get all this 5k+ done as fast as it can be :)
Based on my understanding if I have a multiThreaded process, it'll fully utilize all my cores until the task is completed.
Your understanding is not correct. There are lots of reasons why cores may not (all) be used in a poorly designed multi-threaded application.
so why the work load is only around 0.5 (only half a core is used)?
A number of possible reasons:
The threads may be deadlocked.
The threads may all be contending for a single lock (or a small number of locks), resulting in most of them waiting.
The threads could all be waiting for I/O; e.g. reading the records from some database.
And those are just some of the more obvious possible reasons.
Given the that your threads are making some progress, I think that explanation #2 is a good fit to your "symptoms".
For what it is worth, creating 5k threads is almost certainly a really bad idea. At most 12 of those could possibly be running at any time. The rest will waiting to run (assuming you resolve the problem that is leading to thread starvation) and tying down memory. The latter has various secondary performance effects.
My Target: to utilize all cores and get all this 5k+ done as fast as it can be :)
Those two goals are probably mutually exclusive :-)
All threads are logging to the same file by a the java.util.Logger.
That is a possibly leading to them all contending for the same lock ... on a something in the logger framework. Or bottlenecking on file I/O for the log file.
Generally speaking logging is expensive. If you want performance, minimize your logging, and for cases where the logging is essential, use a logging framework that doen't introduce a concurrency bottleneck.
The best way to solve this problem is to profile the code and figure ouot where it is spending most of its time.
Guesswork is inefficient.
Thank you guys, I've fixed the problem and now Im having the 12 cores running up to maximum as you see in the picture. :)
I actually tried to run this command jstack <Pid> to see the status of my all running threads in this process ID, and I found that 95% of my threads are actually BLOCKED at the logging line, I did some googling and found that I can use AsynchAppender in log4J so logging will not block the thread
I need to update 550 000 records in a table with the JBOSS Server is Starting up. I need to make this update as a backgroundt process with multiple threads and parallel processing. Application is Spring, so I can use initializing bean for this.
To perform the parallal processing I am planning to use Java executor framework.
ThreadPoolExecutor executor=(ThreadPoolExecutor)Executors.newFixedThreadPool(50); G
How to decide the thread pool count?
I think this is depends on hardware My hardware. it is 16 GB Ram and Co-i 3 processor.
Is it a good practice to Thread.sleep(20);while processing this big update as background.
I don't know much about Spring processing specifically, but your questions seem general enough that I can still provide a possibly inadequate answer.
Generally there's a lot of factors that go into how many threads you want. You definitely don't want multiple threads on a core, as that'll slow things down as threads start contending for CPU time instead of working, so probably your core count would be your ceiling, or maybe core count - 1 to allow one core for all other tasks to run on (so in your case maybe 3 or 4 cores, tops, if I remember core counts for i3 processors right). However, in this case I'd guess you're more likely to run into I/O and/or memory/cache bottlenecks, since when those are involved, those are more likely to slow down your program than insufficient parallelization. In addition, the tasks that your threads are doing would affect the number of threads you can use; if you have one thread to pull data in and one thread to dump data back out after processing, it might be possible for those threads to share a core.
I'm not sure why this would be a good idea... What use do you see for Thread.sleep() while processing? I'd guess it'd actually slow down your processing, because all you're doing is putting threads to sleep when they could be working.
In any case, I'd be wary of parallelizing what is likely to be an I/O bound task. You'll definitely need to profile to see where your bottlenecks are, even before you start parallelizing, to make sure that multiple cores will actually help you.
If it is the CPU that is adding extra time to complete your task, then you can start parallelizing. Even then, be careful about cache issues; try to make sure each thread works on a totally separate chunk of data (e.g. through ThreadLocal) so cache/memory issues don't limit any performance increases. One way this could work is by having a reader thread dump data into a Queue which the worker threads can then read into a ThreadLocal structure, process, etc.
I hope this helped. I'll keep updating as the mistakes I certainly made are pointed out.
Right now in my application,at certain points we are logging some heavy stuff into the log files.
Basically only for logging we are creating JSON of the data available and then logging into Log files.This is business requirement to log data in JSON format .
Now creating JSON from the data available and then logging to FILE takes lot of time and impacts the original request return time.
Now idea is to improve the sitation .
One of the things that we have discussed is to create a thread pool using
Executors.newSingleThreadExecutor()
in our code and then submitting the task to it which does the conversion of data into JSON and subsequent logging.
Is it a good approach to do this ?As we are managing the thread pool itself ,is it going to create some issues?
I would appreciate if someone can share better solutions.
Someway to use Log4j for this .I tried to use AsyncAppender but didnt achieve any desired result.
We are using EJB 3,Jboss 5.0,Log4j,java6.
I believe you are on right track in terms of using a separate thread pool for logging. In lot of products you will see the asynchronous logging feature. Logs are accumulated and pushed to log files using a separate thread than the request thread. Especially in prodcution environments, where are millions of incoming request and your response time need to be less than few seconds. You cannot afford anything such as logging to slow down the system. So the approach used is to add logs in a memory buffer and push them asynchronously in reasonably sized chunks.
A word of caution while using thread pool for logging
As multiple threads will be working on the log file(s) and on a memory log buffer, you need to be careful about the logging. You need to add logs in a FIFO kind of a buffer to be sure that logs are printed in the log files sorted by time stamp. Also make sure the file access is synchronized and you don't run into situation where log file is all upside down or messed up.
Have a look at Logback,AsyncAppender it already provide separate threadpool, queue etc and is easily configurable, it almost do the same as you are doing, but saves you from re-inventing the wheel.
Is using MongoDB for logging considered?
MongoDB inserts can be done asynchronously. One wouldn’t want a user’s experience to grind to a halt if logging were slow, stalled
or down. MongoDB provides the ability
to fire off an insert into a log collection and not wait for a response code. (If one
wants a response, one calls getLastError() — we would skip that here.)
Old log data automatically LRU’s out. By using capped collections,
we preallocate space for logs, and once it is full, the log wraps
and reuses the space specified. No risk of filling up a disk with
excessive log information, and no need to write log archival /
deletion scripts.
It’s fast enough for the problem. First, MongoDB is very fast in
general, fast enough for problems like this. Second, when using a
capped collection, insertion order is automatically preserved: we
don’t need to create an index on timestamp. This makes things even
faster, and is important given that the logging use case has a very
high number of writes compared to reads (opposite of most database
problems).
Document-oriented / JSON is a great format for log information. Very flexible and “schemaless” in the sense we can throw in an extra
field any time we want.
There is also log4j 2: http://logging.apache.org/log4j/2.x/manual/async.html Additionally read this article about why it is so fast: http://www.grobmeier.de/log4j-2-performance-close-to-insane-20072013.html#.UzwywI9Bow4
You can also try CoralLog to asynchronously log data using the disruptor pattern. That way you spend minimum time in the logger thread and all the hard work is passed to the thread doing the actual file I/O. It also provides Memory Mapped Files to speed up the consumer thread and reduce queue contention.
Disclaimer: I am one of the developers of CoralLog
I'm writing an application that listens on UDP for incoming messages. My main thread receives message after message from the network and passes each of them to a new thread for handling using an executor.
Each handling thread does the required processing on the message it's responsible on and adds it to a LinkedBlockingQueue that is shared between all the handling threads.
Then, I have a DB worker thread that drains the queue by block of 10000 messages and inserts the block of messages in the DB.
Since the arrival rate of messages may be high (more than 20000 messages per second), I thought that using LOAD DATA INFILE is more efficient. So, this DB worker threads drains the queue as said previously, creates a temporary file containing all the messages using a CSV format, and passes the created file to another thread using another executor. This new thread execute the LOAD DATA INFILE statement using JDBC.
After testing my application, I think the performances are not so good, I'm looking for ways to improve performance both at the multithreading level and at the DB access level.
I precise that I use MySQL as DBMS.
Thanks
You need to determine why your performance is poor.
E.g. its quite likely you don't need multiple threads if you are writing the data sequentially to a database which is far more likely to be your bottleneck. The problem with using multiple threads when you don't need to is that it add complexity which is an overhead in itself and it can be slower than using a single thread.
I would try and see what the performance is like if you do everything but load the data into the database. i.e. write the file and discard it.
It's hard to tell without any profiler output, but my (un-)educated guess is that the bottleneck is that you are writing your changes to a file on the hard drive, and then prompt your database to read and parse this file. Storage access is always much, much slower than memory access. So this is very likely much slower than just feeding the database the queries from memory.
But that's just guessing. Maybe the bottleneck is somewhere else where you or me would have never expected it. When you really want to know which part of your applications eats how much CPU time, you should use a profiler like Profiler4j to analyze your program.
Our company is running a Java application (on a single CPU Windows server) to read data from a TCP/IP socket and check for specific criteria (using regular expressions) and if a match is found, then store the data in a MySQL database. The data is huge and is read at a rate of 800 records/second and about 70% of the records will be matching records, so there is a lot of database writes involved. The program is using a LinkedBlockingQueue to handle the data. The producer class just reads the record and puts it into the queue, and a consumer class removes from the queue and does the processing.
So the question is: will it help if I use multiple consumer threads instead of a single thread? Is threading really helpful in the above scenario (since I am using single CPU)? I am looking for suggestions on how to speed up (without changing hardware).
Any suggestions would be really appreciated. Thanks
Simple: Try it and see.
This is one of those questions where you argue several points on either side of the argument. But it sounds like you already have most of the infastructure set up. Just create another consumer thread and see if the helps.
But the first question you need to ask yourself:
What is better?
How do you measure better?
Answer those two questions then try it.
Can the single thread keep up with the incoming data? Can the database keep up with the outgoing data?
In other words, where is the bottleneck? If you need to go multithreaded then look into the Executor concept in the concurrent utilities (There are plenty to choose from in the Executors helper class), as this will handle all the tedious details with threading that you are not particularly interested in doing yourself.
My personal gut feeling is that the bottleneck is the database. Here indexing, and RAM helps a lot, but that is a different question.
It is very likely multi-threading will help, but it is easy to test. Make it a configurable parameter. Find out how many you can do per second with 1 thread, 2 threads, 4 threads, 8 threads, etc.
First of all:
It is wise to create your application using the java 5 concurrent api
If your application is created around the ExecutorService it is fairly easy to change the number of threads used. For example: you could create a threadpool where the number of threads is specified by configuration. So if ever you want to change the number of threads, you only have to change some properties.
About your question:
- About the reading of your socket: as far as i know, it is not usefull (if possible at all) to have two threads read data from one socket. Just use one thread that reads the socket, but make the actions in that thread as few as possible (for example read socket - put data in queue -read socket - etc).
- About the consuming of the queue: It is wise to construct this part as pointed out above, that way it is easy to change number of consuming threads.
- Note: you cannot really predict what is better, there might be another part that is the bottleneck, etcetera. Only monitor / profiling gives you a real view of your situation. But if your application is constructed as above, it is really easy to test with different number of threads.
So in short:
- Producer part: one thread that only reads from socket and puts in queue
- Consumer part: created around the ExecutorService so it is easy to adapt the number of consuming threads
Then use profiling do define the bottlenecks, and use A-B testing to define the optimal numbers of consuming threads for your system
As an update on my earlier question:
We did run some comparison tests between single consumer thread and multiple threads (adding 5, 10, 15 and so on) and monitoring the que size of yet-to-be processed records. The difference was minimal and what more.. the que size was getting slightly bigger after the number of threads was crossing 25 (as compared to running 5 threads). Leads me to the conclusion that the overhead of maintaining the threads was more than the processing benefits got. Maybe this could be particular to our scenario but just mentioning my observations.
And of course (as pointed out by others) the bottleneck is the database. That was handled by using the multiple-insert statement in mySQL instead of single inserts. If we did not have that to start with, we could not have handled this load.
End result: I am still not convinced on how multi-threading will give benefit on processing time. Maybe it has other benefits... but I am looking only from a processing-time factor. If any of you have experience to the contrary, do let us hear about it.
And again thanks for all your input.
In your scenario where a) the processing is minimal b) there is only one CPU c) data goes straight into the database, it is not very likely that adding more threads will help. In other words, the front and the backend threads are I/O bound, with minimal processing int the middle. That's why you don't see much improvement.
What you can do is to try to have three stages: 1st is a single thread pulling data from the socket. 2nd is the thread pool that does processing. 3rd is a single threads that serves the DB output. This may produce better CPU utilization if the input rate varies, at the expense of temporarily growth of the output queue. If not, the throughput will be limited by how fast you can write to the database, no matter how many threads you have, and then you can get away with just a single read-process-write thread.