How to write java thread pool programme to read content of file?

How to write java thread pool programme to read content of file? - java

I want to define thread pool with 10 threads and read the content of the file. But different threads must not read same content.(like divide content into 10 pieces and read each pieces by one thread)

Well what you would do would be roughly this:
get the length of the file,
divide by N.
create N threads
have each one skip to (file_size / N) * thread_no and read (file_size / N) bytes into a buffer
wait for all threads to complete.
stitch the buffers together.
(If you were slightly clever about it, you could avoid the last step ...)
HOWEVER, it is doubtful that you would get much speed-up by doing this. Indeed, I wouldn't be surprised if you got a slow down in many cases. With a typical OS, I would expect that you would get as good, if not better performance by reading the file using one big read(...) call from one thread.
The OS can fetch the data faster from the disc if you read it sequentially. Indeed, a lot of OSes optimize for this use-case, and use read-ahead and in-memory buffering (using OS-level buffers) to give high effective file read rates.
Reading a file with multiple threads means that each thread will typically be reading from a different position in the file. Naively, that would entail the OS to seeking the disk heads backwards and forwards between the different positions ... which will slow down I/O considerably. In practice, the OS will do various things to mitigate that, but even so, simultaneously reading data from different positions on a disk is still bad for I/O throughput.

Related

What causes this performance drop?

I'm using the Disruptor framework for performing fast Reed-Solomon error correction on some data. This is my setup:
RS Decoder 1
/ \
Producer- ... - Consumer
\ /
RS Decoder 8
The producer reads blocks of 2064 bytes from disk into a byte buffer.
The 8 RS decoder consumers perform Reed-Solomon error correction in parallel.
The consumer writes files to disk.
In the disruptor DSL terms, the setup looks like this:
RsFrameEventHandler[] rsWorkers = new RsFrameEventHandler[numRsWorkers];
for (int i = 0; i < numRsWorkers; i++) {
rsWorkers[i] = new RsFrameEventHandler(numRsWorkers, i);
}
disruptor.handleEventsWith(rsWorkers)
.then(writerHandler);
When I don't have a disk output consumer (no .then(writerHandler) part), the measured throughput is 80 M/s, as soon as I add a consumer, even if it writes to /dev/null, or doesn't even write, but it is declared as a dependent consumer, performance drops to 50-65 M/s.
I've profiled it with Oracle Mission Control, and this is what the CPU usage graph shows:
Without an additional consumer:
With an additional consumer:
What is this gray part in the graph and where is it coming from? I suppose it has to do with thread synchronisation, but I can't find any other statistic in Mission Control that would indicate any such latency or contention.

Your hypothesis is correct, it is a thread synchronization issue.
From the API Documentation for EventHandlerGroup<T>.then (Emphasis mine)
Set up batch handlers to consume events from the ring buffer. These handlers will only process events after every EventProcessor in this group has processed the event.
This method is generally used as part of a chain. For example if the handler A must process events before handler B:
This should necessarily decrease throughput. Think about it like a funnel:
The consumer has to wait for every EventProcessor to be finished, before it can proceed through the bottleneck.

I can see two possibilities here, based on what you've shown. You might be affected by one or both, I'd recommend testing both.
1) IO processing bottleneck.
2) Contention on multiple threads writing to buffer.
IO processing
From the data shown, you have stated that as soon as you enable the IO component, your throughput decreases and kernel time increases. This could quite easily be the IO wait time while your consumer thread is writing. Context switch to perform a write() call is significantly more expensive than doing nothing. Your Decoders are now capped at the maximum speed of the consumer. To test this hypothesis, you could remove the write() call. In other words, open the output file, prepare the string for output, and just not issue the write call.
Suggestions
Try removing the write() call in the Consumer, see if it reduces kernel time.
Are you writing to a single flat file sequentially - if not, try this
Are you using smart batching (ie: buffering until endOfBatch flag and then writing in a single batch) to ensure that the IO is bundled up as efficiently as possible?
Contention on multiple writers
Based on your description I suspect your Decoders are reading from the disruptor and then writing back to the very same buffer. This is going to cause issues with multiple writers aka contention on the CPUs writing to memory. One thing I would suggest is to have two disruptor rings:
Producer writes to #1
Decoder reads from #1, performs RS decode and writes the result to #2
Consumer reads from #2, and writes to disk
Assuming your RBs are sufficiently large, this should result in good clean walking through memory.
The key here is not having the Decoder threads (which may be running on a different core) write to the same memory that was just owned by the Producer. With only 2 cores doing this, you will probably see improved throughput unless the disk speed is the bottleneck.
I have a blog article here which describes in more detail how to achieve this including sample code. http://fasterjava.blogspot.com.au/2013/04/disruptor-example-udp-echo-service-with.html
Other thoughts
It would also be helpful to know what WaitStrategy you are using, how many physical CPUs are in the machine, etc.
You should be able to significantly reduce CPU utilisation by moving to a different WaitStrategy given that your biggest latency will be IO writes.
Assuming you are using reasonably new hardware, you should be able to saturate the IO devices with only this setup.
You will also need to make sure the files are on different physical devices to achieve reasonable performance.

How to read only inner segments of a file using NIO

My batch process needs to be reading lines from huge files (1-3G), each of which can be processed independently of another. The files can have 10-50M rows. I was thinking of spawning about a dozen threads each of which would be processing a predetermined range of buffers, e.g. T1 will read range 0-1, T2 1-2, etc. concurrently. That means, of course, that T2 needs to jump instantly into the buffer position 2, without reading 0-2.
Is this type of segmentation of buffered file reading for the purposes of concurrency possible with Java NIO?

There is no point to this. The CPU may allow multiple threads but the disk is still single-threaded. All this will do is cause disk thrashing. Forget it.

What is the fastest way to write a large amount of data from memory to a file?

I have a program that generates a lot of data and puts it in a queue to write but the problem is its generating data faster than I'm currently writing(causing it to max memory and start to slow down). Order does not matter as I plan to parse the file later.
I looked around a bit and found a few questions that helped me design my current process(but I still find it slow). Here's my code so far:
//...background multi-threaded process keeps building the queue..
FileWriter writer = new FileWriter("foo.txt",true);
BufferedWriter bufferWritter = new BufferedWriter(writer);
while(!queue_of_stuff_to_write.isEmpty()) {
String data = solutions.poll().data;
bufferWritter.newLine();
bufferWritter.write(data);
}
bufferWritter.close();
I'm pretty new to programming so I maybe assessing this wrong(maybe a hardware issue as I'm using EC2), but is there a to very quickly dump the queue results into a file or if my approach is okay can I improve it somehow? As order does not matter, does it make more sense to write to multiple files on multiple drives? Will threading make it faster?,etc..I'm not exactly sure the best approach and any suggestions would be great. My goal is to save the results of the queue(sorry no outputting to /dev/null :-) and keep memory consumption as low as possible for my app(I'm not 100% sure but the queue fills up 15gig, so I'm assuming it'll be a 15gig+ file).
Fastest way to write huge data in text file Java (realized I should use buffered writer)
Concurrent file write in Java on Windows (made me see that maybe multi-threading writes wasn't a great idea)

Looking at that code, one thing that springs to mind is character encoding. You're writing strings, but ultimately, it's bytes that go to the streams. A writer character-to-byte encoding under the hood, and it's doing it in the same thread that is handling writing. That may mean that there is time being spent encoding that is delaying writes, which could reduce the rate at which data is written.
A simple change would be to use a queue of byte[] instead of String, do the encoding in the threads which push onto the queue, and have the IO code use a BufferedOutputStream rather than a BufferedWriter.
This may also reduce memory consumption, if the encoded text takes up less than two bytes per character on average. For latin text and UTF-8 encoding, this will usually be true.
However, i suspect it's likely that you're simply generating data faster than your IO subsystem can handle it. You will need to make your IO subsystem faster - either by using a faster one (if you're on EC2, perhaps renting a faster instance, or writing to a different backend - SQS vs EBS vs local disk, etc), or by ganging several IO subsystems together in parallel somehow.

Yes, writing multiple files on multiple drives should help, and if nothing else is writing to those drives at the same time, performance should scale linearly with the number of drives until I/O is no longer the bottleneck. You could also try a couple other optimizations to boost performance even more.
If you're generating huge files and the disk simply can't keep up, you can use a GZIPOutputStream to shrink the output--which, in turn, will reduce the amount of disk I/O. For non-random text, you can usually expect a compression ratio of at least 2x-10x.
//...background multi-threaded process keeps building the queue..
OutputStream out = new FileOutputStream("foo.txt",true);
OutputStreamWriter writer = new OutputStreamWriter(new GZIPOutputStream(out));
BufferedWriter bufferWriter = new BufferedWriter(writer);
while(!queue_of_stuff_to_write.isEmpty()) {
String data = solutions.poll().data;
bufferWriter.newLine();
bufferWriter.write(data);
}
bufferWriter.close();
If you're outputting regular (i.e., repetitive) data, you might also want to consider switching to a different output format--for example, a binary encoding of the data. Depending on the structure of your data, it might be more efficient to store it in a database. If you're outputting XML and really want to stick to XML, you should look into a Binary XML format, such as EXI or Fast InfoSet.

I guess as long as you produce your data out of calculations and do not load your data from another data source, writing will always be slower than generating your data.
You can try writing your data in multiple files (not in the same file -> due to synchronization problems) in multiple threads (but I guess that will not fix your problem).
Is it possible for you to wait for the writing part of your application to finish its operation and continue your calculations?
Another approach is:
Do you empty your queue? Does solutions.poll() reduce your solutions queue?

writing to different files using multiple threads is a good idea. Also, you should look into setting the BufferedWriters buffer size, which you can do from the constructor. Try initializing with a 10 Mb buffer and see if that helps

Channel for sharing data between threads

I have a requirement where I need to read text file then transform it and write it to some other file. I wish to do this in parallel fashion like one thread for read, one for transform and another for write.
Now to share data between threads I need some channel, I was thinking to use BlockingQueue for this but would like to explore some other (better) alternatives if available.
Guava has a EventBus but not sure whether this is a good fit for the requirement. What other alternatives are available and which one is best from performance point of view.

Unless your transform step is really intensive, this is probably a waste of time.
Think of it this way. What are you asking for?
You're asking for something that
Takes an incoming stream of data
Copies it to another thread
Presents it to that thread as an incoming stream of data
What data structure best represents an incoming stream of data for step 3? (Hint: it's the InputStream you started with!)
What value do the first two steps add? The "transform" thread can read from disk just as fast as it could read from disk through another thread. Adding the thread inbetween does not speed up the disk read.
You would start to consider adding another thread when
Your problem can be usefully divided into independent pieces of work (say, each thread works on a chunk of text
The cost of splitting the problem into those pieces of work is significantly smaller than the overhead of adding an additional thread and coordinating between them (which is small, but not free!)
The problem requires more resources than a single CPU can provide (a thread gives you access to more CPU resources, but doesn't provide much value in terms of I/O throughput)

Java file i/o throughput decline

I have a program in which each thread reads in files many lines at a time from a file, processes the lines, and writes the lines out to a different file. Four threads split the list of files to process among them. I'm having strange performance issues across two cases:
Four files with 50,000 lines each
Throughput starts at 700 lines/sec processed, declines to ~100 lines/sec
30,000 files with 12 lines each
Throughput starts around 800 lines/sec and remains steady
This is internal software I'm working on so unfortunately I can't share any source code, but the main steps of the program are:
Split list of files among four worker threads
Start all threads.
Thread reads up to 100 lines at once and stores in String[] array.
Thread applies transformation to all lines in array.
Thread writes lines to a file (not same as input file).
3-5 repeats for each thread until all files completely processed.
What I don't understand is why 30k files with 12 lines each gives me greater performance than a few files with many lines each. I would have expected that the overhead of opening and closing the files to be greater than that of reading a single file. In addition, the decline in performance of the former case is exponential in nature.
I've set the maximum heap size to 1024 MB and it appears to use 100 MB at most, so an overtaxed GC isn't the problem. Do you have any other ideas?

From your numbers, I guess that GC is probably not the issue. I suspect that this is a normal behavior of a disk, being operated on by many concurrent threads. When the files are big, the disk has to switch context between the threads many times (producing significant disk seek time), and the overhead is apparent. With small files, maybe they are read as a single chunk with no extra seek time, so threads do not interfere with each other too much.
When working with a single, standard disk, serial IO is usually better that parallel IO.

I am assuming that the files are located on the same disk, in which case you are probably thrashing the disk (or invalidating the disk\OS cache) with multiple threads attempting to read concurrently and write concurrently. A better pattern may be to have a dedicated reader\writer thread to handle IO, and then alter your pattern so that the job of transform (which sounds expensive) is handled by multiple threads. Your IO thread can fetch and overlap writing with the transform operations as results become available. This should stop disk thrashing, and balance the IO and CPU side of your pattern.

Have you tried running a Java profiler? That will point out what parts of your code are running the slowest. From this discussion, it seems like Netbeans profiler is a good one to check out.

Likely your thread is holding on to the buffered String[]s for too long. Even though your heap is much larger than you need, the throughput could be suffering due to garbage collection. Look at how long you're holding on to those references.
You might also waiting while the vm allocates more memory- asking for Xmx1024m doesn't allocate that much immediately, it grabs what it needs as more memory is required. You could also try -Xms1024m -Xmx1024m (i.e. allocate all of the memory at start) to test if that's the case.

You might have a stop and lock condition going on with your threads (one thread reads 100 lines into memory and holds onto the lock until its done processing, instead of giving it up when it has finished reading from the file). I'm not expert on Java threading, but it's something to consider.

I would review this process. If you use BufferedReader and BufferedWriter there is no advantage to reading and processing 100 lines at a time. It's just added complication and another source of potential error. Do it one at a time and simplify your life.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.