Does log4j uses NIO for writing data to file? - java

It seems to be pretty fast already and i was just wondering if anyone knows if its using NIO. I tried searching the whole source code for NIO (well, its kind of way to search :) lol); but did not hit anything. Also, If its not using NIO; do you think its worthwhile to modify log4j to use NIO also to make it more faster? Any pointers advises and links to some resources would be lot helpful.

Also, If its not using NIO; do you
think its worthwhile to modify of
log4j to use NIO also to make it more
faster?
No, unless logging is a significant part of your application's activities, in which case there is usually something wrong.
You seem to be under the impression that NIO 'is faster', but this is not true in general. Just try creating two files, one with standard IO and one with NIO, writing a bunch of data to them and closing them. You'll see the performance hardly differs. NIO will only perform better in certain use cases; most usually the case of many connections.

Check out the FileAppender source. Pretty much just standard java.io.

I don't see any reason why FileChannel could be any faster than FileOutputStream in this case.
maybe by using MappedByteBuffer? but in append mode, the behavior is OS dependent.
ultimately, the performance depends on your hard drive, your code matters very little.

Elaborating on Confusion's answer, File NIO blocks as well. That's why it is not faster than traditional File IO in some scenarios. Quoting O'Reilly's Java NIO book:
File channels are always blocking and cannot be placed into
nonblocking mode. Modern operating systems have sophisticated caching
and prefetch algorithms that usually give local disk I/O very low
latency. Network filesystems generally have higher latencies but often
benefit from the same optimizations. The nonblocking paradigm of
stream-oriented I/O doesn't make as much sense for file-oriented
operations because of the fundamentally different nature of file I/O.
For file I/O, the true winner is asynchronous I/O, which lets a
process request one or more I/O operations from the operating system
but does not wait for them to complete. The process is notified at a
later time that the requested I/O has completed. Asynchronous I/O is
an advanced capability not available on many operating systems. It is
under consideration as a future NIO enhancement.
Edit: With that said, you can get better read/write efficiency if you use File NIO with a MappedByteBuffer. Note that using MappedByteBuffer in Log4j 2 is under consideration.

Related

Fast and scalable IO passthrough

Original
I have a batch job that passes through the contents it downloads from various URLs to an S3 storage. I am currently using blocking IO and have reached a point where my job is IO bound blocking most of the time because of IO. So in order to speed up the whole process I was thinking about using non-blocking IO.
Unfortunately I wasn't able to find utility code for passing through content from a set of channels to another set of channels. Since I read that writing correct non-blocking code is not exactly easy, I would prefer to use an existing utility/framework than to write that code myself.
The TransferManager seems to be the only possible option for higher throughput when using the AWS SDK, but it only offers the usage of streams and seems to use IO threads in the background. Apparently there is no out of the box option for non-blocking uploads to S3.
What would you recommend? Right now I can only imagine the following solutions.
Stay with blocking IO and use my own IO thread pool
Use non-blocking IO to download the files to the local filesystem, then upload with TransferManager
Use non-blocking IO for pass through
Option 1 will obviously not scale and 2 will probably work for a while, but I would really like to keep my IOs on EBS low so I'd rather use 3.
To successfully implement option 3 I guess I would have to implement a lot myself so my final question is, whether you think it's worth it and if so, which tools I could use to make this work.
Edit 1
Clarifying that by IO bound I actually meant that the job is mostly waiting for IO. Here you can see that my bandwidth is not really saturated, but I would like it to be if that is possible.
If your job is I/O bound you've finished. You're bound by the speed of the network, not of your code. Using NIO won't make it any faster.
Clarifying that by IO bound I actually meant that the job is mostly waiting for IO.
Yes, that's what 'I/O bound' means. Nothing has changed.
Here you can see that my bandwidth is not really saturated, but I would like it to be if that is possible.
You could try using larger buffers, but as I said it seems to me that you've already finished.

Read/Write efficiency

I have a SCALA(+ JAVA) code which reads and writes at a certain rate. Profiling tells me how much time each method in the code has taken to execute. How do I measure if my program is reaching its maximum efficiency ? To make my code optimized so that it is reading at the maximum speed that is possible with the give configuration.I know this is hard-ware specific and varies from machine to machine. If there is a short-way to measure the process. If my program is reading and writing at the fastest rate possible by the hardware. (I'm using FileWriter along with BufferWriter.)
With a description given your best possible choice might be experimenting. What you could measure:
Changing the buffer size (this didn't help me when I tried this)
Switching to NIO (might help for large files)
Caching the data you read (might help for small files), caching dir content if there are many files in it. Opening a file speed degrades when number of files in a folder grows.
One possible technique to make sure there is no problem with the code with your code profiling is to get the CPU time distribution tree for your method and to expand execution paths taking most of the time. If all these paths head to Java standard libraries, you're probably reaching your best performance.
UPDATE
Some other things & techniques from the hrof you provided.
With a profiler or some other tecnhinique (I prefer stopwatches as they give more stable and realistic results) you need to find what's your bottleneck.
Most of the IO can be optimized to use a single buffer - this is less painful with Guava or Apache Commons IO.
However, there is not much you can do if you're using Jackson in your serialization chain if it's your bottleneck. Changing an algorithm?
There are slow guys (compared to native filesystem IO) - i.e. Formatters, String.format is very slow, Jackson, etc.
There are typical slow operations with IO - i.e. buffer allocations, string concatenation, too many allocated char[] buffers is a smell for IO optimization.

Advantages of Java NIO in blocking mode versus traditional I/O?

I have pretty much already decided not to use asynchronous, non-blocking Java NIO. The complexity versus benefit is very questionable in general, and I think it's not worth it in this project particularly.
But most of what I read about NIO, and comparisons with older java.io.* focuses on non-blocking, asynchronous NIO versus thread-per-connection synchronous I/O using java.io.*. However, NIO can be used in synchronous, blocking, thread-per-connection mode, which is rarely discussed it seems.
Here's the question: Is there any performance advantage of synchronous, blocking NIO versus traditional synchronous, blocking I/O (java.io.*)? Both would be thread-per-connection. How does the complexity compare?
Note that this is a general question, but at the moment I am primarily concerned with TCP socket communication.
An advantage of NIO over "traditional" IO is that NIO can use direct buffers that allow the OS to use DMA for some operations (e.g. reading from a network connection directly into a memory-mapped file) and thereby avoid copying data to intermediate buffers.
If you're moving large amounts of data in a scenario where this technique does avoid copy operations that would otherwise be performed, this can have a big impact on performance.
It basically boils down the number of concurrent connections and how busy those connections are. Blocking (standard thread per connection) is faster, both in latency and throughput (about twice as fast for a simple echo server). So if your system can cope with maintaining a thread for each connection (<1000 connections as a rule of thumb) go for the blocking approach. If you have lots of mostly idle connections (e.g. Comet long poll requests or IMAP idle connections) then switching to a non-blocking architecture could help scale your system.
I can not speak to the technology in particular, but it is not unusual for asynchronous libraries to provide synchronous operations to facilitate in debugging.
For instance if you are having problems you can eliminate the asynchronous portions of the logic without rewriting your entire process. This is especially helpful since synchronous processes are typically much easier to work with.

How do I get Java to use my multi-core processor with GZIPInputStream?

I'm using a GZIPInputStream in my program, and I know that the performance would be helped if I could get Java running my program in parallel.
In general, is there a command-line option for the standard VM to run on many cores? It's running on just one as it is.
Thanks!
Edit
I'm running plain ol' Java SE 6 update 17 on Windows XP.
Would putting the GZIPInputStream on a separate thread explicitly help? No! Do not put the GZIPInputStream on a separate thread! Do NOT multithread I/O!
Edit 2
I suppose I/O is the bottleneck, as I'm reading and writing to the same disk...
In general, though, is there a way to make GZIPInputStream faster? Or a replacement for GZIPInputStream that runs parallel?
Edit 3
Code snippet I used:
GZIPInputStream gzip = new GZIPInputStream(new FileInputStream(INPUT_FILENAME));
DataInputStream in = new DataInputStream(new BufferedInputStream(gzip));
AFAIK the action of reading from this stream is single-threaded, so multiple CPUs won't help you if you're reading one file.
You could, however, have multiple threads, each unzipping a different file.
That being said, unzipping is not particularly calculation intensive these days, you're more likely to be blocked by the cost of IO (e.g., if you are reading two very large files in two different areas of the HD).
More generally (assuming this is a question of someone new to Java), Java doesn't do things in parallel for you. You have to use threads to tell it what are the units of work that you want to do and how to synchronize between them. Java (with the help of the OS) will generally take as many cores as is available to it, and will also swap threads on the same core if there are more threads than cores (which is typically the case).
PIGZ = Parallel Implementation of GZip is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data. http://www.zlib.net/pigz/ It's not Java yet--- any takers. Of course the world needs it in Java.
Sometimes the compression or decompression is a big CPU-consumer, though it helps the I/O not be the bottleneck.
See also Dataseries (C++) from HP Labs. PIGZ only parallelizes the compression, while Dataseries breaks the output into large compressed blocks, which are decompressible in parallel. Also has a number of other features.
Wrap your GZIP streams in Buffered streams, this should give you a significant performance increase.
OutputStream out = new BufferedOutputStream(
new GZIPOutputStream(
new FileOutputStream(myFile)
)
)
And likewise for the input stream. Using the buffered input/output streams reduces the number of disk reads.
I'm not seeing any answer addressing the other processing of your program.
If you're just unzipping a file, you'd be better off simply using the command line gunzip tool; but likely there's some processing happening with the files you're pulling out of that stream.
If you're extracting something that comes in reasonably sized chunks, then your processing of those chunks should be happening in a separate thread from the unzipping.
You could manually start a Thread on each large String or other block of data; but since Java 1.6 or so you'd be better of with one of the fancy new classes in java.util.concurrent, such as a ThreadPoolExecutor.
Update
It's not clear to me from the question and other comments whether you really ARE just extracting files using Java. If you really, really think you should try to compete with gunzip, then you can probably gain some performance by using large buffers; i.e. work with a buffer of, say, 10 MB (binary, not decimal! - 1048576), fill that in a single gulp and write it to disk likewise. That will give your OS a chance to do some medium-scale planning for disk space, and you'll need fewer system-level calls too.
Compression seems like a hard case for parallelization because the bytes emitted by the compressor are a non-trivial function of the previous W bytes of input, where W is the window size. You can obviously break a file into pieces and create independent compression streams for each of the pieces that run in their own threads. You'll may need to retain some compression metadata so the decompressor knows how to put the file back together.
compression and decompression using gzip is a serialized process. to use multiple threads you would have to make a custom program to break up the input file into many streams and then a custom program to decompress and join them back together. either way IO is going to be a bottle neck WAY before CPU usage is.
Run multiple VMs. Each VM is a process and you should be able to run at least three processes per core without suffering any drop in performance. Of course, your application would have to be able to leverage multiprocessing in order to benefit. There is no magic bullet which is why you see articles in the press moaning about how we don't yet know how to use multicore machines.
However, there are lots of people out there who have structured their applications into a master which manages a pool of worker processes and parcels out work packages to them. Not all problems are amenable to being solved this way.
I think it is a mistake to assume that multithreading IO is always evil. You probably need to profile your particular case to be sure, because:
Recent operating systems use the currently free memory for the cache, and your files may actually not be on the hard drive when you are reading them.
Recent hard drives like SSD have much faster access times so changing the reading location is much less an issue.
The question is too general to assume we are reading from a single hard drive.
You may need to tune your read buffer, to make it large enough to reduce the switching costs. On the boundary case, one can read all files into memory and decompress there in parallel - faster and no any loss on IO multithreading. However something less extreme may also work better.
You also do not need to do anything special to use multiple available cores on JRE. Different threads will normally use different cores as managed by the operating system.
You can't parallelize the standard GZipInputStream, it is single threaded, but you can pipeline decompression and processing of the decompressed stream into different threads, i.e. set up the GZipInputStream as a producer and whatever processes it as a consumer, and connect them with a bounded blocking queue.

Performance / stability of a Memory Mapped file - Native or MappedByteBuffer - vs. plain ol' FileOutputStream

I support a legacy Java application that uses flat files (plain text) for persistence. Due to the nature of the application, the size of these files can reach 100s MB per day, and often the limiting factor in application performance is file IO. Currently, the application uses a plain ol' java.io.FileOutputStream to write data to disk.
Recently, we've had several developers assert that using memory-mapped files, implemented in native code (C/C++) and accessed via JNI, would provide greater performance. However, FileOutputStream already uses native methods for its core methods (i.e. write(byte[])), so it appears a tenuous assumption without hard data or at least anecdotal evidence.
I have several questions on this:
Is this assertion really true?
Will memory mapped files always
provide faster IO compared to Java's
FileOutputStream?
Does the class MappedByteBuffer
accessed from a FileChannel provide
the same functionality as a native
memory mapped file library accessed
via JNI? What is MappedByteBuffer
lacking that might lead you to use a
JNI solution?
What are the risks of using
memory-mapped files for disk IO in a production
application? That is, applications
that have continuous uptime with
minimal reboots (once a month, max).
Real-life anecdotes from production
applications (Java or otherwise)
preferred.
Question #3 is important - I could answer this question myself partially by writing a "toy" application that perf tests IO using the various options described above, but by posting to SO I'm hoping for real-world anecdotes / data to chew on.
[EDIT] Clarification - each day of operation, the application creates multiple files that range in size from 100MB to 1 gig. In total, the application might be writing out multiple gigs of data per day.
Memory mapped I/O will not make your disks run faster(!). For linear access it seems a bit pointless.
A NIO mapped buffer is the real thing (usual caveat about any reasonable implementation).
As with other NIO direct allocated buffers, the buffers are not normal memory and wont get GCed as efficiently. If you create many of them you may find that you run out of memory/address space without running out of Java heap. This is obviously a worry with long running processes.
You might be able to speed things up a bit by examining how your data is being buffered during writes. This tends to be application specific as you would need an idea of the expected data writing patterns. If data consistency is important, there will be tradeoffs here.
If you are just writing out new data to disk from your application, memory mapped I/O probably won't help much. I don't see any reason you would want to invest time in some custom coded native solution. It just seems like too much complexity for your application, from what you have provided so far.
If you are sure you really need better I/O performance - or just O performance in your case, I would look into a hardware solution such as a tuned disk array. Throwing more hardware at the problem is often times more cost effective from a business point of view than spending time optimizing software. It is also usually quicker to implement and more reliable.
In general, there are a lot of pitfalls in over optimization of software. You will introduce new types of problems to your application. You might run into memory issues/ GC thrashing which would lead to more maintenance/tuning. The worst part is that many of these issues will be hard to test before going into production.
If it were my app, I would probably stick with the FileOutputStream with some possibly tuned buffering. After that I'd use the time honored solution of throwing more hardware at it.
From my experience, memory mapped files perform MUCH better than plain file access in both real time and persistence use cases. I've worked primarily with C++ on Windows, but Linux performances are similar, and you're planning to use JNI anyway, so I think it applies to your problem.
For an example of a persistence engine built on memory mapped file, see Metakit. I've used it in an application where objects were simple views over memory-mapped data, the engine took care of all the mapping stuff behind the curtains. This was both fast and memory efficient (at least compared with traditional approaches like those the previous version used), and we got commit/rollback transactions for free.
In another project I had to write multicast network applications. The data was send in randomized order to minimize the impact of consecutive packet loss (combined with FEC and blocking schemes). Moreover the data could well exceed the address space (video files were larger than 2Gb) so memory allocation was out of question. On the server side, file sections were memory-mapped on demand and the network layer directly picked the data from these views; as a consequence the memory usage was very low. On the receiver side, there was no way to predict the order into which packets were received, so it has to maintain a limited number of active views on the target file, and data was copied directly into these views. When a packet had to be put in an unmapped area, the oldest view was unmapped (and eventually flushed into the file by the system) and replaced by a new view on the destination area. Performances were outstanding, notably because the system did a great job on committing data as a background task, and real-time constraints were easily met.
Since then I'm convinced that even the best fine-crafted software scheme cannot beat the system's default I/O policy with memory-mapped file, because the system knows more than user-space applications about when and how data must be written. Also, what is important to know is that memory mapping is a must when dealing with large data, because the data is never allocated (hence consuming memory) but dynamically mapped into the address space, and managed by the system's virtual memory manager, which is always faster than the heap. So the system always use the memory optimally, and commits data whenever it needs to, behind the application's back without impacting it.
Hope it helps.
As for point 3 - if the machine crashes and there are any pages that were not flushed to disk, then they are lost. Another thing is the waste of the address space - mapping a file to memory consumes address space (and requires contiguous area), and well, on 32-bit machines it's a bit limited. But you've said about 100MB - so it should not be a problem. And one more thing - expanding the size of the mmaped file requires some work.
By the way, this SO discussion can also give you some insights.
If you write fewer bytes it will be faster. What if you filtered it through gzipoutputstream, or what if you wrote your data into ZipFiles or JarFiles?
As mentioned above, use NIO (a.k.a. new IO). There's also a new, new IO coming out.
The proper use of a RAID hard drive solution would help you, but that would be a pain.
I really like the idea of compressing the data. Go for the gzipoutputstream dude! That would double your throughput if the CPU can keep up. It is likely that you can take advantage of the now-standard double-core machines, eh?
-Stosh
I did a study where I compare the write performance to a raw ByteBuffer versus the write performance to a MappedByteBuffer. Memory-mapped files are supported by the OS and their write latencies are very good as you can see in my benchmark numbers. Performing synchronous writes through a FileChannel is approximately 20 times slower and that's why people do asynchronous logging all the time. In my study I also give an example of how to implement asynchronous logging through a lock-free and garbage-free queue for ultimate performance very close to a raw ByteBuffer.

Categories

Resources