Context
I am writing a Java program that communicates with a C# program through standard in and standard out. The C# program is started as a child process. It gets "requests" through stdin and sends "responses" through stdout. The requests are very lightweight (a few bytes size), but the responses are large. In a normal run of the program, the responses amount for about 2GB of data.
I am looking for ways to improve performance, and my measurements indicate that writing to stdout is a bottleneck. Here are the numbers from a normal run:
Total time: 195 seconds
Data transferred through stdout: 2026MB
Time spent writing to stdout: 85 seconds
stdout throughput: 23.8 MB/s
By the way, I am writing all the bytes to an in-memory buffer first, and copying them in one go to stdout to make sure I only measure stdout write time.
Question
What is an efficient and elegant way to share data between the C# child process and the Java parent process? It is clear that stdout is not going to be enough.
I have read here and there about sharing memory through memory mapped files, but the Java and .NET APIs give me the impression that I'm looking in the wrong place.
Before you invest more in memory mapped files or named pipes I would first check whether you actually read and write efficiently. java.lang.Process.getInputStream() uses a BufferedInputStream, so the reader side should be OK. But in your C# program you will most likely use Console.Write. The problem here is that AutoFlush is enabled by default. So every single write explicitely flushes the stream. I wrote my last C# code years ago, so I'm not up-to-date. But maybe it is possible to set the AutoFlush property of Console.Out to false and flush the stream manually after multiple writes.
If disabling AutoFlush should not be possible the only way to improve performance with Console.Out would be to write more text with a single write.
Another potential bottleneck may be a shell in between that has to interpret the written data. Ensure that you execute the C# program directly and not through a script or by calling the command executor.
Before you start using memory mapped files I would first try to simply write into a file. As long as you have enough free memory that is not used by your programs or others and as long as there are no other programs with frequent disk access the operating system will be able to hold quite a big amount of written data within the file system cache. As long as your Java program reads fast enough from file while your C# program is writing to the file chances are high that only some or even no data has to be loaded from disk.
As Matthew Watson mentioned in the comments, it is indeed possible and incredibly fast to use a memory mapped file. In fact, the throughput for my program went from 24 MB/s to 180 MB/s. Below is the gist of it.
The following Java code creates the memory mapped file used for communication and opens a buffer we can read from:
var path = Paths.get("test.mmap");
var channel = FileChannel.open(path, StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE);
var mappedByteBuffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, 200_000 * 8);
The following C# code opens the memory mapped file and creates a stream that you can use to write bytes to it (note that buffer is the name of the array of bytes to be written):
// This code assumes the file has already been created on the Java side
var file = File.Open("test.mmap", FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite);
var memoryMappedFile = MemoryMappedFile.CreateFromFile(file, fileName, 0, MemoryMappedFileAccess.ReadWrite, HandleInheritability.None, false);
var stream = memoryMappedFile.CreateViewStream();
stream.Write(buffer, 0, buffer.Length);
stream.Flush();
Of course, you need to somehow synchronize the Java and the C# side. For the sake of simplicity, I didn't include that in the code above. In my code, I am using standard in and standard out to signal when it is safe to read / write.
Related
I am creating a process using java runtime on a solaris OS. I then get inputstream from the process and do a read on the input stream. I expect (I am not too sure about the process, it is a 3rd party thing)the process outstream to be huge but it seems to be clipped. Could it be that there is a threshold on java side as to how much a process can have in its output stream?
Thanks,
Abdul
There is no limit to the amount of data you can read, if you read repeatedly. You cannot read more than 2 GB at once and some stream types might only give you a few KB at a time. e.g. a slow Socket will often given you 1.5 KB or less (based on the MTU of the connection)
If you call int read(byte[]) it is only guaranteed to read 1 byte. It is a common mistake to assume you will read the full buffer every time. If you need this you can use DataInputStream.readFully(byte[])
By "process output stream" do you mean STDOUT? STDERR? Or you have an OutputStream object that you direct to somewhere? (a file?)
If you write to a file - you might see clipped data if you don't close your output stream. As long as you go by the book (outputstream.close() when you are done writing) you are good to go. Notice that there are some underlying limitations like Storage space (obvious) or file system limitations (some limit the file size).
If you write to STDOUT/STDERR - As far as I know you are fine. Notice again that if you write your output to a terminal, or through Eclipse (for example), then they might have a buffer and therefore limit your output (but then, it's most likely that you'll get the first part of data missing and not the last part of it).
You shouldn't run into limitations on InputStream or OutputStream if it is properly implemented. The most likely resource to run into limitations on is memory when allocating objects either from the input or to the output - for example trying to read a 100GB file into memory to then write to an output. If you need to load very large objects into memory to or from a stream, make sure to use a 64bit JVM and allocate as much memory to it as you can, however testing is the only way to determine the ideal values.
Being bored earlier today I started thinking a bit about the relative performance of buffered and unbuffered byte streams in Java. As a simple test, I downloaded a reasonably large text file and wrote a short program to determine the effect that buffered streams has when copying the file. Four tests were performed:
Copying the file using unbuffered input and output byte streams.
Copying the file using a buffered input stream and an unbuffered output stream.
Copying the file using an unbuffered input stream and a buffered output stream.
Copying the file using buffered input and output streams.
Unsurprisingly, using buffered input and output streams is orders of magnitude faster than using unbuffered streams. However, the really interesting thing (to me at least) was the difference in speed between cases 2 and 3. Some sample results are as follows:
Unbuffered input, unbuffered output
Time: 36.602513585
Buffered input, unbuffered output
Time: 26.449306847
Unbuffered input, buffered output
Time: 6.673194184
Buffered input, buffered output
Time: 0.069888689
For those interested, the code is available here at Github. Can anyone shed any light on why the times for cases 2 and 3 are so asymmetric?
When you read a file, the filesystem and devices below it do various levels of caching. They almost never read one byte at at time; they read a block. On a subsequent read of the next byte, the block will be in cache and so will be much faster.
It stands to reason then that if your buffer size is the same size as your block size, buffering the input stream doesn't actually gain you all that much (it saves a few system calls, but in terms of actual physical I/O it doesn't save you too much).
When you write a file, the filesystem can't cache for you because you haven't given it a backlog of things to write. It could potentially buffer the output for you, but it has to make an educated guess at how often to flush the buffer. By buffering the output yourself, you let the device do much more work at once because you manually build up that backlog.
To your title question, it is more effective to buffer the output. The reason for this is the way Hard Disk Drives (HDDs) write data to their sectors. Especially considering fragmented disks. Reading is much faster because the disk already knows where the data is versus having to determine where it will fit. Using the buffer the disk will find larger contiguous blank space to save the data than in the unbuffered manner.
Run another test for giggles. Create a new partition on your disk and run your tests reading and writing to the clean slate. To compare apples to apples, format the newly created partition between tests. Please post your numbers after this if you run the tests.
Generally writing is more tedious for the computer cause it cannot cache while reading can. Generally it is much like in real life - reading is faster and easier than writing!
I have a program that generates a lot of data and puts it in a queue to write but the problem is its generating data faster than I'm currently writing(causing it to max memory and start to slow down). Order does not matter as I plan to parse the file later.
I looked around a bit and found a few questions that helped me design my current process(but I still find it slow). Here's my code so far:
//...background multi-threaded process keeps building the queue..
FileWriter writer = new FileWriter("foo.txt",true);
BufferedWriter bufferWritter = new BufferedWriter(writer);
while(!queue_of_stuff_to_write.isEmpty()) {
String data = solutions.poll().data;
bufferWritter.newLine();
bufferWritter.write(data);
}
bufferWritter.close();
I'm pretty new to programming so I maybe assessing this wrong(maybe a hardware issue as I'm using EC2), but is there a to very quickly dump the queue results into a file or if my approach is okay can I improve it somehow? As order does not matter, does it make more sense to write to multiple files on multiple drives? Will threading make it faster?,etc..I'm not exactly sure the best approach and any suggestions would be great. My goal is to save the results of the queue(sorry no outputting to /dev/null :-) and keep memory consumption as low as possible for my app(I'm not 100% sure but the queue fills up 15gig, so I'm assuming it'll be a 15gig+ file).
Fastest way to write huge data in text file Java (realized I should use buffered writer)
Concurrent file write in Java on Windows (made me see that maybe multi-threading writes wasn't a great idea)
Looking at that code, one thing that springs to mind is character encoding. You're writing strings, but ultimately, it's bytes that go to the streams. A writer character-to-byte encoding under the hood, and it's doing it in the same thread that is handling writing. That may mean that there is time being spent encoding that is delaying writes, which could reduce the rate at which data is written.
A simple change would be to use a queue of byte[] instead of String, do the encoding in the threads which push onto the queue, and have the IO code use a BufferedOutputStream rather than a BufferedWriter.
This may also reduce memory consumption, if the encoded text takes up less than two bytes per character on average. For latin text and UTF-8 encoding, this will usually be true.
However, i suspect it's likely that you're simply generating data faster than your IO subsystem can handle it. You will need to make your IO subsystem faster - either by using a faster one (if you're on EC2, perhaps renting a faster instance, or writing to a different backend - SQS vs EBS vs local disk, etc), or by ganging several IO subsystems together in parallel somehow.
Yes, writing multiple files on multiple drives should help, and if nothing else is writing to those drives at the same time, performance should scale linearly with the number of drives until I/O is no longer the bottleneck. You could also try a couple other optimizations to boost performance even more.
If you're generating huge files and the disk simply can't keep up, you can use a GZIPOutputStream to shrink the output--which, in turn, will reduce the amount of disk I/O. For non-random text, you can usually expect a compression ratio of at least 2x-10x.
//...background multi-threaded process keeps building the queue..
OutputStream out = new FileOutputStream("foo.txt",true);
OutputStreamWriter writer = new OutputStreamWriter(new GZIPOutputStream(out));
BufferedWriter bufferWriter = new BufferedWriter(writer);
while(!queue_of_stuff_to_write.isEmpty()) {
String data = solutions.poll().data;
bufferWriter.newLine();
bufferWriter.write(data);
}
bufferWriter.close();
If you're outputting regular (i.e., repetitive) data, you might also want to consider switching to a different output format--for example, a binary encoding of the data. Depending on the structure of your data, it might be more efficient to store it in a database. If you're outputting XML and really want to stick to XML, you should look into a Binary XML format, such as EXI or Fast InfoSet.
I guess as long as you produce your data out of calculations and do not load your data from another data source, writing will always be slower than generating your data.
You can try writing your data in multiple files (not in the same file -> due to synchronization problems) in multiple threads (but I guess that will not fix your problem).
Is it possible for you to wait for the writing part of your application to finish its operation and continue your calculations?
Another approach is:
Do you empty your queue? Does solutions.poll() reduce your solutions queue?
writing to different files using multiple threads is a good idea. Also, you should look into setting the BufferedWriters buffer size, which you can do from the constructor. Try initializing with a 10 Mb buffer and see if that helps
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
In Java, what is the advantage of using BufferedWriter to append to a file?
The site that I am looking at says
"The BufferWriter class is used to write text to a character-output stream, buffering characters so as to provide for the efficient writing of single characters, arrays, and strings."
What make's it more efficient and why?
BufferedWriter is more efficient because it uses buffers rather than writing character by character. So it reduces I/O operations of the disk. Data is collected in a buffer and write to the file when the buffer is full.
This is why sometimes no data is written in the file if you didn't call flush method. That is data is collected in the buffer but program exits before writing them to the file. Calling flush method will cause the data to be written in the file even the buffer is not filled completely.
The cost of writing becomes expensive when you write character by character to the file. For reducing that cost, buffers are provided. If you are writing to Buffer, it waits for some limit and then writes the whole to the disk.
A BufferedWriter waits until the buffer (8192 bytes) is full and writes the whole buffer in one disk operation. Unbuffered each single write would result in a disk I/O which is obviously more expensive.
Hard disk hava a minimum unit of information storage so for example if you are writing a single byte the operating system asks for the disk to store a unit of storage (I think that the minimum is 512 bytes). So you ask for writing one byte and the operating system writes much more. If you ask to store 512 bytes with 512 calls you end up doing a lot more I/O (512 disk operations) that buffering 512 bytes and issuing only one call (1 disk operation).
As the name suggests, BufferedWriter uses a buffer to reduce the costs of writes. If you are writing to file, you might know that writing 1byte or writing 4kbytes roughly costs the same. The time required to perform such write is dominated by the access time (~8ms) which is the time required by the disk to rotate and to seek the right sector.
Additionally, aggregating small writes in a bigger one allows you to reduce the overhead on the operating system, achieving better performances.
Most of the operating systems do have an internal buffer to cache writes. However, these caches tries to figure out what the application is doing, by analyzing the write patterns. If the application itself is able to perform that caching, and perform a write only when the data is ready, the result (in terms of performance) is better.
Let's say one program is reading file F.txt, and another program is writing to this file at the same moment.
(When I'm thinking about how would I implement this functionality if I were a system programmer) I realize that there can be ambiguity in:
what will the first program see?
where does the second program write new bytes? (i.e. write "in place" vs write to a new file and then replace the old file with the new one)
how many programs can write to the same file simultaneously?
.. and maybe something not so obvious.
So, my questions are:
what are the main strategies for reading/writing files functionality?
which of them are supported in which OS (Windows, Linux, Mac OS etc)?
can it be dependent on certain programming language? (I can suppose that Java can try to provide some unified behavior on all supported OSs)
A single byte read has a long journey to go, from the magnetic plate/flash cell to your local Java variable. This is the path that a single byte travels:
Magnetic plate/flash cell
Internal hard disc buffer
SATA/IDE bus
SATA/IDE buffer
PCI/PCI-X bus
Computer's data bus
Computer's RAM via DMA
OS Page-cache
Libc read buffer, aka user space fopen() read buffer
Local Java variable
For performance reasons, most of the file buffering done by the OS is kept on the Page Cache, storing the recent read and write files contents on RAM.
That means that every read and write operation from your Java code is done from and to your local buffer:
FileInputStream fis = new FileInputStream("/home/vz0/F.txt");
// This byte comes from the user space buffer.
int oneByte = fis.read();
A page is usually a single block of 4KB of memory. Every page has some special flags and attributes, one of them being the "dirty page", which means that page has some modified data not written to phisical media.
Some time later, when the OS decides to flush the dirty data back to the disk, it sends the data on the opposite direction from where it came.
Whenever two distinct process writes data to the same file, the resulting behaviour is:
Impossible, if the file is locked. The secondth process won't be able to open the file.
Undefined, if writing over the same region of the file.
Expected, if operating over different regions of the file.
A "region" is dependant on the internal buffer sizes that your application uses. For example, on a two megabytes file, two distinct processes may write:
One on the first 1kB of data (0; 1024).
The other on the last 1kB of data (2096128; 2097152)
Buffer overlapping and data corruption would occur only when the local buffer is two megabytes in size. On Java you can use the Channel IO to read files with a fine-grained control of what's going on inside.
Many transactional databases forces some writes from the local RAM buffers back to disk by issuing a sync operation. All the data related to a single file gets flushed back to the magnetic plates or flash cells, effectively ensuring that on power failure no data will be lost.
Finally, a memory mapped file is a region of memory that enables a user process to read and write directly from and to the page cache, bypassing the user space buffering.
The Page Cache system is vital to the performance of a multitasking protected mode OS, and every modern operating system (Windows NT upwards, Linux, MacOS, *BSD) supports all these features.
http://ezinearticles.com/?How-an-Operating-Systems-File-System-Works&id=980216
Strategies can be as much as file systems. Generally, the OS focuses on the avoidance of I/O operations by caching the file before it is synchronized with the disc. Reading from the buffer will see the previously saved data to it. So between the software and hardware is a layer of buffering (eg MySQL MyISAM engine uses this layer much)
JVM synchronize file descriptor buffers to disk at closing file or when a program is invoking methods like fsync() but buffers may be synchronized also by OS when they exceed the defined thresholds. In the JVM this is of course unified on all supported OS.