Java server memory management on slow networks

Java server memory management on slow networks - java

I'm writing a code that generates a large XML document and writes it directly into a client stream using StAX XmlStreamWriter.
I'm afraid that if the network becomes extremely slow, the bytes written into the stream will actually stay in memory buffers for a relatively long time and consume a lot of memory on my server.
My question is: is there any way I can keep writing directly to the client stream, and avoid the potential memory problem I described above?

There wouldn't seem to be if you generate faster then you can stream out it has to be in memory. If this does become a major problem you would need to look at a way to move it out of memory such as generating a file but that still needs to be loaded and streamed by something. The main advantage of a file is if you can reuse the file for many requests.

Related

Using files for for shared memory IPC

In my application, there is one process which writes data to a file, and then, in response to receiving a request, will send (some) of that data via the network to the requesting process. The basis of this question is to see if we can speed up communication when both processes happen to be on the same host. (In my case, the processes are Java, but I think this discussion can apply more broadly.)
There are a few projects out there which use the MappedByteBuffers returned by Java's FileChannel.map() as a way to have shared memory IPC between JVMs on the same host (see Chronicle Queue, Aeron IPC, etc.).
One approach to speeding up same-host communication would be to have my application use one of those technologies to provide the request-response pathway for same-host communication, either in conjunction with the existing mechanism for writing to the data file, or by providing a unified means of both communication and writing to the file.
Another approach would be to allow the requesting process to have direct access to the data file.
I tend to favor the second approach - assuming it would be correct - as it would be easier to implement, and seems more efficient than copying/transmitting a copy of the data for each request (assuming we didn't replace the existing mechanism for writing to the file).
Essentially, I'd like to understanding what exactly occurs when two processes have access to the same file, and use it to communicate, specifically Java (1.8) and Linux (3.10).
From my understanding, it seems like if two processes have the same file open at the same time, the "communication" between them will essentially be via "shared memory".
Note that this question is not concerned with the performance implication of using a MappedByteBuffer or not - it seem highly likely that a using mapped buffers, and the reduction in copying and system calls, will reduce overhead compared to reading and writing the file, but that might require significant changes to the application.
Here is my understanding:
When Linux loads a file from disk, it copies the contents of that file to pages in memory. That region of memory is called the page cache. As far as I can tell, it does this regardless of which Java method (FileInputStream.read(), RandomAccessFile.read(), FileChannel.read(), FileChannel.map()) or native method is used to read the file (obseved with "free" and monitoring the "cache" value).
If another process attempts to load the same file (while it is still resident in the cache) the kernel detects this and doesn't need to reload the file. If the page cache gets full, pages will get evicted - dirty ones being written back out to the disk. (Pages also get written back out if there is an explicit flush to disk, and periodically, with a kernel thread).
Having a (large) file already in the cache is a significant performance boost, much more so than the differences based on which Java methods we use to open/read that file.
If a file is loaded using the mmap system call (C) or via FileChannel.map() (Java), essentially the file's pages (in the cache) are loaded directly into the process' address space. Using other methods to open a file, the file is loaded into pages not in the process' address space, and then the various methods to read/write that file copy some bytes from/to those pages into a buffer in the process' address space. There is an obvious performance benefit avoiding that copy, but my question isn't concerned with performance.
So in summary, if I understand correctly - while mapping offer a performance advantage, it doesn't seem like it offers any "shared memory" functionality that we don't already get just from the nature of the Linux and the page cache.
So, please let me know where my understanding is off.
Thanks.

My question is, on Java (1.8) and Linux (3.10), are MappedByteBuffers really necessary for implementing shared memory IPC, or would any access to a common file provide the same functionality?
It depends on why you want to implement shared memory IPC.
You can clearly implement IPC without shared memory; e.g. over sockets. So, if you are not doing it for performance reasons, it is not necessary to do shared memory IPC at all!
So performance has to be at the root of any discussion.
Access using files via the Java classic io or nio APIs does not provide shared memory functionality or performance.
The main difference between regular file I/O or Socket I/O versus shared memory IPC is that the former requires the applications to explicitly make read and write syscalls to send and receive messages. This entails extra syscalls, and entails the kernel copying data. Furthermore, if there are multiple threads you either need a separate "channel" between each thread pair or something to multiplexing multiple "conversations" over a shared channel. The latter can lead to the shared channel becoming a concurrency bottleneck.
Note that these overheads are orthogonal to the Linux page cache.
By contrast, with IPC implemented using shared memory, there are no read and write syscalls, and no extra copy step. Each "channel" can simply use a separate area of the mapped buffer. A thread in one process writes data into the shared memory and it is almost immediately visible to the second process.
The caveat is that the processes need to 1) synchronize, and 2) implement memory barriers to ensure that the reader doesn't see stale data. But these can both be implemented without syscalls.
In the wash-up, shared memory IPC using memory mapped files >>is<< faster than using conventional files or sockets, and that is why people do it.
You also implicitly asked if shared memory IPC can be implemented without memory mapped files.
A practical way would be to create a memory-mapped file for a file that lives in a memory-only file system; e.g. a "tmpfs" in Linux.
Technically, that is still a memory-mapped file. However, you don't incur overheads of flushing data to disk, and you avoid the potential security concern of private IPC data ending up on disk.
You could in theory implement a shared segment between two processes by doing the following:
In the parent process, use mmap to create a segment with MAP_ANONYMOUS | MAP_SHARED.
Fork child processes. These will end up all sharing the segment with each other and the parent process.
However, implementing that for a Java process would be ... challenging. AFAIK, Java does not support this.
Reference:
What is the purpose of MAP_ANONYMOUS flag in mmap system call?

Essentially, I'm trying to understand what happens when two processes have the same file open at the same time, and if one could use this to safely and performantly offer communication between to processes.
If you are using regular files using read and write operations (i.e. not memory mapping them) then the two processes do not share any memory.
User-space memory in the Java Buffer objects associated with the file is NOT shared across address spaces.
When a write syscall is made, data is copied from pages in one processes address space to pages in kernel space. (These could be pages in the page cache. That is OS specific.)
When a read syscall is made, data is copied from pages in kernel space to pages in the reading processes address space.
It has to be done that way. If the operating system shared pages associated with the reader and writer processes buffers behind their backs, then that would be an security / information leakage hole:
The reader would be able to see data in the writer's address space that had not yet been written via write(...), and maybe never would be.
The writer would be able to see data that the reader (hypothetically) wrote into its read buffer.
It would not be possible to address the problem by clever use of memory protection because the granularity of memory protection is a page versus the granularity of read(...) and write(...) which is as little as a single byte.
Sure: you can safely use reading and writing files to transfer data between two processes. But you would need to define a protocol that allows the reader to know how much data the writer has written. And the reader knowing when the writer has written something could entail polling; e.g. to see if the file has been modified.
If you look at this in terms of just the data copying in the communication "channel"
With memory mapped files you copy (serialize) the data from application heap objects to the mapped buffer, and a second time (deserialize) from the mapped buffer to application heap objects.
With ordinary files there are two additional copies: 1) from the writing processes (non-mapped) buffer to kernel space pages (e.g. in the page cache), 2) from the kernel space pages to the reading processes (non-mapped) buffer.
The article below explains what is going on with conventional read / write and memory mapping. (It is in the context of copying a file and "zero-copy", but you can ignore that.)
Reference:
Zero Copy I: User-Mode Perspective

Worth mentioning three points: performance, and concurrent changes, and memory utilization.
You are correct in the assessment that MMAP-based will usually offer performance advantage over file based IO. In particular, the performance advantage is significant if the code perform lot of small IO at artbitrary point of the file.
consider changing the N-th byte: with mmap buffer[N] = buffer[N] + 1, and with file based access you need (at least) 4 system calls + error checking:
seek() + error check
read() + error check
update value
seek() + error check
write + error check
It is true the the number of actual IO (to the disk) most likely be the same.
The second point worth noting the concurrent access. With file based IO, you have to worry about potential concurrent access. You will need to issue explicit locking (before the read), and unlock (after the write), to prevent two processes for incorrectly accessing the value at the same time. With shared memory, atomic operations can eliminate the need for additional lock.
The third point is actual memory usage. For cases where the size of the shared objects is significant, using shared memory can allow large number of processes to access the data without allocating additional memory. If systems constrained by memory, or system that need to provide real-time performance, this could be the only way to access the data.

Should I use memory mapped files for my simple flow?

I am interested in implemented the following simple flow:
Client sends to a server process a simple message which the server stores. Since the message does not have any hierarchical structure IMO the best approach is to save it in a file instead of an rdb.
But I want to figure out how to optimize this since as I see it there are 2 choices:
Server sends a 200 OK to the client and then stores the message so
the client does not notice any delay
Server saves the message and then sends the 200OK but then the
client notices the overhead of the file I/O.
I prefer the performance of (1) but this could lead to a client thinking all went ok when actually the msg was never saved (for various error cases).
So I was thinking if I could use nio and memory mapped files.
But I was wondering is this a good candidate for using mem-mapped files? Would using a memory mapped file guarantee that e.g. if the process crashed the msg would be saved?
In my mind the flow would be creating/opening and closing many files so is this a good candidate for memory-mapping files?

Server saves the message and then sends the 200OK but then the client notices the overhead of the file I/O.
I suggest you test this. I doubt a human will notice a 10 milli-second delay and I expect you should get better than this for smaller messages.
So I was thinking if I could use nio and memory mapped files.
I use memory mapping as it can reduce the overhead per write by up to 5 micro-second. Is this important to you? If not, I would stick with the simplest approach.
Would using a memory mapped file guarantee that e.g. if the process crashed the msg would be saved?
As long as the OS doesn't crash, yes.
In my mind the flow would be creating/opening and closing many files so is this a good candidate for memory-mapping files?
Opening and closing files is likely to be fast more expensive than writing the data. (By an order of magnitude) I would suggest keeping such operations to a minimum.
You might find this library of mine interesting. https://github.com/peter-lawrey/Java-Chronicle It allows you to persist messages in the order of single digit micro-seconds for text and sub-micro-second for a small binary message.

How do you write zero copy in java? What are the main differences

I was reading about how you can use the java nio library to take advantage of file transfer/buffering at the O/S level which is called 'zero copy'.
What are the differences in how you create/write to files then?
Are there any drawbacks to using zero-copy?

zero copy means that your program will not transfer the data from the kernel space to the user space and so on. this is faster
nice article can be found here:
https://developer.ibm.com/articles/j-zerocopy/

Zero copy is a technique where the application is no longer the 'middleman' in transferring data from a disk to the socket. Applications that use zero copy request that the kernel copy the data directly from the disk file to the socket, without going through the application, which improves performance and reduces context switches.
It all depends on what the application will do with the data it reads from disks. If it is a web application serving a lot of static content by reading files and relaying them over sockets, then zero copy is the way to go in order to get better performance. However, if the application is using the data locally (either crunching it in some way and then writing it back, or displaying it locally without persisting it back), you would not use zero copy.
This IBM DeveloperWorks article about zero copy is a good read.
Other ways of file I/O in java are via the use of Stream classes based on the type of file you would want to read/write. This involves both buffered and unbuffered streams, although usually buffered streams promise better performance since they cause less I/O seek cycles and hence lesser context switches.

Is it faster to port data from a C++ program to a Java program's input via sockets than via a raw json or xml file on the server?

What's the best way to go about things in terms of speed/performance?
Where do things like "Apache Thrift" come in and what are the benefits?
Please add some good resources I can use to learn about any recommendations!
Thanks all

Presuming you mean both processes are already running, then it's going to be via sockets.
Writing a file to the disk from one process then reading it from the other is going to incur the performance hit of the disk write and read (and of course whatever method you employ to keep the reader from accessing the file until it's done being written; either locks or an atomic rename on the disk).
Even ignoring that, your localhost interface is going to have a faster transfer rate than your disk controller, with the possible exception of a 10Gb fiber channel RAID array with 15k RPM drives in it.

Try it out. There's just no other way to find out.
Using sockets or the file system should be comparably fast, since both methods rely on some system calls that are very similar.
Always be aware that this communication involves these steps:
Encoding your data to a stream of bytes (JSON, XML, YAML, X.509 DER, Java Serialization)
Transferring this stream of bytes (TCP socket, UNIX socket, filesystem, ramdisk, pipes)
Decoding the stream of bytes into data (same as step 1)
Step 1 and 2 are completely independent, so take that into account when you benchmark.

Performance / stability of a Memory Mapped file - Native or MappedByteBuffer - vs. plain ol' FileOutputStream

I support a legacy Java application that uses flat files (plain text) for persistence. Due to the nature of the application, the size of these files can reach 100s MB per day, and often the limiting factor in application performance is file IO. Currently, the application uses a plain ol' java.io.FileOutputStream to write data to disk.
Recently, we've had several developers assert that using memory-mapped files, implemented in native code (C/C++) and accessed via JNI, would provide greater performance. However, FileOutputStream already uses native methods for its core methods (i.e. write(byte[])), so it appears a tenuous assumption without hard data or at least anecdotal evidence.
I have several questions on this:
Is this assertion really true?
Will memory mapped files always
provide faster IO compared to Java's
FileOutputStream?
Does the class MappedByteBuffer
accessed from a FileChannel provide
the same functionality as a native
memory mapped file library accessed
via JNI? What is MappedByteBuffer
lacking that might lead you to use a
JNI solution?
What are the risks of using
memory-mapped files for disk IO in a production
application? That is, applications
that have continuous uptime with
minimal reboots (once a month, max).
Real-life anecdotes from production
applications (Java or otherwise)
preferred.
Question #3 is important - I could answer this question myself partially by writing a "toy" application that perf tests IO using the various options described above, but by posting to SO I'm hoping for real-world anecdotes / data to chew on.
[EDIT] Clarification - each day of operation, the application creates multiple files that range in size from 100MB to 1 gig. In total, the application might be writing out multiple gigs of data per day.

Memory mapped I/O will not make your disks run faster(!). For linear access it seems a bit pointless.
A NIO mapped buffer is the real thing (usual caveat about any reasonable implementation).
As with other NIO direct allocated buffers, the buffers are not normal memory and wont get GCed as efficiently. If you create many of them you may find that you run out of memory/address space without running out of Java heap. This is obviously a worry with long running processes.

You might be able to speed things up a bit by examining how your data is being buffered during writes. This tends to be application specific as you would need an idea of the expected data writing patterns. If data consistency is important, there will be tradeoffs here.
If you are just writing out new data to disk from your application, memory mapped I/O probably won't help much. I don't see any reason you would want to invest time in some custom coded native solution. It just seems like too much complexity for your application, from what you have provided so far.
If you are sure you really need better I/O performance - or just O performance in your case, I would look into a hardware solution such as a tuned disk array. Throwing more hardware at the problem is often times more cost effective from a business point of view than spending time optimizing software. It is also usually quicker to implement and more reliable.
In general, there are a lot of pitfalls in over optimization of software. You will introduce new types of problems to your application. You might run into memory issues/ GC thrashing which would lead to more maintenance/tuning. The worst part is that many of these issues will be hard to test before going into production.
If it were my app, I would probably stick with the FileOutputStream with some possibly tuned buffering. After that I'd use the time honored solution of throwing more hardware at it.

From my experience, memory mapped files perform MUCH better than plain file access in both real time and persistence use cases. I've worked primarily with C++ on Windows, but Linux performances are similar, and you're planning to use JNI anyway, so I think it applies to your problem.
For an example of a persistence engine built on memory mapped file, see Metakit. I've used it in an application where objects were simple views over memory-mapped data, the engine took care of all the mapping stuff behind the curtains. This was both fast and memory efficient (at least compared with traditional approaches like those the previous version used), and we got commit/rollback transactions for free.
In another project I had to write multicast network applications. The data was send in randomized order to minimize the impact of consecutive packet loss (combined with FEC and blocking schemes). Moreover the data could well exceed the address space (video files were larger than 2Gb) so memory allocation was out of question. On the server side, file sections were memory-mapped on demand and the network layer directly picked the data from these views; as a consequence the memory usage was very low. On the receiver side, there was no way to predict the order into which packets were received, so it has to maintain a limited number of active views on the target file, and data was copied directly into these views. When a packet had to be put in an unmapped area, the oldest view was unmapped (and eventually flushed into the file by the system) and replaced by a new view on the destination area. Performances were outstanding, notably because the system did a great job on committing data as a background task, and real-time constraints were easily met.
Since then I'm convinced that even the best fine-crafted software scheme cannot beat the system's default I/O policy with memory-mapped file, because the system knows more than user-space applications about when and how data must be written. Also, what is important to know is that memory mapping is a must when dealing with large data, because the data is never allocated (hence consuming memory) but dynamically mapped into the address space, and managed by the system's virtual memory manager, which is always faster than the heap. So the system always use the memory optimally, and commits data whenever it needs to, behind the application's back without impacting it.
Hope it helps.

As for point 3 - if the machine crashes and there are any pages that were not flushed to disk, then they are lost. Another thing is the waste of the address space - mapping a file to memory consumes address space (and requires contiguous area), and well, on 32-bit machines it's a bit limited. But you've said about 100MB - so it should not be a problem. And one more thing - expanding the size of the mmaped file requires some work.
By the way, this SO discussion can also give you some insights.

If you write fewer bytes it will be faster. What if you filtered it through gzipoutputstream, or what if you wrote your data into ZipFiles or JarFiles?

As mentioned above, use NIO (a.k.a. new IO). There's also a new, new IO coming out.
The proper use of a RAID hard drive solution would help you, but that would be a pain.
I really like the idea of compressing the data. Go for the gzipoutputstream dude! That would double your throughput if the CPU can keep up. It is likely that you can take advantage of the now-standard double-core machines, eh?
-Stosh

I did a study where I compare the write performance to a raw ByteBuffer versus the write performance to a MappedByteBuffer. Memory-mapped files are supported by the OS and their write latencies are very good as you can see in my benchmark numbers. Performing synchronous writes through a FileChannel is approximately 20 times slower and that's why people do asynchronous logging all the time. In my study I also give an example of how to implement asynchronous logging through a lock-free and garbage-free queue for ultimate performance very close to a raw ByteBuffer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.