Fast non-blocking read/writes using MappedByteBuffer? - java

I am processing messages from a vendor as a stream of data and want to store msgSeqNum locally in a local file. Reason:
They send msgSeqNum to uniquely identify each message. And they provide a 'sync-and-stream' functionality to stream messages on reconnecting from a given sequence number. Say if the msgSeqNum starts from 1 and my connection went down at msgSeqNum 50 and missed the next 100 messages (vendor server's current msgSeqNum is now 150), then when I reconnect to the vendor, I need to call 'sync-and-stream' with msgSeqNum=50 to get the missed 100 messages.
So I want to understand how I can persist the msgSeqNum locally for fast access. I assume
1) Since the read/writes happen frequently i.e. while processing every message (read to ignore dups, write to update msgSeqNum after processing a msg), I think it's best to use Java NIO's 'MappedByteBuffer'?
2) Could someone confirm if the below code is best for this where I expose the mapped byte buffer object to be reused for reads and writes and leave the FileChannel open for the lifetime of the process? Sample Junit code below:
I know this could be achieved with general Java file operations to read and write into a file but I need something fast which is equivalent to non-IO as I am using a single writer patten and want to be quick in processing these messages in a non-blocking manner.
private FileChannel fileChannel = null;
private MappedByteBuffer mappedByteBuffer = null;
private Charset utf8Charset = null;
private CharBuffer charBuffer = null;
#Before
public void setup() {
try {
charBuffer = CharBuffer.allocate( 24 ); // Long max/min are till 20 bytes anyway
System.out.println( "charBuffer length: " + charBuffer.length() );
Path pathToWrite = getFileURIFromResources();
FileChannel fileChannel = (FileChannel) Files
.newByteChannel( pathToWrite, EnumSet.of(
StandardOpenOption.READ,
StandardOpenOption.WRITE,
StandardOpenOption.TRUNCATE_EXISTING ));
mappedByteBuffer = fileChannel
.map( FileChannel.MapMode.READ_WRITE, 0, charBuffer.length() );
utf8Charset = Charset.forName( "utf-8" );
//charBuffer = CharBuffer.allocate( 8 );
} catch ( Exception e ) {
// handle it
}
}
#After
public void destroy() {
try {
fileChannel.close();
} catch ( IOException e ) {
// handle it
}
}
#Test
public void testWriteAndReadUsingSharedMappedByteBuffer() {
if ( mappedByteBuffer != null ) {
mappedByteBuffer.put( utf8Charset.encode( charBuffer.wrap( "101" ) )); // TODO improve this and try reusing the same buffer instead of creating a new one
} else {
System.out.println( "mappedByteBuffer null" );
fail();
}
mappedByteBuffer.flip();
assertEquals( "101", utf8Charset.decode(mappedByteBuffer).toString() );
}

Related

Parsing files over 2.15 GB in Java using Kaitai Struct

I'm parsing large PCAP files in Java using Kaitai-Struct. Whenever the file size exceeds Integer.MAX_VALUE bytes I face an IllegalArgumentException caused by the size limit of the underlying ByteBuffer.
I haven't found references to this issue elsewhere, which leads me to believe that this is not a library limitation but a mistake in the way I'm using it.
Since the problem is caused by trying to map the whole file into the ByteBuffer I'd think that the solution would be mapping only the first region of the file, and as the data is being consumed map again skipping the data already parsed.
As this is done within the Kaitai Struct Runtime library it would mean to write my own class extending fom KatiaiStream and overwrite the auto-generated fromFile(...) method, and this doesn't really seem the right approach.
The auto-generated method to parse from file for the PCAP class is.
public static Pcap fromFile(String fileName) throws IOException {
return new Pcap(new ByteBufferKaitaiStream(fileName));
}
And the ByteBufferKaitaiStream provided by the Kaitai Struct Runtime library is backed by a ByteBuffer.
private final FileChannel fc;
private final ByteBuffer bb;
public ByteBufferKaitaiStream(String fileName) throws IOException {
fc = FileChannel.open(Paths.get(fileName), StandardOpenOption.READ);
bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
}
Which in turn is limitted by the ByteBuffer max size.
Am I missing some obvious workaround? Is it really a limitation of the implementation of Katiati Struct in Java?
There are two separate issues here:
Running Pcap.fromFile() for large files is generally not a very efficient method, as you'll eventually get all files parsed into memory array at once. A example on how to avoid that is given in kaitai_struct/issues/255. The basic idea is that you'd want to have control over how you read every packet, and then dispose of every packet after you've parsed / accounted it somehow.
2GB limit on Java's mmaped files. To mitigate that, you can use alternative RandomAccessFile-based KaitaiStream implementation: RandomAccessFileKaitaiStream — it might be slower, but it should avoid that 2GB problem.
This library provides a ByteBuffer implementation which uses long offset. I haven't tried this approach but looks promising. See section Mapping Files Bigger than 2 GB
http://www.kdgregory.com/index.php?page=java.byteBuffer
public int getInt(long index)
{
return buffer(index).getInt();
}
private ByteBuffer buffer(long index)
{
ByteBuffer buf = _buffers[(int)(index / _segmentSize)];
buf.position((int)(index % _segmentSize));
return buf;
}
public MappedFileBuffer(File file, int segmentSize, boolean readWrite)
throws IOException
{
if (segmentSize > MAX_SEGMENT_SIZE)
throw new IllegalArgumentException(
"segment size too large (max " + MAX_SEGMENT_SIZE + "): " + segmentSize);
_segmentSize = segmentSize;
_fileSize = file.length();
RandomAccessFile mappedFile = null;
try
{
String mode = readWrite ? "rw" : "r";
MapMode mapMode = readWrite ? MapMode.READ_WRITE : MapMode.READ_ONLY;
mappedFile = new RandomAccessFile(file, mode);
FileChannel channel = mappedFile.getChannel();
_buffers = new MappedByteBuffer[(int)(_fileSize / segmentSize) + 1];
int bufIdx = 0;
for (long offset = 0 ; offset < _fileSize ; offset += segmentSize)
{
long remainingFileSize = _fileSize - offset;
long thisSegmentSize = Math.min(2L * segmentSize, remainingFileSize);
_buffers[bufIdx++] = channel.map(mapMode, offset, thisSegmentSize);
}
}
finally
{
// close quietly
if (mappedFile != null)
{
try
{
mappedFile.close();
}
catch (IOException ignored) { /* */ }
}
}
}

How are these methods allowing / causing data to be lost on disk?

I have a program that writes its settings and data out to disk every so often (15 seconds or so).
If the program is running and the computer is shut off abruptly -- for example, with the power being cut at the wall -- somehow all of my data files on disk are changed to empty files.
Here is my code, which I thought I designed to protect against this failure, but based on testing the failure still exists:
SaveAllData -- Called every so often, and also when JavaFX.Application.stop() is called.
public void saveAllData () {
createNecessaryFolders();
saveAlbumsAndTracks();
saveSources();
saveCurrentList();
saveQueue();
saveHistory();
saveLibraryPlaylists();
saveSettings();
saveHotkeys();
}
CreateNecessaryFolders
private void createNecessaryFolders () {
if ( !playlistsDirectory.exists() ) {
boolean playlistDir = playlistsDirectory.mkdirs();
}
}
Save Functions -- they all look just like this
public void saveCurrentList () {
File tempCurrentFile = new File ( currentFile.toString() + ".temp" );
try ( ObjectOutputStream currentListOut = new ObjectOutputStream( new FileOutputStream( tempCurrentFile ) ) ) {
currentListOut.writeObject( player.getCurrentList().getState() );
currentListOut.flush();
currentListOut.close();
Files.move( tempCurrentFile.toPath(), currentFile.toPath(), StandardCopyOption.REPLACE_EXISTING );
} catch ( Exception e ) {
LOGGER.warning( e.getClass().getCanonicalName() + ": Unable to save current list to disk, continuing." );
}
}
Github repository to commit where this problem exists. See Persister.java.
As I said, when the power is cut abruptly all setting files saved by this method are blanked. This makes particularly no sense to me, since they are called in sequence and I am making sure the file is written to disk and flushed before calling move().
Any idea how this could be happening? I thought by calling flush, close, then move, I would ensure that the data is written to disk before overwriting the old data. Somehow, this isn't the case, but I am clueless. Any suggestions?
Note: these files are only written to by these functions, and only read from by corresponding load() functions. There is no other access to the files any where else in my program.
Note 2: I am experiencing this on Ubuntu Linux 16.10. I have not tested it on other platforms yet.
Adding StandardCopyOption.ATOMIC_MOVE to the Files.move() call solves the problem:
public void saveCurrentList () {
File tempCurrentFile = new File ( currentFile.toString() + ".temp" );
try ( ObjectOutputStream currentListOut = new ObjectOutputStream( new FileOutputStream( tempCurrentFile ) ) ) {
currentListOut.writeObject( player.getCurrentList().getState() );
currentListOut.flush();
currentListOut.close();
Files.move( tempCurrentFile.toPath(), currentFile.toPath(), StandardCopyOption.REPLACE_EXISTING, StandardCopyOption.ATOMIC_MOVE );
} catch ( Exception e ) {
LOGGER.warning( e.getClass().getCanonicalName() + ": Unable to save current list to disk, continuing." );
}
}

Serialization Overwriting Data

I have a program I'm making for a small business which is implementing serializable on a linkedList to save data. This all works fine, until I have two staff members try and add more data to the list and one ends up overwriting the other.
JButton btnSaveClientFile = new JButton("Save Client File");
btnSaveClientFile.addMouseListener(new MouseAdapter() {
#Override
public void mouseClicked(MouseEvent arg0) {
// add new items to list
jobList.add(data);
.
.
.
Controller.saveData();
}
});
btnSaveClientFile.setBounds(10, 229, 148, 23);
frame.getContentPane().add(btnSaveClientFile);
This method results in one overwriting the other, so I tried doing it like this
JButton btnSaveClientFile = new JButton("Save Client File");
btnSaveClientFile.addMouseListener(new MouseAdapter() {
#Override
public void mouseClicked(MouseEvent arg0) {
Controller.retrieveData();
// add new items to list
jobList.add(data);
.
.
.
Controller.saveData();
}
});
btnSaveClientFile.setBounds(10, 229, 148, 23);
frame.getContentPane().add(btnSaveClientFile);
And when I use this one, I get no data added to the list at all. Here are my Serialization methods. This one is used to save my data.
// methods to serialize data
public static void saveData() {
System.out.println("Saving...");
FileOutputStream fos = null;
ObjectOutputStream oos = null;
try {
fos = new FileOutputStream("Data.bin");
oos = new ObjectOutputStream(fos);
oos.writeObject(myOLL);
oos.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
And this one is used to collect my data
public static void retrieveData() {
// Get data from disk
System.out.println("Loading...");
FileInputStream fis = null;
ObjectInputStream ois = null;
try {
fis = new FileInputStream("Data.bin");
ois = new ObjectInputStream(fis);
myOLL = (OrderedLinkedList) ois.readObject();
ois.close();
} catch (Exception ex) {
System.err.println("File cannot be found");
ex.printStackTrace();
}
}
How do I make it so I can save data to my file from two different computers at a similar time, without one overwriting the other?
This is a demo (and not meant to be used in this crude way) how to acquire a lock on file /tmp/data.
RandomAccessFile raf = new RandomAccessFile( "/tmp/data", "rw" );
FileChannel chan = raf.getChannel();
FileLock lock = null;
while( (lock = chan.tryLock() ) == null ){
System.out.println( "waiting for file" );
Thread.sleep( 1000 );
}
System.out.println( "using file" );
Thread.sleep( 3000 );
System.out.println( "done" );
lock.release();
Clearly, reading a sequential file, mulling over it for some time and then rewriting or not is prohibitive if you require a high level of concurrency. That's why such applications typically use database systems, the client-server paradigm. A free-for-all on the file system isn't tolerable except in rare circumstances. Your organization may be able to assign updates of the data to one person at a time, which would simplify matters.
add more data to the list and one ends up overwriting the other.
This is how files work by default, in fact the ObjectOutputStream doesn't support an "append" mode. Once you have closed the stream, you can't alter it.
How do I make it so I can save data to my file from two different computers at a similar time, without one overwriting the other?
You have two problems here
how to write to a file twice without losing information.
how to co-ordinate writes between processes without one impacting the other.
For the first part, you need to read the contents of the list first, add the entries you wand to add, and write out the contents again. OR you can change the file format to one which supports appending.
For the second part, you need to use locking of some kind. A simple way to do this is to create a lock file. You can create a second file atomically e.g. file.lock and the one which succeeds in creating the file holds the lock, that process alters the file and deletes the lock which finished. Some care needs to be taken to ensure you always remove the lock.
Another approach is to use file locks. You have to take care not to delete the file in the process however this has the benefit that the OS will clean up the lock if your process dies.

How do you write to disk (with flushing) in Java and maintain performance?

Using the following code as a benchmark, the system can write 10,000 rows to disk in a fraction of a second:
void withSync() {
int f = open( "/tmp/t8" , O_RDWR | O_CREAT );
lseek (f, 0, SEEK_SET );
int records = 10*1000;
clock_t ustart = clock();
for(int i = 0; i < records; i++) {
write(f, "012345678901234567890123456789" , 30);
fsync(f);
}
clock_t uend = clock();
close (f);
printf(" sync() seconds:%lf writes per second:%lf\n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}
In the above code, 10,000 records can be written and flushed out to disk in a fraction of a second, output below:
sync() seconds:0.006268 writes per second:0.000002
In the Java version, it takes over 4 seconds to write 10,000 records. Is this just a limitation of Java, or am I missing something?
public void testFileChannel() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw");
FileChannel c = raf.getChannel();
c.force(true);
ByteBuffer b = ByteBuffer.allocateDirect(64*1024);
long s = System.currentTimeMillis();
for(int i=0;i<10000;i++){
b.clear();
b.put("012345678901234567890123456789".getBytes());
b.flip();
c.write(b);
c.force(false);
}
long e=System.currentTimeMillis();
raf.close();
System.out.println("With flush "+(e-s));
}
Returns this:
With flush 4263
Please help me understand what is the correct/fastest way to write records to disk in Java.
Note: I am using the RandomAccessFile class in combination with a ByteBuffer as ultimately we need random read/write access on this file.
Actually, I am surprised that test is not slower. The behavior of force is OS dependent but broadly it forces the data to disk. If you have an SSD you might achieve 40K writes per second, but with an HDD you won't. In the C example its clearly isn't committing the data to disk as even the fastest SSD cannot perform more than 235K IOPS (That the manufacturers guarantee it won't go faster than that :D )
If you need the data committed to disk every time, you can expect it to be slow and entirely dependent on the speed of your hardware. If you just need the data flushed to the OS and if the program crashes but the OS does not, you will not loose any data, you can write data without force. A faster option is to use memory mapped files. This will give you random access without a system call for each record.
I have a library Java Chronicle which can read/write 5-20 millions records per second with a latency of 80 ns in text or binary formats with random access and can be shared between processes. This only works this fast because it is not committing the data to disk on every record, but you can test that if the JVM crashes at any point, no data written to the chronicle is lost.
This code is more similar to what you wrote in C. Takes only 5 msec on my machine. If you really need to flush after every write, it takes about 60 msec. Your original code took about 11 seconds on this machine. BTW, closing the output stream also flushes.
public static void testFileOutputStream() throws IOException {
OutputStream os = new BufferedOutputStream( new FileOutputStream( "/tmp/fos" ) );
byte[] bytes = "012345678901234567890123456789".getBytes();
long s = System.nanoTime();
for ( int i = 0; i < 10000; i++ ) {
os.write( bytes );
}
long e = System.nanoTime();
os.close();
System.out.println( "outputstream " + ( e - s ) / 1e6 );
}
Java equivalent of fputs is file.write("012345678901234567890123456789"); , you are calling 4 functions and just 1 in C, delay seems obvious
i think this is most similar to your C version. i think the direct buffers in your java example are causing many more buffer copies than the C version. this takes about 2.2s on my (old) box.
public static void testFileChannelSimple() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw");
FileChannel c = raf.getChannel();
c.force(true);
byte[] bytes = "012345678901234567890123456789".getBytes();
long s = System.currentTimeMillis();
for(int i=0;i<10000;i++){
raf.write(bytes);
c.force(true);
}
long e=System.currentTimeMillis();
raf.close();
System.out.println("With flush "+(e-s));
}

How to clear the screen output of a Java HttpServletResponse

I'm writing to the browser window using servletResponse.getWriter().write(String).
But how do I clear the text which was written previously by some other similar write call?
The short answer is, you cannot -- once the browser receives the response, there is no way to take it back. (Unless there is some way to abnormally stop a HTTP response to cause the client to reload the page, or something to that extent.)
Probably the last place a response can be "cleared" in a sense, is using the ServletResponse.reset method, which according to the Servlet Specification, will reset the buffer of the servlet's response.
However, this method also seems to have a catch, as it will only work if the buffer has not been committed (i.e. sent to the client) by the ServletOutputStream's flush method.
You cannot. The best thing is to write to a buffer (StringWriter / StringBuilder) and then you can replace the written data any time. Only when you know for sure what is the response you can write the buffer's content to the response.
In the same matter, and reason to write the response this way and not to use some view technology for your output such as JSP, Velocity, FreeMarker, etc.?
If you have an immediate problem that you need to solve quickly, you could work around this design problem by increasing the size of the response buffer - you'll have to read your application server's docs to see if this is possible. However, this solution will not scale as you'll soon run into out-of-memory issues if you site traffic peaks.
No view technology will protect you from this issue. You should design your application to figure out what you're going to show the user before you start writing the response. That means doing all your DB access and business logic ahead of time. This is a common issue I've seen with convoluted system designs that use proxy objects that lazily access the database. E.g. ORM with Entity relationships are bad news if accessed from your view layer! There's not much you can do about an exception that happens 3/4 of the way into a rendered page.
Thinking about it, there might be some way to inject a page redirect via AJAX. Anyone ever heard of a solution like that?
Good luck with re-architecting your design!
I know the post is pretty old, but just thought of sharing my views on this.
I suppose you could actually use a Filter and a ServletResponseWrapper to wrap the response and pass it along the chain.
That is, You can have an output stream in the wrapper class and write to it instead of writing into the original response's output stream... you can clear the wrapper's output stream as and when you please and you can finally write to the original response's output stream when you are done with your processing.
For example,
public class MyResponseWrapper extends HttpServletResponseWrapper {
protected ByteArrayOutputStream baos = null;
protected ServletOutputStream stream = null;
protected PrintWriter writer = null;
protected HttpServletResponse origResponse = null;
public MyResponseWrapper( HttpServletResponse response ) {
super( response );
origResponse = response;
}
public ServletOutputStream getOutputStream()
throws IOException {
if( writer != null ) {
throw new IllegalStateException( "getWriter() has already been " +
"called for this response" );
}
if( stream == null ) {
baos = new ByteArrayOutputStream();
stream = new MyServletStream(baos);
}
return stream;
}
public PrintWriter getWriter()
throws IOException {
if( writer != null ) {
return writer;
}
if( stream != null ) {
throw new IllegalStateException( "getOutputStream() has already " +
"been called for this response" );
}
baos = new ByteArrayOutputStream();
stream = new MyServletStream(baos);
writer = new PrintWriter( stream );
return writer;
}
public void commitToResponse() {
origResponse.getOutputStream().write(baos.toByteArray());
origResponse.flush();
}
private static class MyServletStream extends ServletOutputStream {
ByteArrayOutputStream baos;
MyServletStream(ByteArrayOutputStream baos) {
this.baos = baos;
}
public void write(int param) throws IOException {
baos.write(param);
}
}
//other methods you want to implement
}

Categories

Resources