I need a byte buffer class in Java for single-threaded use. The buffer should resize when it's full, rather than throw an exception or something. Very important issue for me is performance.
What would you recommend?
ADDED:
At the momement I use ByteBuffer but it cannot resize. I need one that can resize.
Any reason not to use the boring normal ByteArrayOutputStream?
As mentioned by miku above, Evan Jones gives a review of different types and shows that it is very application dependent. So without knowing further details it is hard to speculate.
I would start with ByteArrayOutputStream, and only if profiling shows it is your performance bottleneck move to something else. Often when you believe the buffer code is the bottleneck, it will actually be network or other IO - wait until profiling shows you need an optimisation before wasting time finding a replacement.
If you are moving to something else, then other factors you will need to think about:
You have said you are using single threaded use, so BAOS's synchronization is not needed
what is the buffer being filled by and fed into? If either end is already wired to use Java NIO, then using a direct ByteBuffer is very efficient.
Are you using a circular buffer or a plain linear buffer? If you are then the Ostermiller Utils are pretty efficient, and GPL'd
You can use a direct ByteBuffer. Direct memory uses virtual memory to start with is only allocated to the application when it is used. i.e. the amount of main memory it uses re-sizes automagically.
Create a direct ByteBuffer larger than you need and it will only consume what you use.
you can also write manual code for checking the buffer content continously and if its full then make a new buffer of greater size and shift all the data in that new buffer.
Related
Let’s say I’ve mapped a memory region [0, 1000] and now I have MappedByteBuffer.
Can I read and write to this buffer from multiple threads at the same time without locking, assuming that each thread accesses different part of the buffer for exp. T1 [0, 500), T2 [500, 1000]?
If the above is true, is it possible to determine whether it’s better to create one big buffer for multiple threads, or smaller buffer for each thread?
Detailed Intro:
If you wanna learn how to answer those questions yourself, check their implementation source codes:
MappedByteBuffer: https://github.com/himnay/java7-sourcecode/blob/master/java/nio/MappedByteBuffer.java (notice it's still abstract, so you cannot instantiate it directly)
extends ByteBuffer: https://github.com/himnay/java7-sourcecode/blob/master/java/nio/ByteBuffer.java
extends Buffer: https://github.com/himnay/java7-sourcecode/blob/329bbb33cbe8620aee3cee533eec346b4b56facd/java/nio/Buffer.java (which only does index checks, and does not grant an actual access to any buffer memory)
Now it gets a bit more complicated:
When you wanna allocate a MappedByteBuffer, you will get either a
HeapByteBuffer: https://github.com/himnay/java7-sourcecode/blob/329bbb33cbe8620aee3cee533eec346b4b56facd/java/nio/HeapByteBuffer.java
or a DirectByteBuffer: https://github.com/himnay/java7-sourcecode/blob/329bbb33cbe8620aee3cee533eec346b4b56facd/java/nio/DirectByteBuffer.java
Instead of having to browse internet pages, you could also simply download the source code packages for your Java version and attach them in your IDE so you can see the code in development AND debug modes. A lot easier.
Short (incomplete) answer:
Neither of them does secure against multithreading.
So if you ever needed to resize the MappedByteBuffer, you might get stale or even bad (ArrayIndexOutOfBoundsException) access
If the size is constant, you can rely on either Implementation to be "thread safe", as far as your requirements are concerned
On a side note, here also lies an implementation failure creep in the Java implementation:
MappedByteBuffer extends ByteBuffer
ByteBuffer has the heap byte[] called "hb"
DirectByteBuffer extends MappedByteBuffer extends ByteBuffer
So DirectByteBuffer still has ByteBuffer's byte[] hb buffer,
but does not use it
and instead creates and manages its own Buffer
This design flaw comes from the step-by-step development of those classes (they were no all planned and implemented at the same time), AND the topic of package visibility, resulting in inversion of dependency/hierarchy of the implementation.
Now to the true answer:
If you wanna do proper object-oriented programming, you should NOT share resource unless utterly needed.
This ESPECIALLY means that each Thread should have its very own Buffer.
Advantage of having one global buffer: the only "advantage" is to reduce the additional memory consumption for additional object references. But this impact is SO MINIMAL (not even 1:10000 change in your app RAM consumption) that you will NEVER notice it. There's so many other objects allocated for any number of weird (Java) reasons everywhere that this is the least of your concerns. Plus you would have to introduce additional data (index boundaries) which lessens the 'advantage' even more.
The big Advantages of having separate buffers:
You will never have to take care of the pointer/index arithmetics
especially when it comes to you needing more threads at any given time
You can freely allocate new threads at any time without having to rearrange any data or do more pointer arithmetics
you can freely reallocate/resize each individual buffer when needed (without worrying about all the other threads' indexing requirement)
Debugging: You can locate problems so much easier that result from "writing out of boundaries", because if they tried, the bad thread would crash, and not other threads that would have to deal with corrupted data
Java ALWAYS checks each array access (on normal heap arrays like byte[]) before it accesses it, exactly to prevent side effects
think back: once upon a time there was the big step in operating systems to introduce linear address space so programs would NOT have to care about where in the hardware RAM they're loaded.
Your one-buffer-design would be the exact step backwards.
Conclusion:
If you wanna have a really bad design choice - which WILL make life a lot harder later on - you go with one global Buffer.
If you wanna do it the proper OO way, separate those buffers. No convoluted dependencies and side effect problems.
I want to make this general question. If we have a program which reads data from outside of the program, should we first put the data in a container and then manipulate the data or should we work directly on the stream, if the stream api of a language is powerful enough?
For example. I am writing a program which reads from text file. Should I first put the data in a string and then manipulate instead of working directly on the stream. I am using java and let's say it has powerful enough (for my needs) stream classes.
Stream processing is generally preferable to accumulating data in memory.
Why? One obvious reason is that the file you are reading might not even fit into memory. You might not even know the size of the data before you've read it completely (imagine, that you are reading from a socket or a pipe rather than a file).
It is also more efficient, especially, when the size isn't known ahead of time - allocating large chunks of memory and moving data around between them can be taxing. Things like processing and concatenating large strings aren't free either.
If the io is slow (ever tried reading from a tape?) or if the data is being produced in real time by a peer process (socket/pipe), your processing of the data read can, at least in part, happen in parallel with reading, which will speed things up.
Stream processing is inherently easier to scale and parallelize if necessary, because your logic is forced to only depend on the current element, being processed, you are free from state. If the amount of data becomes too large to process sequentially, you can trivially scale your app, by adding more readers, and splitting the stream between them.
You might argue, that in case none of this matters, because the file you are reading is only 300 bytes. Indeed, for small amounts of data, this is not crucial (you may also bubble sort it while you are at it), but adopting good patterns and practices makes you a better programmer, and will help when it does matters. There is no disadvantage to it. No, it does not make your code more complicated. It might seem so to you at first, but that's simply because you are not used to stream processing. Once you get into the right mindset, and it becomes natural to you, you'll see that, if anything, the code, dealing with one small piece of data at a time, and not caring about indexes, pointers and positions, is simpler than the alternative.
All of the above applies to sequential processing though. You read the stream once, processing the data immediately, as it comes in, and discarding it (or, perhaps, writing out to the next stream in pipeline).
You mentioned RandomAccessFile ... that's a completely different beast. If you need random access, and the data fits in memory, put it in memory. Seeking the file back and forth is the same thing conceptually, only much slower. There is no benefit to it other than saving memory.
You should certainly process it as you receive it. The other way adds latency and doesn't scale.
Is there any Java api available which would help in simulating a fixed amount of memory being used ??
I am building a dummy application that contains no implementations in its methods. All i would like to do within this methods is simulate a certain amount of memory being used up - is this at all possible?
The simplest way to consume a fixed amount of memory is to create a byte array of that size and retain it.
byte[] bytes = new byte[1000*1000]; // use 1 MB of memory.
Could get tricky with the way Java handles memory, considering applications are run through the runtime environment, don't know if it's going to the heap, etc.
One simple way might be loading text files into memory of the specific sizes you want, then somehow making sure they don't get garbage collected once the method returns.
I'm wondering how I'd code up a ByteBuffer recycling class that can get me a ByteBuffer which is at least as big as the specified length, and which can lock up ByteBuffer objects in use to prevent their use while they are being used by my code. This would prevent re-construction of DirectByteBuffers and such over and over, instead using existing ones. Is there an existing Java library which can do this very effectively? I know Javolution can work with object recycling, but does that extend to the ByteBuffer class in this context with the requirements set out?
It would be more to the point to be more conservative in your usage patterns in the first place. For example there is lots of code out there that shows allocation of a new ByteBuffer on every OP_READ. This is insane. You only need two ByteBuffers at most per connection, one for input and one for output, and depending on what you're doing you can get away with exactly one. In extremely simple cases like an echo server you can get away with one BB for the entire application.
I would look into that rather than paper over the cracks with yet another layer of software.
This is just advice, not an answer. If you do implement some caching for DirectByteBuffer, then be sure to read about the GC implications, because the memory consumed by DirectByteBuffer is not tracked by the garbage collector.
Some references:
A thread - featuring Stack Overflow's tackline
A blog post on the same subject
And the followup
Typically, you would use combination of ThreadLocal and SoftReference wrapper. Former to simplify synchronization (eliminate need for it, essentially); and latter to make buffer recycleable if there's not enough memory (keeping in mind other comments wrt. GC issues with direct buffers). It's actually quite simple: check if SoftReference has buffer with big enough size; if not, allocate; if yes, clear reference. Once you are done with it, re-set reference to point to buffer.
Another question is whether ByteBuffer is needed, compared to regular byte[]. Many developers assume ByteBuffers are better performance-wise, but that assumption is not usually backed by actual data (i.e. testing to see if there is performance difference, and to what direction). Reason why byte[] may often be faster is that code accessing it can be simpler, easier for HotSpot to efficiently JIT.
I'm working on an online game and I've hit a little snag while working on the server side of things.
When using nonblocking sockets in Java, what is the best course of action to handle complete packet data sets that cannot be processed until all the data is available? For example, sending a large 2D tiled map over a socket.
I can think of two ways to handle it:
Allocate the ByteBuffer large enough to handle the complete data set needed to process a large 2D tiled map from my example. Continue add read data to the buffer until it's all been received and process from there.
If the ByteBuffer is a smaller size (perhaps 1500), subsequent reads can be done and put out to a file until it can be processed completely from the file. This would prevent having to have large ByteBuffers, but degrades performance because of disk I/O.
I'm using a dedicated ByteBuffer for every SocketChannel so that I can keep reading in data until it's complete for processing. The problem is if my 2D Tiled Map amounts to 2MB in size, is it really wise to use 1000 2MB ByteBuffers (assuming 1000 is a client connection limit and they are all in use)? There must be a better way that I'm not thinking of.
I'd prefer to keep things simple, but I'm open to any suggestions and appreciate the help. Thanks!
Probably, the best solution for now is to use the full 2MB ByteBuffer and let the OS take care of paging to disk (virtual memory) if that's necessary. You probably won't have 1000 concurrent users right away, and when you do, you can optimize. You may be surprised what your real performance issues are.
I decided the best course of action was to simply reduce the size of my massive dataset and send tile updates instead of an entire map update. That way I can simply send a list of tiles that have changed on a map instead of the entire map over again. This reduces the need for such a large buffer and I'm back on track. Thanks.