I came across this query: Create a ByteBuf in Netty 4.0 about conversion from byte[] to ByteBuf and ByteBuffer to ByteBuf. I was curious to know about the conversion the other way:
io.netty.buffer.ByteBuf to java.nio.ByteBuffer
and how to do it efficiently, with minimal/no copying? I did some reading and with some trial and error I found this inefficient way of converting it (with two copies):
// io.netty.handler.codec.http.FullHttpRequest fullHttpRequest;
ByteBuf conByteBuf = fullHttpRequest.content ();
int numReadBytes = conByteBuf.readableBytes ();
conBytes = new byte[numReadBytes];
conByteBuf .readBytes (conBytes); // First Copy
ByteBuffer conByteBuffer = ByteBuffer.allocate (conBytes.length);
conByteBuffer.put (conByteBuf); // Second Copy
My question is, can we avoid one or both the copies and make the internal buffer of ByteBuffer to use the internal buffer of ByteBuf.
Thanks!
You should be able to use ByteBuf.nioBuffers(). Which will return a view of the ByteBuf as an array of ByteBuffer objects.
In most cases this array will only have one element, but in some of the more complicated implementations of ByteBuf there may be multiple underlying ByteBuffer objects and ByteBuf.nioBuffers() can return them as-is instead of merging them as would a call to ByteBuf.nioBuffer().
You can tell ahead of time what the array length will be by using ByteBuf.nioBufferCount()
You can at least use ByteBuffer.wrap() to avoid the second copying.
Not particularly efficient but doing the trick:
public static ByteBuffer toNioBuffer(ByteBuf buffer) {
if (buffer.isDirect()) {
return buffer.nioBuffer();
}
final byte[] bytes = new byte[buffer.readableBytes()];
buffer.getBytes(buffer.readerIndex(), bytes);
return ByteBuffer.wrap(bytes);
}
Related
I have a reporting web application which generates reports. The application gets data from a database and stores data into a StringWriter object. I have to get this data in a byte array format to create a csv file and send it to browser.
Below is the code snippet
return new FileTransfer(fileName, reportType.getMimeType(),
new ByteArrayInputStream(generateCSV(reportType, grid, new DataList(), params).toString().getBytes("UTF-8")));
where generateCSV returns a StringWriter object, then to convert it into byte array I am calling toString and then getBytes() method. Below is what the generateCSV method looks like
StringWriter generateCSV(ReportType reportType, GridConfig grid, DataList dataList, String params) {......}
The problem is that when my report has huge records (more than 1 million), the getBytes() method fails with
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
The whole report data when converted to String object has a huge number of characters (billions of it). The .getBytes("UTF-8") method convert it into array, each array element as one character. And for 1 million records, the character are exceeding the MAX JVM ARRAY size limit (https://plumbr.io/outofmemoryerror/requested-array-size-exceeds-vm-limit).
Now how can I avoid use of toString().getBytes("UTF-8") to avoid OOM error? Is there any better approach to convert to byte array from StringWriter?
It’s strange to receive the result of generateCSV as a StringWriter; the preferred solution would be to let the method write to a target while generating, so you don’t have the entire contents in memory.
In either case, you should resort to the FileTransfer(String, String mimeType, OutputStreamLoader) constructor, to receive the target OutputStream when it is time to write the actual data.
When you can’t avoid the intermediate StringWriter, you should at least avoid calling toString on it, as constructing a String implies creating a copy of the entire buffer.
So a solution could look like:
return new FileTransfer(fileName, reportType.getMimeType(), new OutputStreamLoader() {
public void close() {}
public void load(OutputStream out) throws IOException {
// the best would be to let generateCSV write to out directly
// otherwise use:
StringBuffer sb = generateCSV(reportType, grid, new DataList(), params).getBuffer();
Writer w = new OutputStreamWriter(out, "UTF-8")
final int bufSize = 8192;
for(int s = 0, e; s < sb.length(); s = e) {
e = Math.min(sb.length(), s + bufSize);
w.write(sb.substring(s, e));
}
w.flush(); // let the caller close the OutputStream
}
});
An alternative to StringWriter would be CharArrayWriter, which has a writeTo(Writer out), which eliminates the need to implement a manual copying loop and might be even more efficient. But, as said, refactoring generateCSV to write directly to a target would be even better.
The StringWriter holds its content in the memory. So it's not a good approach to use it with large files.
You should try to chunk the File directly to the InputStream without the StringWriter in the middle.
What about your own InputStream implementation which reads and convert the file to csv on the fly.
Can you show us the generateCSV method?
I am attempting to use Gson to to take some Java Object and serialize that to json and get a byte array that represents that Json. I need a byte array because I am passing on the output to an external dependency that requires it to be a byte array.
public byte[] serialize(Object object){
return gson.toJson(object).getBytes();
}
I have 2 questions:
If the input is a String gson seems to return the String as is. It doesn't do any validation of the input. Is this expected? I'd like to use Gson in a way that it would validate that the input object is actually Json. How could I do this?
I'm gonna be invoking this serialize function several thousands of times over a short period. Converting to String and then to byte[] could be some unwanted overhead. Is there a more optimal way to get the byte[]?
edit: my answer on point 1 was misinformed.
2) There will be a lot of unnecessary overhead in reflection if you just use the vanilla gson converter. It would very much be a performance benefit in your case to write a custom adapter. here is one article with more info on that
https://open.blogs.nytimes.com/2016/02/11/improving-startup-time-in-the-nytimes-android-app/?_r=0
If the input is a String gson seems to return the String as is. It doesn't do any validation of the input. Is this expected?
Yes, this is fine. It just returns a JSON string representation of the given string.
I'd like to use Gson in a way that it would validate that the input object is actually Json. How could I do this?
No need per se. Gson.toJson() method accepts objects to be serialized and it generates valid JSON always. If you mean deserialization, then Gson makes fast fails on invalid JSON documents during reading/parsing/deserialization (actually reading, this is the bottom-most layer of Gson).
I'm gonna be invoking this serialize function several thousands of times over a short period. Converting to String and then to byte[] could be some unwanted overhead. Is there a more optimal way to get the byte[]?
Yes, accumulating a JSON string to in order just to expose its internal char[] clone is memory waste, of course. Gson is basically a stream-oriented tool, and note that there are Gson.toJson method overloads accepting Appendable that are basically the Gson core (just take a quick look on how Gson.fromJson(Object) works -- it just creates a StringWriter instance to accumulate a string because of the Appendable interface). It would be extremely cool if Gson could emit JSON tokens via a Reader rather than writing to an Appendable, but this idea was refused and most likely will never be implemented in Gson, unfortunately. Since Gson does not emit JSON tokens during deserialization in read semantics manner (from your code perspective), you have to buffer the whole result:
private static byte[] serializeToBytes(final Object object)
throws IOException {
final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
final OutputStreamWriter writer = new OutputStreamWriter(outputStream);
gson.toJson(object, writer);
writer.flush();
return outputStream.toByteArray();
}
This one does not use StringWriter thus not accumulating an intermediate string with cloned arrays ping-pong. I don't know if there are writers/output streams that can utilize/re-use existing byte arrays, but I believe there should be some, because it makes a good rationale for the performance purposes you mentioned in your question.
If possible, you can also check your library interface/API for exposing/accepting OutputStreams somehow -- then you could probably easily pass such output streams to the serializeToBytes method or even remove the method. If it can use input streams, not just byte arrays, you could also take a look at converting output streams to input streams so the serializeToBytes method could return an InputStream or a Reader (requires some overhead, but can process infinite data -- need to find the balance):
private static InputStream serializeToByteStream(final Object object)
throws IOException {
final PipedInputStream inputStream = new PipedInputStream();
final OutputStream outputStream = new PipedOutputStream(inputStream);
new Thread(() -> {
try {
final OutputStreamWriter writer = new OutputStreamWriter(outputStream);
gson.toJson(object, writer);
writer.flush();
} catch ( final IOException ex ) {
throw new RuntimeException(ex);
} finally {
try {
outputStream.close();
} catch ( final IOException ex ) {
throw new RuntimeException(ex);
}
}
}).start();
return inputStream;
}
Example of use:
final String value = "foo";
System.out.println(Arrays.toString(serializeToBytes(value)));
try ( final InputStream inputStream = serializeToByteStream(value) ) {
int b;
while ( (b = inputStream.read()) != -1 ) {
System.out.print(b);
System.out.print(' ');
}
System.out.println();
}
Output:
[34, 102, 111, 111, 34]
34 102 111 111 34
Both represent an array of ASCII codes representing a string "foo" literally.
Given an object byte[], when we want to operate with such object often we need pieces of it. In my particular example i get byte[] from wire where first 4 bytes describe lenght of the message then another 4 bytes the type of the message (an integer that maps to concrete protobuf class) then remaining byte[] is actual content of the message... like this
length|type|content
in order to parse this message i have to pass content part to specific class which knows how to parse an instance from it... the problem is that often there are no methods provided so that you could specify from where to where parser shall read the array...
So what we end up doing is copying remaining chuks of that array, which is not effective...
As far as i know in java it is not possible to create another byte[] reference that actually refers to some original bigger byte[] array with just 2 indexes (this was approach with String that led to memory leaks)...
I wonder how do we solve situations like this? I suppose giving up on protobuf just because it does not provide some parseFrom(byte[], int, int) does not make sence... protobuf is just an example, anything could lack that api...
So does this force us to write inefficient code or there is something that can be done? (appart from adding that method)...
Normally you would tackle this kind of thing with streams.
A stream is an abstraction for reading just what you need to process the current block of data. So you can read the correct number of bytes into a byte array and pass it to your parse function.
You ask 'So does this force us to write inefficient code or there is something that can be done?'
Usually you get your data in the form of a stream and then using the technique demonstrated below will be more performant because you skip making one copy. (Two copies instead of three; once by the OS and once by you. You skip making a copy of the total byte array before you start parsing.) If you actually start out with a byte[] but it is constructed by yourself then you may want to change to constructing an object such as { int length, int type, byte[] contentBytes } instead and pass contentBytes to your parse function.
If you really, really have to start out with byte[] then the below technique is just a more convenient way to parse it, it would not be more performant.
So suppose you got a buffer of bytes from somewhere and you want to read the contents of that buffer. First you convert it to a stream:
private static List<Content> read(byte[] buffer) {
try {
ByteArrayInputStream bytesStream = new ByteArrayInputStream(buffer);
return read(bytesStream);
} catch (IOException e) {
e.printStackTrace();
}
}
The above function wraps the byte array with a stream and passes it to the function that does the actual reading.
If you can start out from a stream then obviously you can skip the above step and just pass that stream into the below function directly:
private static List<Content> read(InputStream bytesStream) throws IOException {
List<Content> results = new ArrayList<Content>();
try {
// read the content...
Content content1 = readContent(bytesStream);
results.add(content1);
// I don't know if there's more than one content block but assuming
// that there is, you can just continue reading the stream...
//
// If it's a fixed number of content blocks then just read them one
// after the other... Otherwise make this a loop
Content content2 = readContent(bytesStream);
results.add(content2);
} finally {
bytesStream.close();
}
return results;
}
Since your byte-array contains content you will want to read Content blocks from the stream. Since you have a length and a type field, I am assuming that you have different kinds of content blocks. The next function reads the length and type and passes the processing of the content bytes on to the proper class depending on the read type:
private static Content readContent(InputStream stream) throws IOException {
final int CONTENT_TYPE_A = 10;
final int CONTENT_TYPE_B = 11;
// wrap the InputStream in a DataInputStream because the latter has
// convenience functions to convert bytes to integers, etc.
// Note that DataInputStream handles the stream in a BigEndian way,
// so check that your bytes are in the same byte order. If not you'll
// have to find another stream reader that can convert to ints from
// LittleEndian byte order.
DataInputStream data = new DataInputStream(stream);
int length = data.readInt();
int type = data.readInt();
// I'm assuming that above length field was the number of bytes for the
// content. So, read length number of bytes into a buffer and pass that
// to your `parseFrom(byte[])` function
byte[] contentBytes = new byte[length];
int readCount = data.read(contentBytes, 0, contentBytes.length);
if (readCount < contentBytes.length)
throw new IOException("Unexpected end of stream");
switch (type) {
case CONTENT_TYPE_A:
return ContentTypeA.parseFrom(contentBytes);
case CONTENT_TYPE_B:
return ContentTypeB.parseFrom(contentBytes);
default:
throw new UnsupportedOperationException();
}
}
I have made up the below Content classes. I don't know what protobuf is but it can apparently convert from a byte array to an actual object with its parseFrom(byte[]) function, so take this as pseudocode:
class Content {
// common functionality
}
class ContentTypeA extends Content {
public static ContentTypeA parseFrom(byte[] contentBytes) {
return null; // do the actual parsing of a type A content
}
}
class ContentTypeB extends Content {
public static ContentTypeB parseFrom(byte[] contentBytes) {
return null; // do the actual parsing of a type B content
}
}
In Java, Array is not just section of memory - it is an object, that have some additional fields (at least - length). So you cannot link to part of array - you should:
Use array-copy functions or
Implement and use some algorithm that uses only part of byte array.
The concern seems that there is no way to create a view over an array (e.g., an array equivalent of List#subList()). A workaround might be making your parsing methods take in the reference to the entire array and two indices (or an index and a length) to specify the sub-array the method should work on.
This would not prevent the methods from reading or modifying sections of the array they should not touch. Perhaps an ByteArrayView class could be made to add a little bit of safety if this is a concern:
public class ByteArrayView {
private final byte[] array;
private final int start;
private final int length;
public ByteArrayView(byte[] array, int start, int length) { ... }
public byte[] get(int index) {
if (index < 0 || index >= length) {
throw new ArrayOutOfBoundsExceptionOrSomeOtherRelevantException();
}
return array[start + index];
}
}
But if, on the other hand, performance is a concern, then a method call to get() for fetching each byte is probably undesirable.
The code is for illustration; it's not tested or anything.
EDIT
On a second reading of my own answer, I realized that I should point this out: having a ByteArrayView will copy each byte you read from the original array -- just byte by byte rather than as a chunk. It would be inadequate for the OP's concerns.
I have need to return the byte array for the ByteArrayOutputStream from the called method. I see two ways to achieve the same thing: firstly to return ByteArrayOutputStream & use toByteArray() method, and secondly use baos.toByteArray() and return the byte array.
Which one should I use?
To illustrate by example:
Method 1
void parentMethod(){
bytes [] result = process();
}
void byte[] process(){
ByteArrayOutputStream baos;
.....
.....
.....
baos.toByteArray();
}
Method 2
void parentMethod(){
ByteArrayOutputStream baos = process();
}
void ByteArrayOutputStream process(){
ByteArrayOutputStream baos;
.....
.....
.....
return baos;
}
There's another alternative: return an InputStream. The idea is presumably that you're returning the data resulting from the operation. As such, returning an output stream seems very odd to me. To return data, you'd normally either return the raw byte[], or an InputStream wrapping it - the latter is more flexible in that it could be reading from a file or something similar, but does require the caller to close the stream afterwards.
It partly depends on what callers want to do with the data, too - there are some operations which are easier to perform if you've already got a stream; there are others which are easier with a byte array. I'd let that influence the decision quite a lot.
If you do want to return a stream, that's easy:
return new ByteArrayInputStream(baos.toByteArray());
So to summarize:
Don't return ByteArrayOutputStream. The use of that class in coming up with the data is an implementation detail, and it's not really the logical result of the operation.
Consider returning an InputStream if callers are likely to find that easier to use or if you may later want to read the data from a file (or network connection, or whatever); ByteArrayInputStream is suitable in the current implementation
I am reading an array of bytes passed in to me (not my choice, but i have to use it this way). I need to get the data to a LinkedBlockingQueue, and ultimately step through the bytes to build one or more (may contain partial messages) xml messages. So my question is this:
What generic should i use for the LBQ type?
what is the most efficient way to get the byte[] to that generic type?
Here is my example code:
parsebytes(byte[] bytes, int length)
{
//assume that i am doing other checks on data
if (length > 0)
{
myThread.putBytes(bytes, length);
}
}
in my thread:
putBytes(byte[] bytes, int length)
{
for (int i = 0; i < length; i++)
{
blockingQueue.put(bytes[i]);
}
}
I also do not want to have to pull off the blocking queue byte-by-byte either. I would rather grab everything that is in the queue and process it.
There is no such thing as a ListBlockingQueue. However, any BlockingQueue<Object> will accept byte[] since Java arrays are objects.
In the absence of other design considerations, the simplest option might be to just stick the arrays into the queue as they arrive, and let the consumer stich them together.
Consider this:
BlockingQueue<byte[]> q = new LinkedBlockingQueue<>();
q.put(new byte[] {1,2,3});
byte[] bytes = q.take();