I have need to return the byte array for the ByteArrayOutputStream from the called method. I see two ways to achieve the same thing: firstly to return ByteArrayOutputStream & use toByteArray() method, and secondly use baos.toByteArray() and return the byte array.
Which one should I use?
To illustrate by example:
Method 1
void parentMethod(){
bytes [] result = process();
}
void byte[] process(){
ByteArrayOutputStream baos;
.....
.....
.....
baos.toByteArray();
}
Method 2
void parentMethod(){
ByteArrayOutputStream baos = process();
}
void ByteArrayOutputStream process(){
ByteArrayOutputStream baos;
.....
.....
.....
return baos;
}
There's another alternative: return an InputStream. The idea is presumably that you're returning the data resulting from the operation. As such, returning an output stream seems very odd to me. To return data, you'd normally either return the raw byte[], or an InputStream wrapping it - the latter is more flexible in that it could be reading from a file or something similar, but does require the caller to close the stream afterwards.
It partly depends on what callers want to do with the data, too - there are some operations which are easier to perform if you've already got a stream; there are others which are easier with a byte array. I'd let that influence the decision quite a lot.
If you do want to return a stream, that's easy:
return new ByteArrayInputStream(baos.toByteArray());
So to summarize:
Don't return ByteArrayOutputStream. The use of that class in coming up with the data is an implementation detail, and it's not really the logical result of the operation.
Consider returning an InputStream if callers are likely to find that easier to use or if you may later want to read the data from a file (or network connection, or whatever); ByteArrayInputStream is suitable in the current implementation
Related
I have a reporting web application which generates reports. The application gets data from a database and stores data into a StringWriter object. I have to get this data in a byte array format to create a csv file and send it to browser.
Below is the code snippet
return new FileTransfer(fileName, reportType.getMimeType(),
new ByteArrayInputStream(generateCSV(reportType, grid, new DataList(), params).toString().getBytes("UTF-8")));
where generateCSV returns a StringWriter object, then to convert it into byte array I am calling toString and then getBytes() method. Below is what the generateCSV method looks like
StringWriter generateCSV(ReportType reportType, GridConfig grid, DataList dataList, String params) {......}
The problem is that when my report has huge records (more than 1 million), the getBytes() method fails with
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
The whole report data when converted to String object has a huge number of characters (billions of it). The .getBytes("UTF-8") method convert it into array, each array element as one character. And for 1 million records, the character are exceeding the MAX JVM ARRAY size limit (https://plumbr.io/outofmemoryerror/requested-array-size-exceeds-vm-limit).
Now how can I avoid use of toString().getBytes("UTF-8") to avoid OOM error? Is there any better approach to convert to byte array from StringWriter?
It’s strange to receive the result of generateCSV as a StringWriter; the preferred solution would be to let the method write to a target while generating, so you don’t have the entire contents in memory.
In either case, you should resort to the FileTransfer(String, String mimeType, OutputStreamLoader) constructor, to receive the target OutputStream when it is time to write the actual data.
When you can’t avoid the intermediate StringWriter, you should at least avoid calling toString on it, as constructing a String implies creating a copy of the entire buffer.
So a solution could look like:
return new FileTransfer(fileName, reportType.getMimeType(), new OutputStreamLoader() {
public void close() {}
public void load(OutputStream out) throws IOException {
// the best would be to let generateCSV write to out directly
// otherwise use:
StringBuffer sb = generateCSV(reportType, grid, new DataList(), params).getBuffer();
Writer w = new OutputStreamWriter(out, "UTF-8")
final int bufSize = 8192;
for(int s = 0, e; s < sb.length(); s = e) {
e = Math.min(sb.length(), s + bufSize);
w.write(sb.substring(s, e));
}
w.flush(); // let the caller close the OutputStream
}
});
An alternative to StringWriter would be CharArrayWriter, which has a writeTo(Writer out), which eliminates the need to implement a manual copying loop and might be even more efficient. But, as said, refactoring generateCSV to write directly to a target would be even better.
The StringWriter holds its content in the memory. So it's not a good approach to use it with large files.
You should try to chunk the File directly to the InputStream without the StringWriter in the middle.
What about your own InputStream implementation which reads and convert the file to csv on the fly.
Can you show us the generateCSV method?
Given an object byte[], when we want to operate with such object often we need pieces of it. In my particular example i get byte[] from wire where first 4 bytes describe lenght of the message then another 4 bytes the type of the message (an integer that maps to concrete protobuf class) then remaining byte[] is actual content of the message... like this
length|type|content
in order to parse this message i have to pass content part to specific class which knows how to parse an instance from it... the problem is that often there are no methods provided so that you could specify from where to where parser shall read the array...
So what we end up doing is copying remaining chuks of that array, which is not effective...
As far as i know in java it is not possible to create another byte[] reference that actually refers to some original bigger byte[] array with just 2 indexes (this was approach with String that led to memory leaks)...
I wonder how do we solve situations like this? I suppose giving up on protobuf just because it does not provide some parseFrom(byte[], int, int) does not make sence... protobuf is just an example, anything could lack that api...
So does this force us to write inefficient code or there is something that can be done? (appart from adding that method)...
Normally you would tackle this kind of thing with streams.
A stream is an abstraction for reading just what you need to process the current block of data. So you can read the correct number of bytes into a byte array and pass it to your parse function.
You ask 'So does this force us to write inefficient code or there is something that can be done?'
Usually you get your data in the form of a stream and then using the technique demonstrated below will be more performant because you skip making one copy. (Two copies instead of three; once by the OS and once by you. You skip making a copy of the total byte array before you start parsing.) If you actually start out with a byte[] but it is constructed by yourself then you may want to change to constructing an object such as { int length, int type, byte[] contentBytes } instead and pass contentBytes to your parse function.
If you really, really have to start out with byte[] then the below technique is just a more convenient way to parse it, it would not be more performant.
So suppose you got a buffer of bytes from somewhere and you want to read the contents of that buffer. First you convert it to a stream:
private static List<Content> read(byte[] buffer) {
try {
ByteArrayInputStream bytesStream = new ByteArrayInputStream(buffer);
return read(bytesStream);
} catch (IOException e) {
e.printStackTrace();
}
}
The above function wraps the byte array with a stream and passes it to the function that does the actual reading.
If you can start out from a stream then obviously you can skip the above step and just pass that stream into the below function directly:
private static List<Content> read(InputStream bytesStream) throws IOException {
List<Content> results = new ArrayList<Content>();
try {
// read the content...
Content content1 = readContent(bytesStream);
results.add(content1);
// I don't know if there's more than one content block but assuming
// that there is, you can just continue reading the stream...
//
// If it's a fixed number of content blocks then just read them one
// after the other... Otherwise make this a loop
Content content2 = readContent(bytesStream);
results.add(content2);
} finally {
bytesStream.close();
}
return results;
}
Since your byte-array contains content you will want to read Content blocks from the stream. Since you have a length and a type field, I am assuming that you have different kinds of content blocks. The next function reads the length and type and passes the processing of the content bytes on to the proper class depending on the read type:
private static Content readContent(InputStream stream) throws IOException {
final int CONTENT_TYPE_A = 10;
final int CONTENT_TYPE_B = 11;
// wrap the InputStream in a DataInputStream because the latter has
// convenience functions to convert bytes to integers, etc.
// Note that DataInputStream handles the stream in a BigEndian way,
// so check that your bytes are in the same byte order. If not you'll
// have to find another stream reader that can convert to ints from
// LittleEndian byte order.
DataInputStream data = new DataInputStream(stream);
int length = data.readInt();
int type = data.readInt();
// I'm assuming that above length field was the number of bytes for the
// content. So, read length number of bytes into a buffer and pass that
// to your `parseFrom(byte[])` function
byte[] contentBytes = new byte[length];
int readCount = data.read(contentBytes, 0, contentBytes.length);
if (readCount < contentBytes.length)
throw new IOException("Unexpected end of stream");
switch (type) {
case CONTENT_TYPE_A:
return ContentTypeA.parseFrom(contentBytes);
case CONTENT_TYPE_B:
return ContentTypeB.parseFrom(contentBytes);
default:
throw new UnsupportedOperationException();
}
}
I have made up the below Content classes. I don't know what protobuf is but it can apparently convert from a byte array to an actual object with its parseFrom(byte[]) function, so take this as pseudocode:
class Content {
// common functionality
}
class ContentTypeA extends Content {
public static ContentTypeA parseFrom(byte[] contentBytes) {
return null; // do the actual parsing of a type A content
}
}
class ContentTypeB extends Content {
public static ContentTypeB parseFrom(byte[] contentBytes) {
return null; // do the actual parsing of a type B content
}
}
In Java, Array is not just section of memory - it is an object, that have some additional fields (at least - length). So you cannot link to part of array - you should:
Use array-copy functions or
Implement and use some algorithm that uses only part of byte array.
The concern seems that there is no way to create a view over an array (e.g., an array equivalent of List#subList()). A workaround might be making your parsing methods take in the reference to the entire array and two indices (or an index and a length) to specify the sub-array the method should work on.
This would not prevent the methods from reading or modifying sections of the array they should not touch. Perhaps an ByteArrayView class could be made to add a little bit of safety if this is a concern:
public class ByteArrayView {
private final byte[] array;
private final int start;
private final int length;
public ByteArrayView(byte[] array, int start, int length) { ... }
public byte[] get(int index) {
if (index < 0 || index >= length) {
throw new ArrayOutOfBoundsExceptionOrSomeOtherRelevantException();
}
return array[start + index];
}
}
But if, on the other hand, performance is a concern, then a method call to get() for fetching each byte is probably undesirable.
The code is for illustration; it's not tested or anything.
EDIT
On a second reading of my own answer, I realized that I should point this out: having a ByteArrayView will copy each byte you read from the original array -- just byte by byte rather than as a chunk. It would be inadequate for the OP's concerns.
I came across this query: Create a ByteBuf in Netty 4.0 about conversion from byte[] to ByteBuf and ByteBuffer to ByteBuf. I was curious to know about the conversion the other way:
io.netty.buffer.ByteBuf to java.nio.ByteBuffer
and how to do it efficiently, with minimal/no copying? I did some reading and with some trial and error I found this inefficient way of converting it (with two copies):
// io.netty.handler.codec.http.FullHttpRequest fullHttpRequest;
ByteBuf conByteBuf = fullHttpRequest.content ();
int numReadBytes = conByteBuf.readableBytes ();
conBytes = new byte[numReadBytes];
conByteBuf .readBytes (conBytes); // First Copy
ByteBuffer conByteBuffer = ByteBuffer.allocate (conBytes.length);
conByteBuffer.put (conByteBuf); // Second Copy
My question is, can we avoid one or both the copies and make the internal buffer of ByteBuffer to use the internal buffer of ByteBuf.
Thanks!
You should be able to use ByteBuf.nioBuffers(). Which will return a view of the ByteBuf as an array of ByteBuffer objects.
In most cases this array will only have one element, but in some of the more complicated implementations of ByteBuf there may be multiple underlying ByteBuffer objects and ByteBuf.nioBuffers() can return them as-is instead of merging them as would a call to ByteBuf.nioBuffer().
You can tell ahead of time what the array length will be by using ByteBuf.nioBufferCount()
You can at least use ByteBuffer.wrap() to avoid the second copying.
Not particularly efficient but doing the trick:
public static ByteBuffer toNioBuffer(ByteBuf buffer) {
if (buffer.isDirect()) {
return buffer.nioBuffer();
}
final byte[] bytes = new byte[buffer.readableBytes()];
buffer.getBytes(buffer.readerIndex(), bytes);
return ByteBuffer.wrap(bytes);
}
Consider a generic byte reader implementing the following simple API to read an unspecified number of bytes from a data structure that is otherwise inaccessible:
public interface ByteReader
{
public byte[] read() throws IOException; // Returns null only at EOF
}
How could the above be efficiently converted to a standard Java InputStream, so that an application using all methods defined by the InputStream class, works as expected?
A simple solution would be subclassing InputStream to
Call the read() method of the ByteReader as much as needed by the read(...) methods of the InputStream
Buffer the bytes retrieved in a byte[] array
Return part of the byte array as expected, e.g., 1 byte at a time whenever the InputStream read() method is called.
However, this requires more work to be efficient (e.g., for avoiding multiple byte array allocations). Also, for the application to scale to large input sizes, reading everything into memory and then processing is not an option.
Any ideas or open source implementations that could be used?
Create multiple ByteArrayInputStream instances around the returned arrays and use them in a stream that provides for concatenation. You could for instance use SequenceInputStream for this.
Trick is to implement a Enumeration<ByteArrayInputStream> that is can use the ByteReader class.
EDIT: I've implemented this answer, but it is probably better to create your own InputStream instance instead. Unfortunately, this solution does not let you handle IOException gracefully.
final Enumeration<ByteArrayInputStream> basEnum = new Enumeration<ByteArrayInputStream>() {
ByteArrayInputStream baos;
boolean ended;
#Override
public boolean hasMoreElements() {
if (ended) {
return false;
}
if (baos == null) {
getNextBA();
if (ended) {
return false;
}
}
return true;
}
#Override
public ByteArrayInputStream nextElement() {
if (ended) {
throw new NoSuchElementException();
}
if (baos.available() != 0) {
return baos;
}
getNextBA();
return baos;
}
private void getNextBA() {
byte[] next;
try {
next = byteReader.read();
} catch (IOException e) {
throw new IllegalStateException("Issues reading byte arrays");
}
if (next == null) {
ended = true;
return;
}
this.baos = new ByteArrayInputStream(next);
}
};
SequenceInputStream sis = new SequenceInputStream(basEnum);
I assume, by your use of "convert", that a replacement is acceptable.
The easiest way to do this is to just use a ByteArrayInputStream, which already provides all the features you are looking for (but must wrap an existing array), or to use any of the other already provided InputStream for reading data from various sources.
It seems like you may be running the risk of reinventing wheels here. If possible, I would consider scrapping your ByteReader interface entirely, and instead going with one of these options:
Replace with ByteInputStream.
Use the various other InputStream classes (depending on the source of the data).
Extend InputStream with your custom implementation.
I'd stick to the existing InputStream class everywhere. I have no idea how your code is structured but you could, for example, add a getInputStream() method to your current data sources, and have them return an appropriate already-existing InputStream (or a custom subclass if necessary).
By the way, I recommend avoiding the term Reader in your own IO classes, as Reader is already heavily used in the Java SDK to indicate stream readers that operate on encoded character data (as opposed to InputStream which generally operates on raw byte data).
This is a total beginner question, I've spent the past hour searching both stackoverflow and Google, but I haven't found what I'm looking for, hopefully someone here can point me in the right direction.
I'm trying to write a string to an OutputStream, which I will then use to write data to a MySQL database. I've successfully retrieved data from a MySQL (from a .php, implementing JSON and RESTful), so I have some idea of what I'm doing, I think. I'm creating a method which will take a string and return an output stream, and I'm having trouble writing to an output stream, because when I try to initialize one, it creates an anonymous inner class with the write(int oneByte) method. That's not what I want.
private static OutputStream convertStringtoStream(String string) {
byte[] stringByte = string.getBytes();
OutputStream os = new OutputStream() {
#Override
public void write(int oneByte) throws IOException {
/** I'd rather this method be something like
public void write(byte[] bytes), but it requires int oneByte*/
}
};
//return os here
}
As you can see, I want to write to my OutputStream with the buffer, not a single byte. I'm sure this is simple question, but I've not been able to find an answer, or even sample code which does what I want. If someone could point me in the right direction I'd really appreciate it. Thanks.
Your method could look like this, but I'm not sure what it would accomplish. How would you use the returned OutputStream?
private static OutputStream convertStringtoStream(String string) {
byte[] stringByte = string.getBytes();
ByteArrayOutputStream bos = new ByteArrayOutputStream(string.length());
bos.write(stringByte);
return bos;
}
Also, note that using String.getBytes() might get you into trouble in the long run because it uses the system's default encoding. It's better to choose an explicit encoding and use the String.getBytes(Charset) method.
Instead of using the abstract OutputStream class, you might want to use ByteArrayOutputStream which allows you to write a buffer. Even better perhaps would be ObjectOutputStream which would allow you to write string directly since string is serializable. Hope that helps.