When using jersey, I am encountering an OOM exception serving large files. I thought that by using StreamingOutput, I would avoid keeping the entire file in memory and therefore avoid an OOM exception, but that doesn't seem to be the case. This is how we are building the StreamingOutput:
StreamingOutput streamingOutput = new StreamingOutput() {
#Override
public void write(OutputStream outputStream) throws IOException, WebApplicationException {
final InputStream is = tis.getInputStream();
byte[] bbuf = new byte[1024 * 8];
long total = 0;
int length;
while ((is != null) && ((length = is.read(bbuf)) != -1)) {
outputStream.write(bbuf, 0, length);
total += length;
log.trace("Copied {} of {}", total, tis.getFileLength());
}
outputStream.flush();
is.close();
outputStream.close();
}
};
responseBuilder = Response.ok(streamingOutput, tis.getStreamType());
tis is a typedInputStream...
Am I just mistaken in thinking that this should prevent the entire file from being in memory? I am using tomcat 7. I have about a gig of free heap, so when I try to download a 1.5 gig file, an OOM exception is thrown. Is there a mistake in this code? Looking at the heap dump, it seems all of the memory is being used by a byte array, I'm not sure if I can use the heap dump to figure out exactly where in the code that byte array is being initialized.
Related
i have got simple method in controller which streams content from database, streaming works as intended, download starts right after calling endpoint. Problem is heap usage, streaming 256 MB file takes 1GB heap space. If I would replace service.writeContentToStream(param1, param2, out) with method that reads data from local file to input stream and copying to passed output stream result is same. Biggest file I can stream is 256 MB. Is there possible solution to overcome heap size limit?
#GetMapping("/{param1}/download-stream")
public ResponseEntity<StreamingResponseBody> downloadAsStream(
#PathVariable("param1") String param1,
#RequestParam(value = "param2") String param2
) {
Metadata metadata = service.getMetadata(param1);
StreamingResponseBody stream = out -> service.writeContentToStream(param1, param2, out);
return ResponseEntity.ok()
.header(HttpHeaders.CONTENT_DISPOSITION, "attachment;" + getFileNamePart() + metadata.getFileName())
.header(HttpHeaders.CONTENT_LENGTH, Long.toString(metadata.getFileSize()))
.body(stream);
}
service.writeContentToStream method
try (FileInputStream fis = new FileInputStream(fileName)) {
StreamUtils.copy(fis, dataOutputStream);
} catch (IOException e) {
log.error("Error writing file to stream",e);
}
Matadata class contains only information about filesize and filename, there is no content stored there
EDIT
implementation of StreamUtils.copy() method, it comes from spring library
StreamUtils.copy(). Buffer size is set to 4096. Setting buffer to smaller size does not allow me to download bigger files
/**
* Copy the contents of the given InputStream to the given OutputStream.
* Leaves both streams open when done.
* #param in the InputStream to copy from
* #param out the OutputStream to copy to
* #return the number of bytes copied
* #throws IOException in case of I/O errors
*/
public static int copy(InputStream in, OutputStream out) throws IOException {
Assert.notNull(in, "No InputStream specified");
Assert.notNull(out, "No OutputStream specified");
int byteCount = 0;
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
byteCount += bytesRead;
}
out.flush();
return byteCount;
}
I wrote an article back in 2016 regarding StreamingResponseBody when it was first released. You can read that to get more of an idea. But even without that what you are trying to do with the following code is not scalable at all (Imagine 100 users concurrently trying to download).
try (FileInputStream fis = new FileInputStream(fileName)) {
StreamUtils.copy(fis, dataOutputStream);
} catch (IOException e) {
log.error("Error writing file to stream",e);
}
The above code is very memory intensive and nodes with high memory can only work with this and you always will have an upper bound on the file size (Can it download a 1TB file in 5 years?)
What you should do is the following;
try (FileInputStream fis = new FileInputStream(fileName)) {
byte[] data = new byte[2048];
int read = 0;
while ((read = fis.read(data)) > 0) {
dataOutputStream.write(data, 0, read);
}
dataOutputStream.flush();
} catch (IOException e) {
log.error("Error writing file to stream",e);
}
This way your code can download files of any size given that the user is able to wait and will not require a lot of memory
Some ideas:
Run the server inside the Java profiler. For example JProfiler (it costs money).
Try ServletResponse.setBufferSize(...)
Check, if you have some filters configured in the application.
Check the output buffer of the application server. In case of the Tomcat it could be quite tricky. It has a long list of possible buffers:
https://tomcat.apache.org/tomcat-8.5-doc/config/http.html
For me it was logging dependency, so if you are having problems with identifying the cause of heap usage, take a look at your logging configuration:
<dependency>
<groupId>org.zalando</groupId>
<artifactId>logbook-spring-boot-starter</artifactId>
<version>1.4.1</version>
<scope>compile</scope>
</dependency>
I need to upload a very large file from my machine to a server. (a few GB)
Currently, I tried the below approach but I keep getting.
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
I can increase the memory but this is not something I want to do because not sure where my code will run. I want to read a few MB/kb send them to the server and release the memory and repeat. tried other approaches like Files utils or IOUtils.copyLarge but I get the same problem.
URL serverUrl =
new URL(url);
HttpURLConnection urlConnection = (HttpURLConnection) serverUrl.openConnection();
urlConnection.setConnectTimeout(Configs.TIMEOUT);
urlConnection.setReadTimeout(Configs.TIMEOUT);
File fileToUpload = new File(file);
urlConnection.setDoOutput(true);
urlConnection.setRequestMethod("POST");
urlConnection.addRequestProperty("Content-Type", "application/octet-stream");
urlConnection.connect();
OutputStream output = urlConnection.getOutputStream();
FileInputStream input = new FileInputStream(fileToUpload);
upload(input, output);
//..close streams
private static long upload(InputStream input, OutputStream output) throws IOException {
try (
ReadableByteChannel inputChannel = Channels.newChannel(input);
WritableByteChannel outputChannel = Channels.newChannel(output)
) {
ByteBuffer buffer = ByteBuffer.allocateDirect(10240);
long size = 0;
while (inputChannel.read(buffer) != -1) {
buffer.flip();
size += outputChannel.write(buffer);
buffer.clear();
}
return size;
}
}
I think it has something to do with this but I can't figure out what I am doing wrong.
Another approach was but I get the same issue:
private static long copy(InputStream source, OutputStream sink)
throws IOException {
long nread = 0L;
byte[] buf = new byte[10240];
int n;
int i = 0;
while ((n = source.read(buf)) > 0) {
sink.write(buf, 0, n);
nread += n;
i++;
if (i % 10 == 0) {
log.info("flush");
sink.flush();
}
}
return nread;
}
Use setFixedLengthStreamingMode as per this answer on the duplicate question Denis Tulskiy linked to:
conn.setFixedLengthStreamingMode((int) fileToUpload.length());
From the docs:
This method is used to enable streaming of a HTTP request body without internal buffering, when the content length is known in advance.
At the moment, your code is attempting to buffer the file into Java's heap memory in order to compute the Content-Length header on the HTTP request.
I'm trying to make HTTP Transfer Encoding Chunked work with Netty 4.0.
I had success with it so far. It works well with small payloads.
Then I tried with large data, it started to hang.
I suspect there might be a problem with my code, or maybe a leak with ByteBuf.copy().
I stripped down my code to the bare minimum to be sure that I had no other source of leak or side effect and I've ended down to write this test. The complete code is here.
Basically it sends 1GB of 0x0 when you connect with wget to port 8888. I reproduce the problem when I connect with
wget http://127.0.0.1:8888 -O /dev/null
Here's the handler :
protected void channelRead0(ChannelHandlerContext ctx, FullHttpMessage msg) throws Exception {
DefaultHttpResponse response = new DefaultHttpResponse(HTTP_1_1, OK);
HttpHeaders.setTransferEncodingChunked(response);
response.headers().set(CONTENT_TYPE, "application/octet-stream");
ctx.write(response);
ByteBuf buf = Unpooled.buffer();
int GIGABYTE = (4 * 1024 * 1024); // multiply 256B = 1GB
for (int i = 0; i < GIGABYTE; i++) {
buf.writeBytes(CONTENT_256BYTES_ZEROED);
ctx.writeAndFlush(new DefaultHttpContent(buf.copy()));
buf.clear();
}
ctx.writeAndFlush(LastHttpContent.EMPTY_LAST_CONTENT).addListener(ChannelFutureListener.CLOSE);
}
Is there anything wrong with my approach?
EDIT :
With VisualVM I've found that there is a memory leak in the ChannelOutboundBuffer.
The Entry[] buffer keeps growing, addCapacity() is called multiple times. The Entry array seems to contains copies of the buffers that are (or should) be written to the wire.
I see with wireshark data coming in...
Here's a Dropbox link to the heapdump
I have found what I was doing wrong.
The for loop that writeAndFlush() was not working well and is likely to be cause of the leak.
I tried various things (see many revisions in the gist link). See the gist version at the time of writing.
I have found out that the best way to achieve what I wanted to do without memory leaks was to extends InputStream and write to the context (not using writeAndFlush()) the InputStream wrapped in an io.netty.handler.stream.ChunkedStream.
DefaultHttpResponse response = new DefaultHttpResponse(HTTP_1_1, OK);
HttpHeaders.setTransferEncodingChunked(response);
response.headers().set(CONTENT_TYPE, "application/octet-stream");
ctx.write(response);
InputStream is = new InputStream() {
int offset = -1;
byte[] buffer = null;
#Override
public int read() throws IOException {
if (offset == -1 || (buffer != null && offset == buffer.length)) {
fillBuffer();
}
if (buffer == null || offset == -1) {
return -1;
}
while (offset < buffer.length) {
int b = buffer[offset];
offset++;
return b;
}
return -1;
}
// this method simulates an application that would write to
// the buffer.
// ONE GB (max size for the test;
int sz = 1024 * 1024 * 1024;
private void fillBuffer() {
offset = 0;
if (sz <= 0) { // LIMIT TO ONE GB
buffer = null;
return;
}
buffer = new byte[1024];
System.arraycopy(CONTENT_1KB_ZEROED, 0,
buffer, 0,
CONTENT_1KB_ZEROED.length);
sz -= 1024;
}
};
ctx.write(new ChunkedStream(new BufferedInputStream(is), 8192));
ctx.writeAndFlush(LastHttpContent.EMPTY_LAST_CONTENT).addListener(ChannelFutureListener.CLOSE);
The code is writing 1GB of data to the client in 8K chunks. I was able to run 30 simultaneous connection without memory or hanging problems.
I am reading a BLOB column from a Oracle database, then writing it to a file as follows:
public static int execute(String filename, BLOB blob)
{
int success = 1;
try
{
File blobFile = new File(filename);
FileOutputStream outStream = new FileOutputStream(blobFile);
BufferedInputStream inStream = new BufferedInputStream(blob.getBinaryStream());
int length = -1;
int size = blob.getBufferSize();
byte[] buffer = new byte[size];
while ((length = inStream.read(buffer)) != -1)
{
outStream.write(buffer, 0, length);
outStream.flush();
}
inStream.close();
outStream.close();
}
catch (Exception e)
{
e.printStackTrace();
System.out.println("ERROR(img_exportBlob) Unable to export:"+filename);
success = 0;
}
}
The file size is around 3MB and it takes 40-50s to read the buffer. Its actually a 3D image data. So, is there any way by which I can reduce this time?
Given that the blob already has the concept of a buffer, it's possible that you're actually harming performance by using the BufferedInputStream at all - it may be making smaller read() calls, making more network calls than necessary.
Try getting rid of the BufferedInputStream completely, just reading directly from the blob's binary stream. It's only a thought, but worth a try. Oh, and you don't need to flush the output stream every time you write.
(As an aside, you should b closing streams in finally blocks - otherwise you'll leak handles if anything throws an exception.)
update in java9: https://docs.oracle.com/javase/9/docs/api/java/io/InputStream.html#transferTo-java.io.OutputStream-
I saw some similar, but not-quite-what-i-need threads.
I have a server, which will basically take input from a client, client A, and forward it, byte for byte, to another client, client B.
I'd like to connect my inputstream of client A with my output stream of client B. Is that possible? What are ways to do that?
Also, these clients are sending each other messages, which are somewhat time sensitive, so buffering won't do. I do not want a buffer of say 500 and a client sends 499 bytes and then my server holds off on forwarding the 500 bytes because it hasn't received the last byte to fill the buffer.
Right now, I am parsing each message to find its length, then reading length bytes, then forwarding them. I figured (and tested) this would be better than reading a byte and forwarding a byte over and over because that would be very slow. I also did not want to use a buffer or a timer for the reason I stated in my last paragraph — I do not want messages waiting a really long time to get through simply because the buffer isn't full.
What's a good way to do this?
Just because you use a buffer doesn't mean the stream has to fill that buffer. In other words, this should be okay:
public static void copyStream(InputStream input, OutputStream output)
throws IOException
{
byte[] buffer = new byte[1024]; // Adjust if you want
int bytesRead;
while ((bytesRead = input.read(buffer)) != -1)
{
output.write(buffer, 0, bytesRead);
}
}
That should work fine - basically the read call will block until there's some data available, but it won't wait until it's all available to fill the buffer. (I suppose it could, and I believe FileInputStream usually will fill the buffer, but a stream attached to a socket is more likely to give you the data immediately.)
I think it's worth at least trying this simple solution first.
How about just using
void feedInputToOutput(InputStream in, OutputStream out) {
IOUtils.copy(in, out);
}
and be done with it?
from jakarta apache commons i/o library which is used by a huge amount of projects already so you probably already have the jar in your classpath already.
JDK 9 has added InputStream#transferTo(OutputStream out) for this functionality.
For completeness, guava also has a handy utility for this
ByteStreams.copy(input, output);
You can use a circular buffer :
Code
// buffer all data in a circular buffer of infinite size
CircularByteBuffer cbb = new CircularByteBuffer(CircularByteBuffer.INFINITE_SIZE);
class1.putDataOnOutputStream(cbb.getOutputStream());
class2.processDataFromInputStream(cbb.getInputStream());
Maven dependency
<dependency>
<groupId>org.ostermiller</groupId>
<artifactId>utils</artifactId>
<version>1.07.00</version>
</dependency>
Mode details
http://ostermiller.org/utils/CircularBuffer.html
Asynchronous way to achieve it.
void inputStreamToOutputStream(final InputStream inputStream, final OutputStream out) {
Thread t = new Thread(new Runnable() {
public void run() {
try {
int d;
while ((d = inputStream.read()) != -1) {
out.write(d);
}
} catch (IOException ex) {
//TODO make a callback on exception.
}
}
});
t.setDaemon(true);
t.start();
}
BUFFER_SIZE is the size of chucks to read in. Should be > 1kb and < 10MB.
private static final int BUFFER_SIZE = 2 * 1024 * 1024;
private void copy(InputStream input, OutputStream output) throws IOException {
try {
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = input.read(buffer);
while (bytesRead != -1) {
output.write(buffer, 0, bytesRead);
bytesRead = input.read(buffer);
}
//If needed, close streams.
} finally {
input.close();
output.close();
}
}
Use org.apache.commons.io.IOUtils
InputStream inStream = new ...
OutputStream outStream = new ...
IOUtils.copy(inStream, outStream);
or copyLarge for size >2GB
This is a Scala version that is clean and fast (no stackoverflow):
import scala.annotation.tailrec
import java.io._
implicit class InputStreamOps(in: InputStream) {
def >(out: OutputStream): Unit = pipeTo(out)
def pipeTo(out: OutputStream, bufferSize: Int = 1<<10): Unit = pipeTo(out, Array.ofDim[Byte](bufferSize))
#tailrec final def pipeTo(out: OutputStream, buffer: Array[Byte]): Unit = in.read(buffer) match {
case n if n > 0 =>
out.write(buffer, 0, n)
pipeTo(out, buffer)
case _ =>
in.close()
out.close()
}
}
This enables to use > symbol e.g. inputstream > outputstream and also pass in custom buffers/sizes.
In case you are into functional this is a function written in Scala showing how you could copy an input stream to an output stream using only vals (and not vars).
def copyInputToOutputFunctional(inputStream: InputStream, outputStream: OutputStream,bufferSize: Int) {
val buffer = new Array[Byte](bufferSize);
def recurse() {
val len = inputStream.read(buffer);
if (len > 0) {
outputStream.write(buffer.take(len));
recurse();
}
}
recurse();
}
Note that this is not recommended to use in a java application with little memory available because with a recursive function you could easily get a stack overflow exception error