StreamingResponseBody heap usage

StreamingResponseBody heap usage - java

i have got simple method in controller which streams content from database, streaming works as intended, download starts right after calling endpoint. Problem is heap usage, streaming 256 MB file takes 1GB heap space. If I would replace service.writeContentToStream(param1, param2, out) with method that reads data from local file to input stream and copying to passed output stream result is same. Biggest file I can stream is 256 MB. Is there possible solution to overcome heap size limit?
#GetMapping("/{param1}/download-stream")
public ResponseEntity<StreamingResponseBody> downloadAsStream(
#PathVariable("param1") String param1,
#RequestParam(value = "param2") String param2
) {
Metadata metadata = service.getMetadata(param1);
StreamingResponseBody stream = out -> service.writeContentToStream(param1, param2, out);
return ResponseEntity.ok()
.header(HttpHeaders.CONTENT_DISPOSITION, "attachment;" + getFileNamePart() + metadata.getFileName())
.header(HttpHeaders.CONTENT_LENGTH, Long.toString(metadata.getFileSize()))
.body(stream);
}
service.writeContentToStream method
try (FileInputStream fis = new FileInputStream(fileName)) {
StreamUtils.copy(fis, dataOutputStream);
} catch (IOException e) {
log.error("Error writing file to stream",e);
}
Matadata class contains only information about filesize and filename, there is no content stored there
EDIT
implementation of StreamUtils.copy() method, it comes from spring library
StreamUtils.copy(). Buffer size is set to 4096. Setting buffer to smaller size does not allow me to download bigger files
/**
* Copy the contents of the given InputStream to the given OutputStream.
* Leaves both streams open when done.
* #param in the InputStream to copy from
* #param out the OutputStream to copy to
* #return the number of bytes copied
* #throws IOException in case of I/O errors
*/
public static int copy(InputStream in, OutputStream out) throws IOException {
Assert.notNull(in, "No InputStream specified");
Assert.notNull(out, "No OutputStream specified");
int byteCount = 0;
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
byteCount += bytesRead;
}
out.flush();
return byteCount;
}

I wrote an article back in 2016 regarding StreamingResponseBody when it was first released. You can read that to get more of an idea. But even without that what you are trying to do with the following code is not scalable at all (Imagine 100 users concurrently trying to download).
try (FileInputStream fis = new FileInputStream(fileName)) {
StreamUtils.copy(fis, dataOutputStream);
} catch (IOException e) {
log.error("Error writing file to stream",e);
}
The above code is very memory intensive and nodes with high memory can only work with this and you always will have an upper bound on the file size (Can it download a 1TB file in 5 years?)
What you should do is the following;
try (FileInputStream fis = new FileInputStream(fileName)) {
byte[] data = new byte[2048];
int read = 0;
while ((read = fis.read(data)) > 0) {
dataOutputStream.write(data, 0, read);
}
dataOutputStream.flush();
} catch (IOException e) {
log.error("Error writing file to stream",e);
}
This way your code can download files of any size given that the user is able to wait and will not require a lot of memory

Some ideas:
Run the server inside the Java profiler. For example JProfiler (it costs money).
Try ServletResponse.setBufferSize(...)
Check, if you have some filters configured in the application.
Check the output buffer of the application server. In case of the Tomcat it could be quite tricky. It has a long list of possible buffers:
https://tomcat.apache.org/tomcat-8.5-doc/config/http.html

For me it was logging dependency, so if you are having problems with identifying the cause of heap usage, take a look at your logging configuration:
<dependency>
<groupId>org.zalando</groupId>
<artifactId>logbook-spring-boot-starter</artifactId>
<version>1.4.1</version>
<scope>compile</scope>
</dependency>

Related

Does not closing a FileOutPutStream not write anything to the file?

I have a function which writes the given input stream to a given output stream. Code below.
static void copyStream(InputStream is, OutputStream os) throws IOException {
byte[] buffer = new byte[4096];
int len;
try {
while ((len = is.read(buffer)) != -1) {
os.write(buffer, 0, len);
}
}
}
The above function is called from this function
public static void copyFile(File srcFile, File destFile) throws IOException {
FileInputStream fis = new FileInputStream(srcFile);
try {
FileOutputStream fos = new FileOutputStream(destFile);
try {
**copyStream**(fis, fos);
} finally {
if (fos != null)
fos.close();
}
} finally {
if (fis != null)
fis.close();
}
}
In this function, I am writing 4 MB at once. I use this function to copy images. Occasionally I see that the destination file is not created due to which an exception occurs while trying to read that file for future processing. I am guessing the culprit to be not closing the resources. Is my hypothesis good? What are the reasons why my function might fail? Please help

I believe, that given InputStream and OutputStream installed correctly.
Add os.flush(); at the end. Sure, both streams should be closed in the caller as well.
As alternative, you could use Apache IO utils org.apache.commons.io.IOUtils.copy(InputStream input, OutputStream output).

Yes you absolutely must close your destination file to ensure that all caches from the JVM through to the OS are flushed and the file is ready for a reader to consume.
Copying large files the way that you are doing is concise in code but inefficient in operation. Consider upgrading your code to use the more efficient NIO methods, documented here in a blog post. In case that blog disappears, here's the code:
Utility class:
public final class ChannelTools {
public static void fastChannelCopy(final ReadableByteChannel src, final WritableByteChannel dest) throws IOException {
final ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while (src.read(buffer) != -1) {
// prepare the buffer to be drained
buffer.flip();
// write to the channel, may block
dest.write(buffer);
// If partial transfer, shift remainder down
// If buffer is empty, same as doing clear()
buffer.compact();
}
// EOF will leave buffer in fill state
buffer.flip();
// make sure the buffer is fully drained.
while (buffer.hasRemaining()) {
dest.write(buffer);
}
}
}
Usage example with your InputStream and OutputStream:
// allocate the stream ... only for example
final InputStream input = new FileInputStream(inputFile);
final OutputStream output = new FileOutputStream(outputFile);
// get an channel from the stream
final ReadableByteChannel inputChannel = Channels.newChannel(input);
final WriteableByteChannel outputChannel = Channels.newChannel(output);
// copy the channels
ChannelTools.fastChannelCopy(inputChannel, outputChannel);
// closing the channels
inputChannel.close();
outputChannel.close()
There is also a more concise method documented in Wikipedia that achieves the same thing with less code:
// Getting file channels
FileChannel in = new FileInputStream(source).getChannel();
FileChannel out = new FileOutputStream(target).getChannel();
// JavaVM does its best to do this as native I/O operations.
in.transferTo(0, in.size(), out);
// Closing file channels will close corresponding stream objects as well.
out.close();
in.close();

Why is my binary data bigger after getting it from the webserver?

I need to serve a binary file through a web service implemented in Python/Django. The problem is, that when I compare the original file with the transferred file with vbindiff I see trailing bytes on the transferred file, sadly rendering it useless.
The Binary File is accessed saved by a client in Java with:
HttpURLConnection userdataConnection = null;
URL userdataUrl = null;
try {
userdataUrl = new URL("http://localhost:8000/app/vuforia/10");
userdataConnection = (HttpURLConnection) userdataUrl.openConnection();
userdataConnection.setRequestMethod("GET");
userdataConnection.setRequestProperty("Content-Type", "application/octet-stream");
userdataConnection.connect();
InputStream userdataStream = new BufferedInputStream(userdataConnection.getInputStream());
try (ByteArrayOutputStream fileStream = new ByteArrayOutputStream()) {
byte[] buffer = new byte[4094];
while (userdataStream.read(buffer) != -1) {
fileStream.write(buffer);
}
byte[] fileBytes = fileStream.toByteArray();
try (FileOutputStream fos = new FileOutputStream("./test.dat")) {
fos.write(fileBytes);
}
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
I think that HttpURLConnection.getInputStream only reads the body of the response, or not?
This code serves the data in the backend
in views.py:
if request.method == "GET":
all_data = VuforiaDatabase.objects.all()
data = all_data.get(id=version)
return FileResponse(data.get_dat_bytes())
in models.py:
def get_dat_bytes(self):
return self.dat_upload.open()
How do I go about transferring the binary data 1:1?

You’re ignoring the return value of InputStream.read.
From the documentation:
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
Your code is assuming that the buffer is filled with every call to userdataStream.read(buffer), instead of checking how many bytes were actually read into buffer.
You don’t need to read from an InputStream at all. Just use Files.copy:
Path file = Paths.get("./test.dat");
try (InputStream userdataStream = new BufferedInputStream(userdataConnection.getInputStream())) {
Files.copy(userdataStream, file, StandardCopyOption.REPLACE_EXISTING);
}

You always write a multiple the 4094 bytes, no matter how many bytes you actually read.
Don't do .write(buffer); write the amount you actually read. This is what userdataStream.read returns you. It can return a number smaller than the buffer size, but still positive.
If you project is using Apache Commons already, you can just use copyInputStreamToFile.
Note: 4K = 4096, not 4094, and it's a ridiculously small buffer, unless you operate something like a smartcard. On a PC, use something like a few hundred kb at least.

Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

I was trying to read a file into an array by using FileInputStream, and an ~800KB file took about 3 seconds to read into memory. I then tried the same code except with the FileInputStream wrapped into a BufferedInputStream and it took about 76 milliseconds. Why is reading a file byte by byte done so much faster with a BufferedInputStream even though I'm still reading it byte by byte? Here's the code (the rest of the code is entirely irrelevant). Note that this is the "fast" code. You can just remove the BufferedInputStream if you want the "slow" code:
InputStream is = null;
try {
is = new BufferedInputStream(new FileInputStream(file));
int[] fileArr = new int[(int) file.length()];
for (int i = 0, temp = 0; (temp = is.read()) != -1; i++) {
fileArr[i] = temp;
}
BufferedInputStream is over 30 times faster. Far more than that. So, why is this, and is it possible to make this code more efficient (without using any external libraries)?

In FileInputStream, the method read() reads a single byte. From the source code:
/**
* Reads a byte of data from this input stream. This method blocks
* if no input is yet available.
*
* #return the next byte of data, or <code>-1</code> if the end of the
* file is reached.
* #exception IOException if an I/O error occurs.
*/
public native int read() throws IOException;
This is a native call to the OS which uses the disk to read the single byte. This is a heavy operation.
With a BufferedInputStream, the method delegates to an overloaded read() method that reads 8192 amount of bytes and buffers them until they are needed. It still returns only the single byte (but keeps the others in reserve). This way the BufferedInputStream makes less native calls to the OS to read from the file.
For example, your file is 32768 bytes long. To get all the bytes in memory with a FileInputStream, you will require 32768 native calls to the OS. With a BufferedInputStream, you will only require 4, regardless of the number of read() calls you will do (still 32768).
As to how to make it faster, you might want to consider Java 7's NIO FileChannel class, but I have no evidence to support this.
Note: if you used FileInputStream's read(byte[], int, int) method directly instead, with a byte[>8192] you wouldn't need a BufferedInputStream wrapping it.

A BufferedInputStream wrapped around a FileInputStream, will request data from the FileInputStream in big chunks (512 bytes or so by default, I think.) Thus if you read 1000 characters one at a time, the FileInputStream will only have to go to the disk twice. This will be much faster!

It is because of the cost of disk access. Lets assume you will have a file which size is 8kb. 8*1024 times access disk will be needed to read this file without BufferedInputStream.
At this point, BufferedStream comes to the scene and acts as a middle man between FileInputStream and the file to be read.
In one shot, will get chunks of bytes default is 8kb to memory and then FileInputStream will read bytes from this middle man.
This will decrease the time of the operation.
private void exercise1WithBufferedStream() {
long start= System.currentTimeMillis();
try (FileInputStream myFile = new FileInputStream("anyFile.txt")) {
BufferedInputStream bufferedInputStream = new BufferedInputStream(myFile);
boolean eof = false;
while (!eof) {
int inByteValue = bufferedInputStream.read();
if (inByteValue == -1) eof = true;
}
} catch (IOException e) {
System.out.println("Could not read the stream...");
e.printStackTrace();
}
System.out.println("time passed with buffered:" + (System.currentTimeMillis()-start));
}
private void exercise1() {
long start= System.currentTimeMillis();
try (FileInputStream myFile = new FileInputStream("anyFile.txt")) {
boolean eof = false;
while (!eof) {
int inByteValue = myFile.read();
if (inByteValue == -1) eof = true;
}
} catch (IOException e) {
System.out.println("Could not read the stream...");
e.printStackTrace();
}
System.out.println("time passed without buffered:" + (System.currentTimeMillis()-start));
}

how to tune BufferedInputStream read()?

I am reading a BLOB column from a Oracle database, then writing it to a file as follows:
public static int execute(String filename, BLOB blob)
{
int success = 1;
try
{
File blobFile = new File(filename);
FileOutputStream outStream = new FileOutputStream(blobFile);
BufferedInputStream inStream = new BufferedInputStream(blob.getBinaryStream());
int length = -1;
int size = blob.getBufferSize();
byte[] buffer = new byte[size];
while ((length = inStream.read(buffer)) != -1)
{
outStream.write(buffer, 0, length);
outStream.flush();
}
inStream.close();
outStream.close();
}
catch (Exception e)
{
e.printStackTrace();
System.out.println("ERROR(img_exportBlob) Unable to export:"+filename);
success = 0;
}
}
The file size is around 3MB and it takes 40-50s to read the buffer. Its actually a 3D image data. So, is there any way by which I can reduce this time?

Given that the blob already has the concept of a buffer, it's possible that you're actually harming performance by using the BufferedInputStream at all - it may be making smaller read() calls, making more network calls than necessary.
Try getting rid of the BufferedInputStream completely, just reading directly from the blob's binary stream. It's only a thought, but worth a try. Oh, and you don't need to flush the output stream every time you write.
(As an aside, you should b closing streams in finally blocks - otherwise you'll leak handles if anything throws an exception.)

Connecting an input stream to an outputstream

update in java9: https://docs.oracle.com/javase/9/docs/api/java/io/InputStream.html#transferTo-java.io.OutputStream-
I saw some similar, but not-quite-what-i-need threads.
I have a server, which will basically take input from a client, client A, and forward it, byte for byte, to another client, client B.
I'd like to connect my inputstream of client A with my output stream of client B. Is that possible? What are ways to do that?
Also, these clients are sending each other messages, which are somewhat time sensitive, so buffering won't do. I do not want a buffer of say 500 and a client sends 499 bytes and then my server holds off on forwarding the 500 bytes because it hasn't received the last byte to fill the buffer.
Right now, I am parsing each message to find its length, then reading length bytes, then forwarding them. I figured (and tested) this would be better than reading a byte and forwarding a byte over and over because that would be very slow. I also did not want to use a buffer or a timer for the reason I stated in my last paragraph — I do not want messages waiting a really long time to get through simply because the buffer isn't full.
What's a good way to do this?

Just because you use a buffer doesn't mean the stream has to fill that buffer. In other words, this should be okay:
public static void copyStream(InputStream input, OutputStream output)
throws IOException
{
byte[] buffer = new byte[1024]; // Adjust if you want
int bytesRead;
while ((bytesRead = input.read(buffer)) != -1)
{
output.write(buffer, 0, bytesRead);
}
}
That should work fine - basically the read call will block until there's some data available, but it won't wait until it's all available to fill the buffer. (I suppose it could, and I believe FileInputStream usually will fill the buffer, but a stream attached to a socket is more likely to give you the data immediately.)
I think it's worth at least trying this simple solution first.

How about just using
void feedInputToOutput(InputStream in, OutputStream out) {
IOUtils.copy(in, out);
}
and be done with it?
from jakarta apache commons i/o library which is used by a huge amount of projects already so you probably already have the jar in your classpath already.

JDK 9 has added InputStream#transferTo(OutputStream out) for this functionality.

For completeness, guava also has a handy utility for this
ByteStreams.copy(input, output);

You can use a circular buffer :
Code
// buffer all data in a circular buffer of infinite size
CircularByteBuffer cbb = new CircularByteBuffer(CircularByteBuffer.INFINITE_SIZE);
class1.putDataOnOutputStream(cbb.getOutputStream());
class2.processDataFromInputStream(cbb.getInputStream());
Maven dependency
<dependency>
<groupId>org.ostermiller</groupId>
<artifactId>utils</artifactId>
<version>1.07.00</version>
</dependency>
Mode details
http://ostermiller.org/utils/CircularBuffer.html

Asynchronous way to achieve it.
void inputStreamToOutputStream(final InputStream inputStream, final OutputStream out) {
Thread t = new Thread(new Runnable() {
public void run() {
try {
int d;
while ((d = inputStream.read()) != -1) {
out.write(d);
}
} catch (IOException ex) {
//TODO make a callback on exception.
}
}
});
t.setDaemon(true);
t.start();
}

BUFFER_SIZE is the size of chucks to read in. Should be > 1kb and < 10MB.
private static final int BUFFER_SIZE = 2 * 1024 * 1024;
private void copy(InputStream input, OutputStream output) throws IOException {
try {
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = input.read(buffer);
while (bytesRead != -1) {
output.write(buffer, 0, bytesRead);
bytesRead = input.read(buffer);
}
//If needed, close streams.
} finally {
input.close();
output.close();
}
}

Use org.apache.commons.io.IOUtils
InputStream inStream = new ...
OutputStream outStream = new ...
IOUtils.copy(inStream, outStream);
or copyLarge for size >2GB

This is a Scala version that is clean and fast (no stackoverflow):
import scala.annotation.tailrec
import java.io._
implicit class InputStreamOps(in: InputStream) {
def >(out: OutputStream): Unit = pipeTo(out)
def pipeTo(out: OutputStream, bufferSize: Int = 1<<10): Unit = pipeTo(out, Array.ofDim[Byte](bufferSize))
#tailrec final def pipeTo(out: OutputStream, buffer: Array[Byte]): Unit = in.read(buffer) match {
case n if n > 0 =>
out.write(buffer, 0, n)
pipeTo(out, buffer)
case _ =>
in.close()
out.close()
}
}
This enables to use > symbol e.g. inputstream > outputstream and also pass in custom buffers/sizes.

In case you are into functional this is a function written in Scala showing how you could copy an input stream to an output stream using only vals (and not vars).
def copyInputToOutputFunctional(inputStream: InputStream, outputStream: OutputStream,bufferSize: Int) {
val buffer = new Array[Byte](bufferSize);
def recurse() {
val len = inputStream.read(buffer);
if (len > 0) {
outputStream.write(buffer.take(len));
recurse();
}
}
recurse();
}
Note that this is not recommended to use in a java application with little memory available because with a recursive function you could easily get a stack overflow exception error

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

StreamingResponseBody heap usage - java

Related

Does not closing a FileOutPutStream not write anything to the file?

Why is my binary data bigger after getting it from the webserver?

Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

how to tune BufferedInputStream read()?

Connecting an input stream to an outputstream

Categories

Resources