We have an environment where each user may get different html/js/css resources. I'm using the following code to compress and transfer a java script resource:
public static byte[] compress(String str) throws IOException {
if (str == null || str.length() == 0) {
return null;
}
ByteArrayOutputStream obj=new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(obj);
gzip.write(str.getBytes("UTF-8"));
gzip.close();
return obj.toByteArray();
}
...
HttpServletResponse raw = response.raw();
raw.setBufferSize(file.length().intValue());
ServletOutputStream servletOutputStream = raw.getOutputStream();
servletOutputStream.write(compress(FileUtils.readFileToString(file)));
servletOutputStream.flush();
servletOutputStream.close();
...
Inspecting the problem using chrome network tab, the download time is 2 seconds for 300KB of compressed data - this seems unreasonable.
The problem is not the bandwidth or jetty itself, because static resources transfer time is fast.
Don't know if that is the source of your bottleneck, but I wouldn't do:
raw.setBufferSize(file.length().intValue());
If the gzipped file is around 300KB then you could create response buffers which are a magnitude greater than that. And you don't need a big response buffer at all when you are streaming static content.
From the servlet javadoc:
Sets the preferred buffer size for the body of the response. The
servlet container will use a buffer at least as large as the size
requested. The actual buffer size used can be found using
getBufferSize.
A larger buffer allows more content to be written before anything is
actually sent, thus providing the servlet with more time to set
appropriate status codes and headers. A smaller buffer decreases
server memory load and allows the client to start receiving data more
quickly.
Related
We are using Java 8 and using AWS SDK to programmatically upload files to AWS S3. For uploading large file (>100MB), we read that the preferred method to use is Multipart Upload. We tried that but it seems it does not speed it up, upload time remains almost the same as not using multipart upload. Worse is, we even encountered out of memory errors saying heap space is not sufficient.
Questions:
Is using multipart upload really supposed to speed up the upload? if not, then why use it?
How come using multi part upload eats up memory faster than not using? does it concurrently upload all the parts?
See below for the code we used:
private static void uploadFileToS3UsingBase64(String bucketName, String region, String accessKey, String secretKey,
String fileBase64String, String s3ObjectKeyName) {
byte[] bI = org.apache.commons.codec.binary.Base64.decodeBase64((fileBase64String.substring(fileBase64String.indexOf(",")+1)).getBytes());
InputStream fis = new ByteArrayInputStream(bI);
long start = System.currentTimeMillis();
AmazonS3 s3Client = null;
TransferManager tm = null;
try {
s3Client = AmazonS3ClientBuilder.standard().withRegion(region)
.withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials(accessKey, secretKey)))
.build();
tm = TransferManagerBuilder.standard()
.withS3Client(s3Client)
.withMultipartUploadThreshold((long) (50* 1024 * 1025))
.build();
ObjectMetadata metadata = new ObjectMetadata();
metadata.setHeader(Headers.STORAGE_CLASS, StorageClass.Standard);
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, s3ObjectKeyName,
fis, metadata).withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams());
Upload upload = tm.upload(putObjectRequest);
// Optionally, wait for the upload to finish before continuing.
upload.waitForCompletion();
long end = System.currentTimeMillis();
long duration = (end - start)/1000;
// Log status
System.out.println("Successul upload in S3 multipart. Duration = " + duration);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (s3Client != null)
s3Client.shutdown();
if (tm != null)
tm.shutdownNow();
}
}
Using multipart will only speed up the upload if you upload multiple parts at the same time.
In your code you're setting withMultipartUploadThreshold. If your upload size is larger than that threshold, then you should observe concurrent upload of separate parts. If it is not, then only one upload connection should be used. You're saying that you have >100 MB file, and in your code you have 50 * 1024 * 1025 = 52 480 000 bytes as the multipart upload threshold, so concurrent upload of parts of that file should have been happening.
However, if your upload throughput is anyway capped by your network speed, there would not be any increase in throughput. This might be the reason you're not observing any speed increase.
There are other reasons to use multipart too, as it is recommended for fault tolerance reasons as well. Also, it has a larger maximum size than single upload.
For more details see documentation:
Multipart upload allows you to upload a single object as a set of
parts. Each part is a contiguous portion of the object's data. You can
upload these object parts independently and in any order. If
transmission of any part fails, you can retransmit that part without
affecting other parts. After all parts of your object are uploaded,
Amazon S3 assembles these parts and creates the object. In general,
when your object size reaches 100 MB, you should consider using
multipart uploads instead of uploading the object in a single
operation.
Using multipart upload provides the following advantages:
Improved throughput - You can upload parts in parallel to improve throughput.
Quick recovery from any network issues - Smaller part size minimizes the impact of restarting a failed upload due to a network
error.
Pause and resume object uploads - You can upload object parts over time. After you initiate a multipart upload, there is no expiry; you
must explicitly complete or stop the multipart upload.
Begin an upload before you know the final object size - You can upload an object as you are creating it.
We recommend that you use multipart upload in the following ways:
If you're uploading large objects over a stable high-bandwidth network, use multipart upload to maximize the use of your available
bandwidth by uploading object parts in parallel for multi-threaded
performance.
If you're uploading over a spotty network, use multipart upload to increase resiliency to network errors by avoiding upload restarts.
When using multipart upload, you need to retry uploading only parts
that are interrupted during the upload. You don't need to restart
uploading your object from the beginning.
The answer from eis is very fine. Though you still should take some action:
String.getBytes(StandardCharsets.US_ASCII) or ISO_8859_1 prevents using a more costly encoding, like UTF-8. If the platform encoding would be UTF-16LE the data would even be corrupt (0x00 bytes).
The standard java Base64 has some de-/encoders that might work. It can work on a String. However check the correct handling (line endings).
try-with-resources closes also in case of exceptions/internal returns.
The ByteArrayInputStream was not closed, which would have been better style (easier garbage collection?).
You could set the ExecutorFactory to a thread pool factory limiting the number of threads globally.
So
byte[] bI = Base64.getDecoder().decode(
fileBase64String.substring(fileBase64String.indexOf(',') + 1));
try (InputStream fis = new ByteArrayInputStream(bI)) {
...
}
As part of my web service, I have a picture repository which retrieves an image from Amazon S3 (a datastore) then returns it. This is how the method that does this looks:
File getPicture(String path) throws IOException {
File file = File.createTempFile(path, ".png");
S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, path));
IOUtils.copy(object.getObjectContent(), new FileOutputStream(file));
return file;
}
The problem is that it takes way too long to get a response from the service - (a 3MB image took 7.5 seconds to download). I notice that if I comment out the IOUtils.copy() line, the response time is significantly faster so it must be that particular method that's causing this delay.
I've seen this method used in almost all modern examples of converting an S3Object to a file but I seem to be a unique case. Am I missing a trick here?
Appreciate any help!
From the AWS documentation:
public S3Object getObject(GetObjectRequest getObjectRequest)
the returned Amazon S3 object contains a direct stream of data from the HTTP connection. The underlying HTTP connection cannot be reused until the user finishes reading the data and closes the stream.
public S3ObjectInputStream getObjectContent()
Note: The method is a simple getter and does not actually create a stream. If you retrieve an S3Object, you should close this input stream as soon as possible, because the object contents aren't buffered in memory and stream directly from Amazon S3.
If you remove the IOUtils.copy line, then method exits quickly because you don't actually process the stream. If the file is large it will take time to download. You can't do much about that unless you can get a better connection to the AWS services.
I write a client-server application which will be sending an .xml file from the client to the server. I have a problem with sending large data. I notice that the server can get at most 1460 bytes. When I send a file with more than 1460 bytes the server gets only first 1460 bytes and nothng more. In effect I get uncompleted file. Here is my code:
client send:
public void sendToServer(File file) throws Exception
{
OutputStream output = sk.getOutputStream();
FileInputStream fileInputStream = new FileInputStream(file);
byte[] buffer = new byte[1024*1024];
int bytesRead = 0;
while((bytesRead = fileInputStream.read(buffer))>0)
{
output.write(buffer,0,bytesRead);
}
fileInputStream.close();
}
server get:
public File getFile(String name) throws Exception
{
File file=null;
InputStream input = sk.getInputStream();
file = new File("C://protokolPliki/" + name);
FileOutputStream out = new FileOutputStream(file);
byte[] buffer = new byte[1024*1024];
int bytesReceived = 0;
while((bytesReceived = input.read(buffer))>0) {
out.write(buffer,0,bytesReceived);
System.out.println(bytesReceived);
break;
}
return file;
}
Do anyone know what is wrong with this code? Thanks for any help.
EDIT:
Nothing help :(. I google about that and I think its may connected with TCP MSS with is equal 1460 bytes.
Make sure you call flush() on the streams.
A passerby asks: isn't close() enough?
You linked to the docs for Writer, and the info. on the close() method states..
Closes the stream, flushing it first. ..
So you are partly right, OTOH, the OP is clearly using an OutputStream and the docs for close() state:
Closes this output stream and releases any system resources associated with this stream. The general contract of close is that it closes the output stream. A closed stream cannot perform output operations and cannot be reopened.
The close method of OutputStream does nothing.
(Emphasis mine.)
So to sum up. No, calling close() on a plain OutputStream will have no effect, and might as well be removed by the compiler.
Although not relate to your question, the API document said FileInputStream.read returns -1 for end of file. You should use >=0 for the while loop.
The MTU (Maximum Transmission Unit) for Ethernet is around 1500 bytes. Consider sending the file in chunks (i.e. one line at a time or 1024 bytes at a time).
See if using 1024 instead of 1024 * 1024 for the byte buffer solves your problem.
In the code executed on the server side, there is a break instruction in the while loop. Therefore the code in the loop will only get executed once. Remove the break instruction and the code should work just fine.
I'm have a deal with Spring MVC based application deployed under JBoss-4.2.3.GA and want to clarify how servlet input/output streaming works with huge requests/responses body. I'm bother about it because don't want to keep whole request/response in memory until call will be completely finished.
How can I detect exactly input/output stream implementation that JBoss passes to servlet? Or possible I can investigate its behavior in some kind of specification?
Thanks for any useful info about it.
The servlet API does by default not keep the entire request and response body in memory. It's effectively your own processing/parsing code which does that.
As to request bodies, when processing it, you should not hold the entire body in a byte[]. Each byte of a byte[] consumes, yes, one byte of Java's memory. You should try to (re)write your code as such that it never holds the entire body in memory. Process it for example line-by-line or buffer-by-buffer and/or stream it immediately to an OutputStream.
E.g. when the body is character based:
BufferedReader reader = new BufferedReader(new InputStreamReader(request.getInputStream(), "UTF-8"));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(someOutputStream, "UTF-8"));
for (String line; (line = reader.readLine()) != null;) {
processIfNecessary(line);
writer.writeln(line);
}
or when the body is byte based:
BufferedInputStream input = new BufferedInputStream(request.getInputStream());
BufferedOutputStream output = new BufferedOutputStream(someOutputStream);
byte[] buffer = new byte[1024]; // 1KB buffer.
for (int length; (length = input.read(buffer)) > 0;) {
processIfNecessary(buffer);
output.write(buffer, 0, length);
}
As to response bodies, it will be kept in the memory until the buffer size. Anything beyond the buffer size will be flushed. The default buffer size is usually 2KB. This is configureable at appserver level and by ServletResponse#setBufferSize(). When you set the buffer size too high, it will gobble memory.
I am using IBM Websphere Application Server v6 and Java 1.4 and am trying to write large CSV files to the ServletOutputStream for a user to download. Files are ranging from a 50-750MB at the moment.
The smaller files aren't causing too much of a problem but with the larger files it appears that it is being written into the heap which is then causing an OutOfMemory error and bringing down the entire server.
These files can only be served out to authenticated users over HTTPS which is why I am serving them through a Servlet instead of just sticking them in Apache.
The code I am using is (some fluff removed around this):
resp.setHeader("Content-length", "" + fileLength);
resp.setContentType("application/vnd.ms-excel");
resp.setHeader("Content-Disposition","attachment; filename=\"export.csv\"");
FileInputStream inputStream = null;
try
{
inputStream = new FileInputStream(path);
byte[] buffer = new byte[1024];
int bytesRead = 0;
do
{
bytesRead = inputStream.read(buffer, offset, buffer.length);
resp.getOutputStream().write(buffer, 0, bytesRead);
}
while (bytesRead == buffer.length);
resp.getOutputStream().flush();
}
finally
{
if(inputStream != null)
inputStream.close();
}
The FileInputStream doesn't seem to be causing a problem as if I write to another file or just remove the write completely the memory usage doesn't appear to be a problem.
What I am thinking is that the resp.getOutputStream().write is being stored in memory until the data can be sent through to the client. So the entire file might be read and stored in the resp.getOutputStream() causing my memory issues and crashing!
I have tried Buffering these streams and also tried using Channels from java.nio, none of which seems to make any bit of difference to my memory issues. I have also flushed the OutputStream once per iteration of the loop and after the loop, which didn't help.
The average decent servletcontainer itself flushes the stream by default every ~2KB. You should really not have the need to explicitly call flush() on the OutputStream of the HttpServletResponse at intervals when sequentially streaming data from the one and same source. In for example Tomcat (and Websphere!) this is configureable as bufferSize attribute of the HTTP connector.
The average decent servletcontainer also just streams the data in chunks if the content length is unknown beforehand (as per the Servlet API specification!) and if the client supports HTTP 1.1.
The problem symptoms at least indicate that the servletcontainer is buffering the entire stream in memory before flushing. This can mean that the content length header is not set and/or the servletcontainer does not support chunked encoding and/or the client side does not support chunked encoding (i.e. it is using HTTP 1.0).
To fix the one or other, just set the content length beforehand:
response.setContentLengthLong(new File(path).length());
Or when you're not on Servlet 3.1 yet:
response.setHeader("Content-Length", String.valueOf(new File(path).length()));
Does flush work on the output stream.
Really I wanted to comment that you should use the three-arg form of write as the buffer is not necessarily fully read (particularly at the end of the file(!)). Also a try/finally would be in order unless you want you server to die unexpectedly.
I have used a class that wraps the outputstream to make it reusable in other contexts. It has worked well for me in getting data to the browser faster, but I haven't looked at the memory implications. (please pardon my antiquated m_ variable naming)
import java.io.IOException;
import java.io.OutputStream;
public class AutoFlushOutputStream extends OutputStream {
protected long m_count = 0;
protected long m_limit = 4096;
protected OutputStream m_out;
public AutoFlushOutputStream(OutputStream out) {
m_out = out;
}
public AutoFlushOutputStream(OutputStream out, long limit) {
m_out = out;
m_limit = limit;
}
public void write(int b) throws IOException {
if (m_out != null) {
m_out.write(b);
m_count++;
if (m_limit > 0 && m_count >= m_limit) {
m_out.flush();
m_count = 0;
}
}
}
}
I'm also not sure if flush() on ServletOutputStream works in this case, but ServletResponse.flushBuffer() should send the response to the client (at least per 2.3 servlet spec).
ServletResponse.setBufferSize() sounds promising, too.
So, following your scenario, shouldn't you been flush(ing) inside that while loop (on every iteration), instead of outside of it? I would try that, with a bit larger buffer though.
Kevin's class should close the m_out field if it's not null in the close() operator, we don't want to leak things, do we?
As well as the ServletOutputStream.flush() operator, the HttpServletResponse.flushBuffer() operation may also flush the buffers. However, it appears to be an implementation specific detail as to whether or not these operations have any effect, or whether http content length support is interfering. Remember, specifying content-length is an option on HTTP 1.0, so things should just stream out if you flush things. But I don't see that
The while condition does not work, you need to check the -1 before using it. And please use a temporary variable for the output stream, its nicer to read and it safes calling the getOutputStream() repeadably.
OutputStream outStream = resp.getOutputStream();
while(true) {
int bytesRead = inputStream.read(buffer);
if (bytesRead < 0)
break;
outStream.write(buffer, 0, bytesRead);
}
inputStream.close();
out.close();
unrelated to your memory problems, the while loop should be:
while(bytesRead > 0);
your code has an infinite loop.
do
{
bytesRead = inputStream.read(buffer, offset, buffer.length);
resp.getOutputStream().write(buffer, 0, bytesRead);
}
while (bytesRead == buffer.length);
offset has the same value thoughout the loop, so if initially offset = 0, it will remain so in every iteration which will cause infinite-loop and which will leads to OOM error.
Ibm websphere application server uses asynchronous data transfer for servlets by default. That means that it buffers response. If you have problems with large data and OutOfMemory exceptions, try changing settings on WAS to use synchronous mode.
Setting the WebSphere Application Server WebContainer to synchronous mode
You must also take care of loading chunks and flush them.
Sample for loading from large file.
ServletOutputStream os = response.getOutputStream();
FileInputStream fis = new FileInputStream(file);
try {
int buffSize = 1024;
byte[] buffer = new byte[buffSize];
int len;
while ((len = fis.read(buffer)) != -1) {
os.write(buffer, 0, len);
os.flush();
response.flushBuffer();
}
} finally {
os.close();
}