Got IOException "insufficient data written" when inserting large file into Google drive

Got IOException "insufficient data written" when inserting large file into Google drive - java

I'm trying to insert large file into Google's drive using google-api-services-drive version v2-rev93-1.16.0-rc
I've set setChunkSize() for minimum in order to have my own ProgressListener notified more frequent. The following code is used to insert file:
File body = new File();
body.setTitle(filetobeuploaded.getName());
body.setMimeType("application/zip");
body.setFileSize(filetobeuploaded.length());
InputStreamContent mediaContent =
new InputStreamContent("application/zip",
new BufferedInputStream(new FileInputStream(filetobeuploaded)));
mediaContent.setLength(filetobeuploaded.length());
Insert insert = drive.files().insert(body, mediaContent);
MediaHttpUploader uploader = insert.getMediaHttpUploader();
uploader.setChunkSize(MediaHttpUploader.MINIMUM_CHUNK_SIZE);
uploader.setProgressListener(new CustomProgressListener(filetobeuploaded));
insert.execute();
After 'a while' (sometimes 200 MB sometimes 300 MB ) I got IOException :
Exception in thread "main" java.io.IOException: insufficient data written
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3213)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:960)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:482)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:390)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:418)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:460)
Any ideas how to get this code working?

You wont be able to get it working from a frontend because of time constrains. The only reliable way (but a pain) is to do it from a backend using resumable upload since the backend/task queue may also be shut down while processing chunks.

Does 'a while' happen to mean 1 hour? In this case you are probably experiencing the following bug:
http://code.google.com/p/gdata-issues/issues/detail?id=5124

This issue is only for Drive Resumable Media upload. check this reply ..
https://stackoverflow.com/a/30796105/4576135

For me this meant "you are doing a POST and specified a content length, but then the stream you uploaded wasn't long enough to match that content length" (a closed bytearray stream, in my case, closed because previously used to basically "exhausted" already, as it were).

Related

Java File Upload to S3 - should multipart speed it up?

We are using Java 8 and using AWS SDK to programmatically upload files to AWS S3. For uploading large file (>100MB), we read that the preferred method to use is Multipart Upload. We tried that but it seems it does not speed it up, upload time remains almost the same as not using multipart upload. Worse is, we even encountered out of memory errors saying heap space is not sufficient.
Questions:
Is using multipart upload really supposed to speed up the upload? if not, then why use it?
How come using multi part upload eats up memory faster than not using? does it concurrently upload all the parts?
See below for the code we used:
private static void uploadFileToS3UsingBase64(String bucketName, String region, String accessKey, String secretKey,
String fileBase64String, String s3ObjectKeyName) {
byte[] bI = org.apache.commons.codec.binary.Base64.decodeBase64((fileBase64String.substring(fileBase64String.indexOf(",")+1)).getBytes());
InputStream fis = new ByteArrayInputStream(bI);
long start = System.currentTimeMillis();
AmazonS3 s3Client = null;
TransferManager tm = null;
try {
s3Client = AmazonS3ClientBuilder.standard().withRegion(region)
.withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials(accessKey, secretKey)))
.build();
tm = TransferManagerBuilder.standard()
.withS3Client(s3Client)
.withMultipartUploadThreshold((long) (50* 1024 * 1025))
.build();
ObjectMetadata metadata = new ObjectMetadata();
metadata.setHeader(Headers.STORAGE_CLASS, StorageClass.Standard);
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, s3ObjectKeyName,
fis, metadata).withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams());
Upload upload = tm.upload(putObjectRequest);
// Optionally, wait for the upload to finish before continuing.
upload.waitForCompletion();
long end = System.currentTimeMillis();
long duration = (end - start)/1000;
// Log status
System.out.println("Successul upload in S3 multipart. Duration = " + duration);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (s3Client != null)
s3Client.shutdown();
if (tm != null)
tm.shutdownNow();
}
}

Using multipart will only speed up the upload if you upload multiple parts at the same time.
In your code you're setting withMultipartUploadThreshold. If your upload size is larger than that threshold, then you should observe concurrent upload of separate parts. If it is not, then only one upload connection should be used. You're saying that you have >100 MB file, and in your code you have 50 * 1024 * 1025 = 52 480 000 bytes as the multipart upload threshold, so concurrent upload of parts of that file should have been happening.
However, if your upload throughput is anyway capped by your network speed, there would not be any increase in throughput. This might be the reason you're not observing any speed increase.
There are other reasons to use multipart too, as it is recommended for fault tolerance reasons as well. Also, it has a larger maximum size than single upload.
For more details see documentation:
Multipart upload allows you to upload a single object as a set of
parts. Each part is a contiguous portion of the object's data. You can
upload these object parts independently and in any order. If
transmission of any part fails, you can retransmit that part without
affecting other parts. After all parts of your object are uploaded,
Amazon S3 assembles these parts and creates the object. In general,
when your object size reaches 100 MB, you should consider using
multipart uploads instead of uploading the object in a single
operation.
Using multipart upload provides the following advantages:
Improved throughput - You can upload parts in parallel to improve throughput.
Quick recovery from any network issues - Smaller part size minimizes the impact of restarting a failed upload due to a network
error.
Pause and resume object uploads - You can upload object parts over time. After you initiate a multipart upload, there is no expiry; you
must explicitly complete or stop the multipart upload.
Begin an upload before you know the final object size - You can upload an object as you are creating it.
We recommend that you use multipart upload in the following ways:
If you're uploading large objects over a stable high-bandwidth network, use multipart upload to maximize the use of your available
bandwidth by uploading object parts in parallel for multi-threaded
performance.
If you're uploading over a spotty network, use multipart upload to increase resiliency to network errors by avoiding upload restarts.
When using multipart upload, you need to retry uploading only parts
that are interrupted during the upload. You don't need to restart
uploading your object from the beginning.

The answer from eis is very fine. Though you still should take some action:
String.getBytes(StandardCharsets.US_ASCII) or ISO_8859_1 prevents using a more costly encoding, like UTF-8. If the platform encoding would be UTF-16LE the data would even be corrupt (0x00 bytes).
The standard java Base64 has some de-/encoders that might work. It can work on a String. However check the correct handling (line endings).
try-with-resources closes also in case of exceptions/internal returns.
The ByteArrayInputStream was not closed, which would have been better style (easier garbage collection?).
You could set the ExecutorFactory to a thread pool factory limiting the number of threads globally.
So
byte[] bI = Base64.getDecoder().decode(
fileBase64String.substring(fileBase64String.indexOf(',') + 1));
try (InputStream fis = new ByteArrayInputStream(bI)) {
...
}

Large file download in java incomplete without error

we are downloading a very large file (~70G) but one one occasion the code completed without throwing an exception, but the downloaded file was incomplete, just under 50G.
The code is:
public void download(String url, String filename) throws Exception {
URL dumpUrl = new URL(url);
try (InputStream input = dumpUrl.openStream()) {
Files.copy(input, Paths.get(filename));
}
}
The url is a presigned Google Cloud Storage URL.
Is this just the libraries not detecting a connection reset issue? Or something else?
Are there better libraries I could use. Or do I need to do a HEAD call first and then match downloaded size against content-length.
Don't care that it didn't work, that happens and we have retry logic. My issue is the code thought it did work.
UPDATE: So it seems it failed at exactly 2 hours after starting download. This makes me suspect it may be netops/firewall issue. Not sure at which end, I'll hassle my ops team for starters. Anybody know of time limits at google's end?
Ignore this update - have more instances now, no set time. Anywhere between 20 minutes and 2 hours.

Never resolved core issue. But was able to workaround by comparing the bytes downloaded to the Content-Length header. Work in a loop which resumes incomplete download using the Range header (similar to curl -C -).

IOUtils.copy() with input and output streams is extremely slow

As part of my web service, I have a picture repository which retrieves an image from Amazon S3 (a datastore) then returns it. This is how the method that does this looks:
File getPicture(String path) throws IOException {
File file = File.createTempFile(path, ".png");
S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, path));
IOUtils.copy(object.getObjectContent(), new FileOutputStream(file));
return file;
}
The problem is that it takes way too long to get a response from the service - (a 3MB image took 7.5 seconds to download). I notice that if I comment out the IOUtils.copy() line, the response time is significantly faster so it must be that particular method that's causing this delay.
I've seen this method used in almost all modern examples of converting an S3Object to a file but I seem to be a unique case. Am I missing a trick here?
Appreciate any help!

From the AWS documentation:
public S3Object getObject(GetObjectRequest getObjectRequest)
the returned Amazon S3 object contains a direct stream of data from the HTTP connection. The underlying HTTP connection cannot be reused until the user finishes reading the data and closes the stream.
public S3ObjectInputStream getObjectContent()
Note: The method is a simple getter and does not actually create a stream. If you retrieve an S3Object, you should close this input stream as soon as possible, because the object contents aren't buffered in memory and stream directly from Amazon S3.
If you remove the IOUtils.copy line, then method exits quickly because you don't actually process the stream. If the file is large it will take time to download. You can't do much about that unless you can get a better connection to the AWS services.

How to send email with large attach? (OutOfMemoryError: java heap space)

I met an OutOfMemoryError: Java Heap Space when I tried to attach a large attachment. When emails with some large files (say 50M) are sent, the error will be thrown.
The code is like this:
//add attaches
if (vo.getAttaches() != null) {
InputStream iStream = null;
ByteArrayDataSource bdSource = null;
String filename = null;
for (int i = 0; i < vo.getAttaches().length; i++) {
iStream = new FileInputStream(vo.getAttaches()[i]);
bdSource = new ByteArrayDataSource(iStream, null);
filename = vo.getAttachesFileName()[i];
email.attach(bdSource, MimeUtility.encodeText(filename), filename);
}
}
It is bdSource = new ByteArrayDataSource(iStream, null) throws the exception above. I had read "out of memory using java mail" but i don't understand. How can I upload large attaches? If I define a buffer like byte[1024], then how to code email.attach()? Should all data in buffer use the same filename?
Update: Thanks for your kindness. I use FileDataSource instead of ByteArrayDataSource, it seems that there's no exception in my send() function. But I still can not send emails with large attach. The apache james got this error:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:133)
at com.sun.mail.util.ASCIIUtility.getBytes(ASCIIUtility.java:261)
at javax.mail.internet.MimeMessage.parse(MimeMessage.java:338)
at org.apache.james.core.MimeMessageWrapper.parse(MimeMessageWrapper.java:477)
at org.apache.james.core.MimeMessageWrapper.loadMessage(MimeMessageWrapper.java:205)
at org.apache.james.core.MimeMessageWrapper.checkModifyHeaders(MimeMessageWrapper.java:414)
at org.apache.james.core.MimeMessageWrapper.setHeader(MimeMessageWrapper.java:426)
at org.apache.james.core.MimeMessageCopyOnWriteProxy.setHeader(MimeMessageCopyOnWriteProxy.java:652)
at org.apache.james.transport.mailets.UsersRepositoryAliasingForwarding.service(UsersRepositoryAliasingForwarding.java:101)
at org.apache.james.transport.mailets.LocalDelivery.service(LocalDelivery.java:64)
at org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:424)
at org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:405)
at org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:309)
at java.lang.Thread.run(Thread.java:619)
03/07/12 13:08:33 ERROR spoolmanager: An error occurred processing Mail1341292071375-0 through transport
03/07/12 13:08:33 ERROR spoolmanager: Result was error
03/07/12 13:08:33 ERROR spoolmanager: Exception in processor <error>
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at org.apache.james.core.MimeMessageUtil.copyStream(MimeMessageUtil.java:168)
at org.apache.james.core.MimeMessageWrapper.writeTo(MimeMessageWrapper.java:276)
at org.apache.james.core.MimeMessageUtil.writeTo(MimeMessageUtil.java:66)
at org.apache.james.core.MimeMessageUtil.writeTo(MimeMessageUtil.java:50)
at org.apache.james.mailrepository.MessageInputStream.writeStream(MessageInputStream.java:131)
at org.apache.james.mailrepository.MessageInputStream.<init>(MessageInputStream.java:101)
at org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:718)
at org.apache.james.transport.mailets.ToRepository.service(ToRepository.java:98)
at org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:424)
at org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:405)
at org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:309)
at java.lang.Thread.run(Thread.java:619)
03/07/12 13:08:33 ERROR spoolmanager: An error occurred processing Mail1341292071375-0 through error
03/07/12 13:08:33 ERROR spoolmanager: Result was ghost

The problem is that you're trying to store the whole file in memory while composing the message, which actually isn't usually necessary. If you're attaching real files, then you are far better off using a javax.activation.FileDataSource instead of a javax.mail.util.ByteArrayDataSource (both implement the DataSource interface) as that can allow the data to be streamed rather than being held in memory.

For your update: It seems an old (?) variant of JamesSpoolManager reads the whole file into memory when transforming to "mail format (ascii)" in the same thread dated 30/Dec/10 the problem seems fixed.
ByteArrayDataSource will read the full input from the input stream provided, see javadoc:
Create a ByteArrayDataSource with data from the specified InputStream and with the specified MIME type. The InputStream is read completely and the data is stored in a byte array.
So if the file to be read is "larger" than your heap size (JVM limit) it will throw the OutOfMemoryException.
So, to answer your question, you have (at least) two options:
Use a FileDataSource like in this SO answer
Give your program more memory (probably not a good solution in this case..)

I was facing same issue. But, during multiple emails at a same time.
Which result in two errors in logs.
1. OutOfMemoryError: Java Heap Space
2. Maximum number of connections exceeded
I have changed two parameters in wrapper.conf file.
#wrapper.java.initmemory=16
wrapper.java.initmemory=32
#wrapper.java.maxmemory=64
wrapper.java.maxmemory=128
After restart of server, Error has gone and mail send-receive is working.

Why did I get "FileUploadException: Stream ended unexpectedly" with Apache Commons FileUpload?

What is the reason for encountering this Exception:
org.apache.commons.fileupload.FileUploadException:
Processing of multipart/form-data request failed. Stream ended unexpectedly

The main reason is that the underlying socket was closed or reset. The most common reason is that the user closed the browser before the file was fully uploaded. Or the Internet was interrupted during the upload. In any case, the server side code should be able to handle this exception gracefully.

Its been about a year since I dealt with that library, but if I remember correctly, if someone tries to upload a file, then changes the browser URL (clicks a link, opens a bookmark, etc) then you could get that exception.

You could possibly get this exception if you're using FileUpload to receive an upload from flash.
At least as of version 8, Flash contains a known bug: The multipart stream it produces is broken, because the final boundary doesn't contain the suffix "--", which ought to indicate, that no more items are following. Consequently, FileUpload waits for the next item (which it doesn't get) and throws an exception.
There is a workaround suggests to use the streaming API and catch the exception.
catch (MalformedStreamException e) {
// Ignore this
}
For more details, please refer to https://commons.apache.org/proper/commons-fileupload/faq.html#missing-boundary-terminator

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.