Max file upload limitation at jvm or AWS S3 end - java

I am trying to upload a file through TransferManager(Java) to AWS S3 bucket.
My question is, if I try to upload a file of approx 4GB, then do I need to handle something at Java end.
Below is the code:
File tempFile = AWSUtil.createTempFileFromStream(inputStream);
putObjectRequest = new PutObjectRequest(bucketName, fileName, tempFile);
final TransferManager transferMgr = TransferManagerBuilder.standard().withS3Client(s3client).build();
upload = transferMgr.upload(putObjectRequest);
As far as I know AWS S3 has a file upload limitation of 5TB, which is not a concern.
Please let me know if I need to handle it at Java end.

Related

How to save an aspose workbook (.xlsx) to aws s3 using java?

I want to save Aspose worbook (.xlsx) to AWS S3 using java. Any help?
Providing S3 path directly to workbook.save("s3://...") will not work.
I am creating this file in AWS EMR Cluster. I can save this file in the same cluster and then move the file to S3. But I would like to know if there is any way of saving it directly to S3. I looked for answers but did not get any.
You can save the file in EMR Cluster and then move it to S3, then delete the file from EMR Cluster. The code snippet is given below:
workbook.save("temp.xlsx");
File file = new File("temp.xlsx");
InputStream dataStream = new FileInputStream(file);
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(clientRegion)
.build();
ObjectMetadata metadata = new ObjectMetadata();
s3Client.putObject(new PutObjectRequest(bucketName, s3Key, dataStream, metadata));
file.delete();

Read and write to a file in Amazon s3 bucket

I need to read a large (>15mb) file (say sample.csv) from an Amazon S3 bucket. I then need to process the data present in sample.csv and keep writing it to another directory in the S3 bucket. I intend to use an AWS Lambda function to run my java code.
As a first step I developed java code that runs on my local system. The java code reads the sample.csv file from the S3 bucket and I used the put method to write data back to the S3 bucket. But I find only the last line was processed and put back.
Region clientRegion = Region.Myregion;
AwsBasicCredentials awsCreds = AwsBasicCredentials.create("myAccessId","mySecretKey");
S3Client s3Client = S3Client.builder().region(clientRegion).credentialsProvider(StaticCredentialsProvider.create(awsCreds)).build();
ResponseInputStream<GetObjectResponse> s3objectResponse = s3Client.getObject(GetObjectRequest.builder().bucket(bucketName).key("Input/sample.csv").build());
BufferedReader reader = new BufferedReader(new InputStreamReader(s3objectResponse));
String line = null;
while ((line = reader.readLine()) != null) {
s3Client.putObject(PutObjectRequest.builder().bucket(bucketName).key("Test/Testout.csv").build(),RequestBody.fromString(line));
}
Example: sample.csv contains
1,sam,21,java,beginner;
2,tom,28,python,practitioner;
3,john,35,c#,expert.
My output should be
1,mas,XX,java,beginner;
2,mot,XX,python,practitioner;
3,nhoj,XX,c#,expert.
But only 3,nhoj,XX,c#,expert is written in the Testout.csv.
The putObject() method creates an Amazon S3 object.
It is not possible to append or modify an S3 object, so each time the while loop executes, it is creating a new Amazon S3 object.
Instead, I would recommend:
Download the source file from Amazon S3 to local disk (use GetObject() with a destinationFile to download to disk)
Process the file and output to a local file
Upload the output file to the Amazon S3 bucket (method)
This separates the AWS code from your processing code, which should be easier to maintain.

Upload File To Amazon S3 Using Java Not Working

I am newbie and recently started working on amazon s3 services.
I have create a java maven project and using Java 1.8 and aws-java-sdk version 1.11.6 version in my sample program
Below is source code for the same and it executes successfully.
It returns version id as output of the program.
System.out.println("Started the program to create the bucket....");
BasicAWSCredentials awsCreds = new BasicAWSCredentials(CloudMigrationConstants.AWS_ACCOUNT_KEY, CloudMigrationConstants.AWS_ACCOUNT_SECRET_KEY);
AmazonS3Client s3Client = new AmazonS3Client(awsCreds);
String uploadFileName="G:\\Ebooks\\chap1.doc";
String bucketName="jinesh1522421795620";
String keyName="test/";
System.out.println("Uploading a new object to S3 from a file\n");
File file = new File(uploadFileName);
PutObjectResult putObjectResult=s3Client.putObject(new PutObjectRequest(
bucketName, keyName, file));
System.out.println("Version id :" + putObjectResult.getVersionId());
System.out.println("Finished the program to create the bucket....");
But when I try to see the files using s3browser or amazon console I do not see the files are listed inside the bucket.
Can you please let me know what is wrong with My Java program?
I think I misunderstood the concept. We have to specify the name of the file to store while specifying the key. In above program what I missed was specifying the name of the file along with name of the folder hence I was not able to see the file.
File file = new File(uploadFileName);
PutObjectResult putObjectResult=s3Client.putObject(new PutObjectRequest(
bucketName, keyName+"/chap1.doc", file));

Create multiple empty directories in Amazon S3 using java

I am new to S3 and I am trying to create multiple directories in Amazon S3 using java by only making one call to S3.
I could only come up with this :-
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(0);
InputStream emptyContent = new ByteArrayInputStream(new byte[0]);
PutObjectRequest putObjectRequest = new PutObjectRequest(bucket,
"test/tryAgain/", emptyContent, metadata);
s3.putObject(putObjectRequest);
But the problem with this while uploading 10 folders (when the key ends with "/" in the console we can see the object as a folder ) is that I have to make 10 calls to S3.
But I want to do a create all the folders at once like we do a batch delete using DeleteObjectsRequest.
Can anyone please suggest me or help me how to solve my problem ?
Can you be a bit more specific as to what you're trying to do (or avoid doing)?
If you're primarily concerned with the cost per PUT, I don't think there is a way to batch 'upload' a directory with each file being a separate key and avoid that cost. Each PUT (even in a batch process) will cost you the price per PUT.
If you're simply trying to find a way to efficiently and recursively upload a folder, check out the uploadDirectory() method of TransferManager.
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html#uploadDirectory-java.lang.String-java.lang.String-java.io.File-boolean-
public MultipleFileUpload uploadDirectory(String bucketName,
String virtualDirectoryKeyPrefix,
File directory,
boolean includeSubdirectories)

Google Appengine JAVA -Zipping up blobstore files results in error 202 when saving back to blobstore

I am working on a application in appengine that we want to be able to make the content available for offline users. This means we need to get all the used blobstore files and save them off for the offline user. I am using the server side to do this so that it is only done once, and not for every end user. I am using the task queue to run this process as it can easily time out. Assume all this code is running as a task.
Small collections work fine, but larger collections result in a appengine error 202 and it restarts the task again and again. Here is the sample code that comes from combination of Writing Zip Files to GAE Blobstore and following the advice for large zip files at Google Appengine JAVA - Zip lots of images saving in Blobstore by reopening the channel as needed. Also referenced AppEngine Error Code 202 - Task Queue as the error.
//Set up the zip file that will be saved to the blobstore
AppEngineFile assetFile = fileService.createNewBlobFile("application/zip", assetsZipName);
FileWriteChannel writeChannel = fileService.openWriteChannel(assetFile, true);
ZipOutputStream assetsZip = new ZipOutputStream(new BufferedOutputStream(Channels.newOutputStream(writeChannel)));
HashSet<String> blobsEntries = getAllBlobEntries(); //gets blobs that I need
saveBlobAssetsToZip(blobsEntries);
writeChannel.closeFinally();
.....
private void saveBlobAssetsToZip(blobsEntries) throws IOException {
for (String blobId : blobsEntries) {
/*gets the blobstote key that will result in the blobstore entry - ignore the bsmd as
that is internal to our wrapper for blobstore.*/
BlobKey blobKey = new BlobKey(bsmd.getBlobId());
//gets the blob file as a byte array
byte[] blobData = blobstoreService.fetchData(blobKey, 0, BlobstoreService.MAX_BLOB_FETCH_SIZE-1);
String extension = type of file saved from our metadata (ie .jpg, .png, .pfd)
assetsZip.putNextEntry(new ZipEntry(blobId + "." + extension));
assetsZip.write(blobData);
assetsZip.closeEntry();
assetsZip.flush();
/*I have found that if I don't close the channel and reopen it, I can get a IO exception
because the files in the blobstore are too large, thus the write a file and then close and reopen*/
assetsZip.close();
writeChannel.close();
String assetsPath = assetFile.getFullPath();
assetFile = new AppEngineFile(assetsPath);
writeChannel = fileService.openWriteChannel(assetFile, true);
assetsZip = new ZipOutputStream(new BufferedOutputStream(Channels.newOutputStream(writeChannel)));
}
}
What is the proper way to get this to run on appengine? Again small projects work fine and zip saves, but larger projects with more blob files results in this error.
I bet that the instance is running out of memory. Are you using appstats? It can consume a large amount of memory. If that doesn't work you will probably need to increase the instance size.

Categories

Resources