Getting file size from S3 bucket - java

I am trying to get the file size (content-length) using Amazon S3 JAVA sdk.
public Long getObjectSize(AmazonS3Client amazonS3Client, String bucket, String key)
throws IOException {
Long size = null;
S3Object object = null;
try {
object = amazonS3Client.getObject(bucket, key);
size = object.getObjectMetadata().getContentLength();
} finally {
if (object != null) {
//object.close();
1. This results in 50 calls (connection pool size) post that I start getting connection pool errors.
2. If this line is uncommented it takes hell lot of time to make calls.
}
}
return size;
}
I followed this and this. But not sure what I am doing wrong here.
Any help on this?

I'm guessing what your actual question is asking, but I think you can reduce your code and eliminate the need to create an s3Object at all by doing something like:
public Long getObjectSize(AmazonS3Client amazonS3Client, String bucket, String key)
throws IOException {
return amazonS3Client.getObjectMetadata(bucket, key).getContentLength();
}
That should remove the need to call object.close() which you appear to be having issues with.

For v2 of the Amazon S3 Java SDK, try something like this:
HeadObjectRequest headObjectRequest =
HeadObjectRequest.builder()
.bucket(bucket)
.key(key)
.build();
HeadObjectResponse headObjectResponse =
s3Client.headObject(headObjectRequest);
Long contentLength = headObjectResponse.contentLength();

So we have 2 SDKs.
For v1 of the Amazon S3 Java SDK, below
client.getObjectMetadata(bucket, key).getContentLength();
where client is an instance of AmazonS3 coming from import com.amazonaws.services.s3.AmazonS3; and implementation 'com.amazonaws:aws-java-sdk-s3:1.12.353' gradle dependencies.
For v2 of the Amazon S3 Java SDK, below:
return client.headObject(HeadObjectRequest.builder().bucket(bucket).key(key).build()).contentLength();
where client is instance of S3Client coming from import software.amazon.awssdk.services.s3.S3Client; and gradle dependencies implementation 'software.amazon.awssdk:s3:2.18.35'

Related

How to upload multipart to Amazon S3 asynchronously using the java SDK

In my java application I need to write data to S3, which I don't know the size in advance and sizes are usually big so as recommend in the AWS S3 documentation I am using the Using the Java AWS SDKs (low-level-level API) to write data to the s3 bucket.
In my application I provide S3BufferedOutputStream which is an implementation OutputStream where other classes in the app can use this stream to write to the s3 bucket.
I store the data in a buffer and loop and once the data is bigger than bucket size I upload data in the buffer as a a single UploadPartRequest
Here is the implementation of the write method of S3BufferedOutputStream
#Override
public void write(byte[] b, int off, int len) throws IOException {
this.assertOpen();
int o = off, l = len;
int size;
while (l > (size = this.buf.length - position)) {
System.arraycopy(b, o, this.buf, this.position, size);
this.position += size;
flushBufferAndRewind();
o += size;
l -= size;
}
System.arraycopy(b, o, this.buf, this.position, l);
this.position += l;
}
The whole implementation is similar to this: code repo
My problem here is that each UploadPartRequest is done synchronously, so we have to wait for one part to be uploaded to be able to upload the next part. And because I am using the AWS S3 low level API I can not benefit from the parallel uploading provided by the TransferManager
Is there a way to achieve the parallel upload using low level SDK?
Or some code changes that can be done to operate Asynchronously without corrupting the uploaded data and maintain order of the data?
Here's some example code from a class that I have. It submits the parts to an ExecutorService and holds onto the returned Future. This is written for the v1 Java SDK; if you're using the v2 SDK you could use an async client rather than the explicit threadpool:
// WARNING: data must not be updated by caller; make a defensive copy if needed
public synchronized void uploadPart(byte[] data, boolean isLastPart)
{
partNumber++;
logger.debug("submitting part {} for s3://{}/{}", partNumber, bucket, key);
final UploadPartRequest request = new UploadPartRequest()
.withBucketName(bucket)
.withKey(key)
.withUploadId(uploadId)
.withPartNumber(partNumber)
.withPartSize(data.length)
.withInputStream(new ByteArrayInputStream(data))
.withLastPart(isLastPart);
futures.add(
executor.submit(new Callable<PartETag>()
{
#Override
public PartETag call() throws Exception
{
int localPartNumber = request.getPartNumber();
logger.debug("uploading part {} for s3://{}/{}", localPartNumber, bucket, key);
UploadPartResult response = client.uploadPart(request);
String etag = response.getETag();
logger.debug("uploaded part {} for s3://{}/{}; etag is {}", localPartNumber, bucket, key, etag);
return new PartETag(localPartNumber, etag);
}
}));
}
Note: this method is synchronized to ensure that parts are not submitted out of order.
Once you've submitted all of the parts, you use this method to wait for them to finish and then complete the upload:
public void complete()
{
logger.debug("waiting for upload tasks of s3://{}/{}", bucket, key);
List<PartETag> partTags = new ArrayList<>();
for (Future<PartETag> future : futures)
{
try
{
partTags.add(future.get());
}
catch (Exception e)
{
throw new RuntimeException(String.format("failed to complete upload task for s3://%s/%s"), e);
}
}
logger.debug("completing multi-part upload for s3://{}/{}", bucket, key);
CompleteMultipartUploadRequest request = new CompleteMultipartUploadRequest()
.withBucketName(bucket)
.withKey(key)
.withUploadId(uploadId)
.withPartETags(partTags);
client.completeMultipartUpload(request);
logger.debug("completed multi-part upload for s3://{}/{}", bucket, key);
}
You'll also need an abort() method that cancels outstanding parts and aborts the upload. This, and the rest of the class, are left as an exercise for the reader.
You should look at using the AWS SDK for Java V2. You are referencing V1, not the newest Amazon S3 Java API. If you are not familiar with V2, start here:
Get started with the AWS SDK for Java 2.x
To perform Async operations via the Amazon S3 Java API, you use S3AsyncClient.
Now to learn how to upload an object using this client, see this code example:
import software.amazon.awssdk.core.async.AsyncRequestBody;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import software.amazon.awssdk.services.s3.model.PutObjectResponse;
import java.nio.file.Paths;
import java.util.concurrent.CompletableFuture;
// snippet-end:[s3.java2.async_ops.import]
// snippet-start:[s3.java2.async_ops.main]
/**
* To run this AWS code example, ensure that you have setup your development environment, including your AWS credentials.
*
* For information, see this documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class S3AsyncOps {
public static void main(String[] args) {
final String USAGE = "\n" +
"Usage:\n" +
" S3AsyncOps <bucketName> <key> <path>\n\n" +
"Where:\n" +
" bucketName - the name of the Amazon S3 bucket (for example, bucket1). \n\n" +
" key - the name of the object (for example, book.pdf). \n" +
" path - the local path to the file (for example, C:/AWS/book.pdf). \n" ;
if (args.length != 3) {
System.out.println(USAGE);
System.exit(1);
}
String bucketName = args[0];
String key = args[1];
String path = args[2];
Region region = Region.US_WEST_2;
S3AsyncClient client = S3AsyncClient.builder()
.region(region)
.build();
PutObjectRequest objectRequest = PutObjectRequest.builder()
.bucket(bucketName)
.key(key)
.build();
// Put the object into the bucket
CompletableFuture<PutObjectResponse> future = client.putObject(objectRequest,
AsyncRequestBody.fromFile(Paths.get(path))
);
future.whenComplete((resp, err) -> {
try {
if (resp != null) {
System.out.println("Object uploaded. Details: " + resp);
} else {
// Handle error
err.printStackTrace();
}
} finally {
// Only close the client when you are completely done with it
client.close();
}
});
future.join();
}
}
That is uploading an object using the S3AsyncClient client. To perform a multi-part upload, you need to use this method:
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3AsyncClient.html#createMultipartUpload-software.amazon.awssdk.services.s3.model.CreateMultipartUploadRequest-
TO see an example of Multipart upload using the S3 Sync client, see:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/javav2/example_code/s3/src/main/java/com/example/s3/S3ObjectOperations.java
That is your solution - use S3AsyncClient object's createMultipartUpload method.

How can I get the size of an s3 bucket using Jets3t

I am trying to calculate the total size of an S3 bucket using my Java Spring MVC web application. I am using jets3tfor my S3 access.
I have the following method to calculate the total file size of an S3 bucket.
public Long getBucketSize(String accessKeyId, String secretAccessKey, String bucketname) throws IOException, S3ServiceException
{
AWSCredentials awsCredentials =
new AWSCredentials(accessKeyId, secretAccessKey);
S3Service s3Service = new RestS3Service(awsCredentials);
S3Object[] objects = s3Service.listObjects(bucketname);
if(objects != null)
{
for (S3Object s3Object : objects)
{
}
}
return null;
}
But I cannot find any way to calculate the total file size of a bucket. Is there any way to find that?

How to copy S3 object from one region to another when vpc endpoint is enabled

Recently I was unable to copy files using the s3.copyObject(sourceBucket, sourceKey, destBucket, destKey); because of 2 reasons.
1) The source and destination buckets are in 2 different regions (us-east-1 and us-east2 in my case).
2) The region where the server resides is in a VPC which has an S3 endpoint enabled. S3 endpoint is an internal connection to S3, but only in the same region
Given that we are moving large files, we could not download and then upload even temporarily. We also wanted to keep the S3 endpoint in place, because the application makes serious use of S3 assets once in region.
The solution is to stream copy the files from one stream to another. I wrote this simple function which will handle it.
ZipException is just a custom exception. Throw whatever you want.
Hopefully this helps somebody.
public static void copyObject(AmazonS3 sourceClient, AmazonS3 destClient, String sourceBucket, String sourceKey, String destBucket, String destKey) throws IOException {
S3ObjectInputStream inStream = null;
try {
GetObjectRequest request = new GetObjectRequest(sourceBucket, sourceKey);
S3Object object = sourceClient.getObject(request);
inStream = object.getObjectContent();
destClient.putObject(destBucket,
destKey, inStream, object.getObjectMetadata());
} catch (SdkClientException e) {
throw new ZipException("Unable to copy file.", e);
} finally {
if (inStream != null) {
inStream.close();
}
}
}

AWS S3 Event notification using Lambda function in Java

I am trying to use Lambda function for S3 Put event notification. My Lambda function should be called once I put/add any new JSON file in my S3 bucket.
The challenge I have is there are not enough documents for this to implement such Lambda function in Java. Most of doc I found are for Node.js
I want, my Lambda function should be called and then inside that Lambda function, I want to consume that added json and then send that JSON to AWS ES Service.
But what all classes I should use for this? Anyone has any idea about this? S3 abd ES are all setup and running. The auto generated code for lambda is
`
#Override
public Object handleRequest(S3Event input, Context context) {
context.getLogger().log("Input: " + input);
// TODO: implement your handler
return null;
}
What next??
Handling S3 events in Lambda can be done, but you have to keep in mind, the the S3Event object only transports the reference to the object and not the object itself. To get to the actual object you have to invoke the AWS SDK yourself.
Requesting a S3 Object within a lambda function would look like this:
public Object handleRequest(S3Event input, Context context) {
AmazonS3Client s3Client = new AmazonS3Client(new DefaultAWSCredentialsProviderChain());
for (S3EventNotificationRecord record : input.getRecords()) {
String s3Key = record.getS3().getObject().getKey();
String s3Bucket = record.getS3().getBucket().getName();
context.getLogger().log("found id: " + s3Bucket+" "+s3Key);
// retrieve s3 object
S3Object object = s3Client.getObject(new GetObjectRequest(s3Bucket, s3Key));
InputStream objectData = object.getObjectContent();
//insert object into elasticsearch
}
return null;
}
Now the rather difficult part to insert this object into ElasticSearch. Sadly the AWS SDK does not provide any functions for this. The default approach would be to do a REST call against the AWS ES endpoint. There are various samples out their on how to proceed with calling an ElasticSearch instance.
Some people seem to go with the following project:
Jest - Elasticsearch Java Rest Client
Finally, here are the steps for S3 --> Lambda --> ES integration using Java.
Have your S3, Lamba and ES created on AWS. Steps are here.
Use below Java code in your lambda function to fetch a newly added object in S3 and send it to ES service.
public Object handleRequest(S3Event input, Context context) {
AmazonS3Client s3Client = new AmazonS3Client(new DefaultAWSCredentialsProviderChain());
for (S3EventNotificationRecord record : input.getRecords()) {
String s3Key = record.getS3().getObject().getKey();
String s3Bucket = record.getS3().getBucket().getName();
context.getLogger().log("found id: " + s3Bucket+" "+s3Key);
// retrieve s3 object
S3Object object = s3Client.getObject(new GetObjectRequest(s3Bucket, s3Key));
InputStream objectData = object.getObjectContent();
//Start putting your objects in AWS ES Service
String esInput = "Build your JSON string here using S3 objectData";
HttpClient httpClient = new DefaultHttpClient();
HttpPut putRequest = new HttpPut(AWS_ES_ENDPOINT + "/{Index_name}/{product_name}/{unique_id}" );
StringEntity input = new StringEntity(esInput);
input.setContentType("application/json");
putRequest.setEntity(input);
httpClient.execute(putRequest);
httpClient.getConnectionManager().shutdown();
}
return "success";}
Use either Postman or Sense to create Actual index & corresponding mapping in ES.
Once done, download and run proxy.js on your machine. Make sure you setup ES Security steps suggested in this post
Test setup and Kibana by running http://localhost:9200/_plugin/kibana/ URL from your machine.
All is set. Go ahead and set your dashboard in Kibana. Test it by adding new objects in your S3 bucket

How to set HTTP header in Apache JClouds?

I'm using Apache JClouds to connect to my Openstack Swift installation. I managed to upload and download objects from Swift. However, I failed to see how to upload dynamic large object to Swift.
To upload dynamic large object, I need to upload all segments first, which I can do as usual. Then I need to upload a manifest object to combine them logically. The problem is to tell Swift this is a manifest object, I need to set a special header, which I don't know how to do that using JClouds api.
Here's a dynamic large object example from openstack official website.
The code I'm using:
public static void main(String[] args) throws IOException {
BlobStore blobStore = ContextBuilder.newBuilder("swift").endpoint("http://localhost:8080/auth/v1.0")
.credentials("test:test", "test").buildView(BlobStoreContext.class).getBlobStore();
blobStore.createContainerInLocation(null, "container");
ByteSource segment1 = ByteSource.wrap("foo".getBytes(Charsets.UTF_8));
Blob seg1Blob = blobStore.blobBuilder("/foo/bar/1").payload(segment1).contentLength(segment1.size()).build();
System.out.println(blobStore.putBlob("container", seg1Blob));
ByteSource segment2 = ByteSource.wrap("bar".getBytes(Charsets.UTF_8));
Blob seg2Blob = blobStore.blobBuilder("/foo/bar/2").payload(segment2).contentLength(segment2.size()).build();
System.out.println(blobStore.putBlob("container", seg2Blob));
ByteSource manifest = ByteSource.wrap("".getBytes(Charsets.UTF_8));
// TODO: set manifest header here
Blob manifestBlob = blobStore.blobBuilder("/foo/bar").payload(manifest).contentLength(manifest.size()).build();
System.out.println(blobStore.putBlob("container", manifestBlob));
Blob dloBlob = blobStore.getBlob("container", "/foo/bar");
InputStream input = dloBlob.getPayload().openStream();
while (true) {
int i = input.read();
if (i < 0) {
break;
}
System.out.print((char) i); // should print "foobar"
}
}
The "TODO" part is my problem.
Edited:
I've been pointed out that Jclouds handles large file upload automatically, which is not so useful in our case. In fact, we do not know how large the file will be or when the next segment will arrive at the time we start to upload the first segment. Our api is designed to make client able to upload their files in chunks of their own chosen size and at their own chosen time, and when done, call a 'commit' to make these chunks as a file. So this makes us want to upload the manifest on our own here.
According to #Everett Toews's answer, I've got my code correctly running:
public static void main(String[] args) throws IOException {
CommonSwiftClient swift = ContextBuilder.newBuilder("swift").endpoint("http://localhost:8080/auth/v1.0")
.credentials("test:test", "test").buildApi(CommonSwiftClient.class);
SwiftObject segment1 = swift.newSwiftObject();
segment1.getInfo().setName("foo/bar/1");
segment1.setPayload("foo");
swift.putObject("container", segment1);
SwiftObject segment2 = swift.newSwiftObject();
segment2.getInfo().setName("foo/bar/2");
segment2.setPayload("bar");
swift.putObject("container", segment2);
swift.putObjectManifest("container", "foo/bar2");
SwiftObject dlo = swift.getObject("container", "foo/bar", GetOptions.NONE);
InputStream input = dlo.getPayload().openStream();
while (true) {
int i = input.read();
if (i < 0) {
break;
}
System.out.print((char) i);
}
}
jclouds handles writing the manifest for you. Here are a couple of examples that might help you, UploadLargeObject and largeblob.MainApp.
Try using
Map<String, String> manifestMetadata = ImmutableMap.of(
"X-Object-Manifest", "<container>/<prefix>");
BlobBuilder.userMetadata(manifestMetadata)
If that doesn't work you might have to use the CommonSwiftClient like in CrossOriginResourceSharingContainer.java.

Categories

Resources