How to read JSON files from S3 using the S3AsyncClient

How to read JSON files from S3 using the S3AsyncClient - java

I cant figure out how to read a JSON file from S3 into memory as String.
The examples I find calls getObjectContent() however this is not available for the GetObjectResponse I get from the S3AsyncClient.
The code I experiment is the sample code from AWS.
// Creates a default async client with credentials and AWS Region loaded from the
// environment
S3AsyncClient client = S3AsyncClient.create();
// Start the call to Amazon S3, not blocking to wait for the result
CompletableFuture<GetObjectResponse> responseFuture =
client.getObject(GetObjectRequest.builder()
.bucket("my-bucket")
.key("my-object-key")
.build(),
AsyncResponseTransformer.toFile(Paths.get("my-file.out")));
// When future is complete (either successfully or in error), handle the response
CompletableFuture<GetObjectResponse> operationCompleteFuture =
responseFuture.whenComplete((getObjectResponse, exception) -> {
if (getObjectResponse != null) {
// At this point, the file my-file.out has been created with the data
// from S3; let's just print the object version
System.out.println(getObjectResponse.versionId());
} else {
// Handle the error
exception.printStackTrace();
}
});
// We could do other work while waiting for the AWS call to complete in
// the background, but we'll just wait for "whenComplete" to finish instead
operationCompleteFuture.join();
How should this code be modified so that I can get the actual JSON content from the GetObjectResponse?

After response is transformed to bytes it can be transformed to string:
S3AsyncClient client = S3AsyncClient.create();
GetObjectRequest getObjectRequest = GetObjectRequest.builder().bucket("my-bucket").key("my-object-key").build();
client.getObject(getObjectRequest, AsyncResponseTransformer.toBytes())
.thenApply(ResponseBytes::asUtf8String)
.whenComplete((stringContent, exception) -> {
if (stringContent != null)
System.out.println(stringContent);
else
exception.printStackTrace();
});

You can use AsyncResponseTransformer.toBytes in order to save response to a byte array rather than a file. javadoc

Related

How to upload multipart to Amazon S3 asynchronously using the java SDK

In my java application I need to write data to S3, which I don't know the size in advance and sizes are usually big so as recommend in the AWS S3 documentation I am using the Using the Java AWS SDKs (low-level-level API) to write data to the s3 bucket.
In my application I provide S3BufferedOutputStream which is an implementation OutputStream where other classes in the app can use this stream to write to the s3 bucket.
I store the data in a buffer and loop and once the data is bigger than bucket size I upload data in the buffer as a a single UploadPartRequest
Here is the implementation of the write method of S3BufferedOutputStream
#Override
public void write(byte[] b, int off, int len) throws IOException {
this.assertOpen();
int o = off, l = len;
int size;
while (l > (size = this.buf.length - position)) {
System.arraycopy(b, o, this.buf, this.position, size);
this.position += size;
flushBufferAndRewind();
o += size;
l -= size;
}
System.arraycopy(b, o, this.buf, this.position, l);
this.position += l;
}
The whole implementation is similar to this: code repo
My problem here is that each UploadPartRequest is done synchronously, so we have to wait for one part to be uploaded to be able to upload the next part. And because I am using the AWS S3 low level API I can not benefit from the parallel uploading provided by the TransferManager
Is there a way to achieve the parallel upload using low level SDK?
Or some code changes that can be done to operate Asynchronously without corrupting the uploaded data and maintain order of the data?

Here's some example code from a class that I have. It submits the parts to an ExecutorService and holds onto the returned Future. This is written for the v1 Java SDK; if you're using the v2 SDK you could use an async client rather than the explicit threadpool:
// WARNING: data must not be updated by caller; make a defensive copy if needed
public synchronized void uploadPart(byte[] data, boolean isLastPart)
{
partNumber++;
logger.debug("submitting part {} for s3://{}/{}", partNumber, bucket, key);
final UploadPartRequest request = new UploadPartRequest()
.withBucketName(bucket)
.withKey(key)
.withUploadId(uploadId)
.withPartNumber(partNumber)
.withPartSize(data.length)
.withInputStream(new ByteArrayInputStream(data))
.withLastPart(isLastPart);
futures.add(
executor.submit(new Callable<PartETag>()
{
#Override
public PartETag call() throws Exception
{
int localPartNumber = request.getPartNumber();
logger.debug("uploading part {} for s3://{}/{}", localPartNumber, bucket, key);
UploadPartResult response = client.uploadPart(request);
String etag = response.getETag();
logger.debug("uploaded part {} for s3://{}/{}; etag is {}", localPartNumber, bucket, key, etag);
return new PartETag(localPartNumber, etag);
}
}));
}
Note: this method is synchronized to ensure that parts are not submitted out of order.
Once you've submitted all of the parts, you use this method to wait for them to finish and then complete the upload:
public void complete()
{
logger.debug("waiting for upload tasks of s3://{}/{}", bucket, key);
List<PartETag> partTags = new ArrayList<>();
for (Future<PartETag> future : futures)
{
try
{
partTags.add(future.get());
}
catch (Exception e)
{
throw new RuntimeException(String.format("failed to complete upload task for s3://%s/%s"), e);
}
}
logger.debug("completing multi-part upload for s3://{}/{}", bucket, key);
CompleteMultipartUploadRequest request = new CompleteMultipartUploadRequest()
.withBucketName(bucket)
.withKey(key)
.withUploadId(uploadId)
.withPartETags(partTags);
client.completeMultipartUpload(request);
logger.debug("completed multi-part upload for s3://{}/{}", bucket, key);
}
You'll also need an abort() method that cancels outstanding parts and aborts the upload. This, and the rest of the class, are left as an exercise for the reader.

You should look at using the AWS SDK for Java V2. You are referencing V1, not the newest Amazon S3 Java API. If you are not familiar with V2, start here:
Get started with the AWS SDK for Java 2.x
To perform Async operations via the Amazon S3 Java API, you use S3AsyncClient.
Now to learn how to upload an object using this client, see this code example:
import software.amazon.awssdk.core.async.AsyncRequestBody;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import software.amazon.awssdk.services.s3.model.PutObjectResponse;
import java.nio.file.Paths;
import java.util.concurrent.CompletableFuture;
// snippet-end:[s3.java2.async_ops.import]
// snippet-start:[s3.java2.async_ops.main]
/**
* To run this AWS code example, ensure that you have setup your development environment, including your AWS credentials.
*
* For information, see this documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class S3AsyncOps {
public static void main(String[] args) {
final String USAGE = "\n" +
"Usage:\n" +
" S3AsyncOps <bucketName> <key> <path>\n\n" +
"Where:\n" +
" bucketName - the name of the Amazon S3 bucket (for example, bucket1). \n\n" +
" key - the name of the object (for example, book.pdf). \n" +
" path - the local path to the file (for example, C:/AWS/book.pdf). \n" ;
if (args.length != 3) {
System.out.println(USAGE);
System.exit(1);
}
String bucketName = args[0];
String key = args[1];
String path = args[2];
Region region = Region.US_WEST_2;
S3AsyncClient client = S3AsyncClient.builder()
.region(region)
.build();
PutObjectRequest objectRequest = PutObjectRequest.builder()
.bucket(bucketName)
.key(key)
.build();
// Put the object into the bucket
CompletableFuture<PutObjectResponse> future = client.putObject(objectRequest,
AsyncRequestBody.fromFile(Paths.get(path))
);
future.whenComplete((resp, err) -> {
try {
if (resp != null) {
System.out.println("Object uploaded. Details: " + resp);
} else {
// Handle error
err.printStackTrace();
}
} finally {
// Only close the client when you are completely done with it
client.close();
}
});
future.join();
}
}
That is uploading an object using the S3AsyncClient client. To perform a multi-part upload, you need to use this method:
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3AsyncClient.html#createMultipartUpload-software.amazon.awssdk.services.s3.model.CreateMultipartUploadRequest-
TO see an example of Multipart upload using the S3 Sync client, see:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/javav2/example_code/s3/src/main/java/com/example/s3/S3ObjectOperations.java
That is your solution - use S3AsyncClient object's createMultipartUpload method.

Vertx client is taking time to check for failure

I have a requirement where I am connecting one microservice to other microservice via Vertx client. In the code I am checking if another microservice is down then on failure it should create some JsonObject with solrError as key and failure message as value. If there is a solr error I mean if other microservice is down which is calling solr via load balancing then it should throw some error response. But Vertx client is taking some time to check on failure and when condition is checked that time there is no solrError in jsonobject as Vertx client is taking some time to check for failure so condition fails and resp is coming as null. In order to avoid this what can be done so that Vertx client fails before the condition to check for solrError and returns Internal server error response?
Below is the code :
solrQueryService.executeQuery(query).subscribe().with(jsonObject -> {
ObjectMapper objMapper = new ObjectMapper();
SolrOutput solrOutput = new SolrOutput();
List<Doc> docs = new ArrayList<>();
try {
if(null != jsonObject.getMap().get("solrError")){
resp = Response.status(Response.Status.INTERNAL_SERVER_ERROR)
.entity(new BaseException(
exceptionService.processSolrDownError(request.header.referenceId))
.getResponse()).build();
}
solrOutput = objMapper.readValue(jsonObject.toString(), SolrOutput.class);
if (null != solrOutput.getResponse()
&& CollectionUtils.isNotEmpty(solrOutput.getResponse().getDocs())) {
docs.addAll(solrOutput.getResponse().getDocs());
uniDocList = Uni.createFrom().item(docs);
}
} catch (JsonProcessingException e) {
e.printStackTrace();
}
});
if(null!=resp && resp.getStatus() !=200) {
return resp ;
}
SolrQueryService is preparing query and send out URL and query to Vertx web client as below :
public Uni<JsonObject> search(URL url, SolrQuery query,Integer timeout) {
int port = url.getPort();
if (port == -1 && "https".equals(url.getProtocol())) {
port = 443;
}
if (port == -1 && "http".equals(url.getProtocol())) {
port = 80;
}
HttpRequest<Buffer> request = client.post(port, url.getHost(), url.getPath()).timeout(timeout);
return request.sendJson(query).map(resp -> {
return resp.bodyAsJsonObject();
}).onFailure().recoverWithUni(f -> {
return Uni.createFrom().item(new JsonObject().put("solrError", f.getMessage()));
});
}

I have not used the Vertx client but assume its reactive and non-blocking. Assuming this is the case, your code seems to be mixing imperative and reactive constructs. The subscribe in the first line is reactive and the lambda you provide will be called when the server responds to the client request. However, after the subscribe, you have imperative code which runs before the lambda even has a chance to be called so your checks and access to the "resp" object will never be a result of what happened in the lambda itself.
You need to move all the code into the lambda or at least make subsequent code chain onto the result of the subscribe.

Is there any direct way to copy one s3 directory to another in java or scala?

I want to archive all the files and sub directories in a s3 directory to some other s3 location using java. Is there any direct way to copy one s3 directory to another in java or scala?

There is no API call to operate on whole directories in Amazon S3.
In fact, directories/folders do not exist in Amazon S3. Rather, each object stores the full path in its filename (Key).
If you wish to copy multiple objects that have the same prefix in their Key, your code will need to loop through the objects, copying one object at a time.

A bit wordy, but does the job: reasonable logging, multithreading via TransferManager, handling continuation token for "folders" with more than 1000 keys:
/**
* Copies all content from s3://sourceBucketName/sourceFolder to s3://destinationBucketName/destinationFolder.
*/
public void copyAll(String sourceBucketName, String sourceFolder, String destinationBucketName, String destinationFolder) {
log.info("Copying data from s3://{}/{} to s3://{}/{}", sourceBucketName, sourceFolder, destinationBucketName, destinationFolder);
TransferManager transferManager = TransferManagerBuilder.standard()
.withS3Client(client)
.build();
try {
ListObjectsV2Request request = new ListObjectsV2Request()
.withBucketName(sourceBucketName)
.withPrefix(sourceFolder);
ListObjectsV2Result objects;
do {
objects = client.listObjectsV2(request);
List<Copy> transfers = new ArrayList<>();
for (S3ObjectSummary object : objects.getObjectSummaries()) {
String sourceKey = object.getKey();
String sourceRelativeKey = sourceKey.substring(sourceFolder.length());
String destinationKey = destinationFolder + sourceRelativeKey;
transfers.add(transferManager.copy(sourceBucketName, sourceKey, destinationBucketName, destinationKey));
}
for (Copy transfer : transfers) {
log.debug(transfer.getDescription());
transfer.waitForCompletion();
}
log.info("Copied batch of {} objects. Last object: {}", transfers.size(), transfers.isEmpty() ? "None" : transfers.get(transfers.size() - 1).getDescription());
request.setContinuationToken(objects.getNextContinuationToken());
} while (objects.isTruncated());
log.info("Copy operation completed successfully from s3://{}/{} to s3://{}/{}", sourceBucketName, sourceFolder, destinationBucketName, destinationFolder);
} catch (InterruptedException e) {
// Resetting interrupt flag and returning control to the caller.
Thread.currentThread().interrupt();
throw new RuntimeException(e);
} finally {
transferManager.shutdownNow(false);
}
}

How to copy S3 object from one region to another when vpc endpoint is enabled

Recently I was unable to copy files using the s3.copyObject(sourceBucket, sourceKey, destBucket, destKey); because of 2 reasons.
1) The source and destination buckets are in 2 different regions (us-east-1 and us-east2 in my case).
2) The region where the server resides is in a VPC which has an S3 endpoint enabled. S3 endpoint is an internal connection to S3, but only in the same region
Given that we are moving large files, we could not download and then upload even temporarily. We also wanted to keep the S3 endpoint in place, because the application makes serious use of S3 assets once in region.

The solution is to stream copy the files from one stream to another. I wrote this simple function which will handle it.
ZipException is just a custom exception. Throw whatever you want.
Hopefully this helps somebody.
public static void copyObject(AmazonS3 sourceClient, AmazonS3 destClient, String sourceBucket, String sourceKey, String destBucket, String destKey) throws IOException {
S3ObjectInputStream inStream = null;
try {
GetObjectRequest request = new GetObjectRequest(sourceBucket, sourceKey);
S3Object object = sourceClient.getObject(request);
inStream = object.getObjectContent();
destClient.putObject(destBucket,
destKey, inStream, object.getObjectMetadata());
} catch (SdkClientException e) {
throw new ZipException("Unable to copy file.", e);
} finally {
if (inStream != null) {
inStream.close();
}
}
}

Getting error "Can't read input file" on production server but not locally

I make a POST to a request with a File included in the request body.
In my method I retrieve this File
if(request.body.file("imageFile").getOrElse(null) != null) {
request.body.file("imageFile").map{ case FilePart(key, name, contentType, content) =>
try{
val in:InputStream = new BufferedInputStream(new ByteArrayInputStream(content))
image = ImageIO.read(in)
} catch {
case e => Logger.debug(e.printStackTrace.toString); throw new Exception(e.getMessage)
}
}
}
If a File is included in the request body it tries to get it, else it just tries to get a file from S3.
else {
try{
val in:InputStream = new BufferedInputStream(new ByteArrayInputStream(S3Storage.retrieveS3File("facebook.jpg").content))
image = ImageIO.read(in)
} catch {
case e:IOException => Logger.debug("Failed to retrieve facebook image"); throw new IOException(e.getMessage)
}
All this works fine when I run it on my computer, but when I check in this and test it on the amazon server the image = ImageIO.read(in) gives me an error; Can't read input file!.
For me this makes no sense since the file is either in the request body or it's grabbed from a S3 bucket.
I've debugged this code and in the production environment there is a file available there when the "read" is done.
Why cannot the file be read from the production environment?
regards

One suggestion would be not to swallow the original exception and stack trace.
Use constructor new Exception(message, catchedException) in your catch blocks.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read JSON files from S3 using the S3AsyncClient - java

You can use AsyncResponseTransformer.toBytes in order to save response to a byte array rather than a file. javadoc

Related

How to upload multipart to Amazon S3 asynchronously using the java SDK

Vertx client is taking time to check for failure

Is there any direct way to copy one s3 directory to another in java or scala?

How to copy S3 object from one region to another when vpc endpoint is enabled

Getting error "Can't read input file" on production server but not locally

Categories

Resources