I am trying to use Lambda function for S3 Put event notification. My Lambda function should be called once I put/add any new JSON file in my S3 bucket.
The challenge I have is there are not enough documents for this to implement such Lambda function in Java. Most of doc I found are for Node.js
I want, my Lambda function should be called and then inside that Lambda function, I want to consume that added json and then send that JSON to AWS ES Service.
But what all classes I should use for this? Anyone has any idea about this? S3 abd ES are all setup and running. The auto generated code for lambda is
`
#Override
public Object handleRequest(S3Event input, Context context) {
context.getLogger().log("Input: " + input);
// TODO: implement your handler
return null;
}
What next??
Handling S3 events in Lambda can be done, but you have to keep in mind, the the S3Event object only transports the reference to the object and not the object itself. To get to the actual object you have to invoke the AWS SDK yourself.
Requesting a S3 Object within a lambda function would look like this:
public Object handleRequest(S3Event input, Context context) {
AmazonS3Client s3Client = new AmazonS3Client(new DefaultAWSCredentialsProviderChain());
for (S3EventNotificationRecord record : input.getRecords()) {
String s3Key = record.getS3().getObject().getKey();
String s3Bucket = record.getS3().getBucket().getName();
context.getLogger().log("found id: " + s3Bucket+" "+s3Key);
// retrieve s3 object
S3Object object = s3Client.getObject(new GetObjectRequest(s3Bucket, s3Key));
InputStream objectData = object.getObjectContent();
//insert object into elasticsearch
}
return null;
}
Now the rather difficult part to insert this object into ElasticSearch. Sadly the AWS SDK does not provide any functions for this. The default approach would be to do a REST call against the AWS ES endpoint. There are various samples out their on how to proceed with calling an ElasticSearch instance.
Some people seem to go with the following project:
Jest - Elasticsearch Java Rest Client
Finally, here are the steps for S3 --> Lambda --> ES integration using Java.
Have your S3, Lamba and ES created on AWS. Steps are here.
Use below Java code in your lambda function to fetch a newly added object in S3 and send it to ES service.
public Object handleRequest(S3Event input, Context context) {
AmazonS3Client s3Client = new AmazonS3Client(new DefaultAWSCredentialsProviderChain());
for (S3EventNotificationRecord record : input.getRecords()) {
String s3Key = record.getS3().getObject().getKey();
String s3Bucket = record.getS3().getBucket().getName();
context.getLogger().log("found id: " + s3Bucket+" "+s3Key);
// retrieve s3 object
S3Object object = s3Client.getObject(new GetObjectRequest(s3Bucket, s3Key));
InputStream objectData = object.getObjectContent();
//Start putting your objects in AWS ES Service
String esInput = "Build your JSON string here using S3 objectData";
HttpClient httpClient = new DefaultHttpClient();
HttpPut putRequest = new HttpPut(AWS_ES_ENDPOINT + "/{Index_name}/{product_name}/{unique_id}" );
StringEntity input = new StringEntity(esInput);
input.setContentType("application/json");
putRequest.setEntity(input);
httpClient.execute(putRequest);
httpClient.getConnectionManager().shutdown();
}
return "success";}
Use either Postman or Sense to create Actual index & corresponding mapping in ES.
Once done, download and run proxy.js on your machine. Make sure you setup ES Security steps suggested in this post
Test setup and Kibana by running http://localhost:9200/_plugin/kibana/ URL from your machine.
All is set. Go ahead and set your dashboard in Kibana. Test it by adding new objects in your S3 bucket
Related
I am currently evaluating a proof of concept which uses Google bucket, a java microservice and Dataflow.
The communication flow is like so:
User sends CSV file to third party service
Service uploads CSV file to Google bucket with ID and filename
A create event is triggered and sent as a HTTP request to Java microservice
Java service triggers a Google Dataflow job
I am starting to think that the Java service is not necessary and I can directly call Dataflow after the CSV is uploaded to the bucket?
This is the service as you can see its just a basic controller that validates the request params from the "Create" trigger and then delegates to the Dataflow service
#PostMapping(value = "/dataflow", produces = {MediaType.APPLICATION_JSON_VALUE})
public ResponseEntity<Object> triggerDataFlowJob(#RequestBody Map<String, Object> body) {
Map<String, String> requestParams = getRequestParams(body);
log.atInfo().log("Body %s", requestParams);
String bucket = requestParams.get("bucket");
String fileName = requestParams.get("name");
if (Objects.isNull(bucket) || Objects.isNull(fileName)) {
AuditLogger.log(AuditCode.INVALID_CLOUD_STORAGE_REQUEST.getCode(), AuditCode.INVALID_CLOUD_STORAGE_REQUEST.getAuditText());
return ResponseEntity.accepted().build();
}
log.atInfo().log("Triggering a Dataflow job, using Cloud Storage bucket: %s --> and file %s", bucket, fileName);
try {
return DataflowTransport
.newDataflowClient(options)
.build()
.projects()
.locations()
.flexTemplates()
.launch(gcpProjectIdProvider.getProjectId(),
dataflowProperties.getRegion(),
launchFlexTemplateRequest)
.execute();
} catch (Exception ex) {
if (ex instanceof GoogleJsonResponseException && ((GoogleJsonResponseException) ex).getStatusCode() == 409) {
log.atInfo().log("Dataflow job already triggered using Cloud Storage bucket: %s --> and file %s", bucket, fileName);
} else {
log.atSevere().withCause(ex).log("Error while launching dataflow jobs");
AuditLogger.log(AuditCode.LAUNCH_DATAFLOW_JOB.getCode(), AuditCode.LAUNCH_DATAFLOW_JOB.getAuditText());
}
}
return ResponseEntity.accepted().build();
}
Is there a way to directly integrate Google bucket triggers with Dataflow?
When a file is uploaded to Cloud Storage, you can trigger a Cloud Function V2 with event arc.
Then in this Cloud Function, you can trigger a Dataflow job.
Deploy and trigger the Cloud Function V2, with event type object finalize :
gcloud functions deploy your_function_name \
--gen2 \
--trigger-event-filters="type=google.cloud.storage.object.v1.finalized" \
--trigger-event-filters="bucket=YOUR_STORAGE_BUCKET
In the Cloud Function, you will trigger the Dataflow job with a code sample that looks like this :
def startDataflowProcess(data, context):
from googleapiclient.discovery import build
#replace with your projectID
project = "grounded-pivot-266616"
job = project + " " + str(data['timeCreated'])
#path of the dataflow template on google storage bucket
template = "gs://sample-bucket/sample-template"
inputFile = "gs://" + str(data['bucket']) + "/" + str(data['name'])
#user defined parameters to pass to the dataflow pipeline job
parameters = {
'inputFile': inputFile,
}
#tempLocation is the path on GCS to store temp files generated during the dataflow job
environment = {'tempLocation': 'gs://sample-bucket/temp-location'}
service = build('dataflow', 'v1b3', cache_discovery=False)
#below API is used when we want to pass the location of the dataflow job
request = service.projects().locations().templates().launch(
projectId=project,
gcsPath=template,
location='europe-west1',
body={
'jobName': job,
'parameters': parameters,
'environment':environment
},
)
response = request.execute()
print(str(response))
This Cloud Function shows an example with Python but you can keep your logic with Java if you prefer.
I have an usecase to read a file from s3 and publish to rest service in java.
For the implementation, I am trying awssdk s3 API to read file which returns Flux<ByteBuffer> and then publish to rest service using the Spring WebClient.
Per my exploration, the spring WebClient requires BodyInserter which can be prepared using the BodyInserters.fromDataBuffers. I am unable to figure out how to convert properly Flux to Flux and call WebClient exchange;
Flux<ByteBuffer> byteBufferFlux = getS3File(key);
Flux<DataBuffer> dataBufferFlux= byteBufferFlux.map(byteBuffer -> {
?????????????Convert bytebuffer to data buffer ??????
return dataBuffer;
});
BodyInserter<Flux<DataBuffer>, ReactiveHttpOutputMessage> inserter = BodyInserters.fromDataBuffers(dataBufferFlux);
Any suggestions how to achieve this?
You can convert using DefaultDataBuffer which you can create via the DefaultDataBufferFactory
DataBufferFactory dataBufferFactory = new DefaultDataBufferFactory();
Flux<DataBuffer> buffer = getS3File(key).map(dataBufferFactory::wrap);
BodyInserter<Flux<DataBuffer>, ReactiveHttpOutputMessage> inserter =
BodyInserters.fromDataBuffers(buffer);
You don't actually need a BodyInserter at all though if using Webclient you can the following method signature for body()
<T, P extends Publisher<T>> RequestHeadersSpec<?> body(P publisher, Class<T> elementClass);
Which you can then pass your Flux<ByteBuffer> directly into, whilst specifying the Class to use
WebClient.create("http://someUrl")
.post()
.uri("/someUri")
.body(getS3File(key),ByteBuffer.class)
You may not need dataBufferFlux and should be able to write the Flux to your rest endpoint. Try this:
Flux<ByteBuffer> byteBufferFlux = getS3File(key);
BodyInserter<Flux<ByteBuffer>, ReactiveHttpOutputMessage> = BodyInserters.fromPublisher(byteBufferFlux, ByteBuffer.class);
I cant figure out how to read a JSON file from S3 into memory as String.
The examples I find calls getObjectContent() however this is not available for the GetObjectResponse I get from the S3AsyncClient.
The code I experiment is the sample code from AWS.
// Creates a default async client with credentials and AWS Region loaded from the
// environment
S3AsyncClient client = S3AsyncClient.create();
// Start the call to Amazon S3, not blocking to wait for the result
CompletableFuture<GetObjectResponse> responseFuture =
client.getObject(GetObjectRequest.builder()
.bucket("my-bucket")
.key("my-object-key")
.build(),
AsyncResponseTransformer.toFile(Paths.get("my-file.out")));
// When future is complete (either successfully or in error), handle the response
CompletableFuture<GetObjectResponse> operationCompleteFuture =
responseFuture.whenComplete((getObjectResponse, exception) -> {
if (getObjectResponse != null) {
// At this point, the file my-file.out has been created with the data
// from S3; let's just print the object version
System.out.println(getObjectResponse.versionId());
} else {
// Handle the error
exception.printStackTrace();
}
});
// We could do other work while waiting for the AWS call to complete in
// the background, but we'll just wait for "whenComplete" to finish instead
operationCompleteFuture.join();
How should this code be modified so that I can get the actual JSON content from the GetObjectResponse?
After response is transformed to bytes it can be transformed to string:
S3AsyncClient client = S3AsyncClient.create();
GetObjectRequest getObjectRequest = GetObjectRequest.builder().bucket("my-bucket").key("my-object-key").build();
client.getObject(getObjectRequest, AsyncResponseTransformer.toBytes())
.thenApply(ResponseBytes::asUtf8String)
.whenComplete((stringContent, exception) -> {
if (stringContent != null)
System.out.println(stringContent);
else
exception.printStackTrace();
});
You can use AsyncResponseTransformer.toBytes in order to save response to a byte array rather than a file. javadoc
I'm looking to leverage RackSpace's CloudFiles platform for large object storage (word docs, images, etc). Following some of their guides, I found a useful code snippet, that looks like it should work, but doesn't in my case.
Iterable<Module> modules = ImmutableSet.<Module> of(
new Log4JLoggingModule());
Properties properties = new Properties();
properties.setProperty(LocationConstants.PROPERTY_ZONE, ZONE);
properties.setProperty(LocationConstants.PROPERTY_REGION, "ORD");
CloudFilesClient cloudFilesClient = ContextBuilder.newBuilder(PROVIDER)
.credentials(username, apiKey)
.overrides(properties)
.modules(modules)
.buildApi(CloudFilesClient.class);
The problem is that when this code executes, it tries to log me in the IAD (Virginia) instance of CloudFiles. My organization's goal is to use the ORD (Chicago) instance as primary to be colocated with our cloud and use DFW as a back up environment. The login response results in the IAD instance coming back first, so I'm assuming JClouds is using that. Browsing around, it looks like the ZONE/REGION attributes are ignored for CloudFiles. I was wondering if there is any way to override the code that comes back for authentication to loop through the returned providers and choose which one to login to.
Update:
The accepted answer is mostly good, with some more info available in this snippet:
RestContext<CommonSwiftClient, CommonSwiftAsyncClient> swift = cloudFilesClient.unwrap();
CommonSwiftClient client = swift.getApi();
SwiftObject object = client.newSwiftObject();
object.getInfo().setName(FILENAME + SUFFIX);
object.setPayload("This is my payload."); //input stream.
String id = client.putObject(CONTAINER, object);
System.out.println(id);
SwiftObject obj2 = client.getObject(CONTAINER,FILENAME + SUFFIX);
System.out.println(obj2.getPayload());
We are working on the next version of jclouds (1.7.1) that should include multi-region support for Rackspace Cloud Files and OpenStack Swift. In the meantime you might be able to use this code as a workaround.
private void uploadToRackspaceRegion() {
Iterable<Module> modules = ImmutableSet.<Module> of(new Log4JLoggingModule());
String provider = "swift-keystone"; //Region selection is limited to swift-keystone provider
String identity = "username";
String credential = "password";
String endpoint = "https://identity.api.rackspacecloud.com/v2.0/";
String region = "ORD";
Properties overrides = new Properties();
overrides.setProperty(LocationConstants.PROPERTY_REGION, region);
overrides.setProperty(Constants.PROPERTY_API_VERSION, "2");
BlobStoreContext context = ContextBuilder.newBuilder(provider)
.endpoint(endpoint)
.credentials(identity, credential)
.modules(modules)
.overrides(overrides)
.buildView(BlobStoreContext.class);
RestContext<CommonSwiftClient, CommonSwiftAsyncClient> swift = context.unwrap();
CommonSwiftClient client = swift.getApi();
SwiftObject uploadObject = client.newSwiftObject();
uploadObject.getInfo().setName("test.txt");
uploadObject.setPayload("This is my payload."); //input stream.
String eTag = client.putObject("jclouds", uploadObject);
System.out.println("eTag = " + eTag);
SwiftObject downloadObject = client.getObject("jclouds", "test.txt");
System.out.println("downloadObject = " + downloadObject.getPayload());
context.close();
}
Use swift as you would Cloud Files. Keep in mind that if you need to use Cloud Files CDN stuff, the above won't work for that. Also, know that this way of doing things will eventually be deprecated.
Is it possible to upload a txt/pdf/png file to Amazon S3 in a single action, and get the uploaded file URL as the response?
If so, is AWS Java SDK the right library that I need to add in my java struts2 web application?
Please suggest me a solution for this.
No you cannot get the URL in single action but two :)
First of all, you may have to make the file public before uploading because it makes no sense to get the URL that no one can access. You can do so by setting ACL as Michael Astreiko suggested.
You can get the resource URL either by calling getResourceUrl or getUrl.
AmazonS3Client s3Client = (AmazonS3Client)AmazonS3ClientBuilder.defaultClient();
s3Client.putObject(new PutObjectRequest("your-bucket", "some-path/some-key.jpg", new File("somePath/someKey.jpg")).withCannedAcl(CannedAccessControlList.PublicRead))
s3Client.getResourceUrl("your-bucket", "some-path/some-key.jpg");
Note1:
The different between getResourceUrl and getUrl is that getResourceUrl will return null when exception occurs.
Note2:
getUrl method is not defined in the AmazonS3 interface. You have to cast the object to AmazonS3Client if you use the standard builder.
You can work it out for yourself given the bucket and the file name you specify in the upload request.
e.g. if your bucket is mybucket and your file is named myfilename:
https://mybucket.s3.amazonaws.com/myfilename
The s3 bit will be different depending on which region your bucket is in. For example, I use the south-east asia region so my urls are like:
https://mybucket.s3-ap-southeast-1.amazonaws.com/myfilename
For AWS SDK 2+
String key = "filePath";
String bucketName = "bucketName";
PutObjectResponse response = s3Client
.putObject(PutObjectRequest.builder().bucket(bucketName ).key(key).build(), RequestBody.fromFile(file));
GetUrlRequest request = GetUrlRequest.builder().bucket(bucketName ).key(key).build();
String url = s3Client.utilities().getUrl(request).toExternalForm();
#hussachai and #Jeffrey Kemp answers are pretty good. But they have something in common is the url returned is of virtual-host-style, not in path style. For more info regarding to the s3 url style, can refer to AWS S3 URL Styles. In case of some people want to have path style s3 url generated. Here's the step. Basically everything will be the same as #hussachai and #Jeffrey Kemp answers, only with one line setting change as below:
AmazonS3Client s3Client = (AmazonS3Client) AmazonS3ClientBuilder.standard()
.withRegion("us-west-2")
.withCredentials(DefaultAWSCredentialsProviderChain.getInstance())
.withPathStyleAccessEnabled(true)
.build();
// Upload a file as a new object with ContentType and title specified.
PutObjectRequest request = new PutObjectRequest(bucketName, stringObjKeyName, fileToUpload);
s3Client.putObject(request);
URL s3Url = s3Client.getUrl(bucketName, stringObjKeyName);
logger.info("S3 url is " + s3Url.toExternalForm());
This will generate url like:
https://s3.us-west-2.amazonaws.com/mybucket/myfilename
Similarly if you want link through s3Client you can use below.
System.out.println("filelink: " + s3Client.getUrl("your_bucket_name", "your_file_key"));
a bit old but still for anyone stumbling upon this in the future:
you can do it with one line assuming you already wrote the CredentialProvider and the AmazonS3Client.
it will look like this:
String ImageURL = String.valueOf(s3.getUrl(
ConstantsAWS3.BUCKET_NAME, //The S3 Bucket To Upload To
file.getName())); //The key for the uploaded object
and if you didn't wrote the CredentialProvider and the AmazonS3Client then just add them before getting the URL like this:
CognitoCachingCredentialsProvider credentialsProvider = new CognitoCachingCredentialsProvider(
getApplicationContext(),
"POOL_ID", // Identity pool ID
Regions.US_EAST_1 // Region
);
Below method uploads file in a particular folder in a bucket and return the generated url of the file uploaded.
private String uploadFileToS3Bucket(final String bucketName, final File file) {
final String uniqueFileName = uploadFolder + "/" + file.getName();
LOGGER.info("Uploading file with name= " + uniqueFileName);
final PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, uniqueFileName, file);
amazonS3.putObject(putObjectRequest);
return ((AmazonS3Client) amazonS3).getResourceUrl(bucketName, uniqueFileName);
}
If you're using AWS-SDK, the data object returned contains the Object URL in data.Location
const AWS = require('aws-sdk')
const s3 = new AWS.S3(config)
s3.upload(params).promise()
.then((data)=>{
console.log(data.Location)
})
.catch(err=>console.log(err))
System.out.println("Link : " + s3Object.getObjectContent().getHttpRequest().getURI());
With this code you can retrieve the link of already uploaded file to S3 bucket.
To make the file public before uploading you can use the #withCannedAcl method of PutObjectRequest:
myAmazonS3Client.putObject(new PutObjectRequest('some-grails-bucket', 'somePath/someKey.jpg', new File('/Users/ben/Desktop/photo.jpg')).withCannedAcl(CannedAccessControlList.PublicRead))
String url = myAmazonS3Client.getUrl('some-grails-bucket', 'somePath/someKey.jpg').toString();