I am testing with Amazon S3 compatible Minio using "aws-java-sdk-s3" in Java (Servlet).
Minio wants to set this as "Prefix: *, Read Only" because the initial value of the bucket policy is None.
I added the source code when creating the bucket
I wrote as follows, but it did not change.
BasicAWSCredentials awsCreds = new BasicAWSCredentials(awsId, awsKey);
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
.withEndpointConfiguration(new EndpointConfiguration(endpoint, null))
.withPathStyleAccessEnabled(true)
.build();
s3client.createBucket(new CreateBucketRequest(bucketName));
s3client.setBucketPolicy(bucketName,
"{"
+ "\"Version\":\"2012-10-17\","
+ "\"Statement\":["
+ "{"
+ "\"Sid\":\"Statement1\","
+ "\"Effect\":\"Allow\","
+ "\"Principal\":\"*\","
+ "\"Action\":[\"s3:GetObject\"],"
+ "\"Resource\":[\"arn:aws:s3:::*\"]"
+ "}"
+ "]"
+ "}"
);
What did I mistake? please tell me.
If it is possible to change the initial value of bucket policy for all buckets, such as with Minio's environment setting, there is no problem.
Thank you.
This is how you can get the general public access policy programmatically.
// Gets a public read policy on the bucket.
public static String getPublicReadPolicy(String bucket_name) {
Policy bucket_policy = new Policy().withStatements(
new Statement(Statement.Effect.Allow)
.withPrincipals(Principal.AllUsers)
.withActions(S3Actions.GetObject)
.withResources(new Resource(
"arn:aws:s3:::" + bucket_name + "/*")));
return bucket_policy.toJson();
}
Then you can use this policy text to the desired s3 bucket
String policy_text = getPublicReadPolicy(bucket_name);
setBucketPolicy(bucket_name, policy_text);
However in the minio console this will not show with public access. It shows as a custom policy, but it works as public access only.(ref. below image)
Also this logic can be extended by applying wildcard * instead of specific bucket.
Related
In my java application I need to write data to S3, which I don't know the size in advance and sizes are usually big so as recommend in the AWS S3 documentation I am using the Using the Java AWS SDKs (low-level-level API) to write data to the s3 bucket.
In my application I provide S3BufferedOutputStream which is an implementation OutputStream where other classes in the app can use this stream to write to the s3 bucket.
I store the data in a buffer and loop and once the data is bigger than bucket size I upload data in the buffer as a a single UploadPartRequest
Here is the implementation of the write method of S3BufferedOutputStream
#Override
public void write(byte[] b, int off, int len) throws IOException {
this.assertOpen();
int o = off, l = len;
int size;
while (l > (size = this.buf.length - position)) {
System.arraycopy(b, o, this.buf, this.position, size);
this.position += size;
flushBufferAndRewind();
o += size;
l -= size;
}
System.arraycopy(b, o, this.buf, this.position, l);
this.position += l;
}
The whole implementation is similar to this: code repo
My problem here is that each UploadPartRequest is done synchronously, so we have to wait for one part to be uploaded to be able to upload the next part. And because I am using the AWS S3 low level API I can not benefit from the parallel uploading provided by the TransferManager
Is there a way to achieve the parallel upload using low level SDK?
Or some code changes that can be done to operate Asynchronously without corrupting the uploaded data and maintain order of the data?
Here's some example code from a class that I have. It submits the parts to an ExecutorService and holds onto the returned Future. This is written for the v1 Java SDK; if you're using the v2 SDK you could use an async client rather than the explicit threadpool:
// WARNING: data must not be updated by caller; make a defensive copy if needed
public synchronized void uploadPart(byte[] data, boolean isLastPart)
{
partNumber++;
logger.debug("submitting part {} for s3://{}/{}", partNumber, bucket, key);
final UploadPartRequest request = new UploadPartRequest()
.withBucketName(bucket)
.withKey(key)
.withUploadId(uploadId)
.withPartNumber(partNumber)
.withPartSize(data.length)
.withInputStream(new ByteArrayInputStream(data))
.withLastPart(isLastPart);
futures.add(
executor.submit(new Callable<PartETag>()
{
#Override
public PartETag call() throws Exception
{
int localPartNumber = request.getPartNumber();
logger.debug("uploading part {} for s3://{}/{}", localPartNumber, bucket, key);
UploadPartResult response = client.uploadPart(request);
String etag = response.getETag();
logger.debug("uploaded part {} for s3://{}/{}; etag is {}", localPartNumber, bucket, key, etag);
return new PartETag(localPartNumber, etag);
}
}));
}
Note: this method is synchronized to ensure that parts are not submitted out of order.
Once you've submitted all of the parts, you use this method to wait for them to finish and then complete the upload:
public void complete()
{
logger.debug("waiting for upload tasks of s3://{}/{}", bucket, key);
List<PartETag> partTags = new ArrayList<>();
for (Future<PartETag> future : futures)
{
try
{
partTags.add(future.get());
}
catch (Exception e)
{
throw new RuntimeException(String.format("failed to complete upload task for s3://%s/%s"), e);
}
}
logger.debug("completing multi-part upload for s3://{}/{}", bucket, key);
CompleteMultipartUploadRequest request = new CompleteMultipartUploadRequest()
.withBucketName(bucket)
.withKey(key)
.withUploadId(uploadId)
.withPartETags(partTags);
client.completeMultipartUpload(request);
logger.debug("completed multi-part upload for s3://{}/{}", bucket, key);
}
You'll also need an abort() method that cancels outstanding parts and aborts the upload. This, and the rest of the class, are left as an exercise for the reader.
You should look at using the AWS SDK for Java V2. You are referencing V1, not the newest Amazon S3 Java API. If you are not familiar with V2, start here:
Get started with the AWS SDK for Java 2.x
To perform Async operations via the Amazon S3 Java API, you use S3AsyncClient.
Now to learn how to upload an object using this client, see this code example:
import software.amazon.awssdk.core.async.AsyncRequestBody;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import software.amazon.awssdk.services.s3.model.PutObjectResponse;
import java.nio.file.Paths;
import java.util.concurrent.CompletableFuture;
// snippet-end:[s3.java2.async_ops.import]
// snippet-start:[s3.java2.async_ops.main]
/**
* To run this AWS code example, ensure that you have setup your development environment, including your AWS credentials.
*
* For information, see this documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class S3AsyncOps {
public static void main(String[] args) {
final String USAGE = "\n" +
"Usage:\n" +
" S3AsyncOps <bucketName> <key> <path>\n\n" +
"Where:\n" +
" bucketName - the name of the Amazon S3 bucket (for example, bucket1). \n\n" +
" key - the name of the object (for example, book.pdf). \n" +
" path - the local path to the file (for example, C:/AWS/book.pdf). \n" ;
if (args.length != 3) {
System.out.println(USAGE);
System.exit(1);
}
String bucketName = args[0];
String key = args[1];
String path = args[2];
Region region = Region.US_WEST_2;
S3AsyncClient client = S3AsyncClient.builder()
.region(region)
.build();
PutObjectRequest objectRequest = PutObjectRequest.builder()
.bucket(bucketName)
.key(key)
.build();
// Put the object into the bucket
CompletableFuture<PutObjectResponse> future = client.putObject(objectRequest,
AsyncRequestBody.fromFile(Paths.get(path))
);
future.whenComplete((resp, err) -> {
try {
if (resp != null) {
System.out.println("Object uploaded. Details: " + resp);
} else {
// Handle error
err.printStackTrace();
}
} finally {
// Only close the client when you are completely done with it
client.close();
}
});
future.join();
}
}
That is uploading an object using the S3AsyncClient client. To perform a multi-part upload, you need to use this method:
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3AsyncClient.html#createMultipartUpload-software.amazon.awssdk.services.s3.model.CreateMultipartUploadRequest-
TO see an example of Multipart upload using the S3 Sync client, see:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/javav2/example_code/s3/src/main/java/com/example/s3/S3ObjectOperations.java
That is your solution - use S3AsyncClient object's createMultipartUpload method.
I have a AWS S3 with the following structure or hierarchy:
customer/name/firstname/123.gz
customer/name/firstname/456.gz
customer/name/firstname/789.gz
I need to get count of all the gz files in customer/name/firstname using java sdk.
Can I please have a Java code on how to do it?
There are several ways to get list of files from S3. This is one of them:
/**
* #bucketName bucket name (i.e. customer)
* #path path within given bucket (i.e. name/firstname)
* #pattern pattern that matches required files (i.e. "\\w+\\.gz")
*/
private List<String> getFileList(String bucketName, String path, Pattern pattern) throws AmazonS3Exception {
ListObjectsV2Request request = createRequest(bucketName, path);
return s3.listObjectsV2(request).getObjectSummaries().stream()
.map(file -> FilenameUtils.getName(file.getKey()))
.filter(fileName -> pattern.matcher(fileName).matches())
.sorted()
.collect(Collectors.toList());
}
private static ListObjectsV2Request createRequest(String bucketName, String path) {
ListObjectsV2Request request = new ListObjectsV2Request();
request.setPrefix(path);
request.withBucketName(bucketName);
return request;
}
P.S. I suppose that you already have S3 credentials in your home directory and successfully initialized AmazonS3 s3 instance.
I am trying to use Lambda function for S3 Put event notification. My Lambda function should be called once I put/add any new JSON file in my S3 bucket.
The challenge I have is there are not enough documents for this to implement such Lambda function in Java. Most of doc I found are for Node.js
I want, my Lambda function should be called and then inside that Lambda function, I want to consume that added json and then send that JSON to AWS ES Service.
But what all classes I should use for this? Anyone has any idea about this? S3 abd ES are all setup and running. The auto generated code for lambda is
`
#Override
public Object handleRequest(S3Event input, Context context) {
context.getLogger().log("Input: " + input);
// TODO: implement your handler
return null;
}
What next??
Handling S3 events in Lambda can be done, but you have to keep in mind, the the S3Event object only transports the reference to the object and not the object itself. To get to the actual object you have to invoke the AWS SDK yourself.
Requesting a S3 Object within a lambda function would look like this:
public Object handleRequest(S3Event input, Context context) {
AmazonS3Client s3Client = new AmazonS3Client(new DefaultAWSCredentialsProviderChain());
for (S3EventNotificationRecord record : input.getRecords()) {
String s3Key = record.getS3().getObject().getKey();
String s3Bucket = record.getS3().getBucket().getName();
context.getLogger().log("found id: " + s3Bucket+" "+s3Key);
// retrieve s3 object
S3Object object = s3Client.getObject(new GetObjectRequest(s3Bucket, s3Key));
InputStream objectData = object.getObjectContent();
//insert object into elasticsearch
}
return null;
}
Now the rather difficult part to insert this object into ElasticSearch. Sadly the AWS SDK does not provide any functions for this. The default approach would be to do a REST call against the AWS ES endpoint. There are various samples out their on how to proceed with calling an ElasticSearch instance.
Some people seem to go with the following project:
Jest - Elasticsearch Java Rest Client
Finally, here are the steps for S3 --> Lambda --> ES integration using Java.
Have your S3, Lamba and ES created on AWS. Steps are here.
Use below Java code in your lambda function to fetch a newly added object in S3 and send it to ES service.
public Object handleRequest(S3Event input, Context context) {
AmazonS3Client s3Client = new AmazonS3Client(new DefaultAWSCredentialsProviderChain());
for (S3EventNotificationRecord record : input.getRecords()) {
String s3Key = record.getS3().getObject().getKey();
String s3Bucket = record.getS3().getBucket().getName();
context.getLogger().log("found id: " + s3Bucket+" "+s3Key);
// retrieve s3 object
S3Object object = s3Client.getObject(new GetObjectRequest(s3Bucket, s3Key));
InputStream objectData = object.getObjectContent();
//Start putting your objects in AWS ES Service
String esInput = "Build your JSON string here using S3 objectData";
HttpClient httpClient = new DefaultHttpClient();
HttpPut putRequest = new HttpPut(AWS_ES_ENDPOINT + "/{Index_name}/{product_name}/{unique_id}" );
StringEntity input = new StringEntity(esInput);
input.setContentType("application/json");
putRequest.setEntity(input);
httpClient.execute(putRequest);
httpClient.getConnectionManager().shutdown();
}
return "success";}
Use either Postman or Sense to create Actual index & corresponding mapping in ES.
Once done, download and run proxy.js on your machine. Make sure you setup ES Security steps suggested in this post
Test setup and Kibana by running http://localhost:9200/_plugin/kibana/ URL from your machine.
All is set. Go ahead and set your dashboard in Kibana. Test it by adding new objects in your S3 bucket
I'm looking to leverage RackSpace's CloudFiles platform for large object storage (word docs, images, etc). Following some of their guides, I found a useful code snippet, that looks like it should work, but doesn't in my case.
Iterable<Module> modules = ImmutableSet.<Module> of(
new Log4JLoggingModule());
Properties properties = new Properties();
properties.setProperty(LocationConstants.PROPERTY_ZONE, ZONE);
properties.setProperty(LocationConstants.PROPERTY_REGION, "ORD");
CloudFilesClient cloudFilesClient = ContextBuilder.newBuilder(PROVIDER)
.credentials(username, apiKey)
.overrides(properties)
.modules(modules)
.buildApi(CloudFilesClient.class);
The problem is that when this code executes, it tries to log me in the IAD (Virginia) instance of CloudFiles. My organization's goal is to use the ORD (Chicago) instance as primary to be colocated with our cloud and use DFW as a back up environment. The login response results in the IAD instance coming back first, so I'm assuming JClouds is using that. Browsing around, it looks like the ZONE/REGION attributes are ignored for CloudFiles. I was wondering if there is any way to override the code that comes back for authentication to loop through the returned providers and choose which one to login to.
Update:
The accepted answer is mostly good, with some more info available in this snippet:
RestContext<CommonSwiftClient, CommonSwiftAsyncClient> swift = cloudFilesClient.unwrap();
CommonSwiftClient client = swift.getApi();
SwiftObject object = client.newSwiftObject();
object.getInfo().setName(FILENAME + SUFFIX);
object.setPayload("This is my payload."); //input stream.
String id = client.putObject(CONTAINER, object);
System.out.println(id);
SwiftObject obj2 = client.getObject(CONTAINER,FILENAME + SUFFIX);
System.out.println(obj2.getPayload());
We are working on the next version of jclouds (1.7.1) that should include multi-region support for Rackspace Cloud Files and OpenStack Swift. In the meantime you might be able to use this code as a workaround.
private void uploadToRackspaceRegion() {
Iterable<Module> modules = ImmutableSet.<Module> of(new Log4JLoggingModule());
String provider = "swift-keystone"; //Region selection is limited to swift-keystone provider
String identity = "username";
String credential = "password";
String endpoint = "https://identity.api.rackspacecloud.com/v2.0/";
String region = "ORD";
Properties overrides = new Properties();
overrides.setProperty(LocationConstants.PROPERTY_REGION, region);
overrides.setProperty(Constants.PROPERTY_API_VERSION, "2");
BlobStoreContext context = ContextBuilder.newBuilder(provider)
.endpoint(endpoint)
.credentials(identity, credential)
.modules(modules)
.overrides(overrides)
.buildView(BlobStoreContext.class);
RestContext<CommonSwiftClient, CommonSwiftAsyncClient> swift = context.unwrap();
CommonSwiftClient client = swift.getApi();
SwiftObject uploadObject = client.newSwiftObject();
uploadObject.getInfo().setName("test.txt");
uploadObject.setPayload("This is my payload."); //input stream.
String eTag = client.putObject("jclouds", uploadObject);
System.out.println("eTag = " + eTag);
SwiftObject downloadObject = client.getObject("jclouds", "test.txt");
System.out.println("downloadObject = " + downloadObject.getPayload());
context.close();
}
Use swift as you would Cloud Files. Keep in mind that if you need to use Cloud Files CDN stuff, the above won't work for that. Also, know that this way of doing things will eventually be deprecated.
Is it possible to upload a txt/pdf/png file to Amazon S3 in a single action, and get the uploaded file URL as the response?
If so, is AWS Java SDK the right library that I need to add in my java struts2 web application?
Please suggest me a solution for this.
No you cannot get the URL in single action but two :)
First of all, you may have to make the file public before uploading because it makes no sense to get the URL that no one can access. You can do so by setting ACL as Michael Astreiko suggested.
You can get the resource URL either by calling getResourceUrl or getUrl.
AmazonS3Client s3Client = (AmazonS3Client)AmazonS3ClientBuilder.defaultClient();
s3Client.putObject(new PutObjectRequest("your-bucket", "some-path/some-key.jpg", new File("somePath/someKey.jpg")).withCannedAcl(CannedAccessControlList.PublicRead))
s3Client.getResourceUrl("your-bucket", "some-path/some-key.jpg");
Note1:
The different between getResourceUrl and getUrl is that getResourceUrl will return null when exception occurs.
Note2:
getUrl method is not defined in the AmazonS3 interface. You have to cast the object to AmazonS3Client if you use the standard builder.
You can work it out for yourself given the bucket and the file name you specify in the upload request.
e.g. if your bucket is mybucket and your file is named myfilename:
https://mybucket.s3.amazonaws.com/myfilename
The s3 bit will be different depending on which region your bucket is in. For example, I use the south-east asia region so my urls are like:
https://mybucket.s3-ap-southeast-1.amazonaws.com/myfilename
For AWS SDK 2+
String key = "filePath";
String bucketName = "bucketName";
PutObjectResponse response = s3Client
.putObject(PutObjectRequest.builder().bucket(bucketName ).key(key).build(), RequestBody.fromFile(file));
GetUrlRequest request = GetUrlRequest.builder().bucket(bucketName ).key(key).build();
String url = s3Client.utilities().getUrl(request).toExternalForm();
#hussachai and #Jeffrey Kemp answers are pretty good. But they have something in common is the url returned is of virtual-host-style, not in path style. For more info regarding to the s3 url style, can refer to AWS S3 URL Styles. In case of some people want to have path style s3 url generated. Here's the step. Basically everything will be the same as #hussachai and #Jeffrey Kemp answers, only with one line setting change as below:
AmazonS3Client s3Client = (AmazonS3Client) AmazonS3ClientBuilder.standard()
.withRegion("us-west-2")
.withCredentials(DefaultAWSCredentialsProviderChain.getInstance())
.withPathStyleAccessEnabled(true)
.build();
// Upload a file as a new object with ContentType and title specified.
PutObjectRequest request = new PutObjectRequest(bucketName, stringObjKeyName, fileToUpload);
s3Client.putObject(request);
URL s3Url = s3Client.getUrl(bucketName, stringObjKeyName);
logger.info("S3 url is " + s3Url.toExternalForm());
This will generate url like:
https://s3.us-west-2.amazonaws.com/mybucket/myfilename
Similarly if you want link through s3Client you can use below.
System.out.println("filelink: " + s3Client.getUrl("your_bucket_name", "your_file_key"));
a bit old but still for anyone stumbling upon this in the future:
you can do it with one line assuming you already wrote the CredentialProvider and the AmazonS3Client.
it will look like this:
String ImageURL = String.valueOf(s3.getUrl(
ConstantsAWS3.BUCKET_NAME, //The S3 Bucket To Upload To
file.getName())); //The key for the uploaded object
and if you didn't wrote the CredentialProvider and the AmazonS3Client then just add them before getting the URL like this:
CognitoCachingCredentialsProvider credentialsProvider = new CognitoCachingCredentialsProvider(
getApplicationContext(),
"POOL_ID", // Identity pool ID
Regions.US_EAST_1 // Region
);
Below method uploads file in a particular folder in a bucket and return the generated url of the file uploaded.
private String uploadFileToS3Bucket(final String bucketName, final File file) {
final String uniqueFileName = uploadFolder + "/" + file.getName();
LOGGER.info("Uploading file with name= " + uniqueFileName);
final PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, uniqueFileName, file);
amazonS3.putObject(putObjectRequest);
return ((AmazonS3Client) amazonS3).getResourceUrl(bucketName, uniqueFileName);
}
If you're using AWS-SDK, the data object returned contains the Object URL in data.Location
const AWS = require('aws-sdk')
const s3 = new AWS.S3(config)
s3.upload(params).promise()
.then((data)=>{
console.log(data.Location)
})
.catch(err=>console.log(err))
System.out.println("Link : " + s3Object.getObjectContent().getHttpRequest().getURI());
With this code you can retrieve the link of already uploaded file to S3 bucket.
To make the file public before uploading you can use the #withCannedAcl method of PutObjectRequest:
myAmazonS3Client.putObject(new PutObjectRequest('some-grails-bucket', 'somePath/someKey.jpg', new File('/Users/ben/Desktop/photo.jpg')).withCannedAcl(CannedAccessControlList.PublicRead))
String url = myAmazonS3Client.getUrl('some-grails-bucket', 'somePath/someKey.jpg').toString();