listing objects in aws bucket - java

I was trying to print all the objects in a bucket but I am getting an error.
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 301, AWS Service: Amazon S3, AWS Request ID: 758A7CBF1A29FD74, AWS Error Code: PermanentRedirect, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint., S3
At the moment I only have the following code :
public class S3Download {
/**
* #param args
*/
public static void main(String[] args) {
AmazonS3 s3 = new AmazonS3Client(new ClasspathPropertiesFileCredentialsProvider());
Region usWest2 = Region.getRegion(Regions.US_WEST_2);
s3.setRegion(usWest2);
String bucketName = "apireleasecandidate1";
ListObjectsRequest listObjectRequest = new ListObjectsRequest().withBucketName(bucketName);
ObjectListing objectListing;
do{
objectListing = s3.listObjects(listObjectRequest);
for(S3ObjectSummary objectSummary : objectListing.getObjectSummaries()){
System.out.println(" - " + objectSummary.getKey() + " " + "(size = " +
objectSummary.getSize() + ")");
}
listObjectRequest.setMarker(objectListing.getNextMarker());
}while(objectListing.isTruncated());
}
}
I found this solution on amazon's website.
Does anyone know what I am missing?

For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}
def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {
def scan(acc:List[T], listing:ObjectListing): List[T] = {
val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
val mapped = (for (summary <- summaries) yield f(summary)).toList
if (!listing.isTruncated) mapped.toList
else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
}
scan(List(), s3.listObjects(bucket, prefix))
}
To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.
For example
map(s3, bucket, prefix)(s => println(s))
will print all the files
val tuple = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner, s.getSize))
will return the full list of (key, owner, size) tuples in that bucket/prefix
val totalSize = map(s3, "bucket", "prefix")(s => s.getSize).sum
will return the total size of its content (note the additional sum() folding function applied at the end of the expression ;-)
You can combine map() with many other functions as you would normally approach by Monads in Functional Programming

It appears that your bucket "apireleasecandidate1" is not in the us-west-1 region. I think it is in the us-classic region. You should modify your code to remove the setRegion() call.

Related

BatchDelete is not deleting all value in list

Trying to delete multiple row of DynamoDB.
Querying only on one table of DynamoDB.On basis the of Partition key, its returning 2 values as output, which is save in list.
Now after deleting this value using BatchDelete only first element get deleted. Sometimes on random basis second value also get deleted but that was not happened every time.
DynamoDBQueryExpression<Abc> queryExpression = new DynamoDBQueryExpression<Abc>()
.withHashKeyValues(abc);
List<Abc> xyz = dynamoDBMapper.query(Abc.class,queryExpression);
//xyz has size 2
dynamoDBMapper.batchDelete(xyz);
Should I use sleep or is there any other way.
If you look at Java V1 here:
https://github.com/awsdocs/aws-doc-sdk-examples
You will see it's marked as deprecated.
I strongly recommend that you upgrade to the AWS SDK for Java v2 API.
When working with Java V2 and DynamoDB, the Enhanced Client offers a straightforward way to map client-side classes to DynamoDB tables. This is documented in the Java V2 Developer Guide here:
Mapping items in DynamoDB tables
To use the Enhanced Client to delete multiple items, you can use this Java code:
package com.example.dynamodb;
// snippet-start:[dynamodb.java2.mapping.batchdelete.import]
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.enhanced.dynamodb.DynamoDbEnhancedClient;
import software.amazon.awssdk.enhanced.dynamodb.DynamoDbTable;
import software.amazon.awssdk.enhanced.dynamodb.Key;
import software.amazon.awssdk.enhanced.dynamodb.TableSchema;
import software.amazon.awssdk.enhanced.dynamodb.model.BatchWriteItemEnhancedRequest;
import software.amazon.awssdk.enhanced.dynamodb.model.DeleteItemEnhancedRequest;
import software.amazon.awssdk.enhanced.dynamodb.model.WriteBatch;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import software.amazon.awssdk.services.dynamodb.model.DynamoDbException;
// snippet-end:[dynamodb.java2.mapping.batchdelete.import]
/*
* Before running this code example, create an Amazon DynamoDB table named Customer with these columns:
* - id - the id of the record that is the key
* - custName - the customer name
* - email - the email value
* - registrationDate - an instant value when the item was added to the table
*
* Also, ensure that you have set up your development environment, including your credentials.
*
* For information, see this documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class EnhancedBatchDeleteItems {
public static void main(String[] args) {
ProfileCredentialsProvider credentialsProvider = ProfileCredentialsProvider.create();
Region region = Region.US_EAST_1;
DynamoDbClient ddb = DynamoDbClient.builder()
.region(region)
.credentialsProvider(credentialsProvider)
.build();
DynamoDbEnhancedClient enhancedClient = DynamoDbEnhancedClient.builder()
.dynamoDbClient(ddb)
.build();
deleteBatchRecords(enhancedClient);
ddb.close();
}
// snippet-start:[dynamodb.java2.mapping.batchdelete.main]
public static void deleteBatchRecords(DynamoDbEnhancedClient enhancedClient) {
try {
DynamoDbTable<Customer> mappedTable = enhancedClient.table("Customer", TableSchema.fromBean(Customer.class));
Key key1 = Key.builder()
.partitionValue("id110")
.build();
Key key2 = Key.builder()
.partitionValue("id120")
.build();
BatchWriteItemEnhancedRequest request = BatchWriteItemEnhancedRequest.builder()
.writeBatches(WriteBatch.builder(Customer.class)
.mappedTableResource(mappedTable)
.addDeleteItem(DeleteItemEnhancedRequest.builder()
.key(key1)
.build())
.build(),
WriteBatch.builder(Customer.class)
.mappedTableResource(mappedTable)
.addDeleteItem(DeleteItemEnhancedRequest.builder()
.key(key2)
.build())
.build())
.build();
// Delete these two items from the table.
enhancedClient.batchWriteItem(request);
System.out.println("Records deleted");
} catch (DynamoDbException e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
// snippet-end:[dynamodb.java2.mapping.batchdelete.main]
}
You can find this example and other Java v2 DynamoDB examples in AWS Code Example Github.
I do not see any issue with your code, I have tested similar and works for me with no issue. My first suggestion would be to ensure you wrap your code in a try/catch block:
try {
DynamoDBMapper mapper = new DynamoDBMapper(client);
Reply key = new Reply();
key.setPk("1");
DynamoDBQueryExpression<Reply> queryExpression = new DynamoDBQueryExpression<Reply>()
.withHashKeyValues(key)
.withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)
.withScanIndexForward(false);
List<Reply> latestReplies = mapper.query(Reply.class, queryExpression);
// Log keys here to be sure you are deleting the correct item
for (Reply c: latestReplies) {
System.out.println(c.getPk());
}
mapper.batchDelete(latestReplies);
} catch (Throwable t) {
System.err.println("Error running: " + t);
t.printStackTrace();
}
I would suggest that you implement some logging, just to be sure that the items you read are the ones you expected to be deleted.
It may also be worth enabling CloudTrail Dataplane Logs which will allow you to see all the dataplane events being executed on the table.
You may also enable HTTP wire logging to provide you another level of logging, however, this is not advised for production workloads as the logging is quite verbose.

Java : How to print the content of s3 bucket

I am using jdk 11 and virtual-host-style-access (AWS SDK for Java version 2) to create/access objects in AWS s3 bucket following :
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/examples-s3-objects.html#list-object
While I was able to create objects in the the designated bucket, I am not able to print the list of contents/objects in the bucket although, as I checked the permission, everyone is granted to view the objects in the bucket. The error message is :
software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404
This is the way the s3client is created :
adapterSmsS3Client = S3Client.builder()
.region(Region.US_WEST_2)
.credentialsProvider(StaticCredentialsProvider.create(AwsBasicCredentials.create(ACCESS_KEY,SECRET_KEY)))
.endpointOverride(URI.create(BASE_URL))
.build();
And this is the way I am trying to print the list:
public static void listBucketObjects( S3Client s3, String bucketName ) {
ListBucketsResponse res1 = s3.listBuckets();
ListObjectsRequest listObjects = ListObjectsRequest
.builder()
.bucket(BUCKET_NAME)
.build();
ListObjectsResponse res = s3.listObjects(listObjects);
List<S3Object> objects = res.contents();
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
System.out.print("\n The name of the key is " + myValue.key());
System.out.print("\n The object is " + calKb(myValue.size()) + " KBs");
System.out.print("\n The owner is " + myValue.owner());
}
}
BUCKET_NAME is the name of bucket on s3 ( not any URL)
Although, I would like to mention that if I use Path-style-request (AWS SDK for Java version 1), following :
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-s3-objects.html
I am able to print contents from the same bucket. However we do not intend to go that way.
Any insight on why am I getting the "key does not exist" error or potential resolution?
If you had any problem with permissions you would have get a 403 forbidden; not a 404 NoSuchKey.
What are the names of your objects in the bucket ? My guess is that you have some special characters or url-encoded characters that causes the problem. See https://aws.amazon.com/premiumsupport/knowledge-center/404-error-nosuchkey-s3/?nc1=h_ls for more details.
And I suggest you to use listObjectsV2 instead of the V1.

How to get the information of the exact file from S3 using ListVersionsRequest, JAVA?

I have S3 bucket named 'groceries' and with multiple files inside the folder fruits.
Files name would be like ('APPLE_1','APPLE_11','APPLE_1112','APPLE_3','APPLE_6')
I basically want the file_name and it's version IDs for a given bucket and key.
When I run the below code it also fetches other similar files like (APPLE_11, APPLE_1112).
What changes should I make in the below code to filter only (APPLE_1) ?
ListVersionsRequest request = new ListVersionsRequest();
request.withBucketName("groceries");
request.setPrefix("fruits/APPLE_1");
request.withMaxResults(20);
VersionListing versionListing = s3Client.listVersions(request);
int numVersions = 0, numPages = 0;
while (true) {
numPages++;
for (S3VersionSummary objectSummary:
versionListing.getVersionSummaries()) {
System.out.printf("Retrieved object %s, version %s\n",
objectSummary.getKey(),
objectSummary.getVersionId());
numVersions++;
}
if (versionListing.isTruncated()) {
versionListing = s3Client.listNextBatchOfVersions(versionListing);
} else {
break;
}
current output
Retrieved object fruits/APPLE_1, version LFrP3YxiZu9S0
Retrieved object fruits/APPLE_11, version bHcs6Oh1leiPPvUB6NI07P0GB6
Retrieved object fruits/APPLE_1112, version Q9FD7fmVq_t1L3GPFitf
expected output
Retrieved object fruits/APPLE_1, version LFrP3YxiZu9S0
You can place an if condition inside the for loop. E. g.
for (S3VersionSummary objectSummary:
versionListing.getVersionSummaries()) {
if ( objectSummary.getKey().isEqual("fruits/APPLE_1"){// Or any other condition based on your requirement
System.out.printf("Retrieved object %s, version %s\n",
objectSummary.getKey(),
objectSummary.getVersionId());
}
numVersions++;
}

Use "Selector modules" with DataMovement SDK MarkLogic [Java] [MarkLogic] [dmsdk] [data-movement-sdk][ml-java-api]

I'm using Data Movement SDK from MarkLogic Java API to transform several documents, up to now I can transform documents by using a query batcher and a transform, but i'm only able to use URIS selectors by StructuredQuery objects.
My question is: ¿How may I use a selector module from my database instead of define it into my java application?
Update:
Up to now I already have a code that looks for document's URIS and applies a transform on them. I want to change that query batcher and use a module or selector module instead of looking for all documents into a directory
public TransformExecutionResults applyTransformByModule(String transformName, String filterText, int batchSize, int threadCount, String selectorModuleName, Map<String,String> parameters ) {
final ConcurrentHashMap<String, TransformExecutionResults> transformResult = new ConcurrentHashMap<>();
try {
// Specify a server-side transformation module (stored procedure) by name
ServerTransform transform = new ServerTransform(transformName);
ApplyTransformListener transformListener = new ApplyTransformListener().withTransform(transform).withApplyResult(ApplyResult.REPLACE) // Transform in-place, i.e. rewrite
.onSuccess(batch -> {
transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Success);
System.out.println("Transformation " + transformName + " executed succesfully.");
}).onSkipped(batch -> {
System.out.println("Transformation " + transformName + " skipped succesfully.");
transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Skipped);
}).onFailure((batchListener, throwable) -> {
System.err.println("Transformation " + transformName + " executed with errors.");
transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Failed); // failed
});
// Apply the transformation to only the documents that match a query.
QueryManager qm = DbClient.newQueryManager();
StructuredQueryBuilder sqb = qm.newStructuredQueryBuilder();
// instead of this StruturedQueryDefinition, I want to use a module to get all URIS
StructuredQueryDefinition queryBySubdirectory = sqb.directory(true, "/temp/" + filterText + "/");
final QueryBatcher batcher = DMManager.newQueryBatcher(queryBySubdirectory);
batcher.withBatchSize(batchSize);
batcher.withThreadCount(threadCount);
batcher.withConsistentSnapshot();
batcher.onUrisReady(transformListener).onQueryFailure(exception -> {
exception.printStackTrace();
System.out.println("There was an error on Transform process.");
});
final JobTicket ticket = DMManager.startJob(batcher);
batcher.awaitCompletion();
DMManager.stopJob(ticket);
} catch (Exception fault) {
transformResult.compute(transformName, (k, v) -> TransformExecutionResults.GeneralException); // general exception
}
return transformResult.get(transformName);
}
If the job is small enough, you can just implement the document rewriting within your enode code either by making a call to a resource service extension:
http://docs.marklogic.com/guide/java/resourceservices#id_27702
http://docs.marklogic.com/javadoc/client/com/marklogic/client/extensions/ResourceServices.html
or by invoking a main module:
http://docs.marklogic.com/guide/java/resourceservices#id_84134
If the job is too long to fit in a single transaction, your can create a QueryBatcher with a document URI iterator instead of with a query. See:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/DataMovementManager.html#newQueryBatcher-java.util.Iterator-
For some examples illustrating the approach, see the second half of the second example in the class description for QueryBatcher:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/QueryBatcher.html
as well as the second half of this example:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/UrisToWriterListener.html
In your case, you could implement an Iterator that calls a resource service extension or invokes a main module to get and return the URIs (preferrably with read ahead), blocking when necessary.
By returning the uris to the client, it's easy to log the uris for later audit.
Hoping that helps,

createUserDefinedFunction : if already exists?

I'm using azure-documentdb java SDK in order to create and use "User Defined Functions (UDFs)"
So from the official documentation I finally find the way (with a Java client) on how to create an UDF:
String regexUdfJson = "{"
+ "id:\"REGEX_MATCH\","
+ "body:\"function (input, pattern) { return input.match(pattern) !== null; }\","
+ "}";
UserDefinedFunction udfREGEX = new UserDefinedFunction(regexUdfJson);
getDC().createUserDefinedFunction(
myCollection.getSelfLink(),
udfREGEX,
new RequestOptions());
And here is a sample query :
SELECT * FROM root r WHERE udf.REGEX_MATCH(r.name, "mytest_.*")
I had to create the UDF one time only because I got an exception if I try to recreate an existing UDF:
DocumentClientException: Message: {"Errors":["The input name presented is already taken. Ensure to provide a unique name property for this resource type."]}
How should I do to know if the UDF already exists ?
I try to use "readUserDefinedFunctions" function without success. Any example / other ideas ?
Maybe for the long term, should we suggest a "createOrReplaceUserDefinedFunction(...)" on azure feedback
You can check for existing UDFs by running query using queryUserDefinedFunctions.
Example:
List<UserDefinedFunction> udfs = client.queryUserDefinedFunctions(
myCollection.getSelfLink(),
new SqlQuerySpec("SELECT * FROM root r WHERE r.id=#id",
new SqlParameterCollection(new SqlParameter("#id", myUdfId))),
null).getQueryIterable().toList();
if (udfs.size() > 0) {
// Found UDF.
}
An answer for .NET users.
`var collectionAltLink = documentCollections["myCollection"].AltLink; // Target collection's AltLink
var udfLink = $"{collectionAltLink}/udfs/{sampleUdfId}"; // sampleUdfId is your UDF Id
var result = await _client.ReadUserDefinedFunctionAsync(udfLink);
var resource = result.Resource;
if (resource != null)
{
// The UDF with udfId exists
}`
Here _client is Azure's DocumentClient and documentCollections is a dictionary of your documentDb collections.
If there's no such UDF in the mentioned collection, the _client throws a NotFound exception.

Categories

Resources