AWS S3 - Bulk deletion via Java SDK v1 [duplicate]

AWS S3 - Bulk deletion via Java SDK v1 [duplicate] - java

Is it possible to delete a folder(In S3 bucket) and all its content with a single api request using java sdk for aws. For browser console we can delete and folder and its content with a single click and I hope that same behavior should be available using the APIs also.

There is no such thing as folders in S3. There are simply files (objects) with slashes in the filenames (keys).
The S3 browser console will visualize these slashes as folders, but they're not real.
You can delete all files with the same prefix, but first you need to look them up with list_objects(), then you can batch delete them.
For code snippet using Java SDK, please refer to Deleting multiple objects.

You can specify keyPrefix in ListObjectsRequest.
For example, consider a bucket that contains the following keys:
foo/bar/baz
foo/bar/bash
foo/bar/bang
foo/boo
And you want to delete files from foo/bar/baz.
if (s3Client.doesBucketExist(bucketName)) {
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("foo/bar/baz");
ObjectListing objectListing = s3Client.listObjects(listObjectsRequest);
while (true) {
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
s3Client.deleteObject(bucketName, objectSummary.getKey());
}
if (objectListing.isTruncated()) {
objectListing = s3Client.listNextBatchOfObjects(objectListing);
} else {
break;
}
}
}
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ListObjectsRequest.html

There is no option of giving a folder name or more specifically prefix in java sdk to delete files. But there is an option of giving array of keys you want to delete.
Click for details
.
By using this, I have written a small method to delete all files corresponding to a prefix.
private AmazonS3 s3client = <Your s3 client>;
private String bucketName = <your bucket name, can be signed or unsigned>;
public void deleteDirectory(String prefix) {
ObjectListing objectList = this.s3client.listObjects( this.bucketName, prefix );
List<S3ObjectSummary> objectSummeryList = objectList.getObjectSummaries();
String[] keysList = new String[ objectSummeryList.size() ];
int count = 0;
for( S3ObjectSummary summery : objectSummeryList ) {
keysList[count++] = summery.getKey();
}
DeleteObjectsRequest deleteObjectsRequest = new DeleteObjectsRequest( bucketName ).withKeys( keysList );
this.s3client.deleteObjects(deleteObjectsRequest);
}

You can try the below methods, it will handle deletion even for truncated pages, and also it will recursively delete all the contents in the given directory:
public Set<String> listS3DirFiles(String bucket, String dirPrefix) {
ListObjectsV2Request s3FileReq = new ListObjectsV2Request()
.withBucketName(bucket)
.withPrefix(dirPrefix)
.withDelimiter("/");
Set<String> filesList = new HashSet<>();
ListObjectsV2Result objectsListing;
try {
do {
objectsListing = amazonS3.listObjectsV2(s3FileReq);
objectsListing.getCommonPrefixes().forEach(folderPrefix -> {
filesList.add(folderPrefix);
Set<String> tempPrefix = listS3DirFiles(bucket, folderPrefix);
filesList.addAll(tempPrefix);
});
for (S3ObjectSummary summary: objectsListing.getObjectSummaries()) {
filesList.add(summary.getKey());
}
s3FileReq.setContinuationToken(objectsListing.getNextContinuationToken());
} while(objectsListing.isTruncated());
} catch (SdkClientException e) {
System.out.println(e.getMessage());
throw e;
}
return filesList;
}
public boolean deleteDirectoryContents(String bucket, String directoryPrefix) {
Set<String> keysSet = listS3DirFiles(bucket, directoryPrefix);
if (keysSet.isEmpty()) {
System.out.println("Given directory {} doesn't have any file "+ directoryPrefix);
return false;
}
DeleteObjectsRequest deleteObjectsRequest = new DeleteObjectsRequest(bucket)
.withKeys(keysSet.toArray(new String[0]));
try {
amazonS3.deleteObjects(deleteObjectsRequest);
} catch (SdkClientException e) {
System.out.println(e.getMessage());
throw e;
}
return true;
}

First you need to fetch all object keys starting with the given prefix:
public List<FileKey> list(String keyPrefix) {
var objectListing = client.listObjects("bucket-name", keyPrefix);
var paths =
objectListing.getObjectSummaries().stream()
.map(s3ObjectSummary -> s3ObjectSummary.getKey())
.collect(Collectors.toList());
while (objectListing.isTruncated()) {
objectListing = client.listNextBatchOfObjects(objectListing);
paths.addAll(
objectListing.getObjectSummaries().stream()
.map(s3ObjectSummary -> s3ObjectSummary.getKey())
.toList());
}
return paths.stream().sorted().collect(Collectors.toList());
}
Then call deleteObjects:
client.deleteObjects(new DeleteObjectsRequest("bucket-name").withKeys(list("some-prefix")));

You can try this
void deleteS3Folder(String bucketName, String folderPath) {
for (S3ObjectSummary file : s3.listObjects(bucketName, folderPath).getObjectSummaries()){
s3.deleteObject(bucketName, file.getKey());
}
}

Related

AWS SDK V2 S3 fetch object is not fetching objects more than 1000

I am using AWS SDK version : 2.16.78. But the ListObjectsRequest object is not fetching objects more than 1000.
I did go through the documentation but I wasn't able to find how to set the continuous token.
I am using the below code snippet
try {
ListObjectsRequest listObjects = ListObjectsRequest
.builder()
.bucket(bucketName)
.build();
ListObjectsResponse res = s3.listObjects(listObjects);
List<S3Object> objects = res.contents();
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
System.out.print("\n The name of the key is " + myValue.key());
}
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
The above code is only fetching 1000 s3 objects.

As you indicated, AWS will only return up to 1000 of the objects in a bucket:
Returns some or all (up to 1,000) of the objects in a bucket.
Amazon S3 lists objects in alphabetical order. You can take advantage of this fact and provide a marker to the key that should be used to start with in the next requests, if appropriate:
try {
ListObjectsRequest listObjects = ListObjectsRequest
.builder()
.bucket(bucketName)
.build()
;
ListObjectsResponse listObjectsResponse = null;
String lastKey = null;
do {
if ( listObjectsResponse != null ) {
listObjectsRequest = listObjectsRequest.toBuilder()
.marker(lastKey)
.build()
;
}
listObjectsResponse = s3.listObjects(listObjectsRequest);
List<S3Object> objects = listObjectsResponse.contents();
// Iterate over results
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
String key = myValue.key();
System.out.print("\n The name of the key is " + key);
// Update the value of the last key processed
lastKey = key;
}
} while ( listObjectsResponse.isTruncated() );
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
Something very similar can be achieved with the v2 of the list objects API ListObjectsV2Request startAfter method.
With v2, you can use ListObjectsV2Response and continuation token as well. Something similar to:
try {
ListObjectsV2Request listObjects = ListObjectsV2Request
.builder()
.bucket(bucketName)
.build()
;
ListObjectsV2Response listObjectsResponse = null;
String nextContinuationToken = null;
do {
if ( listObjectsResponse != null ) {
listObjectsRequest = listObjectsRequest.toBuilder()
.continuationToken(nextContinuationToken)
.build()
;
}
listObjectsResponse = s3.listObjectsV2(listObjectsRequest);
nextContinuationToken = listObjectsResponse.nextContinuationToken();
List<S3Object> objects = listObjectsResponse.contents();
// Iterate over results
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
String key = myValue.key();
System.out.print("\n The name of the key is " + key);
}
} while ( listObjectsResponse.isTruncated() );
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
Finally, you can use the listObjectsV2Paginator method to iterate over the results in a similar way like listNextBatchOfObjects was used in the v1 of the API. See for instance this related v1 code and these 1 2 related SO questions.
All the mappings between operations from v1 and v2 versions of the API are documented here.

IFile.getFile is case sensitive

Say my workspace has certain files in the root folder like foo.xml, foo1.xml, foo2.xml, foo3.xml.
final List<String> configFiles = new ArrayList<>();
configFiles.add("foo.xml");
configFiles.add("foo1.xml");
configFiles.add("Foo2.xml");
final List<IFile> iFiles = configFiles.stream()
.map(project::getFile)
.filter(IFile::exists)
.collect(Collectors.toList());
When I do a getFile on the project, IFile expects a case sensitive fileName, say there is foo2.xml in my workspace and I try to access Foo2.xml, I don't get the file.
How can I get files regardless of the case ?

I don't think there is a simple way.
You could get call members() on the project:
IResource [] members = project.members();
and then match the member names using equalsIgnoreCase:
private IFile findFile(IResource [] members, String name)
{
for (IResource member : members) {
if (name.equalsIgnoreCase(member.getName())) {
if (member instanceof IFile) {
return (IFile)member;
}
return null;
}
}
return null;
}
so the stream would be:
final List<IFile> iFiles = configFiles.stream()
.map(file -> findFile(members, file))
.filter(Objects::nonNull)
.collect(Collectors.toList());

"recursively" grab all files in subfolders in S3

I need help to ‘recursively’ grab files in s3:
For example, I have s3 structure like this:
My-bucket/2018/06/05/10/file1.json
My-bucket/2018/06/05/11/file2.json
My-bucket/2018/06/05/12/file3.json
My-bucket/2018/06/05/13/file5.json
My-bucket/2018/06/05/14/file4.json
My-bucket/2018/06/05/15/file6.json
I need to get all files pathes with file name for given bucket:
I tried following method, but it didn’t worked for me (its returning not whole path):
public List<String> getObjectsListFromFolder4(String bucketName, String keyPrefix) {
List<String> paths = new ArrayList<String>();
String delimiter = "/";
if (keyPrefix != null && !keyPrefix.isEmpty() && !keyPrefix.endsWith(delimiter)) {
keyPrefix += delimiter;
}
ListObjectsRequest listObjectRequest = new ListObjectsRequest().withBucketName(bucketName)
.withPrefix(keyPrefix).withDelimiter(delimiter);
ObjectListing objectListing;
do {
objectListing = s3Client.listObjects(listObjectRequest);
paths.addAll(objectListing.getCommonPrefixes());
listObjectRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
return paths;
}

There is a new utility class — S3Objects — that provides an easy way to iterate Amazon S3 objects in a "foreach" statement. Use its withPrefix method and then just iterate them. You can use filters and streams as well.
Here is an example (Kotlin):
val s3 = AmazonS3ClientBuilder
.standard()
.withCredentials(EnvironmentVariableCredentialsProvider())
.build()
S3Objects
.withPrefix(s3, bucket, folder)
.filter { s3ObjectSummary ->
s3ObjectSummary.key.endsWith(".gz")
}
.parallelStream()
.forEach { s3ObjectSummary ->
CSVParser.parse(
GZIPInputStream(s3.getObject(s3ObjectSummary.bucketName, s3ObjectSummary.key).objectContent),
StandardCharsets.UTF_8,
CSVFormat.DEFAULT
).use { csvParser ->
…
}
}

getCommonPrefixes() only lists the prefixes, not the actual keys. From the documentation:
For example, consider a bucket that contains the following keys:
"foo/bar/baz"
"foo/bar/bash"
"foo/bar/bang"
"foo/boo"
If calling
listObjects with the prefix="foo/" and the delimiter="/" on this
bucket, the returned S3ObjectListing will contain one entry in the
common prefixes list ("foo/bar/") and none of the keys beginning with
that common prefix will be included in the object summaries list.
Instead, use getObjectSummaries() to get the keys. You also need to remove withDelimiters(). This causes S3 to only list items in the current 'directory.' This method works for me:
public static List<String> getObjectsListFromS3(AmazonS3 s3, String bucket, String prefix) {
final String delimiter = "/";
if (!prefix.endsWith(delimiter)) {
prefix = prefix + delimiter;
}
List<String> paths = new LinkedList<>();
ListObjectsRequest request = new ListObjectsRequest().withBucketName(bucket).withPrefix(prefix);
ObjectListing result;
do {
result = s3.listObjects(request);
for (S3ObjectSummary summary : result.getObjectSummaries()) {
// Make sure we are not adding a 'folder'
if (!summary.getKey().endsWith(delimiter)) {
paths.add(summary.getKey());
}
}
request.setMarker(result.getMarker());
}
while (result.isTruncated());
return paths;
}
Consider an S3 bucket that contains the following keys:
particle.fs
test/
test/blur.fs
test/blur.vs
test/subtest/particle.fs
With this driver code:
public static void main(String[] args) {
String bucket = "playground-us-east-1-1234567890";
AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion("us-east-1").build();
String prefix = "test";
for (String key : getObjectsListFromS3(s3, bucket, prefix)) {
System.out.println(key);
}
}
produces:
test/blur.fs
test/blur.vs
test/subtest/particle.fs

Here is an example about how to get all files in the directory, hope can help you :
public static List<String> getAllFile(String directoryPath,boolean isAddDirectory) {
List<String> list = new ArrayList<String>();
File baseFile = new File(directoryPath);
if (baseFile.isFile() || !baseFile.exists()) {
return list;
}
File[] files = baseFile.listFiles();
for (File file : files) {
if (file.isDirectory()) {
if(isAddDirectory){
list.add(file.getAbsolutePath());
}
list.addAll(getAllFile(file.getAbsolutePath(),isAddDirectory));
} else {
list.add(file.getAbsolutePath());
}
}
return list;
}

Delete a folder and its content AWS S3 java

Is it possible to delete a folder(In S3 bucket) and all its content with a single api request using java sdk for aws. For browser console we can delete and folder and its content with a single click and I hope that same behavior should be available using the APIs also.

There is no such thing as folders in S3. There are simply files (objects) with slashes in the filenames (keys).
The S3 browser console will visualize these slashes as folders, but they're not real.
You can delete all files with the same prefix, but first you need to look them up with list_objects(), then you can batch delete them.
For code snippet using Java SDK, please refer to Deleting multiple objects.

You can specify keyPrefix in ListObjectsRequest.
For example, consider a bucket that contains the following keys:
foo/bar/baz
foo/bar/bash
foo/bar/bang
foo/boo
And you want to delete files from foo/bar/baz.
if (s3Client.doesBucketExist(bucketName)) {
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("foo/bar/baz");
ObjectListing objectListing = s3Client.listObjects(listObjectsRequest);
while (true) {
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
s3Client.deleteObject(bucketName, objectSummary.getKey());
}
if (objectListing.isTruncated()) {
objectListing = s3Client.listNextBatchOfObjects(objectListing);
} else {
break;
}
}
}
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ListObjectsRequest.html

There is no option of giving a folder name or more specifically prefix in java sdk to delete files. But there is an option of giving array of keys you want to delete.
Click for details
.
By using this, I have written a small method to delete all files corresponding to a prefix.
private AmazonS3 s3client = <Your s3 client>;
private String bucketName = <your bucket name, can be signed or unsigned>;
public void deleteDirectory(String prefix) {
ObjectListing objectList = this.s3client.listObjects( this.bucketName, prefix );
List<S3ObjectSummary> objectSummeryList = objectList.getObjectSummaries();
String[] keysList = new String[ objectSummeryList.size() ];
int count = 0;
for( S3ObjectSummary summery : objectSummeryList ) {
keysList[count++] = summery.getKey();
}
DeleteObjectsRequest deleteObjectsRequest = new DeleteObjectsRequest( bucketName ).withKeys( keysList );
this.s3client.deleteObjects(deleteObjectsRequest);
}

You can try the below methods, it will handle deletion even for truncated pages, and also it will recursively delete all the contents in the given directory:
public Set<String> listS3DirFiles(String bucket, String dirPrefix) {
ListObjectsV2Request s3FileReq = new ListObjectsV2Request()
.withBucketName(bucket)
.withPrefix(dirPrefix)
.withDelimiter("/");
Set<String> filesList = new HashSet<>();
ListObjectsV2Result objectsListing;
try {
do {
objectsListing = amazonS3.listObjectsV2(s3FileReq);
objectsListing.getCommonPrefixes().forEach(folderPrefix -> {
filesList.add(folderPrefix);
Set<String> tempPrefix = listS3DirFiles(bucket, folderPrefix);
filesList.addAll(tempPrefix);
});
for (S3ObjectSummary summary: objectsListing.getObjectSummaries()) {
filesList.add(summary.getKey());
}
s3FileReq.setContinuationToken(objectsListing.getNextContinuationToken());
} while(objectsListing.isTruncated());
} catch (SdkClientException e) {
System.out.println(e.getMessage());
throw e;
}
return filesList;
}
public boolean deleteDirectoryContents(String bucket, String directoryPrefix) {
Set<String> keysSet = listS3DirFiles(bucket, directoryPrefix);
if (keysSet.isEmpty()) {
System.out.println("Given directory {} doesn't have any file "+ directoryPrefix);
return false;
}
DeleteObjectsRequest deleteObjectsRequest = new DeleteObjectsRequest(bucket)
.withKeys(keysSet.toArray(new String[0]));
try {
amazonS3.deleteObjects(deleteObjectsRequest);
} catch (SdkClientException e) {
System.out.println(e.getMessage());
throw e;
}
return true;
}

First you need to fetch all object keys starting with the given prefix:
public List<FileKey> list(String keyPrefix) {
var objectListing = client.listObjects("bucket-name", keyPrefix);
var paths =
objectListing.getObjectSummaries().stream()
.map(s3ObjectSummary -> s3ObjectSummary.getKey())
.collect(Collectors.toList());
while (objectListing.isTruncated()) {
objectListing = client.listNextBatchOfObjects(objectListing);
paths.addAll(
objectListing.getObjectSummaries().stream()
.map(s3ObjectSummary -> s3ObjectSummary.getKey())
.toList());
}
return paths.stream().sorted().collect(Collectors.toList());
}
Then call deleteObjects:
client.deleteObjects(new DeleteObjectsRequest("bucket-name").withKeys(list("some-prefix")));

You can try this
void deleteS3Folder(String bucketName, String folderPath) {
for (S3ObjectSummary file : s3.listObjects(bucketName, folderPath).getObjectSummaries()){
s3.deleteObject(bucketName, file.getKey());
}
}

Get the full path of a file at change set

I am searching for a .txt file that is located at change set.
Then I need to create locally over my pc the full path directory of this file.
For example if there a file called"test.txt" that it's located at:
Project1-->Folder1-->Folder2-->test.txt
Till now I have managed to search for this file.
Now I need to fetch the full directory and create similar one over my pc:
Result at my pc:
Folder1-->Folder2-->test.txt
That's what I did to search for the file within a changeset and retrieve it:
public IFileItem getTextFileFile(IChangeSet changeSet, ITeamRepository repository) throws TeamRepositoryException{
IVersionableManager vm = SCMPlatform.getWorkspaceManager(repository).versionableManager();
List changes = changeSet.changes();
IFileItem toReturn = null;
for(int i=0;i<changes.size();i++) {="" <br=""> Change change = (Change) changes.get(i);
IVersionableHandle after = change.afterState();
if( after != null && after instanceof IFileItemHandle) {
IFileItem fileItem = (IFileItem) vm.fetchCompleteState(after, null);
if(fileItem.getName().contains(".txt")) {
toReturn = fileItem;
break;
} else {
continue;
}
}
}
if(toReturn == null){
throw new TeamRepositoryException("Could not find the file");
}
return toReturn;
}
I use RTC:4
Win:XP
Thanks in advance.

I have the following IConfiguration that I fetched by the following:
IWorkspaceManager workspaceManager = SCMPlatform.getWorkspaceManager(repository);
IWorkspaceSearchCriteria wsSearchCriteria = WorkspaceSearchCriteria.FACTORY.newInstance();
wsSearchCriteria.setKind(IWorkspaceSearchCriteria.STREAMS);
wsSearchCriteria.setPartialOwnerNameIgnoreCase(projectAreaName);
List <iworkspacehandle> workspaceHandles = workspaceManager.findWorkspaces(wsSearchCriteria, Integer.MAX_VALUE, Application.getMonitor());　
IWorkspaceConnection workspaceConnection = workspaceManager.getWorkspaceConnection(workspaceHandles.get(0),Application.getMonitor());
IComponentHandle component = changeSet.getComponent();
IConfiguration configuration = workspaceConnection.configuration(component);
List lst = new ArrayList<string>();
lst=configuration.locateAncestors(lst,Application.getMonitor());
=========================================
Now to get the full path of the file item ,I made the following method I got from :
https://jazz.net/forum/questions/94927/how-do-i-find-moved-from-location-for-a-movedreparented-item-using-rtc-4-java-api
=========================================
private String getFullPath(List ancestor, ITeamRepository repository)
throws TeamRepositoryException {
String directoryPath = "";
for (Object ancestorObj : ancestor) {
IAncestorReport ancestorImpl = (IAncestorReport) ancestorObj;
for (Object nameItemPairObj : ancestorImpl.getNameItemPairs()) {
NameItemPairImpl nameItemPair = (NameItemPairImpl) nameItemPairObj;
Object item = SCMPlatform.getWorkspaceManager(repository)
.versionableManager()
.fetchCompleteState(nameItemPair.getItem(), null);
String pathName = "";
if (item instanceof IFolder) {
pathName = ((IFolder) item).getName();
}
else if (item instanceof IFileItem) {
pathName = ((IFileItem) item).getName();
}
if (!pathName.equals(""))
directoryPath = directoryPath + "\\" + pathName;
}
}
return directoryPath;
}
=========================================

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

AWS S3 - Bulk deletion via Java SDK v1 [duplicate] - java

Is it possible to delete a folder(In S3 bucket) and all its content with a single api request using java sdk for aws. For browser console we can delete and folder and its content with a single click and I hope that same behavior should be available using the APIs also.

You can try this void deleteS3Folder(String bucketName, String folderPath) { for (S3ObjectSummary file : s3.listObjects(bucketName, folderPath).getObjectSummaries()){ s3.deleteObject(bucketName, file.getKey()); } }

Related

AWS SDK V2 S3 fetch object is not fetching objects more than 1000

IFile.getFile is case sensitive

"recursively" grab all files in subfolders in S3

Delete a folder and its content AWS S3 java

Get the full path of a file at change set

Categories

Resources