Create multiple empty directories in Amazon S3 using java - java

I am new to S3 and I am trying to create multiple directories in Amazon S3 using java by only making one call to S3.
I could only come up with this :-
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(0);
InputStream emptyContent = new ByteArrayInputStream(new byte[0]);
PutObjectRequest putObjectRequest = new PutObjectRequest(bucket,
"test/tryAgain/", emptyContent, metadata);
s3.putObject(putObjectRequest);
But the problem with this while uploading 10 folders (when the key ends with "/" in the console we can see the object as a folder ) is that I have to make 10 calls to S3.
But I want to do a create all the folders at once like we do a batch delete using DeleteObjectsRequest.
Can anyone please suggest me or help me how to solve my problem ?

Can you be a bit more specific as to what you're trying to do (or avoid doing)?
If you're primarily concerned with the cost per PUT, I don't think there is a way to batch 'upload' a directory with each file being a separate key and avoid that cost. Each PUT (even in a batch process) will cost you the price per PUT.
If you're simply trying to find a way to efficiently and recursively upload a folder, check out the uploadDirectory() method of TransferManager.
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html#uploadDirectory-java.lang.String-java.lang.String-java.io.File-boolean-
public MultipleFileUpload uploadDirectory(String bucketName,
String virtualDirectoryKeyPrefix,
File directory,
boolean includeSubdirectories)

Related

Should I serialize a data and save it in host/server computer to resume/pause upload to AWS S3 Bucket?

According to AWS Docs, we must serialize and de-serialize the pause result so that, if I resume the upload, I can pick up where I left off:
On a successful upload pause, PauseResult.getInfoToResume() returns an instance of PersistableUpload that can be used to resume the upload operation at a later time.
// Retrieve the persistable upload from the pause result.
PersistableUpload persistableUpload = pauseResult.getInfoToResume();
// Create a new file to store the information.
File f = new File("resume-upload");
if( !f.exists() ) f.createNewFile();
FileOutputStream fos = new FileOutputStream(f);
// Serialize the persistable upload to the file.
persistableUpload.serialize(fos);
fos.close();
Taking into account that my project is organized as follows (Standard Spring Boot Project):
Project Name
|__src
|__main
|__java
| |__com.blabla.projectname
|
|_____resources
Would I have temporary files within the deployed project if I deployed my web application using the method described in the AWS Docs?
Project Name
|__src
|__main
|__java
| |__com.blabla.projectname
|
|_____resources
|__temporal_file.tmp
When uploading a file to an AWS S3 bucket, I experienced the identical issue with PutObjectRequest(String bucketName, String key, File file). This method requires file and file has to exist on disk.
I later discovered that I'm not bounded to the method above since it has a overloaded method
PutObjectRequest(String bucketName, String key, InputStream input, ObjectMetadata metadata).
Now, rather than explicitly referencing a file, I use MultipartFile.getInputStream(). This way, I can do this without the file really being on the disk.
Returning to my question, is there a way to pause and resume without serializing? Since I'm new to this and really couldn't find anything decent on the internet, I had to ask it here.
I sincerely appreciate your assistance in advance.

Google Drive API - Check if file exists by file ID

I could not find a good way to check if the file exists before downloading it.
It seems the API doesnt provide a way to check if the file exists by getting the file by ID.
Basically Im checking if the generated outPutStream size is > 0 to process the file, but I didnt like the solution.
Drive driveService = new Drive.Builder(buildHttpTransport(), JSON_FACTORY, googleCredential).build();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
driveService.files().get(this.fileId).executeMediaAndDownloadTo(outputStream);
if(outputStream.size() == 0)
processFile()
Ideas are welcome!
Well, if you just want to check if the file is existed or not, then you can use the Files: list to list the file on your Google Drive. You can get the id of the file in the results.
If you already know the file id, then you can verify it by using the Files: get.
GET https://www.googleapis.com/drive/v3/files/fileId
You will get a response 200 here if the File id is existed.
For the Java code that you want, check this related SO question if it can help you.

How to download files from Amazon S3?

I have a folder named output inside a bucket named BucketA. I have a list of files in output folder. How do I download them to my local machine using AWS Java SDK ?
Below is my code:
AmazonS3Client s3Client = new AmazonS3Client(credentials);
File localFile = new File("/home/abc/Desktop/AmazonS3/");
s3Client.getObject(new GetObjectRequest("bucketA", "/bucketA/output/"), localFile);
And I got the error:
AmazonS3Exception: The specified key does not exist.
Keep in mind that S3 is not a filesystem, but it is an object store. There's a huge difference between the two, one being that directory-style activities simply won't work.
Suppose you have an S3 bucket with two objects in it:
/path/to/file1.txt
/path/to/file2.txt
When working with these objects you can't simply refer to /path/to/ like you can when working with files in a filesystem directory. That's because /path/to/ is not a directory but just part of a key in a very large hash table. This is why the error message indicates an issue with a key. These are not filename paths but keys to objects within the object store.
In order to copy all the files in a location like /path/to/ you need to perform it in multiple steps. First, you need to get a listing of all the objects whose keys begin with /path/to, then you need to loop through each individual object and copy them one by one.
Here is a similar question with an answer that shows how to download multiple files from S3 using Java.
I know this question was asked longtime ago, but still this answer might help some one.
You might want to use something like this to download objects from S3
new ListObjectsV2Request().withBucketName("bucketName").withDelimiter("delimiter").withPrefix("path/to/image/");
as mentioned in the S3 doc
delimiter be "/" and prefix be your "folder like structure".
You can use the predefined classes for upload directory and download directory
For Download
MultipleFileDownload xfer = xfer_mgr.downloadDirectory(
bucketName, key, new File("C:\\Users\\miracle\\Deskto\\Downloads"));
For Upload
MultipleFileUpload xfer = xfer_mgr.uploadDirectory(bucketName, key,Dir,true);
The error message means that the bucket (in this case "bucketA") does not contain a file with the name you specified (in this case "/bucketA/output/").
When you specify the key, do not include the bucket name in the key. S3 supports "folders" in the key, which are delimited with "/", so you probably do not want to try to use keys that end with "/".
If your bucket "bucketA" contains a file called "output", you probably want to say
new GetObjectRequest("bucketA", "output")
If this doesn't work, other things to check:
Do the credentials you are using have permission to read from the bucket?
Did you spell all the names correctly?
You might want to use listObjects("bucketA") to verify what the bucket actually contains (as seen with the credentials you are using).

Google Appengine JAVA -Zipping up blobstore files results in error 202 when saving back to blobstore

I am working on a application in appengine that we want to be able to make the content available for offline users. This means we need to get all the used blobstore files and save them off for the offline user. I am using the server side to do this so that it is only done once, and not for every end user. I am using the task queue to run this process as it can easily time out. Assume all this code is running as a task.
Small collections work fine, but larger collections result in a appengine error 202 and it restarts the task again and again. Here is the sample code that comes from combination of Writing Zip Files to GAE Blobstore and following the advice for large zip files at Google Appengine JAVA - Zip lots of images saving in Blobstore by reopening the channel as needed. Also referenced AppEngine Error Code 202 - Task Queue as the error.
//Set up the zip file that will be saved to the blobstore
AppEngineFile assetFile = fileService.createNewBlobFile("application/zip", assetsZipName);
FileWriteChannel writeChannel = fileService.openWriteChannel(assetFile, true);
ZipOutputStream assetsZip = new ZipOutputStream(new BufferedOutputStream(Channels.newOutputStream(writeChannel)));
HashSet<String> blobsEntries = getAllBlobEntries(); //gets blobs that I need
saveBlobAssetsToZip(blobsEntries);
writeChannel.closeFinally();
.....
private void saveBlobAssetsToZip(blobsEntries) throws IOException {
for (String blobId : blobsEntries) {
/*gets the blobstote key that will result in the blobstore entry - ignore the bsmd as
that is internal to our wrapper for blobstore.*/
BlobKey blobKey = new BlobKey(bsmd.getBlobId());
//gets the blob file as a byte array
byte[] blobData = blobstoreService.fetchData(blobKey, 0, BlobstoreService.MAX_BLOB_FETCH_SIZE-1);
String extension = type of file saved from our metadata (ie .jpg, .png, .pfd)
assetsZip.putNextEntry(new ZipEntry(blobId + "." + extension));
assetsZip.write(blobData);
assetsZip.closeEntry();
assetsZip.flush();
/*I have found that if I don't close the channel and reopen it, I can get a IO exception
because the files in the blobstore are too large, thus the write a file and then close and reopen*/
assetsZip.close();
writeChannel.close();
String assetsPath = assetFile.getFullPath();
assetFile = new AppEngineFile(assetsPath);
writeChannel = fileService.openWriteChannel(assetFile, true);
assetsZip = new ZipOutputStream(new BufferedOutputStream(Channels.newOutputStream(writeChannel)));
}
}
What is the proper way to get this to run on appengine? Again small projects work fine and zip saves, but larger projects with more blob files results in this error.
I bet that the instance is running out of memory. Are you using appstats? It can consume a large amount of memory. If that doesn't work you will probably need to increase the instance size.

Uploading files to S3 using AmazonS3Client.java api

I am using AmazonS3Client.java to upload files to S3 from my application. I am using the putObject method to upload the file
val putObjectRequest = new PutObjectRequest(bucketName, key, inputStream, metadata)
val acl = CannedAccessControlList.Private
putObjectRequest.setCannedAcl(acl)
s3.putObject(putObjectRequest)
This works for buckets at the topmost level in my S3 account. Now, suppose i want to upload the file to a sub-bucket for example bucketB which is inside bucketA . How should i specify the bucket name for bucketB ?
Thank You
It is admittedly somewhat surprising, but there is no such thing as a "sub-bucket" in S3. All buckets are top-level. The structures inside buckets that you see in the S3 admin console or other UIs are called "folders", but even they don't really exist! You can't directly create or destroy folders, for instance, or set any attributes on them. Folders are purely a presentation-level convention for viewing the underlying flat set of objects in your bucket. That said, it's pretty easy to split your objects into (purely non-existent) folders. Just give them heirarchical names, with each level separated by a "/".
val putObjectRequest = new PutObjectRequest(bucketName, topFolderName +"/" + subFolderName+ "/" +key, inputStream, metadata)
Trying using putObjectRequest.setKey("folder")

Categories

Resources