How to upload a file to S3 only if not uploaded already?

How to upload a file to S3 only if not uploaded already? - java

I'm using the AWS Transfer Manager to backup a lot of files to S3. Sometimes the backup fails in the middle in the middle, and I don't want to re-upload all the files, but only the ones that haven't been uploaded yet.
Is there something baked in the Transfer Manager or the S3 Put Request that would let me do that automatically, or is my only solution to check the MD5 of the file with a HEAD request first, and see if it's different before starting the upload.
Thanks!

Rather than coding your own solution, you could use the AWS Command-Line Interface (CLI) to copy or sync the files to Amazon S3.
For example:
aws s3 sync <directory> s3://my-bucket/<directory>
The sync command will only copy files that are not in the destination. So, just run it on a regular basis and it will copy all the files to the S3 bucket!

You can do that using the continue with block. For every upload you can define a retry strategy in the failed upload case. For e.g.:
[[transferManager upload:uploadRequest] continueWithBlock:^id(AWSTask *task) {
if (task.error){
// Handle failed upload here
}
if (task.result) {
//File uploaded successfully.
}
return nil;
}];
You could also create a list of tasks and then use
NSMutableArray *tasks = [NSMutableArray new];
AWSTask * taskForUpload = [transferManager upload:uploadRequest];
[tasks addObject:taskForUpload];
// add more tasks as required
[[AWSTask taskForCompletionOfAllTasks:tasks] continueWithBlock:^id(AWSTask *task) {
if (task.error != nil) {
// Handler error / failed uploads here
} else {
// Handle successful uploads here
}
return nil;
}];
This will perform all the tasks in the list and then give you list of errors which you can retry.
Thanks,
Rohan

Related

Get progress of a file being downloaded from google drive using Google Drive API v3

I'm using this code to download a file from Google Drive:
Drive.Files.Get get = SERVICE.files().get(file.getId());
get.getMediaHttpDownloader().setProgressListener(new ProgressListener());
get.getMediaHttpDownloader().setDirectDownloadEnabled(false);
get.getMediaHttpDownloader().setChunkSize(1000000);
I want to download a file about 10 MB and i want to track the progress.
When i run my code it always shows "0.0" and after the download finished it shows "1.0" in the console.
My listener:
public class ProgressListener implements MediaHttpDownloaderProgressListener {
public void progressChanged(MediaHttpDownloader downloader) {
switch (downloader.getDownloadState()) {
case MEDIA_IN_PROGRESS:
System.out.println(downloader.getProgress());
break;
case MEDIA_COMPLETE:
System.out.println("Download is complete!");
}
}
I only get updates like every 10 seconds..., but then the download is already finished
Pls help me, thanks!

When using chunked upload/download (direct download on false) the api uploads the file in chunk in sized defined by the dev.
Each time a chunk is sent/recieved the progrss listener updates and in your case runs the block that matches its state.
This means that having a chunk size of 1_000_000 means that every chunk is made of 1 MB and this is the probable cause of the slow refresh rate.

Limiting the S3 PUT file size using pre-signed URLs

I am generating S3 pre signed URLs so that the client(mobile app) can PUT an image directly to S3 instead of going through a service. For my use case the expiry time of the pre signed URL needs to be configured for a longer window (10-20 mins). Therefore, I want to limit the size of file upload to S3 so that any malicious attacker can not upload large files to the S3 bucket. The client will get the URL from a service which has access to the S3 bucket. I am using AWS Java SDK.
I found that this can be done using POST forms for browser uploads but how can I do this using just signed S3 URL PUT?

I was using S3-Signed-URLS the first time and was also concerned about this.
And I think this whole signed Urls stuff is a bit of a pain because you cant put a maximum Object/Upload size limit on them.
I think thats something very important on file-uploads in general, that is just missing..
By not having this option you are forced to handle that problem with the expiry time etc. This gets really messy..
But it seems that you can use S3 Buckets also with normal Post-Requests, what has a content-length parameter in their policy.
So I'll probably exchange my Signed-URLS with POST-Routes in the future.
I think for proper, larger applications this is the way to go.(?)
What might help with your issue:
In the JavaScript SDK there is a method / function that gets you only the meta-data of the an S3-Object (including File Size) without downloading the whole file.
It's called s3.headObject()
I think, after the upload is done, it takes some time for AWS to process that newly uploaded file and then is available in your bucket.
What I did was, I set a timer after each upload to check the file-size and if its bigger 1mb, it will delete the file.
I think for production you wanna log that somewhere in a DB.
My FileNames also include the user-id of who uploaded the file.
That way, you can block an account after a too big upload if you wanted.
This here worked for me in javascript..
function checkS3(key) {
return new Promise((resolve, reject) => {
s3.headObject(headParams, (err, metadata) => {
console.log("setTimeout upload Url");
if (err && ["NotFound", "Forbidden"].indexOf(err.code) > -1) {
// console.log(err);
return reject(err);
//return resolve();
} else if (err) {
const e = Object.assign({}, Errors.SOMETHING_WRONG, { err });
// console.log(e);
// console.log(err);
return reject(e);
}
return resolve(metadata);
});
});
}

How to resume upload with AWS S3 Android

I am using AWS S3 bucket for Uploading list of files, I am using MultipleFileUpload and here is my request, while uploading the files if the internet gets disconnected and again came back then uploading process is not getting updated. How can I do so when internet is coming back, it should automatically get uploaded from the last position.
final ObjectMetadataProvider metadataProvider = new ObjectMetadataProvider() {
public void provideObjectMetadata(File file, ObjectMetadata metadata) {
}
};
final MultipleFileUpload multipleFileUpload = transferManager.uploadFileList(HttpUrls.IMAGE_BUCKET_NAME, "photos/mint_original/", myDir_temp, upload_file, metadataProvider);

The TransferManager component in the AWS Android SDK has been deprecated in favor of the TransferUtility component. The TransferUtility component allows you to pause and resume transfers. It also has support for network monitoring and will automatically pause and resume transfers when the network goes down and comes back up. Here is the link to the TransferUtility documentation - https://aws-amplify.github.io/docs/android/storage

Resume S3 multipart upload: PartETag

I'm trying to implement multipart upload in Java, following this sample: https://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html
But my actual task is a bit more complicated: I need to support resuming in case application was shut down during uploading. Also, I can't use TransferManager - I need to use low-level API for particular reason.
The code there is pretty straight-forward, but the problem comes with List<PartETag> partETags part. When finalizing resumed upload, I need to have this collection, previously filled during the upload process. And, obviously, if I'm trying to finalize upload after application restart, I don't have this collection anymore.
So the question is: how do I finalize resumed upload? Is it possible to obtain List<PartETag> partETags from the server using some API? What I have is only a MultipartUpload object.

Get the list of multipart uploads in progress
MultipartUploadListing multipartUploadListing =
s3Client.listMultipartUploads(new ListMultipartUploadsRequest(bucketName));
## for uploadId and keyName
Get the list of parts for each uploadId and key
PartsListing partsListing =
s3Client.listParts(new ListPartsRequest(bucketName, key, uploadId));
Get the List of part summary
List<PartSummary> parts = partsListing.getParts();
From PartSummary getETag() and getPartNumber()
for(PartSummary part: parts)
{
part.getETag();
part.getPartNumber();
}
Amazon S3 SDK Package
AmazonS3 client

Is there a simple method to check if there are changes in a SFTP server?

My objective is to poll the SFTP server for changes. My first thought is to check if the number of files in the dir changed. Then maybe some additional checks for changes in the dir.
Currently I'm using the following:
try {
FileSystemOptions opts = new FileSystemOptions();
SftpFileSystemConfigBuilder.getInstance().setStrictHostKeyChecking(opts, "no");
SftpFileSystemConfigBuilder.getInstance().setUserDirIsRoot(opts, true);
SftpFileSystemConfigBuilder.getInstance().setTimeout(opts, 60000);
FileSystemManager manager = VFS.getManager();
FileObject remoteFile = manager.resolveFile(SFTP_URL, opts);
FileObject[] fileObjects = remoteFile.getChildren();
System.out.println(DateTime.now() + " --> total number of files: " + Objects.length);
for (FileObject fileObject : fileObjects) {
if (fileObject.getName().getBaseName().startsWith("zzzz")) {
System.out.println("found one: " + Object.getName().getBaseName());
}
}
} catch (Exception e) {
e.printStackTrace();
}
This is using apache commons vfs2 2.2.0. It works "fine", but when the server has too many files, it takes over minutes just to get the count(currently, it takes over 2 minutes to get the count for a server that has ~10k files). Any way to get the count or other changes on the server faster?

Unfortunately there's no simple way in the SFTP protocol to get the changes. If you can have some daemon running on the server OR if the source of the new files can create/update a helper file, creation of such file with the last modification time in its name or contents can be an option.

I know the SFTP protocol fairly well, having developed commercial SFTP clients and an SFTP server (CompleteFTP), and as far as I know there's no way within the protocol to get a count of files in a directory without listing it. Some servers, such as ours, provide ways of adding custom commands to servers that you can invoke from the client, so it would be possible to add a custom command that returns the number of files in a directory. CompleteFTP also allows you to write custom file-systems so you could potentially write one that only shows files that have changed after a given timestamp when you do a listing, which might be another approach. Our server only runs on Windows though, so that might be show-stopper for you.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to upload a file to S3 only if not uploaded already? - java

Related

Get progress of a file being downloaded from google drive using Google Drive API v3

Limiting the S3 PUT file size using pre-signed URLs

How to resume upload with AWS S3 Android

Resume S3 multipart upload: PartETag

Is there a simple method to check if there are changes in a SFTP server?

Categories

Resources