I wrote a piece of (java) software that downloads objects (archives) from an S3 bucket, extracts the data locally and does operations on it.
A few days back, I set the lifecycle policy of all the objects in the "folder" within S3 to be moved to glacier automatically 2 days after creation, so that I have the time to DL and extract the data before it's archived. However, when accessing the data programmatically, Amazon Web Services throws an error
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: The operation is not valid for the object's storage class
I suppose this is due to the fact that the objects' storage classes have been updated to Glacier.
So far I have used the following code to access my S3 data:
public static void downloadObjectFromBucket(String bucketName, String pathToObject, String objectName) throws IOException{
AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());
S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, pathToObject));
InputStream reader = new BufferedInputStream(object.getObjectContent());
File file = new File(objectName);
OutputStream writer = new BufferedOutputStream(new FileOutputStream(file));
int read = -1;
while ( ( read = reader.read() ) != -1 ) {
writer.write(read);
}
writer.flush();
writer.close();
reader.close();
}
Do I have to update my code or change some settings in the AWS Console? It is unclear to me, since the objects are still in S3 and accessing every S3 object has been working wonderfully up until a few days ago where I adapted the lifecycle policies....
An Amazon S3 lifecycle policy can be used to archive objects from S3 into Amazon Glacier.
When archived (as indicated by a Storage Class of Glacier), the object still "appears" to be in S3 (it appears in listings, you can see its size and metadata) but the contents of the object is kept in Glacier. Therefore, the contents cannot be accessed.
To retrieve the contents of an object in S3 with a Storage Class of Glacier, you will need to RestoreObject to retrieve the contents into S3. This takes 3-5 hours. You also nominate a duration for how long the contents should remain in S3 (where is will be stored with a Storage Class of Reduced Redundancy). Once the object is restored, you can retrieve the contents of the object.
Related
I need to read a large (>15mb) file (say sample.csv) from an Amazon S3 bucket. I then need to process the data present in sample.csv and keep writing it to another directory in the S3 bucket. I intend to use an AWS Lambda function to run my java code.
As a first step I developed java code that runs on my local system. The java code reads the sample.csv file from the S3 bucket and I used the put method to write data back to the S3 bucket. But I find only the last line was processed and put back.
Region clientRegion = Region.Myregion;
AwsBasicCredentials awsCreds = AwsBasicCredentials.create("myAccessId","mySecretKey");
S3Client s3Client = S3Client.builder().region(clientRegion).credentialsProvider(StaticCredentialsProvider.create(awsCreds)).build();
ResponseInputStream<GetObjectResponse> s3objectResponse = s3Client.getObject(GetObjectRequest.builder().bucket(bucketName).key("Input/sample.csv").build());
BufferedReader reader = new BufferedReader(new InputStreamReader(s3objectResponse));
String line = null;
while ((line = reader.readLine()) != null) {
s3Client.putObject(PutObjectRequest.builder().bucket(bucketName).key("Test/Testout.csv").build(),RequestBody.fromString(line));
}
Example: sample.csv contains
1,sam,21,java,beginner;
2,tom,28,python,practitioner;
3,john,35,c#,expert.
My output should be
1,mas,XX,java,beginner;
2,mot,XX,python,practitioner;
3,nhoj,XX,c#,expert.
But only 3,nhoj,XX,c#,expert is written in the Testout.csv.
The putObject() method creates an Amazon S3 object.
It is not possible to append or modify an S3 object, so each time the while loop executes, it is creating a new Amazon S3 object.
Instead, I would recommend:
Download the source file from Amazon S3 to local disk (use GetObject() with a destinationFile to download to disk)
Process the file and output to a local file
Upload the output file to the Amazon S3 bucket (method)
This separates the AWS code from your processing code, which should be easier to maintain.
I am new to S3 and I am trying to create multiple directories in Amazon S3 using java by only making one call to S3.
I could only come up with this :-
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(0);
InputStream emptyContent = new ByteArrayInputStream(new byte[0]);
PutObjectRequest putObjectRequest = new PutObjectRequest(bucket,
"test/tryAgain/", emptyContent, metadata);
s3.putObject(putObjectRequest);
But the problem with this while uploading 10 folders (when the key ends with "/" in the console we can see the object as a folder ) is that I have to make 10 calls to S3.
But I want to do a create all the folders at once like we do a batch delete using DeleteObjectsRequest.
Can anyone please suggest me or help me how to solve my problem ?
Can you be a bit more specific as to what you're trying to do (or avoid doing)?
If you're primarily concerned with the cost per PUT, I don't think there is a way to batch 'upload' a directory with each file being a separate key and avoid that cost. Each PUT (even in a batch process) will cost you the price per PUT.
If you're simply trying to find a way to efficiently and recursively upload a folder, check out the uploadDirectory() method of TransferManager.
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html#uploadDirectory-java.lang.String-java.lang.String-java.io.File-boolean-
public MultipleFileUpload uploadDirectory(String bucketName,
String virtualDirectoryKeyPrefix,
File directory,
boolean includeSubdirectories)
I have a Java Spring MVC web application. From client, through AngularJS, I am uploading a file and posting it to Controller as webservice.
In my Controller, I am gettinfg it as MultipartFile and I can copy it to local machine.
But I want to upload the file to Amazone S3 bucket. So I have to convert it to java.io.File. Right now what I am doing is, I am copying it to local machine and then uploading to S3 using jets3t.
Here is my way of converting in controller
MultipartHttpServletRequest mRequest=(MultipartHttpServletRequest)request;
Iterator<String> itr=mRequest.getFileNames();
while(itr.hasNext()){
MultipartFile mFile=mRequest.getFile(itr.next());
String fileName=mFile.getOriginalFilename();
fileLoc="/home/mydocs/my-uploads/"+date+"_"+fileName; //date is String form of current date.
Then I am using FIleCopyUtils of SpringFramework
File newFile = new File(fileLoc);
// if the directory does not exist, create it
if (!newFile.getParentFile().exists()) {
newFile.getParentFile().mkdirs();
}
FileCopyUtils.copy(mFile.getBytes(), newFile);
So it will create a new file in the local machine. That file I am uplaoding in S3
S3Object fileObject = new S3Object(newFile);
s3Service.putObject("myBucket", fileObject);
It creates file in my local system. I don't want to create.
Without creating a file in local system, how to convert a MultipartFIle to java.io.File?
MultipartFile, by default, is already saved on your server as a file when user uploaded it.
From that point - you can do anything you want with this file.
There is a method that moves that temp file to any destination you want.
http://docs.spring.io/spring/docs/3.0.x/api/org/springframework/web/multipart/MultipartFile.html#transferTo(java.io.File)
But MultipartFile is just API, you can implement any other MultipartResolver
http://docs.spring.io/spring/docs/3.0.x/api/org/springframework/web/multipart/MultipartResolver.html
This API accepts input stream and you can do anything you want with it. Default implementation (usually commons-multipart) saves it to temp dir as a file.
But other problem stays here - if S3 API accepts a file as a parameter - you cannot do anything with this - you need a real file. If you want to avoid creating files at all - create you own S3 API.
The question is already more than one year old, so I'm not sure if the jets35 link provided by the OP had the following snippet at that time.
If your data isn't a File or String you can use any input stream as a data source, but you must manually set the Content-Length.
// Create an object containing a greeting string as input stream data.
String greeting = "Hello World!";
S3Object helloWorldObject = new S3Object("HelloWorld2.txt");
ByteArrayInputStream greetingIS = new ByteArrayInputStream(greeting.getBytes());
helloWorldObject.setDataInputStream(greetingIS);
helloWorldObject.setContentLength(
greeting.getBytes(Constants.DEFAULT_ENCODING).length);
helloWorldObject.setContentType("text/plain");
s3Service.putObject(testBucket, helloWorldObject);
It turns out you don't have to create a local file first. As #Boris suggests you can feed the S3Object with the Data Input Stream, Content Type and Content Length you'll get from MultipartFile.getInputStream(), MultipartFile.getContentType() and MultipartFile.getSize() respectively.
Instead of copying it to your local machine, you can just do this and replace the file name with this:
File newFile = new File(multipartFile.getOriginalName());
This way, you don't have to have a local destination create your file
if you are try to use in httpentity check my answer here
https://stackoverflow.com/a/68022695/7532946
I am working on a application in appengine that we want to be able to make the content available for offline users. This means we need to get all the used blobstore files and save them off for the offline user. I am using the server side to do this so that it is only done once, and not for every end user. I am using the task queue to run this process as it can easily time out. Assume all this code is running as a task.
Small collections work fine, but larger collections result in a appengine error 202 and it restarts the task again and again. Here is the sample code that comes from combination of Writing Zip Files to GAE Blobstore and following the advice for large zip files at Google Appengine JAVA - Zip lots of images saving in Blobstore by reopening the channel as needed. Also referenced AppEngine Error Code 202 - Task Queue as the error.
//Set up the zip file that will be saved to the blobstore
AppEngineFile assetFile = fileService.createNewBlobFile("application/zip", assetsZipName);
FileWriteChannel writeChannel = fileService.openWriteChannel(assetFile, true);
ZipOutputStream assetsZip = new ZipOutputStream(new BufferedOutputStream(Channels.newOutputStream(writeChannel)));
HashSet<String> blobsEntries = getAllBlobEntries(); //gets blobs that I need
saveBlobAssetsToZip(blobsEntries);
writeChannel.closeFinally();
.....
private void saveBlobAssetsToZip(blobsEntries) throws IOException {
for (String blobId : blobsEntries) {
/*gets the blobstote key that will result in the blobstore entry - ignore the bsmd as
that is internal to our wrapper for blobstore.*/
BlobKey blobKey = new BlobKey(bsmd.getBlobId());
//gets the blob file as a byte array
byte[] blobData = blobstoreService.fetchData(blobKey, 0, BlobstoreService.MAX_BLOB_FETCH_SIZE-1);
String extension = type of file saved from our metadata (ie .jpg, .png, .pfd)
assetsZip.putNextEntry(new ZipEntry(blobId + "." + extension));
assetsZip.write(blobData);
assetsZip.closeEntry();
assetsZip.flush();
/*I have found that if I don't close the channel and reopen it, I can get a IO exception
because the files in the blobstore are too large, thus the write a file and then close and reopen*/
assetsZip.close();
writeChannel.close();
String assetsPath = assetFile.getFullPath();
assetFile = new AppEngineFile(assetsPath);
writeChannel = fileService.openWriteChannel(assetFile, true);
assetsZip = new ZipOutputStream(new BufferedOutputStream(Channels.newOutputStream(writeChannel)));
}
}
What is the proper way to get this to run on appengine? Again small projects work fine and zip saves, but larger projects with more blob files results in this error.
I bet that the instance is running out of memory. Are you using appstats? It can consume a large amount of memory. If that doesn't work you will probably need to increase the instance size.
I am using AmazonS3Client.java to upload files to S3 from my application. I am using the putObject method to upload the file
val putObjectRequest = new PutObjectRequest(bucketName, key, inputStream, metadata)
val acl = CannedAccessControlList.Private
putObjectRequest.setCannedAcl(acl)
s3.putObject(putObjectRequest)
This works for buckets at the topmost level in my S3 account. Now, suppose i want to upload the file to a sub-bucket for example bucketB which is inside bucketA . How should i specify the bucket name for bucketB ?
Thank You
It is admittedly somewhat surprising, but there is no such thing as a "sub-bucket" in S3. All buckets are top-level. The structures inside buckets that you see in the S3 admin console or other UIs are called "folders", but even they don't really exist! You can't directly create or destroy folders, for instance, or set any attributes on them. Folders are purely a presentation-level convention for viewing the underlying flat set of objects in your bucket. That said, it's pretty easy to split your objects into (purely non-existent) folders. Just give them heirarchical names, with each level separated by a "/".
val putObjectRequest = new PutObjectRequest(bucketName, topFolderName +"/" + subFolderName+ "/" +key, inputStream, metadata)
Trying using putObjectRequest.setKey("folder")