Zip entire directory on Amazon S3 using java

Zip entire directory on Amazon S3 using java - java

If I have a directory with small files on S3, is there a way to easily zip up the entire directory and leave the resulting zip file on S3 Using JAVA?

Amazon S3 does not have an operation that lets you ZIP up objects in an S3 bucket out of the box. However, you can do this with AWS SDK for Java V2. The high level steps are:
Get all objects in an S3 bucket by calling s3.listObjects().
For each object, get the byte[] by calling s3.getObjectAsBytes().
Place each file name and each byte[] into a MAP.
Map<String, byte[]> mapReport = new HashMap<>();
You can use Java logic to create a ZIP from the MAP.
Put the ZIP into an S3 bucket by calling s3.PutObject.
To create ZIP, use Java logic such as:
// Pass a map and get back a byte[] that represents a ZIP of all images.
public byte[] listBytesToZip(Map<String, byte[]> mapReporte) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(baos);
for (Map.Entry<String, byte[]> reporte : mapReporte.entrySet()) {
ZipEntry entry = new ZipEntry(reporte.getKey());
entry.setSize(reporte.getValue().length);
zos.putNextEntry(entry);
zos.write(reporte.getValue());
}
zos.closeEntry();
zos.close();
return baos.toByteArray();
}
I tested these steps from my sample web app. My web app downloaded the ZIP file using the logic specified in this thread. Results were perfect:

Related

How can I transform an uncompressed file into zipped bytes?

In Java for a JUnit test, I am trying to mock a function that downloads a Zip File from another external API's endpoint. To simulate the download, I need to zip a test file and transform it into bytes to use as the mock's return value. I do not need to write the zipped file back to the file system but use the bytes raw as they are.
mock(zipReturner.getZipBytes()).thenReturn(testFileAsZippedBytes("testFile.txt"))
private Optional<byte[]> testFileAsZippedBytes(String testFile) {
???
}

Sharing my answer, because all the other examples I found are much heavier, require many more lines of code looping over bytes, or require using external libraries to do the same thing.
To do this without the above, use a combination of ByteArrayOutputStream, as it has the toByteArray function, ZipOutputStream to write zipped bytes to the ByteArrayOutputStream and FileInputStream to read the test file from the file system.
private Optional<byte[]> testFileAsZippedBytes(String filePath, String fileName) throws IOException {
try (
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
ZipOutputStream zipOutputStream = new ZipOutputStream(byteArrayOutputStream);
FileInputStream fileInputStream = new FileInputStream(filePath + fileName);
) {
ZipEntry zipEntry = new ZipEntry(fileName);
zipOutputStream.putNextEntry(zipEntry);
zipOutputStream.write(fileInputStream.readAllBytes());
zipOutputStream.finish();
return Optional.of(byteArrayOutputStream.toByteArray());
}
}
Use ZipEntry to add the file as an entry to the ZipOutputStream and write the bytes to the zip. Use zipOutputStream.finish() to ensure all contents are written to the stream and are ready to be consumed in the ByteArrayOutputStream, otherwise it was my experience that you would only get partial data when you call byteArrayOutputStream.toByteArray().

Read and write to a file in Amazon s3 bucket

I need to read a large (>15mb) file (say sample.csv) from an Amazon S3 bucket. I then need to process the data present in sample.csv and keep writing it to another directory in the S3 bucket. I intend to use an AWS Lambda function to run my java code.
As a first step I developed java code that runs on my local system. The java code reads the sample.csv file from the S3 bucket and I used the put method to write data back to the S3 bucket. But I find only the last line was processed and put back.
Region clientRegion = Region.Myregion;
AwsBasicCredentials awsCreds = AwsBasicCredentials.create("myAccessId","mySecretKey");
S3Client s3Client = S3Client.builder().region(clientRegion).credentialsProvider(StaticCredentialsProvider.create(awsCreds)).build();
ResponseInputStream<GetObjectResponse> s3objectResponse = s3Client.getObject(GetObjectRequest.builder().bucket(bucketName).key("Input/sample.csv").build());
BufferedReader reader = new BufferedReader(new InputStreamReader(s3objectResponse));
String line = null;
while ((line = reader.readLine()) != null) {
s3Client.putObject(PutObjectRequest.builder().bucket(bucketName).key("Test/Testout.csv").build(),RequestBody.fromString(line));
}
Example: sample.csv contains
1,sam,21,java,beginner;
2,tom,28,python,practitioner;
3,john,35,c#,expert.
My output should be
1,mas,XX,java,beginner;
2,mot,XX,python,practitioner;
3,nhoj,XX,c#,expert.
But only 3,nhoj,XX,c#,expert is written in the Testout.csv.

The putObject() method creates an Amazon S3 object.
It is not possible to append or modify an S3 object, so each time the while loop executes, it is creating a new Amazon S3 object.
Instead, I would recommend:
Download the source file from Amazon S3 to local disk (use GetObject() with a destinationFile to download to disk)
Process the file and output to a local file
Upload the output file to the Amazon S3 bucket (method)
This separates the AWS code from your processing code, which should be easier to maintain.

Creating ZIP file in memory

I need to create a ZIP file which consists of files that are created on-the-fly and have no persistence on the file system.
For example: I want to create an SQLite database in memory and after populating it with data I want to add it to a - not yet existing - ZIP file and than I want to actually write this ZIP file to the file system.
I found several approaches where the files, which are going to be the content of the archive, have to be read from the file system.
Is there actually a way to archive what I want to do? I hoped that compress-commons would help me but apparently they don't.
Do I miss something?

If the in memory object you are trying to zip is serializable, then this is quite easy.
You can take any serializable instance and turn it in to a byte[]. I have a utility method to do this:
public static byte[] convertToBytes(Object object) throws IOException {
try (ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = new ObjectOutputStream(bos)) {
out.writeObject(object);
out.flush();
return bos.toByteArray();
}
}
Once you have a that object represented in bytes, you can use a ZipOutputStream to zip it up:
try (ByteArrayOutputStream bos = new ByteArrayOutputStream();
GZIPOutputStream out = new GZIPOutputStream(bos); ) {
out.write(bytes);
out.finish();
byte[] compressed = bos.toByteArray(); // this is my compressed data
}
(I use Gzip here for simplicity but you can also create a zip with multiple entries, for example).

How to download multiple files from URL as one zip file

I want to download multiple zip files as one zip file for a request.
I have zip file paths like C, https://test12.zip etc. So how can I download these files as a one zip file. I have been searching this for a while. All i got is examples for downloading multiple files(local) and zip them. This is what i tried for downloading one file. For multiple files it won't work.
URL url = new URL("https://test12.zip");
URLConnection connection = url.openConnection();
InputStream stream = connection.getInputStream();
BufferedOutputStream outs = new BufferedOutputStream(response.getOutputStream());
int len;
byte[] buf = new byte[1024];
while ((len = stream.read(buf)) > 0) {
outs.write(buf, 0, len);
}
outs.close();
Any help would be much appreciated.

A ZIP file consists of two parts: First the compressed file entries (filename, attributes and data) and at the end of the file there is a central directory containing a list of all entries, again with filename and attributes.
Hence, you can not directly combine or concatenate zip files. In Java you can only decompress the downloaded zip files on-the-fly (without storing them in the file-system) and at the same time using the decompressed content to create a new combined ZIP file:
First create a ZipOutputStream for the zip file you want to create.
Then use the InputStream of each download and use it with a ZipInputStream.
Iterates through all the entries in every ZipInputStream and for each entry create a new identical entry in the ZipOutputStream and copy the content from the ZipInputStream to the ZipOutputStream.
How to use ZipInputStream see for example: https://stackoverflow.com/a/36648504/150978
Note that this process requires to decompress and afterwards re-compress the file content. Depending on the archive size this can result in a high utilization of one CPU core.

Google Appengine JAVA -Zipping up blobstore files results in error 202 when saving back to blobstore

I am working on a application in appengine that we want to be able to make the content available for offline users. This means we need to get all the used blobstore files and save them off for the offline user. I am using the server side to do this so that it is only done once, and not for every end user. I am using the task queue to run this process as it can easily time out. Assume all this code is running as a task.
Small collections work fine, but larger collections result in a appengine error 202 and it restarts the task again and again. Here is the sample code that comes from combination of Writing Zip Files to GAE Blobstore and following the advice for large zip files at Google Appengine JAVA - Zip lots of images saving in Blobstore by reopening the channel as needed. Also referenced AppEngine Error Code 202 - Task Queue as the error.
//Set up the zip file that will be saved to the blobstore
AppEngineFile assetFile = fileService.createNewBlobFile("application/zip", assetsZipName);
FileWriteChannel writeChannel = fileService.openWriteChannel(assetFile, true);
ZipOutputStream assetsZip = new ZipOutputStream(new BufferedOutputStream(Channels.newOutputStream(writeChannel)));
HashSet<String> blobsEntries = getAllBlobEntries(); //gets blobs that I need
saveBlobAssetsToZip(blobsEntries);
writeChannel.closeFinally();
.....
private void saveBlobAssetsToZip(blobsEntries) throws IOException {
for (String blobId : blobsEntries) {
/*gets the blobstote key that will result in the blobstore entry - ignore the bsmd as
that is internal to our wrapper for blobstore.*/
BlobKey blobKey = new BlobKey(bsmd.getBlobId());
//gets the blob file as a byte array
byte[] blobData = blobstoreService.fetchData(blobKey, 0, BlobstoreService.MAX_BLOB_FETCH_SIZE-1);
String extension = type of file saved from our metadata (ie .jpg, .png, .pfd)
assetsZip.putNextEntry(new ZipEntry(blobId + "." + extension));
assetsZip.write(blobData);
assetsZip.closeEntry();
assetsZip.flush();
/*I have found that if I don't close the channel and reopen it, I can get a IO exception
because the files in the blobstore are too large, thus the write a file and then close and reopen*/
assetsZip.close();
writeChannel.close();
String assetsPath = assetFile.getFullPath();
assetFile = new AppEngineFile(assetsPath);
writeChannel = fileService.openWriteChannel(assetFile, true);
assetsZip = new ZipOutputStream(new BufferedOutputStream(Channels.newOutputStream(writeChannel)));
}
}
What is the proper way to get this to run on appengine? Again small projects work fine and zip saves, but larger projects with more blob files results in this error.

I bet that the instance is running out of memory. Are you using appstats? It can consume a large amount of memory. If that doesn't work you will probably need to increase the instance size.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Zip entire directory on Amazon S3 using java - java

If I have a directory with small files on S3, is there a way to easily zip up the entire directory and leave the resulting zip file on S3 Using JAVA?

Related

How can I transform an uncompressed file into zipped bytes?

Read and write to a file in Amazon s3 bucket

Creating ZIP file in memory

How to download multiple files from URL as one zip file

Google Appengine JAVA -Zipping up blobstore files results in error 202 when saving back to blobstore

Categories

Resources