Download the files inside compressed .gz files from S3 Bucket

Download the files inside compressed .gz files from S3 Bucket - java

I have a set of .gz compressed files in s3 bucket. I want to download the csv file inside the .gz file. I tried to extract the .gz file and put it into the s3Object. Now i need to extract the s3 object and download the csv file inside it using java. Please advise.This is the code I used.Now i am able to download gz file. But I need to download csv file inside gz.
S3Object object = s3Client.getObject(“bucket”,“Location/file.gz”);
final String encoding = null;
return ResponseEntity.ok(IOUtils.toString(object.getObjectContent(), encoding));
I need help in unzipping the gz file in s3object and return the decompressed contents in the response.

The below code will convert your gunzip file into plain data, but I'm not sure 100% about your actual issue, whether you want to display the content in browser itself or you want to send it as Save as Option, that's why I did a minimum code change to your code assuming you have only problem in converting gunzip format to CSV data, hope you could modify/enhance it that suits you best.
import java.util.zip.GZIPInputStream;
//your method begins from here
final AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient();
S3Object object = s3.getObject("your-bucket", "file-path");
return ResponseEntity.ok(IOUtils.toString(new GZIPInputStream(object.getObjectContent())));

Related

create zip file without writing to disk

I am working on a Springboot application that has to return a zip file to a frontend when the user downloads some report. I want to create a zip file without writing the zip file or the original files to disk.
The directory I want to zip contains other directories, that contain the actual files. For example, dir1 has subDir1 and subDir2 inside, subDir1 will have two file subDir1File1.pdf and subDir1File2.pdf. subDir2 will also have files inside.
I can do this easily by creating the physical files on the disk. However, I feel it will be more elegant to return these files without writing to disk.

You would use ByteArrayOutputStream if the scope was to write to memory. In essence, the zip file would be entirely contained in memory, so be sure that you don't risk to have too many requests at once and that the file size is reasonable in size! Otherwise this approach can seriously backfire!

You can use following snippet :
public static byte[] zip(final String str) throws IOException {
if (StringUtils.isEmpty(str)) {
throw new IllegalArgumentException("Cannot zip null or empty string");
}
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try (GZIPOutputStream gos = new GZIPOutputStream(bos)) {
gos.write(str.getBytes(StandardCharsets.UTF_8));
}
return bos.toByteArray();
}
But as stated in another answer, make sure you are not risking your program too much by loading everything into your java memory.

Please note that you should stream whenever possible. In your case, you could write your data to https://docs.oracle.com/javase/8/docs/api/index.html?java/util/zip/ZipOutputStream.html.
The only downside of this appproach is: the client won't be able to show a download status bar, because the server will not be able to send the "Content-length" header. That's because the size of a ZIP file can only be known after it has been generated, but the server needs to send the headers first. So - no temporary zip file - no file size beforehand.
You are also talking about subdirectories. This is just a naming issue when dealing with a ZIP stream. Each zip item needs to be named like this: "directory/directory2/file.txt". This will produce subdirectories when unzipping.

Reading a file from tar.gz archive in Spark

I have a bunch of tar.gz files which I would like to process with Spark without decompressing them.
A single archive is about ~700MB and contains 10 different files but I'm interested only in one of them (which is ~7GB after decompression).
I know that context.textFile supports tar.gz but I'm not sure is it the right tool when an archive contains more then one file. What happens is that Spark will return content of all files (line by line) in the archive including file names with some binary data.
Is there any way to select which file from tar.gz I would like to map?

AFAIK, I'd suggest sc.binaryFiles method... please see below doc. where file name and file content are present, you can map and pickup the file you want and process that.
public RDD<scala.Tuple2<String,PortableDataStream>> binaryFiles(String path,
int minPartitions)
Get an RDD for a Hadoop-readable dataset as PortableDataStream for each file (useful for binary data)
For example, if you have the following files:
hdfs://a-hdfs-path/part-00000
hdfs://a-hdfs-path/part-00001
...
hdfs://a-hdfs-path/part-nnnnn
Do val rdd = sparkContext.binaryFiles("hdfs://a-hdfs-path"),
then rdd contains
(a-hdfs-path/part-00000, its content)
(a-hdfs-path/part-00001, its content)
...
(a-hdfs-path/part-nnnnn, its content)
Also, check this

Converting MultipartFile to java.io.File without copying to local machine

I have a Java Spring MVC web application. From client, through AngularJS, I am uploading a file and posting it to Controller as webservice.
In my Controller, I am gettinfg it as MultipartFile and I can copy it to local machine.
But I want to upload the file to Amazone S3 bucket. So I have to convert it to java.io.File. Right now what I am doing is, I am copying it to local machine and then uploading to S3 using jets3t.
Here is my way of converting in controller
MultipartHttpServletRequest mRequest=(MultipartHttpServletRequest)request;
Iterator<String> itr=mRequest.getFileNames();
while(itr.hasNext()){
MultipartFile mFile=mRequest.getFile(itr.next());
String fileName=mFile.getOriginalFilename();
fileLoc="/home/mydocs/my-uploads/"+date+"_"+fileName; //date is String form of current date.
Then I am using FIleCopyUtils of SpringFramework
File newFile = new File(fileLoc);
// if the directory does not exist, create it
if (!newFile.getParentFile().exists()) {
newFile.getParentFile().mkdirs();
}
FileCopyUtils.copy(mFile.getBytes(), newFile);
So it will create a new file in the local machine. That file I am uplaoding in S3
S3Object fileObject = new S3Object(newFile);
s3Service.putObject("myBucket", fileObject);
It creates file in my local system. I don't want to create.
Without creating a file in local system, how to convert a MultipartFIle to java.io.File?

MultipartFile, by default, is already saved on your server as a file when user uploaded it.
From that point - you can do anything you want with this file.
There is a method that moves that temp file to any destination you want.
http://docs.spring.io/spring/docs/3.0.x/api/org/springframework/web/multipart/MultipartFile.html#transferTo(java.io.File)
But MultipartFile is just API, you can implement any other MultipartResolver
http://docs.spring.io/spring/docs/3.0.x/api/org/springframework/web/multipart/MultipartResolver.html
This API accepts input stream and you can do anything you want with it. Default implementation (usually commons-multipart) saves it to temp dir as a file.
But other problem stays here - if S3 API accepts a file as a parameter - you cannot do anything with this - you need a real file. If you want to avoid creating files at all - create you own S3 API.

The question is already more than one year old, so I'm not sure if the jets35 link provided by the OP had the following snippet at that time.
If your data isn't a File or String you can use any input stream as a data source, but you must manually set the Content-Length.
// Create an object containing a greeting string as input stream data.
String greeting = "Hello World!";
S3Object helloWorldObject = new S3Object("HelloWorld2.txt");
ByteArrayInputStream greetingIS = new ByteArrayInputStream(greeting.getBytes());
helloWorldObject.setDataInputStream(greetingIS);
helloWorldObject.setContentLength(
greeting.getBytes(Constants.DEFAULT_ENCODING).length);
helloWorldObject.setContentType("text/plain");
s3Service.putObject(testBucket, helloWorldObject);
It turns out you don't have to create a local file first. As #Boris suggests you can feed the S3Object with the Data Input Stream, Content Type and Content Length you'll get from MultipartFile.getInputStream(), MultipartFile.getContentType() and MultipartFile.getSize() respectively.

Instead of copying it to your local machine, you can just do this and replace the file name with this:
File newFile = new File(multipartFile.getOriginalName());
This way, you don't have to have a local destination create your file

if you are try to use in httpentity check my answer here
https://stackoverflow.com/a/68022695/7532946

How to create a new java.io.File in memory? [duplicate]

This question already has answers here:
How to read file from ZIP using InputStream?
(7 answers)
Closed 1 year ago.
How can I create new File (from java.io) in memory, not on the hard disk?
I am using the Java language. I don't want to save the file on the hard drive.
I'm faced with a bad API (java.util.jar.JarFile). It's expecting File file of String filename. I have no file (only byte[] content) and can create temporary file, but it's not beautiful solution. I need to validate the digest of a signed jar.
byte[] content = getContent();
File tempFile = File.createTempFile("tmp", ".tmp");
FileOutputStream fos = new FileOutputStream(tempFile);
fos.write(archiveContent);
JarFile jarFile = new JarFile(tempFile);
Manifest manifest = jarFile.getManifest();
Any examples of how to achieve getting manifest without creating a temporary file would be appreciated.

How can I create new File (from java.io) in memory , not in the hard disk?
Maybe you are confusing File and Stream:
A File is an abstract representation of file and directory pathnames. Using a File object, you can access the file metadata in a file system, and perform some operations on files on this filesystem, like delete or create the file. But the File class does not provide methods to read and write the file contents.
To read and write from a file, you are using a Stream object, like FileInputStream or FileOutputStream. These streams can be created from a File object and then be used to read from and write to the file.
You can create a stream based on a byte buffer which resides in memory, by using a ByteArrayInputStream and a ByteArrayOutputStream to read from and write to a byte buffer in a similar way you read and write from a file. The byte array contains the "File's" content. You do not need a File object then.
Both the File... and the ByteArray... streams inherit from java.io.OutputStream and java.io.InputStream, respectively, so that you can use the common superclass to hide whether you are reading from a file or from a byte array.

It is not possible to create a java.io.File that holds its content in (Java heap) memory *.
Instead, normally you would use a stream. To write to a stream, in memory, use:
OutputStream out = new ByteArrayOutputStream();
out.write(...);
But unfortunately, a stream can't be used as input for java.util.jar.JarFile, which as you mention can only use a File or a String containing the path to a valid JAR file. I believe using a temporary file like you currently do is the only option, unless you want to use a different API.
If you are okay using a different API, there is conveniently a class in the same package, named JarInputStream you can use. Simply wrap your archiveContent array in a ByteArrayInputStream, to read the contents of the JAR and extract the manifest:
try (JarInputStream stream = new JarInputStream(new ByteArrayInputStream(archiveContent))) {
Manifest manifest = stream.getManifest();
}
*) It's obviously possible to create a full file-system that resides in memory, like a RAM-disk, but that would still be "on disk" (and not in Java heap memory) as far as the Java process is concerned.

You could use an in-memory filesystem, such as Jimfs
Here's a usage example from their readme:
FileSystem fs = Jimfs.newFileSystem(Configuration.unix());
Path foo = fs.getPath("/foo");
Files.createDirectory(foo);
Path hello = foo.resolve("hello.txt"); // /foo/hello.txt
Files.write(hello, ImmutableList.of("hello world"), StandardCharsets.UTF_8);

I think temporary file can be another solution for that.
File tempFile = File.createTempFile(prefix, suffix, null);
FileOutputStream fos = new FileOutputStream(tempFile);
fos.write(byteArray);
There is a an answer about that here.

How to password protect a zipped Excel file in Java?

I have a question about password protecting an Excel file.
The situation is that, I have a zip file, that has an Excel file in it. I need to write a Java program, to password protect the Excel file. Hence, the user should be able to unzip the file (the zip file need not be password protected). But, the Excel needs to be password-protected. When the user tries to unzip the file, he should be able to do so.
And when he tries to open the Excel file (which is inside the unzipped folder), it must ask for a password. The question is similar to Protect excel file with java, with the added complexity that, the Excel file is zipped.
I have code, that password protects only the zip file, but this is not what I want.
import java.io.File;
import java.util.ArrayList;
import net.lingala.zip4j.core.ZipFile;
import net.lingala.zip4j.exception.ZipException;
import net.lingala.zip4j.model.ZipParameters;
import net.lingala.zip4j.util.Zip4jConstants;
/**
* Demonstrates adding files to zip file with standard Zip Encryption
*/
public class AddFilesWithStandardZipEncryption
{
public AddFilesWithStandardZipEncryption()
{
try {
// Initiate ZipFile object with the path/name of the zip file.
//ZipFile zipFile = new ZipFile("c:\\ZipTest\\AddFilesWithStandardZipEncryption.zip");
ZipFile zipFile = new ZipFile("C:\\homepage\\workspace\\PasswordProtectedFiles\\new.zip");
// Build the list of files to be added in the array list
// Objects of type File have to be added to the ArrayList
ArrayList filesToAdd = new ArrayList();
//filesToAdd.add(new File("C:\\homepage\\workspace\\passwordprotectedzipfile\\profile\\profile.txt"));
filesToAdd.add(new File("C:\\homepage\\workspace\\PasswordProtectedFiles\\new.xlsx"));
//filesToAdd.add(new File("c:\\ZipTest\\myvideo.avi"));
//filesToAdd.add(new File("c:\\ZipTest\\mysong.mp3"));
// Initiate Zip Parameters which define various properties such
// as compression method, etc.
ZipParameters parameters = new ZipParameters();
parameters.setCompressionMethod(Zip4jConstants.COMP_DEFLATE); // set compression method to store compression
// Set the compression level
parameters.setCompressionLevel(Zip4jConstants.DEFLATE_LEVEL_NORMAL);
// Set the encryption flag to true
// If this is set to false, then the rest of encryption properties are ignored
parameters.setEncryptFiles(true);
// Set the encryption method to Standard Zip Encryption
parameters.setEncryptionMethod(Zip4jConstants.ENC_METHOD_STANDARD);
// Set password
parameters.setPassword("test123!");
// Now add files to the zip file
// Note: To add a single file, the method addFile can be used
// Note: If the zip file already exists and if this zip file is a split file
// then this method throws an exception as Zip Format Specification does not
// allow updating split zip files
zipFile.addFiles(filesToAdd, parameters);
}
catch (ZipException e)
{
e.printStackTrace();
}
}
public static void main(String[] args)
{
new AddFilesWithStandardZipEncryption();
}
}

Without uncompressing, Its impossible to password protect excel which is inside a zip file.
Here is what you can do
Unzip the content using tips in What is the best way to extract a zip file using java and Compressing and Decompressing Data Using Java APIs
Password protect extracted excel file using tips in Password Protected Excel File
Zip the password protected excel file using tips in Java Compress Large File

Use java.util.zip or zip4j to decompress file to some temp direcotry or to memory, if you know it's small.
Then use HSSFWorkbook.writeProtectWorkbook from Apache POI library
Compress Excel workbook again.

I think you should check out truezip (Truezip website). It provides read/write access to ZIP, JAR, EAR, WAR etc and supports appending to existing ZIP files.
I suggest you create your zip file without the excel file in it, create your passworded excel file as directed in the link you provided and then use truezip to write this excel file to the archive. Hope this helps

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Download the files inside compressed .gz files from S3 Bucket - java

Related

create zip file without writing to disk

Reading a file from tar.gz archive in Spark

Converting MultipartFile to java.io.File without copying to local machine

How to create a new java.io.File in memory? [duplicate]

How to password protect a zipped Excel file in Java?

Categories

Resources