create zip file without writing to disk

create zip file without writing to disk - java

I am working on a Springboot application that has to return a zip file to a frontend when the user downloads some report. I want to create a zip file without writing the zip file or the original files to disk.
The directory I want to zip contains other directories, that contain the actual files. For example, dir1 has subDir1 and subDir2 inside, subDir1 will have two file subDir1File1.pdf and subDir1File2.pdf. subDir2 will also have files inside.
I can do this easily by creating the physical files on the disk. However, I feel it will be more elegant to return these files without writing to disk.

You would use ByteArrayOutputStream if the scope was to write to memory. In essence, the zip file would be entirely contained in memory, so be sure that you don't risk to have too many requests at once and that the file size is reasonable in size! Otherwise this approach can seriously backfire!

You can use following snippet :
public static byte[] zip(final String str) throws IOException {
if (StringUtils.isEmpty(str)) {
throw new IllegalArgumentException("Cannot zip null or empty string");
}
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try (GZIPOutputStream gos = new GZIPOutputStream(bos)) {
gos.write(str.getBytes(StandardCharsets.UTF_8));
}
return bos.toByteArray();
}
But as stated in another answer, make sure you are not risking your program too much by loading everything into your java memory.

Please note that you should stream whenever possible. In your case, you could write your data to https://docs.oracle.com/javase/8/docs/api/index.html?java/util/zip/ZipOutputStream.html.
The only downside of this appproach is: the client won't be able to show a download status bar, because the server will not be able to send the "Content-length" header. That's because the size of a ZIP file can only be known after it has been generated, but the server needs to send the headers first. So - no temporary zip file - no file size beforehand.
You are also talking about subdirectories. This is just a naming issue when dealing with a ZIP stream. Each zip item needs to be named like this: "directory/directory2/file.txt". This will produce subdirectories when unzipping.

Related

Can I store a file in an ArrayList in Java using getResource?

New to Java. I am building a Java HTTP server (no special libraries allowed). There are certain files I need to serve (templates is what I call them) and I was serving them up using this piece of code:
this.getClass().getResourceAsStream("/http/templates/404.html")
And including them in my .jar. This was working. (I realize I was reading them as an input stream.)
Now I want to store all of my files (as File type) for templates, regular files, redirects in a hashmap that looks like this: url -> file. The I have a Response class that serves up the files.
This works for everything except my templates. If I try to insert the getResource code in the hashmap, I get an error in my Response class.
This is my code that I am using to build my hashmap:
new File(this.getClass().getResource("/http/templates/404.html").getFile())
This is the error I'm getting:
Exception in thread "main" java.io.FileNotFoundException: file:/Users/Kelly/Desktop/Java_HTTP_Server/build/jar/server.jar!/http/templates/404.html (No such file or directory)
I ran this command and can see the templates in my jar:
jar tf server.jar
Where is my thinking going wrong? I think I'm missing a piece to the puzzle.
UPDATE: Here's a slice of what I get when I run the last command above...so I think I have the path to the file correctly?
http/server/serverSocket/SystemServerSocket.class
http/server/serverSocket/WebServerSocket.class
http/server/ServerTest.class
http/templates/
http/templates/404.html
http/templates/file_directory.html
http/templates/form.html

The FileNotFoundException error you are getting is not from this line:
new File(this.getClass().getResource("/http/templates/404.html").getFile())
It appears that after you are storing these File objects in hash map, you are trying to read the file (or serve the file by reading using FileInputStream or related APIs). It would have been more useful if you had given the stack trace and the code which is actually throwing this exception.
But the point is that files present within the JAR files are not the same as files on disk. In particular, a File object represents an abstract path name on disk and all standard libraries using File object assume that it is accessible. So /a/path/like/this is a valid abstract path name, but file:/Users/Kelly/Desktop/Java_HTTP_Server/build/jar/server.jar!/http/templates/404.html is not. This is exactly what you get when you call getResource("/http/templates/404.html").getFile(). It just returns a string representing something that doesn't exist as a file on disk.
There are two ways you can serve resources from class path directly:
Directly return the stream as a response to the request. this.getClass().getResourceAsStream() will return you the InputStream object which you can then return to the caller. This will require you to store an InputStream object in your hash map instead of a file. You can have two hash maps one for files from class path and one for files on disk.
Extract all the templates (possibly on first access) to a temporary location say /tmp and then store the File object representing the newly extracted file.

How to move/rename uploaded file?

I followed this tutorial for uploading a file in my JSF2 application.
The application works fine but I am unhappy with one aspect.
While rebuilding the request, the File sent via request is saved somewhere on the disk.
Even though the file is saved I need to rename the file with a name which is available after entering the Managed Bean containing the action method.
Therefore I decided to create a new file with de desired name, copy the already saved file, and then delete the unneeded one.
private File uploadFile;
//...
try {
BufferedWriter bw = new BufferedWriter(new FileWriter(newFile));
BufferedReader br = new BufferedReader(new FileReader(uploadFile));
String line = "";
while ((line = br.readLine()) != null){
bw.write(line);
}
} catch (Exception e){}
The new file appears in the desired location but this error is thrown when I'm trying to open the file: "Invalid or unsupported PNG file"
These are my questions:
Is there a better way to solve this problem?
Is this solution the best way to upload a picture? Is there a reason to save the file before the business logic when there may be need to resize the picture or the desired name is not available yet.
LE:
I know abot this tutorial as well but I'm trying to do this mojarra only.

There is a rename method built into java.io.File object already, I'd be surprised if it didn't work for your situation.
public boolean renameTo(File dest)
Renames the file denoted by this abstract pathname.
Many aspects of the behavior of this method are inherently platform-dependent:
The rename operation might not be able to move a file from one filesystem to
another, it might not be atomic, and it might not succeed if a file with the
destination abstract pathname already exists. The return value should always
be checked to make sure that the rename operation was successful.
You can also check if a file exists before saving it, and you can use the ImageIO class to do validations on the uploaded file before performing the initial save.

Don't use Reader and Writer when you deal with binary files like images. Use streams: FileInputStream and FileOutputStream. And the best variant is to use #Perception solution with renameTo method.
Readers read file as if it consists of characters (e.g. txt, properties, yaml files). Image files are not characters, they are binary and you must use streams for that.

How to get the name of a TAR in a GZIP as well as the number of TARs

I was wondering if there is any Java API to get the name of the TAR file in a GZIP file as well as the number of TAR files in it. (Not sure if multiple TARs are allowed in a GZIP)
This is how I access the files/directories in a TAR file
FileInputStream fis = new FileInputStream(new File(sourceFile));
GZIPInputStream gin = new GZIPInputStream(fis);
TarInputStream tin = new TarInputStream (gin);
TarEntry tarEntry = tin.getNextEntry();
I need to check if I'm untarring the appropriate TAR file, so that's why I need the info about the name. I also need to make sure there is only one TAR file, hence I need the number of TARs.

Although GZIP files can contain some metainformation including the original filename that will not help you in reality. That filename is not valid in many cases because gzip(1) did not know the name when creating the file because it got the data not from the filesystem but via a pipe-filehandle.
Therefore the usual convention is, that the name of the gzip-file is the same as the original filname with either ".gz" appended or optionally replacing the ".tar" suffix with ".tgz".
On the good side: A GZIP file can contain only one datastream (aka. file in this case) hence only one TAR file. This of course excludes malicious cases where someone concatenates several files, calls gzips on the result and names it ".tar.gz" or ".tgz".

The answer to the second part is that is that a GZIP file only contains one file. If (hypothetically) it did contain more than one file (tar or otherwise), there would be no easy way to separate them.

How to estimate zip file size in java before creating it

I am having a requirement wherein i have to create a zip file from a list of available files. The files are of different types like txt,pdf,xml etc.I am using java util classes to do it.
The requirement here is to maintain a maximum file size of 5 mb. I should select the files from list based on timestamp, add the files to zip until the zip file size reaches 5 mb. I should skip the remaining files.
Please let me know if there is a way in java where in i can estimate the zip file size in advance without creating actual file?
Or is there any other approach to handle this

Wrap your ZipOutputStream into a personalized OutputStream, named here YourOutputStream.
The constructor of YourOutputStream will create another ZipOutputStream (zos2) which wraps a new ByteArrayOutputStream (baos)
public YourOutputStream(ZipOutputStream zos, int maxSizeInBytes)
When you want to write a file with YourOutputStream, it will first write it on zos2
public void writeFile(File file) throws ZipFileFullException
public void writeFile(String path) throws ZipFileFullException
etc...
if baos.size() is under maxSizeInBytes
Write the file in zos1
else
close zos1, baos, zos2 an throw an exception. For the exception, I can't think of an already existant one, if there is, use it, else create your own IOException ZipFileFullException.
You need two ZipOutputStream, one to be written on your drive, one to check if your contents is over 5MB.
EDIT : In fact I checked, you can't remove a ZipEntry easily.
http://download.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#size()

+1 for Colin Herbert: Add files one by one, either back up the previous step or removing the last file if the archive is to big. I just want to add some details:
Prediction is way too unreliable. E.g. a PDF can contain uncompressed text, and compress down to 30% of the original, or it contains already-compressed text and images, compressing to 80%. You would need to inspect the entire PDF for compressibility, basically having to compress them.
You could try a statistical prediction, but that could reduce the number of failed attempts, but you would still have to implement above recommendation. Go with the simpler implementation first, and see if it's enough.
Alternatively, compress files individually, then pick the files that won't exceedd 5 MB if bound together. If unpacking is automated, too, you could bind the zip files into a single uncompressed zip file.

There is a better option. Create a dummy LengthOutputStream that just counts the written bytes:
public class LengthOutputStream extends OutputStream {
private long length = 0L;
#Override
public void write(int b) throws IOException {
length++;
}
public long getLength() {
return length;
}
}
You can just simply connect the LengthOutputStream to a ZipOutputStream:
public static long sizeOfZippedDirectory(File dir) throws FileNotFoundException, IOException {
try (LengthOutputStream sos = new LengthOutputStream();
ZipOutputStream zos = new ZipOutputStream(sos);) {
... // Add ZIP entries to the stream
return sos.getLength();
}
}
The LengthOutputStream object counts the bytes of the zipped stream but stores nothing, so there is no file size limit. This method gives an accurate size estimation but almost as slow as creating a ZIP file.

I dont think there is any way to estimate the size of zip that will be created because the zips are processed as streams. Also it would not be technically possible to predict the size of the created compressed format unless you actually compress it.

I did this once on a project with known input types. We knew that general speaking our data compressed around 5:1 (it was all text.) So, I'd check the file size and divide by 5...
In this case, the purpose for doing so was to check that files would likely be below a certain size. We only needed a rough estimate.
All that said, I have noticed zip applications like 7zip will create a zip file of a certain size (like a CD) and then split the zip off to a new file once it reaches the limit. You could look at that source code. I have actually used the command line version of that app in code before. They have a library you can use as well. Not sure how well that will integrate with Java though.
For what it is worth, I've also used a library called SharpZipLib. It was very good. I wonder if there is a Java port to it.

Maybe you could add a file each time, until you reach the 5MB limit, and then discard the last file. Like #Gopi, I don't think there is any way to estimate it without actually compressing the file.
Of course, file size will not increase (or maybe a little, because of the zip header?), so at least you have a "worst case" estimation.

just wanted to share how we implemented manual way
int maxSizeForAllFiles = 70000; // Read from property
int sizePerFile = 22000; // Red from property
/**
* Iterate all attachment list to verify if ZIP is required
*/
for (String attachFile : inputAttachmentList) {
File file = new File(attachFile);
totalFileSize += file.length();
/**
* if ZIP required ??? based on the size
*/
if (file.length() >= sizePerFile) {
toBeZipped = true;
logger.info("File: "
+ attachFile
+ " Size: "
+ file.length()
+ " File required to be zipped, MAX allowed per file: "
+ sizePerFile);
break;
}
}
/**
* Check if all attachments put together cross MAX_SIZE_FOR_ALL_FILES
*/
if (totalFileSize >= maxSizeForAllFiles) {
toBeZipped = true;
}
if (toBeZipped) {
// Zip Here iterating all attachments
}

How to extract a single file from a remote archive file?

Given
URL of an archive (e.g. a zip file)
Full name (including path) of a file inside that archive
I'm looking for a way (preferably in Java) to create a local copy of that file, without downloading the entire archive first.
From my (limited) understanding it should be possible, though I have no idea how to do that. I've been using TrueZip, since it seems to support a large variety of archive types, but I have doubts about its ability to work in such a way. Does anyone have any experience with that sort of thing?
EDIT: being able to also do that with tarballs and zipped tarballs is also important for me.

Well, at a minimum, you have to download the portion of the archive up to and including the compressed data of the file you want to extract. That suggests the following solution: open a URLConnection to the archive, get its input stream, wrap it in a ZipInputStream, and repeatedly call getNextEntry() and closeEntry() to iterate through all the entries in the file until you reach the one you want. Then you can read its data using ZipInputStream.read(...).
The Java code would look something like this:
URL url = new URL("http://example.com/path/to/archive");
ZipInputStream zin = new ZipInputStream(url.getInputStream());
ZipEntry ze = zin.getNextEntry();
while (!ze.getName().equals(pathToFile)) {
zin.closeEntry(); // not sure whether this is necessary
ze = zin.getNextEntry();
}
byte[] bytes = new byte[ze.getSize()];
zin.read(bytes);
This is, of course, untested.

Contrary to the other answers here, I'd like to point out that ZIP entries are compressed individually, so (in theory) you don't need to download anything more than the directory and the entry itself. The server would need to support the Range HTTP header for this to work.
The standard Java API only supports reading ZIP files from local files and input streams. As far as I know there's no provision for reading from random access remote files.
Since you're using TrueZip, I recommend implementing de.schlichtherle.io.rof.ReadOnlyFile using Apache HTTP Client and creating a de.schlichtherle.util.zip.ZipFile with that.
This won't provide any advantage for compressed TAR archives since the entire archive is compressed together (beyond just using an InputStream and killing it when you have your entry).

Since TrueZIP 7.2, there is a new client API in the module TrueZIP Path. This is an implementation of an NIO.2 FileSystemProvider for JSE 7. Using this API, you can access HTTP URI as follows:
Path path = new TPath(new URI("http://acme.com/download/everything.tar.gz/README.TXT"));
try (InputStream in = Files.newInputStream(path)) {
// Read archive entry contents here.
...
}

I'm not sure if there's a way to pull out a single file from a ZIP without downloading the whole thing first. But, if you're the one hosting the ZIP file, you could create a Java servlet which reads the ZIP file and returns the requested file in the response:
public class GetFileFromZIPServlet extends HttpServlet{
#Override
public void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException{
String pathToFile = request.getParameter("pathToFile");
byte fileBytes[];
//get the bytes of the file from the ZIP
//set the appropriate content type, maybe based on the file extension
response.setContentType("...");
//write file to the response
response.getOutputStream().write(fileBytes);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.