I just read about zip bombs, i.e. zip files that contain very large amount of highly compressible data (00000000000000000...).
When opened they fill the server's disk.
How can I detect a zip file is a zip bomb before unzipping it?
UPDATE Can you tell me how is this done in Python or Java?
Try this in Python:
import zipfile
with zipfile.ZipFile('a_file.zip') as z
print(f'total files size={sum(e.file_size for e in z.infolist())}')
Zip is, erm, an "interesting" format. A robust solution is to stream the data out, and stop when you have had enough. In Java, use ZipInputStream rather than ZipFile. The latter also requires you to store the data in a temporary file, which is also not the greatest of ideas.
Reading over the description on Wikipedia -
Deny any compressed files that contain compressed files.
Use ZipFile.entries() to retrieve a list of files, then ZipEntry.getName() to find the file extension.
Deny any compressed files that contain files over a set size, or the size can not be determined at startup.
While iterating over the files use ZipEntry.getSize() to retrieve the file size.
Don't allow the upload process to write enough data to fill up the disk, ie solve the problem, not just one possible cause of the problem.
Check a zip header first :)
If the ZIP decompressor you use can provide the data on original and compressed size you can use that data. Otherwise start unzipping and monitor the output size - if it grows too much cut it loose.
Make sure you are not using your system drive for temp storage. I am not sure if a virusscanner will check it if it encounters it.
Also you can look at the information inside the zip file and retrieve a list of the content. How to do this depends on the utility used to extract the file, so you need to provide more information here
Related
I'm wondering if I can store a file in cache (or somewhere else), not on the hard drive using JAVA. I got a file as input, copy it to my hard drive now to an another path, modify the copied one and return with it. But what if a user does not have rights to write on hard disk, so the application which uses the file also don't have? Thats why I'm trying to store the copy somewhere else, for example in the cache, is it possible? If yes, is it also possible without any library?
Thanks in advance!
If this is string file, and small enough to fit into memory you can just use:
IOUtils.toString(inputStream, encoding);
for binary file you need to reserve byte array big enough to fit the file content into...
Would be possible to unzip tar.gz file partially e.g. unzip only few megabytes from the middle of the large tar.gz file ?
I got this idea as we have a lot zipped log files and it's very time consuming to unzip 100mb log file into ~1gb file and then search in it. Would be great to have option of 'partial unzip'.
Unless the .gz file was specially prepared for this purpose, then no, you need to decompress all of the data up to the middle in order to decompress what's in the middle.
It is possible to use Z_FULL_FLUSH in deflate() periodically to put breaks in in the compressed data to allow decompression starting at those break points. You would have to have a different file and your own software to keep track of where those breakpoints were, and how far into the uncompressed data they are.
Since it is a .tar.gz file, it would make sense to only have those breakpoints at file boundaries. The tar format itself can be read starting at any file header with no problem.
I've got many files that I want to store in a single archive file. My first approach was to store the files in a gzipped tarball. The problem is, that I've to rewrite the whole archive if a single file is added.
I could get rid of the gzip compression, but adding a file would still be expensive.
What other archive format would you suggest that allows fast append operations?
The ZIP file format was designed to allow appends without a total re-write and is ubiquitous, even on Unix.
ZIP and TAR fomats (and the old AR format) allow file append without a full rewrite. However:
The Java archive classes DO NOT support this mode of operation.
File append is likely to result in multiple copies of a file in the archive if you append an existing file.
The ZIP and AR formats have a directory that needs to be rewritten following a file append operation. The standard utilities take precautions when rewriting the directory, but it is possible in theory that you may end up with an archive with a missing or corrupted directory if the append fails.
There are 2 servers that are geographically very far from each other.
One server does file processing, then saves the processed file in a directory:
c:\processed\
Files can be 100-1GB in size.
The 2nd server is to download these files.
What techniques can I use to check if the file correctly downloaded?
Is a checksum all I need to do? will it hash according to the contents of the file or just the file attributes? (or what is best practise)
If the file is 1GB, will creating the checksum take a long time?
Checksum is fine to make sure that the downloaded data matches the source data. For a discussion of making it fast, see What is the fastest way to create a checksum for large files in C#.
I'm trying to zip a large number of pdf files (stored as BLOBs in the DB) and then return the zip as an attachment to the user.
What's the best way to do this without running into memory issues?
Another note: I actually need to merge some PDFs prior to adding them to the ZipOutputStream. Therefore, a couple PDFs will need to be stored in memory at a time.
I assume it would be best to then store them as temporary files on the server before zipping them all?
You can create zip files in memory in Java using ZipOutputStreams.
See http://www.exampledepot.com/egs/java.util.zip/CreateZip.html