File size limitations of ZipOutputStream? - java

I am using the ZipOutputStream to create ZIP files. It works fine, but the Javadoc is quite sparse, so I'm left with questions about the characteristics of ZipOutputStream:
Is there a limit for the maximum supported file sizes? Both for files contained in the ZIP and for the resulting ZIP file itself? The size argument is long, but who knows. (Let us assume that the filesystem imposes no limits.)
What is the minimum input file size that justifies use of the DEFLATED method?
I will always read the resulting ZIP file using ZipInputStream.

The most important aspect is that in a current Java-7 JDK, ZipOutputStream creates ZIP files according to the 2012 PKZIP specification, which also includes support for ZIP64. Note that the ZIP64 features had bugs at first, but any recent version of the Java 7 JDK will be OK.
The maximum file size is thus 264-1 bytes. I tried it with a 10 GB test file. This is much larger than the 4 GB of standard ZIP. I could add it to the ZIP file with no problems, also if the resulting ZIP file itself grew beyond 4 GB.
The minimum file size which justifies use of the DEFLATED method is 22 bytes. This has nothing to do with the minimum ZIP file size which is incidentally also 22 bytes (for empty ZIP files). I empirically determined this number by adding strings of as of increasing length (see diagram below). Such a sequence of identical characters compresses very well, so in the real world, the break-even point will be higher.

Following are the limits of ZIP file format:
The minimum size of a .ZIP file is 22 bytes. The maximum size for both
the archive file and the individual files inside it is 4,294,967,295
bytes (232−1 bytes, or 4 GiB minus 1 byte) for standard .ZIP, and
18,446,744,073,709,551,615 bytes (264−1 bytes, or 16 EiB minus 1 byte)
for ZIP64.[31]
Reference : Zip (file format)

Related

How to unzip file zipped by PKZIP in mainframe by Java?

I am trying to write a program in Java to unzip files zipped by PKZIP tool in Mainframe. However, I have tried below 3 ways, none of them can solve my problem.
By exe.
I have tried to open it by WinRAR, 7Zip and Linux command(unzip).
All are failed with below error message :
The archive is either in unknown format or damaged
By JDK API - java.util.ZipFile
I also have tried to unzip it by JDK API, as this website described.
However, it fails with error message :
IO Error: java.util.zip.ZipException: error in opening zip file
By Zip4J
I also have tried to use Zip4J. It failed too, with error message :
Caused by: java.io.IOException: Negative seek offset
at java.io.RandomAccessFile.seek(Native Method)
at net.lingala.zip4j.core.HeaderReader.readEndOfCentralDirectoryRecord(HeaderReader.java:117)
... 5 more
May I ask if there is any java lib or linux command can extract zip file zipped by PKZIP in Mainframe? Thanks a lot!
I have successfully read files that were compressed with PKZip on z/OS and transferred to Linux. I was able to read them with java.util.zip* classes:
ZipFile ifile = new ZipFile(inFileName);
// faster to loop through entries than open the zip file as a stream
Enumeration<? extends ZipEntry> entries = ifile.entries();
while ( entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
if (!entry.isDirectory()) { // skip directories
String entryName = entry.getName();
// code to determine to process omitted
InputStream zis = ifile.getInputStream(entry);
// process the stream
}
}
The jar file format is just a zip file, so the "jar" command can also read such files.
Like the others, I suspect that maybe the file was not transferred in binary and so was corrupted. On Linux you can use the xxd utility (piped through head) to dump the first few bytes to see if it looks like a zip file:
# xxd myfile.zip | head
0000000: 504b 0304 2d00 0000 0800 2c66 a348 eb5e PK..-.....,f.H.^
The first 4 bytes should be as shown. See also the Wikipedia entry for zip files
Even if the first 4 bytes are correct, if the file was truncated during transmission that could also cause the corrupt file message.

Check compressed archive for corruption

I am creating compressed archives with tar and bzip2 using jarchivelib which utilizes org.apache.commons.compress.
try {
Archiver archiver = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.BZIP2);
File archive = archiver.create(archiveName, destination, sourceFilesArr);
} catch (IOException e) {
e.printStackTrace();
}
Sometimes it can happen that the created file is corrupted, so I want to check for that and recreate the archive if necessary. There is no error thrown and I detected the corruption when trying to decompress it manually with tar -xf file.tar.bz2 (Note: extracting with tar -xjf file.tar.bz2 works flawlessly)
tar: Archive contains `\2640\003\203\325#\0\0\0\003\336\274' where numeric off_t value expected
tar: Archive contains `\0l`\t\0\021\0' where numeric mode_t value expected
tar: Archive contains `\003\301\345\0\0\0\0\006\361\0p\340' where numeric time_t value expected
tar: Archive contains `\0\210\001\b\0\233\0' where numeric uid_t value expected
tar: Archive contains `l\001\210\0\210\001\263' where numeric gid_t value expected
tar: BZh91AY&SY"'ݛ\003\314>\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\343\262\037\017\205\360X\001\210: Unknown file type `', extracted as normal file
tar: BZh91AY&SY"'ݛ�>��������������������������������������X�: implausibly old time stamp 1970-01-01 00:59:59
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
Is there a way using org.apache.commons.compress to check a compressed archive if it is corrupted? Since the files can be at the size of several GB an approach without decompressing would be great.
As bzip2 compression produces a stream, there is no way how to check for corruption without decompressing that stream and passing it to tar to check.
Anyway, in your case you actually decompress directly with tar and not passing first to bzip2. This is the root cause. You need to always use the -j flag to tar as it's compressed by bzip2. That's why the second command works correctly.

java.nio.file.FileSystemException: Not enough storage is available to process this command

I have to copy images (4-6 MB each) periodically (each 8 seconds) into one folder. When a files count in the target folder reaches circa 320 files, an error is thrown. And if I restart an application, another circa 320 files are copied before an error is thrown again (so it is independent on a files count in a target directory):
java.nio.file.FileSystemException: X:\src\from.jpg ->
X:\target\to.jpg: Not enough storage is available to process this
command
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsFileCopy.copy(WindowsFileCopy.java:205) at
sun.nio.fs.WindowsFileSystemProvider.copy(WindowsFileSystemProvider.java:278)
at java.nio.file.Files.copy(Files.java:1274)
I'm using Files.copy(f.toPath(), t.toPath()) to copy the file.
I think a problem si related to System Error. Code: 8. Not enough storage is available to process this command .
Is it possible to somehow clean used resources to avoid this Exception?

ZIP Directory AES encryption

Is is possible to encrypt a zip file with a directory in it but not files? or if this is not possible, how about encrypting the directory itself?
I have a directory with subdirectories. And i need to encrypt it. I read a suggestion to zip it first then encrypt the zipfile.
I also don't want a zip of encrypted files. just in case someone would suggest it. I really need the directory or the zip file of it to be encrypted. or is this the only way possible?
and how do i know my files have been encrypted?
My program is written in java. and being applied in android.
Thank you very much for your input.
As far as I'm aware, the ZIP specification allows for the content of each of the files to be encrypted, and, in theory, with a separate password per file.
If you want to encrypt a directory and all the files below it, then you need to archive the directory and encrypt the archive. A directory is not considered an encryptable item, as it's only a place-holder - all the information related to the directory is part of the zip entry attributes, and not encryptable, as only the content of the file are encrypted (this is per the zip spec).
To determine if the file has been encrypted, you can try extracting using a zip tool without specifying a password. The program should prompt for a password for the encrypted file. Note as of this writing, none of info-zip tools support AES encryption, so you will probably not be able to test extract the files using this method.
With zipinfo look at the 'status' column, and if it's all in capitals then it's encrypted (good rule of thumb):
host:~/bin% zip hello.zip radio
updating: radio (deflated 30%)
host:~/bin% zipinfo hello.zip
Archive: hello.zip 307 bytes 1 file
-rwxr-xr-x 3.0 unx 211 tx defN 9-Jun-09 11:33 radio
1 file, 211 bytes uncompressed, 147 bytes compressed: 30.3%
Note the lower case tx for the non password protected entry
host:~/bin% zip -P fred hello.zip radio
adding: radio (deflated 30%)
host:~/bin% zipinfo hello.zip
Archive: hello.zip 335 bytes 1 file
-rwxr-xr-x 3.0 unx 211 TX defN 9-Jun-09 11:33 radio
1 file, 211 bytes uncompressed, 147 bytes compressed: 30.3%
Note the upper case TX for the password protected entry.
The jar tool produces the awesome message (ymmv):
host:~/bin% zip -P fred hello.zip radio
updating: radio (deflated 30%)
host:~/bin% jar tvf hello.zip
java.util.zip.ZipException: invalid CEN header (encrypted entry)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:214)
at java.util.zip.ZipFile.<init>(ZipFile.java:144)
at java.util.zip.ZipFile.<init>(ZipFile.java:115)
at sun.tools.jar.Main.list(Main.java:1004)
at sun.tools.jar.Main.run(Main.java:245)
at sun.tools.jar.Main.main(Main.java:1177)

Split HDFS files into multiple local files using Java

I have to copy HDFS files into local file system using Java Code and before writing to the disk split into multiple parts . The files are compressed using snappy / lzo. I have used Bufferedreader and Filewriter to read and write the file . But this operation is very slow . 20 Mins for 30 GB file. I can dump file using hadoop fs -text in 2 minutes (but can not split it). Is there anything else that I can do to speed up the operation ?
Since I had two do pass , first to get the line count and then the Split. hadoop fs -text was cpu intensive. Did the below approach :
1) Use a line count Java program as Map reduce to get the line count in the file. Then dividing it by total number of files I need i got the count of lines to write to each file .
2) Use the code mentioned this link with hadoop fs -text
https://superuser.com/a/485602/220236
Hope it helps someone else.

Categories

Resources