People these days create their ZIP archives with WinZIP, which allows for internationalized (i.e. non-latin: cyrillic, greek, chinese, you name it) file names.
Sadly, trying to unpack such file causes trouble:
UNIX unzip creates garbage-named files and dirs like "®£¤ ©¤¥èì".
Java and its jar command fails miserably on such archives.
Is there a passable way to unpack such files programmatically? UNIX or Java.
DotNetZip supports unicode and arbitrary encodings for filenames within zipfiles, either for reading or writing zips.
It's a .NET library. For Unix usage, you would need Mono as a pre-requisite.
If the zipfile is correctly constructed by WinZip, in other words if it's compliant with the zip spec from PKWare, then there's no special work you need to do to specify the encoding at the time you unpack it. According to the zip spec, there are two supported encodings used for filenames in zipfiles: UTF-8 and IBM437. The use of one or the other of these encodings is specified in the zip metadata and any zip library can detect and use it. DotNetZip automatically detects it when reading a compliant zip. like this:
using (var zip = ZipFile.Read("thearchive.zip"))
{
foreach (var e in zip)
{
// e.FileName refers to the name on the entry
e.Extract("extract-directory");
}
}
There are archive programs that produce zips that are "non compliant" w.r.t. encoding. WinRar is one - it will create a zip that has filenames encoded in the default encoding in use on the computer. In Shanghai it will use cp950, while in Iceland, something else, and in Lisbon, something else. The advantage to "non compliance" here is that Windows Explorer will open and correctly display i18n-ized filenames in such zips. In other words, "non compliance" is often what people want, because Windows doesn't (yet?) support UTF-8 zip files.
(This all has to do with the encoding used in the zipfile, not the encoding used in the files contained in the zip file)
The zip spec doesn't allow for the specification of an arbitrary text encoding in the zip metadata. In other words if you use cp950 when creating the zip, then your extract logic needs to "know" to use cp950 when extracting - nothing in the zip file carries that information. In addition, of course, the zip library you use to programmatically extract must support arbitrary encodings. As far as I know, Java's zip library does not. DotNetZip does. Like so:
using (ZipFile zip = ZipFile.Read(zipToExtract,
System.Text.Encoding.GetEncoding(950)))
{
foreach (ZipEntry e in zip)
{
e.Extract(extractDirectory);
}
}
DotNetZip can also create zip files with arbitrary encodings - "non compliant" zips.
DotNetZip is free, and open source.
The solution I've found:
Apache commons-compress can unzip such archives just fine, if supplied with correct fallback charset.
Related
I have a directory containing both Zip & Rar archives.
I already have a way to get a zip file's comment -
if (f.getName().substring(f.getName().length() - 3).equals("zip")) {
ZipFile zip = new ZipFile(f);
zip.getComment();
}
Is there a way to do the same thing on a Rar file?
note that:
There are too many rar files for me to manually convert them to zip on some site (If there is some script to convert them, it could work).
Renaming a rar file's extension to .zip (file.rar -> file.zip) would still produce an exception when trying to create a new ZipFile object with it.
Thanks in advance!
I think in the end, you are looking for some sort of library to get that done for you. Like raroscope or java-unrar.
Alternatively, you could decide to re-invent that wheel yourself (not recommended).
Or you simply run the command line rar tool using ProcessBuilder (as a system command), like explained here.
I have a compressed file (an EAR), which contains several compressed files (EAR's, WAR's and JAR's), which also may contain compressed files (JAR's.). Is there a way to find a specific string in this structure using UNIX commands without manually decompressing them one by one?
Thank You.
With the Java jar executable (jar tf yourEar.ear), you coud list all contained files in in the EAR in the standard output.
But it doesn't list recursively jars.
So you could :
chain this result to a grep that specifies the searched string in the filename.
for each filename ended with .jar listed in the output, you could reuse the same logic.
I'm making a file import system, and I can't move files into the compiled .jar file the application is in.
Here's what I'm trying to do:
Path FROM = Paths.get(filePath.getText());
Path TO = Paths.get("C:\\Users\\" + System.getProperty("user.name") +
"\\AppData\\Roaming\\.minecraft\\mods\\music_crafter-1.0\\src\\main\\resources\\assets\\music_crafter\\sounds\\block\\music_player");
//jar file
Files.move(FROM, TO.resolve(FROM.getFileName()), StandardCopyOption.REPLACE_EXISTING);
You need to handle the jar file internally. A Jar is not a directory, it is a compressed container file (pretty much a ZIP file with a different extension).
To do this, given that you are on Java 6, you have 2 options:
Unzip the contents to a temporary working directory (there are built
in APIs for this, or use a library such as Apache Commons Compress)
do your work (copying, deleting, etc) and then re-zip.
Make external command line calls to the Jar utilities that come with
Java
Of those, only (1) makes any real sense.
A third option would be available if you could up your Java to 7+ which would be:
3. Use a Zip File System Provider to to treat it as a file system in code
All that said, however:
As per comments on your question, you really might want to look at if this something you need to do at all? Why do you need to insert into existing jars? If this is 'external' data, it would be much better in a separate resource location/container, not the application jar.
ZIP entries store the full path name of the entry because (I'm sure of the next part) the ZIP archive is not organized as directories. The metadata contains the info about how files are supposed to be stored (inside directories).
If I create a ZIP file in Windows, when I unzip the data in another OS, e.g. Mac OS X, the file structure remains as it used to be in Windows. Is this because the unzipper is designed to handle this, or isit because the file separators inside the ZIP are standard?
I'm asking this because I'm trying to find an entry inside a ZIP file using the name of the zipped file. But which file separator should I use to make it work in systems other than Windows?
I'm using Java, and the method: .getName() of the ZipEntry gives me the path using the Windows file separator \. Would it be enough if I use the java File.separator separator to make it work on another OS? Or will I have to try to find my file with each possible separator?
Honorary Correct Answer Mention
The answer given by #Eren Yilmaz is correct describing the functionality of many tools (or even the one you can code yourself). But given that the .zip standard clearly documents how it must be, the correct answer had to be updated
The .zip file specification states:
4.4.17.1 The name of the file, with optional relative path.
The path stored MUST not contain a drive or
device letter, or a leading slash. All slashes
MUST be forward slashes '/' as opposed to
backwards slashes '\' for compatibility with Amiga
and UNIX file systems etc. If input came from standard
input, there is no file name field.
The file separator is dependent on the application that creates the zip file. Some applications use the system file separator, whereas some use the "civilized" forward slash "/". So, if you are creating the zip file and then consuming it, then you can simply use a forward slash as file separator. If the zip file is created on somewhere else, then you should find out which separator was used. I don't know a simple way, but you can use a brute method and check out both separator types as you progress.
Some applications, especially custom zip creation codes, can mix the separators on different zip entries, so don't forget to check out each entry.
I'm writing a Java Class which extends Ant Zip Task to do a particular job for me. I want to create a zip file and once that file is created, I want to suppress the access time in the inode so I can't be modified or find a way to not let it change, even if the file is modified. The reason for that is I made a md5 hash which depends on the access time. Thus that's giving me a lot of trouble, and making the access time constant will solve my problem.
Does someone now how would I accomplish that?
Thanks!
I've had to solve a similar problem previously - perhaps this is an option for you. In my case, the problem was:
We made a jar file and then ran an secure hash algorithm on the jar file. Because the jar file is really a zip file, and a zip file internally contains file metadata information including last access time, if we create a new jar file from the exact same source material, then the hash on the new jar file doesn't match the original hash (because while the zip contents are the same, the metadata stored in the zip file has different file creation / access times).
Basically, we needed to be able to compute a secure hash for compliance purposes to be able to easily show that the contents of a jar was unchanged. Recompiling an equivalent jar was ok - it's just that the contents had to be identical.
We wrote a simple set of tools that performed secure hashes (and verifications) specifically for zip/jar files. It computed two hashes:
a regular secure hash of the file (which would identify the exact same jar - this would be the same as the output of your standard md5sum)
a "content only" hash which was computed by iterating over the bytes of the unpacked contents of the zip/jar (and thus could be used to identify that a recompiled jar matched the original jar)
To implement the content only hash, we used a ZipInputStream to iterate over the zip entries.
MessageDigest sha1;
byte[] digest;
for (each zip file entry)
{
if (entry represents a directory)
{
sha1.update( directory name bytes as UTF-8 );
}
else
{
read the entry bytes using ZipInputStream.read()
sha1.update( bytes );
}
}
digest = sha1.digest();
See also: ZipInputStream.read()
Note, however, that some files such as the manifest can contain information such as the version of ant used to create the jar, and the version of the compiler used to compile the classes. Thus, you have to compile from an equivalent environment for the hash to match.
Finally, this doesn't cope with the fact that a zip file might itself contain other zip files. While it would be straight forward enough to make the inspection cater for this and descend into nested zip/jar/war files, our implementation does not.