Encoding errors when compressing files with Apache Commons Compression on Linux - java

I am compressing files using the Apache Commons API Compression. Windows 7 works fine, but in Linux (ubuntu 10.10 - UTF8), characters in file names and folder names, such as "º", for example, are replaced by "?".
Is there any parameter I should pass to the API when compact, or when uncompressing tar?
I'am using tar.gz format, following the API examples.
The files I'm trying compress, are created in windows... is there any trouble?
The code:
public class TarGzTest
{
public static void createTarGzOfDirectory(String directoryPath, String tarGzPath) throws IOException
{
System.out.println("Criando tar.gz da pasta " + directoryPath + " em " + tarGzPath);
FileOutputStream fOut = null;
BufferedOutputStream bOut = null;
GzipCompressorOutputStream gzOut = null;
TarArchiveOutputStream tOut = null;
try
{
fOut = new FileOutputStream(new File(tarGzPath));
bOut = new BufferedOutputStream(fOut);
gzOut = new GzipCompressorOutputStream(bOut);
tOut = new TarArchiveOutputStream(gzOut);
addFileToTarGz(tOut, directoryPath, "");
}
finally
{
tOut.finish();
tOut.close();
gzOut.close();
bOut.close();
fOut.close();
}
System.out.println("Processo concluído.");
}
private static void addFileToTarGz(TarArchiveOutputStream tOut, String path, String base) throws IOException
{
System.out.println("addFileToTarGz()::"+path);
File f = new File(path);
String entryName = base + f.getName();
TarArchiveEntry tarEntry = new TarArchiveEntry(f, entryName);
tOut.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
if(f.isFile())
{
tOut.putArchiveEntry(tarEntry);
IOUtils.copy(new FileInputStream(f), tOut);
tOut.closeArchiveEntry();
}
else
{
File[] children = f.listFiles();
if(children != null)
{
for(File child : children)
{
addFileToTarGz(tOut, child.getAbsolutePath(), entryName + "/");
}
}
}
}
}
(I suppress the main method;)
EDIT (monkeyjluffy) : The changes that I made are to have always the same archive on different platform. Then the hash calculated on it is the same.

I found a workaround for my trouble.
For some reason, java doesn't respects my environment's encoding, and change it to cp1252.
After that I uncompress the file, I just enter in it folder, and ran this command:
convmv --notest -f cp1252 -t utf8 * -r
And it converts everything recursively to UTF-8.
Problem solved, guys.
more info about encoding problems in linux here.
Thanks everyone for the help.

Related

Not able to get the list of file in a .tar file in Java

I'm trying to return a list of file names from inside of a tar file. I'm using the code below, but when it gets to the while loop, it immediately goes to the catch exception and says "java.io.IOException: Error detected parsing the header
Below is the code I'm using. Can you help me figure out why this doesn't work?
public List<String> getFilesInTar(String filename) {
List<String> foundFiles = Lists.newArrayList();
String filePath = System.getProperty("user.home") + File.separator + "Downloads" + File.separator + filename;
try {
TarArchiveInputStream tarInput = new TarArchiveInputStream(new FileInputStream(filePath));
TarArchiveEntry entry;
while ((entry = tarInput.getNextTarEntry()) != null) {
if (!entry.isDirectory()) {
foundFiles.add(entry.getName());
}
}
tarInput.close();
} catch (IOException ex) {
log.error(ex.getMessage());
}
return foundFiles;
}
Your file is not a tar file. It’s a compressed archive of a tar file.
You cannot open it as a tar file, as is, for the same reason you can’t read a text file while it’s in a zip archive: the bytes representing compressed data are not themselves readable.
The .gz extension of your filename indicates that it was compressed using gzip, which is common when compressing tar files. You can use the GZIPInputStream class to uncompress it:
TarArchiveInputStream tarInput = new TarArchiveInputStream(
new GZIPInputStream(
new BufferedInputStream(
new FileInputStream(filePath))));

Getting strange structure file when zipping a directory using Java

I wanted to zip a directory with files and subdirectories in it. I did this and worked fine but I am getting and unusual and curious file structure (At least I see it that way).
This is the created file: When I click on it, I see an "empty" directory like this: but when I unzip this I see this file structure (Not all the names are exacly as they are showed in the image below):
|mantenimiento
|Carpeta_A
|File1.txt
|File2.txt
|Carpeta_B
|Sub_carpetaB
|SubfileB.txt
|Subfile1B.txt
|Subfile2B.txt
|File12.txt
My problem somehow is that the folder "mantenimiento" is where I am zippping from (the directory which I want to zip) and I dont want it to be there, so when I unzip the just created .zip file I want it with this file structure (which are the files and directories inside "mantenimiento" directory): and the other thing is when I click on the .zip file I want to see the files and directories just like the image showed above.
I dont know what's wrong with my code, I have searched but haven't found a reference to what my problem might be.
Here's my code:
private void zipFiles( List<File> files, String directory) throws IOException
{
ZipOutputStream zos = null;
ZipEntry zipEntry = null;
FileInputStream fin = null;
FileOutputStream fos = null;
BufferedInputStream in = null;
String zipFileName = getZipFileName();
try
{
fos = new FileOutputStream( File.separatorChar + zipFileName + EXTENSION );
zos = new ZipOutputStream(fos);
byte[] buf = new byte[1024];
int len;
for(File file : files)
{
zipEntry = new ZipEntry(file.toString());
fin = new FileInputStream(file);
in = new BufferedInputStream(fin);
zos.putNextEntry(zipEntry);
while ((len = in.read(buf)) >= 0)
{
zos.write(buf, 0, len);
}
}
}
catch(Exception e)
{
System.err.println("No fue posible zipear los archivos");
e.printStackTrace();
}
finally
{
in.close();
zos.closeEntry();
zos.close();
}
}
Hope you guys can give me a hint about what I am doing wrong or what I am missing.
Thanks a lot.
Btw, the directory i am giving to the method is never used. The other parameter i am giving is a list of files which contains all the files and directories from the C:\mantenimiento directory.
I once had a problem with windows and zip files, where the created zip did not contain the entries for the folders (i.e. /, /Carpeta_A etc) only the file entries. Try adding ZipEntries for the folders without streaming content.
But as alternative to the somewhat bulky Zip API of Java you could use Filesystem (since Java7) instead. The following example is for Java8 (lambda):
//Path pathToZip = Paths.get("path/to/your/folder");
//Path zipFile = Paths.get("file.zip");
public Path zipPath(Path pathToZip, Path zipFile) {
Map<String, String> env = new HashMap<String, String>() {{
put("create", "true");
}};
try (FileSystem zipFs = FileSystems.newFileSystem(URI.create("jar:" + zipFile.toUri()), env)) {
Path root = zipFs.getPath("/");
Files.walk(pathToZip).forEach(path -> zip(root, path));
}
}
private static void zip(final Path zipRoot, final Path currentPath) {
Path entryPath = zipRoot.resolve(currentPath.toString());
try {
Files.createDirectories(entryPath.getParent());
Files.copy(currentPath, entryPath);
} catch (IOException e) {
throw new RuntimeException(e);
}
}

uncompress files from a tar using apache commons - prob is duplicate entry

Scenario: Uncompress a tar file using Apache commons.
Problem: The tar i am using is a build tar which gets deployed into a web server. This tar contains duplicate entries like below.
appender_class.xml
APPENDER_CLASS.xml
when extracting using the below code only appender_class.xml is extracted but i want both the files how can i do that ? Renaming in fly is fine but how can i accomplish that?
public static void untar(File[] files) throws Exception {
String path = files[0].toString();
File tarPath = new File(path);
TarEntry entry;
TarInputStream inputStream = null;
FileOutputStream outputStream = null;
try {
inputStream = new TarInputStream(new FileInputStream(tarPath));
while (null != (entry = inputStream.getNextEntry())) {
int bytesRead;
System.out.println("tarpath:" + tarPath.getName());
System.out.println("Entry:" + entry.getName());
String pathWithoutName = path.substring(0, path.indexOf(tarPath.getName()));
System.out.println("pathname:" + pathWithoutName);
if (entry.isDirectory()) {
File directory = new File(pathWithoutName + entry.getName());
directory.mkdir();
continue;
}
byte[] buffer = new byte[1024];
outputStream = new FileOutputStream(pathWithoutName + entry.getName());
while ((bytesRead = inputStream.read(buffer, 0, 1024)) > -1) {
outputStream.write(buffer, 0, bytesRead);
}
System.out.println("Extracted " + entry.getName());
}
}
Try opening your FileOutputstream like this instead:
File outputFile = new File(pathWithoutName + entry.getName());
for(int i = 2; outputFile.exists(); i++) {
outputFile = new File(pathWithoutName + entry.getName() + i);
}
outputStream = new FileOutputStream(outputFile);
It should generate a file called APPENDER_CLASS.xml2 if it encounters a previously created file called APPENDER_CLASS.xml. If a APPENDER_CLASS.xml2 exists it will create a APPENDER_CLASS.xml3, ad infinitum.
File.exists() takes case sensitivity into account (windows filenames are case insensitive, whereas unix, linux and mac are case sensitive). Thus with the above code on case insensitive filesystems the file would be renamed and on case sensitive filesystems the file would not be renamed.

Using java to extract .rar files

I am looking a ways to unzip .rar files using Java and where ever I search i keep ending up with the same tool - JavaUnRar. I have been looking into unzipping .rar files with this but all the ways i seem to find to do this are very long and awkward like in this example
I am currently able to extract .tar, .tar.gz, .zip and .jar files in 20 lines of code or less so there must be a simpler way to extract .rar files, does anybody know?
Just if it helps anybody this is the code that I am using to extract both .zip and .jar files, it works for both
public void getZipFiles(String zipFile, String destFolder) throws IOException {
BufferedOutputStream dest = null;
ZipInputStream zis = new ZipInputStream(
new BufferedInputStream(
new FileInputStream(zipFile)));
ZipEntry entry;
while (( entry = zis.getNextEntry() ) != null) {
System.out.println( "Extracting: " + entry.getName() );
int count;
byte data[] = new byte[BUFFER];
if (entry.isDirectory()) {
new File( destFolder + "/" + entry.getName() ).mkdirs();
continue;
} else {
int di = entry.getName().lastIndexOf( '/' );
if (di != -1) {
new File( destFolder + "/" + entry.getName()
.substring( 0, di ) ).mkdirs();
}
}
FileOutputStream fos = new FileOutputStream( destFolder + "/"
+ entry.getName() );
dest = new BufferedOutputStream( fos );
while (( count = zis.read( data ) ) != -1)
dest.write( data, 0, count );
dest.flush();
dest.close();
}
}
You are able to extract .gz, .zip, .jar files as they use number of compression algorithms built into the Java SDK.
The case with RAR format is a bit different. RAR is a proprietary archive file format. RAR license does not allow to include it into software development tools like Java SDK.
The best way to unrar your files will be using 3rd party libraries such as junrar.
You can find some references to other Java RAR libraries in SO question RAR archives with java. Also SO question How to compress text file to rar format using java program explains more on different workarounds (e.g. using Runtime).
You can use the library junrar
<dependency>
<groupId>com.github.junrar</groupId>
<artifactId>junrar</artifactId>
<version>0.7</version>
</dependency>
Code example:
File f = new File(filename);
Archive archive = new Archive(f);
archive.getMainHeader().print();
FileHeader fh = archive.nextFileHeader();
while(fh!=null){
File fileEntry = new File(fh.getFileNameString().trim());
System.out.println(fileEntry.getAbsolutePath());
FileOutputStream os = new FileOutputStream(fileEntry);
archive.extractFile(fh, os);
os.close();
fh=archive.nextFileHeader();
}
You can use http://sevenzipjbind.sourceforge.net/index.html
In addition to supporting a large number of archive formats, version 16.02-2.01 has full support for RAR5 extraction with:
password protected archives
archives with encrypted headers
archives splitted in volumes
gradle
implementation 'net.sf.sevenzipjbinding:sevenzipjbinding:16.02-2.01'
implementation 'net.sf.sevenzipjbinding:sevenzipjbinding-all-platforms:16.02-2.01'
or maven
<dependency>
<groupId>net.sf.sevenzipjbinding</groupId>
<artifactId>sevenzipjbinding</artifactId>
<version>16.02-2.01</version>
</dependency>
<dependency>
<groupId>net.sf.sevenzipjbinding</groupId>
<artifactId>sevenzipjbinding-all-platforms</artifactId>
<version>16.02-2.01</version>
</dependency>
And code example
import net.sf.sevenzipjbinding.ExtractOperationResult;
import net.sf.sevenzipjbinding.IInArchive;
import net.sf.sevenzipjbinding.SevenZip;
import net.sf.sevenzipjbinding.impl.RandomAccessFileInStream;
import net.sf.sevenzipjbinding.simple.ISimpleInArchiveItem;
import java.io.*;
import java.util.HashMap;
import java.util.Map;
/**
* Responsible for unpacking archives with the RAR extension.
* Support Rar4, Rar4 with password, Rar5, Rar5 with password.
* Determines the type of archive itself.
*/
public class RarExtractor {
/**
* Extracts files from archive. Archive can be encrypted with password
*
* #param filePath path to .rar file
* #param password string password for archive
* #return map of extracted file with file name
* #throws IOException
*/
public Map<InputStream, String> extract(String filePath, String password) throws IOException {
Map<InputStream, String> extractedMap = new HashMap<>();
RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
RandomAccessFileInStream randomAccessFileStream = new RandomAccessFileInStream(randomAccessFile);
IInArchive inArchive = SevenZip.openInArchive(null, randomAccessFileStream);
for (ISimpleInArchiveItem item : inArchive.getSimpleInterface().getArchiveItems()) {
if (!item.isFolder()) {
ExtractOperationResult result = item.extractSlow(data -> {
extractedMap.put(new BufferedInputStream(new ByteArrayInputStream(data)), item.getPath());
return data.length;
}, password);
if (result != ExtractOperationResult.OK) {
throw new RuntimeException(
String.format("Error extracting archive. Extracting error: %s", result));
}
}
}
return extractedMap;
}
}
P.S.
#BorisBrodski https://github.com/borisbrodski Happy 40th birthday to you! Hope you had a great celebration. Thanks for your work!
you could simply add this maven dependency to you code:
<dependency>
<groupId>com.github.junrar</groupId>
<artifactId>junrar</artifactId>
<version>0.7</version>
</dependency>
and then use this code for extract rar file:
File rar = new File("path_to_rar_file.rar");
File tmpDir = File.createTempFile("bip.",".unrar");
if(!(tmpDir.delete())){
throw new IOException("Could not delete temp file: " + tmpDir.getAbsolutePath());
}
if(!(tmpDir.mkdir())){
throw new IOException("Could not create temp directory: " + tmpDir.getAbsolutePath());
}
System.out.println("tmpDir="+tmpDir.getAbsolutePath());
ExtractArchive extractArchive = new ExtractArchive();
extractArchive.extractArchive(rar, tmpDir);
System.out.println("finished.");
This method helps to extract files to streams from rar(RAR5) file stream if you have input stream. In my case I was processing MimeBodyPart from email.
The example from #Alexey Bril didn't work for me.
Dependencies are the same
Gradle
implementation 'net.sf.sevenzipjbinding:sevenzipjbinding:16.02-2.01'
implementation 'net.sf.sevenzipjbinding:sevenzipjbinding-all-platforms:16.02-2.01'
Maven
<dependency>
<groupId>net.sf.sevenzipjbinding</groupId>
<artifactId>sevenzipjbinding</artifactId>
<version>16.02-2.01</version>
</dependency>
<dependency>
<groupId>net.sf.sevenzipjbinding</groupId>
<artifactId>sevenzipjbinding-all-platforms</artifactId>
<version>16.02-2.01</version>
</dependency>
Code
private List<InputStream> getInputStreamsFromRar5InputStream(InputStream is) throws IOException {
List<InputStream> inputStreams = new ArrayList<>();
File tempFile = File.createTempFile("tempRarArchive-", ".rar", null);
try (FileOutputStream fos = new FileOutputStream(tempFile)) {
fos.write(is.readAllBytes());
fos.flush();
try (RandomAccessFile raf = new RandomAccessFile(tempFile, "r")) {// open for reading
try (IInArchive inArchive = SevenZip.openInArchive(null, // autodetect archive type
new RandomAccessFileInStream(raf))) {
// Getting simple interface of the archive inArchive
ISimpleInArchive simpleInArchive = inArchive.getSimpleInterface();
for (ISimpleInArchiveItem item : simpleInArchive.getArchiveItems()) {
if (!item.isFolder()) {
ExtractOperationResult result;
final InputStream[] IS = new InputStream[1];
final Integer[] sizeArray = new Integer[1];
result = item.extractSlow(new ISequentialOutStream() {
/**
* #param bytes of extracted data
* #return size of extracted data
*/
#Override
public int write(byte[] bytes) {
InputStream is = new ByteArrayInputStream(bytes);
sizeArray[0] = bytes.length;
IS[0] = new BufferedInputStream(is); // Data to write to file
return sizeArray[0];
}
});
if (result == ExtractOperationResult.OK) {
inputStreams.add(IS[0]);
} else {
log.error("Error extracting item: " + result);
}
}
}
}
}
} finally {
tempFile.delete();
}
return inputStreams;
}

java.util.zip.ZipException: invalid compression method

I am having some issues in dealing with zip files on Mac OS X 10.7.3.
I am receiving a zip file from a third party, which I have to process. My code is using ZipInputStream to do this. This code has been used several times before, without any issue but it fails for this particular zip file. The error which I get is as follows:
java.util.zip.ZipException: invalid compression method
at java.util.zip.ZipInputStream.read(ZipInputStream.java:185)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:105)
at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
I googled about it and I can see that there are some issues with ZipInputStream, e.g. this one.
I have also found some related questions on Stackoverflow, e.g. this one. But there is no proper, accepted/acceptable answer.
I have a couple questions:
Has anyone found any concrete solution for this? Like any lates
update or an all together different JAR which has same functionality
but no issues
On this link, the user phobuz1 mentions that "if
non-standard compression method (method 6)" is used, then this
problem occurs. Is there a way to find out which compression method
is used? So that I can be sure of the reason of failure?
Note that as with some users, if I unzip the file on my local machine and re-zip it, it works perfectly fine.
EDIT 1:
The file which I am getting is in .zip format, I don't know which OS/utility program they are using to compress it. On my local machine I am using the built-in zip utility which comes with Mac OS X.
API JAVADOC :
THIS is what actually compresses your file ( Updated Feb 2021 note, the oracle provided link expired, Duke University (no affiliation) however has the antiquated 1.4 javadoc available) : https://www2.cs.duke.edu/csed/java/jdk1.4.2/docs/api/java/util/zip/ZipEntry.html
per that interface, you are able to get and set compression methods ( getCompression() and setCompression(int), respectively).
good luck!
I'm using the following code in Java on my Windows XP OS to zip folders. It may at least be useful to you as a side note.
//add folder to the zip file
private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception
{
File folder = new File(srcFolder);
//check the empty folder
if (folder.list().length == 0)
{
System.out.println(folder.getName());
addFileToZip(path , srcFolder, zip,true);
}
else
{
//list the files in the folder
for (String fileName : folder.list())
{
if (path.equals(""))
{
addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip,false);
}
else
{
addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip,false);
}
}
}
}
//recursively add files to the zip files
private void addFileToZip(String path, String srcFile, ZipOutputStream zip,boolean flag)throws Exception
{
//create the file object for inputs
File folder = new File(srcFile);
//if the folder is empty add empty folder to the Zip file
if (flag==true)
{
zip.putNextEntry(new ZipEntry(path + "/" +folder.getName() + "/"));
}
else
{
//if the current name is directory, recursively traverse it to get the files
if (folder.isDirectory())
{
addFolderToZip(path, srcFile, zip); //if folder is not empty
}
else
{
//write the file to the output
byte[] buf = new byte[1024];
int len;
FileInputStream in = new FileInputStream(srcFile);
zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
while ((len = in.read(buf)) > 0)
{
zip.write(buf, 0, len); //Write the Result
}
}
}
}
//zip the folders
private void zipFolder(String srcFolder, String destZipFile) throws Exception
{
//create the output stream to zip file result
FileOutputStream fileWriter = new FileOutputStream(destZipFile);
ZipOutputStream zip = new ZipOutputStream(fileWriter);
//add the folder to the zip
addFolderToZip("", srcFolder, zip);
//close the zip objects
zip.flush();
zip.close();
}
private boolean zipFiles(String srcFolder, String destZipFile) throws Exception
{
boolean result=false;
System.out.println("Program Start zipping the given files");
//send to the zip procedure
zipFolder(srcFolder,destZipFile);
result=true;
System.out.println("Given files are successfully zipped");
return result;
}
In this code, you need to invoke the preceding method zipFiles(String srcFolder, String destZipFile) by passing two parameters. The first parameter indicates your folder to be zipped and the second parameter destZipFile indicates your destination zip folder.
The following code is to unzip a zipped folder.
private void unzipFolder(String file) throws FileNotFoundException, IOException
{
File zipFile=new File("YourZipFolder.zip");
File extractDir=new File("YourDestinationFolder");
extractDir.mkdirs();
ZipInputStream inputStream = new ZipInputStream(new FileInputStream(zipFile));
try
{
ZipEntry entry;
while ((entry = inputStream.getNextEntry()) != null)
{
StringBuilder sb = new StringBuilder();
sb.append("Extracting ");
sb.append(entry.isDirectory() ? "directory " : "file ");
sb.append(entry.getName());
sb.append(" ...");
System.out.println(sb.toString());
File unzippedFile = new File(extractDir, entry.getName());
if (!entry.isDirectory())
{
if (unzippedFile.getParentFile() != null)
{
unzippedFile.getParentFile().mkdirs();
}
FileOutputStream outputStream = new FileOutputStream(unzippedFile);
try
{
byte[] buffer = new byte[1024];
int len;
while ((len = inputStream.read(buffer)) != -1)
{
outputStream.write(buffer, 0, len);
}
}
finally
{
if (outputStream != null)
{
outputStream.close();
}
}
}
else
{
unzippedFile.mkdirs();
}
}
}
finally
{
if (inputStream != null)
{
inputStream.close();
}
}
}

Categories

Resources