I am looking a ways to unzip .rar files using Java and where ever I search i keep ending up with the same tool - JavaUnRar. I have been looking into unzipping .rar files with this but all the ways i seem to find to do this are very long and awkward like in this example
I am currently able to extract .tar, .tar.gz, .zip and .jar files in 20 lines of code or less so there must be a simpler way to extract .rar files, does anybody know?
Just if it helps anybody this is the code that I am using to extract both .zip and .jar files, it works for both
public void getZipFiles(String zipFile, String destFolder) throws IOException {
BufferedOutputStream dest = null;
ZipInputStream zis = new ZipInputStream(
new BufferedInputStream(
new FileInputStream(zipFile)));
ZipEntry entry;
while (( entry = zis.getNextEntry() ) != null) {
System.out.println( "Extracting: " + entry.getName() );
int count;
byte data[] = new byte[BUFFER];
if (entry.isDirectory()) {
new File( destFolder + "/" + entry.getName() ).mkdirs();
continue;
} else {
int di = entry.getName().lastIndexOf( '/' );
if (di != -1) {
new File( destFolder + "/" + entry.getName()
.substring( 0, di ) ).mkdirs();
}
}
FileOutputStream fos = new FileOutputStream( destFolder + "/"
+ entry.getName() );
dest = new BufferedOutputStream( fos );
while (( count = zis.read( data ) ) != -1)
dest.write( data, 0, count );
dest.flush();
dest.close();
}
}
You are able to extract .gz, .zip, .jar files as they use number of compression algorithms built into the Java SDK.
The case with RAR format is a bit different. RAR is a proprietary archive file format. RAR license does not allow to include it into software development tools like Java SDK.
The best way to unrar your files will be using 3rd party libraries such as junrar.
You can find some references to other Java RAR libraries in SO question RAR archives with java. Also SO question How to compress text file to rar format using java program explains more on different workarounds (e.g. using Runtime).
You can use the library junrar
<dependency>
<groupId>com.github.junrar</groupId>
<artifactId>junrar</artifactId>
<version>0.7</version>
</dependency>
Code example:
File f = new File(filename);
Archive archive = new Archive(f);
archive.getMainHeader().print();
FileHeader fh = archive.nextFileHeader();
while(fh!=null){
File fileEntry = new File(fh.getFileNameString().trim());
System.out.println(fileEntry.getAbsolutePath());
FileOutputStream os = new FileOutputStream(fileEntry);
archive.extractFile(fh, os);
os.close();
fh=archive.nextFileHeader();
}
You can use http://sevenzipjbind.sourceforge.net/index.html
In addition to supporting a large number of archive formats, version 16.02-2.01 has full support for RAR5 extraction with:
password protected archives
archives with encrypted headers
archives splitted in volumes
gradle
implementation 'net.sf.sevenzipjbinding:sevenzipjbinding:16.02-2.01'
implementation 'net.sf.sevenzipjbinding:sevenzipjbinding-all-platforms:16.02-2.01'
or maven
<dependency>
<groupId>net.sf.sevenzipjbinding</groupId>
<artifactId>sevenzipjbinding</artifactId>
<version>16.02-2.01</version>
</dependency>
<dependency>
<groupId>net.sf.sevenzipjbinding</groupId>
<artifactId>sevenzipjbinding-all-platforms</artifactId>
<version>16.02-2.01</version>
</dependency>
And code example
import net.sf.sevenzipjbinding.ExtractOperationResult;
import net.sf.sevenzipjbinding.IInArchive;
import net.sf.sevenzipjbinding.SevenZip;
import net.sf.sevenzipjbinding.impl.RandomAccessFileInStream;
import net.sf.sevenzipjbinding.simple.ISimpleInArchiveItem;
import java.io.*;
import java.util.HashMap;
import java.util.Map;
/**
* Responsible for unpacking archives with the RAR extension.
* Support Rar4, Rar4 with password, Rar5, Rar5 with password.
* Determines the type of archive itself.
*/
public class RarExtractor {
/**
* Extracts files from archive. Archive can be encrypted with password
*
* #param filePath path to .rar file
* #param password string password for archive
* #return map of extracted file with file name
* #throws IOException
*/
public Map<InputStream, String> extract(String filePath, String password) throws IOException {
Map<InputStream, String> extractedMap = new HashMap<>();
RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
RandomAccessFileInStream randomAccessFileStream = new RandomAccessFileInStream(randomAccessFile);
IInArchive inArchive = SevenZip.openInArchive(null, randomAccessFileStream);
for (ISimpleInArchiveItem item : inArchive.getSimpleInterface().getArchiveItems()) {
if (!item.isFolder()) {
ExtractOperationResult result = item.extractSlow(data -> {
extractedMap.put(new BufferedInputStream(new ByteArrayInputStream(data)), item.getPath());
return data.length;
}, password);
if (result != ExtractOperationResult.OK) {
throw new RuntimeException(
String.format("Error extracting archive. Extracting error: %s", result));
}
}
}
return extractedMap;
}
}
P.S.
#BorisBrodski https://github.com/borisbrodski Happy 40th birthday to you! Hope you had a great celebration. Thanks for your work!
you could simply add this maven dependency to you code:
<dependency>
<groupId>com.github.junrar</groupId>
<artifactId>junrar</artifactId>
<version>0.7</version>
</dependency>
and then use this code for extract rar file:
File rar = new File("path_to_rar_file.rar");
File tmpDir = File.createTempFile("bip.",".unrar");
if(!(tmpDir.delete())){
throw new IOException("Could not delete temp file: " + tmpDir.getAbsolutePath());
}
if(!(tmpDir.mkdir())){
throw new IOException("Could not create temp directory: " + tmpDir.getAbsolutePath());
}
System.out.println("tmpDir="+tmpDir.getAbsolutePath());
ExtractArchive extractArchive = new ExtractArchive();
extractArchive.extractArchive(rar, tmpDir);
System.out.println("finished.");
This method helps to extract files to streams from rar(RAR5) file stream if you have input stream. In my case I was processing MimeBodyPart from email.
The example from #Alexey Bril didn't work for me.
Dependencies are the same
Gradle
implementation 'net.sf.sevenzipjbinding:sevenzipjbinding:16.02-2.01'
implementation 'net.sf.sevenzipjbinding:sevenzipjbinding-all-platforms:16.02-2.01'
Maven
<dependency>
<groupId>net.sf.sevenzipjbinding</groupId>
<artifactId>sevenzipjbinding</artifactId>
<version>16.02-2.01</version>
</dependency>
<dependency>
<groupId>net.sf.sevenzipjbinding</groupId>
<artifactId>sevenzipjbinding-all-platforms</artifactId>
<version>16.02-2.01</version>
</dependency>
Code
private List<InputStream> getInputStreamsFromRar5InputStream(InputStream is) throws IOException {
List<InputStream> inputStreams = new ArrayList<>();
File tempFile = File.createTempFile("tempRarArchive-", ".rar", null);
try (FileOutputStream fos = new FileOutputStream(tempFile)) {
fos.write(is.readAllBytes());
fos.flush();
try (RandomAccessFile raf = new RandomAccessFile(tempFile, "r")) {// open for reading
try (IInArchive inArchive = SevenZip.openInArchive(null, // autodetect archive type
new RandomAccessFileInStream(raf))) {
// Getting simple interface of the archive inArchive
ISimpleInArchive simpleInArchive = inArchive.getSimpleInterface();
for (ISimpleInArchiveItem item : simpleInArchive.getArchiveItems()) {
if (!item.isFolder()) {
ExtractOperationResult result;
final InputStream[] IS = new InputStream[1];
final Integer[] sizeArray = new Integer[1];
result = item.extractSlow(new ISequentialOutStream() {
/**
* #param bytes of extracted data
* #return size of extracted data
*/
#Override
public int write(byte[] bytes) {
InputStream is = new ByteArrayInputStream(bytes);
sizeArray[0] = bytes.length;
IS[0] = new BufferedInputStream(is); // Data to write to file
return sizeArray[0];
}
});
if (result == ExtractOperationResult.OK) {
inputStreams.add(IS[0]);
} else {
log.error("Error extracting item: " + result);
}
}
}
}
}
} finally {
tempFile.delete();
}
return inputStreams;
}
Related
I am currently extracting the contents of a war file and then adding some new files to the directory structure and then creating a new war file.
This is all done programatically from Java - but I am wondering if it wouldn't be more efficient to copy the war file and then just append the files - then I wouldn't have to wait so long as the war expands and then has to be compressed again.
I can't seem to find a way to do this in the documentation though or any online examples.
Anyone can give some tips or pointers?
UPDATE:
TrueZip as mentioned in one of the answers seems to be a very good java library to append to a zip file (despite other answers that say it is not possible to do this).
Anyone have experience or feedback on TrueZip or can recommend other similar libaries?
In Java 7 we got Zip File System that allows adding and changing files in zip (jar, war) without manual repackaging.
We can directly write to files inside zip files as in the following example.
Map<String, String> env = new HashMap<>();
env.put("create", "true");
Path path = Paths.get("test.zip");
URI uri = URI.create("jar:" + path.toUri());
try (FileSystem fs = FileSystems.newFileSystem(uri, env))
{
Path nf = fs.getPath("new.txt");
try (Writer writer = Files.newBufferedWriter(nf, StandardCharsets.UTF_8, StandardOpenOption.CREATE)) {
writer.write("hello");
}
}
As others mentioned, it's not possible to append content to an existing zip (or war). However, it's possible to create a new zip on the fly without temporarily writing extracted content to disk. It's hard to guess how much faster this will be, but it's the fastest you can get (at least as far as I know) with standard Java. As mentioned by Carlos Tasada, SevenZipJBindings might squeeze out you some extra seconds, but porting this approach to SevenZipJBindings will still be faster than using temporary files with the same library.
Here's some code that writes the contents of an existing zip (war.zip) and appends an extra file (answer.txt) to a new zip (append.zip). All it takes is Java 5 or later, no extra libraries needed.
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;
public class Main {
// 4MB buffer
private static final byte[] BUFFER = new byte[4096 * 1024];
/**
* copy input to output stream - available in several StreamUtils or Streams classes
*/
public static void copy(InputStream input, OutputStream output) throws IOException {
int bytesRead;
while ((bytesRead = input.read(BUFFER))!= -1) {
output.write(BUFFER, 0, bytesRead);
}
}
public static void main(String[] args) throws Exception {
// read war.zip and write to append.zip
ZipFile war = new ZipFile("war.zip");
ZipOutputStream append = new ZipOutputStream(new FileOutputStream("append.zip"));
// first, copy contents from existing war
Enumeration<? extends ZipEntry> entries = war.entries();
while (entries.hasMoreElements()) {
ZipEntry e = entries.nextElement();
System.out.println("copy: " + e.getName());
append.putNextEntry(e);
if (!e.isDirectory()) {
copy(war.getInputStream(e), append);
}
append.closeEntry();
}
// now append some extra content
ZipEntry e = new ZipEntry("answer.txt");
System.out.println("append: " + e.getName());
append.putNextEntry(e);
append.write("42\n".getBytes());
append.closeEntry();
// close
war.close();
append.close();
}
}
I had a similar requirement sometime back - but it was for reading and writing zip archives (.war format should be similar). I tried doing it with the existing Java Zip streams but found the writing part cumbersome - especially when directories where involved.
I'll recommend you to try out the TrueZIP (open source - apache style licensed) library that exposes any archive as a virtual file system into which you can read and write like a normal filesystem. It worked like a charm for me and greatly simplified my development.
You could use this bit of code I wrote
public static void addFilesToZip(File source, File[] files)
{
try
{
File tmpZip = File.createTempFile(source.getName(), null);
tmpZip.delete();
if(!source.renameTo(tmpZip))
{
throw new Exception("Could not make temp file (" + source.getName() + ")");
}
byte[] buffer = new byte[1024];
ZipInputStream zin = new ZipInputStream(new FileInputStream(tmpZip));
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(source));
for(int i = 0; i < files.length; i++)
{
InputStream in = new FileInputStream(files[i]);
out.putNextEntry(new ZipEntry(files[i].getName()));
for(int read = in.read(buffer); read > -1; read = in.read(buffer))
{
out.write(buffer, 0, read);
}
out.closeEntry();
in.close();
}
for(ZipEntry ze = zin.getNextEntry(); ze != null; ze = zin.getNextEntry())
{
out.putNextEntry(ze);
for(int read = zin.read(buffer); read > -1; read = zin.read(buffer))
{
out.write(buffer, 0, read);
}
out.closeEntry();
}
out.close();
tmpZip.delete();
}
catch(Exception e)
{
e.printStackTrace();
}
}
I don't know of a Java library that does what you describe. But what you described is practical. You can do it in .NET, using DotNetZip.
Michael Krauklis is correct that you cannot simply "append" data to a war file or zip file, but it is not because there is an "end of file" indication, strictly speaking, in a war file. It is because the war (zip) format includes a directory, which is normally present at the end of the file, that contains metadata for the various entries in the war file. Naively appending to a war file results in no update to the directory, and so you just have a war file with junk appended to it.
What's necessary is an intelligent class that understands the format, and can read+update a war file or zip file, including the directory as appropriate. DotNetZip does this, without uncompressing/recompressing the unchanged entries, just as you described or desired.
As Cheeso says, there's no way of doing it. AFAIK the zip front-ends are doing exactly the same as you internally.
Anyway if you're worried about the speed of extracting/compressing everything, you may want to try the SevenZipJBindings library.
I covered this library in my blog some months ago (sorry for the auto-promotion). Just as an example, extracting a 104MB zip file using the java.util.zip took me 12 seconds, while using this library took 4 seconds.
In both links you can find examples about how to use it.
Hope it helps.
See this bug report.
Using append mode on any kind of
structured data like zip files or tar
files is not something you can really
expect to work. These file formats
have an intrinsic "end of file"
indication built into the data format.
If you really want to skip the intermediate step of un-waring/re-waring, you could read the war file file, get all the zip entries, then write to a new war file "appending" the new entries you wanted to add. Not perfect, but at least a more automated solution.
Yet Another Solution: You may find code below useful in other situations as well. I have used ant this way to compile Java directories, generating jar files, updating zip files,...
public static void antUpdateZip(String zipFilePath, String libsToAddDir) {
Project p = new Project();
p.init();
Target target = new Target();
target.setName("zip");
Zip task = new Zip();
task.init();
task.setDestFile(new File(zipFilePath));
ZipFileSet zipFileSet = new ZipFileSet();
zipFileSet.setPrefix("WEB-INF/lib");
zipFileSet.setDir(new File(libsToAddDir));
task.addFileset(zipFileSet);
task.setUpdate(true);
task.setProject(p);
task.init();
target.addTask(task);
target.setProject(p);
p.addTarget(target);
DefaultLogger consoleLogger = new DefaultLogger();
consoleLogger.setErrorPrintStream(System.err);
consoleLogger.setOutputPrintStream(System.out);
consoleLogger.setMessageOutputLevel(Project.MSG_DEBUG);
p.addBuildListener(consoleLogger);
try {
// p.fireBuildStarted();
// ProjectHelper helper = ProjectHelper.getProjectHelper();
// p.addReference("ant.projectHelper", helper);
// helper.parse(p, buildFile);
p.executeTarget(target.getName());
// p.fireBuildFinished(null);
} catch (BuildException e) {
p.fireBuildFinished(e);
throw new AssertionError(e);
}
}
this a simple code to get a response with using servlet and send a response
myZipPath = bla bla...
byte[] buf = new byte[8192];
String zipName = "myZip.zip";
String zipPath = myzippath+ File.separator+"pdf" + File.separator+ zipName;
File pdfFile = new File("myPdf.pdf");
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(zipPath));
ZipEntry zipEntry = new ZipEntry(pdfFile.getName());
out.putNextEntry(zipEntry);
InputStream in = new FileInputStream(pdfFile);
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
out.closeEntry();
in.close();
out.close();
FileInputStream fis = new FileInputStream(zipPath);
response.setContentType("application/zip");
response.addHeader("content-disposition", "attachment;filename=" + zipName);
OutputStream os = response.getOutputStream();
int length = is.read(buffer);
while (length != -1)
{
os.write(buffer, 0, length);
length = is.read(buffer);
}
Here are examples how easily files can be appended to existing zip using TrueVFS:
// append a file to archive under different name
TFile.cp(new File("existingFile.txt"), new TFile("archive.zip", "entry.txt"));
// recusively append a dir to the root of archive
TFile src = new TFile("dirPath", "dirName");
src.cp_r(new TFile("archive.zip", src.getName()));
TrueVFS, the successor of TrueZIP, uses Java 7 NIO 2 features under the hood when appropriate but offers much more features like thread-safe async parallel compression.
Beware also that Java 7 ZipFileSystem by default is vulnerable to OutOfMemoryError on huge inputs.
Here is Java 1.7 version of Liam answer which uses try with resources and Apache Commons IO.
The output is written to a new zip file but it can be easily modified to write to the original file.
/**
* Modifies, adds or deletes file(s) from a existing zip file.
*
* #param zipFile the original zip file
* #param newZipFile the destination zip file
* #param filesToAddOrOverwrite the names of the files to add or modify from the original file
* #param filesToAddOrOverwriteInputStreams the input streams containing the content of the files
* to add or modify from the original file
* #param filesToDelete the names of the files to delete from the original file
* #throws IOException if the new file could not be written
*/
public static void modifyZipFile(File zipFile,
File newZipFile,
String[] filesToAddOrOverwrite,
InputStream[] filesToAddOrOverwriteInputStreams,
String[] filesToDelete) throws IOException {
try (ZipOutputStream out = new ZipOutputStream(new FileOutputStream(newZipFile))) {
// add existing ZIP entry to output stream
try (ZipInputStream zin = new ZipInputStream(new FileInputStream(zipFile))) {
ZipEntry entry = null;
while ((entry = zin.getNextEntry()) != null) {
String name = entry.getName();
// check if the file should be deleted
if (filesToDelete != null) {
boolean ignoreFile = false;
for (String fileToDelete : filesToDelete) {
if (name.equalsIgnoreCase(fileToDelete)) {
ignoreFile = true;
break;
}
}
if (ignoreFile) {
continue;
}
}
// check if the file should be kept as it is
boolean keepFileUnchanged = true;
if (filesToAddOrOverwrite != null) {
for (String fileToAddOrOverwrite : filesToAddOrOverwrite) {
if (name.equalsIgnoreCase(fileToAddOrOverwrite)) {
keepFileUnchanged = false;
}
}
}
if (keepFileUnchanged) {
// copy the file as it is
out.putNextEntry(new ZipEntry(name));
IOUtils.copy(zin, out);
}
}
}
// add the modified or added files to the zip file
if (filesToAddOrOverwrite != null) {
for (int i = 0; i < filesToAddOrOverwrite.length; i++) {
String fileToAddOrOverwrite = filesToAddOrOverwrite[i];
try (InputStream in = filesToAddOrOverwriteInputStreams[i]) {
out.putNextEntry(new ZipEntry(fileToAddOrOverwrite));
IOUtils.copy(in, out);
out.closeEntry();
}
}
}
}
}
this works 100% , if you dont want to use extra libs ..
1) first, the class that append files to the zip ..
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
public class AddZip {
public void AddZip() {
}
public void addToZipFile(ZipOutputStream zos, String nombreFileAnadir, String nombreDentroZip) {
FileInputStream fis = null;
try {
if (!new File(nombreFileAnadir).exists()) {//NO EXISTE
System.out.println(" No existe el archivo : " + nombreFileAnadir);return;
}
File file = new File(nombreFileAnadir);
System.out.println(" Generando el archivo '" + nombreFileAnadir + "' al ZIP ");
fis = new FileInputStream(file);
ZipEntry zipEntry = new ZipEntry(nombreDentroZip);
zos.putNextEntry(zipEntry);
byte[] bytes = new byte[1024];
int length;
while ((length = fis.read(bytes)) >= 0) {zos.write(bytes, 0, length);}
zos.closeEntry();
fis.close();
} catch (FileNotFoundException ex ) {
Logger.getLogger(AddZip.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(AddZip.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
2) you can call it in your controller ..
//in the top
try {
fos = new FileOutputStream(rutaZip);
zos = new ZipOutputStream(fos);
} catch (FileNotFoundException ex) {
Logger.getLogger(UtilZip.class.getName()).log(Level.SEVERE, null, ex);
}
...
//inside your method
addZip.addToZipFile(zos, pathFolderFileSystemHD() + itemFoto.getNombre(), "foto/" + itemFoto.getNombre());
Based on the answer given by #sfussenegger above, following code is used to append to a jar file and download it:
public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
Resource resourceFile = resourceLoader.getResource("WEB-INF/lib/custom.jar");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (ZipOutputStream zos = new ZipOutputStream(baos, StandardCharsets.ISO_8859_1);) {
try (ZipFile zin = new ZipFile(resourceFile.getFile(), StandardCharsets.ISO_8859_1);) {
zin.stream().forEach((entry) -> {
try {
zos.putNextEntry(entry);
if (!entry.isDirectory()) {
zin.getInputStream(entry).transferTo(zos);
}
zos.closeEntry();
} catch (Exception ex) {
ex.printStackTrace();
}
});
}
/* build file records to be appended */
....
for (FileContents record : records) {
zos.putNextEntry(new ZipEntry(record.getFileName()));
zos.write(record.getBytes());
zos.closeEntry();
}
zos.flush();
}
response.setContentType("application/java-archive");
response.setContentLength(baos.size());
response.setHeader(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=\"custom.jar\"");
try (BufferedOutputStream out = new BufferedOutputStream(response.getOutputStream())) {
baos.writeTo(out);
}
}
I have a JSP application that allows the user to upload a ZIP file and then the application will read all the files in the ZIP and store them in a MySQL.
Upon advice I decided to use "Zip File System Provider" to handle the ZIP file:
Path zipPath = Paths.get(zipFile.getSubmittedFileName());//returns the path to the ZIP file
FileSystem fs = FileSystems.newFileSystem(zipPath, null);//creates the file system
I tried to traverse it using:
for (FileStore store: fs.getFileStores()) {
System.err.println("Store: " + store.name());
}
However it loops only one time and returns tmp.zipwhich is the entire ZIP. How do I extract the physical image files one by one so I can store them in MySQL.
Here's code that traverses given ZIP file and prints first 16 bytes of each file inside.
Path filePath = Paths.get("somefile.zip");
FileSystem fileSystem = FileSystems.newFileSystem(filePath, null);
byte[] buffer = new byte[16];
Base64.Encoder encoder = Base64.getEncoder();
for (Path rootDirectory : fileSystem.getRootDirectories()) {
Files.walk(rootDirectory).forEach(path -> {
System.out.print(path);
if (Files.isRegularFile(path)) {
System.out.print(" ");
try (InputStream stream = Files.newInputStream(path)) {
int length = stream.read(buffer);
for (int i = 0; i < length; i++) {
byte b = buffer[i];
if (32 <= b && b < 127) {
System.out.print((char) b);
} else {
System.out.printf("\\%02x", b);
}
}
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
System.out.println();
});
}
The Apache Commons Compress module probably can help you to iterate through the files.
Below is a sample extract that can iterate over multiple files and extract the byte contents
Sample
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package test;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
public class ZipTest {
public static void main(String[] args) throws FileNotFoundException, IOException {
String fileName = "C:\\temp\\ECDS-File-Upload-Processed.zip";
String destinationDir = "C:\\temp\\mango";
ZipInputStream zipInputStream = new ZipInputStream(new FileInputStream(fileName));
ZipEntry zipEntry = zipInputStream.getNextEntry();
byte[] buffer = new byte[1024];
while (zipEntry != null) {
String zipFileName = zipEntry.getName();
File extractedFile = new File(destinationDir + File.separator + zipFileName);
new File(extractedFile.getParent()).mkdirs();
FileOutputStream fos = new FileOutputStream(extractedFile);
int len;
while ((len = zipInputStream.read(buffer)) > 0) {
fos.write(buffer, 0, len);
}
fos.close();
zipEntry = zipInputStream.getNextEntry();
}
zipInputStream.closeEntry();
zipInputStream.close();
}
}
I am working with some file manipulations, just simple read and writes using the java APIs File object.
so here is what I do. first the application receives a zip file, then it extracts the contents on a TEMPFOLDER. And then another class, totally independent, will work on the TEMPFOLDER. It will check the number of files and it should count them all. then it will do stuff to perform some database functions to write the contents of the zip file to the DB
but here is the problem, when I use the application for the first time it goes well smoothly. then WHEN I DO IT THE SECOND TIME AROUND it will fail because the second class which is supposed to check the TEMPFOLDER return 0 as the number of files in the said folder but when I check it manually, it has some contents.
and the pattern it does when I test it continously, it will work, then it will not work, it will work, then it will not work. it acts like that. the reason it does not work is because the application cannot determine correctly the number of files in a folder. it has files but the file object is returning that it has no items in it. but after the error, if you run it again it will work as it is supposed to work.
if you could give out some suggestions from my explanation which is a thousand feet view, i would appreciate it and try it for my debugging. but if you would need some codes, i will post them later
by the way, I am using a web browser and a servlet to accept the zip file
here is the method that i used to write to file system.
public void extractZipAndWriteContentsToFileSystem(ZipFile zipFile) throws IOException, Exception {
Enumeration en = zipFile.entries();
while (en.hasMoreElements()) {
ZipEntry zipEntry = (ZipEntry) en.nextElement();
String name = zipEntry.getName();
long size = zipEntry.getSize();
long compressedSize = zipEntry.getCompressedSize();
File file = new File(zipExtractsTempRepo+name);
if (name.endsWith("/")) {
file.mkdirs();
continue;
}
File parent = file.getParentFile();
if (parent != null) {
parent.mkdirs();
}
InputStream is = zipFile.getInputStream(zipEntry);
FileOutputStream fos = new FileOutputStream(file);
byte[] bytes = new byte[1024];
int length;
while ((length = is.read(bytes)) >= 0) {
fos.write(bytes, 0, length);
}
file = null;
parent = null;
is.close();
fos.close();
}
zipFile.close();
traverse(new File(zipExtractsTempRepo));
}
/**
* this method traverses through a folder and its subfolders
* and its subfolders and ...
*
* this method retrieves objects that are files. if it is not
* a file, (then a directory) it looks inside it to look
* for other files
*
* #param file
* #throws IOException
*/
public void traverse(File file) throws IOException {
if (file.isDirectory()) {
File[] allFiles = file.listFiles();
for (File aFile : allFiles) {
traverse(aFile);
}
} else {
FileInputStream in = new FileInputStream(file);
FileOutputStream out = new FileOutputStream(
new File(this.tempRepo + file.getName()));
byte[] bytes = new byte[1024];
int length;
while ((length = in.read(bytes)) >= 0) {
out.write(bytes, 0, length);
}
out.write(4024);
in.close();
out.close();
}
}
now this is the method on the second class to verify the number of file written on a folder
private File[] files;
private File[] zipExtracts;
// location of the tempfolder
tempFolder = new File(RawZipFileHandler.tempRepo);
files = tempFolder.listFiles();
public void checkNumberOfFiles() throws Exception {
Util.out("will check number of files");
// check the number of documents on the tempfile folder
if (files.length != 8) {
throw new Exception ("Number of files expected not met. Expected is 8 got " + files.length);
}
}
comments for my not so pretty code are also welcomed.
I am compressing files using the Apache Commons API Compression. Windows 7 works fine, but in Linux (ubuntu 10.10 - UTF8), characters in file names and folder names, such as "º", for example, are replaced by "?".
Is there any parameter I should pass to the API when compact, or when uncompressing tar?
I'am using tar.gz format, following the API examples.
The files I'm trying compress, are created in windows... is there any trouble?
The code:
public class TarGzTest
{
public static void createTarGzOfDirectory(String directoryPath, String tarGzPath) throws IOException
{
System.out.println("Criando tar.gz da pasta " + directoryPath + " em " + tarGzPath);
FileOutputStream fOut = null;
BufferedOutputStream bOut = null;
GzipCompressorOutputStream gzOut = null;
TarArchiveOutputStream tOut = null;
try
{
fOut = new FileOutputStream(new File(tarGzPath));
bOut = new BufferedOutputStream(fOut);
gzOut = new GzipCompressorOutputStream(bOut);
tOut = new TarArchiveOutputStream(gzOut);
addFileToTarGz(tOut, directoryPath, "");
}
finally
{
tOut.finish();
tOut.close();
gzOut.close();
bOut.close();
fOut.close();
}
System.out.println("Processo concluído.");
}
private static void addFileToTarGz(TarArchiveOutputStream tOut, String path, String base) throws IOException
{
System.out.println("addFileToTarGz()::"+path);
File f = new File(path);
String entryName = base + f.getName();
TarArchiveEntry tarEntry = new TarArchiveEntry(f, entryName);
tOut.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
if(f.isFile())
{
tOut.putArchiveEntry(tarEntry);
IOUtils.copy(new FileInputStream(f), tOut);
tOut.closeArchiveEntry();
}
else
{
File[] children = f.listFiles();
if(children != null)
{
for(File child : children)
{
addFileToTarGz(tOut, child.getAbsolutePath(), entryName + "/");
}
}
}
}
}
(I suppress the main method;)
EDIT (monkeyjluffy) : The changes that I made are to have always the same archive on different platform. Then the hash calculated on it is the same.
I found a workaround for my trouble.
For some reason, java doesn't respects my environment's encoding, and change it to cp1252.
After that I uncompress the file, I just enter in it folder, and ran this command:
convmv --notest -f cp1252 -t utf8 * -r
And it converts everything recursively to UTF-8.
Problem solved, guys.
more info about encoding problems in linux here.
Thanks everyone for the help.
How do I extract a tar (or tar.gz, or tar.bz2) file in Java?
You can do this with the Apache Commons Compress library. You can download the 1.2 version from http://mvnrepository.com/artifact/org.apache.commons/commons-compress/1.2.
Here are two methods: one that unzips a file and another one that untars it. So, for a file
<fileName>tar.gz, you need to first unzip it and after that untar it. Please note that the tar archive may contain folders as well, case in which they need to be created on the local filesystem.
Enjoy.
/** Untar an input file into an output file.
* The output file is created in the output folder, having the same name
* as the input file, minus the '.tar' extension.
*
* #param inputFile the input .tar file
* #param outputDir the output directory file.
* #throws IOException
* #throws FileNotFoundException
*
* #return The {#link List} of {#link File}s with the untared content.
* #throws ArchiveException
*/
private static List<File> unTar(final File inputFile, final File outputDir) throws FileNotFoundException, IOException, ArchiveException {
LOG.info(String.format("Untaring %s to dir %s.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath()));
final List<File> untaredFiles = new LinkedList<File>();
final InputStream is = new FileInputStream(inputFile);
final TarArchiveInputStream debInputStream = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", is);
TarArchiveEntry entry = null;
while ((entry = (TarArchiveEntry)debInputStream.getNextEntry()) != null) {
final File outputFile = new File(outputDir, entry.getName());
if (entry.isDirectory()) {
LOG.info(String.format("Attempting to write output directory %s.", outputFile.getAbsolutePath()));
if (!outputFile.exists()) {
LOG.info(String.format("Attempting to create output directory %s.", outputFile.getAbsolutePath()));
if (!outputFile.mkdirs()) {
throw new IllegalStateException(String.format("Couldn't create directory %s.", outputFile.getAbsolutePath()));
}
}
} else {
LOG.info(String.format("Creating output file %s.", outputFile.getAbsolutePath()));
final OutputStream outputFileStream = new FileOutputStream(outputFile);
IOUtils.copy(debInputStream, outputFileStream);
outputFileStream.close();
}
untaredFiles.add(outputFile);
}
debInputStream.close();
return untaredFiles;
}
/**
* Ungzip an input file into an output file.
* <p>
* The output file is created in the output folder, having the same name
* as the input file, minus the '.gz' extension.
*
* #param inputFile the input .gz file
* #param outputDir the output directory file.
* #throws IOException
* #throws FileNotFoundException
*
* #return The {#File} with the ungzipped content.
*/
private static File unGzip(final File inputFile, final File outputDir) throws FileNotFoundException, IOException {
LOG.info(String.format("Ungzipping %s to dir %s.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath()));
final File outputFile = new File(outputDir, inputFile.getName().substring(0, inputFile.getName().length() - 3));
final GZIPInputStream in = new GZIPInputStream(new FileInputStream(inputFile));
final FileOutputStream out = new FileOutputStream(outputFile);
IOUtils.copy(in, out);
in.close();
out.close();
return outputFile;
}
Note: This functionality was later published through a separate project, Apache Commons Compress, as described in another answer. This answer is out of date.
I haven't used a tar API directly, but tar and bzip2 are implemented in Ant; you could borrow their implementation, or possibly use Ant to do what you need.
Gzip is part of Java SE (and I'm guessing the Ant implementation follows the same model).
GZIPInputStream is just an InputStream decorator. You can wrap, for example, a FileInputStream in a GZIPInputStream and use it in the same way you'd use any InputStream:
InputStream is = new GZIPInputStream(new FileInputStream(file));
(Note that the GZIPInputStream has its own, internal buffer, so wrapping the FileInputStream in a BufferedInputStream would probably decrease performance.)
Archiver archiver = ArchiverFactory.createArchiver("tar", "gz");
archiver.extract(archiveFile, destDir);
Dependency:
<dependency>
<groupId>org.rauschig</groupId>
<artifactId>jarchivelib</artifactId>
<version>0.5.0</version>
</dependency>
Apache Commons VFS supports tar as a virtual file system, which supports URLs like this one tar:gz:http://anyhost/dir/mytar.tar.gz!/mytar.tar!/path/in/tar/README.txt
TrueZip or its successor TrueVFS does the same ... it's also available from Maven Central.
I just tried a bunch of the suggested libs (TrueZip, Apache Compress), but no luck.
Here is an example with Apache Commons VFS:
FileSystemManager fsManager = VFS.getManager();
FileObject archive = fsManager.resolveFile("tgz:file://" + fileName);
// List the children of the archive file
FileObject[] children = archive.getChildren();
System.out.println("Children of " + archive.getName().getURI()+" are ");
for (int i = 0; i < children.length; i++) {
FileObject fo = children[i];
System.out.println(fo.getName().getBaseName());
if (fo.isReadable() && fo.getType() == FileType.FILE
&& fo.getName().getExtension().equals("nxml")) {
FileContent fc = fo.getContent();
InputStream is = fc.getInputStream();
}
}
And the maven dependency:
<dependency>
<groupId>commons-vfs</groupId>
<artifactId>commons-vfs</artifactId>
<version>1.0</version>
</dependency>
In addition to gzip and bzip2, Apache Commons Compress API has also tar support, originally based on ICE Engineering Java Tar Package, which is both API and standalone tool.
What about using this API for tar files, this other one included inside Ant for BZIP2 and the standard one for GZIP?
Here's a version based on this earlier answer by Dan Borza that uses Apache Commons Compress and Java NIO (i.e. Path instead of File). It also does the uncompression and untarring in one stream so there's no intermediate file creation.
public static void unTarGz( Path pathInput, Path pathOutput ) throws IOException {
TarArchiveInputStream tararchiveinputstream =
new TarArchiveInputStream(
new GzipCompressorInputStream(
new BufferedInputStream( Files.newInputStream( pathInput ) ) ) );
ArchiveEntry archiveentry = null;
while( (archiveentry = tararchiveinputstream.getNextEntry()) != null ) {
Path pathEntryOutput = pathOutput.resolve( archiveentry.getName() );
if( archiveentry.isDirectory() ) {
if( !Files.exists( pathEntryOutput ) )
Files.createDirectory( pathEntryOutput );
}
else
Files.copy( tararchiveinputstream, pathEntryOutput );
}
tararchiveinputstream.close();
}