Speed up the decompression of a tar.gz

Speed up the decompression of a tar.gz - java

I'm trying to unpack a tar.gz archive with lots of files and folders inside. I'm using Apache commons compress v1.21
The code is this:
public void decompress(File archive, File destination) {
File tar = new File(destination, archive.getName().substring(0, archive.getName().length() - 3));
try (FileInputStream isArchive = new FileInputStream(archive);
GZIPInputStream gzip = new GZIPInputStream(new BufferedInputStream(isArchive));
OutputStream out = new FileOutputStream(tar);
InputStream in = new FileInputStream(tar);
TarArchiveInputStream is = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", in)
) {
IOUtils.copy(gzip, out);
TarArchiveEntry entry;
while ((entry = is.getNextTarEntry()) != null) {
File file = new File(destination, entry.getName());
file.getParentFile().mkdirs();
Files.write(file.toPath(), is.readAllBytes());
}
} catch (IOException | ArchiveException e) {
Logger.log(LogLevel.ERROR, e, "An error occurred during a decompression", "Archive: " + archive.getName());
return;
}
tar.delete();
}
Is there any way to speed up the process? Can multiple files be decompressed in parallel?

Related

Why does my program skip files when unzipping by java.util.zip?

I read quite a few articles, but I did not find a similar problem and its solution.
I'm try to read all files and some skipped with method zis.getNextEntry
public static void main(String[] args) throws Exception {
String fileZip = "src/main/resources/unzipTest/fias_xml.zip";
ZipInputStream zis = new ZipInputStream(new FileInputStream(fileZip));
ZipEntry entry;
while ((entry = zis.getNextEntry()) != null) {
System.out.println(entry.getName());
}
}
}
But if you unzip with WinRar, for example, everything will be unzipped correctly
Archive files
After running the program
Or how i can see why some files doesn't read?
Can the archive be broken?
After I unzipped and re-zipped the files by using winrar, the program worked correctly. Why was winrar able to do this, but the java code was not?
zipArchive
jdk1.8.0_161

Based on the test i did i able to print each directory and file name correctly.
There 2 scenario came to my mind:
i) the filename length or the complete length is more what the platform can handle. But this also should be same case while do unzip from winrar
ii) Was there any permission issue, but again it won't be selective manner.
can you please let me which jdk version ?
Will u be able to sent me the zip file, I can try to simulate.

public void unzip(String zipFile, String destDir)
{
try
{
int BUFFER = 8*1024;
File file = new File(zipFile);
ZipFile zip = new ZipFile(file);
String newPath = destDir;
new File(newPath).mkdir();
Enumeration zipFileEntries = zip.entries();
while (zipFileEntries.hasMoreElements())
{
ZipEntry entry = (ZipEntry) zipFileEntries.nextElement();
String currentEntry = entry.getName();
File destFile = new File(newPath, currentEntry);
File destinationParent = destFile.getParentFile();
destinationParent.mkdirs();
if (!entry.isDirectory())
{
BufferedInputStream is = new BufferedInputStream(zip
.getInputStream(entry));
int currentByte;
byte[] data = new byte[BUFFER];
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream dest = new BufferedOutputStream(fos,
BUFFER);
while ((currentByte = is.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, currentByte);
}
dest.flush();
dest.close();
is.close();
}
}
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
}

How to decompress BZIP (not BZIP2) with Apache Commons

I have been working on a task to decompress from different types of file format such as "zip,tar,tbz,tgz". I am able to do for all except tbz because apache common compress library provides BZIP2 compressors. But I need to decompress a old BZIP not BZIP2. Is there any way to do it java. I have added the code I have done so far for extracting different tar file archives using apache commons library below.
public List<ArchiveFile> processTarFiles(String compressedFilePath, String fileType) throws IOException {
List<ArchiveFile> extractedFileList = null;
TarArchiveInputStream is = null;
FileOutputStream fos = null;
BufferedOutputStream dest = null;
try {
if(fileType.equalsIgnoreCase("tar"))
{
is = new TarArchiveInputStream(new FileInputStream(new File(compressedFilePath)));
}
else if(fileType.equalsIgnoreCase("tbz")||fileType.equalsIgnoreCase("bz"))
{
is = new TarArchiveInputStream(new BZip2CompressorInputStream(new FileInputStream(new File(compressedFilePath))));
}
else if(fileType.equalsIgnoreCase("tgz")||fileType.equalsIgnoreCase("gz"))
{
is = new TarArchiveInputStream(new GzipCompressorInputStream(new FileInputStream(new File(compressedFilePath))));
}
TarArchiveEntry entry = is.getNextTarEntry();
extractedFileList = new ArrayList<>();
while (entry != null) {
// grab a zip file entry
String currentEntry = entry.getName();
if (!entry.isDirectory()) {
File destFile = new File(Constants.DEFAULT_ZIPOUTPUTPATH, currentEntry);
File destinationParent = destFile.getParentFile();
// create the parent directory structure if needed
destinationParent.mkdirs();
ArchiveFile archiveFile = new ArchiveFile();
int currentByte;
// establish buffer for writing file
byte data[] = new byte[(int) entry.getSize()];
// write the current file to disk
fos = new FileOutputStream(destFile);
dest = new BufferedOutputStream(fos, (int) entry.getSize());
// read and write until last byte is encountered
while ((currentByte = is.read(data, 0, (int) entry.getSize())) != -1) {
dest.write(data, 0, currentByte);
}
dest.flush();
dest.close();
archiveFile.setExtractedFilePath(destFile.getAbsolutePath());
archiveFile.setFormat(destFile.getName().split("\\.")[1]);
extractedFileList.add(archiveFile);
entry = is.getNextTarEntry();
} else {
new File(Constants.DEFAULT_ZIPOUTPUTPATH, currentEntry).mkdirs();
entry = is.getNextTarEntry();
}
}
} catch (IOException e) {
System.out.println(("ERROR: " + e.getMessage()));
} catch (Exception e) {
System.out.println(("ERROR: " + e.getMessage()));
} finally {
is.close();
dest.flush();
dest.close();
}
return extractedFileList;
}

The original Bzip was supposedly using a patented algorithm so Bzip2 was born using algorithms and techniques that were not patented.
That might be the reason why it's no longer in widespread use and open source libraries ignore it.
There's some C code for decompressing Bzip files shown here (gist.github.com mirror).
You might want to read and rewrite that in Java.

Zip file not deleted even if I am getting its correct name and path

I am trying to delete a zip file after unziping. but I am not able to delete it:
if (file.getName().contains(".zip")) {
System.out.println(file.getAbsolutePath()); // I am getting the correct path
file.delete();
System.out.println(file.getName()); // I am getting the correct name Script-1.zip
}
This is the full code
public class Zip4 {
public static void main(String[] args) {
File[] files = new File(args[0]).listFiles();
for(File file : files)
// System.out.println(file.getName());
//if(file.getName().contains("1400") && file.getName().contains(".zip"))
extractFolder(args[0] + file.getName(), args[1]);
DeleteFiles();
// for(File file : files)
// System.out.println("File:C:/1/"+ file.getName());
// extractFolder(args[0]+file.getName(),args[1]);
}
private static void DeleteFiles()
{
File f = null;
File[] paths;
f = new File("D:/Copyof");
paths = f.listFiles();
for(File path:paths)
{
// prints file and directory paths
if(path.getName().contains("J14_0_0RC") || path.getName().contains(".zip") || path.getName().contains(".log"))
{
//System.out.println(path);
path.delete();
}
}
}
private static void extractFolder(String zipFile,String extractFolder)
{
try
{
int BUFFER = 2048;
File file = new File(zipFile);
ZipFile zip = new ZipFile(file);
String newPath = extractFolder;
new File(newPath).mkdir();
Enumeration zipFileEntries = zip.entries();
// Process each entry
while (zipFileEntries.hasMoreElements())
{
// grab a zip file entry
ZipEntry entry = (ZipEntry) zipFileEntries.nextElement();
String currentEntry = entry.getName();
File destFile = new File(newPath, currentEntry);
//destFile = new File(newPath, destFile.getName());
File destinationParent = destFile.getParentFile();
// create the parent directory structure if needed
destinationParent.mkdirs();
if (!entry.isDirectory())
{
BufferedInputStream is = new BufferedInputStream(zip
.getInputStream(entry));
int currentByte;
// establish buffer for writing file
byte data[] = new byte[BUFFER];
// write the current file to disk
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream dest = new BufferedOutputStream(fos,
BUFFER);
// read and write until last byte is encountered
while ((currentByte = is.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, currentByte);
}
dest.flush();
dest.close();
fos.flush();
fos.close();
is.close();
}
}
if(file.getName().contains(".zip"))
{
System.out.println(file.getAbsolutePath());
file.delete();
System.out.println(file.getName());
}
}
catch (Exception e)
{
System.out.println("Error: " + e.getMessage());
}
}
}

ZipFile is a closeable resource. So either close() it once you're done in a finally block or create it with try-with-resources (since java7):
try(ZipFile zip = new ZipFile(file)){
//unzip here
}
file.delete();
Apart from this, you should revisit this block
dest.flush();
dest.close();
fos.flush();
fos.close();
is.close();
which is quite prone to resource leaks. If one of the upper calls fails, all subsequent calls are not invoked, resulting in unclosed resources and resource leakage.
So best would be to use try-with-resources here, too.
try(BufferedInputStream is = new BufferedInputStream(zip.getInputStream(entry));
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER)) {
//write the data
} //all streams are closed implicitly here
Or use an existing tool for that, for example Apache Commons IO IOUtil.closeQuietly(resource) or embedd every single call into
if(resource != null) {
try{
resource.close();
} catch(IOException e){
//omit
}
}
You could also omit the call to flush() which is done implicitly when closing the resource.

Reading gzip files inside gzip file using Java

Using Java I have to read text files which are inside gz file which is in another .tar.gz
gz_ltm_logs.tar.gz is the filename. It then has files ltm.1.gz, ltm.2.gz inside it and then these files have text files in them.
I wanted to do it using java.util.zip.* only but if it is impossible then I can look at other libraries.
I thought I will be able to do it using java.util.zip. But doesn't seem straightforward

Here's some code to give you an idea. This method will try to extract a given tar.gz file to outputFolder.
public static void extract(File input, File outputFolder) throws IOException {
byte[] buffer = new byte[1024];
GZIPInputStream gzipFile = new GZIPInputStream(new FileInputStream(input));
ByteOutputStream tarStream = new ByteOutputStream();
int gzipLengthRead;
while ((gzipLengthRead = gzipFile.read(buffer)) > 0){
tarStream.write(buffer, 0, gzipLengthRead);
}
gzipFile.close();
org.apache.tools.tar.TarInputStream tarFile = null;
// files inside the tar
OutputStream out = null;
try {
tarFile = new org.apache.tools.tar.TarInputStream(tarStream.newInputStream());
tarStream.close();
TarEntry entry = null;
while ((entry = tarFile.getNextEntry()) != null) {
String outFilename = entry.getName();
if (entry.isDirectory()) {
File directory = new File(outputFolder, outFilename);
directory.mkdirs();
} else {
File outputFile = new File(outputFolder, outFilename);
File outputDirectory = outputFile.getParentFile();
if (!outputDirectory.exists()) {
outputDirectory.mkdirs();
}
out = new FileOutputStream(outputFile);
// Transfer bytes from the tarFile to the output file
int innerLen;
while ((innerLen = tarFile.read(buffer)) > 0) {
out.write(buffer, 0, innerLen);
}
out.close();
}
}
} finally {
if (tarFile != null) {
tarFile.close();
}
if (out != null) {
out.close();
}
}
}

Extracting zip file into a folder throws "Invalid entry size (expected 46284 but got 46285 bytes)" for one of the entry

When I am trying to extract the zip file into a folder as per the below code, for one of the entry (A text File) getting an error as "Invalid entry size (expected 46284 but got 46285 bytes)" and my extraction is stopping abruptly. My zip file contains around 12 text files and 20 TIF files. It is encountering the problem for the text file and is not able to proceed further as it is coming into the Catch block.
I face this problem only in Production Server which is running on Unix and there is no problem with the other servers(Dev, Test, UAT).
We are getting the zip into the servers path through an external team who does the file transfer and then my code starts working to extract the zip file.
...
int BUFFER = 2048;
java.io.BufferedOutputStream dest = null;
String ZipExtractDir = "/y34/ToBeProcessed/";
java.io.File MyDirectory = new java.io.File(ZipExtractDir);
MyDirectory.mkdir();
ZipFilePath = "/y34/work_ZipResults/Test.zip";
// Creating fileinputstream for zip file
java.io.FileInputStream fis = new java.io.FileInputStream(ZipFilePath);
// Creating zipinputstream for using fileinputstream
java.util.zip.ZipInputStream zis = new java.util.zip.ZipInputStream(new java.io.BufferedInputStream(fis));
java.util.zip.ZipEntry entry;
while ((entry = zis.getNextEntry()) != null)
{
int count;
byte data[] = new byte[BUFFER];
java.io.File f = new java.io.File(ZipExtractDir + "/" + entry.getName());
// write the files to the directory created above
java.io.FileOutputStream fos = new java.io.FileOutputStream(ZipExtractDir + "/" + entry.getName());
dest = new java.io.BufferedOutputStream(fos, BUFFER);
while ((count = zis.read(data, 0, BUFFER)) != -1)
{
dest.write(data, 0, count);
}
dest.flush();
dest.close();
}
zis.close();
zis.closeEntry();
}
catch (Exception Ex)
{
System.Out.Println("Exception in \"ExtractZIPFiles\"---- " + Ex.getMessage());
}

I can't understand the problem you're meeting, but here is the method I use to unzip an archive:
public static void unzip(File zip, File extractTo) throws IOException {
ZipFile archive = new ZipFile(zip);
Enumeration<? extends ZipEntry> e = archive.entries();
while (e.hasMoreElements()) {
ZipEntry entry = e.nextElement();
File file = new File(extractTo, entry.getName());
if (entry.isDirectory()) {
file.mkdirs();
} else {
if (!file.getParentFile().exists()) {
file.getParentFile().mkdirs();
}
InputStream in = archive.getInputStream(entry);
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(file));
IOUtils.copy(in, out);
in.close();
out.close();
}
}
}
Calling:
File zip = new File("/path/to/my/file.zip");
File extractTo = new File("/path/to/my/destination/folder");
unzip(zip, extractTo);
I never met any issue with the code above, so I hope that could help you.

Off the top of my head, I could think of these reasons:
There could be problem with the encoding of the text file.
The file needs to be read/transferred in "binary" mode.
There could be an issue with the line ending \n or \r\n
The file could simply be corrupt. Try opening the file with a zip utility.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Speed up the decompression of a tar.gz - java

Related

Why does my program skip files when unzipping by java.util.zip?

How to decompress BZIP (not BZIP2) with Apache Commons

Zip file not deleted even if I am getting its correct name and path

Reading gzip files inside gzip file using Java

Extracting zip file into a folder throws "Invalid entry size (expected 46284 but got 46285 bytes)" for one of the entry

Categories

Resources