How to decompress BZIP (not BZIP2) with Apache Commons - java

I have been working on a task to decompress from different types of file format such as "zip,tar,tbz,tgz". I am able to do for all except tbz because apache common compress library provides BZIP2 compressors. But I need to decompress a old BZIP not BZIP2. Is there any way to do it java. I have added the code I have done so far for extracting different tar file archives using apache commons library below.
public List<ArchiveFile> processTarFiles(String compressedFilePath, String fileType) throws IOException {
List<ArchiveFile> extractedFileList = null;
TarArchiveInputStream is = null;
FileOutputStream fos = null;
BufferedOutputStream dest = null;
try {
if(fileType.equalsIgnoreCase("tar"))
{
is = new TarArchiveInputStream(new FileInputStream(new File(compressedFilePath)));
}
else if(fileType.equalsIgnoreCase("tbz")||fileType.equalsIgnoreCase("bz"))
{
is = new TarArchiveInputStream(new BZip2CompressorInputStream(new FileInputStream(new File(compressedFilePath))));
}
else if(fileType.equalsIgnoreCase("tgz")||fileType.equalsIgnoreCase("gz"))
{
is = new TarArchiveInputStream(new GzipCompressorInputStream(new FileInputStream(new File(compressedFilePath))));
}
TarArchiveEntry entry = is.getNextTarEntry();
extractedFileList = new ArrayList<>();
while (entry != null) {
// grab a zip file entry
String currentEntry = entry.getName();
if (!entry.isDirectory()) {
File destFile = new File(Constants.DEFAULT_ZIPOUTPUTPATH, currentEntry);
File destinationParent = destFile.getParentFile();
// create the parent directory structure if needed
destinationParent.mkdirs();
ArchiveFile archiveFile = new ArchiveFile();
int currentByte;
// establish buffer for writing file
byte data[] = new byte[(int) entry.getSize()];
// write the current file to disk
fos = new FileOutputStream(destFile);
dest = new BufferedOutputStream(fos, (int) entry.getSize());
// read and write until last byte is encountered
while ((currentByte = is.read(data, 0, (int) entry.getSize())) != -1) {
dest.write(data, 0, currentByte);
}
dest.flush();
dest.close();
archiveFile.setExtractedFilePath(destFile.getAbsolutePath());
archiveFile.setFormat(destFile.getName().split("\\.")[1]);
extractedFileList.add(archiveFile);
entry = is.getNextTarEntry();
} else {
new File(Constants.DEFAULT_ZIPOUTPUTPATH, currentEntry).mkdirs();
entry = is.getNextTarEntry();
}
}
} catch (IOException e) {
System.out.println(("ERROR: " + e.getMessage()));
} catch (Exception e) {
System.out.println(("ERROR: " + e.getMessage()));
} finally {
is.close();
dest.flush();
dest.close();
}
return extractedFileList;
}

The original Bzip was supposedly using a patented algorithm so Bzip2 was born using algorithms and techniques that were not patented.
That might be the reason why it's no longer in widespread use and open source libraries ignore it.
There's some C code for decompressing Bzip files shown here (gist.github.com mirror).
You might want to read and rewrite that in Java.

Related

Speed up the decompression of a tar.gz

I'm trying to unpack a tar.gz archive with lots of files and folders inside. I'm using Apache commons compress v1.21
The code is this:
public void decompress(File archive, File destination) {
File tar = new File(destination, archive.getName().substring(0, archive.getName().length() - 3));
try (FileInputStream isArchive = new FileInputStream(archive);
GZIPInputStream gzip = new GZIPInputStream(new BufferedInputStream(isArchive));
OutputStream out = new FileOutputStream(tar);
InputStream in = new FileInputStream(tar);
TarArchiveInputStream is = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", in)
) {
IOUtils.copy(gzip, out);
TarArchiveEntry entry;
while ((entry = is.getNextTarEntry()) != null) {
File file = new File(destination, entry.getName());
file.getParentFile().mkdirs();
Files.write(file.toPath(), is.readAllBytes());
}
} catch (IOException | ArchiveException e) {
Logger.log(LogLevel.ERROR, e, "An error occurred during a decompression", "Archive: " + archive.getName());
return;
}
tar.delete();
}
Is there any way to speed up the process? Can multiple files be decompressed in parallel?

Using LZ4 to Add to an existing .lz4 (zip) in Java

I am compressing in java using the following and the LZ4 library. If I try to call this method again on the same file name, it overwrites with the new contents instead of appending. Is there a way to append using LZ4? I just want to add another file to the existing zip archive at a later time.
public void zipFile(File[] fileToZip, String outputFileName, boolean activeZip)
{
try (FileOutputStream fos = new FileOutputStream(new File(outputFileName));
LZ4FrameOutputStream lz4fos = new LZ4FrameOutputStream(fos);)
{
for (File a : fileToZip)
{
try (FileInputStream fis = new FileInputStream(a))
{
byte[] buf = new byte[bufferSizeZip];
int length;
while ((length = fis.read(buf)) > 0)
{
lz4fos.write(buf, 0, length);
}
}
}
}
catch (Exception e)
{
LOG.error("Zipping file failed ", e);
}
}
The only way I could figure out how to do this is to send
new FileOutputStream(new File(outputFileName),false)
in the try-with-resources

Zip file not deleted even if I am getting its correct name and path

I am trying to delete a zip file after unziping. but I am not able to delete it:
if (file.getName().contains(".zip")) {
System.out.println(file.getAbsolutePath()); // I am getting the correct path
file.delete();
System.out.println(file.getName()); // I am getting the correct name Script-1.zip
}
This is the full code
public class Zip4 {
public static void main(String[] args) {
File[] files = new File(args[0]).listFiles();
for(File file : files)
// System.out.println(file.getName());
//if(file.getName().contains("1400") && file.getName().contains(".zip"))
extractFolder(args[0] + file.getName(), args[1]);
DeleteFiles();
// for(File file : files)
// System.out.println("File:C:/1/"+ file.getName());
// extractFolder(args[0]+file.getName(),args[1]);
}
private static void DeleteFiles()
{
File f = null;
File[] paths;
f = new File("D:/Copyof");
paths = f.listFiles();
for(File path:paths)
{
// prints file and directory paths
if(path.getName().contains("J14_0_0RC") || path.getName().contains(".zip") || path.getName().contains(".log"))
{
//System.out.println(path);
path.delete();
}
}
}
private static void extractFolder(String zipFile,String extractFolder)
{
try
{
int BUFFER = 2048;
File file = new File(zipFile);
ZipFile zip = new ZipFile(file);
String newPath = extractFolder;
new File(newPath).mkdir();
Enumeration zipFileEntries = zip.entries();
// Process each entry
while (zipFileEntries.hasMoreElements())
{
// grab a zip file entry
ZipEntry entry = (ZipEntry) zipFileEntries.nextElement();
String currentEntry = entry.getName();
File destFile = new File(newPath, currentEntry);
//destFile = new File(newPath, destFile.getName());
File destinationParent = destFile.getParentFile();
// create the parent directory structure if needed
destinationParent.mkdirs();
if (!entry.isDirectory())
{
BufferedInputStream is = new BufferedInputStream(zip
.getInputStream(entry));
int currentByte;
// establish buffer for writing file
byte data[] = new byte[BUFFER];
// write the current file to disk
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream dest = new BufferedOutputStream(fos,
BUFFER);
// read and write until last byte is encountered
while ((currentByte = is.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, currentByte);
}
dest.flush();
dest.close();
fos.flush();
fos.close();
is.close();
}
}
if(file.getName().contains(".zip"))
{
System.out.println(file.getAbsolutePath());
file.delete();
System.out.println(file.getName());
}
}
catch (Exception e)
{
System.out.println("Error: " + e.getMessage());
}
}
}
ZipFile is a closeable resource. So either close() it once you're done in a finally block or create it with try-with-resources (since java7):
try(ZipFile zip = new ZipFile(file)){
//unzip here
}
file.delete();
Apart from this, you should revisit this block
dest.flush();
dest.close();
fos.flush();
fos.close();
is.close();
which is quite prone to resource leaks. If one of the upper calls fails, all subsequent calls are not invoked, resulting in unclosed resources and resource leakage.
So best would be to use try-with-resources here, too.
try(BufferedInputStream is = new BufferedInputStream(zip.getInputStream(entry));
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER)) {
//write the data
} //all streams are closed implicitly here
Or use an existing tool for that, for example Apache Commons IO IOUtil.closeQuietly(resource) or embedd every single call into
if(resource != null) {
try{
resource.close();
} catch(IOException e){
//omit
}
}
You could also omit the call to flush() which is done implicitly when closing the resource.

Reading gzip files inside gzip file using Java

Using Java I have to read text files which are inside gz file which is in another .tar.gz
gz_ltm_logs.tar.gz is the filename. It then has files ltm.1.gz, ltm.2.gz inside it and then these files have text files in them.
I wanted to do it using java.util.zip.* only but if it is impossible then I can look at other libraries.
I thought I will be able to do it using java.util.zip. But doesn't seem straightforward
Here's some code to give you an idea. This method will try to extract a given tar.gz file to outputFolder.
public static void extract(File input, File outputFolder) throws IOException {
byte[] buffer = new byte[1024];
GZIPInputStream gzipFile = new GZIPInputStream(new FileInputStream(input));
ByteOutputStream tarStream = new ByteOutputStream();
int gzipLengthRead;
while ((gzipLengthRead = gzipFile.read(buffer)) > 0){
tarStream.write(buffer, 0, gzipLengthRead);
}
gzipFile.close();
org.apache.tools.tar.TarInputStream tarFile = null;
// files inside the tar
OutputStream out = null;
try {
tarFile = new org.apache.tools.tar.TarInputStream(tarStream.newInputStream());
tarStream.close();
TarEntry entry = null;
while ((entry = tarFile.getNextEntry()) != null) {
String outFilename = entry.getName();
if (entry.isDirectory()) {
File directory = new File(outputFolder, outFilename);
directory.mkdirs();
} else {
File outputFile = new File(outputFolder, outFilename);
File outputDirectory = outputFile.getParentFile();
if (!outputDirectory.exists()) {
outputDirectory.mkdirs();
}
out = new FileOutputStream(outputFile);
// Transfer bytes from the tarFile to the output file
int innerLen;
while ((innerLen = tarFile.read(buffer)) > 0) {
out.write(buffer, 0, innerLen);
}
out.close();
}
}
} finally {
if (tarFile != null) {
tarFile.close();
}
if (out != null) {
out.close();
}
}
}

How does one go about finding a specific directory out of a jar/zip file in java?

I have been working on this for quite a few hours. I can't seem to find the issue to this problem. Essentially what I have is this:
I have a jar, let's call it "a.jar"
I need to get the directory "z" and it's contents from "a.jar", but "z" isn't in the root directory of "a.jar".
"z" is in "/x/y/" and "/x/y/" is in "a.jar", so it looks like this:
"a.jar/x/y/z/"
I hope that's a decent explanation. By the way, "a.jar" is what everything is running out of, so its in the class path obviously.
Basically for each ZipEntry you have to check if it isDirectory() and parse that also.
Checkout this link:
http://www.javaworld.com/javaworld/javatips/jw-javatip49.html
LE:
Here is a complete example that extracts the files from the jar, and if you specify a specific path it will extract only that folder:
public void doUnzip(String inputZip, String destinationDirectory, String specificPath)
throws IOException {
int BUFFER = 2048;
File sourceZipFile = new File(inputZip);
File unzipDestinationDirectory = new File(destinationDirectory);
unzipDestinationDirectory.mkdir();
ZipFile zipFile;
// Open Zip file for reading
zipFile = new ZipFile(sourceZipFile, ZipFile.OPEN_READ);
// Create an enumeration of the entries in the zip file
Enumeration<?> zipFileEntries = zipFile.entries();
// Process each entry
while (zipFileEntries.hasMoreElements()) {
// grab a zip file entry
ZipEntry entry = (ZipEntry) zipFileEntries.nextElement();
if(specificPath != null){
if(entry.getName().startsWith(specificPath) == false)
continue;
}
File destFile = new File(unzipDestinationDirectory, entry.getName());
// create the parent directory structure if needed
destFile.getParentFile().mkdirs();
try {
// extract file if not a directory
if (!entry.isDirectory()) {
BufferedInputStream is = new BufferedInputStream(
zipFile.getInputStream(entry));
// establish buffer for writing file
byte data[] = new byte[BUFFER];
// write the current file to disk
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream dest = new BufferedOutputStream(fos,
BUFFER);
// read and write until last byte is encountered
for (int bytesRead; (bytesRead = is.read(data, 0, BUFFER)) != -1;) {
dest.write(data, 0, bytesRead);
}
dest.flush();
dest.close();
is.close();
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
zipFile.close();
}
public static void main(String[] args) {
Unzip unzip = new Unzip();
try {
unzip.doUnzip("test.jar", "output", "x/y/z");
} catch (IOException e) {
e.printStackTrace();
}
}
..(ZipEntry), but they don't work very well with sub-directories.
They work just fine. Iterate the entries and simply check the path equates to that sub-directory. If it does, add it to a list (or process it, whatever).

Categories

Resources