How to read a .gz or .bzip2 file in java

How to read a .gz or .bzip2 file in java - java

I have .gz and .bzip2 files and I need this to be extracted and displayed. I looked at couple of places and it mentions to use zip4j utility. Can I use this to extract and display the files? Please let me know.
I referred the post Uncompress BZIP2 archive and tried to implement as below, but I am not sure what to pass to FileOutputStream and it is giving me an error.Also,is it possible to create a file object after its uncompressed.Please assist. Thank you for all the help again.
if (getExtension(filename).equals("bz2")) {
try {
FileInputStream fin = new FileInputStream("C:\\temp\\test.bz2");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("archive.tar");
BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
int buffersize = 1024;
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = bzIn.read(buffer))) {
out.write(buffer, 0, n);
}
out.close();
bzIn.close();
}
catch (Exception e) {
throw new Error(e.getMessage());
}
Thanks in advance.
~Akshitha

.gz file extension is an indication for using GZIP compression.
You can use GZIPInputStream for opening GZIP in java 7 SDK: http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPInputStream.html
BZIP2 compression / decompression is not supported, afaik, by Java SDK libraries and you will have to use a 3rd party library such as Apache Commons Compress
The lib you mentioned seem (from a very brief view) to support only ZIP format, not the GZIP of BZIP2 variants - which are different.
Please note that unlike ZIP compression, GZIP compress all files into one pack - so you cannot extract a single file without decompressing the entire package (or maybe until you reached all the bytes you wanted).

Related

Unzipping into a ByteArrayOutputStream -- why am I getting an EOFException?

I have been trying to create a Java program that will read zip files from an online API, unzip them into memory (not into the file system), and load them into a database. Since the unzipped files need to be loaded into the database in a specific order, I will have to unzip all of the files before I load any of them.
I basically used another question on StackOverflow as a model on how to do this. Using ZipInputStream from util.zip I was able to do this with a smaller ZIP (0.7MB zipped ~4MB unzipped), but when I encountered a larger file (25MB zipped, 135MB unzipped), the two largest files were not read into memory. I was not even able to retrieve a ZipEntry for these larger files (8MB and 120MB, the latter making up the vast majority of the data in the zip file). No exceptions were thrown, and my program proceeded until it tried to access tha the unzipped files that failed to be written, and threw NullPointerException.
I am using Jsoup to get the zipfile from online.
Has anyone had any experience with this and can give guidance on why I am unable to retrieve the complete contents of the zip file?
Below is the code that I am using. I am collecting unzipped files as InputStreams in a HashMap, and when there are no more ZipEntrys, the program should stop looking for ZipEntrys when there are no more left.
private Map<String, InputStream> unzip(ZipInputStream verZip) throws IOException {
Map<String, InputStream> result = new HashMap<>();
while (true) {
ZipEntry entry;
byte[] b = new byte[1024];
ByteArrayOutputStream out = new ByteArrayOutputStream();
int l;
entry = verZip.getNextEntry();//Might throw IOException
if (entry == null) {
break;
}
try {
while ((l = verZip.read(b)) > 0) {
out.write(b, 0, l);
}
out.flush();
}catch(EOFException e){
e.printStackTrace();
}
catch (IOException i) {
System.out.println("there was an ioexception");
i.printStackTrace();
fail();
}
result.put(entry.getName(), new ByteArrayInputStream(out.toByteArray()));
}
return result;
}
Might I be better off if my program took advantage of the filesystem to unzip files?

It turns out that Jsoup is the root of the issue. When obtaining binary data with a Jsoup connection, there is a limit to how many bytes will be read from the connection. By default, this limit is 1048576, or 1 megabyte. As a result, when I feed the binary data from Jsoup into a ZipInputStream, the resulting data is cut off after one megabyte. This limit, maxBodySizeBytes can be found in org.jsoup.helper.HttpConnection.Request.
Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
//^^returns a Connection that will only retrieve 1MB of data
InputStream oneMb = c.execute().bodyStream();
ZipInputStream oneMbZip = new ZipInputStream(oneMb);
Trying to unzip the truncated oneMbZip is what led me to get the EOFException
With the code below, I was able to change Connection's byte limit to 1 GB (1073741824), and then was able to retrieve the zip file without running into an EOFException.
Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
//^^returns a Connection that will only retrieve 1MB of data
Connection.Request theRequest = c.request();
theRequest.maxBodySize(1073741824);
c.request(theRequest);//Now this connection will retrieve as much as 1GB of data
InputStream oneGb = c.execute().bodyStream();
ZipInputStream oneGbZip = new ZipInputStream(oneGb);
Note that maxBodySizeBytes is an int and its upper limit is 2,147,483,647, or just under 2GB.

The compressed (zipped) folder is invalid Java

I'm trying to zip files from server into a folder using ZipOutputStream.
After archive download it can't be opened after double click. Error "The compressed (zipped) folder is invalid" occures. But if I open it from context menu - > 7zip -> open file it works normal. What can be reason of the problem?
sourceFileName="./file.txt"'
sourceFile = new File(sourceFileName);
try {
// set the content type and the filename
responce.setContentType("application/zip");
response.addHeader("Content-Disposition", "attachment; filename=" + sourceFileName + ".zip");
responce.setContentLength((int) sourceFile.length());
// get a ZipOutputStream, so we can zip our files together
ZipOutputStream outZip = new ZipOutputStream((responce.getOutputStream());
// Add ZIP entry to output stream.
outZip.putNextEntry(new ZipEntry(sourceFile.getName()));
int length = 0;
byte[] bbuf = new byte[(int) sourceFile.length()];
DataInputStream in = new DataInputStream(new FileInputStream(sourceFile));
while ((in != null) && ((length = in.read(bbuf)) != -1)) {
outZip.write(bbuf, 0, length);
}
outZip.closeEntry();
in.close();
outZip.flush();
outZip.close();

7Zip can open a wide variety of zip formats, and is relatively tolerant of oddities. Windows double-click requires a relatively specific format and is far less tolerant.
You need to look up the zip format and then look at your file (and "good" ones) with a hex editor (such as Hex Editor Neo), to see what may be wrong.
(One possibility is that you're using the wrong compression algorithm. And there are several other variations to consider as well, particularly whether or not you generate a "directory".)

It could be that a close is missing. It could be that the path encoding in the zip cannot be handled by Windows. It might be that Windows has difficulty with the directory structure, or that a path name contains a (back)slash. So it is detective work, trying different files. If you immediately stream the zip to the HTTP response, then finish has to be called i.o. close.
After the code being posted:
The problem is the setContentLength giving the original file size. But when given, it should give the compressed size.
DataInputStream is not needed, and one should here do a readFully.
responce.setContentType("application/zip");
response.addHeader("Content-Disposition", "attachment; filename=file.zip");
//Path sourcePath = sourceFile.toPath();
Path sourcePath = Paths.get(sourceFileName);
ZipOutputStream outZip = new ZipOutputStream((responce.getOutputStream(),
StandardCharsets.UTF-8);
outZip.putNextEntry(new ZipEntry(sourcePath.getFileName().toString()));
Files.copy(sourcePath, outZip);
outZip.closeEntry();
Either finish or closethe zip at the end.
outZip.finish();
//outZip.close();
in.close();
I am not sure (about the best code style) whether to close the response output stream already oneself.
But when not closing finish() must be called, flush() will not suffice, as at the end data is written to the zip.
For file names with for instance Cyrillic letters, it would be best to add a Unicode charset like UTF-8. In fact let UTF-8 be the Esperanto standard world-wide.
A last note: if only one file one could use GZipOutputstream for file.txt.gz or query the browser's capabilities (request parameters) and deliver it compressed as file.txt.

Android FileOutputStream creates corrupted file

I have an app that creates multiple files using a byte array it gets from a Socket InputStream. The file saves perfectly when I just save one file, but if I save the one file then re-instantiate the file stream and save a different file, the first file gets corrupted and the second file is saved perfectly. I opened the two files in a text editor and it seems (about...)the first 1/5th of the first file is blank spaces but the second file is full, and they both have the same size properties(9,128,731 bytes). The following example is a duplication of the senario but with the same corruption result:
FileOutputStream outStream;
outStream = new FileOutputStream("/mnt/sdcard/testmp3.mp3");
File file = new File("/mnt/sdcard/test.mp3");
FileInputStream inStream = new FileInputStream(file);
byte[] buffer = new byte[9128731];
inStream.read(buffer);
outStream.write(buffer, 0, buffer.length);
inStream.close();
outStream.flush();
outStream.close();
outStream = null;
outStream = new FileOutputStream("/mnt/sdcard/testmp32.mp3");
outStream.write(buffer, 0, buffer.length);
inStream.close();
outStream.flush();
outStream.close();
outStream = null;
I tried this EXACT code in a regular java application and both files were saved without a problem. Does anyone know why the android is doing this?
Any help would be GREATLY appreciated

As jtahlborn mentioned you cannot assume that InputStream.read(byte[]) will always read as many bytes as you want. As well you should avoid using such a large byte array to write out at once. At least not without buffering, you could potentially overflow something. You can handle these concerns and save some memory by copying the file like this:
File inFile = new File("/mnt/sdcard/test.mp3");
File outFile = new File("/mnt/sdcard/testmp3.mp3");
FileInputStream inStream = new FileInputStream(inFile);
FileOutputStream outStream = new FileOutputStream(outFile);
byte[] buffer = new byte[65536];
int len;
while ((len = inStream.read(buffer)) != -1) {
outStream.write(buffer, 0, len);
}
inStream.close();
outStream.close();

I see some potential issues that can get you started debugging:
You writing to the first output stream before you close the input stream. This is a bit weird.
You can't accurately gauge the similarity/difference between two binary files using a text editor. You need to look at the files in a hex editor (or better, Audacity)
I would use BufferedOutputStream as suggested by the Android docs:
out = new BufferedOutputStream(new FileOutputStream(file));
http://developer.android.com/reference/java/io/FileOutputStream.html
As a debugging technique, print the contents of buffer after the first write. Also, inStream.read() returns an int. I would additionally compare this to buffer.length and make sure they are the same. Regardless, I would just call write(buffer) instead of write(buffer, 0, buffer.length) unless you have a really good reason.
-tjw

You are assuming that the read() call will read as many bytes as you want. that is incorrect. that method is free to read anywhere from 1 to buffer.length bytes. that is why you should always use the return value to determine how many bytes were actually read. there are plenty of streams tutorials out there which will show you how to correctly read from a java stream (i.e. how to completely fill your buffer).

If anyone's having the same problem and wondering how o fix it I found out the problem was being caused by my SD card. I bought a 32gb kingston sd card and just yesterday I decided to try running the same code again accept using the internal storage instead and everything worked perfectly. I also tried the stock 2gb SD card it came with and it also worked perfectly. I glad to know my code works great but a little frustrated I spent 50 bucks on a defective memory card. Thanks for everyones input.

How to print the content of a tar.gz file with Java?

I have to implement an application that permits printing the content of all files within a tar.gz file.
For Example:
if I have three files like this in a folder called testx:
A.txt contains the words "God Save The queen"
B.txt contains the words "Ubi maior, minor cessat"
C.txt.gz is a file compressed with gzip that contain the file c.txt with the words "Hello America!!"
So I compress testx, obtain the compressed tar file: testx.tar.gz.
So with my Java application I would like to print in the console:
"God Save The queen"
"Ubi maior, minor cessat"
"Hello America!!"
I have implemented the ZIP version and it works well, but keeping tar library from apache ant http://commons.apache.org/compress/, I noticed that it is not easy like ZIP java utils.
Could someone help me?
I have started looking on the net to understand how to accomplish my aim, so I have the following code:
GZIPInputStream gzipInputStream=null;
gzipInputStream = new GZIPInputStream( new FileInputStream(fileName));
TarInputStream is = new TarInputStream(gzipInputStream);
TarEntry entryx = null;
while((entryx = is.getNextEntry()) != null) {
if (entryx.isDirectory()) continue;
else {
System.out.println(entryx.getName());
if ( entryx.getName().endsWith("txt.gz")){
is.copyEntryContents(out);
// out is a OutputStream!!
}
}
}
So in the line is.copyEntryContents(out), it is possible to save on a file the stream passing an OutputStream, but I don't want it! In the zip version after keeping the first entry, ZipEntry, we can extract the stream from the compressed root folder, testx.tar.gz, and then create a new ZipInputStream and play with it to obtain the content.
Is it possible to do this with the tar.gz file?
Thanks.

surfing the net, i have encountered an interesting idea at : http://hype-free.blogspot.com/2009/10/using-tarinputstream-from-java.html.
After converting ours TarEntry to Stream, we can adopt the same idea used with Zip Files like:
InputStream tmpIn = new StreamingTarEntry(is, entryx.getSize());
// use BufferedReader to get one line at a time
BufferedReader gzipReader = new BufferedReader(
new InputStreamReader(
new GZIPInputStream(
inputZip )));
while (gzipReader.ready()) { System.out.println(gzipReader.readLine()); }
gzipReader.close();
SO with this code you could print the content of the file testx.tar.gz ^_^

To not have to write to a File you should use a ByteArrayOutputStream and use the public String toString(String charsetName)
with the correct encoding.

Java, copying file to jre

I'm trying to create a small application that will copy some .jar files into the latest jre.
Is there anyway of finding out which is this path?
I've looking at the File class and I've found several methods that will create an empty file, but I didn't find anything that would help me copying a file into a given path.
Am I missing any important class?
Thanks

To copy files you can use the java.nio.channels.FileChannel class from the nio library.
package.
For example:
// Create channel for the source
FileChannel srcChannel = new FileInputStream("srcFileLocation").getChannel();
// Create channel for the destination
FileChannel dstChannel = new FileOutputStream("dstFileLocation").getChannel();
// Copy file contents from source to destination
dstChannel.transferFrom(srcChannel, 0, srcChannel.size());
// Close the channels
srcChannel.close();
dstChannel.close();

Firstly there isn't a helper method for copying a file until Java 7 (not yet released). Secondly it is inadvisable to try copying into the JRE directory because you may not have sufficient permission. To find the location of the JRE use System.getProperty("java.home")
To copy:
byte[] buffer = new byte[16384];
InputStream in = new FileInputStream(src);
OutputStream out = new FileOutputStream(dst);
while (true) {
int n = in.read(buffer);
if (n == -1)
break;
out.write(buffer, 0, n);
}
in.close();
out.close();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read a .gz or .bzip2 file in java - java

Related

Unzipping into a ByteArrayOutputStream -- why am I getting an EOFException?

The compressed (zipped) folder is invalid Java

Android FileOutputStream creates corrupted file

How to print the content of a tar.gz file with Java?

Java, copying file to jre

Categories

Resources