How to print the content of a tar.gz file with Java?

How to print the content of a tar.gz file with Java? - java

I have to implement an application that permits printing the content of all files within a tar.gz file.
For Example:
if I have three files like this in a folder called testx:
A.txt contains the words "God Save The queen"
B.txt contains the words "Ubi maior, minor cessat"
C.txt.gz is a file compressed with gzip that contain the file c.txt with the words "Hello America!!"
So I compress testx, obtain the compressed tar file: testx.tar.gz.
So with my Java application I would like to print in the console:
"God Save The queen"
"Ubi maior, minor cessat"
"Hello America!!"
I have implemented the ZIP version and it works well, but keeping tar library from apache ant http://commons.apache.org/compress/, I noticed that it is not easy like ZIP java utils.
Could someone help me?
I have started looking on the net to understand how to accomplish my aim, so I have the following code:
GZIPInputStream gzipInputStream=null;
gzipInputStream = new GZIPInputStream( new FileInputStream(fileName));
TarInputStream is = new TarInputStream(gzipInputStream);
TarEntry entryx = null;
while((entryx = is.getNextEntry()) != null) {
if (entryx.isDirectory()) continue;
else {
System.out.println(entryx.getName());
if ( entryx.getName().endsWith("txt.gz")){
is.copyEntryContents(out);
// out is a OutputStream!!
}
}
}
So in the line is.copyEntryContents(out), it is possible to save on a file the stream passing an OutputStream, but I don't want it! In the zip version after keeping the first entry, ZipEntry, we can extract the stream from the compressed root folder, testx.tar.gz, and then create a new ZipInputStream and play with it to obtain the content.
Is it possible to do this with the tar.gz file?
Thanks.

surfing the net, i have encountered an interesting idea at : http://hype-free.blogspot.com/2009/10/using-tarinputstream-from-java.html.
After converting ours TarEntry to Stream, we can adopt the same idea used with Zip Files like:
InputStream tmpIn = new StreamingTarEntry(is, entryx.getSize());
// use BufferedReader to get one line at a time
BufferedReader gzipReader = new BufferedReader(
new InputStreamReader(
new GZIPInputStream(
inputZip )));
while (gzipReader.ready()) { System.out.println(gzipReader.readLine()); }
gzipReader.close();
SO with this code you could print the content of the file testx.tar.gz ^_^

To not have to write to a File you should use a ByteArrayOutputStream and use the public String toString(String charsetName)
with the correct encoding.

Related

Write ZipEntry with given byte array in memory

I have a very confusing problem and hope that I can get some ideas here.
My problem is very simple, but I didn't find a solution yet.
I want to create a simple ZIP File with ZipEntry's in it. The ZipEntry's are created by a given byte array (saved in a Postgres-DB with Hibernate).
When I put this byte array into my ZipOutputStream.write(..) the ZIP File created is always corrupt. What am I doing wrong?
The ZIP File is transferred to a FTP-Server afterwards.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
final ZipOutputStream zipOut = new ZipOutputStream(bos);
String filename = "test.zip";
for(final Attachment attachment : transportDoc.getAttachments()) {
log.debug("Adding "+attachment.getFileName()+" to ZIP file /tmp/"+filename);
ZipEntry ze = new ZipEntry(attachment.getFileName());
zipOut.putNextEntry(ze);
zipOut.write(attachment.getFileContent());
zipOut.flush();
zipOut.closeEntry();
}
zipOut.close();
org.apache.commons.io.FileUtils.writeByteArrayToFile(new File("/tmp/"+filename), bos.toByteArray());
I am confused, because when I replaced
zipOut.write(attachment.getFileContent()); //This is the byte array from db
with
zipOut.write("Bla bla".getBytes());
it worked!
But the byte array from the DB can't be corrupt, because it can be written to a file with
org.apache.commons.io.FileUtils.writeByteArrayToFile(new File("/tmp/test.png"), attachment.getFileContent());
with no problem. It is a correct file.
I hope you have some ideas left.
Thanks in advance.
EDIT:
I tried to repair the ZIP file offline and then this messages appears:
zip warning: no end of stream entry found: cglhnngplpmhipfg.png
(This png file is the byte-Array-File)
Simple unzip-command output the following:
unzip created.zip
Archive: created.zip
error [created.zip]: missing 2 bytes in zipfile
(attempting to process anyway)
error [created.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
(attempting to re-compensate)
replace cglhnngplpmhipfg.png? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
inflating: cglhnngplpmhipfg.png
error: invalid compressed data to inflate
file #2: bad zipfile offset (local header sig): 24709
(attempting to re-compensate)
inflating: created.xml
EDIT 2:
When I write this file to the Filesystem and add this file to the ZIP by an InputStream it doesn't work either! But the File on the Filesystem is ok. I can open the Image with no problem. Its very confusing
File tmpAttachment = new File("/tmp/"+filename+attachment.getFileName());
FileUtils.writeByteArrayToFile(tmpAttachment, attachment.getFileContent());
FileInputStream inTmp = new FileInputStream(tmpAttachment);
int len;
byte[] buffer = new byte[1024];
while ((len = inTmp.read(buffer)) > 0) {
zipOut.write(buffer, 0, len);
}
inTmp.close();
EDIT 3:
This problem only appears when I try to add "complex" files like png or pdf. If I put a txt-file in it, it works.

The problem was NOT in the Zip-Library itself.
It was the transmission to an external FTP Server with wrong mode. (Not binary).
Thanks all for your help.

Try closeEntry() before flush(). Also you can try to explicitly specify the size of the entry using ze.setSize(attachment.getFileContent().length).

Read zip or jar file without unzipping it first

I'm not looking for any answers that involve opening the zip file in a zip input or output stream. My question is is it possible in java to just simply open a jar file like any other file (using buffered reader/writer), read it's contents, and write them somewhere else? For example:
import java.io.*;
public class zipReader {
public static void main(String[] args){
BufferedReader br = new BufferedReader(new FileReader((System.getProperty("user.home").replaceAll("\\\\", "/") + "/Desktop/foo.zip")));
BufferedWriter bw = new BufferedWriter(new FileWriter((System.getProperty("user.home").replaceAll("\\\\", "/") + "/Desktop/baf.zip")));
char[] ch = new char[180000];
while(br.read(ch) > 0){
bw.write(ch);
bw.flush();
}
br.close();
bw.close();
}
}
This works on some small zip/jar files, but most of the time will just corrupt them making it impossible to unzip or execute them. I have found that setting the size of the char[] to 1 will not corrupt the file, just everything in it, meaning I can open the file in an archive program but all it's entries will be corrupted and unusable. Does anyone know how to write the above code so it won't corrupt the file? Also here is a line from a jar file I tested this on that became corrupted:
nèñà?G¾Þ§V¨ö—‚?‰9³’?ÀM·p›a0„èwåÕüaEÜµp‡aæOùR‰(JºJ´êgžè*?”6ftöãÝÈ—ê#qïc3âi,áž…¹¿Êð)V¢cã>Ê”G˜(†®9öCçM?€ÔÙÆC†ÑÝ×ok?ý—¥úûFs.‡
vs the original:
nèñàG¾Þ§V¨ö—‚‰9³’ÀM·p›a0„èwåÕüaEÜµp‡aæOùR‰(JºJ´êgžè*?”6ftöãÝÈ—ê#qïc3âi,áž…¹¿Êð)V¢cã>Ê”G˜(†®9öCçM€ÔÙÆC†ÑÝ×oký—¥úûFs.‡
As you can see either the reader or writer adds ?'s into the files and I can't figure out why. Again I don't want any answers telling me to open it entry by entry, I already know how to do that, if anyone knows the answer to my question please share it.

Why would you want to convert binary data to chars? I think it will be much better to InputStream/OutputStream using byte arrays. See http://www.javapractices.com/topic/TopicAction.do?Id=245
for examples.

bw.write(ch) will write the entire array. Read will only fill in some of it, and return a number telling you how much. This is nothing to do with zip files, just with how IO works.
You need to change your code to look more like:
int charsRead = br.read(buffer);
if (charsRead >= 0) {
bw.write(buffer, 0, charsRead);
} else {
// whatever I do at the end.
}
However, this is only 1/2 of your problem. You are also converting bytes to characters and back again, which will corrupt the data in other ways. Stick to streams.

see the ZipInputStream and ZipOutputStream classes
Edit: use plain FileInputStream and FileOutputStream. I suspect there may be some issues when the reader is interpreting the bytes as characters.
see also: Standard concise way to copy a file in Java? Since you ant to copy the whole file, there is nothing special about it being a zip file

How to create a new java.io.File in memory? [duplicate]

This question already has answers here:
How to read file from ZIP using InputStream?
(7 answers)
Closed 1 year ago.
How can I create new File (from java.io) in memory, not on the hard disk?
I am using the Java language. I don't want to save the file on the hard drive.
I'm faced with a bad API (java.util.jar.JarFile). It's expecting File file of String filename. I have no file (only byte[] content) and can create temporary file, but it's not beautiful solution. I need to validate the digest of a signed jar.
byte[] content = getContent();
File tempFile = File.createTempFile("tmp", ".tmp");
FileOutputStream fos = new FileOutputStream(tempFile);
fos.write(archiveContent);
JarFile jarFile = new JarFile(tempFile);
Manifest manifest = jarFile.getManifest();
Any examples of how to achieve getting manifest without creating a temporary file would be appreciated.

How can I create new File (from java.io) in memory , not in the hard disk?
Maybe you are confusing File and Stream:
A File is an abstract representation of file and directory pathnames. Using a File object, you can access the file metadata in a file system, and perform some operations on files on this filesystem, like delete or create the file. But the File class does not provide methods to read and write the file contents.
To read and write from a file, you are using a Stream object, like FileInputStream or FileOutputStream. These streams can be created from a File object and then be used to read from and write to the file.
You can create a stream based on a byte buffer which resides in memory, by using a ByteArrayInputStream and a ByteArrayOutputStream to read from and write to a byte buffer in a similar way you read and write from a file. The byte array contains the "File's" content. You do not need a File object then.
Both the File... and the ByteArray... streams inherit from java.io.OutputStream and java.io.InputStream, respectively, so that you can use the common superclass to hide whether you are reading from a file or from a byte array.

It is not possible to create a java.io.File that holds its content in (Java heap) memory *.
Instead, normally you would use a stream. To write to a stream, in memory, use:
OutputStream out = new ByteArrayOutputStream();
out.write(...);
But unfortunately, a stream can't be used as input for java.util.jar.JarFile, which as you mention can only use a File or a String containing the path to a valid JAR file. I believe using a temporary file like you currently do is the only option, unless you want to use a different API.
If you are okay using a different API, there is conveniently a class in the same package, named JarInputStream you can use. Simply wrap your archiveContent array in a ByteArrayInputStream, to read the contents of the JAR and extract the manifest:
try (JarInputStream stream = new JarInputStream(new ByteArrayInputStream(archiveContent))) {
Manifest manifest = stream.getManifest();
}
*) It's obviously possible to create a full file-system that resides in memory, like a RAM-disk, but that would still be "on disk" (and not in Java heap memory) as far as the Java process is concerned.

You could use an in-memory filesystem, such as Jimfs
Here's a usage example from their readme:
FileSystem fs = Jimfs.newFileSystem(Configuration.unix());
Path foo = fs.getPath("/foo");
Files.createDirectory(foo);
Path hello = foo.resolve("hello.txt"); // /foo/hello.txt
Files.write(hello, ImmutableList.of("hello world"), StandardCharsets.UTF_8);

I think temporary file can be another solution for that.
File tempFile = File.createTempFile(prefix, suffix, null);
FileOutputStream fos = new FileOutputStream(tempFile);
fos.write(byteArray);
There is a an answer about that here.

The compressed (zipped) folder is invalid Java

I'm trying to zip files from server into a folder using ZipOutputStream.
After archive download it can't be opened after double click. Error "The compressed (zipped) folder is invalid" occures. But if I open it from context menu - > 7zip -> open file it works normal. What can be reason of the problem?
sourceFileName="./file.txt"'
sourceFile = new File(sourceFileName);
try {
// set the content type and the filename
responce.setContentType("application/zip");
response.addHeader("Content-Disposition", "attachment; filename=" + sourceFileName + ".zip");
responce.setContentLength((int) sourceFile.length());
// get a ZipOutputStream, so we can zip our files together
ZipOutputStream outZip = new ZipOutputStream((responce.getOutputStream());
// Add ZIP entry to output stream.
outZip.putNextEntry(new ZipEntry(sourceFile.getName()));
int length = 0;
byte[] bbuf = new byte[(int) sourceFile.length()];
DataInputStream in = new DataInputStream(new FileInputStream(sourceFile));
while ((in != null) && ((length = in.read(bbuf)) != -1)) {
outZip.write(bbuf, 0, length);
}
outZip.closeEntry();
in.close();
outZip.flush();
outZip.close();

7Zip can open a wide variety of zip formats, and is relatively tolerant of oddities. Windows double-click requires a relatively specific format and is far less tolerant.
You need to look up the zip format and then look at your file (and "good" ones) with a hex editor (such as Hex Editor Neo), to see what may be wrong.
(One possibility is that you're using the wrong compression algorithm. And there are several other variations to consider as well, particularly whether or not you generate a "directory".)

It could be that a close is missing. It could be that the path encoding in the zip cannot be handled by Windows. It might be that Windows has difficulty with the directory structure, or that a path name contains a (back)slash. So it is detective work, trying different files. If you immediately stream the zip to the HTTP response, then finish has to be called i.o. close.
After the code being posted:
The problem is the setContentLength giving the original file size. But when given, it should give the compressed size.
DataInputStream is not needed, and one should here do a readFully.
responce.setContentType("application/zip");
response.addHeader("Content-Disposition", "attachment; filename=file.zip");
//Path sourcePath = sourceFile.toPath();
Path sourcePath = Paths.get(sourceFileName);
ZipOutputStream outZip = new ZipOutputStream((responce.getOutputStream(),
StandardCharsets.UTF-8);
outZip.putNextEntry(new ZipEntry(sourcePath.getFileName().toString()));
Files.copy(sourcePath, outZip);
outZip.closeEntry();
Either finish or closethe zip at the end.
outZip.finish();
//outZip.close();
in.close();
I am not sure (about the best code style) whether to close the response output stream already oneself.
But when not closing finish() must be called, flush() will not suffice, as at the end data is written to the zip.
For file names with for instance Cyrillic letters, it would be best to add a Unicode charset like UTF-8. In fact let UTF-8 be the Esperanto standard world-wide.
A last note: if only one file one could use GZipOutputstream for file.txt.gz or query the browser's capabilities (request parameters) and deliver it compressed as file.txt.

How to create a java.io.File from a ByteArrayOutputStream?

I'm reading a bunch of files from an FTP. Then I need to unzip those files and write them to a fileshare.
I don't want to write the files first and then read them back and unzip them. I want to do it all in one go. Is that possible?
This is my code
FTPClient fileclient = new FTPClient();
..
ByteArrayOutputStream out = new ByteArrayOutputStream();
fileclient.retrieveFile(filename, out);
??????? //How do I get my out-stream into a File-object?
File file = new File(?);
ZipFile zipFile = new ZipFile(file,ZipFile.OPEN_READ);
Any ideas?

You should use a ZipInputStream wrapped around the InputStream returned from FTPClient's retrieveFileStream(String remote).

You don't need to create the File object.
If you want to save the file you should pipe the stream directly into a ZipOutputStream
ByteArrayOutputStream out = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(out);
// do whatever with your zip file
If, instead, you want to open the just retrieved file work with the ZipInputStream:
new ZipInputStream(fileClient.retrieveFileStream(String remote));
Just read the doc here and here

I think you want:
ZipInputStream zis = new ZipInputStream( new ByteArrayInputStream( out.toByteArray() ) );
Then read your data from the ZipInputStream.

As others have pointed out, for what you are trying to do, you don't need to write the downloaded ZIP "file" to the file system at all.
Having said that, I'd like to point out a misconception in your question, that is also reflected in some of the answers.
In Java, a File object does no really represent a file at all. Rather, it represents a file name or *path". While this name or path often corresponds to an actual file, this doesn't need to be the case.
This may sound a bit like hair-splitting, but consider this scenario:
File dir = new File("/tmp/foo");
boolean isDirectory = dir.isDirectory();
if (isDirectory) {
// spend a long time computing some result
...
// create an output file in 'dir' containing the result
}
Now if instances of the File class represented objects in the file system, then you'd expect the code that creates the output file to succeed (modulo permissions). But in fact, the create could fail because, something deleted the "/tmp/foo", or replaced it with a regular file.
It must be said that some of the methods on the File class do seem to assume that the File object does correspond to a real filesystem entity. Examples are the methods for getting a file's size or timestamps, or for listing the names in a directory. However, in each case, the method is specified to throw an exception if the actual file does not exist or has the wrong type for the operation requested.

Well, you could just create a FileOutputStream and then write the data from that:
FileOutputStream fos = new FileOutputStream(filename);
try {
out.writeTo(fos);
} finally {
fos.close();
}
Then just create the File object:
File file = new File(filename);
You need to understand that a File object doesn't represent any real data on disk - it's just a filename, effectively. The file doesn't even have to exist. If you want to actually write data, that's what FileOutputStream is for.
EDIT: I've just spotted that you didn't want to write the data out first - but that's what you've got to do, if you're going to pass the file to something that expects a genuine file with data in.
If you don't want to do that, you'll have to use a different API which doesn't expect a file to exist... as per Qwerky's answer.

Just change the ByteArrayOutputStream to a FileOutputStream.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to print the content of a tar.gz file with Java? - java

To not have to write to a File you should use a ByteArrayOutputStream and use the public String toString(String charsetName) with the correct encoding.

Related

Write ZipEntry with given byte array in memory

Read zip or jar file without unzipping it first

How to create a new java.io.File in memory? [duplicate]

The compressed (zipped) folder is invalid Java

How to create a java.io.File from a ByteArrayOutputStream?

Categories

Resources