Getting specific file from ZipInputStream

Getting specific file from ZipInputStream - java

I can go through ZipInputStream, but before starting the iteration I want to get a specific file that I need during the iteration. How can I do that?
ZipInputStream zin = new ZipInputStream(myInputStream)
while ((entry = zin.getNextEntry()) != null)
{
println entry.getName()
}

If the myInputStream you're working with comes from a real file on disk then you can simply use java.util.zip.ZipFile instead, which is backed by a RandomAccessFile and provides direct access to the zip entries by name. But if all you have is an InputStream (e.g. if you're processing the stream directly on receipt from a network socket or similar) then you'll have to do your own buffering.
You could copy the stream to a temporary file, then open that file using ZipFile, or if you know the maximum size of the data in advance (e.g. for an HTTP request that declares its Content-Length up front) you could use a BufferedInputStream to buffer it in memory until you've found the required entry.
BufferedInputStream bufIn = new BufferedInputStream(myInputStream);
bufIn.mark(contentLength);
ZipInputStream zipIn = new ZipInputStream(bufIn);
boolean foundSpecial = false;
while ((entry = zin.getNextEntry()) != null) {
if("special.txt".equals(entry.getName())) {
// do whatever you need with the special entry
foundSpecial = true;
break;
}
}
if(foundSpecial) {
// rewind
bufIn.reset();
zipIn = new ZipInputStream(bufIn);
// ....
}
(I haven't tested this code myself, you may find it's necessary to use something like the commons-io CloseShieldInputStream in between the bufIn and the first zipIn, to allow the first zip stream to close without closing the underlying bufIn before you've rewound it).

use the getName() method on ZipEntry to get the file you want.
ZipInputStream zin = new ZipInputStream(myInputStream)
String myFile = "foo.txt";
while ((entry = zin.getNextEntry()) != null)
{
if (entry.getName().equals(myFileName)) {
// process your file
// stop looking for your file - you've already found it
break;
}
}
From Java 7 onwards, you are better off using ZipFile instead of ZipStream if you only want one file and you have a file to read from:
ZipFile zfile = new ZipFile(aFile);
String myFile = "foo.txt";
ZipEntry entry = zfile.getEntry(myFile);
if (entry) {
// process your file
}

Look at Finding a file in zip entry
ZipFile file = new ZipFile("file.zip");
ZipInputStream zis = searchImage("foo.png", file);
public searchImage(String name, ZipFile file)
{
for (ZipEntry e : file.entries){
if (e.getName().endsWith(name)){
return file.getInputStream(e);
}
}
return null;
}

I'm late to the party, but all above "answers" does not answer the question and accepted "answer" suggest create temp file which is inefficient.
Lets create sample zip file:
seq 10000 | sed "s/^.*$/a/"> /tmp/a
seq 10000 20000 | sed "s/^.*$/b/"> /tmp/b
seq 20000 30000 | sed "s/^.*$/c/"> /tmp/c
zip /tmp/out.zip /tmp/a /tmp/b /tmp/c
so now we have /tmp/out.zip file, which contains 3 files, each of them full of chars a, b or c.
Now lets read it:
public static void main(String[] args) throws IOException {
ZipInputStream zipStream = new ZipInputStream(new FileInputStream("/tmp/out.zip"));
ZipEntry zipEntry;
while ((zipEntry = zipStream.getNextEntry()) != null) {
String name = zipEntry.getName();
System.out.println("Entry: "+name);
if (name.equals("tmp/c")) {
byte[] bytes = zipStream.readAllBytes();
String s = new String(bytes);
System.out.println(s);
}
}
}
method readAllBytes seems weird, while we're in processing of stream, but it seems to work, I tested it also on some images, where there is higher chance of failure. So it's probably just unintuitive api, but it seems to work.

Related

XMLEventReader from ZipInputStream closes stream

I want to stream a ZIP file containing several very large (~1GByte) XML files. I could read the data from each zip file into a buffer and create a XMLStream from that - but to save on memory I would prefer to process the data on the fly.
#Test
public void zipStreamTest() throws IOException, XMLStreamException {
FileInputStream fis = new FileInputStream("archive.zip");
ZipInputStream zis = new ZipInputStream(fis);
ZipEntry ei;
while ((ei = zis.getNextEntry()) != null){
XMLEventReader xr = XMLInputFactory.newInstance().createXMLEventReader(zis);
while (reader.hasNext()) {
XMLEvent xe = xr.nextEvent();
// do some xml event processing..
}
zis.closeEntry();
}
zis.close();
}
The problem: I'm getting a java.io.IOException: Stream closed when executing zis.closeEntry();. When I remove that line, the same error is thrown at zis.getNextEntry() which closes previous entries if they're still open automatically.
It seems that my XML stream reader is breaking the stream at the end of the XML file so that the rest of the zip can't be processed.
Do I have an implementation error or is my conception of how streams work incorrect?
Note: To make this a minimal reproduceable example all you need is a zip file "archive.zip" which contains any valid XML file (no subdirectories inside the zip!). You can then run the snippet using JUnit.

You could try to open separate InputStream for each entry using java.util.zip.ZipFile:
#Test
public void zipStreamTest() throws Exception {
ZipFile zipFile = new ZipFile("archive.zip");
Iterator<? extends ZipEntry> iterator = zipFile.entries().asIterator();
while (iterator.hasNext()) {
ZipEntry ze = iterator.next();
try (InputStream zis = zipFile.getInputStream(ze)) {
XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(zis);
while (reader.hasNext()) {
XMLEvent xe = reader.nextEvent();
// do some xml event processing
}
reader.close();
}
}
}

I would recommend using ZipFile instead of ZipInputStream, as suggested in answer by Alexandra Dudkina.
However, if you're processing a data stream while e.g. downloading, and therefore want to keep using ZipInputStream, you should wrap it in a CloseShieldInputStream from Apache Commons IO1 inside the getNextEntry() loop:
while ((ei = zis.getNextEntry()) != null) {
XMLEventReader xr = XMLInputFactory.newInstance().createXMLEventReader(new CloseShieldInputStream(zis));
// Process XML here
zis.closeEntry();
}
1) Or other similar helper class from third-party library of your choice.

How to access every entry of CompressedSource in google cloud dataflow? And get Byte[] of each subfile

I have a compressed file which is a gzip file composed of multiple text file on google storage. I need to access each subfile and do some operation like regular expression.
I can do the same thing on my local computer like this.
pubic static void untarFile( String filepath ) throw IOException {
try {
FileInputStream fin = new FileInputStream(filepath);
BufferedInputStream in = new BufferedInputStream(fin);
GzipCompressorInputStream gzIn = new GzipCompressorInputStream(in);
TarArchiveInputStream tarInput = new TarArchiveInputStream(gzIn);
TarArchiveEntry entry = null;
while ((entry = (TarArchiveEntry) tarInput.getNextTarEntry() ) != null) {
byte[] fileContent = new byte (int)entry.getSize() ];
tarInput.read(fileContent, 0, fileContent.length);
}
}
}
Therefore, I can do some other operation on fileContent which is a byte[ ]. So I used CompressedSource on google cloud dataflow and refer to its test code.It seems that I can only get every byte from file instead of whole byet[] of subfile, so I am wondering if there is any solution for me to do this on google cloud dataflow.

TextIO does not support this directly, but you can create a new subclass of FileBasedSource to do this. You'll want to override isSplittable() to always return false, and then have readNextRecord() just read the entire file.

How do parse some files in a tar.bz2 archive with Java

So I have written the parser for parsing an individual file but can i read each file within the archive without having to actually extract the archive to disk

Following the examples in http://commons.apache.org/proper/commons-compress/examples.html you have to wrap one InputStream with another
// 1st InputStream from your compressed file
FileInputStream in = new FileInputStream(tarbz2File);
// wrap in a 2nd InputStream that deals with compression
BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
// wrap in a 3rd InputStream that deals with tar
TarArchiveInputStream tarIn = new TarArchiveInputStream(bzIn);
ArchiveEntry entry = null;
while (null != (entry = tarIn.getNextEntry())){
if (entry.getSize() < 1){
continue;
}
// use your parser here, the tar inputStream deals with the size of the current entry
parser.parse(tarIn);
}
tarIn.close();

getNextEntry() doesn't display folder as an entry?

Hi I'm new to android programming.
I'm trying to create a program to unzip a zipped file in my sd card and I noticed something when I debug.
public void testZipOrder() throws Exception {
File file = new File(_zipFile);
zis = new ZipInputStream(new FileInputStream(file));
ZipEntry entry = null;
while ( (entry = zis.getNextEntry()) != null ) {
System.out.println( entry.getName());
}
}
}
this give me an output of :
06-27 00:42:06.360: I/System.out(15402): weee.txt
06-27 00:42:06.360: I/System.out(15402): hi/bye.txt
06-27 00:42:06.360: I/System.out(15402): hi/hiwayne.txt
isn't it suppose to give
weee.txt
hi/
hi/bye.txt
hi/hiwayne.txt
or something that displays its folder instead?

I tried this on my own environment using a test zip file created with 7zip and the following method:
public void testZipOrder() throws Exception {
File file = new File("zip.zip");
ZipInputStream zis = new ZipInputStream(new FileInputStream(file));
ZipEntry entry = null;
while ( (entry = zis.getNextEntry()) != null ) {
System.out.println( entry.getName());
}
zis.close();
}
Note this method is effectively identical to yours.
The resulting output was:
file1.txt
folder1/
folder1/file2.txt
folder1/folder2/
folder1/folder2/file3.txt
Which is, I believe, what you are looking for. As such I expect the problem is with the zip file itself, not your code. It is likely that your zip file does not contain an entry for the directory "hi/".
See here for a basic description of how zip files are structured.

ZIP spec does not require the ordered "placement" of the file and its parent(s) directory in the zip file, and in fact the parent directory entries can be totally absent
See https://bugs.openjdk.java.net/browse/JDK-8054027

What is the idiomatic way to copy a ZipEntry into a new ZipFile?

I'm writing a tool to do some minor text replacement in a DOCX file, which is a zipped format. My method is to copy ZipEntry contents from entries in the original file into the modified file using a ZipOutputStream. For most DOCX files this works well, but occasionally I will encounter ZipExceptions regarding discrepancies between the contents I've written and the meta-information contained in the ZipEntry (usually a difference in compressed size).
Here's the code I'm using to copy over contents. I've stripped out error handling and document processing for brevity; I haven't had issues with the document entry so far.
ZipFile original = new ZipFile(INPUT_FILENAME);
ZipOutputStream outputStream = new ZipOutputStream(new FileOutputStream(OUTPUT_FILE));
Enumeration entries = original.entries();
byte[] buffer = new byte[512];
while (entries.hasMoreElements()) {
ZipEntry entry = (ZipEntry)entries.nextElement();
if ("word/document.xml".equalsIgnoreCase(entry.getName())) {
//perform special processing
}
else{
outputStream.putNextEntry(entry);
InputStream in = original.getInputStream(entry);
while (0 < in.available()){
int read = in.read(buffer);
outputStream.write(buffer,0,read);
}
in.close();
}
outputStream.closeEntry();
}
outputStream.close();
What is the proper or idiomatic way to directly copy ZipEntry objects from one ZipFile to another?

I have found a workaround that avoids the error. By creating a new ZipEntry with only the name field set I'm able to copy over contents without issue.
ZipFile original = new ZipFile(INPUT_FILENAME);
ZipOutputStream outputStream = new ZipOutputStream(new FileOutputStream(OUTPUT_FILE));
Enumeration entries = original.entries();
byte[] buffer = new byte[512];
while (entries.hasMoreElements()) {
ZipEntry entry = (ZipEntry)entries.nextElement();
if ("word/document.xml".equalsIgnoreCase(entry.getName())) {
//perform special processing
}
else{
// create a new empty ZipEntry
ZipEntry newEntry = new ZipEntry(entry.getName());
// outputStream.putNextEntry(entry);
outputStream.putNextEntry(newEntry);
InputStream in = original.getInputStream(entry);
while (0 < in.available()){
int read = in.read(buffer);
if (read > 0) {
outputStream.write(buffer,0,read);
}
}
in.close();
}
outputStream.closeEntry();
}
outputStream.close();
However, I this method loses any meta-information stored in the fields of original ZipEntry (e.g.: comment, extra). The API docs aren't clear on whether this is important.

To keep your meta-data for the zip entry, create it using ZipEntry's "copy constructor":
ZipEntry newEntry = new ZipEntry(entry);
You can then modify just the name or the comments etc. and everything else will be copied from the given entry.
You could also look at Docmosis which can populate DocX files from Java.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting specific file from ZipInputStream - java

I can go through ZipInputStream, but before starting the iteration I want to get a specific file that I need during the iteration. How can I do that? ZipInputStream zin = new ZipInputStream(myInputStream) while ((entry = zin.getNextEntry()) != null) { println entry.getName() }

Look at Finding a file in zip entry ZipFile file = new ZipFile("file.zip"); ZipInputStream zis = searchImage("foo.png", file); public searchImage(String name, ZipFile file) { for (ZipEntry e : file.entries){ if (e.getName().endsWith(name)){ return file.getInputStream(e); } } return null; }

Related

XMLEventReader from ZipInputStream closes stream

How to access every entry of CompressedSource in google cloud dataflow? And get Byte[] of each subfile

How do parse some files in a tar.bz2 archive with Java

getNextEntry() doesn't display folder as an entry?

What is the idiomatic way to copy a ZipEntry into a new ZipFile?

Categories

Resources