Find specific file in 7zip archive - java

I'm using Apache Commons Compress to parse entries in a 7zip archive. I need to be able to find a specific file (e.g. "thisfile.xml"), I was wondering if there is a better way of doing it other than just looping through every entry in the archive.
The sort of thing I'm currently doing is this:
SevenZFile archive = new SevenZFile("chosen 7zip file");
for (SevenZArchiveEntry entry : sevenZFile.getEntries())
{
if (entry.getName().equals("Sites.xml"))
{
//Do stuff
break;
}
}
I don't particularly want to iterate over all entries in the archive, as there could be a lot of them.
Any ideas would be much appreciated

Related

Looping over .tar folder to go multilevel extraction

I have a .tar directory and inside that again .tar.gz directory and within that .gz or .tar folder/files my or may not be available.
I want to keep checking and extract this till 10 level deeper.
It means extract the .tar folder, found .tar.gz, again go on till the end and extract it.
I am able to do till 1 level using apache commons that is .tar if i am extracting, i am able to see .tar.gz file at destination folder.
My concern is how to loop in this logic to check till ten level deeper and extract it.
So your question is how to allow up to ten levels of recursion. How about this?
void unarchive(File archive, File targetDirectory) {
unarchive(File archive, File targetDirectory, 10);
}
void unarchive(File archive, File targetDirectory, int depth) {
// perform your unarchive that you have anyway
if (depth>0) {
// loop over extracted files. If you have any kind of archive, then
unarchive(foundarchive, new File(targetDirectory, foundarchive.getName()), depth-1);
}
}

Programatically Extract Single Specific File From 7zip Archive - Java - Linux

I would really appreciate your input on the below scenario please.
The requirements:
- I have a 7zip archive file with several thousands of files in it
- I have a java application running on linux that is required to retrieve individual files from the 7 zip file
I would like to retrieve a file from the archive by its path (e.g. my7zFile.7z/file1.pdf) without having to iterate through all the files in the archive and comparing file names.
I would like to avoid having to extract all files from the archive before running the search (the uncompressed archive is several TB).
I had a look into 7zip Java Binding - specifically the IInArchive class, the only extract method seems to work via file index, not via file name:
http://sevenzipjbind.sourceforge.net/javadoc/net/sf/sevenzipjbinding/IInArchive.html
Do you know of any other libraries that could help me with this use case or am I overlooking a way of doing this with 7zip jbinding?
Thank you
Kind regards,
Tobi
Sadly it appears the API doesn't provide enough to fulfill all your requirements. In order to extract a single file it appears you need to walk the archive index. The simplified interface to the archive makes this much easier:
The ISimpleInArchive interface provides:
ISimpleInArchiveItem[] getArchiveItems()
Allowing you to retrieve an list of items in the archive.
The ISimpleInArchiveItem interface provides the method:
java.lang.String getPath()
Hence you can walk the archiveItems comparing on path. Granted this is against your requirements.
However, note this walks the index table and does not extract the files until requested. Once you have the item your after you can use:
ExtractOperationResult extractSlow(ISequentialOutStream SequentialOutStream)
on the item you have found to actually extract it.
Looking at the 7z file format (note this is not the official site of 7zip), the header information is all at the end of the file with the Signature header at the start of the file giving an offset to the start of the header info. So provided the SevenZip bindings are written nicely, your search will at most read the start of the file (SignatureHeader) to find the offset to the HeaderInfo section, then walk the HeaderInfo section in order to build up the file list required in getArchiveItems(). Only once you have the item you need will it shift back to the index of the actual stream for the file you want extracted (most likely when you call extractSlow).
So whilst not all your requirements are met, the overhead of the search/compare required is limited to only searching the header info of the archive.
Once I wrote a code to read from all the files and folders from a zip file. I had a long file(text)/folder hierarchy inside the zip file. I am not sure whether that will help you or not. I am sharing the skeleton of the code.
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
ZipFile zipFile = new ZipFile(filepath); // filepath of the zip file
Enumeration<? extends ZipEntry> entries = zipFile.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
if (entry.isDirectory()) { // found directory inside the zipFile
// write your code here
} else {
InputStream stream = zipFile.getInputStream(entry);
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
// write your code to read the content of the file
}
}
You can modify the code to reach your desired file in the zip. But i don't think you will be able to access the file directly rather you have to walk through all the paths of the zip archive. Note that, ZipFile iterates through all file and folders inside a zipped file in DFS (Depth First Search) manner. You will find detailed relevant examples in web.

Read tgz w/out unpacking it onto computer or Unpack as temp & delete when program closes?

Hey guys I'm currently using jarchivelib which can be found Here I'm stuck on figuring out a way to read the file without having to use the unpack method because it makes a file of the unpacked version. EX:
File archive = new File("/home/jack/archive.zip");
File destination = new File("/home/jack/archive");
Archiver archiver = ArchiverFactory.createArchiver(ArchiveFormat.ZIP);
archiver.extract(archive, destination);
I want to make it so i don't have to unpack it to read the files... If there is no way to do that I'm guessing in my method for Jframe.setDefualtCloseOpperation i'll have to make a custom one so it deletes the files? or is there a better way for handling temp files?
If all you want to do is to extract the file, why not use Java's built in zip to extract the file or if it is password protected you can use Zip4j. These libraries support streams, so that you can extract the contents of the file without writing it a FileStream
As of version 0.4.0, the jarchivelib Archiver API supports streaming an archive rather than extracting it directly onto the filesystem.
ArchiveStream stream = archiver.stream(archive);
ArchiveEntry entry;
while((entry = stream.getNextEntry()) != null) {
// access each archive entry individually using the stream
// or extract it using entry.extract(destination)
// or fetch meta-data using entry.getName(), entry.isDirectory(), ...
}
stream.close();
when the stream is pointing to an entry after calling getNextEntry, you can use the stream.read methods just as you would reading an individual entry.

append a file to zip using TrueZip

I want to use the TrueZip library to append a file to an existing archive (not by
unpacking, adding a file and repacking - the new versions are supposed to have this
feature), but I find it a bit difficult to understand the API.
Can please someone, more knowledgeable than me, suggest how to do this in a few lines?
Google is your friend:
Appending entries to ZIP files with TrueZIP 7.3
class MyApplication extends TApplication {
#Override
protected void setup() {
// This should obtain the global configuration.
TConfig config = TConfig.get();
// Set FsOutputOption.GROW for appending-to rather than reassembling an
// archive file.
config.setOutputPreferences(
config.getOutputPreferences.set(FsOutputOption.GROW));
}
...
}

How to walk through Java class resources?

I know we can do something like this:
Class.class.getResourceAsStream("/com/youcompany/yourapp/module/someresource.conf")
to read the files that are packaged within our jar file.
I have googled it a lot and I am surely not using the proper terms; what I want to do is to list the available resources, something like this:
Class.class.listResources("/com/yourcompany/yourapp")
That should return a list of resources that are inside the package com.yourcompany.yourapp.*
Is that possible? Any ideas on how to do it in case it can't be done as easily as I showed?
Note: I know it is possible to know where your jar is and then open it and inspect its contents to achieve it. But, I can't do it in the environment I am working in now.
For resources in a JAR file, something like this works:
URL url = MyClass.class.getResource("MyClass.class");
String scheme = url.getProtocol();
if (!"jar".equals(scheme))
throw new IllegalArgumentException("Unsupported scheme: " + scheme);
JarURLConnection con = (JarURLConnection) url.openConnection();
JarFile archive = con.getJarFile();
/* Search for the entries you care about. */
Enumeration<JarEntry> entries = archive.entries();
while (entries.hasMoreElements()) {
JarEntry entry = entries.nextElement();
if (entry.getName().startsWith("com/y/app/")) {
...
}
}
You can do the same thing with resources "exploded" on the file system, or in many other repositories, but it's not quite as easy. You need specific code for each URL scheme you want to support.
In general can't get a list of resources like this. Some classloaders may not even be able to support this - imagine a classloader which can fetch individual files from a web server, but the web server doesn't have to support listing the contents of a directory.
For a jar file you can load the contents of the jar file explicitly, of course.
(This question is similar, btw.)
The most robust mechanism for listing all resources in the classpath is currently to use this pattern with ClassGraph, because it handles the widest possible array of classpath specification mechanisms, including the new JPMS module system. (I am the author of ClassGraph.)
List<String> resourceNames;
try (ScanResult scanResult = new ClassGraph()
.whitelistPaths("com/yourcompany/yourapp")
.scan()) {
resourceNames = scanResult.getAllResources().getNames();
}
I've been looking for a way to list the contents of a jar file using the classloaders, but unfortunately this seems to be impossible. Instead what you can do is open the jar as a zip file and get the contents this way. You can use standard (here) ways to read the contents of a jar file and then use the classloader to read the contents.
I usually use
getClass().getClassLoader().getResourceAsStream(...)
but I doubt you can list the entries from the classpath, without knowing them a priori.

Categories

Resources