Given
URL of an archive (e.g. a zip file)
Full name (including path) of a file inside that archive
I'm looking for a way (preferably in Java) to create a local copy of that file, without downloading the entire archive first.
From my (limited) understanding it should be possible, though I have no idea how to do that. I've been using TrueZip, since it seems to support a large variety of archive types, but I have doubts about its ability to work in such a way. Does anyone have any experience with that sort of thing?
EDIT: being able to also do that with tarballs and zipped tarballs is also important for me.
Well, at a minimum, you have to download the portion of the archive up to and including the compressed data of the file you want to extract. That suggests the following solution: open a URLConnection to the archive, get its input stream, wrap it in a ZipInputStream, and repeatedly call getNextEntry() and closeEntry() to iterate through all the entries in the file until you reach the one you want. Then you can read its data using ZipInputStream.read(...).
The Java code would look something like this:
URL url = new URL("http://example.com/path/to/archive");
ZipInputStream zin = new ZipInputStream(url.getInputStream());
ZipEntry ze = zin.getNextEntry();
while (!ze.getName().equals(pathToFile)) {
zin.closeEntry(); // not sure whether this is necessary
ze = zin.getNextEntry();
}
byte[] bytes = new byte[ze.getSize()];
zin.read(bytes);
This is, of course, untested.
Contrary to the other answers here, I'd like to point out that ZIP entries are compressed individually, so (in theory) you don't need to download anything more than the directory and the entry itself. The server would need to support the Range HTTP header for this to work.
The standard Java API only supports reading ZIP files from local files and input streams. As far as I know there's no provision for reading from random access remote files.
Since you're using TrueZip, I recommend implementing de.schlichtherle.io.rof.ReadOnlyFile using Apache HTTP Client and creating a de.schlichtherle.util.zip.ZipFile with that.
This won't provide any advantage for compressed TAR archives since the entire archive is compressed together (beyond just using an InputStream and killing it when you have your entry).
Since TrueZIP 7.2, there is a new client API in the module TrueZIP Path. This is an implementation of an NIO.2 FileSystemProvider for JSE 7. Using this API, you can access HTTP URI as follows:
Path path = new TPath(new URI("http://acme.com/download/everything.tar.gz/README.TXT"));
try (InputStream in = Files.newInputStream(path)) {
// Read archive entry contents here.
...
}
I'm not sure if there's a way to pull out a single file from a ZIP without downloading the whole thing first. But, if you're the one hosting the ZIP file, you could create a Java servlet which reads the ZIP file and returns the requested file in the response:
public class GetFileFromZIPServlet extends HttpServlet{
#Override
public void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException{
String pathToFile = request.getParameter("pathToFile");
byte fileBytes[];
//get the bytes of the file from the ZIP
//set the appropriate content type, maybe based on the file extension
response.setContentType("...");
//write file to the response
response.getOutputStream().write(fileBytes);
}
}
Related
I am working on a Springboot application that has to return a zip file to a frontend when the user downloads some report. I want to create a zip file without writing the zip file or the original files to disk.
The directory I want to zip contains other directories, that contain the actual files. For example, dir1 has subDir1 and subDir2 inside, subDir1 will have two file subDir1File1.pdf and subDir1File2.pdf. subDir2 will also have files inside.
I can do this easily by creating the physical files on the disk. However, I feel it will be more elegant to return these files without writing to disk.
You would use ByteArrayOutputStream if the scope was to write to memory. In essence, the zip file would be entirely contained in memory, so be sure that you don't risk to have too many requests at once and that the file size is reasonable in size! Otherwise this approach can seriously backfire!
You can use following snippet :
public static byte[] zip(final String str) throws IOException {
if (StringUtils.isEmpty(str)) {
throw new IllegalArgumentException("Cannot zip null or empty string");
}
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try (GZIPOutputStream gos = new GZIPOutputStream(bos)) {
gos.write(str.getBytes(StandardCharsets.UTF_8));
}
return bos.toByteArray();
}
But as stated in another answer, make sure you are not risking your program too much by loading everything into your java memory.
Please note that you should stream whenever possible. In your case, you could write your data to https://docs.oracle.com/javase/8/docs/api/index.html?java/util/zip/ZipOutputStream.html.
The only downside of this appproach is: the client won't be able to show a download status bar, because the server will not be able to send the "Content-length" header. That's because the size of a ZIP file can only be known after it has been generated, but the server needs to send the headers first. So - no temporary zip file - no file size beforehand.
You are also talking about subdirectories. This is just a naming issue when dealing with a ZIP stream. Each zip item needs to be named like this: "directory/directory2/file.txt". This will produce subdirectories when unzipping.
I would really appreciate your input on the below scenario please.
The requirements:
- I have a 7zip archive file with several thousands of files in it
- I have a java application running on linux that is required to retrieve individual files from the 7 zip file
I would like to retrieve a file from the archive by its path (e.g. my7zFile.7z/file1.pdf) without having to iterate through all the files in the archive and comparing file names.
I would like to avoid having to extract all files from the archive before running the search (the uncompressed archive is several TB).
I had a look into 7zip Java Binding - specifically the IInArchive class, the only extract method seems to work via file index, not via file name:
http://sevenzipjbind.sourceforge.net/javadoc/net/sf/sevenzipjbinding/IInArchive.html
Do you know of any other libraries that could help me with this use case or am I overlooking a way of doing this with 7zip jbinding?
Thank you
Kind regards,
Tobi
Sadly it appears the API doesn't provide enough to fulfill all your requirements. In order to extract a single file it appears you need to walk the archive index. The simplified interface to the archive makes this much easier:
The ISimpleInArchive interface provides:
ISimpleInArchiveItem[] getArchiveItems()
Allowing you to retrieve an list of items in the archive.
The ISimpleInArchiveItem interface provides the method:
java.lang.String getPath()
Hence you can walk the archiveItems comparing on path. Granted this is against your requirements.
However, note this walks the index table and does not extract the files until requested. Once you have the item your after you can use:
ExtractOperationResult extractSlow(ISequentialOutStream SequentialOutStream)
on the item you have found to actually extract it.
Looking at the 7z file format (note this is not the official site of 7zip), the header information is all at the end of the file with the Signature header at the start of the file giving an offset to the start of the header info. So provided the SevenZip bindings are written nicely, your search will at most read the start of the file (SignatureHeader) to find the offset to the HeaderInfo section, then walk the HeaderInfo section in order to build up the file list required in getArchiveItems(). Only once you have the item you need will it shift back to the index of the actual stream for the file you want extracted (most likely when you call extractSlow).
So whilst not all your requirements are met, the overhead of the search/compare required is limited to only searching the header info of the archive.
Once I wrote a code to read from all the files and folders from a zip file. I had a long file(text)/folder hierarchy inside the zip file. I am not sure whether that will help you or not. I am sharing the skeleton of the code.
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
ZipFile zipFile = new ZipFile(filepath); // filepath of the zip file
Enumeration<? extends ZipEntry> entries = zipFile.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
if (entry.isDirectory()) { // found directory inside the zipFile
// write your code here
} else {
InputStream stream = zipFile.getInputStream(entry);
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
// write your code to read the content of the file
}
}
You can modify the code to reach your desired file in the zip. But i don't think you will be able to access the file directly rather you have to walk through all the paths of the zip archive. Note that, ZipFile iterates through all file and folders inside a zipped file in DFS (Depth First Search) manner. You will find detailed relevant examples in web.
I would like to be able to upload files in my JSF2.2 web application, so I started using the new <h:inputFile> component.
My only question is, how can I specify the location, where the files will be saved in the server? I would like to get hold of them as java.io.File instances. This has to be implemented in the backing bean, but I don't clearly understand how.
JSF won't save the file in any predefined location. It will basically just offer you the uploaded file in flavor of a javax.servlet.http.Part instance which is behind the scenes temporarily stored in server's memory and/or temporary disk storage location which you shouldn't worry about.
Important is that you need to read the Part as soon as possible when the bean action (listener) method is invoked. The temporary storage may be cleared out when the HTTP response associated with the HTTP request is completed. In other words, the uploaded file won't necessarily be available in a subsequent request.
So, given a
<h:form enctype="multipart/form-data">
<h:inputFile value="#{bean.uploadedFile}">
<f:ajax listener="#{bean.upload}" />
</h:inputFile>
</h:form>
You have basically 2 options to save it:
1. Read all raw file contents into a byte[]
You can use InputStream#readAllBytes() for this.
private Part uploadedFile; // +getter+setter
private String fileName;
private byte[] fileContents;
public void upload() {
fileName = Paths.get(uploadedFile.getSubmittedFileName()).getFileName().toString(); // MSIE fix.
try (InputStream input = uploadedFile.getInputStream()) {
fileContents = input.readAllBytes();
}
catch (IOException e) {
// Show faces message?
}
}
Note the Path#getFileName(). This is a MSIE fix as to obtaining the submitted file name. This browser incorrectly sends the full file path along the name instead of only the file name.
In case you're not on Java 9 yet and therefore can't use InputStream#readAllBytes(), then head to Convert InputStream to byte array in Java for all other ways to convert InputStream to byte[].
Keep in mind that each byte of an uploaded file costs one byte of server memory. Be careful that your server don't exhaust of memory when users do this too often or can easily abuse your system in this way. If you want to avoid this, better use (temporary) files on local disk file system instead.
2. Or, write it to local disk file system
In order to save it to the desired location, you need to get the content by Part#getInputStream() and then copy it to the Path representing the location.
private Part uploadedFile; // +getter+setter
private File savedFile;
public void upload() {
String fileName = Paths.get(uploadedFile.getSubmittedFileName()).getFileName().toString(); // MSIE fix.
savedFile = new File(uploads, fileName);
try (InputStream input = file.getInputStream()) {
Files.copy(input, savedFile.toPath());
}
catch (IOException e) {
// Show faces message?
}
}
Note the Path#getFileName(). This is a MSIE fix as to obtaining the submitted file name. This browser incorrectly sends the full file path along the name instead of only the file name.
The uploads folder and the filename is fully under your control. E.g. "/path/to/uploads" and Part#getSubmittedFileName() respectively. Keep in mind that any existing file would be overwritten, you might want to use File#createTempFile() to autogenerate a filename. You can find an elaborate example in this answer.
Do not use Part#write() as some prople may suggest. It will basically rename the file in the temporary storage location as identified by #MultipartConfig(location). Also do not use ExternalContext#getRealPath() in order to save the uploaded file in deploy folder. The file will get lost when the WAR is redeployed for the simple reason that the file is not contained in the original WAR. Always save it on an absolute path outside the deploy folder.
For a live demo of upload-and-preview feature, check the demo section of the <o:inputFile> page on OmniFaces showcase.
See also:
Write file into disk using JSF 2.2 inputFile
How to save uploaded file in JSF
Recommended way to save uploaded files in a servlet application
Hey guys I'm currently using jarchivelib which can be found Here I'm stuck on figuring out a way to read the file without having to use the unpack method because it makes a file of the unpacked version. EX:
File archive = new File("/home/jack/archive.zip");
File destination = new File("/home/jack/archive");
Archiver archiver = ArchiverFactory.createArchiver(ArchiveFormat.ZIP);
archiver.extract(archive, destination);
I want to make it so i don't have to unpack it to read the files... If there is no way to do that I'm guessing in my method for Jframe.setDefualtCloseOpperation i'll have to make a custom one so it deletes the files? or is there a better way for handling temp files?
If all you want to do is to extract the file, why not use Java's built in zip to extract the file or if it is password protected you can use Zip4j. These libraries support streams, so that you can extract the contents of the file without writing it a FileStream
As of version 0.4.0, the jarchivelib Archiver API supports streaming an archive rather than extracting it directly onto the filesystem.
ArchiveStream stream = archiver.stream(archive);
ArchiveEntry entry;
while((entry = stream.getNextEntry()) != null) {
// access each archive entry individually using the stream
// or extract it using entry.extract(destination)
// or fetch meta-data using entry.getName(), entry.isDirectory(), ...
}
stream.close();
when the stream is pointing to an entry after calling getNextEntry, you can use the stream.read methods just as you would reading an individual entry.
I followed this tutorial for uploading a file in my JSF2 application.
The application works fine but I am unhappy with one aspect.
While rebuilding the request, the File sent via request is saved somewhere on the disk.
Even though the file is saved I need to rename the file with a name which is available after entering the Managed Bean containing the action method.
Therefore I decided to create a new file with de desired name, copy the already saved file, and then delete the unneeded one.
private File uploadFile;
//...
try {
BufferedWriter bw = new BufferedWriter(new FileWriter(newFile));
BufferedReader br = new BufferedReader(new FileReader(uploadFile));
String line = "";
while ((line = br.readLine()) != null){
bw.write(line);
}
} catch (Exception e){}
The new file appears in the desired location but this error is thrown when I'm trying to open the file: "Invalid or unsupported PNG file"
These are my questions:
Is there a better way to solve this problem?
Is this solution the best way to upload a picture? Is there a reason to save the file before the business logic when there may be need to resize the picture or the desired name is not available yet.
LE:
I know abot this tutorial as well but I'm trying to do this mojarra only.
There is a rename method built into java.io.File object already, I'd be surprised if it didn't work for your situation.
public boolean renameTo(File dest)
Renames the file denoted by this abstract pathname.
Many aspects of the behavior of this method are inherently platform-dependent:
The rename operation might not be able to move a file from one filesystem to
another, it might not be atomic, and it might not succeed if a file with the
destination abstract pathname already exists. The return value should always
be checked to make sure that the rename operation was successful.
You can also check if a file exists before saving it, and you can use the ImageIO class to do validations on the uploaded file before performing the initial save.
Don't use Reader and Writer when you deal with binary files like images. Use streams: FileInputStream and FileOutputStream. And the best variant is to use #Perception solution with renameTo method.
Readers read file as if it consists of characters (e.g. txt, properties, yaml files). Image files are not characters, they are binary and you must use streams for that.