Checking if a stream is a zip file

Checking if a stream is a zip file - java

We have a requirement to determine whether an incoming InputStream is a reference to an zip file or zip data. We do not have reference to the underlying source of the stream. We aim to copy the contents of this stream into an OutputStream directed at an alternate location.
I tried reading the stream using ZipInputStream and extracting a ZipEntry. The ZipEntry is null if the stream is a regular file - as expected - however, in checking for a ZipEntry I loose the initial couple of bytes from the stream. Hence, by the time I know that the stream is a regular stream, I have already lost initial data from the stream.
Any thoughts around how to check if the InputStream is an archive without data loss would be helpful.
Thanks.

Assuming your original inputstream is not buffered, I would try wrapping the original stream in a BufferedInputStream, before wrapping that in a ZipInputStream to check. You can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.

This is how I did it.
Using mark/reset to restore the stream if the GZIPInputStream detects incorrect zip format (throws the ZipException).
/**
* Wraps the input stream with GZIPInputStream if needed.
* #param inputStream
* #return
* #throws IOException
*/
private InputStream wrapIfZip(InputStream inputStream) throws IOException {
if (!inputStream.markSupported()) {
inputStream = new BufferedInputStream(inputStream);
}
inputStream.mark(1000);
try {
return new GZIPInputStream(inputStream);
} catch (ZipException e) {
inputStream.reset();
return inputStream;
}
}

You can check first bytes of stream for ZIP local header signature (PK 0x03 0x04), that would be enough for most cases. If you need more precision, you should take last ~100 bytes and check for central directory locator fields.

You have described a java.io.PushbackInputStream - in addition to read(), it has an unread(byte[]) which allows you push them bck to the front of the stream, and to re-read() them again.
It's in java.io since JDK1.0 (though I admit I haven't seen a use for it until today).

It sounds a bit like a hack, but you could implement a proxy java.io.InputStream to sit between ZipInputStream and the stream you originally passed to ZipInputStream's constructor. Your proxy would stream to a buffer until you know whether it's a ZIP file or not. If not, then the buffer saves your day.

Related

Transfer data from ReadableByteChannel to file

Goal: Decrypt data from one source and write the decrypted data to a file.
try (FileInputStream fis = new FileInputStream(targetPath.toFile());
ReadableByteChannel channel = newDecryptedByteChannel(path, associatedData))
{
FileChannel fc = fis.getChannel();
long position = 0;
while (position < ???)
{
position += fc.transferFrom(channel, position, CHUNK_SIZE);
}
}
The implementation of newDecryptedByteChannel(Path,byte[]) should not be of interest, it just returns a ReadableByteChannel.
Problem: What is the condition to end the while loop? When is the "end of the byte channel" reached? Is transferFrom the right choice here?
This question might be related (answer is to just set the count to Long.MAX_VALUE). Unfortunately this doesn't help me because the docs say that up to count bytes may be transfered, depending upon the natures and states of the channels.
Another thought was to just check whether the amount of bytes actually transferred is 0 (returned from transferFrom), but this condition may be true if the source channel is non-blocking and has fewer than count bytes immediately available in its input buffer.

It is one of the bizarre features of FileChannel. transferFrom() that it never tells you about end of stream. You have to know the input length independently.
I would just use streams for this: specifically, a CipherInputStream around a BufferedInputStream around a FileInputStream, and a FileOutputStream.
But the code you posted doesn't make any sense anyway. It can't work. You are transferring into the input file, and via a channel that was derived from a FileInputStream, so it is read-only, so transferFrom() will throw an exception.

As commented by #user207421, as you are reading from ReadableByteChannel, the target channel needs to be derived from FileOutputStream rather than FileInputStream. And the condition for ending loop in your code should be the size of file underlying the ReadableByteChannel which is not possible to get from it unless you are able to get FileChannel and find the size through its size method.
The way I could find for transferring is through ByteBuffer as below.
ByteBuffer buf = ByteBuffer.allocate(1024*8);
while(readableByteChannel.read(buf)!=-1)
{
buf.flip();
fc.write(buf); //fc is FileChannel derived from FileOutputStream
buf.compact();
}
buf.flip();
while(buf.hasRemainig())
{
fc.write(buf);
}

When do I need to specify the encoding while writing the file to the disk?

I have a sample method which copies one file to another using InputStream and OutputStream. In this case, the source file is encoded in 'UTF-8'. Even if I don't specify the encoding while writing to the disk, the destination file has the correct encoding. But, if I have to write a java.lang.String to a file, I need to specify the encoding. Why is that ?
public static void copyFile() {
String sourceFilePath = "C://my_encoded.txt";
InputStream inStream = null;
OutputStream outStream = null;
try{
String targetFilePath = "C://my_target.txt";
File sourcefile =new File(sourceFilePath);
outStream = new FileOutputStream(targetFilePath);
inStream = new FileInputStream(sourcefile);
byte[] buffer = new byte[1024];
int length;
//copy the file content in bytes
while ((length = inStream.read(buffer)) > 0){
outStream.write(buffer, 0, length);
}
inStream.close();
outStream.close();
System.out.println("File "+targetFilePath+" is copied successful!");
}catch(IOException e){
e.printStackTrace();
}
}
My guess is that since the source file has thee correct encoding and since we read and write one byte at a time, it works fine. And java.lang.String is 'UTF-16' by default and if we write it to the file, it reads one byte at a time instead of 2 bytes and hence garbage values. Is that correct or am I completely wrong in my understanding ?

You are copying the file byte per byte, so you don't need to care about character encoding.
As a rule of thumb:
Use the various InputStream and OutputStream implementations for byte-wise processing (like file copy).
There are some convenience methods to handle text directly like PrintStream.println(). Be careful because most of them use the default platform specific encoding.
Use the various Reader and Writer implemenations for reading and writing text.
If you need to convert between byte-wise and text processing use InputStreamReader and OutputStreamWriter with explicit file encoding.
Do not rely on the default encoding. The default character encoding is platform specific (e.g. Windows-ANSI aka Cp1252 for Windows, usually UTF-8 on Linux).
Example: If you need to read a UTF-8 text file:
BufferedReader reader =
new BufferedReader(new InputStreamReader(new FileInputStream(inFile), "UTF-8"));
Avoid using a FileReader because a FileReader uses always the default encoding.
A special case: If you need random access to a file you should use RandomAccessFile. With it you can read and write data blocks at arbitrary positions. You can read and write raw byte blocks or you can use convenience methods to read and write text. But you should read the documentation carefully. E.g. the methods readUTF() and writeUTF() use a modified UTF-8 encoding.
InputStream, OutputStream, Reader, Writer and RandomAccessFile form the basic IO functionality, enough for most use cases. For advanced IO (e.g. memory mapped files, ...) have a look at package java.nio.

Just read your code! (For the copy part at least ;-) )
When you copy the two files, you copy it byte by byte. There is no conversion to String, thus.
When you write a String into a file, you need to convert it (indirectly sometimes) in an array of byte (byte[]). There you need to specify your encoding.
When you read a file to get a String, you need to know its encoding in order to do it properly. Java doesn't 'skip' any byte but you need to make a conversion once again : from a byte[] to a String.

How do I use a ChannelBufferOutputStream to check compression size

In a java program I am compressing an InputStream like this:
ChannelBufferOutputStream outputStream = new ChannelBufferOutputStream(ChannelBuffers.dynamicBuffer(BUFFER_SIZE));
GZIPOutputStream compressedOutputStream = new GZIPOutputStream(outputStream);
try {
IOUtils.copy(inputStream, compressedOutputStream);
} finally {
// this should print the byte size after compression
System.out.println(outputStream.writtenBytes());
}
I am testing this code with a json file that is ~31.000 byte uncompressed and ~7.000 byte compressed on disk. Sending a InputStream that is wrapping the uncompressed json file to the code above, outputStream.writtenBytes() returns 10 which would indicate that it compressed down to only 10 byte. That seems wrong, so I wonder where the problem is. ChannelBufferOutputStream javadoc says: Returns the number of written bytes by this stream so far. So it should be working.

Try calling GZIPOutputStream.finish() or flush() methods before counting bytes
If that does not work, you can create a proxy stream, whose mission - to count the number of bytes that have passed through it

Spring OuputStream as File Download?

I would like to send OutputStream object, which has pdf data, as file to the webbrowser.
The code is as follows.
#RequestMapping(value="/issue", method=RequestMethod.POST)
public void issue(HttpServletResponse response, TimeStampIssueParam param) throws JsonGenerationException, JsonMappingException, IOException {
OutputStream pdfOuput = issue(input);
response.setContentType("application/pdf");
ServletOutputStream respOutput = response.getOutputStream();
....
}
The problem is I already have the outputstream, and I do not want to convert it to byte array.
Any comment would be appreciated.

You can't: you can only copy an InputStream to an OutputStream. Then, you'll can use: org.springframework.util.FileCopyUtils.copy(InputStream, OutputStream)

First, I would say it is wrong to say that the OutputStream has any data. A stream just lets the data through to some destination. Sometimes (SocketOutputStream) this destination may be on a completely different computer, and sometimes (ByteArrayOutputStream) it will be closely related to the stream and even obtainable through it. But this is a detail of a specific stream, not something you can count on from an arbitrary one.
So, not knowing exactly where the result of the issue method comes from it is hard to provide a solution, but a generic OutputStream is not what it should return.
Guessing that the method generates some PDF data and writes it somewhere via an OutputStream, then returns the stream:
If it creates a File and the stream happens to be a FileOutputStream, it should return the file, file path or a FileInputStream for the same file instead.
If it creates eg. a ByteArrayOutputStream, then you already have a byte array, and additionally this stream type has a writeTo method that can be used directly to write the data to the ServletOutputStream; issue just has to return the stream as the proper type not hiding it behind the general interface.
For other OutputStream types, well, it depends on what exactly they are.

how to write a file without allocating the whole byte array into memory?

This is a newbie question, I know. Can you guys help?
I'm talking about big files, of course, above 100MB. I'm imagining some kind of loop, but I don't know what to use. Chunked stream?
One thins is for certain: I don't want something like this (pseudocode):
File file = new File(existing_file_path);
byte[] theWholeFile = new byte[file.length()]; //this allocates the whole thing into memory
File out = new File(new_file_path);
out.write(theWholeFile);
To be more specific, I have to re-write a applet that downloads a base64 encoded file and decodes it to the "normal" file. Because it's made with byte arrays, it holds twice the file size in memory: one base64 encoded and the other one decoded. My question is not about base64. It's about saving memory.
Can you point me in the right direction?
Thanks!

From the question, it appears that you are reading the base64 encoded contents of a file into an array, decoding it into another array before finally saving it.
This is a bit of an overhead when considering memory. Especially given the fact that Base64 encoding is in use. It can be made a bit more efficient by:
Reading the contents of the file using a FileInputStream, preferably decorated with a BufferedInputStream.
Decoding on the fly. Base64 encoded characters can be read in groups of 4 characters, to be decoded on the fly.
Writing the output to the file, using a FileOutputStream, again preferably decorated with a BufferedOutputStream. This write operation can also be done after every single decode operation.
The buffering of read and write operations is done to prevent frequent IO access. You could use a buffer size that is appropriate to your application's load; usually the buffer size is chosen to be some power of two, because such a number does not have an "impedance mismatch" with the physical disk buffer.

Perhaps a FileInputStream on the file, reading off fixed length chunks, doing your transformation and writing them to a FileOutputStream?

Perhaps a BufferedReader? Javadoc: http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/io/BufferedReader.html

Use this base64 encoder/decoder, which will wrap your file input stream and handle the decoding on the fly:
InputStream input = new Base64.InputStream(new FileInputStream("in.txt"));
OutputStream output = new FileOutputStream("out.txt");
try {
byte[] buffer = new byte[1024];
int readOffset = 0;
while(input.available() > 0) {
int bytesRead = input.read(buffer, readOffset, buffer.length);
readOffset += bytesRead;
output.write(buffer, 0, bytesRead);
}
} finally {
input.close();
output.close();
}

You can use org.apache.commons.io.FileUtils. This util class provides other options too beside what you are looking for. For example:
FileUtils.copyFile(final File srcFile, final File destFile)
FileUtils.copyFile(final File input, final OutputStream output)
FileUtils.copyFileToDirectory(final File srcFile, final File destDir)
And so on.. Also you can follow this tut.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.