How to unzip file from InputStream - java

I'm trying to get a zip file from the server.
Im using HttpURLConnection to get InputStream and this is what i have:
myInputStream.toString().getBytes().toString() is equal to [B#4.....
byte[] bytes = Base64.decode(myInputStream.toString(), Base64.DEFAULT);
String string = new String(bytes, "UTF-8");
string == �&ܢ��z�m����y....
I realy tried to unzip this file but nothing works, also there is so many questions, should I use GZIPInputStream or ZipInputStream? Do I have to save this stream as file, or I can work on InputStream
Please help, my boss is getting impatient:O
I have no idea what is in this file i have to find out:)

GZipInputStream and ZipInputStream are two different formats. https://en.wikipedia.org/wiki/Gzip
It is not a good idea to retrieve a string directly from the stream.From an InputStream, you can create a File and write data into it using a FileOutputStream.
Decoding in Base 64 is something else. If your stream has already decoded the format upstream, it's OK; otherwise you have to encapsulate your stream with another input stream that decodes the Base64 format.
The best practice is to use a buffer to avoid memory overflow.
Here is some Kotlin code that decompresses the InputStream zipped into a file. (simpler than java because the management of byte [] is tedious) :
val fileBinaryDecompress = File(..path..)
val outputStream = FileOutputStream(fileBinaryDecompress)
readFromStream(ZipInputStream(myInputStream), BUFFER_SIZE_BYTES,
object : ReadBytes {
override fun read(buffer: ByteArray) {
outputStream.write(buffer)
}
})
outputStream.close()
interface ReadBytes {
/**
* Called after each buffer fill
* #param buffer filled
*/
#Throws(IOException::class)
fun read(buffer: ByteArray)
}
#Throws(IOException::class)
fun readFromStream(inputStream: InputStream, bufferSize: Int, readBytes: ReadBytes) {
val buffer = ByteArray(bufferSize)
var read = 0
while (read != -1) {
read = inputStream.read(buffer, 0, buffer.size)
if (read != -1) {
val optimizedBuffer: ByteArray = if (buffer.size == read) {
buffer
} else {
buffer.copyOf(read)
}
readBytes.read(optimizedBuffer)
}
}
}
If you want to get the file from the server without decompressing it, remove the ZipInputStream() decorator.

Usually, there is no significant difference between GZIPInputStream or ZipInputStream, so if at all, both should work.
Next, you need to identify whether the zipped stream was Base64 encoded, or the some Base64 encoded contents was put into a zipped stream - from what you put to your question, it seems to be the latter option.
So you should try
ZipInputStream zis = new ZipInputStream( myInputStream );
ZipEntry ze = zis.getNextEntry();
InputStream is = zis.getInputStream( ze );
and proceed from there ...

basically by setting inputStream to be GZIPInputStream should be able to read the actual content.
Also for simplicity using IOUtils package from apache.commons makes your life easy
this works for me:
InputStream is ; //initialize you IS
is = new GZIPInputStream(is);
byte[] bytes = IOUtils.toByteArray(is);
String s = new String(bytes);
System.out.println(s);

Related

Why does ZipInputStream.getNextEntry() return null?

I have a zip file that is AES encrypted. After decrypting I'm left with a byte[] containing the zip content. But when I try to unzip it, using a ByteArrayInputStream, ZipInputStream.getNextEntry() returns null right away. Debugging, I see that my byte[] doesn't have a required local file header signature which is
static long LOCSIG = 0x04034b50L; // "PK\003\004"
so ZipInputStream.getNextEntry() returns null.
If, however, I write those decrypted bytes out to a file and then use a FileInputStream() that I pass to ZipInputStream(), everything works as expected. Below is my current code. Can anyone suggest a way to unzip without first writing out to a temporary file?
byte[] data = AESUtil.decryptInputStream(...);
ByteArrayInputStream bis = new ByteArrayInputStream(data);
ZipInputStream stream = new ZipInputStream(bis);
ZipEntry entry;
while ((entry = stream.getNextEntry()) != null) {
...
}
I've come to the conclusion that ZipInputStream just isn't flexible enough. When I substituted the above code with Apache Commons' compress classes, everything works. Below is a working implementation using that library and the same byte array:
byte[] data = AESUtil.decryptInputStream(...);
SeekableInMemoryByteChannel inMemoryByteChannel = new SeekableInMemoryByteChannel(data);
ZipFile zipFile = new ZipFile(inMemoryByteChannel);
Iterator<ZipArchiveEntry> iterator = = zipFile.getEntriesInPhysicalOrder().asIterator();
while (iterator.hasNext()) {
...
}

Extract tar.gz file in memory in Java

I'm using the Apache Compress library to read a .tar.gz file, something like this:
final TarArchiveInputStream tarIn = initializeTarArchiveStream(this.archiveFile);
try {
TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
while (tarEntry != null) {
byte[] btoRead = new byte[1024];
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
int len = 0;
while ((len = tarIn.read(btoRead)) != -1) {
bout.write(btoRead, 0, len);
}
bout.close();
tarEntry = tarIn.getNextTarEntry();
}
tarIn.close();
}
catch (IOException e) {
e.printStackTrace();
}
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
You could replace the file stream with a ByteArrayOutputStream.
i.e. replace this:
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
with this:
ByteArrayOutputStream bout = new ByteArrayOutputStream();
and then after closing bout, use bout.toByteArray() to get the bytes.
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
Yea sure.
Just replace the code in the inner loop that is openning files and writing to them with code that writes to a ByteArrayOutputStream ... or a series of such streams.
The natural representation of the data that you read from the TAR (like that) will be bytes / byte arrays. If the bytes are properly encoded characters, and you know the correct encoding, then you can convert them to strings. Otherwise, it is better to leave the data as bytes. (If you attempt to convert non-text data to strings, or if you convert using the wrong charset/encoding you are liable to mangle it ... irreversibly.)
Obviously, you are going to need to think through some of these issues yourself, but basic idea should work ... provided you have enough heap space.
copy the value of btoread to a String like
String s = String.valueof(byteVar);
and goon appending the byte value to the string untill end of the file reaches..

about download a file by java drive api

I use the "get" method from java drive api, and I can get the inputstream. but I cannt open the file when I use the inputstream to creat it. It likes the file is broken.
private static String fileurl = "C:\\googletest\\drive\\";
public static void newFile(String filetitle, InputStream stream) throws IOException {
String filepath = fileurl + filetitle;
BufferedInputStream bufferedInputStream=new BufferedInputStream(stream);
byte[] buffer = new byte[bufferedInputStream.available()];
File file = new File(filepath);
if (!file.exists()) {
file.getParentFile().mkdirs();
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(new FileOutputStream(filepath));
while( bufferedInputStream.read(buffer) != -1) {
bufferedOutputStream.write(buffer);
}
bufferedOutputStream.flush();
bufferedOutputStream.close();
}
}
Firstly, C:\googletest\drive\ is not a URL. It is a file system pathname.
Next, the following probably does not do what you think it does:
byte[] buffer = new byte[bufferedInputStream.available()];
The problem is that the available() call can return zero ... for a non-empty stream. The value returned by available() is an estimate of how many bytes that are currently available to read ... right now. That is not necessarily the stream length ... or anything related to it. And indeed the device drivers for some devices consistently return zero, even when there is data to be read.
Finally, this is wrong:
while( bufferedInputStream.read(buffer) != -1) {
bufferedOutputStream.write(buffer);
You are assuming that read returning -1 means that it filled the buffer. That is not so. Any one of the read calls could return with a partly full buffer. But then you write the entire buffer contents to the output stream ... including "junk" from previous reads.
Either or both of the 2nd and 3rd problems could lead to file corruption. In fact, the third one is likely to.

Strange byte[] behavior reading from a URL

In the end, my ultimate goals are:
Read from a URL (what this question is about)
Save the retrieved [PDF] content to a BLOB field in a DB (already have that nailed down)
Read from the BLOB field and attach that content to an email
All without going to a filesystem
The goal with the following method is to get a byte[] that can be used downstream as an email attachment (to avoid writing to disk):
public byte[] retrievePDF() {
HttpClient httpClient = new HttpClient();
GetMethod httpGet = new GetMethod("http://website/document.pdf");
httpClient.executeMethod(httpGet);
InputStream is = httpGet.getResponseBodyAsStream();
byte[] byteArray = new byte[(int) httpGet.getResponseContentLength()];
is.read(byteArray, 0, byteArray.length);
return byteArray;
}
For a particular PDF, the getResponseContentLength() method returns 101,689 as the length. The strange part is that if I set a break-point and interrogate the byteArray variable, it has 101,689 byte elements, however, after byte #3744 the remaining bytes of the array are all zeroes (0). The resulting PDF is then not readable by a PDF-reader client, like Adobe Reader.
Why would that happen?
Retrieving this same PDF via browser and saving to disk, or using a method like the following (which I patterned after an answer to this StackOverflow post), results in a readable PDF:
public void retrievePDF() {
FileOutputStream fos = null;
URL url;
ReadableByteChannel rbc = null;
url = new URL("http://website/document.pdf");
DataSource urlDataSource = new URLDataSource(url);
/* Open a connection, then set appropriate time-out values */
URLConnection conn = url.openConnection();
conn.setConnectTimeout(120000);
conn.setReadTimeout(120000);
rbc = Channels.newChannel(conn.getInputStream());
String filePath = "C:\\temp\\";
String fileName = "testing1234.pdf";
String tempFileName = filePath + fileName;
fos = new FileOutputStream(tempFileName);
fos.getChannel().transferFrom(rbc, 0, 1 << 24);
fos.flush();
/* Clean-up everything */
fos.close();
rbc.close();
}
For both approaches, the size of the resulting PDF is 101,689-bytes when doing a Right-click > Properties... in Windows.
Why would the byte array essentially "stop" part-way through?
InputStream.read reads up to byteArray.length bytes but might not read exactly that much. It returns how many bytes it read. You should call it repeatedly to fully read the data, like this:
int bytesRead = 0;
while (true) {
int n = is.read(byteArray, bytesRead, byteArray.length);
if (n == -1) break;
bytesRead += n;
}
Check the return value of InputStream.read. It's not going to read all at one go. You have to write a loop. Or, better yet, use Apache Commons IO to copy the stream.
101689 = 2^16 + 36153 so it would look like, that there is a 16 bit limitation on buffer size.
The difference between 36153 and 3744 maybe stems from the header part having been read in an extra small 1K buffer or so, and already containing some bytes.

Reading from a ZipInputStream into a ByteArrayOutputStream

I am trying to read a single file from a java.util.zip.ZipInputStream, and copy it into a java.io.ByteArrayOutputStream (so that I can then create a java.io.ByteArrayInputStream and hand that to a 3rd party library that will end up closing the stream, and I don't want my ZipInputStream getting closed).
I'm probably missing something basic here, but I never enter the while loop here:
ByteArrayOutputStream streamBuilder = new ByteArrayOutputStream();
int bytesRead;
byte[] tempBuffer = new byte[8192*2];
try {
while ((bytesRead = zipStream.read(tempBuffer)) != -1) {
streamBuilder.write(tempBuffer, 0, bytesRead);
}
} catch (IOException e) {
// ...
}
What am I missing that will allow me to copy the stream?
Edit:
I should have mentioned earlier that this ZipInputStream is not coming from a file, so I don't think I can use a ZipFile. It is coming from a file uploaded through a servlet.
Also, I have already called getNextEntry() on the ZipInputStream before getting to this snippet of code. If I don't try copying the file into another InputStream (via the OutputStream mentioned above), and just pass the ZipInputStream to my 3rd party library, the library closes the stream, and I can't do anything more, like dealing with the remaining files in the stream.
Your loop looks valid - what does the following code (just on it's own) return?
zipStream.read(tempBuffer)
if it's returning -1, then the zipStream is closed before you get it, and all bets are off. It's time to use your debugger and make sure what's being passed to you is actually valid.
When you call getNextEntry(), does it return a value, and is the data in the entry meaningful (i.e. does getCompressedSize() return a valid value)? IF you are just reading a Zip file that doesn't have read-ahead zip entries embedded, then ZipInputStream isn't going to work for you.
Some useful tidbits about the Zip format:
Each file embedded in a zip file has a header. This header can contain useful information (such as the compressed length of the stream, it's offset in the file, CRC) - or it can contain some magic values that basically say 'The information isn't in the stream header, you have to check the Zip post-amble'.
Each zip file then has a table that is attached to the end of the file that contains all of the zip entries, along with the real data. The table at the end is mandatory, and the values in it must be correct. In contrast, the values embedded in the stream do not have to be provided.
If you use ZipFile, it reads the table at the end of the zip. If you use ZipInputStream, I suspect that getNextEntry() attempts to use the entries embedded in the stream. If those values aren't specified, then ZipInputStream has no idea how long the stream might be. The inflate algorithm is self terminating (you actually don't need to know the uncompressed length of the output stream in order to fully recover the output), but it's possible that the Java version of this reader doesn't handle this situation very well.
I will say that it's fairly unusual to have a servlet returning a ZipInputStream (it's much more common to receive an inflatorInputStream if you are going to be receiving compressed content.
You probably tried reading from a FileInputStream like this:
ZipInputStream in = new ZipInputStream(new FileInputStream(...));
This won’t work since a zip archive can contain multiple files and you need to specify which file to read.
You could use java.util.zip.ZipFile and a library such as IOUtils from Apache Commons IO or ByteStreams from Guava that assist you in copying the stream.
Example:
ByteArrayOutputStream out = new ByteArrayOutputStream();
try (ZipFile zipFile = new ZipFile("foo.zip")) {
ZipEntry zipEntry = zipFile.getEntry("fileInTheZip.txt");
try (InputStream in = zipFile.getInputStream(zipEntry)) {
IOUtils.copy(in, out);
}
}
I'd use IOUtils from the commons io project.
IOUtils.copy(zipStream, byteArrayOutputStream);
You're missing call
ZipEntry entry = (ZipEntry) zipStream.getNextEntry();
to position the first byte decompressed of the first entry.
ByteArrayOutputStream streamBuilder = new ByteArrayOutputStream();
int bytesRead;
byte[] tempBuffer = new byte[8192*2];
ZipEntry entry = (ZipEntry) zipStream.getNextEntry();
try {
while ( (bytesRead = zipStream.read(tempBuffer)) != -1 ){
streamBuilder.write(tempBuffer, 0, bytesRead);
}
} catch (IOException e) {
...
}
You could implement your own wrapper around the ZipInputStream that ignores close() and hand that off to the third-party library.
thirdPartyLib.handleZipData(new CloseIgnoringInputStream(zipStream));
class CloseIgnoringInputStream extends InputStream
{
private ZipInputStream stream;
public CloseIgnoringInputStream(ZipInputStream inStream)
{
stream = inStream;
}
public int read() throws IOException {
return stream.read();
}
public void close()
{
//ignore
}
public void reallyClose() throws IOException
{
stream.close();
}
}
I would call getNextEntry() on the ZipInputStream until it is at the entry you want (use ZipEntry.getName() etc.). Calling getNextEntry() will advance the "cursor" to the beginning of the entry that it returns. Then, use ZipEntry.getSize() to determine how many bytes you should read using zipInputStream.read().
It is unclear how you got the zipStream. It should work when you get it like this:
zipStream = zipFile.getInputStream(zipEntry)
t is unclear how you got the zipStream. It should work when you get it like this:
zipStream = zipFile.getInputStream(zipEntry)
If you are obtaining the ZipInputStream from a ZipFile you can get one stream for the 3d party library, let it use it, and you obtain another input stream using the code before.
Remember, an inputstream is a cursor. If you have the entire data (like a ZipFile) you can ask for N cursors over it.
A diferent case is if you only have an "GZip" inputstream, only an zipped byte stream. In that case you ByteArrayOutputStream buffer makes all sense.
Please try code bellow
private static byte[] getZipArchiveContent(File zipName) throws WorkflowServiceBusinessException {
BufferedInputStream buffer = null;
FileInputStream fileStream = null;
ByteArrayOutputStream byteOut = null;
byte data[] = new byte[BUFFER];
try {
try {
fileStream = new FileInputStream(zipName);
buffer = new BufferedInputStream(fileStream);
byteOut = new ByteArrayOutputStream();
int count;
while((count = buffer.read(data, 0, BUFFER)) != -1) {
byteOut.write(data, 0, count);
}
} catch(Exception e) {
throw new WorkflowServiceBusinessException(e.getMessage(), e);
} finally {
if(null != fileStream) {
fileStream.close();
}
if(null != buffer) {
buffer.close();
}
if(null != byteOut) {
byteOut.close();
}
}
} catch(Exception e) {
throw new WorkflowServiceBusinessException(e.getMessage(), e);
}
return byteOut.toByteArray();
}
Check if the input stream is positioned in the begging.
Otherwise, as implementation: I do not think that you need to write to the result stream while you are reading, unless you process this exact stream in another thread.
Just create a byte array, read the input stream, then create the output stream.

Categories

Resources