I get a java.lang.ArrayIndexOutOfBoundsException when using ByteArrayInputStream.
First, I use a ZipInputStream to read through a zip file,
and while looping through the zipEntries,
I use a ByteArrayInputStream to capture the data of each zipEntry
using the
ZipInputStream.read(byte[] b) and ByteArrayInputStream(byte[] b) methods.
At the end, I have a total of 6 different ByteArrayInputStream objects containing data from 6 different zipEntries.
I then use OpenCSV to read through each of the ByteArrayInputStream.
I have no problem reading 4 of the 6 ByteArrayInputStream objects, of which have byte sizes of less than 2000.
The other 2 ByteArrayInputStream objects have byte sizes of 2155 and 4010 respectively and the CSVreader was only able to read part of these 2 objects, then give an java.lang.ArrayIndexOutOfBoundsException.
This is the code I used to loop through the ZipInputStream
InputStream fileStream = attachment.getInputStream();
try {
ZipInputStream zippy = new ZipInputStream(fileStream);
ZipEntry entry = zippy.getNextEntry();
ByteArrayInputStream courseData = null;
while (entry!= null) {
String name = entry.getName();
long size = entry.getSize();
if (name.equals("course.csv")) {
courseData = copyInputStream(zippy, (int)size);
}
//similar IF statements for 5 other ByteArrayInputStream objects
entry = zippy.getNextEntry();
}
CourseDataManager.load(courseData);
}catch(Exception e){
e.printStackTrace();
}
The following is the code with which I use to copy the data from the ZipInputStream to the ByteArrayInputStream.
public ByteArrayInputStream copyInputStream(InputStream in, int size)
throws IOException {
byte[] buffer = new byte[size];
in.read(buffer);
ByteArrayInputStream b = new ByteArrayInputStream(buffer);
return b;
}
The 2 sets of openCSV codes are able to read a few lines of data, before throwing that exception, which leads me to believe that it is the byteArray that is causing the problem. Is there anything I can do or work around this problem?
I am trying to make an application that accepts a zip file, while not storing any temporary files in the web app, as I am deploying to both google app engine and tomcat server.
Fixed!!! Thanks to stephen C, i realized that read(byte[]) does not read everything so I adjusted the code to make the copyInputStream fully functional.
Since this looks like homework, here's a hint:
The read(byte[]) method returns the number bytes read.
On what line do you get the error? And have you checked the value of size? I suspect it's 0
Related
Requirement: compress a byte[] to get another byte[] using java.util.zip.ZipOutputStream BUT without using any files on disk or in-memory(like here https://stackoverflow.com/a/18406927/9132186). Is this even possible?
All the examples I found online read from a file(.txt) and write to a file(.zip). ZipOutputStream needs a ZipEntry to work with and that ZipEntry needs a file.
However, my use case is as follows: I need to compress a chunk (say 10MB) of a file at a time using a zip format and append all these compressed chunks to make a .zip file. But, when I unzip the .zip file then it is corrupted.
I am using in-memory files as suggested in https://stackoverflow.com/a/18406927/9132186 to avoid files on disk but need a solution without these files also.
public void testZipBytes() {
String infile = "test.txt";
FileInputStream in = new FileInputStream(infile);
String outfile = "test.txt.zip";
FileOutputStream out = new FileOutputStream(outfile);
byte[] buf = new byte[10];
int len;
while ((len = in.read(buf)) > 0) {
out.write(zipBytes(buf));
}
in.close();
out.close();
}
// ACTUAL function that compresses byte[]
public static class MemoryFile {
public String fileName;
public byte[] contents;
}
public byte[] zipBytesMemoryFileWORKS(byte[] input) {
MemoryFile memoryFile = new MemoryFile();
memoryFile.fileName = "try.txt";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(baos);
ZipEntry entry = new ZipEntry(memoryFile.fileName);
entry.setSize(input.length);
zos.putNextEntry(entry);
zos.write(input);
zos.finish();
zos.closeEntry();
zos.close();
return baos.toByteArray();
}
Scenario 1:
if test.txt has small amount of data (less than 10 bytes) like "this" then unzip test.txt.zip yeilds try.txt with "this" in it.
Scenario 2:
if test.txt has larger amount of data (more than 10 bytes) like "this is a test for zip output stream and it is not working" then unzip test.txt.zip yields try.txt with broken pieces of data and is incomplete.
this 10 bytes is the buffer size in testZipBytes and is the amount of data that is compressed at a time by zipBytes
Expected (or rather desired):
1. unzip test.txt.zip does not use the "try.txt" filename i gave in the MemoryFile but rather unzips to filename test.txt itself.
2. unzipped data is not broken and yields the input data as is.
3. I have done the same with GzipOutputStream and it works perfectly fine.
Requirement: compress a byte[] to get another byte[] using java.util.zip.ZipOutputStream BUT without using any files on disk or in-memory(like here https://stackoverflow.com/a/18406927/9132186). Is this even possible?
Yes, you've already done it. You don't actually need MemoryFile in your example; just delete it from your implementation and write ZipEntry entry = new ZipEntry("try.txt") instead.
But you can't concatenate the zips of 10MB chunks of file and get a valid zip file for the combined file. Zipping doesn't work like that. You could have a solution which minimizes how much is in memory at once, perhaps. But breaking the original file up into chunks seems unworkable.
I'm using the Apache Compress library to read a .tar.gz file, something like this:
final TarArchiveInputStream tarIn = initializeTarArchiveStream(this.archiveFile);
try {
TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
while (tarEntry != null) {
byte[] btoRead = new byte[1024];
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
int len = 0;
while ((len = tarIn.read(btoRead)) != -1) {
bout.write(btoRead, 0, len);
}
bout.close();
tarEntry = tarIn.getNextTarEntry();
}
tarIn.close();
}
catch (IOException e) {
e.printStackTrace();
}
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
You could replace the file stream with a ByteArrayOutputStream.
i.e. replace this:
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
with this:
ByteArrayOutputStream bout = new ByteArrayOutputStream();
and then after closing bout, use bout.toByteArray() to get the bytes.
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
Yea sure.
Just replace the code in the inner loop that is openning files and writing to them with code that writes to a ByteArrayOutputStream ... or a series of such streams.
The natural representation of the data that you read from the TAR (like that) will be bytes / byte arrays. If the bytes are properly encoded characters, and you know the correct encoding, then you can convert them to strings. Otherwise, it is better to leave the data as bytes. (If you attempt to convert non-text data to strings, or if you convert using the wrong charset/encoding you are liable to mangle it ... irreversibly.)
Obviously, you are going to need to think through some of these issues yourself, but basic idea should work ... provided you have enough heap space.
copy the value of btoread to a String like
String s = String.valueof(byteVar);
and goon appending the byte value to the string untill end of the file reaches..
I use the "get" method from java drive api, and I can get the inputstream. but I cannt open the file when I use the inputstream to creat it. It likes the file is broken.
private static String fileurl = "C:\\googletest\\drive\\";
public static void newFile(String filetitle, InputStream stream) throws IOException {
String filepath = fileurl + filetitle;
BufferedInputStream bufferedInputStream=new BufferedInputStream(stream);
byte[] buffer = new byte[bufferedInputStream.available()];
File file = new File(filepath);
if (!file.exists()) {
file.getParentFile().mkdirs();
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(new FileOutputStream(filepath));
while( bufferedInputStream.read(buffer) != -1) {
bufferedOutputStream.write(buffer);
}
bufferedOutputStream.flush();
bufferedOutputStream.close();
}
}
Firstly, C:\googletest\drive\ is not a URL. It is a file system pathname.
Next, the following probably does not do what you think it does:
byte[] buffer = new byte[bufferedInputStream.available()];
The problem is that the available() call can return zero ... for a non-empty stream. The value returned by available() is an estimate of how many bytes that are currently available to read ... right now. That is not necessarily the stream length ... or anything related to it. And indeed the device drivers for some devices consistently return zero, even when there is data to be read.
Finally, this is wrong:
while( bufferedInputStream.read(buffer) != -1) {
bufferedOutputStream.write(buffer);
You are assuming that read returning -1 means that it filled the buffer. That is not so. Any one of the read calls could return with a partly full buffer. But then you write the entire buffer contents to the output stream ... including "junk" from previous reads.
Either or both of the 2nd and 3rd problems could lead to file corruption. In fact, the third one is likely to.
Someone explain to me what InputStream and OutputStream are?
I am confused about the use cases for both InputStream and OutputStream.
If you could also include a snippet of code to go along with your explanation, that would be great. Thanks!
The goal of InputStream and OutputStream is to abstract different ways to input and output: whether the stream is a file, a web page, or the screen shouldn't matter. All that matters is that you receive information from the stream (or send information into that stream.)
InputStream is used for many things that you read from.
OutputStream is used for many things that you write to.
Here's some sample code. It assumes the InputStream instr and OutputStream osstr have already been created:
int i;
while ((i = instr.read()) != -1) {
osstr.write(i);
}
instr.close();
osstr.close();
InputStream is used for reading, OutputStream for writing. They are connected as decorators to one another such that you can read/write all different types of data from all different types of sources.
For example, you can write primitive data to a file:
File file = new File("C:/text.bin");
file.createNewFile();
DataOutputStream stream = new DataOutputStream(new FileOutputStream(file));
stream.writeBoolean(true);
stream.writeInt(1234);
stream.close();
To read the written contents:
File file = new File("C:/text.bin");
DataInputStream stream = new DataInputStream(new FileInputStream(file));
boolean isTrue = stream.readBoolean();
int value = stream.readInt();
stream.close();
System.out.printlin(isTrue + " " + value);
You can use other types of streams to enhance the reading/writing. For example, you can introduce a buffer for efficiency:
DataInputStream stream = new DataInputStream(
new BufferedInputStream(new FileInputStream(file)));
You can write other data such as objects:
MyClass myObject = new MyClass(); // MyClass have to implement Serializable
ObjectOutputStream stream = new ObjectOutputStream(
new FileOutputStream("C:/text.obj"));
stream.writeObject(myObject);
stream.close();
You can read from other different input sources:
byte[] test = new byte[] {0, 0, 1, 0, 0, 0, 1, 1, 8, 9};
DataInputStream stream = new DataInputStream(new ByteArrayInputStream(test));
int value0 = stream.readInt();
int value1 = stream.readInt();
byte value2 = stream.readByte();
byte value3 = stream.readByte();
stream.close();
System.out.println(value0 + " " + value1 + " " + value2 + " " + value3);
For most input streams there is an output stream, also. You can define your own streams to reading/writing special things and there are complex streams for reading complex things (for example there are Streams for reading/writing ZIP format).
From the Java Tutorial:
A stream is a sequence of data.
A program uses an input stream to read data from a source, one item at a time:
A program uses an output stream to write data to a destination, one item at time:
The data source and data destination pictured above can be anything that holds, generates, or consumes data. Obviously this includes disk files, but a source or destination can also be another program, a peripheral device, a network socket, or an array.
Sample code from oracle tutorial:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
This program uses byte streams to copy xanadu.txt file to outagain.txt , by writing one byte at a time
Have a look at this SE question to know more details about advanced Character streams, which are wrappers on top of Byte Streams :
byte stream and character stream
you read from an InputStream and write to an OutputStream.
for example, say you want to copy a file. You would create a FileInputStream to read from the source file and a FileOutputStream to write to the new file.
If your data is a character stream, you could use a FileReader instead of an InputStream and a FileWriter instead of an OutputStream if you prefer.
InputStream input = ... // many different types
OutputStream output = ... // many different types
byte[] buffer = new byte[1024];
int n = 0;
while ((n = input.read(buffer)) != -1)
output.write(buffer, 0, n);
input.close();
output.close();
OutputStream is an abstract class that represents writing output. There are many different OutputStream classes, and they write out to certain things (like the screen, or Files, or byte arrays, or network connections, or etc). InputStream classes access the same things, but they read data in from them.
Here is a good basic example of using FileOutputStream and FileInputStream to write data to a file, then read it back in.
A stream is a continuous flow of liquid, air, or gas.
Java stream is a flow of data from a source into a destination. The source or destination can be a disk, memory, socket, or other programs. The data can be bytes, characters, or objects. The same applies for C# or C++ streams. A good metaphor for Java streams is water flowing from a tap into a bathtub and later into a drainage.
The data represents the static part of the stream; the read and write methods the dynamic part of the stream.
InputStream represents a flow of data from the source, the OutputStream represents a flow of data into the destination.
Finally, InputStream and OutputStream are abstractions over low-level access to data, such as C file pointers.
Stream: In laymen terms stream is data , most generic stream is binary representation of data.
Input Stream : If you are reading data from a file or any other source , stream used is input stream. In a simpler terms input stream acts as a channel to read data.
Output Stream : If you want to read and process data from a source (file etc) you first need to save the data , the mean to store data is output stream .
An output stream is generally related to some data destination like a file or a network etc.In java output stream is a destination where data is eventually written and it ends
import java.io.printstream;
class PPrint {
static PPrintStream oout = new PPrintStream();
}
class PPrintStream {
void print(String str) {
System.out.println(str)
}
}
class outputstreamDemo {
public static void main(String args[]) {
System.out.println("hello world");
System.out.prinln("this is output stream demo");
}
}
For one kind of InputStream, you can think of it as a "representation" of a data source, like a file.
For example:
FileInputStream fileInputStream = new FileInputStream("/path/to/file/abc.txt");
fileInputStream represents the data in this path, which you can use read method to read bytes from the file.
For the other kind of InputStream, they take in another inputStream and do further processing, like decompression.
For example:
GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
gzipInputStream will treat the fileInputStream as a compressed data source. When you use the read(buffer, 0, buffer.length) method, it will decompress part of the gzip file into the buffer you provide.
The reason why we use InputStream because as the data in the source becomes larger and larger, say we have 500GB data in the source file, we don't want to hold everything in the memory (expensive machine; not friendly for GC allocation), and we want to get some result faster (reading the whole file may take a long time).
The same thing for OutputStream. We can start moving some result to the destination without waiting for the whole thing to finish, plus less memory consumption.
If you want more explanations and examples, you have check these summaries: InputStream, OutputStream, How To Use InputStream, How To Use OutputStream
In continue to the great other answers, in my simple words:
Stream - like mentioned #Sher Mohammad is data.
Input stream - for example is to get input – data – from the file. The case is when I have a file (the user upload a file – input) – and I want to read what we have there.
Output Stream – is the vice versa. For example – you are generating an excel file, and output it to some place.
The “how to write” to the file, is defined at the sender (the excel workbook class) not at the file output stream.
See here example in this context.
try (OutputStream fileOut = new FileOutputStream("xssf-align.xlsx")) {
wb.write(fileOut);
}
wb.close();
I am trying to read a single file from a java.util.zip.ZipInputStream, and copy it into a java.io.ByteArrayOutputStream (so that I can then create a java.io.ByteArrayInputStream and hand that to a 3rd party library that will end up closing the stream, and I don't want my ZipInputStream getting closed).
I'm probably missing something basic here, but I never enter the while loop here:
ByteArrayOutputStream streamBuilder = new ByteArrayOutputStream();
int bytesRead;
byte[] tempBuffer = new byte[8192*2];
try {
while ((bytesRead = zipStream.read(tempBuffer)) != -1) {
streamBuilder.write(tempBuffer, 0, bytesRead);
}
} catch (IOException e) {
// ...
}
What am I missing that will allow me to copy the stream?
Edit:
I should have mentioned earlier that this ZipInputStream is not coming from a file, so I don't think I can use a ZipFile. It is coming from a file uploaded through a servlet.
Also, I have already called getNextEntry() on the ZipInputStream before getting to this snippet of code. If I don't try copying the file into another InputStream (via the OutputStream mentioned above), and just pass the ZipInputStream to my 3rd party library, the library closes the stream, and I can't do anything more, like dealing with the remaining files in the stream.
Your loop looks valid - what does the following code (just on it's own) return?
zipStream.read(tempBuffer)
if it's returning -1, then the zipStream is closed before you get it, and all bets are off. It's time to use your debugger and make sure what's being passed to you is actually valid.
When you call getNextEntry(), does it return a value, and is the data in the entry meaningful (i.e. does getCompressedSize() return a valid value)? IF you are just reading a Zip file that doesn't have read-ahead zip entries embedded, then ZipInputStream isn't going to work for you.
Some useful tidbits about the Zip format:
Each file embedded in a zip file has a header. This header can contain useful information (such as the compressed length of the stream, it's offset in the file, CRC) - or it can contain some magic values that basically say 'The information isn't in the stream header, you have to check the Zip post-amble'.
Each zip file then has a table that is attached to the end of the file that contains all of the zip entries, along with the real data. The table at the end is mandatory, and the values in it must be correct. In contrast, the values embedded in the stream do not have to be provided.
If you use ZipFile, it reads the table at the end of the zip. If you use ZipInputStream, I suspect that getNextEntry() attempts to use the entries embedded in the stream. If those values aren't specified, then ZipInputStream has no idea how long the stream might be. The inflate algorithm is self terminating (you actually don't need to know the uncompressed length of the output stream in order to fully recover the output), but it's possible that the Java version of this reader doesn't handle this situation very well.
I will say that it's fairly unusual to have a servlet returning a ZipInputStream (it's much more common to receive an inflatorInputStream if you are going to be receiving compressed content.
You probably tried reading from a FileInputStream like this:
ZipInputStream in = new ZipInputStream(new FileInputStream(...));
This won’t work since a zip archive can contain multiple files and you need to specify which file to read.
You could use java.util.zip.ZipFile and a library such as IOUtils from Apache Commons IO or ByteStreams from Guava that assist you in copying the stream.
Example:
ByteArrayOutputStream out = new ByteArrayOutputStream();
try (ZipFile zipFile = new ZipFile("foo.zip")) {
ZipEntry zipEntry = zipFile.getEntry("fileInTheZip.txt");
try (InputStream in = zipFile.getInputStream(zipEntry)) {
IOUtils.copy(in, out);
}
}
I'd use IOUtils from the commons io project.
IOUtils.copy(zipStream, byteArrayOutputStream);
You're missing call
ZipEntry entry = (ZipEntry) zipStream.getNextEntry();
to position the first byte decompressed of the first entry.
ByteArrayOutputStream streamBuilder = new ByteArrayOutputStream();
int bytesRead;
byte[] tempBuffer = new byte[8192*2];
ZipEntry entry = (ZipEntry) zipStream.getNextEntry();
try {
while ( (bytesRead = zipStream.read(tempBuffer)) != -1 ){
streamBuilder.write(tempBuffer, 0, bytesRead);
}
} catch (IOException e) {
...
}
You could implement your own wrapper around the ZipInputStream that ignores close() and hand that off to the third-party library.
thirdPartyLib.handleZipData(new CloseIgnoringInputStream(zipStream));
class CloseIgnoringInputStream extends InputStream
{
private ZipInputStream stream;
public CloseIgnoringInputStream(ZipInputStream inStream)
{
stream = inStream;
}
public int read() throws IOException {
return stream.read();
}
public void close()
{
//ignore
}
public void reallyClose() throws IOException
{
stream.close();
}
}
I would call getNextEntry() on the ZipInputStream until it is at the entry you want (use ZipEntry.getName() etc.). Calling getNextEntry() will advance the "cursor" to the beginning of the entry that it returns. Then, use ZipEntry.getSize() to determine how many bytes you should read using zipInputStream.read().
It is unclear how you got the zipStream. It should work when you get it like this:
zipStream = zipFile.getInputStream(zipEntry)
t is unclear how you got the zipStream. It should work when you get it like this:
zipStream = zipFile.getInputStream(zipEntry)
If you are obtaining the ZipInputStream from a ZipFile you can get one stream for the 3d party library, let it use it, and you obtain another input stream using the code before.
Remember, an inputstream is a cursor. If you have the entire data (like a ZipFile) you can ask for N cursors over it.
A diferent case is if you only have an "GZip" inputstream, only an zipped byte stream. In that case you ByteArrayOutputStream buffer makes all sense.
Please try code bellow
private static byte[] getZipArchiveContent(File zipName) throws WorkflowServiceBusinessException {
BufferedInputStream buffer = null;
FileInputStream fileStream = null;
ByteArrayOutputStream byteOut = null;
byte data[] = new byte[BUFFER];
try {
try {
fileStream = new FileInputStream(zipName);
buffer = new BufferedInputStream(fileStream);
byteOut = new ByteArrayOutputStream();
int count;
while((count = buffer.read(data, 0, BUFFER)) != -1) {
byteOut.write(data, 0, count);
}
} catch(Exception e) {
throw new WorkflowServiceBusinessException(e.getMessage(), e);
} finally {
if(null != fileStream) {
fileStream.close();
}
if(null != buffer) {
buffer.close();
}
if(null != byteOut) {
byteOut.close();
}
}
} catch(Exception e) {
throw new WorkflowServiceBusinessException(e.getMessage(), e);
}
return byteOut.toByteArray();
}
Check if the input stream is positioned in the begging.
Otherwise, as implementation: I do not think that you need to write to the result stream while you are reading, unless you process this exact stream in another thread.
Just create a byte array, read the input stream, then create the output stream.