Java - Resetting InputStream

Java - Resetting InputStream - java

I'm dealing with some Java code in which there's an InputStream that I read one time and then I need to read it once again in the same method.
The problem is that I need to reset it's position to the start in order to read it twice.
I've found a hack-ish solution to the problem:
is.mark(Integer.MAX_VALUE);
//Read the InputStream is fully
// { ... }
try
{
is.reset();
}
catch (IOException e)
{
e.printStackTrace();
}
Does this solution lead to some unespected behaviours? Or it will work in it's dumbness?

As written, you have no guarantees, because mark() is not required to report whether it was successful. To get a guarantee, you must first call markSupported(), and it must return true.
Also as written, the specified read limit is very dangerous. If you happen to be using a stream that buffers in-memory, it will potentially allocate a 2GB buffer. On the other hand, if you happen to be using a FileInputStream, you're fine.
A better approach is to use a BufferedInputStream with an explicit buffer.

It depends on the InputStream implementation. You can also think whether it will be better if you use byte[]. The easiest way is to use Apache commons-io:
byte[] bytes = IOUtils.toByteArray(inputSream);

You can't do this reliably; some InputStreams (such as ones connected to terminals or sockets) don't support mark and reset (see markSupported). If you really have to traverse the data twice, you need to read it into your own buffer.

Instead of trying to reset the InputStream load it into a buffer like a StringBuilder or if it's a binary data stream a ByteArrayOutputStream. You can then process the buffer within the method as many times as you want.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int read = 0;
byte[] buff = new byte[1024];
while ((read = inStream.read(buff)) != -1) {
bos.write(buff, 0, read);
}
byte[] streamData = bos.toByteArray();

For me, the easiest solution was to pass the object from which the InputStream could be obtained, and just obtain it again. In my case, it was from a ContentResolver.

Related

Transfer data from ReadableByteChannel to file

Goal: Decrypt data from one source and write the decrypted data to a file.
try (FileInputStream fis = new FileInputStream(targetPath.toFile());
ReadableByteChannel channel = newDecryptedByteChannel(path, associatedData))
{
FileChannel fc = fis.getChannel();
long position = 0;
while (position < ???)
{
position += fc.transferFrom(channel, position, CHUNK_SIZE);
}
}
The implementation of newDecryptedByteChannel(Path,byte[]) should not be of interest, it just returns a ReadableByteChannel.
Problem: What is the condition to end the while loop? When is the "end of the byte channel" reached? Is transferFrom the right choice here?
This question might be related (answer is to just set the count to Long.MAX_VALUE). Unfortunately this doesn't help me because the docs say that up to count bytes may be transfered, depending upon the natures and states of the channels.
Another thought was to just check whether the amount of bytes actually transferred is 0 (returned from transferFrom), but this condition may be true if the source channel is non-blocking and has fewer than count bytes immediately available in its input buffer.

It is one of the bizarre features of FileChannel. transferFrom() that it never tells you about end of stream. You have to know the input length independently.
I would just use streams for this: specifically, a CipherInputStream around a BufferedInputStream around a FileInputStream, and a FileOutputStream.
But the code you posted doesn't make any sense anyway. It can't work. You are transferring into the input file, and via a channel that was derived from a FileInputStream, so it is read-only, so transferFrom() will throw an exception.

As commented by #user207421, as you are reading from ReadableByteChannel, the target channel needs to be derived from FileOutputStream rather than FileInputStream. And the condition for ending loop in your code should be the size of file underlying the ReadableByteChannel which is not possible to get from it unless you are able to get FileChannel and find the size through its size method.
The way I could find for transferring is through ByteBuffer as below.
ByteBuffer buf = ByteBuffer.allocate(1024*8);
while(readableByteChannel.read(buf)!=-1)
{
buf.flip();
fc.write(buf); //fc is FileChannel derived from FileOutputStream
buf.compact();
}
buf.flip();
while(buf.hasRemainig())
{
fc.write(buf);
}

How can I use ReadableByteChannel to get file contents and store it in byteBuffer?

Below is the code that I have written. I want to do the simple thing, storing binary file data into byteBuffer.
File file = new File(fileName);
try {
ReadableByteChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(file.length());
// How can use buf.read to get all the contents?
} catch (Exception e){
}
I was wondering
how can I use read to get all data from channel and store it in ByteBuffer
if there is more elegant way to allocate ByteBuffer, other than using File object to get the length of the file

I prefer to use memory mapping.
FileChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = channel.map(MapMode.READ_ONLY,0,channel.size());
If the file is greater than 2 GB, you have to have more than one mapping. On the plus side this takes around 10 ms regardless of size and doesn't use much heap or direct memory regardless of the size of the file.

From the ReadableByteChannel Javadocs
read(ByteBuffer dst)
An attempt is made to read up to r bytes from the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked.
So ... channel.read(buf);
As for your second question, if you want to read the entire contents of the file into memory at once that seems like a reasonable approach.

What could lead to the creation of false EOF in a GZip compressed data stream

We are streaming data between a server (written in .Net running on Windows) to a client (written in Java running on Ubuntu) in batches. The data is in XML format. Occasionally the Java client throws an unexpected EOF while trying decompress the stream. The message content always varies and is user driven. The response from the client is also compressed using GZip. This never fails and seems to be rock solid. The response from the client is controlled by the system.
Is there a chance that some arrangement of characters or some special characters are creating false EOF markers? Could it be white-space related? Is GZip suitable for compressing XML?
I am assuming that the code to read and write from the input/output streams works because we only occasionally gets this exception and when we inspect the user data at the time there seems to be special characters (which is why I asked the question) such as the '#' sign.
Any ideas?
UPDATE:
The actual code as requested. I thought it wasn't this due to the fact that I had been to a couple of sites to get help on this issue and they all more or less had the same code. Some sites mentioned appended GZip. Something to do with GZip creating multiple segments?
public String receive() throws IOException {
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(8192);
do {
int nrBytes = in.read(buffer);
if (nrBytes > 0) {
baos.write(buffer, 0, nrBytes);
}
} while (in.available() > 0);
return compressor.decompress(baos.toByteArray());
}
public String decompress(byte[] data) throws IOException {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(data);
try {
GZIPInputStream inflater = new GZIPInputStream(in);
byte[] byteBuffer = new byte[8192];
int r;
while((r = inflater.read(byteBuffer)) > 0 ) {
buffer.write(byteBuffer, 0, r);
}
} catch (IOException e) {
log.error("Could not decompress stream", e);
throw e;
}
return new String(buffer.toByteArray());
}
At first I thought there must be something wrong with the way that I am reading in the stream and I thought perhaps I am not looping properly. I then generated a ton of data to be streamed and checked that it was looping. Also the fact they it happens so seldom and so far has not been reproducable lead me to believe that it was the content rather than the scenario. But at this point I am totally baffled and for all I know it is the code.
Thanks again everyone.
Update 2:
As requested the .Net code:
Dim DataToCompress = Encoding.UTF8.GetBytes(Data)
Dim CompressedData = Compress(DataToCompress)
To get the raw data into bytes. And then it gets compressed
Private Function Compress(ByVal Data As Byte()) As Byte()
Try
Using MS = New MemoryStream()
Using Compression = New GZipStream(MS, CompressionMode.Compress)
Compression.Write(Data, 0, Data.Length)
Compression.Flush()
Compression.Close()
Return MS.ToArray()
End Using
End Using
Catch ex As Exception
Log.Error("Error trying to compress data", ex)
Throw
End Try
End Function
Update 3: Also added more java code. the in variable is the InputStream return from socket.getInputStream()

It certainly shouldn't be due to the data involved - the streams deal with binary data, so that shouldn't make any odds at all.
However, without seeing your code, it's hard to say for sure. My first port of call would be to check anywhere that you're using InputStream.read() - check that you're using the return value correctly, rather than assuming a single call to read() will fill the buffer.
If you could provide some code, that would help a lot...

I would suspect that for some reason the data is altered underway, by treating it as text, not as binary, so it may either be \n conversions or a codepage alteration.
How is the gzipped stream transferred between the two systems?

It is not pssible. EOF in TCP is delivered as an out of band FIN segment, not via the data.

Java: How do I pipe InputStream to standard out?

Is there an easy (therefore quick) way to accomplish this? Basically just take some input stream, could be something like a socket.getInputStream(), and have the stream's buffer autmoatically redirect to standard out?

There are no easy ways to do it, because InputStream has a pull-style interface, when OutputStream has a push-style one. You need some kind of pumping loop to pull data from InputStream and push them into OutputStream. Something like this (run it in a separate thread if necessary):
int size = 0;
byte[] buffer = new byte[1024];
while ((size = in.read(buffer)) != -1) out.write(buffer, 0, size);
It's already implemented in Apache Commons IO as IOUtils.copy()

You need a simple thread which reads from the input stream and writes to standard output. Make sure it yields to other threads.

Since Java 9, you can use InputStream.transferTo
Example
try (InputStream stream = Application.class.getResourceAsStream("/test.txt")) {
stream.transferTo(System.out);
}

Copy binary data from URL to file in Java without intermediate copy

I'm updating some old code to grab some binary data from a URL instead of from a database (the data is about to be moved out of the database and will be accessible by HTTP instead). The database API seemed to provide the data as a raw byte array directly, and the code in question wrote this array to a file using a BufferedOutputStream.
I'm not at all familiar with Java, but a bit of googling led me to this code:
URL u = new URL("my-url-string");
URLConnection uc = u.openConnection();
uc.connect();
InputStream in = uc.getInputStream();
ByteArrayOutputStream out = new ByteArrayOutputStream();
final int BUF_SIZE = 1 << 8;
byte[] buffer = new byte[BUF_SIZE];
int bytesRead = -1;
while((bytesRead = in.read(buffer)) > -1) {
out.write(buffer, 0, bytesRead);
}
in.close();
fileBytes = out.toByteArray();
That seems to work most of the time, but I have a problem when the data being copied is large - I'm getting an OutOfMemoryError for data items that worked fine with the old code.
I'm guessing that's because this version of the code has multiple copies of the data in memory at the same time, whereas the original code didn't.
Is there a simple way to grab binary data from a URL and save it in a file without incurring the cost of multiple copies in memory?

Instead of writing the data to a byte array and then dumping it to a file, you can directly write it to a file by replacing the following:
ByteArrayOutputStream out = new ByteArrayOutputStream();
With:
FileOutputStream out = new FileOutputStream("filename");
If you do so, there is no need for the call out.toByteArray() at the end. Just make sure you close the FileOutputStream object when done, like this:
out.close();
See the documentation of FileOutputStream for more details.

I don't know what you mean with "large" data, but try using the JVM parameter
java -Xmx 256m ...
which sets the maximum heap size to 256 MByte (or any value you like).

If you need the Content-Length and your web-server is somewhat standard conforming, then it should provide you a "Content-Length" header.
URLConnection#getContentLength() should give you that information upfront so that you are able to create your file. (Be aware that if your HTTP server is misconfigured or under control of an evil entity, that header may not match the number of bytes received. In that case, why dont you stream to a temp-file first and copy that file later?)
In addition to that: A ByteArrayInputStream is a horrible memory allocator. It always doubles the buffer size, so if you read a 32MB + 1 byte file, then you end up with a 64MB buffer. It might be better to implement a own, smarter byte-array-stream, like this one:
http://source.pentaho.org/pentaho-reporting/engines/classic/trunk/core/source/org/pentaho/reporting/engine/classic/core/util/MemoryByteArrayOutputStream.java

subclassing ByteArrayOutputStream gives you access to the buffer and the number of bytes in it.
But of course, if all you want to do is to store de data into a file, you are better off using a FileOutputStream.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Resetting InputStream - java

It depends on the InputStream implementation. You can also think whether it will be better if you use byte[]. The easiest way is to use Apache commons-io: byte[] bytes = IOUtils.toByteArray(inputSream);

You can't do this reliably; some InputStreams (such as ones connected to terminals or sockets) don't support mark and reset (see markSupported). If you really have to traverse the data twice, you need to read it into your own buffer.

For me, the easiest solution was to pass the object from which the InputStream could be obtained, and just obtain it again. In my case, it was from a ContentResolver.

Related

Transfer data from ReadableByteChannel to file

How can I use ReadableByteChannel to get file contents and store it in byteBuffer?

What could lead to the creation of false EOF in a GZip compressed data stream

Java: How do I pipe InputStream to standard out?

Copy binary data from URL to file in Java without intermediate copy

Categories

Resources