InputStreamReader buffering issue - java

I am reading data from a file that has, unfortunately, two types of character encoding.
There is a header and a body. The header is always in ASCII and defines the character set that the body is encoded in.
The header is not fixed length and must be run through a parser to determine its content/length.
The file may also be quite large so I need to avoid bring the entire content into memory.
So I started off with a single InputStream. I wrap it initially with an InputStreamReader with ASCII and decode the header and extract the character set for the body. All good.
Then I create a new InputStreamReader with the correct character set, drop it over the same InputStream and start trying to read the body.
Unfortunately it appears, javadoc confirms this, that InputStreamReader may choose to read-ahead for effeciency purposes. So the reading of the header chews some/all of the body.
Does anyone have any suggestions for working round this issue? Would creating a CharsetDecoder manually and feeding in one byte at a time but a good idea (possibly wrapped in a custom Reader implementation?)
Thanks in advance.
EDIT: My final solution was to write a InputStreamReader that has no buffering to ensure I can parse the header without chewing part of the body. Although this is not terribly efficient I wrap the raw InputStream with a BufferedInputStream so it won't be an issue.
// An InputStreamReader that only consumes as many bytes as is necessary
// It does not do any read-ahead.
public class InputStreamReaderUnbuffered extends Reader
{
private final CharsetDecoder charsetDecoder;
private final InputStream inputStream;
private final ByteBuffer byteBuffer = ByteBuffer.allocate( 1 );
public InputStreamReaderUnbuffered( InputStream inputStream, Charset charset )
{
this.inputStream = inputStream;
charsetDecoder = charset.newDecoder();
}
#Override
public int read() throws IOException
{
boolean middleOfReading = false;
while ( true )
{
int b = inputStream.read();
if ( b == -1 )
{
if ( middleOfReading )
throw new IOException( "Unexpected end of stream, byte truncated" );
return -1;
}
byteBuffer.clear();
byteBuffer.put( (byte)b );
byteBuffer.flip();
CharBuffer charBuffer = charsetDecoder.decode( byteBuffer );
// although this is theoretically possible this would violate the unbuffered nature
// of this class so we throw an exception
if ( charBuffer.length() > 1 )
throw new IOException( "Decoded multiple characters from one byte!" );
if ( charBuffer.length() == 1 )
return charBuffer.get();
middleOfReading = true;
}
}
public int read( char[] cbuf, int off, int len ) throws IOException
{
for ( int i = 0; i < len; i++ )
{
int ch = read();
if ( ch == -1 )
return i == 0 ? -1 : i;
cbuf[ i ] = (char)ch;
}
return len;
}
public void close() throws IOException
{
inputStream.close();
}
}

Why don't you use 2 InputStreams? One for reading the header and another for the body.
The second InputStream should skip the header bytes.

Here is the pseudo code.
Use InputStream, but do not wrap a
Reader around it.
Read bytes containing header and
store them into
ByteArrayOutputStream.
Create ByteArrayInputStream from
ByteArrayOutputStream and decode
header, this time wrap ByteArrayInputStream
into Reader with ASCII charset.
Compute the length of non-ascii
input, and read that number of bytes
into another ByteArrayOutputStream.
Create another ByteArrayInputStream
from the second
ByteArrayOutputStream and wrap it
with Reader with charset from the
header.

I suggest rereading the stream from the start with a new InputStreamReader. Perhaps assume that InputStream.mark is supported.

My first thought is to close the stream and reopen it, using InputStream#skip to skip past the header before giving the stream to the new InputStreamReader.
If you really, really don't want to reopen the file, you could use file descriptors to get more than one stream to the file, although you may have to use channels to have multiple positions within the file (since you can't assume you can reset the position with reset, it may not be supported).

It's even easier:
As you said, your header is always in ASCII. So read the header directly from the InputStream, and when you're done with it, create the Reader with the correct encoding and read from it
private Reader reader;
private InputStream stream;
public void read() {
int c = 0;
while ((c = stream.read()) != -1) {
// Read encoding
if ( headerFullyRead ) {
reader = new InputStreamReader( stream, encoding );
break;
}
}
while ((c = reader.read()) != -1) {
// Handle rest of file
}
}

If you wrap the InputStream and limit all reads to just 1 byte at a time, it seems to disable the buffering inside of InputStreamReader.
This way we don't have to rewrite the InputStreamReader logic.
public class OneByteReadInputStream extends InputStream
{
private final InputStream inputStream;
public OneByteReadInputStream(InputStream inputStream)
{
this.inputStream = inputStream;
}
#Override
public int read() throws IOException
{
return inputStream.read();
}
#Override
public int read(byte[] b, int off, int len) throws IOException
{
return super.read(b, off, 1);
}
}
To construct:
new InputStreamReader(new OneByteReadInputStream(inputStream));

Related

How to properly handle InpuStream as a method argument

I've been googling for almost two days and i still can't figure it out. I have this exercise when it passes InpuStream is as an argument and expects me to store whatever is passed and return the count, but I don't know and can't seem to figure how to handle InputStream properly. I always get argument error.
Code:
class Subtitles {
int redenBroj;
int vrPocetok;
int vrKraj;
String text;
public Subtitles() {
redenBroj = 0;
vrPocetok = 0;
vrKraj = 0;
text = null;
}
int loadSubtitles(InputStream is) {
}
}
InputStream is an abstract class. Therefore, the implementation of the method int loadSubtitles should not care on how the given InputStream is implemented - it can be anything, as long as it is a type of InputStream.
You can choose from different subclasses of InputStream so that you can test your method with your own data format:
FileInputStream -- You can use this type of input stream if you want to stream a file:
File sourceFile = new File("source.txt");
InputStream inputStream = new FileInputStream(sourceFile)
ByteArrayInputStream -- This is used to stream an array of bytes.
byte[] input = "this is an example array".getBytes();
InputStream inputStream = new ByteArrayInputStream(input);
Now that you have built an input stream, you can now use them regardless on how it is built:
// Java 9+
byte[] content = inputStream.readAllBytes();
// do something with `content`
-
// before Java 9
int data = inputStream.read();
while (data != -1) {
// doSomething with `data`
data = inputStream.read(); // read next data
}
inputStream.close(); // or use the try-with-resources syntax

Java decompress GZIP stream sequentially

My Java program implements a server that should get a very large file, compressed using gzip, from a client over websockets and should check for some bytes pattern in the file content.
The client sends the file chunks embedded inside a proprietary protocol so I'm getting message after message from the client, parse the message and extract the gzipped file content.
I can't hold the whole file in the program memory so I'm trying to decompress each chunk, process the data and continue to the next chunk.
I'm using the following code:
public static String gzipDecompress(byte[] compressed) throws IOException {
String uncompressed;
try (
ByteArrayInputStream bis = new ByteArrayInputStream(compressed);
GZIPInputStream gis = new GZIPInputStream(bis);
Reader reader = new InputStreamReader(gis);
Writer writer = new StringWriter()
) {
char[] buffer = new char[10240];
for (int length = 0; (length = reader.read(buffer)) > 0; ) {
writer.write(buffer, 0, length);
}
uncompressed = writer.toString();
}
return uncompressed;
}
But I'm getting the following exception when calling the function with the first compressed chunk:
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Reader.read(Reader.java:140)
It's important to mention that I'm not skipping any chunk and trying to decompress the chunks sequentially.
What am I missing?
The problem is that you play with those chunks manually.
The correct way would be to obtain some InputStream, wrap it with GZIPInputStream and then read the data.
InputStream is = // obtain the original gzip stream
GZIPInputStream gis = new GZIPInputStream(is);
Reader reader = new InputStreamReader(gis);
//... proceed reading and so on
GZIPInputStream works in stream fashion, so if you only ask 10kb at a time from your reader, the overall memory footprint will be low regardless of the size of the initial GZIP file.
Update after the question was updated
A possible solution for your situation is to write an InputStream implementation that streams bytes that are being put to it in chunks by your client protocol handler.
Here is a prototype:
public class ProtocolDataInputStream extends InputStream {
private BlockingQueue<byte[]> nextChunks = new ArrayBlockingQueue<byte[]>(100);
private byte[] currentChunk = null;
private int currentChunkOffset = 0;
private boolean noMoreChunks = false;
#Override
public synchronized int read() throws IOException {
boolean takeNextChunk = currentChunk == null || currentChunkOffset >= currentChunk.length;
if (takeNextChunk) {
if (noMoreChunks) {
// stream is exhausted
return -1;
} else {
currentChunk = nextChunks.take();
currentChunkOffset = 0;
}
}
return currentChunk[currentChunkOffset++];
}
#Override
public synchronized int available() throws IOException {
if (currentChunk == null) {
return 0;
} else {
return currentChunk.length - currentChunkOffset;
}
}
public synchronized void addChunk(byte[] chunk, boolean chunkIsLast) {
nextChunks.add(chunk);
if (chunkIsLast) {
noMoreChunks = true;
}
}
}
Your client protocol handler adds byte chunks using addChunk(), while your decompressing code pulls the data out of this stream (via Reader).
Please note that this code has some issues:
The queue being used has a limited size. If addChunk() is being called too frequently, the queue may be filled, which will block addChunk(). This may be desirable or not.
Only read() method is implemented for illustration purposes. For performance, it is better to implement read(byte[]) in the same manner.
Conservative synchornization is used under the assumption that reader (decompressor) and writer (protocol handler calling addChunk()) are different threads.
InterruptedException is not handled on take() to avoid too much details.
If your decompressor and addChunk() execute in the same thread (in the same loop), then you could try to use the InputStream.available() method when pulling using InputStream or Reader.ready() when pulling with a Reader.
An arbitrary sequence of bytes from a gzipped stream is not valid standalone gzip data. One way or another, you must concatenate all the byte chunks.
The easiest way is to accumulate them all with a simple pipe:
import java.io.PipedOutputStream;
import java.io.IOException;
import java.util.zip.GZIPInputStream;
public class ChunkInflater {
private final PipedOutputStream pipe;
private final InputStream stream;
public ChunkInflater()
throws IOException {
pipe = new PipedOutputStream();
stream = new GZIPInputStream(new PipedInputStream(pipe));
}
public InputStream getInputStream() {
return stream;
}
public void addChunk(byte[] compressedChunk)
throws IOException {
pipe.write(compressedChunk);
}
}
Now you have an InputStream you can read in whatever increments you desire. For instance:
ChunkInflater inflater = new ChunkInflater();
Callable<Void> chunkReader = new Callable<Void>() {
#Override
public Void call()
throws IOException {
byte[] chunk;
while ((chunk = readChunkFromSource()) != null) {
inflater.addChunk(chunk);
}
return null;
}
};
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.submit(chunkReader);
executor.shutdown();
Reader reader = new InputStreamReader(inflater.getInputStream());
// read text here

can't work with BufferedInputStream and BufferedReader together

I'm trying to read first line from socket stream with BufferedReader from BufferedInputStream, it reads the first line(1), this is size of some contents(2) in this content i have the size of another content(3)
Reads correctly... ( with BufferedReader, _bin.readLine() )
Reads correctly too... ( with _in.read(byte[] b) )
Won't read, seems there's more content than my size read in (2)
I think problem is that I'm trying to read using BufferedReader and then BufferedInputStream... can anyone help me ?
public HashMap<String, byte[]> readHead() throws IOException {
JSONObject json;
try {
HashMap<String, byte[]> map = new HashMap<>();
System.out.println("reading header");
int headersize = Integer.parseInt(_bin.readLine());
byte[] parsable = new byte[headersize];
_in.read(parsable);
json = new JSONObject(new String(parsable));
map.put("id", lTob(json.getLong(SagConstants.KEY_ID)));
map.put("length", iTob(json.getInt(SagConstants.KEY_SIZE)));
map.put("type", new byte[]{(byte)json.getInt(SagConstants.KEY_TYPE)});
return map;
} catch(SocketException | JSONException e) {
_exception = e.getMessage();
_error_code = SagConstants.ERROR_OCCOURED_EXCEPTION;
return null;
}
}
sorry for bad english and for bad explanation, i tried to explain my problem, hope you understand
file format is so:
size1
{json, length is given size1, there is size2 given}
{second json, length is size2}
_in is BufferedInputStream();
_bin is BufferedReader(_in);
with _bin, i read first line (size1) and convert to integer
with _in, i read next data, where is size2 and length of this data is size1
then im trying to read the last data, its size is size2
something like this:
byte[] b = new byte[secondSize];
_in.read(b);
and nothing happens here, program is paused...
can't work with BufferedInputStream and BufferedReader together
That's correct. If you use any buffered stream or reader on a socket [or indeed any data source], you can't use any other stream or reader with it whatsoever. Data will get 'lost', that is to say read-ahead, in the buffer of the buffered stream or reader, and will not be available to the other stream/reader.
You need to rethink your design.
You create one BufferedReader _bin and BufferedInputStream _in and read a file both of them, but their cursor position is different so second read start from beginning because you use 2 object to read it. You should read size1 with _in too.
int headersize = Integer.parseInt(readLine(_in));
byte[] parsable = new byte[headersize];
_in.read(parsable);
Use below readLine to read all data with BufferedInputStream.
private final static byte NL = 10;// new line
private final static byte EOF = -1;// end of file
private final static byte EOL = 0;// end of line
private static String readLine(BufferedInputStream reader,
String accumulator) throws IOException {
byte[] container = new byte[1];
reader.read(container);
byte byteRead = container[0];
if (byteRead == NL || byteRead == EOL || byteRead == EOF) {
return accumulator;
}
String input = "";
input = new String(container, 0, 1);
accumulator = accumulator + input;
return readLine(reader, accumulator);
}

Safely reading http request headers in java

I'm building my own HTTP webserver in java and would like to implement some security measures while reading the http request header from a socket inputstream.
I'm trying to prevent scenario's where someone sending extremely long single line headers or absurd amounts of header lines would cause memory overflows or other things you wouldn't want.
I'm currently trying to do this by reading 8kb of data into a byte array and parse all the headers within the buffer I just created.
But as far as I know this means your inputstream's current offset is always already 8kb from it's starting point, even if you had only 100bytes of header.
the code I have so far:
InputStream stream = socket.getInputStream();
HashMap<String, String> headers = new HashMap<String, String>();
byte [] buffer = new byte[8*1024];
stream.read( buffer , 0 , 8*1024);
ByteArrayInputStream bytestream = new ByteArrayInputStream( buffer );
InputStreamReader streamReader = new InputStreamReader( bytestream );
BufferedReader reader = new BufferedReader( streamReader );
String requestline = reader.readLine();
for ( ;; )
{
String line = reader.readLine();
if ( line.equals( "" ) )
break;
String[] header = line.split( ":" , 2 );
headers.put( header[0] , header[1] ); //TODO: check for bad header
}
//if contentlength > 0
// read body
So my question is, how can I be sure that I'm reading the body data (if any) starting from the correct position in the inputstream?
I don't exactly use streams a lot so I don't really have a feel for them and google hasn't been helpful so far
I figured out an answer myself. (was easier than I thought it would be)
If I were to guess it's not buffered (I've no idea when something is buffered anyway) but it works.
public class SafeHttpHeaderReader
{
public static final int MAX_READ = 8*1024;
private InputStream stream;
private int bytesRead;
public SafeHttpHeaderReader(InputStream stream)
{
this.stream = stream;
bytesRead = 0;
}
public boolean hasReachedMax()
{
return bytesRead >= MAX_READ;
}
public String readLine() throws IOException, Http400Exception
{
String s = "";
while(bytesRead < MAX_READ)
{
String n = read();
if(n.equals( "" ))
break;
if(n.equals( "\r" ))
{
if(read().equals( "\n" ))
break;
throw new Http400Exception();
}
s += n;
}
return s;
}
private String read() throws IOException
{
byte b = readByte();
if(b == -1)
return "";
return new String( new byte[]{b} , "ASCII");
}
private byte readByte() throws IOException
{
byte b = (byte) stream.read();
bytesRead ++;
return b;
}
}

Read Numeric Values From text File that have been written in Android

I need to be able to read the bytes from a file in android.
Everywhere I look, it seems that FileInputStream should be used to read bytes from a file but that is not what I want to do.
I want to be able to read a text file that contains (edit) a textual representation of byte-wide numeric values (/edit) that I want to save to an array.
An example of the text file I want to have converted to a byte array follows:
0x04 0xF2 0x33 0x21 0xAA
The final file will be much longer. Using FileInputStream takes the values of each character where I want to save an array of length five to have the values listed above.
I want the array to be processed like:
ExampleArray[0] = (byte) 0x04;
ExampleArray[1] = (byte) 0xF2;
ExampleArray[2] = (byte) 0x33;
ExampleArray[3] = (byte) 0x21;
ExampleArray[4] = (byte) 0xAA;
Using FileInputStream on a text file returns the ASCII values of the characters and not the values I need written to the array.
The simplest solution is to use FileInputStream.read(byte[] a) method which will transfer the bytes from file into byte array.
Edit: It seems I've misread the requirements. So the file contains the text representation of bytes.
Scanner scanner = new Scanner(new FileInputStream(FILENAME));
String input;
while (scanner.hasNext()) {
input = scanner.next();
long number = Long.decode(input);
// do something with the value
}
Old answer (obviously wrong for this case, but I'll leave it for posterity):
Use a FileInputStream's read(byte[]) method.
FileInputStream in = new FileInoutStream(filename);
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = in.read(buffer, 0, buffer.length);
You just don't store bytes as text. Never!
Because 0x00 can be written as one byte in a file, or as a string, in this case (hex) taking up 4 times more space.
If you're required to do this, discuss how awful this decision would be!
I will edit my answer if you can provide a sensible reason though.
You would only save stuff as actual text, if:
It is easier (not the case)
It adds value (if an increase in filesize by over 4 (spaces count) adds value, then yes)
If users should be able to edit the file (then you would omit the "0x"...)
You can write bytes like this:
public static void writeBytes(byte[] in, File file, boolean append) throws IOException {
FileOutputStream fos = null;
try {
fos = new FileOutputStream(file, append);
fos.write(in);
} finally {
if (fos != null)
fos.close();
}
}
and read like this:
public static byte[] readBytes(File file) throws IOException {
return readBytes(file, (int) file.length());
}
public static byte[] readBytes(File file, int length) throws IOException {
byte[] content = new byte[length];
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
while (length > 0)
length -= fis.read(content);
} finally {
if (fis != null)
fis.close();
}
return content;
}
and therefore have:
public static void writeString(String in, File file, String charset, boolean append)
throws IOException {
writeBytes(in.getBytes(charset), file, append);
}
public static String readString(File file, String charset) throws IOException {
return new String(readBytes(file), charset);
}
to write and read strings.
Note that I don't use the try-with-resource construct because Android's current Java source level is too low for that. :(

Categories

Resources