Safely reading http request headers in java - java

I'm building my own HTTP webserver in java and would like to implement some security measures while reading the http request header from a socket inputstream.
I'm trying to prevent scenario's where someone sending extremely long single line headers or absurd amounts of header lines would cause memory overflows or other things you wouldn't want.
I'm currently trying to do this by reading 8kb of data into a byte array and parse all the headers within the buffer I just created.
But as far as I know this means your inputstream's current offset is always already 8kb from it's starting point, even if you had only 100bytes of header.
the code I have so far:
InputStream stream = socket.getInputStream();
HashMap<String, String> headers = new HashMap<String, String>();
byte [] buffer = new byte[8*1024];
stream.read( buffer , 0 , 8*1024);
ByteArrayInputStream bytestream = new ByteArrayInputStream( buffer );
InputStreamReader streamReader = new InputStreamReader( bytestream );
BufferedReader reader = new BufferedReader( streamReader );
String requestline = reader.readLine();
for ( ;; )
{
String line = reader.readLine();
if ( line.equals( "" ) )
break;
String[] header = line.split( ":" , 2 );
headers.put( header[0] , header[1] ); //TODO: check for bad header
}
//if contentlength > 0
// read body
So my question is, how can I be sure that I'm reading the body data (if any) starting from the correct position in the inputstream?
I don't exactly use streams a lot so I don't really have a feel for them and google hasn't been helpful so far

I figured out an answer myself. (was easier than I thought it would be)
If I were to guess it's not buffered (I've no idea when something is buffered anyway) but it works.
public class SafeHttpHeaderReader
{
public static final int MAX_READ = 8*1024;
private InputStream stream;
private int bytesRead;
public SafeHttpHeaderReader(InputStream stream)
{
this.stream = stream;
bytesRead = 0;
}
public boolean hasReachedMax()
{
return bytesRead >= MAX_READ;
}
public String readLine() throws IOException, Http400Exception
{
String s = "";
while(bytesRead < MAX_READ)
{
String n = read();
if(n.equals( "" ))
break;
if(n.equals( "\r" ))
{
if(read().equals( "\n" ))
break;
throw new Http400Exception();
}
s += n;
}
return s;
}
private String read() throws IOException
{
byte b = readByte();
if(b == -1)
return "";
return new String( new byte[]{b} , "ASCII");
}
private byte readByte() throws IOException
{
byte b = (byte) stream.read();
bytesRead ++;
return b;
}
}

Related

Android HttpUrlConnection InputStream reading error(JSONException: Unterminated array)

I am using an HttpUrlConnection to GET a very large JSON array from the web. I am reading the data 500 bytes at a time as so:
public String getJSON(String myurl) throws IOException {
URL url = new URL(myurl);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
try {
InputStream in = new BufferedInputStream(urlConnection.getInputStream());
String result = readIt(in, 500) ;
return result ;
//Log.d(TAG, result);
}
finally {
urlConnection.disconnect();
}
}
public String readIt(InputStream stream, int len) throws IOException, UnsupportedEncodingException {
StringBuilder result = new StringBuilder();
InputStreamReader reader = null;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
while(reader.read(buffer) != -1)
{
System.out.println("!##: " + new String(buffer)) ;
result.append(new String(buffer)) ;
buffer = new char[len];
}
System.out.println(result.length()) ;
return result.toString();
}
This works fine on some phones, but not on newer phones. On newer phones I realized that the result JSON string was starting to contain garbage characters once it got to character 2048.
Some of my garbage return data:
ST AUGUSTINE, FL","2012050��������������������������������������
And the full error is:
Error: org.json.JSONException: Unterminated array at character 40549 of {"COLUMNS":["IMAGELI
Probably you append a wrong buffer to your string. You should count the number of char you get when reading and append them to the string but no more:
String str = new String(); // or use a StringBuilder if you prefer
char[] buffer = new char[len];
while ((count = reader.read(buffer, 0, len)) > 0)
{ str += new String(buffer, 0, count); }
Avoid recreate your buffer each time ! You allocate a new one for each loop... Reuse the buffer, as you have flushed it in str.
Be carreful when debugging: you cannot print a too long string in logcat (it will be cut if too long). But your str should be fine and should not contain any garbage data anymore.

URLConnection doesn't read whole page

In my app I need to download some web page. I do it in a way like this
URL url = new URL(myUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(5000000);//5 seconds to download
conn.setConnectTimeout(5000000);//5 seconds to connect
conn.setRequestMethod("GET");
conn.setDoInput(true);
conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();
String s = readIt(is, len);
System.out.println("got: " + s);
My readIt function is:
public String readIt(InputStream stream) throws IOException {
int len = 10000;
Reader reader;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
reader.read(buffer);
return new String(buffer);
}
The problem is that It doesn't dowload the whole page. For example, if myUrl is "https://wikipedia.org", then the output is
How can I download the whole page?
Update
Second answer from here Read/convert an InputStream to a String solved my problem. The problem is in readIt function. You should read response from InputStream like this:
static String convertStreamToString(java.io.InputStream is) {
java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}
There are a number of mistakes your code:
You are reading into a character buffer with a fixed size.
You are ignoring the result of the read(char[]) method. It returns the number of characters actually read ... and you need to use that.
You are assuming that read(char[]) will read all of the data. In fact, it is only guaranteed to return at least one character ... or zero to indicate that you have reached the end of stream. When you reach from a network connection, you are liable to only get the data that has already been sent by the other end and buffered locally.
When you create the String from the char[] you are assuming that every position in the character array contains a character from your stream.
There are multiple ways to do it correctly, and this is one way:
public String readIt(InputStream stream) throws IOException {
Reader reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[4096];
StringBuilder builder = new StringBuilder();
int len;
while ((len = reader.read(buffer) > 0) {
builder.append(buffer, 0, len);
}
return builder.toString();
}
Another way to do it is to look for an existing 3rd-party library method with a readFully(Reader) method.
You need to read in a loop till there are no more bytes left in the InputStream.
while (-1 != (len = in.read(buffer))) { //do stuff here}
You are reading only 10000 bytes from the input stream.
Use a BufferedReader to make your life easier.
public String readIt(InputStream stream) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
StringBuilder out = new StringBuilder();
String newLine = System.getProperty("line.separator");
String line;
while ((line = reader.readLine()) != null) {
out.append(line);
out.append(newLine);
}
return out.toString();
}

can't work with BufferedInputStream and BufferedReader together

I'm trying to read first line from socket stream with BufferedReader from BufferedInputStream, it reads the first line(1), this is size of some contents(2) in this content i have the size of another content(3)
Reads correctly... ( with BufferedReader, _bin.readLine() )
Reads correctly too... ( with _in.read(byte[] b) )
Won't read, seems there's more content than my size read in (2)
I think problem is that I'm trying to read using BufferedReader and then BufferedInputStream... can anyone help me ?
public HashMap<String, byte[]> readHead() throws IOException {
JSONObject json;
try {
HashMap<String, byte[]> map = new HashMap<>();
System.out.println("reading header");
int headersize = Integer.parseInt(_bin.readLine());
byte[] parsable = new byte[headersize];
_in.read(parsable);
json = new JSONObject(new String(parsable));
map.put("id", lTob(json.getLong(SagConstants.KEY_ID)));
map.put("length", iTob(json.getInt(SagConstants.KEY_SIZE)));
map.put("type", new byte[]{(byte)json.getInt(SagConstants.KEY_TYPE)});
return map;
} catch(SocketException | JSONException e) {
_exception = e.getMessage();
_error_code = SagConstants.ERROR_OCCOURED_EXCEPTION;
return null;
}
}
sorry for bad english and for bad explanation, i tried to explain my problem, hope you understand
file format is so:
size1
{json, length is given size1, there is size2 given}
{second json, length is size2}
_in is BufferedInputStream();
_bin is BufferedReader(_in);
with _bin, i read first line (size1) and convert to integer
with _in, i read next data, where is size2 and length of this data is size1
then im trying to read the last data, its size is size2
something like this:
byte[] b = new byte[secondSize];
_in.read(b);
and nothing happens here, program is paused...
can't work with BufferedInputStream and BufferedReader together
That's correct. If you use any buffered stream or reader on a socket [or indeed any data source], you can't use any other stream or reader with it whatsoever. Data will get 'lost', that is to say read-ahead, in the buffer of the buffered stream or reader, and will not be available to the other stream/reader.
You need to rethink your design.
You create one BufferedReader _bin and BufferedInputStream _in and read a file both of them, but their cursor position is different so second read start from beginning because you use 2 object to read it. You should read size1 with _in too.
int headersize = Integer.parseInt(readLine(_in));
byte[] parsable = new byte[headersize];
_in.read(parsable);
Use below readLine to read all data with BufferedInputStream.
private final static byte NL = 10;// new line
private final static byte EOF = -1;// end of file
private final static byte EOL = 0;// end of line
private static String readLine(BufferedInputStream reader,
String accumulator) throws IOException {
byte[] container = new byte[1];
reader.read(container);
byte byteRead = container[0];
if (byteRead == NL || byteRead == EOL || byteRead == EOF) {
return accumulator;
}
String input = "";
input = new String(container, 0, 1);
accumulator = accumulator + input;
return readLine(reader, accumulator);
}

Changing encoding in java

I am writting a function that is should detect used charset and then switch it to utf-8. I am using juniversalchardet which is java port for universalchardet by mozilla.
This is my code:
private List<List<String>> setProperEncoding(List<List<String>> input) {
try {
// Detect used charset
UniversalDetector detector = new UniversalDetector(null);
int position = 0;
while ((position < input.size()) & (!detector.isDone())) {
String row = null;
for (String cell : input.get(position)) {
row += cell;
}
byte[] bytes = row.getBytes();
detector.handleData(bytes, 0, bytes.length);
position++;
}
detector.dataEnd();
Charset charset = Charset.forName(detector.getDetectedCharset());
Charset utf8 = Charset.forName("UTF-8");
System.out.println("Detected charset: " + charset);
// rewrite input using proper charset
List<List<String>> newLines = new ArrayList<List<String>>();
for (List<String> row : input) {
List<String> newRow = new ArrayList<String>();
for (String cell : row) {
//newRow.add(new String(cell.getBytes(charset)));
ByteBuffer bb = ByteBuffer.wrap(cell.getBytes(charset));
CharBuffer cb = charset.decode(bb);
bb = utf8.encode(cb);
newRow.add(new String(bb.array()));
}
newLines.add(newRow);
}
return newLines;
} catch (Exception e) {
e.printStackTrace();
return input;
}
}
My problem is that when I read file with chars of for example Polish alphabet, letters like ł,ą,ć and similiar are replaced by ? and other strange things. What am I doing wrong?
EDIT:
For compilation I am using eclipse.
Method parameter is a result of reading MultipartFile. Just using FileInputStream to get every line and then splitting everyline by some separator (it is prepaired for xls, xlsx and csv files). Nothing special there.
First of all, you have your data somewhere in a binary format. For the sake of simplicity, I suppose it comes from an InputStream.
You want to write the output as an UTF-8 String, I suppose it can be an OutputStream.
I would recommend to create an AutoDetectInputStream:
public class AutoDetectInputStream extends InputStream {
private InputStream is;
private byte[] sampleData = new byte[4096];
private int sampleLen;
private int sampleIndex = 0;
public AutoDetectStream(InputStream is) throws IOException {
this.is = is;
// pre-read the data
sampleLen = is.read(sampleData);
}
public Charset getCharset() {
// detect the charset
UniversalDetector detector = new UniversalDetector(null);
detector.handleData(sampleData, 0, sampleLen);
detector.dataEnd();
return detector.getDetectedCharset();
}
#Override
public int read() throws IOException {
// simulate the stream for the reader
if(sampleIndex < sampleLen) {
return sampleData[sampleIndex++];
}
return is.read();
}
}
The second task is quite simple because Java stores the strings (characters) in UTF-8, so just use a simple OutputStreamWriter. So, here's your code:
// open input with Detector stream
// we use BufferedReader so we could read lines
InputStream is = new FileInputStream("in.txt");
AutoDetectInputStream detector = new AutoDetectInputStream(is);
Charset charset = detector.getCharset();
// here we can use the charset to decode the bytes into characters
BufferedReader rdr = new BufferedReader(new InputStreamReader(detector, charset));
// open output to write to
OutputStream os = new FileOutputStream("out.txt");
Writer utf8Writer = new OutputStreamWriter(os, Charset.forName("UTF-8"));
// copy the whole file
String line;
while((line = rdr.readLine()) != null) {
utf8Writer.append(line);
}
// close streams
rdr.close();
utf8Writer.flush();
utf8Writer.close();
So, finally you got all your txt file transcoded to UTF-8.
Note, that the buffer size should be big enough to feed the UniversalDetector.

InputStreamReader buffering issue

I am reading data from a file that has, unfortunately, two types of character encoding.
There is a header and a body. The header is always in ASCII and defines the character set that the body is encoded in.
The header is not fixed length and must be run through a parser to determine its content/length.
The file may also be quite large so I need to avoid bring the entire content into memory.
So I started off with a single InputStream. I wrap it initially with an InputStreamReader with ASCII and decode the header and extract the character set for the body. All good.
Then I create a new InputStreamReader with the correct character set, drop it over the same InputStream and start trying to read the body.
Unfortunately it appears, javadoc confirms this, that InputStreamReader may choose to read-ahead for effeciency purposes. So the reading of the header chews some/all of the body.
Does anyone have any suggestions for working round this issue? Would creating a CharsetDecoder manually and feeding in one byte at a time but a good idea (possibly wrapped in a custom Reader implementation?)
Thanks in advance.
EDIT: My final solution was to write a InputStreamReader that has no buffering to ensure I can parse the header without chewing part of the body. Although this is not terribly efficient I wrap the raw InputStream with a BufferedInputStream so it won't be an issue.
// An InputStreamReader that only consumes as many bytes as is necessary
// It does not do any read-ahead.
public class InputStreamReaderUnbuffered extends Reader
{
private final CharsetDecoder charsetDecoder;
private final InputStream inputStream;
private final ByteBuffer byteBuffer = ByteBuffer.allocate( 1 );
public InputStreamReaderUnbuffered( InputStream inputStream, Charset charset )
{
this.inputStream = inputStream;
charsetDecoder = charset.newDecoder();
}
#Override
public int read() throws IOException
{
boolean middleOfReading = false;
while ( true )
{
int b = inputStream.read();
if ( b == -1 )
{
if ( middleOfReading )
throw new IOException( "Unexpected end of stream, byte truncated" );
return -1;
}
byteBuffer.clear();
byteBuffer.put( (byte)b );
byteBuffer.flip();
CharBuffer charBuffer = charsetDecoder.decode( byteBuffer );
// although this is theoretically possible this would violate the unbuffered nature
// of this class so we throw an exception
if ( charBuffer.length() > 1 )
throw new IOException( "Decoded multiple characters from one byte!" );
if ( charBuffer.length() == 1 )
return charBuffer.get();
middleOfReading = true;
}
}
public int read( char[] cbuf, int off, int len ) throws IOException
{
for ( int i = 0; i < len; i++ )
{
int ch = read();
if ( ch == -1 )
return i == 0 ? -1 : i;
cbuf[ i ] = (char)ch;
}
return len;
}
public void close() throws IOException
{
inputStream.close();
}
}
Why don't you use 2 InputStreams? One for reading the header and another for the body.
The second InputStream should skip the header bytes.
Here is the pseudo code.
Use InputStream, but do not wrap a
Reader around it.
Read bytes containing header and
store them into
ByteArrayOutputStream.
Create ByteArrayInputStream from
ByteArrayOutputStream and decode
header, this time wrap ByteArrayInputStream
into Reader with ASCII charset.
Compute the length of non-ascii
input, and read that number of bytes
into another ByteArrayOutputStream.
Create another ByteArrayInputStream
from the second
ByteArrayOutputStream and wrap it
with Reader with charset from the
header.
I suggest rereading the stream from the start with a new InputStreamReader. Perhaps assume that InputStream.mark is supported.
My first thought is to close the stream and reopen it, using InputStream#skip to skip past the header before giving the stream to the new InputStreamReader.
If you really, really don't want to reopen the file, you could use file descriptors to get more than one stream to the file, although you may have to use channels to have multiple positions within the file (since you can't assume you can reset the position with reset, it may not be supported).
It's even easier:
As you said, your header is always in ASCII. So read the header directly from the InputStream, and when you're done with it, create the Reader with the correct encoding and read from it
private Reader reader;
private InputStream stream;
public void read() {
int c = 0;
while ((c = stream.read()) != -1) {
// Read encoding
if ( headerFullyRead ) {
reader = new InputStreamReader( stream, encoding );
break;
}
}
while ((c = reader.read()) != -1) {
// Handle rest of file
}
}
If you wrap the InputStream and limit all reads to just 1 byte at a time, it seems to disable the buffering inside of InputStreamReader.
This way we don't have to rewrite the InputStreamReader logic.
public class OneByteReadInputStream extends InputStream
{
private final InputStream inputStream;
public OneByteReadInputStream(InputStream inputStream)
{
this.inputStream = inputStream;
}
#Override
public int read() throws IOException
{
return inputStream.read();
}
#Override
public int read(byte[] b, int off, int len) throws IOException
{
return super.read(b, off, 1);
}
}
To construct:
new InputStreamReader(new OneByteReadInputStream(inputStream));

Categories

Resources