I am using an HttpUrlConnection to GET a very large JSON array from the web. I am reading the data 500 bytes at a time as so:
public String getJSON(String myurl) throws IOException {
URL url = new URL(myurl);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
try {
InputStream in = new BufferedInputStream(urlConnection.getInputStream());
String result = readIt(in, 500) ;
return result ;
//Log.d(TAG, result);
}
finally {
urlConnection.disconnect();
}
}
public String readIt(InputStream stream, int len) throws IOException, UnsupportedEncodingException {
StringBuilder result = new StringBuilder();
InputStreamReader reader = null;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
while(reader.read(buffer) != -1)
{
System.out.println("!##: " + new String(buffer)) ;
result.append(new String(buffer)) ;
buffer = new char[len];
}
System.out.println(result.length()) ;
return result.toString();
}
This works fine on some phones, but not on newer phones. On newer phones I realized that the result JSON string was starting to contain garbage characters once it got to character 2048.
Some of my garbage return data:
ST AUGUSTINE, FL","2012050��������������������������������������
And the full error is:
Error: org.json.JSONException: Unterminated array at character 40549 of {"COLUMNS":["IMAGELI
Probably you append a wrong buffer to your string. You should count the number of char you get when reading and append them to the string but no more:
String str = new String(); // or use a StringBuilder if you prefer
char[] buffer = new char[len];
while ((count = reader.read(buffer, 0, len)) > 0)
{ str += new String(buffer, 0, count); }
Avoid recreate your buffer each time ! You allocate a new one for each loop... Reuse the buffer, as you have flushed it in str.
Be carreful when debugging: you cannot print a too long string in logcat (it will be cut if too long). But your str should be fine and should not contain any garbage data anymore.
Related
In my app I need to download some web page. I do it in a way like this
URL url = new URL(myUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(5000000);//5 seconds to download
conn.setConnectTimeout(5000000);//5 seconds to connect
conn.setRequestMethod("GET");
conn.setDoInput(true);
conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();
String s = readIt(is, len);
System.out.println("got: " + s);
My readIt function is:
public String readIt(InputStream stream) throws IOException {
int len = 10000;
Reader reader;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
reader.read(buffer);
return new String(buffer);
}
The problem is that It doesn't dowload the whole page. For example, if myUrl is "https://wikipedia.org", then the output is
How can I download the whole page?
Update
Second answer from here Read/convert an InputStream to a String solved my problem. The problem is in readIt function. You should read response from InputStream like this:
static String convertStreamToString(java.io.InputStream is) {
java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}
There are a number of mistakes your code:
You are reading into a character buffer with a fixed size.
You are ignoring the result of the read(char[]) method. It returns the number of characters actually read ... and you need to use that.
You are assuming that read(char[]) will read all of the data. In fact, it is only guaranteed to return at least one character ... or zero to indicate that you have reached the end of stream. When you reach from a network connection, you are liable to only get the data that has already been sent by the other end and buffered locally.
When you create the String from the char[] you are assuming that every position in the character array contains a character from your stream.
There are multiple ways to do it correctly, and this is one way:
public String readIt(InputStream stream) throws IOException {
Reader reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[4096];
StringBuilder builder = new StringBuilder();
int len;
while ((len = reader.read(buffer) > 0) {
builder.append(buffer, 0, len);
}
return builder.toString();
}
Another way to do it is to look for an existing 3rd-party library method with a readFully(Reader) method.
You need to read in a loop till there are no more bytes left in the InputStream.
while (-1 != (len = in.read(buffer))) { //do stuff here}
You are reading only 10000 bytes from the input stream.
Use a BufferedReader to make your life easier.
public String readIt(InputStream stream) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
StringBuilder out = new StringBuilder();
String newLine = System.getProperty("line.separator");
String line;
while ((line = reader.readLine()) != null) {
out.append(line);
out.append(newLine);
}
return out.toString();
}
Aim : To read a Url which containing information in Json.
Question: I got a code of reading Url Which is given Below. I have a complete Understanding what code is doing but I do not have any idea why the size of char array is 1024 not 2048 or something else . How to decide what character size array is good at the time of reading Url ?
private static String readUrl(String urlString) throws Exception {
BufferedReader reader = null;
try {
URL url = new URL(urlString);
reader = new BufferedReader(new InputStreamReader(url.openStream()));
StringBuffer buffer = new StringBuffer();
int read;
char[] chars = new char[1024]; ???
while ((read = reader.read(chars)) != -1)
buffer.append(chars, 0, read);
return buffer.toString();
} finally {
if (reader != null)
reader.close();
}
}
As the BufferedReader already has an internal buffer of 4096 characters, implementation-dependent, and as the socket already has a considerably larger receive buffer, it really doesn't make much difference what value you choose. The returns on buffering diminish geometrically with size.
I'm building my own HTTP webserver in java and would like to implement some security measures while reading the http request header from a socket inputstream.
I'm trying to prevent scenario's where someone sending extremely long single line headers or absurd amounts of header lines would cause memory overflows or other things you wouldn't want.
I'm currently trying to do this by reading 8kb of data into a byte array and parse all the headers within the buffer I just created.
But as far as I know this means your inputstream's current offset is always already 8kb from it's starting point, even if you had only 100bytes of header.
the code I have so far:
InputStream stream = socket.getInputStream();
HashMap<String, String> headers = new HashMap<String, String>();
byte [] buffer = new byte[8*1024];
stream.read( buffer , 0 , 8*1024);
ByteArrayInputStream bytestream = new ByteArrayInputStream( buffer );
InputStreamReader streamReader = new InputStreamReader( bytestream );
BufferedReader reader = new BufferedReader( streamReader );
String requestline = reader.readLine();
for ( ;; )
{
String line = reader.readLine();
if ( line.equals( "" ) )
break;
String[] header = line.split( ":" , 2 );
headers.put( header[0] , header[1] ); //TODO: check for bad header
}
//if contentlength > 0
// read body
So my question is, how can I be sure that I'm reading the body data (if any) starting from the correct position in the inputstream?
I don't exactly use streams a lot so I don't really have a feel for them and google hasn't been helpful so far
I figured out an answer myself. (was easier than I thought it would be)
If I were to guess it's not buffered (I've no idea when something is buffered anyway) but it works.
public class SafeHttpHeaderReader
{
public static final int MAX_READ = 8*1024;
private InputStream stream;
private int bytesRead;
public SafeHttpHeaderReader(InputStream stream)
{
this.stream = stream;
bytesRead = 0;
}
public boolean hasReachedMax()
{
return bytesRead >= MAX_READ;
}
public String readLine() throws IOException, Http400Exception
{
String s = "";
while(bytesRead < MAX_READ)
{
String n = read();
if(n.equals( "" ))
break;
if(n.equals( "\r" ))
{
if(read().equals( "\n" ))
break;
throw new Http400Exception();
}
s += n;
}
return s;
}
private String read() throws IOException
{
byte b = readByte();
if(b == -1)
return "";
return new String( new byte[]{b} , "ASCII");
}
private byte readByte() throws IOException
{
byte b = (byte) stream.read();
bytesRead ++;
return b;
}
}
I am working on learning Java and am going through the examples on the Android website. I am getting remote contents of an XML file. I am able to get the contents of the file, but then I need to convert the InputStream into a String.
public String readIt(InputStream stream, int len) throws IOException, UnsupportedEncodingException {
InputStreamReader reader = null;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
reader.read(buffer);
return new String(buffer);
}
The issue I am having is I don't want the string to be limited by the len var. But, I don't know java well enough to know how to change this.
How can I create the char without a length?
Generally speaking it's bad practice to not have a max length on input strings like that due to the possibility of running out of available memory to store it.
That said, you could ignore the len variable and just loop on reader.read(...) and append the buffer to your string until you've read the entire InputStream like so:
public String readIt(InputStream stream, int len) throws IOException, UnsupportedEncodingException {
String result = "";
InputStreamReader reader = null;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
while(reader.read(buffer) >= 0)
{
result = result + (new String(buffer));
buffer = new char[len];
}
return result;
}
I am reading a request body and put it into an input stream. While I am explicitly saying the decode method, I still get many \u0000 (Null) after the string.
InputStream is = exchange.getRequestBody();
byte[] header = new byte[100];
is.read(header);
String s = new String(header, "UTF-8");
How can I avoid this with Standard Java Library? I cannot use third party libraries.
is.read(header); returns the number of bytes that were actually read. Change your code as
byte[] header = new byte[100];
int n = is.read(header);
String s = new String(header, 0, n, "UTF-8");
Try using BufferedReader with InputStreamReader. I.e.
InputStreamReader isr = new InputStreamReader(is, "UTF-8");
BufferedReader br = new BufferedReader(isr);
String line = br.readLine();
Use a buffered reader to read it line by line:
InputStream is = exchange.getRequestBody();
BufferedReader reader = new BufferedReader(new InputStreamReader(is))
String line;
while((line = reader.readLine()) != null)
{
// do work
}
If you look at the value of line.toCharArray(), you will see that the null (aka '\u0000') chars show up. A round-about solution to getting rid of those is to add only the chars that you want to another array by using an if statement to exclude the '\u0000' chars. So:
public String removeNullChars(String line) {
String result = "";
// Convert to char array.
char[] chars = line.toCharArray();
// Add only chars that are not equal to '\u0000' to result
for (int i = 0; i < chars.length; i++) {
if (chars[i] != '\u0000') {
result += chars[i];
}
}
return result;
}