My app is parsing a large http response, the http response is over 6 megabytes and is json, but not in a standard schema.
final char[] buffer = new char[0x10000];
StringBuilder out = new StringBuilder();
Reader in = new InputStreamReader(is, "UTF-8");
int read;
System.gc();
do
{
read = in.read(buffer, 0, buffer.length);
if (read > 0)
{
out.append(buffer, 0, read);
}
} while (read >= 0);
in.close();
is.close();
in = null;
is = null;
System.gc();
return out.toString();
It doesn't matter if there is a bufferedreader from a file, or an inputstream, the StringBuilder simply cannot contain the entire object and it fails at out.append(buffer, 0, read); or it will fail at out.toString() as another copy may be made
IOUtils.copy from the apache library is doing the same things under the hood and it will also fail.
How can I read this large object in for further manipulation. Right now this method fails on Android 2.2 and 2.3 devices, and uses more memory than I want on newer devices.
Similar questions all have answers that involve appending to a stringbuilder, reading in lines, or have incomplete solutions that are only hints, and that doesn't work.
You need to do one of two things:
Get multiple smaller JSON responses from the server and parse those. This might be preferable on a mobile device, as large chunks of data might not be transmitted reliably, which will cause the device to request the entire thing repeatedly.
Use a streaming JSON parser, such as Jackson, to process the data as it comes in.
Related
I know this has been asked before, but since I haven't been able to find an answer with a definitive conclusion or at least one that shows the pros and cons of the possibles approaches, I have to ask :
When it comes to read data from the Internet, a webservice for instance, what is the correct or more efficient way to read this data?
From all the books I have glanced over, I've found at least 4 ways to read data:
1) Reading a specific amount of characters at a time.
In this case the data is read in chunks of 4026 characters
BufferedReader reader = new BufferedReader(
new InputStreamReader(in, encoding));
char[] buffer = new char[4096];
StringBuilder sb = new StringBuilder();
int downloadedBytes = 0;
int len1 = 0;
while ((len1 = reader.read(buffer)) > 0) {
sb.append(buffer);
}
return sb.toString();
2) Read the data knowing the content lenght
int length =(HttpURLConnection) urlConnection.getContentLength();
InputStream inputStream = urlConnection.getInputStream();
BufferedReader bufferedReader =new BufferedReader(new InputStreamReader(inputStream));
StringBuilder stringBuilder = new StringBuilder(length);
char[] buffer = new char[length];
int charsRead;
while ((charsRead = bufferedReader.read(buffer)) != -1) {
stringBuilder.append(buffer, 0, charsRead);
}
return stringBuilder.toString();
3) Read the data line by line :
BufferedReader reader=new BufferedReader(new InputStreamReader(c.getInputStream()));
StringBuilder buf=new StringBuilder();
String line=null;
while ((line=reader.readLine()) != null) {
buf.append(line);
}
return(buf.toString());
4) Read the data character by character:
InputStream in = mConnection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(
in, enconding));
int ch;
StringBuilder sb = new StringBuilder();
while ((ch=reader.read()) > 0) {
sb.append((char)ch);
}
return sb.toString().trim();
I have tried 3 of these 4 different techniques, except for number 3 (Read the data line by line ) and out of the three techniques only the fourth has given me good results.
The first method, didn't work for me because when I read large amounts of data, as it often cut the data giving me as a result invalid json strings or string with white spaces at the end.
The second approach, well I wasn't able to use that method because getContentLength is not always reliable and if the value is not set , there's nothing we can do about it , well that's my case.
I didn't tried the third method because I wasn't sure about the fact of reading data "line" by "line". Does this apply to data that contains an array of json objects or only to files that indeed contain lines??
Being the last technique the last choice I was left with, I tried it and it worked, BUT I don't think that reading a large amount of data character by character would be efficient at all.
So now I would really appreciate your opinions and ideas. What approach do you use when it comes to reading data from webservices? and more importantly why?
Thanks.
P.D. I know I could've easily used DefaultHttpClient, but the doc clearly encourages not to do so.
For Android 2.3 (Gingerbread) and later, HttpURLConnection is the best
choice. Its simple API and small size makes it great fit for Android.
Transparent compression and response caching reduce network use,
improve speed and save battery.
Ive tried all the methods that you have mentioned. One problem if faced was the reply not being read completely. After some research, the most efficient/fastest way i found was to go about it like this
DefaultHttpClient client = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("Accept", "application/json");
httpGet.setHeader("Content-type", "application/json");
//ive put json header because im using json
try {
HttpResponse execute = client.execute(httpGet);
String responseStr = EntityUtils.toString(execute.getEntity());
}
responseStr will contain the webservice reply and it reads it in one go. Hope this helps
If the data volume is not too big, it doesn't really matter, what approach you use. If it is, then it makes sense to use buffering - and read data in chunks.
2nd approach is not too good, as you not always can get ContentLength.
Then, if your data is text/html/JSON you can use 3rd approach, as you don't have to bother yourself with the chunk size. Also, you can print the incoming data line-by-line to aim debugging.
If your data is a binary/base64 stream like image, you should use 1st approach and read data in 4k (usually used) blocks.
UPDATE:
BTW, instead of the dreaded DefaultHttpClient I'm using the AndroidHttpClient as a singleton and it works smooth :)
It matters. Best for performance is to read from InputStream into a buffer of a reasonable size. This way you transfer a decent amount of data at one time, rather then repeating the same operation thousand times. Do not always rely on Content-length header value. For gzipped content it might show incorrect size.
I'm dealing with some Java code in which there's an InputStream that I read one time and then I need to read it once again in the same method.
The problem is that I need to reset it's position to the start in order to read it twice.
I've found a hack-ish solution to the problem:
is.mark(Integer.MAX_VALUE);
//Read the InputStream is fully
// { ... }
try
{
is.reset();
}
catch (IOException e)
{
e.printStackTrace();
}
Does this solution lead to some unespected behaviours? Or it will work in it's dumbness?
As written, you have no guarantees, because mark() is not required to report whether it was successful. To get a guarantee, you must first call markSupported(), and it must return true.
Also as written, the specified read limit is very dangerous. If you happen to be using a stream that buffers in-memory, it will potentially allocate a 2GB buffer. On the other hand, if you happen to be using a FileInputStream, you're fine.
A better approach is to use a BufferedInputStream with an explicit buffer.
It depends on the InputStream implementation. You can also think whether it will be better if you use byte[]. The easiest way is to use Apache commons-io:
byte[] bytes = IOUtils.toByteArray(inputSream);
You can't do this reliably; some InputStreams (such as ones connected to terminals or sockets) don't support mark and reset (see markSupported). If you really have to traverse the data twice, you need to read it into your own buffer.
Instead of trying to reset the InputStream load it into a buffer like a StringBuilder or if it's a binary data stream a ByteArrayOutputStream. You can then process the buffer within the method as many times as you want.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int read = 0;
byte[] buff = new byte[1024];
while ((read = inStream.read(buff)) != -1) {
bos.write(buff, 0, read);
}
byte[] streamData = bos.toByteArray();
For me, the easiest solution was to pass the object from which the InputStream could be obtained, and just obtain it again. In my case, it was from a ContentResolver.
I've been Googling up some answers, and I can't seem to find the best one.
Here's what I have so far for reading internal files on Android:
fis = openFileInput("MY_FILE");
StringBuilder fileContent = new StringBuilder("");
byte[] buffer = new byte[fis.available()];
while (fis.read(buffer) != -1) {
fileContent.append(new String(buffer));
}
MYVARIABLE = fileContent.toString();
fis.close();
It use to leave a lot of whitespaces, but I just used .available method to only return what I need.
Is there a faster or shorter way to write this? I can't seem to find any good ones in the API guide.
1). API for available() says it should not be used for the purposes you need:
Note that this method provides such a weak guarantee that it is not very useful in practice.
Meaning it may not give you the file size.
2). When you read smth in RAM, then take under account the file can be lengthy, so try to avoid spending extra RAM. For this a relatively small (1~8 KB) buffer is used to read from source and then append to result. On the other hand using too small buffers (say, several bytes) slows down reading significantly.
3). Reading bytes differs from reading characters, because a single character may be represented by more than one byte (depends on encoding). To read chars the spesific classes are used which are aware of encoding and know how to convert bytes to chars properly. For instance InputStreamReader is one of such classes.
4). The encoding to use for reading should be the encoding tha was used for persisting the data.
Taking all the said above I would use smth like this:
public static String getStringFromStream(InputStream in, String encoding)
throws IOException {
InputStreamReader reader;
if (encoding == null) {
// This constructor sets the character converter to the encoding
// specified in the "file.encoding" property and falls back
// to ISO 8859_1 (ISO-Latin-1) if the property doesn't exist.
reader = new InputStreamReader(in);
} else {
reader = new InputStreamReader(in, encoding);
}
StringBuilder sb = new StringBuilder();
final char[] buf = new char[1024];
int len;
while ((len = reader.read(buf)) > 0) {
sb.append(buf, 0, len);
}
return sb.toString();
}
5). Make sure to always close an InputStream when done working with it.
Sure, there are more than one way to read text from file in Java/Android. This is mostly because Java API contains several generations of IO APIs. For instance, classes from java.nio package were created to be more efficient, however usually there is no strong reason of using them (don't fall into premature optimization sin).
I receive gziped JSON from web service and then i unzip it (size of unziped JSON is 3.2MB).
I need to transform received InputStream to String so i can then create JSONObject and parse it. I do it with this code:
public static String InputStreamToString(InputStream in)
throws IOException {
BufferedInputStream bis = new BufferedInputStream(in);
ByteArrayOutputStream buf = new ByteArrayOutputStream();
int result = bis.read();
while(result != -1) {
byte b = (byte)result;
buf.write(b);
result = bis.read();
}
return buf.toString();
}
I receive java.lang.OutOfMemoryError on the last line: "return buf.toString();" on the emulator and device with 288MB Ram.
What shall i do?
Reading in a byte at a time is so 1990's. Either use HttpClient and BasicResponseHandler, or at least read the data in respectable chunks and append them using a StringBuilder.
Assuming you are still having the problem, the issue is that there is no single block of memory that is big enough for your string, based upon other things your app has been doing. The Android garbage collector is not a compacting collector, so it is possible to have lots of free heap space yet not enough for a specific allocation request.
In that case, you may need to switch to some sort of streaming JSON parser. If you happen to be targeting only Honeycomb and higher, you can use JSONReader. Otherwise, Jackson reportedly works on Android and apparently has a streaming mode.
You can try to create a new JSONObject using
new JSONObject(new JSONTokener(in))
instead of converting in to a String directly. However, this will probably only delay the problem. If you don't have enough memory to load a 3.2 meg string into memory, you probably won't have enough memory to load that as a json object, which will take more memory than the simple string.
We are streaming data between a server (written in .Net running on Windows) to a client (written in Java running on Ubuntu) in batches. The data is in XML format. Occasionally the Java client throws an unexpected EOF while trying decompress the stream. The message content always varies and is user driven. The response from the client is also compressed using GZip. This never fails and seems to be rock solid. The response from the client is controlled by the system.
Is there a chance that some arrangement of characters or some special characters are creating false EOF markers? Could it be white-space related? Is GZip suitable for compressing XML?
I am assuming that the code to read and write from the input/output streams works because we only occasionally gets this exception and when we inspect the user data at the time there seems to be special characters (which is why I asked the question) such as the '#' sign.
Any ideas?
UPDATE:
The actual code as requested. I thought it wasn't this due to the fact that I had been to a couple of sites to get help on this issue and they all more or less had the same code. Some sites mentioned appended GZip. Something to do with GZip creating multiple segments?
public String receive() throws IOException {
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(8192);
do {
int nrBytes = in.read(buffer);
if (nrBytes > 0) {
baos.write(buffer, 0, nrBytes);
}
} while (in.available() > 0);
return compressor.decompress(baos.toByteArray());
}
public String decompress(byte[] data) throws IOException {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(data);
try {
GZIPInputStream inflater = new GZIPInputStream(in);
byte[] byteBuffer = new byte[8192];
int r;
while((r = inflater.read(byteBuffer)) > 0 ) {
buffer.write(byteBuffer, 0, r);
}
} catch (IOException e) {
log.error("Could not decompress stream", e);
throw e;
}
return new String(buffer.toByteArray());
}
At first I thought there must be something wrong with the way that I am reading in the stream and I thought perhaps I am not looping properly. I then generated a ton of data to be streamed and checked that it was looping. Also the fact they it happens so seldom and so far has not been reproducable lead me to believe that it was the content rather than the scenario. But at this point I am totally baffled and for all I know it is the code.
Thanks again everyone.
Update 2:
As requested the .Net code:
Dim DataToCompress = Encoding.UTF8.GetBytes(Data)
Dim CompressedData = Compress(DataToCompress)
To get the raw data into bytes. And then it gets compressed
Private Function Compress(ByVal Data As Byte()) As Byte()
Try
Using MS = New MemoryStream()
Using Compression = New GZipStream(MS, CompressionMode.Compress)
Compression.Write(Data, 0, Data.Length)
Compression.Flush()
Compression.Close()
Return MS.ToArray()
End Using
End Using
Catch ex As Exception
Log.Error("Error trying to compress data", ex)
Throw
End Try
End Function
Update 3: Also added more java code. the in variable is the InputStream return from socket.getInputStream()
It certainly shouldn't be due to the data involved - the streams deal with binary data, so that shouldn't make any odds at all.
However, without seeing your code, it's hard to say for sure. My first port of call would be to check anywhere that you're using InputStream.read() - check that you're using the return value correctly, rather than assuming a single call to read() will fill the buffer.
If you could provide some code, that would help a lot...
I would suspect that for some reason the data is altered underway, by treating it as text, not as binary, so it may either be \n conversions or a codepage alteration.
How is the gzipped stream transferred between the two systems?
It is not pssible. EOF in TCP is delivered as an out of band FIN segment, not via the data.