How can I make my android app receive/send data faster? - java

I have an app that needs to transfer data back and forth between a server, but the speed is not satisfactory right now. The main part is that I'm receiving and parsing JSON data (about 200 characters long) over 3g from a server, and the fastest it will ever do the task is about 5 seconds, but sometimes it will take long enough to timeout (upwards of 30 seconds). My server is a rackspace cloud server.
I thought I was following best practices, but it can't be so with these kinds of speeds. I am using AsyncTask and the same global HttpClient variable for everything.
Can you help me find a better way?
I've thought about these options:
using TCP instead of HTTP
encoding the data to try to reduce the size (not sure how this would work)
I don't know a lot about TCP, but it seems like it would be less overhead. What would be the pros and cons of using TCP instead of HTTP? is it practical for a cell phone to do?
Thanks
fyi - once I solve the problem I'll accept an answer that's the most helpful. So far I've received some really great answers
EDIT: I made it so that I can see the progress as it downloads and I've noticed that it is staying at 0% for a long time then it is quickly going to 100% -- does anyone have any ideas in light of this new info? It may be relevant that I'm using a Samsung Epic with Froyo.

Try using GZIP to compress the data being sent. Note a code complete example, but it should get you on the right path.
Rejinderi is right; GSON rocks.
HttpGet getRequest = new HttpGet(url);
getRequest.addHeader("Accept-Encoding", "gzip");
InputStream instream = response.getEntity().getContent();
Header contentEncoding = response.getFirstHeader("Content-Encoding");
if (contentEncoding != null && contentEncoding.getValue().equalsIgnoreCase("gzip")) {
instream = new GZIPInputStream(instream);
}

TCP is just HTTP at a lower level and if you really need performance then TCP is the one you should use. HTTP is easier to develop as there are more support and easier to implement as a developer it wraps a lot of things up so you don't have to implement them yourself. The overhead for your case shouldnt be that much.
As for the JSON data. check if its taking a long time, the normal JSON library java has is damn slow take a look here
http://www.cowtowncoder.com/blog/archives/2009/09/entry_326.html
Debug and see if that is the case. if its the json parse speed i suggest you use the gson library. Its cleaner and easy to implement and much MUCH faster.

Sounds like you need to profile the application to find out where your bottleneck is. You said you are sending data of about 200 chars. That is miniscule and I don't see how compression or anything strictly data related is going to make much of an impact on such a small data set.
I think it is more likely that you have some communication issues, perhaps attempting to establish a new connection for every transfer or something along those lines that is giving you all the overhead.
Profiling is the key to resolving your issues, anything else is a shot in the dark.

Related

How to handle large http JSON response body [duplicate]

This question already has answers here:
Best way to process a huge HTTP JSON response
(3 answers)
Closed 1 year ago.
There are REST-API that return large JSON data
example result:
{
"arrayA":[
{
"data1":"data",
"data2":"data"
},..
],
"arrayB":[
{
"data1":"data"
},..
]
}
"arrayA" possible record just 0 to 100 records but "arrayB" can be possible 1 million to 10 million record
it make my java application out of memory.
My question is how to handle this case.
There are different concerns here and IMO the question is too broad because the best solution may depend on the actual use case.
You say, you have a REST API and you would like to “protect” the server from Out Of Memory Error, I get that.
However, assuming you’ll find the way to fix the OOM error on server, what kind of client will want to view tens of millions objects at once? If its a browser, is it really required? How long will take the JSON processing on the client side? Won’t the client side of application become too slow and the clients will start to complain? I’m sure you’ve got the point.
So the first way is to “re-think” why do you need such a big response. In this case, probably the best solution is refactoring and changing the logic of the client-server communication
Now, another possible case is that you have an “integration” - some kind of server to server communication.
In this case there is no point in adding the whole json response at once or even doing streaming. If you’re running in the cloud for example, you might want to add this huge json string to some file and upload to S3, for example and then provide a link to it (because S3 can deal with files like this). Of course there are other alternatives in non AWS environment.
As a “stripped-down” idea you might get the Request, create the temp file on the file system, write the data to it in chunks and then return the “FileResource” to the client. Working chunk-by-chunk will ensure that the memory consumption is low on your java application’s side. Basically it would be equal to downloading the file that gets generated dynamically. When you close the stream you might want to remove the file.
This would work best if you have some kind of “get heap dump” or any data dump in general functionality.

Ways to buffer REST response

There's a REST endpoint, which serves large (tens of gigabytes) chunks of data to my application.
Application processes the data in it's own pace, and as incoming data volumes grow, I'm starting to hit REST endpoint timeout.
Meaning, processing speed is less then network throughoutput.
Unfortunately, there's no way to raise processing speed enough, as there's no "enough" - incoming data volumes may grow indefinitely.
I'm thinking of a way to store incoming data locally before processing, in order to release REST endpoint connection before timeout occurs.
What I've came up so far, is downloading incoming data to a temporary file and reading (processing) said file simultaneously using OutputStream/InputStream.
Sort of buffering, using a file.
This brings it's own problems:
what if processing speed becomes faster then downloading speed for
some time and I get EOF?
file parser operates with
ObjectInputStream and it behaves weird in cases of empty file/EOF
and so on
Are there conventional ways to do such a thing?
Are there alternative solutions?
Please provide some guidance.
Upd:
I'd like to point out: http server is out of my control.
Consider it to be a vendor data provider. They have many consumers and refuse to alter anything for just one.
Looks like we're the only ones to use all of their data, as our client app processing speed is far greater than their sample client performance metrics. Still, we can not match our app performance with network throughoutput.
Server does not support http range requests or pagination.
There's no way to divide data in chunks to load, as there's no filtering attribute to guarantee that every chunk will be small enough.
Shortly: we can download all the data in a given time before timeout occurs, but can not process it.
Having an adapter between inputstream and outpustream, to pefrorm as a blocking queue, will help a ton.
You're using something like new ObjectInputStream(new FileInputStream(..._) and the solution for EOF could be wrapping the FileInputStream first in an WriterAwareStream which would block when hitting EOF as long a the writer is writing.
Anyway, in case latency don't matter much, I would not bother start processing before the download finished. Oftentimes, there isn't much you can do with an incomplete list of objects.
Maybe some memory-mapped-file-based queue like Chronicle-Queue may help you. It's faster than dealing with files directly and may be even simpler to use.
You could also implement a HugeBufferingInputStream internally using a queue, which reads from its input stream, and, in case it has a lot of data, it spits them out to disk. This may be a nice abstraction, completely hiding the buffering.
There's also FileBackedOutputStream in Guava, automatically switching from using memory to using a file when getting big, but I'm afraid, it's optimized for small sizes (with tens of gigabytes expected, there's no point of trying to use memory).
Are there alternative solutions?
If your consumer (the http client) is having trouble keeping up with the stream of data, you might want to look at a design where the client manages its own work in progress, pulling data from the server on demand.
RFC 7233 describes the Range Requests
devices with limited local storage might benefit from being able to request only a subset of a larger representation, such as a single page of a very large document, or the dimensions of an embedded image
HTTP Range requests on the MDN Web Docs site might be a more approachable introduction.
This is the sort of thing that queueing servers are made for. RabbitMQ, Kafka, Kinesis, any of those. Perhaps KStream would work. With everything you get from the HTTP server (given your constraint that it cannot be broken up into units of work), you could partition it into chunks of bytes of some reasonable size, maybe 1024kB. Your application would push/publish those records/messages to the topic/queue. They would all share some common series ID so you know which chunks match up, and each would need to carry an ordinal so they can be put back together in the right order; with a single Kafka partition you could probably rely upon offsets. You might publish a final record for that series with a "done" flag that would act as an EOF for whatever is consuming it. Of course, you'd send an HTTP response as soon as all the data is queued, though it may not necessarily be processed yet.
not sure if this would help in your case because you haven't mentioned what structure & format the data are coming to you in, however, i'll assume a beautifully normalised, deeply nested hierarchical xml (ie. pretty much the worst case for streaming, right? ... pega bix?)
i propose a partial solution that could allow you to sidestep the limitation of your not being able to control how your client interacts with the http data server -
deploy your own webserver, in whatever contemporary tech you please (which you do control) - your local server will sit in front of your locally cached copy of the data
periodically download the output of the webservice using a built-in http querying library, a commnd-line util such as aria2c curl wget et. al, an etl (or whatever you please) directly onto a local device-backed .xml file - this happens as often as it needs to
point your rest client to your own-hosted 127.0.0.1/modern_gigabyte_large/get... 'smart' server, instead of the old api.vendor.com/last_tested_on_megabytes/get... server
some thoughts:
you might need to refactor your data model to indicate that the xml webservice data that you and your clients are consuming was dated at the last successful run^ (ie. update this date when the next ingest process completes)
it would be theoretically possible for you to transform the underlying xml on the way through to better yield records in a streaming fashion to your webservice client (if you're not already doing this) but this would take effort - i could discuss this more if a sample of the data structure was provided
all of this work can run in parallel to your existing application, which continues on your last version of the successfully processed 'old data' until the next version 'new data' are available
^
in trade you will now need to manage a 'sliding window' of data files, where each 'result' is a specific instance of your app downloading the webservice data and storing it on disc, then successfully ingesting it into your model:
last (two?) good result(s) compressed (in my experience, gigabytes of xml packs down a helluva lot)
next pending/ provisional result while you're streaming to disc/ doing an integrity check/ ingesting data - (this becomes the current 'good' result, and the last 'good' result becomes the 'previous good' result)
if we assume that you're ingesting into a relational db, the current (and maybe previous) tables with the webservice data loaded into your app, and the next pending table
switching these around becomes a metadata operation, but now your database must store at least webservice data x2 (or x3 - whatever fits in your limitations)
... yes you don't need to do this, but you'll wish you did after something goes wrong :)
Looks like we're the only ones to use all of their data
this implies that there is some way for you to partition or limit the webservice feed - how are the other clients discriminating so as not to receive the full monty?
You can use in-memory caching techniques OR you can use Java 8 streams. Please see the following link for more info:
https://www.conductor.com/nightlight/using-java-8-streams-to-process-large-amounts-of-data/
Camel could maybe help you the regulate the network load between the REST producer and producer ?
You might for instance introduce a Camel endpoint acting as a proxy in front of the real REST endpoint, apply some throttling policy, before forwarding to the real endpoint:
from("http://localhost:8081/mywebserviceproxy")
.throttle(...)
.to("http://myserver.com:8080/myrealwebservice);
http://camel.apache.org/throttler.html
http://camel.apache.org/route-throttling-example.html
My 2 cents,
Bernard.
If you have enough memory, Maybe you can use in-memory data store like Redis.
When you get data from your Rest endpoint you can save your data into Redis list (or any other data structure which is appropriate for you).
Your consumer will consume data from the list.

limit size of requests with com.sun.net.httpserver.HttpExchange

abusive users may attempt to send really large requests to my httpserver so while a "maxRequestSize" config would've been useful, I have yet to find a way of dealing with this. I also thought there might be a timeout option somewhere but couldn't find anything like that either.
httpExchange.getRequestBody() returns an InputStream but based on my research there's no way to determine the length of an InputStream without first processing it

parsing giant json response

First the background info:
I'm using commons httpclient to make a get request to the server.
Server response a json string.
I parse the string using org.json.
The problem:
Actually everything works, that is for small responses (smaller then 2^31 bytes = max value of an integer which limits the getResponseBody and the stringbuilder). I on the other hand have a giant response (over several GB) and I'm getting stuck. I tried using the "getResponseBodyAsStream" of httpclient but the response is so big that my system is getting stuck. I tried using a String, a Stringbuilder, even saving it to a file.
The question:
First, is this the right approach, if so, what is the best way to handle such a response? If not, how should I proceed?
If you ever shall have a response which can be factor of GB you shall parse the json as stream character by character (almost) and avoid creating any String objects... (its very important because java stopTheWorld garbage collection will cause your system freese for seconds if you constantly create lot of garbage)
You can use SAXophone to create parsing logic.
You'll have to implement all the methods like onObjectStart, onObjectClose, onObjectKey etc... its hard at first but once you take a look implementation of PrettyPrinter in test packages you'll have the idea...
Once properly implemented you can handle an infinite stream of data ;)
P.S. this is designed for HFT so its all about performance and no garbage...

download with java code is really slow

i wrote a bit of code that reads download links from a text file and downloads the videos using the copyURLToFile methode from apaches commons-io library and the download is really slow when im in my wlan.
when i put in an internet stick is is about 6 times faster although the stick got 4mbit and my wlan is 8 mbit.
i also tried to do it without the commons-io library but the problem is the same.
normally im downloading 600-700 kb/s in my wlan but with java it only downloads with about 50 kb/s. With the internet stick its about 300 kb/s.
Do you know what the Problem could be?
thanks in advance
//Edit: Here is the code but i dont think it has anything to do with this and what do you mean with network it policies?
FileInputStream fstream = new FileInputStream(linksFile);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String link;
String name;
while ((link = br.readLine()) != null) {
name = br.readLine();
FileUtils.copyURLToFile(new URL(link), new File("videos/"+name+".flv"));;
System.out.println(link);
}
This isn't likely to be a Java problem.
The code you've posted actually doesn't do any IO over the network - it just determines a URL and passes it to (presumably Apache Commons') FileUtils.copyURLToFile. As usual with popular third-party libraries, if this method had a bug in it that caused slow throughput in all but the most unusual situations, it would already have been identified (and hopefully fixed).
Thus the issue is going to lie elsewhere. Do you get the expected speeds when accessing resource through normal HTTP methods (e.g. in a browser)? If not, then there's a universal problem at the OS level. Otherwise, I'd have a look at the policies on your network.
Two possible causes spring to mind:
The obvious one is some sort of traffic shaping - your network deprioritises the packets that come from your Java app (for an potentially arbitrary reason). You'd need to see hwo this is configured and look at its logs to see if this is the case.
The problem resides with DNS. If Java's using a primary server that's either blocked or incredibly slow, then it could take up to a few seconds to convert that URL to an IP address and begin the actual transfer. I had a similar problem once when a firewall was silently dropping packets to one server and it took three seconds (per lookup!) for the Java process to switch to the secondary server.
In any case, it's almost certainly not the Java code that's at fault.
The FileUtils.copyURLToFile internals uses a buffer to read.
Increasing the value of the buffer could speed up the download, but that seems not possible.

Categories

Resources