Using GWT-RPC vs RequestFactory for passing large arrays - java

I am building an application which retrieves data and parses it into a two-dimensional array object before sending it back to the client. The application then uses the data to create an image on HTML5 canvas. The array contains several thousand entries, and when I built the application using GWT-RPC, it worked properly, but it took too long to transfer the array to the client (several minutes).
I found this issue when searching for solutions: http://code.google.com/p/google-web-toolkit/issues/detail?id=860
The last response was from a couple months ago, but there doesn't seem to be a conclusive answer to the best way to go about passing large arrays from server to client. Since deRPC is being deprecated (I have yet to actually try using it), is using requestfactory the only choice? It seems like requestFactory is supposed to be for accessing a database, and not for performing calculations and returning large results, and I haven't found an example where a request is made for a calculation and a result is passed back. Should I create a JSON object instead of an array in my current implementation and keep RPC or am I missing something when it comes to requestFactory?

The issue you linked to is about slow speed of de-serialization on the client and not the data transfer speed. You should first measure the transfer speed using Firebug or similar tool and then subtract this time from total time of RPC call to find out how much time is spent during de-serialization. Roughly speaking, the breakup goes like this:
Total RPC time = time-spent-on-server + network-time-in-out +
deserialization-time
You should first find out which part is the real bottleneck, if it turns out to be the data transfer speed, you'll probably need to re-think your design. See my answer to a related question.
EDIT:
IMO, until you have calculated the above time breakup, you should put aside the question that whether JSON or another approach is right for you

Related

Easiest way to collect SOAP response data

Good day,
There is a SOAP interface which returns the data I need. The request passes initial parameters to the server, and returns a data set. I need to collect hundreds of data sets with different parameters, store them in an array and then iterate through it. This process may need to be repeated every few weeks.
I found that SoapUI can retrieve the datasets one by one, but this means a lot of manual work. In my experience SoapUI also freezes every 30 minutes or so, especially if large datasets are involved. I spent a few days retrieving the data via SoapUI, copypasting the responses into array and then iterating through it with Javascript - but this approach seems very inefficient.
I never worked with SOAP before this - so most of the online tutorials fly over my head. SoapUI is quick and easy, but doesn't seem suitable for automatically retrieving large quantities of data. I don't have a huge preference for a programming language - whatever is simplest to implement is fine with me. I'd be perfectly happy if the response comes in an XML that's simply in a string format - it can then be parsed with regex, I know how to do this. I have a bit of experience with java and python though, if that helps. Thank you!
You might want to take a look at Zeep, a Python SOAP client.
The response object you’ll get back can be accessed just like a dictionary, you can do things like response['key']. You can also serialize zeep objects to native python data structures.

Compare 2 large string arrays between client/server

I have a big string array which has between 24-32 random characters (which include 0123456789abcdefghijklmnopqrstuvwxyz!##$%^&*()_+=-[]';/.,<>?}{). Some times the array is empty, but other times the array has more than 1000 elements inside it.
I send them to my client, which is a browser, via AJAX every time he requests them and I want to reload a part of my application only if that array is different. That means if there was a modification, adding/removing in said array. So I want to send the entire array, and some kind of hash of all the elements inside it. I can't use md5 or anything like that because the elements inside the array might move around.
What do you suggest I do? The server uses Java to serve pages.
Are you sure transmitting 1000 characters is actually a problem in your use case? For instance, this stackoverflow page is currently 17000 bytes large, and stackoverflow makes no effort to only transmit it if it has changed. Put differently, transmitting 1000 characters will take about 1000 bytes, or 1 ms on a 1 MBit connection (which is slow by modern standards ;-).
That said, transmitting data only if it has changed is such a basic optimization strategy that it has been incorporated into the HTTP standard itself. The HTTP standard describes both time based and etag based invalidation, and is implemented by virtually any software or hardware interacting using HTTP, including browsers and CDNs. To learn more, read an tutorial by Google or the normative specification.
You could be using time based invalidation, either by specifying a fixed lifetime or interpreting the If-Modified-Since header. You could also use an ETag that is not sensitive to ordering, by putting your elements into a particular order (e.g. through sorting) before hashing.
I would suggest a system that allows you to skip sending the strings altogether if the client has the latest version. The client keeps the version number (or hash code) of the latest version it received. If it hasn't received any strings yet, it can default to 0.
So, when the client needs to get the strings, it can say, "Give me the strings if the current version isn't X," where X is the version that the client currently has.
The server maintains a version number or hash code which it updates whenever the strings change. If it receives a request, and the client's version is the same as the current version, then the server returns a result that says, "You already have the current version."
The point here is twofold: prevent transmitting information that you don't need to transmit, and prevent the client from having to compute a hash code.
If the server needs to compute a hash at every request rather than just keeping a current hash code value, have the server sort the array of strings first, and then do an MD5 or CRC or whatever.

Searching strategy to efficiently load data from service or server?

This question is not very a language-specific question, it's some kind of pattern-related question, but I would like to tag it with some popular languages that I can understand here.
I've not been very experienced with the requirement of efficiently loading data in combination with searching data (especially for mobile environment).
My strategy used before is load everything into local memory and search from there (such as using LINQ in C#).
One more strategy is reload the data every time a new search is executed. Doing something like this is of course not efficient, also we may need to do some more complicated things to sync the newly loaded data with the existing data (already loaded into local memory).
The last strategy I can think of is the hardest one to implement, that is lazily load the data together with the searching execution. That is when the search is executed, the return result should be cached locally. The search should look in the local memory first before fetching new result from the service/server. So the result of each search is a combination of the local search and the server search. The purpose here is to reduce the amount of data being reloaded from server every time a search is run.
Here is what I can think of to implement this kind of strategy:
When a search is run, look in the local memory first. Finishing this step gives out the local result.
Now before sending request to search on the server side, we need to somehow pass what are already put in the result (locally) to exclude them from the result when searching on the server side. So the searching method may include a list of arguments containing all the item IDs found by the fisrt step.
With that searching request, we can exclude the found result and return only new items to the client.
The last step is merge the 2 results: from local and server to have the final search result before showing on the UI to the user.
I'm not sure if this is the right approach but what I feel not really good here is at the step 2. Because we need to send a list of item IDs found on the step 1 to the server, so what if we have hundreds or thousands of such IDs, sending them in that case to the server may not be very efficient. Also the query to exclude such a large amount of items may not be also efficient (even using direct SQL or LINQ). I'm still confused at this point.
Finally if you have any better idea and importantly implemented in some production project, please share with me. I don't need any concrete example code, I just need some idea or steps to implement.
Too long for a comment....
Concerning step 2, you know you can run into many problems:
Amount of data
Over time, you may accumulate a huge amount of data so that even the set their id's gets bigger than the normal server answer. In the end, you could need to cache not only previous server's answers on the client, but also client's state on the server. What you're doing is sort of synchronization, so look at rsync for inspiration; it's an old but smart Unix tool. Also git push might be inspiring.
Basically, by organizing your IDs into a tree, you can easily synchronize the information (about what the client already knows) between the server and the client. The price may be increasing latency as multiple steps may be needed.
Using the knowledge
It's quite possible that excluding the already known objects from the SQL result could be more expensive than not, especially when you can't easily determine if a to-be-excluded object would be a part of the full answer. Still, you can save bandwidth by post-filtering the data.
Being up to date
If your data change or get deleted, your may find your client keeping obsolete data. The client subscribing for relevant changes is one possibility; associating a (logical) timestamp to your IDs is another one.
Summary
It can get pretty complicated and you should measure before you even try. You may find out that the problem itself is hard enough and that achieving these savings is even harder and the gain limited. You know the root of all evil, right?
I would approach the problem by thinking local and remote are two different data sources,
When a search is triggered, the search is initiated against both data sources (local - in memory and server)
Most likely local search will result in results first, so display them to the user.
When results returned from the server, you can append non duplicate results.
Optional - in case server data has changed and some results remove/ or changed, update/remove local results and update the view.

Quick C++ data to Java transfer

I'm trying to transfer a stream of strings from my C++ program to my Java program in an efficient manner but I'm not sure how to do this. Can anyone post up links/explain the basic idea about how do implement this?
I was thinking of writing my data into a text file and then reading the text file from my Java program but I'm not sure that this will be fast enough. I need it so that a single string can be transferred in 16ms so that we can get around 60 data strings to the C++ program in a second.
Text files can easily be written to and read from upwards with 60 strings worth of content in merely a few milliseconds.
Some alternatives, if you find that you are running into timing troubles anyway:
Use socket programming. http://beej.us/guide/bgnet/output/html/multipage/index.html.
Sockets should easily be fast enough.
There are other alternatives, such as the tibco messaging service, which will be an order of magnitude faster than what you need: http://www.tibco.com/
Another alternative would be to use a mysql table to pass the data, and potentially just set an environment variable in order to indicate the table should be queried for the most recent entries.
Or I suppose you could just use an environment variable itself to convey all of the info -- 60 strings isn't very much.
The first two options are the more respectable solutions though.
Serialization:
protobuf or s11n
Pretty much any way you do this will be this fast. A file is likely to be the slowest and it could be around 10ms total!. A Socket will be similar if you have to create a new connection as well (its the connect, not the data which will take most time) Using a socket has the advantage of the sender and receiver knowing how much data has been produced. If you use a file instead, you need another way to say, the file is complete now, you should read it. e.g. a socket ;)
If the C++ and Java are in the same process, you can use a ByteBuffer to wrap a C array and import into Java in around 1 micro-second.

How to implement Java/Scala in-memory statistics database?

My need is to aggregate real time statistics of a web application server.
For example:
How many requests of content type X have been done
How long it takes to process request of type Y
And so on.
This data has to be completely in memory, not in a file, for best performance. It doesn't log each and every request but instead only stores counters of various aspects.
The most easy way I know is to store the values in a SQL-like table and do SQL-like queries. The benefit is that the indexing is coming off-the-shelf without development effort. I guess some embedded Java databases like Apache Derby would do the work.
The other way to go is to implement collection (say a list) and hash table for each "index column". This way it's all done with Java/Scala collections API, but I actually have to implement indexing mechanism myself, test it, maintain it, etc.
So my question is what way do you think is preferred, and if there are other ways to easily and quickly implement this feature?
Thanks.
I would choose H2 database, I have very positive experiences with it, performance is great as well.
Are you sure that SQL database is well suited for your needs, and have you looked at javamelody, to see if it suits your needs, or if it does not suit you take a look at JRobin for a rolling database implementation.
I would imagine you only need one collection per type of information you need to collection. To improve performance, simplify code I would use TObjectIntHashMap. e.g.
How many requests of content type X have been done
TObjectIntHashMap<ContentType> contentTypeCount
= new TObjectIntHashMap<ContentType>();
contentTypeCount.increment(contentType);
How long it takes to process request of type Y
TObjectLongHashMap<ProcessType> contentTypeTime
= new TObjectLongHashMap<ProcessType>();
contentTypeTime.adjustValue(processType, processTime);
I don't see how you can make it any shorter/simpler/faster by using the other approaches you mentioned.
The average time to perform increment(key) on my machines takes 15 ns (billionths of a second)
I also been noticed about Twitter Ostrich that is statistics library for Scala.
It contains counters, gauges and timing meters.
Data is accessible from HTTP REST API.

Categories

Resources