I work with a large Java web application which uses a service to return information. My key goal is to retrieve 100+ individual results as quick as possible. We've noticed that sending 100+ requests for information doesn't give us the best performance on the replies of data. Our solution was to break the 100+ requests into small batches (~15,25) and assemble them once all have been received.
I'm looking for a suggestion in Java to make 1 or 50 or 200 requests from the application, to the service, get back info to the application and perform another batch if there are more requests. If no requests left, assemble into on list and return that full list.
Any suggestions of form are welcome, thanks.
I use Spring Integration for this kind of thing. You can set up a configurable message splitter that chops up your request and sends off many tiny ones, and a message aggregator that knows when it has received all responses and can then give you back a single result.
Spring also has a product called Spring Batch that might be a useful alternative, but that is more for heavy batch processing which it doesn't sound like you are doing.
If possible / feasible, extend the /service to support handle multiple logical requests in a single protocol request; e.g. HTTP request.
In theory, the both the client and server side should be able to do the work with less overheads if the server side gets a single request. There should be savings on both the client and server sides.
Related
Which is the best architecture to handle multiple http requests sending a list of ids (with more than 1000 ids in each request) and each request will be a heavy time-consuming process that communicate with other systems (that uses SOAP or REST) and saves data into a RDBMS?
Should I send the ids in a POST HTTP request?
Should I use Spring Framework Async/Futures (without return) in a Resource/Controller REST to handle the Time-consuming process and return HTTP 202 Accepted (and maybe an ID to query the status of the process or the resource)?
Or Is it good if my Resources/Controllers REST only put the messages in a JMS Queue (maybe a persistent queue for each REST method) and Message-Driven Beans (for each method) consume the messages and process the request?
It is a good idea if my Resources/Controllers REST or a class acting like a proxy save the requests for my system and other systems responses in a database for logging purposes and to fix problems? Or should I use the retry feature in JMS for example?
I agree with dunni; however, your general requirements sound like JMS would be a good place to start. That would allow clients to drop off requests and allow back end components to work on the requests as they are able. Of course, you need enough back end power to get all the requests completed in a reasonable time frame.
Is making REST based web service (POST) asynchronous is the best way to handle thousands of requests at one time (Keeping in mind that I have only single instance of server serving the request)?
Edited:
Jersey is wrongly tagged.
For eg: I have a rest based web service, which is supposed to be consumed by 100 thousand clients within a very short span of time (~60 seconds). I understand that if I am allowed to deploy multiple instance of the server, then I can use a load balancer to handle all my incoming request and delegate them accordingly. But I am restricted to use only single instance. What design could I opt within this restriction?
I could think of making the request asynchronous( which will not respond to client immediately ) in order to be able to let the server be free from this load and handle the requests at it's own pace.
For now we can ignore memory limitations.
Please let me know if this clarifies your doubt?
The term asynchronous could have different meanings in different places. For a web application code, it could refer to a Nonblocking I/O server such as Node or Netty/Akka which is a way for HTTP Requests to time multiplex on the same worker threads. If you're writing callbacks or using async or future constructs, it probably is non-blocking I/O which people sometimes refer to as asynchronous.
However, I could have REST API running on Node which implements non-blocking I/O, but the API or the overall architecture is still fully synchronous. For example, let's say I have an API endpoint POST /photos, which takes in a photo, creates image thumbnails, stores the URLs of the photo in a SQL Db and then stores the images in S3. The REST API could still block from the initial POST until after the image is processed and stored.
A second way is for the server to accept the photo process as a job and return immediately. Then the server could store the photo in a in memory or network based queue to be processed later by some other worker thread. In fact, I could even implement this async architecture even with a blocking server like some good old Java 7 and Jetty.
I have two CSV files, one containing 500k+ customer records. I am attempting to convert each row to a customer object and do a POST to an API which I am also responsible for.
This approach has the obvious problem of firing of 500k+ HTTP calls and causing maximum HTTP connections to be reached.
I have had two suggestions thrown at me, opening up a WebSocket or using Spring Batch. Is this a good use case for opening a WebSocket and sending messages rather than opening multiple HTTP connections? Or is it better to go the more traditional route of using spring batch?
Since it appears to be your own server, you should just make a server route that allows you to send it multiple records at a time and then you can batch things into a lot fewer API calls.
If it's really 500k records, you need to send, you will probably still want to batch them into multiple requests, but you could at least do them 10k at a time and manage your connections so you don't have more than 5-10 requests in flight at any given time (since it's unlikely your server could process more than that at once anyway and this should keep your client from running out of network resources).
Or, if you want to do it more like a file upload, you could send 500k records worth of data, have your server handle it like a file upload and then once it succeeds, have the server process it.
In fact, you may want to just upload the CSV and let the server process it directly.
While a webSocket connection would let you use the same connection for multiple requests (which is a good thing), you still don't want to be sending 500k individual records. Just the overhead of sending that many separate request alone will be inefficient whether it's webSocket or http request. Instead, you really want to batch the requests and send large chunks of data per request.
This question might sound a bit abstract,answered (but did my search didn't stumble on a convenient answer) or not specific at all ,but I will try to provide as much information as I can.
I am building a mobile application which will gather and send sensory data to a remote server. The remote server will collect all these data in a mySQL database and make computations (not the mysql database ,another process/program) . What I wanna know is :
After some updates in the database , is it doable to send a response from a RESTful Server to a certain client (the one who like did the last update probably) ,using something like "a background thread"? Or this should be done via socket connection through server-client response?
Some remarks:
I am using javaEE, Spring MVC with hibernate and tomcat (cause I am familiar with the environment though in a more asynchronous manner).
I thought this would be a convenient way because the SQL schema is not much complicated and security and authentication issues are not needed (it's a prototype).
Also there is a front-end webpage that will have to visualize these data, so such a back-end system would look like a good option for getting the job done fast.
Lastly I saw this solution :
Is there a way to 'listen' for a database event and update a page in real time?
My issue is that besides the page I wanna update the client's side with messages from the RESTful server.
If all these above are unecessary and a more simple client-server application will prove better and less complex please be welcome to inform me.
Thank you in advance.
Generally you should upload your data to a resource on the server (e.g. POST /widgets and the server should immediately return with a 201 Created or (if creation is too slow and needs to happen later) 202 Accepted status. There are several approaches after that happens, each has their merits:
Polling - The server's response includes a location field which the client can then proceed to poll until a change happens (e.g. check for an update every second). This is the easiest approach and quite efficient if you use HTTP caching effectively and the average number of checks is relatively low.
Push notification - Server sends a push notification when the change happens, report's generated, etc. Obviously this requires you to store the client's details and their notification requirements. This is probably the cleanest approach and also easy to scale. In the case of Android (also iOS) you have free push notifications available via Google Cloud Messaging.
Set up a persistent connection between client and server, e.g. using a Websocket or low-level TCP connection. This should yield the fastest response times, but will probably be a drain on phone battery, harder to scale on the server, and more complex to code.
I'm looking for opinion from you all. I have a web application that need to records data into another web application database. I not prefer to use HTTP request GET on 2nd application because of latency issue. I looking for fast way to save records on 2nd application quickly, I came across the idea of "fire and forget" , will JMS suit for this scenario? from my understanding JMS will guarantee message delivery, guarantee whether message will be 100% deliver is not important as long as can serve as many requests as possible. Let say I need to call at least 1000 random requests per seconds to 2nd application should I use JMS? HTTP request? or XMPP instead?
I think you're misunderstanding networking in general. There's positively no reason that a HTTP GET would have to be any slower than anything else, and if HTTP takes advantage of keep alives it's faster that most options.
JMX isn't a protocol, it's a specification that wraps many other protocols including, possibly, HTTP or XMPP.
In the end, at the levels where Java will operate, there's either UDP or TCP. TCP has more overhead by guarantees delivery (via retransmission) and ordering. UDP offers neither guaranteed delivery nor in-order delivery. If you can deal with UDP's limitations you'll find it "faster", and if you can't then any lightweight TCP wrapper (of which HTTP is one) is just about the same.
Your requirements seem to be:
one client and one server (inferred from your first sentence),
HTTP is mandatory (inferred from your talking about a web application database),
1000 or more record updates per second, and
individual updates do not need to be acknowledged synchronously (you are willing to use "fire and forget" approach.
The way I would approach this is to have the client threads queue the updates internally, and implement a client thread that periodically assembles queued updates into one HTTP request and sends it to the server. If necessary, the server can send a response that indicates the status for individual updates.
Batching eliminates the impact of latency on the client, and potentially allows the server to process the updates more efficiently.
The big difference between HTTP and JMS or XMPP is that JMS and XMPP allow asynchronous fire and forget messaging (where the client does not really know when and if a message will reach its destination and does not expect a response or an acknowledgment from the receiver). This would allow the first app to respond fast regardless of the second application processing time.
Asynchronous messaging is usually preferred for high-volume distributed messaging where the message consumers are slower than the producers. I can't say if this is exactly your case here.
If you have full control and the two web applications run in the same web container and hence in the same JVM, I would suggest using JNDI to allow both web applications to get access to a common data structure (a list?) which allows concurrent modification, namely to allow application A to add new entries and application B to consume the oldest entries simultaneously.
This is most likely the fastest way possible.
Note, that you should keep the information you put in the list to classes found in the JRE, or you will most likely run into class cast exceptions. These can be circumvented, but the easiest is most likely to just transfer strings in the common data structure.