I am sending REST based requests to a server. I would like to get the response as quickly as possible and would want to know the various optimizations that can be made.
One way is of course to send these requests in parallel in threads. What other options are available to optimize this?
On the server, part what configurations can be added?
Optimizations for REST calls (or just HTTP calls):
Like Brian Kelly said, cache the calls aggressively.
You can minimize the payloads that are returned when doing a GET. If it's returning JSON, you can trim the names of the fields to make the total return object smaller.
You can make sure you have compression turned on.
You can batch calls. So if a user wants to go three GETs in a row, you might batch those server side (assuming a web application) and then make one HTTP call with the three requests.
Again if it's a web application and you want to minimize load times for pages, you can load only essential data on page load and push the rest of the calls to AJAX calls.
You can optimize your database queries that serve the REST calls.
The biggest bang for you buck will definitely be caching though.
Related
I just read that it is recommended to use asynchronous method calls on the server via promises when executing long running requests. The documentation says this is because the Play server will block on the request and not be able to handle concurrent requests.
Does this mean all of my web requests should be asynchronous?
I'm just thinking that if I want to to increase my web pages rendering times that I would make a series of ajax calls to fetch needed page regions concurrently. Since I would potentially make multiple ajax calls, my Play controller methods need to be asynchronous.
Am I understanding this correctly? The syntax is quite verbose so I want to make certain I don't take this concept overboard. It would seem strange to me that I have to do this given other web severs such as Glassfish or IIS automatically handle pooling.
Here are some detailed docs on Play's thread pools, various different configurations, how to tune them, best practices etc:
http://www.playframework.com/documentation/2.2.x/ThreadPools
I'm working on a site containing real estate listings in Spring MVC. I would like to prevent scripts to steal the content by scraping the site. Does anyone have experience with techniques that can easily be plugged in to a spring mvc environment?
User-agent is too simple to circumvent.
One idea I had was to keep track of two counters on the serverside.
ipaddress --> (counter xhr request, counter page request)
the counter page request is increased with a filter
the counter xhr request is increased on document ready
If a filter notices the two counters are totally out of sync, the ip is blocked.
Could this work or are there easier techniques?
Cheers
edit
I am aware that if scrapers are persistent they will find a way to get the content. However, I'd like to make it as hard as possible.
Off the top of my head:
Look for patterns in how your pages are requested. Regular intervals is a flag. Regular frequency might be a flag (four times a day, but at different times during the day).
Require login. Nothing gets shown until the user logs in, so at least the scraper has to have an account.
Mix up the tag names around the content every once in a while. It might break their script. Do this enough times and they'll search for greener pastures.
You can't stop it at all, but you can make it harder as much as possible.
One way to make it harder is change your content URL very frequent base on time with appending some encrypted flag in url.
Some of suggestion are in given link.
http://blog.screen-scraper.com/2009/08/17/further-thoughts-on-hindering-screen-scraping/
http://www.hyperarts.com/blog/the-definitive-guide-to-blog-content-scraping-how-to-stop-it/
Load the content via ajax.
Make the ajax request dynamic so they cant just go and scrape the ajax request.
Only sophisticated scrapers support execution of java script.
Most scrapers dont run the pages through a real browser, so you can try to use that to your advantage.
I'm developing an MVC spring web app, and I would like to store the actions of my users (what they click on, etc.) in a database for offline analysis. Let's say an action is a tuple (long userId, long actionId, Date timestamp). I'm not specifically interested in the actions of my users, but I take this as an example.
I expect a lot of actions by a lot of (different) users par minutes (seconds). Hence the processing time is crucial.
In my current implementation, I've defined a datasource with a connection pool to store the actions in a database. I call a service from the request method of a controller, and this service calls a DAO which saves the action into the database.
This implementation is not efficient because it waits that the call from the controller and all the way down to the database is done to return the response to the user. Therefore I was thinking of wrapping this "action saving" into a thread, so that the response to the user is faster. The thread does not need to be finished to get the reponse.
I've no experience in these massive, concurrent and time-critical applications. So any feedback/comments would be very helpful.
Now my questions are:
How would you design such system?
would you implement a service and then wrap it into a thread called at every action?
What should I use?
I checked spring Batch, and this JobLauncher, but I'm not sure if it is the right thing for me.
What happen when there are concurrent accesses at the controller, the service, the DAO and the datasource level?
In more general terms, what are the best practices for designing such applications?
Thank you for your help!
Take a singleton object # apps level and update it with every user action.
This singleton object should have a Hashmap as generic, which should get refreshed periodically say after it reached a threshhold level of 10000 counts and save it to DB, as a spring batch.
Also, periodically, refresh it / clean it upto the last no.# of the records everytime it processed. We can also do a re-initialization of the singleton instance , weekly/ monthly. Remember, this might lead to an issue of updating the same in case, your apps is deployed into multiple JVM. So, you need to implement the clone not supported exception in singleton.
Here's what I did for that :
Used aspectJ to mark all the actions of the user I wanted to collect.
Then I sent this to log4j with an asynchronous dbAppender...
This lets you turn it on or off with log4j logging level.
works perfectly.
If you are interested in the actions your users take, you should be able to figure that out from the HTTP requests they send, so you might be better off logging the incoming requests in an Apache webserver that forwards to your application server. Putting a cluster of web servers in front of application servers is a typical practice (they're good for serving static content) and they are usually logging requests anyway. That way the logging will be fast, your application will not have to deal with it, and the biggest work will be writing a script to slurp the logs into a database where you can do analysis.
Typically it is considered bad form to spawn your own threads in a Java EE application.
A better approach would be to write to a local queue via JMS and then have a separate component, e.g., a message driven bean (pretty easy with EJB or Spring) which persists it to the database.
Another approach would be to just write to a log file and then have a process read the log file and write to the database once a day or whenever.
The things to consider are: -
How up-to-date do you need the information to be?
How critical is the information, can you lose some?
How reliable does the order need to be?
All of these will factor into how many threads you have processing your queue/log file, whether you need a persistent JMS queue and whether you should have the processing occur on a remote system to your main container.
Hope this answers your questions.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Consider the situation.
I am writing a statistical analysis app. The app has multiple tiers.
Frontend UI written for multiple device types, desktop, browser,
mobile.
Mid-tier servlet that offers a so-called REST service to
these frontend.
Backend that performs the extreme computation of the statistical
processing.
Which communicates with a further backend database
Due to the reason that statistical analysis requires huge amount of processing power, you would never dream of delegating such processing to the front-end.
The statistical analyses consists of procedures or a series of
work-flow steps.
Some steps may require so much processing power, you would not want
to repeat them.
If you have a work-flow of 20 steps, you cannot execute step 20
without first executing step 19, which cannot be executed without
first executing step 18, so on and so forth.
There are observation points, such that, for example, the
statistician must inspect results of steps 3, 7, 9, 14, 19 before
telling to client-side to proceed to the next step.
Each of these steps are a so-called request to the REST service, to
tell the backend supercomputer to progressively set up the
statistical model in memory.
There are many workflows. Some workflows may incidentally share step
results. e.g., Flow[dry]:Step[7] may share Flow[wet]:Step[10]. Due
to the amount of processing involved, we absolutely have prevent
repeating a step that might incidentally have already be
accomplished by another flow.
Therefore, you can see that in the so-called REST service being designed,
it is not possible that each request be independent of any previous request.
Therefore, how true can the following statement be?
All REST interactions are stateless. That is, each request contains
all of the information necessary for a connector to understand the
request, independent of any requests that may have preceded it.
Obviously, the application I described, requires that request be dependent on previous request. There are three possibilities that I can see concerning this app.
My app does not comply to REST, since it cannot comply to stateless requests. It may use the JAX-RS framework, but using JAX-RS and all the trappings of REST does not make it REST, simply because it fails the stateless criteria.
My app is badly designed - I should disregard trying to avoid the temporal and financial cost restacking up a statistical model even if it took 5 - 15 minutes for a workflow. Just make sure there is no dependence on previous requests. Repeat costly steps when necessary.
The stateless criteria is outdated. My understanding of REST is outdated/defective in that REST community have been constantly ignoring this criteria.
Is my app considered RESTful?
New Question: ISO 9000
Finally, in case my app is not completely considered RESTFul, would all references to "REST" need to be omitted to pass ISO 9000 certification?
new edit:
REST-in-piece
OK, my colleague and I have discussed this and decided to call such an architecture/pattern REST-in-piece = REST in piecemeal stages.
ISTM, you're reading too much into to statelessness. A REST API supports traditional CRUD operations. The API for CouchDB is good example of how DB state is updated by a series of stateless transactions.
Your task is to identify what the resources are and the "state transfers" between them. Each step in your workflow is a different state transfer, marked by a different URI. Each update/change to a resource has an accompanying POST/PATCH or an idempotent PUT or DELETE operation.
If you want to gain a better of understanding of what is means to be RESTful and the reasons behind each design choice, I recommend spending a hour reading Chapter 5 of Roy Fielding's Dissertation.
When making design choices, just think about what the principles of RESTful design are trying to accomplish. Setup your design so that queries are safe (don't change state) and that they are done in a ways that can be bookmarkable, cacheable, distributable, etc. Let each step in the workflow jump to a new state with a distinct URI so that a user can back-up, branch out different ways, etc. The whole idea is to create a scalable, flexible design.
You are updating an in memory model via a REST api. This means that you are maintaining state on the server between requests.
The REST-ful way of addressing this would be to make the client maintain the state by simply processing the request and returning all the information for constructing the next request in the response. The server then reconstructs the in memory model from the information in the request and then does its thing. That way, if you operate in a e.g. a clustered environment, any of the available servers would be able to handle the request.
Whether or not this is the most efficient way to do things depends on your application. There are loads of enterprise applications that use a server side session and elaborate load balancing to ensure that clients always use the same nodes in a cluster. So having server side state is an entirely valid design choice and there are plenty of ways of implementing this robustly. However, server side state generally complicates scaling out and REST in the purest sense is all about avoiding server side state and avoiding complexity.
A workaround/compromise is persisting the state in some kind of database or store. That way your nodes can fetch the state from disk before processing a request.
It all depends on what you need and what is acceptable for you. As the previous commenter mentioned, don't get too hung up on this whole stateful-ness thing. Clearly somebody will have to maintain state and the question is merely what the best place is to put that state for you and how you access it. Basically there are a couple of tradeoffs that basically have to do with various what-if type scenarios. For example, if the server crashes, do you want your client to re-run the entire set of requests to reconstruct the calculation or do you prefer to simply resend the last request? I can imagine that you don't really need high availability here and don't mind the low risk that something occasionally goes wrong for your clients. In that case, having the state on the server side in memory is an acceptable solution.
Assuming your server holds the computation state in some hash map, a REST-ful way of passing the state around then could be simply sending back the key for the model in the response. That's a perfectly REST-ful API and you can change the implementation to persist the state or do something else without changing the API when needed. And this is the main point of being REST-ful: decouple the implementation details from the API. Your client doesn't need to know where you put the state or how you store it. All it needs is a resource representation of that state that can be manipulated.
Of course the key should be represented as a URI. I recommend you read Jim Webber's "REST in practice". It's a great introduction to designing REST-ful APIs.
I have to develop a web service that looks like this: I make a get call, including a string in the url, and I need to receive another string based on the initial string from the query.
I might have to make this call even for a thousands times a minute. Do you think that the server will be able to handle so much HTTP communication? Is a RPC approach better?
Any suggestion is welcomed, I am just starting to work on web services and I have no clue about the performance.
Thanks.
Thousands calls per minute means hundreds per second. I believe that modern computers can do more. I do not think that you will have serious performance limitations. But before you are starting check how long will it take to deal with the request. If this will take time I'd recommend you to decouple the HTTP WEB front end and business logic, i.e. process the request asynchronously. You can easily achieve this using JMS.
SOAP or REST? I personally prefer REST. It is simpler, it is faster. And it seems that you have only 2 String parameters, so SOAP does not give you any advantages.
IMHO, the main difference between SOAP and REST is that the former inserts an additional overhead (both processing and data) since it's data has to follow a somewhat strict structure. REST is simpler and leaner because it doesn't require you to explicitly define a message format, leaving this task to the software that will handle the message instead of the transport infrastructure.
So:
Do you want to enforce a message structure at the cost of additional overhead? Use SOAP;
You want a more lightweight option, at the cost of having senders and receivers piece together the messages into meaningful data? Use REST;
One of the key advantages of a REST web service is that its responses can be cached. In this way, the intermediate HTTP cache chain between your service and its clients bears a huge part of the total workload, so your web service can scale up. REST can be far more scalable than SOAP or RPC.
You may also want to check out Jersey at http://jersey.java.net/ as an alternative to Restlet.