Why is URL rewriting in GAE discouraged? - java

I'm using Google App Engine with Java runtime. I'm not too clear about the routing policy Google is enforcing. And web searches have only yielded misleading and discordant results in my case.
According to Google documentation:
App Engine runs multiple instances of your application, each instance has its own web server for handling requests. Any request can be routed to any instance, so consecutive requests from the same user are not necessarily sent to the same instance. The number of instances can be adjusted automatically as traffic changes.
That's pretty clear. And it lets me think that URL rewriting I define in my development server will be applied after this higher level routing, meaning when a request actually reaches one of the many potentially available app server instances.
However, I then stumbled upon a thread in which it's argued that URL rewrite using the Tuckey's URL Rewrite plugin is troublesome in that it cuts out Google's content delivery network for so-called static files.
So my questions are:
- Given that the request gets to the server after being routed, what's the matter of rewriting the URL? Server level rewriting shouldn't have any impact on the top level Google's routing.
- Google says that static files are stored in different servers having nothing to do with the real application, so you basically don't know there they are. Does this means that if a request for one of these static files comes in, the actual resource is requested to one of these reserved servers and only other requests (including missing resources and invalid URLs coming from pushstate) actually reaches my app's server instances?
It all this was true, I wouldn't see any real performance risk in rewriting URLs at server instance level.

There's nothing wrong with doing server side rewrites.
Google serves static files (as configured in appengine-web.xml) using google front end, which is a low latency, edge cached cdn, so you should always prefer to have it serve static content for you, rather than generated or served from your war. It's faster, cheaper and doesn't have the drawbacks of instance scaling (I.e users waiting for spin-up)
In your case (an index.html for angular), you don't need a rewrite, you can map the same servlet/resource to all paths, e.g. /*
Tucky rewrite comes into play when the possible web.xml path rules are not flexible enough to map your routes to servlets. But there's no problem using it if needed.

Related

Limit number of calls to RESTful service

we have a RESTful service deployed on multiple nodes and we want to limit the number of calls coming to our service from each client with different quota for each client per minute.
our stack : Jboss application server, Java/Spring RESTful service.
What cloud be the possible technique to implement this?
Sometimes ago I read a good article where the same theme was highlighted.
The idea is to move this logic into load balancing proxy and here some good reasons to do it:
Eliminates technical debt - If you’ve got rate limiting logic coupled in with app logic, you’ve got technical debt you don’t need. You can lift and shift that debt
Efficiency gains - You’re offloading logic upstream, which means all your compute resources are dedicated to compute. You can better predict
Security - It’s well understood that application layer (request-response) attacks are on the rise, including denial of service. By leveraging an upstream proxy with greater capacity for connections you can stop those attacks in their tracks, because they never get anywhere near the actual server.
If the only way to access your API is through a UI client which you manages , then you can add a check on the client code (javascript in case of web app) to make a call only when the limit is not crossed by that user. Else there is no way, since a user can always access your API and the only thing at the server level which you can do is to check whether to send an error or valid result as a part of API response.
To limit the stack, it means you need to keep state, at least based on some specific client identification. This may require you to maintain a central counter e.g. db (cassandra) which can allow you to look up the current request count per minute, and then within a java filter, you can restrict request counts as necessary.
Or if you can track the client's session, then you can track and then use sticky session, enforcing clients to use specific node for the duration of the client session, and hence you can simply track within a java filter, the number of requests per client, and send 503 code or something more relevant.

Server Side Caching

I have a standalone WildFly 9.0.2 and I want to cache on the server side the responses for certain requests.
Some of the requests are available for all users (visitors), others should be available only to authenticated users.
I do not understand from the documentation how to do this.
Can you point me to a tutorial or manual that implements this functionality?
I started wildfly using the default configuration for Infispan that is found in the standalone\configuration\standalone.xml
Then, I modified the response object to contain in the header information for caching, hoping it would work like JAX-RS where it would check the headers and automatically cache.
final HttpServletResponse response
long current = System.currentTimeMillis();
long expires = current + 86400000;
response.setHeader("Cache-Control", "no-transform, max-age="+ 86400 + ", public");
response.addDateHeader("Expires", expires);
response.addDateHeader("Last-Modified", current);
That unfortunately did not work on the server side (thought it did work for my web application which is reading properly the header information for cache and it re-uses its local cache).
When I tried to view the Infinispan settings from the administration panel at http://127.0.0.1:9990, I get an exception and cannot proceed.
Thank you in advance for your help.
There is no standalone Java servlet server that does response caching the way you anticipated. The headers you set in the response, will be interpreted by the browser (which does cache) or intermediate proxies, which might cache also. Specialized proxies for caching are: Varnish, NGINX. These proxies are also called Edge Proxies.
Crafting a library that enables a standalone server to cache like you want to, seams possible, the normal request flow could be intercepted by a ServletFilter. I don't know of any public library that is doing something like that.
If you want to cache inside the application, the normal thing to do is to use a caching library, like EHCache, cache2k, Google Guava Cache and others.
In your specific example case I would recommend, that you become familiar with a proxy cache server like NGINX and put it in front of your application. That is the, let's say, the "industry standard". It is not desired to do HTTP response caching inside the Java server, for a couple of reasons:
In case of a cache hit, the response from the proxy is faster and the Java server is not hit
You can scale, by putting more caching proxies in front of your application
The Java heap is not a good fit to cache a large amount of data. Where should it go? There are caches that do overflow to disk. This needs complex setup, as well as a caching proxy in front of your application
For debugging and transparency it is better that the server generates a fresh answer when a request is sent to it
I always recommend to do caching inside the application, too. However we do it on a Java object level. The cache size is limited, so the heap keeps small. A lot of cached objects inside the application are used for many different responses, so object caching is on a more finer level then HTTP response caching.
Only in some special cases we do something similar to HTTP response caching inside the application, too. This is used to compress or recompress some images and CSS resources that are used very often. Here is some potential that is a general useful thing. Maybe we will open source this.
Hope that helps.

Using nginx server as proxy along glassfish server for static content loading

Have been working on a java-ee restful application, and the front end is based on react.js. So i was looking for a good way to load static content and some file uploads to be handle by an nginx server. Have heard nginx is good for static file content loading, though am new to the nginx server environment so is are there use case or what is the best practices of using the nginx server along a glassfish sever.
You are heading in the right direction - you could use a webserver, e.g. nginx for serving static content, like files or static parts of your web content. And you should use it in a productive environment anyway, for several reaons.
First, if you have reasonable traffic, this shifts part of the load to another machine (as long as you have several machines at hand). This is not only good for big static content, e.g. serving files, but also for the many small parts. Consider for example having an image in a CSS class pointing to a resource within your deployed application to style a button, then your GlassFish will have to serve it along with your other dynamic web content. If on the other hand, it will come from an URL, it can be handled by your webserver and due to the static URL it can also be cached and directly used from there without serving it over and over again.
And then, apart from performance, your webserver will allow for handling security issues before the request reaches your application server. So you can for example decide based on the URL of your REST services which node should handle it and which security guidelines to follow, for example if an SSL certificate must be provided.
But all in all, it depends very much on your application and environment. It might not be necessary at all to build all this if it's OK for your purposes to let the GlassFish handle everything.

How to route subdomains to one or more appropriate nodes within a cluster?

I am trying to solve a distributed computing architecture problem. Here is the scenario.
Users come to my website and registers. As a part of the registration process they get a subdomain. For example, foo.xyz.com.
Now each users website is located/replicated on a one or more cluster nodes using some arbitrary scheme.
When the user request comes in (HTTP request via browser) , appropriate subdomain must be redirected to the matching cluster node. Essentially, I want my own dynamic domain name. I need to implement it in a fast and efficient way.
I've a java based web application which runs inside a Jetty7 container.
thanks,
NG
This definitely should be implemented outside of your application. Your Web application should be, as much as possible, agnostic from the way that requests get balanced in a cluster. The best performance you would get would be with hardware load balancers this one for example.
If you want to go for software based balancing I would configure Apache to serve as entry point and balance the traffic for your cluster with something like mod_proxy. See this tutorial that refers to Jetty.
have you taken a look at Nginx?Nginx may be more than your needs but it does effective job of routing subdomains to particular nodes.

what does it mean when they say http is stateless

I am studing java for web and it mentions http is stateless.
what does that mean and how it effects the programming
I was also studying the spring framework and there it mentions some beans have to declared as inner beans as their state changes . What does that means?
HTTP -- that is the actual transport protocol between the server and the client -- is "stateless" because it remembers nothing between invocations. EVERY resource that is accessed via HTTP is a single request with no threaded connection between them. If you load a web page with an HTML file that within it contains three <img> tags hitting the same server, there will be four TCP connections negotiated and opened, four data transfers, four connections closed. There is simply no state kept at the server at the protocol level that will have the server know anything about you as you come in.
(Well, that's true for HTTP up to 1.0 at any rate. HTTP 1.1 adds persistent connection mechanisms of various sorts because of the inevitable performance problems that a truly stateless protocol engenders. We'll overlook this for the moment because they don't really make HTTP stateful, they just make it dirty-stateless instead of pure-stateless.)
To help you understand the difference, imagine that a protocol like Telnet or SSH were stateless. If you wanted to get a directory listing of a remote file, you would have to, as one atomic operation, connect, sign in, change to the directory and issue the ls command. When the ls command finished displaying the directory contents, the connection would close. If you then wanted to display the contents of a specific file you would have to again connect, sign in, change to the directory and now issue the cat command. When the command displaying the file finished, the connection would again close.
When you look at it that way, though the lens of Telnet/SSH, that sounds pretty stupid, doesn't it? Well, in some ways it is and in some ways it isn't. When a protocol is stateless, the server can do some pretty good optimizations and the data can be spread around easily. Servers using stateless protocols can scale very effectively, so while the actual individual data transfers can be very slow (opening and closing TCP connections is NOT cheap!) an overall system can be very, very efficient and can scale to any number of users.
But...
Almost anything you want to do other than viewing static web pages will involve sessions and states. When HTTP is used for its original purpose (sharing static information like scientific papers) the stateless protocol makes a lot of sense. When you start using it for things like web applications, online stores, etc. then statelessness starts to be a bother because these are inherently stateful activities. As a result people very rapidly came up with ways to slather state on top of the stateless protocol. These mechanisms have included things like cookies, like encoding state in the URLs and having the server dynamically fire up data based on those, like hidden state requests, like ... well, like a whole bunch of things up to and including the more modern things like Web Sockets.
Here are a few links you can follow to get a deeper understanding of the concepts:
http://en.wikipedia.org/wiki/Stateless_server
http://en.wikipedia.org/wiki/HTTP
http://en.wikipedia.org/wiki/HTTP_persistent_connection
HTTP is stateless - this means that when using HTTP the end point does not "remember" things (such as who you are). It has no state. This is in contrast to a desktop application - if you have a form and you go to a different form, then go back, the state has been retained (so long as you haven't shut down the application).
Normally, in order to maintain state in web application, one uses cookies.
A stateless protocol does not require the server to retain information or status about each user for the duration of multiple requests. For example, when a web server is required to customize the content of a web page for a user, the web application may have to track the user's progress from page to page.
A common solution is the use of HTTP cookies. Other methods include server side sessions, hidden variables (when the current page is a form), and URL-rewriting using URI-encoded parameters, e.g., /index.php?session_id=some_unique_session_code.
here
HTTP is called a stateless protocol because each command is executed independently, without any knowledge of the commands that came before it.
This shortcoming of HTTP is being addressed in a number of new technologies, including cookies.
When it's said that something is stateless it usually means that you can't assume that the server tracks any state between interactions.
By default the HTTP protocol assumes a truly stateless server. Every request is treated as an independent request.
In practice this is fixed by some servers (most of them) using a tracking cookie in the request to match some state on the server with a specific client. This works because the way cookies work (they are posted to server on each subsequent requests once they have been set on the client).
Basically a server that isn't stateless is an impediment to scale. You need to either make sure that you route all the requests from a specific browser to the same instance or to do backend replication of the states. This usually is a limiting factor when trying to scale an application.
There are some other solutions for keeping track of state (see rails's encrypted state cookie) but basically if you want to grow you need to figure a way to avoid tracking state on the server :).

Categories

Resources