Server Side Caching

Server Side Caching - java

I have a standalone WildFly 9.0.2 and I want to cache on the server side the responses for certain requests.
Some of the requests are available for all users (visitors), others should be available only to authenticated users.
I do not understand from the documentation how to do this.
Can you point me to a tutorial or manual that implements this functionality?
I started wildfly using the default configuration for Infispan that is found in the standalone\configuration\standalone.xml
Then, I modified the response object to contain in the header information for caching, hoping it would work like JAX-RS where it would check the headers and automatically cache.
final HttpServletResponse response
long current = System.currentTimeMillis();
long expires = current + 86400000;
response.setHeader("Cache-Control", "no-transform, max-age="+ 86400 + ", public");
response.addDateHeader("Expires", expires);
response.addDateHeader("Last-Modified", current);
That unfortunately did not work on the server side (thought it did work for my web application which is reading properly the header information for cache and it re-uses its local cache).
When I tried to view the Infinispan settings from the administration panel at http://127.0.0.1:9990, I get an exception and cannot proceed.
Thank you in advance for your help.

There is no standalone Java servlet server that does response caching the way you anticipated. The headers you set in the response, will be interpreted by the browser (which does cache) or intermediate proxies, which might cache also. Specialized proxies for caching are: Varnish, NGINX. These proxies are also called Edge Proxies.
Crafting a library that enables a standalone server to cache like you want to, seams possible, the normal request flow could be intercepted by a ServletFilter. I don't know of any public library that is doing something like that.
If you want to cache inside the application, the normal thing to do is to use a caching library, like EHCache, cache2k, Google Guava Cache and others.
In your specific example case I would recommend, that you become familiar with a proxy cache server like NGINX and put it in front of your application. That is the, let's say, the "industry standard". It is not desired to do HTTP response caching inside the Java server, for a couple of reasons:
In case of a cache hit, the response from the proxy is faster and the Java server is not hit
You can scale, by putting more caching proxies in front of your application
The Java heap is not a good fit to cache a large amount of data. Where should it go? There are caches that do overflow to disk. This needs complex setup, as well as a caching proxy in front of your application
For debugging and transparency it is better that the server generates a fresh answer when a request is sent to it
I always recommend to do caching inside the application, too. However we do it on a Java object level. The cache size is limited, so the heap keeps small. A lot of cached objects inside the application are used for many different responses, so object caching is on a more finer level then HTTP response caching.
Only in some special cases we do something similar to HTTP response caching inside the application, too. This is used to compress or recompress some images and CSS resources that are used very often. Here is some potential that is a general useful thing. Maybe we will open source this.
Hope that helps.

Related

Why is URL rewriting in GAE discouraged?

I'm using Google App Engine with Java runtime. I'm not too clear about the routing policy Google is enforcing. And web searches have only yielded misleading and discordant results in my case.
According to Google documentation:
App Engine runs multiple instances of your application, each instance has its own web server for handling requests. Any request can be routed to any instance, so consecutive requests from the same user are not necessarily sent to the same instance. The number of instances can be adjusted automatically as traffic changes.
That's pretty clear. And it lets me think that URL rewriting I define in my development server will be applied after this higher level routing, meaning when a request actually reaches one of the many potentially available app server instances.
However, I then stumbled upon a thread in which it's argued that URL rewrite using the Tuckey's URL Rewrite plugin is troublesome in that it cuts out Google's content delivery network for so-called static files.
So my questions are:
- Given that the request gets to the server after being routed, what's the matter of rewriting the URL? Server level rewriting shouldn't have any impact on the top level Google's routing.
- Google says that static files are stored in different servers having nothing to do with the real application, so you basically don't know there they are. Does this means that if a request for one of these static files comes in, the actual resource is requested to one of these reserved servers and only other requests (including missing resources and invalid URLs coming from pushstate) actually reaches my app's server instances?
It all this was true, I wouldn't see any real performance risk in rewriting URLs at server instance level.

There's nothing wrong with doing server side rewrites.
Google serves static files (as configured in appengine-web.xml) using google front end, which is a low latency, edge cached cdn, so you should always prefer to have it serve static content for you, rather than generated or served from your war. It's faster, cheaper and doesn't have the drawbacks of instance scaling (I.e users waiting for spin-up)
In your case (an index.html for angular), you don't need a rewrite, you can map the same servlet/resource to all paths, e.g. /*
Tucky rewrite comes into play when the possible web.xml path rules are not flexible enough to map your routes to servlets. But there's no problem using it if needed.

What is the best approach to build a system with high amount of data communication?

Hello
I have a cache server (written with Java+Lucene Framework) which keeps large amount of data and provides them according to request query.
It basically works like this:
On the startup, it connects DB and stores all tables to the RAM.
It listens for requests and provides the proper data as array lists (about 1000 - 20000 rows)
When a user visits to the web page, it connects to the cache server, requests, and show the server response.
I planned to run web and cache applications in different instances because of memory issues. Cache Server is as service and web is on Tomcat.
What is your suggestion about how the communication should be built between web side and cache server ?
I need to pass large amount of data with array lists from one instance to another. Should I think web services (xml communication), nio socket communication (maybe Apache MINA) or the solutions like CORBA ?
Thanks.

It really depends very much on considerations you have not specified.
What are the clients? for example, if your clients are javascript running AJAX, obviously something over HTTP is more useful than a proprietary UDP solution.
What network is it working on? Local networks behave differently than internet, and mobile internet is quite different than both.
How elaborate use can you make of caching? If you use HTTP you can have a rather good control (through HTTP headers) of both client cache and network caches, and a plethora of existing software that can make use of both.
There are many other considerations to be taken into account, and there are many existing implementations of systems matching the more-common needs. From your (not very detailed) description you gave, I would recommend having a look at Redis.

what does it mean when they say http is stateless

I am studing java for web and it mentions http is stateless.
what does that mean and how it effects the programming
I was also studying the spring framework and there it mentions some beans have to declared as inner beans as their state changes . What does that means?

HTTP -- that is the actual transport protocol between the server and the client -- is "stateless" because it remembers nothing between invocations. EVERY resource that is accessed via HTTP is a single request with no threaded connection between them. If you load a web page with an HTML file that within it contains three <img> tags hitting the same server, there will be four TCP connections negotiated and opened, four data transfers, four connections closed. There is simply no state kept at the server at the protocol level that will have the server know anything about you as you come in.
(Well, that's true for HTTP up to 1.0 at any rate. HTTP 1.1 adds persistent connection mechanisms of various sorts because of the inevitable performance problems that a truly stateless protocol engenders. We'll overlook this for the moment because they don't really make HTTP stateful, they just make it dirty-stateless instead of pure-stateless.)
To help you understand the difference, imagine that a protocol like Telnet or SSH were stateless. If you wanted to get a directory listing of a remote file, you would have to, as one atomic operation, connect, sign in, change to the directory and issue the ls command. When the ls command finished displaying the directory contents, the connection would close. If you then wanted to display the contents of a specific file you would have to again connect, sign in, change to the directory and now issue the cat command. When the command displaying the file finished, the connection would again close.
When you look at it that way, though the lens of Telnet/SSH, that sounds pretty stupid, doesn't it? Well, in some ways it is and in some ways it isn't. When a protocol is stateless, the server can do some pretty good optimizations and the data can be spread around easily. Servers using stateless protocols can scale very effectively, so while the actual individual data transfers can be very slow (opening and closing TCP connections is NOT cheap!) an overall system can be very, very efficient and can scale to any number of users.
But...
Almost anything you want to do other than viewing static web pages will involve sessions and states. When HTTP is used for its original purpose (sharing static information like scientific papers) the stateless protocol makes a lot of sense. When you start using it for things like web applications, online stores, etc. then statelessness starts to be a bother because these are inherently stateful activities. As a result people very rapidly came up with ways to slather state on top of the stateless protocol. These mechanisms have included things like cookies, like encoding state in the URLs and having the server dynamically fire up data based on those, like hidden state requests, like ... well, like a whole bunch of things up to and including the more modern things like Web Sockets.
Here are a few links you can follow to get a deeper understanding of the concepts:
http://en.wikipedia.org/wiki/Stateless_server
http://en.wikipedia.org/wiki/HTTP
http://en.wikipedia.org/wiki/HTTP_persistent_connection

HTTP is stateless - this means that when using HTTP the end point does not "remember" things (such as who you are). It has no state. This is in contrast to a desktop application - if you have a form and you go to a different form, then go back, the state has been retained (so long as you haven't shut down the application).
Normally, in order to maintain state in web application, one uses cookies.

A stateless protocol does not require the server to retain information or status about each user for the duration of multiple requests. For example, when a web server is required to customize the content of a web page for a user, the web application may have to track the user's progress from page to page.
A common solution is the use of HTTP cookies. Other methods include server side sessions, hidden variables (when the current page is a form), and URL-rewriting using URI-encoded parameters, e.g., /index.php?session_id=some_unique_session_code.
here

HTTP is called a stateless protocol because each command is executed independently, without any knowledge of the commands that came before it.
This shortcoming of HTTP is being addressed in a number of new technologies, including cookies.

When it's said that something is stateless it usually means that you can't assume that the server tracks any state between interactions.
By default the HTTP protocol assumes a truly stateless server. Every request is treated as an independent request.
In practice this is fixed by some servers (most of them) using a tracking cookie in the request to match some state on the server with a specific client. This works because the way cookies work (they are posted to server on each subsequent requests once they have been set on the client).
Basically a server that isn't stateless is an impediment to scale. You need to either make sure that you route all the requests from a specific browser to the same instance or to do backend replication of the states. This usually is a limiting factor when trying to scale an application.
There are some other solutions for keeping track of state (see rails's encrypted state cookie) but basically if you want to grow you need to figure a way to avoid tracking state on the server :).

Java Tomcat enable caching

What is the best way to set up caching on tomcat?
Also how caching works?Is it url based,dir based or what?
I need url specific caching so the database don't make the same calculations for the same url's

The simplest way, is to use a dedicated web cache provider. Tomcat does not have one OOTB, but you could employ Apache with mod_cache (and obviously mod_jk). In this configuration, Apache tends to act as a proxy-cache that caches the dynamic content served by Tomcat; you will have to ensure that Tomcat serves the right headers to ensure that Apache will cache the responses.
There are other commerical web-cache solutions, but they're typically used for high-end uses.
You could also employ Squid instead of Apache, to act as a reverse proxy that is also capable of serving cached content; in this case, Squid performs caching of the dynamic content.
If you do not wish to invest in an additional server, like the above solutions suggest, you might consider using EHCache to perform web page caching on Tomcat itself.
Related
Java Web Application: How to implement caching techniques?

Tomcat doesn't support what you want out of the box, so you'll need some extra stuff. I'm not fully aware of all mod_cache capabilities but if you're not using Apache, OSCache can do what you're requesting.

Can you use the Expires Filter?
http://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#Expires_Filter
See also this question:
Enable caching in Tomcat 6?

Scalable http session management (java, linux)

Is there a best-practice for scalable http session management?
Problem space:
Shopping cart kind of use case. User shops around the site, eventually checking out; session must be preserved.
Multiple data centers
Multiple web servers in each data center
Java, linux
I know there are tons of ways doing that, and I can always come up with my own specific solution, but I was wondering whether stackoverflow's wisdom of crowd can help me focus on best-practices
In general there seem to be a few approaches:
Don't keep sessions; Always run stateless, religiously [doesn't work for me...]
Use j2ee, ejb and the rest of that gang
use a database to store sessions. I suppose there are tools to make that easier so I don't have to craft all by myself
Use memcached for storing sessions (or other kind of intermediate, semi persistent storage)
Use key-value DB. "more persistent" than memcached
Use "client side sessions", meaning all session info lives in hidden form fields, and passed forward and backward from client to server. Nothing is stored on the server.
Any suggestions?
Thanks

I would go with some standard distributed cache solution.
Could be your application server provided, could be memcached, could be terracotta
Probably doesn't matter too much which one you choose, as long as you are using something sufficiently popular (so you know most of the bugs are already hunted down).
As for your other ideas:
Don't keep session - as you said not possible
Client Side Session - too unsecure - suppose someone hacks the cookie to put discount prices in the shopping cart
Use database - databases are usually the hardest bottleneck to solve, don't put any more there than you absolutely have to.
Those are my 2 cents :)
Regarding multiple data centers - you will want to have some affinity of the session to the data center it started on. I don't think there are any solutions for distributed cache that can work between different data centers.

You seem to have missed out vanilla replicated http sessions from your list. Any servlet container worth its salt supports replication of sessions across the cluster. As long as the items you put into the session aren't huge, and are serializable, then it's very easy to make it work.
http://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html
edit: It seems, however, that tomcat session replication doesn't scale well to large clusters. For that, I would suggest using JBoss+Tomcat, which gives the idea of "buddy replication":
http://www.jboss.org/community/wiki/BuddyReplicationandSessionData

I personally haven't managed such clusters, but when I took a J2EE course at the university the lecturer said to store sessions in a database and don't try to cache it. (You can't meaningfully cache dynamic pages anyway.) Http sessions are client-side by the definition, as the session-id is a cookie. If the client refuses to store cookies (e.g. he's paranoid about tracking), then he can't have a session.
You can get this id by calling HttpSession.getId().
Of course database is a bottleneck, so you'll end up with two clusters: an application server cluster and a database cluster.
As far as I know, both stateful message beans and regular servlet http sessions exist only in memory without load balancing built in.
Btw. I wouldn't store e-mail address or usernames in a hidden field, but maybe the content of the cart isn't that sensitive data.

I would rather move away from storing user application state in an HTTP session, but that would require a different way of thinking how the application works and use a RESTful stateless architecture. This normally involves dropping support for earlier versions of browsers that do not support MVWW architectures on the client side.
The shopping cart isn't a user application state it is an application state which means it would be stored on a database and managed as such. There can be an association table that would link the user to one or many shopping carts assuming the sharing of carts is possible.
Your biggest hurdle would likely be how to authenticate the user for every request if it is stateless. BASIC auth is the simplest approach that does not involve sessions, FORM-auth will require sessions regardless. A JASPIC implementation (like HTTP Headers or OAuth) will be able to mitigate your authentication concerns elsewhere, in which case a cookie can be used to manage your authentication token (like FORM-auth) or HTTP header like SiteMinder or Client Side Certificates with Apache.
The more expensive databases like DB2 have High Availability and Disaster Recovery features that work across multiple data centers. Note that it is not meant for load balancing the database, since there'd be a large impact due to network traffic.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.