what does it mean when they say http is stateless

what does it mean when they say http is stateless - java

I am studing java for web and it mentions http is stateless.
what does that mean and how it effects the programming
I was also studying the spring framework and there it mentions some beans have to declared as inner beans as their state changes . What does that means?

HTTP -- that is the actual transport protocol between the server and the client -- is "stateless" because it remembers nothing between invocations. EVERY resource that is accessed via HTTP is a single request with no threaded connection between them. If you load a web page with an HTML file that within it contains three <img> tags hitting the same server, there will be four TCP connections negotiated and opened, four data transfers, four connections closed. There is simply no state kept at the server at the protocol level that will have the server know anything about you as you come in.
(Well, that's true for HTTP up to 1.0 at any rate. HTTP 1.1 adds persistent connection mechanisms of various sorts because of the inevitable performance problems that a truly stateless protocol engenders. We'll overlook this for the moment because they don't really make HTTP stateful, they just make it dirty-stateless instead of pure-stateless.)
To help you understand the difference, imagine that a protocol like Telnet or SSH were stateless. If you wanted to get a directory listing of a remote file, you would have to, as one atomic operation, connect, sign in, change to the directory and issue the ls command. When the ls command finished displaying the directory contents, the connection would close. If you then wanted to display the contents of a specific file you would have to again connect, sign in, change to the directory and now issue the cat command. When the command displaying the file finished, the connection would again close.
When you look at it that way, though the lens of Telnet/SSH, that sounds pretty stupid, doesn't it? Well, in some ways it is and in some ways it isn't. When a protocol is stateless, the server can do some pretty good optimizations and the data can be spread around easily. Servers using stateless protocols can scale very effectively, so while the actual individual data transfers can be very slow (opening and closing TCP connections is NOT cheap!) an overall system can be very, very efficient and can scale to any number of users.
But...
Almost anything you want to do other than viewing static web pages will involve sessions and states. When HTTP is used for its original purpose (sharing static information like scientific papers) the stateless protocol makes a lot of sense. When you start using it for things like web applications, online stores, etc. then statelessness starts to be a bother because these are inherently stateful activities. As a result people very rapidly came up with ways to slather state on top of the stateless protocol. These mechanisms have included things like cookies, like encoding state in the URLs and having the server dynamically fire up data based on those, like hidden state requests, like ... well, like a whole bunch of things up to and including the more modern things like Web Sockets.
Here are a few links you can follow to get a deeper understanding of the concepts:
http://en.wikipedia.org/wiki/Stateless_server
http://en.wikipedia.org/wiki/HTTP
http://en.wikipedia.org/wiki/HTTP_persistent_connection

HTTP is stateless - this means that when using HTTP the end point does not "remember" things (such as who you are). It has no state. This is in contrast to a desktop application - if you have a form and you go to a different form, then go back, the state has been retained (so long as you haven't shut down the application).
Normally, in order to maintain state in web application, one uses cookies.

A stateless protocol does not require the server to retain information or status about each user for the duration of multiple requests. For example, when a web server is required to customize the content of a web page for a user, the web application may have to track the user's progress from page to page.
A common solution is the use of HTTP cookies. Other methods include server side sessions, hidden variables (when the current page is a form), and URL-rewriting using URI-encoded parameters, e.g., /index.php?session_id=some_unique_session_code.
here

HTTP is called a stateless protocol because each command is executed independently, without any knowledge of the commands that came before it.
This shortcoming of HTTP is being addressed in a number of new technologies, including cookies.

When it's said that something is stateless it usually means that you can't assume that the server tracks any state between interactions.
By default the HTTP protocol assumes a truly stateless server. Every request is treated as an independent request.
In practice this is fixed by some servers (most of them) using a tracking cookie in the request to match some state on the server with a specific client. This works because the way cookies work (they are posted to server on each subsequent requests once they have been set on the client).
Basically a server that isn't stateless is an impediment to scale. You need to either make sure that you route all the requests from a specific browser to the same instance or to do backend replication of the states. This usually is a limiting factor when trying to scale an application.
There are some other solutions for keeping track of state (see rails's encrypted state cookie) but basically if you want to grow you need to figure a way to avoid tracking state on the server :).

Related

Why is URL rewriting in GAE discouraged?

I'm using Google App Engine with Java runtime. I'm not too clear about the routing policy Google is enforcing. And web searches have only yielded misleading and discordant results in my case.
According to Google documentation:
App Engine runs multiple instances of your application, each instance has its own web server for handling requests. Any request can be routed to any instance, so consecutive requests from the same user are not necessarily sent to the same instance. The number of instances can be adjusted automatically as traffic changes.
That's pretty clear. And it lets me think that URL rewriting I define in my development server will be applied after this higher level routing, meaning when a request actually reaches one of the many potentially available app server instances.
However, I then stumbled upon a thread in which it's argued that URL rewrite using the Tuckey's URL Rewrite plugin is troublesome in that it cuts out Google's content delivery network for so-called static files.
So my questions are:
- Given that the request gets to the server after being routed, what's the matter of rewriting the URL? Server level rewriting shouldn't have any impact on the top level Google's routing.
- Google says that static files are stored in different servers having nothing to do with the real application, so you basically don't know there they are. Does this means that if a request for one of these static files comes in, the actual resource is requested to one of these reserved servers and only other requests (including missing resources and invalid URLs coming from pushstate) actually reaches my app's server instances?
It all this was true, I wouldn't see any real performance risk in rewriting URLs at server instance level.

There's nothing wrong with doing server side rewrites.
Google serves static files (as configured in appengine-web.xml) using google front end, which is a low latency, edge cached cdn, so you should always prefer to have it serve static content for you, rather than generated or served from your war. It's faster, cheaper and doesn't have the drawbacks of instance scaling (I.e users waiting for spin-up)
In your case (an index.html for angular), you don't need a rewrite, you can map the same servlet/resource to all paths, e.g. /*
Tucky rewrite comes into play when the possible web.xml path rules are not flexible enough to map your routes to servlets. But there's no problem using it if needed.

Server Side Caching

I have a standalone WildFly 9.0.2 and I want to cache on the server side the responses for certain requests.
Some of the requests are available for all users (visitors), others should be available only to authenticated users.
I do not understand from the documentation how to do this.
Can you point me to a tutorial or manual that implements this functionality?
I started wildfly using the default configuration for Infispan that is found in the standalone\configuration\standalone.xml
Then, I modified the response object to contain in the header information for caching, hoping it would work like JAX-RS where it would check the headers and automatically cache.
final HttpServletResponse response
long current = System.currentTimeMillis();
long expires = current + 86400000;
response.setHeader("Cache-Control", "no-transform, max-age="+ 86400 + ", public");
response.addDateHeader("Expires", expires);
response.addDateHeader("Last-Modified", current);
That unfortunately did not work on the server side (thought it did work for my web application which is reading properly the header information for cache and it re-uses its local cache).
When I tried to view the Infinispan settings from the administration panel at http://127.0.0.1:9990, I get an exception and cannot proceed.
Thank you in advance for your help.

There is no standalone Java servlet server that does response caching the way you anticipated. The headers you set in the response, will be interpreted by the browser (which does cache) or intermediate proxies, which might cache also. Specialized proxies for caching are: Varnish, NGINX. These proxies are also called Edge Proxies.
Crafting a library that enables a standalone server to cache like you want to, seams possible, the normal request flow could be intercepted by a ServletFilter. I don't know of any public library that is doing something like that.
If you want to cache inside the application, the normal thing to do is to use a caching library, like EHCache, cache2k, Google Guava Cache and others.
In your specific example case I would recommend, that you become familiar with a proxy cache server like NGINX and put it in front of your application. That is the, let's say, the "industry standard". It is not desired to do HTTP response caching inside the Java server, for a couple of reasons:
In case of a cache hit, the response from the proxy is faster and the Java server is not hit
You can scale, by putting more caching proxies in front of your application
The Java heap is not a good fit to cache a large amount of data. Where should it go? There are caches that do overflow to disk. This needs complex setup, as well as a caching proxy in front of your application
For debugging and transparency it is better that the server generates a fresh answer when a request is sent to it
I always recommend to do caching inside the application, too. However we do it on a Java object level. The cache size is limited, so the heap keeps small. A lot of cached objects inside the application are used for many different responses, so object caching is on a more finer level then HTTP response caching.
Only in some special cases we do something similar to HTTP response caching inside the application, too. This is used to compress or recompress some images and CSS resources that are used very often. Here is some potential that is a general useful thing. Maybe we will open source this.
Hope that helps.

Clustered event driven Java application - Should I use Websockets or polling?

I'm creating a monitor application that monitors the activities of a user. There are four elements in my system:
EventCatcher: The EventCatcher is responsible for catching all the events that happen in a subsystem and pushes the data to the EventHandler. Based from observation, there is an average of 10 events per second that is being pushed to the EventHandler. Some events are UserLogin, UserLogout.
EventHandler: The EventHandler is a singleton class that handles all the incoming events from the EventCatcher. It also keeps track of all the logged in users in the system. So, whenever the EventHandler receives a UserLogin event, the User object is extracted from the event and is stored in a HashMap. When a UserLogout event is received, that User object will be remove from the HashMap. This class also maintains a Set of all active Websocket sessions because everytime an event has occurred, I would want to inform all the open sessions that a particular event happened.
Websocket Endpoint: This is just a simple Java class annotated with #ServerEndpoint.
Clients: The system I will be building is for internal (company) use only. At production, at most, there will only be around 5 - 10 clients. All the clients will be receiving the same information every time an event has occurred.
So right now I am trying to convince my supervisor that Websockets is the way to go, however, my supervisor finds it really unnecessary because a simple polling solution would do the trick.
His points are:
We don't really need up-to-date information by the millisecond. We can poll every second.
If I was to maintain a list of open WebSocket sessions, how would that work in a clustered environment (we use a load balancer)
If I plan to send information to the client every time an event (UserLogin, UserLogout) has occurred, I should be able to just send small updates to all WebSocket sessions - meaning, I can't be sending a whole JSON dump of everything. So that means, for every WebSocket instance, I would have to maintain another Set of Users and properly maintain it to mirror the Set contained in the EventHandler.
What my supervisor suggests is that I lose the WebSocket and just convert it to a simple Servlet and let the clients poll every second to receive the entire JSON dump.
In this scenario, should I stick with WebSockets? Or should I just poll?
The main advantage, as far as I've read, of Websockets vs. polling is that by using Websockets, you will have a persistent connection from client to server. HTTP is not really meant for real-time data.
Also, polling requires sending an HTTP request every time and every request comes with HTTP headers. If an HTTP request header contains 800 bytes, then that's 48kb sent per minute per client. With a WebSocket, this isn't problem.
But then again, we won't really have a lot of active clients. We're not concerned about third parties sniffing our requests because this system is for company use only - internal use! And I believe my supervisor wants something simple and reliable.
I am fine with either way. I just want to be sure whether I'm using the right tool for the job.
Additional question: If WebSockets is the way to go, is there any reason why I should consider polling?

The entire purpose of WebSocket is to efficiently support continuing connections between client and server.
I’m not clear on how you are implementing your app. If this is a web app running in a Servlet environment leveraging WebSocket support in the web server, be aware that you need to use recent versions of the Servlet container. For example, with Tomcat you must use either version 8 or the latest updates to version 7.
And of course the web browser must have support for WebSocket.
Be aware that WebSocket is still a new technology that has been changing and evolving in both the specs and the implementations.
Atmosphere
You may want to consider using the Atmosphere framework. Atmosphere supports multiple techniques of Push including WebSocket & Comet.
The Vaadin web-app framework leverages Atmosphere to provide automatic support for Push in your app. By default, WebSocket is automatically attempted first. If WebSocket is not available, Vaadin+Atmosphere falls back automatically to the other techniques including polling.

What is the best approach to build a system with high amount of data communication?

Hello
I have a cache server (written with Java+Lucene Framework) which keeps large amount of data and provides them according to request query.
It basically works like this:
On the startup, it connects DB and stores all tables to the RAM.
It listens for requests and provides the proper data as array lists (about 1000 - 20000 rows)
When a user visits to the web page, it connects to the cache server, requests, and show the server response.
I planned to run web and cache applications in different instances because of memory issues. Cache Server is as service and web is on Tomcat.
What is your suggestion about how the communication should be built between web side and cache server ?
I need to pass large amount of data with array lists from one instance to another. Should I think web services (xml communication), nio socket communication (maybe Apache MINA) or the solutions like CORBA ?
Thanks.

It really depends very much on considerations you have not specified.
What are the clients? for example, if your clients are javascript running AJAX, obviously something over HTTP is more useful than a proprietary UDP solution.
What network is it working on? Local networks behave differently than internet, and mobile internet is quite different than both.
How elaborate use can you make of caching? If you use HTTP you can have a rather good control (through HTTP headers) of both client cache and network caches, and a plethora of existing software that can make use of both.
There are many other considerations to be taken into account, and there are many existing implementations of systems matching the more-common needs. From your (not very detailed) description you gave, I would recommend having a look at Redis.

Scalable http session management (java, linux)

Is there a best-practice for scalable http session management?
Problem space:
Shopping cart kind of use case. User shops around the site, eventually checking out; session must be preserved.
Multiple data centers
Multiple web servers in each data center
Java, linux
I know there are tons of ways doing that, and I can always come up with my own specific solution, but I was wondering whether stackoverflow's wisdom of crowd can help me focus on best-practices
In general there seem to be a few approaches:
Don't keep sessions; Always run stateless, religiously [doesn't work for me...]
Use j2ee, ejb and the rest of that gang
use a database to store sessions. I suppose there are tools to make that easier so I don't have to craft all by myself
Use memcached for storing sessions (or other kind of intermediate, semi persistent storage)
Use key-value DB. "more persistent" than memcached
Use "client side sessions", meaning all session info lives in hidden form fields, and passed forward and backward from client to server. Nothing is stored on the server.
Any suggestions?
Thanks

I would go with some standard distributed cache solution.
Could be your application server provided, could be memcached, could be terracotta
Probably doesn't matter too much which one you choose, as long as you are using something sufficiently popular (so you know most of the bugs are already hunted down).
As for your other ideas:
Don't keep session - as you said not possible
Client Side Session - too unsecure - suppose someone hacks the cookie to put discount prices in the shopping cart
Use database - databases are usually the hardest bottleneck to solve, don't put any more there than you absolutely have to.
Those are my 2 cents :)
Regarding multiple data centers - you will want to have some affinity of the session to the data center it started on. I don't think there are any solutions for distributed cache that can work between different data centers.

You seem to have missed out vanilla replicated http sessions from your list. Any servlet container worth its salt supports replication of sessions across the cluster. As long as the items you put into the session aren't huge, and are serializable, then it's very easy to make it work.
http://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html
edit: It seems, however, that tomcat session replication doesn't scale well to large clusters. For that, I would suggest using JBoss+Tomcat, which gives the idea of "buddy replication":
http://www.jboss.org/community/wiki/BuddyReplicationandSessionData

I personally haven't managed such clusters, but when I took a J2EE course at the university the lecturer said to store sessions in a database and don't try to cache it. (You can't meaningfully cache dynamic pages anyway.) Http sessions are client-side by the definition, as the session-id is a cookie. If the client refuses to store cookies (e.g. he's paranoid about tracking), then he can't have a session.
You can get this id by calling HttpSession.getId().
Of course database is a bottleneck, so you'll end up with two clusters: an application server cluster and a database cluster.
As far as I know, both stateful message beans and regular servlet http sessions exist only in memory without load balancing built in.
Btw. I wouldn't store e-mail address or usernames in a hidden field, but maybe the content of the cart isn't that sensitive data.

I would rather move away from storing user application state in an HTTP session, but that would require a different way of thinking how the application works and use a RESTful stateless architecture. This normally involves dropping support for earlier versions of browsers that do not support MVWW architectures on the client side.
The shopping cart isn't a user application state it is an application state which means it would be stored on a database and managed as such. There can be an association table that would link the user to one or many shopping carts assuming the sharing of carts is possible.
Your biggest hurdle would likely be how to authenticate the user for every request if it is stateless. BASIC auth is the simplest approach that does not involve sessions, FORM-auth will require sessions regardless. A JASPIC implementation (like HTTP Headers or OAuth) will be able to mitigate your authentication concerns elsewhere, in which case a cookie can be used to manage your authentication token (like FORM-auth) or HTTP header like SiteMinder or Client Side Certificates with Apache.
The more expensive databases like DB2 have High Availability and Disaster Recovery features that work across multiple data centers. Note that it is not meant for load balancing the database, since there'd be a large impact due to network traffic.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.