GAE/GWT server side data inconsistent / not persisting between instances

GAE/GWT server side data inconsistent / not persisting between instances - java

I'm writing a game app on GAE with GWT/Java and am having a issues with server-side persistent data.
Players are polling using RPC for active games and game states, all being stores on the server. Sometimes client polling fails to find game instances that I know should exist. This only happens when I deploy to google appspot, locally everything is fine.
I understand this could be to do with how appspot is a clouded service and that it can spawn and use a new instance of my servlet at any point, and the existing data is not persisting between instances.
Single games only last a minute or two and data will change rapidly, (multiple times a second) so what is the best way to ensure that RPC calls to different instances will use the same server-side data?
I have had a look at the DataStore API and it seems to be database like storage which i'm guessing will be way too slow for what I need. Also Memcache can be flushed at any point so that's not useful.
What am I missing here?

You have two issues here: persisting data between requests and polling data from clients.
When you have a distributed servlet environment (such as GAE) you can not make request to one instance, save data to memory and expect that data is available on other instances. This is true for GAE and any other servlet environment where you have multiple servers.
So to you need to save data to some shared storage: Datastore is costly, persistent, reliable and slow. Memcache is fast, free, but non-reliable. Usually we use a combination of both. Some libraries even transparently combine both: NDB, objectify.
On GAE there is also a third option to have semi-persisted shared data: backends. Those are always-on instances, where you control startup/shutdown.
Data polling: if you have multiple clients waiting for updates, it's best not to use polling. Polling will make a lot of unnecessary requests (data did not change on server) and there will still be a minimum delay (since you poll at some interval). Instead of polling you use push via Channel API. There are even GWT libs for it: gwt-gae-channel, gwt-channel-api.

Short answer: You did not design your game to run on App Engine.
You sound like you've already answered your own question. You understand that data is not persisted across instances. The two mechanisms for persisting data on the server side are memcache and the datastore, but you also understand the limitations of these. You need to architect your game around this.
If you're not using memcache or the datastore, how are you persisting your data (my best guess is that you aren't actually persisting it). From the vague details, you have not architected your game to be able to run across multiple instances, which is essential for any app running on App Engine. It's a basic design principle that you don't know which instance any HTTP request will hit. You have to rearchitect to use the datastore + memcache.
If you want to use a single server, you can use backends, which behave like single servers that stick around (if you limit it to one instance). Frankly though, because of the cost, you're better off with Amazon or Rackspace if you go this route. You will also have to deal with scaling on your own - ie if a game is running on a particular server instance, you need to build a way such that playing the game consistently hits that instance.

Remember you can deploy GWT applications without GAE, see this explanation:
https://developers.google.com/web-toolkit/doc/latest/DevGuideServerCommunication#DevGuideRPCDeployment
You may want to ask yourself: Will your application ever NEED multiple server instances or GAE-specific features?
If so, then I agree with Peter Knego's reply regarding memcache etc.
If not, then you might be able to work around your problem by choosing a different hosting option (other than GAE). Particularly one that lets you work with just a single instance. You could then indeed simply manage all your game data in server memory, like I understand you have been doing so far.
If this solution suits your purpose, then all you need to do is find a suitable hosting provider. This may well be a cloud-based PaaS offer, provided that they let you put a hard limit (unlike with GAE) on the number of server instances, and that it goes as low as one. For example, Heroku (currently) lets you do that, as far as I understand, and apparently it's suitable for GWT applications, according to this thread:
https://stackoverflow.com/a/8583493/2237986
Note that the above solution involves a bit of fiddling and I don't know your needs well enough to make a strong recommendation. There may be easier and better solutions for what you're trying to do. In particular, have a look at non-cloud-based hosting options and server architectures that are optimized for highly time-critical, real-time multiplayer gaming.
Hope this helps! Keep us posted on your progress.

Related

Cluster system architecture?

I would like to develop application for ~500 active users (sessions at one time). System would not process any massive calculations. It will be simple read/write to database solution. However, to application would be uploaded about 50mb of data daily per user. (it would be analysed and clean by other application every day when non users will be active). Actually I'm working on design of this application and I've got few questions about that.
Should I consider developing application working in some cluster with load balance or one server will handle this amount of usage?
If yes, is there any guidelines about developing application to work in cluster? Is there any difference than developing single server application?
Should I be worried about database of this application? What problems should I expect when 2 servers will read/write data to single database at same time? Maybe it also should work in cluster?
I would be pleased for any help and/or articles about design this mid size applications.

This depends on you NFR (non functional requirements). Next to load balancing, a cluster provides higher availability.
You'll have to make your back-end state-less so that requests from the same user can end up on another node without the user noticing. This makes it more expensive to build scaling software. So consider your options carefully.
Accessing a database from multiple servers is not different than accessing it from multiple threads.

To answer your first question, I think using an infrastructure provider that lets you easily scale (up or down) your application is always a big plus and can help you save money. My main experience with this kind of providers is with Amazon Web Services (AWS).
I don't know precisely what technology you are planning to use, but a general setup like that on AWS would make sense to me is:
A set of EC2 instances (= virtual servers) running behind an ELB (a load balancer)
An auto scaling group containing the EC2 instances. You can look it up, but an auto scaling group basically lets you automatically add and remove instances depending on various factors (server load, disk I/O, etc.)
The use of RDS for your database. It supports multiple DBMS such as MySQL and Oracle. It also provides you with nice features such as replication, automated backups and monitoring.
The use of CodeDeploy to deploy your application on the servers
(I'm voluntarly using the AWS names so that you can read the documentation if you are interested.)
This would basically let you scale to a lot more than 500 concurrent users if needed, and could save you some money when you are handling less users. Note that auto scaling groups can also be scheduled. For instance : « I want at least 5 instances during the day (max 50), but you can go down to 2 (and still up to 50) between 1am and 4am »
The services I mentionned are quite widely documented, so you can look it up if you'd like some more specific details.
I won't discuss in detail your two other questions because I'm not an expert on the subject, but the database can indeed be a bottleneck since it may involve a lot of I/Os.
Hope this helps :)

Single Java Cache for multiple Application Server

We got multiple Application Server behind a Reverse Proxy. We want a single cache on another host which all Application Servers can easily use, thus the cache has to have some kind of network support. Furthermore the setup should be easy probably supporting docker, but this is not a must. The cache duration is about 1d. The API should be as easy and standardized as possible (JCache?).
In a later stage we want to prepolutate the Cache.
Which options do I have?
Background: In a first step we want to reduce load on the backend systems, which provides mainly SOAP Services. So we want to cache the SOAP response (JAX-WS). The cache hit rate will be probably about 25% in a first stage.
Later we want to use the same cache for JPA as well (we already have in memory caching enabled for each Application Servcer and use a Cache Coordination strategy).
To use even more caching we will need some sort cache categories.

In general: The question is to broad and actually you are asking for a product recommendation. Please take a look at the stackoverflow question guidelines.
About your question:
There is no "single cache" for any purpose. Furthermore, there can be many variants in software and system architecture, with a single cache product, too. The best solution depends not on the application but on the type of data access you want to cache. Some questions that come to my mind:
Do you have a mostly read or a read/write usage pattern?
What is the type of access, point, range, or a full scan? What type of operations you do on the data? What is the object count and typical object size? Are there hot spots? How many application servers you have? Is there a memory limit in the application servers? How costly is it to generate the data in the backend (latency and resource costs)?
One general recommendation: If you only have a few application servers, I would start with local caching in the application servers and ignore the fact that there may be redundant requests on the backend from different application servers. This way you can keep the existing system architecture. Putting in a separate cache server or servers needs a lot of planing and a lot considerations for staging, deployment and operation your application.
One second general recommendation: The cache hit rate will be probably about 25% in a first stage A cache with this hitrate will be pretty useless. It may happen that you don't get any performance gain from the cache at all. There may be reasons to do it anyway, e.g. to improve the application for flash crowds. This needs some more detailed elaboration. Double check you numbers!
I am looking forward for more detailed questions :)

What about using the cache server from Ehcache ?
It provides a RESTful interface and can run on a dedicated server.

Transform local software into online software

We've developed a management software specifically to very small businesses. But, unexpectedly, several bigger businesses liked it and started to use it. The problem is that our software works only locally (one store) and the bigger businesses want us to make the software to work online (two or more stores at different places but with the same database).
So, we would like to make our local software to work online, utilizing our already existent JAVA Swing user interface.
We've thought about some solutions but, as it will be a big change, we would like to know what is the best way to proceed.
Important information:
The user interface is JAVA Swing and the database is Postgresql.
We have thousands of customers using our software and they would like to use it online too.
Below are the solutions that we've thought about. Please, let us know if there is a better way.
Solution #1
A single database in internet and all clients connected to it.
Drawbacks:
Every single query would have to access internet.
The source code of the clients would need to have the database password. Then the security would be totally compromised.
Solution #2
All clients would have their own database in internet, with different passwords.
Drawbacks:
Every single query would have to access internet.
Would need thousand of databases.
Difficult to maintain.
Solution #3
A single database in internet, but clients connecting through a web service that validates the customers login data and returns the queries results.
Drawbacks:
Every single query would have to access internet.
The construction of the web service would be a little complex and it would have to return the results in somehow we don't know (maybe simple csv text or xml).
Solution #4
There would exist a single database in the internet for all our customers.
Also, all the clients would have their own database locally, so they could do fast select queries.
Every update query would be firstly sent to a web service that would execute the query at the online database and return if it were successfully done.
Besides that, we would have a mechanism to synchronize the local databases with the online database from time to time.
Drawbacks:
Very complex and difficult to implement.
The synchronize mechanism would require high processing.
Is there a better way? How?

I would go with Solution #3. Build a database-backed service/API, and have the desktop client authenticate itself and use it. I would avoid having a local database as in Solution #4. You cannot rely on your users not to accidentally mess with it somehow and cause synchronization to be lost or corrupted. In addition, having a local database will slow you down when you want to create a different client, for example a mobile app.
If you decide to go with Solution #3, the current de facto standard is JSON-based REST API. Also, you should note that there are many caching techniques that can be used which will reduce the number of queries actually run.

How can I communicate between PHP and a Java program?

I'm working on a web application that frequently requires a calculation intense query to be run, the results of which are stored in a separate table. Using MySQL, this query takes about 500ms (as optimized as possible, believe me). To eliminate this bottleneck, I've created a Java program that loads the relevant DB data into memory and performs the query itself; it takes about 8ms (something I'm a little bit proud of). I'd like to use this Java program to get the results, and if it fails or is unavailable, failover to having PHP run a MySQL query.
Since loading the data into the Java application takes some time, it's going to load it once and remain running as a background process. Now, the question is how do I communicate with this Java application via PHP?
Keep in mind:
Multiple instances of PHP may need to communicate with this Java process simultaneously.
If the Java instance cannot be found (eg: it crashes for some reason) PHP should progress by using the older and slower MySQL method.
An intermediary process, such as Memcache, is acceptable.
Ideally, the solution would withstand race conditions.
I would preferably not like to use MySQL as the intermediary.
I was going to use Memcache, where PHP would write to a known key and poll until that key changed to "completed", meanwhile Java would poll that key and once it found something perform the job and set it to "completed". However, this wouldn't work for two reasons. First, both PHP and Java read/write to Memcache using serialized objects, and there's no way to change that, and I don't want Java to unserialize PHP objects and vice/versa -- it's too messy. Second, this is not ACID compliant -- if a queue built up there would be race conditions.
For now, I'm stuck with polling MySQL "selects" to see if a job is off the queue or not, which is far from an optimal solution because the poll time will need to be slower so MySQL doesn't get pinged too frequently. I need a better solution!
Thanks.
Edit: Duh. It looks like I will be using some sort of SocketServer in Java, which I'm unfamiliar with. An example might help :)

I'm using socket server on the Java end, and PHP sockets on the PHP end. Works great.
There's no need to overcomplicate things with PHP/Java bridge, and no need for overhead of creating a web server.
Sockets work great, and I'm actually a bit ashamed I even asked the question to begin work.

My suggestion is to use WebServices... Write and run webservice in Java, and then request it in php by using f.e. NuSOAP. This solution have one more advantage - your webservice can be used easily in other applications like f.e. .NET ones...
Another option which might be easier if you have small number of methods is to build Servlet in Java which will take the parameters as GET request.
Both those solutions are strictly web-based, and both of them are working on separate threads so they guarantee you good performance.

Second level cache for java web app and its alternatives

Between the transitions of the web app I use a Session object to save my objects in.
I've heard there's a program called memcached but there's no compiled version of it on the site,
besides some people think there are real disadvantages of it.
Now I wanna ask you.
What are alternatives, pros and cons of different approaches?
Is memcached painpul for sysadmins to install? Is it difficult to embed it to the existing infrastructure from the perspective of a sysadmin?
What about using a database to hold temporary data between web app transitions?
Is it a normal practice?

What about using a database to hold
temporary data between web app
transitions? Is it a normal practice?
Database have indeed a cache already. A well design application should try to leverage it to reduce the disk IO.
The database cache works at the data level. That's why other caching mechanism can be used to address different levels. At the java level, you can use the 2nd level cache of hibernate, which can cache entities and query result. This can notably reduce the network IO between the app. server and the database.
Then you may want to address horizontal scalability, that is, to add servers to manage the load. In this case, the 2nd level cache need to be distributed across the nodes. This exists (see JBoss cache), but can get slightly complicated to manage.
Distributed cache tend to worker better if they have simpler scheme based on key/value. That's what memcached is, but there are also other similar solutions. The biggest problem with distributed caches is invalidation of outdated entries -- which can itself turn into a performance bottleneck.
Don't think that you can use a distributed cache as-is to make your performance problems vanish. Designing a scalable distributed architecture requires experience and is always a matter of trade-off between what to optimize and not.
To come back to your question: for regular application, there is IMHO no need of a distributed cache. Decent disk IO and network IO lead usually to decent performance.
EDIT
For non-persistent objects, you have several options:
The HttpSession. Objects need to implement Serializable. The exact way the session is managed depends on the container. In a cluster, the session is usually replicated twice, so that if one node crashes you still have one copy. There is then session affinity to route the request to the server that has the session in memory.
Distributed cache. A system like memcached may indeed make sense, but I don't know the details.
Database. You could of course dump any Serializable object in the database in a BLOB. Can be an option if the web servers are not as reliable as the database server.
Again, for regular application, I would try to go as far as possible with the HttpSession.

How about Ehcache? It's an easy to use pure Java solution ready to plug in to Hibernate. As far as I remember it's supported by containers.
It's quite painless in my experience.

http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache
This page should have everything that you need (hopefully !)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.