How to route to the nearest RMI Server? - java

In continuation to my question How to improve the performance of client server architecture application
I have decided to maintain a centralized database and several slave server-database configuration. I plan to use Symmetric DS for replicating between the slave and master database. Each server-database configuration would be installed closer to the client. Ideally I want the request from a client to route to the nearest slave server-database for the obvious reason. Since I'm using RMI to connect to the server, I want to know if there is any product/API currently available, which would solve this?
Any other solution than the above one is highly regarded :)
Note: Refactoring the client code is definitely one alternative but since the application is very huge, its a huge risk (can break existing code), time taking & expensive.

Take a look at distributed and consistent hashing:
http://en.wikipedia.org/wiki/Distributed_hash_table#Keyspace_partitioning
http://en.wikipedia.org/wiki/Consistent_hashing
Barebones, you would setup a variant of consistent hashing that would take the identifier of the client (in lieu of the 'key') and locate the nearest server. Bonus benefit here is that if one of the slaves goes down, your infrastructure will transparently route to the next nearest server.

Related

Simple node discovery method

I'm starting work on a system that will need to discover nodes in a cluster and send those nodes jobs to work on. I know that a myriad of systems exist that solve this but I'm unclear about the complexities of each and which would best suit my specific needs.
Our requirements are that an application should be able to send out job requests. Each request will specify multiple segments of data to work on. Nodes in the cluster should get these job requests and figure out whether the data segments being requested are "convenient". The application will need to keep track of which segments are being worked on by some node and then possibly send out a further requests if there are data segments that it needs to force some nodes to work on (all the nodes have access to all the data, but they should prefer to work on data segments that they have already cached).
This is a very typical map/reduce problem but we don't want to use the standard hadoop solutions because we are trying to avoid the overhead of writing preliminary results to files. This is more of a streaming problem where we want nodes to perform filtering on data that they read and then send it over a network socket to the application that will combine the results from all the nodes.
I've taken a quick look at akka, apache-spark (streaming), storm and just plain simple UPNP and I'm not quite sure which one would suit my needs best. One thing that works against at least spark is that it seems to require ZooKeeper to be set up on the network which is a complication that we'd like to be able to avoid.
Is there any simple library that does something similar to this "auto discover nodes via network multicast" and then allows you to simply send messages back and forth to negotiate which node will handle which data segment? Will Akka be able to help me here? How are nodes added/discovered in a cluster there? Again, we'd like to keep the configuration overhead to a minimum which is why UPNP/SSDP look sort of nice.
Any suggestions for how to use the solutions mentioned above or even other libraries or solutions to look into are very much appreciated.
You could use Akka Clustering: http://doc.akka.io/docs/akka/current/java/cluster-usage.html. However, it doesn't use multicast, it uses a Gossip protocol to handle node up/down messages. You could use a Cluster-Aware Router (see the Akka Clustering doc and http://doc.akka.io/docs/akka/current/java/routing.html) to route your messages to the cluster, there are several different types of routers depending on your needs and what you mean by "convenient". If "convenient" just means which actor is currently free, you can use a Smallest Mailbox router. If it has something to do with the content of the message, you could use a Consistent Hashing router.
See Balancing Workload Across Nodes with Akka 2.
This post describes a work distribution algorithm using Akka. The algorithm doesn't use multicast to discover workers. There is a well-known master address and the workers register with the master. Other than that though it fits your requirements well.
Another variation on it is described in Akka Work Pulling Pattern.
I've used this pattern in a number of projects - it works great.
Storm is fairly resilient when it comes to worker-nodes coming offline & online. However, just like Spark, it does require Zookeeper.
The good news is that Storm is comes with a sister project to make deployment a breeze: https://github.com/nathanmarz/storm-deploy/wiki
If you're running vanilla storm on EC2, the storm-deploy project could be what you're looking for.

RMI Server Clustering with many clients

I have a question about my system's design. I searched the questions but couldn't find the same situation. So currently I have system that has 1 server and multiple(300+ for now) clients that connects with RMI. Since integrity issues I need to make this system fail safe so I need another server. I don't know how to configure my application for that for now but while doing so I'm wondering that if I could the server side clustered even with load balancing? These two servers are going to be different places with different ip addresses of course and they are comprehensive machines as well.
For example for when a client makes a request it makes the request to more available one.
I searched for external solutions but I'm very new to this stuff. Can you make a suggestion about them as well.
I appriciate the responses. If anything is not clear ask and I will clear it as much as I can.
RMI/JRMP doesn't support that in any way shape or form, but RMI/IIOP with a suitable failover ORB might.

GAE/GWT server side data inconsistent / not persisting between instances

I'm writing a game app on GAE with GWT/Java and am having a issues with server-side persistent data.
Players are polling using RPC for active games and game states, all being stores on the server. Sometimes client polling fails to find game instances that I know should exist. This only happens when I deploy to google appspot, locally everything is fine.
I understand this could be to do with how appspot is a clouded service and that it can spawn and use a new instance of my servlet at any point, and the existing data is not persisting between instances.
Single games only last a minute or two and data will change rapidly, (multiple times a second) so what is the best way to ensure that RPC calls to different instances will use the same server-side data?
I have had a look at the DataStore API and it seems to be database like storage which i'm guessing will be way too slow for what I need. Also Memcache can be flushed at any point so that's not useful.
What am I missing here?
You have two issues here: persisting data between requests and polling data from clients.
When you have a distributed servlet environment (such as GAE) you can not make request to one instance, save data to memory and expect that data is available on other instances. This is true for GAE and any other servlet environment where you have multiple servers.
So to you need to save data to some shared storage: Datastore is costly, persistent, reliable and slow. Memcache is fast, free, but non-reliable. Usually we use a combination of both. Some libraries even transparently combine both: NDB, objectify.
On GAE there is also a third option to have semi-persisted shared data: backends. Those are always-on instances, where you control startup/shutdown.
Data polling: if you have multiple clients waiting for updates, it's best not to use polling. Polling will make a lot of unnecessary requests (data did not change on server) and there will still be a minimum delay (since you poll at some interval). Instead of polling you use push via Channel API. There are even GWT libs for it: gwt-gae-channel, gwt-channel-api.
Short answer: You did not design your game to run on App Engine.
You sound like you've already answered your own question. You understand that data is not persisted across instances. The two mechanisms for persisting data on the server side are memcache and the datastore, but you also understand the limitations of these. You need to architect your game around this.
If you're not using memcache or the datastore, how are you persisting your data (my best guess is that you aren't actually persisting it). From the vague details, you have not architected your game to be able to run across multiple instances, which is essential for any app running on App Engine. It's a basic design principle that you don't know which instance any HTTP request will hit. You have to rearchitect to use the datastore + memcache.
If you want to use a single server, you can use backends, which behave like single servers that stick around (if you limit it to one instance). Frankly though, because of the cost, you're better off with Amazon or Rackspace if you go this route. You will also have to deal with scaling on your own - ie if a game is running on a particular server instance, you need to build a way such that playing the game consistently hits that instance.
Remember you can deploy GWT applications without GAE, see this explanation:
https://developers.google.com/web-toolkit/doc/latest/DevGuideServerCommunication#DevGuideRPCDeployment
You may want to ask yourself: Will your application ever NEED multiple server instances or GAE-specific features?
If so, then I agree with Peter Knego's reply regarding memcache etc.
If not, then you might be able to work around your problem by choosing a different hosting option (other than GAE). Particularly one that lets you work with just a single instance. You could then indeed simply manage all your game data in server memory, like I understand you have been doing so far.
If this solution suits your purpose, then all you need to do is find a suitable hosting provider. This may well be a cloud-based PaaS offer, provided that they let you put a hard limit (unlike with GAE) on the number of server instances, and that it goes as low as one. For example, Heroku (currently) lets you do that, as far as I understand, and apparently it's suitable for GWT applications, according to this thread:
https://stackoverflow.com/a/8583493/2237986
Note that the above solution involves a bit of fiddling and I don't know your needs well enough to make a strong recommendation. There may be easier and better solutions for what you're trying to do. In particular, have a look at non-cloud-based hosting options and server architectures that are optimized for highly time-critical, real-time multiplayer gaming.
Hope this helps! Keep us posted on your progress.

Simple *Authoritative DNS Server* in Java

Is there an already written Java DNS Server that only implements authoritative responses. I would like to take the source code and move it into a DNS server we will be developing that will use custom rule sets to decide what TTL to use and what IP address to publish.
The server will not be a caching server. It will only return authoritative results and only be published on the WHOIS record for the domains. It will never be called directly.
The server will have to publish MX records, A records and SPF/TXT records. The plan is to use DNS to assist in load balancing among gateway-servers on multiple locations (we are aware that DNS has a short reach in this area). Also it will cease to publish IP addesses of gateway-servers when they go down (on purpose or on accident) (granted, DNS will only be able to help during extended outages).
We will write the logic for all this ourselves.. but I would very much like to start with a DNS server that has been through a little testing instead of starting from scratch.
However, that is only feasible if what we copy from is simple enough. Otherwise,, it could turn out to be a waste of time
George,
I guess what you need is a java library which implements DNS protocol.
Take a look at dnsjava
This is very good in terms of complete spec coverage of all types of records and class.
But the issue which you might face with a java based library is performance.
DNS servers would be expected to have a high throughput. But yes, you can solve that by throwing more hardware.
If performance is a concern for you , I would suggest to look into unbound
http://www.xbill.org/dnsjava/
Unfortunately, the documentation states "jnamed should not be used for production, and should probably not be used for testing. If the above documentation is not enough,
please do not ask for more, because it really should not be used."
I'm not aware of any better alternatives, however.
You could take a look at Eagle DNS:
http://www.unlogic.se/projects/eagledns
It's been around for a few years and it's quite well tested by now.

Grid Computing and Java

I couldn't seem to find a similar question to this.
I am currently looking at the best solution solving a grid computing problem.
The setup:
I have a server/client situation where there clients [typically dumb of most logic] and recieve instructions from the server
Have an authorization request
Clients report back information on speed of completing the task (the task's difficult is judged by the task type)
Clients recieve the best fit task for their previous performance (the best clients receive the worst problems)
Eventually the requirements would be:
The client's footprint must be small and standalone - I can't have a client that requires lots to install and setup
The client should be able to grab new jobs and job runtimes from the server (it would be nice to have the grid scale to new problems [and the new problems would be distributed by the server] that are introduced)
I need to have an authentication layer (doesn't have to be complex or conform to an existing ldap) [easier requirement: clients can signup for a new "membership" and get access] (I'm not sure that RMI's strengths lie here)
The clients would be able to run from the Internet rather in a networked environement
Which means encryption of the results requested
I'm currently using webservices to communicate between the clients and and the server. All of the information and results goes back to the hosting server (J2EE).
My question is there a grid system setup that matches all/most of these requirements, and is open source?
I'm not interested in doing a cloud because most of these tasks are small, but very frequent (once a day but the task may be easy, but performs maintenance).
All of the code for this system is in Java.
You may want to investigate space-based architectures, and in particular Jini and Javaspaces. What's Jini ? It is essentially RMI with a configurable discovery mechanism. You request an implementor of a Java interface, and the Jini subsystem finds current services implementing that interface and dynamically informs your service of these.
Briefly, you'd write the work items into a space. The grid nodes would be set up to read data transactionally from the space. Each grid node would take a work item, process it and write a result back into that space (or another space). The distributing node can monitor for results being written back (and.or for your projected result timings, as you've requested).
It's all Java, and will scale linearly. Because it's Jini, the grid nodes can dynamically load their classes from an HTTP server and so you can propagate code updates trivially.
Take a look on Grid Beans
BOINC sounds like it would work for your problem, though you have to wrap java for your clients. That, and it may be overkill for you.

Categories

Resources