How to cluster an application

How to cluster an application - java

I am sorry if the following question has already been answered or if it is simply not specific enough yet I am entirely new to the topic of clustered applications and I am looking for a starting point right now.
My goal is to write a software which will perform a variety of different tasks, primarily starting other software on remote servers. This software needs to calculate how many instance of these other softwares are currently required and will then have to somehow tell the remote server to launch new instances.
The software which is requesting new instance must not be unavailable or crash, though, which is why I am looking to distribute it onto multiple servers and then coordinate these instances to work simultaneously and take over if any instance crashes so that the system as a whole is somewhat resilient.
The application itself should either be reachable via a REST API or via a TCP connection, whereas I would prefer the former one. If you know any framework which might help me with this I would be pleased if you could let me know.
Best regards
PS.: I have seen that Spring provides some support for distributed system but I am unsure if that would really help me with my task.

Let me divide your questions into parts
You want your applications to do some tasks
Your applications are multiple for high availability purposes
You want some sort of a mechanism for your applications to communicate and pass messages
You want your application to be reachable (REST API or TCP) preferably REST
Well the answer to your first 3 questions lies in not implementing all of it yourself but use a distributed cache for your purposes. A Distributed Cache or an In-Memory Data Grid is clustered in nature and provides high availability for production environments. Scalabiilty is plus for you though.
What you could do is deploy as many as your applications you want and use the Cache/IMDG to provide high availability and a platform/medium for you to use to coordinate actions. Features like writer locks can guarantee no two applications are performing the same task or more than it was required.
Lastly, regarding creating a reachable REST API for your applications, simply deploy your applications as web based applications and use the distributed cache underneath.
Disclosure: I work for Alachisoft. An In-Memory Distributed Cache provider. We have two products, one for linux and one for .Windows, both server .NET and Java applications and objects from both the languages are interchangeable. A feature of TayzGrid and NCache

Related

Cloud Foundry for isolated environments

We have a quite large monolithic app (java/spring) and we are considering splitting it up to microservices and using spring-cloud to utilize existing solution for some common problems (discovery, redundancy etc.). Currently we run one instance (with different modules) per client.
Some of our clients are small and one VPS handles it and others are larger and might use multiple servers.
The problem is that this "pack" of microservices should be isolated for each environment - they might be slightly different.
As I am reading through resources about Cloud Foundry - which looks really great - it seems that it would be best to run an cloud foundry instance per client and I am afraid that that is overkill and quite a lot of work to get one client running (which I would like to automate as much as possible).
Ideal Solution
BEGIN
We provide servers with heterogenous OS, possible containers (VM/docker/jail/...) with restrictions where they may rur and finally services with restrictions in which containers they may run.
When creating new environment I just provide list of services to run in it and the Solution creates containers, deploys services in them and sets up communication channels (message broker) between them.
It should also handle upgrades, monitoring, etc.
END
What approach would you recommend? Or please could you share your experience from building similar thing?
Thanks

You could provide each customer with their own space in a single CF instance where all the microservices are deployed.

what is the difference between jolokia & jmxtrans ? When to choose one over the other? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am looking for a JMX querying tool. I have come across jolokia & jmxtrans, both support JSON based querying. JMXtrans has writers to monitoring tools which I think is missing in Jolokia. I googled, but I didn't get much info comparing the two.
But I have read positive blog posts wrt both tools. If anyone has used these earlier,pls share ur experiences...

I'm the author of jmxtrans. I considered Jolokia before I developed jmxtrans. I choose to develop with jmxtrans because I had a different usecase in mind.
Jolokia runs as a war in Tomcat. It is similar to jmxtrans in that it allows you to query a JMX server, but that is about where the similarity ends. You would need to implement the rest of jmxtrans on top of Jolokia in order to have feature parity.
On the other hand, jmxtrans is standalone with no requirements on Tomcat. My target audience is the netops/devops role. It is configured with a relatively simple JSON based structure so it doesn't require an engineering degree to configure and use it.
The thought is that you use jmxtrans to continuously monitor services which expose statistics via jmx. You use jmxtrans to 'transform' data from jmx to whatever output format you want. Generally, devops people want to integrate their JVMs with some sort of monitoring solution like Graphite/Ganglia so I provided output writers for those tools.
jmxtrans is also very smart about how it does queries against jmx servers and there is a bit of optimization in there for that. There is also a lot of work to allow you to do things like parallelize requests to many servers to enable to you scale jmxtrans to continuously query hundreds of servers with a single jmxtrans instance.
I hope that clarifies things a bit. If you have any specific questions, I'm happy to answer them.

Indeed, jmxtrans and Jolokia have a different focus. Jmxtrans is a full monitoring solution with a scheduler which queries JMX request periodically and sent the results to a backend like graphite or rrdtool. It uses standard JSR-160 (RMI based) communication for querying JMX enabeled Java servers
Jolokia on the other hand is an HTTP/JSON-JMX Adaptor, which allows for easy access for non-Java clients and adds some unique features which are not available for a pure JSR-160 implementation. For integration into a monitoring platform yet another piece of software is needed. For Nagios, there is Jmx4Perl which provides a full featured Nagios Plugin for querying Jolokia agents.
Since I'm the Jolokia author, let me allow to stress some highlights of Jolokia
Many requests can be send at once as a single bulk requests. This allow for querying hundreds of attributes with a single HTTP turn around. This really makes a huge difference in large environments.
Due to the usage of HTTP Jolokia allows easy querying across firewall boundaries (which is a nightmare when using standard JMX connectors)
Fine grained authorization is easily possible by using a plain XML policy file
Agents are available not only for Tomcat or any other Java EE container but also for any Java 6 application (e.g. ActiveMQ or Camel)
A nice suite of command lines tools (e.g. a readline based JMX shell with context sensitive command completion) comes with Jmx4Perl.
Access libraries from Perl, Javascript, Python, ... are available.
.... For more info, please refer to www.jolokia.org
To summarize, I think you should use jmxtrans when you need a complete monitoring solution based on JSR-160 remoting (however, you could use Nagios and check_jmx4perl, too) and Jolokia when you need to overcome JSR-160 limitations or can benefit of one of its unique features. One could even imagine to integrate Jolokia into jmxtrans for the communication to the servers to monitor, which then would go over JSON/HTTP instead of JSR-160 RMI (maybe this also clarify the difference focus and supported use case).

let me put one more project on the table - https://github.com/dimovelev/metrics-sampler
it queries JMX data using regular expressions and variable substitution and also supports JDBC queries as metric sources (mostly to monitor our oracle db stats) and mod_qos for the apache mod_qos stuff. We only need graphite as output and that is the only output it currently supports.
By the way, IMHO the JMX ports are problematic with firewalls just because hotspot picks a random ephemeral port upon startup. With jrockit it can be specified (to the same port as the registry) with a standard jvm option. To do this on the hotspot you need to code it yourself (or reuse code e.g. from the tomcat jmx connector). The nice part is that it is okay to set both ports to the same value thus needing just one firewall rule.
Cheers
Dimo

Java & the Different Types/Methods of Clustering

I'm relatively new to Java EE and have already began to hear about the many different types of systems that can be clustered:
Virtual Machines (i.e. "that appliance is a cluster of VMs...")
Application servers, such as Tomcat, JBoss or GlassFish (i.e. "We're running clustered JBoss...")
Clustering APIs like Terracotta
Databases, like Oracle ("clustered database")
Cloud applications ("A cloud is basically a cluster...")
Wikipedia defines "clustering" as:
A computer cluster consists of a set of loosely connected computers that work together so that in many respects they can be viewed as a single system.
I'm wondering how clustering works for each of these "cluster types/methods" (mentioned above) and how they relate to one another.
For instance, if one could benefit from having a clustered application, he/she would probably put them on a clustered app server and then throw a cluster manager into the mix (again, like Terracotta).
But because the phrase "clustering" seems to be used in vague/ambiguous ways, I'm not seeing how each of these ties into the others ones, or if they even do. Thanks in advance to any brave StackOverflowers out there who can help me make sense of this interwoven terminology!

To me, clustering implies a number of qualities to a system but it boils down to fault tolerance -- server, networking, and data persistence. There are both loosely and tightly coupled systems and all flavors in between. Tightly coupled systems have the clustering performed at level close to the hardware. Many of the old clustering systems were more tightly coupled with the applications often not recognizing that they were clustered.
Loosely coupled systems are the norm these days with a large degree of the fault tolerance accomplished at a software level entirely. Systems in the cluster only share network connectivity to be able to accomplish fault tolerance. Usually there are some specialized load balancers which route requests to the various cluster servers using specialized hardware (sometimes just software) to accomplish this.
All of the examples you mentioned have some sort of "clustering". It is going to take a very long answer to describe the details about how each of the architectures accomplish this. For me, the differences are what comes "for free" when you use the architecture, and how much work you will have to do to get it to work optimally.
How you mix and match the solutions you've mentioned depends on what your architecture looks like and your requirements. You can have a Terracotta store for local high speed persistence and the cloud for the rest. You can use Glassfish as your application server and utilize Terracotta as your persistence layer.
Here are my thoughts about the technologies you listed:
Cloud applications ("A cloud is basically a cluster...")
Cloud applications are the easiest to work with obviously. Your only job from an architecture standpoint is to pick a good cluster provider. Certainly Amazon and Google will do it "right" in terms of fault tolerance and data integrity. There are many other players that probably do it "good enough" and are cheaper. You program to their APIs which come with their own set of limitations and expenses. One problem with cloud applications is that it most likely will be very hard to switch to a new one. Again, you might have some [large] portion of your application running on cloud servers and have some local systems for your higher latency requirements. The trend is to put most production functions in the cloud or at least start that way until you get too big or need some services they can't provide.
Clustering APIs like Terracotta
Databases, like Oracle ("clustered database")
JBoss
These 3 systems provide their own clustering capabilities. They may require you to do a lot of machine and service layer configurations to get the system running well in a production environment. I hear good things about Terracotta which is a distributed persistence layer. I've used Jgroups a lot which is under Jboss and it can be tricky to get running right but Jboss may also have some good default configurations/documentation. Oracle is most likely going to be hardest to cluster right. DBA's make a lot of money tweaking Oracle configurations.
Virtual Machines (i.e. "that appliance is a cluster of VMs...")
Application servers, such as Tomcat, GlassFish
These are the most amorphous to define in terms of clustering. Some VMs are considered "clustered" in that they share networking hardware and power backplanes but are really not clusters when compared to cloud computing certainly. As mentioned, there are some clustered hardware solutions that are very custom and will require a lot of specific domain knowledge to get running well.
I have very little experience with application servers such as Tomcat and Glassfish. We have our own clustering software on top of Jgroups and run Jetty entirely. Application servers are not, in themselves, "clustered" but packages such as Jboss and Terracotta run on top of them to provide clustering and they have internal projects which have clustering software written for them.
Hope some of this helps.

Here's a quick whack at it. How you cluster depends on what your goals are. Here are some thoughts that also tie in to GlassFish.
A cluster enables multiple instances to be managed as one since they share a common configuration. If you make a change to a configuration, such as defining a new resource, then all instances that belong to a cluster inherit that change. Deploying an application to a cluster deploys it to all instances of that cluster.
A cluster provides service availability. If one instance fails, deployed applications are still available on other instances.
A cluster can offer session availability. If an instance dies while a user has items in their shopping cart, then another instance can take ownership of handling that user's session such that the shopping cart contents are still there. The user never knows a backend server has failed.
With GlassFish, HTTP session state can be managed by GlassFish (built-in), delegated to a coherence grid, or the application can manage state itself (using terracotta, database, etc). The benefit of using the built-in capability is that it works out of the box and has gone through stress testing, QA, etc. The benefit of externalizing is that you can potentially get better scalability since you decouple session management and application logic. Externalizing lets the JVM focus on executing business logic, and uses less HEAP space since backup sessions exist elsewhere. Oracle has tested / QA'd externalizing to the Coherence Grid, and is a formal feature of the commercial Oracle GlassFish Server. If you roll your own via database, then you need to manage & QA itH yourself.

Java Applications Replicating State

I have the same server side application running on multiple machines.
I would like to provide a way to push changes to all the applications. I'm more interested in state/property changes to the objects themselves, and not so much replication of files, etc.
So I'm envisioning an admin console, where I would change some property, and have the change affect each applications state.
I am currently looking into JGroups, which is a toolkit for reliable multicast communication. In this case, each application would listen in on the same multicast group, and the admin console would send changes to the group.
Are there any other solutions/techniques available?

There exist alot of techniques, corba, rmi etc etc. However if you want a fully distributed system with no central server, I would personally recommend JGroups.
If you have a central server you can either
Let the server push the changes to all clients. The server must be aware of all clients, either directly or by having the clients register themselves.
Let clients poll the server.

Other simple solutions might including, polling a central database or a central file.
A quick Google search turns up: http://www.hazelcast.com/product.jsp. Which looks promising but I have no experience with it.

For the more complex scenarios I can't recommend terracotta enough.
Essentially terracotta distributes parts of your heap to the network, meaning that your applications share parts of the heap. Changes made by one app in the shared heap will be visible to the other applications sharing the heap. The main drawback is that terracotta needs a dedicated server since it's a hub and spoke design.

Apache ZooKeeper from the Hadoop project may also be interesting. I have no experience with it, but from the webpage I conlude it offers a hierarchical datamodel. Each application interested in changes can watch for changes in a specific node and act upon the changes.
http://zookeeper.apache.org/doc/trunk/index.html

Glassfish Clustering architecture and Node Agents

I am going to be creating an application that will be highly distributed. There will be several "agents" each agent is a source of events, from external devices, ftp or filesystem. These agents will be deployed on seporate machines, close to the hardware source that will create the event. These agents will report back events to the central system for processing. One of the requirements is to be able to deploy new agents on the fly. These choices will be made by the user, because they will know what machines will be close to the hardware device, or will contain certain events. I will be writing this application in java, and have been looking at the glassfsh platform and what it can provide for me.
I'm looking at the clustering functionality of Glassfish, the Node Agents, and the heartbeet and startup functionality of the Node Agents.
My question is, can the clustering functionality support my requirements, I believe the original intent of the cluster is to load balance requests. My requirement is not quite the same, but it seems that Glassfish is really close to solving my requirements.
Does Glassfish offer the ability to expose what agents will have what applications, and allow me to configure the application running in a specific agent seporate from the application running on a different agent?
if anyone knows of any other platform that would allow me to deploy agents and manage them individually, along with supporting a heatbeet, and other management / high availability tasks I would love to hear of them.
Thanks
Joshua

Might a JMS based system be more appropriate for your application?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.