Glassfish Clustering architecture and Node Agents

Glassfish Clustering architecture and Node Agents - java

I am going to be creating an application that will be highly distributed. There will be several "agents" each agent is a source of events, from external devices, ftp or filesystem. These agents will be deployed on seporate machines, close to the hardware source that will create the event. These agents will report back events to the central system for processing. One of the requirements is to be able to deploy new agents on the fly. These choices will be made by the user, because they will know what machines will be close to the hardware device, or will contain certain events. I will be writing this application in java, and have been looking at the glassfsh platform and what it can provide for me.
I'm looking at the clustering functionality of Glassfish, the Node Agents, and the heartbeet and startup functionality of the Node Agents.
My question is, can the clustering functionality support my requirements, I believe the original intent of the cluster is to load balance requests. My requirement is not quite the same, but it seems that Glassfish is really close to solving my requirements.
Does Glassfish offer the ability to expose what agents will have what applications, and allow me to configure the application running in a specific agent seporate from the application running on a different agent?
if anyone knows of any other platform that would allow me to deploy agents and manage them individually, along with supporting a heatbeet, and other management / high availability tasks I would love to hear of them.
Thanks
Joshua

Might a JMS based system be more appropriate for your application?

Related

How to cluster an application

I am sorry if the following question has already been answered or if it is simply not specific enough yet I am entirely new to the topic of clustered applications and I am looking for a starting point right now.
My goal is to write a software which will perform a variety of different tasks, primarily starting other software on remote servers. This software needs to calculate how many instance of these other softwares are currently required and will then have to somehow tell the remote server to launch new instances.
The software which is requesting new instance must not be unavailable or crash, though, which is why I am looking to distribute it onto multiple servers and then coordinate these instances to work simultaneously and take over if any instance crashes so that the system as a whole is somewhat resilient.
The application itself should either be reachable via a REST API or via a TCP connection, whereas I would prefer the former one. If you know any framework which might help me with this I would be pleased if you could let me know.
Best regards
PS.: I have seen that Spring provides some support for distributed system but I am unsure if that would really help me with my task.

Let me divide your questions into parts
You want your applications to do some tasks
Your applications are multiple for high availability purposes
You want some sort of a mechanism for your applications to communicate and pass messages
You want your application to be reachable (REST API or TCP) preferably REST
Well the answer to your first 3 questions lies in not implementing all of it yourself but use a distributed cache for your purposes. A Distributed Cache or an In-Memory Data Grid is clustered in nature and provides high availability for production environments. Scalabiilty is plus for you though.
What you could do is deploy as many as your applications you want and use the Cache/IMDG to provide high availability and a platform/medium for you to use to coordinate actions. Features like writer locks can guarantee no two applications are performing the same task or more than it was required.
Lastly, regarding creating a reachable REST API for your applications, simply deploy your applications as web based applications and use the distributed cache underneath.
Disclosure: I work for Alachisoft. An In-Memory Distributed Cache provider. We have two products, one for linux and one for .Windows, both server .NET and Java applications and objects from both the languages are interchangeable. A feature of TayzGrid and NCache

Monitor traffic in a java web application

I'm looking for something that can monitor the traffic of a java web application in order to estimate cloud computing prices.
It would be great if it can categorize the traffic in different categories, e.g. database, static resources, pages, etc.
Ideally, it should be something working in the same way New Relic does. Unfortunately, New Relic only monitors times and not traffic...
Does something like this exist?
Thanks.

You don't need any java-specific software for doing that. You can use any network monitor tool and just run your application locally! For monitoring different kind of traffic, I would use different tools too. There are a lot of DB-monitor tools out there... Sorry for not being more specific.

There are quite a lot solutions for monitoring Java web applications. I have tried a few of them but finally stopped at two: Zabbix and JavaMelody. Both are appropriate to monitor and categorize a traffic of apps, although they are completely different in how they work. Zabbix allows watching an app over the long term via JMX. JavaMelody can be built into an app and provides complete insight into business app processes.
Your final decision about Java app monitoring platform depends on the prioritized app features that you’d like to monitor. I recommend you to read the review that helps you look at both solutions in details: http://cases.azoft.com/enterprise-system-monitoring-solutions-business-apps/

Java & the Different Types/Methods of Clustering

I'm relatively new to Java EE and have already began to hear about the many different types of systems that can be clustered:
Virtual Machines (i.e. "that appliance is a cluster of VMs...")
Application servers, such as Tomcat, JBoss or GlassFish (i.e. "We're running clustered JBoss...")
Clustering APIs like Terracotta
Databases, like Oracle ("clustered database")
Cloud applications ("A cloud is basically a cluster...")
Wikipedia defines "clustering" as:
A computer cluster consists of a set of loosely connected computers that work together so that in many respects they can be viewed as a single system.
I'm wondering how clustering works for each of these "cluster types/methods" (mentioned above) and how they relate to one another.
For instance, if one could benefit from having a clustered application, he/she would probably put them on a clustered app server and then throw a cluster manager into the mix (again, like Terracotta).
But because the phrase "clustering" seems to be used in vague/ambiguous ways, I'm not seeing how each of these ties into the others ones, or if they even do. Thanks in advance to any brave StackOverflowers out there who can help me make sense of this interwoven terminology!

To me, clustering implies a number of qualities to a system but it boils down to fault tolerance -- server, networking, and data persistence. There are both loosely and tightly coupled systems and all flavors in between. Tightly coupled systems have the clustering performed at level close to the hardware. Many of the old clustering systems were more tightly coupled with the applications often not recognizing that they were clustered.
Loosely coupled systems are the norm these days with a large degree of the fault tolerance accomplished at a software level entirely. Systems in the cluster only share network connectivity to be able to accomplish fault tolerance. Usually there are some specialized load balancers which route requests to the various cluster servers using specialized hardware (sometimes just software) to accomplish this.
All of the examples you mentioned have some sort of "clustering". It is going to take a very long answer to describe the details about how each of the architectures accomplish this. For me, the differences are what comes "for free" when you use the architecture, and how much work you will have to do to get it to work optimally.
How you mix and match the solutions you've mentioned depends on what your architecture looks like and your requirements. You can have a Terracotta store for local high speed persistence and the cloud for the rest. You can use Glassfish as your application server and utilize Terracotta as your persistence layer.
Here are my thoughts about the technologies you listed:
Cloud applications ("A cloud is basically a cluster...")
Cloud applications are the easiest to work with obviously. Your only job from an architecture standpoint is to pick a good cluster provider. Certainly Amazon and Google will do it "right" in terms of fault tolerance and data integrity. There are many other players that probably do it "good enough" and are cheaper. You program to their APIs which come with their own set of limitations and expenses. One problem with cloud applications is that it most likely will be very hard to switch to a new one. Again, you might have some [large] portion of your application running on cloud servers and have some local systems for your higher latency requirements. The trend is to put most production functions in the cloud or at least start that way until you get too big or need some services they can't provide.
Clustering APIs like Terracotta
Databases, like Oracle ("clustered database")
JBoss
These 3 systems provide their own clustering capabilities. They may require you to do a lot of machine and service layer configurations to get the system running well in a production environment. I hear good things about Terracotta which is a distributed persistence layer. I've used Jgroups a lot which is under Jboss and it can be tricky to get running right but Jboss may also have some good default configurations/documentation. Oracle is most likely going to be hardest to cluster right. DBA's make a lot of money tweaking Oracle configurations.
Virtual Machines (i.e. "that appliance is a cluster of VMs...")
Application servers, such as Tomcat, GlassFish
These are the most amorphous to define in terms of clustering. Some VMs are considered "clustered" in that they share networking hardware and power backplanes but are really not clusters when compared to cloud computing certainly. As mentioned, there are some clustered hardware solutions that are very custom and will require a lot of specific domain knowledge to get running well.
I have very little experience with application servers such as Tomcat and Glassfish. We have our own clustering software on top of Jgroups and run Jetty entirely. Application servers are not, in themselves, "clustered" but packages such as Jboss and Terracotta run on top of them to provide clustering and they have internal projects which have clustering software written for them.
Hope some of this helps.

Here's a quick whack at it. How you cluster depends on what your goals are. Here are some thoughts that also tie in to GlassFish.
A cluster enables multiple instances to be managed as one since they share a common configuration. If you make a change to a configuration, such as defining a new resource, then all instances that belong to a cluster inherit that change. Deploying an application to a cluster deploys it to all instances of that cluster.
A cluster provides service availability. If one instance fails, deployed applications are still available on other instances.
A cluster can offer session availability. If an instance dies while a user has items in their shopping cart, then another instance can take ownership of handling that user's session such that the shopping cart contents are still there. The user never knows a backend server has failed.
With GlassFish, HTTP session state can be managed by GlassFish (built-in), delegated to a coherence grid, or the application can manage state itself (using terracotta, database, etc). The benefit of using the built-in capability is that it works out of the box and has gone through stress testing, QA, etc. The benefit of externalizing is that you can potentially get better scalability since you decouple session management and application logic. Externalizing lets the JVM focus on executing business logic, and uses less HEAP space since backup sessions exist elsewhere. Oracle has tested / QA'd externalizing to the Coherence Grid, and is a formal feature of the commercial Oracle GlassFish Server. If you roll your own via database, then you need to manage & QA itH yourself.

Cloud for Flex, Java, mongoDb?

I am about to develop my masters project using Flex as front end, BlazeDs, Java Web Services and MongoDB in the backend. I am looking to deploy and manage it on a cloud. (The application analyzes financial data from various sources, I will need to query multiple endpoints for news articles and DB for processing)
It is my experiment to usage of cloud rather than deploying on my local for demo and presentation purposes.
I saw heroku (http://www.heroku.com/), but I am not sure if it allows flash.
Please suggest a cloud application platform which allows Flex, BlazeDs, Java Web Services and MongoDB?

Amazon Web Services is a good place to start. You can have a instance ready in like 15-30min from signing up. If you are just experimenting, you ought to try to get the Amazon Linux Image (AMI) up and running. Scour the net on HOWTO set up Tomcat, for your requirements it might be too much to go J2EE, but you might know better.
But a word of advice, it's better to get your application working on a local machine first. Then drop the programmer hat and put on the deployment hat 100% cause it's a b!tch configuring deployment environment for Tomcat configurations, Blaze DS, Mongo's failover servers, load balancers and all kinds of non-programming tasks. You will want to work your development stack close to home so you can diagnose quickly.
Cloud business is great only when you want 1) Not use your home PC and bandwidth as a server 2) You want to have global mirror points to your application so that user's latency in one area of the world is not slower than another part of the world 3) You want to distribute computing load burden on one application across many instances of the same application.
Clouds are relatively cheap to deploy but if you got an application that hording GB's of bandwidth and storage, be prepared to fork over $1000's+ in costs. You can save money by going with an OS with no licensing costs to get a better rate.

Java Applications Replicating State

I have the same server side application running on multiple machines.
I would like to provide a way to push changes to all the applications. I'm more interested in state/property changes to the objects themselves, and not so much replication of files, etc.
So I'm envisioning an admin console, where I would change some property, and have the change affect each applications state.
I am currently looking into JGroups, which is a toolkit for reliable multicast communication. In this case, each application would listen in on the same multicast group, and the admin console would send changes to the group.
Are there any other solutions/techniques available?

There exist alot of techniques, corba, rmi etc etc. However if you want a fully distributed system with no central server, I would personally recommend JGroups.
If you have a central server you can either
Let the server push the changes to all clients. The server must be aware of all clients, either directly or by having the clients register themselves.
Let clients poll the server.

Other simple solutions might including, polling a central database or a central file.
A quick Google search turns up: http://www.hazelcast.com/product.jsp. Which looks promising but I have no experience with it.

For the more complex scenarios I can't recommend terracotta enough.
Essentially terracotta distributes parts of your heap to the network, meaning that your applications share parts of the heap. Changes made by one app in the shared heap will be visible to the other applications sharing the heap. The main drawback is that terracotta needs a dedicated server since it's a hub and spoke design.

Apache ZooKeeper from the Hadoop project may also be interesting. I have no experience with it, but from the webpage I conlude it offers a hierarchical datamodel. Each application interested in changes can watch for changes in a specific node and act upon the changes.
http://zookeeper.apache.org/doc/trunk/index.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.