mod_cluster not balancing load after discover

mod_cluster not balancing load after discover - java

I have cluster setup up and running ...Jboss 7.1.1.Final and mod_cluster mod_cluster-1.2.6.Final.
mod_cluster load balancing is happening bitween two nodes - nodeA nodeB.
But when I stop one node and start, mod_cluster still sends the all load to the other node. It is not distributing load after comeback.
What is configuration changes required this ? I could see both nodes enabled in mod_cluster_manager. But it directs load only to one node even after the other node comeback after fail over.
Thanks

If you are seeing existing requests being forwarded to the active node, then it's because of sticky session being enabled. This is the default behavior.
If you are seeing new requests are not being forwarded to the new node (even when it's not busy) then it is a different issue. You may want to look at the load balancing factor/algorithm that you are currently utilizing in your mod-cluster subsystem.

It came to my mind, that you might actually be seeing the correct behaviour -- within a short time span. Take a look at my small FAQ: I started mod_cluster and it looks like it's using only one of the workers. TL;DR: If you send only a relatively small amount of requests, it might look like the load balancing doesn't work whereas it's actually correct not to flood fresh newcomers with a barrage of requests at once.

Related

Load balancing approached with gremlin driver

Few days ago, Google publish this article:
https://cloud.google.com/blog/big-data/2018/07/developing-a-janusgraph-backed-service-on-google-cloud-platform
We can read from there, that it is common to deploy janus graph as a separate instance behind the internal load balancer.
So, in my project we have pretty much the same architecture: bigtable, gke with janus and some app which calls janus through load balancer. The only difference ( dunno if that's important or no, we don't have internal load balancer, we have the "external(?)" one )
So. The question is: what is the state of load balancing when using gremlin driver in java application. Our research shows that it does not work. Since connections are stateful the throughput is not forwarded randomly to janus replicas. When it sticked to one - it stays with that particular replica till the end.
However, when the replica is killed, the connection somehow hangs, without any exception, warning, log, anything. It's like not information about the state of the connection at all. It is bad cause if we assume that one have automatic load balancer which spins out additional replicas when needed, it will simply does not work.
We are using janus graph 0.21 with corresponding tinkerpop driver 3.2.9 ( however we've tried many different combinations ) and still the schema stays the same. Load balancing does not work for us, as well as failover when some pod gets killed. - to make this even worse it is no really deterministic - we had some tests where it worked, but when we return to that test after a while, it doesn't.
Do you, stackoverflowers have any idea what is the state of this problem?

Load Balancing a java application running in a cluster

I have multiple vms running all of the modules in the project. Each request created by the user has to be processed by all modules, but needs to be done only once. So if VM1 picks up a request then module1 can process the request partially, next VM1 or VM2 or any other VM in cluster can pick up and process for module2. And so on.
Since each VM is of limited capacity i would like to use a load balancer for allocating work among individual VM's.
Are there load balancers(open source for java) available which can solve this or do i need to implement it using several load balancing algos(round robin,weighted etc) for solving my requirement?
Edit 1:
Each module is a java class which is independent in itself but needs previous modules to be done before its started.Each Vm is listening to a message bus. As and when a message appears in the bus any of the vm can pick up this and start working on it.

You can try HAProxy (TCP/HTTP loadbalancer ) which is open source, feature rich and quite widely used. Apart from good documentation you can find lots of information available.

Depending on the exact semantics of the problem you're trying to parallelize, you might get good results by chunking your problem into "work packets" of some size and keeping them in a central queue. Then, just have each VM poll a packet from said queue as soon as it finished the previous packet. This is called self-scheduling.

How to ensure java clients continue "working" in case whole hazelcast cluster is down

We are currently preparing hazelcast for going live in the next weeks. There is still one bigger issue left, that troubles our OPs department and could be a possible show stopper in case we cannot fix it.
Since we are maintaining a high availability payment application, we have to survive in case the cluster ist not available. Reasons could be:
Someone messed up the hazelcast configuration and a map on the cluster increases until we have OOM (had this on the test system).
There is some issue with the network cards/hardware that temporary breaks the connection to the cluster
OPs guys reconfigured the firewall and accidentaly blocked some ports that are necessary, whatosoever.
Whatever else
I spent some time on finding good existing solution, but the only solution so far was to increase the number of backup servers, which of course does not solve the case.
During my current tests the application completely stopped working because after certain retries the clients disconnect from the cluster and the hibernate 2nd level cache is no longer working. Since we are using hazelcast throughout the whole ecosystem this would kill 40 java clients almost instantly.
Thus I wonder how we could achieve that the applications are still working in a of course slower manner when the cluster is down. Our current approach is to switch over to ehcache local cache but I think there should be hazelcast solution for that problem as well?

If I were you I would use a LocalSessionFactoryBean and set the cacheRegionFactory to a Spring Bean that can delegate a call to either Hazelcast or a NoCachingRegionFactory, if the Hazelcast server is down.
This is desirable, since Hibernate assumes the Cache implementation is always available, so you need to provide your own CacheRegion proxy that can decide the cache region routing at runtime.

How do JCache compliant distributed caches work when your app is deployed to a cloud?

Please note: if the cache systems mentioned in this question work so completely differently from one another that an answer to this question is nearly-impossible, then I would simplify this question down to anything that is just JCache (JSR107) compliant.
The major players in the distributed cache game, for Java at least, are EhCache, Hazelcast and Infinispan.
First of all, my understanding of a distributed cache is that it is a cache that lives inside a running JVM process, but that is constantly synchronizing its in-memory contents across other multiple JVM processes running elsewhere. Hence Process 1 (P1) is running on Machine 1 (M1), P2 is running on M2 and P3 is running on M3. An instance of the same distributed cache is running on all 3 processes, but they somehow all know about each other and are able to keep their caches synchronized with one another.
I believe EhCache accomplishes this inter-process synchrony via JGroups. Not sure what the others are using.
Furthermore, my understanding is that these configurations are limiting because, for each node/instance/process, you have to configure it and tell it about the other nodes/instances/processes in the system, so they can all sync their caches with one another. Something like this:
<cacheConfig>
<peers>
<instance uri="myapp01:12345" />
<instance uri="myapp02:12345" />
<instance uri="myapp03:12345" />
</peers>
</cacheConfig>
So to begin with, if anything I have stated is incorrect or is mislead, please begin by correcting me!
Assuming I'm more or less on track, then I'm confused how distributed caches could possibly work in an elastic/cloud environment where nodes are regulated by auto-scalers. One minute, load is peaking and there are 50 VMs serving your app. Hence, you would need 50 "peer instances" defined in your config. Then the next minute, load dwindles to a crawl and you only need 2 or 3 load balanced nodes. Since the number of "peer instances" is always changing, there's no way to configure your system properly in a static config file.
So I ask: How do distributed caches work on the cloud if there are never a static number of processes/instances running?

One way to handle that problem is to have an external (almost static) caching cluster which holds the data and your application (or the frontend servers) are using clients to connect to the cluster. You can still scale the caching clusters up and down to your needs but most of the time you'll need less nodes in the caching cluster than you'll need frontend servers.

Cluster-aware Tomcat web application

I believe this task is not much exotic, but due to the lack of clustering experience I feel hard to find the answer.
Our web-app performs some background operations by schedule (data querying and transfer).
Now Tomcat server on which it is running goes clustered. We need only one instance in cluster to perform these background operations, not all.
I see following options:
The ideal solution would be master/slave model of cluster, where slave instances of Tomcat has our application in inactive status (undeployed). If slave becomes a master, application gets deployed and starts to work. Is it possible ?
If not, then we need some notifications/events that we can implement listeners for, in order to know when some node starts up / shuts down. We will then programmaticaly make application in first raised node a master, and block unwanted process in other (slave) nodes. Further we will listen to startup / shutdown events from nodes to keep always a single active master. I were looking for such events API in Tomcat but wihout luck so far.
Does anyone have experience with such task ? How did you solve it ?
Thank you.

I don't know if there is a master/slave behavior setting in a Tomcat cluster, because I think all nodes need to be equal. But what about using the Quartz Clustering with JDBC-JobStore? You can define the tasks within a shared database and if a task is triggered, the first available node will execute it. So all nodes in your cluster will have the same behavior while only a singe node will execute the same task at a time:
"Only one node will fire the job for each firing. ... It won't necessarily be the same node each time - it will more or less be random which node runs it. The load balancing mechanism is near-random for busy schedulers (lots of triggers) but favors the same node for non-busy (e.g. few triggers) schedulers."
If a node fails while executing a task the next available node will retry:
"Fail-over occurs when one of the nodes fails while in the midst of executing one or more jobs. When a node fails, the other nodes detect the condition and identify the jobs in the database that were in progress within the failed node."

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.