I have a problem in understanding Distributed Executor Service.I am trying to run the example which is mentioned here in
[a link] https://github.com/hazelcast/hazelcast-code-samples/tree/master/distributed-executor/scale-out
What i am assuming about scale out is that when i run master and slave on different Machines the execution should happen on both machines i.e load should be balanced on both machines.But i am not able to see anything happening on slave console.The master console is executing all the 1000 EchoTask.Is my understanding wrong about Distributed Executor Service?Can someone help me in understanding this
Your understanding is correct. When you start up the nodes, are they able to connect to each other? Do you actually see a cluster being formed or work both of them independently?
Latter case can happen if your network does not work with multicast and nodes are not able to discover each other automatically.
Can you basically share some logging to see that the nodes form the cluster as expected?
Related
We are running our calculations in a standalone Spark cluster, ver 1.0.2 - the previous major release. We do not have any HA or recovery logic configured.
A piece of functionality on the driver side consumes incoming JMS messages and submits respective jobs to spark.
When we bring the single & only Spark master down (for tests), it seems the driver program is unable to properly figure out that the cluster is no longer usable. This results in 2 major problems:
The driver tries to reconnect endlessly to the master, or at least we couldn't wait until it gives up.
Because of the previous point, submission of new jobs blocks (in org.apache.spark.scheduler.JobWaiter#awaitResult). I presume this is because the cluster is not reported unreacheable/down and the submission simply logic waits until the cluster comes back. For us this means that we run out of the JMS listener threads very fast since they all get blocked.
There are a couple of akka failure detection-related properties that you can configure on Spark, but:
The official documentation strongly doesn't recommend enabling akka's built-in failure detection.
I would really want to understand how this is supposed to work by default.
So, can anyone please explain what's the designed behavior if a single spark master in a standalone deployment mode fails/stops/shuts down. I wasn't able to find any proper doc on the internet about this.
In default, Spark can handle Workers failures but not for the Master (Driver) failure. If the Master crashes, no new applications can be created. Therefore, they provide 2 high availability schemes here: https://spark.apache.org/docs/1.4.0/spark-standalone.html#high-availability
Hope this helps,
Le Quoc Do
I have cluster setup up and running ...Jboss 7.1.1.Final and mod_cluster mod_cluster-1.2.6.Final.
mod_cluster load balancing is happening bitween two nodes - nodeA nodeB.
But when I stop one node and start, mod_cluster still sends the all load to the other node. It is not distributing load after comeback.
What is configuration changes required this ? I could see both nodes enabled in mod_cluster_manager. But it directs load only to one node even after the other node comeback after fail over.
Thanks
If you are seeing existing requests being forwarded to the active node, then it's because of sticky session being enabled. This is the default behavior.
If you are seeing new requests are not being forwarded to the new node (even when it's not busy) then it is a different issue. You may want to look at the load balancing factor/algorithm that you are currently utilizing in your mod-cluster subsystem.
It came to my mind, that you might actually be seeing the correct behaviour -- within a short time span. Take a look at my small FAQ: I started mod_cluster and it looks like it's using only one of the workers. TL;DR: If you send only a relatively small amount of requests, it might look like the load balancing doesn't work whereas it's actually correct not to flood fresh newcomers with a barrage of requests at once.
I've got a Spring Web application that's running on two different instances.
The two instances aren't aware of each other, they run on distinct servers.
That application has a scheduled Quartz job but my problem is that the job shouldn't execute simultaneously on the instances, as its a mail sending job, it could cause duplicate emails being sent.
I'm using RAMJobStore and JDBCJobStore is not an option for me due to the large number of tables it requires.(I cant afford to create many tables due to internal restriction)
The solutions I thought about:
-creating a single control table, that has to be checked everytime a job starts (with repeatable read isolation level to avoid concurrency issues) The problem is that if the server is killed, the table might be left in a invalid state.
-using properties to define a single server to be the job running server. Problem is that if that server goes down, jobs will stop running
Has anyone ever experienced this problem and do you have any thoughts to share?
Start with the second solution (deactivate qartz on all nodes except one). It is very simple to do and it is safe. Count how frequently your server goes down. If it is inacceptable then try the first solution. The problem with the first solution is that you need a good skill in mutithreaded programming to implement it without bugs. It is not so simple if multithreading is not your everyday task. And a cost of some bug in your implementation may be bigger than actual profit.
OK, I think question is pretty simple. This is only for testing on University. I would like to brute-force attack some password (not to complicate, just to find given string by brute-forcing). It's basically one loooong for loop combination that could be performed in new thread. Is there any way to distribute that "thread" over computers on same network? Use multiple computers (for example in electronical class - with 100 computers) just to support with their processing power that same for loop (thread). And all resolts are displayed on that computer where program started (some kind of master or server computer).
Is something like that possible in Java? Windows 7 is operating system.
In case you want to build the application yourself instead of using a framework, I recommend, you take a walk through the RMI Trail: See http://docs.oracle.com/javase/tutorial/rmi/
I did a simple distributed computing engine myself some time ago, and it turned out to be pretty easy using RMI.
Its very simple. You need to have a master/controller computer and several worker computers or slaves. The master breaks the problem into chunks and allocates chunks to slaves. The slaves perform chunks of work and report back to the master when they have finished.
The most complicated part is getting the master/slaves to talk to each other. A relatively simple solution would be to use sockets and invent some trivial comms protocol.
The master algorithm would look something like;
break problem into chunks
while problem not solved {
wait for socket comms from slave
if slave is asking for a chunk of work
allocate chunk to slave
else if slave is reporting a chunk didn't contain the solution
mark chunk as completed
allocate chunk to slave
else if slave is reporting a chunk did contain the solution
problem is solved
}
The slave algorithm would look something like;
while problem not solved {
ask master for a chunk of work
process this chunk
if chunk contains solution
problem is solved
report results back to master
}
You need to make your chunk size big enough so that the slaves spend most of their time solving the problem and not talking to the master. The frequency at which slaves talk to the master will probably determine the number of slaves a single master is able to handle. At a first guess I'd say size the chunks so that a slave takes 2 or 3 minutes to process it.
thanks for answers. I can use anything already made, I don't need to code that part. Problem is, there are no multiple little problems so I could distribute them to other clusters and wait for them to solve them... There is ONLY ONE LONG LOOP, one LONG FOR LOOP that I would like to SPEED UP using processing power of multiple computers in same local network.
I'm thinking that should be some tool that is connecting Java Virtual Machines, not software I wrote alone. Somehow, that's the only way I see solution for this.
I have a situation here where I need to distribute work over to multiple JAVA processes running in different JVMs, probably different machines.
Lets say I have a table with records 1 to 1000. I am looking for work to be collected and distributed is sets of 10. Lets say records 1-10 to workerOne. Then records 11-20 to workerThree. And so on and so forth. Needless to say workerOne never does the work of workerTwo unless and until workerTwo couldnt do it.
This example was purely based on database but could be extended to any system, I believe be it File processing, email processing and so forth.
I have a small feeling that the immediate response would be to go for a Master/Worker approach. However here we are talking about different JVMs. Even if one JVM were to come down the other JVM should just keep doing its work.
Now the million dollar question would be: Are there any good frameworks(production ready) that would give me facility to do this. Even if there are concrete implementations of specific needs like Database records, File processing, Email processing and their likes.
I have seen the Java Parallel Execution Framework, but am not sure if it can be used for different JVMs and if one were to come down would the other keep going.I believe Workers could be on multiple JVMs, but what about the Master?
More Info 1: Hadoop would be a problem because of the JDK 1.6 requirement. Thats bit too much.
Thanks,
Franklin
Might want to look into MapReduce and Hadoop
You could also use message queues. Have one process that generates the list of work and packages it in nice little chunks. It then plops those chunks on a queue. Each one of the workers just keeps waiting on the queue for something to show up. When it does, the worker pulls a chunk off the queue and processes it. If one process goes down, some other process will pick up the slack. Simple and people have been doing it that way for a long time so there's a lot information about it on the net.
Check out Hadoop
I believe Terracotta can do this. If you are dealing with web pages, JBoss can be clustered.
If you want to do this yourself you will need a work manager which keeps track of jobs to do, jobs in progress and jobs never done which needs to be rescheduled. The workers then ask for something to do, do it, and send the result back, asking for more.
You may want to elaborate on what kind of work you want to do.
The problem you've described is definitely best solved using the master/worker pattern.
You should have a look into JavaSpaces (part of the Jini framework), it's really well suited to this kind of thing. Basically you just want to encapsulate each task to be carried out inside a Command object, subclassing as necesssary. Dump these into the JavaSpace, let your workers grab and process one at a time, then reassemble when done.
Of course your performance gains will totally depend on how long it takes you to process each set of records, but JavaSpaces won't cause any problems if distributed across several machines.
If you work on records in a single database, consider performing the work within the database itself using stored procedures. The gain for processing the records on different machine might be negated by the cost of retrieving and transmitting the work between the database and the computing nodes.
For file processing it could be a similar case. Working on files in (shared) filesystem might introduce large I/O pressure for OS.
And the cost for maintaining multiple JVM's on multiple machines might be an overkill too.
And for the question: I used the JADE (Java Agent Development Environment) for some distributed simulation once. Its multi-machine suppord and message passing nature might help you.
I would consider using Jgroups for that. You can cluster your jvms and one of your nodes can be selected as master and then can distribute the work to the other nodes by sending message over network. Or you can already partition your work items and then manage in master node the distribution of the partitions like partion-1 one goes to JVM-4 , partion-2 goes to JVM-3, partion-3 goes to JVM-2 and so on. And if JVM-4 goes down it will be realized by the master node and then master node will tell to one of the other nodes to start pick up partition-1 as well.
One other alternative which is easier to use is redis pub sub support. http://redis.io/topics/pubsub . But then you will have to maintain redis servers which i dont like.