Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am new to zookeeper, Apache curator and need your help to design a prorgram:
I need to create a java program, that will run a script every hour (based on cron expression provided by end user).
Consider I have 3 servers, I need to make sure the script runs every hour without failure even in case of a server is down (in this case script must run on other server). Every hour script will be running only on one server.
I have to create an interface to provide input the this java program. Input will be (i) Script to be run and (ii) Cron expression to schedule script.
1) Please suggest an idea how can I design my program to achieve this. How zookeeper, Apache curator can be used in the same.
2) Is there any way to cache the script on these 3 servers that end-user provide to run?
Can Apache curator's NodeCache be used to cache the script on these 3 servers?
Your response will be highly appreciated.
With three servers, where one is to run no matter what, you need a distributed approach. The problem is that in the event of failures, you might not be able to solve the puzzle of whether to run the script or not.
For a start, you can just have one computer connect to others and tell them not to run. This is called a "hold down" approach; but, it has a lot of issues when you can't connect to the other computers. The problems are that most starting programmers fail to really understand the changes a network environment makes on how they need to design programs. Please take a little time to read over the typical fallacies of distributed computing.
Chron solves this by not caring what happens on other computers, so chron has the wrong design goals.
With three computers, you will also have three different clocks, with their own speeds and times. A good distributed solution will have some concept of time that doesn't directly rely on each machine's clock.
Distributed solutions (if they are to tolerate faults or failures) must be able to run without reliable communication to the other machines. Sometimes the group gets split in half, where one group of machines cannot communicate to the other group. In many cases, both group will perform the "critical" action in fear that the other group didn't. It other cases, both groups might not perform the "critical" action assuming that the other group did. A good solution will ensure that the "critical action" is performed once, even when the computers cannot communicate. Often this is done by "majority" where your group (quorum) cannot perform a critical action if you don't have access to at least a majority of the involved machines.
Look at the Paxos algorithim to get an idea of the issues; and, once you are more aware of the problems, look back at your chosen technologies to determine which parts of the problems they are attempting to solve considering the "fallacies of distributed computing". Also realize that a perfect, 100% correct solution might not be possible; because, the pre-selected machine(s) to run the script might suffer a network failure, and then a power failure in sequence in such a manner that the up machines just assume there's only a network outage.
This is an interview question, right? If yes, be aware that this answer only gets you partway.
The simplest solution is to have all three servers running, and attempt to acquire a lock to perform the processing. See http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks
To ensure that only one server runs the job, you will need to record the last execution time. This is simply "store a value with known key," and you'll find it in one of the intro tutorials.
Of course, if this is an interview question, the interviewer will ask follow-on questions such as "what happens if the script fails halfway through?" or "what if the computers don't have the same time?" You won't (easily) solve either of those problems with ZooKeeper.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Background:
I'm working on a web-based application built in Spring MVC and Angular. we have a help desk module where agents use to work for customer care. The application is deployed on a single server. we have a ticket locking mechanism i-e when an agent opens a ticket to start working on it ticket get locked to that agent so that other agents may not work on the same ticket at the same time. as soon as the agent closes the ticket its available for other agents to open and update if needed. For locking ticket to avoid too much DB calls we have implemented ConcurrentHashMap so that everyone gets updated for the locking ticket using the same map this is working absolutely fine.
Issue:
Now the application is deployed on two different servers and this ConcurrentHashMap is not working as MAP is maintained by each server. If a user is locking a ticket using Node-1 and if 2nd user's request goes to node-2, this approach is not going to work. To avoid this situation we are planning to change the flow so that we may avoid such issues. Parallelly, we don't want to save this locking details directly to DB to avoid DB IO as it a very frequent usage area of application.
Options
After doing some R&D I got the following options that we can implement, keeping the persistence in mind.
We can implement the In-Memory table concept using MSSQL or Redis
RabbitMQ
We can implement an API that will be deployed on a single node and both of our servers will use that to maintain locking tickets but we still have two problems with this calling API would be time taking and 2nd it's not persisting the data, if the server will get restarted we will lose the data.
Can anyone advise me on which approach should be good for the above case and how to implement it. I just need a startup.
thanks in advance.
I think your real problem is this:
For locking ticket to avoid too much DB calls [ you decided not to use the database ].
IMO, that was a mistake. A database call to acquire a "lock" on a ticket is unlikely to result in too many database calls.
In analyzing this, you need to consider how often someone will want to start working on a ticket, and how often it is likely to fail because someone is already working on the ticket. I don't know your use-case details, but I would be very surprised if the latter event happens more often than once per second.
If your database cannot sustain one "small" database operation per second (worst case!) for locking, then it won't be able to sustain the larger transactions involved in creating tickets, agents updating them, user reading them, and so on.
So suggestions are:
Work out what the actual database load for ticket locking will be ... relative to all of the other things that the database needs to do.
If it is small, just go back to the database for ticket locking. Keep it simple!
If it is large; either:
Scale up or scale out the existing database; e.g. use sharding. It seems likely that you will need to do this anyway. That should give you the "headroom" to use the existing database for locking as well.
Create a separate database server for the locking. It is unlikely that it will need to be big, and I can't envisage that it needs to be very fast. (See below!!)
Use one of your proposed solutions.
But my main advice is to AVOID the trap of premature optimization. You seem to be designing for bottlenecks that you think will exist without any clear evidence for this. For example:
"We can implement an API that will be deployed on a single node and both of our servers will use that to maintain locking tickets but we still have [the problem] with this calling API would be time taking ..."
Unless the time taken is multiple seconds, this is unlikely to be a real problem. The best strategy is to implement the system first the simple way and then measure performance to see 1) whether optimization effort is warranted and 2) where the real bottlenecks are in the complete system.
In your case, I doubt that the users will care if it takes (say) 1 second versus 2 seconds to be told that someone else is already working on a ticket.
Finally, wouldn't it be simpler to use an existing off-the-shelf ticketing system? There are many of them out there. Commercial products, open source, hosted, etcetera. (OK it is probably too late for this, because it sounds like you are committed to implementing your own ticketting system from scratch. But it may not be too late to reconsider your strategy.)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
In many examples that discuss synchronization, it is mentioned something to the effect, "This will work in a single-threaded application, but if multi-threaded..."
I am puzzled because it seems to me possible, although perhaps incredibly unlikely, that even in a single-threaded operation, there can be problems similar to those mentioned in multi-threaded applications.
Say you have an object that has a status field that reflects whether it has been placed into a queue. Pseudo-code would be: object.setStatus(INQUEUE); placeInQueue(object);
Now, if it was somehow possible for the status to be successfully set but the next line of code "silently fail" and the program continue, would we not have a problem wherein we think the object is in the queue but is not? Maybe the idea that we could have a silent failure is false.
But if it is somehow possible for the above problem to occur, how would we make so that the two lines of code either always both executed or both failed?
Logic bugs, that is a flaw in the algorithm, can and regularly do still happen in single threaded applications. If such a problem exists in a single threaded implementation of the algorithm, then it will only get worse when one tries to make it multi-threaded.
The quote "This will work in a single-threaded application, but if multi-threaded..." was talking about a class of problem that gets introduced by the nature of being concurrent. For example, if I was in the kitchen baking a cake by myself I would not have to worry about bumping into another chief. I would however have to still worry about burning my hands on the oven and not bumping my hip on the counter.
The scenario that you describe using a queue, that is backed by disk is another example of parallelism. Even though our application logic is single threaded, other processes can be writing to disk while our process is working with the queue and so it is possible for the disk to run out of space at no fault of our program. Handling such problems can become quite involved, the two basic approaches are to either lock out a resource for a period of time or optimistically assume that one will succeed and then handle an error when it fails later. The example that you gave was an example of the later, only without the error handling. A silent failure in that scenario can happen in real systems that ignore the problem, and they are broken.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Wikipedia defines an execution unit as:
"In computer engineering, an execution unit (also called a functional unit) is a part of a CPU that performs the operations and calculations called for by the computer program."
Now, is it a logical or conceptual thing performing the operations of the program? Or is it a physical (hardware) structure in CPU which performs the tasks called for by the program (e.g. shutting down the computer, changing the colors etc. ) ?
And I have read that "In concurrent programming, there are two units of execution i.e. processes and threads."
Now, the concept I have made in my mind is that a unit of execution is, let's say a package of related classes as well as the system resources being used by them e.g. system's memory and other resources.
Please tell me to what extent am I right?
NOTE: Please keep your language (i.e. jargon and terminology you might use) simple enough for a beginner to understand.
Thank you in advance.
It appears to me that execution unit refers to hardware, specifically a portion of the computer's brain that can work at the same time as other parts on a different task. It seems to allow simple multi-tasking, as implied by Wikipedia's article on the execution unit. The article explains that Superscalar Architecture involves multiple execution units fetching commands at the same time.
An execution unit is like a worker. He has a job and does it until he is finished. Then he asks his boss what to do next and works on that. When you have multiple workers, you get more work done faster. An execution unit does low level tasks like 1+1.
Moving onto unit of execution, it appears this is more about how the software runs, as evidenced by this microsoft article. A unit of execution, such as a thread, manages high level tasks involving many small steps like conquerTheWorld().
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a small doubt, I trying to make my application secure as much as possible so is it possible we can make a setup file that will run only once and after that it should not run on any pc. It sounds STUPID but is it possible. I don't have any code to show, I just want to know can we make it in JAVA
The setup wizard doesn't actually control when, or if, it runs. Nor does it control how many times one wants it to run. So the direct answer to your question is "no"; however, it is quite possible (and even desirable) to have the setup wizard check for "artifacts" of being ran a previous time.
If you have a setup wizard detect a file or setting which the wizard would be the only likely creator and then shutdown if it is detected, then effectively you can guard against the critical section of the wizard being run twice.
You could have the application connect to a website to check if it has been installed however it would require internet connection at the time of installation.
Then during the installation you would send notice of its installation.
I use a system like this. At time of download it generates a serial number and inserts it into a file that is later read by the installation system and used to "register" the product during installation.
No. You can put some reasonable steps in place to make it more difficult, but anyone truely interested in breaking your security mechanism will likely be able to.
Anything you create can be copied and executed any number of times, even if the running copy deletes itself afterwards.
This leads to the requirement of external authentication against some server each time the setup program is run. This however is also not guaranteed to work, just look at how easy/quick video game DRM is to crack as an example.
You could read from a file that contains either true or false. Wrap the start-up of the wizard in an if statement to only execute if this variable is false, then at the end of the wizard change the file to say true. If you want to make it more secure you could encrypt the file and then decrypt to see what it says.
"Is it possible" is always a tricky question, for many reasons.
I think it is unlikely that you will be able to create this to work the way you want, simply because you're talking about a security question, and if anyone is seriously interested in violating your security measures, then there will be someone better at breaking than you are at locking.
Whether your software will inspire that sort of interest, I have no way of knowing. The important question in security is "can I make this secure enough for my purposes?", and we don't know enough about your requirements, expected threat models, and so forth.
All in all, the best answer I can give you is: if you want security done right, don't do it yourself. Go to a professional and have them secure it. You want to know enough about security to evaluate the professional, so you have some hope of getting what you pay for, but you don't want to try to write that code. You might be good at writing spreadsheets or mail clients or whatever you're writing, but you're clearly not good at writing security, and it's not something you learn in a day.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I want to make connection between two parts of my program which can be located in departed places.
I have some choices for making this connection:
using PRC/RMI: in every request a calling method will send to second part
using normal function call
using queue(in memory):every request will be placed in a queue and second part will get that and answer requests
using queue(in DB):like number 3 but in DB
using socket for sending data(TCP/IP or UDP or ...)
using web service
can anyone compare these available ways?
Here are my thoughts:
RPC/RMI - Requires the RMI/IIOP protocol over the wire, which limits you to using Java for both the client and the server.
Normal function call means both objects are resident in the same JVM and cannot be distributed. This will be the fastest option for a single method call. Reusing that object means having to package it in a JAR and redistribute it to all the other apps that need it. Now you've got to know where all those JARs are if the code changes. Distribution is an issue.
Asynchronous processing, but you'll have to write all the queue and handling code. This would take strong multi-threading skills. Could be very fastest of all, because it's all in memory and would allow parallel processing if you had multiple cores. It's also the most dangerous, because you have to be thread-safe.
Don't understand why you'd have the queue in a database. I'd prefer a Java EE app server for doing this. Not all RDBMS have queues running inside them. If you agree and go with JMS, this will be asynchronous and distributed and robust. It'll allow topics or queues, which can be flexible. But it'll be slower than the others.
Using a socket is just like RMI, except you have to write the entire protocol. Lots of work.
A web service will be similar to RMI in performance. But it'll use HTTP as the protocol, which means that any client that can formulate an HTTP request can call it. REST or SOAP will give you flexibility about message choices (e.g., XML, JSON, etc.)
Synchronous calls mean the caller and callee are directly coupled. The interface has to be kept constant.
Asynchronous calls mean looser coupling between the caller and callee. Like the synchronous case, the messages have to be relatively stable.
UPDATE: The picture you added makes the problem murkier. The interaction with 3rd party merchant and card handlers makes your error situation dicier. What happens if one of those fails? What if either one is unavailable? If the bank fails, how do you communicate that back to the 3rd parties? Very complicated, indeed. You'll have bigger problems than just choosing between RMI and web services.
There's quite a lot to answer there. Briefly:
If your program is split up into a client/server, you can't use a normal method call.
Queuing would possibly work if your method calls are one-way (have no return value). However if you want to return values synchronously then what do you do ? Have two queues - one for outgoing and one for incoming results ? And what do you do re. exceptions. It's not a natural fit.
Web services will work. However they're often used for bridging between client/servers on different platforms and written in different languages, so it may be a lot of unnecessary work here. The same applies to CORBA, btw.
A TCP socket solution would work, but requires quite a lot of extra work to set up (if you want to invoke separate methods etc.). Note (also) that TCP and UDP are fundamentally different and for reliability purposes I wouldn't use UDP normally for this sort of stuff.
RMI is pretty straightforward to set up in Java, and that would probably be a good first step. Check out the RMI tutorial here.
Option (2) will only work if both parts of your program are running in the same JVM.
Options (3) and (4) only work if you don't mind the calls being asynchronous, i.e. not returning a result directly to the caller.
Options (5) and (6) take a lot of work to set up, you'd be better off with (1).