I am thinking how to build a real-time system using java without Sun Real-time System API.
Say, a boss generates an event at 11:00 am, he has to get feedback at 11:10 am. If no any feedback, he will resend his event.
The staff gets his boss's event at 11:01 am, he has to leave 1 minutes for sending his result back to his boss. So actually he has 8 minutes to do his job. At 11:09 am, he has to send a feedback no matter he finished it or not.
This is a real-time system, isn't it ?
In this case, how to design our system using java ? This is a producer-consumer pattern. In the consumer side, use which object (blockingqueue, delayqueue ...) to meet this requirement ?
Any weblink, open source will be welcome.
Thank.
You cannot do real-time programming in the real computer engineering sense in Java. You are at the mercy of a thread scheduler and an operating system with totally unknown underlying properties. If the OS felt like waiting until 11:20 until it got back around to giving the JVM some CPU time, that's its business.
If you mean "realtime" in the Microsofty way as in "Things respond really really fast and we're careful never to block the main UI thread" that doesn't have a well defined technical meaning. It just means "architecture user facing code to give the appearance they don't have to wait on the computer."
--edit in response to comment.
If by 11:08 you mean 'between 11:07:59 and 11:08:01' then regular java can generally do that for you on a modern platform with careful programming. What it can't deliver is a situation where the event happening at 11:08:01 is considered a platform defect, it just doesn't make that guarantee.
When we say 'real time' and what the RTS API is for, is a situation more like "The bonding head must be at these coordinates at exactly this millisecond, if it's more than half a millisecond late, the part will be defective, and if it's more than 2 milliseconds early, a $300,000 servo table is going to crash into its bearings and cause a $10,000,000 assembly line outage."
The system you described can be solved with JMS.
Use a pub-sub JMS queue to assign the work. The "boss" is the publisher. Each member of the staff is a "subscriber".
The "boss" will need to store each message it publishes in a "check back" area (perhaps a list) and set a timer for 10 minutes. When it gets a response to a message, it will clear the timer and remove the message from the "check back" area.
Related
I have a scenario where I have some timing data that I get from a MIDI file. When a MIDI note needs to be played, I need to send a command over UDP. Basically, I have instructions that say "play note A, wait 125ms, play note B, wait 300ms, play note C..." and each time I "play note X" I need to send data over UDP. I have tried using both a TimerTask and a simple thread with a loop that check the system time and calculate how much time has elapsed and decide whether or not to play a note based on that, but both methods seem to have timing issues. The TimerTask doesn't run exactly on the specified interval (which was stated in the documentation) so I get erratic messages. The thread works better, but it still hiccups sometimes which I assume is because other threads are getting priority over it.
Is there a better way to send this data with more accurate timing? Is there something I can use like the Clip interface in Java that is used for playing audio?
Any assistance is very much appreciated.
This is an approach just about doomed to failure. Let me count the issues here:
1)Android is not a real time OS. Neither is Linux (which its built on). Expecting ms level timings to happen exactly correctly is never going to work. Even if the clock is accurate enough to interrupt on a 1ms rate, there's no assurance that Linux will schedule your thread for wakeup.
2)TimerTasks aren't promised to be accurate even to the degree limited by 1.
3)You're then sending it somewhere via UDP? A protocol that has no assurance as to delivery or timing, to a receiver who will then do something with it- and that receiver may have additional timing issues of its own.
Throw out this entire approach and start over would be my advice. Every single step of this says bad idea.
For the last few years we have used our own RM Application to process events related to our applications. This works by polling a database table every few minutes, looking for any rows that have a due date before now, and have not been processed yet.
We are currently making the transition to SNS, with SQS Worker tiers processing them. The problem with this approach is that we can't future date our messages. Our applications sometimes have events that we don't want to process until a week later.
Are there any design approaches, alternative services, clever tricks we could employ that would allow us to do achieve this?
One solution would be to keep our existing application running, at a simplified level, so all it does is send the SNS notifications when they are due, but the aim of this project is to try and do away with our existing app.
The database approach would be the wisest, being careful that each row is only processed once.
Amazon Simple Notification Service (SNS) is designed to send notifications immediately. There is no functionality for a delayed send (although some notification types are retried if they fail).
Amazon Simple Queue Service (SQS) does have a delay feature, but only up to 15 minutes -- this is useful if you need to do some work before the message is processed, such as copying related data to Amazon S3.
Given that your requirement is to wait until some future arbitrary time (effectively like a scheduling system), you could either start a process and tell it to sleep for a certain amount of time (a bad idea in case systems are restarted), or continue your approach of polling from a database.
If all jobs are scheduled for a distant future (eg at least one hour away), you theoretically only need to poll the database once an hour to retrieve the earliest scheduled time.
A week might be too long as SQS message retention itself is only 15 days. If you are okay with maximum retention of 15days, one idea is to keep the changing the visibility of a message every time you receive until it is ready for processing. The maximum allowed visibility timeout is 12 hours. More on visibility timeout and APIs for changing them,
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ChangeMessageVisibility.html
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html
I found this approach: https://github.com/alestic/aws-sns-delayed. Basically, you can use a step function with a wait step in there
So, I've recently been injected with the Node virus which is spreading in the Programming world very fast.
I am fascinated by it's "Non-Blocking IO" approach and have indeed tried out a couple of programs myself.
However, I fail to understand certain concepts at the moment.
I need answers in layman terms (someone coming from a Java background)
1. Multithreading & Non-Blocking IO.
Let's consider a practical scenario. Say, we have a website where users can register. Below would be the code.
..
..
// Read HTTP Parameters
// Do some Database work
// Do some file work
// Return a confirmation message
..
..
In a traditional programming language, the above happens in a sequential way. And, if there are multiple requests for registration, the web server creates a new thread and the rest is history. Of course, programmers can create threads of their own to work on Line 2 and Line 3 simultaneously.
In Node, as I understand, Lines 2 & 3 will be run in parallel while the rest of the program gets executed and the Interpreter polls the lines 2 & 3 every 'x' ms.
Now, my question is, if Node is a single threaded language, what does the job of lines 2 & 3 while the rest of the program is being executed?
2. Scalability
I recently read that LinkedIn have adapted Node as a back-end for their Mobile Apps and have seen massive improvements.
Can anyone explain how it has made such a difference?
3. Adapting in other programming languages
If people are claiming that Node to be making a lot of difference when it comes to performance, why haven't other programming languages adapted this Non-Blocking IO paradigm?
I'm sure I'm missing something. Only if you can explain me and guide me with some links, would be helpful.
Thanks.
A similar question was asked and probably contains all the info you're looking for: How the single threaded non blocking IO model works in Node.js
But I'll briefly cover your 3 parts:
1.
Lines 2 and 3 in a very simple form could look like:
db.query(..., function(query_data) { ... });
fs.readFile('/path/to/file', function(file_data) { ... });
Now the function(query_data) and function(file_data) are callbacks. The functions db.query and fs.readFile will send the actual I/O requests but the callbacks allow the processing of the data from the database or the file to be delayed until the responses are received. It doesn't really "poll lines 2 and 3". The callbacks are added to an event loop and associated with some file descriptors for their respective I/O events. It then polls the file descriptors to see if they are ready to perform I/O. If they are, it executes the callback functions with the I/O data.
I think the phrase "Everything runs in parallel except your code" sums it up well. For example, something like "Read HTTP parameters" would execute sequentially, but I/O functions like in lines 2 and 3 are associated with callbacks that are added to the event loop and execute later. So basically the whole point is it doesn't have to wait for I/O.
2.
Because of the things explained in 1., Node scales well for I/O intensive requests and allows many users to be connected simultaneously. It is single threaded, so it doesn't necessarily scale well for CPU intensive tasks.
3.
This paradigm has been used with JavaScript because JavaScript has support for callbacks, event loops and closures that make this easy. This isn't necessarily true in other languages.
I might be a little off, but this is the gist of what's happening.
Q1. " what does the job of lines 2 & 3 while the rest of the program is being executed?"
Answer: "Nothing". Lines 2 and 3 each themselves start their respective jobs, but those jobs cannot be done immediately because (for example) the disk sectors required are not loaded in yet - so the operating system issues a call to the disk to go get those sectors, then "Nothing happens" (node goes on with it's next task) until the disk subsystem (later) issues an interrupt to report they're ready, at which point node returns control to lines #2 and #3.
Q2. single-thread non-blocking dedicates almost no resources to each incoming connection (just some housekeeping data about the connected socket). It's very memory efficient. Traditional web servers "fork" a whole new process to handle each new connection - that means making a humongous copy of every bit of code and data variables needed, and time-slicing the CPU to deal with it all. That's massively wasteful of resources. Thus - if your load is a lot of idle connections waiting for stuff, as was theirs, node makes loads more sense.
Q3. almost every programming language does already have non-blocking I/O if you want to use it. Node is not a programming language, it's a web server that runs javascript and uses non-blocking I/O (eg: I personally wrote my own identical thing 10 years ago in perl, as did google (in C) when they started, and I'm sure loads of other people have similar web servers too). The non-blocking I/O is not the hard part - getting the programmer to understand how to use it is the tricky bit. Javascript happens to work well for that, because those programmers are already familiar with event programming.
Even though node.js has been around for a few years, it's performance model is still a bit mysterious.
I recently started a blog and decided that the node.js model would be a good first topic since I wanted to understand it better myself and it would be helpful to others to share what I learned. Here are a couple of articles I wrote that explain the high level concepts and some tradeoffs:
Blocking vs. Non-Blocking I/O – What’s going on?
Understanding node.js Performance
What is both faster and "better practice", using a polling system or a event based timer?
I'm currently having a discussion with a more senior coworker regarding how to implement some mission critical logic. Here is the situation:
A message giving an execution time is received.
When that execution time is reached, some logic must be executed.
Now multiple messages can be received giving different execution times, and the logic must be executed each time.
I think that the best way to implement the logic would be to create a timer that would trigger the logic when the message at the time in the message, but my coworker believes that I would be better off polling a list of the messages to see if the execution time has been reached.
His argument is that the polling system is safer as it is less complicated and thus less likely to be screwed up by the programmer. My argument is that by implementing it my way, we reduce the reduce the computational load and thus are more likely execute the logic when we actually want it to execute. How should I implement it and why?
Requested Information
The only time my logic would ever be utilized would almost certainly be at a time of the highest load.
The requirements do not specify how reliable the connection will be but everyone I've talked to has stated that they have never heard of a message being dropped
The scheduling is based on an absolute system. So, the message will have a execution time specifying when an algorithm should be executed. Since there is time synchronization, I have been instructed to assume that the time will be uniform among all machines.
The algorithm that gets executed uses some inputs which initially are volatile but soon stabilize. By postponing the processing, I hope to use the most stable information available.
The java.util.Timer effectively does what your colleague suggests (truth be told, in the end, there really aren't that many ways to do this).
It maintains a collection of TimerTasks, and it waits for new activity on it, or until the time has come to execute the next task. It doesn't poll the collection, it "knows" that the next task will fire in N seconds, and waits until that happens or anything else (such as a TimerTask added or deleted). This is better overall than polling, since it spends most of its time sleeping.
So, in the end, you're both right -- you should use a Timer for this, because it basically does what your coworker wants to do.
Given the following facts, is there a existing open-source Java API (possibly as part of some greater product) that implements an algorithm enabling the reproducible ordering of events in a cluster environment:
1) There are N sources of events, each with a unique ID.
2) Each event produced has an ID/timestamp, which, together with
its source ID, makes it uniquely identifiable.
3) The ids can be used to sort the events.
4) There are M application servers receiving those events.
M is normally 3.
5) The events can arrive at any one or more of the application
servers, in no specific order.
6) The events are processed in batches.
7) The servers have to agree for each batch on the list of events
to process.
8) The event each have earliest and latest batch ID in which they
must be processed.
9) They must not be processed earlier, and are "failed" if they
cannot be processed before the deadline.
10) The batches are based on the real clock time. For example,
one batch per second.
11) The events of a batch are processed when 2 of the 3 servers
agree on the list of events to process for that batch (quorum).
12) The "third" server then has to wait until it possesses all the
required events before it can process that batch too.
13) Once an event was processed or failed, the source has to be
informed.
14) [EDIT] Events from one source must be processed (or failed) in
the order of their ID/timestamp, but there is no causality
between different sources.
Less formally, I have those servers that receive events. They start with the same initial state, and should keep in sync by agreeing on which event to process in which order. Luckily for me, the events are not to be processed ASAP, but "in a bit", so that I have some time to get the servers to agree before the deadline. But I'm not sure if that actually make any real difference to the algorithms. And if all servers agree on all batches, then they will always be in sync, therefore presenting a consistent view when queried.
While I would be most happy with a Java API, I would accept something else if I can call it from Java. And if there is no open-source API, but a clear algorithm, I would also take that as an answer and try to implement it myself.
Looking at the question and your follow-up there probably "wasn't" an API to satisfy your requirements. To day you could take a look at the Kafka (from LinkedIn)
Apache Kafka
And the general concept of "a log" entity, in what folks like to call 'big data':
The Log: What every software engineer should know about real-time data's unifying abstraction
Actually for your question, I'd begin with the blog about "the log". In my terms the way it works -- And Kafka isn't the only package out doing log handling -- Works as follows:
Instead of a queue based message-passing / publish-subscribe
Kafka uses a "log" of messages
Subscribers (or end-points) can consume the log
The log guarantees to be "in-order"; it handles giga-data, is fast
Double check on the guarantee, there's usually a trade-off for reliability
You just read the log, I think reads are destructive as the default.
If there's a subscriber group, everyone can 'read' before the log-entry dies.
The basic handling (compute) process for the log, is a Map-Reduce-Filter model so you read-everything really fast; keep only stuff you want; process it (reduce) produce outcome(s).
The downside seems to be you need clusters and stuff to make it really shine. Since different servers or sites was mentioned I think we are still on track. I found it a finicky to get up-and-running with the Apache downloads because the tend to assume non-Windows environments (ho hum).
The other 'fast' option would be
Apache Apollo
Which would need you to do the plumbing for connecting different servers. Since the requirements include ...
servers that receive events. They start with the same initial state, and should keep in sync by agreeing on which event to process in which order. Luckily for me, the events are not to be processed ASAP, but "in a bit", so that I have some time to get the servers to agree before the deadline
I suggest looking at a "Getting Started" example or tutorial with Kafka and then looking at similar ZooKeeper organised message/log software(s). Good luck and Enjoy!
So far I haven't got a clear answer, but I think it would be useful anyone interested to see what I found.
Here are some theoretical discussions related to the subject:
Dynamic Vector Clocks for Consistent Ordering of Events
Conflict-free Replicated Data Types
One way of making multiple concurent process wait for each other, which I could use to synchronize the "batches" is a distributed barrier. One Java implementation seems to be available on top of Hazelcast and another uses ZooKeeper
One simpler alternative I found is to use a DB. Every process inserts all events it receives into the DB. Depending on the DB design, this can be fully concurrent and lock-free, like in VoltDB, for example. Then at regular interval of one second, some "cron job" runs that selects and marks the events to be processed in the next batch. The job can run on every server. The first to run the job for one batches fixes the set of events, so that the others just get to use the list that the first one defined. Like that we have a guarantee that all batches contain the same set of event on all servers. And if we can use a complete order over the whole batch, which the cron job could specify itself, then the state of the servers will be kept in sync.