My knowledge of threads is very limited. I happen to be the guy who can write multi-threaded programs but just by copy-pasting and finding answers to my questions on the internet. But I've finally decided to learn a bit about concurrency and bought the book "Java Concurrency in Practice". After reading a couple of pages, I'm confident that I'll learn a great deal from this book.
Maybe I'm being a little impatient but I cannot resist the temptation of asking this question. It made me create an account on Stack Overflow. I'm not sure I'll be able to correctly phrase the question so I'll try to explain my question using an example.
If I had to write a (extremely unprofessionally coded) peer-to-peer chat client in, say, Java, I'll initiate a socket connection between the clients and keep it alive because messages can arrive at any time. The solution I can imagine would open a socket connection in a new thread and run a while loop continuously to keep the thread alive, as the thread dies as soon as run returns. For some reason, I cannot imagine a similar chat client in a single threaded program. How can you keep "waiting" until a message arrives if all you have is a single thread. Won't that block the execution of entire program?
To solve such a problem, what's the alternative to a continuous while loop?
How can you keep "waiting" until a message arrives if all you have is a single thread.
One possibility is to have the "parallelism" to happen "outside" of your application. Imagine a waiter in a restaurant. Just one guy. He walks from one customer to the next, and writes up the orders. From time to time, he walks over to the counter, puts in the orders, and picks up whatever stuff the chef left for him. Just one guy, walking around, doing "single task" work. But in the end, the overall system still has multiple actors (the guests, the waiter, the chef, the guy beyond the bar preparing the beverages). So, the waiter could be seen as "single threaded", but in the end, the overall system "restaurant" isn't.
Some IT architectures "mimic" that, for example around the idea of "non blocking" IO. That is how node.js works. It is single threaded by nature, but does async IO (see here for details). And you can do similar things with Java, too.
On the other hand, when you learn about concurrency, you still want to learn about the "real" multi threading, what it means, and how you would write code to "use" that concept.
Related
I am new to this concept and want to have a great understanding of this topic.
To make my point clear I want to take an analogy.
Let's take a scenario of Node JS which is single-threaded and provide fast IO operation using an event loop. Now that makes sense since It is single-threaded and is not blocked for any task.
While studying reactive programming in Java using reactor. I came to a situation where the main thread is blocked when an object subscribes and some delay event took place.
Then I came to know the concept of subscribeOn.boundedElastic and many more pipelines like this.
I got it that they are trying to make it asynchronous by moving those subscribers to other threads.
But if it occurs like this then why is the asynchronous. Is it not thread-based programming?
If we are trying to achieve the async behaviour of Node JS then according to my view it should be in a single thread.
Summary of my question is:
So I don't get the fact of using or calling reactive programming as asynchronous or functional programming because of two reason
Main thread is blocked
We can manage the thread and can run it in another pool. Runnable service/ callable we can also define.
First of all you can't compare asynchronous with functional programming. Its like comparing a rock with a banana. Its two separate things.
Functional programming is compared to other types of programming, like object oriented programming or procedural programming etc. etc.
Reactor is a java library, and java is an object oriented programming language with functional features.
Asynchronous i will explain with what wikipedia says
Asynchrony, in computer programming, refers to the occurrence of events independent of the main program flow and ways to deal with such events.
So basically how to handle stuff "around" your application, that is not a part of the main flow of your program.
In comparison to Blocking, wikipedia again:
A process that is blocked is one that is waiting for some event, such as a resource becoming available or the completion of an I/O operation.
A traditional servlet application works by assigning one thread per request.
So every time a request comes in, a thread is spawned, and this thread follows along the request until the request returns. If there is something blocking during this request, for instance reading a file from the operating system, or making a request to another service. The assigned thread will block and wait until the reading of the file is completed, or the request has returned etc.
Reactive works with subscribers and producers and makes heavy use of the observer pattern. Which means that as soon as some thing blocks, reactor can take that thread and use it for something else. And then it is un-blocked any thread can pick up where it left off. This makes sure that every thread is always in use, and utilized at 100%.
All things processed in reactor is done by the event loop the event loop is a single threaded loop that just processes events as quick as possible. Schedulers schedule things to be processed on the event loop, and after they are processed a scheduler picks up the result and carries on.
If you just run reactor you get a default scheduler that will schedule things for you completely automatically.
But lets say you have something blocking. Well then you will stop the event loop. And everything needs to wait for that thing to finish.
When you run a fully reactive application you usually get one event loop per core during startup. Which means lets say you have 4 cores, you get 4 event loops and you block one, then during that period of blockages your application runs 25% slower.
25% slower is a lot!
Well sometimes you have something that is blocking that you can't avoid. For instance an old database that doesn't have a non-blocking driver. Or you need to read files from the operating system in a blocking manor. How do you do then?
Well the reactor team built in a fallback, so that if you use onSubscribe in combination with its own elastic thread pool, then you will get the old servlet behaviour back for that single subscriber to a specific say endpoint etc.
This makes sure that you can run fully reactive stuff side by side with old legacy blocking things. So that maybe some reaquests usese the old servlet behaviour, while other requests are fully non-blocking.
You question is not very clear so i am giving you a very unclear answer. I suggest you read the reactor documentation and try out all their examples, as most of this information comes from there.
Even after reading http://krondo.com/?p=1209 or Does an asynchronous call always create/call a new thread? I am still confused about how to provide asynchronous calls on an inherently single-threaded system. I will explain my understanding so far and point out my doubts.
One of the examples I read was describing a TCP server providing asynch processing of requests - a user would call a method e.g. get(Callback c) and the callback would be invoked some time later. Now, my first issue here - we have already two systems, one server and one client. This is not what I mean, cause in fact we have two threads at least - one in the server and one on the client side.
The other example I read was JavaScript, as this is the most prominent example of single-threaded asynch system with Node.js. What I cannot get through my head, maybe thinking in Java terms, is this:If I execute the code below (apologies for incorrect, probably atrocious syntax):
function foo(){
read_file(FIle location, Callback c) //asynchronous call, does not block
//do many things more here, potentially for hours
}
the call to read file executes (sth) and returns, allowing the rest of my function to execute. Since there is only one thread i.e. the one that is executing my function, how on earth the same thread (the one and only one which is executing my stuff) will ever get to read in the bytes from disk?
Basically, it seems to me I am missing some underlying mechanism that is acting like round-robin scheduler of some sort, which is inherently single-threaded and might split the tasks to smaller ones or call into a multiothraded components that would spawn a thread and read the file in.
Thanks in advance for all comments and pointing out my mistakes on the way.
Update: Thanks for all responses. Further good sources that helped me out with this are here:
http://www.html5rocks.com/en/tutorials/async/deferred/
http://lostechies.com/johnteague/2012/11/30/node-js-must-know-concepts-asynchrounous/
http://www.interact-sw.co.uk/iangblog/2004/09/23/threadless (.NET)
http://ejohn.org/blog/how-javascript-timers-work/ (intrinsics of timers)
http://www.mobl-lang.org/283/reducing-the-pain-synchronous-asynchronous-programming/
The real answer is that it depends on what you mean by "single thread".
There are two approaches to multitasking: cooperative and interrupt-driven. Cooperative, which is what the other StackOverflow item you cited describes, requires that routines explicitly relinquish ownership of the processor so it can do other things. Event-driven systems are often designed this way. The advantage is that it's a lot easier to administer and avoids most of the risks of conflicting access to data since only one chunk of your code is ever executing at any one time. The disadvantage is that, because only one thing is being done at a time, everything has to either be designed to execute fairly quickly or be broken up into chunks that to so (via explicit pauses like a yield() call), or the system will appear to freeze until that event has been fully processed.
The other approach -- threads or processes -- actively takes the processor away from running chunks of code, pausing them while something else is done. This is much more complicated to implement, and requires more care in coding since you now have the risk of simultaneous access to shared data structures, but is much more powerful and -- done right -- much more robust and responsive.
Yes, there is indeed a scheduler involved in either case. In the former version the scheduler is just spinning until an event arrives (delivered from the operating system and/or runtime environment, which is implicitly another thread or process) and dispatches that event before handling the next to arrive.
The way I think of it in JavaScript is that there is a Queue which holds events. In the old Java producer/consumer parlance, there is a single consumer thread pulling stuff off this queue and executing every function registered to receive the current event. Events such as asynchronous calls (AJAX requests completing), timeouts or mouse events get pushed on to the Queue as soon as they happen. The single "consumer" thread pulls them off the queue and locates any interested functions and then executes them, it cannot get to the next Event until it has finished invoking all the functions registered on the current one. Thus if you have a handler that never completes, the Queue just fills up - it is said to be "blocked".
The system has more than one thread (it has at least one producer and a consumer) since something generates the events to go on the queue, but as the author of the event handlers you need to be aware that events are processed in a single thread, if you go into a tight loop, you will lock up the only consumer thread and make the system unresponsive.
So in your example :
function foo(){
read_file(location, function(fileContents) {
// called with the fileContents when file is read
}
//do many things more here, potentially for hours
}
If you do as your comments says and execute potentially for hours - the callback which handles fileContents will not fire for hours even though the file has been read. As soon as you hit the last } of foo() the consumer thread is done with this event and can process the next one where it will execute the registered callback with the file contents.
HTH
So, I've recently been injected with the Node virus which is spreading in the Programming world very fast.
I am fascinated by it's "Non-Blocking IO" approach and have indeed tried out a couple of programs myself.
However, I fail to understand certain concepts at the moment.
I need answers in layman terms (someone coming from a Java background)
1. Multithreading & Non-Blocking IO.
Let's consider a practical scenario. Say, we have a website where users can register. Below would be the code.
..
..
// Read HTTP Parameters
// Do some Database work
// Do some file work
// Return a confirmation message
..
..
In a traditional programming language, the above happens in a sequential way. And, if there are multiple requests for registration, the web server creates a new thread and the rest is history. Of course, programmers can create threads of their own to work on Line 2 and Line 3 simultaneously.
In Node, as I understand, Lines 2 & 3 will be run in parallel while the rest of the program gets executed and the Interpreter polls the lines 2 & 3 every 'x' ms.
Now, my question is, if Node is a single threaded language, what does the job of lines 2 & 3 while the rest of the program is being executed?
2. Scalability
I recently read that LinkedIn have adapted Node as a back-end for their Mobile Apps and have seen massive improvements.
Can anyone explain how it has made such a difference?
3. Adapting in other programming languages
If people are claiming that Node to be making a lot of difference when it comes to performance, why haven't other programming languages adapted this Non-Blocking IO paradigm?
I'm sure I'm missing something. Only if you can explain me and guide me with some links, would be helpful.
Thanks.
A similar question was asked and probably contains all the info you're looking for: How the single threaded non blocking IO model works in Node.js
But I'll briefly cover your 3 parts:
1.
Lines 2 and 3 in a very simple form could look like:
db.query(..., function(query_data) { ... });
fs.readFile('/path/to/file', function(file_data) { ... });
Now the function(query_data) and function(file_data) are callbacks. The functions db.query and fs.readFile will send the actual I/O requests but the callbacks allow the processing of the data from the database or the file to be delayed until the responses are received. It doesn't really "poll lines 2 and 3". The callbacks are added to an event loop and associated with some file descriptors for their respective I/O events. It then polls the file descriptors to see if they are ready to perform I/O. If they are, it executes the callback functions with the I/O data.
I think the phrase "Everything runs in parallel except your code" sums it up well. For example, something like "Read HTTP parameters" would execute sequentially, but I/O functions like in lines 2 and 3 are associated with callbacks that are added to the event loop and execute later. So basically the whole point is it doesn't have to wait for I/O.
2.
Because of the things explained in 1., Node scales well for I/O intensive requests and allows many users to be connected simultaneously. It is single threaded, so it doesn't necessarily scale well for CPU intensive tasks.
3.
This paradigm has been used with JavaScript because JavaScript has support for callbacks, event loops and closures that make this easy. This isn't necessarily true in other languages.
I might be a little off, but this is the gist of what's happening.
Q1. " what does the job of lines 2 & 3 while the rest of the program is being executed?"
Answer: "Nothing". Lines 2 and 3 each themselves start their respective jobs, but those jobs cannot be done immediately because (for example) the disk sectors required are not loaded in yet - so the operating system issues a call to the disk to go get those sectors, then "Nothing happens" (node goes on with it's next task) until the disk subsystem (later) issues an interrupt to report they're ready, at which point node returns control to lines #2 and #3.
Q2. single-thread non-blocking dedicates almost no resources to each incoming connection (just some housekeeping data about the connected socket). It's very memory efficient. Traditional web servers "fork" a whole new process to handle each new connection - that means making a humongous copy of every bit of code and data variables needed, and time-slicing the CPU to deal with it all. That's massively wasteful of resources. Thus - if your load is a lot of idle connections waiting for stuff, as was theirs, node makes loads more sense.
Q3. almost every programming language does already have non-blocking I/O if you want to use it. Node is not a programming language, it's a web server that runs javascript and uses non-blocking I/O (eg: I personally wrote my own identical thing 10 years ago in perl, as did google (in C) when they started, and I'm sure loads of other people have similar web servers too). The non-blocking I/O is not the hard part - getting the programmer to understand how to use it is the tricky bit. Javascript happens to work well for that, because those programmers are already familiar with event programming.
Even though node.js has been around for a few years, it's performance model is still a bit mysterious.
I recently started a blog and decided that the node.js model would be a good first topic since I wanted to understand it better myself and it would be helpful to others to share what I learned. Here are a couple of articles I wrote that explain the high level concepts and some tradeoffs:
Blocking vs. Non-Blocking I/O – What’s going on?
Understanding node.js Performance
How many threads in minimum will be required for developing a traffic signal application?
I think it is only one, because when any one of the lights is in green the other 3 directions will be red. And for this application there is no need for multiple operations to be run parallel y.
This question cannot be answered meaningfully without knowing what the scope of entire application is. (In the simple case, a traffic light simulation could certainly be implemented in a single thread ... if the conditions were right.)
However, real world control systems for things like traffic lights are typically written in languages like C that are better at interfacing with controller hardware. That makes your question moot ... kind of.
As a driver, I sure hope it's only one thread!
You would only need one thread, but imagine the implications of non-threadsafe code or threading bugs...
someone could literally die!
Edited
Actually, "it depends" is the correct answer, if there is one.
Simple traffic lights, for example pedestrian crossings, could simply block waiting for a button press then complete the cycle and return to a blocking wait again.
Complex event-driven lights that can receive many inputs, may need multiple threads if the hardware doesn't support interrupts or other single threaded mechanisms for dealing with real time input signals.
As others mentioned, this is an useless exercise without knowing the exact conditions and constraints you have.
But here's my shot: I would probably have 4 threads:
Main thread -- manages the lifecycle
Receiver thread -- receives events from other sources, probably from different hardware parts, like the fire truck "remote control", or communication from the nearby traffic lights to determine if they the nearby traffic lights are in sync.
Broadcaster thread -- dispatches signals from this specific traffic light to other consumers (other traffic lights, command and control center, ...)
Processing thread -- gets delegated the processing of the events sent and received from the other two threads (Receiver, Broadcaster).
In a very trivial case yes. But usually traffic signal depends on so many factors like :
Time for which it will be red/green based on traffic
On any intersection points of 2-3 roads signals of each road should be in sync.
an so on...
So there is no fixed answer to this.
I'm writing a little genetic algorithm in Java, as a school assignment. Up until now I've pretty much stuck to doing console applications. However I think a UI would be really helpful for this program, so I'd like to make one. I'm having trouble figuring out how to reconcile a GUI which is event-driven, and a console application which has a beginning and end.
Ideally I'd like to have a bunch of text boxes for settings, and then a Start button. Once you hit Start, the algorithm would start running and the GUI would update on a set interval with the latest program state. How the heck do I accomplish this without the algorithm freezing the GUI or vice-versa? I don't want either one waiting on the other.
How do I get my main loop to not freeze the GUI while the algorithm is running? I assume they need to be in separate threads, but I've never messed with threads before. That seems too complex for this task, which must be commonplace.
You're on to something with threads. GUI programming mandates threads, in most cases -- luckily, Java's threading API isn't too terrible (Python's is modeled on it, so it's doing something right).
Don't be intimidated by threading, though -- it's intermediate, I'd say, but is something that every programmer should understand.
There's a lot of information out there that would predispose you against threads. GUI applications, however, are one area where they are phenomenally useful. Opponents of threading would lead you to believe that an event-programming model would help you in this case, when really, it will not. The solutions that most who say "threading sucks" propose are often worse than threading itself.
You could try to kludge your solution into a single thread, but it would require your CPU-intensive code to yield to the GUI at a predictable interval. That solution sucks. EDIT: Since others are suggesting that approach, let me elaborate on why it sucks: Unbeknownst to you, something is always being updated in a GUI. When you move a window on top and then back off, the entire region under that window is invalidated and code must execute -- in your process -- to redraw that section. Even if you are updating the GUI very quickly, this provides a negative user experience as simple GUI operations block entirely. Buttons highlight when you mouse over, sometimes. A user right clicks. All of these things require CPU time to implement, and if your solitary thread is chewing away on your GA, they will not happen. GUI code being executed is not only your code.
Here's what appears to be a very useful article on the topic.
Two lessons in the trail on the topic are:
Concurrency
Concurrency in Swing
Sorry - it would seem like background tasks would be an easy and obvious sort of thing. Unfortunately, the Java Swing GUI threading model is a bit complicated. There have been some improvements in this area, but one still has to have some knowledge of threading first.
If you have time, I'd suggest reading the chapter on threading in Filthy Rich Clients - Painless Threading through SwingWorker.
If your impatient, just read the JavaDoc on SwingWorker. If you're really impatient, just copy the meaning of life example from the JavaDoc sample usage.
When I was writing a ray-tracer for one of my computer graphics classes in college, I had a long-running task, and I wanted to update the display periodically as the tracer was drawing. I used two separate threads - one thread sleeps, and updates (say every 500 ms); the other thread does the actual raytracing. The key is to synchronize on a common object - in my case, accessing my image buffer was the point of synchronization (one thread can't make changes to the image buffer without first waiting until the other thread is done reading).
For your GA processing, you might have something like this (pseudocode):
Supposing you have some object, generationHistoryObject, which stores the state that you want to display in your GUI, then:
(in Thread #1:)
Generation newGeneration = doMutationAndTestThisGeneration(lastGeneration);
synchronized (generationHistoryObject) {
generationHistoryObject.updateWithNextGeneration(newGeneration);
}
(in Thread #2:)
while (!programIsDone()) {
synchronized (generationHistoryObject) {
drawGuiForCurrentState(generationHistoryObject);
}
Thread.sleep(500);
}
The idea is that you do the time-consuming work for each generation in isolation, then update the part that the GUI has to access in the synchronized block (making the GUI wait to draw until the update is done).
Your problem with Swing is that it is single threaded (which is a good thing) so you want to get your work out of the Swing thread so your application stays responsive.
The thing you need to do is to convert your core algorithm to a Runnable as it can be handled easily by a SwingWorker and with the newer Executors (see Executors for a lot of preconfigured ones). You can also create investigate how to make a PrintStream to a JTextPanel so you can just use standard println statements to output your current status information.
If you want to add a stop button you need to understand the thread model, so you know how to control it. The Java Tutorial has good material on this as well as Swing programming in general. Strongly recommended.
http://java.sun.com/docs/books/tutorial/uiswing/concurrency/index.html
Since your application has to do with genetic algorithms, you could update the GUI every generation or so. You could accomplish this by implementing a next() method in your algorithm code, and calling it from the GUI. This should be simple enough.
However, if you really don't want the GUI to freeze while waiting for the algorithm, you should go for threads.