I have basic idea about concurrency but I have a confusion about the following architecture. I think it is concurrent but my colleague thinks it is not. The architecture is as follows:
I have multiple robots which publish its data to its individual gateways and there's another java service which listens on the gateways. The service creates a new thread to listen to each gateway.
My understanding is that the service is performing concurrent execution but my colleague says this is not concurrent as concurrency involves sharing of hardware.
Appreciate if some one can clarify or elaborate on this topic.
My understanding is that the service is performing concurrent execution but my colleague says this is not concurrent as concurrency involves sharing of hardware.
TL/DR: Words are squishy. That's why we have code.
"Concurrent" simply means two or more things happening at the same time. As it applies to computation, true concurrency means two or more threads of execution running at the same time, which requires separate hardware. That certainly can be separate cores of the same CPU or separate CPUs in the same chassis, so that there is some degree of shared hardware. It can also be separate cores in different chassis, however, such as in a computational cluster, though perhaps this is where your colleague is drawing his line. Such a line would be pretty arbitrary, though.
In contrast, long before it was common for even servers to feature multiple CPU (core)s, many computer systems implemented one flavor or another of multitasking, whereby multiple tasks can all be in progress at the same time by virtue of the operating system allotting slices of CPU time to each and switching them in and out. All modern general-purpose operating systems still do this. On a single core, however, this provides only simulated concurrency, because at any given instant in time, only one computation is actually making progress.
Your colleague does have a point, however, that multiple, spatially distributed robots all operating at the same time without coordination is a bit beyond what people usually mean when they talk about concurrent computation. Certainly such robots are operating concurrently, in the general-use sense of "at the same time", but it's a bit of a stretch to characterize them as participating in a concurrent computation.
The server that allocates a separate thread to handle communication with each robot may thereby be performing a concurrent computation. But as long as we're splitting hairs, do recognize that communication over a single network interface is serialized, so unless your server has multiple network interfaces, the actual communication cannot be truly concurrent. If the server is primarily just recording the data as it arrives, as opposed to incorporating it into an ongoing concurrent computation, then it would be potentially misleading to describe it as performing a concurrent operation.
Even by your colleague's definition, this is a concurrent system since there are multiple threads executing on the hardware on which the service resides.
Related
In the book Core Java : Volume 1 Fundamentals -> chapter MultiThreading .
The Author wrote as follows :
"All modern desktop and server operating systems use preemptive
scheduling. However, smaller devices such as cell phones may use
cooperative scheduling...."
I am aware of the definitions/workings of both types of scheduling , but want to understand reasons why cooperative scheduling is preferred over preemptive in smaller devices.
Can anyone explain the reasons why ?
Preemptive scheduling has to solve a hard problem -- getting all kinds of software from all kinds of places to efficiently share a CPU.
Cooperative scheduling solves a much simpler problem -- allowing CPU sharing among programs that are designed to work together.
So cooperative scheduling is cheaper and easier when you can get away with it. The key thing about small devices that allows cooperative scheduling to work is that all the software comes from one vendor and all the programs can be designed to work together.
The big benefit in cooperative scheduling over preemptive is that cooperative scheduling does not use "context switching". Context switching involves storing and restoring the state of an application (or thread). This is costly.
The reason why smaller devices are able to get away with cooperative scheduling for now has to do with the fact that there is only one user on a small device. The problem with cooperative scheduling is that one application can hog up the CPU. In preemptive scheduling every application will eventually be given an opportunity to use the CPU for a few cycles. For bigger systems, where multiple demons or users are involved, cooperative scheduling may cause issues.
Reducing context switching is kind of a big thing in modern programming. You see it in Node.js, Nginx, epoll, ReactiveX and many other places.
First you have to find the Meaning of the word Preemption
Preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such changes of the executed task are known as context switches.(https://en.wikipedia.org/wiki/Preemption_(computing))
Therefore, the difference is
In a preemptive model, the operating system's thread scheduler is
allowed to step in and hand control from one thread to another at any
time(tasks can be forcibly suspended).
In cooperative model, once a thread is given control it continues to
run until it explicitly yields control(handover control of CPU to the next task) or until it blocks.
Both models have their advantages and disadvantages. Preemptive scheduling works better when CPU have to run all kinds of software which are not related to each other. And cooperative scheduling works better when running programs that are designed to work together.
Examples for cooperative scheduling threads:
Windows fibers (https://learn.microsoft.com/en-gb/windows/win32/procthread/fibers?redirectedfrom=MSDN)
Sony’s PlayStation 4 SDK (http://twvideo01.ubm-us.net/o1/vault/gdc2015/presentations/Gyrling_Christian_Parallelizing_The_Naughty.pdf)
If you want to learn underline implementations of these cooperative scheduling fibers refer this book (https://www.gameenginebook.com/)
Your book states that "smaller devices such as cell phones", may be author is referring to cell phones from several years back. They had only few programs to run and all are provided by the phone manufacturer. So we can assume those programs are designed to work together.
Cooperative scheduling has fewer synchronizaton problems.
Cooperative scheduling can have better performance in some, mostly contrived, scenarios.
Cooperative scheduling introduces constraints upon design and implementation of threads.
Cooperative scheduling is basically useless for most real purposes because of dire I/O performance, which is why almost nobody uses it.
Even small devices will prefer to use preemptive scheduling if they can possibly get away with it. Smartphones, streaming, (esp. video), and such apps that require good I/O are essentially not possible with cooperative systems.
What you are left with are trivial embedded toaster-controllers and the like.
Hard real-time control applications often demand that at least one thread/task not be preemptively interrupted while other threads are more forgiving. Additionally, the highest priority task may require that it be executed on a rigid schedule rather than being left to the mercy of a scheduler that will eventually provide a time-slot. For these applications, cooperative multitasking seems much closer to what is needed than preemptive multitasking but it still isn't an exact fit since some tasks may need immediate on-demand interrupt response while other tasks are less sensitive to the multi-tasking scheme.
Cooperative Scheduling
A task will give up the CPU on a point called (Synchronization Point). It can use something like that in POSIX:
pthread.yield(Task_ID)
Preemptive Scheduling
The main difference here is that in preemptive scheduling, the task may be forced to relinquish the CPU by the scheduler. For instance, two tasks with same priority, while one of them running, its time slice is ended.
What I understood from Vert.x documentation (and a little bit of coding in it) is that Vert.x is single threaded and executes events in the event pool. It doesn't wait for I/O or any network operation(s) rather than giving time to another event (which was not before in any Java multi-threaded framework).
But I couldn't understand following:
How single thread is better than multi-threaded? What if there are millions of incoming HTTP requests? Won't it be slower than other multi-threaded frameworks?
Verticles depend on CPU cores. As many CPU cores you have, you can have that many verticles running in parallel. How come a language that works on a virtual machine can make use of CPU as needed? As far as I know, the Java VM (JVM) is an application that uses just another OS process for (here my understanding is less about OS and JVM hence my question might be naive).
If a single threaded, non-blocking concept is so effective then why can't we have the same non-blocking concept in a multi-threaded environemnt? Won't it be faster? Or again, is it because CPU can execute one thread at a time?
What I understood from Vert.x documentation (and a little bit of coding in it) is that Vert.x is single threaded and executes events in the event pool.
It is event-driven, callback-based. It isn't single-threaded:
Instead of a single event loop, each Vertx instance maintains several event loops. By default we choose the number based on the number of available cores on the machine, but this can be overridden.
It doesn't wait for I/O or any network operation(s)
It uses non-blocking or asynchronous I/O, it isn't clear which. Use of the Reactor pattern suggests non-blocking, but it may not be.
rather than giving time to another event (which was not before in any Java multi-threaded framework).
This is meaningless.
How single thread is better than multi-threaded?
It isn't.
What if there are millions of incoming HTTP requests? Won't it be slower than other multi-threaded frameworks?
Yes.
Verticles depend on CPU cores. As many CPU cores you have, you can have that many verticles running in parallel. How come a language that works on a virtual machine can make use of CPU as needed? As far as I know, the Java VM (JVM) is an application that uses just another OS process for (here my understanding is less about OS and JVM hence my question might be naive).
It uses a thread per core, as per the quotation above, or whatever you choose by overriding that.
If a single threaded, non-blocking concept is so effective then why can't we have the same non-blocking concept in a multi-threaded environemnt?
You can.
Won't it be faster?
Yes.
Or again, is it because CPU can execute one thread at a time?
A multi-core CPU can execute more than one thread at a time. I don't know what 'it' in 'is it because' refers to.
First of all, Vertx isn't single threaded by any means. It just doesn't spawn more threads that it needs.
Second, and this is not related to Vertx at all, JVM maps threads to native OS threads.
Third, we can have non-blocking behavior in multithreaded environment. It's not one thread per CPU, but one thread per core.
But then the question is: "what are those threads doing?". Because usually, to be useful, they need other resources. Network, DB, filesystem, memory. And here it becomes tricky. When you're single threaded, you don't have race conditions. The only one accessing the memory at any point of time is you. But if you're multi threaded, you need to concern yourself with mutexes, or any other way to keep you data consistent.
Q:
How single thread is better than multi-threaded? What if there are millions of incoming HTTP requests? Won't it be slower than other multi-threaded frameworks?
A:
Vert.x isn't a single threaded framework, it does make sure that a "verticle" which is something you deploy within you application and register with vert.x is mostly single threaded.
The reason for this is that concurrency with multiple threads over complicates concurrency with locks synchronisation and other concept that need to be taken care of with multi threaded communication.
While verticles are single threaded the do use something called an event loop which is the true power behind this paradigm called the reactor pattern or multi reactor pattern in Vert.x's case. Multiple verticles can be registered within one application, communication between these verticles run through an eventbus which empowers verticles to use an event based transfer protocol internally but this can also be distributed using some other technology to manage the clustering.
event loops handle events coming in on one thread but everything is async so computation gets handled by the loop and when it's done a signal notifies that a result can be used.
So all computation is either callback based or uses something like Reactive.X / fibers / coroutines / channels and the lot.
Due to the simpler communication model for concurrency and other nice features of Vert.x it can actually be faster than a lot of the Blocking and pure multi threaded models out there.
the numbers
Q:
If a single threaded, non-blocking concept is so effective then why can't we have the same non-blocking concept in a multi-threaded environemnt? Won't it be faster? Or again, is it because CPU can execute one thread at a time?
A:
Like a said with the first question it's not really single threaded. Actually when you know something is blocking you'll have to register computation with a method called executeBlocking which wil make it run multithreaded on an ExecutorService managed by Vert.x
The reason why Vert.x's model is mostly faster is also here because event loops make better use of cpu computation features and constraints. This is mostly powered by the Netty project.
the overhead of multi threading with it's locks and syncs imposes to much strain to outdo Vert.x with it's multi reactor pattern.
I am new to multithreading in Java, after looking at Java virtual machine - maximum number of threads it would appear there isn't a limit to how many threads a Java/Android app can run. However, is there an advisable limit? What I mean by this is, is there a number of threads where if you run past this number then it is unwise because you are unable to determine what thread does what at what time? I hope my question makes sense.
There are some advisable limits, however they don't really have anything to do with keeping track of them.
Most multithreading comes with locking. If you are using central data storage or global mutable state then the more threads you have, the more lock contention you will get. This is app-specific and depends on how much of said state you have and how often threads read and write it.
There are no limits in desktop JVMs by default, but there are OS limits.It should be in the tens of thousands for modern Windows machines, but don't rely on the ability to create much more than that.
Running multiple tasks in parallel is great, but the hardware can only cope with so much. If you are using small threads that get fired up sometimes, and spend most their time idle, that's no biggie (Java servers were written like this for years). However if your threads are very intensive, making more of them than the number of cores you have is not likely to give you any benefit. (I believe the standard practice is twice the number of cores if you anticipate threads going idle sometimes).
Threads have a cost to them. Whenever you switch Threads you switch context, and while it isn't that expensive, doing it constantly will hurt performance. It's not a good idea to create a Thread to sum up two integers and write back a result.
If Threads need visibility of each others state, then they are greatly slowed down, since a lot of their writes have to be written back to main memory. Threads are best used for standalone tasks that require little interaction with each other.
TL;DR
Depends on OS and Hardware: on servers creating thousands of threads is fine, on desktop machines you should limit yourself to 50-200 and choose carefully what you do with them.
Note: Androids default and suggested "UI multithread helper" - the AsyncTask is not actually a thread. It's a task invoked from a ThreadPool, and as such there is no limit or penalty to using it. It has an upper limit on the number of threads it spawns and reuses them rather than creating new ones. Most Android apps should use it instead of spawning their own threads. In general, Thread Pools are fairly widespread and are a great choice unless you are forced into blocking operations.
I'm learning reactive programming techniques, with async I/O etc, and I just can't find decent authoritative comparative data about the benefits of not switching threads.
Apparently switching threads is "expensive" compared to computations. But what scale are we talking on?
The essential question is "How many processor cycles/instructions does it take to switch a java thread?" (I'm expecting a range)
Is it affected by OS?
I presume it's affected by number of threads, which is why async IO is so much better than blocking - the more threads, the further away the context has to be stored (presumably even out of the cache into main memory).
I've seen Approximate timings for various operations which although it's (way) out of date, is probably still useful for relating processor cycles (network would likely take more "instructions", SSD disk probably less).
I understand that reactive applications enable web apps to go from 1000's to 10,000's requests per second (per server), but that's hard to tell too - comments welcome
NOTE - I know this is a bit of a vague, useless, fluffy question at the moment because I have little idea on the inputs that would affect the speed of a context switch. Perhaps statistical answers would help - as an example I'd guess >=60% of threads would take between 100-10000 processor cycles to switch.
Thread switching is done by the OS, so Java has little to do with it. Also, on linux at least, but I presume also many other operating systems, the scheduling cost does not depend on the number of threads. Linux has been using an O(1) scheduler since version 2.6.
The thread switch overhead on Linux is some 1.2 µs (article from 2018). Unfortunately the article doesn't list the clock speed at which that was measured, but the overhead should be some 1000-2000 clock cycles or thereabout. On a given machine and OS the thread switching overhead should be more or less constant, not a wide range.
Apart from this direct switching cost there's also the cost of changing workload: the new thread is most likely using a different set of instructions and data, which need to be loaded into the cache, but this cost doesn't differ between a thread switch or an asynchronous programming 'context switch'. And for completeness, switching to an entirely different process has the additional overhead of changing the memory address space, which is also significant.
By comparison, the switching overhead between goroutines in the Go programming language (which uses userspace threads which are very similar to asynchronous programming techniques) was around 170 ns, so one seventh of a linux thread switch.
Whether that is significant for you depends on your use case of course. But for most tasks, the time you spend doing computation will be far more than the context switching overhead. Unless you have many threads that do an absolutely tiny amount of work before switching.
Threading overhead has improved a lot since the early 2000s, and according to the linked article running 10,000 threads in production shouldn't be a problem on a recent server with a lot of memory. General claims of thread switching being slow are often based on yesteryears computers, so take those with a grain of salt.
One remaining fundamental advantage of asynchronous programming is that the userspace scheduler has more knowledge about the tasks, and so can in principle make smarter scheduling decisions. It also doesn't have to deal with processes from different users doing wildly different things that still need to be scheduled fairly. But even that can be worked around, and with the right kernel extensions these Google engineers were able to reduce the thread switching overhead to the same range as goroutine switches (200 ns).
Rugal has a point. In modern architectures theoretical turn-around times are usually far off from actual measurements because the hardware, as well as the software have become so much more complex. It also inherently depends on your application. Many web-applications for example are I/O-bound where the context switch time matters a lot less.
Also note that context switching (what you refer to as thread switching) is an OS thing and not a Java thing. There is no guarantee as to how "heavy" a context switch in your OS is. It used to take tens if not hundreds of thousands of CPU cycles to do a kernel-level switch, but there are also user-level switches, as well as experimental systems, where even kernel-level switches can take only a few hundred cycles.
I am implementing a worker pool in Java.
This is essentially a whole load of objects which will pick up chunks of data, process the data and then store the result. Because of IO latency there will be significantly more workers than processor cores.
The server is dedicated to this task and I want to wring the maximum performance out of the hardware (but no I don't want to implement it in C++).
The simplest implementation would be to have a single Java process which creates and monitors a number of worker threads. An alternative would be to run a Java process for each worker.
Assuming for arguments sake a quadcore Linux server which of these solutions would you anticipate being more performant and why?
You can assume the workers never need to communicate with one another.
One process, multiple threads - for a few reasons.
When context-switching between jobs, it's cheaper on some processors to switch between threads than between processes. This is especially important in this kind of I/O-bound case with more workers than cores. The more work you do between getting I/O blocked, the less important this is. Good buffering will pay for threads or processes, though.
When switching between threads in the same JVM, at least some Linux implementations (x86, in particular) don't need to flush cache. See Tsuna's blog. Cache pollution between threads will be minimized, since they can share the program cache, are performing the same task, and are sharing the same copy of the code. We're talking savings on the order of 100's of nanoseconds to several microseconds per switch. If that's small potatoes for you, then read on...
Depending on the design, the I/O data path may be shorter for one process.
The startup and warmup time for a thread is generally much shorter. The OS doesn't have to start a process, Java doesn't have to start another JVM, classloading is only done once, JIT-compilation is only done once, and HotSpot optimizations are done once, and sooner.
Well usually, when discussing multi processing (/w one thread per process) versus multi threading in the same process, while the theoretical overhead is bigger in the first case than in the latter (and thus multi processing is theoretically slower than multi threading), in reality on most modern OSs this is not such a big issue. However when discussing it in the Java context, starting a new process is a lot more costly then starting a new thread. Starting a new process means starting up a new instance of the JVM which is very costly especially in terms of memory. I recommend that you start multiple threads in the same JVM.
Moreover, if you say inter-thread communication is not an issue, you can use Java's Executor Service to get a fixed thread pool of size 2x(number of available CPUs). The number of available CPU's can be autodetected at runtime via Java's Runtime class. This way you get a quick simple multithreading going without any boiler plate code.
Actually, if you do this with large scale taks using multiple jvm process is way faster than one jvm with multple threads. At least we never got one jvm runnning as fast as multple jvms.
We do some calculations where each task uses around 2-3GB ram and does some heavy number crunching. If we spawn 30 jvm's and run 30 task they perform around 15-20% better than spawning 30 threads in one jvm. We tried tuning the gc and the various memory sections and never catched up to the first variant.
We did this on various machines 14 tasks on a 16 core server, 34 tasks on a 36 core server etc. Multithreading in java always performed worde than multiple jvm processes.
It may not make any difference on simple tasks but on heavy calculations it seems jvm performce bad on threads.