Would it be possible to write a Java compiler or Virtual Machine that would let you compile legacy java application that use thread and blocking system call the same way GO program are compiled.
Thus new Thread().run(); would create light weight thread and all blocking system call will instead be asynchronous Operating System call and make the light weight thread yield.
If not, what is the main reason this would be impossible!
Earlier versions of Sun's Java runtime on Solaris (and other UNIX systems) made use of a user space threading system known as "green threads". As described in the Java 1.1 for Solaris documentation:
Implementations of the many-to-one model (many user threads to one kernel thread) allow the application to create any number of threads that can execute concurrently. In a many-to-one (user-level threads) implementation, all threads activity is restricted to user space. Additionally, only one thread at a time can access the kernel, so only one schedulable entity is known to the operating system. As a result, this multithreading model provides limited concurrency and does not exploit multiprocessors. The initial implementation of Java threads on the Solaris system was many-to-one, as shown in the following figure.
This was replaced fairly early on by the use of the operating system's threading support. In the case of Solaris prior to Solaris 9, this was an M:N "many to many" system similar to Go, where the threading library schedules a number of program threads over a smaller number of kernel-level threads. On systems like Linux and newer versions of Solaris that use a 1:1 system where user threads correspond directly with kernel-level threads, this is not the case.
I don't think there has been any serious plans to move the Sun/Oracle JVM away from using the native threading libraries since that time. As history shows, it certainly would be possible for a JVM to use such a model, but it doesn't seem to have been considered a direction worth pursuing.
James Henstridge has already provided good background on Java green threads, and the efficiency problems introduced by exposing native OS threads to the programmer because their use is expensive.
There have been several university attempts to recover from this situation. Two such are JCSP from Kent and CTJ (albeit probably defunct) from Twente. Both offer easy design of concurrency in the Go style (based on Hoare's CSP). But both suffer from the poor JVM performance of coding in this way because JVM threads are expensive.
If performance is not critical, CSP is a superior way to achieve a concurrent design because it avoids the complexities of asynchronous programming. You can use JCSP in production code - I do.
There were reports that the JCSP team also had an experimental JNI-add-on to the JVM to modify the thread semantics to be much more efficient, but I've never seen that in action.
Fortunately for Go you can "have your cake and eat it". You get CSP-based happen-before simplicity, plus top performance. Yay!
Aside: an interesting Oxford University paper reported on a continuation-passing style modification for concurrent Scala programs that allows CSP to be used on the JVM. I'm hoping for further news on this at the CPA2014 conference in Oxford this August (forgive the plug!).
Related
So i am pretty confused. I read in an article that version 1.7 onwards java has been 'core-aware'
Now question is if I use Thread class, will the threads be parallel or concurrent assuming that its a multi-core system and tasks are fully disjoint, and lets assume only this process is running on the system?
What was the situation before 1.7 version, does that mean java was only concurrent back then?
Also tell the same for the ForkJoinPool and ThreadPool (Executor Framework).
Concurrent: Not on the same instant, on same core sequentially i.e. on mercy of Thread Schedular.
Parallel: On the same instant on different cores e.g. 8 threads/4 cores(hyperthreaded).
Thanks a lot in advance
Parallel is concurrent. "Concurrent" means that the effective order in which events from two or more different threads happen is undefined (not counting events like unlocking a mutex that are specifically intended to coordinate the threads.) "Parallel" means, that the threads are using more CPU resources than a single CPU core is able to provide. Threads can't run in parallel without also running concurrently.
What was the situation before 1.7 version
I don't remember what changed with 1.7, but I was using Java from its earliest days, and the language always promised that threads would run concurrently. Whether or not they also were able to run in parallel was outside of the scope of the language spec. It depended on what hardware you were running on, what operating system and version, what JVM and version, etc.
I think that the actual change that the "article" was referring to happened in Java 1.3 when the "green thread" implementation1 was replaced with "native" threads. (Source: https://en.wikipedia.org/wiki/Green_thread)
However, your distinction between Concurrent vs Parallel does not match Oracle / Sun's definitions; see Sun's Multithreaded Programming Guide: Defining Multithreading Terms.
"Parallelism: A condition that arises when at least two threads are executing simultaneously."
"Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.".
This also aligns with what the Wikipedia page on Concurrency says.
"In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems."
If you could give us a citation for the source(s) of your definitions of "Concurrent" and "Parallel", it would help us to understand whether there is a genuine dispute about the terminology ... or ... you are simply misinformed.
1 - Interesting fact: they were called "green threads" because the Sun team that developed the first Java release was called the "Green Team". Source: "Java Technology: The early years" by Jon Byous, April 2003.
So i am pretty confused. I read in an article that version 1.7 onwards java has been 'core-aware'
I think the context matters. Maybe you are talking about this Quora post? To quote:
Ever since Java 7, the JVM has been core-aware and able to access cores within a CPU. If the host has two CPUs with two cores each, then it can create four threads and dispatch them to each of the four cores.
This is not talking about the differences between concurrency theory or parallelism but rather about how the JVM interfaces with the OS and the hardware to provide thread services to the application.
What was the situation before 1.7 version, does that mean java was only concurrent back then?
Java threads have been available for some time before 1.7. Most of the concurrency stuff was greatly improved in 1.5. Again, the post seems specifically about CPUs verses cores. Applications before 1.7 could use multiple cores to run in parallel.
Now question is if I use Thread class, will the threads be parallel or concurrent assuming that its a multi-core system and tasks are fully disjoint, and lets assume only this process is running on the system?
So this part of the question seems to be addressing the academic terms "parallel" and "concurrent". #SolomonSlow sounds like they have more academic instruction around this. I've been programming threads for 30+ years starting when they were non-preemptive – back when reentrance was more about recursion than threads.
To me "concurrent" means in parallel – running concurrency on a single piece of hardware put on different cores (physical or virtual). I understand that this wikipedia page on Concurrency (computer science) disagrees with my definition.
I also understand that a threaded program may run serially depending on many factors including the application itself, the OS, the load on the server running it, etc. and there is a lot of theory behind all this.
Concurrent: Not on the same instant, on same core sequentially i.e. on mercy of Thread Schedular.
This definition I disagree with. The wikipedia page talks about the fact that 2 concurrent units can run in parallel or out of order which could mean sequentially, but it's not part of the definition.
What I know is after JDK 1.2 all Java Threads are created using 'Native Thread Model' which associates each Java Thread with an OS thread with the help of JNI and OS Thread library.
So from the following text I believe that all Java threads created nowadays can realize use of multi-core processors:
Multiple native threads can coexist. Therefore it is also called many-to-many model. Such characteristic of this model allows it to take complete advantage of multi-core processors and execute threads on separate individual cores concurrently.
But when I read about the introduction of Fork/Join Framework introduced in JDK 7 in JAVA The Compelete Reference :
Although the original concurrent API was impressive in its own right, it was significantly expanded by JDK 7. The most important addition was the Fork/Join Framework. The Fork/Join Framework facilitates the creation of programs that make use of multiple processors (such as those found in multicore systems). Thus, it streamlines the development of programs in which two or more pieces execute with true simultaneity (that is, true parallel execution), not just time-slicing.
It makes me question why the framework was introduced when 'Java Native Thread Model' existed since JDK 3?
Fork join framework does not replace the original low level thread API; it makes it easier to use for certain classes of problems.
The original, low-level thread API works: you can use all the CPUs and all the cores on the CPUs installed on the system. If you ever try to actually write multithreaded applications, you'll quickly realize that it is hard.
The low level thread API works well for problems where threads are largely independent, and don't have to share information between each other - in other words, embarrassingly parallel problems. Many problems however are not like this. With the low level API, it is very difficult to implement complex algorithms in a way that is safe (produces correct results and does not have unwanted effects like dead lock) and efficient (does not waste system resources).
The Java fork/join framework, an implementation on the fork/join model, was created as a high level mechanism to make it easier to apply parallel computing for divide and conquer algorithms.
So I watched this video from a number of years ago (2008), where Joe Armstrong explains the background of Erlang.
video link
He makes quite a case, and the piece I am asking about is when he says this at 13:07:
[Erlang is a] concurrent language; by that I mean that processes in
the language are part of the programming language. They do not belong
to the operating system. That is really what is wrong with languages
like Java and C++ is that the threads are not in the programming
language; the threads are something that is in the operating system
and they inherit all of the problems that they have in the OS. One of
the problems is the granularity of the memory management system...
And he goes on about the issues with thread management and how that relates to this disconnect between the language and the OS. And THEN goes on to say that Erlang is uniquely positioned to leverage multi-core technology for that reason, namely that it can manage cores 'directly' without using threads at all? Or did either understand him wrong, or perhaps one or more new languages has arisen in the last 8 years to challenge Erlang in this arena?
Thanks very much for any references or comments that might shed light on this matter.
Erlang VM spawns OS threads (one per CPU core in default) and they run process schedulers. (VM also can spawn more threads for IO operations, drivers and in NIFs but they don't run Erlang code.) Schedulers schedule execution of code in Erlang processes. Each Erlang processes are (can be and should) very lightweight compared to OS processes and OS threads and are completely separated from each other. It allows a unique design of applications with easy, safety, robustness, and elegance utilizing multicore HW. For more information about the mapping Erlang processes to cores see my answer to other question. For a detailed explanation how scheduling work in detail see Erlang Scheduler Details and Why It Matters blog.
It is obvious that OS scheduling/ threading algorithms have their impact on Java threads but
can we safely say that Threads are OS/machine dependant?
If this is the case then doesn't it make Java platform dependant?
Yes, the details of the scheduling of threads in Java depends on the JVM implementation and (usually) on the OS implementation as well.
But the specifics of that scheduling is also not specified in the Java SE specification, only a selected few ground rules are specified.
This means that as long as the OS specific scheduling is conforming to those ground rules, it is also conforming to the JVM spec.
If your code depends on scheduling specifics that are not specified in the JVM spec, then it depends on implementation details and can not be expected to work everywhere.
That's pretty much the same situation as file I/O: if you hard-code paths and use a fixed directory separator, then you're working outside the spec and can not expect your code to work cross-platform.
Edit: The JVM implementation itself (i.e. the JRE) is platform dependent, of course. It provides the layer that allows pure Java programs to not care about the platform specifics. To achieve this, the JRE has to be paltform specific.
... Java will usually use native threads, but on some operating
systems it uses so called "green threads", which the JVM handles
itself and is executed by a single native thread.
You shouldn't have to worry about this. It is all handled by the JVM,
and is invisible to the programmer. The only real difference I can
think of is that on an implementation that uses green threads, there
will be no performance gain from multi-threaded divide-and-conquer
algorithms. However, the same lack of performance gain is true for
implementations that use native threads, but run on a machine with a
single core.
Excerpt from JVM & Java Threads Scheduling
Even on the same platform, if you write unsafe multi-thread code, behavior can depend on the full configuration details, the rest of the machine load, and a lot of luck, as well as hardware and OS. An unsafe program can work apparently correctly one day, and fail the next on the same hardware with more-or-less the same workload.
If you write safe multi-thread code, code that depends only on what is promised in the Java Language Specification and the library APIs, the choice of platform can, of course, affect performance, but not whether it works functionally.
I'm looking for a development job and see that many listings specify that the developers must be versed in multithreading. This appears both for Java job listings, and for C++ listings that involve "system programming" on UNIX.
In the past few years I have been working with Java and using its various synchronization mechanisms.
In the late 90s I did a lot of C++ work, though very little threads. In college, however, we used threads on Solaris.
My question is whether there are significant differences in the issues that developers in C/C++ face compared to developers in Java, and whether any of the techniques to address them are fundamentally different. Java obviously includes some nicer mechanisms and synchronized versions of collections, etc.
If I want to refresh or relearn threading on UNIX, what's the best approach? Which library should I look at? etc. Is there some great current tutorial on threads in c++?
The fundamental challenges of threading (e.g. synchronization, race conditions, inter-thread communication, resource cleanup), but Java makes thread much more manageable with garbage collection, exceptions, advanced synchronization objects, advanced debugging support with reflection.
With C++, you are much more likely to have memory corruption and "impossible" race conditions. And you will need to write a lot more low-level thread primitives or rely on libraries (like boost) that are not part of the standardized language.
C++ is actually aeasier to write complex threaded code in than Java because it has a feature Java lacks - RAII or "resource acquisition is initialisation". This idiom is used for all all resource control in well written C++ code, but is particularly appropriate in multi-threaded code where automatic management of synchronisation is a must.
Look at pthreads and boost (the pthreads one was a random lijnk, but it looks ok as a starting point).
At a high level the issues for Java/C/C++/ are the same. The specifics about how you solve the problem (functions to call, classes to create, etc...) vary language to language.
Garbage collection makes programming threads that do not leak memory easier, and there are fancy things you can do to address the timing of the collections.
Deterministic destructors make programming threads that do not spawn zombies easier, see ACM paper here
It depends on what level you choose to work at. Intel TBB and OpenMP handle a lot of common cases from a pretty high level. Posix threads, Windows APIs, and portable libraries like Boost threads bring you closer to the same level as primitives in Java.
C++0x threading (especially with acquire and release memory barriers) allow you to go to an even lower level for more control and complexity than Java offers (marking a variable volatile in Java gives it both an acquire and a release memory barrier, but in Java you can't ask for just the acquire or just the release barrier; where in C++0x you can).
Please note that C++0x's threading model is intentionally low level with the hope that people will build things like TBB on top of it and the next time the standards committee meets they'll be able to figure out which of those higher level libraries and toolkits work well enough to learn from.
Regardless of the programming language being uses, the idiosyncrasy of thread are common. For instance even across OS the POSIX threads & WIN32 threads have same set of logical idiosyncrasies, though the API calls & native implementation WRT underlying hardware/kernel might change, but to system programmers the logical thinking about threads & how to make them work as expected & in achieving this is the most hardest part. This is even true when coming to programming languages. If you really understand the concept of threading & thread synchronization you are good to go & use them in any programming languages you like. Since these programming languages provide syntactic sugars on top of the native thread/thread synchronization implementation.