There is at least three well-known approaches for creating concurrent applications:
Multithreading and memory synchronization through locking(.NET, Java). Software Transactional Memory (link text) is another approach to synchronization.
Asynchronous message passing (Erlang).
I would like to learn if there are other approaches and discuss various pros and cons of these approaches applied to large distributed applications. My main focus is on simplifying life of the programmer.
For example, in my opinion, using multiple threads is easy when there is no dependencies between them, which is pretty rare. In all other cases thread synchronization code becomes quite cumbersome and hard to debug and reason about.
I'd strongly recommend looking at this presentation by Rich Hickey. It describes an approach to building high performance, concurrent applications which I would argue is distinct from lock-based or message-passing designs.
Basically it emphasises:
Lock free, multi-threaded concurrent applications
Immutable persistent data structures
Changes in state handled by Software Transactional Memory
And talks about how these principles influenced the design of the Clojure language.
Read Herb Sutter's Effective Concurrency column, and you too will be enlightened.
With the Java 5 concurrency API, doing concurrent programming in Java doesn't have to be cumbersome and difficult as long as you take advantage of the high-level utilities and use them correctly. I found the book, Java Concurrency in Practice by Brian Goetz, to be an excellent read about this subject. At my last job, I used the techniques from this book to make some image processing algorithms scale to multiple CPUs and to pipeline CPU and disk bound tasks. I found it to be a great experience and we got excellent results.
Or if you are using C++ you could try OpenMP, which uses #pragma directives to make loops parallel, although I've never used it myself.
In Erlang and OTP in Action, the authors present four process communication paradigms:
Shared memory with locks
A construct (lock) is used to
restrict access to shared resources.
Hardware support is often required
from the memory system, in terms of
special instructions. Among the possible drawbacks of this approach: overhead, points of contention in the memory system, debug difficulty, especially with huge number of processes.
Software Transactional Memory
Memory is treated as a database,
where transactions decide what to
write and when. The main problem here
is represented by the possible
contentions and by the number of
failed transaction attempts.
Futures, promises and similar
The basic idea is that a future is a
result of a computation that has been
outsourced to a different process
(potentially on a different CPU or
machine) and that can be passed around
like any other object. In case of
network failures problem can arise.
Message passing
Synchronous or asynchronous, in
Erlang style.
Related
I would like to know yours thoughts about my design from your experience.
I am designing a system having a very critical part:
I have component A,B,C(on the same JVM) which need to "speak" with each other.
I could have two ways doing so:
Method call way(each one holds each other instances (injection, object instance,etc..)
messaging way(topic/queue)
I am aware to cons of having a middle ware messing system(option-2).
BUT:
I am talking about latency considerations.
I need to have those messages reached to the targets in low latency (talking about ms latency).
I would like to choose option-2(the messaging way).
By your experience how much it will affect my latency? again latency is a very huge factor in this decision.
(Programming with Java, not sure which app container yet (Spring, Jboss..)
thanks,
ray.
Given that the messaging way is in-memory, within the same JVM. Then typically most latency comes from a combination of contention (use of synchronized etc), scheduling (how threads are woken up to do their jobs and so forth) and GC. Those sources of latency tend to dwarf everything else.
It is possible to write fairly light weight messaging systems that do not add much overhead. A good example of this would be Akka, which is increasingly finding its way into low latency financial systems. It is better known in the Scala realm, but it does have a Java API.
In conclusion, a messaging system can be implemented to sub millisecond demands. However make sure that it fits your needs first. Just because you can, does not mean that you should. If you are working on a small system then dependency injection/inversion of control may be all that you need to have a good design. However if you are looking at messaging as a way to bring multiple cpu cores into the mix, or some such then I recommend taking a look at Akka. Even if only as a case study.
In the shared-address-space model there is a common address space shared between
processes and represented as data structures in memory (like ConcurrentHashMap). This gives the advantage of very fast data sharing as shared objects are located on a single computer (let us suppose so for simplicity). As processes may collide, various lock mechanisms (mutex) are helpful to ensure mutual exclusion in accessing shared memory.
This scheme lacks of scalability as an increase in processor number can raise geometrically the traffic on shared memory and a single computer can not have more than say 8 processors.
In the message-passing model there is no sense of global address space. Each
process has its one private local memory. Processes can communicate with each
other via passing messages. Unlike shared-address-space, the message-
passing model offers scalability between processors and memory, although requires the common data to be replicated. An increase in processors will proportionally increase the memory (for that data) size as well, though no lock mechanisms are required in this case.
Reading "Thinking in Java" for inspiration I find only a talk about the shared-address-space model with synchronization principles. As my problem grows in complexity I'm going to try the message-passing paradigm, which as far as I'm not blind, is not presented in the book.
Could you please recommend Java native classes or any proved external library to work with the message-passing model, something like MPI in C++? Any link to that source would be highly appreciated!
Akka is a commonly-used actor framework for the JVM - available for both Java and Scala.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.
http://hadoop.apache.org/
I need to parallelize a CPU intensive Java application on my multicore desktop but I am not so comfortable with threads programming. I looked at Scala but this would imply learning a new language which is really time consuming. I also looked at Ateji PX Java parallel extensions which seem very easy to use but did not have a chance yet to evaluate it. Would anyone recommend it? Other suggestions welcome.
Thanks in advance for your help
Bill
I would suggest you try the built-in ExecutorService for distributing multiple tasks across multiple threads/cores. Do you have any requirements which this might not do for you?
The Java concurrency utilites:
http://download.oracle.com/javase/1.5.0/docs/guide/concurrency/overview.html
make parallel programming on Java even easier than it already was. I would suggest starting there - if you are uncomfortable with that level of working with threads, I would think twice about proceeding further. Parallelizing anything requires some level of technical comfort with how concurrent computation is done and coordinated. In my opinion, it can't get much easier than that framework - which is part of the reason why you see so few alternatives.
Second, the main thing you should think about is what the unit of work is for parallelization. If your unit of work is independent (i.e., each parallel task does not impact the others), this is generally far easier because you don't need to worry about much (or any) synchronization at all. Put effort into thinking how to model the problem so that computation is as independent as possible. If you model it well, you will almost certainly reduce the lines of code (which reduces the error, etc).
Admittedly, frameworks that automatically parallelize for you are less error prone, but can be suboptimal if your model unit of work doesn't play to their parallelization scheme.
I am the lead developer of Ateji PX. As you mention, guaranteeing thread safety is an important topic. It is also a very difficult one, and there's not much help out there beside hand-written and hand-checked #ThreadSafe annotations. See e.g. "The problem with threads".
We are currently working on a parallel verifier for Ateji PX. This has become possible because parallelism in Ateji PX is compositional (unlike threads) and based on a sound mathematical foundation, namely pi-calculus. Even without a tool, experience shows that expressing parallelism in an intuitive and compositional way makes it much easier to "think parallel" and catch errors earlier.
I browsed quickly through the Ateji PX web site. Seems to be a nice product but I'm afraid you will be disappointed at some point since Ateji PX only provides you an intuitive simple way of performing high level parallel operations such as distributing the work load on several workers, creating rendez-vous points between parallel tasks, etc. However as you can read in the FAQ in the section How do you detect and prevent data dependencies? Ateji PX does not ensure that the underlying code is thread safe. So at any rate you'll still be needing skills in Java thread programming.
Edit:
Also consider that when maintenance time will come and you won't be available to perform it, it'll be easier to find a contractor, employee or trainee with skills in standard Java multithread programming than in Ateji PX.
Last word, there's a free 30 days evaluation, try it.
Dont worry java 7 is coming up with Fork Join by Doug lea for Distributed Processing.
I've work in embedded systems and systems programming for hardware interfaces
to date. For fun and personal knowledge, recently I've been trying to learn more about server programming after getting my hands wet with Erlang. I've been going back and thinking about servers from a C++/Java prospective, and now I wonder how scalable systems can be built with technology like C++ or Java.
I've read that due to context-switching and limited memory, a per-client thread handler isn't realistic. Usually a thread-pool is created and a mix of worker-threads and asynchronous I/O is used to handle requests. I wonder, first of all, how does one determine the thread pool size? Does one simply have to measure and find the optimal balance? Eventually as the system scales then perhaps more than one server is needed to handle requests. How are requests managed across mulitple servers handling a large client base?
I am just looking for some direction into where I might be able to read more and find answers to my questions. What area of computer science would I look into for more information in this area? Are there any design patterns for this area of computing?
Your question is too general to have a nice answer. The answer depends greatly on the context, on how much processing any one Thread does, on how rapidly requests arrive, on the CPU family being used, on the web container being used, and on many other factors.
for C++ I've used boost::asio, it's very modern C++, and quite plesant to work with. Also the C++0x network libraries will be based on ASIO's implementation, so it's valuable knowledge.
As for designs 1thread per client, doesn't work, as you've already learned. And for high performance multithreading the best number of threads seems to be CoresX2, but for servers, there is lots of IO per request, which means lots of idle waiting. And from experience, looking at Apache, MySQL, and Oracle the amount of threads is about CoresX10 for database servers, and CoresX40 for web servers, not saying these are the ideals, but they seem to be patterns of succesful systems, so if your system can be balanced to work optimally with similar numbers atleast you'll know your design isn't completely lousy.
C++ Network Programming: Mastering Complexity Using ACE and Patterns and
C++ Network Programming: Systematic Reuse with ACE and Frameworks are very good books that describe many design patterns and their use with the highly portable ACE library.
Like Lothar, we use the ACE library which contains reactor and proactor patterns for handling asynchronous events and asynchronous I/O with C++ code. We use sizable worker thread pools that grow as needed (to a configurable maximum) and shrink over time.
One of the tricks with C++ is how you are going to propagate exceptions and error situations across network boundaries (which isn't handled by the language). I know that there are ways with .NET to throw exceptions across these network boundaries.
One thing you may consider is looking into SOA (Service Oriented Architecture) for dealing with higher level distributed system issues. ACE if really for running at the bare metal of the machine.
I'm looking for a development job and see that many listings specify that the developers must be versed in multithreading. This appears both for Java job listings, and for C++ listings that involve "system programming" on UNIX.
In the past few years I have been working with Java and using its various synchronization mechanisms.
In the late 90s I did a lot of C++ work, though very little threads. In college, however, we used threads on Solaris.
My question is whether there are significant differences in the issues that developers in C/C++ face compared to developers in Java, and whether any of the techniques to address them are fundamentally different. Java obviously includes some nicer mechanisms and synchronized versions of collections, etc.
If I want to refresh or relearn threading on UNIX, what's the best approach? Which library should I look at? etc. Is there some great current tutorial on threads in c++?
The fundamental challenges of threading (e.g. synchronization, race conditions, inter-thread communication, resource cleanup), but Java makes thread much more manageable with garbage collection, exceptions, advanced synchronization objects, advanced debugging support with reflection.
With C++, you are much more likely to have memory corruption and "impossible" race conditions. And you will need to write a lot more low-level thread primitives or rely on libraries (like boost) that are not part of the standardized language.
C++ is actually aeasier to write complex threaded code in than Java because it has a feature Java lacks - RAII or "resource acquisition is initialisation". This idiom is used for all all resource control in well written C++ code, but is particularly appropriate in multi-threaded code where automatic management of synchronisation is a must.
Look at pthreads and boost (the pthreads one was a random lijnk, but it looks ok as a starting point).
At a high level the issues for Java/C/C++/ are the same. The specifics about how you solve the problem (functions to call, classes to create, etc...) vary language to language.
Garbage collection makes programming threads that do not leak memory easier, and there are fancy things you can do to address the timing of the collections.
Deterministic destructors make programming threads that do not spawn zombies easier, see ACM paper here
It depends on what level you choose to work at. Intel TBB and OpenMP handle a lot of common cases from a pretty high level. Posix threads, Windows APIs, and portable libraries like Boost threads bring you closer to the same level as primitives in Java.
C++0x threading (especially with acquire and release memory barriers) allow you to go to an even lower level for more control and complexity than Java offers (marking a variable volatile in Java gives it both an acquire and a release memory barrier, but in Java you can't ask for just the acquire or just the release barrier; where in C++0x you can).
Please note that C++0x's threading model is intentionally low level with the hope that people will build things like TBB on top of it and the next time the standards committee meets they'll be able to figure out which of those higher level libraries and toolkits work well enough to learn from.
Regardless of the programming language being uses, the idiosyncrasy of thread are common. For instance even across OS the POSIX threads & WIN32 threads have same set of logical idiosyncrasies, though the API calls & native implementation WRT underlying hardware/kernel might change, but to system programmers the logical thinking about threads & how to make them work as expected & in achieving this is the most hardest part. This is even true when coming to programming languages. If you really understand the concept of threading & thread synchronization you are good to go & use them in any programming languages you like. Since these programming languages provide syntactic sugars on top of the native thread/thread synchronization implementation.