I would like to know yours thoughts about my design from your experience.
I am designing a system having a very critical part:
I have component A,B,C(on the same JVM) which need to "speak" with each other.
I could have two ways doing so:
Method call way(each one holds each other instances (injection, object instance,etc..)
messaging way(topic/queue)
I am aware to cons of having a middle ware messing system(option-2).
BUT:
I am talking about latency considerations.
I need to have those messages reached to the targets in low latency (talking about ms latency).
I would like to choose option-2(the messaging way).
By your experience how much it will affect my latency? again latency is a very huge factor in this decision.
(Programming with Java, not sure which app container yet (Spring, Jboss..)
thanks,
ray.
Given that the messaging way is in-memory, within the same JVM. Then typically most latency comes from a combination of contention (use of synchronized etc), scheduling (how threads are woken up to do their jobs and so forth) and GC. Those sources of latency tend to dwarf everything else.
It is possible to write fairly light weight messaging systems that do not add much overhead. A good example of this would be Akka, which is increasingly finding its way into low latency financial systems. It is better known in the Scala realm, but it does have a Java API.
In conclusion, a messaging system can be implemented to sub millisecond demands. However make sure that it fits your needs first. Just because you can, does not mean that you should. If you are working on a small system then dependency injection/inversion of control may be all that you need to have a good design. However if you are looking at messaging as a way to bring multiple cpu cores into the mix, or some such then I recommend taking a look at Akka. Even if only as a case study.
Related
For what reasons would one choose several processes over several threads to implement an application in Java?
I'm refactoring an older java application which is currently divided into several smaller applications (processes) running on the same multi-core machine, communicating which each other via sockets.
I personally think this should be done using threads rather than processes, but what arguments would defend the original design?
I (and others, see attributions below) can think of a couple of reasons:
Historical Reasons
The design is from the days when only green threads were available and the original author/designer figured they wouldn't work for him.
Robustness and Fault Tolerance
You use components which are not thread safe, so you cannot parallelize withough resorting to multiple processes.
Some components are buggy and you don't want them to be able to affect more than one process. Say, if a component has a memory or resource leak which eventually could force a process restart, then only the process using the component is affected.
Correct multithreading is still hard to do. Depending on your design harder than multiprocessing. The later, however, is arguably also not too easy.
You can have a model where you have a watchdog process that can actively monitor (and eventually restart) crashed worker processes. This may also include suspend/resume of processes, which is not safe with threads (thanks to #Jayan for pointing out).
OS Resource Limits & Governance
If the process, using a single thread, is already using all of the available address space (e.g. for 32bit apps on Windows 2GB), you might need to distribute work amongst processes.
Limiting the use of resources (CPU, memory, etc.) is typically only possible on a per process basis (for example on Windows you could create "job" objects, which require a separate process).
Security Considerations
You can run different processes using different accounts (i.e. "users"), thus providing better isolation between them.
Compatibility Issues
Support multiple/different Java versions: Using differnt processes you can use different Java versions for your application parts (if required by 3rd party libraries).
Location Transparency
You could (potentially) distribute your application over multiple physical machines, thus further increasing scalability and/or robustness of the application (see #Qwe's answer for more Details / the original idea).
If you decide to go with threads you will restrict your app to be run on a single machine. This solution doesn't scale (or scales to some extent) - there are always hardware limits.
And different processes communicating via sockets can be distributed between machines, so that you could add virtually unlimited number or them. This scales better at the cost of slow communication between processes.
Deciding which approach is more suitable is itself a very interesting task. And once you make the decision there's no guarantee that it will look stupid to your successors in a couple of years when requirements change or new hardware becomes available.
I have heard a lot of times Java being more resource efficient than being a very fast language. What does efficiency in terms of resources consumption actually means and how is that beneficial for web applications context ?
For web applications, raw performance of the application code is mostly irrelevant (of course if you make it excessively slow, you're still in trouble) as the main performance bottlenecks are outside your application code.
They're network, databases, application servers, browser rendering time, etc. etc..
So even if Java were slow (it isn't) it wouldn't be so much of a problem.
An application however that consumes inordinate amounts of memory or CPU cycles can bring a server to its knees.
But again, that's often less a problem of the language you use than of the way you use it.
For example, creating a massive memory structure and not properly disposing of it can cause memory problems always. Java however makes it harder to not dispose of your memory (at the cost of a tiny bit of performance and maybe hanging on to that memory longer than strictly necessary). But similar garbage collection systems can be (and have been) built and used with other languages as well.
Resources are cheap these days, esp memory, CPU, disk. So unless you are programming for a phone, or a very complex problem efficiency is unlikely the best reason to use/not use Java.
Java is more efficient than some languages, less efficient than others.
For web applications, there are many other considerations, like user experience to consider first.
A programming language can't be "fast" or "slow" or "resource efficient".
For architecture overview, MSDN Architecture Guide seems nice.
For most accurate/technical description of difference between interface and abstract class, refer to the spec for your language. For Java it's Java Language Spec.
I need to parallelize a CPU intensive Java application on my multicore desktop but I am not so comfortable with threads programming. I looked at Scala but this would imply learning a new language which is really time consuming. I also looked at Ateji PX Java parallel extensions which seem very easy to use but did not have a chance yet to evaluate it. Would anyone recommend it? Other suggestions welcome.
Thanks in advance for your help
Bill
I would suggest you try the built-in ExecutorService for distributing multiple tasks across multiple threads/cores. Do you have any requirements which this might not do for you?
The Java concurrency utilites:
http://download.oracle.com/javase/1.5.0/docs/guide/concurrency/overview.html
make parallel programming on Java even easier than it already was. I would suggest starting there - if you are uncomfortable with that level of working with threads, I would think twice about proceeding further. Parallelizing anything requires some level of technical comfort with how concurrent computation is done and coordinated. In my opinion, it can't get much easier than that framework - which is part of the reason why you see so few alternatives.
Second, the main thing you should think about is what the unit of work is for parallelization. If your unit of work is independent (i.e., each parallel task does not impact the others), this is generally far easier because you don't need to worry about much (or any) synchronization at all. Put effort into thinking how to model the problem so that computation is as independent as possible. If you model it well, you will almost certainly reduce the lines of code (which reduces the error, etc).
Admittedly, frameworks that automatically parallelize for you are less error prone, but can be suboptimal if your model unit of work doesn't play to their parallelization scheme.
I am the lead developer of Ateji PX. As you mention, guaranteeing thread safety is an important topic. It is also a very difficult one, and there's not much help out there beside hand-written and hand-checked #ThreadSafe annotations. See e.g. "The problem with threads".
We are currently working on a parallel verifier for Ateji PX. This has become possible because parallelism in Ateji PX is compositional (unlike threads) and based on a sound mathematical foundation, namely pi-calculus. Even without a tool, experience shows that expressing parallelism in an intuitive and compositional way makes it much easier to "think parallel" and catch errors earlier.
I browsed quickly through the Ateji PX web site. Seems to be a nice product but I'm afraid you will be disappointed at some point since Ateji PX only provides you an intuitive simple way of performing high level parallel operations such as distributing the work load on several workers, creating rendez-vous points between parallel tasks, etc. However as you can read in the FAQ in the section How do you detect and prevent data dependencies? Ateji PX does not ensure that the underlying code is thread safe. So at any rate you'll still be needing skills in Java thread programming.
Edit:
Also consider that when maintenance time will come and you won't be available to perform it, it'll be easier to find a contractor, employee or trainee with skills in standard Java multithread programming than in Ateji PX.
Last word, there's a free 30 days evaluation, try it.
Dont worry java 7 is coming up with Fork Join by Doug lea for Distributed Processing.
I need to build simple server that
reads (potentially large) xml files
processes them in memory(eg transform them to a different xml structure)
writes them back to disk.
Some important aspects of the program:
speed
ability to distribute the server. That means placing (what does that mean) several such servers and each server will handle different volume of xml files.
cross platform
built in a very tight dead line
Basically my question is :
In what programming language should I do it ?
Java ?
speed of development
cross platform
IO operation are high with the right configuration (add a web link here).
C++ ?
execution speed
cross platform (with the right libraries).
however development is slower.
Rather than coding this in a low-level language, you might want to look into an ETL or XSLT engine. They are optimized for performance beyond what you would generally be able to produce on your own, and are generalized enough to accommodate user changes (not sure if your XML transformation is a one-time thing, or if it may change over time).
I'm still a little foggy on your requirements BUT.
You are asking the wrong question. If language really isn't an issue, you should be looking for 3rd party libraries that can handle large amounts of disk io, an libraries that perform XSLT. See which libraries exist for both languages then pick.
Further, if the performance is a key requirement, you'll need to determine whether the process will be IO bound or CPU bound. That will dictate with libraries need to be used as well as general architecture. Are the xml transformations cpu intensive? or can the easily be done with a one or two pass parse?
Tight deadline? The need for parallel operation a given? Then speed is not a problem. Just throw more servers at it, until your throughput matches the demand.
If you're faster in Java, sure, go ahead with that. You might need twice the number of servers, but those can be built in days not weeks.
Portability is never a requirement under tight deadlines. Just ask whoever set those deadlines whether he has made any non-reversible choices. If so, stick with those; if not, pick something and stick with that. You don't have the time to test it on different platforms, so any portability you would have would be theoretical anyway.
I've work in embedded systems and systems programming for hardware interfaces
to date. For fun and personal knowledge, recently I've been trying to learn more about server programming after getting my hands wet with Erlang. I've been going back and thinking about servers from a C++/Java prospective, and now I wonder how scalable systems can be built with technology like C++ or Java.
I've read that due to context-switching and limited memory, a per-client thread handler isn't realistic. Usually a thread-pool is created and a mix of worker-threads and asynchronous I/O is used to handle requests. I wonder, first of all, how does one determine the thread pool size? Does one simply have to measure and find the optimal balance? Eventually as the system scales then perhaps more than one server is needed to handle requests. How are requests managed across mulitple servers handling a large client base?
I am just looking for some direction into where I might be able to read more and find answers to my questions. What area of computer science would I look into for more information in this area? Are there any design patterns for this area of computing?
Your question is too general to have a nice answer. The answer depends greatly on the context, on how much processing any one Thread does, on how rapidly requests arrive, on the CPU family being used, on the web container being used, and on many other factors.
for C++ I've used boost::asio, it's very modern C++, and quite plesant to work with. Also the C++0x network libraries will be based on ASIO's implementation, so it's valuable knowledge.
As for designs 1thread per client, doesn't work, as you've already learned. And for high performance multithreading the best number of threads seems to be CoresX2, but for servers, there is lots of IO per request, which means lots of idle waiting. And from experience, looking at Apache, MySQL, and Oracle the amount of threads is about CoresX10 for database servers, and CoresX40 for web servers, not saying these are the ideals, but they seem to be patterns of succesful systems, so if your system can be balanced to work optimally with similar numbers atleast you'll know your design isn't completely lousy.
C++ Network Programming: Mastering Complexity Using ACE and Patterns and
C++ Network Programming: Systematic Reuse with ACE and Frameworks are very good books that describe many design patterns and their use with the highly portable ACE library.
Like Lothar, we use the ACE library which contains reactor and proactor patterns for handling asynchronous events and asynchronous I/O with C++ code. We use sizable worker thread pools that grow as needed (to a configurable maximum) and shrink over time.
One of the tricks with C++ is how you are going to propagate exceptions and error situations across network boundaries (which isn't handled by the language). I know that there are ways with .NET to throw exceptions across these network boundaries.
One thing you may consider is looking into SOA (Service Oriented Architecture) for dealing with higher level distributed system issues. ACE if really for running at the bare metal of the machine.