Server Design and Implementation - java

I've work in embedded systems and systems programming for hardware interfaces
to date. For fun and personal knowledge, recently I've been trying to learn more about server programming after getting my hands wet with Erlang. I've been going back and thinking about servers from a C++/Java prospective, and now I wonder how scalable systems can be built with technology like C++ or Java.
I've read that due to context-switching and limited memory, a per-client thread handler isn't realistic. Usually a thread-pool is created and a mix of worker-threads and asynchronous I/O is used to handle requests. I wonder, first of all, how does one determine the thread pool size? Does one simply have to measure and find the optimal balance? Eventually as the system scales then perhaps more than one server is needed to handle requests. How are requests managed across mulitple servers handling a large client base?
I am just looking for some direction into where I might be able to read more and find answers to my questions. What area of computer science would I look into for more information in this area? Are there any design patterns for this area of computing?

Your question is too general to have a nice answer. The answer depends greatly on the context, on how much processing any one Thread does, on how rapidly requests arrive, on the CPU family being used, on the web container being used, and on many other factors.

for C++ I've used boost::asio, it's very modern C++, and quite plesant to work with. Also the C++0x network libraries will be based on ASIO's implementation, so it's valuable knowledge.
As for designs 1thread per client, doesn't work, as you've already learned. And for high performance multithreading the best number of threads seems to be CoresX2, but for servers, there is lots of IO per request, which means lots of idle waiting. And from experience, looking at Apache, MySQL, and Oracle the amount of threads is about CoresX10 for database servers, and CoresX40 for web servers, not saying these are the ideals, but they seem to be patterns of succesful systems, so if your system can be balanced to work optimally with similar numbers atleast you'll know your design isn't completely lousy.

C++ Network Programming: Mastering Complexity Using ACE and Patterns and
C++ Network Programming: Systematic Reuse with ACE and Frameworks are very good books that describe many design patterns and their use with the highly portable ACE library.

Like Lothar, we use the ACE library which contains reactor and proactor patterns for handling asynchronous events and asynchronous I/O with C++ code. We use sizable worker thread pools that grow as needed (to a configurable maximum) and shrink over time.
One of the tricks with C++ is how you are going to propagate exceptions and error situations across network boundaries (which isn't handled by the language). I know that there are ways with .NET to throw exceptions across these network boundaries.
One thing you may consider is looking into SOA (Service Oriented Architecture) for dealing with higher level distributed system issues. ACE if really for running at the bare metal of the machine.

Related

Are there some common techniques for Java Web Servers to detect when approaching overload?

In a period where microservices become more and more predominant, I was wondering if there are some common techniques by which Java Web Services detect overload before start deteriorating.
Unfortunately, it doesn't look like there is One golden rule (e.g. CPU never has to go above 50%), and it seems a topic where the best heuristic wins.
Online there are several resources pointing out to a X number of CPU, GC pauses, Network IO etc, but I was wondering if there were more scientific studies/documented use cases to catch these signaling.

Dynamically messaging vs Latency design issue

I would like to know yours thoughts about my design from your experience.
I am designing a system having a very critical part:
I have component A,B,C(on the same JVM) which need to "speak" with each other.
I could have two ways doing so:
Method call way(each one holds each other instances (injection, object instance,etc..)
messaging way(topic/queue)
I am aware to cons of having a middle ware messing system(option-2).
BUT:
I am talking about latency considerations.
I need to have those messages reached to the targets in low latency (talking about ms latency).
I would like to choose option-2(the messaging way).
By your experience how much it will affect my latency? again latency is a very huge factor in this decision.
(Programming with Java, not sure which app container yet (Spring, Jboss..)
thanks,
ray.
Given that the messaging way is in-memory, within the same JVM. Then typically most latency comes from a combination of contention (use of synchronized etc), scheduling (how threads are woken up to do their jobs and so forth) and GC. Those sources of latency tend to dwarf everything else.
It is possible to write fairly light weight messaging systems that do not add much overhead. A good example of this would be Akka, which is increasingly finding its way into low latency financial systems. It is better known in the Scala realm, but it does have a Java API.
In conclusion, a messaging system can be implemented to sub millisecond demands. However make sure that it fits your needs first. Just because you can, does not mean that you should. If you are working on a small system then dependency injection/inversion of control may be all that you need to have a good design. However if you are looking at messaging as a way to bring multiple cpu cores into the mix, or some such then I recommend taking a look at Akka. Even if only as a case study.

Hard time choosing ... IO vs NIO

I would like to ask what would be more appropriate to choose when developing a server similar to SmartFoxServer. I intend to develop a similar yet different server. In the benchmarks made by the ones that developed the above server they had something like 10000 concurrent clients.
I made a bit of research regarding the cost of using too many threads(>500) but cannot decide which way to go. I once made a server in java but that was for a small application and had nothing to do with heavy loads.
Thanks
Take a look at Apache Mina. They've done alot of the heavy lifting required to use NIO effectively in a networking application. Whether or not NIO increases your ability to process concurrent connections really depends on your implementation, but the performance boosts in Tomcat, JBoss and Jetty are plenty evidence to you already in the positive.
i'm not familiar with smartfoxserver, so i can only speak generically (which is not always good :P but here i go)
i think those are 2 different questions. on one hand, the io performance when using native java sockets vs. native sockets written in c (like tomcat).
the other question is how to scale up to that kind of concurrency level. other than that, i'd always choose native sockets (i.e: c).
now, how to scale: it's not a good idea to have a lot of threads running at the same time (os constraints, etc), so i'd choose to scale horizontally, meaning to add a load balancer that can send the requests to different servers that can be linked by using messages (using jms, like rabbitmq or activemq, or even using a protocol like stomp or amqp).
other solution, a cloud environment that allows you to grow your installation as you need
In most benchmarks which test 10K or 100K connections, the server is doing no work and unless your server does next to nothing, these test are unrealistic.
You need to take a clear idea of mow many concurrent connections you want to support.
If you have less than 1K connection, using a thread per connection will work ok. This is the simplest approach to take. Using a dispatcher model with NIO will work better if your request are very simple. Otherwise it won't matter much.
If you have more than 1K connections it is likely you want to use more than one server as each connection is getting less than 1% of a core and the cost of a basic server is relatively cheap these days.

Concurrent programming techniques, pros, cons

There is at least three well-known approaches for creating concurrent applications:
Multithreading and memory synchronization through locking(.NET, Java). Software Transactional Memory (link text) is another approach to synchronization.
Asynchronous message passing (Erlang).
I would like to learn if there are other approaches and discuss various pros and cons of these approaches applied to large distributed applications. My main focus is on simplifying life of the programmer.
For example, in my opinion, using multiple threads is easy when there is no dependencies between them, which is pretty rare. In all other cases thread synchronization code becomes quite cumbersome and hard to debug and reason about.
I'd strongly recommend looking at this presentation by Rich Hickey. It describes an approach to building high performance, concurrent applications which I would argue is distinct from lock-based or message-passing designs.
Basically it emphasises:
Lock free, multi-threaded concurrent applications
Immutable persistent data structures
Changes in state handled by Software Transactional Memory
And talks about how these principles influenced the design of the Clojure language.
Read Herb Sutter's Effective Concurrency column, and you too will be enlightened.
With the Java 5 concurrency API, doing concurrent programming in Java doesn't have to be cumbersome and difficult as long as you take advantage of the high-level utilities and use them correctly. I found the book, Java Concurrency in Practice by Brian Goetz, to be an excellent read about this subject. At my last job, I used the techniques from this book to make some image processing algorithms scale to multiple CPUs and to pipeline CPU and disk bound tasks. I found it to be a great experience and we got excellent results.
Or if you are using C++ you could try OpenMP, which uses #pragma directives to make loops parallel, although I've never used it myself.
In Erlang and OTP in Action, the authors present four process communication paradigms:
Shared memory with locks
A construct (lock) is used to
restrict access to shared resources.
Hardware support is often required
from the memory system, in terms of
special instructions. Among the possible drawbacks of this approach: overhead, points of contention in the memory system, debug difficulty, especially with huge number of processes.
Software Transactional Memory
Memory is treated as a database,
where transactions decide what to
write and when. The main problem here
is represented by the possible
contentions and by the number of
failed transaction attempts.
Futures, promises and similar
The basic idea is that a future is a
result of a computation that has been
outsourced to a different process
(potentially on a different CPU or
machine) and that can be passed around
like any other object. In case of
network failures problem can arise.
Message passing
Synchronous or asynchronous, in
Erlang style.

Socket Programming, Java, Tomcat 6, Scaling

I'm pretty new to web programming and I'm currently developing a web back end for a mobile application. Currently I have the users log in using servlet interactions and once they have full access to the application I need to open a Socket Connection so that I can provide server pushes. Now the problem I'm running into is how people handle thousands of concurrent socket connections. I've run into people talking about ThreadPools which seems pretty easy to implement and NIO. Is there some framework that I can work with to ensure my servers are handling at least 20-30k concurrent connections. I could also forget TCP connections and go for Long-polling but from my understanding TCP is best option resource wise.
#Steve - I'm looking at the former: One serversocket with thousands of connections.
I would look into clustering the web end immediately and use that as your primary scaling mechanism. 30k connections is quite a lot and you don't have much room for growth before you hit a server limit of some kind. If the I/O itself isn't onerous I would just use lots of threads and servers with lots of horsepower and memory. Get it working that way so you can ship, and have a fallback plan to switch to multiplexed NIO if performance or scaling becomes a problem, but be warned that it's a radical overhaul and about ten times as complex to program as java.net. After several years' consideration I am more and more wondering whether NIO to economize on threads is really worth it: it adds several new problems of its own such as a need for push parsing; synchronization issues with the selector if there are worker threads that need to change the registration state of channels; lots of ways to get the code wrong; and the fact that the scheduling overhead moves out of the OS into your application, where you only have linear set-iterator data structures to deal with it unless you engage in yet another level of complexity. It's worth remembering that select() was invented for Unix to allow economizing on processes, which are expensive. Threads are pretty cheap really, and provide a very simple programming model with built-in context for handling a single connection. NIO barely manages this at all except via disciplined use of selection key attachments, much less naturally.

Categories

Resources