I am looking to optimize a microservice architecture that currently uses HTTP/REST for internal node-to-node communication.
One option is implementing backpressure capability into the services, (eg) by integrating something like Quasar into the stack. This would no doubt improve things. But I see a couple challenges. One is, the async client threads are transient (in memory) and on client failure (crash), these retry threads will be lost. The second, in theory, if a target server is down for some time, the client could eventually reach OOM attempting retry because threads are ultimately limited, even Quasar Fibers.
I know it's a little paranoid, but I'm wondering if a queue-based alternative would be more advantageous at very large scale.
It would still work asynchronously like Quasar/fibers, except a) the queue is centrally managed and off the client JVM, and b) the queue can be durable, so that in the event client and or target servers go down, no in flight messages are lost.
The downside to queue of course is that there are more hops and it slows down the system. But I'm thinking there is probably a sweet spot where Quasar ROI peaks and a centralized and durable queue becomes more critical to scale and HA.
My question is:
Has this tradeoff been discussed? Are there any papers on using a
centralized external queue / router approach for intraservice
communication.
TL;DR; I just realized I could probably phrase this question as:
"When is it appropriate to use Message Bus based intraservice
communication as opposed to direct HTTP within a microservice
architecture."
I've seen three general protocol design patterns with microservices architectures, when running at scale:
Message bus architecture, using a central broker such as ActiveMQ or Apache Qpid.
"Resilient" HTTP, where some additional logic is built on HTTP to make it more resilient. Typical approaches here are Hystrix (Java), or SmartStack/Baker St (smart proxy).
Point-to-point asynchronous messaging using something like NSQ, ZMQ, or Qpid Proton.
By far the most common design pattern is #2, with a little bit of #1 mixed in when a queue is desirable.
In theory, #3 offers the best of both worlds (resiliency AND scale AND performance) but the technologies are all somewhat immature. It turns out that with #2 you can get really very far (e.g., Netflix uses Hystrix everywhere).
To answer your question directly, I'd say that #1 is very rarely used as an exclusive design pattern because it creates a single bottleneck for your entire system. #1 is common for a subset of the system. For most people, I'd recommend #2 today.
Related
In a project of mine, I decided to use Vertx for the HTTP APIs, given its proven performance record. Then, because the application does use event queues internally I started wondering if I should use Vertx event bus and verticles, instead of using my usual ArrayBlockingQueue. I am still quite new to Vertx so I don't know how suitable it could be. I've experience with Akka and Actors and those would fit the bill very well, but I'm not sure if Vertx event bus is designed to scale to 100k events per second?
I work with Vert.x since version 3 and have done some projects with it (It's my main stack, since a couple of years). I did never run into a situation where the event bus was the limiting factor. The event bus is designed to handle such an amount of event and even much more. As #injecteer mentioned, the limiting factor is basically the hardware and how many events will be processed depends on what do you with them and how you scale your code.
Vert.x follows consequently a non-blocking programming model and you should follow it as well ... never be blocking. Vert.x has the concept of loose coupling, that's solved with portioning of the code with "verticles" https://vertx.io/docs/vertx-core/java/#_verticles. You can deploy/start multiple instances of those verticles (your pieces of code). A further base concept is event loop threads (default count of cores * 2).
Each deployed verticle instance will run on a specific event loop thread, and ALL registered handlers (event bus, http server, etc.) are get called on this specific event loop thread at any time. This way you are able to scale your code in a "per thread" fashion, according to your needs. Events over the event bus are distributed with round robin between the verticle instance (and the handlers within the verticles) ... btw handlers of http requests are also distributed with round robin.
Clustered mode is a bit different. How do you (de)serialize dtos (Json, Protobuf, etc.) can become a significant difference in terms of performance. A clustered event bus has TCP sockets between all nodes, means events are sent point-to-point. The cluster manager (Hazelcast is the default) on the other hand defines to which node an event should get send to (round robin on cluster level), but events are NOT sent over the cluster manager. E.g. the cluster manager knows which node has consumers registered on the event bus (on which address).
Since Vert.x 4 milestone 5 the cluster manager SPI provides an entry point where you can implement your own alternative to round robin, e.g. load specific distribution etc.
There are some basic concepts like event loop threads, non-blocking programming, and verticles (which is not mandatory but recommended). If / when those concepts are clear you get a very flexible base for near any kind of application. I personally love it and also did never see any other framework/technology that reaches near a comparable performance (with a proper scaling that fit the load).
I benchmarked Vert.x event bus (using pure Vert.x for pub and sub) and found it to max out at around 100K msg/s / CPU (using high-end Xeon CPU). Interestingly the performance was comparable to Vert.x's WebSockets implementation so I agree it's not the bottleneck if you do:
WS -> Event Bus
But if you do 10 hops on the Event Bus then it could be the bottleneck.
I observed the performance of the LMAX Disrupter to be much higher but once you introduce I/O then the I/O become the bottleneck with Disrupter. The problem with disrupter is that you can't use it with Netty.
From my understanding all libraries running in a single JVM would have comparable performance levels and are limited by your hardware and settings.
So, local event-bus would perform as good as any other local tech.
The things start getting interesting, if you scale up your system across different JVMs and/or different machines. This is where Vert.x EB shines, as you don't have to change the code of your verticles!
You replace the local EB with clustered one, which is a matter of adding dependencies and configuring the cluster, but still no original code for event-bus operations has to be changed. The other way around also works just fine, if you want to squeese several verticles into the same JVM.
Clustering of the EB of course has it's price, but it's performance has to do rather with underlying clustering technologies like Hazelcast (default) or Infinispan than with Vert.x itself.
I just work with an new project backed by RabbitMQ, and there are multiple consumer instances created listening to the same queue when the application starts. Howerver they shares the same connections with different channels.
The messages from the queue are massive(millions messages for one single producing behavior ) so I guess the very first code author is trying to do something to make consuming faster.
I am trying to find some posts discussing on this but I can't find a very certain answer.
What I get so far is:
Each channel will have a separate dispatch thread
The operation commands on the same channel is serialized even though they are called in multiple thread
So
creating multiple consumers thus multiple channels will have multiple dispatch threads, but I don't think it provided a better performance to message dispatching since the dispatch should far from enough with one single thread.
The operation of ack will can be paralized in different channels, I am not quite sure this will give any better performances.
Since more channels consume more system resources I wonder is this practice good?
There seem to be a few things going on here, so let's try to look at this scenario from a holistic perspective.
For starters, it sounds like the original designer of this code understood some basics about RabbitMQ (or learned a few things by trial and error), but may have had trouble putting all the pieces together- hopefully I can help.
RabbitMQ connections are, in reality, AMQP-over-TCP connections (and thus are somewhere around the session layer of the OSI model). TCP connections are supposed to be opened up and used until some sort of network interruption or application shutdown closes them (and for this reason, AMQP has trouble with firewalls and other smart network devices). Using a single TCP connection for message processing activities for a single logical process is a good idea, as creating and destroying TCP connections is usually an expensive process for the computer, which leads to
RabbitMQ channels are used to multiplex communication streams in the AMQP-Over-TCP connection (and are defined in the AMQP Protocol Spec). All they do is specify an integer value (I can't remember the number of bytes, but it doesn't matter anyway) used to preface the subsequent command or response on a TCP connection. Most AMQP operations are channel-specific. For the purposes of higher-level operations, channels are treated similar to connections, as they are application-level constructs.
Now, where I think the question starts to go off the rails a bit is here:
The messages from the queue are massive(millions messages for one
single producing behavior ) so I guess the very first code author is
trying to do something to make consuming faster.
A fundamental assumption about a system which uses queues is that messages are consumed at approximately the same rate that they are produced. Queues exist to buffer uneven producing activities. The mathematics and statistics of how queues work are quite interesting, and assuming the production of messages is done in response to some real-world stimulus, your system is virtually guaranteed to behave in a predictable manner. Therefore, your design goal is to ensure that there are enough consumers to process the messages that are produced, and to respond to changing conditions as needed. Your goal should not be to "speed up" the consumers (unless they have some specific issue), but rather to have enough consumers to process the total load.
Further, the average number of items in the queue at any time should approach zero. It is usually a good idea to have overcapacity so that you don't wind up with an unstable situation where messages start accumulating in the queue (and the queue ends up looking like the Stack Overflow Close Vote Queue).
And that brings us to an attempt to answer your fundamental question, which seems to deal with threading and possibly detailed implementation of the Java client, which I will readily admit I have not used (I'm a .NET guy).
Here are some design guidelines for your software:
Ensure that a single thread uses no more than one channel.
Use one TCP connection per logical consuming process.
Balance the number of logical processes on a single physical machine such that resource contention is not a problem (you don't want to starve your consumers of computer resources).
Try to use BASIC.GET as opposed to a push-based consumer. Use of consumers is difficult in practice, and there is no performance benefit at the protocol level over a BASIC.GET. Note I do not know if the Java library has implemented these differently such that it does cause a performance difference- stranger things have been known to happen.
If you do use consumers, make sure pre-fetch is set to 0 (disabled) and that AutoAck is set to false if reliable processing is important (most applications require reliable processing). Along with this, make sure you are acknowledging messages upon completion of processing!
Periodically reboot your consuming threads, channels, and processors - or do a BASIC.Recover. There are degrees of randomness that will result in unacknowledged messages accumulating over time, and this will deal with it.
Again, if you prefer to use consumers, generally speaking to share consumers across channels is a bad idea. Each consumer should get its own channel.
I need to create a relatively simple Java tcp/ip server and I'm having a little trouble determining if I should use something like Netty or just stick with simple ServerSocket and InputStream/OutputStream.
We really just need to listen for a request, then pass the new client Socket off to some processing code in a new thread. That thread will terminate once the processing is complete and the response is sent.
I like the idea of pipelines, decoders, etc. in Netty, but for such a simple scenario it doesn't seem worth the added up front development time. It seems like a bit overkill for our initial requirements, but I'm a little nervous that there are lots of things I'm not considering. What, if any, are the benefits of Netty for such simple requirements? What am I failing to consider?
The main advantage of Netty over simply reading from and writing to sockets using streams is that Netty supports non-blocking, asynchronous I/O (using Java's NIO API); when you use streams to read and write from sockets (and you start a new thread for each connected accepted from a ServerSocket) you are using blocking, synchronous I/O.
The Netty approach scales much better, which is important if your system needs to be able to handle many (thousands) of connections at the same time. If your system does not need to scale to many simultaneous connections, it might not be worth the trouble to use a framework like Netty.
Some more background information: Threads are relatively expensive resources in an operating system. Each thread needs memory for the stack (which can be for example 2 MB in size). When you create thousands of threads, this is going to cost a lot of memory; also, operating systems have limits on the number of threads that can be created. So you don't want to start a new thread for each accepted connection. The idea of asynchronous I/O is to decouple the threads from the connections (no one-to-one relation). There can be many more connections than threads, and whenever some event happens on one of the connections (for example, data is received), a thread from a thread pool is temporarily used to handle the event.
I think that the benefits of using netty are not immediate but actually come later when requirements change and maintenance becomes more complex for your project. Netty brings built in understanding of the HTTP protocol so that you can provide simple RESTful web services. Also you have the option of utilizing asynchronous request processing that netty provides as a framework so that you can potentially get better performance and service several orders of magnitude more concurrent requests.
First, write the logic of your service so that it's independent of your communication layer.
As Victor Sorokin said, there's a learning advantage to doing it yourself. So it ought to be worthwhile to write it with sockets. It will involve less effort to get started, and if it works well enough then you're off to the races.
If you find that you need more scalability/robustness later, you can switch to netty. Just write a new netty layer that communicates for your service logic layer and swap them out.
I'm new to enterprise Java development, although I'm sure this question equally applies to any language or platform, such as .NET.
For the first time ever now I'm dealing with message queues, and I'm very intrigued by them. (specifically, we're using ActiveMQ). My tech lead wants ActiveMQ queues to be the front-runners to all of our databases and internal web services; thus instead of a database query being fired off from the client and going directly to the database, it gets queued up first.
My question is this: are queues the way to go with every major processing component? Do best practices dictate putting them in front of system components that usually get hit with large amounts of requests? Are there situations where queues should not be used?
Thanks for any insight here!
Here are some examples where a message queue might be useful.
Limited resources
Lets say you have a large number of users making requests to a service. If the service can only handle a small number of requests concurrently then you might use a queue as a buffer.
Service decoupling
A key enterprise integration concept is decoupling of systems in for eg a workflow. Instead of having systems talk directly to each other, they asyncronously post messages to queues. The integration component then routes and delivers the message to the appropriate system.
Message replay
In the above example queues can also provide reliable delivery and processing of requests. If one component of the workflow breaks, others are unaffected and can still operate and post messages to the broken component. When the broken component recovers it can process all the queued up messages.
They key concepts here are load throttling, loose coupling, reliability and async operation.
As to whether they are the way to go for every major component, I would say no, this is not an automatic choice, you must consider each component individually.
Queues are indeed a very powerful and useful tool, but like every tool you should only use it for the job it is intended.
IMO they are not the away to go for every major processing component.
As a general rule I would use a queue where the requesting resource does not require an immediate, synchronous response. I would not use a queue where the timeliness and order of processing is vital.
Where asynchronous processing is allowable and you wish to regulate the amount of traffic to a service then a queue may be the way to go.
See #Qwerky's answer too, he (or she) makes some good points.
Please check out this:
http://code.google.com/p/disruptor/
Not only queues are there in the wild to solve those kind of problems.
Answering your question. Queues in this case will introduce asynchronous behavior in access to your databases. In this case it is more a question of can you afford such a great impact on your legacy systems. It just might be too much of change to push everything to the queues. Please describe what is the general purpose of your systems. Then it will be easer to answer your question fully.
Message queues are fundamentally an asynchronous communication system. In this case, it means that aside from the queue that links the sender and receiver, both sender and receiver operate independently; a receiver of a message does not (and should not) require interaction with the sender. Similarly, a sender of a message does not (and should not) require interaction with receiver.
If the sender needs to wait for the result of processing a message, then a message queue may not be a good solution, as this would force an asynchronous system to be synchronous, against the core design. It might be possible to construct a synchronous communication system on top of a message queue, but the fundamental asynchronous nature of a message queue would make this conversion awkward.
I am designing a application where many clients connect to a central server. This server keeps these connections, sending keep-alives every half-hour. The server has a embedded HTTP server, which provides a interface to the client connections (ex. http://server/isClientConnected?id=id). I was wondering what is the best way to go about this. My current implementation is in Java, and I just have a Map with ID's as the key, but a thread is started for each connection, and I don't know if this is really the best way to do this. Any pointers would be appreciated.
Thanks,
Isaac Waller
Use the java.nio package, as described on this page: Building Highly Scalable Servers with Java NIO. Also read this page very carefully: Architecture of a Highly Scalable NIO-Based Server.
Personally I'd not bother with the NIO internals and use a framework like Apache MINA or xSocket. NIO is complicated and easy to get wrong in very obscure ways. If you want it to "just work", then use a framework.
With a single thread per connection you can usually scale up to about 10,000 connections on a single machine. For a Windows 32 machine, you probably will hit a limit around 1,000 connections.
To avoid this, you can either change the design of your program, or you can scale out (horizontal). You have to weight the cost of development with the cost of hardware.
The single thread per user, with a single continuous connection is usually the easiest programming model. I would stick with this model until you reach the limits of your current hardware. At that point, I would decide to either change the code, or add more hardware.
If the clients will be connected for long periods of time, allocating a thread per client can be problematic. Each thread on the server requires a certain amount of resources (memory for the stack, for example).
You could use Jetty Continuations to handle the client request with fewer threads by using asynchronous servlets.
Read more about the the Reactor pattern. There is an implementation for that in Java (it uses channels instead of thread for client).
It is easy to implement and very efficient.