Limit on the amount of queues and verticles in Vert.x

Limit on the amount of queues and verticles in Vert.x - java

We are now in the process of refactoring our messaging application written in Vert.x. The application processes incoming messages from users. Initially, it was implemented so that there is a single verticle instance that listens to a single queue in the event bus and processes all the incoming messages.
What we are thinking of doing is to refactor it so that it works a bit similar to actor model: we deploy an instance of a verticle for each active user and make it listen to a user-specific queue. This way the verticle instance can maintain user-specific state and the parallelization of the message processing becomes much easier.
The issue, however, is that this would lead to a huge number of verticles deployed (30k - 50k in parallel) and huge amount of queues in the eventbus. And also we would need to maintain the verticles manually (undeploy unused verticles and deploy the ones when there is a message from a new user).
Question is - is this actor-style architecture good for vert.x and can it handle large amount of deployed verticles and eventbus queues at the same time?

There's one major correction to be made here - EventBus is a single queue. So, you won't have "huge number of queues". There will be only one. You'll have huge number of addresses on a single queue.
But is this number so huge? Well, is a HashMap of 50K elements can be considered huge? Probably not, at least in terms of keys. Now note that this applies only to Vert.x in non-clustered mode. Clustered Vert.x is different (still should work, though).
Now having those verticles is another matter. Each verticle is a separate object, and if you plan to store some data in it, it will be even larger. But if you can afford machines with some decent RAM (16GB+), it should work just fine.
What does concern me in this solution, though, is that you plan to deploy verticles on demand, then undeploy them. It does incur delays, so your users will experience degraded performance for first message they send.

What you call "actor-style" does not mean, that you have to inflate a new verticle instance per user. If you do so, you are going to get a system with 98% redundancy.
It's absolutely enough to register an event-bus address for each user and use some sort of persistant storage to keep track of them. Such a storage can be any DB for long-term persistance or a cluster-wide SharedMap for short-term, or a combination of both.
Perhaps you don't even need a address-per-user scheme. Such a scheme is nice when the users are connected constantly to your system via some sort of EventBusBridge. If this is not a case, you can register a single event-bus address for all users and process messages based on payload.

Related

Is the performance of Vertx event bus as good or better than ConcurrentQueues in Java?

In a project of mine, I decided to use Vertx for the HTTP APIs, given its proven performance record. Then, because the application does use event queues internally I started wondering if I should use Vertx event bus and verticles, instead of using my usual ArrayBlockingQueue. I am still quite new to Vertx so I don't know how suitable it could be. I've experience with Akka and Actors and those would fit the bill very well, but I'm not sure if Vertx event bus is designed to scale to 100k events per second?

I work with Vert.x since version 3 and have done some projects with it (It's my main stack, since a couple of years). I did never run into a situation where the event bus was the limiting factor. The event bus is designed to handle such an amount of event and even much more. As #injecteer mentioned, the limiting factor is basically the hardware and how many events will be processed depends on what do you with them and how you scale your code.
Vert.x follows consequently a non-blocking programming model and you should follow it as well ... never be blocking. Vert.x has the concept of loose coupling, that's solved with portioning of the code with "verticles" https://vertx.io/docs/vertx-core/java/#_verticles. You can deploy/start multiple instances of those verticles (your pieces of code). A further base concept is event loop threads (default count of cores * 2).
Each deployed verticle instance will run on a specific event loop thread, and ALL registered handlers (event bus, http server, etc.) are get called on this specific event loop thread at any time. This way you are able to scale your code in a "per thread" fashion, according to your needs. Events over the event bus are distributed with round robin between the verticle instance (and the handlers within the verticles) ... btw handlers of http requests are also distributed with round robin.
Clustered mode is a bit different. How do you (de)serialize dtos (Json, Protobuf, etc.) can become a significant difference in terms of performance. A clustered event bus has TCP sockets between all nodes, means events are sent point-to-point. The cluster manager (Hazelcast is the default) on the other hand defines to which node an event should get send to (round robin on cluster level), but events are NOT sent over the cluster manager. E.g. the cluster manager knows which node has consumers registered on the event bus (on which address).
Since Vert.x 4 milestone 5 the cluster manager SPI provides an entry point where you can implement your own alternative to round robin, e.g. load specific distribution etc.
There are some basic concepts like event loop threads, non-blocking programming, and verticles (which is not mandatory but recommended). If / when those concepts are clear you get a very flexible base for near any kind of application. I personally love it and also did never see any other framework/technology that reaches near a comparable performance (with a proper scaling that fit the load).

I benchmarked Vert.x event bus (using pure Vert.x for pub and sub) and found it to max out at around 100K msg/s / CPU (using high-end Xeon CPU). Interestingly the performance was comparable to Vert.x's WebSockets implementation so I agree it's not the bottleneck if you do:
WS -> Event Bus
But if you do 10 hops on the Event Bus then it could be the bottleneck.
I observed the performance of the LMAX Disrupter to be much higher but once you introduce I/O then the I/O become the bottleneck with Disrupter. The problem with disrupter is that you can't use it with Netty.

From my understanding all libraries running in a single JVM would have comparable performance levels and are limited by your hardware and settings.
So, local event-bus would perform as good as any other local tech.
The things start getting interesting, if you scale up your system across different JVMs and/or different machines. This is where Vert.x EB shines, as you don't have to change the code of your verticles!
You replace the local EB with clustered one, which is a matter of adding dependencies and configuring the cluster, but still no original code for event-bus operations has to be changed. The other way around also works just fine, if you want to squeese several verticles into the same JVM.
Clustering of the EB of course has it's price, but it's performance has to do rather with underlying clustering technologies like Hazelcast (default) or Infinispan than with Vert.x itself.

Is it ok to use multiple session and connection on JMS (ActiveMQ)?

I must handle about 100 JMS Queue in a point-to-point messaging architecture. Every queue has a consumer. So I will have 100 consumer threads to handle them. Is it ok?

1)ActiveMQ Support your request(suggest write a connection pool)
2)you should confirm you server configuration whether is ok,when
QPS is high,

Instead of 100 queues, you could use a single queue and provide JMS message properties, having each consumer filter just the messages it wants.
What this does is give you some more options in architecture and deployment. You could have a single process consume multiple type of messages. Depending on your scaling issues, you could have multiple instances of a single consumer spread out among processes/servers/whatever.
You could also have one consumer for all 100 logical queues, reading the property and figuring out where to hand off the message internally, again, depending on whatever design issues you're running into.
Overall, messaging is so light-weight that it takes a significant volume of messages or a significant size of individual messages to really hurt things. I've got an ActiveMQ app that upon restart might have to process 10K/20K messages and it's complete in seconds. Fairly small messages, but still very possible (and my experience with other MQs is similar performance, as long as your processing is not too overwhelmingly difficult, you should be able to keep up).

How to properly throttle web requests to external systems?

My Java web application pulls some data from external systems (JSON over HTTP) both live whenever the users of my application request it and batch (nightly updates for cases where no user has requested it). The data changes so caching options are likely exhausted.
The external systems have some throttling in place, the exact parameters of which I don't know, and which likely change depending on system load (e.g., peak times 10 requests per second from one IP address, off-peak times 100 requests per second from open IP address). If the requests are too frequent, they time out or return HTTP 503.
Right now I am attempting the request 5 times with 2000ms delay between each, giving up if an error is received each time. This is not optimal as sometimes at peak-times nearly all requests fail; I could avoid making these requests and perhaps get at least some to succeed instead.
My goals are to have a somewhat simple, reliable design, and enough flexibility so that I could both pull some metrics from the throttler to understand how well the external systems are responding (and thus adjust how often they are invoked), and to auto-adjust the interval with which I call them (individually per system) so that it is optimal both on off-peak and peak hours.
My infrastructure is Java with RabbitMQ over MongoDB over Linux.
I'm thinking of three main options:
Since I already have RabbitMQ used for batch processing, I could just introduce a queue to which the web processes would send the requests they have for external systems, then worker processes would read from that queue, throttle themselves as needed, and return the results. This would allow running multiple parallel worker processes on more servers if needed. My main concern is that it isn't a very simple solution, and how to manage peak-hour throughput being low and thus the web processes waiting for a long while. Also this converts my RabbitMQ into a critical single failure point; if it dies the whole system stops (as opposed to the nightly batch processes just not running any more, which is less critical). I suppose rpc is the correct pattern of RabbitMQ usage, but not sure. Edit - I've posted a related question How to properly implement RabbitMQ RPC from Java servlet web container? on how to implement this.
Introduce nginx (e.g. ngx_http_limit_req_module), HAProxy (link) or other proxy software to the mix (as reverse proxies?), have them take care of the throttling through some configuration magic. The pro is that I don't have to make code changes. The con is that it is more technology used, and one I've not used before, so chances of misconfiguring something are quite high. It would also likely not be easy to do dynamic throttling depending on external server load, or prioritizing live requests over batch requests, or get statistics of how the throttling is doing. Also, most documentation and examples will likely be on throttling incoming requests, not outgoing.
Do a pure-Java solution (e.g., leaky bucket implementation). Would be simple in the sense that it is "just code", but the devil is in the details; debugging all the deadlocks, starvations and race conditions isn't always fun.
What am I missing here?
Which is the best solution in this case?
P.S. Somewhat related question - what's the proper approach to log all the external system invocations, so that statistics are collected as to how often I invoke them, and what the success rate is?
E.g., after every invocation I'd invoke something like .logExternalSystemInvocation(externalSystemName, wasSuccessful, elapsedTimeMills), and then get some aggregate data out of it whenever needed.
Is there a standard library/tool to use, or do I have to roll my own?
If I use option 1. with RabbitMQ, is there a way to organize the flow so that I get this out of the box from the RabbitMQ console? I wouldn't want to send all failed messages to poison queue, it would fill up too quickly though and in most cases there is no need to re-process these failed requests as the user has already sadly moved on.

Perhaps this open source system can help you a little: http://code.google.com/p/valogato/

Potential pitfalls in using a JMS queue?

I've been asked to design and implement a system for receiving a high volume of automated sensor data from a large number of devices. This data will be produced at regular intervals and sent to the server as xml in an http post. The devices will keep resending the same data if they don't receive a specific acknowledgment from the server. Some potentially heavy duty processing of this data will need to occur before it's inserted to a number of tables in the main database via a transaction, and additionally some data points will need to be enqueued to be re-directed to other external urls.
I'm planning on using a Java application server (leaning towards GlassFish) with a servlet to receive the incoming data. I'd like to implement some kind of queuing mechanism to store the data temporarily so that the response back to the sensor isn't dependent on all the intermediate processing. Separate independent queues are also a requirement for the data re-direction piece. After doing some research the two main options seem to be:
1) Install a database on the app server and use tables for the various queues. The queues would be processed by a Java application, either running in the app server or standalone as it's own service.
2) Use a database backed JMS solution to implement the queuing.
I'm not that familiar with JMS but from what I've read it seems to be the better solution in this case. The primary requirement is that no sensor data ever be lost or dropped from the queue before being processed and that it be processed more or less sequentially. We'd also like to make it easy to halt the processing of some of the queues at certain times but still have them accumulate data and for these messages to never automatically expire.
With strategy 1 it's obvious to me how to meet these requirements but it may be less robust and scalable, and more complex to develop than strategy 2, since I'll need to write my own multi-threaded code to handle the various independent queues. I'm wondering what the potential pitfalls could be in using JMS queues for this purpose since I've never worked with them before.
Data integrity is a big issue so I need to make sure JMS can guarantee no data loss in the event of a server reboot, power outage, or if the queue gets very large for some reason. For instance could a problem completing transactions to the main database for a period of time potentially cause the JVM to run out of memory, crash, and lose all accumulated data? (This would be the nightmare scenario).
Also, I was wondering if there would be any way to pause the JMS queue processing via an app server admin tool or to easily see what's in the queue (I would be enqueuing an object which would be the message xml plus some other data, including timestamp received, etc.) I've read a few posts on here that deal with related issues but wanted to get some direct feedback. Basically I'd like to know of instances (if any) where JMS is not an appropriate queuing solution and if this is one of those cases. Any advice is greatly appreciated.

Kaleb's answer talks about the benefits of JMS quite eloquently, but since you're asking about pitfalls, here's what I can think of.
Not all JMS implementations are equal. In theory you can use whatever implementation suits your needs, but unless you're prepared to do some serious load testing and failure condition testing, you can't know that a particular implementation isn't going to fail under your particular use case.
Most JMS use a transactional datastore like a relational database as their back end. That means that rather than writing directly to whatever datastore you're familiar with, you have to rely on the JMS implementation's extra layer between you and that stored messages.
While swapping JMS implementations to find the one that perfectly fits your needs may seem like a simple endeavor because of the homogeneous JMS API, the critical features for failure handling, JMS server monitoring, and all the other cool stuff that exists above and beyond messaging is going to be a hassle to deal with if you do change your implementation.
That said, I think you'd be crazy to write to the DB yourself instead of going with JMS. On the first point, ActiveMQ is a venerable JMS server used in many enterprise environments. On the second point, the fact is you'd just end up writing that extra layer yourself in order to implement messaging, and your code won't have the benefit of thousands of eyes (or a set of paid developers who's sole job it is to respond to customers and make sure the JMS implementation is solid). On the third point, well the same ends up being true of your backend datastore. Use JMS, you'll save yourself trouble in the long run.

If you want to go the JMS route, a standalone JMS-compatible message broker (separate from your app server) would be a good choice. Message brokers range from free open-source (like ActiveMQ at http://activemq.apache.org/ or OpenMQ at https://mq.dev.java.net/), to large-scale commercial solutions (IBM's WebSphere MQ at http://www-01.ibm.com/software/integration/wmq/ is one of the largest).
Message brokers offer guaranteed delivery (provided the server's up and listening), and you can do quite a bit to ensure that the system is fail-safe including integrated backup broker servers and instant power backup. Broker queues can eventually run out of room if your app server isn't picking up the messages, but you can assign huge queue depth (100's of GB) and have the server send alerts if the messages aren't getting processed and the queue reaches a certain percentage.
Your Java app would then run on a different server entirely, and would connect to the broker and pull messages off of the queue as fast as possible. If the app server crashes or stops picking up messages for any other reason, the broker would just keep all messages in that queue until the app server begins picking them up again.

You will be wanting to implement a poison message queue in your implementation - this is the place that messages unable to be processed after some number of retries will arrive.
You will probably need to write some code that can examine the messages in that queue and re-send them to the appropriate destination after fixing whatever is causing them to fail.
If sequence of message processing is important, a message ending up in the poison queue could mean all processing is halted until that message is corrected.
As far as fault tolerance goes, you can have multiple instances of the consuming services subscribe to the same queue or topic, providing an ability to continue processing even if one or more instances goes down.
Finally, have a watchdog process that pings the various consumers on your message queue, and if one doesn't respond, have it send a message that results in a new instance being started. In this way, your message processing environment can be somewhat self regulating.

Critically efficient server

I am developing a client-server based application for financial alerts, where the client can set a value as the alert for a chosen financial instrument , and when this value will be reached the monitoring server will somehow alert the client (email, sms ... not important) .The server will monitor updates that come from a data generator program. Now, the server has to be very efficient as it has to handle many clients (possible over 50-100.000 alerts ,with updates coming at 1,2 seconds) .I've written servers before , but never with such imposed performances and I'm simply afraid that a basic approach(like before) will just not do it . So how should I design the server ?, what kind of data structures are best suited ?..what about multithreading ?....in general what should I do (and what I should not do) to squeeze every drop of performance out of it ?
Thanks.

I've worked on servers like this before. They were all written in C (or fairly simple C++). But they were even higher performance -- handling 20K updates per second (all updates from most major stock exchanges).
We would focus on not copying memory around. We were very careful in what STL classes we used. As far as updates, each financial instrument would be an object, and any clients that wanted to hear about that instrument would subscribe to it (ie get added to a list).
The server was multi-threaded, but not heavily so -- maybe a thread handing incoming updates, one handling outgoing client updates, one handling client subscribe/release notifications (don't remember that part -- just remember it had fewer threads than I would have expected, but not just one).
EDIT: Oh, and before I forget, the number of financial transactions happening is growing at an exponential rate. That 20K/sec server was just barely keeping up and the architects were getting stressed about what to do next year. I hear all major financial firms are facing similar problems.

You might want to look into using a proven message queue system, as it sounds like this is basically what you are doing in your application.
Projects like Apache's ActiveMQ or RabbitMQ are already widely used and highly tuned, and should be able to support the type of load you are talking about outside of the box.

I would think that squeezing every drop of performance out of it is not what you want to do, as you really never want that server to be under load significant enough to take it out of a real-time response scenario.
Instead, I would use a separate machine to handle messaging clients, and let that main, critical server focus directly on processing input data in "real time" to watch for alert criteria.

Best advice is to design your server so that it scales horizontally.
This means distributing your input events to one or more servers (on the same or different machines), that individually decide whether they need to handle a particular message.
Will you be supporting 50,000 clients on day 1? Then that should be your focus: how easily can you define a single client's needs, and how many clients can you support on a single server?
Second-best advice is not to artificially constrain yourself. If you say "we can't afford to have more than one machine," then you've already set yourself up for failure.

Beware of any architecture that needs clustered application servers to get a reasonable degree of performance. London Stock Exchange had just such a problem recently when they pulled an existing Tandem-based system and replaced it with clustered .Net servers.
You will have a lot of trouble getting this type of performance from a single Java or .Net server - really you need to consider C or C++. A clustered architecture is much more error prone to build and deploy and harder to guarantee uptime from.
For really high volumes you need to think in terms of using asynchronous I/O for networking (i.e. poll(), select() and asynchronous writes or their Windows equivalents), possibly with a pool of worker threads. Read up about the C10K problem for some more insight into this.
There is a very mature C++ framework called ACE (Adaptive Communications Environment) which was designed for high volume server applications in telecommunications. It may be a good foundation for your product - it has support for quite a variety of concurrency models and deals with most of the nuts and bolts of synchronisation within the framework. You might find that the time spent learning how to drive this framework pays you back in less development and easier implementation and testing.

One Thread for the receiving of instrument updates which will process the update and put it in a BlockingQueue.
One Thread to take the update from the BlockingQueue and hand it off to the process that handles that instrument, or set of instruments. This process will need to serialize the events to an instrument so the customer will not receive notices out-of-order.
This process (Thread) will need to iterated through the list of customers registered to receive notification and create a list of customers who should be notified based on their criteria. The process should then hand off the list to another process that will notify the customer of the change.
The notification process should iterate through the list and send each notification event to another process that handles how the customer wants to be notified (email, etc.).
One of the problems will be that with 100,000 customers synchronizing access to the list of customers and their criteria to be monitored.

You should try to find a way to organize the alerts as a tree and be able to quickly decide what alerts can be triggered by an update.
For example let's assume that the alert is the level of a certain indicator. Said indicator can have a range of 0, n. I would groups the clients who want to be notified of the level of the said indicator in a sort of a binary tree. That way you can scale it properly (you can actually implement a subtree as a process on a different machine) and the number of matches required to find the proper subset of clients will always be logarithmic.

Probably the Apache Mina network application framework as well as Apache Camel for messages routing are the good start point. Also Kilim message-passing framework looks very promising.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.