How to properly throttle web requests to external systems?

How to properly throttle web requests to external systems? - java

My Java web application pulls some data from external systems (JSON over HTTP) both live whenever the users of my application request it and batch (nightly updates for cases where no user has requested it). The data changes so caching options are likely exhausted.
The external systems have some throttling in place, the exact parameters of which I don't know, and which likely change depending on system load (e.g., peak times 10 requests per second from one IP address, off-peak times 100 requests per second from open IP address). If the requests are too frequent, they time out or return HTTP 503.
Right now I am attempting the request 5 times with 2000ms delay between each, giving up if an error is received each time. This is not optimal as sometimes at peak-times nearly all requests fail; I could avoid making these requests and perhaps get at least some to succeed instead.
My goals are to have a somewhat simple, reliable design, and enough flexibility so that I could both pull some metrics from the throttler to understand how well the external systems are responding (and thus adjust how often they are invoked), and to auto-adjust the interval with which I call them (individually per system) so that it is optimal both on off-peak and peak hours.
My infrastructure is Java with RabbitMQ over MongoDB over Linux.
I'm thinking of three main options:
Since I already have RabbitMQ used for batch processing, I could just introduce a queue to which the web processes would send the requests they have for external systems, then worker processes would read from that queue, throttle themselves as needed, and return the results. This would allow running multiple parallel worker processes on more servers if needed. My main concern is that it isn't a very simple solution, and how to manage peak-hour throughput being low and thus the web processes waiting for a long while. Also this converts my RabbitMQ into a critical single failure point; if it dies the whole system stops (as opposed to the nightly batch processes just not running any more, which is less critical). I suppose rpc is the correct pattern of RabbitMQ usage, but not sure. Edit - I've posted a related question How to properly implement RabbitMQ RPC from Java servlet web container? on how to implement this.
Introduce nginx (e.g. ngx_http_limit_req_module), HAProxy (link) or other proxy software to the mix (as reverse proxies?), have them take care of the throttling through some configuration magic. The pro is that I don't have to make code changes. The con is that it is more technology used, and one I've not used before, so chances of misconfiguring something are quite high. It would also likely not be easy to do dynamic throttling depending on external server load, or prioritizing live requests over batch requests, or get statistics of how the throttling is doing. Also, most documentation and examples will likely be on throttling incoming requests, not outgoing.
Do a pure-Java solution (e.g., leaky bucket implementation). Would be simple in the sense that it is "just code", but the devil is in the details; debugging all the deadlocks, starvations and race conditions isn't always fun.
What am I missing here?
Which is the best solution in this case?
P.S. Somewhat related question - what's the proper approach to log all the external system invocations, so that statistics are collected as to how often I invoke them, and what the success rate is?
E.g., after every invocation I'd invoke something like .logExternalSystemInvocation(externalSystemName, wasSuccessful, elapsedTimeMills), and then get some aggregate data out of it whenever needed.
Is there a standard library/tool to use, or do I have to roll my own?
If I use option 1. with RabbitMQ, is there a way to organize the flow so that I get this out of the box from the RabbitMQ console? I wouldn't want to send all failed messages to poison queue, it would fill up too quickly though and in most cases there is no need to re-process these failed requests as the user has already sadly moved on.

Perhaps this open source system can help you a little: http://code.google.com/p/valogato/

Related

direct logging on elasticsearch vs using logstash and filebeat

I'm using a Spring Boot back-end to provide some restful API and need to log all of my request-response logs into ElasticSearch.
Which of the following two methods has better performance?
Using Spring Boot ResponseBodyAdvice to log every request and response that is sent to the client directly to ElasticSearch.
Log every request and response into a log file and using filebeat and/or logstash to send them to ElasticSearch.

First off, I assume, that you have a distributed application, otherwise just write your stuff in a log file and that's it
I also assume that you have quite a log of logs to manage, otherwise, if you're planning to log like a couple of messages in a hour, then it doesn't really matter which way you go - both will do the job.
Technically both ways can be implemented, although for the first path I would suggest a different approach, at least I did something similar ~ 5 years ago in one of my projects:
Create a custom log appender that throws everything into some queue (for async processing) and from that took an Apache Flume project that can write stuff to the DB of your choice in a transaction manner with batch support, "all-or-nothing" semantics, etc.
This approach solves issues that might appear in the "first" option that you've presented, while some other issues will be left unsolved.
If I compare the first and the second option that you've presented,
I think you better off with filebeat / logstash or even both to write to ES, here is why:
When you log in the advice - you will "eat" the resources of your JVM - memory, CPU to maintain ES connections pool, thread pool for doing an actual log (otherwise the business flow might slow down because of logging the requests to ES).
In addition you won't be able to write "in batch" into the elasticsearch without the custom code and instead will have to create an "insert" per log message that might be wasty.
One more "technicality" - what happens if the application gets restarted for some reason, will you be able to write all the logs prior to the restart if everything gets logged in the advice?
Yet another issue - what happens if you want to "rotate" the indexes in the ES, namely create an index with TTL and produce a new index every day.
filebeat/logstash potentially can solve all these issues, however they might require a more complicated setup.
Besides, obviously you'll have more services to deploy and maintain:
logstash is way heavier than filebeat from the resource consumption standpoint, and usually you should parse the log message (usually with grok filter) in logstash.
filebeat is much more "humble" when it comes to the resource consumption, and if you have like many instances to log (really distributed logging, that I've assumed you have anyway) consider putting a service of filebeat (deamon set if you have k8s) on each node from which you'll gather the logs, so that a single filebeat process could handle different instances, and then deploy a cluster of instances of logstash on a separate machine so that they'll do a heavy log-crunching all the time and stream the data to the ES.
How does logstash/filebeat help?
Out of my head:
It will run in its own pace, so even if process goes down, the messages produced by this process will be written to the ES after all
It even can survive short outages of the ES itself I think (should check that)
It can handle different processes written in different technologies, what if tomorrow you'll want to gather logs from the database server, for example, that doesn't have spring/not written java at all
It can handle indices rotation, batch writing internally so you'll end up with effective ES management that otherwise you had to write by yourself.
What are the drawbacks of the logstash/filebeat approach?
Again, out of my head, not a full list or something:
Well, much more data will go through the network all-in-all
If you use "LogEvent" you don't need to parse the string, so this conversion is redundant.
As for performance implications - it basically depends on what do you measure how exactly does your application look like, what hardware do you have, so I'm afraid I won't be able to give you a clear answer on that - you should measure in your concrete case and come up with a way that works for you better.

Not sure if you can expect a clear answer to that. It really depends on your infrastructure and used hardware.
And do you mean by performance the performance of your spring boot backend application or performance in terms of how long it takes for your logs to arrive at ElasticSearch?
I just assume the first one.
When sending the logs directly to ElasticSearch your bottleneck will be the used network and while logging request and responses into a log file first, your bottleneck will probably be the used harddisk and possible max I/O operations.
Normally I would say that sending the logs directly to ElasticSearch via network should be the faster option when you are operating inside your company/network because writing to a disk is always quite slow in comparison. But if you are using fast SSDs the effect should be neglectable. And if you need to send your network packages to a different location/country this can also change fast.
So in summary:
If you have a fast network connection to your ElasticSearch and HDDs/slower SSDs the performance might be better using the network.
If your ElasticSearch is not at your location and you can use fast SSD, writing the logs into a file first might be the faster option.
But in the end you maybe have to try out both approaches, implement some timers and check for yourself.

we are using both solution. first approach have less complexity.
we choose second approach when we dont want to touch the code and have too many instance of app.
about performance. with writing directly on elasticsearch you have better performance because you are not occupying disk I/O. but assume that when the connection between your app and elasticsearch server is dropped. you would have lost log after some retrying attempts.
using rsyslog and logstash is more reliable for big clusters.

Decision to go for distributed application?

I have a legacy product in financial domain.Using tomcat 6. We get millions of request 10k of request in hour. I am wondering at high level
should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
The reason i am planning to go in this direction so that load is distribute and get better performance With this it becomes scalable also.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
If some expert can help
what is the criteria i should consider to go for distributed model and pros/cons of it? I also tried googling where i could get some stats
like how much load a given webserver (tomcat in my case)handle efiiciently with given hardware(16 gb ram, windows 7, processor ).
Yes i am going
to do POC where i will be measuring performance with distributed model vs without bit high level input will be highly appreciated?

It is impossible to answer this questions without more details - how long does it take to reply to one request on the current server? How many resources are allocated for one request?
having 10k requests per hour means ~3 requests per second. If performing the necessary operations and replying to a request, using 1 CPU takes ~300ms - one simple machine is totally fine. This is simple math, and doesn't always work. I guess you still have peaks within those 10k requests per hour and they aren't gradually distributed.
If we assume, one reply can take up to 1 second, than you can handle as many replies per second as your system has CPUs (given that a CPU would be the bottle neck) If the CPU isn't the bottle neck for your application server, there's probably something wrong. You should set up the database(s) on a different machine and only perform computation tasks on the application server machine.
Especially in the financial sector with a legacy software, I wouldn't try splitting a running product. How old is the current server? I believe that a new Server should be cheaper than rewriting an application. Unless you expect 50-100k requests per hour very soon, I don't think, splitting up such small parts makes sense.
Instead - run it on an up to date server hardware, split application server and data storage and you should be fine.

I am wondering at high level if should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
I'm not sure what you mean for "system" in this context, but if it means that you are planning to run your application in two servers,
one dedicated to presentation and other dedicated to business layer, take in mind that a simpler approach (and probably more suitable for your app)
is build a co-located architecture.
Basically, the idea is to replicate your app in several servers (at least two) and put in front of them a load balancer that routes the incoming requests among the available servers.
All servers share the same database instance. This will give you vertical scalability and also will improve the availability of your system.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
Distributing your business logic will probably involve a refactor of your application code, if the system is working well you will add some bugs for sure.
The necessary remote calls will add latency and the fact that you execute your business logic in several servers doesn't resolve the performance problems on the presentation tier.
In Expert One-on-One J2EE Development Without EJB (pag. 65), you can find a good reading about why not distribute your business logic.

Prevent client from overloading server?

I have a Java servlet that's getting overloaded by client requests during peak hours. Some clients span concurrent requests. Sometimes the number of requests per second is just too great.
Should I implement application logic to restrict the number of request client can send per second? Does this need to be done on the application level?

The two most common ways of handling this are to turn away requests when the server is too busy, or handle each request slower.
Turning away requests is easy; just run a fixed number of instances. The OS may or may not queue up a few connection requests, but in general the users will simply fail to connect. A more graceful way of doing it is to have the service return an error code indicating the client should try again later.
Handling requests slower is a bit more work, because it requires separating the servlet handling the requests from the class doing the work in a different thread. You can have a larger number of servlets than worker bees. When a request comes in it accepts it, waits for a worker bee, grabs it and uses it, frees it, then returns the results.
The two can communicate through one of the classes in java.util.concurrent, like LinkedBlockingQueue or ThreadPoolExecutor. If you want to get really fancy, you can use something like a PriorityBlockingQueue to serve some customers before others.
Me, I would throw more hardware at it like Anon said ;)

Some solid answers here. I think more hardware is the way to go. Having too many clients or traffic is usually a good problem to have.
However, if you absolutely must throttle clients, there are some options.
The most scalable solutions that I've seen revolve around a distributed caching system, like Memcached, and using integers to keep counts.
Figure out a rate at which your system can handle traffic. Either overall, or per client. Then put a count into memcached that represents that rate. Each time you get a request, decrement the value. Periodically increment the counter to allow more traffic through.
For example, if you can handle 10 requests/second, put a count of 50 in every 5 seconds, up to a maximum of 50. That way you aren't refilling it all the time, but you can also handle a bit of bursting limited to a window. You will need to experiment to find a good refresh rate. The key for this counter can either be a global key, or based on user id if you need to restrict that way.
The nice thing about this system is that it works across an entire cluster AND the mechanism that refills the counters need not be in one of your current servers. You can dedicate a separate process for it. The loaded servers only need to check it and decrement it.
All that being said, I'd investigate other options first. Throttling your customers is usually a good way to annoy them. Most probably NOT the best idea. :)

I'm assuming you're not in a position to increase capacity (either via hardware or software), and you really just need to limit the externally-imposed load on your server.
Dealing with this from within your application should be avoided unless you have very special needs that are not met by the existing solutions out there, which operate at HTTP server level. A lot of thought has gone into this problem, so it's worth looking at existing solutions rather than implementing one yourself.
If you're using Tomcat, you can configure the maximum number of simultaneous requests allowed via the maxThreads and acceptCount settings. Read the introduction at http://tomcat.apache.org/tomcat-6.0-doc/config/http.html for more info on these.
For more advanced controls (like per-user restrictions), if you're proxying through Apache, you can use a variety of modules to help deal with the situation. A few modules to google for are limitipconn, mod_bw, and mod_cband. These are quite a bit harder to set up and understand than the basic controls that are probably offered by your appserver, so you may just want to stick with those.

Potential pitfalls in using a JMS queue?

I've been asked to design and implement a system for receiving a high volume of automated sensor data from a large number of devices. This data will be produced at regular intervals and sent to the server as xml in an http post. The devices will keep resending the same data if they don't receive a specific acknowledgment from the server. Some potentially heavy duty processing of this data will need to occur before it's inserted to a number of tables in the main database via a transaction, and additionally some data points will need to be enqueued to be re-directed to other external urls.
I'm planning on using a Java application server (leaning towards GlassFish) with a servlet to receive the incoming data. I'd like to implement some kind of queuing mechanism to store the data temporarily so that the response back to the sensor isn't dependent on all the intermediate processing. Separate independent queues are also a requirement for the data re-direction piece. After doing some research the two main options seem to be:
1) Install a database on the app server and use tables for the various queues. The queues would be processed by a Java application, either running in the app server or standalone as it's own service.
2) Use a database backed JMS solution to implement the queuing.
I'm not that familiar with JMS but from what I've read it seems to be the better solution in this case. The primary requirement is that no sensor data ever be lost or dropped from the queue before being processed and that it be processed more or less sequentially. We'd also like to make it easy to halt the processing of some of the queues at certain times but still have them accumulate data and for these messages to never automatically expire.
With strategy 1 it's obvious to me how to meet these requirements but it may be less robust and scalable, and more complex to develop than strategy 2, since I'll need to write my own multi-threaded code to handle the various independent queues. I'm wondering what the potential pitfalls could be in using JMS queues for this purpose since I've never worked with them before.
Data integrity is a big issue so I need to make sure JMS can guarantee no data loss in the event of a server reboot, power outage, or if the queue gets very large for some reason. For instance could a problem completing transactions to the main database for a period of time potentially cause the JVM to run out of memory, crash, and lose all accumulated data? (This would be the nightmare scenario).
Also, I was wondering if there would be any way to pause the JMS queue processing via an app server admin tool or to easily see what's in the queue (I would be enqueuing an object which would be the message xml plus some other data, including timestamp received, etc.) I've read a few posts on here that deal with related issues but wanted to get some direct feedback. Basically I'd like to know of instances (if any) where JMS is not an appropriate queuing solution and if this is one of those cases. Any advice is greatly appreciated.

Kaleb's answer talks about the benefits of JMS quite eloquently, but since you're asking about pitfalls, here's what I can think of.
Not all JMS implementations are equal. In theory you can use whatever implementation suits your needs, but unless you're prepared to do some serious load testing and failure condition testing, you can't know that a particular implementation isn't going to fail under your particular use case.
Most JMS use a transactional datastore like a relational database as their back end. That means that rather than writing directly to whatever datastore you're familiar with, you have to rely on the JMS implementation's extra layer between you and that stored messages.
While swapping JMS implementations to find the one that perfectly fits your needs may seem like a simple endeavor because of the homogeneous JMS API, the critical features for failure handling, JMS server monitoring, and all the other cool stuff that exists above and beyond messaging is going to be a hassle to deal with if you do change your implementation.
That said, I think you'd be crazy to write to the DB yourself instead of going with JMS. On the first point, ActiveMQ is a venerable JMS server used in many enterprise environments. On the second point, the fact is you'd just end up writing that extra layer yourself in order to implement messaging, and your code won't have the benefit of thousands of eyes (or a set of paid developers who's sole job it is to respond to customers and make sure the JMS implementation is solid). On the third point, well the same ends up being true of your backend datastore. Use JMS, you'll save yourself trouble in the long run.

If you want to go the JMS route, a standalone JMS-compatible message broker (separate from your app server) would be a good choice. Message brokers range from free open-source (like ActiveMQ at http://activemq.apache.org/ or OpenMQ at https://mq.dev.java.net/), to large-scale commercial solutions (IBM's WebSphere MQ at http://www-01.ibm.com/software/integration/wmq/ is one of the largest).
Message brokers offer guaranteed delivery (provided the server's up and listening), and you can do quite a bit to ensure that the system is fail-safe including integrated backup broker servers and instant power backup. Broker queues can eventually run out of room if your app server isn't picking up the messages, but you can assign huge queue depth (100's of GB) and have the server send alerts if the messages aren't getting processed and the queue reaches a certain percentage.
Your Java app would then run on a different server entirely, and would connect to the broker and pull messages off of the queue as fast as possible. If the app server crashes or stops picking up messages for any other reason, the broker would just keep all messages in that queue until the app server begins picking them up again.

You will be wanting to implement a poison message queue in your implementation - this is the place that messages unable to be processed after some number of retries will arrive.
You will probably need to write some code that can examine the messages in that queue and re-send them to the appropriate destination after fixing whatever is causing them to fail.
If sequence of message processing is important, a message ending up in the poison queue could mean all processing is halted until that message is corrected.
As far as fault tolerance goes, you can have multiple instances of the consuming services subscribe to the same queue or topic, providing an ability to continue processing even if one or more instances goes down.
Finally, have a watchdog process that pings the various consumers on your message queue, and if one doesn't respond, have it send a message that results in a new instance being started. In this way, your message processing environment can be somewhat self regulating.

Critically efficient server

I am developing a client-server based application for financial alerts, where the client can set a value as the alert for a chosen financial instrument , and when this value will be reached the monitoring server will somehow alert the client (email, sms ... not important) .The server will monitor updates that come from a data generator program. Now, the server has to be very efficient as it has to handle many clients (possible over 50-100.000 alerts ,with updates coming at 1,2 seconds) .I've written servers before , but never with such imposed performances and I'm simply afraid that a basic approach(like before) will just not do it . So how should I design the server ?, what kind of data structures are best suited ?..what about multithreading ?....in general what should I do (and what I should not do) to squeeze every drop of performance out of it ?
Thanks.

I've worked on servers like this before. They were all written in C (or fairly simple C++). But they were even higher performance -- handling 20K updates per second (all updates from most major stock exchanges).
We would focus on not copying memory around. We were very careful in what STL classes we used. As far as updates, each financial instrument would be an object, and any clients that wanted to hear about that instrument would subscribe to it (ie get added to a list).
The server was multi-threaded, but not heavily so -- maybe a thread handing incoming updates, one handling outgoing client updates, one handling client subscribe/release notifications (don't remember that part -- just remember it had fewer threads than I would have expected, but not just one).
EDIT: Oh, and before I forget, the number of financial transactions happening is growing at an exponential rate. That 20K/sec server was just barely keeping up and the architects were getting stressed about what to do next year. I hear all major financial firms are facing similar problems.

You might want to look into using a proven message queue system, as it sounds like this is basically what you are doing in your application.
Projects like Apache's ActiveMQ or RabbitMQ are already widely used and highly tuned, and should be able to support the type of load you are talking about outside of the box.

I would think that squeezing every drop of performance out of it is not what you want to do, as you really never want that server to be under load significant enough to take it out of a real-time response scenario.
Instead, I would use a separate machine to handle messaging clients, and let that main, critical server focus directly on processing input data in "real time" to watch for alert criteria.

Best advice is to design your server so that it scales horizontally.
This means distributing your input events to one or more servers (on the same or different machines), that individually decide whether they need to handle a particular message.
Will you be supporting 50,000 clients on day 1? Then that should be your focus: how easily can you define a single client's needs, and how many clients can you support on a single server?
Second-best advice is not to artificially constrain yourself. If you say "we can't afford to have more than one machine," then you've already set yourself up for failure.

Beware of any architecture that needs clustered application servers to get a reasonable degree of performance. London Stock Exchange had just such a problem recently when they pulled an existing Tandem-based system and replaced it with clustered .Net servers.
You will have a lot of trouble getting this type of performance from a single Java or .Net server - really you need to consider C or C++. A clustered architecture is much more error prone to build and deploy and harder to guarantee uptime from.
For really high volumes you need to think in terms of using asynchronous I/O for networking (i.e. poll(), select() and asynchronous writes or their Windows equivalents), possibly with a pool of worker threads. Read up about the C10K problem for some more insight into this.
There is a very mature C++ framework called ACE (Adaptive Communications Environment) which was designed for high volume server applications in telecommunications. It may be a good foundation for your product - it has support for quite a variety of concurrency models and deals with most of the nuts and bolts of synchronisation within the framework. You might find that the time spent learning how to drive this framework pays you back in less development and easier implementation and testing.

One Thread for the receiving of instrument updates which will process the update and put it in a BlockingQueue.
One Thread to take the update from the BlockingQueue and hand it off to the process that handles that instrument, or set of instruments. This process will need to serialize the events to an instrument so the customer will not receive notices out-of-order.
This process (Thread) will need to iterated through the list of customers registered to receive notification and create a list of customers who should be notified based on their criteria. The process should then hand off the list to another process that will notify the customer of the change.
The notification process should iterate through the list and send each notification event to another process that handles how the customer wants to be notified (email, etc.).
One of the problems will be that with 100,000 customers synchronizing access to the list of customers and their criteria to be monitored.

You should try to find a way to organize the alerts as a tree and be able to quickly decide what alerts can be triggered by an update.
For example let's assume that the alert is the level of a certain indicator. Said indicator can have a range of 0, n. I would groups the clients who want to be notified of the level of the said indicator in a sort of a binary tree. That way you can scale it properly (you can actually implement a subtree as a process on a different machine) and the number of matches required to find the proper subset of clients will always be logarithmic.

Probably the Apache Mina network application framework as well as Apache Camel for messages routing are the good start point. Also Kilim message-passing framework looks very promising.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.