How to set limit to the number of concurrent request in servlet?

How to set limit to the number of concurrent request in servlet? - java

I got this servlet which return a pdf file to the client web browser.
We do not want to risk any chance that when the number of request is too much, the server is paralyzed.
We would like to make an application level (program) way to set a limit in the number of concurrent request, and return a error message to the browser when the limit is reached. We need to do it in applicantion level because we have different servlet container in development level(tomcat) and production level(websphere).
I must emphasize that I want to control the maximum number of request instead of session. A user can send multiple request over the server with the same session.
Any idea?
I've thought about using a static counter to keep track of the number of request, but it would raise a problem of race condition.

I'd suggest writing a simple servlet Filter. Configure it in your web.xml to apply to the path that you want to limit the number of concurrent requests. The code would look something like this:
public class LimitFilter implements Filter {
private int limit = 5;
private int count;
private Object lock = new Object();
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
try {
boolean ok;
synchronized (lock) {
ok = count++ < limit;
}
if (ok) {
// let the request through and process as usual
chain.doFilter(request, response);
} else {
// handle limit case, e.g. return status code 429 (Too Many Requests)
// see https://www.rfc-editor.org/rfc/rfc6585#page-3
}
} finally {
synchronized (lock) {
count--;
}
}
}
}
Or alternatively you could just put this logic into your HttpServlet. It's just a bit cleaner and more reusable as a Filter. You might want to make the limit configurable through the web.xml rather than hard coding it.
Ref.:
Check definition of HTTP status code 429.

You can use RateLimiter. See this article for explanation.

You might want to have a look on Semaphore.
Semaphores are often used to restrict the number of threads than can access some (physical or logical) resource.
Or even better try to figure it out with the server settings. That would of course be server-dependant.

I've thought about using a static counter to keep track of the number of request, but it would raise a problem of race condition.
If you use a AtomicInteger for the counter, you will not have the problem of race conditions.
An other way would be using the Java Executor Framework (comes with Java 1.5). There you are able to limit the number of running threads, and block new once until there is a new free thread.
But I think the counter would work and be the easyest solution.
Attention: put the counter relese in a finally block!
//psydo code
final AtomicInteger counter;
...
while(true) {
int v = counter.getValue()
if (v > max) return FAILURE;
if(counter.compareAndSet(v, v+1)) break;
}
try{
doStuff();
} finally{
counter.decrementAndGet();
}

If you are serving static files, it's unlikely that the server will crash. The bottleneck would be the network throughput, and it degrades gracefully - when more requests come in, each still get served, just a little bit slower.
If you set a hard limit on total requests, remember to set a limit on requests per IP. Otherwise, it's easy for one bad guy to issue N requests, deliberately read the responses very slowly, and totally clog your service. This works even if he's on a dialup and your server network has a vast throughput.

Related

Tomcat: set maximum connection for particular servlet [duplicate]

I got this servlet which return a pdf file to the client web browser.
We do not want to risk any chance that when the number of request is too much, the server is paralyzed.
We would like to make an application level (program) way to set a limit in the number of concurrent request, and return a error message to the browser when the limit is reached. We need to do it in applicantion level because we have different servlet container in development level(tomcat) and production level(websphere).
I must emphasize that I want to control the maximum number of request instead of session. A user can send multiple request over the server with the same session.
Any idea?
I've thought about using a static counter to keep track of the number of request, but it would raise a problem of race condition.

I'd suggest writing a simple servlet Filter. Configure it in your web.xml to apply to the path that you want to limit the number of concurrent requests. The code would look something like this:
public class LimitFilter implements Filter {
private int limit = 5;
private int count;
private Object lock = new Object();
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
try {
boolean ok;
synchronized (lock) {
ok = count++ < limit;
}
if (ok) {
// let the request through and process as usual
chain.doFilter(request, response);
} else {
// handle limit case, e.g. return status code 429 (Too Many Requests)
// see https://www.rfc-editor.org/rfc/rfc6585#page-3
}
} finally {
synchronized (lock) {
count--;
}
}
}
}
Or alternatively you could just put this logic into your HttpServlet. It's just a bit cleaner and more reusable as a Filter. You might want to make the limit configurable through the web.xml rather than hard coding it.
Ref.:
Check definition of HTTP status code 429.

You can use RateLimiter. See this article for explanation.

You might want to have a look on Semaphore.
Semaphores are often used to restrict the number of threads than can access some (physical or logical) resource.
Or even better try to figure it out with the server settings. That would of course be server-dependant.

I've thought about using a static counter to keep track of the number of request, but it would raise a problem of race condition.
If you use a AtomicInteger for the counter, you will not have the problem of race conditions.
An other way would be using the Java Executor Framework (comes with Java 1.5). There you are able to limit the number of running threads, and block new once until there is a new free thread.
But I think the counter would work and be the easyest solution.
Attention: put the counter relese in a finally block!
//psydo code
final AtomicInteger counter;
...
while(true) {
int v = counter.getValue()
if (v > max) return FAILURE;
if(counter.compareAndSet(v, v+1)) break;
}
try{
doStuff();
} finally{
counter.decrementAndGet();
}

If you are serving static files, it's unlikely that the server will crash. The bottleneck would be the network throughput, and it degrades gracefully - when more requests come in, each still get served, just a little bit slower.
If you set a hard limit on total requests, remember to set a limit on requests per IP. Otherwise, it's easy for one bad guy to issue N requests, deliberately read the responses very slowly, and totally clog your service. This works even if he's on a dialup and your server network has a vast throughput.

Camel: File consumer component "bites off more than it can chew", pipeline dies from out-of-memory error

I have a route defined in Camel that goes something like this: GET request comes in, a file gets created in the file system. File consumer picks it up, fetches data from external web services, and sends the resulting message by POST to other web services.
Simplified code below:
// Update request goes on queue:
from("restlet:http://localhost:9191/update?restletMethod=post")
.routeId("Update via POST")
[...some magic that defines a directory and file name based on request headers...]
.to("file://cameldest/queue?allowNullBody=true&fileExist=Ignore")
// Update gets processed
from("file://cameldest/queue?delay=500&recursive=true&maxDepth=2&sortBy=file:parent;file:modified&preMove=inprogress&delete=true")
.routeId("Update main route")
.streamCaching() //otherwise stuff can't be sent to multiple endpoints
[...enrich message from some web service using http4 component...]
.multicast()
.stopOnException()
.to("direct:sendUpdate", "direct:dependencyCheck", "direct:saveXML")
.end();
The three endpoints in the multicast are simply POSTing the resulting message to other web services.
This all works rather well when the queue (i.e. the file directory cameldest) is fairly empty. Files are being created in cameldest/<subdir>, picked up by the file consumer and moved into cameldest/<subdir>/inprogress, and stuff is being sent to the three outgoing POSTs no problem.
However, once the incoming requests pile up to about 300,000 files progress slows down and eventually the pipeline fails due to out-of-memory errors (GC overhead limit exceeded).
By increasing logging I can see that the file consumer polling basically never runs, because it appears to take responsibility for all files it sees at each time, waits for them to be done processing, and only then starts another poll round. Besides (I assume) causing the resources bottleneck, this also interferes with my sorting requirements: Once the queue is jammed with thousands of messages waiting to be processed, new messages that would naively be sorted higher up are -if they even still get picked up- still waiting behind those that are already "started".
Now, I've tried the maxMessagesPerPoll and eagerMaxMessagesPerPoll options. They seem to alleviate the problem at first, but after a number of poll rounds I still end up with thousands of files in "started" limbo.
The only thing that sort of worked was making the bottle neck of delay and maxMessages... so narrow that the processing on average would finish faster than the file polling cycle.
Clearly, that is not what I want. I would like my pipeline to process files as fast as possible, but not faster. I was expecting the file consumer to wait when the route is busy.
Am I making an obvious mistake?
(I'm running a somewhat older Camel 2.14.0 on a Redhat 7 machine with XFS, if that is part of the problem.)

Try set maxMessagesPerPoll to a low value on the from file endpoint to only pickup at most X files per poll which also limits the total number of inflight messages you will have in your Camel application.
You can find more information about that option in the Camel documentation for the file component

The short answer is that there is no answer: The sortBy option of Camel's file component is simply too memory-inefficient to accomodate my use-case:
Uniqueness: I don't want to put a file on queue if it's already there.
Priority: Files flagged as high priority should be processed first.
Performance: Having a few hundred thousands of files, or maybe even a few million, should be no problem.
FIFO: (Bonus) Oldest files (by priority) should be picked up first.
The problem appears to be, if I read the source code and the documentation correctly, that all file details are in memory to perform the sorting, no matter whether the built-in language or a custom pluggable sorter is used. The file component always creates a list of objects containing all details, and that apparently causes an insane amount of garbage collection overhead when polling many files often.
I got my use case to work, mostly, without having to resort to using a database or writing a custom component, using the following steps:
Move from one file consumer on the parent directory cameldest/queue that sorts recursively the files in the subdirectories (cameldest/queue/high/ before cameldest/queue/low/) to two consumers, one for each directory, with no sorting at all.
Set up only the consumer from /cameldest/queue/high/ to process files through my actual business logic.
Set up the consumer from /cameldest/queue/low to simply promote files from "low" to "high" (copying them over, i.e. .to("file://cameldest/queue/high");)
Crucially, in order to only promote from "low" to "high" when high is not busy, attach a route policy to "high" that throttles the other route, i.e. "low" if there are any messages in-flight in "high"
Additionally, I added a ThrottlingInflightRoutePolicy to "high" to prevent it from inflighting too many exchanges at once.
Imagine this like at check-in at the airport, where tourist travellers are invited over into the business class lane if that is empty.
This worked like a charm under load, and even while hundreds of thousands of files were on queue in "low", new messages (files) dropped directly into "high" got processed within seconds.
The only requirement that this solution doesn't cover, is the orderedness: There is no guarantee that older files are picked up first, rather they are picked up randomly. One could imagine a situation where a steady stream of incoming files could result in one particular file X just always being unlucky and never being picked up. The chance of that happening, though, is very low.
Possible improvement: Currently the threshold for allowing / suspending the promotion of files from "low" to "high" is set to 0 messages inflight in "high". On the one hand, this guarantees that files dropped into "high" will be processed before another promotion from "low" is performed, on the other hand it leads to a bit of a stop-start-pattern, especially in a multi-threaded scenario. Not a real problem though, the performance as-is was impressive.
Source:
My route definitions:
ThrottlingInflightRoutePolicy trp = new ThrottlingInflightRoutePolicy();
trp.setMaxInflightExchanges(50);
SuspendOtherRoutePolicy sorp = new SuspendOtherRoutePolicy("lowPriority");
from("file://cameldest/queue/low?delay=500&maxMessagesPerPoll=25&preMove=inprogress&delete=true")
.routeId("lowPriority")
.log("Copying over to high priority: ${in.headers."+Exchange.FILE_PATH+"}")
.to("file://cameldest/queue/high");
from("file://cameldest/queue/high?delay=500&maxMessagesPerPoll=25&preMove=inprogress&delete=true")
.routeId("highPriority")
.routePolicy(trp)
.routePolicy(sorp)
.threads(20)
.log("Before: ${in.headers."+Exchange.FILE_PATH+"}")
.delay(2000) // This is where business logic would happen
.log("After: ${in.headers."+Exchange.FILE_PATH+"}")
.stop();
My SuspendOtherRoutePolicy, loosely built like ThrottlingInflightRoutePolicy
public class SuspendOtherRoutePolicy extends RoutePolicySupport implements CamelContextAware {
private CamelContext camelContext;
private final Lock lock = new ReentrantLock();
private String otherRouteId;
public SuspendOtherRoutePolicy(String otherRouteId) {
super();
this.otherRouteId = otherRouteId;
}
#Override
public CamelContext getCamelContext() {
return camelContext;
}
#Override
public void onStart(Route route) {
super.onStart(route);
if (camelContext.getRoute(otherRouteId) == null) {
throw new IllegalArgumentException("There is no route with the id '" + otherRouteId + "'");
}
}
#Override
public void setCamelContext(CamelContext context) {
camelContext = context;
}
#Override
public void onExchangeDone(Route route, Exchange exchange) {
//log.info("Exchange done on route " + route);
Route otherRoute = camelContext.getRoute(otherRouteId);
//log.info("Other route: " + otherRoute);
throttle(route, otherRoute, exchange);
}
protected void throttle(Route route, Route otherRoute, Exchange exchange) {
// this works the best when this logic is executed when the exchange is done
Consumer consumer = otherRoute.getConsumer();
int size = getSize(route, exchange);
boolean stop = size > 0;
if (stop) {
try {
lock.lock();
stopConsumer(size, consumer);
} catch (Exception e) {
handleException(e);
} finally {
lock.unlock();
}
}
// reload size in case a race condition with too many at once being invoked
// so we need to ensure that we read the most current size and start the consumer if we are already to low
size = getSize(route, exchange);
boolean start = size == 0;
if (start) {
try {
lock.lock();
startConsumer(size, consumer);
} catch (Exception e) {
handleException(e);
} finally {
lock.unlock();
}
}
}
private int getSize(Route route, Exchange exchange) {
return exchange.getContext().getInflightRepository().size(route.getId());
}
private void startConsumer(int size, Consumer consumer) throws Exception {
boolean started = super.startConsumer(consumer);
if (started) {
log.info("Resuming the other consumer " + consumer);
}
}
private void stopConsumer(int size, Consumer consumer) throws Exception {
boolean stopped = super.stopConsumer(consumer);
if (stopped) {
log.info("Suspending the other consumer " + consumer);
}
}
}

I would propose an alternative solution unless you really need to save the data as files.
From your restlet consumer, send each request to a message queuing app such as activemq or rabbitmq or something similar. You will quickly end up with lots of messages on that queue but that is ok.
Then replace your file consumer with a queue consumer. It will take some time but the each message should be processed separately and sent to wherever you want. I have tested rabbitmq with about 500 000 messages and that has worked fine. This should reduce the load on the consumer as well.

Vert.x Event loop - How is this asynchronous?

I'm playing around with Vert.x and quite new to the servers based on event loop as opposed to the thread/connection model.
public void start(Future<Void> fut) {
vertx
.createHttpServer()
.requestHandler(r -> {
LocalDateTime start = LocalDateTime.now();
System.out.println("Request received - "+start.format(DateTimeFormatter.ISO_DATE_TIME));
final MyModel model = new MyModel();
try {
for(int i=0;i<10000000;i++){
//some simple operation
}
model.data = start.format(DateTimeFormatter.ISO_DATE_TIME) +" - "+LocalDateTime.now().format(DateTimeFormatter.ISO_DATE_TIME);
} catch (Exception e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
r.response().end(
new Gson().toJson(model)
);
})
.listen(4568, result -> {
if (result.succeeded()) {
fut.complete();
} else {
fut.fail(result.cause());
}
});
System.out.println("Server started ..");
}
I'm just trying to simulate a long running request handler to understand how this model works.
What I've observed is the so called event loop is blocked until my first request completes. Whatever little time it takes, subsequent request is not acted upon until the previous one completes.
Obviously I'm missing a piece here and that's the question that I have here.
Edited based on the answers so far:
Isn't accepting all requests considered to be asynchronous? If a new
connection can only be accepted when the previous one is cleared
off, how is it async?
Assume a typical request takes anywhere between 100 ms to 1 sec (based on the kind and nature of the request). So it means, the
event loop can't accept a new connection until the previous request
finishes(even if its winds up in a second). And If I as a programmer
have to think through all these and push such request handlers to a
worker thread , then how does it differ from a thread/connection
model?
I'm just trying to understand how is this model better from a traditional thread/conn server models? Assume there is no I/O op or
all the I/O op are handled asynchronously? How does it even solve
c10k problem, when it can't start all concurrent requests parallely and have to wait till the previous one terminates?
Even if I decide to push all these operations to a worker thread(pooled), then I'm back to the same problem isn't it? Context switching between threads?
Edits and topping this question for a bounty
Do not completely understand how this model is claimed to asynchronous.
Vert.x has an async JDBC client (Asyncronous is the keyword) which I tried to adapt with RXJava.
Here is a code sample (Relevant portions)
server.requestStream().toObservable().subscribe(req -> {
LocalDateTime start = LocalDateTime.now();
System.out.println("Request for " + req.absoluteURI() +" received - " +start.format(DateTimeFormatter.ISO_DATE_TIME));
jdbc.getConnectionObservable().subscribe(
conn -> {
// Now chain some statements using flatmap composition
Observable<ResultSet> resa = conn.queryObservable("SELECT * FROM CALL_OPTION WHERE UNDERLYING='NIFTY'");
// Subscribe to the final result
resa.subscribe(resultSet -> {
req.response().end(resultSet.getRows().toString());
System.out.println("Request for " + req.absoluteURI() +" Ended - " +LocalDateTime.now().format(DateTimeFormatter.ISO_DATE_TIME));
}, err -> {
System.out.println("Database problem");
err.printStackTrace();
});
},
// Could not connect
err -> {
err.printStackTrace();
}
);
});
server.listen(4568);
The select query there takes 3 seconds approx to return the complete table dump.
When I fire concurrent requests(tried with just 2), I see that the second request completely waits for the first one to complete.
If the JDBC select is asynchronous, Isn't it a fair expectation to have the framework handle the second connection while it waits for the select query to return anything.?

Vert.x event loop is, in fact, a classical event loop existing on many platforms. And of course, most explanations and docs could be found for Node.js, as it's the most popular framework based on this architecture pattern. Take a look at one more or less good explanation of mechanics under Node.js event loop. Vert.x tutorial has fine explanation between "Don’t call us, we’ll call you" and "Verticles" too.
Edit for your updates:
First of all, when you are working with an event loop, the main thread should work very quickly for all requests. You shouldn't do any long job in this loop. And of course, you shouldn't wait for a response to your call to the database.
- Schedule a call asynchronously
- Assign a callback (handler) to result
- Callback will be executed in the worker thread, not event loop thread. This callback, for example, will return a response to the socket.
So, your operations in the event loop should just schedule all asynchronous operations with callbacks and go to the next request without awaiting any results.
Assume a typical request takes anywhere between 100 ms to 1 sec (based on the kind and nature of the request).
In that case, your request has some computation expensive parts or access to IO - your code in the event loop shouldn't wait for the result of these operations.
I'm just trying to understand how is this model better from a traditional thread/conn server models? Assume there is no I/O op or all the I/O op are handled asynchronously?
When you have too many concurrent requests and a traditional programming model, you will make thread per each request. What this thread will do? They will be mostly waiting for IO operations (for example, result from database). It's a waste of resources. In our event loop model, you have one main thread that schedule operations and preallocated amount of worker threads for long tasks. + None of these workers actually wait for the response, they just can execute another code while waiting for IO result (it can be implemented as callbacks or periodical checking status of IO jobs currently in progress). I would recommend you go through Java NIO and Java NIO 2 to understand how this async IO can be actually implemented inside the framework. Green threads is a very related concept too, that would be good to understand. Green threads and coroutines are a type of shadowed event loop, that trying to achieve the same thing - fewer threads because we can reuse system thread while green thread waiting for something.
How does it even solve c10k problem, when it can't start all concurrent requests parallel and have to wait till the previous one terminates?
For sure we don't wait in the main thread for sending the response for the previous request. Get request, schedule long/IO tasks execution, next request.
Even if I decide to push all these operations to a worker thread(pooled), then I'm back to the same problem isn't it? Context switching between threads?
If you make everything right - no. Even more, you will get good data locality and execution flow prediction. One CPU core will execute your short event loop and schedule async work without context switching and nothing more. Other cores make a call to the database and return response and only this. Switching between callbacks or checking different channels for IO status doesn't actually require any system thread's context switching - it's actually working in one worker thread. So, we have one worker thread per core and this one system thread await/checks results availability from multiple connections to database for example. Revisit Java NIO concept to understand how it can work this way. (Classical example for NIO - proxy-server that can accept many parallel connections (thousands), proxy requests to some other remote servers, listen to responses and send responses back to clients and all of this using one or two threads)
About your code, I made a sample project for you to demonstrate that everything works as expected:
public class MyFirstVerticle extends AbstractVerticle {
#Override
public void start(Future<Void> fut) {
JDBCClient client = JDBCClient.createShared(vertx, new JsonObject()
.put("url", "jdbc:hsqldb:mem:test?shutdown=true")
.put("driver_class", "org.hsqldb.jdbcDriver")
.put("max_pool_size", 30));
client.getConnection(conn -> {
if (conn.failed()) {throw new RuntimeException(conn.cause());}
final SQLConnection connection = conn.result();
// create a table
connection.execute("create table test(id int primary key, name varchar(255))", create -> {
if (create.failed()) {throw new RuntimeException(create.cause());}
});
});
vertx
.createHttpServer()
.requestHandler(r -> {
int requestId = new Random().nextInt();
System.out.println("Request " + requestId + " received");
client.getConnection(conn -> {
if (conn.failed()) {throw new RuntimeException(conn.cause());}
final SQLConnection connection = conn.result();
connection.execute("insert into test values ('" + requestId + "', 'World')", insert -> {
// query some data with arguments
connection
.queryWithParams("select * from test where id = ?", new JsonArray().add(requestId), rs -> {
connection.close(done -> {if (done.failed()) {throw new RuntimeException(done.cause());}});
System.out.println("Result " + requestId + " returned");
r.response().end("Hello");
});
});
});
})
.listen(8080, result -> {
if (result.succeeded()) {
fut.complete();
} else {
fut.fail(result.cause());
}
});
}
}
#RunWith(VertxUnitRunner.class)
public class MyFirstVerticleTest {
private Vertx vertx;
#Before
public void setUp(TestContext context) {
vertx = Vertx.vertx();
vertx.deployVerticle(MyFirstVerticle.class.getName(),
context.asyncAssertSuccess());
}
#After
public void tearDown(TestContext context) {
vertx.close(context.asyncAssertSuccess());
}
#Test
public void testMyApplication(TestContext context) {
for (int i = 0; i < 10; i++) {
final Async async = context.async();
vertx.createHttpClient().getNow(8080, "localhost", "/",
response -> response.handler(body -> {
context.assertTrue(body.toString().contains("Hello"));
async.complete();
})
);
}
}
}
Output:
Request 1412761034 received
Request -1781489277 received
Request 1008255692 received
Request -853002509 received
Request -919489429 received
Request 1902219940 received
Request -2141153291 received
Request 1144684415 received
Request -1409053630 received
Request -546435082 received
Result 1412761034 returned
Result -1781489277 returned
Result 1008255692 returned
Result -853002509 returned
Result -919489429 returned
Result 1902219940 returned
Result -2141153291 returned
Result 1144684415 returned
Result -1409053630 returned
Result -546435082 returned
So, we accept a request - schedule a request to the database, go to the next request, we consume all of them and send a response for each request only when everything is done with the database.
About your code sample I see two possible issues - first, it looks like you don't close() connection, which is important to return it to pool. Second, how your pool is configured? If there is only one free connection - these requests will serialize waiting for this connection.
I recommend you to add some printing of a timestamp for both requests to find a place where you serialize. You have something that makes the calls in the event loop to be blocking. Or... check that you send requests in parallel in your test. Not next after getting a response after previous.

How is this asynchronous? The answer is in your question itself
What I've observed is the so called event loop is blocked until my
first request completes. Whatever little time it takes, subsequent
request is not acted upon until the previous one completes
The idea is instead of having a new for serving each HTTP request, same thread is used which you have blocked by your long running task.
The goal of event loop is to save the time involved in context switching from one thread to another thread and utilize the ideal CPU time when a task is using IO/Network activities. If while handling your request it had to other IO/Network operation eg: fetching data from a remote MongoDB instance during that time your thread will not be blocked and instead an another request would be served by the same thread which is the ideal use case of event loop model (Considering that you have concurrent requests coming to your server).
If you have long running tasks which does not involve Network/IO operation, you should consider using thread pool instead, if you block your main event loop thread itself other requests would be delayed. i.e. for long running tasks you are okay to pay the price of context switching for for server to be responsive.
EDIT:
The way a server can handle requests can vary:
1) Spawn a new thread for each incoming request (In this model the context switching would be high and there is additional cost of spawning a new thread every time)
2) Use a thread pool to server the request (Same set of thread would be used to serve requests and extra requests gets queued up)
3) Use a event loop (single thread for all the requests. Negligible context switching. Because there would be some threads running e.g: to queue up the incoming requests)
First of all context switching is not bad, it is required to keep application server responsive, but, too much context switching can be a problem if the number of concurrent requests goes too high (roughly more than 10k). If you want to understand in more detail I recommend you to read C10K article
Assume a typical request takes anywhere between 100 ms to 1 sec (based
on the kind and nature of the request). So it means, the event loop
can't accept a new connection until the previous request finishes(even
if its winds up in a second).
If you need to respond to large number of concurrent requests (more than 10k) I would consider more than 500ms as a longer running operation. Secondly, Like I said there are some threads/context switching involved e.g.: to queue up incoming requests, but, the context switching amongst threads would be greatly reduced as there would be too few threads at a time. Thirdly, if there is a network/IO operation involved in resolving first request second request would get a chance to be resolved before first is resolved, this is where this model plays well.
And If I as a programmer have to think
through all these and push such request handlers to a worker thread ,
then how does it differ from a thread/connection model?
Vertx is trying to give you best of threads and event loop, so, as programmer you can make a call on how to make your application efficient under both the scenario i.e. long running operation with and without network/IO operation.
I'm just trying to understand how is this model better from a
traditional thread/conn server models? Assume there is no I/O op or
all the I/O op are handled asynchronously? How does it even solve c10k
problem, when it can't start all concurrent requests parallely and
have to wait till the previous one terminates?
The above explanation should answer this.
Even if I decide to push all these operations to a worker
thread(pooled), then I'm back to the same problem isn't it? Context
switching between threads?
Like I said, both have pros and cons and vertx gives you both the model and depending on your use case you got to choose what is ideal for your scenario.

In these sort of processing engines, you are supposed to turn long running tasks in to asynchronously executed operations and these is a methodology for doing this, so that the critical thread can complete as quickly as possible and return to perform another task. i.e. any IO operations are passed to the framework to call you back when the IO is done.
The framework is asynchronous in the sense that it supports you producing and running these asynchronous tasks, but it doesn't change your code from being synchronous to asynchronous.

Why does async-http-client does not throttle my requests?

I have an Akka actor that owns an AsyncHttpClient. This actor must handles a lot of asynchronous requests. Because my system cannot handle thousands of requests simultaneously, I need to limit the number of concurrent requests.
Right now, I'm doing this :
AsyncHttpClientConfig config = new AsyncHttpClientConfig.Builder().setAllowPoolingConnection(true)
.addRequestFilter(new ThrottleRequestFilter(32))
.setMaximumConnectionsPerHost(16)
.setMaxRequestRetry(5)
.build();
final AsyncHttpClient httpClient = new AsyncHttpClient(new NettyAsyncHttpProvider(config));
When my actor receives a message, I use the client like this :
Future<Integer> f = httpClient.prepareGet(url).execute(
new AsyncCompletionHandler<Integer>() {
#Override
public Integer onCompleted(Response response) throws Exception {
// handle successful request
}
#Override
public void onThrowable(Throwable t){
// handle failed request
}
}
);
The problem is that requests are never put in the client queue and are all processed like the configuration doesn't matter. Why doesn't this work as it should?

From the maintainer:
setMaxConnectionsPerHost only caps the number of connections that can be open to a given host. There's no built-in queuing mechanism for requests that might need a connection while there's none available.
So basically, it's a hard limit. Also, in versions of the library prior to, I believe, 1.9.10, the maximumConnectionsPerHost field was not being properly utilized by the code to limit the number of concurrent connections per host. Instead, there was a bug where the client only looked at the maximumConnectionsTotal field.
Link to issue referenced on GitHub

Can I use Thread.sleep in a servlet to add random delays for the local server to answer my api call?

My deployed server has sometimes long response times, while working and developing at localhost all calls are really fast.
This has made my application enter unexpected behaviour once deployed a few times due to problems with resource loading taking too long.
I'd like to simulate in my local tests the bad connection with my real server, therefore I want to add a random delay to every request-response and my first thought was to use Thread.sleep in the servlet:
protected void doPost(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
//add delay before processing request
if (DELAY > 0){
int delay;
if (RANDOMIZE){
delay = Random.nextInt(DELAY);
} else {
delay = DELAY;
}
try {
Thread.sleep(delay);
} catch (InterruptedException e1) {
logger.error(e1);
}
}
...
However I have read that one should not use Thread.sleep() inside a servlet, but the context of such discouragement and their solutions are drastically different from my case, can I use thread.sleep() in this context?
EDIT: This is of course only for local and for the client to be strained a bit in the local tests... I just want to simulate the bad network I've encountered in reality!

I think this whole approach is flawed. I wouldn't introduce a random delay (how are you going to repeat test cases?). You can introduce a Thread.sleep(), but I wouldn't. Would this be in your production code ? Is it configurable ? What happens if it's accidentlally turned on in production ?
I would rather set up a test server with the exact characteristics of your production environment. That way you can not only debug effectively, but build a regression test suite that will allow you to develop effectively, knowing how the application will perform in production.
Perhaps the one concession to the above is to introduce network delays (as appropriate) between client and server if your users are geographically diverse. That's often done using a hardware device on the network and wouldn't affect your code or configuration.

I did this to get delay :
response.setContentType("text/html;charset=UTF-8");
try (PrintWriter out = response.getWriter()) {
out.println("<meta http-equiv=\"Refresh\" content=\"3;url=home.jsp/\">");
}
Remember that in content=\"3;url=home.jsp/\", 3 is the delay seconds and home.jsp is the page you want to go to after the given seconds.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.