Embedded Jetty timeout under load

Embedded Jetty timeout under load - java

I have an akka (Java) application with camel-jetty consumer. Under some minimum load (about 10 TPS), our client starts seeing HTTP 503 error. I tried to reproduce the problem in our lab, and it seems jetty can't handle overlapping HTTP requests. Below is the output from apache bench (ab):
ab sends 10 requests using one single thread (i.e. one request at a time)
ab -n 10 -c 1 -p bad.txt http://192.168.20.103:8899/pim
Benchmarking 192.168.20.103 (be patient).....done
Server Software: Jetty(8.1.16.v20140903)
Server Hostname: 192.168.20.103
Server Port: 8899
Document Path: /pim
Document Length: 33 bytes
Concurrency Level: 1
Time taken for tests: 0.61265 seconds
Complete requests: 10
Failed requests: 0
Requests per second: 163.23 [#/sec] (mean)
Time per request: 6.126 [ms] (mean)
Time per request: 6.126 [ms] (mean, across all concurrent requests)
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.0 1 2
Processing: 3 4 1.8 5 7
Waiting: 2 4 1.8 5 7
Total: 3 5 1.9 6 8
Percentage of the requests served within a certain time (ms)
50% 6
66% 6
75% 6
80% 8
90% 8
95% 8
98% 8
99% 8
100% 8 (longest request)
ab sends 10 requests using two threads (up to 2 requests at the same time):
ab -n 10 -c 2 -p bad.txt http://192.168.20.103:8899/pim
Benchmarking 192.168.20.103 (be patient).....done
Server Software: Jetty(8.1.16.v20140903)
Server Hostname: 192.168.20.103
Server Port: 8899
Document Path: /pim
Document Length: 33 bytes
Concurrency Level: 2
Time taken for tests: 30.24549 seconds
Complete requests: 10
Failed requests: 1
(Connect: 0, Length: 1, Exceptions: 0)
// obmited for clarity
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.9 1 2
Processing: 3 3005 9492.9 4 30023
Waiting: 2 3005 9492.7 3 30022
Total: 3 3006 9493.0 5 30024
Percentage of the requests served within a certain time (ms)
50% 5
66% 5
75% 7
80% 7
90% 30024
95% 30024
98% 30024
99% 30024
100% 30024 (longest request)
I don't believe jetty is this bad. Hopefully, it's just a configuration issue. This is the setting for my camel consumer URI:
"jetty:http://0.0.0.0:8899/pim?replyTimeout=70000&autoAck=false"
I am using akka 2.3.12 and camel-jetty 2.15.2

Jetty is certain not that bad and should be able to handle 10s of thousands of connections with many thousands of TPS.
Hard to diagnose from what you have said, other than Jetty does not send 503's when it is under load.... unless perhaps if the Denial of Service protection filter is deployed? (and ab would look like a DOS attack.... which it basically is and is not a great load generator for benchmarking).
So you need to track down who/what is sending that 503 and why.

It was my bad code: the sender (client) info was overwritten with overlapping requests. The 503 error message was sent due to Jetty continuation timeout.

Related

Restriction input queu size at Micronaut(Netty)

When the application is started but not yet warmed(Jit need time), it cannot process the expected RPS.
The problem is in the incoming queue. As the IO thread continues to process requests, there are many requests in the queue that the GC cannot clean up. After the overflow of the Survived generation, GC starts perfom major pause, which slows down the execution of requests even more and after some time the application falls on the OOM.
My application have self warmed readnessProbe (3k random request).
I try to configure count of thread and queue size:
application.yml
micronaut:
server:
port: 8080
netty:
parent:
threads: 2
worker:
threads: 2
executors:
io:
n-threads: 1
parallelism: 1
type: FIXED
scheduled:
n-threads: 1
parallelism: 1
corePoolSize: 1
And some props
System.setProperty("io.netty.eventLoop.maxPendingTasks", "16")
System.setProperty("io.netty.eventexecutor.maxPendingTasks", "16")
System.setProperty("io.netty.eventLoopThreads", "1")
But the queue keeps filling up:
i want to find somw way to restriction input queu size at Micronaut, so that the application does not failed under high load

Concurrency in Spring Boot

I have spring boot application with embedded jetty and its configurations are:
jetty's minThread: 50
jetty's maxThread: 500
jetty's maxQueueSize: 25000 (I changed default queue to LinkedBlockingQueue)
I didn't change acceptors and selectors (since I dont believe on hard coding the value)
With above configuration, I am getting below jmeter test results:
Concurrent Users: 60
summary = 183571 in 00:01:54 = 1611.9/s Avg: 36 Min: 3 Max:
1062 Err: 0 (0.00%)
Concurrent Users: 75
summary = 496619 in 00:05:00 = 1654.6/s Avg: 45 Min: 3 Max:
1169 Err: 0 (0.00%)
If I increase concurrent users, I dont see any improvement. I want to increase concurrency. How to achieve this?
===========================================================================
Updating on 29-March-2019
I was spending more effort on improving business logic. Still no much improvement. Then I decided to develop one hello world spring-boot project.
i.e.,
spring-boot (1.5.9)
jetty 9.4.15
rest controller which has get endpoint
code below:
#GetMapping
public String index() {
return "Greetings from Spring Boot!";
}
Then I tried to benchmark using apachebench
75 concurrent users:
ab -t 120 -n 1000000 -c 75 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 75
Time taken for tests: 37.184 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 26893.28 [#/sec] (mean)
Time per request: 2.789 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3755.61 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 23.5 0 3006
Processing: 0 2 7.8 1 404
Waiting: 0 2 7.8 1 404
Total: 0 3 24.9 2 3007
100 concurrent users:
ab -t 120 -n 1000000 -c 100 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 100
Time taken for tests: 36.708 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27241.77 [#/sec] (mean)
Time per request: 3.671 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3804.27 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 35.7 1 3007
Processing: 0 2 9.4 1 405
Waiting: 0 2 9.4 1 405
Total: 0 4 37.0 2 3009
500 concurrent users:
ab -t 120 -n 1000000 -c 500 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 500
Time taken for tests: 36.222 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27607.83 [#/sec] (mean)
Time per request: 18.111 [ms] (mean)
Time per request: 0.036 [ms] (mean, across all concurrent requests)
Transfer rate: 3855.39 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 14 126.2 1 7015
Processing: 0 4 22.3 1 811
Waiting: 0 3 22.3 1 810
Total: 0 18 129.2 2 7018
1000 concurrent users:
ab -t 120 -n 1000000 -c 1000 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 1000
Time taken for tests: 36.534 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27372.09 [#/sec] (mean)
Time per request: 36.534 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3822.47 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 30 190.8 1 7015
Processing: 0 6 31.4 2 1613
Waiting: 0 5 31.4 1 1613
Total: 0 36 195.5 2 7018
From above test run, I achieved ~27K per second with 75 users itself but it looks increasing the users also increasing the latency. Also, we can clearly note connect time is increasing.
I have requirement for my application to support 40k concurrent users (assume all are using own separate browsers) and request should be finished within 250 milliseconds.
Please help me on this

You can try increasing or decreasing the number of Jetty threads but the application performance will depend on the application logic. If your current bottleneck is the database query you will see hardly any improvements by tuning HTTP layer, especially when testing over local network.
Find the bottleneck in your application, attempt to improve it, and then measure again to confirm it's better. Repeat this three steps until achieving desired performance. Do not tune performance blindly, it's a waste of time.

Cassandra Read/Write performance - High CPU

I have started using Casandra since last few days and here is what I am trying to do.
I have about 2 Million+ objects which maintain profiles of users. I convert these objects to json, compress and store them in a blob column. The average compressed json size is about 10KB. This is how my table looks in cassandra,
Table:
dev.userprofile (uid varchar primary key, profile blob);
Select Query:
select profile from dev.userprofile where uid='';
Update Query:
update dev.userprofile set profile='<bytebuffer>' where uid = '<uid>'
Every hour, I get events from a queue which I apply to my userprofile object. Each event corresponds to one userprofile object. I get about 1 Million of such events, so I have to update around 1M of the userprofile objects within a short time i.e update the object in my application, compress the json and update the cassandra blob. I have to finish updating all of 1 Million user profile objects preferably in few mins. But I notice its taking longer now.
While running my application, I notice that I can update around 400 profiles/second on an average. I already see a lot of CPU iowait - 70%+ on cassandra instance. Also, the load initially is pretty high around 16 (on 8 vcpu instance) and then drops off to around 4.
What am I doing wrong? Because, when I was updating smaller objects of size 2KB I noticed that cassandra operations /sec is much faster. I was able to get about 3000 Ops/sec. Any thoughts on how I should improve the performance?
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-extras</artifactId>
<version>3.1.0</version>
</dependency>
I just have one node of cassandra setup within a m4.2xlarge aws instance for testing
Single node Cassandra instance
m4.2xlarge aws ec2
500 GB General Purpose (SSD)
IOPS - 1500 / 10000
nodetool cfstats output
Keyspace: dev
Read Count: 688795
Read Latency: 27.280683695439137 ms.
Write Count: 688780
Write Latency: 0.010008401811899301 ms.
Pending Flushes: 0
Table: userprofile
SSTable count: 9
Space used (live): 32.16 GB
Space used (total): 32.16 GB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 13.56 MB
SSTable Compression Ratio: 0.9984539538554672
Number of keys (estimate): 2215817
Memtable cell count: 38686
Memtable data size: 105.72 MB
Memtable off heap memory used: 0 bytes
Memtable switch count: 6
Local read count: 688807
Local read latency: 29.879 ms
Local write count: 688790
Local write latency: 0.012 ms
Pending flushes: 0
Bloom filter false positives: 47
Bloom filter false ratio: 0.00003
Bloom filter space used: 7.5 MB
Bloom filter off heap memory used: 7.5 MB
Index summary off heap memory used: 2.07 MB
Compression metadata off heap memory used: 3.99 MB
Compacted partition minimum bytes: 216 bytes
Compacted partition maximum bytes: 370.14 KB
Compacted partition mean bytes: 5.82 KB
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
nodetool cfhistograms output
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 3.00 9.89 2816.16 4768 2
75% 3.00 11.86 43388.63 8239 2
95% 4.00 14.24 129557.75 14237 2
98% 4.00 20.50 155469.30 17084 2
99% 4.00 29.52 186563.16 20501 2
Min 0.00 1.92 61.22 216 2
Max 5.00 74975.55 4139110.98 379022 2
Dstat output
---load-avg--- --io/total- ---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- -net/total-
1m 5m 15m | read writ|run blk new| used buff cach free| in out | read writ| int csw |usr sys idl wai hiq siq| recv send
12.8 13.9 10.6|1460 31.1 |1.0 14 0.2|9.98G 892k 21.2G 234M| 0 0 | 119M 3291k| 63k 68k| 1 1 26 72 0 0|3366k 3338k
13.2 14.0 10.7|1458 28.4 |1.1 13 1.5|9.97G 884k 21.2G 226M| 0 0 | 119M 3278k| 61k 68k| 2 1 28 69 0 0|3396k 3349k
12.7 13.8 10.7|1477 27.6 |0.9 11 1.1|9.97G 884k 21.2G 237M| 0 0 | 119M 3321k| 69k 72k| 2 1 31 65 0 0|3653k 3605k
12.0 13.7 10.7|1474 27.4 |1.1 8.7 0.3|9.96G 888k 21.2G 236M| 0 0 | 119M 3287k| 71k 75k| 2 1 36 61 0 0|3807k 3768k
11.8 13.6 10.7|1492 53.7 |1.6 12 1.2|9.95G 884k 21.2G 228M| 0 0 | 119M 6574k| 73k 75k| 2 2 32 65 0 0|3888k 3829k
Edit
Switched to LeveledCompactionStrategy & disabled compression on sstables, I don't see a big improvement:
There was a bit of improvement in profiles/sec updated. Its now 550-600 profiles /sec. But, the cpu spikes remain i.e the iowait.
gcstats
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
755960 83 3449 8 73179796264 107 -1
dstats
---load-avg--- --io/total- ---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- -net/total-
1m 5m 15m | read writ|run blk new| used buff cach free| in out | read writ| int csw |usr sys idl wai hiq siq| recv send
7.02 8.34 7.33| 220 16.6 |0.0 0 1.1|10.0G 756k 21.2G 246M| 0 0 | 13M 1862k| 11k 13k| 1 0 94 5 0 0| 0 0
6.18 8.12 7.27|2674 29.7 |1.2 1.5 1.9|10.0G 760k 21.2G 210M| 0 0 | 119M 3275k| 69k 70k| 3 2 83 12 0 0|3906k 3894k
5.89 8.00 7.24|2455 314 |0.6 5.7 0|10.0G 760k 21.2G 225M| 0 0 | 111M 39M| 68k 69k| 3 2 51 44 0 0|3555k 3528k
5.21 7.78 7.18|2864 27.2 |2.6 3.2 1.4|10.0G 756k 21.2G 266M| 0 0 | 127M 3284k| 80k 76k| 3 2 57 38 0 0|4247k 4224k
4.80 7.61 7.13|2485 288 |0.1 12 1.4|10.0G 756k 21.2G 235M| 0 0 | 113M 36M| 73k 73k| 2 2 36 59 0 0|3664k 3646k
5.00 7.55 7.12|2576 30.5 |1.0 4.6 0|10.0G 760k 21.2G 239M| 0 0 | 125M 3297k| 71k 70k| 2 1 53 43 0 0|3884k 3849k
5.64 7.64 7.15|1873 174 |0.9 13 1.6|10.0G 752k 21.2G 237M| 0 0 | 119M 21M| 62k 66k| 3 1 27 69 0 0|3107k 3081k
You could notice the cpu spikes.
My main concern is iowait before I increase the load further. Anything specific I should looking for thats causing this? Because, 600 profiles / sec (i.e 600 Reads + Writes) seems low to me.

Can you try LeveledCompactionStrategy? With 1:1 read/writes on large objects like this the IO saved on reads will probably counter the IO spent on the more expensive compactions.
If your already compressing the data before sending it, you should turn off compression on the table. Its breaking it into 64kb chunks which will be largely dominated by only 6 values which wont get much compression (as shown in horrible compression ratio SSTable Compression Ratio: 0.9984539538554672).
ALTER TABLE dev.userprofile
WITH compaction = { 'class' : 'LeveledCompactionStrategy' }
AND compression = { 'sstable_compression' : '' };
400 profiles/second is very very slow though and there may be some work to do on your client that could potentially be bottleneck as well. If you have a 4 load on a 8 core system its may not Cassandra slowing things down. Make sure your parallelizing your requests and using them asynchronously, sending requests sequentially is a common issue.
With larger blobs there is going to be an impact on GCs, so monitoring them and adding that information can be helpful. I would be surprised for 10kb objects to affect it that much but its something to look out for and may require more JVM tuning.
If that helps, from there I would recommend tuning the heap and upgrading to at least 3.7 or latest in 3.0 line.

Tomcat response time is increasing as concurrency is increased in apache bench

When I am increasing concurrency -c in ab (apache bench) then I am getting different response time.
-c 99%
3 2ms
5 4ms
8 6ms
If I increase -c further then response time will increase as well.My code is extremely simple, No thread starvation, no blocking, etc. Then why the response time is getting increased?
I think that ab -c 8 means 8 simultaneous request would be made to http://localhost:8070/benchmark. So is it possible that requests are getting pooled at tomcat? If it is so then how can I make tomcat handle more concurrent users with less response time (4ms at 99%)?
#RestController
public class PerformanceController {
#RequestMapping(value="/benchmark",method=RequestMethod.GET)
public String getRetargetingData(){
return "Just plain call";
}
}
ritesh#ritesh:~$ ab -n 10000 -c 3 'http://localhost:8070/benchmark'
Server Software: Apache-Coyote/1.1
Server Hostname: localhost
Server Port: 8070
Document Path: /benchmark
Document Length: 15 bytes
Concurrency Level: 3
Time taken for tests: 2.602 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 4050000 bytes
HTML transferred: 150000 bytes
Requests per second: 3843.57 [#/sec] (mean)
Time per request: 0.781 [ms] (mean)
Time per request: 0.260 [ms] (mean, across all concurrent requests)
Transfer rate: 1520.16 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 0 1 0.5 1 23
Waiting: 0 1 0.4 1 23
Total: 0 1 0.5 1 24
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 1
98% 2
99% 2
100% 24 (longest request)
ritesh#ritesh:~$ ab -n 10000 -c 6 'http://localhost:8070/benchmark'
Server Software: Apache-Coyote/1.1
Server Hostname: localhost
Server Port: 8070
Document Path: /benchmark
Document Length: 15 bytes
Concurrency Level: 6
Time taken for tests: 1.814 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 4050000 bytes
HTML transferred: 150000 bytes
Requests per second: 5514.16 [#/sec] (mean)
Time per request: 1.088 [ms] (mean)
Time per request: 0.181 [ms] (mean, across all concurrent requests)
Transfer rate: 2180.89 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 1
Processing: 0 1 0.9 1 22
Waiting: 0 1 0.8 1 22
Total: 0 1 0.9 1 22
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 2
95% 2
98% 3
99% 4
100% 22 (longest request)
ritesh#ritesh:~$ ab -n 10000 -c 8 'http://localhost:8070/benchmark'
Server Software: Apache-Coyote/1.1
Server Hostname: localhost
Server Port: 8070
Document Path: /benchmark
Document Length: 15 bytes
Concurrency Level: 8
Time taken for tests: 1.889 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 4050000 bytes
HTML transferred: 150000 bytes
Requests per second: 5293.55 [#/sec] (mean)
Time per request: 1.511 [ms] (mean)
Time per request: 0.189 [ms] (mean, across all concurrent requests)
Transfer rate: 2093.64 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 2
Processing: 0 1 1.8 1 50
Waiting: 0 1 1.8 1 50
Total: 0 1 1.8 1 50
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 2
80% 2
90% 2
95% 3
98% 5
99% 6
100% 50 (longest request)
ritesh#ritesh:~$ ab -n 10000 -c 500 'http://localhost:8070/benchmark'
Server Software: Apache-Coyote/1.1
Server Hostname: localhost
Server Port: 8070
Document Path: /benchmark
Document Length: 15 bytes
Concurrency Level: 500
Time taken for tests: 1.830 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 4050000 bytes
HTML transferred: 150000 bytes
Requests per second: 5464.11 [#/sec] (mean)
Time per request: 91.506 [ms] (mean)
Time per request: 0.183 [ms] (mean, across all concurrent requests)
Transfer rate: 2161.10 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 25 154.8 0 1001
Processing: 1 23 45.7 12 818
Waiting: 1 23 45.6 12 818
Total: 2 48 188.5 12 1818
Percentage of the requests served within a certain time (ms)
50% 12
66% 17
75% 22
80% 26
90% 45
95% 66
98% 1021
99% 1220
100% 1818 (longest request)

how to improve jboss response time

Environment is JBoss 5.1.0,
RestEasy web service deployed and load test with about 10 transaction per seconds.
Need to improve the response time of Resteasy Web service,
Then change the working thread number to 100.
jboss-5.1.0.GA/server/default/deploy/jbossweb.sar/server.xml
<!-- connectionTimeout and keepAliveTimeout should be greater than the
Heartbeat period between SIPServer and XS.
(Here 60s for an Heartbeat less than 60s) -->
<Connector protocol="${http.connector.class}"
port="${xs.http.port}" address="${jboss.bind.address}"
allowTrace="false" enableLookups="false"
SSLEnabled="false"
connectionTimeout="60000" keepAliveTimeout="60000"
maxKeepAliveRequests="-1" maxThreads="100" compression="off" />
while admin-console shows the result is not good, the max processing time is still great than 1000ms.
Max threads: 100
Current thread count: 4
Current thread busy: 4
Max processing time: 1061 ms
Processing time: 55.7 s
Request count: 657
Error count: 1
Bytes received: 0.08 MB
Bytes sent: 0.86 MB
Question is how to improve the response time of JBoss Web Service.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Embedded Jetty timeout under load - java

It was my bad code: the sender (client) info was overwritten with overlapping requests. The 503 error message was sent due to Jetty continuation timeout.

Related

Restriction input queu size at Micronaut(Netty)

Concurrency in Spring Boot

Cassandra Read/Write performance - High CPU

Tomcat response time is increasing as concurrency is increased in apache bench

how to improve jboss response time

Categories

Resources