Qpid Benchmark: Have questions related to quid benchmark - java

I am trying to benchmark Qpid with the following use case:
Default Qpid configs are used(ex: 2GB is the max memory set), broker and
client are on the same machine
I have 1 connection and 256 sessions per connection, each sessions has a
producer and consumer. So, there are 256 producers and 256 consumers
All the producers/consumers are created before they start
producing/consuming messages. Each producer/consumer is a thread and they
run parallely
Consumers start consuming(they wait with .receive()). All consumer are durableSubscribers
producers start producing messages, each producer produces only 1
message, so there are 256 messages produced in total
A fanout exchange is used(topic.fanout=fanout://amq.fanout//fanOutTopic),
and there are 256 consumers, each consumer receives 256 messages and so
there are 256*256 messages received in total
Following are the response times(RT's) for the messages:
Response time is defined as the difference in the time when the
message is sent to the broker and the time at which the message is received
at the consumer
min: 144.0 ms
max: 350454.0 ms
average: 151933.02 ms
stddev: 113347.89 ms
95th percentile: 330559.0 ms
Is there any thing that I am doing wrong fundamentally?. I am worried about
the avg response times of "152 secs". Is this expected from qpid?. I see
a pattern here, as the test is running the RT's are increasing linearly
over time.
Thank you,
Siva.

Related

Restriction input queu size at Micronaut(Netty)

When the application is started but not yet warmed(Jit need time), it cannot process the expected RPS.
The problem is in the incoming queue. As the IO thread continues to process requests, there are many requests in the queue that the GC cannot clean up. After the overflow of the Survived generation, GC starts perfom major pause, which slows down the execution of requests even more and after some time the application falls on the OOM.
My application have self warmed readnessProbe (3k random request).
I try to configure count of thread and queue size:
application.yml
micronaut:
server:
port: 8080
netty:
parent:
threads: 2
worker:
threads: 2
executors:
io:
n-threads: 1
parallelism: 1
type: FIXED
scheduled:
n-threads: 1
parallelism: 1
corePoolSize: 1
And some props
System.setProperty("io.netty.eventLoop.maxPendingTasks", "16")
System.setProperty("io.netty.eventexecutor.maxPendingTasks", "16")
System.setProperty("io.netty.eventLoopThreads", "1")
But the queue keeps filling up:
i want to find somw way to restriction input queu size at Micronaut, so that the application does not failed under high load

When is offset committed in KafkaListener.java in Springboot?

There are two parts to my question.
In below Springboot KafkaListener implementation when does offset get committed for offset strategy auto-offset-reset: latest and enable-auto-commit: true? Is it immediately after the message is received by the consumer or after the whole method implementing the KafkaListener has completed?
KafkaConsumer.java
#KafkaListener(topics = "${spring.kafka.consumer.topic}")
public ResponseEntity<String> consume(String message) {
log.info("Message recieved from Kafka topic {}", message); // offset committed HERE?
KafkaResponse kafkaResponse = new Gson().fromJson(message, KafkaResponse.class);
myBusinessService.processKafkaResponse(kafkaResponse);
return new ResponseEntity<>("Successfully Received", HttpStatus.OK);// OR offset committed HERE?
}
In application.yml for the below properties which statements are True/False/what's the correct answer?:
max-poll-records: 100
max-poll-interval-ms: 200000
enable-auto-commit: true
auto-commit-interval: 3000
auto-offset-reset: latest
isolation-level: READ_UNCOMMITTED
fetch-max-bytes: 52428800
Between auto-offset-reset: latest and auto-commit-interval: 3000 if the Kafka consumer breaks down before 3 seconds then no offset would be committed for any record processed during those 3 seconds?
Between max-poll-interval-ms: 200000 and auto-commit-interval: 3000 Kafka consumer would surely be polling the broker after an interval of 200000 seconds even if the consumer hasn't finished committing the current batch of offsets in 3000 seconds?
Between the combination of max-poll-records: 100 and fetch-max-bytes: 52428800 which would take a priority if 99th record has exceeded 52428800 bytes?
Thanks in advance!
It's best not to use enable.auto.commit=true. Spring commits the offsets in a more deterministic fashion, either after all the records from a poll have been processed (default - AckMode.BATCH) or after each record is processed AckMode.RECORD.
enable.auto.commit won't commit until the next poll() and then only if the the auto.commit.interval has passed.

Hazelcast Operation Heartbeat Timeouts appearing sporadically

We have a Hazelcast client (3.7.4):
//Initializes Hazelcast client config
ClientConfig aHazelcastClientConfig = new ClientConfig();
String aHazelcastUrl = this.getHost()+":"+this.getPort().toString();
ClientNetworkConfig aHazelcastNetworkConfig=
aHazelcastClientConfig.getNetworkConfig();
aHazelcastNetworkConfig.addAddress(aHazelcastUrl);
GroupConfig group = new GroupConfig (getGroupName(),getGroupPassword());
aHazelcastClientConfig.setGroupConfig(group);
HazelcastInstance aHazelcastClient=
HazelcastClient.newHazelcastClient(aHazelcastClientConfig);
...
IMap aMonitoredMap = aHazelcastClient.getMap(getMonitoredMap());
that periodically checks one HZ Server (3.7.4), and we have observed sometimes next exceptions are appearing in the client side:
InitializeDistributedObjectOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2017-02-07 18:07:30.329. Total elapsed time: 120189 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2017-02-07 18:05:37.489. Invocation{op=com.hazelcast.spi.impl.proxyservice.impl.operations.InitializeDistributedObjectOperation{serviceName='hz:impl:mapService', identityHash=9759664, partitionId=-1, replicaIndex=0, callId=0, invocationTime=1486487130140 (2017-02-07 18:05:30.140), waitTimeout=-1, callTimeout=60000}, tryCount=1, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1486487130140, firstInvocationTime='2017-02-07 18:05:30.140', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 01:00:00.000', target=[10.118.152.82]:5720, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=7, /172.22.191.200:5720->/10.118.152.82:42563, endpoint=[10.118.152.82]:5720, alive=true, type=MEMBER]}
It seems the maximum call waiting timeout (by default 60000 msecs) is being reached. In the above example, the total elapsed time is more than 2 minutes (120189 ms)
This problem is appearing sporadically, without any regular appearance pattern.
It seems the network is working correctly when it has appeared, so we can discard some network connectivity issue.
Any hint or recommendation about which reasons could provoke it?
Thanks a lot.
Best Regards,
Jorge

Embedded Jetty timeout under load

I have an akka (Java) application with camel-jetty consumer. Under some minimum load (about 10 TPS), our client starts seeing HTTP 503 error. I tried to reproduce the problem in our lab, and it seems jetty can't handle overlapping HTTP requests. Below is the output from apache bench (ab):
ab sends 10 requests using one single thread (i.e. one request at a time)
ab -n 10 -c 1 -p bad.txt http://192.168.20.103:8899/pim
Benchmarking 192.168.20.103 (be patient).....done
Server Software: Jetty(8.1.16.v20140903)
Server Hostname: 192.168.20.103
Server Port: 8899
Document Path: /pim
Document Length: 33 bytes
Concurrency Level: 1
Time taken for tests: 0.61265 seconds
Complete requests: 10
Failed requests: 0
Requests per second: 163.23 [#/sec] (mean)
Time per request: 6.126 [ms] (mean)
Time per request: 6.126 [ms] (mean, across all concurrent requests)
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.0 1 2
Processing: 3 4 1.8 5 7
Waiting: 2 4 1.8 5 7
Total: 3 5 1.9 6 8
Percentage of the requests served within a certain time (ms)
50% 6
66% 6
75% 6
80% 8
90% 8
95% 8
98% 8
99% 8
100% 8 (longest request)
ab sends 10 requests using two threads (up to 2 requests at the same time):
ab -n 10 -c 2 -p bad.txt http://192.168.20.103:8899/pim
Benchmarking 192.168.20.103 (be patient).....done
Server Software: Jetty(8.1.16.v20140903)
Server Hostname: 192.168.20.103
Server Port: 8899
Document Path: /pim
Document Length: 33 bytes
Concurrency Level: 2
Time taken for tests: 30.24549 seconds
Complete requests: 10
Failed requests: 1
(Connect: 0, Length: 1, Exceptions: 0)
// obmited for clarity
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.9 1 2
Processing: 3 3005 9492.9 4 30023
Waiting: 2 3005 9492.7 3 30022
Total: 3 3006 9493.0 5 30024
Percentage of the requests served within a certain time (ms)
50% 5
66% 5
75% 7
80% 7
90% 30024
95% 30024
98% 30024
99% 30024
100% 30024 (longest request)
I don't believe jetty is this bad. Hopefully, it's just a configuration issue. This is the setting for my camel consumer URI:
"jetty:http://0.0.0.0:8899/pim?replyTimeout=70000&autoAck=false"
I am using akka 2.3.12 and camel-jetty 2.15.2
Jetty is certain not that bad and should be able to handle 10s of thousands of connections with many thousands of TPS.
Hard to diagnose from what you have said, other than Jetty does not send 503's when it is under load.... unless perhaps if the Denial of Service protection filter is deployed? (and ab would look like a DOS attack.... which it basically is and is not a great load generator for benchmarking).
So you need to track down who/what is sending that 503 and why.
It was my bad code: the sender (client) info was overwritten with overlapping requests. The 503 error message was sent due to Jetty continuation timeout.

Load testing ironmq

I'm doing some load testing of ironmq sending 500 messages and afterwards consuming them.
So far I'm able to send 16 msg's pr. sec and consume (read/delete) about 5 msg's pr. sec. using ironAWSEUWest on my local machine.
I use the v. 0.0.18 java client sdk.
output :
[l-1) thread #0 - dataset://foo] dataset://foo?produceDelay=5 INFO Sent: 100 messages so far. Last group took: 6066 millis which is: 16,485 messages per second. average: 16,485
[l-1) thread #0 - dataset://foo] dataset://foo?produceDelay=5 INFO Sent: 200 messages so far. Last group took: 6504 millis which is: 15,375 messages per second. average: 15,911
[l-1) thread #0 - dataset://foo] dataset://foo?produceDelay=5 INFO Sent: 300 messages so far. Last group took: 6560 millis which is: 15,244 messages per second. average: 15,682
[thread #1 - ironmq://testqueue] dataset://foo?produceDelay=5 INFO Received: 100 messages so far. Last group took: 17128 millis which is: 5,838 messages per second. average: 5,838
[l-1) thread #0 - dataset://foo] dataset://foo?produceDelay=5 INFO Sent: 400 messages so far. Last group took: 6415 millis which is: 15,588 messages per second. average: 15,659
[l-1) thread #0 - dataset://foo] dataset://foo?produceDelay=5 INFO Sent: 500 messages so far. Last group took: 7089 millis which is: 14,106 messages per second. average: 15,321
[thread #1 - ironmq://testqueue] dataset://foo?produceDelay=5 INFO Received: 200 messages so far. Last group took: 17957 millis which is: 5,569 messages per second. average: 5,7
[thread #1 - ironmq://testqueue] dataset://foo?produceDelay=5 INFO Received: 300 messages so far. Last group took: 18281 millis which is: 5,47 messages per second. average: 5,622
[thread #1 - ironmq://testqueue] dataset://foo?produceDelay=5 INFO Received: 400 messages so far. Last group took: 18206 millis which is: 5,493 messages per second. average: 5,589
[thread #1 - ironmq://testqueue] dataset://foo?produceDelay=5 INFO Received: 500 messages so far. Last group took: 18136 millis which is: 5,514 messages per second. average: 5,574
Is this the expected throughput ?
When I turn up the load to 1000 messages I receive sporadic errors when reading a batch of 100 messages at the time, and afterwards deleting them one at the time.
[thread #1 - ironmq://testqueue] IronMQConsumer WARN Error occurred during delete of object with messageid : 6033017857819101120. This exception is ignored.. Exchange[Message: <hello>229]. Caused by: [io.iron.ironmq.HTTPException - Message not found]
io.iron.ironmq.HTTPException: Message not found
at io.iron.ironmq.Client.singleRequest(Client.java:194)[ironmq-0.0.18.jar:]
at io.iron.ironmq.Client.request(Client.java:132)[ironmq-0.0.18.jar:]
at io.iron.ironmq.Client.delete(Client.java:105)[ironmq-0.0.18.jar:]
at io.iron.ironmq.Queue.deleteMessage(Queue.java:141)[ironmq-0.0.18.jar:]
It seems that the delete method can fail under load.
The test is part of a Camel component for Ironmq that can be found here https://github.com/pax95/camel-ironmq
The loadtest is here https://github.com/pax95/camel-ironmq/blob/master/src/test/java/org/apache/camel/component/ironmq/integrationtest/LoadTest.java
The network latency will have a lot to do with the message rates you can achieve. From outside an AWS DC, you'll generally see an additional 50-75ms for each operation. If you use concurrent threads, you'll get greater throughput. Also, our public clusters sometimes slows down due to load which is why our "Production" plan customers move to Pro clusters that are much faster.
That said, we've got a very big update coming to all our clusters that will increase performance and throughput significantly. You can actually download an installable version here: http://www.iron.io/mq-enterprise.
Chad

Categories

Resources