Kafka poll duration with slow network

Kafka poll duration with slow network - java

I have a very silly question, but I cannot find in the Kafka documentation a clear answer.
Imagine this situation.
I have a network latency that go very slow 1Mb/s, and then I have a Kafka topic with 10 messages with a size of 500kbs.
If I set a consumer.poll(500 Milliseconds) taking into consideration the network latency I would get only 2 records in that iteration?
Regards.

Related

Performance Issue: Latency Spike happens sometimes in Kafka Streams

I am doing Performance Testing in Kafka Streaming. I created a simple Streams API with Transformer.
// Stream data from input topic
builder.stream(Serdes.String(), Serdes.String(), inTopic)
// convert csv data to avro
.transformValues(new TransformSupplier())
// post converted data to output topic
.to(Serdes.String(), Serdes.ByteArray(), outTopic);
I am using inTopic with 10 partitions and outTopic with 1 partition. I am seeing latency is good and around ~4-6 ms . But, I am facing sudden spike in the latency sometimes and it reaches even upto ~60 - 1000 ms. Then after few seconds, it gradually dropped latency down back to ~4-6 ms. This results in the average latency for my whole experiment to ~67 ms.
What could be the reason for the sudden spike? Suggest me some performance tuning parameters if any.
Note: I have provided default StreamsConfig only.

After some amount of msgs producing, the data should be flushed to the disk.
This may cause the phenomenon you observed.
Please refer to the "log.flush.interval.messages" of kafka configuration: Link
After in prod, I do not recommend you to change this property to improve. You should change your system conf:
/proc/sys/vm/dirty_background_ratio
/proc/sys/vm/dirty_ratio
To improve the efficiency of your own msgs flush

What is "Consumer utilisation" in RabbitMQ, and how does it relate to prefetch count?

What is Consumer Utilization (shown in the RabbitMQ management console) in RabbitMQ? Is it the percentage the consumers utilize the RabbitMQ by consuming the messages? Or have I misunderstood it? If so how to make the consumers best utilize the RabbitMQ for 100%? Because, I couldn't see any increase in % when I increase the consumers, but only a fraction of seconds on the moment a new consumer is started. I could not make lots from the small explanation provided in tool-tip.
Besides, the tooltip doc says the prefetch count someway influences the consumer utilization, so is there a formula kind to fix the numbers?
so many consumers = this many fetch counts
(or)
time taken by a consumer to process a messages = this many fetch count

The definition of consumer utilisation is the proportion of time that a queue's consumers could take new messages.
Increasing the prefetch limit will result in increases in consumer utilisation.
See here for more info.
The page linked above also contains the author's observations on the correlation between prefetch limit and consumer utilisation.

Benchmarking Kafka - mediocre performance

I'm benchmarking Kafka 0.8.1.1 by streaming 1k size messages on EC2 servers.
I installed zookeeper on two m3.xlarge servers and have the following configuration:
dataDir=/var/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.server1=zoo1:2888:3888
server.server2=zoo2:2888:3888
Second I installed Single Kafka Server on i2.2xlarge machine with 32Gb RAM and additional 6 SSD Drives where each disk partitioned as /mnt/a , mnt/b, etc..... On server I have one broker, single topic on port 9092 and 8 partitions with replication factor 1:
broker.id=1
port=9092
num.network.threads=4
num.io.threads=8
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.dirs=/mnt/a/dfs-data/kafka-logs,/mnt/b/dfs-data/kafka-logs,/mnt/c/dfs-data/kafka-logs,/mnt/d/dfs-data/kafka-logs,/mnt/e/dfs-data/kafka-logs,/mnt/f/dfs-data/kafka-logs
num.partitions=8
log.retention.hours=168
log.segment.bytes=536870912
log.cleanup.interval.mins=1
zookeeper.connect=172.31.26.252:2181,172.31.26.253:2181
zookeeper.connection.timeout.ms=1000000
kafka.metrics.polling.interval.secs=5
kafka.metrics.reporters=kafka.metrics.KafkaCSVMetricsReporter
kafka.csv.metrics.dir=/tmp/kafka_metrics
kafka.csv.metrics.reporter.enabled=false
replica.lag.max.messages=10000000
All my tests are done from another instance and latency between instances is less than 1 ms.
I wrote producer/consumer java client using one thread producer and 8 threads consumer, when partition key is a random number from 0 till 7.
I serialized each message using Json by providing custom encoder.
My consumer producer properties are the following:
metadata.broker.list = 172.31.47.136:9092
topic = mytopic
group.id = mytestgroup
zookeeper.connect = 172.31.26.252:2181,172.31.26.253:2181
serializer.class = com.vanilla.kafka.JsonEncoder
key.serializer.class = kafka.serializer.StringEncoder
producer.type=async
queue.enqueue.timeout.ms = -1
batch.num.messages=200
compression.codec=0
zookeeper.session.timeout.ms=400
zookeeper.sync.time.ms=200
auto.commit.interval.ms=1000
number.messages = 100000
Now when I'm sending 100k messages, I'm getting 10k messages per second capacity and about 1 ms latency.
that means that I have 10 Megabyte per second which equals to 80Mb/s, this is not bad, but I would expect better performance from those instances located in the same zone.
Am I missing something in configuration?

I suggest you break down the problem. How fast is it without JSon encoding. How fast is one node, without replication vs with replication. Build a picture of how fast each component should be.
I also suggest you test bare metal machines to see how they compare as they can be significantly faster (unless CPU bound in which case they can be much the same)
According to this benchmark you should be able to get 50 MB/s from one node http://kafka.apache.org/07/performance.html
I would expect you should be able to get close to saturating your 1 Gb links (I assume thats what you have)
Disclaimer: I work on Chronicle Queue which is quite a bit faster, http://java.dzone.com/articles/kafra-benchmark-chronicle

If it makes sense for your application, you could get better performance by streaming byte arrays instead of JSON objects, and convert the byte arrays to JSON objects on the last step of your pipeline.
You might also get better performance if each consumer thread consistently reads from the same topic partition. I think Kafka only allows one consumer to read from a partition at a time, so depending on how you're randomly selecting partitions, its possible that a consumer would be briefly blocked if it's trying to read from the same partition as another consumer thread.
It's also possible you might be able to get better performance using fewer consumer threads or different kafka batch sizes. I use parameterized JUnit tests to help find the best settings for things like number of threads per consumer and batch sizes. Here are some examples I wrote which illustrate that concept:
http://www.bigendiandata.com/2016-10-02-Junit-Examples-for-Kafka/
https://github.com/iandow/kafka_junit_tests
I hope that helps.

Scalable and High performance message channel

I am developing agents to collect data from different sources, the data should be posted to a channel at high frequency (say every 15 seconds). REST is definitely not a solution. The requirement is clearly fire and forget as status reply is not concerned.
Throughput is more important, message drops are acceptable upto 5%.
Possible solutions I come across are
Message Bus
Multicast
UDP
Any alternatives, please suggest.

IMHO High frequency is too fast too see and 15 seconds you can see. It takes about 0.5 seconds to send a message round the world and back again. You can just about see 15 milli-seconds. And if you are talking about 15 micro-seconds, that is definitely high frequency. I have a persisted messaging solution with a latency of around 0.1 micro-seconds which is 0.0000001 seconds, but I don't suggest you need that.
If all you need is a message every 15 seconds I would use the simplest solution which comes to mind. I would try ActiveMQ which I found to be one of the simplest to get working. You should be able to achieve message rates of up to 20,000 per seconds and decent latencies of about 0.01 seconds and you shouldn't lose any messages.

Rabbit MQ: Improve queue flushing speed

I have a durable queue which holds persistent messages. The messages arrive into the queue at a rate of about 10 messages per second.
The client is unable to fetch those messages at that rate. As a result the queue on the server keeps growing.
Each message is less than 1 KB and I have a healthy 2 Mbps line between the server and my machine. Using a network monitoring utility, I found that it is hardly using any of that bandwidth.
The client is doing nothing with the messages as of now, just printing them to console so processing time on client is almost 0.
Some other details:
I am using a java client.
I have set the client to prefetch 10000 messages. (also tried with default values)
The round trip time is about 350 ms.
Messages are individually acknowledged.
The available resources are being underutilized and 10 messages per second is hardly any load in my opinion. How do I speed up things so that messages held up in queue are transferred faster to the client. Possibly using some sort of batching.

If you are indidivudally acknowledging messages every 350 ms, I would expect the consumer might achieve about 1/0.35 or about 2.9 messages per second. However, the protocol might not be that efficient and it may need two round trips to the server to acknowledge the message and get the next one. i.e. 1.4 message per second may be more realistic.
A round trip of 350 ms is very high, you can go around the world and back again in that time, so a simple solution may not work best for you. e.g. London -> New York -> Tokyo -> London.
I would try having a broker local to your client instead. This way the round trip is between your client and your local broker.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Kafka poll duration with slow network - java

Related

Performance Issue: Latency Spike happens sometimes in Kafka Streams

What is "Consumer utilisation" in RabbitMQ, and how does it relate to prefetch count?

Benchmarking Kafka - mediocre performance

Scalable and High performance message channel

Rabbit MQ: Improve queue flushing speed

Categories

Resources