Kafka PointToPoint - java

The Problem
We have a multi-datacenter activeMQ setup, with NFS for each HA pair, and it seems that activeMQ isn't really scalable, and doesn't play well with NFS issues. (we're using 5.7)
The Possible Solution
Move to Kafka
Requirements
We need PointToPoint & pub/sub functionality
Message Priorities (I know kafka doesn't provide that out of the box, but there's a workaround for it on our side)
Question
Is this possible with Kafka (not necessarily out-of-the-box, but with some client tweaking)? If not, then what other technology would you suggest? It doesn't have to be JMS, but it needs to be scalable and reliable (and it needs to play well with NFS)

We need PointToPoint & pub/sub functionality
Kafka does that, shared my finding here
Message Priorities
Little confused what exactly you mean, but by priorities if you mean to consume from a specific offset then the Low level or Simple consumer API provide that. It also supports re-submission of messages as well

For point to point delivery of messages, (single producer, single consumer), you can configure the number of partitions as 1. Producer will publish messages on this topic and partition and a single consumer, will read from that topic and partition. A topic would equate to your queue in ActiveMQ terms.
If you want to add message priorities, I would use the lower level Kafka client, you could then increase the number of partitions, (each partition for a message of different priority level), and have the consumer fetch messages from the highest priority topic first, if no message exists it would then fetch from the next lower priority topic.

In your case I would use Kafka with 1 partition per topic with separate topics for each message priority level, solving prioritization of delivery on the subscriber side.

Kafka provides point to point and publish subscribe usage patterns by user groups concept.
What there is not directly supported is selectors, priorities. But you get possibility tho distribute messages across partitions so you could for example distribute messages based on priority to partitions.
You also get for free message persistence (one of the core principles) limited by retention policy. Each message in kafka is essentially key-value pair. Key has some specific semantics in partitioning and log compaction. There is nothing like in traditional messaging systems that you can use custom header which you can use for routing etc. Following article tries to summarize that.

Related

How many operating system resources is needed for one Java Kafka Consumer?

I want to use hundreds of thousands of KafkaConsumer. For example, I need 100_000 consumers for some architectural pattern. I am thinking, is it OK? Or should I to refactor my system and use few consumers for the whole system (for example, 10 consumers instead of 100_000).
So, my questions are:
Is there connection pool in KafkaConsumer, or each consumer creates own connection to kafka brokers?
Is there thread pool in KafkaConsumer, or each consumer creates own thread (I hope, it does not).
What is average memory consumption per KafkaConsumer?
What do you think about such architectural pattern?
1,2) Consumers request metadata from one of the brokers which is the leader of the partition. Each consumer is able to handle all IO from a single thread as the Java clients are designed around an event loop which is driven by the poll(). You can also build multi-threaded consumers but you'd need take care of offset management. Refer to Confluent's documentation for more details regarding the implementation of Java Clients.
3) According to Apache Kafka and Confluent Enterprise Reference architecture,
Consumers use at least 2MB per consumer and up to 64MB in cases of
large responses from brokers (typical for bursty traffic)
4) The number of consumers you've mentioned is huge so you'd need a very good reason to go for 100,000 consumers. It depends on the scenario though, but even Netflix should be using a lot less than that.

In Kafka, Is it possible to have replication for selective partitions from one topic?

As I understand that we can have replication at topic level where all partitions of a Topic will be replicated across cluster. But can we control replication of only selective partitions from a topic?
No. From the doc
Kafka replicates the log for each topic's partitions across a
configurable number of servers (you can set this replication factor on
a topic-by-topic basis).
If you think about the design of Kafka, what are Topics and Partitions, it makes sense:
Topics are streams or category of messages, so it's a way to organise your messages based on their type and content
partitions are basically a concept to handle smoothly distribution. They are based on a Key, but the original idea is to split your messages evenly on all partitions of one Topic. It's not really designed to organise messages for business use, but for architectural constraint.

Is it ok to use multiple session and connection on JMS (ActiveMQ)?

I must handle about 100 JMS Queue in a point-to-point messaging architecture. Every queue has a consumer. So I will have 100 consumer threads to handle them. Is it ok?
1)ActiveMQ Support your request(suggest write a connection pool)
2)you should confirm you server configuration whether is ok,when
QPS is high,
Instead of 100 queues, you could use a single queue and provide JMS message properties, having each consumer filter just the messages it wants.
What this does is give you some more options in architecture and deployment. You could have a single process consume multiple type of messages. Depending on your scaling issues, you could have multiple instances of a single consumer spread out among processes/servers/whatever.
You could also have one consumer for all 100 logical queues, reading the property and figuring out where to hand off the message internally, again, depending on whatever design issues you're running into.
Overall, messaging is so light-weight that it takes a significant volume of messages or a significant size of individual messages to really hurt things. I've got an ActiveMQ app that upon restart might have to process 10K/20K messages and it's complete in seconds. Fairly small messages, but still very possible (and my experience with other MQs is similar performance, as long as your processing is not too overwhelmingly difficult, you should be able to keep up).

Messaging Brokers and Buffers

Sorry for the newbie-ish question, but are messaging brokers such as RabbitMQ a replacement for writing our own message buffers? Meaning if we have fast producers and slow consumers, using a messaging broker takes care of the queueing or do I still have to implmement my own queue buffer?
Your scenario does indeed suggest using a JMS solution, like RabbitMQ. This is a primary reason these queue solutions exist. If you're expecting a huge bottleneck, with more producers than you can afford consumers, you may have to configure your queues in a more sophisticated manner. But, you shouldn't have to implement your own buffer.

Module clustering and JMS

I have a module which runs standalone in a JVM (no containers) and communicates with other modules via JMS.
My module is both a producer in one queue and a consumer in a different queue.
I have then need to cluster this module, both for HA reasons and for workload reasons, and I'm probably going to go with Terracotta+Hibernate for clustering my entities.
Currently when my app starts it launches a thread (via Executors.newSingleThreadExecutor()) which serves as the consumer (I can attach actual code sample if relevant and neccessary).
What I understood from reading questions here is that if I just start up my module on N different JVMs then N different subscribers will be created and each message in the queue will arrive to N subscribers.
What I'd like to do is have only one of them (let's currently say that which one is not important) process that message and so in actuality enable me to process N messages at a time.
How can/should this be done? Am I way off the track?
BTW, I'm using OpenMQ as my implementation but I don't know if that's relevant.
Thanks for any help
A classic case of message handling in clustered environment. This is what I would do.
Use Broadcast message (Channel based) in place of Queue. Queue being useful for point to point communication is not very effective. Set validity of message till the time it is consumed by one of the consumer. This way, other consumers wont even see the message and only one consumer will consume it.
Take a look at JGroups. You may consider implementing your module/subscribers to use jgroups for the kind of synchronization you need. JGroups provide Reliable Multicast Communication.

Categories

Resources