I use spring framework and kafka that has 3 brokers clustered. I found out that consumer did not consume some messages (let say 0.01 percent between all sent messages) so in producer code I log message offset that returned by api:
ListenableFuture<SendResult<String, Message>> future = messageTemplate.sendDefault(id, message);
SendResult<String, Message> sendResult = future.get();
String offset = sendResult.getRecordMetadata().offset();
I use return offset to query kafka topic in all partition but it do not find the message (I test other offsets related to messages that consumers used and they are in kafka), whats the problem and how can I insure that message sent to kafka??
I also used messageTemplate.flush(); in producer
I find out that when topic leader of Kafka broker goes down Kafka will rebalance itself and another broker becomes the leader of that partition and if the ack config is not set to all there is a chance of losing some data in this procedure. so change the config to
ack=all
also, there is a chance of losing data if minimum in sync replica becomes less than 2, so set it at least to 2.
min.insync.replicas = 2
Related
I have a single node kafka broker and simple streams application. I created 2 topics (topic1 and topic2).
Produced on topic1 - processed message - write to topic2
Note: For each message produced only one message is written to destination topic
I produced a single message. After it was written to topic2, I stopped the kafka broker. After sometime I restarted the broker and produced another message on topic1. Now streams app processed that message 3 times. Now without stopping the broker I produced messages to topic1 and waited for streams app to write to topic2 before producing again.
Streams app is behaving strangely. Sometimes for one produced message there are 2 messages written to destination topic and sometimes 3. I don't understand why is this happening. I mean even the messages produced after broker restart are being duplicated.
Update 1:
I am using Kafka version 1.0.0 and Kafka-Streams version 1.1.0
Below is the code.
Main.java
String credentials = env.get("CREDENTIALS");
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "activity-collection");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.RECONNECT_BACKOFF_MS_CONFIG, 100000);
props.put(StreamsConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG, 200000);
props.put(StreamsConfig.REQUEST_TIMEOUT_MS_CONFIG, 60000);
props.put(StreamsConfig.RETRY_BACKOFF_MS_CONFIG, 60000);
props.put(StreamsConfig.producerPrefix(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG), true);
props.put(StreamsConfig.producerPrefix(ProducerConfig.ACKS_CONFIG), "all");
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> activityStream = builder.stream("activity_contenturl");
KStream<String, String> activityResultStream = AppUtil.hitContentUrls(credentials , activityStream);
activityResultStream.to("o365_user_activity");
AppUtil.java
public static KStream<String, String> hitContentUrls(String credentials, KStream<String, String> activityStream) {
KStream<String, String> activityResultStream = activityStream
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
#Override
public Iterable<String> apply(String value) {
ArrayList<String> log = new ArrayList<String>();
JSONObject received = new JSONObject(value);
String url = received.get("url").toString();
String accessToken = ServiceUtil.getAccessToken(credentials);
JSONObject activityLog = ServiceUtil.getActivityLogs(url, accessToken);
log.add(activityLog.toString());
}
return log;
}
});
return activityResultStream;
}
Update 2:
In a single broker and single partition environment with the above config, I started the Kafka broker and streams app. Produced 6 messages on source topic and when I started a consumer on destination topic there are 36 messages and counting. They keep on coming.
So I ran this to see consumer-groups:
kafka_2.11-1.1.0/bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server localhost:9092 --list
Output:
streams-collection-app-0
Next I ran this:
kafka_2.11-1.1.0/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group streams-collection-app-0
Output:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
o365_activity_contenturl 0 1 1 0 streams-collection-app-0-244b6f55-b6be-40c4-9160-00ea45bba645-StreamThread-1-consumer-3a2940c2-47ab-49a0-ba72-4e49d341daee /127.0.0.1 streams-collection-app-0-244b6f55-b6be-40c4-9160-00ea45bba645-StreamThread-1-consumer
After a while the output showed this:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
o365_activity_contenturl 0 1 6 5 - - -
And then:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
o365_activity_contenturl 0 1 7 6 - - -
seems you are facing with known limitation. Kafka topic by default stores messages at least 7 days, but committed offsets stored for 1 day (default config value offsets.retention.minutes = 1440). so if no messages were produced to your source topic during more than 1 day, after app restart all messages from topic will be reprocessed again (actually multiple times, depending on number of restarts, max 1 time per day per such topic with rare incoming messages).
you could find description of expiration committed offsets How does an offset expire for consumer group.
in kafka version 2.0 retention for committed offsets was increased KIP-186: Increase offsets retention default to 7 days.
to prevent reprocessing, you could add consumer property auto.offset.reset: latest (default value is earliest).
there is exist a small risk with latest: if no one produced message into a source topic longer that day, and after that you restarted app, you could lost some messages (only messages which arrived exactly during restart).
I would like to fine tune the KafkaTemplate options for the Producer, to handle as optimally as possible, the various failover and recovery scenarios.
We have our KafkaProducerMessageHandler running in sync mode (i.e. waiting for the send operation results - see: acks below). Note: this is necessary in the current version of Kafka to enable ErrorChannel reporting.
Here are the options I have choosen:
acks = 1 (we are performing basic acknowlegement from the Kafka
broker leader)
retries = 10
max.in.flight.requests.per.connection = 1 (This will keep the messages in order, if an error state is reached)
linger.ms = 1 (not sure about this one or whether it is relevant?)
request.timeout.ms = 5000
(five seconds for timeout, this will work with the retries - so
total time of 50 seconds, before the message is deemed to have failed and will then appear on the error channel)
enable.idempotence = false (again, not sure about this option?)
retry.backoff.ms = 100 (this is the default - again is it worth playing with?)
How do these values sound?
Is there anything I am missing?
This is an old post about Kafka producer tuning: http://ingest.tips/2015/07/19/tips-for-improving-performance-of-kafka-producer/
TLDR version:
Pay attention on the 'batch.size' and 'linger.ms' parameters.
I have a rabbitmq queue and two spring cloud spring consumers.
I want that each consumers process messages in order.
I thought that when consumer1 send ack, consumer2 receive second message,
so I expected message1, message2 is processed in order in each consumers.
-------------------- time pass ------------------------>
consumer1: message1 message3
consumer2: message2 message4
But it wasn't. consumer1, consumer2 receive message1, message2, and process simultaneously.
-------------------- time pass ------------------------>
consumer1: message1 message3
consumer2: message2 message4
Is there a way for spring cloud stream to consume messages exclusively?
RabbitMQ (AMQP) doesn't support that; each consumer gets prefetch messages.
It does support exclusive consumers, but it means consumer1 would get all the messages and consumer2 would only get messages if consumer1 dies.
However, Spring Cloud Stream doesn't currently provide a property to set that option.
you would have to model your queues in a different way. E.g. by having an "incoming" queue which has exactly one consumer-coordinator. This consumer would relay messages to the "work" queue where consumer1+2 are both waiting and pick up work in a round robin way.
They would then signal completion to the coordinator on a third queue which would cause it to resume relaying a single message to the work queue.
I have a simple kafka setup. A producer is producing messages to a single partition with a single topic at a high rate. A single consumer is consuming messages from this partition. During this process, the consumer may pause processing messages several times. The pause can last a couple of minutes. After the producer stops producing messages, all messages queued up will be processed by the consumer. It appears that messages produced by the producer are not being seen immediately by the consumer. I am using kafka 0.10.1.0. What can be happening here? Here is the section of code that consumes the message:
while (true)
{
try
{
ConsumerRecords<String, byte[]> records = consumer.poll(100);
for (final ConsumerRecord<String, byte[]> record : records)
{
serviceThread.submit(() ->
{
externalConsumer.accept(record);
});
}
consumer.commitAsync();
} catch (org.apache.kafka.common.errors.WakeupException e)
{
}
}
where consumer is a KafkaConsumer with auto commit disabled, max poll record of 100, and session timeout of 30000. serviceThread is an ExecutorService.
The producer just involves the KafkaProducer.send call to send a ProducerRecord.
All configurations on the broker are left as kafka defaults.
I am also using kafka-consumer-groups.sh to check what is happening when consumer is not consuming the message. But when this happens, the kafka-consumer-groups.sh will be hanging there also, not able to get information back. Sometimes it triggers the consumer re-balance. But not always.
For those who can find this helpful. I've encountered this problem (when kafka silently supposedly stops consuming) often enough and every single time it wasn't actually problem with Kafka.
Usually it is some long-running or hanged silent process that keeps Kafka from committing the offset. For example a DB client trying to connect to the DB. If you wait for long enough (e.g. 15 minutes for SQLAlchemy and Postgres), you will see a exception will be printed to the STDOUT, saying something like connection timed out.
I have a requirement that a single JMS message sent by a client must be delivered reliably (exactly-once) to two systems. These 2 systems are not HA-enabled, so the best suggestion that I came up with is to:
create single queue where client posts to
set up two "intermediate" queues
use a custom "DuplicatorMDB" that will read messages from the client queue and post them to two queues within the same transaction.
client->JMSDQ->DuplicatorMDB->Q1->MDB->System1
\->Q2->MDB->System2
Is there any existing functionality like that? What would be the proper way to balance the system to keep it stable if one or both of the backend systems are down?
The application server is WebLogic 10.
I can't use topics for this because in a cluster topics will cause too much message duplication. If we have 2 instances, then with topics it'll go like this:
client->Topic-->MDB1#server1->System1
| \->MDB2#server1->System2
\---->MDB1#server2->System1
\--->MDB2#server2->System2
Thus every message will be delivered twice to System1 and twice to System2 and if there'll be 8 servers in a cluster, each message will be delivered 8 times. This is what I'd really like to avoid...
Finally I got some time to test it and here is what I observed:
2 nodes in a cluster. 2 JMS servers: jms1 on node1, jms2 on node2.
Distributed topic dt. MDB with durable subscription and jms-client-id=durableSubscriber. Started the system: 0 messages, mdb#node1 is up, mdb#node2 trying to connect periodically, but it can't because "Client id, durableSubscriber, is in use". As expected.
Sent in 100 messages:
jms1#dt messages current = 0, messages total = 100, consumers current = 1
I can see that node1 processed 100 messages.
jms2#dt messages current = 100, messages total = 100 , consumers current = 1
i.e. "duplicate" messages are pending in the topic.
Sent in another 100 messages, 100 were processed on the node1, 200 pending on node2.
Rebooted node1, mdb#node2 reconnected to dt and started processing "pending" messages. 200 messages were processed on node2.
After node1 is up, mdb#node1 can't connect to the dt, while mdb#node2 is connected.
jms1#dt messages current=0, messages total = 0, consumers current = 0
jms2#dt messages current=0, messages total = 200, consumers current = 1
Send in 100 more messages, I see that all 100 messages are processed on node2 and discarded on node1.
jms1#dt messages current=0, messages total = 100, consumers current = 0
jms2#dt messages current=0, messages total = 300, consumers current = 1
Now I reboot node2, mdb#node1 reconnects to dt. After reboot mdb#node2 reconnects to dt and mdb#node1 gets disconnected from dt.
jms1#dt messages current=0, messages total = 100, consumers current = 1
jms2#dt messages current=0, messages total = 0, consumers current = 1
I send in 100 messages, all are processed on node2 and stored in the topic on node1:
jms1#dt messages current=100, messages total = 200, consumers current = 1
jms2#dt messages current=0, messages total = 0, consumers current = 1
Then I shut down node2 and I see 100 "pending messages" being processed on node1 after mdb#node1 reconnects to the topic.
So the result is:
I sent 400 messages, 700 were processed by MDB out of which 300 were duplicates.
It looks like the MDB reconnection works good as expected, but the messages may be duplicated if a node hosting the "active" MDB goes down.
This might be a bug or a feature of weblogic JMS implementation.
I haven't used Weblogic, but most JMS solutions have the concept of Queues and Topics. You want a JMS Topic. Subscribers register and the topic ensures that the message is delivered to each subscriber once.
Configuration details.
Update: If you are running into issues in a clustered environment, I would make sure that everything is configured properly (here is a guide for JMS Topic Clustering). It definitely sounds strange that Weblogic would fail so miserably when clustering. If that does not work, you could look into a 3rd party Messaging Queue, such as RabbitMQ, which support JMS and will definitely not have this issue.
This is the kind of behaviour that an ESB implementation should enbale. In terms of processing overhead there would be no great difference, but it can be useful to have separation of concerns between "plumbing" and application code.
As it happens, the WebSphere JMS implementaion has support for installing mediations that address such requirements. I don't know whether WebLogic has something similar, or whether their associated ESB products are an option for you, but I would recommend investigating those capabilities. You currently have a simple requirement, and your code is surely sufficient, however it's quite easy to imagine how a few minor additional requirements (could we just convert this field from dollars to pounds before we transmit to that destination, could we not send messages with this content to that destination ...) and lo! you find yourself writing your own ESB.
[...] Thus every message will be delivered twice to System1 and twice to System2 and if there'll be 8 servers in a cluster, each message will be delivered 8 times. This is what I'd really like to avoid...
This is right for non-durable subscriptions, but not for durable. For durable, all MDBs share the same connection-id and subscription-id (based on the MDB name by default), so only one MDB will be able to attach and receive messages at a time. The first MDB to try will succeed in connecting, the others will detect a conflict and fail, but keep retrying. So using a durable topic subscription should do the trick.