how to get all uncommitted messages in kafka when manually committing offset - java

In my application ,i am consuming json messages from kafka topic and Multiple instances are running of my application. I have set kafka prop as: props.put("enable.auto.commit", "false")
So When i consume message ,i push it to my DB and then commit it as :
private static void commitMessage(KafkaConsumer<String, String> kafkaConsumer, ConsumerRecord message, String kafkaTopic) {
long nextOffset = message.offset() + 1;
TopicPartition topicPartition = new TopicPartition(kafkaTopic, message.partition());
OffsetAndMetadata offsetAndMetadata = new OffsetAndMetadata(nextOffset);
Map<TopicPartition, OffsetAndMetadata> offsetAndMetadataMap = new HashMap<>();
offsetAndMetadataMap.put(topicPartition, offsetAndMetadata);
//
log.info("Commiting processed kafka message, topic [" + kafkaTopic + "], partition [" + message.partition() + "], next offset [" + nextOffset + "]");
kafkaConsumer.commitSync(offsetAndMetadataMap);
}
Now it may happen after consuming message(but before pushing it to DB) my Application restarts for some reason. Now i want to consume uncommitted message again from kafka after restart. I am able to do using seek:
private static void seekAllPartitions(KafkaConsumer<String, String> kafkaConsumer, String kafkaTopic) {
List<PartitionInfo> partitionInfos = kafkaConsumer.partitionsFor(kafkaTopic);
println 'Size ofpartition list : ' + partitionInfos.size()
for (PartitionInfo partitionInfo : partitionInfos) {
TopicPartition topicPartition = new TopicPartition(kafkaTopic, partitionInfo.partition());
OffsetAndMetadata committedForPartition = kafkaConsumer.committed(topicPartition);
try {
if (committedForPartition != null) {
println 'Seeking offset...' + committedForPartition.offset()
kafkaConsumer.seek(topicPartition, committedForPartition.offset());
}
} catch (Exception ex) {}
}
}
Now problem is - seek(topicPartition,committedForPartition.offset()) gives me last uncommitted message and not the intermediate uncommitted messages.As i mentioned ,multiple instance are running - i may end up with intermediate uncommitted messages for ex : Instance a -2nd msg was not committed and Instance b - 5 the msg not committed but it gives me 5th message only and not 2nd.

Related

Akka stream stops processing data

When I run the below stream it does not receive any subsequent data once the stream runs.
final long HOUR = 3600000;
final long PAST_HOUR = System.currentTimeMillis()-HOUR;
private final static ActorSystem actorSystem = ActorSystem.create(Behaviors.empty(), "as");
protected static ElasticsearchParams constructElasticsearchParams(
String indexName, String typeName, ApiVersion apiVersion) {
if (apiVersion == ApiVersion.V5) {
return ElasticsearchParams.V5(indexName, typeName);
} else if (apiVersion == ApiVersion.V7) {
return ElasticsearchParams.V7(indexName);
}
else {
throw new IllegalArgumentException("API version " + apiVersion + " is not supported");
}
}
String queryStr = "{ \"bool\": { \"must\" : [{\"range\" : {"+
"\"timestamp\" : { "+
"\"gte\" : "+PAST_HOUR
+" }} }]}} ";
ElasticsearchConnectionSettings connectionSettings =
ElasticsearchConnectionSettings.create("****")
.withCredentials("****", "****");
ElasticsearchSourceSettings sourceSettings =
ElasticsearchSourceSettings.create(connectionSettings)
.withApiVersion(ApiVersion.V7);
Source<ReadResult<Stats>, NotUsed> dataSource =
ElasticsearchSource.typed(
constructElasticsearchParams("data", "_doc", ApiVersion.V7),
queryStr,
sourceSettings,
Stats.class);
dataSource.buffer(10000, OverflowStrategy.backpressure());
dataSource.backpressureTimeout(Duration.ofSeconds(1));
dataSource
.log("error")
.runWith(Sink.foreach(a -> System.out.println(a)), actorSystem);
produces output :
ReadResult(id=1656107389556,source=Stats(size=0.09471),version=)
Data is continually being written to the index data but the stream does not process it once it has started. Shouldn't the stream continually process data from the upstream source? In this case, the upstream source is an Elastic index named data.
I've tried amending the query to match all documents :
String queryStr = "{\"match_all\": {}}";
but the same result.
The Elasticsearch source does not run continuously. It initiates a search, manages pagination (using the bulk API) and streams results; when Elasticsearch reports no more results it completes.
You could do something like
Source.repeat(Done).flatMapConcat(done -> ElasticsearchSource.typed(...))
Which will run a new search immediately after the previous one finishes. Note that it would be the responsibility of the downstream to filter out duplicates.

#RabbitListener how to set frequency of receiving messages in annotation

I am using #RabbitListner annotation to recieve messages from a RabbitMq queue.
How to make threads receive messages no more often than 1 second?
#RabbitListener(queues = "message", priority = "3",concurrency = "2")
public void receiveCheck(RequestMessage message){
}
Your task is a bit strange. Don't you think that the problem should be solved differently (maybe you can send messages with a certain frequency - 1 message/sec)?
But if you're sure that's what you need, you could use primitive solution:
#RabbitListener(queues = "message", priority = "3", concurrency = "2")
public void receiveMessage(String message) throws InterruptedException {
System.out.println("Received <" + message + ">" + " Message time: " + LocalDateTime.now());
Thread.sleep(1000);
}
Or with the calculation of the operation time:
#RabbitListener(queues = "message", priority = "3", concurrency = "2")
public void receiveMessageWithTimer(String message) throws InterruptedException {
long start = System.currentTimeMillis();
System.out.println("Received <" + message + ">" + " Message time: " + LocalDateTime.now());
long finish = System.currentTimeMillis();
long operationTime = finish - start;
Thread.sleep(1000 - operationTime);
}
But in this case you should remeber that concurrency level = 2. And you will receive 2 message/sec.
For receiving of only one message you could set concurrency level = 1.

ZeroMQ (cppzmq) subscriber with filters which start with the same string

I'm using two topics in my sample publisher. Both start with the same string. When I filter the message in a subscriber using only one of the two topics, the subscriber receives both topics
If I use two different topics, it works
My sample publisher
try (ZContext context = new ZContext()) {
final ZMQ.Socket socket = context.createSocket(SocketType.PUB);
socket.bind("tcp://*:5555");
int i = 0;
while (!Thread.currentThread().isInterrupted() && !stopped) {
logger.debug("sending C1 message");
final String env = "topic";
final String msg = "Hello, world #" + i++;
socket.sendMore(env);
socket.send(msg);
logger.debug("sending C2 message");
final String env2 = "topic2";
final String msg2 = "Hello, world #" + i++;
socket.sendMore(env2);
socket.send(msg2);
try {
sleep(5000);
} catch (final InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
And my sample subscriber :
zmq::context_t ctx;
zmq::socket_t sock(ctx, zmq::socket_type::sub);
sock.connect("tcp://127.0.0.1:5555");
std::string filter="topic";
sock.setsockopt(ZMQ_SUBSCRIBE,filter.c_str(),filter.length());
while(true) {
zmq::message_t env;
sock.recv(&env);
std::string env_str = std::string(static_cast<char*>(env.data()), env.size());
std::cout << "Received Enveloppe '" << env_str << "'" << std::endl;
zmq::message_t msg;
sock.recv(&msg);
std::string msg_str = std::string(static_cast<char*>(msg.data()), msg.size());
std::cout << "Received '" << msg_str << "'" << std::endl;
}
My subscriber should display only the message associated with the topic "topic" and not both.
Statement : "My subscriber should display only the message associated to the topic "topic" and not both."
Not correct, exactly the opposite is true.
Documentation is clear in this. ZeroMQ API explicitly states:
A non-empty option_value shall subscribe to all messages beginning with the specified prefix. Multiple filters may be attached to a single ZMQ_SUB socket, in which case a message shall be accepted if it matches at least one filter.
+-----------------------+---------------+| Option value type | binary data |
+-----------------------+---------------+
Example:
a message, dispatched on the PUB-side as:
PUB.send( "123456------------" );
will get .recv()-ed on either of below subscribed SUB-s:
SUB.setsockopt( zmq.SUBSCRIBE, "" ); // this one .recv()-es EVERY message
SUB.setsockopt( zmq.SUBSCRIBE, "1" ); // this one .recv()-es "1{0+[*]}"
SUB.setsockopt( zmq.SUBSCRIBE, "12" ); // this one .recv()-es "12{0+[*]}"
SUB.setsockopt( zmq.SUBSCRIBE, "123" );// this one .recv()-es "123{0+[*]}"

Kafka Consumer giving only first produced message

I am new to Kafka. I am using Kafka 0.9.0.0 client for java. While consuming the data from a particular topic, I am getting same message every time (Which was posted for the first time ), when I start the producer-consumer java project.
My Requirement is to produce some message and consume it and check if both the messages are same or not.
Below is the code I am using for Kafka Consumer:-
KafkaConsumer<String, String> newConsumer = new KafkaConsumer<String, String>(properties);
newConsumer.subscribe(Collections.singletonList(props.getProperty("monitoring.topic")));
String consumerRecord = "";
ConsumerRecords<String, String> consumerRecords = newConsumer.poll(120000);
for (ConsumerRecord<String, String> record : consumerRecords) {
logger.info("Found message for {} {} {}", adapter, record.key(), record.value());
System.out.println("consumerMessage : " + record.value());
JSONObject jsonConsumerMessage = (JSONObject) (parser.parse(record.value()));
Long offset = record.offset();
System.out.println("Offset of this record is " + offset);
String UUIDProducer = message.get("UUID").toString();
String UUIDConsumer = jsonConsumerMessage.get("UUID").toString();
System.out.println("UUIDProducer : " + UUIDProducer);
System.out.println("UUIDConsumer : " + UUIDConsumer);
if (UUIDProducer.equals(UUIDConsumer)) {
return true;
} else
return false;
}
Note: -I am able to consume the latest messages through command line.
Can anyone please guide me on this ?
It was my silly mistake that I am returning the true and false value inside for loop. Its causing the loop to come out as soon as first message came from the topic.

AWS SQS Java. Not all messages are retrieved from the SQS queue

I have been trying several approaches to retrieve all messages from the SQS queue by using AWS SDK for Java to no avail. I have read about the distributed nature of the AWS SQS and that messages are stored on the different servers. But what I do not understand is why this architecture is not hidden from the end user. What tricks do I have to apply in Java code to retrieve all messages and be 100% sure that no one was missed?
I tried this with the "Long Polling":
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl);
List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (Message message : messages) {
System.out.println(" Message");
System.out.println(" MessageId: " + message.getMessageId());
System.out.println(" ReceiptHandle: " + message.getReceiptHandle());
System.out.println(" MD5OfBody: " + message.getMD5OfBody());
System.out.println(" Body: " + message.getBody());
for (Entry<String, String> entry : message.getAttributes().entrySet()) {
System.out.println(" Attribute");
System.out.println(" Name: " + entry.getKey());
System.out.println(" Value: " + entry.getValue());
}
}
System.out.println();
And this with Request Batching / Client-Side Buffering:
// Create the basic Amazon SQS async client
AmazonSQSAsync sqsAsync = new AmazonSQSAsyncClient();
// Create the buffered client
AmazonSQSAsync bufferedSqs = new AmazonSQSBufferedAsyncClient(sqsAsync);
CreateQueueRequest createRequest = new CreateQueueRequest().withQueueName("MyTestQueue");
CreateQueueResult res = bufferedSqs.createQueue(createRequest);
SendMessageRequest request = new SendMessageRequest();
String body = "test message_" + System.currentTimeMillis();
request.setMessageBody( body );
request.setQueueUrl(res.getQueueUrl());
SendMessageResult sendResult = bufferedSqs.sendMessage(request);
ReceiveMessageRequest receiveRq = new ReceiveMessageRequest()
.withMaxNumberOfMessages(10)
.withQueueUrl(queueUrl);
ReceiveMessageResult rx = bufferedSqs.receiveMessage(receiveRq);
List<Message> messages = rx.getMessages();
for (Message message : messages) {
System.out.println(" Message");
System.out.println(" MessageId: " + message.getMessageId());
System.out.println(" ReceiptHandle: " + message.getReceiptHandle());
System.out.println(" MD5OfBody: " + message.getMD5OfBody());
System.out.println(" Body: " + message.getBody());
for (Entry<String, String> entry : message.getAttributes().entrySet()) {
System.out.println(" Attribute");
System.out.println(" Name: " + entry.getKey());
System.out.println(" Value: " + entry.getValue());
}
}
But I am still unable to retrieve all messages.
Any idea?
AWS Forum keeps silence on my post.
When receiving messages from an SQS queue, you need to repeatedly call sqs:ReceiveMessage.
On each call to sqs:ReceiveMessage, you will get 0 or more messages from the queue which you'll need to iterate through. For each message, you'll also need to call sqs:DeleteMessage to remove the message from the queue when you're done processing each message.
Add a loop around your "Long Polling" sample above to receive all messages.
for (;;) {
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl);
List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (Message message : messages) {
System.out.println(" Message");
System.out.println(" MessageId: " + message.getMessageId());
System.out.println(" ReceiptHandle: " + message.getReceiptHandle());
System.out.println(" MD5OfBody: " + message.getMD5OfBody());
System.out.println(" Body: " + message.getBody());
for (Entry<String, String> entry : message.getAttributes().entrySet()) {
System.out.println(" Attribute");
System.out.println(" Name: " + entry.getKey());
System.out.println(" Value: " + entry.getValue());
}
}
System.out.println();
}
Also note that you may receive the same message more than once. So allow your work to "reprocess" the same message, or detect a repeated message.
I too was facing same issue - only one message was getting returned , then i tried
receiveMessageRequest.setMaxNumberOfMessages(10) , which would help me in retrieving 10 messages in a loop,
since my queue has >500 records what i did was
List<String> messagelist = new ArrayList<>();
try
{
AmazonSQS sqs = new AmazonSQSClient(credentials);
Region usWest2 = Region.getRegion(Regions.US_WEST_2);
sqs.setRegion(usWest2);
boolean flag = true;
while(flag)
{
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(queuename);
receiveMessageRequest.setMaxNumberOfMessages(number_of_message_);
receiveMessageRequest.withMaxNumberOfMessages(number_of_message_).withWaitTimeSeconds(wait_time_second_);
List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (Message message : messages)
{
// System.out.println(" Body: " + message.getBody());
messagelist.add( message.getBody());
String messageReceiptHandle = message.getReceiptHandle();
sqs.deleteMessage(new DeleteMessageRequest().withQueueUrl(queuename).withReceiptHandle(messageReceiptHandle));
}
if(messages.size()==0)
{
flag = false;
}
}
}
catch (AmazonServiceException ase) {
ase.printStackTrace();
} catch (AmazonClientException ace) {
ace.printStackTrace();
}
finally {
return messagelist ;
}
I am reading records from SQS then saving it into a String list and then deletion the record from queue.
so in the end i will have all the data from the queue in a list
An SQS queue is not a database. You can't read all the messages into a list like you are trying to do. There is no beginning and no end to the queue. You poll the queue and ask for some messages, it returns you some messages if they exist.
If you want a method that can return the entire dataset, then sqs is not the right tool - a traditional database might be better in that case.
Long polling will wait if there is no message in Queue. This means that if you call ReceiveMessage with long polling in loop you are guaranteed that you will get all messages. When there is 0 messages received in response, you've already received all messages.
You mentioned that you used also web console. Web console works in same way as calling API with SDK. This means that when you receive and see messages in console, messages are invisible to other clients until visibility timeout expires. That's probably reason why you don't see messages.
See more information about visibility timeout:
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html

Categories

Resources