Kafka - Producer Number of Batches - java

Is there a way to determine the number of batches that were created by a Kafka producer for a specific set of messages? For instance if I am sending 10K messages in a loop , is there a way to check how many batches were sent? I set the "batch.size" to a high value and my expectation was that the message will be buffered and there will be a delay in seeing the message in my consumer. However this seems to be printed almost immediately in my consumer program.
The default value if batch.size is 16384. Is this number of bytes?
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
public class KafkaProducerApp {
public static void main(String[] args){
Properties properties = new Properties();
properties.put("bootstrap.servers","localhost:9092,localhost:9093,localhost:9094");
properties.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.put("acks","0");
properties.put("batch.size",33554432);
KafkaProducer<String,String> kafkaProducer = new KafkaProducer<String, String>(properties);
Map<Integer,Integer> partitionCount = new HashMap<Integer,Integer>();
partitionCount.put(0,0);
partitionCount.put(1,0);
partitionCount.put(2,0);
try{
Date from = new Date();
for(int i=0;i<10000;i++) {
RecordMetadata ack = kafkaProducer.send(new ProducerRecord<String, String>("test_topic", Integer.toString(i), "MyMessage" + Integer.toString(i))).get();
//RecordMetadata ack = kafkaProducer.send(new ProducerRecord<String,String>("test_topic",0,Integer.toString(i), "MyMessage" + Integer.toString(i))).get();
System.out.println(" Offset = " + ack.offset());
System.out.println(" Partition = " + ack.partition());
partitionCount.put(ack.partition(),partitionCount.get(ack.partition())+1);
}
Date to = new Date();
System.out.println(" partition 0 =" + partitionCount.get(0));
System.out.println(" partition 1 =" + partitionCount.get(1));
System.out.println(" partition 2 =" + partitionCount.get(2));
System.out.println(" Elapsed Time = " + (to.getTime()-from.getTime())/1000);
} catch (Exception ex){
ex.printStackTrace();
} finally {
kafkaProducer.close();
}
}
}

What you are asking for is the total number of produce requests.
You can see the average number of produce requests per second using the JMX Mbean kafka.producer:type=producer-metrics,client-id=([-.w]+)

Related

how to get all uncommitted messages in kafka when manually committing offset

In my application ,i am consuming json messages from kafka topic and Multiple instances are running of my application. I have set kafka prop as: props.put("enable.auto.commit", "false")
So When i consume message ,i push it to my DB and then commit it as :
private static void commitMessage(KafkaConsumer<String, String> kafkaConsumer, ConsumerRecord message, String kafkaTopic) {
long nextOffset = message.offset() + 1;
TopicPartition topicPartition = new TopicPartition(kafkaTopic, message.partition());
OffsetAndMetadata offsetAndMetadata = new OffsetAndMetadata(nextOffset);
Map<TopicPartition, OffsetAndMetadata> offsetAndMetadataMap = new HashMap<>();
offsetAndMetadataMap.put(topicPartition, offsetAndMetadata);
//
log.info("Commiting processed kafka message, topic [" + kafkaTopic + "], partition [" + message.partition() + "], next offset [" + nextOffset + "]");
kafkaConsumer.commitSync(offsetAndMetadataMap);
}
Now it may happen after consuming message(but before pushing it to DB) my Application restarts for some reason. Now i want to consume uncommitted message again from kafka after restart. I am able to do using seek:
private static void seekAllPartitions(KafkaConsumer<String, String> kafkaConsumer, String kafkaTopic) {
List<PartitionInfo> partitionInfos = kafkaConsumer.partitionsFor(kafkaTopic);
println 'Size ofpartition list : ' + partitionInfos.size()
for (PartitionInfo partitionInfo : partitionInfos) {
TopicPartition topicPartition = new TopicPartition(kafkaTopic, partitionInfo.partition());
OffsetAndMetadata committedForPartition = kafkaConsumer.committed(topicPartition);
try {
if (committedForPartition != null) {
println 'Seeking offset...' + committedForPartition.offset()
kafkaConsumer.seek(topicPartition, committedForPartition.offset());
}
} catch (Exception ex) {}
}
}
Now problem is - seek(topicPartition,committedForPartition.offset()) gives me last uncommitted message and not the intermediate uncommitted messages.As i mentioned ,multiple instance are running - i may end up with intermediate uncommitted messages for ex : Instance a -2nd msg was not committed and Instance b - 5 the msg not committed but it gives me 5th message only and not 2nd.

AWS Lambda : Java : Kinesis Events

I'm having trouble finding examples of what I'm trying to do...
I'd like to create a Lambda function in Java. I thought I'd always use Javascript for Lambda functions, but in this case I'll end up re-using application logic already written in Java, so it makes sense.
In the past I've written Javascript Lambda functions that are triggered by Kinesis events. Super simple, function receives the events as a parameter, do something, voila. I'd like to do the same thing with Java. Really simple :
Kinesis Event(s) -> Trigger Function -> (Java) Receive Kinesis Events, do something with them
Anyone have experience with this kind of use case?
Here is some sample code I wrote to demonstrate the same concept internally. This code forwards events from one stream to another.
Note this code does not handle retries if there are errors in forwarding, nor is it meant to be performant in a production environment, but it does demonstrate how to handle the records from the publishing stream.
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.kinesis.AmazonKinesisClient;
import com.amazonaws.services.kinesis.model.PutRecordsRequest;
import com.amazonaws.services.kinesis.model.PutRecordsRequestEntry;
import com.amazonaws.services.kinesis.model.PutRecordsResult;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.LambdaLogger;
import com.amazonaws.services.lambda.runtime.events.KinesisEvent;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class KinesisToKinesis {
private LambdaLogger logger;
final private AmazonKinesisClient kinesisClient = new AmazonKinesisClient();
public PutRecordsResult eventHandler(KinesisEvent event, Context context) {
logger = context.getLogger();
if (event == null || event.getRecords() == null) {
logger.log("Event contains no data" + System.lineSeparator());
return null;
} else {
logger.log("Received " + event.getRecords().size() +
" records from " + event.getRecords().get(0).getEventSourceARN() + System.lineSeparator());
}
final Long startTime = System.currentTimeMillis();
// set up the client
Region region;
final Map<String, String> environmentVariables = System.getenv();
if (environmentVariables.containsKey("AWS_REGION")) {
region = Region.getRegion(Regions.fromName(environmentVariables.get("AWS_REGION")));
} else {
region = Region.getRegion(Regions.US_WEST_2);
logger.log("Using default region: " + region.toString() + System.lineSeparator());
}
kinesisClient.setRegion(region);
Long elapsed = System.currentTimeMillis() - startTime;
logger.log("Finished setup in " + elapsed + " ms" + System.lineSeparator());
PutRecordsRequest putRecordsRequest = new PutRecordsRequest().withStreamName("usagecounters-global");
List<PutRecordsRequestEntry> putRecordsRequestEntryList = event.getRecords().parallelStream()
.map(r -> new PutRecordsRequestEntry()
.withData(ByteBuffer.wrap(r.getKinesis().getData().array()))
.withPartitionKey(r.getKinesis().getPartitionKey()))
.collect(Collectors.toList());
putRecordsRequest.setRecords(putRecordsRequestEntryList);
elapsed = System.currentTimeMillis() - startTime;
logger.log("Processed " + putRecordsRequest.getRecords().size() +
" records in " + elapsed + " ms" + System.lineSeparator());
PutRecordsResult putRecordsResult = kinesisClient.putRecords(putRecordsRequest);
elapsed = System.currentTimeMillis() - startTime;
logger.log("Forwarded " + putRecordsRequest.getRecords().size() +
" records to Kinesis " + putRecordsRequest.getStreamName() +
" in " + elapsed + " ms" + System.lineSeparator());
return putRecordsResult;
}
}

How to get a simple DAG to work in Hazelcast Jet?

While working on my DAG in hazelcast Jet, I stumbled into a weird problem. To check for the error I dumbed down my approach completely and: it seems that the edges are not working according to the tutorial.
The code below is almost as simple as it gets. Two vertices (one source, one sink), one edge.
The source is reading from a map, the sink should put into a map.
The data.addEntryListener correctly tells me that the map is filled with 100 lists (each with 25 objects at 400 byte) by another application ... and then nothing. The map fills up, but the dag doesn't interact with it at all.
Any idea where to look for the problem?
package be.andersch.clusterbench;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.hazelcast.config.Config;
import com.hazelcast.config.SerializerConfig;
import com.hazelcast.core.EntryEvent;
import com.hazelcast.jet.*;
import com.hazelcast.jet.config.JetConfig;
import com.hazelcast.jet.stream.IStreamMap;
import com.hazelcast.map.listener.EntryAddedListener;
import be.andersch.anotherpackage.myObject;
import java.util.List;
import java.util.concurrent.ExecutionException;
import static com.hazelcast.jet.Edge.between;
import static com.hazelcast.jet.Processors.*;
/**
* Created by abernard on 24.03.2017.
*/
public class Analyzer {
private static final ObjectMapper mapper = new ObjectMapper();
private static JetInstance jet;
private static final IStreamMap<Long, List<String>> data;
private static final IStreamMap<Long, List<String>> testmap;
static {
JetConfig config = new JetConfig();
Config hazelConfig = config.getHazelcastConfig();
hazelConfig.getGroupConfig().setName( "name" ).setPassword( "password" );
hazelConfig.getNetworkConfig().getInterfaces().setEnabled( true ).addInterface( "my_IP_range_here" );
hazelConfig.getSerializationConfig().getSerializerConfigs().add(
new SerializerConfig().
setTypeClass(myObject.class).
setImplementation(new OsamKryoSerializer()));
jet = Jet.newJetInstance(config);
data = jet.getMap("data");
testmap = jet.getMap("testmap");
}
public static void main(String[] args) throws ExecutionException, InterruptedException {
DAG dag = new DAG();
Vertex source = dag.newVertex("source", readMap("data"));
Vertex test = dag.newVertex("test", writeMap("testmap"));
dag.edge(between(source, test));
jet.newJob(dag).execute()get();
data.addEntryListener((EntryAddedListener<Long, List<String>>) (EntryEvent<Long, List<String>> entryEvent) -> {
System.out.println("Got data: " + entryEvent.getKey() + " at " + System.currentTimeMillis() + ", Size: " + jet.getHazelcastInstance().getMap("data").size());
}, true);
testmap.addEntryListener((EntryAddedListener<Long, List<String>>) (EntryEvent<Long, List<String>> entryEvent) -> {
System.out.println("Got test: " + entryEvent.getKey() + " at " + System.currentTimeMillis());
}, true);
Runtime.getRuntime().addShutdownHook(new Thread(() -> Jet.shutdownAll()));
}
}
The Jet job is already finished at the line jet.newJob(dag).execute().get(), before you even created the entry listeners. This means that the job runs on an empty map. Maybe your confusion is about the nature of this job: it's a batch job, not an infinite stream processing one. Jet version 0.3 does not yet support infinite stream processing.

AWS SQS Java. Not all messages are retrieved from the SQS queue

I have been trying several approaches to retrieve all messages from the SQS queue by using AWS SDK for Java to no avail. I have read about the distributed nature of the AWS SQS and that messages are stored on the different servers. But what I do not understand is why this architecture is not hidden from the end user. What tricks do I have to apply in Java code to retrieve all messages and be 100% sure that no one was missed?
I tried this with the "Long Polling":
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl);
List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (Message message : messages) {
System.out.println(" Message");
System.out.println(" MessageId: " + message.getMessageId());
System.out.println(" ReceiptHandle: " + message.getReceiptHandle());
System.out.println(" MD5OfBody: " + message.getMD5OfBody());
System.out.println(" Body: " + message.getBody());
for (Entry<String, String> entry : message.getAttributes().entrySet()) {
System.out.println(" Attribute");
System.out.println(" Name: " + entry.getKey());
System.out.println(" Value: " + entry.getValue());
}
}
System.out.println();
And this with Request Batching / Client-Side Buffering:
// Create the basic Amazon SQS async client
AmazonSQSAsync sqsAsync = new AmazonSQSAsyncClient();
// Create the buffered client
AmazonSQSAsync bufferedSqs = new AmazonSQSBufferedAsyncClient(sqsAsync);
CreateQueueRequest createRequest = new CreateQueueRequest().withQueueName("MyTestQueue");
CreateQueueResult res = bufferedSqs.createQueue(createRequest);
SendMessageRequest request = new SendMessageRequest();
String body = "test message_" + System.currentTimeMillis();
request.setMessageBody( body );
request.setQueueUrl(res.getQueueUrl());
SendMessageResult sendResult = bufferedSqs.sendMessage(request);
ReceiveMessageRequest receiveRq = new ReceiveMessageRequest()
.withMaxNumberOfMessages(10)
.withQueueUrl(queueUrl);
ReceiveMessageResult rx = bufferedSqs.receiveMessage(receiveRq);
List<Message> messages = rx.getMessages();
for (Message message : messages) {
System.out.println(" Message");
System.out.println(" MessageId: " + message.getMessageId());
System.out.println(" ReceiptHandle: " + message.getReceiptHandle());
System.out.println(" MD5OfBody: " + message.getMD5OfBody());
System.out.println(" Body: " + message.getBody());
for (Entry<String, String> entry : message.getAttributes().entrySet()) {
System.out.println(" Attribute");
System.out.println(" Name: " + entry.getKey());
System.out.println(" Value: " + entry.getValue());
}
}
But I am still unable to retrieve all messages.
Any idea?
AWS Forum keeps silence on my post.
When receiving messages from an SQS queue, you need to repeatedly call sqs:ReceiveMessage.
On each call to sqs:ReceiveMessage, you will get 0 or more messages from the queue which you'll need to iterate through. For each message, you'll also need to call sqs:DeleteMessage to remove the message from the queue when you're done processing each message.
Add a loop around your "Long Polling" sample above to receive all messages.
for (;;) {
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl);
List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (Message message : messages) {
System.out.println(" Message");
System.out.println(" MessageId: " + message.getMessageId());
System.out.println(" ReceiptHandle: " + message.getReceiptHandle());
System.out.println(" MD5OfBody: " + message.getMD5OfBody());
System.out.println(" Body: " + message.getBody());
for (Entry<String, String> entry : message.getAttributes().entrySet()) {
System.out.println(" Attribute");
System.out.println(" Name: " + entry.getKey());
System.out.println(" Value: " + entry.getValue());
}
}
System.out.println();
}
Also note that you may receive the same message more than once. So allow your work to "reprocess" the same message, or detect a repeated message.
I too was facing same issue - only one message was getting returned , then i tried
receiveMessageRequest.setMaxNumberOfMessages(10) , which would help me in retrieving 10 messages in a loop,
since my queue has >500 records what i did was
List<String> messagelist = new ArrayList<>();
try
{
AmazonSQS sqs = new AmazonSQSClient(credentials);
Region usWest2 = Region.getRegion(Regions.US_WEST_2);
sqs.setRegion(usWest2);
boolean flag = true;
while(flag)
{
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(queuename);
receiveMessageRequest.setMaxNumberOfMessages(number_of_message_);
receiveMessageRequest.withMaxNumberOfMessages(number_of_message_).withWaitTimeSeconds(wait_time_second_);
List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (Message message : messages)
{
// System.out.println(" Body: " + message.getBody());
messagelist.add( message.getBody());
String messageReceiptHandle = message.getReceiptHandle();
sqs.deleteMessage(new DeleteMessageRequest().withQueueUrl(queuename).withReceiptHandle(messageReceiptHandle));
}
if(messages.size()==0)
{
flag = false;
}
}
}
catch (AmazonServiceException ase) {
ase.printStackTrace();
} catch (AmazonClientException ace) {
ace.printStackTrace();
}
finally {
return messagelist ;
}
I am reading records from SQS then saving it into a String list and then deletion the record from queue.
so in the end i will have all the data from the queue in a list
An SQS queue is not a database. You can't read all the messages into a list like you are trying to do. There is no beginning and no end to the queue. You poll the queue and ask for some messages, it returns you some messages if they exist.
If you want a method that can return the entire dataset, then sqs is not the right tool - a traditional database might be better in that case.
Long polling will wait if there is no message in Queue. This means that if you call ReceiveMessage with long polling in loop you are guaranteed that you will get all messages. When there is 0 messages received in response, you've already received all messages.
You mentioned that you used also web console. Web console works in same way as calling API with SDK. This means that when you receive and see messages in console, messages are invisible to other clients until visibility timeout expires. That's probably reason why you don't see messages.
See more information about visibility timeout:
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html

Retrieving nth qualifier in hbase using java

This question is quite out of box but i need it.
In list(collection), we can retrieve the nth element in the list by list.get(i);
similarly is there any method, in hbase, using java API, where i can get the nth qualifier given the row id and ColumnFamily name.
NOTE: I have million qualifiers in single row in single columnFamily.
Sorry for being unresponsive. Busy with something important. Try this for right now :
package org.myorg.hbasedemo;
import java.io.IOException;
import java.util.Scanner;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
public class GetNthColunm {
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "TEST");
Get g = new Get(Bytes.toBytes("4"));
Result r = table.get(g);
System.out.println("Enter column index :");
Scanner reader = new Scanner(System.in);
int index = reader.nextInt();
System.out.println("index : " + index);
int count = 0;
for (KeyValue kv : r.raw()) {
if(++count!=index)
continue;
System.out.println("Qualifier : "
+ Bytes.toString(kv.getQualifier()));
System.out.println("Value : " + Bytes.toString(kv.getValue()));
}
table.close();
System.out.println("Done.");
}
}
Will let you know if I get a better way to do this.

Categories

Resources