Setting up multiple partitions in Apache Kafka - java

Im trying to set the no of partitions to 2 from the code,and i have single node setup, (1 zookeeper, 1kafka). when i consume the message i see that kafka is using only one partition to store the data, Do i need to make any modifications to the setup to have multiple partitions ?
private void setupZookeeper(String[] topicList){
ZkClient zkClient = null;
ZkUtils zkUtils = null;
try {
String[] zookeeperHosts = {"localhost:2181"}; // If multiple zookeeper then -> String zookeeperHosts = "192.168.20.1:2181,192.168.20.2:2181";
int sessionTimeOutInMs = 15 * 1000; // 15 secs
int connectionTimeOutInMs = 10 * 1000; // 10 secs
//String topicName = "testTopic";
int noOfPartitions = 2;
int noOfReplication = 1;
for(String zookeeper:zookeeperHosts){
zkClient = new ZkClient(zookeeper, sessionTimeOutInMs, connectionTimeOutInMs, ZKStringSerializer$.MODULE$);
zkUtils = new ZkUtils(zkClient, new ZkConnection(zookeeper), false);
for(String topicName: topicList){
System.out.println("Setting no of partitions ="+noOfPartitions + "for topic" + topicName);
AdminUtils.createTopic(zkUtils, topicName, noOfPartitions, noOfReplication,
producerConfig(),RackAwareMode.Disabled$.MODULE$);
}
}
} catch (Exception ex) {
ex.printStackTrace();
} finally {
if (zkClient != null) {
zkClient.close();
}
}
My producerConfig, looks like the following:
private Properties producerConfig() {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
//props.put("retries", 0);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
return props;
}

when i consume the message i see that kafka is using only one
partition to store the data
The default message partitioning strategy as below, "only one partition used" may be caused by constant message key, same hash value calculated and route to only one partition.
If a partition is specified in the record, use it;
If no partition is specified but a key is present choose a partition based on a hash of the key;
If no partition or key is present choose a partition in a round-robin fashion.
you

Related

Kafka producer sending null key instead of string

I am using a Producer to send messages to a Kafka topic.
When JUnit testing, I have found that the producer in my application code (but not in my JUnit test class) is sending a null key, despite me providing a String key for it to use.
Code as follows:
Main application class
final Producer<String, HashSet<String>> actualApplicationProducer;
ApplicationInstance(String bootstrapServers) // constructor
{
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "ActualClient");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CustomSerializer.class.getName());
props.put(ProducerConfig.LINGER_MS_CONFIG, lingerBatchMS);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Math.min(maxBatchSizeBytes,1000000));
actualApplicationProducer = new KafkaProducer<>(props);
}
public void doStuff()
{
HashSet<String> values = new HashSet<String>();
String key = "applicationKey";
// THIS LINE IS SENDING A NULL KEY
actualApplicationProducer.send(new ProducerRecord<>(topicName, key, values));
}
But, in my junit classes:
#EmbeddedKafka
#ExtendWith(SpringExtension.class)
#SuppressWarnings("static-method")
#TestInstance(TestInstance.Lifecycle.PER_CLASS)
public class CIFFileProcessorTests
{
/** An Embedded Kafka Broker that can be used for unit testing purposes. */
#Autowired
private EmbeddedKafkaBroker embeddedKafkaBroker;
#BeforeAll
public void setUpBeforeClass(#TempDir File globalTablesDir, #TempDir File rootDir) throws Exception
{
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "JUnitClient");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CustomSerializer.class.getName());
props.put(ProducerConfig.LINGER_MS_CONFIG, lingerBatchMS);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Math.min(maxBatchSizeBytes,1000000));
try(Producer<String, HashSet<String>> junitProducer = new Producer<>(props))
{
HashSet<String> values = new HashSet<>();
// Here, I'm sending a record, just like in my main application code, but it's sending the key correctly and not null
junitProducer.send(new ProducerRecord<>(topicName,"junitKey",values));
}
#Test
public void test()
{
ApplicationInstance sut = new ApplicationInstance(embeddedKafkaBroker.getBrokersAsString());
sut.doStuff();
// "records" is a LinkedBlockingQueue, populated by a KafkaMessageListenerContainer which is monitoring the topic for records using a MessageListener
ConsumerRecord<String, HashSet<String>> record = records.poll(1,TimeUnit.SECONDS);
assertEquals("junitKey", record.key()); // TEST FAILS - expected "junitKey" but returned null
}
Custom serializer:
try (final ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos))
{
oos.writeObject(object);
return baos.toByteArray();
}
Does anyone know why the KafkaProducer would send a null key when I explicitly specify a String?
--- Update ---
I have tried inspecting the metadata, and the Producer is indeed sending the key, and not null:
RecordMetadata info = actualApplicationProducer.send(new ProducerRecord<>(topicName, key, values)).get();
System.out.println("INFO - partition: " + info.partition() + ", topic: " + info.topic() + ", offset: " + info.offset() + ", timestamp: "+ info.timestamp() + ", keysize: " + info.serializedKeySize() + ", valuesize: " + info.serializedValueSize());
output:
INFO - partition: 0, topic: topicName, offset: 2, timestamp: 1656060840304, keysize: 14, valuesize: 6258
The keysize being > 0 shows that null is not passed to the topic.
So, the issue must be with the reading of the topic, perhaps?
Turns out, I was using a different Deserializer class for my KafkaMessageListenerContainer, which didn't know what to do with the String as provided
Not sure why you want to use ByteArrayOutputStream or ObjectOutputStream for serializing KAFKA producer records, that may be your requirement. In such case, you may refer the producer section from https://dzone.com/articles/kafka-producer-and-consumer-example
But injecting key in the producer record can be easily done. For example, if you want generate a Producer Record from an AVRO schema and use assert to inject record key and value, you can do something like this.
Generate a AVRO or Specific records
You can refer https://technology.amis.nl/soa/kafka/generate-random-json-data-from-an-avro-schema-using-java/
You can convert it to SpecifiRecords using JSONAVROConverter:
public static ProducerRecord<String, CustomEvent> generateRecord(){
String schemaFile = "AVROSchema.avsc";
Schema schema = getSchema(JSONFile);
String json = getJson(dataFile);
byte[] jsonBytes = json.getBytes(StandardCharsets.UTF_8);
CustomEventMessage producerRecord = null;
JsonAvroConverter converter = new JsonAvroConverter();
try {
record = converter.convertToSpecificRecord(jsonBytes, CustomEvent.class, schema);
} catch (Exception e) {
}
String recordKey = "YourKey";
return new ProducerRecord<String, CustomEvent>( topic, recordKey, record);
}
You can inject the ProducerRecord into your Assert functions later.

Getting Heap Space Issue while deserializing Avro packet in Kafka Consumer

Getting heap space out of memory exception while deserializing avro message in Kafka consumer.
Running consumer code in Java with local kafka producer and consumer and i tried to increase heap memory till 10GB in IntelliJ but still getting this issue.
Simple Consumer Class Code
Properties props = new Properties();
props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test1");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
AvroDeserializer.class.getName());
KafkaConsumer<String, BookingContext> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("fastlog"));
while (true) {
ConsumerRecords<String, MyClass> records = consumer.poll(100);
for (ConsumerRecord<String, MyClass> record : records)
{
System.out.printf("----------------------" +
"+\noffset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
}
Here is my class for deserializer where i wrote to convert packet to normal class after the process.
Avro Deserializer Code :
public T deserialize(String topic, byte[] data) {
try {
T result = null;
if (data != null) {
LOGGER.debug("data='{}'", DatatypeConverter.printHexBinary(data));
DatumReader<GenericRecord> datumReader =
new SpecificDatumReader<>(MyClass.getClassSchema());
Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
result = (T) datumReader.read(null, decoder);
LOGGER.debug("deserialized data='{}'", result);
}
return result;
} catch (Exception ex) {
throw new SerializationException(
"Can't deserialize data '" + Arrays.toString(data) + "' from topic '" + topic + "'", ex);
}
}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.avro.generic.GenericData$Array.<init>(GenericData.java:245)
at org.apache.avro.generic.GenericDatumReader.newArray(GenericDatumReader.java:391)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:257)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:177)
at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:116)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:116)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at kafka.serializer.AvroDeserializer.deserialize(AvroDeserializer.java:59)
at kafka.serializer.AvroDeserializer.deserialize(AvroDeserializer.java:21)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:918)
at org.apache.kafka.clients.consumer.internals.Fetcher.access$2600(Fetcher.java:93)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1095)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1200(Fetcher.java:944)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:567)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:528)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1110)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
at SimpleConsumer.main(SimpleConsumer.java:43)
The code you posted doesn't show anything that would run out of memory, but you're obviously storing these result returned values somewhere else, and not cleaning up after them. I suggest you check whatever is calling your deserialize method, and check if you are maybe storing all those results in a list or another data structure, and not cleaning them up.
The other thing you can do is run a JVM profiler like JVisualVM, and you do a heap dump that will show you what type/quantity of objects are clogging up your JVM heap.

How to reset Kafka consumer offset when calling consumer many times

I'm trying to reset consumer offset whenever calling consumer so that when I call consumer many times it can still read record sent by producer. I'm setting props.put("auto.offset.reset","earliest"); and calling consumer.seekToBeginning(consumer.assignment()); but when I call the consumer the second time it will receive no records. How can I fix this?
public ConsumerRecords<String, byte[]> consumer(){
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
//props.put("group.id", String.valueOf(System.currentTimeMillis()));
props.put("auto.offset.reset","earliest");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("topiccc"));
ConsumerRecords<String, byte[]> records = consumer.poll(100);
consumer.seekToBeginning(consumer.assignment());
/* List<byte[]> videoContents = new ArrayList<byte[]>();
for (ConsumerRecord<String, byte[]> record : records) {
System.out.printf("offset = %d, key = %s, value = %s\n", record.offset(), record.key(), record.value());
videoContents.add(record.value());
}*/
return records;
}
public String producer(#RequestParam("message") String message) {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kakfa cluster
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
Producer<String, byte[]> producer = new KafkaProducer<>(props);
Path path = Paths.get("C:/Programming Files/video-2012-07-05-02-29-27.mp4");
ProducerRecord<String, byte[]> record = null;
try {
record = new ProducerRecord<>("topiccc", "keyyyyy"
, Files.readAllBytes(path));
} catch (IOException e) {
e.printStackTrace();
}
producer.send(record);
producer.close();
//kafkaSender.send(record);
return "Message sent to the Kafka Topic java_in_use_topic Successfully";
}
From the Kafka Java Code, the documentation on AUTO_OFFSET_RESET_CONFIG says the following:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted): earliest: automatically reset the offset to the earliest offsetlatest: automatically reset the offset to the latest offsetnone: throw exception to the consumer if no previous offset is found for the consumer's groupanything else: throw exception to the consumer.
This can be found here in GitHub:
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java
We can see from their comment that the setting is only used when the offset is not on the server. In the question, the offset is retrieved from the server and that's why the offset is not reset to the beginning but rather stays at the last offset, making it appear that there are no more records.
You would need to explicitly reset the offset on the server side to fix this as requested in the question.
Here is another answer that describes how that could be done.
https://stackoverflow.com/a/54492802/231860
This is a snippet of code that allowed me to reset the offset. NOTE: You can't call seekToBeginning if you call the subscribe method. I could only get it to work if I assign the partitions myself using the assign method. Pity.
// Create the consumer:
final Consumer<String, DataRecord> consumer = new KafkaConsumer<>(props);
// Get the partitions that exist for this topic:
List<PartitionInfo> partitions = consumer.partitionsFor(topic);
// Get the topic partition info for these partitions:
List<TopicPartition> topicPartitions = partitions.stream().map(info -> new TopicPartition(info.topic(), info.partition())).collect(Collectors.toList());
// Assign all the partitions to the topic so that we can seek to the beginning:
// NOTE: We can't use subscribe if we use assign, but we can't seek to the beginning if we use subscribe.
consumer.assign(topicPartitions);
// Make sure we seek to the beginning of the partitions:
consumer.seekToBeginning(topicPartitions);
Yes, it seems extremely complicated to achieve a seemingly rudimentary use case. This might indicate that the whole kafka world just seems to want to read streams once.
I am usually creating a new consumer with different group.id to read again records.
So do it like that:
props.put("group.id", Instant.now().getEpochSecond());
There is a workaround for this (not a production solution, though) which is to change the group.id configuration value each time you consume. Setting auto.offset.reset to earliest is not enough in many cases.
When you want one message to be consumed by consumers multiple time the ideal way is to create consumers with different consumer group so same message can be consumed.
But if you want the same consumer to consume the same message multiple time then you can play with commit and offset
You set the auto.commit very high or disable it and do commit as per your logic
You can refer to this for more details https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
This javadoc provides detail on how to manually manage offset

Can't get KafkaProducer/KafkaConsumer to work in Scala

I'm trying to create a simple KafkaProducer and KafkaConsumer so I can send data to a topic on a broker, and then verify that the data was received. I have below the two methods I used to define my consumer and producer, and how I'm sending the message. The send method takes at lest 20 seconds to complete, and as far as I can tell the consumer.poll method never actually finishes, but the longest I've left it was 10 minutes.
Does anyone have a suggestion as to what I'm doing wrong? Is there some property for the producer/consumer that I'm not setting up correctly? Those properties are copied directly from the docs, so I don't understand why they won't work.
KafkaProducer docs
KafkaConsumer docs
"verify we can send to producer" in {
val consumer = createKafkaConsumer("address:9002")
val producer = createKafkaProducer("address:9002")
val message = "I am a message"
val record = new ProducerRecord[String, String]("myTopic", message)
producer.send(record)
TimeUnit.SECONDS.sleep(5)
val records = consumer.poll(5000)
println("records: "+records)
consumer1.close()
}
def createKafkaProducer(kafka: String): KafkaProducer[String,String] = {
val props = new Properties()
props.put("bootstrap.servers", kafka)
props.put("acks", "all")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
new KafkaProducer[String,String](props)
}
def createKafkaConsumer(kafka: String): KafkaConsumer[String, String] = {
val props = new Properties()
props.put("bootstrap.servers", kafka)
props.put("group.id", "test")
props.put("enable.auto.commit", "true")
props.put("auto.commit.interval.ms", "1000")
props.put("session.timeout.ms", "30000")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(Collections.singletonList("myTopic"))
consumer
}
Edit: I've updated my code so that I now get the response from the send method, and it seems that that times out with org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
Turns out I had a DNS issue that meant that I wasn't actually connecting to the broker. Fixing this allowed the messages to go through, there was nothing wrong with the config.

Apache Kafka LEADER_NOT_AVAILABLE

I'm running into an issue with apache Kafka that I don't understand . I subscribe to a topic in my broker called "topic-received" . This is the code :
protected String readResponse(final String idMessage) {
if (props != null) {
kafkaClient = new KafkaConsumer<>(props);
logger.debug("Subscribed to topic-received");
kafkaClient.subscribe(Arrays.asList("topic-received"));
logger.debug("Waiting for reading : topic-received");
ConsumerRecords<String, String> records =
kafkaClient.poll(kafkaConfig.getRead_timeout());
if (records != null) {
for (ConsumerRecord<String, String> record : records) {
logger.debug("Resultado devuelto : "+record.value());
return record.value();
}
}
}
return null;
}
As this is happening, I send a message to "topic-received" from another point . The code is the following one :
private void sendMessageToKafkaBroker(String idTopic, String value) {
Producer<String, String> producer = null;
try {
producer = new KafkaProducer<String, String>(mapProperties());
ProducerRecord<String, String> producerRecord = new
ProducerRecord<String, String>("topic-received", value);
producer.send(producerRecord);
logger.info("Sended value "+value+" to topic-received");
} catch (ExceptionInInitializerError eix) {
eix.printStackTrace();
} catch (KafkaException ke) {
ke.printStackTrace();
} finally {
if (producer != null) {
producer.close();
}
}
}
First time I try , with topic "topic-received", I get a warning like this
"WARN 13164 --- [nio-8085-exec-3] org.apache.kafka.clients.NetworkClient :
Error while fetching metadata with correlation id 1 : {topic-
received=LEADER_NOT_AVAILABLE}"
But if I try again, to this topic "topic-received", works ok, and no warning is presented . Anyway, that's not useful for me, because I have to listen from a topic and send to a topic new each time ( referenced by an String identifier ex: .. 12Erw45-2345Saf-234DASDFasd )
Looking for LEADER_NOT_AVAILABLE in google , some guys talk about adding to server.properties the next lines :
host.name=127.0.0.1
advertised.port=9092
advertised.host.name=127.0.0.1
But it's not working for me ( Don't know why ) .
I have tried to create the topic before all this process with the following code:
private void createTopic(String idTopic) {
String zookeeperConnect = "localhost:2181";
ZkClient zkClient = new ZkClient(zookeeperConnect,10000,10000,
ZKStringSerializer$.MODULE$);
ZkUtils zkUtils = new ZkUtils(zkClient, new
ZkConnection(zookeeperConnect),false);
if(!AdminUtils.topicExists(zkUtils,idTopic)) {
AdminUtils.createTopic(zkUtils, idTopic, 2, 1, new Properties(),
null);
logger.debug("Created topic "+idTopic+" by super user");
}
else{
logger.debug("topic "+idTopic+" already exists");
}
}
No error, but still, it stays listening till the timeout.
I have reviewed the properties of the broker to check if there's any help, but I haven't found anything clear enough . The props that I have used for reading are :
props = new Properties();
props.put("bootstrap.servers", kafkaConfig.getBootstrap_servers());
props.put("key.deserializer", kafkaConfig.getKey_deserializer());
props.put("value.deserializer", kafkaConfig.getValue_deserializer());
props.put("key.serializer", kafkaConfig.getKey_serializer());
props.put("value.serializer", kafkaConfig.getValue_serializer());
props.put("group.id",kafkaConfig.getGroupId());
and , for sending ...
Properties props = new Properties();
props.put("bootstrap.servers", kafkaConfig.getHost() + ":" +
kafkaConfig.getPort());
props.put("group.id", kafkaConfig.getGroup_id());
props.put("enable.auto.commit", kafkaConfig.getEnable_auto_commit());
props.put("auto.commit.interval.ms",
kafkaConfig.getAuto_commit_interval_ms());
props.put("session.timeout.ms", kafkaConfig.getSession_timeout_ms());
props.put("key.deserializer", kafkaConfig.getKey_deserializer());
props.put("value.deserializer", kafkaConfig.getValue_deserializer());
props.put("key.serializer", kafkaConfig.getKey_serializer());
props.put("value.serializer", kafkaConfig.getValue_serializer());
Any clue ? Why , the only way that I have to consume messages from the broker and from the topic, is repeating the request after an error ?
Thanks in advance
This happens when trying to produce messages to a topic that doesn't exist
PLEASE NOTE: In some Kafka installations, the framework can automatically create the topic when it doesn't exist, that explains why you see the issue only once at the very beginning.
This error appears when your Topic name doesn't exist.
To list all topics execute following command:
kafka-topics --list --zookeeper localhost:2181

Categories

Resources