Kafka producer sending null key instead of string

Kafka producer sending null key instead of string - java

I am using a Producer to send messages to a Kafka topic.
When JUnit testing, I have found that the producer in my application code (but not in my JUnit test class) is sending a null key, despite me providing a String key for it to use.
Code as follows:
Main application class
final Producer<String, HashSet<String>> actualApplicationProducer;
ApplicationInstance(String bootstrapServers) // constructor
{
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "ActualClient");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CustomSerializer.class.getName());
props.put(ProducerConfig.LINGER_MS_CONFIG, lingerBatchMS);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Math.min(maxBatchSizeBytes,1000000));
actualApplicationProducer = new KafkaProducer<>(props);
}
public void doStuff()
{
HashSet<String> values = new HashSet<String>();
String key = "applicationKey";
// THIS LINE IS SENDING A NULL KEY
actualApplicationProducer.send(new ProducerRecord<>(topicName, key, values));
}
But, in my junit classes:
#EmbeddedKafka
#ExtendWith(SpringExtension.class)
#SuppressWarnings("static-method")
#TestInstance(TestInstance.Lifecycle.PER_CLASS)
public class CIFFileProcessorTests
{
/** An Embedded Kafka Broker that can be used for unit testing purposes. */
#Autowired
private EmbeddedKafkaBroker embeddedKafkaBroker;
#BeforeAll
public void setUpBeforeClass(#TempDir File globalTablesDir, #TempDir File rootDir) throws Exception
{
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "JUnitClient");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CustomSerializer.class.getName());
props.put(ProducerConfig.LINGER_MS_CONFIG, lingerBatchMS);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Math.min(maxBatchSizeBytes,1000000));
try(Producer<String, HashSet<String>> junitProducer = new Producer<>(props))
{
HashSet<String> values = new HashSet<>();
// Here, I'm sending a record, just like in my main application code, but it's sending the key correctly and not null
junitProducer.send(new ProducerRecord<>(topicName,"junitKey",values));
}
#Test
public void test()
{
ApplicationInstance sut = new ApplicationInstance(embeddedKafkaBroker.getBrokersAsString());
sut.doStuff();
// "records" is a LinkedBlockingQueue, populated by a KafkaMessageListenerContainer which is monitoring the topic for records using a MessageListener
ConsumerRecord<String, HashSet<String>> record = records.poll(1,TimeUnit.SECONDS);
assertEquals("junitKey", record.key()); // TEST FAILS - expected "junitKey" but returned null
}
Custom serializer:
try (final ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos))
{
oos.writeObject(object);
return baos.toByteArray();
}
Does anyone know why the KafkaProducer would send a null key when I explicitly specify a String?
--- Update ---
I have tried inspecting the metadata, and the Producer is indeed sending the key, and not null:
RecordMetadata info = actualApplicationProducer.send(new ProducerRecord<>(topicName, key, values)).get();
System.out.println("INFO - partition: " + info.partition() + ", topic: " + info.topic() + ", offset: " + info.offset() + ", timestamp: "+ info.timestamp() + ", keysize: " + info.serializedKeySize() + ", valuesize: " + info.serializedValueSize());
output:
INFO - partition: 0, topic: topicName, offset: 2, timestamp: 1656060840304, keysize: 14, valuesize: 6258
The keysize being > 0 shows that null is not passed to the topic.
So, the issue must be with the reading of the topic, perhaps?

Turns out, I was using a different Deserializer class for my KafkaMessageListenerContainer, which didn't know what to do with the String as provided

Not sure why you want to use ByteArrayOutputStream or ObjectOutputStream for serializing KAFKA producer records, that may be your requirement. In such case, you may refer the producer section from https://dzone.com/articles/kafka-producer-and-consumer-example
But injecting key in the producer record can be easily done. For example, if you want generate a Producer Record from an AVRO schema and use assert to inject record key and value, you can do something like this.
Generate a AVRO or Specific records
You can refer https://technology.amis.nl/soa/kafka/generate-random-json-data-from-an-avro-schema-using-java/
You can convert it to SpecifiRecords using JSONAVROConverter:
public static ProducerRecord<String, CustomEvent> generateRecord(){
String schemaFile = "AVROSchema.avsc";
Schema schema = getSchema(JSONFile);
String json = getJson(dataFile);
byte[] jsonBytes = json.getBytes(StandardCharsets.UTF_8);
CustomEventMessage producerRecord = null;
JsonAvroConverter converter = new JsonAvroConverter();
try {
record = converter.convertToSpecificRecord(jsonBytes, CustomEvent.class, schema);
} catch (Exception e) {
}
String recordKey = "YourKey";
return new ProducerRecord<String, CustomEvent>( topic, recordKey, record);
}
You can inject the ProducerRecord into your Assert functions later.

Related

Flink S3 StreamingFileSink not writing files to S3

I am doing a POC for writing data to S3 using Flink. The program does not give a error. However I do not see any files being written in S3 either.
Below is the code
public class StreamingJob {
public static void main(String[] args) throws Exception {
// set up the streaming execution environment
final String outputPath = "s3a://testbucket-s3-flink/data/";
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//Enable checkpointing
env.enableCheckpointing();
//S3 Sink
final StreamingFileSink<String> sink = StreamingFileSink
.forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8"))
.build();
//Source is a local kafka
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "kafka:9094");
properties.setProperty("group.id", "test");
DataStream<String> input = env.addSource(new FlinkKafkaConsumer<String>("queueing.transactions", new SimpleStringSchema(), properties));
input.flatMap(new Tokenizer()) // Tokenizer for generating words
.keyBy(0) // Logically partition the stream for each word
.timeWindow(Time.minutes(1)) // Tumbling window definition
.sum(1) // Sum the number of words per partition
.map(value -> value.f0 + " count: " + value.f1.toString() + "\n")
.addSink(sink);
// execute program
env.execute("Flink Streaming Java API Skeleton");
}
public static final class Tokenizer
implements FlatMapFunction<String, Tuple2<String, Integer>> {
#Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
String[] tokens = value.toLowerCase().split("\\W+");
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<>(token, 1));
}
}
}
}
}
Note that I have set the s3.access-key and s3.secret-key value in the configuration and tested by changing them to incorrect values (I got a error on incorrect values)
Any pointers what may be going wrong?

Could it be that you are running into this issue?
Given that Flink sinks and UDFs in general do not differentiate between normal job termination (e.g. finite input stream) and termination due to failure, upon normal termination of a job, the last in-progress files will not be transitioned to the “finished” state.

Getting Heap Space Issue while deserializing Avro packet in Kafka Consumer

Getting heap space out of memory exception while deserializing avro message in Kafka consumer.
Running consumer code in Java with local kafka producer and consumer and i tried to increase heap memory till 10GB in IntelliJ but still getting this issue.
Simple Consumer Class Code
Properties props = new Properties();
props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test1");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
AvroDeserializer.class.getName());
KafkaConsumer<String, BookingContext> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("fastlog"));
while (true) {
ConsumerRecords<String, MyClass> records = consumer.poll(100);
for (ConsumerRecord<String, MyClass> record : records)
{
System.out.printf("----------------------" +
"+\noffset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
}
Here is my class for deserializer where i wrote to convert packet to normal class after the process.
Avro Deserializer Code :
public T deserialize(String topic, byte[] data) {
try {
T result = null;
if (data != null) {
LOGGER.debug("data='{}'", DatatypeConverter.printHexBinary(data));
DatumReader<GenericRecord> datumReader =
new SpecificDatumReader<>(MyClass.getClassSchema());
Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
result = (T) datumReader.read(null, decoder);
LOGGER.debug("deserialized data='{}'", result);
}
return result;
} catch (Exception ex) {
throw new SerializationException(
"Can't deserialize data '" + Arrays.toString(data) + "' from topic '" + topic + "'", ex);
}
}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.avro.generic.GenericData$Array.<init>(GenericData.java:245)
at org.apache.avro.generic.GenericDatumReader.newArray(GenericDatumReader.java:391)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:257)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:177)
at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:116)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:116)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at kafka.serializer.AvroDeserializer.deserialize(AvroDeserializer.java:59)
at kafka.serializer.AvroDeserializer.deserialize(AvroDeserializer.java:21)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:918)
at org.apache.kafka.clients.consumer.internals.Fetcher.access$2600(Fetcher.java:93)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1095)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1200(Fetcher.java:944)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:567)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:528)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1110)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
at SimpleConsumer.main(SimpleConsumer.java:43)

The code you posted doesn't show anything that would run out of memory, but you're obviously storing these result returned values somewhere else, and not cleaning up after them. I suggest you check whatever is calling your deserialize method, and check if you are maybe storing all those results in a list or another data structure, and not cleaning them up.
The other thing you can do is run a JVM profiler like JVisualVM, and you do a heap dump that will show you what type/quantity of objects are clogging up your JVM heap.

Apache Kafka LEADER_NOT_AVAILABLE

I'm running into an issue with apache Kafka that I don't understand . I subscribe to a topic in my broker called "topic-received" . This is the code :
protected String readResponse(final String idMessage) {
if (props != null) {
kafkaClient = new KafkaConsumer<>(props);
logger.debug("Subscribed to topic-received");
kafkaClient.subscribe(Arrays.asList("topic-received"));
logger.debug("Waiting for reading : topic-received");
ConsumerRecords<String, String> records =
kafkaClient.poll(kafkaConfig.getRead_timeout());
if (records != null) {
for (ConsumerRecord<String, String> record : records) {
logger.debug("Resultado devuelto : "+record.value());
return record.value();
}
}
}
return null;
}
As this is happening, I send a message to "topic-received" from another point . The code is the following one :
private void sendMessageToKafkaBroker(String idTopic, String value) {
Producer<String, String> producer = null;
try {
producer = new KafkaProducer<String, String>(mapProperties());
ProducerRecord<String, String> producerRecord = new
ProducerRecord<String, String>("topic-received", value);
producer.send(producerRecord);
logger.info("Sended value "+value+" to topic-received");
} catch (ExceptionInInitializerError eix) {
eix.printStackTrace();
} catch (KafkaException ke) {
ke.printStackTrace();
} finally {
if (producer != null) {
producer.close();
}
}
}
First time I try , with topic "topic-received", I get a warning like this
"WARN 13164 --- [nio-8085-exec-3] org.apache.kafka.clients.NetworkClient :
Error while fetching metadata with correlation id 1 : {topic-
received=LEADER_NOT_AVAILABLE}"
But if I try again, to this topic "topic-received", works ok, and no warning is presented . Anyway, that's not useful for me, because I have to listen from a topic and send to a topic new each time ( referenced by an String identifier ex: .. 12Erw45-2345Saf-234DASDFasd )
Looking for LEADER_NOT_AVAILABLE in google , some guys talk about adding to server.properties the next lines :
host.name=127.0.0.1
advertised.port=9092
advertised.host.name=127.0.0.1
But it's not working for me ( Don't know why ) .
I have tried to create the topic before all this process with the following code:
private void createTopic(String idTopic) {
String zookeeperConnect = "localhost:2181";
ZkClient zkClient = new ZkClient(zookeeperConnect,10000,10000,
ZKStringSerializer$.MODULE$);
ZkUtils zkUtils = new ZkUtils(zkClient, new
ZkConnection(zookeeperConnect),false);
if(!AdminUtils.topicExists(zkUtils,idTopic)) {
AdminUtils.createTopic(zkUtils, idTopic, 2, 1, new Properties(),
null);
logger.debug("Created topic "+idTopic+" by super user");
}
else{
logger.debug("topic "+idTopic+" already exists");
}
}
No error, but still, it stays listening till the timeout.
I have reviewed the properties of the broker to check if there's any help, but I haven't found anything clear enough . The props that I have used for reading are :
props = new Properties();
props.put("bootstrap.servers", kafkaConfig.getBootstrap_servers());
props.put("key.deserializer", kafkaConfig.getKey_deserializer());
props.put("value.deserializer", kafkaConfig.getValue_deserializer());
props.put("key.serializer", kafkaConfig.getKey_serializer());
props.put("value.serializer", kafkaConfig.getValue_serializer());
props.put("group.id",kafkaConfig.getGroupId());
and , for sending ...
Properties props = new Properties();
props.put("bootstrap.servers", kafkaConfig.getHost() + ":" +
kafkaConfig.getPort());
props.put("group.id", kafkaConfig.getGroup_id());
props.put("enable.auto.commit", kafkaConfig.getEnable_auto_commit());
props.put("auto.commit.interval.ms",
kafkaConfig.getAuto_commit_interval_ms());
props.put("session.timeout.ms", kafkaConfig.getSession_timeout_ms());
props.put("key.deserializer", kafkaConfig.getKey_deserializer());
props.put("value.deserializer", kafkaConfig.getValue_deserializer());
props.put("key.serializer", kafkaConfig.getKey_serializer());
props.put("value.serializer", kafkaConfig.getValue_serializer());
Any clue ? Why , the only way that I have to consume messages from the broker and from the topic, is repeating the request after an error ?
Thanks in advance

This happens when trying to produce messages to a topic that doesn't exist
PLEASE NOTE: In some Kafka installations, the framework can automatically create the topic when it doesn't exist, that explains why you see the issue only once at the very beginning.

This error appears when your Topic name doesn't exist.
To list all topics execute following command:
kafka-topics --list --zookeeper localhost:2181

Setting up multiple partitions in Apache Kafka

Im trying to set the no of partitions to 2 from the code,and i have single node setup, (1 zookeeper, 1kafka). when i consume the message i see that kafka is using only one partition to store the data, Do i need to make any modifications to the setup to have multiple partitions ?
private void setupZookeeper(String[] topicList){
ZkClient zkClient = null;
ZkUtils zkUtils = null;
try {
String[] zookeeperHosts = {"localhost:2181"}; // If multiple zookeeper then -> String zookeeperHosts = "192.168.20.1:2181,192.168.20.2:2181";
int sessionTimeOutInMs = 15 * 1000; // 15 secs
int connectionTimeOutInMs = 10 * 1000; // 10 secs
//String topicName = "testTopic";
int noOfPartitions = 2;
int noOfReplication = 1;
for(String zookeeper:zookeeperHosts){
zkClient = new ZkClient(zookeeper, sessionTimeOutInMs, connectionTimeOutInMs, ZKStringSerializer$.MODULE$);
zkUtils = new ZkUtils(zkClient, new ZkConnection(zookeeper), false);
for(String topicName: topicList){
System.out.println("Setting no of partitions ="+noOfPartitions + "for topic" + topicName);
AdminUtils.createTopic(zkUtils, topicName, noOfPartitions, noOfReplication,
producerConfig(),RackAwareMode.Disabled$.MODULE$);
}
}
} catch (Exception ex) {
ex.printStackTrace();
} finally {
if (zkClient != null) {
zkClient.close();
}
}
My producerConfig, looks like the following:
private Properties producerConfig() {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
//props.put("retries", 0);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
return props;
}

when i consume the message i see that kafka is using only one
partition to store the data
The default message partitioning strategy as below, "only one partition used" may be caused by constant message key, same hash value calculated and route to only one partition.
If a partition is specified in the record, use it;
If no partition is specified but a key is present choose a partition based on a hash of the key;
If no partition or key is present choose a partition in a round-robin fashion.
you

ErrorHandling in SpringXD

I have configured Redis as MessageBus for my spring-xd setup. When my stream fails, the data is pushed to Error Queues. I'm trying to read them back and push them back to destination queues. But I don't see the my Sink modules receiving data. Can some help me understand where I'm going wrong.
Code Snippet.
public RedisTemplate<String, byte[]> redisTemplate(RedisConnectionFactory redisConnectionFactory) {
final RedisTemplate<String, byte[]> template = new RedisTemplate<String, byte[]>();
template.setConnectionFactory(redisConnectionFactory);
template.setKeySerializer(new StringRedisSerializer());
template.setEnableDefaultSerializer(false);
return template;
}
List<String> listOfKeys = new ArrayList<>();
Set<byte[]> keys = redisTemplate.getConnectionFactory().getConnection().keys("ERRORS*".getBytes());
for (byte[] data : keys) {
listOfKeys.add(new String(data, 0, data.length));
}
for (String errorQueue : listOfKeys) {
String destinationQueue = errorQueue.replace("ERRORS:", EMPTY_STRING);
Long size = redisTemplate.opsForList().size(errorQueue);
for (int i = 0; i < size; i++) {
byte[] errorEvt = redisTemplate.opsForList().rightPop(errorQueue);
redisTemplate.opsForList().leftPush(destinationQueue, errorEvt);
}
}

A quick glance at your code says it should work ok; I suggest you run redis-cli and then enter monitor to watch the redis traffic.
EDIT
To answer your comment below - it depends on your content type. If it's simple text, it's relatively easy:
"\xff\x01\x0bcontentType\x00\x00\x00\x0c\"text/plain\"2016-05-20 10:11:07"
The first part are headers, you can decode that with XD's EmbeddedHeadersMessageConverter.
If your payload is a java object, it is serialized by the bus using kryo.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Kafka producer sending null key instead of string - java

Turns out, I was using a different Deserializer class for my KafkaMessageListenerContainer, which didn't know what to do with the String as provided

Related

Flink S3 StreamingFileSink not writing files to S3

Getting Heap Space Issue while deserializing Avro packet in Kafka Consumer

Apache Kafka LEADER_NOT_AVAILABLE

Setting up multiple partitions in Apache Kafka

ErrorHandling in SpringXD

Categories

Resources