How to skip an Avro serialization exception in KafkaStreams API?

How to skip an Avro serialization exception in KafkaStreams API? - java

I have a Kafka application that is written by KafkaStreams Java api. It reads data from Mysql binlog and do some stuff that is irrelevant to my question. The problem is one particular row produces error in deserialization from avro. I can dig into Avro schema file and find the problem but as a whole what I need is a forgiving exception handler that upon encountering such error does not bring the whole application to halt.
This is the main part of my stream app:
StreamsBuilder streamsBuilder = watchForCourierUpdate(builder);
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), properties);
kafkaStreams.start();
Runtime.getRuntime().addShutdownHook(new Thread(kafkaStreams::close));
}
private static StreamsBuilder watchForCourierUpdate(StreamsBuilder builder){
CourierUpdateListener courierUpdateListener = new CourierUpdateListener(builder);
courierUpdateListener.start();
return builder;
}
private static Properties configProperties(){
Properties streamProperties = new Properties();
streamProperties.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, Configs.getConfig("schemaRegistryUrl"));
streamProperties.put(StreamsConfig.APPLICATION_ID_CONFIG, "courier_app");
streamProperties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, Configs.getConfig("bootstrapServerUrl"));
streamProperties.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
streamProperties.put(StreamsConfig.STATE_DIR_CONFIG, "/tmp/state_dir");
streamProperties.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, "3");
streamProperties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
streamProperties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
streamProperties.put(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "DEBUG");
streamProperties.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
CourierSerializationException.class);
return streamProperties;
}
This is my CourierSerializationException class:
public class CourierSerializationException implements ProductionExceptionHandler {
#Override
public ProductionExceptionHandlerResponse handle(ProducerRecord<byte[], byte[]> producerRecord, Exception e) {
Logger.logError("Failed to de/serialize entity from " + producerRecord.topic() + " topic.\n" + e);
return ProductionExceptionHandlerResponse.CONTINUE;
}
#Override
public void configure(Map<String, ?> map) {
}
}
Still, whenever an avro deserialization exception happens the stream shuts down and the application does not continue. Am I missing something!

Have you tried to do this with the default.deserialization.exception.handler provided by kafka? you can use LogAndContinueExceptionHandler which will log and continue.
I may be wrong but i think creating a Customexception by implementing ProductionExceptionHandler only works for network related error on the kafka side.
add this to the properties and see what happens:
> props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, LogAndContinueExceptionHandler.class);

Related

Karate 0.9.6 1.1.0 - org.graalvm.polyglot.PolyglotException: not found error when using classpath to specify the file location [duplicate]

I was working with karate framework to test my rest service and it work great, however I have service that consume message from kafka topic then persist on mongo to finally notify kafka.
I made a java producer on my karate project, it called by js to be used by feature.
Then I have a consumer to check the message
Feature:
* def kafkaProducer = read('../js/KafkaProducer.js')
JS:
function(kafkaConfiguration){
var Producer = Java.type('x.y.core.producer.Producer');
var producer = new Producer(kafkaConfiguration);
return producer;
}
Java:
public class Producer {
private static final Logger LOGGER = LoggerFactory.getLogger(Producer.class);
private static final String KEY = "C636E8E238FD7AF97E2E500F8C6F0F4C";
private KafkaConfiguration kafkaConfiguration;
private ObjectMapper mapper;
private AESEncrypter aesEncrypter;
public Producer(KafkaConfiguration kafkaConfiguration) {
kafkaConfiguration.getProperties().put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
kafkaConfiguration.getProperties().put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");
this.kafkaConfiguration = kafkaConfiguration;
this.mapper = new ObjectMapper();
this.aesEncrypter = new AESEncrypter(KEY);
}
public String produceMessage(String payload) {
// Just notify kafka with payload and return id of payload
}
Other class
public class KafkaConfiguration {
private static final Logger LOGGER = LoggerFactory.getLogger(KafkaConfiguration.class);
private Properties properties;
public KafkaConfiguration(String host) {
try {
properties = new Properties();
properties.put(BOOTSTRAP_SERVERS_CONFIG, host);
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "karate-integration-test");
properties.put(ConsumerConfig.CLIENT_ID_CONFIG, "offset123");
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
} catch (Exception e) {
LOGGER.error("Fail creating the consumer...", e);
throw e;
}
}
public Properties getProperties() {
return properties;
}
public void setProperties(Properties properties) {
this.properties = properties;
}
}
I'd would like to use the producer code with anotation like cucumber does like:
#Then("^Notify kafka with payload (-?\\d+)$")
public void validateResult(String payload) throws Throwable {
new Producer(kafkaConfiguration).produceMessage(payload);
}
and on feature use
Then Notify kafka with payload "{example:value}"
I want to do that because I want to reuse that code on base project in order to be included in other project
If annotation doesn't works, maybe you can suggest me another way to do it

The answer is simple, use normal Java / Maven concepts. Move the common Java code to the "main" packages (src/main/java). Now all you need to do is build a JAR and add it as a dependency to any Karate project.
The last piece of the puzzle is this: use the classpath: prefix to refer to any features or JS files in the JAR. Karate will be able to pick them up.
EDIT: Sorry Karate does not support Cucumber or step-definitions. It has a much simpler approach. Please read this for details: https://github.com/intuit/karate/issues/398

Retry max 3 times when consuming batches in Spring Cloud Stream Kafka Binder

I am consuming batches in kafka, where retry is not supported in spring cloud stream kafka binder with batch mode, there is an option given that You can configure a SeekToCurrentBatchErrorHandler (using a ListenerContainerCustomizer) to achieve similar functionality to retry in the binder.
I tried the same, but with SeekToCurrentBatchErrorHandler, but it's retrying more than the time set which is 3 times.
How can I do that?
I would like to retry the whole batch.
How can I send the whole batch to dlq topic? like for record listener I used to match deliveryAttempt(retry) to 3 then send to DLQ topic, check in listener.
I have checked this link, which is exactly my issue but an example would be great help, with this library spring-cloud-stream-kafka-binder, can I achieve that. Please explain with an example, I am new to this.
Currently I have below code.
#Configuration
public class ConsumerConfig {
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<?, ?>> customizer() {
return (container, dest, group) -> {
container.getContainerProperties().setAckOnError(false);
SeekToCurrentBatchErrorHandler seekToCurrentBatchErrorHandler
= new SeekToCurrentBatchErrorHandler();
seekToCurrentBatchErrorHandler.setBackOff(new FixedBackOff(0L, 2L));
container.setBatchErrorHandler(seekToCurrentBatchErrorHandler);
//container.setBatchErrorHandler(new BatchLoggingErrorHandler());
};
}
}
Listerner:
#StreamListener(ActivityChannel.INPUT_CHANNEL)
public void handleActivity(List<Message<Event>> messages,
#Header(name = KafkaHeaders.ACKNOWLEDGMENT) Acknowledgment
acknowledgment,
#Header(name = "deliveryAttempt", defaultValue = "1") int
deliveryAttempt) {
try {
log.info("Received activity message with message length {}", messages.size());
nodeConfigActivityBatchProcessor.processNodeConfigActivity(messages);
acknowledgment.acknowledge();
log.debug("Processed activity message {} successfully!!", messages.size());
} catch (MessagePublishException e) {
if (deliveryAttempt == 3) {
log.error(
String.format("Exception occurred, sending the message=%s to DLQ due to: ",
"message"),
e);
publisher.publishToDlq(EventType.UPDATE_FAILED, "message", e.getMessage());
} else {
throw e;
}
}
}
After seeing #Gary's response added the ListenerContainerCustomizer #Bean with RetryingBatchErrorHandler, but not able to import the class. attaching screenshots.
not able to import RetryingBatchErrorHandler
my spring cloud dependencies

Use a RetryingBatchErrorHandler to send the whole batch to the DLT
https://docs.spring.io/spring-kafka/docs/current/reference/html/#retrying-batch-eh
Use a RecoveringBatchErrorHandler where you can throw a BatchListenerFailedException to tell it which record in the batch failed.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#recovering-batch-eh
In both cases provide a DeadLetterPublishingRecoverer to the error handler; disable DLTs in the binder.
EDIT
Here's an example; it uses the newer functional style rather than the deprecated #StreamListener, but the same concepts apply (but you should consider moving to the functional style).
#SpringBootApplication
public class So69175145Application {
public static void main(String[] args) {
SpringApplication.run(So69175145Application.class, args);
}
#Bean
ListenerContainerCustomizer<AbstractMessageListenerContainer<?, ?>> customizer(
KafkaTemplate<byte[], byte[]> template) {
return (container, dest, group) -> {
container.setBatchErrorHandler(new RetryingBatchErrorHandler(new FixedBackOff(5000L, 2L),
new DeadLetterPublishingRecoverer(template,
(rec, ex) -> new TopicPartition("errors." + dest + "." + group, rec.partition()))));
};
}
/*
* DLT topic won't be auto-provisioned since enableDlq is false
*/
#Bean
public NewTopic topic() {
return TopicBuilder.name("errors.so69175145.grp").partitions(1).replicas(1).build();
}
/*
* Functional equivalent of #StreamListener
*/
#Bean
public Consumer<List<String>> input() {
return list -> {
System.out.println(list);
throw new RuntimeException("test");
};
}
/*
* Not needed here - just to show we sent them to the DLT
*/
#KafkaListener(id = "so69175145", topics = "errors.so69175145.grp")
public void listen(String in) {
System.out.println("From DLT: " + in);
}
}
spring.cloud.stream.bindings.input-in-0.destination=so69175145
spring.cloud.stream.bindings.input-in-0.group=grp
spring.cloud.stream.bindings.input-in-0.content-type=text/plain
spring.cloud.stream.bindings.input-in-0.consumer.batch-mode=true
# for DLT listener
spring.kafka.consumer.auto-offset-reset=earliest
[foo]
2021-09-14 09:55:32.838ERROR...
...
[foo]
2021-09-14 09:55:37.873ERROR...
...
[foo]
2021-09-14 09:55:42.886ERROR...
...
From DLT: foo

How to log offset in KStreams Bean using spring-kafka and kafka-streams

I have referred almost all the questions regarding logging offset on KStreams via Processor API's transform() or process() method like mentioned in many questions here -
How can I get the offset value in KStream
But Im not able to get the solution these answers so I'm asking this question.
I want to log the partition, consumer-group-id and offset each time the message is consumed by the stream, I'm not getting how to integrate process() or transform() method with the ProcessorContext API? And if I'm implementing Processor interface in my CustomParser class then I would have to implement all the methods but Im not sure if that will work, like it is mentioned in the confluent docs for Record Meta Data - https://docs.confluent.io/current/streams/developer-guide/processor-api.html#streams-developer-guide-processor-api
I've set up KStreams in a spring-boot application like below (for reference have change the variable names)
#Bean
public Set<KafkaStreams> myKStreamJson(StreamsBuilder profileBuilder) {
Serde<JsonNode> jsonSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);
final KStream<String, JsonNode> pStream = myBuilder.stream(inputTopic, Consumed.with(Serdes.String(), jsonSerde));
Properties props = streamsConfig.kStreamsConfigs().asProperties();
pstream
.map((key, value) -> {
try {
return CustomParser.parse(key, value);
} catch (Exception e) {
LOGGER.error("Error occurred - " + e.getMessage());
}
return new KeyValue<>(null, null);
}
)
.filter((key, value) -> {
try {
return MessageFilter.filterNonNull(key, value);
} catch (Exception e) {
LOGGER.error("Error occurred - " + e.getMessage());
}
return false;
})
.through(
outputTopic,
Produced.with(Serdes.String(), new JsonPOJOSerde<>(TransformedMessage.class)));
return Sets.newHashSet(
new KafkaStreams(profileBuilder.build(), props)
);
}

Implement Transformer; save off the ProcessorContext in init(); you can then access the record metadata in transform() and simply return the original key/value.
Here is an example of a Transformer. It is provided by Spring for Apache Kafka to invoke a Spring Integration flow to transform the key/value.

How to implement FlinkKafkaProducer serializer for Kafka 2.2

I've been working on updating a Flink processor (Flink version 1.9) that reads from Kafka and then writes to Kafka. We have written this processor to run towards a Kafka 0.10.2 cluster and now we have deployed a new Kafka cluster running version 2.2. Therefore I set out to update the processor to use the latest FlinkKafkaConsumer and FlinkKafkaProducer (as suggested by the Flink docs). However I've run into some problems with the Kafka producer. I'm unable to get it to Serialize data using deprecated constructors (not surprising) and I've been unable to find any implementations or examples online about how to implement a Serializer (all the examples are using older Kafka Connectors)
The current implementation (for Kafka 0.10.2) is as follows
FlinkKafkaProducer010<String> eventBatchFlinkKafkaProducer = new FlinkKafkaProducer010<String>(
"playerSessions",
new SimpleStringSchema(),
producerProps,
(FlinkKafkaPartitioner) null
);
When trying to implement the following FlinkKafkaProducer
FlinkKafkaProducer<String> eventBatchFlinkKafkaProducer = new FlinkKafkaProducer<String>(
"playerSessions",
new SimpleStringSchema(),
producerProps,
null
);
I get the following error:
Exception in thread "main" java.lang.NullPointerException
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.<init>(FlinkKafkaProducer.java:525)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.<init>(FlinkKafkaProducer.java:483)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.<init>(FlinkKafkaProducer.java:357)
at com.ebs.flink.sessionprocessor.SessionProcessor.main(SessionProcessor.java:122)
and I haven't been able to figure out why.
The constructor for FlinkKafkaProducer is also deprecated and when I try implementing the non-deprecated constructor I can't figure out how to serialize the data.
The following is how it would look:
FlinkKafkaProducer<String> eventBatchFlinkKafkaProducer = new FlinkKafkaProducer<String>(
"playerSessions",
new KafkaSerializationSchema<String>() {
#Override
public ProducerRecord<byte[], byte[]> serialize(String s, #Nullable Long aLong) {
return null;
}
},
producerProps,
FlinkKafkaProducer.Semantic.EXACTLY_ONCE
);
But I don't understand how to implement the KafkaSerializationSchema and I find no examples of this online or in the Flink docs.
Does anyone have any experience implementing this or any tips on why the FlinkProducer gets NullPointerException in the step?

If you are just sending String to Kafka:
public class ProducerStringSerializationSchema implements KafkaSerializationSchema<String>{
private String topic;
public ProducerStringSerializationSchema(String topic) {
super();
this.topic = topic;
}
#Override
public ProducerRecord<byte[], byte[]> serialize(String element, Long timestamp) {
return new ProducerRecord<byte[], byte[]>(topic, element.getBytes(StandardCharsets.UTF_8));
}
}
For sending a Java Object:
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonProcessingException;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.streaming.connectors.kafka.KafkaSerializationSchema;
import org.apache.kafka.clients.producer.ProducerRecord;
public class ObjSerializationSchema implements KafkaSerializationSchema<MyPojo>{
private String topic;
private ObjectMapper mapper;
public ObjSerializationSchema(String topic) {
super();
this.topic = topic;
}
#Override
public ProducerRecord<byte[], byte[]> serialize(MyPojo obj, Long timestamp) {
byte[] b = null;
if (mapper == null) {
mapper = new ObjectMapper();
}
try {
b= mapper.writeValueAsBytes(obj);
} catch (JsonProcessingException e) {
// TODO
}
return new ProducerRecord<byte[], byte[]>(topic, b);
}
}
In your code
.addSink(new FlinkKafkaProducer<>(producerTopic, new ObjSerializationSchema(producerTopic),
params.getProperties(), FlinkKafkaProducer.Semantic.EXACTLY_ONCE));

To the deal with the timeout in the case of FlinkKafkaProducer.Semantic.EXACTLY_ONCE you should read https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-011-and-newer, particularly this part:
Semantic.EXACTLY_ONCE mode relies on the ability to commit transactions that were started before taking a checkpoint, after recovering from the said checkpoint. If the time between Flink application crash and completed restart is larger than Kafka’s transaction timeout there will be data loss (Kafka will automatically abort transactions that exceeded timeout time). Having this in mind, please configure your transaction timeout appropriately to your expected down times.
Kafka brokers by default have transaction.max.timeout.ms set to 15 minutes. This property will not allow to set transaction timeouts for the producers larger than it’s value. FlinkKafkaProducer011 by default sets the transaction.timeout.ms property in producer config to 1 hour, thus transaction.max.timeout.ms should be increased before using the Semantic.EXACTLY_ONCE mode.

KafkaProducer is always picking localhost:8081 for schema registry in Java API

KafkaProducer is not able to pick schema.registry.url defined in its properties .
As we can see in following screenshot , the schema registry url is a dummy url
// variable which is being debugged
private KafkaProducer<K, V> kafkaFullAckProducer;
But still in my logs the publishing of messaging using KafkaProducer fails with host as http://0:8081
{"#timestamp":"2018-10-31T18:57:37.906+05:30","message":"Failed to send HTTP request to endpoint: http://0:8081/subjects/
These two above mentioned proofs were taken in one single run of the programme . As we can clearly see the schmearegistry url prompting during eclipse debugging is 123.1.1.1 but it is http://0 in case of my failed logs .
Because of this in my other environment i am not able to run other assigned schema.registry.url because it is always using http://0
The code is hosted on one machine and the schema registry / broker is on another .
The registry was started in development environment ./confluent start
My Producer Code :
private KafkaProducer<K, V> kafkaFullAckProducer;
MonitoringConfig config;
public void produceWithFullAck(String brokerTopic, V genericRecord) throws Exception {
// key is null
ProducerRecord<K, V> record = new ProducerRecord<K, V>(brokerTopic, genericRecord);
try {
Future<RecordMetadata> futureHandle = this.kafkaFullAckProducer.send(record, (metadata, exception) -> {
if (metadata != null) {
log.info("Monitoring - Sent record(key=" + record.key() + " value=" + record.value()
+ " meta(partition=" + metadata.partition() + " offset=" + metadata.offset() + ")");
}
});
RecordMetadata recordMetadata = futureHandle.get();
} catch (Exception e) {
if (e.getCause() != null)
log.error("Monitoring - " + e.getCause().toString());
throw new RuntimeException(e.getMessage(), e.getCause());
} finally {
// initializer.getKafkaProducer().close();
}
}
#PostConstruct
private void initialize() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, config.getBrokerList());
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class.getName());
// kafkaProps.put("value.serializer",
// "org.apache.kafka.common.serialization.ByteArraySerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CustomAvroSerializer.class.getName());
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, Constants.COMPRESSION_TYPE_CONFIG);
props.put(ProducerConfig.RETRIES_CONFIG, config.getProducerRetryCount());
props.put("schema.registry.url", config.getSchemaRegistryUrl()); // the url is coming right here
props.put("acks", "all");
kafkaFullAckProducer = new KafkaProducer<K, V>(props);
}
MonitoringConfig :
public class MonitoringConfig {
#Value("${kafka.zookeeper.connect}")
private String zookeeperConnect;
#Value("${kafka.broker.list}")
private String brokerList;
#Value("${consumer.timeout.ms:1000}")
private String consumerTimeout;
#Value("${producer.retry.count:1}")
private String producerRetryCount;
#Value("${schema.registry.url}")
private String schemaRegistryUrl;
#Value("${consumer.enable:false}")
private boolean isConsumerEnabled;
#Value("${consumer.count.thread}")
private int totalConsumerThread;
}
application.properties :
kafka.zookeeper.connect=http://localhost:2181
kafka.broker.list=http://localhost:9092
consumer.timeout.ms=1000
producer.retry.count=1
schema.registry.url=http://123.1.1.1:8082
The custom avro serializer is something which I need to deprecate and use the way as discussed here but I am sure that is not the cause of this problem .
Here are the details of hosts :
HOST 1 : Has this Java service and Kafka Connect and the error logs are coming here .
HOST 2: Has Kafka , Schema Registry and Zookeper .

You're using a custom serializer, and as part of the Serializer implementation, you must define a configure method that accepts a Map.
Within that method, I'm guessing you defined a CachedSchemaRegistryClient field, but did not extract the url property that's added at the Producer level from the config map, and so it'll default to using some other localhost address
The Confluent code requires stepping through four classes, but you'll see the Serializer implementation here, then look at the separate Config class, as well as the Abstract Serializer and SerDe parent classes. From your previous post, I had pointed out that I didn't think you needed to actually use a custom Avro serializer because you seemed to be redoing what the AbstractKafkaSerializer class does

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to skip an Avro serialization exception in KafkaStreams API? - java

Related

Karate 0.9.6 1.1.0 - org.graalvm.polyglot.PolyglotException: not found error when using classpath to specify the file location [duplicate]

Retry max 3 times when consuming batches in Spring Cloud Stream Kafka Binder

How to log offset in KStreams Bean using spring-kafka and kafka-streams

How to implement FlinkKafkaProducer serializer for Kafka 2.2

KafkaProducer is always picking localhost:8081 for schema registry in Java API

Categories

Resources