I am writing code first time in java to consume AVRO data from kafka topic. I am using kafka-avro-console-producer to produce record.I am using leneseio/fast-data-dev image on Docker to UP kafka stack.
Producing record :
root#fast-data-dev / $ kafka-avro-console-producer --broker-list localhost:9092 --topic payengine --property schema.registry.url=http://localhost:8081 --property value.schema='{"type":"record", "name":"payengine", "fields":[{"name":"tin", "type":"string"},{"name":"ach","type":"string"}] }'
{"tin":"61582","ach":"I"}
{"tin":"97820","ach":"I"}
Now, to read this record, I have written below code. Also, it seems like I don't have to refer the schema while consuming records (as mentioned in the below reference link). I had also gone through one example, where SpecificAvroRecord was being used in place of GenericRecord, but that requires building class based on schema. I am not sure how GenericRecord points to correct schema, but don't see any schema reference in the example.
package com.github.psingh.Kafka;
import io.confluent.kafka.serializers.KafkaAvroDeserializer;
import io.confluent.kafka.serializers.KafkaAvroDeserializerConfig;
import org.apache.avro.generic.GenericRecord;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringSerializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public class SimpleConsumer_AvroSchema {
public static void main(String[] args) {
// System.out.println("Hello Kafka ");
// setting properties
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
//name topic
String topic = "payengine";
// create the consumer
KafkaConsumer<String, GenericRecord> consumer = new KafkaConsumer<String, GenericRecord>(props);
//subscribe to topic
consumer.subscribe(Collections.singleton(topic));
System.out.println("Waiting for the data...");
while (true) {
ConsumerRecords<String, GenericRecord> records = consumer.poll(Duration.ofMillis(5000));
for(ConsumerRecord<String,GenericRecord> record: records) {
System.out.print(record.value());
}
// consumer.commitSync();
}
}
}
The code built was successful.I was hoping to get console produced record here, but I am not getting anything :
Please suggest.
I have taken reference from here :
https://docs.confluent.io/current/schema-registry/serdes-develop/serdes-avro.html
Related
Hi I'm getting trouble in starting kafka producer using java. please help me if u know the proper solution. below is the code i've used. I've went throgh various solutions on statck overflow. and tried some of them but they didn't solved the issue.
import java.util.Properties;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
public class KafkaProducerClass {
public static void main(String[] args) {
Properties properties = new Properties();
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "9092");
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
//Thread.currentThread().setContextClassLoader(null);
Producer<String, String> producer = new KafkaProducer<>(properties);
for(int i=0;i<20;i++) {
ProducerRecord<String, String> producerRecord = new ProducerRecord<>("TestTopic", "Message from java");
producer.send(producerRecord);
}
producer.close();
}
}
The Exception i got is:
Exception in thread "main" org.apache.kafka.common.KafkaException: Failed to construct kafka producer
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:434)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:298)
at com.innominds.producer.KafkaProducerClass.main(KafkaProducerClass.java:21)
Caused by: org.apache.kafka.common.config.ConfigException: Invalid url in bootstrap.servers: 9092
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:59)
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:48)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:408)
... 2 more
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "9092");
9092 is not a valid DNS name in your network.
You must provide a valid IP or host along with the port you want to connect to
I would suggest using a higher level library such as Vertx or Quarkus or Spring for simpler configuration options
I have a cloudera Quickstart VM. i have installed Kafka parcels using Cloudera Manager and its working fine inside the VM using console based consumer and producer.
But when i try to use java based consumer it does not produce or consume messages. I can list the topics.
But i cannot consume messages.
following is my code.
package kafka_consumer;
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
public class mclass {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "10.0.75.1:9092");
// Just a user-defined string to identify the consumer group
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
// Enable auto offset commit
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
// List of topics to subscribe to
consumer.subscribe(Arrays.asList("second_topic"));
for (String k_topic : consumer.listTopics().keySet()) {
System.out.println(k_topic);
}
while (true) {
try {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("Offset = %d\n", record.offset());
System.out.printf("Key = %s\n", record.key());
System.out.printf("Value = %s\n", record.value());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
}
And following is the output of the code. while the console producer is producing messages but consumer is not able to receive it.
PS: i can telnet the port and ip of the Kafka broker. I can even list the topics. Consumer is constantly running without crashing but no messages is being consumed.
While trying timestamp in ProducerRecord; I found something weird. After sending few messages from the producer, I ran kafka-console-consumer.sh and verified that those messages are in the topic. I stopped the producer and waited for a minute. When I reran kafka-console-consumer.sh then it did not show the messages that I generated previously. I also added producer.flush() and producer.close() but the outcome was still the same.
Now, when I stopped using timestamp field then everything worked fine which makes me believe that there is something finicky about messages with timestamp.
I am using Kafka_2.11-2.0.0 (released on July 30, 2018)
Following is the sample code.
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.header.internal.RecordHeaders;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
import static java.lang.Thread.sleep;
public class KafkaProducerSample{
public static void main(String[] args){
String kafkaHost="sample:port";
String notificationTopic="test";
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, this.kafkaHost);
props.put(ProducerConfig.ACKS_CONFIG, 1);
props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
Producer<String, String> producer = new KafkaProducer(props, new StringSerialize(), new StringSerializer);
RecordHeaders recordHeaders = new RecordHeader();
ProducerRecord<String, String> record = new ProducerRecord(notificationTopic, null, 1574443515L, sampleKey, SampleValue);
producer.send(record);
sleep(1000);
}
}
I run console consumer as following
$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap.server KAFKA_HOST:PORT --topic test --from-beginning
#output after running producer
test
#output 5mins after shutting down producer
You are asynchronously sending only one record, but not ack-ing or flushing the buffer.
You will need to send more records,
or
producer.send(record).get();
or
producer.send(record);
producer.flush();
or (preferred), do Runtime.addShutdownHook() in your main method to flush and close the producer
I'm developing a simple java with spark streaming.
I configured a kafka jdbc connector (postgres to topic) and I wanna read it with a spark streaming consumer.
I'm able to read to topic correctly with:
./kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --property print.key=true --from-beginning --topic postgres-ip_audit
getting this results:
null
{"id":1557,"ip":{"string":"90.228.176.138"},"create_ts":{"long":1554819937582}}
when I use my java application with this config:
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "localhost:9092");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "groupStreamId");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", false);
I get results like that:
�179.20.119.53�����Z
Can someone point me how to fix my issue?
I try also to use a ByteArrayDeserializer and convert the bytes[] in to a string but I get always bad character results.
You can deserialize avro messages using io.confluent.kafka.serializers.KafkaAvroDeserializer and having schema registry in to manage the records schema.
Here is a sample code snippet
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import io.confluent.kafka.serializers.KafkaAvroDecoder;
import kafka.serializer.StringDecoder;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaPairInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka.KafkaUtils;
import scala.Tuple2;
public class SparkStreaming {
public static void main(String... args) {
SparkConf conf = new SparkConf();
conf.setMaster("local[2]");
conf.setAppName("Spark Streaming Test Java");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaStreamingContext ssc = new JavaStreamingContext(sc, Durations.seconds(10));
processStream(ssc, sc);
ssc.start();
ssc.awaitTermination();
}
private static void processStream(JavaStreamingContext ssc, JavaSparkContext sc) {
System.out.println("--> Processing stream");
Map<String, String> props = new HashMap<>();
props.put("bootstrap.servers", "localhost:9092");
props.put("schema.registry.url", "http://localhost:8081");
props.put("group.id", "spark");
props.put("specific.avro.reader", "true");
props.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
Set<String> topicsSet = new HashSet<>(Collections.singletonList("test"));
JavaPairInputDStream<String, Object> stream = KafkaUtils.createDirectStream(ssc, String.class, Object.class,
StringDecoder.class, KafkaAvroDecoder.class, props, topicsSet);
stream.foreachRDD(rdd -> {
rdd.foreachPartition(iterator -> {
while (iterator.hasNext()) {
Tuple2<String, Object> next = iterator.next();
Model model = (Model) next._2();
System.out.println(next._1() + " --> " + model);
}
}
);
});
}
}
Complete sample application is available in this github repo
You provided a StringDeserializer however you are sending values serialized with avro so you need to deserialized them accordingly. Using spark 2.4.0 (and the following deps compile org.apache.spark:spark-avro_2.12:2.4.1 you can achieve it by using from_avro function:
import org.apache.spark.sql.avro._
// `from_avro` requires Avro schema in JSON string format.
val jsonFormatSchema = new String(Files.readAllBytes(Paths.get("path/to/your/schema.avsc")))
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
Dataset<Row> output = df
.select(from_avro(col("value"), jsonFormatSchema).as("user"))
.where("user.favorite_color == \"red\"")
.show()
If you need to use a schema registry (like you did with kafka-avro-console-consumer) it's not possible out of the box and need a to write a lot of code. I'll recommend using this lib https://github.com/AbsaOSS/ABRiS. However it's only compatible with spark 2.3.0
Is there a way to start a consumer from a specific offset using the initial properties that we pass
I know there is props.put("auto.offset.reset", "earliest") but that gets me to the beginning.
However I want to go back and my scenarios are as follows
Specify an offset where I want to start at
Specify the time where I want to start
And I want to do that using the initial properties as a preferred option.
If that is not possible then using some other mechanism
Attaching my Simple Consumer code for reference
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class SimpleConsumer {
public static void main(String[] args) throws Exception {
String topicName = "test3";
Properties props = new Properties();
String groupId = "single";
// Kafka consumer configuration settings
props.put("bootstrap.servers", "mymachine:9092");
props.put("group.id", groupId);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.offset.reset", "earliest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topicName));
System.out.println("Starting the _NON-BATCH_ consumer ::: Topic=" + topicName+" GroupId="+groupId);
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("%s (offset:%d, key:%s, partition = %s, topic = %s)", record.value(), record.offset(), record.key(), record.partition(), record.topic());
System.out.println();
}
}
}
}
For scenario 1, you can use KafkaConsumer.seek(TopicPartition, offset) to specify the offset from which you read.
For scenario 2, Kafka 0.10.1.0 offers KafkaConsumer.offsetsForTimes method, allowing you to lookup the offsets for the given partitions by timestamp, then invoking seek() method to retrieve the desired messages you want.