I write a kafka-streams application that getting data from topic "topic_one" (data had received from MySQL). Then I want to get a part (section "after", see below) of this data with KStream interface to make other operations. But I have an error with serialization then I use mapValue(). I am a new in kafka-streams and have no idea how to make and use a proper serde. Can anybody help me?
Source data from topic_one:
[KSTREAM-SOURCE-0000000000]: null, {"before": null, "after": {"id": 1, "category": 1, "item": "abc"}, "source": {"version": "0.8.3.Final", "name": "example", "server_id": 1, "ts_sec": 1581491071, "gtid": null, "file": "mysql-bin.000013", "pos": 217827349, "row": 0, "snapshot": false, "thread": 95709, "db": "example", "table": "item", "query": null}, "op": "c", "ts_ms": 1581491071727}
I want to get:
{"id": 1, "category": 1, "item": "abc"}
My code:
Properties properties = getProperties();
try {
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> resourceStream = builder.stream("topic_one");
resourceStream.print(Printed.toSysOut());
KStream<String, String> resultStream = resourceStream.mapValues(value ->
new Gson().fromJson(value, JsonObject.class).get("after").getAsJsonObject().toString());
resultStream.print(Printed.toSysOut());
Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology, properties);
streams.cleanUp();
streams.start();
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
private static Properties getProperties() {
Properties properties = new Properties(); // TODO настройки вынести в отдельный файл?
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, "app_id");
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.put("schema.registry.url", "http://localhost:8081");
return properties;
}
Error:
Exception in thread "streams_id-db618fbf-c3e4-468b-a5a2-18e6b0b9c6be-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=matomo.matomo.matomo_scenarios_directory, partition=0, offset=30, stacktrace=org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor. Do the Processor's input types match the deserialized types? Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters. **Make sure the Processor can accept the deserialized input of type key: unknown because key is null, and value: org.apache.avro.generic.GenericData$Record.
Note that although incorrect Serdes are a common cause of error, the cast exception might have another cause (in user code, for example). For example, if a processor wires in a store, but casts the generics incorrectly, a class cast exception could be raised during processing, but the cause would not be wrong Serdes.**
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:122)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:201)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:180)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:133)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:429)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:474)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:536)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:792)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.lang.String
at org.apache.kafka.streams.kstream.internals.AbstractStream.lambda$withKey$1(AbstractStream.java:103)
at org.apache.kafka.streams.kstream.internals.KStreamMapValues$KStreamMapProcessor.process(KStreamMapValues.java:40)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:118)
... 10 more
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:446)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:474)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:536)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:792)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
**Caused by: org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor. Do the Processor's input types match the deserialized types? Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters. Make sure the Processor can accept the deserialized input of type key: unknown because key is null, and value: org.apache.avro.generic.GenericData$Record.**
Note that although incorrect Serdes are a common cause of error, the cast exception might have another cause (in user code, for example). For example, if a processor wires in a store, but casts the generics incorrectly, a class cast exception could be raised during processing, but the cause would not be wrong Serdes.
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:122)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:201)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:180)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:133)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:429)
... 5 more
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.lang.String
at org.apache.kafka.streams.kstream.internals.AbstractStream.lambda$withKey$1(AbstractStream.java:103)
at org.apache.kafka.streams.kstream.internals.KStreamMapValues$KStreamMapProcessor.process(KStreamMapValues.java:40)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:118)
... 10 more
In your getProperties() method, you defined your value serde as GenericAvroSerde.class, but when you create the streams, you are using String as value type. That's why you get the exception at runtime.
KStream<String, String> resourceStream = ...
KStream<String, String> resultStream = ...
If you really use Avro as message format, then you have the use the correct types, when defining you KStream. But as it seems, you have just JSON strings as values, so you can probably just set the correct value serde by replacing
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
with
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
Hope it helps.
Related
I ran the spark application where i joined two datsets and formed one dataset,and using an Encoder I converted Dataset<Row> into Dataset<T> format.
Encoder looks as follows:
Encoder<RuleParamsBean> encoder = Encoders.bean(RuleParamsBean.class);
Dataset<RuleParamsBean> ds = new Dataset<RuleParamsBean>(sparkSession, finalJoined.logicalPlan(), encoder);
Dataset<RuleParamsBean> validateDataset = ds.map(rulesParamBean -> validateTransaction(rulesParamBean),encoder);
validateDataset.show();
And after the map operation over the dataset i am getting the error as follows:
Dataset<RuleParamsBean> ds = new Dataset<RuleParamsBean>(sparkSession, finalJoined.logicalPlan(), encoder);
Error Log
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'named_struct()' due to data type mismatch: input to function named_struct requires at least one argument;;
Relation[TXN_DETAIL_ID#0,TXN_HEADER_ID#1,TXN_SOURCE_CD#2,TXN_REC_TYPE_CD#3,TXN_DTTM#4,EXT_TXN_NBR#5,CUST_REF_NBR#6,CIS_DIVISION#7,ACCT_ID#8,TXN_VOL#9,TXN_AMT#10,CURRENCY_CD#11,MANUAL_SW#12,USER_ID#13,HOW_TO_USE_TXN_FLG#14,MESSAGE_CAT_NBR#15,MESSAGE_NBR#16,UDF_CHAR_1#17,UDF_CHAR_2#18,UDF_CHAR_3#19,UDF_CHAR_4#20,UDF_CHAR_5#21,UDF_CHAR_6#22,UDF_CHAR_7#23,... 102 more fields] JDBCRelation(CI_TXN_DETAIL_STG_DUMMY) [numPartitions=1]
Relation[ACCT_ID#377,ACCT_NBR_TYPE_CD#378,ACCT_NBR#379,VERSION#380,PRIM_SW#381] JDBCRelation(CI_ACCT_NBR_DUMMY) [numPartitions=1]
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:116)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:120)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:120)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:125)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:125)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:172)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:178)
at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
at org.apache.spark.sql.Dataset.map(Dataset.scala:2569)
at com.sample.Transformation.main(Transformation.java:100)
For me the issue was because of unsupported type. I was using LocalDate which was not supported in Spark 2.X I guess.(I think they included support for it in version 3)
I simply changed it from LocalDate to TimeStamp and it worked. Look if this is the case for you as well? Any type which is in your POJO which is not supported?
I am trying to Unit test my kafka consumer. I am trying to use MockConsumer class which comes with kafka-client java api.
Below is my configuration code
#Bean
public MockConsumer consumer(){
MockConsumer consumer = new MockConsumer(OffsetResetStrategy.LATEST);
consumer.assign(Arrays.asList(new TopicPartition("test-topic", 0)));
HashMap<TopicPartition, Long> beginningOffsets = new HashMap<>();
beginningOffsets.put(new TopicPartition("test-topic", 0), 0L);
consumer.updateBeginningOffsets(beginningOffsets);
consumer.addRecord(new ConsumerRecord<String, String>("test-topic",0,
0L, "mykey", "myvalue0"));
consumer.addRecord(new ConsumerRecord<String, String>("test-topic", 0,
1L, "mykey", "myvalue1"));
consumer.addRecord(new ConsumerRecord<String, String>("test-topic", 0,
2L, "mykey", "myvalue2"));
consumer.addRecord(new ConsumerRecord<String, String>("test-topic", 0,
3L, "mykey", "myvalue3"));
consumer.addRecord(new ConsumerRecord<String, String>("test-topic", 0,
4L, "mykey", "myvalue4"));
HashMap<TopicPartition, Long> endOffsets = new HashMap<>();
endOffsets.put(new TopicPartition("test-topic", 0), 4L);
consumer.updateEndOffsets(endOffsets);
return consumer;
}
Now When I am using this MockConsumer Bean in my test case like below
#Autowired
MockConsumer kafkaConsumer;
#Autowired
#InjectMocks
MyConsumer myConsumer; //this is the class having consumer code. This
//is the class under test
#Test
public void testConsumeWithAutoAssignment() throws Exception {
myConsumer.consumeTopic("test-topic");
}
I am getting exception from
kafkaConsumer.subscribe(topic)
java.lang.IllegalStateException: Subscription to topics, partitions and pattern are mutually exclusive
Please let me know if anyone has found the issue or fixed this.
This is because in the bean you are using consumer.assign(Arrays.asList(new TopicPartition("test-topic", 0))); which means that the consumer wants to consume from a specific partition (0) from the "test-topic". Then somewhere but I don't see where from the code you provided there is a call to subscribe(topic). With subscribe, the consumer becomes part of a consumer group and the Kafka broker assign partitions automatically (for re-balancing). You can't use both : assigning specific partition (USER DEFINED) and subscribing with auto assigning.
I faced the same issue, the workaround for this is to assign new TopicPartition(topic, 0) to a variable and use that. Following code works for me:
TopicPartition topicPartition = new TopicPartition(topic, 0);
consumer.assign(Arrays.asList(topicPartition));
HashMap<TopicPartition, Long> beginningOffsets = new HashMap<>();
beginningOffsets.put(topicPartition, 0L);
consumer.updateBeginningOffsets(beginningOffsets);
consumer.addRecord(new ConsumerRecord<>(topic, 0, 0L, "some-key", "some-value"));
I am using Confluent-3.2.1 as a Kafka streamer. I am trying to aggregate my KGroupedStream<String, MyClass1> into KTable<Windowed<String>,MsgAggr>. While using aggregation, I am also using TimeWindows.of(TimeUnit.SECONDS.toMillis(5)). I am using user defined "Serdes" as an argument to aggregation. The code for User define "Serdes" is,
Map<String, Object> serdeProps = new HashMap<>();
final Serializer<MsgAggr> pageViewSerializer = new JsonPOJOSerializer<>();
serdeProps.put("JsonPOJOClass", MsgAggr.class);
pageViewSerializer.configure(serdeProps, false);
final Deserializer<MsgAggr> pageViewDeserializer = new JsonPOJODeserializer<>();
serdeProps.put("JsonPOJOClass", MsgAggr.class);
pageViewDeserializer.configure(serdeProps, false);
final Serde<MsgAggr> pageViewSerde = Serdes.serdeFrom(pageViewSerializer, pageViewDeserializer);`
Code for Streaming is
KGroupedStream<String, MyClass1> msg_grp = message
.groupByKey();
KTable<Windowed<String>,MsgAggr> msg_win = msg_grp
//.reduce(new Reduced(), arg1, arg2);
.aggregate(new Init(),
new Aggr(),
TimeWindows.of(TimeUnit.SECONDS.toMillis(5)),
pageViewSerde,
"MySample_out");
When I run the code I got the errors:
[2017-05-23 18:16:45,648] ERROR stream-thread [StreamThread-1] Streams application error during processing: (org.apache.kafka.streams.processor.internals.StreamThread:249)
java.lang.ClassCastException: my.kafka.strm.MyClass1 cannot be cast to java.lang.String
at org.apache.kafka.common.serialization.StringSerializer.serialize(StringSerializer.java:24)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:64)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:82)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:202)
at org.apache.kafka.streams.kstream.internals.KStreamFilter$KStreamFilterProcessor.process(KStreamFilter.java:44)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:82)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:202)
at org.apache.kafka.streams.kstream.internals.KStreamMap$KStreamMapProcessor.process(KStreamMap.java:43)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:82)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:202)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:66)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:180)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:436)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242)
Exception in thread "StreamThread-1" java.lang.ClassCastException: my.kafka.strm.MyClass1 cannot be cast to java.lang.String
at org.apache.kafka.common.serialization.StringSerializer.serialize(StringSerializer.java:24)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:64)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:82)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:202)
at org.apache.kafka.streams.kstream.internals.KStreamFilter$KStreamFilterProcessor.process(KStreamFilter.java:44)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:82)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:202)
at org.apache.kafka.streams.kstream.internals.KStreamMap$KStreamMapProcessor.process(KStreamMap.java:43)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:82)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:202)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:66)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:180)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:436)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242)
The problem is with message.groupByKey();. Its using the String Serde for your custom class MyClass1. Please implement custom Serializer and deserializer for MyClass1 and use the same in the overloaded version of groupByKey - https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/kstream/KStream.html#groupByKey(org.apache.kafka.common.serialization.Serde,%20org.apache.kafka.common.serialization.Serde)
I use spring-cloud-stream-schema to read avro messages from kafka. I configured input channel in MessagesChannels:
#Input("topicName1")
SubscribableChannel fromInput1();
I have configuration file like that:
#Configuration
#EnableBinding(MessagesChannels.class)
#EnableSchemaRegistryClient
public class MessageConfiguration {
#Bean
public MessageConverter topic1MessageConverter() throws IOException {
return new AvroSchemaMessageConverter(MimeType.valueOf("avro/bytes"));
}
}
And my consumer is called with
fromInput1().subscribe(this::onMessage);
void onMessage(Message message) {
}
When I actually sent message I got this error:
nested exception is java.lang.ClassCastException:
org.apache.avro.generic.GenericData$Record cannot be cast to [B
Actually raw bytes are parsed correctly into org.apache.avro.generic.GenericData$Record. But spring requires Message class. How to cast GenericData$Record to Message or how to cast GenericData$Record directly to generated by avro-tools class?
More details:
2017-03-06 11:23:10.695 ERROR 19690 --- [afka-listener-1] o.s.kafka.listener.LoggingErrorHandler : Error while processing: ConsumerRecord(topic = topic1, partition = 0, offset = 7979, CreateTime = 1488784987569, checksum = 623709057, serialized key size = -1, serialized value size = 36, key = null, value = {"foor": "bar"})
org.springframework.messaging.MessageHandlingException: error occurred in message handler [org.springframework.cloud.stream.binder.AbstractMessageChannelBinder$ReceivingHandler#4bf9d802]; nested exception is java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to [B
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:139)
at org.springframework.integration.channel.FixedSubscriberChannel.send(FixedSubscriberChannel.java:70)
at org.springframework.integration.channel.FixedSubscriberChannel.send(FixedSubscriberChannel.java:64)
I think you need to set the contentType for the incoming message channel to use application/*+avro as specified here
I am trying to write a parquet read/write class for a certain class type using DataFrame/datasets
class schema:
class A {
long count;
List<B> listOfValues;
}
class B {
String id;
long count;
}
code :
String path = "some path";
List<A> entries = somerandomAentries();
JavaRDD<A> rdd = sc.parallelize(entries, 1);
DataFrame df = sqlContext.createDataFrame(rdd, A.class);
df.write().parquet(path);
DataFrame newDataDF = sqlContext.read().parquet(path);
newDataDF.show();
when i try to run this, it throws an error. what am I missing here? Do I need to provide a schema for the whole class while creating data frames
error:
Caused by: scala.MatchError: B(Id=abc, count=0) (of class B)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:255)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:169)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:153)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358)
at org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1358)
at org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1356)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:263)
... 8 more
You are getting error because nested JavaBeans are not supported in Spark 1.6 version. Please see https://spark.apache.org/docs/1.6.0/sql-programming-guide.html#inferring-the-schema-using-reflection
Currently, Spark SQL does not support JavaBeans that contain nested or contain complex types such as Lists or Arrays.