Spark serialization error: When I insert Spark Stream data into HBase - java

I'm confused about how spark interact with HBase in terms of data format. For instance, when I omitted the 'ERROR' line in the following code snippet, it runs well... but adding the line, I've caught the error saying 'Task not serializable' related to serialization issue.
How do I change the code?
What is the reason why the error happens?
My code is following :
// HBase
Configuration hconfig = HBaseConfiguration.create();
hconfig.set("hbase.zookeeper.property.clientPort", "2222");
hconfig.set("hbase.zookeeper.quorum", "127.0.0.1");
hconn = HConnectionManager.createConnection(hconfig);
HTable htable = new HTable(hconf, Bytes.toBytes(tableName));
// KAFKA configuration
Set<String> topics = Collections.singleton(topic);
Map<String, String> kafkaParams = new HashMap<>();
kafkaParams.put("metadata.broker.list", "localhost:9092");
kafkaParams.put("zookeeper.connect", "localhost:2222");
kafkaParams.put("group.id", "tag_topic_id");
//Spark Stream
JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
ssc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topics );
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
#Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
JavaDStream<String> records = lines.flatMap(new FlatMapFunction<String, String>() {
#Override
public Iterator<String> call(String x) throws IOException {
////////////// Put into HBase : ERROR /////////////////////
String[] data = x.split(",");
if (null != data && data.length > 2 ){
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
String ts = sdf.format(new Date());
Put put = new Put(Bytes.toBytes(ts));
put.addImmutable(Bytes.toBytes(familyName), Bytes.toBytes("LINEID"), Bytes.toBytes(data[0]));
put.addImmutable(Bytes.toBytes(familyName), Bytes.toBytes("TAGID"), Bytes.toBytes(data[1]));
put.addImmutable(Bytes.toBytes(familyName), Bytes.toBytes("VAL"), Bytes.toBytes(data[2]));
htable.put(put); // ***** ERROR ********
htable.close();
}
return Arrays.asList(COLDELIM.split(x)).iterator();
}
});
records.print();
ssc.start();
ssc.awaitTermination();
When I start my application, I met the following error:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
at org.apache.spark.streaming.dstream.DStream$$anonfun$flatMap$1.apply(DStream.scala:554)
at org.apache.spark.streaming.dstream.DStream$$anonfun$flatMap$1.apply(DStream.scala:554)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:682)
at org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:264)
at org.apache.spark.streaming.dstream.DStream.flatMap(DStream.scala:553)
at org.apache.spark.streaming.api.java.JavaDStreamLike$class.flatMap(JavaDStreamLike.scala:172)
at org.apache.spark.streaming.api.java.AbstractJavaDStreamLike.flatMap(JavaDStreamLike.scala:42)
Caused by: java.io.NotSerializableException: org.apache.hadoop.hbase.client.HTable
Serialization stack:
- object not serializable (class: org.apache.hadoop.hbase.client.HTable, value: MCSENSOR;hconnection-0x6839203b)

You have a hint here by serialization debugger
Caused by: java.io.NotSerializableException: org.apache.hadoop.hbase.client.HTable
Serialization stack:
- object not serializable (class: org.apache.hadoop.hbase.client.HTable, value: MCSENSOR;hconnection-0x6839203b)
put the below part inside FlatMapFunction before call method (closure) where you are using it, that should solve the issue
Configuration hconfig = HBaseConfiguration.create();
hconfig.set("hbase.zookeeper.property.clientPort", "2222");
hconfig.set("hbase.zookeeper.quorum", "127.0.0.1");
hconn = HConnectionManager.createConnection(hconfig);
HTable htable = new HTable(hconf, Bytes.toBytes(tableName));

Related

Hortonworks Schema Registry + Nifi + Java: Deserialize Nifi Record

I am trying to deserialize some Kafka messages that were serialized by Nifi, using Hortonworks Schema Registry
Processor used on the Nifi Side as RecordWritter: AvroRecordSetWriter
Schema write strategy: HWX COntent-Encoded Schema Reference
I am able to deserialize these messsages in other Nifi kafka consumer. However I am trying to deserialize them from my Flink application using Kafka code.
I have the following inside the Kafka deserializer Handler of my Flink Application:
final String SCHEMA_REGISTRY_CACHE_SIZE_KEY = SchemaRegistryClient.Configuration.CLASSLOADER_CACHE_SIZE.name();
final String SCHEMA_REGISTRY_CACHE_EXPIRY_INTERVAL_SECS_KEY = SchemaRegistryClient.Configuration.CLASSLOADER_CACHE_EXPIRY_INTERVAL_SECS.name();
final String SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_SIZE_KEY = SchemaRegistryClient.Configuration.SCHEMA_VERSION_CACHE_SIZE.name();
final String SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS_KEY = SchemaRegistryClient.Configuration.SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS.name();
final String SCHEMA_REGISTRY_URL_KEY = SchemaRegistryClient.Configuration.SCHEMA_REGISTRY_URL.name();
Properties schemaRegistryProperties = new Properties();
schemaRegistryProperties.put(SCHEMA_REGISTRY_CACHE_SIZE_KEY, 10L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_CACHE_EXPIRY_INTERVAL_SECS_KEY, 5000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_SIZE_KEY, 1000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS_KEY, 60 * 60 * 1000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_URL_KEY, "http://schema_registry_server:7788/api/v1");
return (Map<String, Object>) HWXSchemaRegistry.getInstance(schemaRegistryProperties).deserialize(message);
And here is the HWXSchemaRegistryCode to deserialize the message:
import com.hortonworks.registries.schemaregistry.avro.AvroSchemaProvider;
import com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient;
import com.hortonworks.registries.schemaregistry.errors.SchemaNotFoundException;
import com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer;
public class HWXSchemaRegistry {
private SchemaRegistryClient client;
private Map<String,Object> config;
private AvroSnapshotDeserializer deserializer;
private static HWXSchemaRegistry hwxSRInstance = null;
public static HWXSchemaRegistry getInstance(Properties schemaRegistryConfig) {
if(hwxSRInstance == null)
hwxSRInstance = new HWXSchemaRegistry(schemaRegistryConfig);
return hwxSRInstance;
}
public Object deserialize(byte[] message) throws IOException {
Object o = hwxSRInstance.deserializer.deserialize(new ByteArrayInputStream(message), null);
return o;
}
private static Map<String,Object> properties2Map(Properties config) {
Enumeration<Object> keys = config.keys();
Map<String, Object> configMap = new HashMap<String,Object>();
while (keys.hasMoreElements()) {
Object key = (Object) keys.nextElement();
configMap.put(key.toString(), config.get(key));
}
return configMap;
}
private HWXSchemaRegistry(Properties schemaRegistryConfig) {
_log.debug("Init SchemaRegistry Client");
this.config = HWXSchemaRegistry.properties2Map(schemaRegistryConfig);
this.client = new SchemaRegistryClient(this.config);
this.deserializer = this.client.getDefaultDeserializer(AvroSchemaProvider.TYPE);
this.deserializer.init(this.config);
}
}
But I am getting a 404 HTTP Error code(schema not found). I think this is due to incompatible "protocols" between Nifi configuration and HWX Schema Registry Client implementation, so schema identifier bytes that the client is looking for does not exist on the server, or something like this.
Can someone help on this?
Thank you.
Caused by: javax.ws.rs.NotFoundException: HTTP 404 Not Found
at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1069)
at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:866)
at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:750)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:205)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390)
at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:748)
at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:404)
at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:300)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient$14.run(SchemaRegistryClient.java:1054)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient$14.run(SchemaRegistryClient.java:1051)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.getEntities(SchemaRegistryClient.java:1051)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.getAllVersions(SchemaRegistryClient.java:872)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.getAllVersions(SchemaRegistryClient.java:676)
at HWXSchemaRegistry.(HWXSchemaRegistry.java:56)
at HWXSchemaRegistry.getInstance(HWXSchemaRegistry.java:26)
at SchemaService.deserialize(SchemaService.java:70)
at SchemaService.deserialize(SchemaService.java:26)
at org.apache.flink.streaming.connectors.kafka.internals.KafkaDeserializationSchemaWrapper.deserialize(KafkaDeserializationSchemaWrapper.java:45)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:140)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:712)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
I found a workaround. Since I wasn't able to get this working. I take the first bytes of the byte array to make several calls to schema registry and get the avro schema to deserialize later the rest of the byte array.
First byte (0) is protocol version (I figured out this is a Nifi-specific byte, since I didn't need it).
Next 8 bytes are the schema Id
Next 4 bytes are the schema version
The rest of the bytes are the message itself:
import com.hortonworks.registries.schemaregistry.SchemaMetadataInfo;
import com.hortonworks.registries.schemaregistry.SchemaVersionInfo;
import com.hortonworks.registries.schemaregistry.SchemaVersionKey;
import com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient;
try(SchemaRegistryClient client = new SchemaRegistryClient(this.schemaRegistryConfig)) {
try {
Long schemaId = ByteBuffer.wrap(Arrays.copyOfRange(message, 1, 9)).getLong();
Integer schemaVersion = ByteBuffer.wrap(Arrays.copyOfRange(message, 9, 13)).getInt();
SchemaMetadataInfo schemaInfo = client.getSchemaMetadataInfo(schemaId);
String schemaName = schemaInfo.getSchemaMetadata().getName();
SchemaVersionInfo schemaVersionInfo = client.getSchemaVersionInfo(
new SchemaVersionKey(schemaName, schemaVersion));
String avroSchema = schemaVersionInfo.getSchemaText();
byte[] message= Arrays.copyOfRange(message, 13, message.length);
// Deserialize [...]
}
catch (Exception e)
{
throw new IOException(e.getMessage());
}
}
I also thought that maybe I had to remove the first byte before calling the hwxSRInstance.deserializer.deserialize in my question code, since this byte seems to be a Nifi specific byte to communicate between Nifi processors, but it didn't work.
Next step is to build a cache with the schema texts to avoid calling multiple times the schema registry API.
New info: I will extend my answer to include the avro deserialization part, since it was some troubleshooting for me and I had to inspect Nifi Avro Reader source code to figure out this part (I was getting not valid Avro data exception when trying to use the basic avro deserialization code):
import org.apache.avro.Schema;
import org.apache.avro.file.SeekableByteArrayInput;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;
private static GenericRecord deserializeMessage(byte[] message, String schemaText) throws IOException {
InputStream in = new SeekableByteArrayInput(message);
Schema schema = new Schema.Parser().parse(schemaText);
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(in, null);
GenericRecord genericRecord = null;
genericRecord = datumReader.read(genericRecord, decoder);
in.close();
return genericRecord;
}
If you want to convert GenericRecord to map, note that String values are not Strings objects, you need to cast the Keys and values of types string:
private static Map<String, Object> avroGenericRecordToMap(GenericRecord record)
{
Map<String, Object> map = new HashMap<>();
record.getSchema().getFields().forEach(field ->
map.put(String.valueOf(field.name()), record.get(field.name())));
// Strings are maped to Utf8 class, so they need to be casted (all the keys of records and those values which are typed as string)
if(map.get("value").getClass() == org.apache.avro.util.Utf8.class)
map.put("value", String.valueOf(map.get("value")));
return map;
}

Flink 1.4.2 SQL Maps?

I am currently using Flink V 1.4.2
If I have a POJO:
class CustomObj{
public Map<String, String> custTable = new HashMap<>();
public Map<String, String> getcustTable(){ return custTable; }
public void setcustTable(Map<String, String> custTable){
this.custTable = custTable;
}
}
I have a DataStream<POJO> ds = //from some kafka source
Now I do tableEnv.registerDataStream("tableName", ds);
And I want to run
tableEnv.sqlQuery("SELECT * FROM tableName WHERE custTable['key'] = 'val'");
When I try running this I get the error:
org.apache.flink.table.api.TableException: Type is not supported: ANY
What can I do about this and how can I fix it?

Will Kafka flapmapValues split the record into multiple records when passing json array object?

I'm using confluent 5.0.0 version*
I've a JSON array like below :
{
"name" : "David,Corral,Babu",
"age" : 23
}
and by using kafka streams, I want to split the above record into two based on criteria of comma in the value of the "name" key. The output should be something like :
{
"name" : "David",
"age" : 23
},
{
"name" : "Corral",
"age" : 23
},
{
"name" : "Babu",
"age" : 23
}
For this I'm using "flatMapValues". But so far I'm not able to achieve
the expected results.
But wanted to check if "flatmapValues" is the correct function to be used
for my requirement?
I've used following code:
package test;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.ValueMapper;
import org.apache.kafka.streams.kstream.KeyValueMapper;
import org.apache.kafka.streams.kstream.Produced;
import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class RecordSplliter {
public static void main(String[] args) throws Exception {
System.out.println("** STARTING RecordSplliter STREAM APP **");
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "json-e44nric2315her");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, PersonSeder.class);
final Serde<String> stringSerde = Serdes.String();
final StreamsBuilder builder = new StreamsBuilder();
// Consume JSON and enriches it
KStream<String, Person> source = builder.stream("streams-plaintext-input");
KStream<String, String> output = source
.flatMapValues(person -> Arrays.asList(person.getName().split(",")));
output.to("streams-output");
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// Attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
During runtime I've got following exception:
08:31:10,822 ERROR
org.apache.kafka.streams.processor.internals.AssignedStreamsTasks -
stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-
StreamThread-1] Failed to process stream task 0_0 due to the following
error:
org.apache.kafka.streams.errors.StreamsException: Exception caught in
process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=streams-
plaintext-input, partition=0, offset=0
at
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:304)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409)
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:957)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: myapps.PersonSerializer) is not compatible to the actual key or value type (key type: unknown because key is null / value type: java.lang.String). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:94)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.KStreamFlatMapValues$KStreamFlatMapValuesProcessor.process(KStreamFlatMapValues.java:42)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:288)
... 6 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to myapps.Person
at myapps.PersonSerializer.serialize(PersonSerializer.java:1)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:154)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:98)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
... 18 more
08:31:10,827 INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN
08:31:10,827 INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1] Shutting down
08:31:10,833 INFO org.apache.kafka.clients.producer.KafkaProducer - [Producer clientId=json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1-producer] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
08:31:10,843 INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
08:31:10,843 INFO org.apache.kafka.streams.KafkaStreams - stream-client [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387] State transition from RUNNING to ERROR
08:31:10,843 WARN org.apache.kafka.streams.KafkaStreams - stream-client [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387] All stream threads have died. The instance will be in error state and should be closed.
08:31:10,843 INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1] Shutdown complete
Exception in thread "json-enricher-0f8bc964-40c0-41f2-a724-dfa638923387-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=streams-plaintext-input, partition=0, offset=0
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:304)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409)
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:957)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: myapps.PersonSerializer) is not compatible to the actual key or value type (key type: unknown because key is null / value type: java.lang.String). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:94)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.KStreamFlatMapValues$KStreamFlatMapValuesProcessor.process(KStreamFlatMapValues.java:42)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:288)
... 6 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to myapps.Person
at myapps.PersonSerializer.serialize(PersonSerializer.java:1)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:154)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:98)
at
org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
... 18 more
Exception is because your flatMapValues produced value of type String. In your code you don't pass any Produced to KStream::to function so it tries to use default one (passed in properties), which in your case is PersonSeder.class.
Your values are of type String, but PersonSeder.class is used to serializatoin.
If you would like to split it you need something like this
KStream<String, Person> output = source
.flatMapValues(person ->
Arrays.stream(person.getName().split(","))
.map(name -> new Person(name, person.getAge()))
.collect(Collectors.toList()));
I've used following code with your serializer and with deserializer, that is symmetrical (also using a Gson) and it works
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "app1");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, PersonSerdes.class);
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, Person> source = builder.stream("input");
KStream<String, Person> output = source
.flatMapValues(person ->
Arrays.stream(person.getName()
.split(","))
.map(name -> new Person(name, person.getAge()))
.collect(Collectors.toList()));
output.to("output");
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
UPDATE 1:
According to your question regarding using json instead POJO, everything depends on your Sedes. If you use Generic Serdes you can serialize and deserialize to/from Json (Map)
Below is simple MapSerdes, that can be used for that and sample code of usage.
import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serializer;
import java.lang.reflect.Type;
import java.nio.charset.Charset;
import java.util.Map;
public class MapSerdes implements Serde<Map<String, String>> {
private static final Charset CHARSET = Charset.forName("UTF-8");
#Override
public void configure(Map<String, ?> configs, boolean isKey) {}
#Override
public void close() {}
#Override
public Serializer<Map<String, String>> serializer() {
return new Serializer<Map<String, String>>() {
private Gson gson = new Gson();
#Override
public void configure(Map<String, ?> configs, boolean isKey) {}
#Override
public byte[] serialize(String topic, Map<String, String> data) {
String line = gson.toJson(data); // Return the bytes from the String 'line'
return line.getBytes(CHARSET);
}
#Override
public void close() {}
};
}
#Override
public Deserializer<Map<String, String>> deserializer() {
return new Deserializer<Map<String, String>>() {
private Type type = new TypeToken<Map<String, String>>(){}.getType();
private Gson gson = new Gson();
#Override
public void configure(Map<String, ?> configs, boolean isKey) {}
#Override
public Map<String, String> deserialize(String topic, byte[] data) {
Map<String,String> result = gson.fromJson(new String(data), type);
return result;
}
#Override
public void close() {}
};
}
}
Sample usage:
Instead name, depends on your map you can use different properties.
public class GenericJsonSplitterApp {
public static void main(String[] args) {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "app1");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, MapSerdes.class);
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, Map<String, String>> source = builder.stream("input");
KStream<String, Map<String, String>> output = source
.flatMapValues(map ->
Arrays.stream(map.get("name")
.split(","))
.map(name -> {
HashMap<String, String> splittedMap = new HashMap<>(map);
splittedMap.put("name", name);
return splittedMap;
})
.collect(Collectors.toList()));
output.to("output");
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}

how to send multiple pairs (key , value) to the Xadd command using java?

// I have tried to send the multiple values to the streams using XAdd command.
public class LettuceDemo {
public static void main(String[] args) {
RedisClient redisClient = RedisClient.create("redis://password#localhost:6739/0");
StatefulRedisConnection<String, String> connection =redisClient.connect();
RedisStreamCommands<String, String> streamCommands = connection.sync();
List<String> supplierNames1 = new ArrayList<String>();
supplierNames1.add("PaymentOption");
supplierNames1.add("StartDate");
supplierNames1.add("EndDate");
supplierNames1.add("RateOfInterest");
supplierNames1.add("RequiredLoanAmmount");
List<String> supplierNames2 = new ArrayList<String>();
supplierNames2.add(String.valueOf(123));
supplierNames2.add(String.valueOf(765));
supplierNames2.add(String.valueOf(347746));
supplierNames2.add(String.valueOf(8347674));
supplierNames2.add(String.valueOf(34875645));
Map<List<String>, List<String>> body1 = Collections.singletonMap(supplierNames1, supplierNames2);
String messageId = streamCommands.xadd("demo", body1);
System.out.println("my-stream code reference " + messageId);
connection.close();
redisClient.shutdown();
}
}
// I facing the issue while executing the program.
Exception in thread "main" java.lang.IllegalArgumentException: Message >body.length must be a multiple of 2 and contain a sequence of field1, >value1, field2, value2, fieldN, valueN
at io.lettuce.core.internal.LettuceAssert.isTrue(LettuceAssert.java:131)
at io.lettuce.core.RedisCommandBuilder.xadd(RedisCommandBuilder.java:2110)
at io.lettuce.core.AbstractRedisAsyncCommands.xadd(AbstractRedisAsyncCommands.java:1499)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at io.lettuce.core.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:57)
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
at com.sun.proxy.$Proxy0.xadd(Unknown Source)
at com.excent.experiences.tinnumber.LettuceDemo.main(LettuceDemo.java:48)

How to pass HashMap to forEach tag in xls generated by jett?

I have a Map in managed bean
private Map<FaseProducao, Set<FichaTecnicaOperacao>> fichasTecnicasOperacaoResumo;
that reference to entity FichaTecnica:
public class FichaTecnica{
//...
private Set<FichaTecnicaOperacao> operacoes;
}
which I need to pass as a parameter on a beans.put () to generate an xls with jett:
public void createRelatorioFichaTecnica(FichaTecnica fichaTecnica) throws IOException {
//ommited...
Map<String, Object> beans = new HashMap<String, Object>();
beans.put("operacaoResumo", fichasTecnicasOperacaoResumo);
try (ByteArrayOutputStream saida = new ByteArrayOutputStream();
InputStream template = this.getClass().getResourceAsStream("/templates/jett/fichaTecnica.xls");
Workbook workbook = transformer.transform(template, beans);) {
//ommited...
}
}
when the xls is generated the exception happens:
WARNING [javax.enterprise.resource.webcontainer.jsf.lifecycle] (default task-28) #{ProdutoManagedBean.createRelatorioFichaTecnica(row)}: net.sf.jett.exception.AttributeExpressionException: Expected a "java.util.Collection" for "items", got a "java.util.HashMap": "${operacaoResumo}".
so I'm not understanding this error because a Map is a correct collection? So why does not jett recognize it in items = "$ {operacaoResumo}"? I created this forEach based on the link on the site:
http://jett.sourceforge.net/tags/forEach.html
As #rgettman said I did:
public void createRelatorioFichaTecnica(FichaTecnica fichaTecnica) throws IOException {
//ommited...
Map<String, Object> beans = new HashMap<String, Object>();
beans.put("operacaoResumo", fichasTechicasOperacaoResumo.keySet());
}
thank you all!

Categories

Resources