I have configured Redis as MessageBus for my spring-xd setup. When my stream fails, the data is pushed to Error Queues. I'm trying to read them back and push them back to destination queues. But I don't see the my Sink modules receiving data. Can some help me understand where I'm going wrong.
Code Snippet.
public RedisTemplate<String, byte[]> redisTemplate(RedisConnectionFactory redisConnectionFactory) {
final RedisTemplate<String, byte[]> template = new RedisTemplate<String, byte[]>();
template.setConnectionFactory(redisConnectionFactory);
template.setKeySerializer(new StringRedisSerializer());
template.setEnableDefaultSerializer(false);
return template;
}
List<String> listOfKeys = new ArrayList<>();
Set<byte[]> keys = redisTemplate.getConnectionFactory().getConnection().keys("ERRORS*".getBytes());
for (byte[] data : keys) {
listOfKeys.add(new String(data, 0, data.length));
}
for (String errorQueue : listOfKeys) {
String destinationQueue = errorQueue.replace("ERRORS:", EMPTY_STRING);
Long size = redisTemplate.opsForList().size(errorQueue);
for (int i = 0; i < size; i++) {
byte[] errorEvt = redisTemplate.opsForList().rightPop(errorQueue);
redisTemplate.opsForList().leftPush(destinationQueue, errorEvt);
}
}
A quick glance at your code says it should work ok; I suggest you run redis-cli and then enter monitor to watch the redis traffic.
EDIT
To answer your comment below - it depends on your content type. If it's simple text, it's relatively easy:
"\xff\x01\x0bcontentType\x00\x00\x00\x0c\"text/plain\"2016-05-20 10:11:07"
The first part are headers, you can decode that with XD's EmbeddedHeadersMessageConverter.
If your payload is a java object, it is serialized by the bus using kryo.
Related
I am using a Producer to send messages to a Kafka topic.
When JUnit testing, I have found that the producer in my application code (but not in my JUnit test class) is sending a null key, despite me providing a String key for it to use.
Code as follows:
Main application class
final Producer<String, HashSet<String>> actualApplicationProducer;
ApplicationInstance(String bootstrapServers) // constructor
{
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "ActualClient");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CustomSerializer.class.getName());
props.put(ProducerConfig.LINGER_MS_CONFIG, lingerBatchMS);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Math.min(maxBatchSizeBytes,1000000));
actualApplicationProducer = new KafkaProducer<>(props);
}
public void doStuff()
{
HashSet<String> values = new HashSet<String>();
String key = "applicationKey";
// THIS LINE IS SENDING A NULL KEY
actualApplicationProducer.send(new ProducerRecord<>(topicName, key, values));
}
But, in my junit classes:
#EmbeddedKafka
#ExtendWith(SpringExtension.class)
#SuppressWarnings("static-method")
#TestInstance(TestInstance.Lifecycle.PER_CLASS)
public class CIFFileProcessorTests
{
/** An Embedded Kafka Broker that can be used for unit testing purposes. */
#Autowired
private EmbeddedKafkaBroker embeddedKafkaBroker;
#BeforeAll
public void setUpBeforeClass(#TempDir File globalTablesDir, #TempDir File rootDir) throws Exception
{
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "JUnitClient");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CustomSerializer.class.getName());
props.put(ProducerConfig.LINGER_MS_CONFIG, lingerBatchMS);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Math.min(maxBatchSizeBytes,1000000));
try(Producer<String, HashSet<String>> junitProducer = new Producer<>(props))
{
HashSet<String> values = new HashSet<>();
// Here, I'm sending a record, just like in my main application code, but it's sending the key correctly and not null
junitProducer.send(new ProducerRecord<>(topicName,"junitKey",values));
}
#Test
public void test()
{
ApplicationInstance sut = new ApplicationInstance(embeddedKafkaBroker.getBrokersAsString());
sut.doStuff();
// "records" is a LinkedBlockingQueue, populated by a KafkaMessageListenerContainer which is monitoring the topic for records using a MessageListener
ConsumerRecord<String, HashSet<String>> record = records.poll(1,TimeUnit.SECONDS);
assertEquals("junitKey", record.key()); // TEST FAILS - expected "junitKey" but returned null
}
Custom serializer:
try (final ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos))
{
oos.writeObject(object);
return baos.toByteArray();
}
Does anyone know why the KafkaProducer would send a null key when I explicitly specify a String?
--- Update ---
I have tried inspecting the metadata, and the Producer is indeed sending the key, and not null:
RecordMetadata info = actualApplicationProducer.send(new ProducerRecord<>(topicName, key, values)).get();
System.out.println("INFO - partition: " + info.partition() + ", topic: " + info.topic() + ", offset: " + info.offset() + ", timestamp: "+ info.timestamp() + ", keysize: " + info.serializedKeySize() + ", valuesize: " + info.serializedValueSize());
output:
INFO - partition: 0, topic: topicName, offset: 2, timestamp: 1656060840304, keysize: 14, valuesize: 6258
The keysize being > 0 shows that null is not passed to the topic.
So, the issue must be with the reading of the topic, perhaps?
Turns out, I was using a different Deserializer class for my KafkaMessageListenerContainer, which didn't know what to do with the String as provided
Not sure why you want to use ByteArrayOutputStream or ObjectOutputStream for serializing KAFKA producer records, that may be your requirement. In such case, you may refer the producer section from https://dzone.com/articles/kafka-producer-and-consumer-example
But injecting key in the producer record can be easily done. For example, if you want generate a Producer Record from an AVRO schema and use assert to inject record key and value, you can do something like this.
Generate a AVRO or Specific records
You can refer https://technology.amis.nl/soa/kafka/generate-random-json-data-from-an-avro-schema-using-java/
You can convert it to SpecifiRecords using JSONAVROConverter:
public static ProducerRecord<String, CustomEvent> generateRecord(){
String schemaFile = "AVROSchema.avsc";
Schema schema = getSchema(JSONFile);
String json = getJson(dataFile);
byte[] jsonBytes = json.getBytes(StandardCharsets.UTF_8);
CustomEventMessage producerRecord = null;
JsonAvroConverter converter = new JsonAvroConverter();
try {
record = converter.convertToSpecificRecord(jsonBytes, CustomEvent.class, schema);
} catch (Exception e) {
}
String recordKey = "YourKey";
return new ProducerRecord<String, CustomEvent>( topic, recordKey, record);
}
You can inject the ProducerRecord into your Assert functions later.
I am using Kafka streams to read and process protobuf messages.
I am using the following properties for the stream:
Properties properties = new Properties();
properties.put(ConsumerConfig.GROUP_ID_CONFIG, kafkaConfig.getGroupId());
properties.put(StreamsConfig.CLIENT_ID_CONFIG, kafkaConfig.getClientId());
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, kafkaConfig.getApplicationId());
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaConfig.getBootstrapServers());
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.StringSerde.class);
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, KafkaProtobufSerde.class);
properties.put(AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, kafkaConfig.getSchemaRegistryUrl());
properties.put(KafkaProtobufDeserializerConfig.SPECIFIC_PROTOBUF_VALUE_TYPE, ProtobufData.class);
return properties;
}
but while running I encounter this error:
Caused by: java.lang.ClassCastException: class com.google.protobuf.DynamicMessage cannot be cast to class model.schema.proto.input.ProtobufDataProto$ProtobufData (com.google.protobuf.DynamicMessage and model.schema.proto.input.ProtobufDataProto$ProtobufData are in unnamed module of loader 'app')
My .proto files looks as follows:
import "inner_data.proto";
package myPackage;
option java_package = "model.schema.proto.input";
option java_outer_classname = "ProtobufDataProto";
message OuterData {
string timestamp = 1;
string x = 3;
repeated InnerObject flows = 4;
}
(I have two separate proto files)
package myPackage;
option java_package = "model.schema.proto.input";
option java_outer_classname = "InnerDataProto";
message InnerData {
string a = 1;
string b = 2;
string c = 3;
}
I would like to know why Kafka uses DynamicMessage even though I gave the specific protobuf value class in the properties and how to fix this?
I had the same issue while trying to make Kafkastream working with protobuf,
I solved this issue by using specifically KafkaProtobufSerde to configure the streambuilder AND by specifiing explicitly the class to deserialize to with this line: serdeConfig.put(SPECIFIC_PROTOBUF_VALUE_TYPE,ProtobufDataProto.class.getName());
/*
* Define SpecificSerde for Even in protobuff
*/
final KafkaProtobufSerde< ProtobufDataProto > protoSerde = new KafkaProtobufSerde<>();
Map<String, String> serdeConfig = new HashMap<>();
serdeConfig.put(SCHEMA_REGISTRY_URL_CONFIG, registryUrl);
/*
* Technically, the following line is only mandatory in order to de-serialize object into GeneratedMessageV3
* and NOT into DynamicMessages : https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/DynamicMessage
*/
serdeConfig.put(SPECIFIC_PROTOBUF_VALUE_TYPE,ProtobufDataProto.class.getName());
protoSerde.configure(serdeConfig, false);
Then i can create my input stream and it will be deserialized:
//Define a Serde for the key
final Serde<byte[]> bytesSerde = Serdes.ByteArray();
//Define the stream
StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.stream("inputTopic", Consumed.with(bytesSerde, protoSerde));
/*
add your treatments, maps, filter etc
...
*/
streamsBuilder.build();
I am doing a POC for writing data to S3 using Flink. The program does not give a error. However I do not see any files being written in S3 either.
Below is the code
public class StreamingJob {
public static void main(String[] args) throws Exception {
// set up the streaming execution environment
final String outputPath = "s3a://testbucket-s3-flink/data/";
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//Enable checkpointing
env.enableCheckpointing();
//S3 Sink
final StreamingFileSink<String> sink = StreamingFileSink
.forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8"))
.build();
//Source is a local kafka
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "kafka:9094");
properties.setProperty("group.id", "test");
DataStream<String> input = env.addSource(new FlinkKafkaConsumer<String>("queueing.transactions", new SimpleStringSchema(), properties));
input.flatMap(new Tokenizer()) // Tokenizer for generating words
.keyBy(0) // Logically partition the stream for each word
.timeWindow(Time.minutes(1)) // Tumbling window definition
.sum(1) // Sum the number of words per partition
.map(value -> value.f0 + " count: " + value.f1.toString() + "\n")
.addSink(sink);
// execute program
env.execute("Flink Streaming Java API Skeleton");
}
public static final class Tokenizer
implements FlatMapFunction<String, Tuple2<String, Integer>> {
#Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
String[] tokens = value.toLowerCase().split("\\W+");
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<>(token, 1));
}
}
}
}
}
Note that I have set the s3.access-key and s3.secret-key value in the configuration and tested by changing them to incorrect values (I got a error on incorrect values)
Any pointers what may be going wrong?
Could it be that you are running into this issue?
Given that Flink sinks and UDFs in general do not differentiate between normal job termination (e.g. finite input stream) and termination due to failure, upon normal termination of a job, the last in-progress files will not be transitioned to the “finished” state.
I'm working on creating a framework to allow customers to create their own plugins to my software built on Apache Flink. I've outlined in a snippet below what I'm trying to get working (just as a proof of concept), however I'm getting a org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. error when trying to upload it.
I want to be able to branch the input stream into x number of different pipelines, then having those combine together into a single output. What I have below is just my simplified version I'm starting with.
public class ContentBase {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "kf-service:9092");
properties.setProperty("group.id", "varnost-content");
// Setup up execution environment and get stream from Kafka
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> logs = see.addSource(new FlinkKafkaConsumer011<>("log-input",
new JSONKeyValueDeserializationSchema(false), properties).setStartFromLatest())
.map((MapFunction<ObjectNode, ObjectNode>) jsonNodes -> (ObjectNode) jsonNodes.get("value"));
// Create a new List of Streams, one for each "rule" that is being executed
// For now, I have a simple custom wrapper on flink's `.filter` function in `MyClass.filter`
List<String> codes = Arrays.asList("404", "200", "500");
List<DataStream<ObjectNode>> outputs = new ArrayList<>();
for (String code : codes) {
outputs.add(MyClass.filter(logs, "response", code));
}
// It seemed as though I needed a seed DataStream to union all others on
ObjectMapper mapper = new ObjectMapper();
ObjectNode seedObject = (ObjectNode) mapper.readTree("{\"start\":\"true\"");
DataStream<ObjectNode> alerts = see.fromElements(seedObject);
// Union the output of each "rule" above with the seed object to then output
for (DataStream<ObjectNode> output : outputs) {
alerts.union(output);
}
// Convert to string and sink to Kafka
alerts.map((MapFunction<ObjectNode, String>) ObjectNode::toString)
.addSink(new FlinkKafkaProducer011<>("kf-service:9092", "log-output", new SimpleStringSchema()));
see.execute();
}
}
I can't figure out how to get the actual error out of the Flink web interface to add that information here
There were a few errors I found.
1) A Stream Execution Environment can only have one input (apparently? I could be wrong) so adding the .fromElements input was not good
2) I forgot all DataStreams are immutable so the .union operation creates a new DataStream output.
The final result ended up being much simpler
public class ContentBase {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "kf-service:9092");
properties.setProperty("group.id", "varnost-content");
// Setup up execution environment and get stream from Kafka
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> logs = see.addSource(new FlinkKafkaConsumer011<>("log-input",
new JSONKeyValueDeserializationSchema(false), properties).setStartFromLatest())
.map((MapFunction<ObjectNode, ObjectNode>) jsonNodes -> (ObjectNode) jsonNodes.get("value"));
// Create a new List of Streams, one for each "rule" that is being executed
// For now, I have a simple custom wrapper on flink's `.filter` function in `MyClass.filter`
List<String> codes = Arrays.asList("404", "200", "500");
List<DataStream<ObjectNode>> outputs = new ArrayList<>();
for (String code : codes) {
outputs.add(MyClass.filter(logs, "response", code));
}
Optional<DataStream<ObjectNode>> alerts = outputs.stream().reduce(DataStream::union);
// Convert to string and sink to Kafka
alerts.map((MapFunction<ObjectNode, String>) ObjectNode::toString)
.addSink(new FlinkKafkaProducer011<>("kf-service:9092", "log-output", new SimpleStringSchema()));
see.execute();
}
}
The code you post cannot be compiled through because of the last part code (i.e., converting to string). You mixed up the java stream API map with Flink one. Change it to
alerts.get().map(ObjectNode::toString);
can fix it.
Good luck.
I want to fetch kafka message offset from high-level consumer in java program. Since I am using custom commitoffset property, i want to test whether my custom commitoffset is working fine or not. Can anyone help me how to get offset???
i have come across with couple of kafka tools (like getoffsetshell) but it's not helpful in my testing.
When getting a message from the ConsumerIterator you can also get the offset by doing the following:
ConsumerConnector consumerConnector = Consumer.createJavaConsumerConnector(getConsumerConfig());
KafkaStream<byte[], byte[]> stream = getKafkaStream(consumerConnector);
ConsumerIterator<byte[], byte[]> iterator = stream.iterator();
while(iterator.hasNext()) {
MessageAndMetadata<byte[], byte[]> messageAndMetadata = iterator.next();
String message = new String(messageAndMetadata.message());
long offset = messageAndMetadata.offset();
}