I hope you can help me. Let's use for example this very simple code
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class ms_example1 {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
final StreamsBuilder builder = new StreamsBuilder();
builder.stream("streams-plaintext-input").to("streams-pipe-output");
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
Like you can see this microservice only forwards messages from one kafka-topic to another. I also want to send this data to zipkin to see the duration of the messages or something like that.
Maybe I've seen the solution and don't get it but I realy have looked for a solution but didn't find one. You are my last hope. I have seen the brave api but I don't realy understand how to use it for kafka. I
Related
I am running a Kafka Cluster Docker Compose on an AWS EC2 instance.
I want to receive all the tweets of a specific keyword and push them to Kafka. This works fine.
But I also want to count the most used words of those tweets.
This is the WordCount code:
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.StreamsBuilder;
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import java.util.concurrent.CountDownLatch;
import static org.apache.kafka.streams.StreamsConfig.APPLICATION_ID_CONFIG;
import static org.apache.kafka.streams.StreamsConfig.BOOTSTRAP_SERVERS_CONFIG;
import static org.apache.kafka.streams.StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG;
import static org.apache.kafka.streams.StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG;
public class WordCount {
public static void main(String[] args) {
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, String> textLines = builder
.stream("test-topic");
textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, value) -> value)
.count(Materialized.as("WordCount"))
.toStream()
.to("test-output", Produced.with(Serdes.String(), Serdes.Long()));
final Topology topology = builder.build();
Properties props = new Properties();
props.put(APPLICATION_ID_CONFIG, "streams-word-count");
props.put(BOOTSTRAP_SERVERS_CONFIG, "ec2-ip:9092");
props.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
Runtime.getRuntime().addShutdownHook(
new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
When I check the output topic in the Control Center, it looks like this:
Key
Value
Looks like it's working as far as splitting the tweets into single words. But the count value isn't in Long format, although it is specified in the code.
When I use the kafka-console-consumer to consume from this topic, it says:
"Size of data received by LongDeserializer is not 8"
Control Center UI and console consumer can only render UTF8 data, by default.
You'll need to explicitly pass LongDeserializer to the console consumer, as the value deserializer only
try a KTable instead:
KStream<String, String> textLines = builder.stream("test-topic", Consumed.with(stringSerde, stringSerde));
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("test-output", Produced.with(Serdes.String(), Serdes.Long()));
I want to concatenate logs by ID within a window of time using Kafka Streams.
For now, I can successfully count the number of logs having a same ID (the commented code).
However, when I replace the .count method with .aggregate I face following error:
"Failed to flush state store time-windowed-aggregation-stream-store"
Caused by: java.lang.ClassCastException: org.apache.kafka.streams.kstream.Windowed cannot be cast to java.lang.String
I'm new to this and can't figure out the cause of this error, I think that having .withValueSerde(Serdes.String()) is supposed to prevent this.
Below my code:
package myapps;
import java.time.Duration;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.*;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Suppressed.*;
import org.apache.kafka.streams.state.WindowStore;
public class MyCode {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-mycode");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("streams-plaintext-input");
KStream<String, String> changedKeyStream = source.selectKey((k, v)
-> v.substring(v.indexOf("mid="),v.indexOf("mid=")+8));
/* // Working code for count
changedKeyStream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(3))
.grace(Duration.ofSeconds(2)))
.count(Materialized.with(Serdes.String(), Serdes.Long())) // could be replaced with an aggregator (reducer?) ?
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded()))
.toStream()
.print(Printed.toSysOut());
*/
changedKeyStream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(3)))
.aggregate(
String::new, (String k, String v, String Result) -> { return Result+"\n"+v; },
Materialized.<String, String, WindowStore<Bytes, byte[]>>as("time-windowed-aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.String())) /* serde for aggregate value */
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded()))
.toStream()
.print(Printed.toSysOut());
changedKeyStream.to("streams-mycode-output", Produced.with(Serdes.String(), Serdes.String()));
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
// launch until control+c
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.out.print("Something went wrong!");
System.exit(1);
}
System.exit(0);
}
}
Thank you in advance for your help.
There are two option to fix it:
Pass org.apache.kafka.streams.kstream.Grouped to KStream::groupByKey.
Set org.apache.kafka.common.serialization.Serde to Materialized - Materialized::withKeySerde(...)
Sample code bellow:
Ad 1.
changedKeyStream
.groupByKey(Grouped.with(Serdes.String(), Serdes.String()))
.windowedBy(TimeWindows.of(Duration.ofSeconds(3)))
Ad 2.
changedKeyStream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(3)))
.aggregate(
String::new, (String k, String v, String Result) -> { return Result+"_"+v; },
Materialized.<String, String, WindowStore<Bytes, byte[]>>as("time-windowed-aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.String())
.withKeySerde(Serdes.String())
)
I am trying to create a KStream application in Eclipse using Java. right now I am referring to the word count program available on the internet for KStreams and modifying it.
What I want is that the data that I am reading from the input topic should be written to a file instead of being written to another output topic.
But when I am trying to print the KStream/KTable to the local file, I am getting the following entry in the output file:
org.apache.kafka.streams.kstream.internals.KStreamImpl#4c203ea1
How do I implement redirecting the output from the KStream to a file?
Below is the code:
package KStreamDemo.kafkatest;
package org.apache.kafka.streams.examples.wordcount;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.KeyValueMapper;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.kstream.ValueMapper;
import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class TemperatureDemo {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "34.73.184.104:9092");
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
System.out.println("#1###################################################################################################################################################################################");
// setting offset reset to earliest so that we can re-run the demo code with the same pre-loaded data
// Note: To re-run the demo, you need to use the offset reset tool:
// https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Application+Reset+Tool
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
StreamsBuilder builder = new StreamsBuilder();
System.out.println("#2###################################################################################################################################################################################");
KStream<String, String> source = builder.stream("iot-temperature");
System.out.println("#5###################################################################################################################################################################################");
KTable<String, Long> counts = source
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
#Override
public Iterable<String> apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
}
})
.groupBy(new KeyValueMapper<String, String, String>() {
#Override
public String apply(String key, String value) {
return value;
}
})
.count();
System.out.println("#3###################################################################################################################################################################################");
System.out.println("OUTPUT:"+ counts);
System.out.println("#4###################################################################################################################################################################################");
// need to override value serde to Long type
counts.toStream().to("iot-temperature-max", Produced.with(Serdes.String(), Serdes.Long()));
final KafkaStreams streams = new KafkaStreams(builder.build(), props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-wordcount-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
This is not correct
System.out.println("OUTPUT:"+ counts);
You would need to do counts.foreach, then print the messages out to a file.
Print Kafka Stream Input out to console? (just update to write to file instead)
However, probably better to write out the stream to a topic. And the use Kafka Connect to write out to a file. This is a more industry-standard pattern. Kafka Streams is encouraged to only move data between topics within Kafka, not integrate with external systems (or filesystems)
Edit connect-file-sink.properties with the topic information you want, then
bin/connect-standalone config/connect-file-sink.properties
i am running a Java Producer code. However, i get the below error :
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
Here, is the snippet of my producer class :
import java.util.List;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import pojo.JsonToPojo;
public class KafkaSender {
public void sendtoKafka(List<JsonToPojo> data) {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "0.0.0.0:9092");
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, 5);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 80554432);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.MAX_BLOCK_MS_CONFIG, "60000");
Producer<String, JsonToPojo> producer = new KafkaProducer<String, JsonToPojo>(props);
TestCallback callback = new TestCallback();
for (JsonToPojo toKafka : data) {
ProducerRecord<String, JsonToPojo> record = new ProducerRecord<String, JsonToPojo>("dontknow", toKafka.group_city.toString(), toKafka);
// RecordMetadata metadata = producer.send(record).get();
// System.out.println("Hey" + metadata.topic());
producer.send(record, callback);
}
producer.close();
}
private static class TestCallback implements Callback {
#Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
System.out.println("Error while producing message to topic :" + recordMetadata);
e.printStackTrace();
} else {
String message = String.format("sent message to topic:%s partition:%s offset:%s",
recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset());
System.out.println(message);
}
}
}
}
I am using kafka 0.9 version on a MapR cluster. Right now, there is just one broker which is running. I dont get any error apart from the one i have posted above. I have played with the server.properties file by changing a few parameters, but nothing seem to work.
I am trying to write a simple java kafka consumer to read data using similar code as in https://github.com/bkimminich/apache-kafka-book-examples/blob/master/src/test/kafka/consumer/SimpleHLConsumer.java.
Looks like my app is able to connect, but its not fetching any data. Please suggest.
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
//import scala.util.parsing.json.JSONObject
import scala.util.parsing.json.JSONObject;
public class SimpleHLConsumer {
private final ConsumerConnector consumer;
private final String topic;
public SimpleHLConsumer(String zookeeper, String groupId, String topic) {
Properties props = new Properties();
props.put("zookeeper.connect", zookeeper);
props.put("group.id", groupId);
// props.put("zookeeper.session.timeout.ms", "5000");
// props.put("zookeeper.sync.time.ms", "250");
// props.put("auto.commit.interval.ms", "1000");
consumer = Consumer.createJavaConsumerConnector(new ConsumerConfig(props));
this.topic = topic;
}
public void testConsumer() {
Map<String, Integer> topicCount = new HashMap<>();
topicCount.put(topic, 1);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerStreams = consumer.createMessageStreams(topicCount);
System.out.println(consumerStreams);
List<KafkaStream<byte[], byte[]>> streams = consumerStreams.get(topic);
System.out.println(streams);
System.out.println(consumer);
for (final KafkaStream stream : streams) {
ConsumerIterator<byte[], byte[]> it = stream.iterator();
System.out.println("for loop");
System.out.println(it);
System.out.println("Message from Single Topic: " + new String(it.next().message()));
//System.out.println("Message from Single Topic: " + new String(it.message()));
while (it.hasNext()) {
System.out.println("in While");
System.out.println("Message from Single Topic: " + new String(it.next().message()));
}
}
// if (consumer != null) {
// consumer.shutdown();
// }
}
public static void main(String[] args) {
String topic = "test";
SimpleHLConsumer simpleHLConsumer = new SimpleHLConsumer("localhost:2181", "testgroup", topic);
simpleHLConsumer.testConsumer();
}
}
Here is the output i see in eclipse. It does seem to connect to my zookeeper , but it just hangs there, it does not display any message at all.
log4j:WARN No appenders could be found for logger (kafka.utils.VerifiableProperties).
log4j:WARN Please initialize the log4j system properly.
SLF4J: The requested version 1.6 by your slf4j binding is not compatible with [1.5.5, 1.5.6]
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.
{test=[testgroup kafka stream]}
[testgroup kafka stream]
kafka.javaapi.consumer.ZookeeperConsumerConnector#6200f9cb
for loop
Consumer iterator hasNext is blocking call. It will block indefinitely if no new message is available for consumption.
To verify this, change your code to
// Comment 2 lines below
// System.out.println(it);
// System.out.println("Message from Single Topic: " + new String(it.next().message()));
// Line below is blocking. Your code will hang till next message in topic.
// Add new message in topic using producer, message will appear in console
while (it.hasNext()) {
Better way is to execute code in separate thread. Use consumer.timeout.ms to specify time in ms, after which consumer will throw timeout exception
// keepRunningThread is flag to control when to exit consumer loop
while(keepRunningThread)
{
try
{
if(it.hasNext())
{
System.out.println(new String(it.next().message()));
}
}
catch(ConsumerTimeoutException ex)
{
// Timeout exception waiting for kafka message
// Wait for 5 (or t) seconds before checking for message again
Thread.sleep(5000);
}
}