Kafka Streams Twitter Wordcount - Count Value not Long after Serialization - java

I am running a Kafka Cluster Docker Compose on an AWS EC2 instance.
I want to receive all the tweets of a specific keyword and push them to Kafka. This works fine.
But I also want to count the most used words of those tweets.
This is the WordCount code:
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.StreamsBuilder;
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import java.util.concurrent.CountDownLatch;
import static org.apache.kafka.streams.StreamsConfig.APPLICATION_ID_CONFIG;
import static org.apache.kafka.streams.StreamsConfig.BOOTSTRAP_SERVERS_CONFIG;
import static org.apache.kafka.streams.StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG;
import static org.apache.kafka.streams.StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG;
public class WordCount {
public static void main(String[] args) {
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, String> textLines = builder
.stream("test-topic");
textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, value) -> value)
.count(Materialized.as("WordCount"))
.toStream()
.to("test-output", Produced.with(Serdes.String(), Serdes.Long()));
final Topology topology = builder.build();
Properties props = new Properties();
props.put(APPLICATION_ID_CONFIG, "streams-word-count");
props.put(BOOTSTRAP_SERVERS_CONFIG, "ec2-ip:9092");
props.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
Runtime.getRuntime().addShutdownHook(
new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
When I check the output topic in the Control Center, it looks like this:
Key
Value
Looks like it's working as far as splitting the tweets into single words. But the count value isn't in Long format, although it is specified in the code.
When I use the kafka-console-consumer to consume from this topic, it says:
"Size of data received by LongDeserializer is not 8"

Control Center UI and console consumer can only render UTF8 data, by default.
You'll need to explicitly pass LongDeserializer to the console consumer, as the value deserializer only

try a KTable instead:
KStream<String, String> textLines = builder.stream("test-topic", Consumed.with(stringSerde, stringSerde));
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("test-output", Produced.with(Serdes.String(), Serdes.Long()));

Related

TestOutputTopic.readKeyValuesToMap() removes messages from tested topic. How to do intermediate assertions during the test?

While using TopologyTestDriver I want to test my stream and do assrtions on intermediate state between incoming messages. But after using TestOutputTopic.readKeyValuesToMap() tested topic is cleared. How to "peek" and do assertions between messages?
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.test.TestRecord;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import java.util.Properties;
public class AggregationTest {
private static TestInputTopic<String, String> inputTopic;
private static TestOutputTopic<String, String> outputTopic;
private static TopologyTestDriver testDriver;
#BeforeAll
public static void setup() {
StreamsBuilder builder = new StreamsBuilder();
builder
.stream("inputTopic", Consumed.with(Serdes.String(), Serdes.String()))
.toTable(Materialized.with(Serdes.String(), Serdes.String()))
.groupBy(
KeyValue::pair,
Grouped.with("group-by-internal", Serdes.String(), Serdes.String()))
.aggregate(
() -> "",
(key, incomingMessage, existingMessage) -> incomingMessage + " " + existingMessage,
(key, incomingMessage, existingMessage) -> existingMessage
).toStream().to("outputTopic", Produced.with(Serdes.String(), Serdes.String()));
testDriver = new TopologyTestDriver(builder.build(), new Properties() {{
put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, Serdes.String().getClass().getName());
put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
put(StreamsConfig.STATE_DIR_CONFIG, "/tmp/kafka-streams");
}});
inputTopic = testDriver.createInputTopic("inputTopic", Serdes.String().serializer(), Serdes.String().serializer());
outputTopic = testDriver.createOutputTopic("outputTopic", Serdes.String().deserializer(), Serdes.String().deserializer());
}
#Test
public void testAggregation() {
TestRecord<String, String> message1 = new TestRecord<>( "Key1", "Value1");
TestRecord<String, String> message2 = new TestRecord<>( "Key2", "Value2");
TestRecord<String, String> message3 = new TestRecord<>( "Key1", "Value3");
inputTopic.pipeInput(message1);
inputTopic.pipeInput(message2);
var outputMap = outputTopic.readKeyValuesToMap();
System.out.println(outputMap); // {Key2=Value2 , Key1=Value1 }
// Assert that message1 and message2 was not effected
inputTopic.pipeInput(message3);
var outputMap2 = outputTopic.readKeyValuesToMap();
System.out.println(outputMap2); // {Key1=Value3 Value1 } // where message with Key2 disappeared?
// How to assert that message3 was merged with message1, but message2 was not effected?
}
#AfterAll
public static void tearDown() {
testDriver.close();
}
}
The JavaDoc indicates it should return the full, latest state of the topic (are you sure a tombstone event wasn't introduced, somehow?), so I am not sure why it would disappear.
If you want to aggregate the state of both maps, you can merge them rather than re-assign the previous reference, but that would fix the test, not necessarily actual runtime behavior...
You may want to revisit your aggregate function. Key2 has no existingMessage when it is originally incoming. Therefore, you've returned null there, and it would not exist in the map output. Only value you'd have is therefore Value3 Value1
Try this for instances where you only expect one value
(key, incomingMessage, existingMessage) -> existingMessage == null ? incomingMessage : existingMessage
From the docs:
readKeyValuesToMap:
"Read output to map. If the result is considered a stream, you can use readRecordsToList() instead."
The Map depict the table only containing the latest value per key. What you want is the stream of data records.
Link: https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/TestOutputTopic.html#readKeyValuesToMap--
(PS sorry for the incomplete answer, I can not comment yet.)

Count number of message/event in Kafka stream at a periodic level

I have created one Kafka stream by consuming the message from one Kafka topic. I want to count what is the number of messages that I have received at a 1-minute level.
So let's say, I have got the message in the following way:
t1 -> message1
t1 -> message2
t1 -> message3
After 1 minute I receive the message say like this
t2 -> message4
t2 -> message5
Let's say I have one integer variable count in my Java application. What I want is from the start of the application till 1 minute this count value should be 3. At the end of the second minute, this count variable should become 2. This is because at the first minute I Had received 3 messages and in the second minute I had received 2 messages.
My code so far
import lombok.SneakyThrows;
import org.apache.commons.lang3.StringUtils;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.ForeachAction;
import org.apache.kafka.streams.kstream.KStream;
import java.util.Properties;
public class CountMessage {
private static KafkaStreams kafkaStreams;
public static void main(String[] args) {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my_first_count_2");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "10.0.0.43:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass());
props.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, MyTimestampExtractor.class);
final StreamsBuilder streamsBuilder = new StreamsBuilder();
// consuming stream
String kafkaTopic = "my_kafka_topic_2";
System.out.println("Starting the application");
KStream<String, String> myStream = streamsBuilder
.stream(kafkaTopic);
myStream.foreach(new ForeachAction<String, String>() {
#SneakyThrows
#Override
public void apply(String key, String value) {
System.out.println("key received = " + key + "---<<<" + value);
}
});
final Topology topology = streamsBuilder.build();
kafkaStreams = new KafkaStreams(topology, props);
kafkaStreams.start();
}
}
Not sure if you're tied to using Kafka Streams, but for what it's worth you can do this with ksqlDB:
SELECT TIMESTAMPTOSTRING(WINDOWSTART,'yyyy-MM-dd HH:mm:ss') AS TS,
COUNT(*) AS MSG_COUNT
FROM SRC_STREAM
WINDOW TUMBLING (SIZE 1 MINUTE)
GROUP BY 'X'
EMIT CHANGES;

Kafka Streams - Fields in the Custom object changing to null while doing Aggregation

I've written a simple Kafka Stream processor code to
Read messages as stream from a topic with <K, V> as <String, String>
Convert the value in the message from String to a Custom Object <String, Object> using mapValues() method
Use Window function to aggregate the statistics of the Objects for a particular time interval
Sample Message
{"coiRequestGuid":"xxxx","accountId":1122132,"companyName":"xxxx","existingPolicyCoverageLimit":1000000,"isChangeRequested":true,"newlyRequestedPolicyCoverageLimit":200000,"isNewRecipient":false,"newRecipientGuid":null,"existingRecipientId":11111,"recipientName":"xxxx","recipientEmail":"xxxxx"}
Here is my code
import com.da.app.data.model.PolicyChangeRequest;
import com.da.app.data.model.PolicyChangeRequestStats;
import com.da.app.data.serde.JsonDeserializer;
import com.da.app.data.serde.JsonSerializer;
import com.da.app.data.serde.WrapperSerde;
import com.da.app.system.util.ConfigUtil;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.*;
import org.apache.kafka.streams.state.WindowStore;
import org.apache.log4j.Logger;
import org.json.JSONObject;
import java.time.Duration;
import java.util.Properties;
public class PolicyChangeReqStreamProcessor {
private static final Logger logger = Logger.getLogger(PolicyChangeReqStreamProcessor.class);
private static final String TOPIC_NAME = "stream-window-play1";
private static ObjectMapper mapper = new ObjectMapper();
private static Properties properties = ConfigUtil.loadProperty();
public static void main(String[] args) {
logger.info("Policy Limit Change Stats Generator");
Properties streamProperties = getStreamProperties();
StreamsBuilder streamsBuilder = new StreamsBuilder();
KStream<String, String> source = streamsBuilder.stream(TOPIC_NAME,
Consumed.with(Serdes.String(), Serdes.String()));
source
.filter((key, value) -> isValidEvent(value))
//Converting the request json to PolicyChangeRequest object
.mapValues(PolicyChangeReqStreamProcessor::convertPolicyChangeReqJsonToObj)
//Mapping all events to a single key in order to group all the events
.map((key, value) -> new KeyValue<>("key", value))
// Grouping by key
.groupByKey(Grouped.with(Serdes.String(), new PolicyChangeRequestSerde()))
//Creating a Tumbling window of 5 secs (for Testing)
.windowedBy(TimeWindows.of(Duration.ofSeconds(5)).advanceBy(Duration.ofSeconds(5)))
// Aggregating the PolicyChangeRequest events to a
// PolicyChangeRequestStats object
.<PolicyChangeRequestStats>aggregate(PolicyChangeRequestStats::new,
(k, v, policyStats) -> policyStats.add(v),
Materialized.<String, PolicyChangeRequestStats, WindowStore<Bytes, byte[]>>as
("policy-change-aggregates")
.withValueSerde(new PolicyChangeRequestStatsSerde()))
//Converting KTable to KStream
.toStream()
.foreach((key, value) -> logger.info(key.window().startTime() + "----" + key.window().endTime() + " :: " + value));
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), streamProperties);
logger.info("Started the stream");
kafkaStreams.start();
Runtime.getRuntime().addShutdownHook(new Thread(kafkaStreams::close));
}
private static PolicyChangeRequest convertPolicyChangeReqJsonToObj(String policyChangeReq) {
JSONObject policyChangeReqJson = new JSONObject(policyChangeReq);
PolicyChangeRequest policyChangeRequest = new PolicyChangeRequest(policyChangeReqJson);
// return mapper.readValue(value, PolicyChangeRequest.class);
return policyChangeRequest;
}
private static boolean isValidEvent(String value) {
//TODO: Message Validation
return true;
}
private static Properties getStreamProperties() {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "policy-change-stats-gen");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, properties.getProperty("kafka.bootstrap.servers"));
props.put(StreamsConfig.CLIENT_ID_CONFIG, "stream-window-play1");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return props;
}
public static final class PolicyChangeRequestStatsSerde extends WrapperSerde<PolicyChangeRequestStats> {
PolicyChangeRequestStatsSerde() {
super(new JsonSerializer<>(), new JsonDeserializer<>(PolicyChangeRequestStats.class));
}
}
public static final class PolicyChangeRequestSerde extends WrapperSerde<PolicyChangeRequest> {
PolicyChangeRequestSerde() {
super(new JsonSerializer<>(), new JsonDeserializer<>(PolicyChangeRequest.class));
}
}
}
isValidEvent - returns true
.mapValues(PolicyChangeReqStreamProcessor::convertPolicyChangeReqJsonToObj) - This will convert the incoming json string to a PolicyChangeRequest object
Till the operation map((key, value) -> new KeyValue<>("key", value)), the Custom Object - PolicyChangeRequest is fine as per the incoming message (I've tested by printing the stream there).
But after going into the Groupby and aggregate operation, the Custom Object got changed as
PolicyChangeRequest{coiRequestGuid='null', accountId='null', companyName='null', existingPolicyCoverageLimit=null, isChangeRequested=null, newlyRequestedPolicyCoverageLimit=null, isNewRecipient=null, newRecipientGuid='null', existingRecipientId='null', recipientName='null', recipientEmail='null'}
I found the above value by putting a log statement inside the policyStats.add(v) method i've called inside the aggregate method.
The add method is in the PolicyChangeRequestStats class
public PolicyChangeRequestStats add(PolicyChangeRequest policyChangeRequest) {
System.out.println("Incoming req: " + policyChangeRequest);
//Incrementing the Policy limit change request count
this.policyLimitChangeRequests++;
//Adding the Increased policy limit coverage to the existing increasedPolicyLimitCoverage
this.increasedPolicyLimitCoverage +=
(policyChangeRequest.getNewlyRequestedPolicyCoverageLimit() -
policyChangeRequest.getExistingPolicyCoverageLimit());
return this;
}
I'm getting NullPointerException in the line where I'm adding the policyChangeRequest.getNewlyRequestedPolicyCoverageLimit() - policyChangeRequest.getExistingPolicyCoverageLimit() as the values were null in the PolicyChangeRequest object
I've provided the valid Serde classes for the key and Value while doing groupBy .groupByKey(Grouped.with(Serdes.String(), new PolicyChangeRequestSerde())).
For Serialization and Desrialization I used Gson.
But I can't able to get the PolicyChangeRequest object as is before it was sent to the grouping operation.
I'm new to kafka Streams and I'm not sure whether I missed anything or whether the process I'm doing is correct or not.
Can anyone guide me here?

Concatenate logs by ID and time using Kafka Streams - Failed to flush state store

I want to concatenate logs by ID within a window of time using Kafka Streams.
For now, I can successfully count the number of logs having a same ID (the commented code).
However, when I replace the .count method with .aggregate I face following error:
"Failed to flush state store time-windowed-aggregation-stream-store"
Caused by: java.lang.ClassCastException: org.apache.kafka.streams.kstream.Windowed cannot be cast to java.lang.String
I'm new to this and can't figure out the cause of this error, I think that having .withValueSerde(Serdes.String()) is supposed to prevent this.
Below my code:
package myapps;
import java.time.Duration;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.*;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Suppressed.*;
import org.apache.kafka.streams.state.WindowStore;
public class MyCode {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-mycode");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("streams-plaintext-input");
KStream<String, String> changedKeyStream = source.selectKey((k, v)
-> v.substring(v.indexOf("mid="),v.indexOf("mid=")+8));
/* // Working code for count
changedKeyStream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(3))
.grace(Duration.ofSeconds(2)))
.count(Materialized.with(Serdes.String(), Serdes.Long())) // could be replaced with an aggregator (reducer?) ?
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded()))
.toStream()
.print(Printed.toSysOut());
*/
changedKeyStream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(3)))
.aggregate(
String::new, (String k, String v, String Result) -> { return Result+"\n"+v; },
Materialized.<String, String, WindowStore<Bytes, byte[]>>as("time-windowed-aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.String())) /* serde for aggregate value */
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded()))
.toStream()
.print(Printed.toSysOut());
changedKeyStream.to("streams-mycode-output", Produced.with(Serdes.String(), Serdes.String()));
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
// launch until control+c
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.out.print("Something went wrong!");
System.exit(1);
}
System.exit(0);
}
}
Thank you in advance for your help.
There are two option to fix it:
Pass org.apache.kafka.streams.kstream.Grouped to KStream::groupByKey.
Set org.apache.kafka.common.serialization.Serde to Materialized - Materialized::withKeySerde(...)
Sample code bellow:
Ad 1.
changedKeyStream
.groupByKey(Grouped.with(Serdes.String(), Serdes.String()))
.windowedBy(TimeWindows.of(Duration.ofSeconds(3)))
Ad 2.
changedKeyStream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(3)))
.aggregate(
String::new, (String k, String v, String Result) -> { return Result+"_"+v; },
Materialized.<String, String, WindowStore<Bytes, byte[]>>as("time-windowed-aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.String())
.withKeySerde(Serdes.String())
)

How to write data from Kafka topic to file using KStreams?

I am trying to create a KStream application in Eclipse using Java. right now I am referring to the word count program available on the internet for KStreams and modifying it.
What I want is that the data that I am reading from the input topic should be written to a file instead of being written to another output topic.
But when I am trying to print the KStream/KTable to the local file, I am getting the following entry in the output file:
org.apache.kafka.streams.kstream.internals.KStreamImpl#4c203ea1
How do I implement redirecting the output from the KStream to a file?
Below is the code:
package KStreamDemo.kafkatest;
package org.apache.kafka.streams.examples.wordcount;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.KeyValueMapper;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.kstream.ValueMapper;
import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class TemperatureDemo {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "34.73.184.104:9092");
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
System.out.println("#1###################################################################################################################################################################################");
// setting offset reset to earliest so that we can re-run the demo code with the same pre-loaded data
// Note: To re-run the demo, you need to use the offset reset tool:
// https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Application+Reset+Tool
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
StreamsBuilder builder = new StreamsBuilder();
System.out.println("#2###################################################################################################################################################################################");
KStream<String, String> source = builder.stream("iot-temperature");
System.out.println("#5###################################################################################################################################################################################");
KTable<String, Long> counts = source
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
#Override
public Iterable<String> apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
}
})
.groupBy(new KeyValueMapper<String, String, String>() {
#Override
public String apply(String key, String value) {
return value;
}
})
.count();
System.out.println("#3###################################################################################################################################################################################");
System.out.println("OUTPUT:"+ counts);
System.out.println("#4###################################################################################################################################################################################");
// need to override value serde to Long type
counts.toStream().to("iot-temperature-max", Produced.with(Serdes.String(), Serdes.Long()));
final KafkaStreams streams = new KafkaStreams(builder.build(), props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-wordcount-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
This is not correct
System.out.println("OUTPUT:"+ counts);
You would need to do counts.foreach, then print the messages out to a file.
Print Kafka Stream Input out to console? (just update to write to file instead)
However, probably better to write out the stream to a topic. And the use Kafka Connect to write out to a file. This is a more industry-standard pattern. Kafka Streams is encouraged to only move data between topics within Kafka, not integrate with external systems (or filesystems)
Edit connect-file-sink.properties with the topic information you want, then
bin/connect-standalone config/connect-file-sink.properties

Categories

Resources