Cannot access KTable from a different app as StateStore - java

I have two Java Application (App1, App2) to test how to access a KTable from a different app on a single instance environment in docker.
The first App (App1) writes to a KTable with following code.
public static void main(String[] args)
{
final Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG,"gateway-service");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "172.18.0.11:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, ServiceTransactionSerde.class);
KStreamBuilder builder = new KStreamBuilder();
KStream<String,ServiceTransaction> source = builder.stream("gateway_request_processed");
KStream<String, Long> countByApi = source.groupBy((key,value)-> value.getApiId().toString()).count("Counts").toStream();
countByApi.to(Serdes.String(), Serdes.Long(),"countByApi");
countByApi.print();
final KafkaStreams streams = new KafkaStreams(builder,props);
streams.start();
System.out.println(streams.state());
System.out.println(streams.allMetadata());
System.out.println(streams.allMetadataForStore("countByApi"));
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
System.out.println(streams.allMetadata());
streams.close();
}
}));
}
When I run my producer I got following output for the code in App1
RUNNING
[]
[]
[KTABLE-TOSTREAM-0000000006]: c00af5ee-3c2d-4d12-9c4b-3b55c1284dd6, 19
This shows me state = RUNNING. Metadata are empty also for the store. But the request gets processed and store in the KTable successfully (String,Long).
When I run kafka-topics.sh --list --zookeeper:2181
I get the following topics.
bash-4.3# kafka-topics.sh --list --zookeeper zookeeper:2181
__consumer_offsets
countByApi
gateway-Counts-changelog
gateway-Counts-repartition
gateway-service-Counts-changelog
gateway-service-Counts-repartition
gateway_request_processed
This shows me that the KTable is somehow persisted with new topics.
I then have a secound command line app (App2) with following code which tries to access this KTable as a state store (ReadOnlyKeyValueStore) and access it.
public static void main( String[] args )
{
final Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "gateway-service-table-client");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "172.18.0.11:9092");
KStreamBuilder builder = new KStreamBuilder();
KafkaStreams streams = new KafkaStreams(builder,props);
streams.cleanUp();
streams.start();
System.out.println( "Hello World!" );
System.out.println(streams.state());
ReadOnlyKeyValueStore<String,Long> keyValueStore =
streams.store("countByApi", QueryableStoreTypes.keyValueStore());
final KeyValueIterator<String,Long> range = keyValueStore.all();
while(range.hasNext()){
KeyValue<String,Long> next = range.next();
System.out.println(String.format("key: %s | value: %s", next.key,next.value));
}
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
System.out.println(streams.allMetadata());
streams.close();
}
}));
}
When I run the 2. App I do get the error message:
RUNNING
Exception in thread "main" org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, countByApi, may have migrated to another instance.
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:60)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:728)
at com.comp.streamtable.App.main(App.java:37)
Unfortunatly I do have only 1 instance and I verify that the state is equal "RUNNING".
Note: I had to choose different application.id for each app since this thew another Exception. Just wanted to point this out since this might be for interest.
What do I miss here to access my KTable from another app?

You are using two different application.id for both applications. Thus, both applications are completely decoupled.
Interactive Queries are designed for different instances of the same app, and do not work across applications.
This blog post might help: https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/

Related

Flink savepoint not saving the valuestates

I'm writing a Flink program and can't save my stateful variables while restarting a new job.
I made a simple program with a connector from Kafka where I receive messages and a RichFlatMap with a valueState variable. This variable is an integer that increases by 1 with every message.
I make a stop-savepoint when the value is around 15, but when I restore it from that savepoint the counter comes back to 1.
Streamingjob.java:
KeyedStream<JsonNode, Object> eventsByKey = env
.addSource(consumer).name("Producer Topic Source")
.keyBy(e -> {...});
eventsByKey
.flatMap(new Test())
.uid("test-id")
Test.java:
public class Test extends RichFlatMapFunction<JsonNode, JsonNode> {
private transient ValueState<Integer> persistence;
#Override
public void flatMap(JsonNode node, Collector<JsonNode> collector) throws Exception {
if (persistence.value() == null) persistence.update(1);
String device_id = node.get("data").get("device_id").toString();
System.out.println();
System.out.println(device_id);
System.out.println(persistence.value());
System.out.println();
persistencia.update(persistence.value() + 1);
}
#Override
public void open(Configuration config) {
this.persistence = getRuntimeContext().getState(new ValueStateDescriptor<>(
"prueba", // the state name
Integer.class));
}
}
This is the command I use for stop-savepoint
../bin/flink stop --savepointPath f74c92af01ed51af94e530ee0e208d7c
And this one for start-savepoint
../bin/flink run flink-andy-12.3.0.jar --savepointPath file:/{...}/savepoint-f74c92-6acdb05afd11
Any ideas on what should I do?
To restart from a savepoint you need to specify --fromSavepoint, and not --savepointPath. (docs)
In other words:
$ ./bin/flink run \
--fromSavepoint /{...}/savepoint-f74c92-6acdb05afd11 \
flink-andy-12.3.0.jar

LaunchDarkly: Flushing data from client in offline mode

I'm working on a POC using LaunchDarkly's Java + Redis SDK and one of my requirements is initializing a 2nd LaunchDarkly client in "offline" mode. Due to my existing architecture one application will connect to LaunchDarkly and hydrate a Redis instance. The 2nd application will connect to the same data store, but the client will initialize as "offline" -- is there currently a way for me to read stored events from the offline client and flush them to the LaunchDarkly servers?
In the code snippet below I am initializing the first client + redis store, then initializing a 2nd client in a background thread that connects to the same local redis instance. I can confirm that when I run this snippet I do not see events populate in the LaunchDarkly UI.
NOTE: this is POC to determine whether LaunchDarkly will work for my use case. It is not a Production-grade implementation.
public static void main(String[] args) throws IOException {
LDConfig config = new LDConfig.Builder().dataStore(Components
.persistentDataStore(
Redis.dataStore().uri(URI.create("redis://127.0.0.1:6379")).prefix("my-key-prefix"))
.cacheSeconds(30)).build();
LDClient ldClient = new LDClient("SDK-KEY", config);
Runnable r = new Runnable() {
#Override
public void run() {
LDConfig offlineConfig = new LDConfig.Builder().dataStore(Components
.persistentDataStore(
Redis.dataStore().uri(URI.create("redis://127.0.0.1:6379")).prefix("my-key-prefix"))
.cacheSeconds(30)).offline(true).build();
LDClient offlineClient = new LDClient("SDK-KEY", offlineConfig);
String uniqueId = "abcde";
LDUser user = new LDUser.Builder(uniqueId).custom("customField", "customValue").build();
boolean showFeature = offlineClient.boolVariation("test-feature-flag", user, false);
if (showFeature) {
System.out.println("Showing your feature");
} else {
System.out.println("Not showing your feature");
}
try {
offlineClient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
};
ExecutorService executor = Executors.newCachedThreadPool();
executor.submit(r);
executor.shutdown();
ldClient.close();
}

How to skip an Avro serialization exception in KafkaStreams API?

I have a Kafka application that is written by KafkaStreams Java api. It reads data from Mysql binlog and do some stuff that is irrelevant to my question. The problem is one particular row produces error in deserialization from avro. I can dig into Avro schema file and find the problem but as a whole what I need is a forgiving exception handler that upon encountering such error does not bring the whole application to halt.
This is the main part of my stream app:
StreamsBuilder streamsBuilder = watchForCourierUpdate(builder);
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), properties);
kafkaStreams.start();
Runtime.getRuntime().addShutdownHook(new Thread(kafkaStreams::close));
}
private static StreamsBuilder watchForCourierUpdate(StreamsBuilder builder){
CourierUpdateListener courierUpdateListener = new CourierUpdateListener(builder);
courierUpdateListener.start();
return builder;
}
private static Properties configProperties(){
Properties streamProperties = new Properties();
streamProperties.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, Configs.getConfig("schemaRegistryUrl"));
streamProperties.put(StreamsConfig.APPLICATION_ID_CONFIG, "courier_app");
streamProperties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, Configs.getConfig("bootstrapServerUrl"));
streamProperties.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
streamProperties.put(StreamsConfig.STATE_DIR_CONFIG, "/tmp/state_dir");
streamProperties.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, "3");
streamProperties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
streamProperties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
streamProperties.put(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "DEBUG");
streamProperties.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
CourierSerializationException.class);
return streamProperties;
}
This is my CourierSerializationException class:
public class CourierSerializationException implements ProductionExceptionHandler {
#Override
public ProductionExceptionHandlerResponse handle(ProducerRecord<byte[], byte[]> producerRecord, Exception e) {
Logger.logError("Failed to de/serialize entity from " + producerRecord.topic() + " topic.\n" + e);
return ProductionExceptionHandlerResponse.CONTINUE;
}
#Override
public void configure(Map<String, ?> map) {
}
}
Still, whenever an avro deserialization exception happens the stream shuts down and the application does not continue. Am I missing something!
Have you tried to do this with the default.deserialization.exception.handler provided by kafka? you can use LogAndContinueExceptionHandler which will log and continue.
I may be wrong but i think creating a Customexception by implementing ProductionExceptionHandler only works for network related error on the kafka side.
add this to the properties and see what happens:
> props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, LogAndContinueExceptionHandler.class);

Why are all my Kafka messages being replayed in Storm?

I'm trying to figure out why all my Kafka messages are getting replayed every time I restart my Storm topology.
My understanding how how it should work were that once the last Bolt have ack'ed the tuple the spout should commit the message on Kafka, and hence I should not see it replay after a restart.
My code is a simple Kafka-spout and a Bolt which just print every message and then ack'ing them.
private static KafkaSpout buildKafkaSpout(String topicName) {
ZkHosts zkHosts = new ZkHosts("localhost:2181");
SpoutConfig spoutConfig = new SpoutConfig(zkHosts,
topicName,
"/" + topicName,
"mykafkaspout"); /*was:UUID.randomUUID().toString()*/
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
return new KafkaSpout(spoutConfig);
}
public static class PrintBolt extends BaseRichBolt {
OutputCollector _collector;
public static Logger LOG = LoggerFactory.getLogger(PrintBolt.class);
#Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
#Override
public void execute(Tuple tuple) {
LOG.error("PrintBolt.0: {}",tuple.getString(0));
_collector.ack(tuple);
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("nothing"));
}
}
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka", buildKafkaSpout("mytopic"), 1);
builder.setBolt("print1", new PrintBolt(),1).shuffleGrouping("kafka");
}
I have not provided any config settings than those in the code.
Am I missing a config-setting or what am I doing wrong?
UPDATE:
To clarify, everything works fine until I restart the pipeline. The below behavior is what I can get in other (non-storm) consumers, and what I expected from the KafkaSpout
My expectations:
However the actual behavior Im getting using the default setting is the following. The messages are processed fine up to I stop the pipeline, and then when I restart I get a replay of all the messages, including those (A and B) which I believed I had ack'ed already
What actually happens:
As per the configuration options mentioned by Matthias, I can change the startOffsetTime to Latest, however that is literally the latest where the pipeline is dropping the messages (Message "C") that were produced while the pipeline were restarting.
I have a consume written in NodeJS (using npm kafka-node) which is able to ack messages to Kafka and when I restart the NodeJs consumer it does exactly what I expected (catchup on message "C" which were produced when the consumer were down and continue from there) -- so how do I get the same behavior with the KafkaSpout?
The problem were in the submit code -- the template code for submitting the topology will create a instance of LocalCluster if the storm jar is run without a topology name, and the local cluster does not capture the state and hence the replay.
So
$ storm jar myjar.jar storm.myorg.MyTopology topologyname
will launch it on my single node development cluster, where
$ storm jar myjar.jar storm.myorg.MyTopology
will launch it on an instance of LocalCluster

neo4j rest graphdb hangs when connecting to remote heroku instance

public class Test
{
private static RestAPI rest = new RestAPIFacade("myIp","username","password");
public static void main(String[] args)
{
Map<String, Object> foo = new HashMap<String, Object>();
foo.put("Test key", "testing");
rest.createNode(foo);
}
}
No output it just hangs on connection indefinitely.
Environment:
Eclipse
JDK 7
neo4j-rest-binding 1.9: https://github.com/neo4j/java-rest-binding
Heroku
Any ideas as to why this just hangs?
The following code works:
public class Test
{
private static RestAPI rest = new RestAPIFacade("myIp","username","password");
public static void main(String[] args)
{
Node node = rest.getNodeById(1);
}
}
So it stands that I can correctly retrieve remote values.
I guess this is caused by lacking usage of transactions. By default neo4j-rest-binding aggregates multiple operations into one request (aka one transaction). There are 2 ways to deal with this:
change transactional behaviour to "1 operation = 1 transaction" by setting
-Dorg.neo4j.rest.batch_transaction=false for your JVM. Be aware this could impact performance since every atomic operation is a seperate REST request.
use transactions in your code:
.
RestGraphDatabse db = new RestGraphDatabase("http://localhost:7474/db/data",username,password);
Transaction tx = db.beginTx();
try {
Node node = db.createNode();
node.setPropery("key", "value");
tx.success();
} finally {
tx.finish();
}

Categories

Resources