Flink savepoint not saving the valuestates

Flink savepoint not saving the valuestates - java

I'm writing a Flink program and can't save my stateful variables while restarting a new job.
I made a simple program with a connector from Kafka where I receive messages and a RichFlatMap with a valueState variable. This variable is an integer that increases by 1 with every message.
I make a stop-savepoint when the value is around 15, but when I restore it from that savepoint the counter comes back to 1.
Streamingjob.java:
KeyedStream<JsonNode, Object> eventsByKey = env
.addSource(consumer).name("Producer Topic Source")
.keyBy(e -> {...});
eventsByKey
.flatMap(new Test())
.uid("test-id")
Test.java:
public class Test extends RichFlatMapFunction<JsonNode, JsonNode> {
private transient ValueState<Integer> persistence;
#Override
public void flatMap(JsonNode node, Collector<JsonNode> collector) throws Exception {
if (persistence.value() == null) persistence.update(1);
String device_id = node.get("data").get("device_id").toString();
System.out.println();
System.out.println(device_id);
System.out.println(persistence.value());
System.out.println();
persistencia.update(persistence.value() + 1);
}
#Override
public void open(Configuration config) {
this.persistence = getRuntimeContext().getState(new ValueStateDescriptor<>(
"prueba", // the state name
Integer.class));
}
}
This is the command I use for stop-savepoint
../bin/flink stop --savepointPath f74c92af01ed51af94e530ee0e208d7c
And this one for start-savepoint
../bin/flink run flink-andy-12.3.0.jar --savepointPath file:/{...}/savepoint-f74c92-6acdb05afd11
Any ideas on what should I do?

To restart from a savepoint you need to specify --fromSavepoint, and not --savepointPath. (docs)
In other words:
$ ./bin/flink run \
--fromSavepoint /{...}/savepoint-f74c92-6acdb05afd11 \
flink-andy-12.3.0.jar

Related

Kafka streams errors after redeploying Tomcat

I am using kafka streams in my project. I compile my project as war and run it in tomcat.
My project works as I want without any errors. If I first stop tomcat and then start it, it works without error. However, if I redeploy(undeploy and deploy) the service without stopping tomcat, I start getting errors. When I do research, there is information that tomcat caches the old version of the service. I could not reach a solution even though I applied some solutions. I will be grateful if you could help me.
I want to say it again. My code block works normally. If I run the service for the first time in tomcat, I don't get an error. Or if I close tomcat completely and start it again, I don't get an error. However, if I redeploy(undeploy and deploy) the service without stopping tomcat , I start getting an error.
I am sharing a small code block below.
Properties streamConfiguration = kafkaStreamsConfiguration.createStreamConfiguration(createKTableGroupId(), new AppSerdes.DataWrapperSerde());
StreamsBuilder streamsBuilder = new StreamsBuilder();
KTable<String, DataWrapper> kTableDataWrapper = streamsBuilder.table(topicAction.getTopicName());
KTable<String, DataWrapper> kTableWithStore = kTableDataWrapper.filter((key, dataWrapper) -> key != null && dataWrapper != null, Materialized.as(createStoreName()));
kTableWithStore.toStream().filter((key, dataWrapper) -> // Filter)
.mapValues((ValueMapperWithKey<String, DataWrapper, Object>) (key, dataWrapper) -> {
// Logics
})
.to(createOutputTopicName());
this.kafkaStreams = new KafkaStreams(streamsBuilder.build(), streamConfiguration);
this.kafkaStreams.start();
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
if (kafkaStreams != null) {
kafkaStreams.close();
}
}));
public Properties createStreamConfiguration(String appId, Serde serde) {
Properties properties = new Properties();
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, appId);
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBrokers);
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, serde.getClass());
properties.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, dynamicKafkaSourceTopologyConfiguration.getkTableCommitIntervalMs());
properties.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, dynamicKafkaSourceTopologyConfiguration.getkTableMaxByteBufferMB() * 1024 * 1024);
properties.put(StreamsConfig.STATE_DIR_CONFIG, KafkaStreamsConfigurationConstants.stateStoreLocation);
return properties;
}
Error :
2022-02-16 14:19:39.663 WARN 9529 --- [ Thread-462] o.a.k.s.p.i.StateDirectory : Using /tmp directory in the state.dir property can cause failures with writing the checkpoint file due to the fact that this directory can be cleared by the OS
2022-02-16 14:19:39.677 ERROR 9529 --- [ Thread-462] o.a.k.s.p.i.StateDirectory : Unable to obtain lock as state directory is already locked by another process
2022-02-16 14:19:39.702 ERROR 9529 --- [ Thread-462] f.t.s.c.- Message : Unable to initialize state, this can happen if multiple instances of Kafka Streams are running in the same state directory - Localized Message : Unable to initialize state, this can happen if multiple instances of Kafka Streams are running in the same state directory - Print Stack Trace : org.apache.kafka.streams.errors.StreamsException: Unable to initialize state, this can happen if multiple instances of Kafka Streams are running in the same state directory
at org.apache.kafka.streams.processor.internals.StateDirectory.initializeProcessId(StateDirectory.java:186)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:681)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:657)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:567)

I think this is because
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
if (kafkaStreams != null) {
kafkaStreams.close();
}
}));
is not being called during re-deploy, as JVM process continue to run. Please try another way to be notified when your application is being redeployed, for example using ServletContextListener

My problem was solved thanks to #udalmik.
I solved my problem by extending my beans from DisposableBean.
Additionally I have prototype beans. This solution didn't work on my prototype beans.
I am writing my solution for both prototype and singleton beans.
// For Singleton Bean
#Service
public class PersonSingletonBean implements DisposableBean {
#Override
public void destroy() throws Exception {
if (kafkaStreams != null) {
kafkaStreams.close();
}
}
}
// For PrototypeBean
#Service
#Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public class PersonPrototypeBean implements DisposableBean {
#Override
public void destroy() {
if (kafkaStreams != null) {
kafkaStreams.close();
}
}
}
#Service
public class PersonPrototypeBeanList implements DisposableBean {
private final List<PersonPrototypeBean> personPrototypeBeanList = Collections.synchronizedList(new ArrayList<>());
public void addToPersonPrototypeBeanList(PersonPrototypeBean personPrototypeBean) {
personPrototypeBeanList.add(personPrototypeBean);
}
public void destroy() throws Exception {
synchronized (personPrototypeBeanList) {
for (PersonPrototypeBean personPrototypeBean : personPrototypeBeanList) {
if (personPrototypeBean != null) {
((DisposableBean) personPrototypeBean).destroy();
}
}
personPrototypeBeanList.clear();
}
}
}

Seek method behavior in spring kafka consumer 1.2.x

I do not want to commit offsets for those messages for which processing fails and I want them to be re-delivered again for processing. I am using spring-kafka 1.2.x and implemented ConsumerSeekAware in my listener.
#Component
public class Listener implements ConsumerSeekAware {
private static Logger logger = LoggerFactory.getLogger(Listener.class);
private final ThreadLocal<ConsumerSeekCallback> seekCallBack = new ThreadLocal<>();
#KafkaListener(topics = "my-topic", containerFactory = "kafkaManualAckListenerContainerFactory")
public void listen1(ConsumerRecord<String, String> consumerRecord) throws MyCustomException {
logger.info("received: key - " + consumerRecord.key() + " value - " + consumerRecord.value());
// Below code is just to show the issue.Not acknowledging so I can get the same msg again.
boolean should_commit = false;
if(should_commit) {
ack.acknowledge();
}
else {
this.seekCallBack.get().seek(consumerRecord.topic(), consumerRecord.partition() , consumerRecord.offset());
}
}
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
logger.info("registerSeekCallback called..");
this.seekCallBack.set(callback);
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
logger.info("onPartitionsAssigned called..");
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
logger.info("onIdleContainer called..");
}
}
#########Contaianer config (auto.commit is false in consumer)
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL_IMMEDIATE);
The problem I am facing is if I have 10 messages in different partitions for a topic so I am getting all of them one by one and after getting all the messages I keep on getting the last message for any partition. I also tried SeekToCurrentErrorHandler which is implemented in version 2.0.x and that works perfectly. but I can not upgrade my kafka version. If I restart the container I get all the messages again which is fine but I don't want to stop the container when processing of a message fails.
So my question is it even possible to get the same (Exactly same without any need of stopping the container) behavior same as SeekToCurrentErrorHandler in spring-kafka 1.2.x ?

Different result on running Flink in local mode and Yarn cluster

I run a code using Flink Java API that gets some bytes from Kafka and parses it following by inserting into Cassandra database using another library static method (both parsing and inserting results is done by the library). Running code on local in IDE, I get the desired answer, but running on YARN cluster the parse method didn't work as expected!
public class Test {
static HashMap<Integer, Object> ConfigHashMap = new HashMap<>();
public static void main(String[] args) throws Exception {
CassandraConnection.connect();
Parser.setInsert(true);
stream.flatMap(new FlatMapFunction<byte[], Void>() {
#Override
public void flatMap(byte[] value, Collector<Void> out) throws Exception {
Parser.parse(ByteBuffer.wrap(value), ConfigHashMap);
// Parser.parse(ByteBuffer.wrap(value));
}
});
env.execute();
}
}
There is a static HashMap field in the Parser class that configuration of parsing data is based on its information, and data will insert it during the execution. The problem running on YARN was this data was not available for taskmanagers and they just print config is not available!
So I redefine that HashMap as a parameter for parse method, but no differences in results!
How can I fix the problem?

I changed static methods and fields to non-static and using RichFlatMapFunction solved the problem.
stream.flatMap(new RichFlatMapFunction<byte[], Void>() {
CassandraConnection con = new CassandraConnection();
int i = 0 ;
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
con.connect();
}
#Override
public void flatMap(byte[] value, Collector<Void> out) throws Exception {
ByteBuffer tb = ByteBuffer.wrap(value);
np.parse(tb, ConfigHashMap, con);
}
});

Why are all my Kafka messages being replayed in Storm?

I'm trying to figure out why all my Kafka messages are getting replayed every time I restart my Storm topology.
My understanding how how it should work were that once the last Bolt have ack'ed the tuple the spout should commit the message on Kafka, and hence I should not see it replay after a restart.
My code is a simple Kafka-spout and a Bolt which just print every message and then ack'ing them.
private static KafkaSpout buildKafkaSpout(String topicName) {
ZkHosts zkHosts = new ZkHosts("localhost:2181");
SpoutConfig spoutConfig = new SpoutConfig(zkHosts,
topicName,
"/" + topicName,
"mykafkaspout"); /*was:UUID.randomUUID().toString()*/
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
return new KafkaSpout(spoutConfig);
}
public static class PrintBolt extends BaseRichBolt {
OutputCollector _collector;
public static Logger LOG = LoggerFactory.getLogger(PrintBolt.class);
#Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
#Override
public void execute(Tuple tuple) {
LOG.error("PrintBolt.0: {}",tuple.getString(0));
_collector.ack(tuple);
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("nothing"));
}
}
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka", buildKafkaSpout("mytopic"), 1);
builder.setBolt("print1", new PrintBolt(),1).shuffleGrouping("kafka");
}
I have not provided any config settings than those in the code.
Am I missing a config-setting or what am I doing wrong?
UPDATE:
To clarify, everything works fine until I restart the pipeline. The below behavior is what I can get in other (non-storm) consumers, and what I expected from the KafkaSpout
My expectations:
However the actual behavior Im getting using the default setting is the following. The messages are processed fine up to I stop the pipeline, and then when I restart I get a replay of all the messages, including those (A and B) which I believed I had ack'ed already
What actually happens:
As per the configuration options mentioned by Matthias, I can change the startOffsetTime to Latest, however that is literally the latest where the pipeline is dropping the messages (Message "C") that were produced while the pipeline were restarting.
I have a consume written in NodeJS (using npm kafka-node) which is able to ack messages to Kafka and when I restart the NodeJs consumer it does exactly what I expected (catchup on message "C" which were produced when the consumer were down and continue from there) -- so how do I get the same behavior with the KafkaSpout?

The problem were in the submit code -- the template code for submitting the topology will create a instance of LocalCluster if the storm jar is run without a topology name, and the local cluster does not capture the state and hence the replay.
So
$ storm jar myjar.jar storm.myorg.MyTopology topologyname
will launch it on my single node development cluster, where
$ storm jar myjar.jar storm.myorg.MyTopology
will launch it on an instance of LocalCluster

neo4j rest graphdb hangs when connecting to remote heroku instance

public class Test
{
private static RestAPI rest = new RestAPIFacade("myIp","username","password");
public static void main(String[] args)
{
Map<String, Object> foo = new HashMap<String, Object>();
foo.put("Test key", "testing");
rest.createNode(foo);
}
}
No output it just hangs on connection indefinitely.
Environment:
Eclipse
JDK 7
neo4j-rest-binding 1.9: https://github.com/neo4j/java-rest-binding
Heroku
Any ideas as to why this just hangs?
The following code works:
public class Test
{
private static RestAPI rest = new RestAPIFacade("myIp","username","password");
public static void main(String[] args)
{
Node node = rest.getNodeById(1);
}
}
So it stands that I can correctly retrieve remote values.

I guess this is caused by lacking usage of transactions. By default neo4j-rest-binding aggregates multiple operations into one request (aka one transaction). There are 2 ways to deal with this:
change transactional behaviour to "1 operation = 1 transaction" by setting
-Dorg.neo4j.rest.batch_transaction=false for your JVM. Be aware this could impact performance since every atomic operation is a seperate REST request.
use transactions in your code:
.
RestGraphDatabse db = new RestGraphDatabase("http://localhost:7474/db/data",username,password);
Transaction tx = db.beginTx();
try {
Node node = db.createNode();
node.setPropery("key", "value");
tx.success();
} finally {
tx.finish();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flink savepoint not saving the valuestates - java

To restart from a savepoint you need to specify --fromSavepoint, and not --savepointPath. (docs) In other words: $ ./bin/flink run \ --fromSavepoint /{...}/savepoint-f74c92-6acdb05afd11 \ flink-andy-12.3.0.jar

Related

Kafka streams errors after redeploying Tomcat

Seek method behavior in spring kafka consumer 1.2.x

Different result on running Flink in local mode and Yarn cluster

Why are all my Kafka messages being replayed in Storm?

neo4j rest graphdb hangs when connecting to remote heroku instance

Categories

Resources