Using Zookeeper Znodes to dynamically change storm bolt processes - java

I am creating a storm based project where messages will be filtered by storm. My aim is to allow a user to adapt the filtering performed at runtime by sending configuration information to a zookeeper Znode.
I believe this is possible by setting a zookeeper watcher up within storm but I am struggling to achieve this. I would be gratefull for some guidance or a simple example on how to perfrom this.
I have looked at the Java docs and afraid the way to perfrom this does not seem obvious

Related

create a continuous query using kafka and flink

I'm new to the streaming community
I'm trying to create a continuous query using kafka topics and flink but I haven't found any examples so I can get an idea of how to get started
can you help me with some examples?
thank you.
For your use case, I'm guessing you want to use kafka as source for continuous data. In this case you can use kafka-source-connector(linked below) and if you want to slice it with time you can use flink's Window Processing Function. This will group your kafka messages streamed in a particular timeframe like a list/map.
Flink Kafka source connector
Flink Window Processing Function

Dynamic Streams Topology in Kafka

While creating Kafka Streams using Kafka Streams DSL
https://kafka.apache.org/0110/documentation/streams/developer-guide
we have encountered a scenario where we need to update the Kafka Streams with new topology definition.
For example:
When we started, we have a topology defined to read from one topic (Source) and a destination topic (Sink).
However, on a configuration change we now need to read from 2 different topics (2 sources if you will) and write to a single destination topic.
From what we have built right now, the topology definition is hard coded, something like defined in processor topology.
Questions:
Is it possible to define topology in a declarative way (say in a Json or something else), which doesn't require a codification of the topology?
Is it possible to reload an existing Kafka Stream to use a new definition of the Kafka Streams Topology?
For #2 mentioned above, does Kafka Streams DSL provide a way to "reload" new topology definitions by way of an external trigger or system call?
We are using JDK 1.8 and Kafka DSL 2.2.0
Thanks,
Ayusman
Is it possible to define topology in a declarative way (say in a Json or something else), which doesn't require a codification of the topology?
The KStreams DSL is declarative, but I assume you mean something other than the DSL?
If so, the answer is No. You may want to look at KSQL, however.
Is it possible to reload an existing Kafka Stream to use a new definition of the Kafka Streams Topology?
You mean if an existing Kafka Streams application can reload a new definition of a processing topology? If so, the answer is No. In such cases, you'd deploy a new version of your application.
Depending on how the old/new topologies are defined, a simple rolling upgrade of your application may suffice (roughly: if the topology change was minimal), but probably you will need to deploy the new application separately and then, once the new one is vetted, decommission your old application.
Note: KStreams is a Java library and, by design, does not include functionality to operate/manage the Java applications that use the KStreams library.
For #2 mentioned above, does Kafka Streams DSL provide a way to "reload" new topology definitions by way of an external trigger or system call?
No.

Delete Messages from a Topic in Apache Kafka

So I am new to working with Apache Kafka and I am trying to create a simple app so I can try to understand the API better. I know this question has been asked a lot here, but how can I clear out the messages/records that are stored on a topic?
Most of the answers I have seen say to change the message retention time or to delete & recreate the topic. Neither of these are options for me as I do not have access to the server.properties file. I am not running Kafka locally, it is hosted on a server. Is there a way to do do it in Java code maybe or something?
If you are searching for a way to delete messages selectively, the new AdminClient API (usable from Java code) provides the following deleteRecords method :
https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/admin/AdminClient.html

Best way to start zookeeper server from java program

I have two questions for which I couldn't find any popular/widely accepted solutions:
What is the easiest way to start zookeeper server using Java Program?
And, is it possible to add servers to zookeeper cluster without having to manually go to each machine and update their config files with new node's id and ip:port entry?
Can someone please help? Thanks!
If you want to start a new ZooKeeper server process from your Java code, you would do it the same way you would start any other external process from Java, e.g. using a ProcessBuilder. There is nothing special here in case of ZooKeeper. You can check the official docs on how the actual command should look like. It gets complicated if you want to supervise the process for production use, so in that case it would be better to use something provided on your OS (e.g. upstart, runit, etc...), or take a look at Exhibitor for code examples: https://github.com/Netflix/exhibitor.
If you are asking about starting a ZooKeeper cluster from your Java program, then you complicate things further, since you would basically need to supervise multiple ZooKeeper JVM processes on different hosts. Also take a look at Exhibitor.
If your question is about starting a ZooKeeper server instance inside the same JVM process as your Java code (embedded), then it is also possible. There are a few important details to keep in mind, take a look at this answer:
Is it possible to start a zookeeper server instance in process, say for unit tests?
Regarding your second question, real support for dynamic cluster reconfiguration was added just recently, in 3.5.0: http://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html.
Prior to this, you can still "add servers to zookeeper cluster without having to manually go to each machine and update their config", but you have to use a configuration management tool like Chef, Puppet or similar, and in this case you would also need to do a restart of the cluster to uptake new config.

Is there a java monitoring/alerts framework for a cluster?

I have a cluster of servers. Common tasks I manualy code are:
collect various stats (failures, success, times) with metrics library.
aggregate those combine cross cluster.
depending on conditions check out the aggregated stats cross cluster and based on that send alerts. (instead of having each server send an alert, increase global metrics which are polled then to graphite).
if a specific node send an alert its first accumulated and base on alerts from other nodes (again cross cluster scenario) then I woudl decide which alert to send (so if i have 100 servers not each of them send a separate alert but single one).
I looked into a few frameworks but none of them that I see achieve this: metrics, javamelody, netflix servo, netflix zuul
but none of them support for example my cross cluster scenario where i want to aggregate stats over time and only if certain conditions apply send an alert (as a method to avoid duplicating alerts cross servers). Do I need to build my own framework for that? or is there already something existing?
(and in case my use case sounded specific so that I should just code it, i have many more such similar use cases which makes me think why isn't there such a framework, before i start coding something i don't want to find i just duplicated some other framework).
Have you looked at using a combination of either Graphite or OpenTSDB with Riemann? You can aggregate your information in Graphite (with or without statsd) or dump everything into OpenTSDB and use Riemann for event processing? Riemann's config is in Clojure but I believe you can use client libraries in multiple languages (unless you want to do the event processing yourself using Esper/Siddhi). Another option could be to look at Rocksteady (whcih uses Graphite/Esper). Graphite is a Python/Django application (there are multiple forks of statsd - not just the one in NodeJS & besides, you can simply use metrics in place of that). OpenTSDB is Java on HBase (if you're looking to store time series information). For event processing, you could also choose to look into Storm (and use Esper/Siddhi as a bolt in Storm).

Categories

Resources