I am new to Kafka and working with new KafkaProducer and KafkaConsumer, version : 0.9.0.1
Is there any way in java to alter/update the number of partitions for a specific topic after it has been created.
I am not using zookeeper to create topic.
My KafkaProducer is automatically creating topics when publish request arrives.
I can also provide more details if these are not enough
Yes, it's possible. You have to access the AdminUtils scala class in kafka_2.11-0.9.0.1.jar to add partitions.
AdminUtils supports the number of partitions in the topic can only be increased. You may need kafka_2.11-0.9.0.1.jar, zk-client-0.8.jar, scala-library-2.11.8.jar and scala-parser-combinators_2.11-1.0.4.jar jars in your classpath.
Portions of the below code is borrowed / inspired from kafka-cloudera examples.
package org.apache.kafka.examples;
import java.io.Closeable;
import org.I0Itec.zkclient.ZkClient;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import kafka.admin.AdminOperationException;
import kafka.admin.AdminUtils;
import kafka.admin.RackAwareMode.Enforced$;
import kafka.utils.ZKStringSerializer$;
import kafka.utils.ZkUtils;
public class Test {
static final Logger logger = LogManager.getLogger();
public Test() {
// TODO Auto-generated constructor stub
}
public static void addPartitions(String zkServers, String topic, int partitions) {
try (AutoZkClient zkClient = new AutoZkClient(zkServers)) {
ZkUtils zkUtils = ZkUtils.apply(zkClient, false);
if (AdminUtils.topicExists(zkUtils, topic)) {
logger.info("Altering topic {}", topic);
try {
AdminUtils.addPartitions(zkUtils, topic, partitions, "", true, Enforced$.MODULE$);
logger.info("Topic {} altered with partitions : {}", topic, partitions);
} catch (AdminOperationException aoe) {
logger.info("Error while altering partitions for topic : {}", topic, aoe);
}
} else {
logger.info("Topic {} doesn't exists", topic);
}
}
}
// Just exists for Closeable convenience
private static final class AutoZkClient extends ZkClient implements Closeable {
static int sessionTimeout = 30_000;
static int connectionTimeout = 6_000;
AutoZkClient(String zkServers) {
super(zkServers, sessionTimeout, connectionTimeout, ZKStringSerializer$.MODULE$);
}
}
public static void main(String[] args) {
addPartitions("localhost:2181", "hello", 20);
}
}
Related
I'm new on apache storm and kafka and try to learn these notions via courses provided by OpenClassroom.The principle is simple, messages are sent via a python program to a kafka server, and are retrieved via a kafka spout defined in the main class of a Storm topology. The problem is that I don't understand how the bolt retrieves the messages. From what I understand this is done in the ParsingBolt class with the following line of code: JSONObject obj = (JSONObject)jsonParser.parse(input.getStringByField("value"));. The only thing is that I don't understand how we know that the messages are contained in the value field. Below you can find the main class and the parsing bolt class. (The whole project is available here)
The main class:
package velos;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.kafka.spout.KafkaSpout;
import org.apache.storm.kafka.spout.KafkaSpoutConfig;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.topology.base.BaseWindowedBolt;
import org.apache.storm.tuple.Fields;
public class App {
public static void main(String[] args)
throws AlreadyAliveException, InvalidTopologyException, AuthorizationException {
TopologyBuilder builder = new TopologyBuilder();
KafkaSpoutConfig.Builder<String, String> spoutConfigBuilder = KafkaSpoutConfig.builder("localhost:9092",
"velib-stations");
spoutConfigBuilder.setProp(ConsumerConfig.GROUP_ID_CONFIG, "city-stats");
KafkaSpoutConfig<String, String> spoutConfig = spoutConfigBuilder.build();
builder.setSpout("stations", new KafkaSpout<String, String>(spoutConfig));
builder.setBolt("station-parsing", new StationParsingBolt()).shuffleGrouping("stations");
builder.setBolt("city-stats",
new CityStatsBolt().withTumblingWindow(BaseWindowedBolt.Duration.of(1000 * 60 * 5)))
.fieldsGrouping("station-parsing", new Fields("city"));
builder.setBolt("save-results", new SaveResultsBolt()).fieldsGrouping("city-stats", new Fields("city"));
StormTopology topology = builder.createTopology();
Config config = new Config();
config.setMessageTimeoutSecs(60 * 30);
String topologyName = "Velos";
if (args.length > 0 && args[0].equals("remote")) {
StormSubmitter.submitTopology(topologyName, config, topology);
} else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(topologyName, config, topology);
}
}
}
The ParsingBolt:
package velos;
import java.util.Map;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import org.apache.storm.shade.org.json.simple.JSONObject;
import org.apache.storm.shade.org.json.simple.parser.JSONParser;
import org.apache.storm.shade.org.json.simple.parser.ParseException;
public class StationParsingBolt extends BaseRichBolt {
private OutputCollector outputCollector;
#Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
outputCollector = collector;
}
#Override
public void execute(Tuple input) {
try {
process(input);
} catch (ParseException e) {
e.printStackTrace();
outputCollector.fail(input);
}
}
public void process(Tuple input) throws ParseException {
JSONParser jsonParser = new JSONParser();
JSONObject obj = (JSONObject)jsonParser.parse(input.getStringByField("value"));
String contract = (String)obj.get("contract_name");
Long availableStands = (Long)obj.get("available_bike_stands");
Long stationNumber = (Long)obj.get("number");
outputCollector.emit(new Values(contract, stationNumber, availableStands));
outputCollector.ack(input);
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("city", "station_id", "available_stands"));
}
}
By default the "topic", "partition", "offset", "key", and "value" will be emitted to the "default" stream.
https://storm.apache.org/releases/2.4.0/storm-kafka-client.html
Use a RecordTranslator to change this.
I am trying to create a simple application where the app will consume Kafka message do some cql transform and publish to Kafka and below is the code:
JAVA: 1.8
Flink: 1.13
Scala: 2.11
flink-siddhi: 2.11-0.2.2-SNAPSHOT
I am using library: https://github.com/haoch/flink-siddhi
input json to Kafka:
{
"awsS3":{
"ResourceType":"aws.S3",
"Details":{
"Name":"crossplane-test",
"CreationDate":"2020-08-17T11:28:05+00:00"
},
"AccessBlock":{
"PublicAccessBlockConfiguration":{
"BlockPublicAcls":true,
"IgnorePublicAcls":true,
"BlockPublicPolicy":true,
"RestrictPublicBuckets":true
}
},
"Location":{
"LocationConstraint":"us-west-2"
}
}
}
main class:
public class S3SidhiApp {
public static void main(String[] args) {
internalStreamSiddhiApp.start();
//kafkaStreamApp.start();
}
}
App class:
package flinksidhi.app;
import com.google.gson.JsonObject;
import flinksidhi.event.s3.source.S3EventSource;
import io.siddhi.core.SiddhiManager;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.siddhi.SiddhiCEP;
import org.json.JSONObject;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Map;
import static flinksidhi.app.connector.Consumers.createInputMessageConsumer;
import static flinksidhi.app.connector.Producer.*;
public class internalStreamSiddhiApp {
private static final String inputTopic = "EVENT_STREAM_INPUT";
private static final String outputTopic = "EVENT_STREAM_OUTPUT";
private static final String consumerGroup = "EVENT_STREAM1";
private static final String kafkaAddress = "localhost:9092";
private static final String zkAddress = "localhost:2181";
private static final String S3_CQL1 = "from inputStream select * insert into temp";
private static final String S3_CQL = "from inputStream select json:toObject(awsS3) as obj insert into temp;" +
"from temp select json:getString(obj,'$.awsS3.ResourceType') as affected_resource_type," +
"json:getString(obj,'$.awsS3.Details.Name') as affected_resource_name," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration') as encryption," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration.Rules[0].ApplyServerSideEncryptionByDefault.SSEAlgorithm') as algorithm insert into temp2; " +
"from temp2 select affected_resource_name,affected_resource_type, " +
"ifThenElse(encryption == ' ','Fail','Pass') as state," +
"ifThenElse(encryption != ' ' and algorithm == 'aws:kms','None','Critical') as severity insert into outputStream";
public static void start(){
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//DataStream<String> inputS = env.addSource(new S3EventSource());
//Flink kafka stream consumer
FlinkKafkaConsumer<String> flinkKafkaConsumer =
createInputMessageConsumer(inputTopic, kafkaAddress,zkAddress, consumerGroup);
//Add Data stream source -- flink consumer
DataStream<String> inputS = env.addSource(flinkKafkaConsumer);
SiddhiCEP cep = SiddhiCEP.getSiddhiEnvironment(env);
cep.registerExtension("json:toObject", io.siddhi.extension.execution.json.function.ToJSONObjectFunctionExtension.class);
cep.registerExtension( "json:getString", io.siddhi.extension.execution.json.function.GetStringJSONFunctionExtension.class);
cep.registerStream("inputStream", inputS, "awsS3");
inputS.print();
System.out.println(cep.getDataStreamSchemas());
//json needs extension jars to present during runtime.
DataStream<Map<String,Object>> output = cep
.from("inputStream")
.cql(S3_CQL1)
.returnAsMap("temp");
//Flink kafka stream Producer
FlinkKafkaProducer<Map<String, Object>> flinkKafkaProducer =
createMapProducer(env,outputTopic, kafkaAddress);
//Add Data stream sink -- flink producer
output.addSink(flinkKafkaProducer);
output.print();
try {
env.execute();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Consumer class:
package flinksidhi.app.connector;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.json.JSONObject;
import java.util.Properties;
public class Consumers {
public static FlinkKafkaConsumer<String> createInputMessageConsumer(String topic, String kafkaAddress, String zookeeprAddr, String kafkaGroup ) {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", kafkaAddress);
properties.setProperty("zookeeper.connect", zookeeprAddr);
properties.setProperty("group.id",kafkaGroup);
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>(
topic,new SimpleStringSchema(),properties);
return consumer;
}
}
Producer class:
package flinksidhi.app.connector;
import flinksidhi.app.util.ConvertJavaMapToJson;
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema;
import org.json.JSONObject;
import java.util.Map;
public class Producer {
public static FlinkKafkaProducer<Tuple2> createStringProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Tuple2>(kafkaAddress, topic, new AverageSerializer());
}
public static FlinkKafkaProducer<Map<String,Object>> createMapProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Map<String,Object>>(kafkaAddress, topic, new SerializationSchema<Map<String, Object>>() {
#Override
public void open(InitializationContext context) throws Exception {
}
#Override
public byte[] serialize(Map<String, Object> stringObjectMap) {
String json = ConvertJavaMapToJson.convert(stringObjectMap);
return json.getBytes();
}
});
}
}
I have tried many things but the code where the CQL is invoked is never called and doesn't even give any error not sure where is it going wrong.
The same thing if I do creating an internal stream source and use the same input json to return as string it works.
Initial guess: if you are using event time, are you sure you have defined watermarks correctly? As stated in the docs:
(...) an incoming element is initially put in a buffer where elements are sorted in ascending order based on their timestamp, and when a watermark arrives, all the elements in this buffer with timestamps smaller than that of the watermark are processed (...)
If this doesn't help, I would suggest to decompose/simplify the job to a bare minimum, for example just a source operator and some naive sink printing/logging elements. And if that works, start adding back operators one by one. You could also start by simplifying your CEP pattern as much as possible.
First of all thanks a lot #Piotr Nowojski , just because of your small pointer which no matter how many times I pondered over about event time , it did not came in my mind. So yes while debugging the two cases:
With internal datasource , where it was processing successfully, while debugging the flow , I identified that it was processing a watermark after it was processing the data, but it did not catch me that it was somehow managing the event time of the data implicitly.
With kafka as a datasource , while I was debugging I could very clearly see that it was not processing any watermark in the flow, but it did not occur to me that , it is happening because of the event time and watermark not handled properly.
Just adding a single line of code in the application code which I understood from below Flink code snippet:
#deprecated In Flink 1.12 the default stream time characteristic has been changed to {#link
* TimeCharacteristic#EventTime}, thus you don't need to call this method for enabling
* event-time support anymore. Explicitly using processing-time windows and timers works in
* event-time mode. If you need to disable watermarks, please use {#link
* ExecutionConfig#setAutoWatermarkInterval(long)}. If you are using {#link
* TimeCharacteristic#IngestionTime}, please manually set an appropriate {#link
* WatermarkStrategy}. If you are using generic "time window" operations (for example {#link
* org.apache.flink.streaming.api.datastream.KeyedStream#timeWindow(org.apache.flink.streaming.api.windowing.time.Time)}
* that change behaviour based on the time characteristic, please use equivalent operations
* that explicitly specify processing time or event time.
*/
I got to know that by default flink considers event time and for that watermark needs to be handled properly which I didn't so I added below link for setting the time characteristics of the flink execution environment:
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
and kaboom ... it started working , while this is deprecated and needs some other configuration, but thanks a lot , it was a great pointer and helped me a lot and I solved the issue..
Thanks again #Piotr Nowojski
I have some trouble using flink's SerializationSchema.
Here is my main code :
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DeserializationSchema<Row> sourceDeserializer = new JsonRowDeserializationSchema.Builder( /*Extract TypeInformation<Row> from an avsc schema file*/ ).build();
DataStream<Row> myDataStream = env.addSource( new MyCustomSource(sourceDeserializer) ) ;
final SinkFunction<Row> sink = new MyCustomSink(new JsonRowSerializationSchema.Builder(myDataStream.getType()).build());
myDataStream.addSink(sink).name("MyCustomSink");
env.execute("MyJob");
Here is my custom Sink Function :
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import org.apache.flink.types.Row;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
#SuppressWarnings("serial")
public class MyCustomSink implements SinkFunction<Row> {
private static final Logger LOGGER = LoggerFactory.getLogger(MyCustomSink.class);
private final boolean print;
private final SerializationSchema<Row> serializationSchema;
public MyCustomSink(final SerializationSchema<Row> serializationSchema) {
this.serializationSchema = serializationSchema;
}
#Override
public void invoke(final Row value, final Context context) throws Exception {
try {
LOGGER.info("MyCustomSink- invoke : [{}]", new String(serializationSchema.serialize(value)));
}catch (Exception e){
LOGGER.error("MyCustomSink- Error while sending data : " + e);
}
}
}
And here is my custom Source Function (not sure it is useful for the problem I have) :
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.shaded.guava18.com.google.common.io.ByteStreams;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class MyCustomSource<T> extends RichSourceFunction<T> implements ResultTypeQueryable<T> {
/** logger */
private static final Logger LOGGER = LoggerFactory.getLogger(MyCustomSource.class);
/** the JSON deserializer */
private final DeserializationSchema<T> deserializationSchema;
public MyCustomSource(final DeserializationSchema<T> deserializer) {
this.deserializationSchema = deserializer;
}
#Override
public void open(final Configuration parameters) {
...
}
#Override
public void run(final SourceContext<T> ctx) throws Exception {
LOGGER.info("run");
InputStream data = ...; // Retrieve the input json data
final T row = deserializationSchema
.deserialize(ByteStreams.toByteArray(data));
ctx.collect(row);
}
#Override
public void cancel() {
...
}
#Override
public TypeInformation<T> getProducedType() {
return deserializationSchema.getProducedType();
}
}
Now I run my code and I send some data sequentially to my pipeline :
==>
{
"id": "sensor1",
"data":{
"rotation": 250
}
}
Here, the data is correctly printed by my sink : MyCustomSink- invoke : [{"id":"sensor1","data":{"rotation":250}}]
==>
{
"id": "sensor1"
}
Here, the data is correctly printed by my sink : MyCustomSink- invoke : [{"id":"sensor1","data":null}]
==>
{
"id": "sensor1",
"data":{
"rotation": 250
}
}
Here, there is an error on serialization. The error log printed is :
MyCustomSink- Error while sending data : java.lang.RuntimeException: Could not serialize row 'sensor1,250'. Make sure that the schema matches the input.
I do not understand at all why I have this behavior. Someone have an idea ?
Notes:
Using Flink 1.9.2
-- EDIT --
I added the CustomSource part
-- EDIT 2 --
After more investigations, it looks like this behavior is caused by the private transient ObjectNode node of the JsonRowSerializationSchema. If I understand correctly, this is used for optimization, but seems to be the cause of my problem.
Is it the normal behavior, and if it is, what would be the correct use of this class in my case ? (Else, is there any way to bypass this problem ?)
This is a JsonRowSerializationSchema bug which has been fixed in most recent Flink versions - I believe, this PR addresses the issue above.
I am trying to create a KStream application in Eclipse using Java. right now I am referring to the word count program available on the internet for KStreams and modifying it.
What I want is that the data that I am reading from the input topic should be written to a file instead of being written to another output topic.
But when I am trying to print the KStream/KTable to the local file, I am getting the following entry in the output file:
org.apache.kafka.streams.kstream.internals.KStreamImpl#4c203ea1
How do I implement redirecting the output from the KStream to a file?
Below is the code:
package KStreamDemo.kafkatest;
package org.apache.kafka.streams.examples.wordcount;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.KeyValueMapper;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.kstream.ValueMapper;
import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class TemperatureDemo {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "34.73.184.104:9092");
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
System.out.println("#1###################################################################################################################################################################################");
// setting offset reset to earliest so that we can re-run the demo code with the same pre-loaded data
// Note: To re-run the demo, you need to use the offset reset tool:
// https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Application+Reset+Tool
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
StreamsBuilder builder = new StreamsBuilder();
System.out.println("#2###################################################################################################################################################################################");
KStream<String, String> source = builder.stream("iot-temperature");
System.out.println("#5###################################################################################################################################################################################");
KTable<String, Long> counts = source
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
#Override
public Iterable<String> apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
}
})
.groupBy(new KeyValueMapper<String, String, String>() {
#Override
public String apply(String key, String value) {
return value;
}
})
.count();
System.out.println("#3###################################################################################################################################################################################");
System.out.println("OUTPUT:"+ counts);
System.out.println("#4###################################################################################################################################################################################");
// need to override value serde to Long type
counts.toStream().to("iot-temperature-max", Produced.with(Serdes.String(), Serdes.Long()));
final KafkaStreams streams = new KafkaStreams(builder.build(), props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-wordcount-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
This is not correct
System.out.println("OUTPUT:"+ counts);
You would need to do counts.foreach, then print the messages out to a file.
Print Kafka Stream Input out to console? (just update to write to file instead)
However, probably better to write out the stream to a topic. And the use Kafka Connect to write out to a file. This is a more industry-standard pattern. Kafka Streams is encouraged to only move data between topics within Kafka, not integrate with external systems (or filesystems)
Edit connect-file-sink.properties with the topic information you want, then
bin/connect-standalone config/connect-file-sink.properties
I was trying to use Zookeeper in our project. Could run the server..Even test it using zkcli.sh .. All good..
But couldn't find a good tutorial for me to connect to this server using Java ! All I need in Java API is a method
public String getServiceURL ( String serviceName )
I tried https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index --> Not good for me.
http://zookeeper.apache.org/doc/trunk/javaExample.html : Sort of ok; but couldnt understand concepts clearly ! I feel it is not explained well..
Finally, this is the simplest and most basic program I came up with which will help you with ZooKeeper "Getting Started":
package core.framework.zookeeper;
import java.util.Date;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.Watcher.Event.KeeperState;
import org.apache.zookeeper.ZooDefs.Ids;
import org.apache.zookeeper.ZooKeeper;
public class ZkConnect {
private ZooKeeper zk;
private CountDownLatch connSignal = new CountDownLatch(0);
//host should be 127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002
public ZooKeeper connect(String host) throws Exception {
zk = new ZooKeeper(host, 3000, new Watcher() {
public void process(WatchedEvent event) {
if (event.getState() == KeeperState.SyncConnected) {
connSignal.countDown();
}
}
});
connSignal.await();
return zk;
}
public void close() throws InterruptedException {
zk.close();
}
public void createNode(String path, byte[] data) throws Exception
{
zk.create(path, data, Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}
public void updateNode(String path, byte[] data) throws Exception
{
zk.setData(path, data, zk.exists(path, true).getVersion());
}
public void deleteNode(String path) throws Exception
{
zk.delete(path, zk.exists(path, true).getVersion());
}
public static void main (String args[]) throws Exception
{
ZkConnect connector = new ZkConnect();
ZooKeeper zk = connector.connect("54.169.132.0,52.74.51.0");
String newNode = "/deepakDate"+new Date();
connector.createNode(newNode, new Date().toString().getBytes());
List<String> zNodes = zk.getChildren("/", true);
for (String zNode: zNodes)
{
System.out.println("ChildrenNode " + zNode);
}
byte[] data = zk.getData(newNode, true, zk.exists(newNode, true));
System.out.println("GetData before setting");
for ( byte dataPoint : data)
{
System.out.print ((char)dataPoint);
}
System.out.println("GetData after setting");
connector.updateNode(newNode, "Modified data".getBytes());
data = zk.getData(newNode, true, zk.exists(newNode, true));
for ( byte dataPoint : data)
{
System.out.print ((char)dataPoint);
}
connector.deleteNode(newNode);
}
}
This post has almost all operations required to interact with Zookeeper.
https://www.tutorialspoint.com/zookeeper/zookeeper_api.htm
Create ZNode with data
Delete ZNode
Get list of ZNodes(Children)
Check an ZNode exists or not
Edit the content of a ZNode...
This blog post, Zookeeper Java API examples, includes some good examples if you are looking for Java examples to start with. Zookeeper also provides a client API library( C and Java) that is very easy to use.
Zookeeper is one of the best open source server and service that helps to reliably coordinates distributed processes. Zookeeper is a CP system (Refer CAP Theorem) that provides Consistency and Partition tolerance. Replication of Zookeeper state across all the nods makes it an eventually consistent distributed service.
This is about as simple as you can get. I am building a tool which will use ZK to lock files that are being processed (hence the class name):
package mypackage;
import java.io.IOException;
import java.util.List;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.Watcher;
public class ZooKeeperFileLock {
public static void main(String[] args) throws IOException, KeeperException, InterruptedException {
String zkConnString = "<zknode1>:2181,<zknode2>:2181,<zknode3>:2181";
ZooKeeperWatcher zkWatcher = new ZooKeeperWatcher();
ZooKeeper client = new ZooKeeper(zkConnString, 10000, zkWatcher);
List<String> zkNodes = client.getChildren("/", true);
for(String node : zkNodes) {
System.out.println(node);
}
}
public static class ZooKeeperWatcher implements Watcher {
#Override
public void process(WatchedEvent event) {
}
}
If you are on AWS; now We can create internal ELB which supports redirection based on URI .. which can really solve this problem with High Availability already baked in.