I am fetching neo4j data into spark dataframe using neo4j-spark connector. I am able to fetch it successfully as I am able to show the dataframe. Then I register the dataframe with createOrReplaceTempView() method. Then I try running spark sql on it, but it gives exception saying
org.apache.spark.sql.AnalysisException: Table or view not found: neo4jtable;
This is how my whole code looks like:
import java.text.ParseException;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.AnalysisException;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.neo4j.spark.Neo4JavaSparkContext;
import org.neo4j.spark.Neo4j;
import scala.collection.immutable.HashMap;
public class Neo4jDF {
private static Neo4JavaSparkContext neo4jjsc;
private static SparkConf sConf;
private static JavaSparkContext jsc;
private static SparkContext sc;
private static SparkSession ss;
private static Dataset<Row> neo4jdf;
static String neo4jip = "ll.mm.nn.oo";
public static void main(String[] args) throws AnalysisException, ParseException
{
setSparkConf();
setJavaSparkContext();
setNeo4jJavaSparkContext();
setSparkContext();
setSparkSession();
neo4jdf = loadNeo4jDataframe();
neo4jdf.createOrReplaceTempView("neo4jtable");
neo4jdf.show(false); //this prints correctly
Dataset<Row> neo4jdfsqled = ss.sql("SELECT * from neo4jtable");
neo4jdfsqled.show(false); //this throws exception
}
private static Dataset<Row> loadNeo4jDataframe(String pAutosysBoxCaption)
{
Neo4j neo4j = new Neo4j(jsc.sc());
HashMap<String, Object> a = new HashMap<String, Object>();
Dataset<Row> rdd = neo4j.cypher("cypher query deleted for irrelevance", a).loadDataFrame();
return rdd;
}
private static void setSparkConf()
{
sConf = new SparkConf().setAppName("GetNeo4jToRddDemo");
sConf.set("spark.neo4j.bolt.url", "bolt://" + neo4jip + ":7687");
sConf.set("spark.neo4j.bolt.user", "neo4j");
sConf.set("spark.neo4j.bolt.password", "admin");
sConf.setMaster("local");
sConf.set("spark.testing.memory", "471859200");
sConf.set("spark.sql.warehouse.dir", "file:///D:/Mahesh/workspaces/spark-warehouse");
}
private static void setJavaSparkContext()
{
jsc = new JavaSparkContext(sConf);
}
private static void setSparkContext()
{
sc = JavaSparkContext.toSparkContext(jsc);
}
private static void setSparkSession()
{
ss = new SparkSession(sc);
}
private static void setNeo4jJavaSparkContext()
{
neo4jjsc = Neo4JavaSparkContext.neo4jContext(jsc);
}
}
I feel the issue might be with how all spark environment variables are created.
I first created SparkConf sConf.
From sConf, I created JavaSparkContext jsc.
From jsc, I created SparkContext sc.
From sc, I created SparkSession ss.
From ss, I created Neo4jJavaSparkContext neo4jjjsc.
So visually:
sConf -> jsc -> sc -> ss
-> neo4jjsc
Also note that
Inside loadNeo4jDataframe(), I use sc to instantiate instance Neo4j neo4j, which is then used for fetching neo4j data.
Data is fetched using Neo4j instance.
neo4jjjsc is never used, but I kept it as a possible hint for issue.
Given all these points and observations, please tell me why I get table not found exception? I must be missing something stupid. :\
Update
Tried setting ss (after data is fetched using SparkContext of neo4j) as follows:
private static void setSparkSession(SparkContext sc)
{
ss = new SparkSession(sc);
}
private static Dataset<Row> loadNeo4jDataframe(String pAutosysBoxCaption)
{
Neo4j neo4j = new Neo4j(sc);
Dataset<Row> rdd = neo4j.cypher("deleted cypher for irrelevance", a).loadDataFrame();
//initalizing ss after data is fetched using SparkContext of neo4j
setSparkSession(neo4j.sc());
return rdd;
}
Update 2
From comments, just realised that neo4j creates a its own spark session using spark context sc instance provided to it. I am not having access to that spark session. So, how I am supposed to add / register arbitrary dataframe (here, neo4jdf) which is created in some other spark session (here spark session created by neo4j.cypher) to my spark session ss?
Based on the symptoms we can infer that both pieces of code use different SparkSession / SQLContext. Assuming there is nothing unusual going on in the Neo4j connector, you should be able to fix this by changing:
private static void setSparkSession()
{
ss = SparkSession().builder.getOrCreate();
}
or by initializing SparkSession before calling setNeo4jJavaSparkContext.
If these won't work, you can switch to using createGlobalTempView.
Important:
In general I would recommend initializing single SparkSession using builder pattern, and deriving other contexts (SparkContexts) from it, when necessary.
Related
I am trying to create a simple application where the app will consume Kafka message do some cql transform and publish to Kafka and below is the code:
JAVA: 1.8
Flink: 1.13
Scala: 2.11
flink-siddhi: 2.11-0.2.2-SNAPSHOT
I am using library: https://github.com/haoch/flink-siddhi
input json to Kafka:
{
"awsS3":{
"ResourceType":"aws.S3",
"Details":{
"Name":"crossplane-test",
"CreationDate":"2020-08-17T11:28:05+00:00"
},
"AccessBlock":{
"PublicAccessBlockConfiguration":{
"BlockPublicAcls":true,
"IgnorePublicAcls":true,
"BlockPublicPolicy":true,
"RestrictPublicBuckets":true
}
},
"Location":{
"LocationConstraint":"us-west-2"
}
}
}
main class:
public class S3SidhiApp {
public static void main(String[] args) {
internalStreamSiddhiApp.start();
//kafkaStreamApp.start();
}
}
App class:
package flinksidhi.app;
import com.google.gson.JsonObject;
import flinksidhi.event.s3.source.S3EventSource;
import io.siddhi.core.SiddhiManager;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.siddhi.SiddhiCEP;
import org.json.JSONObject;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Map;
import static flinksidhi.app.connector.Consumers.createInputMessageConsumer;
import static flinksidhi.app.connector.Producer.*;
public class internalStreamSiddhiApp {
private static final String inputTopic = "EVENT_STREAM_INPUT";
private static final String outputTopic = "EVENT_STREAM_OUTPUT";
private static final String consumerGroup = "EVENT_STREAM1";
private static final String kafkaAddress = "localhost:9092";
private static final String zkAddress = "localhost:2181";
private static final String S3_CQL1 = "from inputStream select * insert into temp";
private static final String S3_CQL = "from inputStream select json:toObject(awsS3) as obj insert into temp;" +
"from temp select json:getString(obj,'$.awsS3.ResourceType') as affected_resource_type," +
"json:getString(obj,'$.awsS3.Details.Name') as affected_resource_name," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration') as encryption," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration.Rules[0].ApplyServerSideEncryptionByDefault.SSEAlgorithm') as algorithm insert into temp2; " +
"from temp2 select affected_resource_name,affected_resource_type, " +
"ifThenElse(encryption == ' ','Fail','Pass') as state," +
"ifThenElse(encryption != ' ' and algorithm == 'aws:kms','None','Critical') as severity insert into outputStream";
public static void start(){
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//DataStream<String> inputS = env.addSource(new S3EventSource());
//Flink kafka stream consumer
FlinkKafkaConsumer<String> flinkKafkaConsumer =
createInputMessageConsumer(inputTopic, kafkaAddress,zkAddress, consumerGroup);
//Add Data stream source -- flink consumer
DataStream<String> inputS = env.addSource(flinkKafkaConsumer);
SiddhiCEP cep = SiddhiCEP.getSiddhiEnvironment(env);
cep.registerExtension("json:toObject", io.siddhi.extension.execution.json.function.ToJSONObjectFunctionExtension.class);
cep.registerExtension( "json:getString", io.siddhi.extension.execution.json.function.GetStringJSONFunctionExtension.class);
cep.registerStream("inputStream", inputS, "awsS3");
inputS.print();
System.out.println(cep.getDataStreamSchemas());
//json needs extension jars to present during runtime.
DataStream<Map<String,Object>> output = cep
.from("inputStream")
.cql(S3_CQL1)
.returnAsMap("temp");
//Flink kafka stream Producer
FlinkKafkaProducer<Map<String, Object>> flinkKafkaProducer =
createMapProducer(env,outputTopic, kafkaAddress);
//Add Data stream sink -- flink producer
output.addSink(flinkKafkaProducer);
output.print();
try {
env.execute();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Consumer class:
package flinksidhi.app.connector;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.json.JSONObject;
import java.util.Properties;
public class Consumers {
public static FlinkKafkaConsumer<String> createInputMessageConsumer(String topic, String kafkaAddress, String zookeeprAddr, String kafkaGroup ) {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", kafkaAddress);
properties.setProperty("zookeeper.connect", zookeeprAddr);
properties.setProperty("group.id",kafkaGroup);
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>(
topic,new SimpleStringSchema(),properties);
return consumer;
}
}
Producer class:
package flinksidhi.app.connector;
import flinksidhi.app.util.ConvertJavaMapToJson;
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema;
import org.json.JSONObject;
import java.util.Map;
public class Producer {
public static FlinkKafkaProducer<Tuple2> createStringProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Tuple2>(kafkaAddress, topic, new AverageSerializer());
}
public static FlinkKafkaProducer<Map<String,Object>> createMapProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Map<String,Object>>(kafkaAddress, topic, new SerializationSchema<Map<String, Object>>() {
#Override
public void open(InitializationContext context) throws Exception {
}
#Override
public byte[] serialize(Map<String, Object> stringObjectMap) {
String json = ConvertJavaMapToJson.convert(stringObjectMap);
return json.getBytes();
}
});
}
}
I have tried many things but the code where the CQL is invoked is never called and doesn't even give any error not sure where is it going wrong.
The same thing if I do creating an internal stream source and use the same input json to return as string it works.
Initial guess: if you are using event time, are you sure you have defined watermarks correctly? As stated in the docs:
(...) an incoming element is initially put in a buffer where elements are sorted in ascending order based on their timestamp, and when a watermark arrives, all the elements in this buffer with timestamps smaller than that of the watermark are processed (...)
If this doesn't help, I would suggest to decompose/simplify the job to a bare minimum, for example just a source operator and some naive sink printing/logging elements. And if that works, start adding back operators one by one. You could also start by simplifying your CEP pattern as much as possible.
First of all thanks a lot #Piotr Nowojski , just because of your small pointer which no matter how many times I pondered over about event time , it did not came in my mind. So yes while debugging the two cases:
With internal datasource , where it was processing successfully, while debugging the flow , I identified that it was processing a watermark after it was processing the data, but it did not catch me that it was somehow managing the event time of the data implicitly.
With kafka as a datasource , while I was debugging I could very clearly see that it was not processing any watermark in the flow, but it did not occur to me that , it is happening because of the event time and watermark not handled properly.
Just adding a single line of code in the application code which I understood from below Flink code snippet:
#deprecated In Flink 1.12 the default stream time characteristic has been changed to {#link
* TimeCharacteristic#EventTime}, thus you don't need to call this method for enabling
* event-time support anymore. Explicitly using processing-time windows and timers works in
* event-time mode. If you need to disable watermarks, please use {#link
* ExecutionConfig#setAutoWatermarkInterval(long)}. If you are using {#link
* TimeCharacteristic#IngestionTime}, please manually set an appropriate {#link
* WatermarkStrategy}. If you are using generic "time window" operations (for example {#link
* org.apache.flink.streaming.api.datastream.KeyedStream#timeWindow(org.apache.flink.streaming.api.windowing.time.Time)}
* that change behaviour based on the time characteristic, please use equivalent operations
* that explicitly specify processing time or event time.
*/
I got to know that by default flink considers event time and for that watermark needs to be handled properly which I didn't so I added below link for setting the time characteristics of the flink execution environment:
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
and kaboom ... it started working , while this is deprecated and needs some other configuration, but thanks a lot , it was a great pointer and helped me a lot and I solved the issue..
Thanks again #Piotr Nowojski
I'm trying to filter DataFrame content, using Spark's 1.5 method dropDuplicates().
Using it with fully data filled tables (I mean no empty cells) gives correct result, but when my CSV source contains empty cells (I'll provide you with source file) - Spark throw ArrayIndexOutOfBoundsException.
What am I doing wrong? I've read Spark SQL and DataFrames tutorial for version 1.6.2, It does not describe DataFrame operations in detail. I am also reading book "Learning Spark. Lightning-Fast Big Data Analysis.", but It's written for Spark 1.5 and operations I need are not described there. I'll be glad to get explanation either link to manual.
Thank you.
package data;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructType;
import java.util.Arrays;
public class TestDrop {
public static void main(String[] args) {
DropData dropData = new DropData("src/main/resources/distinct-test.csv");
dropData.execute();
}
}
class DropData{
private String csvPath;
private JavaSparkContext sparkContext;
private SQLContext sqlContext;
DropData(String csvPath) {
this.csvPath = csvPath;
}
void execute(){
initContext();
DataFrame dataFrame = loadDataFrame();
dataFrame.show();
dataFrame.dropDuplicates(new String[]{"surname"}).show();
//this one fails too: dataFrame.drop("surname")
}
private void initContext() {
sparkContext = new JavaSparkContext(new SparkConf().setMaster("local[4]").setAppName("Drop test"));
sqlContext = new SQLContext(sparkContext);
}
private DataFrame loadDataFrame() {
JavaRDD<String> strings = sparkContext.textFile(csvPath);
JavaRDD<Row> rows = strings.map(string -> {
String[] cols = string.split(",");
return RowFactory.create(cols);
});
StructType st = DataTypes.createStructType(Arrays.asList(DataTypes.createStructField("name", DataTypes.StringType, false),
DataTypes.createStructField("surname", DataTypes.StringType, true),
DataTypes.createStructField("age", DataTypes.StringType, true),
DataTypes.createStructField("sex", DataTypes.StringType, true),
DataTypes.createStructField("socialId", DataTypes.StringType, true)));
return sqlContext.createDataFrame(rows, st);
}
}
Sending List instead of Object[] results as creation rows, containing 1 column with a list inside. That's what I was doing wrong.
I want to save a Twitter stream in a HBase database. What I have now, is the Saprk Application to receive and transform the data. But I don't know how to save my TwitterStream into HBase?
The only thing I found that could be useful is the PairRDD.saveAsNewAPIHadoopDataset(conf) method. But how shall I use it, which Configurations do I have to make to able to save the RDD data to my HBase table?
The only thing I found yet is the HBase client library, which can insert data to a table via Put objects. But this isn't a solution for inside a Spark program, is it (would be necessary to iterate over all items inside the RDD!!)?
Can someone give an example in JAVA? My main problem seems to be the set-up of the org.apache.hadoop.conf.Configuration instance, I have to submit in the saveAsNewAPIHadoopDataset...
Here a code snippet:
JavaReceiverInputDStream<Status> statusDStream = TwitterUtils.createStream(streamingCtx);
JavaPairDStream<Long, String> statusPairDStream = statusDStream.mapToPair(new PairFunction<Status, Long, String>() {
public Tuple2<Long, String> call(Status status) throws Exception {
return new Tuple2<Long, String> (status.getId(), status.getText());
}
});
statusPairDStream.foreachRDD(new Function<JavaPairRDD<Long,String>, Void>() {
public Void call(JavaPairRDD<Long, String> status) throws Exception {
org.apache.hadoop.conf.Configuration conf = new Configuration();
status.saveAsNewAPIHadoopDataset(conf);
// HBase PUT here can't be correct!?
return null;
}
});
First thing is functions are discouraged, if you are using java 8. Pls. use lambda.
Below code snippet could address all your queries.
sample snippet:
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
....
public static void processYourMessages(final JavaRDD<YourMessage> rdd, final HiveContext sqlContext,
, MyMessageUtil messageutil) throws Exception {
final JavaRDD<Row> yourrdd = rdd.filter(msg -> messageutil.filterType(.....) // create a java rdd
final JavaPairRDD<ImmutableBytesWritable, Put> yourrddPuts = yourrdd.mapToPair(row -> messageutil.getPuts(row));
yourrddPuts.saveAsNewAPIHadoopDataset(conf);
}
where conf is like below
private Configuration conf = HBaseConfiguration.create();
conf.set(ZOOKEEPER_QUORUM, "comma seperated list of zookeeper quorum");
conf.set("hbase.mapred.outputtable", "your table name");
conf.set("mapreduce.outputformat.class", "org.apache.hadoop.hbase.mapreduce.TableOutputFormat");
MyMessageUtil has getPuts methods which is like below
public Tuple2<ImmutableBytesWritable, Put> getPuts(Row row) throws Exception {
Put put = ..// prepare your put with all the columns you have.
return new Tuple2<ImmutableBytesWritable, Put>(new ImmutableBytesWritable(), put);
}
Hope this helps!
i am doing one simple example of word count in apache spark in java with reference of Internet and i m getting error of
Caused by: java.net.UnknownHostException: my.txt
you can see my below code for the reference!
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
public class MyCount {
public static void main(String[] args) {
// TODO Auto-generated method stub
String file = "hdfs://my.txt";
JavaSparkContext sc = new JavaSparkContext("local", "Simple App");
JavaRDD<String> lines = sc.textFile(file);
long nums = lines.count();
System.out.println(nums);
}
}
Can you try
String file = "hdfs://localhost/my.txt"
PS: make sure you have this file my.txt in hdfs.
In case if you don't have that file hdfs, follow below command to put the file in hdfs from local dir.
Hadoop fs -copyFromLocal /home/training/my.txt hadoop/
Old question but an answer was never accepted, the mistake at the time I read it is mixing the "local" concept of Spark with "localhost."
Using this constructor: JavaSparkContext(java.lang.String master, java.lang.String appName), you would want to use:
JavaSparkContext sc = new JavaSparkContext("localhost", "Simple App");
but the question was using "local". Further, the HDFS filename didn't specify a hostname: "hdfs://SomeNameNode:9000/foo/bar/"or
"hdfs://host:port/absolute-path"
As of 1.6.2, the Javadoc for JavaSparkContext is not showing any constructor that let's you specify the cluster type directly:
http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html
The best constructor for JavaSparkContext wants a SparkConf object. To do something more readable by humans, build a SparkConf object and then pass it to JavaSparkContext, here's an example that sets the appname, specifies Kryo serializer and sets the master:
SparkConf sparkConf = new SparkConf().setAppName("Threshold")
//.setMaster("local[4]");
.setMaster(getMasterString(masterName))
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.registerKryoClasses(kryoClassArray);
// create the JavaSparkContext now:
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
NOTE: the alternate .setMaster("local[4]"); would use local mode, which the OP may have been trying.
I have a more extended answer here that addresses using hostnames vs. IP addresses and a lot more for setting up your SparkConf
You can try this simple word count program
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;
public class First {
public static void main(String[] args) {
SparkConf sf = new SparkConf().setMaster("local[3]").setAppName("parth");
JavaSparkContext sc = new JavaSparkContext(sf);
JavaRDD<String> textFile = sc.textFile("input file path");
JavaRDD<String> words = textFile.flatMap((new FlatMapFunction<String, String>() {
public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); }}));
JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
});
JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer a, Integer b) { return a + b; }
});
counts.saveAsTextFile("outputfile-path");
}
}
I was trying to use Zookeeper in our project. Could run the server..Even test it using zkcli.sh .. All good..
But couldn't find a good tutorial for me to connect to this server using Java ! All I need in Java API is a method
public String getServiceURL ( String serviceName )
I tried https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index --> Not good for me.
http://zookeeper.apache.org/doc/trunk/javaExample.html : Sort of ok; but couldnt understand concepts clearly ! I feel it is not explained well..
Finally, this is the simplest and most basic program I came up with which will help you with ZooKeeper "Getting Started":
package core.framework.zookeeper;
import java.util.Date;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.Watcher.Event.KeeperState;
import org.apache.zookeeper.ZooDefs.Ids;
import org.apache.zookeeper.ZooKeeper;
public class ZkConnect {
private ZooKeeper zk;
private CountDownLatch connSignal = new CountDownLatch(0);
//host should be 127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002
public ZooKeeper connect(String host) throws Exception {
zk = new ZooKeeper(host, 3000, new Watcher() {
public void process(WatchedEvent event) {
if (event.getState() == KeeperState.SyncConnected) {
connSignal.countDown();
}
}
});
connSignal.await();
return zk;
}
public void close() throws InterruptedException {
zk.close();
}
public void createNode(String path, byte[] data) throws Exception
{
zk.create(path, data, Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}
public void updateNode(String path, byte[] data) throws Exception
{
zk.setData(path, data, zk.exists(path, true).getVersion());
}
public void deleteNode(String path) throws Exception
{
zk.delete(path, zk.exists(path, true).getVersion());
}
public static void main (String args[]) throws Exception
{
ZkConnect connector = new ZkConnect();
ZooKeeper zk = connector.connect("54.169.132.0,52.74.51.0");
String newNode = "/deepakDate"+new Date();
connector.createNode(newNode, new Date().toString().getBytes());
List<String> zNodes = zk.getChildren("/", true);
for (String zNode: zNodes)
{
System.out.println("ChildrenNode " + zNode);
}
byte[] data = zk.getData(newNode, true, zk.exists(newNode, true));
System.out.println("GetData before setting");
for ( byte dataPoint : data)
{
System.out.print ((char)dataPoint);
}
System.out.println("GetData after setting");
connector.updateNode(newNode, "Modified data".getBytes());
data = zk.getData(newNode, true, zk.exists(newNode, true));
for ( byte dataPoint : data)
{
System.out.print ((char)dataPoint);
}
connector.deleteNode(newNode);
}
}
This post has almost all operations required to interact with Zookeeper.
https://www.tutorialspoint.com/zookeeper/zookeeper_api.htm
Create ZNode with data
Delete ZNode
Get list of ZNodes(Children)
Check an ZNode exists or not
Edit the content of a ZNode...
This blog post, Zookeeper Java API examples, includes some good examples if you are looking for Java examples to start with. Zookeeper also provides a client API library( C and Java) that is very easy to use.
Zookeeper is one of the best open source server and service that helps to reliably coordinates distributed processes. Zookeeper is a CP system (Refer CAP Theorem) that provides Consistency and Partition tolerance. Replication of Zookeeper state across all the nods makes it an eventually consistent distributed service.
This is about as simple as you can get. I am building a tool which will use ZK to lock files that are being processed (hence the class name):
package mypackage;
import java.io.IOException;
import java.util.List;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.Watcher;
public class ZooKeeperFileLock {
public static void main(String[] args) throws IOException, KeeperException, InterruptedException {
String zkConnString = "<zknode1>:2181,<zknode2>:2181,<zknode3>:2181";
ZooKeeperWatcher zkWatcher = new ZooKeeperWatcher();
ZooKeeper client = new ZooKeeper(zkConnString, 10000, zkWatcher);
List<String> zkNodes = client.getChildren("/", true);
for(String node : zkNodes) {
System.out.println(node);
}
}
public static class ZooKeeperWatcher implements Watcher {
#Override
public void process(WatchedEvent event) {
}
}
If you are on AWS; now We can create internal ELB which supports redirection based on URI .. which can really solve this problem with High Availability already baked in.