Java client can't find master node: MasterNotDiscoveredException waited for [1m]

Java client can't find master node: MasterNotDiscoveredException waited for [1m] - java

I'm using vagrant and I installed ES on it using the debian package:
elasticsearch-1.1.1.deb
In my web app, I am using the jar:
org.elasticsearch elasticsearch 1.1.1
I am creating my client like:
val node = nodeBuilder.client(true).node
val client: Client = node.client
When I try and index I get the error:
val response = client.prepareIndex("articles", "article", article.id.toString).setSource(json).execute.actionGet
The error I get is:
[MasterNotDiscoveredException: waited for [1m]]
I can see my ES instance is working fine by going to:
http://localhost:9200
I ran some test queries from the README file and they worked fine, but now for some reason it isn't working either:
http://localhost:9200/twitter/user/kimchy?pretty=true
I get the error:
{
"error" : "ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];]",
"status" : 503
}
My vagrant file 2 ports open for elastic search:
config.vm.network "forwarded_port", guest: 9200, host: 9200 # ES
config.vm.network "forwarded_port", guest: 9300, host: 9300 # ES
What seems to be the problem?
Note: my web application isn't using a elasticsearch.yml file because it is just connecting to the default localhost:9200 from what I understand.

Normally you have to connect to ES from outside through http (normally, but there are also others protocols available) and then talk REST/JSON. So your webapp should use a scala/java ES client (see http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/clients.html) and then connect via http to your host which is running ES on port 9200. Port 9300 is only for internode communication (ES is a distributed clustered system). But there is another way to talk remotely to ES: Powerup a node which joins the cluster and then query this node through the internal client. But:
In your above question you try to connect to ES through the internal Java client (internal transport) which starts a node and then try to joins the cluster. That fails because the master node could to be found. Maybe due to networking issues. Try to include elasticsearch.yml in the classpath or use REST like described above. There is also a third option: TransportClient - look http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html#transport-client
See also http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-transport.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-http.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-memcached.html

Since you are generating your client node with .client(true), that disables both data-storage and master-eligibility on your node, if I understand the docs correctly. (the source is not very helpful either)
Note that any ES cluster needs at least 1 master node.
First, to clarify the config situation, your main elasticsearch.yml (see reference config) configuration is under /etc/elasticsearch/. You can also configure a second elasticsearch.yml in your src/main/resources folder, which will apply to the nodes you create in your app. I'd recommend doing this as it's way clearer compared to using the mysterious nodeBuilder methods.
Can you show what is the response when you query, right after starting es up, http://localhost:9200/_nodes ?
Specifically, if you have
"attributes": {
"master": "true"
},
set on one of the nodes. If so, then it looks like a networking problem as your client node is unable to contact the master node. I actually had a similar issue when I was setting up, and the solution was to set network.host: 127.0.0.1 in the app's elasticsearch.yml (wish I knew why)

uncomment discovery.zen.ping.multicast.enabled: false in /etc/elasticsearch/elasticsearch.yml

Related

kafka + zookeeper remote = error

I am trying to install a kafka & zookeeper instance on a remote server. I only need 1 node of each actually because i only want to provide remote kafka for test purposes.
Kafka and Zookeeper are running from the Apache Kafka tarball you can find there (v0.0.9), inside a Docker image.
Trying to consume / produce using the provided scripts. And trying to produce using own java application. Everythinf is working fine if Kafka & ZK are installed on the local server.
Here is the error I get while trying to produce :
BrokerPartitionInfo:83 - Error while fetching metadata [{TopicMetadata for topic RSS ->
No partition metadata for topic RSS due to kafka.common.LeaderNotAvailableException}] for topic [RSS]: class kafka.common.LeaderNotAvailableException
Kafka properties tested
First :
borker.id=0
port=9092
host.name=<external-ip>
zookeeper.connect=localhost:<PORT>
Second:
borker.id=0
port=9092
host.name=<external-ip>
zookeeper.connect=<external-ip>:<PORT>
Third:
borker.id=0
port=9092
host.name=<external-ip>
zookeeper.connect=<external-ip>:<PORT>
advertised.host.name=<external-ip>
advertised.host.port=<external-ip>
Last:
borker.id=0
port=9092
host.name=</etc/host name>
zookeeper.connect=<external-ip>:<PORT>
advertised.host.name=<external-ip>
advertised.host.port=<external-ip>
Here is my "/etc/hosts"
127.0.0.1 kafka kafka
127.0.0.1 localhost
I followed the Getting Started, which if I understood is a localhost / signle server configurations. I cannot understand what I have to do to get this work with remote calls...
Thanks for your help !
EDIT 1
host.name=localhost
advertised.host.name=politik.cm-cloud.fr
Seems to allow a local consumer (on the server) and producer. But if we want to do the same from a remote server we get
[2015-12-09 12:44:10,826] WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.NoRouteToHostException: No route to host

The error does not look like connectivity problem with Zookeeper / Kafka.
Just follow the instruction in "quickstart" from http://kafka.apache.org/
BrokerPartitionInfo:83 - Error while fetching metadata [{TopicMetadata for topic RSS ->
Additionally the error indicates there is no partition info i.e topic not yet created . Try creating topics first and then try to produce/consume because when producing to a non existent topic kafka will create the topic based on auto.create.topics.enable in server.properties but remotely it is better to create topics rathen than relying on auto create

Apache Tez configuration with hadoop

Here is what I have done in a nutshell:
STEP1: I have successfully configured hadoop 2.6 on my laptop (single node) and ran a sample mapreduce job.
STEP2: I cloned tez repository and successfully built the 0.8.0 version and copied the jarfiles into HDFS and exports the required variables. I also changed the value of variable mapreduce.framework.name to yarn-tez in the mapred-site.xml.
But when I want to run a tez orderedwordcount job, I got this error:
15/07/04 18:45:03 INFO ipc.Client: Retrying connect to server: hostname/hostIP:57339.
Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/07/04 18:45:12 INFO client.DAGClientImpl: DAG completed. FinalState=FAILED
I have checked resource manager and it is listening on port 8030.
But it seems the client tries to connect to a random port. is it correct?
What can I do to get it work correctly?

It seems that it was the problem of this version (0.8.0) connecting to the resource manager. I compiled and integrated the previous stable release (0.7.0) and everything is good to go now. I hope that they will figure the problem out.

From your logs it seems a Firewall issue rather than issue with Tez version. And it is irrespective of Tez, even if you run Hadoop only you can face this.
Hadoop uses multiple ports for communication with clients and between service components. To enable Hadoop communication, open the specific ports that Hadoop uses.
To open specific ports, you can set the access rules in Windows. For example, the following command will open up port 80 in the active Windows Firewall:
netsh advfirewall firewall add rule name=AllowRPCCommunication dir=in action=allow protocol=TCP localport=80
For more see here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0-Win/bk_HDP_Install_Win/content/ref-79239257-778e-42a9-9059-d982d0c08885.1.html

Setting the network.publish_host to a client node using the Elasticsearch JAVA API

I am running an Elasticsearch node on my VM. I wrote a simulator on the host that tries to connect to my VM ES node.
The client code connects as follows:
Node node = nodeBuilder().clusterName("AnalyticsCluster")
.client(true).node();
mClient = node.client();
I made sure I configured the right cluster name on the VM node. I do not want to use the other method using a TransportClient to connect to the ES node because according to ES documentation this will cause 2 hops pass through on each search.
It fails as follows:
org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [1m]
at org.elasticsearch.action.support.master.TransportMasterNodeOperationAction$3.onTimeout(TransportMasterNodeOperationAction.java:180)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:491)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I suppose I understand the root cause(not 100% though) being that the client and the Node are using a different network for publishing the multicast. I am saying that based on the following:
ES Node Console
[2014-02-26 18:19:13,725][INFO ][transport ] [Baron Samedi] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.79.128:9300]}
Client Node Console
INFO: [Lacuna] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.105:9200]}
In other terms the Node publishes on 192.168.79.* network which is the VM network and the client publishes on 192.168.1.* network which is my wifi network.
It seems I could solve this by setting on the client side the network.publish_host. The thing is on the client i don't have an elasticsearch.yml . I also didn't find a way to set it programatically.
I have 2 questions in order of priority:
Can the network.publish_host be set programatically and of so, how?
How can i set an elasticsearch.yml on my client side that the API would use for its settings?
Thx in advance
P.S: the firewall on the VM is stopped.

I solved the problem by doing 2 things.
A) I added an src/main/resources/elasticsearch.yml on the client side that looks as follows:
network.host: 10.231.150.165
That didn't solve the problem completely. The client was correctly sending the multicast on the server side that sits on the VM. But the VM was not able to connect back to the client
B) I configured the network between the host and the VM to be bridge and not NAT like the default on VMWare. The settings is as follows:
That completely solved the problem since now my host and my VM are on the same LAN.

HOWTO Resolve warning messages of "restributing to another node" when using Spymemcached client library for memcached server

I am using spymemcached client library v2.8.0 provided by couchbase folks. The memcached server installed is version 1.4.13.
The configuration for memcached is pretty basic > -m 64 -p 11211 -u memcache -l 127.0.0.1.
I am able to proper get, set, delete requests using the client library. But going through my logs I notice the warning messages from the spymemcached library like so -
WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for ...
I am not sure why is trying to redirect to another node in the cluster if there does not exist one.
I am connecting to the cache client using the below code -
String address = 127.0.0.1:11211;
new MemcachedClient(new ConnectionFactoryBuilder().setDaemon(true).build(), AddrUtil.getAddresses(address));
Any help appreciated.

By default Spymemcached will redistribute an operation to an different node if the primary node is not available. If you had another node this would happen, but since there is only one then redistributing is the same as retrying the operation on the primary node. In your case this message is a little bit confusing. If you never want to redistribute an operation then you can do the following.
String address = 127.0.0.1:11211;
new MemcachedClient(new ConnectionFactoryBuilder().setDaemon(true).setFailureMode(FailureMode.RETRY).build(), AddrUtil.getAddresses(address));
If sounds like your client might have lost connection to the server and reconnected at some point.

Hbase client can't connect to remote Hbase server

i have written a following hbase client class for remote server:
System.out.println("Hbase Demo Application ");
// CONFIGURATION
// ENSURE RUNNING
try {
HBaseConfiguration config = new HBaseConfiguration();
config.clear();
config.set("hbase.zookeeper.quorum", "192.168.15.20");
config.set("hbase.zookeeper.property.clientPort","2181");
config.set("hbase.master", "192.168.15.20:60000");
//HBaseConfiguration config = HBaseConfiguration.create();
//config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally
HBaseAdmin.checkHBaseAvailable(config);
System.out.println("HBase is running!");
// createTable(config);
//creating a new table
HTable table = new HTable(config, "mytable");
System.out.println("Table mytable obtained ");
addData(table);
} catch (MasterNotRunningException e) {
System.out.println("HBase is not running!");
System.exit(1);
}catch (Exception ce){ ce.printStackTrace();
it is throwing some exception:
Oct 17, 2011 1:43:54 PM org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation getMaster
INFO: getMaster attempt 0 of 1 failed; no more retrying.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:359)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:89)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1215)
at com.ifkaar.hbase.HBaseDemo.main(HBaseDemo.java:31)
HBase is not running!
can you tell me why is it throwing an exception, what is wrong with code and how to solve it.

This problem is occuring due to your HBase server's hosts file.
You just need to edit you HBase server's /etc/hosts file.
Remove the localhost entry from that file and put the localhost entry in front of HBase server IP.
For example, your HBase server's /etc/hosts files seems like this:
127.0.0.1 localhost
192.166.66.66 xyz.hbase.com hbase
You have to change it like this by removing localhost:
# 127.0.0.1 localhost # line commented out
192.166.66.66 xyz.hbase.com hbase localhost # note: localhost added here
This is because when remote machine asks hbase server machine where HMaster is running, it tells that it is running on localhost.
So if the entry is 127.0.0.1 then HBase server returns this address and remote machine start to find HMaster on its own machine (locally).
When we change that with the HBase Server IP then everything works fine :)

I agree.. The HBase is very sensitive to /etc/hosts configurations.. I had to set the zeekeeper bindings property in the hbase-site.xml correctly in order for the above mentioned Java code to work...For example: I had to set it as follows:
{property}
{name}hbase.zookeeper.quorum{/name}
{value}www.remoterg12.net{/value} {!-- this is the externally accessible domain --}
{/property}
{property}
{name}hbase.zookeeper.property.clientPort{/name}
{value}2181{/value} {!-- everything needs to be externally accessible --}
{/property}
{property}
{name}hbase.master.info.port{/name} {!-- http://www.remoterg12.net:60010/ --}
{value}60010{/value}
{/property}
{property}
{name}hbase.master.info.bindAddress{/name}
{value}www.remoterg12.net{/value} {!-- Use this to access the GUI console, --}
{/property}
The Remote GUI will give you a clear picture of the Binding Domains.. For example, the [HBase Master] property in the "GUI Web console" should be something like this: www.remoterg12.net:60010 (It should NOT be localhost:60010 )... AND YES!!, I did have to play around with the /etc/hosts just right as I didn't want to mess up the existing Apache configs :-)

The same problem can be solve by editing the conf/regionservers file in hbase directory to add the Hbase server (Remote) in it . Then no need to change the etc/hosts file
After editing conf/regionservers will look like:
localhost
ip address of the remote hbase server
eg
localhost
10.132.258.366

Exact same problem here with HBase 1.1.3.
2 virtuals machines (Ubuntu) on the same network. The logs show that the client can reach Zookeeper but not the HBase server.
TL;DR: remove the following line in /etc/hosts on the server (server_hostame):
127.0.1.1 server_hostname server_hostname
And add this one with 127.x.y.z the ip of your server on the (local) network:
192.x.y.z server_hostname
I tried a lot of combinations on the client and server sides. In standalone mode I don't think there is a better approach.
Not really proud of that. It is a shame to have to mess with the network configuration and to not even provide a HBase shell client able to connect remotely to a server (welcome to the Java world of illusions...)
On the server side, leave the files conf/hbase-site.xml empty. You don't need to put a Zookeeper configuration in here, defaults are fine.
Same for etc/regionservers. Leave it with the default entry (localhost) because I don't think in standalone mode it really cares (and I tried to put server_hostname in it and of course this does not works).
On the client side, It must know the server by hostname if you want to resolve with it so again add an entry in your /etc/hosts client file for the server.
As a bonus I give you my sbt configuration and some complete working code for the client since the HBase team seems to have spent the documentation budget at Vegas for the last 4 years (again, welcome the «Business ready» world of Java/Scala).
build.sbt:
libraryDependencies ++= Seq(
...
"org.apache.hadoop" % "hadoop-core" % "1.2.1",
"org.apache.hbase" % "hbase" % "1.1.2",
"org.apache.hbase" % "hbase-client" % "1.1.2",
)
some_client_code.scala:
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{HTable, Put, HBaseAdmin}
import org.apache.hadoop.hbase.util.Bytes
val hbaseConf = HBaseConfiguration.create()
hbaseConf.set("hbase.zookeeper.quorum", "server_hostname")
HBaseAdmin.checkHBaseAvailable(hbaseConf)
val table = new HTable(hbaseConf, "my_hbase_table")
val put = new Put(Bytes.toBytes("row_key"))
put.add(Bytes.toBytes("cf"), Bytes.toBytes("colId1"), Bytes.toBytes("foo"))

I know it is too late to answer this question but I want to share my way of resolving a similar issue.
I had the same issue and I tried to set the zookeeper quorum from the java program and also tried via the CLI but none of them worked.
I am using CDH 5.7.7 with HBase version 1.1.0
Finally I had to export few configs to the Hadoop classpath to fix the issue. Here is config that I have exported.
export HADOOP_CLASSPATH=/etc/hadoop/conf:/usr/share/cmf/lib/cdh5/hbase-protocol-0.98.1-cdh5.5.0.jar:/etc/hbase/conf:/driven/conf
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java client can't find master node: MasterNotDiscoveredException waited for [1m] - java

uncomment discovery.zen.ping.multicast.enabled: false in /etc/elasticsearch/elasticsearch.yml

Related

kafka + zookeeper remote = error

Apache Tez configuration with hadoop

Setting the network.publish_host to a client node using the Elasticsearch JAVA API

HOWTO Resolve warning messages of "restributing to another node" when using Spymemcached client library for memcached server

Hbase client can't connect to remote Hbase server

Categories

Resources