Inconsistency in Hbase table[Region not deployed on any region server] - java

In a small HBase cluster, all the slave nodes got restarted. When I started HBase services, one of the tables (test) became inconsistent.
In HDFS some blocks were missing(hbase blocks). So it was in safe mode. I gave safemode -leave command.
Then HBase table (test) became inconsistent.
I performed below mentioned actions:
I executed "hbase hbck" several times. 2 inconsistencies found for table "test".
ERROR: Region { meta=>test,1m\x00\x03\x1B\x15,1393439284371.4c213a47bba83c47075f21fec7c6d862., hdfs => hdfs://master:9000/hbase/test/4c213a47bba83c47075f21fec7c6d862, deployed => } not deployed on any region server.
hbase hbck -fixMeta -fixAssignments HBaseFsckRepair: Region still in transition, waiting for it to become assigned:
{NAME => 'test,1m\x00\x03\x1B\x15,1393439284371.4c213a47bba83c47075f21fec7c6d862.', STARTKEY => '1m\x00\x03\x1B\x15', ENDKEY => '', ENCODED => 4c213a47bba83c47075f21fec7c6d862,}
hbase hbck -repair HBaseFsckRepair: Region still in transition, waiting for it to become assigned:
{NAME => 'test,1m\x00\x03\x1B\x15,1393439284371.4c213a47bba83c47075f21fec7c6d862.', STARTKEY => '1m\x00\x03\x1B\x15', ENDKEY => '', ENCODED => 4c213a47bba83c47075f21fec7c6d862,}
I checked datanode logs in parallel.
Logs:
org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock BP-1015188871-192.168.1.11-1391187113543:blk_7616957984716737802_27846 received exception java.io.EOFException
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.1.12, storageID=DS-831971799-192.168.1.12-50010-1391193910800, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-7f99a9de-258c-493c-9db0-46b9e84b4c12;nsid=1286773982;c=0):Got exception while serving BP-1015188871-192.168.1.11-1391187113543:blk_7616957984716737802_27846 to /192.168.1.12:36127
Checked Namenode logs
ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:ubuntu (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /hbase/test/4c213a47bba83c47075f21fec7c6d862/C 2014-02-28 14:13:15,738
INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from
10.10.242.31:42149: error: java.io.FileNotFoundException: File does not exist: /hbase/test/4c213a47bba83c47075f21fec7c6d862/C java.io.FileNotFoundException: File does not exist: /hbase/test/4c213a47bba83c47075f21fec7c6d862/C at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1301)
But, I am able to browse and download the file from HDFS. How can recover the data?
How can I make the "test" table consistent?

In HBase 2.0 (and possibly in previous versions), "not deployed on any region server" is typically solved by getting the region assigned.
Authenticate if you're on a secured cluster. You are on a secured cluster, aren't you? ;)
kinit [keytab] [principal]
Run HBase check to see which regions specifically are unassigned
hbase hbck -details
If you see an error like this:
ERROR: Region {
meta => my.tablename,,1500001112222.abcdef123456789abcdef12345678912.,
hdfs => hdfs://cluster/apps/hbase/data/data/default/my.tablename/abcdef123456789abcdef12345678912,
deployed => ,
replicaId => 0
} not deployed on any region server.
(the key being "not deployed on any region server"), then you should assign the region. This, it turns out, is pretty simple. Proceed to step 4.
Open an hbase shell
hbase shell
Assign the region by passing the encoded regionname to the assign method. As noted in the help documentation, this should not be called without the previous due diligence as this command will do a force reassign. The docs say, and I caution: for experts only.
hbase(main):001:0> assign 'abcdef123456789abcdef12345678912'
Double-check your work by running hbase check for your table that had the unassigned regions.
hbase hbck my.tablename
If you did everything correctly and if there's no underlying HDFS issue, you should see this message near the bottom of the hbck output:
0 inconsistencies detected.
Status: OK

In Hbase 2.0.2 version there is no repair option to recover inconsistencies.
Run hbase hbck command.
If the error mesaage look like mentioned below:
ERROR: Region { meta => EMP_NMAE,\x02\x00\x00\x00\x00,1571419090798.054b393c37a80563ae1aa60f29e3e4df., hdfs => hdfs://node1:8020/apps/hbase/data/data/LEVEL_RESULT/054b393c37a80563ae1aa60f29e3e4df, deployed => , replicaId => 0 } not deployed on any region server.
ERROR: Region { meta => TABLE_3,\x02174\x0011100383\x00496\x001,1571324271429.6959c7157693956825be65676ced605c., hdfs => hdfs://node1:8020/apps/hbase/data/data/TABLE_NAME/6959c7157693956825be65676ced605c, deployed => , replicaId => 0 } not deployed on any region server.
copy this error inconsistancy to an file and pull the alphanumeric value by using the below command.
If our inconsistancy count is less we can take value manually if the number is more it would be hectic to retrive the entire value. so use the below command to narrow down to alphanemeric alone which can be copied and put in hbase shell at a stretch.
cat inconsistant.out|awk -F'.' '{print $2}'
Open hbase hbase shell and assign these consistancy manually. LIKE BELOW:
assign '054b393c37a80563ae1aa60f29e3e4df'
assign '6959c7157693956825be65676ced605c'
assign '7058dfe0da0699865a5b63be9d3799ab'
assign 'd25529539bae49eb078c7d0ca6ce84e4'
assign 'e4ad94f58e310a771a0f5a1eade884cc'
once the assigning is completed run the hbase hbck command again

I had the same problem. It turned out there were regions overlappings. How I fixed:
Try to assign region which is not deployed in hbase shell: assign 'Abcd...'
Check HBase Master log for ERROR AssingmentManager [something like that: Trying to assign region {ENCODED => Abcd..., NAME => ..., ts=1591351130943, server=server1,6020,1581641930622}]
Turn off region server on server1
Run hbase hbck -repair my_table
Repeat for every undeployed region
Or you can just restart hbase and run 'hbase hbck -repair'

Related

JanusGraph cannot connect to remote elasticsearch cluster

We previously had JG (v0.5.2) up and running without any issues using HBASE as our backend storage. To try and speed up queries we stood up an elasticsearch cluster, however now our JG instance is not starting up, saying that "could not get type for name org.janusgraph.example.GraphApp".
Our elastic properties are setup like:
index.search.backend="elasticsearch"
index.search.hostname = "myhost.elasticsearch:9200" ## https://myhost.elasticsearch:9200/janusgraph <-- returns index values
index.search.elasticsearch.client-only=true
In order to hit ES using curl, I have to run with the -k flag
curl -k https://myhost.elasticsearch:9200
I assume there has to be an option like that for JG, no?
I've been banging my head on this one for a while. Is there anything that I am missing??

Spring dataflow not responding after deploy

I tried to do a deployment for some applications in spring dataflow,
Routinely each diploi takes a few minutes and passes successfully or fails.
But this time the diplomacy took longer than usual. At one point I pressed "undeploy"
Since the system does not respond.
Under Stream all flickers in UNKNOWN mode.
It is not possible to redeploy.
When I try to perform a dipole I get the error Failed to upload the package. Package [test-orders:1.0.0] in Repository [local] already exists. from the ui
When I request the status of the pods I get 2 pods with CrashLoopBackOff status
I rebooted all pods kubectl -n **** rollout restart deploy
I try to run dataflow:>stream undeploy --name test-orders
I deleted the new docker image from EKS
Changed skipper_status from FAILED to DELETED
The problem still exists.
I'm really at a loss.
OK,
I seem to have been able to solve the problem.
Due to the CrashLoopBackOff status I realized that the system is unable to pull the image or the image is corrupt.
I have overwritten all the images in EKS that are associated with the project.
I changed the problematic skipper_status.status_code to DELETED(update skipper_status set status_code = 'DELETED' where id =***).
In the skipper_release table I added
backoffLimit: 6
completions: 1
parallelism: 1
So a crash of the system after several attempts will result in the end of a run.
I did a reset for all the pods.
And then in the UI interface I pressed the undeploy button.
Edit 1
I noticed that there were pods left that did not close.
I closed them like this:
kubectl -n foobar delete deployment foo-bar-v1

Hiveserver2 dont start: "ascii codec cant encode character"

I made a cluster with NameNode, Secondary NameNode, and 3 DataNodes. I installed HDP via Ambari + HUE and now I am configuring XA secure policies for HDFS, Hive and Hbase. It works fine for every component, except Hive. Problem is that when I change hive.security.authorization to true (in Ambari -> hive configs) the Hiveserver2 fails at start with a problem:
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 115, in action_create
fp.write(content)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 990: ordinal not in range(128)
I tried to edit that python file but when I do any changes it gets even worse. It probably tries to encode Unicode character using wrong codec and save it to the file, but I am bad programmer and I dont know how to edit it correctly. I cant figure out what is that file, where is it and what it contains.
When I set security authorization to false, the server starts but crashes in ~3 minutes with an error:
12:02:43,523 ERROR [pool-1-thread-648] JMXPropertyProvider:540 - Caught exception getting JMX metrics : Server returned HTTP response code: 500 for URL: http://localhost.localdomain:8745/api/cluster/summary
12:02:50,604 INFO [qtp677995254-4417] HeartBeatHandler:428 - State of service component HIVE_SERVER of service HIVE of cluster testING has changed from STARTED to INSTALLED at host localhost.localdomain
12:02:53,624 ERROR [pool-1-thread-668] JMXPropertyProvider:540 - Caught exception getting JMX metrics : Read timed out
Any suggestions? Thank you in advance.
#EDIT
Here is line of code in python which causes problem:
fp.write(content)
I was trying to add .decode("utf-8") at the end but:
'NoneType' object has no attribute 'decode' occurs
For the first problem, try adding
# -*- coding: UTF-8 -*-
At the first line of your file

Java client can't find master node: MasterNotDiscoveredException waited for [1m]

I'm using vagrant and I installed ES on it using the debian package:
elasticsearch-1.1.1.deb
In my web app, I am using the jar:
org.elasticsearch elasticsearch 1.1.1
I am creating my client like:
val node = nodeBuilder.client(true).node
val client: Client = node.client
When I try and index I get the error:
val response = client.prepareIndex("articles", "article", article.id.toString).setSource(json).execute.actionGet
The error I get is:
[MasterNotDiscoveredException: waited for [1m]]
I can see my ES instance is working fine by going to:
http://localhost:9200
I ran some test queries from the README file and they worked fine, but now for some reason it isn't working either:
http://localhost:9200/twitter/user/kimchy?pretty=true
I get the error:
{
"error" : "ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];]",
"status" : 503
}
My vagrant file 2 ports open for elastic search:
config.vm.network "forwarded_port", guest: 9200, host: 9200 # ES
config.vm.network "forwarded_port", guest: 9300, host: 9300 # ES
What seems to be the problem?
Note: my web application isn't using a elasticsearch.yml file because it is just connecting to the default localhost:9200 from what I understand.
Normally you have to connect to ES from outside through http (normally, but there are also others protocols available) and then talk REST/JSON. So your webapp should use a scala/java ES client (see http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/clients.html) and then connect via http to your host which is running ES on port 9200. Port 9300 is only for internode communication (ES is a distributed clustered system). But there is another way to talk remotely to ES: Powerup a node which joins the cluster and then query this node through the internal client. But:
In your above question you try to connect to ES through the internal Java client (internal transport) which starts a node and then try to joins the cluster. That fails because the master node could to be found. Maybe due to networking issues. Try to include elasticsearch.yml in the classpath or use REST like described above. There is also a third option: TransportClient - look http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html#transport-client
See also http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-transport.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-http.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-memcached.html
Since you are generating your client node with .client(true), that disables both data-storage and master-eligibility on your node, if I understand the docs correctly. (the source is not very helpful either)
Note that any ES cluster needs at least 1 master node.
First, to clarify the config situation, your main elasticsearch.yml (see reference config) configuration is under /etc/elasticsearch/. You can also configure a second elasticsearch.yml in your src/main/resources folder, which will apply to the nodes you create in your app. I'd recommend doing this as it's way clearer compared to using the mysterious nodeBuilder methods.
Can you show what is the response when you query, right after starting es up, http://localhost:9200/_nodes ?
Specifically, if you have
"attributes": {
"master": "true"
},
set on one of the nodes. If so, then it looks like a networking problem as your client node is unable to contact the master node. I actually had a similar issue when I was setting up, and the solution was to set network.host: 127.0.0.1 in the app's elasticsearch.yml (wish I knew why)
uncomment discovery.zen.ping.multicast.enabled: false in /etc/elasticsearch/elasticsearch.yml

Hbase client can't connect to remote Hbase server

i have written a following hbase client class for remote server:
System.out.println("Hbase Demo Application ");
// CONFIGURATION
// ENSURE RUNNING
try {
HBaseConfiguration config = new HBaseConfiguration();
config.clear();
config.set("hbase.zookeeper.quorum", "192.168.15.20");
config.set("hbase.zookeeper.property.clientPort","2181");
config.set("hbase.master", "192.168.15.20:60000");
//HBaseConfiguration config = HBaseConfiguration.create();
//config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally
HBaseAdmin.checkHBaseAvailable(config);
System.out.println("HBase is running!");
// createTable(config);
//creating a new table
HTable table = new HTable(config, "mytable");
System.out.println("Table mytable obtained ");
addData(table);
} catch (MasterNotRunningException e) {
System.out.println("HBase is not running!");
System.exit(1);
}catch (Exception ce){ ce.printStackTrace();
it is throwing some exception:
Oct 17, 2011 1:43:54 PM org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation getMaster
INFO: getMaster attempt 0 of 1 failed; no more retrying.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:359)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:89)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1215)
at com.ifkaar.hbase.HBaseDemo.main(HBaseDemo.java:31)
HBase is not running!
can you tell me why is it throwing an exception, what is wrong with code and how to solve it.
This problem is occuring due to your HBase server's hosts file.
You just need to edit you HBase server's /etc/hosts file.
Remove the localhost entry from that file and put the localhost entry in front of HBase server IP.
For example, your HBase server's /etc/hosts files seems like this:
127.0.0.1 localhost
192.166.66.66 xyz.hbase.com hbase
You have to change it like this by removing localhost:
# 127.0.0.1 localhost # line commented out
192.166.66.66 xyz.hbase.com hbase localhost # note: localhost added here
This is because when remote machine asks hbase server machine where HMaster is running, it tells that it is running on localhost.
So if the entry is 127.0.0.1 then HBase server returns this address and remote machine start to find HMaster on its own machine (locally).
When we change that with the HBase Server IP then everything works fine :)
I agree.. The HBase is very sensitive to /etc/hosts configurations.. I had to set the zeekeeper bindings property in the hbase-site.xml correctly in order for the above mentioned Java code to work...For example: I had to set it as follows:
{property}
{name}hbase.zookeeper.quorum{/name}
{value}www.remoterg12.net{/value} {!-- this is the externally accessible domain --}
{/property}
{property}
{name}hbase.zookeeper.property.clientPort{/name}
{value}2181{/value} {!-- everything needs to be externally accessible --}
{/property}
{property}
{name}hbase.master.info.port{/name} {!-- http://www.remoterg12.net:60010/ --}
{value}60010{/value}
{/property}
{property}
{name}hbase.master.info.bindAddress{/name}
{value}www.remoterg12.net{/value} {!-- Use this to access the GUI console, --}
{/property}
The Remote GUI will give you a clear picture of the Binding Domains.. For example, the [HBase Master] property in the "GUI Web console" should be something like this: www.remoterg12.net:60010 (It should NOT be localhost:60010 )... AND YES!!, I did have to play around with the /etc/hosts just right as I didn't want to mess up the existing Apache configs :-)
The same problem can be solve by editing the conf/regionservers file in hbase directory to add the Hbase server (Remote) in it . Then no need to change the etc/hosts file
After editing conf/regionservers will look like:
localhost
ip address of the remote hbase server
eg
localhost
10.132.258.366
Exact same problem here with HBase 1.1.3.
2 virtuals machines (Ubuntu) on the same network. The logs show that the client can reach Zookeeper but not the HBase server.
TL;DR: remove the following line in /etc/hosts on the server (server_hostame):
127.0.1.1 server_hostname server_hostname
And add this one with 127.x.y.z the ip of your server on the (local) network:
192.x.y.z server_hostname
I tried a lot of combinations on the client and server sides. In standalone mode I don't think there is a better approach.
Not really proud of that. It is a shame to have to mess with the network configuration and to not even provide a HBase shell client able to connect remotely to a server (welcome to the Java world of illusions...)
On the server side, leave the files conf/hbase-site.xml empty. You don't need to put a Zookeeper configuration in here, defaults are fine.
Same for etc/regionservers. Leave it with the default entry (localhost) because I don't think in standalone mode it really cares (and I tried to put server_hostname in it and of course this does not works).
On the client side, It must know the server by hostname if you want to resolve with it so again add an entry in your /etc/hosts client file for the server.
As a bonus I give you my sbt configuration and some complete working code for the client since the HBase team seems to have spent the documentation budget at Vegas for the last 4 years (again, welcome the «Business ready» world of Java/Scala).
build.sbt:
libraryDependencies ++= Seq(
...
"org.apache.hadoop" % "hadoop-core" % "1.2.1",
"org.apache.hbase" % "hbase" % "1.1.2",
"org.apache.hbase" % "hbase-client" % "1.1.2",
)
some_client_code.scala:
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{HTable, Put, HBaseAdmin}
import org.apache.hadoop.hbase.util.Bytes
val hbaseConf = HBaseConfiguration.create()
hbaseConf.set("hbase.zookeeper.quorum", "server_hostname")
HBaseAdmin.checkHBaseAvailable(hbaseConf)
val table = new HTable(hbaseConf, "my_hbase_table")
val put = new Put(Bytes.toBytes("row_key"))
put.add(Bytes.toBytes("cf"), Bytes.toBytes("colId1"), Bytes.toBytes("foo"))
I know it is too late to answer this question but I want to share my way of resolving a similar issue.
I had the same issue and I tried to set the zookeeper quorum from the java program and also tried via the CLI but none of them worked.
I am using CDH 5.7.7 with HBase version 1.1.0
Finally I had to export few configs to the Hadoop classpath to fix the issue. Here is config that I have exported.
export HADOOP_CLASSPATH=/etc/hadoop/conf:/usr/share/cmf/lib/cdh5/hbase-protocol-0.98.1-cdh5.5.0.jar:/etc/hbase/conf:/driven/conf
Hope this helps.

Categories

Resources