Recovering from HBase server failure using Async HBase client - java

I'm currently trying to find a way to deal with unexpected HBase failures in my application. More specifically, what I'm trying to solve is a case where my application inserts data to HBase and then HBase fails and restarts.
In order to check how my application reacts to that scenario I wrote an application that uses HBase Async client by doing a tight loop and saving the results in HBase. When I start the application I can see rows are saved into the table, if during this time I intentionally fail my HBase server and restart it the client seems to reconnect but new insertions are not saved into the table
The code looks like this:
HConnection connection = HConnectionManager.createConnection();
HBaseClient hbaseClient = new HBaseClient(connection);
IntStream.range(0, 10000)
.forEach(new IntConsumer() {
#Override
public void accept(int value) {
try {
System.out.println("in value: " + value);
Thread.sleep(2000);
Get get = new Get(Bytes.toBytes("key"));
hbaseClient.get(TableName.valueOf("testTable"), get, new ResponseHandler<Result>() {
#Override
public void onSuccess(Result response) {
System.out.println("SUCCESS");
}
#Override
public void onFailure(IOException e) {
System.out.println("FAILURE");
}
});
urlsClient.save("valuekey", "w" + value, new FailureHandler<IOException>() {
#Override
public void onFailure(IOException failure) {
System.out.println("FAILURE");
}
});
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
This is obviously just a simple test but what I'm trying to achieve is that the async client will successfully save new rows after I restarted my HBase server. What the asynchronous HBase clients prints to me if I actually print the stacktrace in the "onFailure" method is:
org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=303, waitTime=60096, rpcTimeout=60000
at org.apache.hadoop.hbase.ipc.AsyncRpcChannel.cleanupCalls(AsyncRpcChannel.java:612)
at org.apache.hadoop.hbase.ipc.AsyncRpcChannel$1.run(AsyncRpcChannel.java:119)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)
at java.lang.Thread.run(Thread.java:745)
And so my questions are:
How should one deal with a situation like I mentioned using the specified async client?
If this async client is no longer relevant could someone suggest a different async client that can perform asynchronous puts? I tried the BufferedMutator but it does not seem to actually flush any contents but just fails with the following java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator (but this gets a little off topic so I wont expand anymore)
Thanks

It's been quite a long time since I asked this question but I ended up using the HBase high availability instead of finding a way to solve it with code

Related

Vertx http server Thread has been blocked for xxxx ms, time limit is 2000

i have written a large scale http server using , but im getting this error when number of concurrent requests increases
WARNING: Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 8458 ms, time limit is 1000
io.vertx.core.VertxException: Thread blocked
here is my full code :
public class MyVertxServer {
public Vertx vertx = Vertx.vertx(new VertxOptions().setWorkerPoolSize(100));
private HttpServer server = vertx.createHttpServer();
private Router router = Router.router(vertx);
public void bind(int port){
server.requestHandler(router::accept).listen(port);
}
public void createContext(String path,MyHttpHandler handler){
if(!path.endsWith("/")){
path += "/";
}
path+="*";
router.route(path).handler(new Handler<RoutingContext>() {
#Override
public void handle(RoutingContext ctx) {
String[] handlerID = ctx.request().uri().split(ctx.currentRoute().getPath());
String suffix = handlerID.length > 1 ? handlerID[1] : null;
handler.Handle(ctx, new VertxUtils(), suffix);
}
});
}
}
and how i call it :
ver.createContext("/getRegisterManager",new ProfilesManager.RegisterHandler());
ver.createContext("/getLoginManager", new ProfilesManager.LoginHandler());
ver.createContext("/getMapcomCreator",new ItemsManager.MapcomCreator());
ver.createContext("/getImagesManager", new ItemsManager.ImagesHandler());
ver.bind(PORT);
how ever i dont find eventbus() useful for http servers that process send/receive files , because u need to send the RoutingContext in the message with is not possible.
could you please point me to the right direction? thanks
added a little bit of handler's code:
class ProfileGetter implements MyHttpHandler{
#Override
public void Handle(RoutingContext ctx, VertxUtils utils, String suffix) {
String username = utils.Decode(ctx.request().headers().get("username"));
String lang = utils.Decode(ctx.request().headers().get("lang"));
display("profile requested : "+username);
Profile profile = ProfileManager.FindProfile(username,lang);
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
}
here ProfileManager.FindProfile(username,lang) does a long running database job on the same thread
...
basically all of my processes are happening on the main thread , because if i use executor i will get strange exceptions and nullpointers in Vertx , making me feel like the request proccessors in Vertx are parallel
Given the small amount of code in the question lets agree that the problem is on the line:
Profile profile = ProfileManager.FindProfile(username,lang);
Assuming that this is internally doing some blocking JDBC call which is a anti-pattern in Vert.x you can solve this in several ways.
Say that you can totally refactor the ProfileManager class which IMO is the best then you can update it to be reactive, so your code would be like:
ProfileManager.FindProfile(username,lang, res -> {
if (res.failed()) {
// handle error, sent 500 back, etc...
} else {
Profile profile = res.result();
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
});
Now what would be hapening behind the scenes is that your JDBC call would not block (which is tricky because JDBC is blocking by nature). So to fix this and you're lucky enough to use MySQL or Postgres then you could code your JDBC against the async-client if you're stuck with other RDBMS servers then you need to use the jdbc-client which in turn will use a thread pool to offload the work from the event loop thread.
Now say that you cannot change the ProfileManager code then you can still off load it to the thread pool by wrapping the code in a executeBlocking block:
vertx.executeBlocking(future -> {
Profile profile = ProfileManager.FindProfile(username,lang);
future.complete(profile);
}, false, res -> {
if (res.failed()) {
// handle error, sent 500 back, etc...
} else {
Profile profile = res.result();
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
});

javax.net.ssl.SSLHandshakeException: Connection reset by peer

I use following code to call an Azure mobile backend API in my Android app,
try {
mobileClient.invokeApi("CustomTransaction", senderToCheck,
Boolean.class, new ApiOperationCallback<Boolean>() {
#Override
public void onCompleted(Boolean result,
Exception error, ServiceFilterResponse response) {
if (error == null) {
CheckSender(result);
} else {
dial.dismiss();
Crouton.makeText(MyActivity.this,
"Eror Occured with service",
Style.ALERT).show();
}
}
});
} catch (SecurityException e) {
Log.d(TAG, "CouldNotConnectToSocket", e);
e.printStackTrace();
} catch (IllegalArgumentException e) {
Log.d(TAG, "CouldNotConnectToSocket", e);
e.printStackTrace();
}
Other information:
CustomTransaction - API Controller name;
senderToCheck - JSON parsable data transfer object;
Boolean.class - return type; and 4th parameter is the callback method
All objects are JSON parsable and this worked like several days ago.
So this API call/Azure call always times out giving a What does "connection reset by peer" mean? ,SSLHandShakeExceptionand and most of the time Connect gets Timed out.
Main cause for the problem is com.microsoft.windowsazure.mobileservices.MobileServiceException: Error while processing request.
I tried re-publishing my asp.net web app several times but it never hits controller action where my debugger point is placed when debugging the service call remotely.
I checked if my service is down, found it is up & running then checked Azure management portal logs, found out traceApi messages of some controller action methods. and of SQL Cpu usages and Data out packet sizes., but I never gets a proper reply from anywhere to solve this problem for two weeks now.
In case,if I am correct, think the solution for this problem lies in http://www.webapper.com/blog/index.php/2007/02/09/troubleshooting-javaxnetsslsslhandshakeexception/ but Im not pretty sure on doing it.
Please advise me on getting this fixed

What can be the best approach to handle java.net.UnknownHostException for AWS users?

My application sends message to Amazon Simple Notification Service (SNS) topic but sometime (6/10) I get java.net.UnknownHostException:sqs.ap-southeast-1.amazonaws.com. The reason of exception is described in the amazon web services discussion forums, please look: https://forums.aws.amazon.com/thread.jspa?messageID=499290&#499290.
My problem is similar to what described in forums of amazon but my rate of publishing messages to topic is very dynamic. It can be 1 message/second or 1 message/minute or no message in an hour. I am looking for a cleaner, better and safe approach, which guaranties sending of message to SNS topic.
Description of problem in detail:
Topic_Arn= arn of SNS topic where application wants to publish message
msg = Message to send in topic
// Just a sample example which publish message to Amazon SNS topic
class SimpleNotificationService {
AmazonSNSClient mSnsClient = null;
static {
createSnsClient()
}
private void static createSnsClient() {
Region region = Region.getRegion(Regions.AP_SOUTHEAST_1);
AWSCredentials credentials = new
BasicAWSCredentials(AwsPropertyLoader.getInstance().getAccessKey(),
AwsPropertyLoader.getInstance().getSecretKey());
mSqsClient = new AmazonSQSClient(credentials);
mSqsClient.setRegion(region);
}
public void static publishMessage(String Topic_Arn, String msg) {
PublishRequest req = new PublishRequest(Topic_Arn, msg);
mSnsClient.publish(req);
}
}
class which calls SimpleNotificationService
class MessagingManager {
public void sendMessage(String message) {
String topic_arn = "arn:of:amazon:sns:topic";
SimpleNotificationService.publishMessage(topic_arn, message);
}
}
Please note that this is a sample code, not my actual code. Here can be class design issue but please ignore those if they are not related to problem.
My thought process says to have try-catch block inside sendMessage, so when we catch UnknownHostException then again retry but I am not sure how to write this in safer, cleaner and better way.
So MessagingManager class will look something like this:
class MessagingManager {
public void sendMessage(String message) {
String topic_arn = "arn:of:amazon:sns:topic";
try {
SimpleNotificationService.publishMessage(topic_arn, message);
} catch (UnknownHostException uhe) {
// I need to catch AmazonClientException as aws throws
//AmazonClientException when sees UnknownHostException.
// I am mentioning UnknownHostException for non-aws user to understand
// my problem in better way.
sendMessage(message); // Isn't unsafe? - may falls into infinite loop
}
}
}
I am open for answers like this: java.net.UnknownHostException: Invalid hostname for server: local but my concern is to dependent on solution at application code-level and less dependent on changes to machine. As my server application is going to run in many boxes (developer boxes, testing boxes or production boxes). If changes in machine host-files or etc is only guaranted solution then I prefer that to include with code level changes.
Each AWS SDK implements automatic retry logic. The AWS SDK for Java automatically retries requests, and you can configure the retry settings using the ClientConfiguration class.
Below is the sample example to create SNS client. It retries for 25 times if encounters UnKnownHostException. It uses default BackOff and retry strategy. If you want to have your own then you need to implement these two interfaces: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/retry/RetryPolicy.html
private void static createSnsClient() {
Region region = Region.getRegion(Regions.AP_SOUTHEAST_1);
AWSCredentials credentials = new
BasicAWSCredentials(AwsPropertyLoader.getInstance().getAccessKey(),
AwsPropertyLoader.getInstance().getSecretKey());
ClientConfiguration clientConfiguration = new ClientConfiguration();
clientConfiguration.setMaxErrorRetry(25);
clientConfiguration.setRetryPolicy(new RetryPolicy(null, null, 25, true));
mSnsClient = new AmazonSNSClient(credentials, clientConfiguration);
mSnsClient.setRegion(region);
}
Have you considering looking into the JVM TTL for the DNS Cache?
http://docs.aws.amazon.com/AWSSdkDocsJava/latest//DeveloperGuide/java-dg-jvm-ttl.html

data cache for odata4j

I am implementing a RESTlet service via odata4j on Android.
When running the application there is a constant flow of data, which needs to be sent to a odata server.
The following method get's the new data:
private void freshData(Data data) {
try {
dataTransmitter.sendData(data, this.ptId);
} catch (Exception ex) {
//
}
The following method sends the data to the server:
ODataJerseyConsumer c = ODataJerseyConsumer.create(serviceUrl);
public void sendData(Data data, int ptId) throws Exception {
OEntity newData = c.createEntity(entitySet)
.properties(OProperties.int32("ptID", ptID),
OProperties.double_("data", data.getDouble())))
.execute;
So far no problem. But what if the mobile connection is cut off or lags?
1) Is there way to get the status from execute() (positive send, or no mobile connection for instance).
2) If the send has failed, or is still in progress I somehow need to store the new data to send it, when the old data is out. Is there a feature in odata4j (0.8-SNAPSHOT) available I haven't found, or do I need to do this with a queue for example?
Thank you for the help!

Does this program introduce a parallel execution?

Here is a simple server application using Bonjour and written in Java. The main part of the code is given here:
public class ServiceAnnouncer implements IServiceAnnouncer, RegisterListener {
private DNSSDRegistration serviceRecord;
private boolean registered;
public boolean isRegistered(){
return registered;
}
public void registerService() {
try {
serviceRecord = DNSSD.register(0,0,null,"_killerapp._tcp", null,null,1234,null,this);
} catch (DNSSDException e) {
// error handling here
}
}
public void unregisterService(){
serviceRecord.stop();
registered = false;
}
public void serviceRegistered(DNSSDRegistration registration, int flags,String serviceName, String regType, String domain){
registered = true;
}
public void operationFailed(DNSSDService registration, int error){
// do error handling here if you want to.
}
}
I understand it in the following way. We can try to register a service calling "registerService" method which, in its turn, calls "DNSSD.register" method. "DNSSD.register" try to register the service and, in general case, it can end up with two results: service was "successfully registered" and "registration failed". In both cases "DNSSD.register" calls a corresponding method (either "serviceRegistered" or "operationFailed") of the object which was given to the DNSSD.register as the last argument. And programmer decides what to put into "serviceRegistered" and "operationFailed". It is clear.
But should I try to register a service from the "operationFailed"? I am afraid that in this way my application will try to register the service too frequently. Should I put some "sleep" or "pause" into "operationFailed"? But in any case, it seems to me, that when the application is unable to register a service it will be also unable to do something else (for example to take care of GUI). Or may be DNSSD.register introduce some kind of parallelism? I mean it starts a new thread but that if I try to register service from "operation Failed", I could generate a huge number of the threads. Can it happen? If it is the case, should it be a problem? And if it is the case, how can I resolve this problem?
Yes, callbacks from the DNSSD APIs can come asynchronously from another thread. This exerpt from the O'Reilly book on ZeroConf networking gives some useful information.
I'm not sure retrying the registration from your operationFailed callback is a good idea. At least without some understanding of why the registration failed, is simply retrying it with the same parameters going to make sense?

Categories

Resources