I'm using hbase 0.94.15 (no cluster setup).
I adapted the BulkDeleteEndpoing (see org.apache.hadoop.hbase.coprocessor.example.BulkDeleteEndpoint) and am calling it from the client.
It works fine for a limited amount of data (probably around 20.000 rows wrt our table design), but after that I get an error containing responseTooSlow and execCoprocessor.
I've read that this is due to a client disconnect, as it doesn't get any response within 60 seconds (default hbase.rpc.timeout) (https://groups.google.com/forum/#!topic/nosql-databases/FPeMLHrYkco).
My question now is, how do I prevent the client from closing the connection before the endpoint is done?
I do not want to set the default rpc timeout to some high value (setting the timeout for this specific call only would be an option, but would only be a workaround)
in some mailing list I found a comment that it is able to poll for the status of the endpoint, but (1) I can't find the site anymore, and (2) I can't find any other information regarding this idea. Maybe it is available in some more recent version?
any other ideas are appreciated
What we might try is limit the scope of the scan object (may be calling
Scan.setMaxResultsPerColumnFamily()) and invoking the co-processor in a loop,
with the startKey updated in every iteration. Need to figure out how to update
the startKey at the end of every invocation.
Related
The setup:
We have an https://Main.externaldomain/xmlservlet site, which is authenticating/validating/geo-locating and proxy-ing (slightly modified) requests to http://London04.internaldomain/xmlservlet for example.
There's no direct access to internaldomain exposed to end-users at all. The communication between the sites gets occasionally interrupted and sometimes the internaldomain nodes become unavailable/dead.
The Main site is using org.apache.http.impl.client.DefaultHttpClient (I know it's deprecated, we're gradually upgrading this legacy code) with readTimeout set to 10.000 milli-seconds.
The request and response have xml payload/body of variable length and the Transfer-Encoding: chunked is used, also the Keep-Alive: timeout=15 is used.
The problem:
Sometimes London04 actually needs more than 10 seconds (let's say 2 minutes) to execute. Sometimes it non-gracefully crashes. Sometimes other (networking) issues happen.
Sometimes during those 2 minutes - the portions of response-xml-data are being so gradually filled that there're no 10-second gaps between the portions and therefore the readTimeout is never exceeded,
sometimes there's a 10+ seconds gap and HttpClient times out...
We could try to increase the timeout on Main side, but that would easily bloat/overload the listener pool (just by regular traffic, not even being DDOSed yet).
We need a way to distinguish between internal-site-still-working-on-generating-the-response and the cases where it really crashed/network_lost/etc.
And a best thing feels to be some kind of heart-beat (every 5 seconds) during the communication.
We thought the Keep-Alive would save us, but it seems to only secure the gaps between the requests (not during the requests) and it seems to not do any heartbeating during the gap (just having/waiting_for the timeout).
We thought chunked-encoding may save us by sending some heartbeat (0-bytes-sized-chunks) to let other side aware, but there seems to be no such/default implementation of supporting any heartbeat this way and moreso it seems that 0-bytes-sized chunk is an EOD indicator itself...
Question(s):
If we're correct in assumptions that KeepAlive/ChunkedEncoding won't help us with achieving the keptAlive/hearbeat/fastDetectionOfDeadBackend then:
1) which layer such a heart-beat should be rather implemented at? Http? tcp?
2) any standard framework/library/setting/etc implementing it already? (if possible: Java, REST)
UPDATE
I've also looked into heartbeat-implementers for WADL/WSDL, though found none for REST, checked out the WebSockets...
Also looked into TCP-keepalives which seem to be the right feauture for the task:
https://en.wikipedia.org/wiki/Keepalive
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
Socket heartbeat vs keepalive
WebSockets ping/pong, why not TCP keepalive?
BUT according to those I'd have to set up something like:
tcp_keepalive_time=5
tcp_keepalive_intvl=1
tcp_keepalive_probes=3
which seems to be a counter-recommendation (2h is the recommended, 10min already presented as an odd value, is going to 5s sane/safe?? if it is - might be my solution upfront...)
also where should I configure this? on London04 alone or on Main too? (if I set it up on Main - won't it flood client-->Main frontend communication? or might the NATs/etc between sites ruin the keepalive intent/support easily?)
P.S. any link to an RTFM is welcome - I might just be missing something obvious :)
My advice would be don't use a heartbeat. Have your external-facing API return a 303 See Other with headers that indicates when and where the desired response might be available.
So you might call:
POST https://public.api/my/call
and get back
303 See Other
Location: "https://public.api/my/call/results"
Retry-After: 10
To the extent your server can guess how long a response will take to build, it should factor that into the Retry-After value. If a later GET call is made to the new location and the results are not yet done being built, return a response with an updated Retry-After value. So maybe you try 10, and if that doesn't work, you tell the client to wait another 110, which would be two minutes in total.
Alternately, use a protocol that's designed to stay open for long periods of time, such as WebSockets.
Take a look SSE
example code:
https://github.com/rsvoboda/resteasy-sse
or vertx event-bus:
https://vertx.io/docs/apidocs/io/vertx/core/eventbus/EventBus.html
I have this simple Spring boot based web app that downloads data from several APIs. Some of them don't respond in time, since my connectionTimeout is set to somewhat 4 seconds.
As soon as I get rid of connectionTimeout setting, I'm getting an exceptions after 20 or so seconds.
So, my question is, for how long am I able to try to connect to an API and what does it depend on? Where do those 20 seconds come from? What if an API responds after 40 minutes of time and I won't be able to catch that specific moment and just gonna lose data. I don't want that to happen. What are my options?
Here's the code to set the connection. Nothing special.
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new HttpComponentsClientHttpRequestFactory(HttpClientBuilder.create().build());
clientHttpRequestFactory.setConnectTimeout(4000);
RestTemplate restTemplate = new RestTemplate(clientHttpRequestFactory);
Then I retrieve the values via:
myObject.setJsonString(restTemplate.getForObject(url, String.class));
Try increasing your timeout. 4 seconds is too little.
It will need to connect, formulate data and return. So 4 seconds is just for connecting, by the time it attempts to return anything, your application has already disconnected.
Set it to 20 seconds to test it. You can set it to much longer to give the API enough time to complete. This does not mean you app will use up all of the connection timeout time. It will finish as soon as a result is returned. Also API's are not designed to take long. They will perform the task and return the result as fast as possible
Connection timeout means that your program couldn't connect to the server at all within the time specified.
The timeout can be configured, as, like you say, some systems may take a longer time to connect to, and if this is known in advance, it can be allowed for. Otherwise the timeout serves as a guard to prevent the application from waiting forever, which in most cases doesn't really give a good user experience.
A separate timeout can normally be configured for reading data (socket timeout). They are not inclusive of each other.
To solve your problem:
Check that the server is running and accepting incoming connections.
You might want to use curl or depending on what it is simply your browser to try and connect.
If one tool can connect, but the other can't, check your firewall settings and ensure that outgoing connections from your Java program are permitted. The easiest way to test whether this is a problem is to disable anti virus and firewall tools temporarily. If this allows the connection, you'll either need to leave the FW off, or better add a corresponding exception.
Leave the timeout on a higher setting (or try setting it to 0, which is interpreted as infinite) while testing. Once you have it working, you can consider tweaking it to reflect your server spec and usability requirements.
Edit:
I realised that this doesn't necessarily help, as you did ultimately connect. I'll leave the above standing as general info.
for how long am I able to try to connect to an API and what does it depend on?
Most likely the server that the API is hosted on. If it is overloaded, response time may lengthen.
Where do those 20 seconds come from?
Again this depends on the API server. It might be random, or it may be processing each request for a fixed period of time before finding itself in an error state. In this case that might take 20 seconds each time.
What if an API responds after 40 minutes of time and I won't be able to catch that specific moment and just gonna lose data. I don't want that to happen. What are my options?
Use a more reliable API, possibly paying for a service guarantee.
Tweak your connection and socket timeouts to allow for the capabilities of the server side, if known in advance.
If the response is really 40 minutes, it is a really poor service, but moving on with that assumption - if the dataset is that large, explore whether the API offers a streaming callback, whereby you pass in an OutputStream into the API's library methods, to which it will (asynchronously) write the response when it is ready.
Keep in mind that connection and socket timeout are separate things. Once you have connected, the connection timeout becomes irrelevant (socket is established). As long as you begin to receive and continue to receive data (packet to packet) within the socket timeout, the socket timeout won't be triggered either.
Use infinite timeouts (set to 0), but this could lead to poor usability within your applications, as well as resource leaks if a server is in fact offline and will never respond. In that case you will be left with dangling connections.
The default and maximum has nothing to do with the the server. It depends on the client platform, but it is around a minute. You can decrease it, but not increase it. Four seconds is far too short. It should be measured in tens of seconds in most circumstances.
And absent or longer connection timeouts do not cause server errors of any kind. You are barking up the wrong tree here.
I have written a managment application which has a function to put a bunch of events in multiple Google calendars.
On my computer everything works fine. But the main user of this application has a verry bad network connection. More percicely the ping to different server varies between 23ms and like 2000 ms and packets get lost.
My approach was, besides increasing the timout, to use an own thread for each API call and recall in case of an connection error.
And at this point I got stuck. Now every event is created. Unfortunately not once but at least once. So some events were uploaded mutiple times.
I have already tried to group them as batch requests, but google doesn't want events on multiple calendars in a single batch request.
I hope my situtaion is clear and someone has a solution for me.
I would first try to persuade the "main user" to get a better network connection.
If that is impossible, I would change the code to have the following logic:
// Current version
createEvent(parameters)
// New version
while (queryEvent(parameters) -> no event) {
createEvent(parameters)
}
with appropriate timeouts and retry counters. The idea is to implement some extra logic to make the creation of an event in the calendar idempotent. (This may entail generating a unique identifier on the client side for each event so that you can query the events reliably.)
Hy, I have made a java application using asterisk-java and from it I can receive a call and can initiate outbound as well. But I am facing one problem that whenever I bulk outbound calls to say 50k users, the application can handle only those who answered the calls not those who unanswered or didn't responded. Also as I have set the value of
OriginateAction.setAsync(true)
in my outbound calling application so I am getting success response to every call, which means call is successfully initiated, but if this value is not set then although I can check the response (error: incase the user didn't responded the call), but then in this case outbound bulk drops from 50k to 3k.
My Ideal solution would be if the call is not answered and is hung up eventually then I can redirect it to some AGI script, which would show its record (hangup cause, Answered/Busy/etc)
Kindly guide.
You have manage number of call yourself.
Asterisk not designed to know how many calls at once your hardware/trunks can support,that interface just for place SINGLE call.
Check vicidial dialler or other project writed before.
I have a Java Webserivce which querying a DB to return data to users. DB queries are expensive so I have Cron job which runs every 60 seconds to cache the current data in memcached.
Data elements 'close' after a time meaning they aren't returned by "get current data" requests. So these requests can utilize the cached data.
Clients use a feature called 'since' to get all the data that has changed since a particular timestamp (the last request's timestamp). This would return any closed data if that data closed during since that timestamp.
How can I effectively store the diffs/since data? Accessing the DB for every since request is too slow (and won't scale well), but because clients could request any since time, it makes it difficult to generate an all-purpose cache.
I tried having the cron job also build a since cache. It would do 'since' requests to have everything that changed since the last update, and attempted to force clients to request the timestamps which matched the cron job's since requests. But inconsistencies in how long the cron took plus neither the client nor corn job runs exactly every 60 seconds, so the small differences add up. This eventually results in some data closing, but the cache or the client misses it.
I'm not even sure what to search for to solve this.
I'd be tempted to stick a time expiring cache (eg ehcache with timeToLive set) in front of the database and have whatever process updated the database also put the data directly into the cache (resetting or removing an existing matching element). The webservice then just hits the cache (which is incredibly fast) on everything except its initial connection, filtering out the few elements that are too old and sending the rest on to the client. Gradually the old data gets dropped from the cache as its time to live passes. Then just make sure the cache gets pre populated when the service starts up.
Does your data has any time-stamping? We were having similar issues while caching here in my company, the time-stamping resolved it. You can use a "Valid-upto" timestamp with your data, so that your cache and client can know till when the data is valid.