proper way of connecting to twitter using twitter-hbc in multithreaded system - java

I am having a use-case in which I am consuming twitter filter stream in a multithreaded environment using twitter HBC.
I receive keywords to track from user on per request that I receive, process them and save them to the database.
Example
Request 1: track keywords [apple, google]
Request 2: track keywords [yahoo, microsoft]
What I am currently doing is, opening a new thread for each request and processing them.
I am making connection for every end-pints a below (I am following this official HBC example)
endpoint.trackTerms(Lists.newArrayList("twitterapi", "#yolo"));
Authentication auth = new OAuth1(consumerKey, consumerSecret, token, secret);
// Create a new BasicClient. By default gzip is enabled.
Client client = new ClientBuilder()
.hosts(Constants.STREAM_HOST)
.endpoint(endpoint)
.authentication(auth)
.processor(new StringDelimitedProcessor(queue))
.build();
// Establish a connection
client.connect();
The problem that I am facing is twitter gives me warning of having more that one connection.
But as according to the above code and my use case I have to make a new instance of Client object for ever request that I receive as my end points
(trackTerms) are different for every request that I receive.
What is the proper way to connect to twitter for my use-case (multithreaded system) to avoid to many connections warning and rate limit warnings
Current warnings from twitter:
2017-02-01 13:47:33 INFO TwitterStreamImpl:62 - 420:Returned by the Search and Trends API when you are being rate limited (https://dev.twitter.com/docs/rate-limiting).
Returned by the Streaming API:
Too many login attempts in a short period of time.
Running too many copies of the same application authenticating with the same account name.
Exceeded connection limit for user
Thanks.

Related

gRPC connection cycling

We are setting up a cluster to handle inferencing (with Tensorflow Serving) over gRPC. We intend to use a layer-7 load balancer (AWS ALB) to distribute the load. For our work load, inferencing will occur many times per minute from each client account. It is my understand that gRPC holds connection state for each of these channels. As a result, in order for the ALB to do its job, we need to periodically teardown and rebuild the connection on the client instance.
My question: what is the best practice for cycling a connection in Java?
Below is my proposed code, which would be called every couple minutes on each client channel. I assume that while the first connection is being shutdown, we can go about creating new one and immediately issue a request on it; or do we need to wait while the prior channel is shutdown first. In our situation, the channel will (very likely) be empty since the previous request will have been 10 seconds earlier.
if (mChannel != null)
mChannel.shutdown();
mChannel = ManagedChannelBuilder.forAddress(mHost, mPort).usePlaintext().build();
mStub = PredictionServiceGrpc.newBlockingStub(mChannel);
The best practice is to use Lookaside Load Balancing.
However, you can do few tweaks to terminate client connections.
var builder = ManagedChannelBuilder.forAddress(mHost, mPort)
.keepAliveTime(15, TimeUnit.SECONDS)
.keepAliveTimeout(5, TimeUnit.SECONDS);
The above config will ensure to terminate sticky gRPC connections, and AWS ALB can do its job to load balance requests uniformly.
There are other options that you can try depending upon your use case, e.g retries, etc. See ManagedChannelBuilder

While authenticating mongodb using java it is taking more time and throwing mongotimeoutException in case of wrong credential

MongoClient m = new MongoClient(new ServerAddress("182.178.0.29",27017),
Arrays.asList(MongoCredential.createCredential("username", "employeedb", "password".toCharArray())));
MongoDatabase md = m.getDatabase("employeedb");
MongoIterable<String> strings = md.listCollectionNames();
MongoCursor<String> iterator = strings.iterator();
After authentication i need to show message to end user. But, the exception is comming after 30 seconds in case when the user enters wrong credentials.User needs to wait untill the msg dialog comes. Could you please check why it is taking that much time and is there any other way to authenticate.
MongoDB version: 3.2.14 java driver version: 3.2.1
Exception:
com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches ReadPreferenceServerSelector
If you know why your authentication is failing ( you mentioned wrong credentials) , then you can customize the timeout period for failing authentication using serverSelectionTimeout property so that you can show your users the failed authentication rather quickly.
More explanation can be seen on MongoDB site here.
The serverSelectionTimeoutMS variable gives the amount of time in
milliseconds that drivers should allow for server selection before
giving up and raising an error. Users can set this higher or lower
depending on whether they prefer to be patient or to return an error
to users quickly (e.g. a "fail whale" web page). The default is 30
seconds, which is enough time for a typical new-primary election to
occur during failover.

Handling AWS User Pool + Fedration Identity token refresh system in android

Here is question which might be asked several times before but I am struggling to frame a query.
So aws cognito works as you have to pass the IDToken + authentication provider to cognito identity federation and it provides the temporary credentials valid for an hour. So what happens after an hour is, I get Authentication Exception.
Now I observed CognitoCachingCredentialProvider tries to refresh before performing given task let's say execute lambda or make dynamodb query. But what is a good way to handle expiry, intercept refresh, fetch token first and set it to credentialprovider and then continue refresh.
May it be UserPool IDToken or Google's IDToken, all I need to know is how to know if credentials are expired and I need to fetch the new IDTokens from providers and refresh credentials before processing the request.
I have tried hourly task (55 minutes actually) but sometimes it won't work and not very reliable so far.
Thanks
It's a bit tricky to get just right, but there's two common ways to handle it.
One is to do what you suggested - track when the token was vended, and then refresh if it's within some threshold of expiring (e.g. refresh if it's < 5 minutes from expiry).
The other is to blindly try to refresh, then catch the exception that gets thrown when a token is expired and refres/retry there. If you go this route, be careful to only retry once there so you don't spam the service if the request isn't just right.

Design suggestion for handling large mailboxes using java mail api (IMAP)

We use java mail api with imap and fetch messages of folders containing millions of messages. There are some rules and limitiations:
We do not have always open connections to mail server and therefore we can not add listeners.
The messages will be stored in a local database with all properties, subject, body, receive date, from etc.
Can not use multiple threads
To keep the performance at acceptable levels and prevent out of memory crashes, I am planning:
1.During inital fetch, where all messages have to be fetched, store only the message headers and bypass body and attachments. Getting the body and attachment of a message will be done when requested by the client. The initialization can take hours, it is not a problem.
2.When fetching all messages at start, use a appropriate fetch profile to make it faster, but process in blocks, for example:
Message m1[] = f.getMessages(1, 10000);
f.fetch(m1, fp);
//process m1 array
Message m2[] = f.getMessages(10001, 20000);
f.fetch(m2, fp);
//process m2 array
instead of
Message m_all[] = f.getMessages(1, NUMALLMESSAGES);
f.fetch(m_all, fp);
//process m_all array, may throw out of memory errors
3.And after we have all the messages, store the UID of recent message in the DB and on the next fetch perform:
f.getMessagesByUID(LASTUIDREADFROMDB, UIDMAX)
Do you have additional suggestions, or see any points we have to care of (memory, performance)

couchdb gen_server call timeout during purge

I'm running an analysis on time duration to run a couchdb purge using a java program. The couchdb connections and calls are handled using ektorp. For a small number of documents purging takes place and I receive a success response.
But when I purge ~ 10000 or more, I get the following error:
org.ektorp.DbAccessException: 500:Internal Server Error
URI: /dbname/_purge
Response Body:
{
"error" : "timeout",
"reason" : "{gen_server,call,
....
On checking the db status using a curl command, the actual purging has taken place. But this timeout does not allow me to monitor the actual time of the purging method in my java program since this throws an exception.
On some research, I believe this is due to a default timeout value of an erlang gen_server process. Is there anyway for me to fix this?
I have tried changing the timeout values of the StdHttpClient to no avail.
HttpClient authenticatedHttpClient = new StdHttpClient.Builder()
.url(url)
.username(Conf.COUCH_USERNAME)
.password(Conf.COUCH_PASSWORD)
.connectionTimeout(600*1000)
.socketTimeout(600*1000)
.build();
CouchDB Dev here. You are not supposed to use purge with large numbers of documents. This is to remove accidentally added data from the DB, like credit card or social security numbers. This isn’t meant for general operations.
Consequently, you can’t raise that gen_server timeout :)

Categories

Resources