How to handle rate limit using twitter4j? - java

I frequently get the following exception using twitter4j:
2015-06-02 10:04:30,802 DEBUG [Twitter Stream consumer-1[Establishing connection]] TwitterBot(116): Got an exception 420:Returned by the Search and Trends API when you are being rate limited (https://dev.twitter.com/docs/rate-limiting).
Returned by the Streaming API:
Too many login attempts in a short period of time.
Running too many copies of the same application authenticating with the same account name.
Exceeded connection limit for user
Since i try to avoid being banned from Twitter, i would like to ask, if I am doing something wrong in my code:
I am using a StatusListener on the Stream API, which is filtered by my own ID and some string values.
If a status matches the criteria an answer for this status is send via twitter. This does not happen very often and therefore this should not be the problem of the rate limitation exception.
The whole thing runs in a TomEE EJB environment, if this is important?
#Startup
#Singleton
public class TwitterBot implements Service {
private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TwitterBot.class);
//this is fix for this twitter bot
public static final String TWITTER_BOT_NAME = "botname";
public static final long TWITTER_BOT_USER_ID = 99999L; //the bot's user id
private final TwitterStream twitterStream;
private final Twitter twitter;
public TwitterBot() {
this.twitterStream = new TwitterStreamFactory().getInstance();
this.twitter = TwitterFactory.getSingleton();
}
#PostConstruct
public void listen() {
StatusListener listener = new StatusListener() {
#Override
public void onStatus(Status status) {
//this is to avoid a circle... ignore tweets coming from ourselves.
if (status.getUser().getScreenName().equalsIgnoreCase(TWITTER_BOT_NAME)) {
return;
}
try {
//do something and update own status
StatusUpdate update = new StatusUpdate("Hello World!");
update.setInReplyToStatusId(status.getId());
twitter.updateStatus(update);
} catch (TwitterException e) {
logger.error("Could not complete twitter update, {}", e.getLocalizedMessage(), e);
}
}
//other Status Listener methods, which are not used (default implementation)
};
//filtering for ourselves here
long[] userFilter = {TWITTER_BOT_USER_ID};
String[] termFilter = {TWITTER_EXPERTIZER_BOT_NAME};
FilterQuery filter = new FilterQuery(0, userFilter, termFilter);
twitterStream.addListener(listener);
twitterStream.filter(filter);
}
}
The answer on this How to handle rate limit using twitter4j to avoid being banned tells me, that the Streaming API has no rate limitation.
So what is the issue? Is there an explanation in the API documentation?
Thank you in advance!

Edit:
The Problem is related to
FilterQuery filter = new FilterQuery(0, userFilter, termFilter);
Using the query like this produces some kind of polling on the Twitter API and therefore exceeds connection limit.
Instead use:
FilterQuery filter = new FilterQuery(termFilter);

Related

the use of threads and the java Future interface in AWS Lambda

I want to create an AWS Lambda function in java that writes to a database in Firestore. The short story is that, while the code does what it should when I execute it
on my own computer, using NetBeans (the truth is that it works most of the time, but not always, maybe due to problems with my internet connection), nothing at all
happens when I deploy it as a Lambda function and invoke this. I suspect that this has less to do with Firestore itself, but rather with how AWS Lambda handles asynchronous
operations.
Now to the details!
As a simple example, the method that writes to the Firestore object db reads
public static void writeFirestore(Firestore db){
try{
DateTime now = DateTime.now();
String time = now.toString();
Map<String, String> data = new HashMap<>();
data.put("time", time);
String collTitle = "Notebook";
String docTitle = "Document: "+time;
db.collection(collTitle).document(docTitle).set(data);
System.out.println("wrote to Firestore");
}
catch(Exception e){
System.out.println("Could not write to db: "+e.toString());
}
}
Now, as it takes some time to connect to Firestore and initialize db, I want to make sure that db is not passed as an argument into writeFirestore() before it
has been properly retrieved. So, I define a version of db in the form of a Future object, using ExecutorService, and then retrieve
the object db with the get()-method. For this, I define the class TaskRunner:
public class TaskRunner {
ExecutorService executor;
public TaskRunner(){
executor = Executors.newSingleThreadExecutor();
}
public static interface Callback<T>{
public void onCallback(T result);
}
public <T> void executeAsync(Callable<T> callable, Callback<T> callback) throws Exception{
try{
Future future = executor.submit(callable);
Object result = future.get();
if(result != null){
System.out.println("result is not null; applying callback...");
callback.onCallback((T) result);
}
else{
System.out.println("result is null");
}
}
catch(Exception e){
System.out.println("Problem running executeAsync: "+e.toString());
}
}
}
Writing the example document to my fixed database db now goes as follows:
I define the class FirestoreCreator that implements Callable with the purpose of retrieving the Firestore object db:
public static class FirestoreCreator implements Callable<Firestore>{
#Override
public Firestore call() throws Exception {
String projectId = "myProjectId";
GoogleCredentials credentials =
GoogleCredentials.fromStream(new FileInputStream("myCredentialsFile.json"));
FirestoreOptions firestoreOptions = FirestoreOptions.getDefaultInstance()
.toBuilder()
.setProjectId(projectId)
.setCredentials(credentials)
.build();
Firestore db = firestoreOptions.getService();
return db;
}
}
I implement the TaskRunner.Callback interface using writeFirestore().
I create a TaskRunner object, taskRunner, and call its executeAsync() method with the above two objects as parameters.
These three steps are collected in the final method testUpdateFirestore() that does the job:
public static void testUpdateFirestoreInterface(){
FirestoreCreator fsCreator = new FirestoreCreator();
TaskRunner.Callback<Firestore> updateCallback = new TaskRunner.Callback<Firestore>() {
#Override
public void onCallback(Firestore result) {
writeFirestore(result);
}
};
TaskRunner taskRunner = new TaskRunner();
try {
taskRunner.executeAsync(fsCreator, updateCallback);
} catch (Exception ex) {
System.out.println("Failed to run executeAsync");
}
}
As I already mentioned in the introduction, the code works (most times) when I run it on my computer, but not at all in AWS Lambda. No exception is thrown, and yet no document has been written in Firestore.
The discussion about threads in AWS Lambda (https://dzone.com/articles/multi-threaded-programming-with-aws-lambda) made me suspect that reason is that the use of some thread that runs when ExecutorService is used is not being handled properly.
Does anyone know what goes wrong and what a solution could look like?

Twitter4j: Oauth wont work

Fighting to get a working twitter4j instance, but having problems with Oauth. I copied-pasted multiple times, created different apps and keys/tokens and still cant get this working. What could i do wrong here really?
my main:
public class Main {
public static void main(String[] args) {
// write your code here
Tweeter tweet = new Tweeter();
try{
tweet.searchTweets();
} catch (TwitterException e) {
e.printStackTrace();
}
}
}
my twitter class:
public class Tweeter {
public static Twitter GetTwitterInstance(){
String consumerKey = "**************";
String consumerSecret = "*************";
String accessToken = "***************"; // yourt token
String accessTokenSecret = "************"; // your token secre
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey(consumerKey)
.setOAuthConsumerSecret(consumerSecret)
.setOAuthAccessToken(accessToken)
.setOAuthAccessTokenSecret(accessTokenSecret);
TwitterFactory tf = new TwitterFactory(cb.build());
Twitter twitter = tf.getSingleton();
return twitter;
}
public static void searchTweets() throws TwitterException{
Twitter twitter = GetTwitterInstance();
Query query = new Query("something");
QueryResult result = twitter.search(query);
List<Status> statuses = result.getTweets();
for(Status tweet: statuses){
System.out.println("User: "+tweet.getUser().getScreenName() + " Tweet: "+tweet.getText());
}
}
}
And the error stacktrace:
400:The request was invalid. An accompanying error message will explain why.
This is the status code will be returned during version 1.0 rate
limiting(https://dev.twitter.com/pages/rate-limiting). In API v1.1, a request
without authentication is considered invalid and you will get this response.
message - Bad Authentication data.
code - 215
Relevant discussions can be found on the Internet at:
http://www.google.co.jp/search?q=506c3b98 or
http://www.google.co.jp/search?q=11ed9ae8
TwitterException{exceptionCode=[506c3b98-11ed9ae8], statusCode=400,
message=Bad Authentication data., code=215, retryAfter=-1,
rateLimitStatus=null, version=4.0.6}
at twitter4j.HttpClientImpl.handleRequest(HttpClientImpl.java:164)
at twitter4j.HttpClientBase.request(HttpClientBase.java:57)
at twitter4j.HttpClientBase.get(HttpClientBase.java:75)
at twitter4j.TwitterImpl.get(TwitterImpl.java:1786)
at twitter4j.TwitterImpl.search(TwitterImpl.java:268)
at Tweeter.searchTweets(Tweeter.java:58)
at Main.main(Main.java:14)
Printing the twitter factory's keys/tokens gives the normal keys/tokens as expected without weird characters or spaces or whatever. Any help will be really really appreciated!
Problem was that tf.getSingleton() does not return a twitter instance apparently.
it should be just Twitter twitter = tf.getInstance(); and then it works.
Your code appears to be cut and pasted from known working code. Check again that there are no extra characters, such as a space at the end of a string.

Manipulate with cache as with collection in Spring

I looked a lot of stuff on on internet but I don't found any solution for my needs.
Here is a sample code which doesn't work but show my requirements for better understanding.
#Service
public class FooCachedService {
#Autowired
private MyDataRepository dataRepository;
private static ConcurrentHashMap<Long, Object> cache = new ConcurrentHashMap<>();
public void save(Data data) {
Data savedData = dataRepository.save(data);
if (savedData.getId() != null) {
cache.put(data.getRecipient(), null);
}
}
public Data load(Long recipient) {
Data result = null;
if (!cache.containsKey(recipient)) {
result = dataRepository.findDataByRecipient(recipient);
if (result != null) {
cache.remove(recipient);
return result;
}
}
while (true) {
try {
if (cache.containsKey(recipient)) {
result = dataRepository.findDataByRecipient(recipient);
break;
}
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return result;
}
}
and data object:
public class Data {
private Long id;
private Long recipient;
private String payload;
// getters and setters
}
As you can see in code above I need implement service which will be stored new data into database and into cache as well.
Whole algorithm should looks something like that:
Some userA create POST request to my controller to store data and it fire save method of my service.
Another userB logged in system send request GET to my controller which fire method load of my service. In this method is compared logged user's id which sent request with recipients' ids in map. If map contains data for this user they are fetched with repository else algorithm check every second if there are some new data for that user (this checking will be some timeout, for example 30s, and after 30s request return empty data, and user create new GET request and so on...)
Can you tell me if it possible do it with some elegant way and how? How to use cache for that or what is the best practice for that? I am new in this area so I will be grateful for any advice.

Vertx http server Thread has been blocked for xxxx ms, time limit is 2000

i have written a large scale http server using , but im getting this error when number of concurrent requests increases
WARNING: Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 8458 ms, time limit is 1000
io.vertx.core.VertxException: Thread blocked
here is my full code :
public class MyVertxServer {
public Vertx vertx = Vertx.vertx(new VertxOptions().setWorkerPoolSize(100));
private HttpServer server = vertx.createHttpServer();
private Router router = Router.router(vertx);
public void bind(int port){
server.requestHandler(router::accept).listen(port);
}
public void createContext(String path,MyHttpHandler handler){
if(!path.endsWith("/")){
path += "/";
}
path+="*";
router.route(path).handler(new Handler<RoutingContext>() {
#Override
public void handle(RoutingContext ctx) {
String[] handlerID = ctx.request().uri().split(ctx.currentRoute().getPath());
String suffix = handlerID.length > 1 ? handlerID[1] : null;
handler.Handle(ctx, new VertxUtils(), suffix);
}
});
}
}
and how i call it :
ver.createContext("/getRegisterManager",new ProfilesManager.RegisterHandler());
ver.createContext("/getLoginManager", new ProfilesManager.LoginHandler());
ver.createContext("/getMapcomCreator",new ItemsManager.MapcomCreator());
ver.createContext("/getImagesManager", new ItemsManager.ImagesHandler());
ver.bind(PORT);
how ever i dont find eventbus() useful for http servers that process send/receive files , because u need to send the RoutingContext in the message with is not possible.
could you please point me to the right direction? thanks
added a little bit of handler's code:
class ProfileGetter implements MyHttpHandler{
#Override
public void Handle(RoutingContext ctx, VertxUtils utils, String suffix) {
String username = utils.Decode(ctx.request().headers().get("username"));
String lang = utils.Decode(ctx.request().headers().get("lang"));
display("profile requested : "+username);
Profile profile = ProfileManager.FindProfile(username,lang);
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
}
here ProfileManager.FindProfile(username,lang) does a long running database job on the same thread
...
basically all of my processes are happening on the main thread , because if i use executor i will get strange exceptions and nullpointers in Vertx , making me feel like the request proccessors in Vertx are parallel
Given the small amount of code in the question lets agree that the problem is on the line:
Profile profile = ProfileManager.FindProfile(username,lang);
Assuming that this is internally doing some blocking JDBC call which is a anti-pattern in Vert.x you can solve this in several ways.
Say that you can totally refactor the ProfileManager class which IMO is the best then you can update it to be reactive, so your code would be like:
ProfileManager.FindProfile(username,lang, res -> {
if (res.failed()) {
// handle error, sent 500 back, etc...
} else {
Profile profile = res.result();
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
});
Now what would be hapening behind the scenes is that your JDBC call would not block (which is tricky because JDBC is blocking by nature). So to fix this and you're lucky enough to use MySQL or Postgres then you could code your JDBC against the async-client if you're stuck with other RDBMS servers then you need to use the jdbc-client which in turn will use a thread pool to offload the work from the event loop thread.
Now say that you cannot change the ProfileManager code then you can still off load it to the thread pool by wrapping the code in a executeBlocking block:
vertx.executeBlocking(future -> {
Profile profile = ProfileManager.FindProfile(username,lang);
future.complete(profile);
}, false, res -> {
if (res.failed()) {
// handle error, sent 500 back, etc...
} else {
Profile profile = res.result();
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
});

Using multiple threads to get data from twitter using twitter4j

I have a set of the keywords (over 600) and I want to use streaming api to track tweets with them. Twitter api limits the number of keywords, which you are allowed to track, to 200. So I decided to have several threads that will do it, using several OAuth tokens for this. This is how I do it:
String[] dbKeywords = KeywordImpl.listKeywords();
List<String[]> keywords = ditributeKeywords(dbKeywords);
for (String[] subList : keywords) {
StreamCrawler streamCrawler = new StreamCrawler();
streamCrawler.setKeywords(subList);
Thread crawlerThread = new Thread(streamCrawler);
crawlerThread.start();
}
This is how words are distributed among threads. Each thread receives no more than 200 words.
This is the implementation of the StreamCrawler:
public class StreamCrawler extends Crawler implements Runnable {
...
private String[] keywords;
public void setKeywords(String[] keywords) {
this.keywords = keywords;
}
#Override
public void run() {
TwitterStream twitterStream = getTwitterInstance();
StatusListener listener = new StatusListener() {
ArrayDeque<Tweet> tweetbuffer = new ArrayDeque<Tweet>();
ArrayDeque<TwitterUser> userbuffer = new ArrayDeque<TwitterUser>();
#Override
public void onException(Exception arg0) {
System.out.println(arg0);
}
#Override
public void onDeletionNotice(StatusDeletionNotice arg0) {
System.out.println(arg0);
}
#Override
public void onScrubGeo(long arg0, long arg1) {
System.out.println(arg1);
}
#Override
public void onStatus(Status status) {
...Doing something with message
}
#Override
public void onTrackLimitationNotice(int arg0) {
System.out.println(arg0);
try {
Thread.sleep(5 * 60 * 1000);
System.out.println("Will sleep for 5 minutes!");
} catch (InterruptedException e) {
e.printStackTrace();
}
}
#Override
public void onStallWarning(StallWarning arg0) {
System.out.println(arg0);
}
};
FilterQuery fq = new FilterQuery();
String keywords[] = getKeywords();
System.out.println(keywords.length);
System.out.println("Listening for " + Arrays.toString(keywords));
fq.track(keywords);
twitterStream.addListener(listener);
twitterStream.filter(fq);
}
private long getCurrentThreadId() {
return Thread.currentThread().getId();
}
private TwitterStream getTwitterInstance() {
TwitterConfiguration configuration = null;
TwitterStream twitterStream = null;
while (configuration == null) {
configuration = TokenFactory.getAvailableToken();
if (configuration != null) {
System.out
.println("Token was obtained " + getCurrentThreadId());
System.out.println(configuration.getTwitterAccount());
setToken(configuration);
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true);
cb.setOAuthConsumerKey(configuration.getConsumerKey());
cb.setOAuthConsumerSecret(configuration.getConsumerSecret());
cb.setOAuthAccessToken(configuration.getAccessToken());
cb.setOAuthAccessTokenSecret(configuration.getAccessSecret());
twitterStream = new TwitterStreamFactory(cb.build())
.getInstance();
} else {
// If there is no available configuration, wait for 2 minutes
// and try again
try {
System.out
.println("There were no available tokens, sleeping for 2 minutes.");
Thread.sleep(2 * 60 * 1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
return twitterStream;
}
}
So my problem is that when I start for example 2 threads I get notification that both of them are opening stream and getting it. But actually only first one is really getting stream and respectively calling OnStatus method. The array, which is used in the second thread, is not empty; Twitterconfiguration is also valid and unique. So I don't understand what might be the reason for such behavior. Why does the only first thread return tweets?
As far as I see you're trying to make two simultaneous connections to the public streaming endpoints (a.k.a. general streams or stream.twitter.com) from the same IP.
More specifically, I think you want two active connections to stream.twitter.com/1.1/statuses/filter.json from the same IP.
Although the Twitter streaming-apis documentation doesn't clearly say about only one standing connection to the public endpoints, the Twitter employees clarify this on the dev site https://dev.twitter.com/discussions/7542
For general streams, you should only make one connection from the same IP.
This means that it doesn't matter you use two different Twitter applications/accounts to connect to public streams; as long you're connecting from the same IP address you can have only one standing connection to the public streams. You said that you got both streams connected, and the answer to this behaviour is given by a Twitter employee: https://dev.twitter.com/discussions/14935
You may find that at times stream.twitter.com lets you get away with more open connections here or there, but that behavior shouldn't be counted on.
If you try for instance, in the 2nd thread, to connect to user stream instead (twitter4j TwitterStream user() method), then you'll really start getting both filter & user streams.
Regarding the 200 track keywords limit, probably the twitter4j.org javadoc is little bit outdated. Here is what the twitter api docs are saying
The default access level allows up to 400 track keywords, 5,000 follow userids and 25 0.1-360 degree location boxes. If you need elevated access to the Streaming API, you should explore our partner providers of Twitter data ...
So, if you need to go beyond the 400, you'll probably want to ask Twitter for increased track access level for your Twitter account application, or working with certified partner providers of Twitter data.
Another thing you don't necessarily need, is starting new threads for getting the streams, since the twitter4j filter (or user) "method internally creates a thread which manipulates TwitterStream and calls adequate listener methods continuously" (quoted from an example code by Yusuke Yamamoto).
I hope this help. (I couldn't post more links because I'm getting this "You need at least 10 reputation to post more than 2 links")

Categories

Resources