How to get fixed number of tweets using twitter4j - java

I am using Twitter4j to build a client to fetch tweets for the input search term. I am also trying to provide the facility to the user to enter the number of tweets he wants in the result.
I know that we can set the number of tweets to be returned per page with the Query's setCount() method:
Query q = new Query(searchTerm);
q.setCount(maxTweets);
But if I am giving the value 1 as maxTweets, it returns 2 tweets.
Update: After further research I observed that it is returning 1 extra tweet per search. So I am giving 1 as maxTweets value it is returning 2 tweets. If I am giving 2 as maxTweets value it is returning 3 tweets and so on.
I am not sure where I am wrong but please let me know if there is a way by which I can get the fixed number of tweets using twitter4j.
Any guidance will be helpful.

When you write
Query q = new Query(searchTerm);
Think of it as one tabled page which contains an amount of result matching your query. But there might be more multiple pages.
When you set
q.setCount(maxTweets);
it will bring you maxTweets amount of tweets per page. In your case, 2, because there were two pages matching your query and you selected one tweet per page.
What you can do, try to handle it with a do - while loop.
Query q = new Query(searchTerm);
QueryResult result;
int tempUSerInput = 0; //keep a temp value
boolean flag = false;
do {
result = twitter.search(query);
List<Status> tweets = result.getTweets();
tempUSerInput = tempUSerInput + tweets.size();
if(tempUSerInput >= realyourUserInput) // you have already matched the number
flag = true; //set the flag
}
while ((query = result.nextQuery()) != null && !flag);
// Here Take only realyourUserInput number
// as you might have taken more than required
List<Status> finaltweets = new ArrayList();
for(int i=0; i<realyourUserInput; i++)
finaltweets.add( tweets.get(i) ); //add them to your final list

Related

Twitter rate limit null pointer exception

To be clear, to all of the guys who rush and say that these type of posts are duplicate without even reading it: this is not a type of question in which i ask what null is and how can i manage these exceptions, here i ask why twitter's API returns to my method A null object seemingly random.
I am creating a java application that interacts with Twitter API using the library Twitter4J. I want to download a big amount of tweets, and then do the statistics on the offline data. Tweets are saved in a NoSQL database (elasticsearch).
My code was doing fine when it started printing the tweets only on the console for testing. When my program hit the limit of max tweets it slept until the reset of twitter limitation (more than 1.000.000 was printed and got zero errors), the problem came up after i started saving the tweets in my database, after some loops, i get a java.lang.NullPointerException in this exact statement if (searchTweetsRateLimit.getRemaining() == 0). Any suggestions?
public static void main(String[] args) throws TwitterException {
int totalTweets = 0;
long maxID = -1;
twitter4j.Twitter twitter = getTwitter();
RestClient restclient = RestClient.builder(
new HttpHost("localhost",9200,"http"),
new HttpHost("localhost",9201,"http")).build();
Map<String, RateLimitStatus> rateLimitStatus = twitter.getRateLimitStatus("search");
// This finds the rate limit specifically for doing the search API call we use in this program
RateLimitStatus searchTweetsRateLimit = rateLimitStatus.get("/search/tweets");
System.out.printf("You have %d calls remaining out of %d, Limit resets in %d seconds\n",
searchTweetsRateLimit.getRemaining(),
searchTweetsRateLimit.getLimit(),
searchTweetsRateLimit.getSecondsUntilReset());
int i = 10;
// This is the loop that retrieve multiple blocks of tweets from Twitter
for (int queryNumber=0;queryNumber < MAX_QUERIES; queryNumber++)
{
System.out.printf("\n\n!!! Starting loop %d\n\n", queryNumber);
// Do we need to delay because we've already hit our rate limits?
if (searchTweetsRateLimit.getRemaining() == 0)
{
// Yes we do, unfortunately ...
System.out.printf("!!! Sleeping for %d seconds due to rate limits\n", searchTweetsRateLimit.getSecondsUntilReset());
// If you sleep exactly the number of seconds, you can make your query a bit too early
// and still get an error for exceeding rate limitations
Thread.sleep((searchTweetsRateLimit.getSecondsUntilReset()+2) * 1000l);
}
Query q = new Query(SEARCH_TERM); // Search for tweets that contains this term
q.setCount(TWEETS_PER_QUERY); // How many tweets, max, to retrieve
q.resultType(null); // Get all tweets
q.setLang("en"); // English language tweets, please

Count number of strings returned from Azure query

I am trying to count the number of rows from my azure database table in my android java code. Unfortunately there is no count() method built into the azure library. The closest thing to it is the includeInlineCount() method. I used the following line of code:
final MobileServiceList<Crime> result = mToDoTable.includeInlineCount().execute().get();
Which returns the value of the first column for each row. The value of result looks something like this:
[column1_row1_String, column1_row2_String, column1_row3_String]
How can I extract the number of strings from the value result?
According to the source code of Class MobileServiceList, you can try the code below using the method getTotalCount().
final MobileServiceList<Crime> result = mToDoTable.includeInlineCount().execute().get();
int count = result.getTotalCount();
TRY THIS..it will select all the elements of your table and then count it to return integer value.
int count = mToDoTable.execute().get().getTotalCount();
this will definitely give you your required answer.
There is no built in count() method which can directly count the number, you can try using includeTotalCount() instead, first query out all result and do the count. Below is an example in C#, hope it could help you make it out in Java:
var table = MobileService.GetTable<T> ();
var query = table.Take(0).IncludeTotalCount();
IList<T> results = await query.ToListAsync ();
long count = ((ITotalCountProvider)results).TotalCount;
Check this thread for details:How to get the row count from an azure database?

Java Mysql big data out of heap space

I have an application which accesses about 2 million tweets from a MySQL database. Specifically one of the fields holds a tweet of text (with maximum length of 140 characters). I am splitting every tweet into an ngram of words ngrams where 1 <= n <= 3. For example, consider the sentence:
I am a boring sentence.
The corresponding nGrams are:
I
I am
I am a
am
am a
am a boring
a
a boring
a boring sentence
boring
boring sentence
sentence
With about 2 million tweets, I am generating a lot of data. Regardless, I am surprised to get a heap error from Java:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2145)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1922)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3423)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:483)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3118)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2288)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2709)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2678)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1612)
at twittertest.NGramFrequencyCounter.FreqCount(NGramFrequencyCounter.java:49)
at twittertest.Global.main(Global.java:40)
Here is the problem code statement (line 49) as given by the above output from Netbeans:
results = stmt.executeQuery("select * from tweets");
So, if I am running out of memory it must be that it is trying to return all the results at once and then storing them in memory. What is the best way to solve this problem? Specifically I have the following questions:
How can I process pieces of results rather than the whole set?
How would I increase the heap size? (If this is possible)
Feel free to include any suggestions, and let me know if you need more information.
EDIT
Instead of select * from tweets I partitioned the table into equally sized subsets of about 10% of the total size. Then I tried running the program. It looked like it was working fine but it eventually gave me the same heap error. This is strange to me because I have ran the same program in the past, successfully with 610,000 tweets. Now I have about 2,000,000 tweets or roughly 3 times as much more data. So if I split the data into thirds it should work, but I went further and split the subsets into size 10%.
Is some memory not being freed? Here is the rest of the code:
results = stmt.executeQuery("select COUNT(*) from tweets");
int num_tweets = 0;
if(results.next())
{
num_tweets = results.getInt(1);
}
int num_intervals = 10; //split into equally sized subets
int interval_size = num_tweets/num_intervals;
for(int i = 0; i < num_intervals-1; i++) //process 10% of the data at a time
{
results = stmt.executeQuery( String.format("select * from tweets limit %s, %s", i*interval_size, (i+1)*interval_size));
while(results.next()) //for each row in the tweets database
{
tweetID = results.getLong("tweet_id");
curTweet = results.getString("tweet");
int colPos = curTweet.indexOf(":");
curTweet = curTweet.substring(colPos + 1); //trim off the RT and retweeted
if(curTweet != null)
{
curTweet = removeStopWords(curTweet);
}
if(curTweet == null)
{
continue;
}
reader = new StringReader(curTweet);
tokenizer = new StandardTokenizer(Version.LUCENE_36, reader);
//tokenizer = new StandardFilter(Version.LUCENE_36, tokenizer);
//Set stopSet = StopFilter.makeStopSet(Version.LUCENE_36, stopWords, true);
//tokenizer = new StopFilter(Version.LUCENE_36, tokenizer, stopSet);
tokenizer = new ShingleFilter(tokenizer, 2, 3);
charTermAttribute = tokenizer.addAttribute(CharTermAttribute.class);
while(tokenizer.incrementToken()) //insert each nGram from each tweet into the DB
{
insertNGram.setInt(1, nGramID++);
insertNGram.setString(2, charTermAttribute.toString().toString());
insertNGram.setLong(3, tweetID);
insertNGram.executeUpdate();
}
}
}
Don't get all rows from table. Try to select partial
data based on your requirement by setting limits to query. You are using MySQL database your query would be select * from tweets limit 0,10. Here 0 is starting row id and 10 represents 10 rows from start.
You can always increase the heap size available to your JVM using the -Xmx argument. You should read up on all the knobs available to you (e.g. perm gen size). Google for other options or read this SO answer.
You probably can't do this kind of problem with a 32-bit machine. You'll want 64 bits and lots of RAM.
Another option would be to treat it as a map-reduce problem. Solve it on a cluster using Hadoop and Mahout.
Have you considered streaming the result set? Halfway down the page is a section on result set, and it addresses your problem (I think?) Write the n grams to a file, then process the next row? Or, am I misunderstanding your problem?
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

How many items can send in one amazon request?

I try to get ASIN for movies in my DB, try to match via EAN. I created simple java class to connect with Amazon:
AWSECommerceServiceservice = new AWSECommerceService();
service.setHandlerResolver(new AwsHandlerResolver(AMAZON_SECRET_KEY));
AWSECommerceServicePortType port = service.getAWSECommerceServicePortDE();
Holder<OperationRequest> operationrequest = new Holder<OperationRequest>();
failedItem = 0;
doneItem = 0;
ItemLookup lookup = new ItemLookup();
lookup.setAWSAccessKeyId(AMAZON_ACCESS_ID);
lookup.setAssociateTag(ASSOCIATE_TAG);
ItemLookupRequest request = new ItemLookupRequest();
request.getResponseGroup().add("ItemAttributes");
request.setSearchIndex("Video");
request.setIdType("EAN");
//Here iterates on my amazon-movies object
for (AmazonSIdN i : amazonItems) {
request.getItemId().add(i.getEan());
if (request.getItemId().size() % 10 == 0) { //numbers of items in request
LOG.info("Request size = " + request.getItemId().size());
break;
}
}
Holder<List<Items>> items = new Holder<List<Items>>();
lookup.getRequest().add(request);
port.itemLookup(lookup.getMarketplaceDomain(), lookup.getAWSAccessKeyId(),
lookup.getAssociateTag(),lookup.getXMLEscaping(), lookup.getValidate(),
lookup.getShared(), lookup.getRequest(),operationrequest, items);
List<Items> result = items.value;
for (int k = 0; k < result.get(0).getItem().size(); ++k) {
Item i = result.get(0).getItem().get(k);
//do something with item
}
It is working, but one thing is really strange for me. When in this line:
if (request.getItemId().size() % 10 == 0) { //numbers of items in request
I use 40 000 instead of 10 WS returns:
The server sent HTTP status code 413: Request Entity Too Large
And this is understandable. But when I put number greater then 10 WS doesn't return eny errors but result has no items:
Can anybody tell me what I a'm doing wrong or what did I forget set? Any advice?
I found an answer, so I put it here maybe someone save some time in future.
In Amazon ItemLookup documentatnion is written:
ItemId - One or more (up to ten) positive integers that uniquely identify an item. The meaning of the number is specified by IdType. That is, if IdType is ASIN, the ItemId value is an ASIN. If ItemIdis an ASIN, a search index cannot be specified in the request.
Type: String
Default: None
Constraints: Must be a valid item ID. For more than one ID, use a comma-separated list of up to ten IDs.

How to perform limit query in HQL when table size is not known? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do you do a limit query in HQL
I have a table A of large yet unknown size. I need to retrieve results in batches, i.e. 100 at a time, from the table and perform operations on them. For this, I have a variable m so that I can execute the HQL equivalent of query "SELECT * from A LIMIT m,100" within a loop, which iterates m as 1, 101, 201, ... and so on. What shall be the terminating condition of the loop? What is the result when the above query is executed in a condition when m > number of rows in the table?
use select count(*) query first , this will give you total number of records. now you write java code to set limit accordingly.
(count/m) int = x= number of full batches and count-x*m = rows for last batch.
The issue you have is more known as pagination.
And your problem is how many pages you can retrieve form a table ?
You must know the pages count to use pagination.
To set page count, you need to execute pre-query that will have same FROM clause but only SELECT count(*) FROM .... This will give you always correct answer.
If you can not or do not want to use extra statement. Then you should as database for pages until the result is not empty. So you execute statement for page 1, paget 2, ... page n until size of result is less then page size.
int pageSize = 100;
int page = 1;
do {
currentPage = loadPage(page, pageSize);
page++;
} while(currentPage.size() == pageSize);
I suggest to use hibernate functions setFirstResult() and setMaxResults() to achieve your desired results e.g. below:
Query query= session.createQuery("SELECT * from A");
int resultSize = 100;
List<?> pagedResults = null;
//use any approriate number for iterations or do a result count first to get the expected result size
for(int m=0; ; m++){
pagedResults = (List<?>)query.setFirstResult(m*resultSize).setMaxResults((m+1)*resultSize).list();
//do your processing
if(pagedResults.isEmpty() || pagedResults.size() <resultSize){
break; //no further iterations required
}
}
setMaxResults() restricts the result size only. If there are less records available, you will receive those record only in the result. Added on "if" condition to avoid any unnecessary iterations.
Hope this helps!!

Categories

Resources