Web Search API for Java - java

I need a Java library for doing web searches ( any search engine is good ).
I should be able to to write code such as:
WebSearch search = new WebSearch("Apple");
System.Out.Println("Results = " + search.Results.Count);
Does such a library/API exist ?
Regards,
Sebastian

There's an API called JFreeWebSearch:
http://www.ke.tu-darmstadt.de/resources/jfreewebsearch

I think this is your best options.
What is important here is Google's API:
http://ajax.googleapis.com/ajax/services/search/web?start=0&rsz=large&v=1.0&q=Apple
Note the "Apple" in the query String.
With this in mind, use any Java HttpClient (URLConnection class, Apache's HttpClient, you name it) to query and get results back.

lucene is Java based search engine

Related

Is it possible to get Direct Messages from Twitter by a specific user using the Twitter4j library?

I'm using the Twitter4j library to develop a proyect that works with Twitter, one of the things what I need is to get the Direct messages, I'm using the following code:
try{
List<DirectMessage> loStatusList = loTwitter.getDirectMessages();
for (DirectMessage loStatus : loStatusList) {
System.out.println(loStatus.getId() + ",#" + loStatus.getSenderScreenName() + "," + loStatus.getText() + "|");
}
}
catch(Exception e)
It works fine, but what the code returns is a list of the most recent messages in general. What I want is to get those direct messages using some kind of filter that allows finding them by a user that I indicate.
For example, I need to see the DM only from user #TwitterUser.
Is this posible with this library?
All kinds of suggestions are accepted, even if I should use another library I would be grateful if you let me know.
It looks like the actual Twitter API doesn't support a direct filter on that API, by username anyway. (See Twitter API doc: GET direct_messages.)
Which means, you'd have to make multiple calls to the API with pagination enabled, and cache the responses into a list.
Here is an example of pagination wtih Twitter4J getDirectMessages().
In that example, use the existing:
List<DirectMessage> messages;
But inside the loop, do:
messages.addAll(twitter.getDirectMessages(paging));
Note: you only would have to do this once. And in fact, you should persist these to a durable local cache like Redis or something. Because once you have the last message id, you can ask the Twitter API to only return "messages since id" with the since_id param.
Anyway, then on the client side you'd just do your filtering with the usual means in Java. For example:
// Joe is on twitter as #joe
private static final String AT_JOE = "Joe";
// Java 8 Lambda to filter by screen name
List<DirectMessage> messagesFromJoe = messages.stream()
.filter(message -> message.getSenderScreenName().equals(AT_JOE))
.collect(Collectors.toList());
Above, getSenderScreenName() was discovered by reading the Twitter4J API doc for DirectMessage.

Elasticsearch: send JSON query string via Java client?

I am new to Elasticsearch. I read Elasticsearch's Java client APIs and am able to build query and send it to the Elasticsearch server via the transport client.
Because my query is quite complex with multi-level filters and I notice that it is cumbersome to build a query via the Java client. I feel that it is much simpler to build a JSON query string and then send it over to the Elasticsearch server via a Java client.
Is this something Elasticsearch offers?
I do like what Elasticsearch Java API can do after receiving results such as scrolling over the results. I want to keep these features.
Thanks for any input and links!
Regards.
Did further research on Elasticsearch API and found out that Elasticsearch does offer this capability. Here is how:
SearchResponse scrollResp = client.prepareSearch("my-index")
.setTypes("my-type")
.setSearchType(SearchType.SCAN)
.setQuery(query) // **<-- Query string in JSON format**
.execute().actionGet();
You can no longer pass in string to the .setQuery function, however you can use a WrapperQueryBuilder like this:
WrapperQueryBuilder builder = QueryBuilders.wrapperQuery(searchQuery);
SearchRequestBuilder sr = client.prepareSearch().setIndices(index).setTypes(mapping).setQuery(builder);
I'd recommend using the Java API, it is very good once you get used to it and in most cases it is less cumbersome. If you look through the Elasticsearch source code you will see that the Java API Builds the JSON under the hood. Here is an example from the MatchAllQueryBuilder:
#Override
public void doXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject(MatchAllQueryParser.NAME);
if (boost != -1) {
builder.field("boost", boost);
}
if (normsField != null) {
builder.field("norms_field", normsField);
}
builder.endObject();
}
ElasticSearch has built in capabilities to do exactly what you need, in an organized manner.
To answer your question, please see this link (the material is gone on elastic's site, so it might no longer work):
https://web.archive.org/web/20150906215934/https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/search.html
All you have to do is build a simple file which contains your search template i.e complex search query.
It can be a simple json file, or a text file.
Now you simply pass in your parameters, through your java code.
See the example in the link, it makes things amply clear.
Bhargav.

How to use OWL.INVERSEOF using Openrdf in Java

I am trying to create some statements and their inverse in Java using OpenRDF's Sesame. I am following the tutorial in the Sesame 2.7 documentation as well. Let's say I have created the following URIs and statements and added them to the Sesame repository:
RepositoryConnection connection;
ValueFactory factory = ValueFactoryImpl.getInstance();
Resource author_1 = new createURI("http://www.author.org/author_1");
Resource article_1 = new createURI("http://www.title.org/article_1");
URL wrote = factory.createURI("http://www.paper.org/wrote");
URL writtenby = factory.createURI("http://www.paper.org/writtenby");
Statement statement_1 = factory.createStatement(author_1, wrote, article_1);
connection.add(statement_1);
The code above is for creating a statement to describe that an author wrote an article. In the OpenRDF Workbench, I can see this statement. What I am trying to do is to do the inverse using OWL.INVERSEOF to get that article_1 is written by author_1 as follows:
connection.add(writtenby, OWL.INVERSEOF, wrote);
When I run the project and get back to the OpenRDF Workbench, I see the following statements:
<http://www.author.org/author_1>, http://www.paper.org/wrote, http://www.title.org/article_1>
<http://www.paper.org/writtenby>, <http://www.w3.org/2002/owl#inverseOf>, <http://www.paper.org/wrote>
When I click on <http://www.paper.org/writtenby>, I can't find the inverse statement that the article_1 is written by author1 but I can find the author_1 wrote article_1. Is my way of doing this inverse wrong or do I misunderstand something with this concept? Thank you very much for your help in advance.
It is as Joshua says. OpenRDF/Sesame doesn't have support for this kind of reasoning. I think it supports only some kind of basic RDF/S reasoning during load. It also (still) doesn't support custom rules I think.
You can achieve what you are asking by using Sesame with OWLIM. OWLIM-Lite and OWLIM-SE do have support for OWL reasoning (rule-based, forward-chaining, materialization). There is a number of predefined rule sets supported by OWLIM. You would probably want owl-max.
Depending on what you are trying to achieve, you might want to use a real OWL reasoner such as Pellet. However, Sesame doesn't support Pellet...

Google Translator API and many translation for one word

I am using google-api-translate-java-0.92.jar.
Translate.setHttpReferrer("http://translate.google.com/");
try {
String translation = Translate.execute("arrangement", Language.ENGLISH, Language.UKRANIAN);
System.out.println(translation);
} catch (Exception e) {
System.out.println(e.getMessage());
}
But I get only one translation.
I go to page http://translate.google.com/ and It gives me multiple translation.
How can I get multiple translation from my code?
I don't believe you can...
Looking at the soucre, it builds up the following URL:
http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=en|uk&q=arrangement
which when you look at the JSON response, returns:
{"responseData": {"translatedText":"Композиція"}, "responseDetails": null, "responseStatus": 200}
As you can see, this is only returning a single word. The dictionary lookup on the google translate page must be an additional call to a different service (not part of the translate service)
EDIT
Using firebug, you can see the request that is being made by the translate page, and you get this URL:
http://translate.google.com/translate_a/t?client=t&text=arrangement&hl=en&sl=en&tl=uk&multires=1&otf=2&pc=0&sc=1
Which returns this:
[[["Композиція","arrangement","Kompozytsiya"]],[["noun",["розташування","розміщення","домовленість","аранжування","упорядкування","механізм","оформлення","пристрій","систематизація","монтаж","пристосування","урегулювання","плани","згода","залагода","розв'язання","порозуміння"]]],"en"]
However, this extended URL format is not supported by the translate JAR you are using (at least I can't find it in the source on google code), is not part of the googleapis subdomain, and I'm not even sure it's for public consumption or that calling it directly doesn't violate Googles T&Cs.
But that's how they generate the dictinary list anyway...
From Google Translate API FAQ (checked: 16 November 2013)
Is it possible to get multiple translations of a word?
The answer:
No. This feature is only available via the web interface at
translate.google.com
source: https://developers.google.com/translate/v2/faq#technical
There is an open enhancement request for this functionality. So it doesn't look like you're doing anything wrong; the API just doesn't expose that functionality.

Is there any API in Java to access wikipedia data

I want to know: is there any API or a query interface through which I can access Wikipedia data?
Mediawiki, the wiki platform that wikipedia uses does have an HTTP based API. See MediaWiki API.
For example, to get pages with the title stackoverflow, you call
http://en.wikipedia.org/w/api.php?action=query&titles=Stackoverflow
There are some (incomplete) Java wrappers around the API - see the Client Code - Java section of the API page for more detail.
For the use with Java, try http://code.google.com/p/wiki-java. It is only one class, but a great one!
You can use Jwiki to get Wikipedia data
Example :
Jwiki jwiki = new Jwiki("elon musk");
System.out.println("Title :"+jwiki.getDisplayTitle()); //get title
System.out.println("Text : "+jwiki.getExtractText()); //get summary text
System.out.println("Image : "+jwiki.getImageURL()); //get image URL
I had the same question and the closest I came to an out-of-the-box solution is bliki, hosted at http://code.google.com/p/gwtwiki/.
I also wrote an article at Integrating Stuff to help you get started with it: http://www.integratingstuff.com/2012/04/06/hook-into-wikipedia-using-java-and-the-mediawiki-api/
MediaWiki is a free and open-source wiki software. Originally developed by Magnus Manske and improved by Lee Daniel Crocker, it runs on many websites, including Wikipedia, Wiktionary and Wikimedia Commons.[5][6]
There is list of Java libraries that can help you to connect wiki by java code .
https://www.mediawiki.org/wiki/API:Client_code#Java
but after use some of them because of their limitations , we try to call REST services from mediawiki directly.

Categories

Resources