Tree Tagger for Java (tt4j) - java

I am creating a Twitter Sentiment Analysis tool in Java. I am using the Twitter4J API to search tweets via the hashtag feature in twitter and then provide sentiment analysis on these tweets. Through research, I have found that the best solution to doing this will be using a POS and TreeTagger for Java.
At the moment, I am using the examples provided to see how the code works, although I am encountering some problems.
This is the code
import org.annolab.tt4j.*;
import static java.util.Arrays.asList;
public class Example {
public static void main(String[] args) throws Exception {
// Point TT4J to the TreeTagger installation directory. The executable is expected
// in the "bin" subdirectory - in this example at "/opt/treetagger/bin/tree-tagger"
System.setProperty("treetagger.home", "/opt/treetagger");
TreeTaggerWrapper tt = new TreeTaggerWrapper<String>();
try {
tt.setModel("/opt/treetagger/models/english.par:iso8859-1");
tt.setHandler(new TokenHandler<String>() {
public void token(String token, String pos, String lemma) {
System.out.println(token + "\t" + pos + "\t" + lemma);
}
});
tt.process(asList(new String[] { "This", "is", "a", "test", "." }));
}
finally {
tt.destroy();
}
}
}
At the moment, when this is run, I receive an error which says
TreeTaggerWrapper cannot be resolved to a type
TokenHandler cannot be resolved to a type
I will be grateful for any help given
Thank you

Related

jni binding, javac error, unexpected token

We did an openvr java binding using jna and it's true what they usually say about jna, it's quite easy to implement.
On the contrary, it has some performance penalties. Googling around and reading some papers, jna is from 10 to almost 80 times slower than jni (here and here).
This wouldn't be a problem for not-critical performance scenarios, but we run into some performance issues and we are trying to addresses all the causes, such as the binding for example.
I searched for some time and there are a lot of different ways to achieve this, but given the header we'd like to port is relatively easy (last famous words..) we are trying to do it manually.
I started by the two most important calls, VR_Init and VR_ShutDown:
inline IVRSystem *VR_Init( EVRInitError *peError, EVRApplicationType eApplicationType )
{
IVRSystem *pVRSystem = nullptr;
EVRInitError eError;
VRToken() = VR_InitInternal( &eError, eApplicationType );
COpenVRContext &ctx = OpenVRInternal_ModuleContext();
ctx.Clear();
if ( eError == VRInitError_None )
{
if ( VR_IsInterfaceVersionValid( IVRSystem_Version ) )
{
pVRSystem = VRSystem();
}
else
{
VR_ShutdownInternal();
eError = VRInitError_Init_InterfaceNotFound;
}
}
if ( peError )
*peError = eError;
return pVRSystem;
}
/** unloads vrclient.dll. Any interface pointers from the interface are
* invalid after this point */
inline void VR_Shutdown()
{
VR_ShutdownInternal();
}
The corresponding java class is pretty simple:
public class HelloVr {
static {
System.loadLibrary("openvr_api");
}
static final int VRInitError_None = 0, VRApplication_Scene = 1;
public native IVRSystem VR_Init(ByteBuffer peError, int eApplicationType);
public native void VR_Shutdown();
public static void main(String[] args) {
new HelloVr();
}
public HelloVr() {
ByteBuffer peError = ByteBuffer.allocateDirect(Integer.BYTES).order(ByteOrder.nativeOrder());
IVRSystem hmd = VR_Init(peError, VRApplication_Scene);
System.out.println("error: " + peError.getInt(0));
}
class IVRSystem {
private long nativePtr = 0L;
}
}
Now is time to compile HelloVr.java into HelloVr.class by typing
javac HelloVr.java
But I get Unexpected Token:
PS C:\Users\GBarbieri\Documents\NetBeansProjects\Test\Test\src\test> "C:\Program Files\Java\jdk1.8.0_102\bin\javac.exe"
.\HelloVr.java
Unerwartetes Token ".\HelloVr.java" im Ausdruck oder in der Anweisung.
Bei Zeile:1 Zeichen:66
+ "C:\Program Files\Java\jdk1.8.0_102\bin\javac.exe" .\HelloVr.java <<<<
+ CategoryInfo : ParserError: (.\HelloVr.java:String) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : UnexpectedToken
why?
Not exactly the ideal answer I was looking for, but it made it working.
I added the javac.exe directly to the PATH environment variable and simply run:
javac HelloVr.java

Bukkit Custom Prefix plugin doesn't work with Essentials

I'm making a plugin that will have ranks in the near future, but I decided to get past prefixes first. I have this code:
Essentials ess = (Essentials) Bukkit.getServer().getPluginManager().getPlugin("Essentials");
User user = ess.getUserMap().getUser(p.getName());
//nickname
String nick = user.getDisplayName();
String prisoner = ColourMsg("&5<<&bPrisoner&5>>&r>" + " <");
p.setDisplayName(prisoner + nick);
For some reason, this code doesn't work! It only displays the nickname, and not the prefix (I would expect it to display both). Also, the only error message I get is from essentials chat, which isn't needed for my plugin and /nick still works.
If anyone can help, please let me know.
Thanks in advance!
You don't need Essentials for that (Essentials is a bad plugin anyway, since 1.8).
You can simply use scoreboard prefixes/suffix in the PlayerJoinEvent to set the tags.
Scoreboard sb = Bukkit.getScoreboardManager().getNewScoreboard();
Objective ob = sb.registerNewObjective("objName", "dummy");
public void onEnable() {
// Set Display slot
ob.setDisplaySlot(DisplaySlot.PLAYER_LIST);
}
public void onJoin(PlayerJoinEvent e) {
// Delay a task
Bukkit.getServer().getScheduler().scheduleSyncDelayedTask(this, new Runnable() {
#Override
public void run() {
if (e.getPlayer().hasPermission("tags.example")) {
sb.registerNewTeam("Example");
Team team = sb.getTeam("Example");
team.setPrefix(ChatColor.RED + "[Example]");
team.addEntry(e.getPlayer().getName());
} else if (e.getPlayer().hasPermission("tags.otherTag")) {
sb.registerNewTeam("OtherTag");
Team team = sb.getTeam("OtherTag");
team.setPrefix(ChatColor.GREEN + "[OtherTag]");
team.addEntry(e.getPlayer().getName());
}
}
}, 20 * 1); // The 1 is the number of seconds to delay, 1 is fine
}

How to translate city names in different languages

I have a mobile app (both iOS and Android) and I need to translate cities name in the language of the user. I can do the translation on mobile device or on my server (running php).
As of now I managed to translate country names, here the java code that translate all possible countries in all possible languages:
import java.util.Locale;
public class ListCountry {
public static void main(String[] args) {
ListCountry obj = new ListCountry();
obj.getListOfCountries();
}
public void getListOfCountries() {
String[] locales = Locale.getISOCountries();
for (String countryCode : locales) {
Locale obj = new Locale("", countryCode);
String[] lingue = Locale.getISOLanguages();
for (String languageCode : lingue) {
System.out.println("Country Code = " + obj.getCountry()
+ ", Country Name = " + obj.getDisplayCountry(new Locale(languageCode)) + ", language = " + (new Locale(languageCode)).getDisplayLanguage());
}
}
}
}
How can I do a similar thing but with city names? I know CLDR and ICU but I really can't figure out how to do it (or if it's even possible). If there is a nice object oriented library out there it'll be better than parsing CLDR XMLs or other source.
I prefer to do it locally (on my server or even on mobile app) instead of calling Google API, example:
http://maps.googleapis.com/maps/api/geocode/json?address=turin&language=ES
http://maps.googleapis.com/maps/api/geocode/json?address=turin&language=IT
http://maps.googleapis.com/maps/api/geocode/json?address=turin&language=EN
(question is: I guess google DB of cities name is public, where is it? is it nicely wrapped in some user friendly cross-platform library?)
Thanks for your help
I guess you're looking for a file containing all cities and its translations instead of fetch them once per city?
If so, www.geonames.org has geo-data of different types (countries, adminzones, cities) in multiple languages. Next to their API call you can also download their files directly and parse it by yourself:
At the following URL, you'll find 3 Zip-Files prefixed by "alternativeNames"
http://download.geonames.org/export/dump/
They contain - hopefully - the necessary data.

How to parse data in Talend with Java (coming from a previously produced .txt file)?

I have a process in Talend which gets the search result of a page, saves the html and writes it into files, as seen here:
Initially I had a two step process with parsing out the date from the HTML files in Java. Here is the code: It works and writes it to a mysql database. Here is the code which basically does exactly that. (I'm a beginner, sorry for the lack of elegance)
package org.jsoup.examples;
import java.io.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;
import java.io.IOException;
public class parse2 {
static parse2 parseIt2 = new parse2();
String companyName = "Platzhalter";
String jobTitle = "Platzhalter";
String location = "Platzhalter";
String timeAdded = "Platzhalter";
public static void main(String[] args) throws IOException {
parseIt2.getData();
}
//
public void getData() throws IOException {
Document document = Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"), "utf-8");
Elements elements = document.select(".joblisting");
for (Element element : elements) {
// Parse Data into Elements
Elements jobTitleElement = element.select(".job_title span");
Elements companyNameElement = element.select(".company_name span[itemprop=name]");
Elements locationElement = element.select(".locality span[itemprop=addressLocality]");
Elements dateElement = element.select(".job_date_added [datetime]");
// Strip Data from unnecessary tags
String companyName = companyNameElement.text();
String jobTitle = jobTitleElement.text();
String location = locationElement.text();
String timeAdded = dateElement.attr("datetime");
System.out.println("Firma:\t"+ companyName + "\t" + jobTitle + "\t in:\t" + location + " \t Erstellt am \t" + timeAdded );
}
}
}
Now I want to do the process End-to-End in Talend, and I got assured this works.
I tried this (which looks quite shady to me):
Basically I put all imports in "advanced settings" and the code in the "basic settings" section. This importLibrary is thought to load the jsoup parsing library, as well as the mysql connect (i might to the connect with talend tools though).
Obviously this isn't working. I tried to strip the Base Code from classes and stuff and it was even worse. Can you help me how to get the generated .txt files parsed with Java here?
EDIT: Here is the Link to the talend Job http://www.share-online.biz/dl/8M5MD99NR1
EDIT2: I changed the code to the one I tried in JavaFlex. But it didn't work (the import part in the start part of the code, the rest in "body/main" and nothing in "end".
This is a problem related to Talend, in your code, use the complete method names including their packages. For your document parsing for example, you can use :
Document document = org.jsoup.Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"), "utf-8");

search google and get results using java swing [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Does anyone know if and how it is possible to search Google programmatically - especially if there is a Java API for it?
Some facts:
Google offers a public search webservice API which returns JSON: http://ajax.googleapis.com/ajax/services/search/web. Documentation here
Java offers java.net.URL and java.net.URLConnection to fire and handle HTTP requests.
JSON can in Java be converted to a fullworthy Javabean object using an arbitrary Java JSON API. One of the best is Google Gson.
Now do the math:
public static void main(String[] args) throws Exception {
String google = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=";
String search = "stackoverflow";
String charset = "UTF-8";
URL url = new URL(google + URLEncoder.encode(search, charset));
Reader reader = new InputStreamReader(url.openStream(), charset);
GoogleResults results = new Gson().fromJson(reader, GoogleResults.class);
// Show title and URL of 1st result.
System.out.println(results.getResponseData().getResults().get(0).getTitle());
System.out.println(results.getResponseData().getResults().get(0).getUrl());
}
With this Javabean class representing the most important JSON data as returned by Google (it actually returns more data, but it's left up to you as an exercise to expand this Javabean code accordingly):
public class GoogleResults {
private ResponseData responseData;
public ResponseData getResponseData() { return responseData; }
public void setResponseData(ResponseData responseData) { this.responseData = responseData; }
public String toString() { return "ResponseData[" + responseData + "]"; }
static class ResponseData {
private List<Result> results;
public List<Result> getResults() { return results; }
public void setResults(List<Result> results) { this.results = results; }
public String toString() { return "Results[" + results + "]"; }
}
static class Result {
private String url;
private String title;
public String getUrl() { return url; }
public String getTitle() { return title; }
public void setUrl(String url) { this.url = url; }
public void setTitle(String title) { this.title = title; }
public String toString() { return "Result[url:" + url +",title:" + title + "]"; }
}
}
###See also:
How to fire and handle HTTP requests using java.net.URLConnection
How to convert JSON to Java
Update since November 2010 (2 months after the above answer), the public search webservice has become deprecated (and the last day on which the service was offered was September 29, 2014). Your best bet is now querying http://www.google.com/search directly along with a honest user agent and then parse the result using a HTML parser. If you omit the user agent, then you get a 403 back. If you're lying in the user agent and simulate a web browser (e.g. Chrome or Firefox), then you get a way much larger HTML response back which is a waste of bandwidth and performance.
Here's a kickoff example using Jsoup as HTML parser:
String google = "http://www.google.com/search?q=";
String search = "stackoverflow";
String charset = "UTF-8";
String userAgent = "ExampleBot 1.0 (+http://example.com/bot)"; // Change this to your company's name and bot homepage!
Elements links = Jsoup.connect(google + URLEncoder.encode(search, charset)).userAgent(userAgent).get().select(".g>.r>a");
for (Element link : links) {
String title = link.text();
String url = link.absUrl("href"); // Google returns URLs in format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>".
url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");
if (!url.startsWith("http")) {
continue; // Ads/news/etc.
}
System.out.println("Title: " + title);
System.out.println("URL: " + url);
}
To search google using API you should use Google Custom Search, scraping web page is not allowed
In java you can use CustomSearch API Client Library for Java
The maven dependency is:
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-customsearch</artifactId>
<version>v1-rev57-1.23.0</version>
</dependency>
Example code searching using Google CustomSearch API Client Library
public static void main(String[] args) throws GeneralSecurityException, IOException {
String searchQuery = "test"; //The query to search
String cx = "002845322276752338984:vxqzfa86nqc"; //Your search engine
//Instance Customsearch
Customsearch cs = new Customsearch.Builder(GoogleNetHttpTransport.newTrustedTransport(), JacksonFactory.getDefaultInstance(), null)
.setApplicationName("MyApplication")
.setGoogleClientRequestInitializer(new CustomsearchRequestInitializer("your api key"))
.build();
//Set search parameter
Customsearch.Cse.List list = cs.cse().list(searchQuery).setCx(cx);
//Execute search
Search result = list.execute();
if (result.getItems()!=null){
for (Result ri : result.getItems()) {
//Get title, link, body etc. from search
System.out.println(ri.getTitle() + ", " + ri.getLink());
}
}
}
As you can see you will need to request an api key and setup an own search engine id, cx.
Note that you can search the whole web by selecting "Search entire web" on basic tab settings during setup of cx, but results will not be exactly the same as a normal browser google search.
Currently (date of answer) you get 100 api calls per day for free, then google like to share your profit.
In the Terms of Service of google we can read:
5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.
So I guess the answer is No. More over the SOAP API is no longer available
Google TOS have been relaxed a bit in April 2014. Now it states:
"Don’t misuse our Services. For example, don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide."
So the passage about "automated means" and scripts is gone now. It evidently still is not the desired (by google) way of accessing their services, but I think it is now formally open to interpretation of what exactly an "interface" is and whether it makes any difference as of how exactly returned HTML is processed (rendered or parsed). Anyhow, I have written a Java convenience library and it is up to you to decide whether to use it or not:
https://github.com/afedulov/google-web-search
Indeed there is an API to search google programmatically. The API is called google custom search. For using this API, you will need an Google Developer API key and a cx key. A simple procedure for accessing google search from java program is explained in my blog.
Now dead, here is the Wayback Machine link.
As an alternative to BalusC answer as it has been deprecated and you have to use proxies, you can use this package. Code sample:
Map<String, String> parameter = new HashMap<>();
parameter.put("q", "Coffee");
parameter.put("location", "Portland");
GoogleSearchResults serp = new GoogleSearchResults(parameter);
JsonObject data = serp.getJson();
JsonArray results = (JsonArray) data.get("organic_results");
JsonObject first_result = results.get(0).getAsJsonObject();
System.out.println("first coffee: " + first_result.get("title").getAsString());
Library on GitHub
In light of those TOS alterations last year we built an API that gives access to Google's search. It was for our own use only but after some requests we decided to open it up. We're planning to add additional search engines in the future!
Should anyone be looking for an easy way to implement / acquire search results you are free to sign up and give the REST API a try: https://searchapi.io
It returns JSON results and should be easy enough to implement with the detailed docs.
It's a shame that Bing and Yahoo are miles ahead on Google in this regard. Their APIs aren't cheap, but at least available.

Categories

Resources