I am starting to learn the OpenNLP API by Jave.
I found some good examples in this website
http://www.programcreek.com/2012/05/opennlp-tutorial/
I have tried the Name Finder API but I found something strange.
If I replace the input as
String []sentence = new String[]{
"John",
"is",
"good"
};
The code is still working, but if I change it as
String []sentence = new String[]{
"John",
"is",
"fine"
};
There is no output.
I cannot understand what causes the problem. Is it form the model I use? (en-ner-person.bin)
And does anyone know how can I build my own model?
Thanks!
Assuming it is not throwing an exception and just can't find the name "John," It's not working because the model cannot find John when the sentence is "John is fine" because OpenNLP is a Machine learning approach and it finds Named entities based on a model. The en-person.bin model apparently does not have sufficient samples of sentences similar enough to "john is fine" to return a probability high enough to give you a response.
Related
I'm trying to add custom made dictionary file to hunspell processing in my java project. The reason for this is because I want to be able for hunspell to suggest me words that are not in English language (default dictionary), such as personal names and street names. I've seen that the same thing works in openoffice for example. The hunspell version I'm using is 1.1.1.
I am sure the hunspell has read my dictionary file as well as the ".aff" file with language rules, but for some reason it just doesn't give proper results.
My question is, does anyone have a working example of how hunspell can be used in java with custom made dictionary that includes personal names? By working example, I mean an example where hunspell will correctly suggest the results if misspell happens. For example if I input "Pettter", it knows to suggest "Peter" (which is in dictionary) instead.
I made a lot of research already and just can't make it work so I'm posting here if anyone ran into similar or same problem.
Hunspell initialization example:
StringBuilder dictionaryPathString = new StringBuilder(AppConfigurationProperties.PRIVATE_DIRECTORY.getAbsolutePath());
dictionaryPathString.append("/integration/hunspell")
.append("/")
.append(lang)
.append("/")
.append(lang)
.append(".dic");
Path dictionaryPath = Paths.get(dictionaryPathString);
StringBuilder affixPathString = new StringBuilder(AppConfigurationProperties.PRIVATE_DIRECTORY.getAbsolutePath());
affixPathString.append("/integration/hunspell")
.append("/")
.append(lang)
.append("/")
.append(lang)
.append(".aff");
Path affixPath = Paths.get(affixPathString);
Hunspell hunspell = new Hunspell(dictionaryPath, affixPath);
Hunspell sugggest use example:
public List<String> suggest(String word, String lang) {
Hunspell hunspell = getInstance(lang);
return hunspell.suggest(word);
}
getInstance method just returns proper hunspell instance by given language since I have instances in a map.
Also, this is the aff file I'm using:
https://github.com/ropensci/hunspell/tree/master/inst/dict
Tnx for your help in advance!
First off let me say that I am a complete newbie with NLP. Although, as you read on, that is probably going to become strikingly apparent.
I'm parsing Wikipedia pages to find all mentions of the page title. I do this by going through the CorefChainAnnotations to find "proper" mentions - I then assume that the most common ones are talking about the page title. I do it by running this:
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,coref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String content = "Abraham Lincoln was an American politician and lawyer who served as the 16th President of the United States from March 1861 until his assassination in April 1865. Lincoln led the United States through its Civil War—its bloodiest war and perhaps its greatest moral, constitutional, and political crisis.";
Annotation document = new Annotation(content);
pipeline.annotate(document);
for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
List<CorefChain.CorefMention> corefMentions = cc.getMentionsInTextualOrder();
for (CorefChain.CorefMention cm : corefMentions) {
if (cm.mentionType == Dictionaries.MentionType.PROPER) {
log("Proper ref using " + cm.mentionSpan + ", " + cm.mentionType);
}
}
}
This returns:
Proper ref using the United States
Proper ref using the United States
Proper ref using Abraham Lincoln
Proper ref using Lincoln
I know already that "Abraham Lincoln" is definitely what I am looking for and I can surmise that because "Lincoln" appears a lot as well then that must be another way of talking about the main subject. (I realise right now the most common named entity is "the United States", but once I've fed it the whole page it works fine).
This works great until I have a page like "Gone with the Wind". If I change my code to use that:
String content = "Gone with the Wind has been criticized as historical revisionism glorifying slavery, but nevertheless, it has been credited for triggering changes to the way African-Americans are depicted cinematically.";
then I get no Proper mentions back at all. I suspect this is because none of the words in the title are recognised as named entities.
Is there any way I can get Stanford NLP to recognise "Gone with the Wind" as an already-known named entity? From looking around on the internet it seems to involve training a model, but I want this to be a known named entitity just for this single run and I don't want the model to remember this training later.
I can just imagine NLP experts rolling their eyes at the awfulness of this approach, but it gets better! I came up with the great idea of changing any occurences of the page title to "Thingamijig" before passing the text to Stanford NLP, which works great for "Gone with the Wind" but then fails for "Abraham Lincoln" because (I think) the NER longer associates "Lincoln" with "Thingamijig" in the corefMentions.
In my dream world I would do something like:
pipeline.addKnownNamedEntity("Gone with the Wind");
But that doesn't seem to be something I can do and I'm not exactly sure how to go about it.
You can submit a dictionary with any phrases you want and have them recognized as named entities.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.regexner.mapping additional.rules -file example.txt -outputFormat text
additional.rules
Gone With The Wind MOVIE MISC 1
Note that the columns above should be tab-delimited. You can have as many lines as you'd like in the additional.rules file.
One warning, EVERY TIME that token pattern occurs it will be tagged.
More details here: https://stanfordnlp.github.io/CoreNLP/ner.html
i want to progamm an android app, which can automaticly generate statements. But before i can build the statements i need information from wikidata.
I just need a possibility to enter a Keyword and i will get all informations about it.
Should look like this:
Input:
"New York"
Output:
"population: 8,405,837, 2016; 8,175,133, 2010;........"
"head of gouverment: Bill de Blasio, 1 January 2014;........."
.
.
.
With this data i could build my statements like:
"The population of New York is 8,405,837 since 2016" (For de Statement building i dont need help only for the extraction of the data)
i need a method like:
public String getDataByKeyWord (String keyWord ){
String data;
.
.
return data;
}
I hope you can help me its very Important for me.
And sorry if i made mistakes, its my first time here.
Thanks
You may find everything you need here https://www.mediawiki.org/wiki/Wikidata_Toolkit
Wikidata being structured data (as opposed to wikipedia) you'll be able to navigate through it in a more data structured way
I'm currently working on a translator using IBM's Watson and their Java SDK imported using Maven, and I'm currently having a hard time trying to figure out how to output the translated text into a text box. The translation results come back to me as:
[{
"translation": "Hello, World !"
}]
I think I'm not calling something correctly in my code:
LanguageTranslation service = new LanguageTranslation();
service.setUsernameAndPassword("<username>", "<password>");
TranslationResult translationResult = service.translate(txt_input.getText(), "en", "fr");
txt_translation.setText(translationResult.getTranslations().toString());
I think I'm not calling something correctly in my code. According to the documentation, the translated text returns as a list of translations, but I'm not sure how to make it output only the translation of what I enter in. Any suggestions?
EDIT: I'm still working out trying to fix the issue. I've tried:
java.util.List<Translation> translationText = translationResult.getTranslations();
txt_translation.setText(translationText.toString());
With and without the toString() added in. Still comes up as what it did before.
translationResult.getTranslations() returns the List<Translation>, so it is a list of Translation that you should iterate. For instance, in order to get the first translation, you should use:
translationResult.getTranslations().get(0)
, which returns Translation, and then use getTranslation() in order to get translated text.
For instance, translation for the first language should be reachable by:
translationResult.getTranslations().get(0).getTranslation()
Notice that for your specific example, just saying that you want from "en" to "fr" won't work. You have to select an existing translation model first, like this
LanguageTranslation service = new LanguageTranslation();
service.setUsernameAndPassword("xxx", "xxx");
//System.out.println(service.getModels()); // <<< get your model from here
//TranslationResult translationResult = service.translate("Hello World!", "en", "fr"); //<< does not work OOTB
TranslationResult translationResult = service.translate("Hello World!", "en-fr-conversational");
System.out.println(translationResult.getTranslations().get(0).getTranslation());
so you'll have
Bonjour monde !
How can we find the word phrases in a synset ? In particular, take this synset for the adj "booked":
booked, engaged, set-aside -- (reserved in advance)
I use the RitaWN Java package (WordNet version is 2.1), and cannot seem to find the phrases. In the example above, when I run
RiWordnet wordnet = new RiWordnet(null);
String[] syn = wordnet.getSynset(word, "a", true);
for(int i = 0; i < syn.length; i++)
System.out.println(syn[i]);
It only outputs
booked engaged
While "set-aside" is not listed.
I have tested a lot and all phrases are not found. Another example:
commodity, trade good, good -- (articles of commerce)
then "trade good" is not returned from the getSynset() method. So how can we actually get phrases ?
(the ritawn package is obtained from http://rednoise.org/rita/wordnet/documentation/index.htm)
RiTaWN seems to ignore "compound-words" by default. You can disable this to get the full list of phrases (line 2 below).
RiWordnet wordnet = new RiWordnet();
wordnet.ignoreCompoundWords(false);
String[] syn = wordnet.getSynset("booked", "a", true);
System.out.println(Arrays.asList(syn));
Result:
[INFO] RiTa.WordNet.version [033]
[booked, engaged, set-aside]
This answer is a bit off right field but in any case...
Idilia has an online Wordnet-like database that is actually much more complete and richer than Wordnet. Depending on where you are in your application it may make sense so I'm mentioning it. There are coding examples for Java access on the site.
In this case the query:
[{"fs":"booked/J1", "lemma":[], "definition":null}]
would return
{
"fs" : "booked/J1",
"lemma" : [
"set_aside",
"set-aside",
"engaged",
"booked"
],
"definition" : "reserved in advance."
}