I am a bit new to Java and Eclipse. I usually use python and Nltk for NLP task..
I am trying to follow the tutorial provided here
package edu.stanford.nlp.examples;
import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ie.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import java.util.*;
public class BasicPipelineExample {
public static String text = "Joe Smith was born in California. " +
"In 2017, he went to Paris, France in the summer. " +
"His flight left at 3:00pm on July 10th, 2017. " +
"After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
"He sent a postcard to his sister Jane Smith. " +
"After hearing about Joe's trip, Jane decided she might go to France one day.";
public static void main(String[] args) {
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
// set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument document = new CoreDocument(text);
// annnotate the document
pipeline.annotate(document);
// examples
// 10th token of the document
CoreLabel token = document.tokens().get(10);
System.out.println("Example: token");
System.out.println(token);
System.out.println();
// text of the first sentence
String sentenceText = document.sentences().get(0).text();
System.out.println("Example: sentence");
System.out.println(sentenceText);
System.out.println();
// second sentence
CoreSentence sentence = document.sentences().get(1);
// list of the part-of-speech tags for the second sentence
List<String> posTags = sentence.posTags();
System.out.println("Example: pos tags");
System.out.println(posTags);
System.out.println();
// list of the ner tags for the second sentence
List<String> nerTags = sentence.nerTags();
System.out.println("Example: ner tags");
System.out.println(nerTags);
System.out.println();
// constituency parse for the second sentence
Tree constituencyParse = sentence.constituencyParse();
System.out.println("Example: constituency parse");
System.out.println(constituencyParse);
System.out.println();
// dependency parse for the second sentence
SemanticGraph dependencyParse = sentence.dependencyParse();
System.out.println("Example: dependency parse");
System.out.println(dependencyParse);
System.out.println();
// kbp relations found in fifth sentence
List<RelationTriple> relations =
document.sentences().get(4).relations();
System.out.println("Example: relation");
System.out.println(relations.get(0));
System.out.println();
// entity mentions in the second sentence
List<CoreEntityMention> entityMentions = sentence.entityMentions();
System.out.println("Example: entity mentions");
System.out.println(entityMentions);
System.out.println();
// coreference between entity mentions
CoreEntityMention originalEntityMention = document.sentences().get(3).entityMentions().get(1);
System.out.println("Example: original entity mention");
System.out.println(originalEntityMention);
System.out.println("Example: canonical entity mention");
System.out.println(originalEntityMention.canonicalEntityMention().get());
System.out.println();
// get document wide coref info
Map<Integer, CorefChain> corefChains = document.corefChains();
System.out.println("Example: coref chains for document");
System.out.println(corefChains);
System.out.println();
// get quotes in document
List<CoreQuote> quotes = document.quotes();
CoreQuote quote = quotes.get(0);
System.out.println("Example: quote");
System.out.println(quote);
System.out.println();
// original speaker of quote
// note that quote.speaker() returns an Optional
System.out.println("Example: original speaker of quote");
System.out.println(quote.speaker().get());
System.out.println();
// canonical speaker of quote
System.out.println("Example: canonical speaker of quote");
System.out.println(quote.canonicalSpeaker().get());
System.out.println();
}
}
but I always get the following output containing an error, and this happen for all modules relating to kbp, and I did add the jar files as requested by the tutorial:
Adding annotator tokenize No tokenizer type provided. Defaulting to
PTBTokenizer. Adding annotator ssplit Adding annotator pos Loading POS
tagger from
edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
... done [0.9 sec]. Adding annotator lemma Adding annotator ner
Loading classifier from
edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ...
done [1.4 sec]. Loading classifier from
edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ...
done [1.8 sec]. Loading classifier from
edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz
... done [0.6 sec]. Exception in thread "main"
edu.stanford.nlp.io.RuntimeIOException: Couldn't read TokensRegexNER
from edu/stanford/nlp/models/kbp/regexner_caseless.tab at
edu.stanford.nlp.pipeline.TokensRegexNERAnnotator.readEntries(TokensRegexNERAnnotator.java:593)
at
edu.stanford.nlp.pipeline.TokensRegexNERAnnotator.(TokensRegexNERAnnotator.java:293)
at
edu.stanford.nlp.pipeline.NERCombinerAnnotator.setUpFineGrainedNER(NERCombinerAnnotator.java:209)
at
edu.stanford.nlp.pipeline.NERCombinerAnnotator.(NERCombinerAnnotator.java:152)
at
edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:68)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$45(StanfordCoreNLP.java:546)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$70(StanfordCoreNLP.java:625)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at
edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at
edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:201)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:194)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:181)
at NLP.Start.main(Start.java:13) Caused by: java.io.IOException:
Unable to open "edu/stanford/nlp/models/kbp/regexner_caseless.tab" as
class path, filename or URL at
edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:481)
at edu.stanford.nlp.io.IOUtils.readerFromString(IOUtils.java:618) at
edu.stanford.nlp.pipeline.TokensRegexNERAnnotator.readEntries(TokensRegexNERAnnotator.java:590)
... 14 more
Do you have any idea to fix this? Thanks in advance!
Probably you forgot to add stanford-corenlp-3.9.1-models.jar to your class path.
Well, according to the models page, there are is a separate model download for the kbp material. Perhaps you have access to stanford-english-corenlp-2018-02-27-models, but not access to stanford-english-kbp-corenlp-2018-02-27-models? I would guess this because it appears that other models were found from what you provided us in the question.
Related
I am running CoreNLP over some text, and matching the entities found to Wikipedia entities. I want to reconstruct the sentence providing the link and other useful information for the entities found.
The CoreEntityMention has an entity() method, but it just returns a String.
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitylink");
// set up pipeline
pipeline = new StanfordCoreNLP(props);
String doc = "text goes here";
pipeline.annotate(doc);
// Iterate the sentences
for (CoreSentence sentence : doc.sentences()) {
Go through all mentions
for (CoreEntityMention em : sentence.entityMentions()) {
System.out.println(em.sentence());
// Here I would like to extract the Wikipedia entity information
System.out.println(em.entity());
}
}
You just need to add the wikipedia page url.
So Neil_Armstrong maps to https://en.wikipedia.org/wiki/Neil_Armstrong.
The code:
package org.javautil.salesdata;
import java.io.File;
import java.io.IOException;
import java.util.Map;
import org.javautil.util.ListOfNameValue;
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
// https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv
public class Manufacturers {
private static final String fileName= "src/main/resources/pdssr/manufacturers.csv";
ListOfNameValue getManufacturers() throws IOException {
ListOfNameValue lnv = new ListOfNameValue();
File csvFile = new File(fileName);
CsvMapper mapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader(); // use first row as header; otherwise defaults are fine
MappingIterator<Map<String,String>> it = mapper.readerFor(Map.class)
.with(schema)
.readValues(csvFile);
while (it.hasNext()) {
Map<String,String> rowAsMap = it.next();
System.out.println(rowAsMap);
}
return lnv;
}
}
The data:
"mfr_id","mfr_cd","mfr_name"
"0000000020","F-L", "Frito-Lay"
"0000000030","GM", "General Mills"
"0000000040","HVEND", "Hershey Vending"
"0000000050","HFUND", "Hershey Fund Raising"
"0000000055","HCONC", "Hershey Concession"
"0000000060","SNYDERS", "Snyder's of Hanover"
"0000000080","KELLOGG", "Kellogg & Keebler"
"0000000115","KARS", "Kar Nut Product (Kar's)"
"0000000135","MARS", "Mars Chocolate "
"0000000145","POORE", "Inventure Group (Poore Brothers)"
"0000000150","WOW", "WOW Foods"
"0000000160","CADBURY", "Cadbury Adam USA, LLC"
"0000000170","MONOGRAM", "Monogram Food"
"0000000185","JUSTBORN", "Just Born"
"0000000190","HOSTESS", "Hostess, Dolly Madison"
"0000000210","SARALEE", "Sara Lee"
The exception is
fasterxml.jackson.databind.exc.RuntimeJsonMappingException: Too many entries: expected at most 3 (value #3 (4 chars) "LLC"")
I thought I would throw out my own CSV parser and adopt a supported project with more functionality, but most of them are far slower, just plain break or have examples all over the web that don't work with current release of the product.
The problem is your file does not meet the CSV standard. The third field always starts with a space
mfr_id","mfr_cd","mfr_name"
"0000000020","F-L", "Frito-Lay"
"0000000030","GM", "General Mills"
"0000000040","HVEND", "Hershey Vending"
"0000000050","HFUND", "Hershey Fund Raising"
From wikipedia:
According to RFC 4180, spaces outside quotes in a field are not allowed; however, the RFC also says that "Spaces are considered part of a field and should not be ignored." and "Implementors should 'be conservative in what you do, be liberal in what you accept from others' (RFC 793, section 2.10) when processing CSV files."
Jackson is being "liberal" in processing most of your records; but when it finds
"0000000160","CADBURY", "Cadbury Adam USA, LLC"
It has no choice but to treat is as 4 fields:
'0000000160'
'CADBURY'
' "Cadbury Adam USA'
' LLC"'
Would suggest fixing the file as that will allow parsing with most CSV libraries. You could try another library, there is no shortage of them.
univocity-parsers can handle that without any issues. It's built to deal with all sorts of tricky and non-standard CSV files and is also faster than the parser you are using.
Try this code:
String fileName= "src/main/resources/pdssr/manufacturers.csv";
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(settings);
for(Record record : parser.iterateRecords(new File(fileName))){
Map<String, String> rowAsMap = record.toFieldMap();
System.out.println(rowAsMap);
}
Hope this helps.
Disclosure: I'm the author of this library. It's open source and free (Apache 2.0 license)
When I used SUTime feature of StanfordCoreNLP using the code given in its documentation which involved the usage of AnnotationPipeline for creating a pipeline object, I was able to extract TIME from the string successfully.
The code used is :
But my project required StanfordCoreNLP pipeline so when I used the same pipeline to extract the TIME it was giving me a NULLPointerException.
My code is as follows:
The error I am encountering is as follows:
I also tried the solution suggested by #StanfordNLPHelp in this link :
Dates when using StanfordCoreNLP pipeline
The code is as follows :
But the error still persists:
The standard ner annotator will run SUTime. Please see this link for the Java API info:
https://stanfordnlp.github.io/CoreNLP/api.html
basic example:
import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ie.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import java.util.*;
public class BasicPipelineExample {
public static String text = "Joe Smith was born in California. " +
"In 2017, he went to Paris, France in the summer. " +
"His flight left at 3:00pm on July 10th, 2017. " +
"After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
"He sent a postcard to his sister Jane Smith. " +
"After hearing about Joe's trip, Jane decided she might go to France one day.";
public static void main(String[] args) {
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
// set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument document = new CoreDocument(text);
// annnotate the document
pipeline.annotate(document);
// examples
// 10th token of the document
CoreLabel token = document.tokens().get(10);
System.out.println("Example: token");
System.out.println(token);
System.out.println();
// text of the first sentence
String sentenceText = document.sentences().get(0).text();
System.out.println("Example: sentence");
System.out.println(sentenceText);
System.out.println();
// second sentence
CoreSentence sentence = document.sentences().get(1);
// list of the part-of-speech tags for the second sentence
List<String> posTags = sentence.posTags();
System.out.println("Example: pos tags");
System.out.println(posTags);
System.out.println();
// list of the ner tags for the second sentence
List<String> nerTags = sentence.nerTags();
System.out.println("Example: ner tags");
System.out.println(nerTags);
System.out.println();
// constituency parse for the second sentence
Tree constituencyParse = sentence.constituencyParse();
System.out.println("Example: constituency parse");
System.out.println(constituencyParse);
System.out.println();
// dependency parse for the second sentence
SemanticGraph dependencyParse = sentence.dependencyParse();
System.out.println("Example: dependency parse");
System.out.println(dependencyParse);
System.out.println();
// kbp relations found in fifth sentence
List<RelationTriple> relations =
document.sentences().get(4).relations();
System.out.println("Example: relation");
System.out.println(relations.get(0));
System.out.println();
// entity mentions in the second sentence
List<CoreEntityMention> entityMentions = sentence.entityMentions();
System.out.println("Example: entity mentions");
System.out.println(entityMentions);
System.out.println();
// coreference between entity mentions
CoreEntityMention originalEntityMention = document.sentences().get(3).entityMentions().get(1);
System.out.println("Example: original entity mention");
System.out.println(originalEntityMention);
System.out.println("Example: canonical entity mention");
System.out.println(originalEntityMention.canonicalEntityMention().get());
System.out.println();
// get document wide coref info
Map<Integer, CorefChain> corefChains = document.corefChains();
System.out.println("Example: coref chains for document");
System.out.println(corefChains);
System.out.println();
// get quotes in document
List<CoreQuote> quotes = document.quotes();
CoreQuote quote = quotes.get(0);
System.out.println("Example: quote");
System.out.println(quote);
System.out.println();
// original speaker of quote
// note that quote.speaker() returns an Optional
System.out.println("Example: original speaker of quote");
System.out.println(quote.speaker().get());
System.out.println();
// canonical speaker of quote
System.out.println("Example: canonical speaker of quote");
System.out.println(quote.canonicalSpeaker().get());
System.out.println();
}
}
You can remove the annotators after ner if you only want DATE's.
The same TIMEX3 format which resulted from using :
obj.get(TimeExpression.Annotation.class).getTemporal() ---> 2018-06-29T17:00
got stored in NormalizedNamedEntityTagAnnotation.class when I used ner tagger along with StanfordCoreNLP pipeline. The detailed information can be found in the documentation of Stanford Temporal Tagger
The following code worked fine in extracting the date:
I run the same demo example on the website with the following sentence:
"Hudson was born in Hampstead, which is a suburb of London."
and give me the following,
Hudson be bear
and I was expecting the following relations:
(Hudson, was born in, Hampstead)
(Hampstead, is a suburb of, London)
import edu.stanford.nlp.ie.util.RelationTriple;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.naturalli.NaturalLogicAnnotations;
import edu.stanford.nlp.util.CoreMap;
import java.util.Collection;
import java.util.Properties;
/** A demo illustrating how to call the OpenIE system programmatically.
*/
public class OpenIEDemo {
public static void main(String[] args) throws Exception {
// Create the Stanford CoreNLP pipeline
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie");
//tokenize,ssplit,pos,lemma,depparse,natlog,openie
//tokenize,ssplit,pos,lemma,ner,regexner,parse,mention,entitymentions,coref,kbp
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// Annotate an example document.
Annotation doc = new Annotation(args[0]);
pipeline.annotate(doc);
// Loop over sentences in the document
for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
// Get the OpenIE triples for the sentence
Collection<RelationTriple> triples =
sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);
// Print the triples
for (RelationTriple triple : triples) {
System.out.println(triple.confidence + "\t" +
triple.subjectLemmaGloss() + "\t" +
triple.relationLemmaGloss() + "\t" +
triple.objectLemmaGloss());
}
}
}
}
Thank you for your help
So, the system is not wrong, though certainly undergenerating possible relations. Hudson be bear is just asserting that Hudson was born (a true fact). This in particular was caused by the ref edge from Hampstead -ref-> which. This should be fixed in subsequent versions of the code.
In general though, like all NLP systems, OpenIE has a certain accuracy rate that's under 100%, and you should never expect the system to produce completely correct output. Especially for a task like Open IE, where even getting agreement on what "correct" means is difficult.
I am looking for similar logic like from here RelationExtraction NLP
as process explain in the answer, I am able to reach out NER and entity sinking but I am very confused with "slot filling" logic and not getting proper resources on Internet.
Here is my code sample
public static void main(String[] args) throws IOException, ClassNotFoundException {
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
//String text = "Mary has a little lamb. She is very cute."; // Add your text here!
String text = "Matrix Partners along with existing investors Sequoia Capital and Nexus Venture Partners has invested R100 Cr in Mumbai based food ordering app, TinyOwl. The series B funding will be used by the company to expand its geographical presence to over 50 cities, upgrade technology and enhance user experience.";
text+="In December last year, it raised $3 Mn from Sequoia Capital India and Nexus Venture Partners to deepen its presence in home market Mumbai. It was seeded by Deap Ubhi (who had earlier founded Burrp) and Sandeep Tandon.";
text+="Kunal Bahl and Rohit Bansal, were also said to be planning to invest in the company’s second round of fund raise.";
text+="Founded by Harshvardhan Mandad and Gaurav Choudhary, TinyOwl claims to have tie-up with 5,000 restaurants and processes almost 2000 orders. The app which competes with the likes of FoodPanda aims to process over 50,000 daily orders.";
text+="The top-line comes from the cut the company takes from each order placed through its app.";
text+="The startup is also planning to come with reviews which would make it a competitor of Zomato, valued at $660 Mn. Also, Zomato is entering the food ordering business to expand its offerings.";
text+="Recently another peer, Bengaluru based food delivery startup, SpoonJoy raised an undisclosed amount of funding from Sachin Bansal (Co-Founder Flipkart) and Mekin Maheshwari (CPO Flipkart), Abhishek Goyal (Founder, Tracxn) and Sahil Barua (Co-Founder, Delhivery).";
text+="-TechCrunch";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
//System.out.println(" word \n"+word);
// this is the POS tag of the token
String pos = token.get(PartOfSpeechAnnotation.class);
// System.out.println(" pos \n"+pos);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
//System.out.println(" ne \n"+ne);
}
// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);
System.out.println(" TREE \n"+tree);
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println(" dependencies \n"+dependencies);
}
// This is the coreference link graph
// Each chain stores a set of mentions that link to each other,
// along with a method for getting the most representative mention
// Both sentence and token offsets start at 1!
Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
System.out.println("graph \n "+graph);
}
}
This gives output with same entities combined now I have to take this ahead with finding relationship between these entities. Example from code string I should get in putput that "Matrix Partners" and "Sequoia Capital" has relation "investor" or similar kind of structure.
Please correct me if am wrong somewhere and lead me to correct way.