Extracting Date using StanfordCoreNLP pipeline instead of AnnotationPipeline - java

When I used SUTime feature of StanfordCoreNLP using the code given in its documentation which involved the usage of AnnotationPipeline for creating a pipeline object, I was able to extract TIME from the string successfully.
The code used is :
But my project required StanfordCoreNLP pipeline so when I used the same pipeline to extract the TIME it was giving me a NULLPointerException.
My code is as follows:
The error I am encountering is as follows:
I also tried the solution suggested by #StanfordNLPHelp in this link :
Dates when using StanfordCoreNLP pipeline
The code is as follows :
But the error still persists:

The standard ner annotator will run SUTime. Please see this link for the Java API info:
https://stanfordnlp.github.io/CoreNLP/api.html
basic example:
import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ie.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import java.util.*;
public class BasicPipelineExample {
public static String text = "Joe Smith was born in California. " +
"In 2017, he went to Paris, France in the summer. " +
"His flight left at 3:00pm on July 10th, 2017. " +
"After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
"He sent a postcard to his sister Jane Smith. " +
"After hearing about Joe's trip, Jane decided she might go to France one day.";
public static void main(String[] args) {
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
// set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument document = new CoreDocument(text);
// annnotate the document
pipeline.annotate(document);
// examples
// 10th token of the document
CoreLabel token = document.tokens().get(10);
System.out.println("Example: token");
System.out.println(token);
System.out.println();
// text of the first sentence
String sentenceText = document.sentences().get(0).text();
System.out.println("Example: sentence");
System.out.println(sentenceText);
System.out.println();
// second sentence
CoreSentence sentence = document.sentences().get(1);
// list of the part-of-speech tags for the second sentence
List<String> posTags = sentence.posTags();
System.out.println("Example: pos tags");
System.out.println(posTags);
System.out.println();
// list of the ner tags for the second sentence
List<String> nerTags = sentence.nerTags();
System.out.println("Example: ner tags");
System.out.println(nerTags);
System.out.println();
// constituency parse for the second sentence
Tree constituencyParse = sentence.constituencyParse();
System.out.println("Example: constituency parse");
System.out.println(constituencyParse);
System.out.println();
// dependency parse for the second sentence
SemanticGraph dependencyParse = sentence.dependencyParse();
System.out.println("Example: dependency parse");
System.out.println(dependencyParse);
System.out.println();
// kbp relations found in fifth sentence
List<RelationTriple> relations =
document.sentences().get(4).relations();
System.out.println("Example: relation");
System.out.println(relations.get(0));
System.out.println();
// entity mentions in the second sentence
List<CoreEntityMention> entityMentions = sentence.entityMentions();
System.out.println("Example: entity mentions");
System.out.println(entityMentions);
System.out.println();
// coreference between entity mentions
CoreEntityMention originalEntityMention = document.sentences().get(3).entityMentions().get(1);
System.out.println("Example: original entity mention");
System.out.println(originalEntityMention);
System.out.println("Example: canonical entity mention");
System.out.println(originalEntityMention.canonicalEntityMention().get());
System.out.println();
// get document wide coref info
Map<Integer, CorefChain> corefChains = document.corefChains();
System.out.println("Example: coref chains for document");
System.out.println(corefChains);
System.out.println();
// get quotes in document
List<CoreQuote> quotes = document.quotes();
CoreQuote quote = quotes.get(0);
System.out.println("Example: quote");
System.out.println(quote);
System.out.println();
// original speaker of quote
// note that quote.speaker() returns an Optional
System.out.println("Example: original speaker of quote");
System.out.println(quote.speaker().get());
System.out.println();
// canonical speaker of quote
System.out.println("Example: canonical speaker of quote");
System.out.println(quote.canonicalSpeaker().get());
System.out.println();
}
}
You can remove the annotators after ner if you only want DATE's.

The same TIMEX3 format which resulted from using :
obj.get(TimeExpression.Annotation.class).getTemporal() ---> 2018-06-29T17:00
got stored in NormalizedNamedEntityTagAnnotation.class when I used ner tagger along with StanfordCoreNLP pipeline. The detailed information can be found in the documentation of Stanford Temporal Tagger
The following code worked fine in extracting the date:

Related

How to extract Wikipedia entity matched to CoreEntityMention (WikiDictAnnotator)

I am running CoreNLP over some text, and matching the entities found to Wikipedia entities. I want to reconstruct the sentence providing the link and other useful information for the entities found.
The CoreEntityMention has an entity() method, but it just returns a String.
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitylink");
// set up pipeline
pipeline = new StanfordCoreNLP(props);
String doc = "text goes here";
pipeline.annotate(doc);
// Iterate the sentences
for (CoreSentence sentence : doc.sentences()) {
Go through all mentions
for (CoreEntityMention em : sentence.entityMentions()) {
System.out.println(em.sentence());
// Here I would like to extract the Wikipedia entity information
System.out.println(em.entity());
}
}
You just need to add the wikipedia page url.
So Neil_Armstrong maps to https://en.wikipedia.org/wiki/Neil_Armstrong.

Could you pleas help me in the following stanford-nlp OpenIE

I run the same demo example on the website with the following sentence:
"Hudson was born in Hampstead, which is a suburb of London."
and give me the following,
Hudson be bear
and I was expecting the following relations:
(Hudson, was born in, Hampstead)
(Hampstead, is a suburb of, London)
import edu.stanford.nlp.ie.util.RelationTriple;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.naturalli.NaturalLogicAnnotations;
import edu.stanford.nlp.util.CoreMap;
import java.util.Collection;
import java.util.Properties;
/** A demo illustrating how to call the OpenIE system programmatically.
*/
public class OpenIEDemo {
public static void main(String[] args) throws Exception {
// Create the Stanford CoreNLP pipeline
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie");
//tokenize,ssplit,pos,lemma,depparse,natlog,openie
//tokenize,ssplit,pos,lemma,ner,regexner,parse,mention,entitymentions,coref,kbp
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// Annotate an example document.
Annotation doc = new Annotation(args[0]);
pipeline.annotate(doc);
// Loop over sentences in the document
for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
// Get the OpenIE triples for the sentence
Collection<RelationTriple> triples =
sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);
// Print the triples
for (RelationTriple triple : triples) {
System.out.println(triple.confidence + "\t" +
triple.subjectLemmaGloss() + "\t" +
triple.relationLemmaGloss() + "\t" +
triple.objectLemmaGloss());
}
}
}
}
Thank you for your help
So, the system is not wrong, though certainly undergenerating possible relations. Hudson be bear is just asserting that Hudson was born (a true fact). This in particular was caused by the ref edge from Hampstead -ref-> which. This should be fixed in subsequent versions of the code.
In general though, like all NLP systems, OpenIE has a certain accuracy rate that's under 100%, and you should never expect the system to produce completely correct output. Especially for a task like Open IE, where even getting agreement on what "correct" means is difficult.

StanfordNLP: models from kbp not found (Eclipse)

I am a bit new to Java and Eclipse. I usually use python and Nltk for NLP task..
I am trying to follow the tutorial provided here
package edu.stanford.nlp.examples;
import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ie.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import java.util.*;
public class BasicPipelineExample {
public static String text = "Joe Smith was born in California. " +
"In 2017, he went to Paris, France in the summer. " +
"His flight left at 3:00pm on July 10th, 2017. " +
"After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
"He sent a postcard to his sister Jane Smith. " +
"After hearing about Joe's trip, Jane decided she might go to France one day.";
public static void main(String[] args) {
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
// set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument document = new CoreDocument(text);
// annnotate the document
pipeline.annotate(document);
// examples
// 10th token of the document
CoreLabel token = document.tokens().get(10);
System.out.println("Example: token");
System.out.println(token);
System.out.println();
// text of the first sentence
String sentenceText = document.sentences().get(0).text();
System.out.println("Example: sentence");
System.out.println(sentenceText);
System.out.println();
// second sentence
CoreSentence sentence = document.sentences().get(1);
// list of the part-of-speech tags for the second sentence
List<String> posTags = sentence.posTags();
System.out.println("Example: pos tags");
System.out.println(posTags);
System.out.println();
// list of the ner tags for the second sentence
List<String> nerTags = sentence.nerTags();
System.out.println("Example: ner tags");
System.out.println(nerTags);
System.out.println();
// constituency parse for the second sentence
Tree constituencyParse = sentence.constituencyParse();
System.out.println("Example: constituency parse");
System.out.println(constituencyParse);
System.out.println();
// dependency parse for the second sentence
SemanticGraph dependencyParse = sentence.dependencyParse();
System.out.println("Example: dependency parse");
System.out.println(dependencyParse);
System.out.println();
// kbp relations found in fifth sentence
List<RelationTriple> relations =
document.sentences().get(4).relations();
System.out.println("Example: relation");
System.out.println(relations.get(0));
System.out.println();
// entity mentions in the second sentence
List<CoreEntityMention> entityMentions = sentence.entityMentions();
System.out.println("Example: entity mentions");
System.out.println(entityMentions);
System.out.println();
// coreference between entity mentions
CoreEntityMention originalEntityMention = document.sentences().get(3).entityMentions().get(1);
System.out.println("Example: original entity mention");
System.out.println(originalEntityMention);
System.out.println("Example: canonical entity mention");
System.out.println(originalEntityMention.canonicalEntityMention().get());
System.out.println();
// get document wide coref info
Map<Integer, CorefChain> corefChains = document.corefChains();
System.out.println("Example: coref chains for document");
System.out.println(corefChains);
System.out.println();
// get quotes in document
List<CoreQuote> quotes = document.quotes();
CoreQuote quote = quotes.get(0);
System.out.println("Example: quote");
System.out.println(quote);
System.out.println();
// original speaker of quote
// note that quote.speaker() returns an Optional
System.out.println("Example: original speaker of quote");
System.out.println(quote.speaker().get());
System.out.println();
// canonical speaker of quote
System.out.println("Example: canonical speaker of quote");
System.out.println(quote.canonicalSpeaker().get());
System.out.println();
}
}
but I always get the following output containing an error, and this happen for all modules relating to kbp, and I did add the jar files as requested by the tutorial:
Adding annotator tokenize No tokenizer type provided. Defaulting to
PTBTokenizer. Adding annotator ssplit Adding annotator pos Loading POS
tagger from
edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
... done [0.9 sec]. Adding annotator lemma Adding annotator ner
Loading classifier from
edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ...
done [1.4 sec]. Loading classifier from
edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ...
done [1.8 sec]. Loading classifier from
edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz
... done [0.6 sec]. Exception in thread "main"
edu.stanford.nlp.io.RuntimeIOException: Couldn't read TokensRegexNER
from edu/stanford/nlp/models/kbp/regexner_caseless.tab at
edu.stanford.nlp.pipeline.TokensRegexNERAnnotator.readEntries(TokensRegexNERAnnotator.java:593)
at
edu.stanford.nlp.pipeline.TokensRegexNERAnnotator.(TokensRegexNERAnnotator.java:293)
at
edu.stanford.nlp.pipeline.NERCombinerAnnotator.setUpFineGrainedNER(NERCombinerAnnotator.java:209)
at
edu.stanford.nlp.pipeline.NERCombinerAnnotator.(NERCombinerAnnotator.java:152)
at
edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:68)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$45(StanfordCoreNLP.java:546)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$70(StanfordCoreNLP.java:625)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at
edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at
edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:201)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:194)
at
edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:181)
at NLP.Start.main(Start.java:13) Caused by: java.io.IOException:
Unable to open "edu/stanford/nlp/models/kbp/regexner_caseless.tab" as
class path, filename or URL at
edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:481)
at edu.stanford.nlp.io.IOUtils.readerFromString(IOUtils.java:618) at
edu.stanford.nlp.pipeline.TokensRegexNERAnnotator.readEntries(TokensRegexNERAnnotator.java:590)
... 14 more
Do you have any idea to fix this? Thanks in advance!
Probably you forgot to add stanford-corenlp-3.9.1-models.jar to your class path.
Well, according to the models page, there are is a separate model download for the kbp material. Perhaps you have access to stanford-english-corenlp-2018-02-27-models, but not access to stanford-english-kbp-corenlp-2018-02-27-models? I would guess this because it appears that other models were found from what you provided us in the question.

Customized Relationship Extraction Between two Entities Stanford NLP

I am looking for similar logic like from here RelationExtraction NLP
as process explain in the answer, I am able to reach out NER and entity sinking but I am very confused with "slot filling" logic and not getting proper resources on Internet.
Here is my code sample
public static void main(String[] args) throws IOException, ClassNotFoundException {
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
//String text = "Mary has a little lamb. She is very cute."; // Add your text here!
String text = "Matrix Partners along with existing investors Sequoia Capital and Nexus Venture Partners has invested R100 Cr in Mumbai based food ordering app, TinyOwl. The series B funding will be used by the company to expand its geographical presence to over 50 cities, upgrade technology and enhance user experience.";
text+="In December last year, it raised $3 Mn from Sequoia Capital India and Nexus Venture Partners to deepen its presence in home market Mumbai. It was seeded by Deap Ubhi (who had earlier founded Burrp) and Sandeep Tandon.";
text+="Kunal Bahl and Rohit Bansal, were also said to be planning to invest in the company’s second round of fund raise.";
text+="Founded by Harshvardhan Mandad and Gaurav Choudhary, TinyOwl claims to have tie-up with 5,000 restaurants and processes almost 2000 orders. The app which competes with the likes of FoodPanda aims to process over 50,000 daily orders.";
text+="The top-line comes from the cut the company takes from each order placed through its app.";
text+="The startup is also planning to come with reviews which would make it a competitor of Zomato, valued at $660 Mn. Also, Zomato is entering the food ordering business to expand its offerings.";
text+="Recently another peer, Bengaluru based food delivery startup, SpoonJoy raised an undisclosed amount of funding from Sachin Bansal (Co-Founder Flipkart) and Mekin Maheshwari (CPO Flipkart), Abhishek Goyal (Founder, Tracxn) and Sahil Barua (Co-Founder, Delhivery).";
text+="-TechCrunch";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
//System.out.println(" word \n"+word);
// this is the POS tag of the token
String pos = token.get(PartOfSpeechAnnotation.class);
// System.out.println(" pos \n"+pos);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
//System.out.println(" ne \n"+ne);
}
// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);
System.out.println(" TREE \n"+tree);
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println(" dependencies \n"+dependencies);
}
// This is the coreference link graph
// Each chain stores a set of mentions that link to each other,
// along with a method for getting the most representative mention
// Both sentence and token offsets start at 1!
Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
System.out.println("graph \n "+graph);
}
}
This gives output with same entities combined now I have to take this ahead with finding relationship between these entities. Example from code string I should get in putput that "Matrix Partners" and "Sequoia Capital" has relation "investor" or similar kind of structure.
Please correct me if am wrong somewhere and lead me to correct way.

ROME API to parse RSS/Atom

I'm trying to parse RSS/Atom feeds with the ROME library. I am new to Java, so I am not in tune with many of its intricacies.
Does ROME automatically use its modules to handle different feeds as it comes across them, or do I have to ask it to use them? If so, any direction on this.
How do I get to the correct 'source'? I was trying to use item.getSource(), but it is giving me fits. I guess I am using the wrong interface. Some direction would be much appreciated.
Here is the meat of what I have for collection my data.
I noted two areas where I am having problems, both revolving around getting Source Information of the feed. And by source, I want CNN, or FoxNews, or whomever, not the Author.
Judging from my reading, .getSource() is the correct method.
List<String> feedList = theFeeds.getFeeds();
List<FeedData> feedOutput = new ArrayList<FeedData>();
for (String sites : feedList ) {
URL feedUrl = new URL(sites);
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));
List<SyndEntry> entries = feed.getEntries();
for (SyndEntry item : entries){
String title = item.getTitle();
String link = item.getUri();
Date date = item.getPublishedDate();
Problem here --> ** SyndEntry source = item.getSource();
String description;
if (item.getDescription()== null){
description = "";
} else {
description = item.getDescription().getValue();
}
String cleanDescription = description.replaceAll("\\<.*?>","").replaceAll("\\s+", " ");
FeedData feedData = new FeedData();
feedData.setTitle(title);
feedData.setLink(link);
And Here --> ** feedData.setSource(link);
feedData.setDate(date);
feedData.setDescription(cleanDescription);
String preview =createPreview(cleanDescription);
feedData.setPreview(preview);
feedOutput.add(feedData);
// lets print out my pieces.
System.out.println("Title: " + title);
System.out.println("Date: " + date);
System.out.println("Text: " + cleanDescription);
System.out.println("Preview: " + preview);
System.out.println("*****");
}
}
getSource() is definitely wrong - it returns back SyndFeed to which entry in question belongs. Perhaps what you want is getContributors()?
As far as modules go, they should be selected automatically. You can even write your own and plug it in as described here
What about trying regex the source from the URL without using the API?
That was my first thought, anyway I checked against the RSS standardized format itself to get an idea if this option is actually available at this level, and then try to trace its implementation upwards...
In RSS 2.0, I have found the source element, however it appears that it doesn't exist in previous versions of the spec- not good news for us!
[ is an optional sub-element of 1
Its value is the name of the RSS channel that the item came from, derived from its . It has one required attribute, url, which links to the XMLization of the source.

Categories

Resources