I am absolutely new to Java development.
Can someone please elaborate on how to obtain "Grammatical Relations" using the Stanfords's Natural Language Processing Lexical Parser- open source Java code?
Thanks!
See line 88 of first file in my code to run the Stanford Parser programmatically
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCollapsed();
System.out.println("words: "+words);
System.out.println("POStags: "+tags);
System.out.println("stemmedWordsAndTags: "+stems);
System.out.println("typedDependencies: "+tdl);
The collection tdl is a list of these typed dependencies. If you look on the javadoc for TypedDependency you'll see that using the .reln() method gets you the grammatical relation.
Lines 311-318 of the third file in my code show how to use that list of typed dependencies. I happen to get the name of the relation, but you could get the relation itself, which would be of the class GrammaticalRelation.
for( Iterator<TypedDependency> iter = tdl.iterator(); iter.hasNext(); ) {
TypedDependency var = iter.next();
TreeGraphNode dep = var.dep();
TreeGraphNode gov = var.gov();
// All useful information for a node in the tree
String reln = var.reln().getShortName();
Don't feel bad, I spent a miserable day or two trying to figure out how to use the parser. I don't know if the docs have improved, but when I used it they were pretty damn awful.
Related
I recently discovered the Stanford NLP parser and it seems quite amazing. I have currently a working instance of it running in our project but facing the below mentioned 2 problems.
How can I parse text and then extract only specific speech-labels from the parsed data, for example, how can I extract only NNPS and PRP from the sentence.
Our platform works in both English and German, so there is always a possibility that the text is either in English or German. How can I accommodate this scenario. Thank you.
Code :
private final String PCG_MODEL = "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";
private final TokenizerFactory<CoreLabel> tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), "invertible=true");
public void testParser() {
LexicalizedParser lp = LexicalizedParser.loadModel(PCG_MODEL);
String sent="Complete Howto guide to install EC2 Linux server in Amazon Web services cloud.";
Tree parse;
parse = lp.parse(sent);
List taggedWords = parse.taggedYield();
System.out.println(taggedWords);
}
The above example works, but as you can see I am loading the English data. Thank you.
Try this:
for (Tree subTree: parse) // traversing the sentence's parse tree
{
if(subTree.label().value().equals("NNPS")) //If the word's label is NNPS
{ //Do what you want }
}
For Query 1, I don't think stanford-nlp has an option to extract a specific POS tags.
However, Using custom trained models, we can achieve the same. I had tried similar requirement for NER - name Entity recognition custom models.
I am new in ANTLR. My requirement is to programaticaly parse a PL/SQL block and to be able to format it based on some conditions.For example:
1) I want to find all the commented code inside the SQL block.
2) I should be able to write back the parsed/edited object back in some sql file.
Currently I have compiled the plsql-parser available at porcelli/pl-sql.
I also went through this helpful link. So in a nutshell I have parsed the sql block.
PLSQLLexer lex = new PLSQLLexer(new ANTLRNoCaseFileStream(file));
CommonTokenStream tokens = new CommonTokenStream(lex);
PLSQLParser parser = new PLSQLParser(tokens);
/*start_rule_return AST =*/ parser.data_manipulation_language_statements();
System.err.println(file +": " + parser.getNumberOfSyntaxErrors());
//This is the place I want to build my tree
// parser.setBuildParseTree(true);
//ParseTree tree = parser.plsql_block();
I need some help, useful links in this direction. I am new in ANTLR; so any help in any direction will be appreciated.
I am trying to implement the java based NLP „RFTagger“ to a Processing Sketch in order to analyze Tweets.
Using Twitter4j as described here http://blog.blprnt.com/blog/blprnt/updated-quick-tutorial-processing-twitter
Using RFTagger to analyze Tweets: http://sifnos.sfs.uni-tuebingen.de/resource/A4/rftj/
After I filtered out all retweets, hashtags and profile names in order to have clear sentences to work with, the words of one sentence are stored in an ArrayList:
ArrayList<String> sentsTweet = new ArrayList<String>();
Now I’d like to have the sentence analyzed by RFTagger. I just implemented the library as described on the RFTagger Website:
List <String> tags = rft.getTags(sentsTweet);
Unfortunately within Processing the class "List" is unknown / not available (?) / Error Message: Cannot find a class or type named “List“
I know I could transform the data into some other, manageable format. Like this:
Object[] tags = (rft.getTags(sentsTweet)).toArray();
But I need to store the data how it is in order to send it a second time to RFTagger to use it's tagset converter:
TagsetConverter conv = ConverterFactory.getConverter("stts");
List<String> sttsTags = new LinkedList<String>();
for ( String tag : tags ) {
sttsTags.add(conv.rftag2tag(tag));
}
Now as List<String> doesn't work in Processing do you guys have an idea how I could handle the data and or communication of RFTagger it?
Kind regards,
Marv
This has nothing to do with the processing library.
RFTagger.getTags() returns java.util.List which is part of the JDK and JRE. You need to add the import for the List class:
import java.util.List;
I'm currently using the Twitter POS tagger available here to tag out tweets into the Penn-Tree Bank tags.
Here is that code:
import java.util.List;
import cmu.arktweetnlp.Tagger;
import cmu.arktweetnlp.Tagger.TaggedToken;
/* Tags the tweet text */
List<TaggedToken> tagTweet(String text) throws IOException {
// Loads Penn Treebank POS tags
tagger.loadModel("res/model.ritter_ptb_alldata_fixed.txt");
// Tags the tweet text
taggedTokens = tagger.tokenizeAndTag(text);
return taggedTokens;
}
Now I need to identify where the direct objects are in these tags. After some searching, I've discovered that the Stanford Parser can do this, by way of the Stanford Typed Dependencies, (online example). By using the dobj() call, I should be able to get what I need.
However, I have not found any good documentation about how to feed already-tagged sentences into this tool. From what I understand, before using the Dependency Parser I need to create a tree from the sentence's tokens/tags. How is this done? I have not been able to find any example code.
The Twitter POS Tagger contains an instance of the Stanford NLP Tools, so I'm not far off, however I am not familiar enough with the Stanford tools to feed my POS-tagged text into it in order to get the dependency parser to work properly. The FAQ does mention this functionality, but without any example code to go off of, I'm a bit stuck.
Here is how it is done with completely manual creation of the List discussed in the FAQ:
String[] sent3 = { "It", "can", "can", "it", "." };
// Parser gets second "can" wrong without help (parsing it as modal MD)
String[] tag3 = { "PRP", "MD", "VB", "PRP", "." };
List<TaggedWord> sentence3 = new ArrayList<TaggedWord>();
for (int i = 0; i < sent3.length; i++) {
sentence3.add(new TaggedWord(sent3[i], tag3[i]));
}
Tree parse = lp.parse(sentence3);
parse.pennPrint();
I'm migrating my Java application from Lucene 2 to Lucene 4, and I cannot find any good way to convert my code. I also tried to go to http://lucene.apache.org/core/4_0_0-ALPHA/MIGRATE.html but the example code in it simply does not work (for example the method reader.termDocsEnum does not exist for IndexReader or DirectoryReader, but only for AtomicReader I never heard about).
Given an IndexReader called indexReader, the old code was:
Term find = new Term("field", "value");
TermDocs td = indexReader.termDocs(find);
while (termDocs.next()) {
Document d = termDocs.doc();
// do stuff
}
How can I convert that code?
Thanks!
The following should be relevant to your case:
The docs/positions enums cannot seek to a term. Instead, TermsEnum is able to seek, and then you request the docs/positions enum from that TermsEnum.
I guess you need this:
TermsEnum termsEnum = atomicReader.terms("fieldName").iterator();
BytesRef text = new BytesRef("searchTerm");
if (termsEnum.seekExact(text, true)) {
...
}
The low-level API is now clearly oriented towards atomic (non-composite) readers because this is the only way to top performance. You might wrap te composite reader you acquire from Directory in a SlowCompositeReaderWrapper, but, as the classname already warns, it will be slow.