I run the same demo example on the website with the following sentence:
"Hudson was born in Hampstead, which is a suburb of London."
and give me the following,
Hudson be bear
and I was expecting the following relations:
(Hudson, was born in, Hampstead)
(Hampstead, is a suburb of, London)
import edu.stanford.nlp.ie.util.RelationTriple;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.naturalli.NaturalLogicAnnotations;
import edu.stanford.nlp.util.CoreMap;
import java.util.Collection;
import java.util.Properties;
/** A demo illustrating how to call the OpenIE system programmatically.
*/
public class OpenIEDemo {
public static void main(String[] args) throws Exception {
// Create the Stanford CoreNLP pipeline
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie");
//tokenize,ssplit,pos,lemma,depparse,natlog,openie
//tokenize,ssplit,pos,lemma,ner,regexner,parse,mention,entitymentions,coref,kbp
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// Annotate an example document.
Annotation doc = new Annotation(args[0]);
pipeline.annotate(doc);
// Loop over sentences in the document
for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
// Get the OpenIE triples for the sentence
Collection<RelationTriple> triples =
sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);
// Print the triples
for (RelationTriple triple : triples) {
System.out.println(triple.confidence + "\t" +
triple.subjectLemmaGloss() + "\t" +
triple.relationLemmaGloss() + "\t" +
triple.objectLemmaGloss());
}
}
}
}
Thank you for your help
So, the system is not wrong, though certainly undergenerating possible relations. Hudson be bear is just asserting that Hudson was born (a true fact). This in particular was caused by the ref edge from Hampstead -ref-> which. This should be fixed in subsequent versions of the code.
In general though, like all NLP systems, OpenIE has a certain accuracy rate that's under 100%, and you should never expect the system to produce completely correct output. Especially for a task like Open IE, where even getting agreement on what "correct" means is difficult.
Related
I'm having an issue with finding out how to use WEKA filters in the java code. I've looked up help but it seems a little dated as I'm using WEKA 3.8.5 . I'm doing 3 test. Test 1: No Filter, Test 2: weka.filters.supervised.instance.SpreadSubsample -M 1.0 , and Test 3: weka.filters.supervised.instance.Resample -B 1.0 -Z 130.3.
If my research is correct I should import the filters like this. Now I'm lost on having "-M 1.0 " for SpreadSample(my under sampling Test) and "-B 1.0 -Z 130.3." for Resample(My oversampling test).
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.supervised.instance.Resample;
import weka.filters.supervised.instance.SpreadSubsample;
And I have Test 1(my no filter Test) coded below
import java.io.FileReader;
import java.util.Random;
import weka.classifiers.Evaluation;
import weka.classifiers.trees.J48;
import weka.core.Instances;
public class Fraud {
public static void main(String args[])
{
try {
// Creating J48 classifier for the tree
J48 j48Classifier = new J48();
// Setting the path for the dataset
String FraudDataset = "C:\\Users\\Owner\\Desktop\\CreditCard\\CreditCard.arff";
BufferedReader bufferedReader
= new BufferedReader(
new FileReader(FraudDataset));
// Creating the data set instances
Instances datasetInstances
= new Instances(bufferedReader);
datasetInstances.setClassIndex(
datasetInstances.numAttributes() - 1);
Evaluation evaluation
= new Evaluation(datasetInstances);
// Cross Validate Model. 10 Folds
evaluation.crossValidateModel(
j48Classifier, datasetInstances, 10,
new Random(1));
System.out.println(evaluation.toSummaryString(
"\nResults", false));
}
// Catching exceptions
catch (Exception e) {
System.out.println("Error Occured!!!! \n"
+ e.getMessage());
}
System.out.print("DT Successfully executed.");
}
}
The results of my code is:
Results
Correctly Classified Instances 284649 99.9445 %
Incorrectly Classified Instances 158 0.0555 %
Kappa statistic 0.8257
Mean absolute error 0.0008
Root mean squared error 0.0232
Relative absolute error 24.2995 %
Root relative squared error 55.9107 %
Total Number of Instances 284807
DT Successfully executed.
Does anyone have an idea on how I can add the filters and the settings I want for the filters to the code for Test 2 and 3? Any help will be appreciated. I will run the 3 tests multiple times and compare the results. I want to see what works best of the 3.
-M 1.0 and -B 1.0 -Z 130.3 are the options that you supply to the filters from the command-line. These filters implement the weka.core.OptionHandler interface, which offers the setOptions and getOptions methods.
For example, SpreadSubsample can be instantiated like this:
import weka.filters.supervised.instance.SpreadSubsample;
import weka.core.Utils;
...
SpreadSubsample spread = new SpreadSubsample();
// Utils.splitOptions generates an array from an option string
spread.setOptions(Utils.splitOptions("-M 1.0"));
// alternatively:
// spread.setOptions(new String[]{"-M", "1.0"});
In order to apply the filters, you should use the FilteredClassifier approach. E.g., for SpreadSubsample you would do something like this:
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.supervised.instance.SpreadSubsample;
import weka.core.Utils;
...
// base classifier
J48 j48 = new J48();
// filter
SpreadSubsample spread = new SpreadSubsample();
spread.setOptions(Utils.splitOptions("-M 1.0"));
// meta-classifier
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(spread);
fc.setClassifier(j48);
And then evaluate the fc classifier object on your dataset.
The code:
package org.javautil.salesdata;
import java.io.File;
import java.io.IOException;
import java.util.Map;
import org.javautil.util.ListOfNameValue;
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
// https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv
public class Manufacturers {
private static final String fileName= "src/main/resources/pdssr/manufacturers.csv";
ListOfNameValue getManufacturers() throws IOException {
ListOfNameValue lnv = new ListOfNameValue();
File csvFile = new File(fileName);
CsvMapper mapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader(); // use first row as header; otherwise defaults are fine
MappingIterator<Map<String,String>> it = mapper.readerFor(Map.class)
.with(schema)
.readValues(csvFile);
while (it.hasNext()) {
Map<String,String> rowAsMap = it.next();
System.out.println(rowAsMap);
}
return lnv;
}
}
The data:
"mfr_id","mfr_cd","mfr_name"
"0000000020","F-L", "Frito-Lay"
"0000000030","GM", "General Mills"
"0000000040","HVEND", "Hershey Vending"
"0000000050","HFUND", "Hershey Fund Raising"
"0000000055","HCONC", "Hershey Concession"
"0000000060","SNYDERS", "Snyder's of Hanover"
"0000000080","KELLOGG", "Kellogg & Keebler"
"0000000115","KARS", "Kar Nut Product (Kar's)"
"0000000135","MARS", "Mars Chocolate "
"0000000145","POORE", "Inventure Group (Poore Brothers)"
"0000000150","WOW", "WOW Foods"
"0000000160","CADBURY", "Cadbury Adam USA, LLC"
"0000000170","MONOGRAM", "Monogram Food"
"0000000185","JUSTBORN", "Just Born"
"0000000190","HOSTESS", "Hostess, Dolly Madison"
"0000000210","SARALEE", "Sara Lee"
The exception is
fasterxml.jackson.databind.exc.RuntimeJsonMappingException: Too many entries: expected at most 3 (value #3 (4 chars) "LLC"")
I thought I would throw out my own CSV parser and adopt a supported project with more functionality, but most of them are far slower, just plain break or have examples all over the web that don't work with current release of the product.
The problem is your file does not meet the CSV standard. The third field always starts with a space
mfr_id","mfr_cd","mfr_name"
"0000000020","F-L", "Frito-Lay"
"0000000030","GM", "General Mills"
"0000000040","HVEND", "Hershey Vending"
"0000000050","HFUND", "Hershey Fund Raising"
From wikipedia:
According to RFC 4180, spaces outside quotes in a field are not allowed; however, the RFC also says that "Spaces are considered part of a field and should not be ignored." and "Implementors should 'be conservative in what you do, be liberal in what you accept from others' (RFC 793, section 2.10) when processing CSV files."
Jackson is being "liberal" in processing most of your records; but when it finds
"0000000160","CADBURY", "Cadbury Adam USA, LLC"
It has no choice but to treat is as 4 fields:
'0000000160'
'CADBURY'
' "Cadbury Adam USA'
' LLC"'
Would suggest fixing the file as that will allow parsing with most CSV libraries. You could try another library, there is no shortage of them.
univocity-parsers can handle that without any issues. It's built to deal with all sorts of tricky and non-standard CSV files and is also faster than the parser you are using.
Try this code:
String fileName= "src/main/resources/pdssr/manufacturers.csv";
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(settings);
for(Record record : parser.iterateRecords(new File(fileName))){
Map<String, String> rowAsMap = record.toFieldMap();
System.out.println(rowAsMap);
}
Hope this helps.
Disclosure: I'm the author of this library. It's open source and free (Apache 2.0 license)
I found a bit of code by BalusC which was edited by another user: Pisek, and was wondering how to read data from another website.
I understand how to find the new class name to read different parts of data but I'm not sure how to read the quantity of the product.
Here's my code so far:
package internalAssessment;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class practiceArea {
public static void main(String[] args) throws Exception {
String url = "https://www.tesco.com/groceries/product/details/?id=265485175";
Document document = Jsoup.connect(url).get();
String price = document.select(".linePrice").text();
System.out.println("Price: " + price);
String quantity = document.select("").text();
System.out.println("Quantity: " + quantity);
}
}
The way you get the price is by using the class :
String price = document.select(".linePrice").text();
You can also get the quantity with its class (or by its id):
document.select(".quantity").attr("value"); // by class
document.select("#qty-265485175-1").attr("value"); // by id
The thing which differ is get the number, here it is a value attribute so you'll use : .attr("value");
As I said in comment : this launches a nex connection to the website, so there is reason that the value you'll get will not be 1
I need help with my Java project using Jsoup (if you think there is a more efficient way to achieve the purpose, please let me know). The purpose of my program is to parse certain useful information from different URLs and put it in a text file. I am not an expert in HTML or JavaScript, therefore, it has been difficult for me to code in Java exactly what I want to parse.
In the website that you see in the code below as one of the examples, the information that interests me to parse with Jsoup is everything you can see in the table under “Routing”(Route, Location, Vessel/Voyage, Container Arrival Date, Container Departure Date; = Origin, Seattle SSA Terminal T18, 26 Jun 15 A, 26 Jun 15 A… and so on).
So far, with Jsoup we are only able to parse the title of the website, yet we have been unsuccessful in getting any of the body.
Here is the code that I used, which I got from an online source:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Jsouptest71115 {
public static void main(String[] args) throws Exception {
String url = "http://google.com/gentrack/trackingMain.do "
+ "?trackInput01=999061985";
Document document = Jsoup.connect(url).get();
String title = document.title();
System.out.println("title : " + title);
String body = document.select("body").text();
System.out.println("Body: " + body);
}
}
Working code:
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
import java.util.ArrayList;
public class Sample {
public static void main(String[] args) {
String url = "http://homeport8.apl.com/gentrack/blRoutingPopup.do";
try {
Connection.Response response = Jsoup.connect(url)
.data("blNbr", "999061985") // tracking number
.method(Connection.Method.POST)
.execute();
Element tableElement = response.parse().getElementsByTag("table")
.get(2).getElementsByTag("table")
.get(2);
Elements trElements = tableElement.getElementsByTag("tr");
ArrayList<ArrayList<String>> tableArrayList = new ArrayList<>();
for (Element trElement : trElements) {
ArrayList<String> columnList = new ArrayList<>();
for (int i = 0; i < 5; i++) {
columnList.add(i, trElement.children().get(i).text());
}
tableArrayList.add(columnList);
}
System.out.println("Origin/Location: "
+tableArrayList.get(1).get(1));// row and column number
System.out.println("Discharge Port/Container Arrival Date: "
+tableArrayList.get(5).get(3));
} catch (IOException e) {
e.printStackTrace();
}
}
}
Output:
Origin/Location: SEATTLE SSA TERMINAL (T18), WA
Discharge Port/Container Arrival Date: 23 Jul 15 E
You need to utilize document.select("body") select method input to which is CSS selector. To know more about CSS selectors just google it, or Read this. Using CSS selectors you can identify parts of web page body easily.
In your particular case you will have a different problem though, for instance the table you are after is inside an IFrame and if you look at the html of web page you are visiting its(iframe's) url is "http://homeport8.apl.com/gentrack/blRoutingFrame.do", so if you visit this URL directly so that you can access its content you will get an exception which is perhaps some restriction from Server. To get content properly you need to visit two URLs via JSoup, 1. http://homeport8.apl.com/gentrack/trackingMain.do?trackInput01=999061985 and 2. http://homeport8.apl.com/gentrack/blRoutingFrame.do?trackInput01=999061985
For first URL you'll get nothing useful, but for second URL you'll get tables of your interest. The try using document.select("table") which will give you List of tables iterator over this list and find table of your interest. Once you have the table use Element.select("tr") to get a table row and then for each "tr" use Element.select("td") to get table cell data.
The webpage you are visiting didn't use CSS class and id selectors which would have made reading it with jsoup a lot easier so I am afraid iterating over document.select("table") is your best and easy option.
Good Luck.
I have a process in Talend which gets the search result of a page, saves the html and writes it into files, as seen here:
Initially I had a two step process with parsing out the date from the HTML files in Java. Here is the code: It works and writes it to a mysql database. Here is the code which basically does exactly that. (I'm a beginner, sorry for the lack of elegance)
package org.jsoup.examples;
import java.io.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;
import java.io.IOException;
public class parse2 {
static parse2 parseIt2 = new parse2();
String companyName = "Platzhalter";
String jobTitle = "Platzhalter";
String location = "Platzhalter";
String timeAdded = "Platzhalter";
public static void main(String[] args) throws IOException {
parseIt2.getData();
}
//
public void getData() throws IOException {
Document document = Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"), "utf-8");
Elements elements = document.select(".joblisting");
for (Element element : elements) {
// Parse Data into Elements
Elements jobTitleElement = element.select(".job_title span");
Elements companyNameElement = element.select(".company_name span[itemprop=name]");
Elements locationElement = element.select(".locality span[itemprop=addressLocality]");
Elements dateElement = element.select(".job_date_added [datetime]");
// Strip Data from unnecessary tags
String companyName = companyNameElement.text();
String jobTitle = jobTitleElement.text();
String location = locationElement.text();
String timeAdded = dateElement.attr("datetime");
System.out.println("Firma:\t"+ companyName + "\t" + jobTitle + "\t in:\t" + location + " \t Erstellt am \t" + timeAdded );
}
}
}
Now I want to do the process End-to-End in Talend, and I got assured this works.
I tried this (which looks quite shady to me):
Basically I put all imports in "advanced settings" and the code in the "basic settings" section. This importLibrary is thought to load the jsoup parsing library, as well as the mysql connect (i might to the connect with talend tools though).
Obviously this isn't working. I tried to strip the Base Code from classes and stuff and it was even worse. Can you help me how to get the generated .txt files parsed with Java here?
EDIT: Here is the Link to the talend Job http://www.share-online.biz/dl/8M5MD99NR1
EDIT2: I changed the code to the one I tried in JavaFlex. But it didn't work (the import part in the start part of the code, the rest in "body/main" and nothing in "end".
This is a problem related to Talend, in your code, use the complete method names including their packages. For your document parsing for example, you can use :
Document document = org.jsoup.Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"), "utf-8");