NoSuchMethod when trying to create a SPARQL query with jena - java

I am trying to make some SPARQL queries using vc-db-1.rdf and q1.rq from ARQ examples. Here is my java code:
import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.util.FileManager;
import com.hp.hpl.jena.query.* ;
import com.hp.hpl.jena.query.ARQ;
import com.hp.hpl.jena.iri.*;
import java.io.*;
public class querier extends Object
{
static final String inputFileName = "vc-db-1.rdf";
public static void main (String args[])
{
// Create an empty in-memory model
Model model = ModelFactory.createDefaultModel();
// use the FileManager to open the bloggers RDF graph from the filesystem
InputStream in = FileManager.get().open(inputFileName);
if (in == null)
{
throw new IllegalArgumentException( "File: " + inputFileName + " not found");
}
// read the RDF/XML file
model.read( in, "");
// Create a new query
String queryString = "PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT ?y ?givenName WHERE { ?y vcard:Family \"Smith\" . ?y vcard:Given ?givenName . }";
QueryFactory.create(queryString);
}
}
Compilation passes just fine.
The problem is that the query is not even executed, but I am getting an error during creating it at line
QueryFactory.create(queryString);
with the following explanation:
C:\Wallet\projects\java\ARQ_queries>java querier
Exception in thread "main" java.lang.NoSuchMethodError: com.hp.hpl.jena.iri.IRI.
resolve(Ljava/lang/String;)Lcom/hp/hpl/jena/iri/IRI;
at com.hp.hpl.jena.n3.IRIResolver.resolveGlobal(IRIResolver.java:191)
at com.hp.hpl.jena.sparql.mgt.SystemInfo.createIRI(SystemInfo.java:31)
at com.hp.hpl.jena.sparql.mgt.SystemInfo.<init>(SystemInfo.java:23)
at com.hp.hpl.jena.query.ARQ.init(ARQ.java:373)
at com.hp.hpl.jena.query.ARQ.<clinit>(ARQ.java:385)
at com.hp.hpl.jena.query.Query.<clinit>(Query.java:53)
at com.hp.hpl.jena.query.QueryFactory.create(QueryFactory.java:68)
at com.hp.hpl.jena.query.QueryFactory.create(QueryFactory.java:40)
at com.hp.hpl.jena.query.QueryFactory.create(QueryFactory.java:28)
at querier.main(querier.java:24)
How can i solve this? Thank you.

It looks like you're missing the IRI library on the classpath (the IRI library is separate from the main Jena JAR). Jena has runtime dependencies on several other libraries which are included in the lib directory of the Jena distribution. All of these need to be on your classpath at runtime (but not necessarily at compile time).

Related

Read Parquet File with illegal characters (Apache-Avro)

I have some Parquet files written in Python using PyArrow. Now I want to read them using a Java program. I tried the following, using Apache Avro:
import java.io.IOException;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.Schema;
import org.apache.avro.SchemaBuilder;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.avro.AvroReadSupport;
import org.apache.parquet.hadoop.ParquetReader;
public class Main {
private static Path path = new Path("D:\\pathToFile\\review.parquet");
public static void main(String[] args) throws IllegalArgumentException {
try {
Configuration conf = new Configuration();
Schema schema = SchemaBuilder.record("lineitem")
.fields()
.name("reviewID")
.aliases("review_id$str")
.type().stringType()
.noDefault()
.endRecord();
conf.set(AvroReadSupport.AVRO_REQUESTED_PROJECTION, schema.toString());
ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(path)
.withConf(conf)
.build();
GenericRecord r;
while (null != (r = reader.read())) {
r.getSchema().getField("reviewID").addAlias("review_id$str");
Object review_id = r.get("review_id$str");
String review_id_str = review_id != null ? ("'" + review_id.toString() + "'") : "-";
System.out.println("review_id: " + review_id_str);
}
} catch (IOException e) {
System.out.println("Error reading parquet file.");
e.printStackTrace();
}
}
}
My Parquet File contains columns whose name contain the symbols [, ], ., \ and $. (In this case, the Parquet file contains a column review_id$str, whose values I want to read). However, these characters are invalid in Avro (see: https://avro.apache.org/docs/current/spec.html#names). Therefore, I tried to use Aliases (see: http://avro.apache.org/docs/current/spec.html#Aliases). Even though now I don't get any "Invalid Character Errors", I am still unable to get the values, i.e. nothing is getting printed even though the column contains values.
It only prints:
review_id: -
review_id: -
review_id: -
review_id: -
...
And expected would be:
review_id: Q1sbwvVQXV2734tPgoKj4Q
review_id: GJXCdrto3ASJOqKeVWPi6Q
review_id: 2TzJjDVDEuAW6MR5Vuc1ug
review_id: yi0R0Ugj_xUx_Nek0-_Qig
...
Am I using the Aliases wrong? Is it even possible to use aliases in this situation? If so, please explain me how I can fix it. Thank you.
Update 2021:
In the end, I decided not to use Java for this task. I stuck to my solution in Python using PyArrow which works perfectly fine.

Migrating a file or a folder from one repository to another in Documentum

I am working on a JavaFx project connected to Documentum data storage . And I am trying to configure how to move a file (lets call it file1) placed in a folder (lets call it Folder1) into another folder (lets call it Folder2) . It's worth mentioning that both of the Folders are in the same cabinet . I have implemented the following class :
package application;
import com.documentum.com.DfClientX;
import com.documentum.com.IDfClientX;
import com.documentum.fc.client.DfClient;
import com.documentum.fc.client.IDfDocument;
import com.documentum.fc.client.IDfFolder;
import com.documentum.fc.client.IDfSession;
import com.documentum.fc.common.DfException;
import com.documentum.fc.common.DfId;
import com.documentum.operations.IDfMoveNode;
import com.documentum.operations.IDfMoveOperation;
public class Migrate {
public Migrate(){}
public String move ( IDfSession mySession,String docId, String destination){
String str ="";
try{
IDfClientX clientx = new DfClientX();
IDfMoveOperation mo = clientx . getMoveOperation();
IDfFolder destinationDirectory = mySession . getFolderByPath(destination);
//Here is the line that causes error
mo.setDestinationFolderId(destinationDirectory . getObjectId());
IDfDocument doc = (IDfDocument) mySession . getObject(new DfId(docId));
IDfMoveNode node = (IDfMoveNode)mo.add(doc);
if (mo.execute()) {
str= "Move operation successful . ";
}
else {
str = "Move operation failed . ";
}
}catch(DfException e){
System.out.println(e.getLocalizedMessage());
}
return str;
}
}
instead of docId I am passing through the r_object_id of the file I am wishing to be moved but I get the following error :
com.documentum.fc.client.DfFolder___PROXY cannot be cast to
com.documentum.fc.client.IDfDocument
Does any one know where my mistake is ? Or where am I doing it wrong ?
It's obvious, in line
IDfDocument doc = (IDfDocument) mySession . getObject(new DfId(docId));
the docId parameter represents folder object, not the document object. Do the type check first to be sure and than use either IDfFolder or IDfDocument. If you're sure that you're moving folder to another folder than just change IDfDocument -> IDfFolder.

Regex for SPARQL

I have downloaded dbpedia_quotationsbook.zip from dbpedia which contains dbpedia_quotationsbook.nt triplestore.
In this triplestore
subject is authorname
predicate is "sameas"
object is authorcode
I have tried this querying triplestore using JENA , simple queries are running.
Now I want all authorcode whose authorname matches partially with given string .
So I tried following query
select ?code
where
{
FILTER regex(?name, "^Rob") <http://www.w3.org/2002/07/owl#sameAs> ?code.
}
above query should return all authorcodes whose authorname contains
"Rob"
I am getting following exception
Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Encountered " "." ". "" at line 5, column 74.
Was expecting one of:
<IRIref> ...
<PNAME_NS> ...
<PNAME_LN> ...
<BLANK_NODE_LABEL> ...
<VAR1> ...
<VAR2> ...
"true" ...
"false" ...
<INTEGER> ...
<DECIMAL> ...
<DOUBLE> ...
<INTEGER_POSITIVE> ...
<DECIMAL_POSITIVE> ...
<DOUBLE_POSITIVE> ...
<INTEGER_NEGATIVE> ...
<DECIMAL_NEGATIVE> ...
<DOUBLE_NEGATIVE> ...
<STRING_LITERAL1> ...
<STRING_LITERAL2> ...
<STRING_LITERAL_LONG1> ...
<STRING_LITERAL_LONG2> ...
"(" ...
<NIL> ...
"[" ...
<ANON> ...
at com.hp.hpl.jena.sparql.lang.ParserSPARQL11.perform(ParserSPARQL11.java:102)
at com.hp.hpl.jena.sparql.lang.ParserSPARQL11.parse$(ParserSPARQL11.java:53)
at com.hp.hpl.jena.sparql.lang.SPARQLParser.parse(SPARQLParser.java:34)
at com.hp.hpl.jena.query.QueryFactory.parse(QueryFactory.java:148)
at com.hp.hpl.jena.query.QueryFactory.create(QueryFactory.java:80)
at com.hp.hpl.jena.query.QueryFactory.create(QueryFactory.java:53)
at com.hp.hpl.jena.query.QueryFactory.create(QueryFactory.java:41)
at rdfcreate.NewClass.query(NewClass.java:55)
at rdfcreate.NewClass.main(NewClass.java:97)
Jena Code
import com.hp.hpl.jena.query.Dataset;
import com.hp.hpl.jena.query.Query;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.QueryFactory;
import com.hp.hpl.jena.query.QuerySolution;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.tdb.TDBFactory;
import com.hp.hpl.jena.util.FileManager;
/**
*
* #author swapnil
*/
public class NewClass {
String read()
{
final String tdbDirectory = "C:\\TDBLoadGeoCoordinatesAndLabels";
String dataPath = "F:\\Swapnil Drive\\Project Stuff\\Project data 2015 16\\Freelancer\\SPARQL\\dbpedia_quotationsbook.nt";
Model tdbModel = TDBFactory.createModel(tdbDirectory);
/*Incrementally read data to the Model, once per run , RAM > 6 GB*/
FileManager.get().readModel( tdbModel, dataPath, "N-TRIPLES");
tdbModel.close();
return tdbDirectory;
}
void query(String tdbDirectory, String query1)
{
Dataset dataset = TDBFactory.createDataset(tdbDirectory);
Model tdb = dataset.getDefaultModel();
Query query = QueryFactory.create(query1);
QueryExecution qexec = QueryExecutionFactory.create(query, tdb);
/*Execute the Query*/
ResultSet results = qexec.execSelect();
System.out.println(results.getRowNumber());
while (results.hasNext()) {
// Do something important
QuerySolution qs = results.nextSolution();
qs.toString();
System.out.println("sol "+qs);
}
qexec.close();
tdb.close() ;
}
public static void main(String[] args) {
NewClass nc = new NewClass();
String tdbd= nc.read();
nc.query(tdbd, "select ?code\n" +
"WHERE\n" +
"{\n" +
"<http://dbpedia.org/resource/Robert_H._Schuller> <http://www.w3.org/2002/07/owl#sameAs> ?code.\n" +
"}");
}
}
}
Result
sol ( ?code = http://quotationsbook.com/author/6523 )
Above query gives me code of the given author.
Please help me on this
You cannot mix patterns and filters. You must first bind (ie select) the ?name using a triple pattern and then filter the results. Jena basically complains because your SPARQL has invalid syntax.
Now, you could run the query below but your data only contains mapping between dbpedia URIs and quotationsbook URIs.
PREFIX owl: <http://www.w3.org/2002/07/owl#>
select ?code
where
{
?author <name> ?name .
?author owl:sameAs ?code .
FILTER regex(?name, "^Rob")
}
The above means
Get names of authors
Get codes of authors
Include only authors whose name matches regex
Select their codes
Again this would only work for data available locally. Problem is that you do not have the actual names. Of course you could change you query to regex entire dbpedia identifiers, but that's not perfect.
FILTER regex(?author, "Rob")
What you can do, because dbpedia resources are dereferencable, is wrap the name triple pattern in a GRAPH pattern
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
select ?author ?code
where
{
GRAPH <file://path/to/dbpedia_quotationsbook.nt>
{
?author owl:sameAs ?code .
}
GRAPH ?author
{
?author <http://www.w3.org/2000/01/rdf-schema#label> ?name .
FILTER regex(?name, "^Rob")
}
}
Here's what's happening
Get ?authors and ?codes from the import file (SPARQL GUI imports into a graph)
Treat ?author as graph name, so that it can be downloaded from the web
Get ?author's ?names
Filter ?names which start with Rob
There are two important bits to make this work, depending on your SPARQL processor (I'm using SPARQL GUI from dotNetRDF toolkit).
Here's a screenshot of the results I got. Notice the highlighted settings and Fiddler log of dbpedia requests.
Bottom line is I've just given you an example of federated SPARQL query.

How to parse data in Talend with Java (coming from a previously produced .txt file)?

I have a process in Talend which gets the search result of a page, saves the html and writes it into files, as seen here:
Initially I had a two step process with parsing out the date from the HTML files in Java. Here is the code: It works and writes it to a mysql database. Here is the code which basically does exactly that. (I'm a beginner, sorry for the lack of elegance)
package org.jsoup.examples;
import java.io.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;
import java.io.IOException;
public class parse2 {
static parse2 parseIt2 = new parse2();
String companyName = "Platzhalter";
String jobTitle = "Platzhalter";
String location = "Platzhalter";
String timeAdded = "Platzhalter";
public static void main(String[] args) throws IOException {
parseIt2.getData();
}
//
public void getData() throws IOException {
Document document = Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"), "utf-8");
Elements elements = document.select(".joblisting");
for (Element element : elements) {
// Parse Data into Elements
Elements jobTitleElement = element.select(".job_title span");
Elements companyNameElement = element.select(".company_name span[itemprop=name]");
Elements locationElement = element.select(".locality span[itemprop=addressLocality]");
Elements dateElement = element.select(".job_date_added [datetime]");
// Strip Data from unnecessary tags
String companyName = companyNameElement.text();
String jobTitle = jobTitleElement.text();
String location = locationElement.text();
String timeAdded = dateElement.attr("datetime");
System.out.println("Firma:\t"+ companyName + "\t" + jobTitle + "\t in:\t" + location + " \t Erstellt am \t" + timeAdded );
}
}
}
Now I want to do the process End-to-End in Talend, and I got assured this works.
I tried this (which looks quite shady to me):
Basically I put all imports in "advanced settings" and the code in the "basic settings" section. This importLibrary is thought to load the jsoup parsing library, as well as the mysql connect (i might to the connect with talend tools though).
Obviously this isn't working. I tried to strip the Base Code from classes and stuff and it was even worse. Can you help me how to get the generated .txt files parsed with Java here?
EDIT: Here is the Link to the talend Job http://www.share-online.biz/dl/8M5MD99NR1
EDIT2: I changed the code to the one I tried in JavaFlex. But it didn't work (the import part in the start part of the code, the rest in "body/main" and nothing in "end".
This is a problem related to Talend, in your code, use the complete method names including their packages. For your document parsing for example, you can use :
Document document = org.jsoup.Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"), "utf-8");

How to process the rdf version of a DBpedia page with Jena?

In all dbpedia pages, e.g.
http://dbpedia.org/page/Ireland
there's a link to a RDF file.
In my application I need to analyse the rdf code and run some logic on it.
I could rely on the dbpedia SPARQL endpoint, but I prefer to download the rdf code locally and parse it, to have full control over it.
I installed JENA and I'm trying to parse the code and extract for example a property called: "geo:geometry".
I'm trying with:
StringReader sr = new StringReader( node.rdfCode )
Model model = ModelFactory.createDefaultModel()
model.read( sr, null )
How can I query the model to get the info I need?
For example, if I wanted to get the statement:
<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<geo:geometry xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" rdf:datatype="http://www.openlinksw.com/schemas/virtrdf#Geometry">POINT(-7 53)</geo:geometry>
</rdf:Description>
Or
<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<dbpprop:countryLargestCity xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Dublin</dbpprop:countryLargestCity>
</rdf:Description>
What is the right filter?
Many thanks!
Mulone
Once you have the file parsed in a Jena model you can iterate and filter with something like:
//Property to filter the model
Property geoProperty =
model. createProperty("http://www.w3.org/2003/01/geo/wgs84_pos#",
"geometry");
//Iterator based on a Simple selector
StmtIterator iter =
model.listStatements(new SimpleSelector(null, geoProperty, (RDFNode)null));
//Loop to traverse the statements that match the SimpleSelector
while (iter.hasNext()) {
Statement stmt = iter.nextStatement();
System.out.print(stmt.getSubject().toString());
System.out.print(stmt.getPredicate().toString());
System.out.println(stmt.getObject().toString());
}
The SimpleSelector allows you to pass any (subject,predicate,object) pattern to match statements in the model. In your case if you only care about a specific predicate then first and third parameters of the constructor are null.
Allowing filtering two different properties
To allow more complex filtering you can implement the selects method in the
SimpleSelector interface like here:
Property geoProperty = /* like before */;
Property countryLargestCityProperty =
model. createProperty("http://dbpedia.org/property/",
"countryLargestCity");
SimpleSelector selector = new SimpleSelector(null, null, (RDFNode)null) {
public boolean selects(Statement s)
{ return s.getPredicate().equals(geoProperty) ||
s.getPredicate().equals(countryLargestCityProperty) ;}
}
StmtIterator iter = model.listStatements(selector);
while(it.hasNext()) {
/* same as in the previous example */
}
Edit: including a full example
This code includes a full example that works for me.
import com.hp.hpl.jena.util.FileManager;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.SimpleSelector;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.rdf.model.Statement;
public class TestJena {
public static void main(String[] args) {
FileManager fManager = FileManager.get();
fManager.addLocatorURL();
Model model = fManager.loadModel("http://dbpedia.org/data/Ireland.rdf");
Property geoProperty =
model. createProperty("http://www.w3.org/2003/01/geo/wgs84_pos#",
"geometry");
StmtIterator iter =
model.listStatements(new SimpleSelector(null, geoProperty,(RDFNode) null));
//Loop to traverse the statements that match the SimpleSelector
while (iter.hasNext()) {
Statement stmt = iter.nextStatement();
if (stmt.getObject().isLiteral()) {
Literal obj = (Literal) stmt.getObject();
System.out.println("The geometry predicate value is " +
obj.getString());
}
}
}
}
This full example prints out:
The geometry predicate value is POINT(-7 53)
Notes on Linked Data
http://dbpedia.org/page/Ireland is the HTML document version of the resource http://dbpedia.org/resource/Ireland
In order to get the RDF you should resolve :
http://dbpedia.org/data/Ireland.rdf
or
http://dbpedia.org/resource/Ireland + Accept: application/rdfxml in the HTTP header.
With curl it'd be something like:
curl -L -H 'Accept: application/rdf+xml' http://dbpedia.org/resource/Ireland

Categories

Resources