Ontology owl and Excel data - java

I am a non programmer. I have a ontology in owl format. I also have an excel sheet (it contains data numeric data with headers of selected ontology). Now I have to connect the excel header with ontology framework and need to extract the links in excel data from the ontology.

Do I understand you correctly that you have an RDF knowledge base whose schema is described by an OWL ontology and you want to import this data from RDF to a spreadsheet?
The most straightforward case to transform RDF to spreadsheets is a SPARQL SELECT query.
Prerequisites
If you don't already have the data in an application or endpoint where you can query it directly (e.g. Protégé may have a widget for SPARQL queries), there are three prerequisites, else skip those:
1. Export/Convert the Data
If you have your data in an application where you can't perform SPARQL queries or as a file in a syntax such as OWL/XML, you need to convert it first, because most SPARQL endpoints don't understand this format, but rather need an RDF serialization such as N-Triples, RDF Turtle or RDF/XML, so you need to export the data in one of those formats.
2. Setup a SPARQL Endpoint
Now you can install e.g. a Virtuoso SPARQL endpoint, either locally or on a server or use the endpoint of someone else who gives you access credentials.
It can take a while to install but you can use a Docker image if that is easier.
3. Upload the Data
In Virtuoso SPARQL, you can now upload the ontology and the instance data in the conductor under "Linked Data" -> "Quad Store Upload".
Querying
I don't know of any existing tool that automatically maps ontologies and downloads instances according to a given Excel sheet templates so I recommend to create a SPARQL SELECT query manually.
Example
Let's say your Excel sheet has the header rows "name", "age" and "height" (you said you have numeric data) and the ontology has a person class defined like this in RDF Turtle:
:Person a owl:Class;
rdfs:label "Person"#en.
:age a owl:DatatypeProperty;
rdfs:label "age"#en;
rdfs:domain :Person;
rdfs:range xsd:nonNegativeInteger.
:height a owl:DatatypeProperty;
rdfs:label "height"#en;
rdfs:domain :Person;
rdfs:range xsd:decimal.
Now you can write the following SPARQL SELECT query:
PREFIX :<http://my.prefix/>
SELECT ?person ?age ?height
{
?person a :person;
:age ?age;
:height ?height.
}
This will generate a result table, which you can obtain in different formats. Choose the CSV spreadsheet format and then you can import it into MS Excel, which solves your problem as far as I interpret it.

Related

Query document against multiple words

I'm using elasticsearch 6.x version with ingest plugin to let me query inside document.
I managed to insert record with attachment document and I'm able to query it against various fields.
When I query the content of the file I'm doing this:
boolQuery.filter(new MatchPhrasePrefixQueryBuilder("attachment.content", "St. Anna Church"))
It works, but I want now to make query with this field: "Church Wall People" where basically it's not a complete phrase, I want back all the documents that contain the words Church, Wall and People.

Read the table data from the PDF file and save in the MySQL database as it is

I have a requirement to read the value form a PDF file and save the result in a db.
I have converted Pdf to text .
Now the text data looks like this:
Test Name Results Units Bio. Ref. Interval
LIPID PROFILE, BASIC, SERUM
Cholesterol Total 166.00 mg/dL <200.00
Triglycerides 118.00 mg/dL <150.00
My requirement is to read the table data from the Pdf file and save in the MySQL database as it is.
use java io to read the text file and jdbc to safe the information in the mysql via sql.

Extract/Parse XML data/element from BLOB column in Oracle

I have a 2 tables, CONFIGURATION_INFO and CONFIGURATION_FILE. I use the below query to find out all employee files
select i.cfg_id, Filecontent
from CONFIGURATION_INFO i,
CONFIGURATION_FILE f
where i.cfg_id=f.cfg_id
but I also need to parse or extract data from the blob column Filecontent and display all cfg_id whose xml tag PCVERSION starts with 8. Is there any way?
XML tag that needs to be extracted is <CSMCLIENT><COMPONENT><PCVERSION>8.1</PCVERSION></COMPONENT></CSMCLIENT>
XML
It need not be any query, even if it is a java or groovy code, it would help me.
Note: Some of the XMLs might be as big as 5MB.
So basically the data from the table CONFIGURATION_INFO, for the column Filecontent is BLOB?
So the syntax to query out the XML from the BLOB Content from database is using this function XMLType.
This function is converting the datatype of your column from BLOB to XMLType. Then parsing it using XPath function.
Oracle Database
select
xmltype(Filecontent, 871).extract('//CSMCLIENT/COMPONENT/PCVERSION/text()').getstringval()
from CONFIGURATION_INFO ...
do the rest of WHERE logic on your own.
Usally you know what the data in the BLOB column, so you can parse in the SQL query..
If it is a text column (varchar or something like that) you can use to_char(coloumName).
There are a lot of functions that you can use you can find them in this link
Usually you will use to_char/to_date/hexToRow/rowTohex
convert blob to file link

sparql query for rdf:id

I want to write a sparql query to get rdf data based on their id. I am trying with
SELECT ?ID ?NAME WHERE {?ID = "something" }
but does not return the expecting results. Does anyone knows which is my mistake?
Actually rdf:id is the resource URI itself. You can utilise a SPARQL FILTER clause for filtering your result, or you can directly insert the URI in the WHERE clause of your query, e.g.
<myURI> ex:name ?name .
In order to have a precise answer you should share a small fragment of your RDF data (possibly in Turtle format, human friendly).

file (not in memory) based JDBC driver for CSV files

Is there a open source file based (NOT in-memory based) JDBC driver for CSV files? My CSV are dynamically generated from the UI according to the user selections and each user will have a different CSV file. I'm doing this to reduce database hits, since the information is contained in the CSV file. I only need to perform SELECT operations.
HSQLDB allows for indexed searches if we specify an index, but I won't be able to provide an unique column that can be used as an index, hence it does SQL operations in memory.
Edit:
I've tried CSVJDBC but that doesn't support simple operations like order by and group by. It is still unclear whether it reads from file or loads into memory.
I've tried xlSQL, but that again relies on HSQLDB and only works with Excel and not CSV. Plus its not in development or support anymore.
H2, but that only reads CSV. Doesn't support SQL.
You can solve this problem using the H2 database.
The following groovy script demonstrates:
Loading data into the database
Running a "GROUP BY" and "ORDER BY" sql query
Note: H2 supports in-memory databases, so you have the choice of persisting the data or not.
// Create the database
def sql = Sql.newInstance("jdbc:h2:db/csv", "user", "pass", "org.h2.Driver")
// Load CSV file
sql.execute("CREATE TABLE data (id INT PRIMARY KEY, message VARCHAR(255), score INT) AS SELECT * FROM CSVREAD('data.csv')")
// Print results
def result = sql.firstRow("SELECT message, score, count(*) FROM data GROUP BY message, score ORDER BY score")
assert result[0] == "hello world"
assert result[1] == 0
assert result[2] == 5
// Cleanup
sql.close()
Sample CSV data:
0,hello world,0
1,hello world,1
2,hello world,0
3,hello world,1
4,hello world,0
5,hello world,1
6,hello world,0
7,hello world,1
8,hello world,0
9,hello world,1
10,hello world,0
If you check the sourceforge project csvjdbc please report your expierences. the documentation says it is useful for importing CSV files.
Project page
This was discussed on Superuser https://superuser.com/questions/7169/querying-a-csv-file.
You can use the Text Tables feature of hsqldb: http://hsqldb.org/doc/2.0/guide/texttables-chapt.html
csvsql/gcsvsql are also possible solutions (but there is no JDBC driver, you will have to run a command line program for your query).
sqlite is another solution but you have to import the CSV file into a database before you can query it.
Alternatively, there is commercial software such as http://www.csv-jdbc.com/ which will do what you want.
To do anything with a file you have to load it into memory at some point. What you could do is just open the file and read it line by line, discarding the previous line as you read in a new one. Only downside to this approach is its linearity. Have you thought about using something like memcache on a server where you use Key-Value stores in memory you can query instead of dumping to a CSV file?
You can use either specialized JDBC driver, like CsvJdbc (http://csvjdbc.sourceforge.net) or you may chose to configure a database engine such as mySQL to treat your CSV as a table and then manipulate your CSV through standard JDBC driver.
The trade-off here - available SQL features vs performance.
Direct access to CSV via CsvJdbc (or similar) will allow you very quick operations on big data volumes, but without capabilities to sort or group records using SQL commands ;
mySQL CSV engine can provide rich set of SQL features, but with the cost of performance.
So if the size of your table is relatively small - go with mySQL. However if you need to process big files (> 100Mb) without need for grouping or sorting - go with CsvJdbc.
If you need both - handle very bif files and be able to manipulate them using SQL, then optimal course of action - to load the CSV into normal database table (e.g. mySQL) first and then handle the data as usual SQL table.

Categories

Resources