Apache Marmotta LDPath libraries - Selecting URI fn:content - java

I am selecting certain rdf properties using Apache Marmotta LDPath. The documentation (http://marmotta.apache.org/ldpath/language.html) denotes fn and lmf prefixes are not neccesary explicitly defined.
My code is:
#prefix dc : <http://purl.org/dc/elements/1.1/> ;
id = . :: xsd:string ;
title = dc:title :: xsd:string ;
file = fn:content(.) :: lmf:text_es ;
but I get the next ParseException:
Caused by: org.apache.marmotta.ldpath.parser.ParseException: function with URI http://www.newmedialab.at/lmf/functions/1.0/content does not exist
at org.apache.marmotta.ldpath.parser.LdPathParser.getFunction(LdPathParser.java:213)
at org.apache.marmotta.ldpath.parser.LdPathParser.FunctionSelector(LdPathParser.java:852)
at org.apache.marmotta.ldpath.parser.LdPathParser.AtomicSelector(LdPathParser.java:686)
at org.apache.marmotta.ldpath.parser.LdPathParser.Selector(LdPathParser.java:607)
at org.apache.marmotta.ldpath.parser.LdPathParser.Rule(LdPathParser.java:441)
at org.apache.marmotta.ldpath.parser.LdPathParser.Program(LdPathParser.java:406)
at org.apache.marmotta.ldpath.parser.LdPathParser.parseProgram(LdPathParser.java:112)
at org.apache.marmotta.ldpath.LDPath.programQuery(LDPath.java:235)
... 47 more
Edit
I'm using the LDPath core Fedora Duraspace 4.5.1. My goal is Solr indexing full text of binary resources, anyway to proceed is valid for me.

To whom it need it,
it seems subset Apache Marmotta LDPath library does not support complex functions like fn:, lmf, and others.
For indexing full text of binary resources is necessary to use Apache Tika, for example.

Related

cTAKES parser output

I am trying to understand the result generated via cTAKES parser. I am unable to understand certain points-
cTAKES parser is invoked via TIKa-app
we get following result-
ctakes:AnatomicalSiteMention: liver:77:82:C1278929,C0023884
ctakes:ProcedureMention: CT scan:24:31:C0040405,C0040405,C0040405,C0040405
ctakes:ProcedureMention: CT:24:26:C0009244,C0009244,C0040405,C0040405,C0009244,C0009244,C0040405,C0009244,C0009244,C0009244,C0040405
ctakes:ProcedureMention: scan:27:31:C0034606,C0034606,C0034606,C0034606,C0441633,C0034606,C0034606,C0034606,C0034606,C0034606,C0034606
ctakes:RomanNumeralAnnotation: did:47:50:
ctakes:SignSymptomMention: lesions:62:69:C0221198,C0221198
ctakes:schema: coveredText:start:end:ontologyConceptArr
resourceName: sample
and document parsed contains -
The patient underwent a CT scan in April which did not reveal lesions in his liver
i have following questions-
why UMLS id is repeated like in ctakes:ProcedureMention: scan:27:31:C0009244,C0009244,C0040405,C0040405,C0009244,C0009244,C0040405,C0009244,C0009244,C0009244,C0040405? (cTAKES configuration properties file has annotationProps=BEGIN,END,ONTOLOGY_CONCEPT_ARR)
what does RomanNumeralAnnotation indicate?
In concept unique identifier like C0040405, do these 7 numbers have any meaning. How are these generated?
System information:
Apache tika 1.10
Apache ctakes 3.2.2

Java CSVeed library, option for quote inside unquoted field?

I'm benching several Java libraries to parse csv files. I can't find a solution for the CSVeed library with this line :
af,dekh"iykh'ya,Dekh"iykh'ya,13,,34.60345,69.2405
I have this error :
org.csveed.report.CsvException: Illegal state transition:
Parsing symbol QUOTE_SYMBOL [34] in state INSIDE_FIELD
19970: af,dekh
I understand very well what happen unfortunately I tried different blend of options without succeed. Is there a way?
In fact the perfect line of 7 cols should be :
af,dekh\"iykh\'ya,Dekh\"iykh\'ya,13,,34.60345,69.2405
af,dekh"iykh'ya,Dekh"iykh'ya,13,,34.60345,69.2405
To parse this in the following fields you'll have to turn quoting off in your parser:
af
dekh"iykh'ya
Dekh"iykh'ya
13
<null>
34.60345
69.2405
If quoting can not be turned off, you could use setQuote(char symbol) and provide an unused char as parameter.

OSB fn:bea Function using XQuery Engine in Java

After some research I haven't found a solution, but quite alot of people with this problem:
I am trying to do a XQuery Transformation in a Java Application using
net.sf.saxon.s9api
However I get this error when trying to compile XQueryExecutable exec = compiler.compile(...)); my XQuery:
Error on line 13 column 3 of AivPumaRequest.xquery:
XPST0081 XQuery static error near #... fn-bea:inlinedXML(fn:concat#:
Prefix fn-bea has not been declared
Error on line 44 column 102 of AivPumaRequest.xquery:
XPST0081 XQuery static error near #... div xdt:dayTimeDuration('P1D'#:
Prefix xdt has not been declared
Error on line 199 column 3 of AivPumaRequest.xquery:
XPST0081 XQuery static error near #... fn-bea:inlinedXML(fn:concat#:
Prefix fn-bea has not been declared
Error on line 282 column 4 of AivPumaRequest.xquery:
XPST0081 XQuery static error near #... {fn-bea:inlinedXML(fn:concat#:
Prefix fn-bea has not been declared
net.sf.saxon.s9api.SaxonApiException: Prefix fn-bea has not been declared
Is there a way to static include this prefix or what am I missing so my XQuery Engine (SAXON) finds the Prefix?
The simple answer to your question is that you can declare namespace prefixes either within the query prolog using
declare namespace fn-bea = "http://some-appropriate-uri";
or in the Saxon API using
XQueryCompiler.declareNamespace("fn-bea", "http://some-appropriate-uri")
But this won't get you any further unless (a) you know what URI to bind the prefixes to, and (b) you make the functions with these names available to the query processor.
The reference to xdt:dayTimeDuration suggests to me that this query was written when XQuery was still a working draft. If you look at the 2005 working draft, for example
https://www.w3.org/TR/2005/CR-xquery-20051103/
you'll see in section 2 that it uses a built-in prefix
xdt = http://www.w3.org/2005/xpath-datatypes
By the time XQuery 1.0 became a recommendation, the dayTimeDuration data type had been moved into the standard XML Schema (xs) namespace, so you can probably simply replace "xdt" by "xs" - though you should be aware that the semantics of the language probably changed in minor details as well.
As for fn-bea:inlinedXML, the choice of prefix suggests to me that this was probably a built-in vendor extension in the BEA query processor, which was taken over by Oracle. The spec here:
https://docs.oracle.com/cd/E13162_01/odsi/docs10gr3/xquery/extensions.html
says:
fn-bea:inlinedXML Parses textual XML and returns an instance of the
XQuery 1.0 Data Model.
Which suggests that the function does something very similar to the XQuery 3.0 function fn:parse-xml(), and I suggest you try making that replacement in your query.

Velocity parser crashes when parsing java code template

When trying to use a java source code as template for Velocity, it crashes at this line of the template:
/* #see panama.form.Validator#validate(java.lang.Object) */
with this Exception:
Exception in thread "main" org.apache.velocity.exception.ParseErrorException: Lexical error, Encountered: "l" (108), after : "." at *unset*[line 23, column 53]
at org.apache.velocity.runtime.RuntimeInstance.evaluate(RuntimeInstance.java:1301)
at org.apache.velocity.runtime.RuntimeInstance.evaluate(RuntimeInstance.java:1265)
at org.apache.velocity.app.VelocityEngine.evaluate(VelocityEngine.java:199)
Apparently it takes the #validate for a macro and crashes when it tries to parse the arguments for the macro. Is there anything one could do about this?
I'm using Velocity 1.7.
Edit
I know I could escape the # characters in the template files, but there are quite a number of them which also might change now and then, so I would prefer a way that would not require manual changes on the files.
First option
Try this solution from here: Escaping VTL Directives
VTL directives can be escaped with the backslash character ("\") in a manner similar to valid VTL references.
## #include( "a.txt" ) renders as <contents of a.txt>
#include( "a.txt" )
## \#include( "a.txt" ) renders as #include( "a.txt" )
\#include( "a.txt" )
## \\#include ( "a.txt" ) renders as \<contents of a.txt>
\\#include ( "a.txt" )
Second option
You have this tool [EscapeTool][2].
Tool for working with escaping in Velocity templates.
It provides methods to escape outputs for Java, JavaScript, HTML, XML and SQL. Also provides methods to render VTL characters that otherwise needs escaping.
Third option:
You may also try this workaround, I didn't use it but it should work:
You can at the beginning read your template as a String and then pre-parse it. For example replace all # with \#, or add to the beginning of file
#set( $H = '#' )
$H$H
see this answer: How to escape a # in velocity And then from that pre-parsed String create Template by using this answer: How to use String as Velocity Template?

Human editable JSON-like or YAML-like program configuration in Java

Is there a Java library similar to libconfig for C++, where the config file is stored in a JSON-like format that can be edited by humans, and later read from the program?
I don't want to use Spring or any of the larger frameworks. What I'm looking for is a small, fast, self-contained library. I looked at java.util.Properties, but it doesn't seem to support hierarchical/nested config data.
I think https://github.com/typesafehub/config is exactly what you are looking for. The format is called HOCON for Human-Optimized Config Object Notation and it a super-set of JSON.
Examples of HOCON:
HOCON that is also valid JSON:
{
"foo" : {
"bar" : 10,
"baz" : 12
}
}
HOCON also supports standard properties format, so the following is valid as well:
foo.bar=10
foo.baz=12
One of the features I find very useful is inheritance, this allows you to layer configurations. For instance a library would have a reference.conf, and the application using the library would have an application.conf. The settings in the application.conf will override the defaults in reference.conf.
Standard Behavior for loading configs:
The convenience method ConfigFactory.load() loads the following
(first-listed are higher priority):
system properties application.conf (all resources on classpath with
this name)
application.json (all resources on classpath with this
name)
application.properties (all resources on classpath with this
name)
reference.conf (all resources on classpath with this name)
I found this HOCON example:
my.organization {
project {
name = "DeathStar"
description = ${my.organization.project.name} "is a tool to take control over whole world. By world I mean couch, computer and fridge ;)"
}
team {
members = [
"Aneta"
"Kamil"
"Lukasz"
"Marcin"
]
}
}
my.organization.team.avgAge = 26
to read values:
val config = ConfigFactory.load()
config.getString("my.organization.project.name") // => DeathStar
config.getString("my.organization.project.description") // => DeathStar is a tool to take control over whole world. By world I mean couch, computer and fridge ;)
config.getInt("my.organization.team.avgAge") // => 26
config.getStringList("my.organization.team.members") // => [Aneta, Kamil, Lukasz, Marcin]
Reference: marcinkubala.wordpress.com
Apache Commons Configuration API and Constretto seem to be somewhat popular and support multiple formats (no JSON mentioned, though). I've personally never tried either, so YMMV.
There's a Java library to handle JSON files if that's what you're looking for:
http://www.json.org/java/index.html
Check out other tools on the main page:
http://json.org/

Categories

Resources