cTAKES parser output

cTAKES parser output - java

I am trying to understand the result generated via cTAKES parser. I am unable to understand certain points-
cTAKES parser is invoked via TIKa-app
we get following result-
ctakes:AnatomicalSiteMention: liver:77:82:C1278929,C0023884
ctakes:ProcedureMention: CT scan:24:31:C0040405,C0040405,C0040405,C0040405
ctakes:ProcedureMention: CT:24:26:C0009244,C0009244,C0040405,C0040405,C0009244,C0009244,C0040405,C0009244,C0009244,C0009244,C0040405
ctakes:ProcedureMention: scan:27:31:C0034606,C0034606,C0034606,C0034606,C0441633,C0034606,C0034606,C0034606,C0034606,C0034606,C0034606
ctakes:RomanNumeralAnnotation: did:47:50:
ctakes:SignSymptomMention: lesions:62:69:C0221198,C0221198
ctakes:schema: coveredText:start:end:ontologyConceptArr
resourceName: sample
and document parsed contains -
The patient underwent a CT scan in April which did not reveal lesions in his liver
i have following questions-
why UMLS id is repeated like in ctakes:ProcedureMention: scan:27:31:C0009244,C0009244,C0040405,C0040405,C0009244,C0009244,C0040405,C0009244,C0009244,C0009244,C0040405? (cTAKES configuration properties file has annotationProps=BEGIN,END,ONTOLOGY_CONCEPT_ARR)
what does RomanNumeralAnnotation indicate?
In concept unique identifier like C0040405, do these 7 numbers have any meaning. How are these generated?
System information:
Apache tika 1.10
Apache ctakes 3.2.2

Related

OSB fn:bea Function using XQuery Engine in Java

After some research I haven't found a solution, but quite alot of people with this problem:
I am trying to do a XQuery Transformation in a Java Application using
net.sf.saxon.s9api
However I get this error when trying to compile XQueryExecutable exec = compiler.compile(...)); my XQuery:
Error on line 13 column 3 of AivPumaRequest.xquery:
XPST0081 XQuery static error near #... fn-bea:inlinedXML(fn:concat#:
Prefix fn-bea has not been declared
Error on line 44 column 102 of AivPumaRequest.xquery:
XPST0081 XQuery static error near #... div xdt:dayTimeDuration('P1D'#:
Prefix xdt has not been declared
Error on line 199 column 3 of AivPumaRequest.xquery:
XPST0081 XQuery static error near #... fn-bea:inlinedXML(fn:concat#:
Prefix fn-bea has not been declared
Error on line 282 column 4 of AivPumaRequest.xquery:
XPST0081 XQuery static error near #... {fn-bea:inlinedXML(fn:concat#:
Prefix fn-bea has not been declared
net.sf.saxon.s9api.SaxonApiException: Prefix fn-bea has not been declared
Is there a way to static include this prefix or what am I missing so my XQuery Engine (SAXON) finds the Prefix?

The simple answer to your question is that you can declare namespace prefixes either within the query prolog using
declare namespace fn-bea = "http://some-appropriate-uri";
or in the Saxon API using
XQueryCompiler.declareNamespace("fn-bea", "http://some-appropriate-uri")
But this won't get you any further unless (a) you know what URI to bind the prefixes to, and (b) you make the functions with these names available to the query processor.
The reference to xdt:dayTimeDuration suggests to me that this query was written when XQuery was still a working draft. If you look at the 2005 working draft, for example
https://www.w3.org/TR/2005/CR-xquery-20051103/
you'll see in section 2 that it uses a built-in prefix
xdt = http://www.w3.org/2005/xpath-datatypes
By the time XQuery 1.0 became a recommendation, the dayTimeDuration data type had been moved into the standard XML Schema (xs) namespace, so you can probably simply replace "xdt" by "xs" - though you should be aware that the semantics of the language probably changed in minor details as well.
As for fn-bea:inlinedXML, the choice of prefix suggests to me that this was probably a built-in vendor extension in the BEA query processor, which was taken over by Oracle. The spec here:
https://docs.oracle.com/cd/E13162_01/odsi/docs10gr3/xquery/extensions.html
says:
fn-bea:inlinedXML Parses textual XML and returns an instance of the
XQuery 1.0 Data Model.
Which suggests that the function does something very similar to the XQuery 3.0 function fn:parse-xml(), and I suggest you try making that replacement in your query.

Apache Marmotta LDPath libraries - Selecting URI fn:content

I am selecting certain rdf properties using Apache Marmotta LDPath. The documentation (http://marmotta.apache.org/ldpath/language.html) denotes fn and lmf prefixes are not neccesary explicitly defined.
My code is:
#prefix dc : <http://purl.org/dc/elements/1.1/> ;
id = . :: xsd:string ;
title = dc:title :: xsd:string ;
file = fn:content(.) :: lmf:text_es ;
but I get the next ParseException:
Caused by: org.apache.marmotta.ldpath.parser.ParseException: function with URI http://www.newmedialab.at/lmf/functions/1.0/content does not exist
at org.apache.marmotta.ldpath.parser.LdPathParser.getFunction(LdPathParser.java:213)
at org.apache.marmotta.ldpath.parser.LdPathParser.FunctionSelector(LdPathParser.java:852)
at org.apache.marmotta.ldpath.parser.LdPathParser.AtomicSelector(LdPathParser.java:686)
at org.apache.marmotta.ldpath.parser.LdPathParser.Selector(LdPathParser.java:607)
at org.apache.marmotta.ldpath.parser.LdPathParser.Rule(LdPathParser.java:441)
at org.apache.marmotta.ldpath.parser.LdPathParser.Program(LdPathParser.java:406)
at org.apache.marmotta.ldpath.parser.LdPathParser.parseProgram(LdPathParser.java:112)
at org.apache.marmotta.ldpath.LDPath.programQuery(LDPath.java:235)
... 47 more
Edit
I'm using the LDPath core Fedora Duraspace 4.5.1. My goal is Solr indexing full text of binary resources, anyway to proceed is valid for me.

To whom it need it,
it seems subset Apache Marmotta LDPath library does not support complex functions like fn:, lmf, and others.
For indexing full text of binary resources is necessary to use Apache Tika, for example.

How to tell wsdl2java not insert current timestamp into generated files?

I use wsdl2java to generate DTO Java classes. It adds current timestamp into the comments section of every file generated.
How to disable those timestamps?
Because I'd like to minify changes between two wsdl2java launches (the generated java sources are under RCS).
P.S. Java 7; wsdl2java comes from org.apache.cxf:cxf-codegen-plugin:2.6.16 although version 3 is also considered.

Use option -suppress-generated-date of underlying Apache CXF in wsdl2java configuration.
Fragment of a build.gradle file as an example:
wsdl2java {
...
wsdlsToGenerate = [
[
...
"-suppress-generated-date",
...
]
]
...
}
This option will change these comments in generated classes
/**
* This class was generated by Apache CXF 3.2.7
* 2018-11-23T10:12:12.986+02:00
* Generated source version: 3.2.7
*
*/
to these:
/**
* This class was generated by Apache CXF 3.2.7
* Generated source version: 3.2.7
*
*/
More details: http://cxf.apache.org/docs/wsdl-to-java.html

however, other with CXF 3.5.2 dates as
#Generated(value = "org.apache.cxf.tools.wsdlto.WSDLToJava", date = "2022-09-24T16:22:10.990+02:00")
#Generated(value = "com.sun.tools.xjc.Driver", comments = "JAXB RI v2.3.5", date = "2022-09-24T16:22:10+02:00")
still remain in code.
yes, the file heading comments are away but intention was not to have code cluttered with unwanted changes. the changes are tracked by git normally.
generated dates in code may help with very very old code, but generally they are not desirable. it would be even better to have one comment with date in Service than 20 very same comments spread around in code.
no one follows 20 dates spread around generated code. if no one reads that information, that information has no value and should be avoided.
The changes in WS contract are commonly followed in WSDL file, there is no need to have dates generated in code.
it might be partially useful, if the generated dates would track real changes, that means, it would ONLY update the date where the contents really changed. it is a bad idea to clutter all places with very same date.

How to read multiple ORC & OBR segment from HL7 message using HAPI

I have following HL7-Message to parse.
MSH|^~\&|LIS|LAB1|APP2|LAB2|20140706163250||OML^O21|20140706163252282|P|2.4
PID|1||7015||LISTESTPATIENT12^LISTESTPATIENT12||19730901000000|F
PV1|1||||||LISPHYCDE1^LISPHY001^LISCARE TEST
ORC|NW|LISCASEID15|||||||||||||||NJ||||TCL^TCL
OBR|1|LISCASEID15||28259^Her2^STAIN|||20140706162713|||||||20140706162713|Breast|patho^pathl^pathf|||image1^image1^image1|blk1^blk1^blk1|SPEC14^SPEC14^SPEC14
ORC|XO|LISCASEID15|||||||||||||||NJ||||TCL^TCL
OBR|2|LISCASEID15||28260^Her2^STAIN|||20140706162713|||||||20140706162713|Breast|patho^pathl^pathf|||image2^image2^image|blk2^blk2^blk2|SPEC14^SPEC14^SPEC14
I am trying to fetch values from both OBR & ORC segments using HAPI Terser.get() method as follows.
Terser t = new Terser(h7msg);
t.get("/.ORDER_OBSERVATION(0)/ORC-1-1"); // Should return NW
t.get("/.ORDER_OBSERVATION(1)/ORC-1-1"); // Should return XO
t.get("/.ORDER_OBSERVATION(0)/OBR-4-1"); // Should return 28259
t.get("/.ORDER_OBSERVATION(1)/OBR-4-1"); // Should return 28260
But all the above statements gives following error
"End of message reached while iterating without loop"
Don't know, what wrong I am doing here.
Guys please help me with proper input to Teaser.get() method, to get above values.

The issue here is that the OML^O21 message does not contain multiple ORDER_OBSERVATION. This means you cannot access the element ORDER_OBSERVATION(1), because it does not exist.
Here a representation within 7edit:
When you parse your OML message with to XML, you can see the real structure of the HL7:
<?xml version="1.0" encoding="UTF-8"?><OML_O21 xmlns="urn:hl7-org:v2xml">
<MSH>
<MSH.1>|</MSH.1>
<MSH.2>^~\&</MSH.2>
<MSH.3>
<HD.1>LIS</HD.1>
</MSH.3>
<MSH.4>
<HD.1>LAB1</HD.1>
</MSH.4>
<MSH.5>
<HD.1>APP2</HD.1>
</MSH.5>
<MSH.6>
<HD.1>LAB2</HD.1>
</MSH.6>
<MSH.7>
<TS.1>20140706163250</TS.1>
</MSH.7>
<MSH.9>
<MSG.1>OML</MSG.1>
<MSG.2>O21</MSG.2>
</MSH.9>
<MSH.10>20140706163252282</MSH.10>
<MSH.11>
<PT.1>P</PT.1>
</MSH.11>
<MSH.12>
<VID.1>2.4</VID.1>
</MSH.12>
</MSH>
<OML_O21.PATIENT>
<PID>
<PID.1>1</PID.1>
<PID.3>
<CX.1>7015</CX.1>
</PID.3>
<PID.5>
<XPN.1>
<FN.1>LISTESTPATIENT12</FN.1>
</XPN.1>
<XPN.2>LISTESTPATIENT12</XPN.2>
</PID.5>
<PID.7>
<TS.1>19730901000000</TS.1>
</PID.7>
<PID.8>F</PID.8>
</PID>
<OML_O21.PATIENT_VISIT>
<PV1>
<PV1.1>1</PV1.1>
<PV1.7>
<XCN.1>LISPHYCDE1</XCN.1>
<XCN.2>
<FN.1>LISPHY001</FN.1>
</XCN.2>
<XCN.3>LISCARE TEST</XCN.3>
</PV1.7>
</PV1>
</OML_O21.PATIENT_VISIT>
</OML_O21.PATIENT>
<OML_O21.ORDER_GENERAL>
<OML_O21.ORDER>
<ORC>
<ORC.1>NW</ORC.1>
<ORC.2>
<EI.1>LISCASEID15</EI.1>
</ORC.2>
<ORC.17>
<CE.1>NJ</CE.1>
</ORC.17>
<ORC.21>
<XON.1>TCL</XON.1>
<XON.2>TCL</XON.2>
</ORC.21>
</ORC>
</OML_O21.ORDER>
<OML_O21.ORDER>
<ORC>
<ORC.1>XO</ORC.1>
<ORC.2>
<EI.1>LISCASEID15</EI.1>
</ORC.2>
<ORC.17>
<CE.1>NJ</CE.1>
</ORC.17>
<ORC.21>
<XON.1>TCL</XON.1>
<XON.2>TCL</XON.2>
</ORC.21>
</ORC>
<OML_O21.OBSERVATION_REQUEST>
<OBR>
<OBR.1>1</OBR.1>
<OBR.2>
<EI.1>LISCASEID15</EI.1>
</OBR.2>
<OBR.4>
<CE.1>28259</CE.1>
<CE.2>Her2</CE.2>
<CE.3>STAIN</CE.3>
</OBR.4>
<OBR.7>
<TS.1>20140706162713</TS.1>
</OBR.7>
<OBR.14>
<TS.1>20140706162713</TS.1>
</OBR.14>
<OBR.15>
<SPS.1>
<CE.1>Breast</CE.1>
</SPS.1>
</OBR.15>
<OBR.16>
<XCN.1>patho</XCN.1>
<XCN.2>
<FN.1>pathl</FN.1>
</XCN.2>
<XCN.3>pathf</XCN.3>
</OBR.16>
<OBR.19>image1</OBR.19>
<OBR.20>blk1</OBR.20>
<OBR.21>SPEC14</OBR.21>
</OBR>
<OML_O21.PRIOR_RESULT>
<OML_O21.ORDER_PRIOR>
<OBR>
<OBR.1>2</OBR.1>
<OBR.2>
<EI.1>LISCASEID15</EI.1>
</OBR.2>
<OBR.4>
<CE.1>28260</CE.1>
<CE.2>Her2</CE.2>
<CE.3>STAIN</CE.3>
</OBR.4>
<OBR.7>
<TS.1>20140706162713</TS.1>
</OBR.7>
<OBR.14>
<TS.1>20140706162713</TS.1>
</OBR.14>
<OBR.15>
<SPS.1>
<CE.1>Breast</CE.1>
</SPS.1>
</OBR.15>
<OBR.16>
<XCN.1>patho</XCN.1>
<XCN.2>
<FN.1>pathl</FN.1>
</XCN.2>
<XCN.3>pathf</XCN.3>
</OBR.16>
<OBR.19>image2</OBR.19>
<OBR.20>blk2</OBR.20>
<OBR.21>SPEC14</OBR.21>
</OBR>
</OML_O21.ORDER_PRIOR>
</OML_O21.PRIOR_RESULT>
</OML_O21.OBSERVATION_REQUEST>
</OML_O21.ORDER>
</OML_O21.ORDER_GENERAL>
</OML_O21>
This is unfortunately a problem with many parsers like HAPI, they do verify the structure of any message, depending on the type (OML_O21) and also the version. Because if you change from 2.4 to 2.5, you will get a completely different structure.
If you don't care about that structure, you may use a different HL7 parser like HL7X that transforms the hl7 to xml like a delimited file - independent of hl7 message type or version.
Here you find a similar problem on stackoverflow:
How to parse the Multiple OBR Segment in HL7 using HAPI TERSER

Interpreted vs. Compiled XSLT 1.0 produces different JSON output

Given the following XML:
<abc>
<def>
<one>Hello</one>
<two>World</two>
</def>
</abc>
And the XSL file to transform the XML to JSON available here: http://dropbox.ashlock.us/open311/json-xml/xml-tools/xml2json_spark.xsl
When transforming using Interpreted XSLT (PROCESSOR: Apache Software Foundation, javax.xml.transform.TransformerFactory=org.apache.xalan.processor.TransformerFactoryImpl), the JSON output is:
{"abc":[{"one":"Hello","two":"World"}]}
When transforming using Compiled XSLT (PROCESSOR: Apache Software Foundation (Xalan XSLTC), javax.xml.transform.TransformerFactory=org.apache.xalan.xsltc.trax.TransformerFactoryImpl), the JSON output is:
[{"one":"Hello","two":"World"}]
Why do the 2 processors produce different results?

Saxon's output is the same as XSLTC:
[{"one":"Hello","two":"World"}]
I haven't attempted to debug the stylesheet in detail. It doesn't contain anything obviously implementation-defined, so it looks like a bug in Xalan-interpreted to me.
This pattern is questionable, though not illegal:
*[count(../*[name(../*)=name(.)])=count(../*) and count(../*)>1]
It's questionable because name(../*) is supplying a sequence of elements to the name function. That would be an error in XSLT 2.0, but in 1.0 mode it gives the name of the first selected element. I suspect that the author may have intended something like
*[count(../*[name(.)=name(current())])=count(../*) and count(../*)>1]

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

cTAKES parser output - java

Related

OSB fn:bea Function using XQuery Engine in Java

Apache Marmotta LDPath libraries - Selecting URI fn:content

How to tell wsdl2java not insert current timestamp into generated files?

How to read multiple ORC & OBR segment from HL7 message using HAPI

Interpreted vs. Compiled XSLT 1.0 produces different JSON output

Categories

Resources