I have configured Apache Solr 6.6.2 to index and search documents later. I am facing some problems. If there is a number in document like 1234, I want it should be mapped (copied) to corresponding Urdu numerics like ۱۲۳۴. It will ultimately help to retrieve document if either user enter 1234 or ۱۲۳۴.
Is there any built in solution in Solr or how I can come into this functionality?
If you are using Java/SolrJ client for indexing ...
Add junidecode dependency to your project
for gradle
compile group: 'junidecode', name: 'junidecode', version: '0.1.1'
for maven:
<dependency>
<groupId>junidecode</groupId>
<artifactId>junidecode</artifactId>
<version>0.1.1</version>
</dependency>
while indexing ... index an additional field ...
import net.sf.junidecode.Junidecode;
String converted = Junidecode.unidecode("۱۲۳۴")
// converted == 1234
Related
I am trying to understand the result generated via cTAKES parser. I am unable to understand certain points-
cTAKES parser is invoked via TIKa-app
we get following result-
ctakes:AnatomicalSiteMention: liver:77:82:C1278929,C0023884
ctakes:ProcedureMention: CT scan:24:31:C0040405,C0040405,C0040405,C0040405
ctakes:ProcedureMention: CT:24:26:C0009244,C0009244,C0040405,C0040405,C0009244,C0009244,C0040405,C0009244,C0009244,C0009244,C0040405
ctakes:ProcedureMention: scan:27:31:C0034606,C0034606,C0034606,C0034606,C0441633,C0034606,C0034606,C0034606,C0034606,C0034606,C0034606
ctakes:RomanNumeralAnnotation: did:47:50:
ctakes:SignSymptomMention: lesions:62:69:C0221198,C0221198
ctakes:schema: coveredText:start:end:ontologyConceptArr
resourceName: sample
and document parsed contains -
The patient underwent a CT scan in April which did not reveal lesions in his liver
i have following questions-
why UMLS id is repeated like in ctakes:ProcedureMention: scan:27:31:C0009244,C0009244,C0040405,C0040405,C0009244,C0009244,C0040405,C0009244,C0009244,C0009244,C0040405? (cTAKES configuration properties file has annotationProps=BEGIN,END,ONTOLOGY_CONCEPT_ARR)
what does RomanNumeralAnnotation indicate?
In concept unique identifier like C0040405, do these 7 numbers have any meaning. How are these generated?
System information:
Apache tika 1.10
Apache ctakes 3.2.2
I am trying to convert strings that contain a unicode to the actual character but everything I have found so far either only work if the string is only the unicode or converts the symbol to the code.
This is the string I am using as an example right now
Rebroadcast of Shows from the past Week! RPGs, Talk shows, Science, Wisdom, Vampires and more - Good stuff! \\u003c3 - !rbschedule for more info
I am getting this in from an API call so I can't just write it as \ instead of the \\.
\\
That is called escaping, and it is what is currently blocking you from seeing the < character.
Un-escaping is not what you'd actually want to do manually, as there are many caveats.
You might want to use Apache common-text StringEscapeUtils#unescapeJava
final String result = StringEscapeUtils.unescapeJava(yourString);
That will output "...Good stuff! <3 - !rbschedule for more info..."
The Maven dependency
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.6</version>
</dependency>
Or for Gradle
compile group: 'org.apache.commons', name: 'commons-text', version: '1.6'
what if you replace all "//" with "/" dynamically?
I am trying to install a custom archetype in my local repo. When user generate a project based on this archetype, they are required to provide the artifactId. Most of the time it is in all lower case. However, the main Class name (also the java filename) is dependent on this artifactId with the first letter capitalized. Instead of asking user to input another variable, I would like to call some String method to convert the artifactId to the correct format for class name.
In Maven archetype: Modify artifactId, looks like you can embed Java method in archetype-metadata.xml as below:
<requiredProperty key="artifactIdWithUnderscore" >
<defaultValue>${artifactId.replaceAll("-", "_")}</defaultValue>
</requiredProperty>
So I did something similar in my archetype-metadata.xml to capitalize first letter.
<requiredProperty key="artifactId" />
<requiredProperty key="serviceName">
<defaultValue>${artifactId.toLowerCase().substring(0,1).toUpperCase()+actifactId.toLowerCase().substring(1)}</defaultValue>
</requiredProperty>
However I got the following parse error:
SEVERE: Parser Exception: serviceName
org.apache.velocity.runtime.parser.ParseException: Encountered "+artifactId.toLowerCase().substring(1)}" at line 1, column 55.
Was expecting one of:
"[" ...
"}" ...
at org.apache.velocity.runtime.parser.Parser.generateParseException(Parser.java:3679)
What is correct way to insert Java String method in this archetype xml?
The plus character can't be used as string concatenation operator here, it's also not needed. Just concatenate two replacements (without any operator between):
<defaultValue>${artifactId.toLowerCase().substring(0,1).toUpperCase()}${actifactId.toLowerCase().substring(1)}</defaultValue>
<defaultValue>
${artifactId.substring(0,1).toUpperCase()}${artifactId.substring(1).toLowerCase()}
</defaultValue>
works for me, but artifactId is coming from the archetype project and not from my argument artifactId as expected...
This will capitalize the first letter.
<requiredProperty key="artifactId" />
<requiredProperty key="serviceName">
<defaultValue>${StringUtils.capitalize("artifactId")}</defaultValue>
</requiredProperty>
Make sure you have Apache commons dependency included.
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.5</version>
</dependency>
For given groupId, artifactId, version, classifier and type, how can I download the corresponding artifact using REST?
use the gavc search to get the URL and from there you can download the artefact:
GAVC Search
Description: Search by Maven coordinates: GroupId, ArtifactId, Version
& Classifier. Search must contain at least one argument. Can limit
search to specific repositories (local and remote-cache). Since: 2.2.0
Security: Requires a privileged user (can be anonymous) Usage: GET
/api/search/gavc?[g=groupId][&a=artifactId][&v=version][&c=classifier][&repos=x[,y]]
Headers (Optionally): X-Result-Detail: info (To add all extra
information of the found artifact), X-Result-Detail: properties (to
get the properties of the found artifact), X-Result-Detail: info,
properties (for both). Produces:
application/vnd.org.jfrog.artifactory.search.GavcSearchResult+json
Sample Output:
GET /api/search/gavc?g=org.acme&a=artifact&v=1.0&c=sources&repos=libs-release-local
{
"results": [
{
"uri": "http://localhost:8080/artifactory/api/storage/libs-release-local/org/acme/artifact/1.0/artifact-1.0-sources.jar"
},{
"uri": "http://localhost:8080/artifactory/api/storage/libs-release-local/org/acme/artifactB/1.0/artifactB-1.0-sources.jar"
}
]
}
Taken from the API-Documenation.
In the Artifactory docs about their REST service you have an example here: https://www.jfrog.com/confluence/display/RTF/Artifactory+REST+API#ArtifactoryRESTAPI-RetrieveArtifact
I am selecting certain rdf properties using Apache Marmotta LDPath. The documentation (http://marmotta.apache.org/ldpath/language.html) denotes fn and lmf prefixes are not neccesary explicitly defined.
My code is:
#prefix dc : <http://purl.org/dc/elements/1.1/> ;
id = . :: xsd:string ;
title = dc:title :: xsd:string ;
file = fn:content(.) :: lmf:text_es ;
but I get the next ParseException:
Caused by: org.apache.marmotta.ldpath.parser.ParseException: function with URI http://www.newmedialab.at/lmf/functions/1.0/content does not exist
at org.apache.marmotta.ldpath.parser.LdPathParser.getFunction(LdPathParser.java:213)
at org.apache.marmotta.ldpath.parser.LdPathParser.FunctionSelector(LdPathParser.java:852)
at org.apache.marmotta.ldpath.parser.LdPathParser.AtomicSelector(LdPathParser.java:686)
at org.apache.marmotta.ldpath.parser.LdPathParser.Selector(LdPathParser.java:607)
at org.apache.marmotta.ldpath.parser.LdPathParser.Rule(LdPathParser.java:441)
at org.apache.marmotta.ldpath.parser.LdPathParser.Program(LdPathParser.java:406)
at org.apache.marmotta.ldpath.parser.LdPathParser.parseProgram(LdPathParser.java:112)
at org.apache.marmotta.ldpath.LDPath.programQuery(LDPath.java:235)
... 47 more
Edit
I'm using the LDPath core Fedora Duraspace 4.5.1. My goal is Solr indexing full text of binary resources, anyway to proceed is valid for me.
To whom it need it,
it seems subset Apache Marmotta LDPath library does not support complex functions like fn:, lmf, and others.
For indexing full text of binary resources is necessary to use Apache Tika, for example.