How to Check for Multiple Conditions in Xpath - java

I want to retrive PName of the row/field whose id =2 and pAddress=INDIA
<?xml version="1.0"?>
<mysqldump >
<database name="MyDb">
<table name="DescriptionTable">
<row>
<field name="id">1</field>
<field name="pName">XYZ</field>
<field name="pAddress">INDIA</field>
<field name="pMobile">1234567897</field>
</row>
<row>
<field name="id">2</field>
<field name="pName">PQR</field>
<field name="pAddress">UK</field>
<field name="pMobile">755377</field>
</row>
<row>
<field name="id">3</field>
<field name="pName">ABC</field>
<field name="pAddress">USA</field>
<field name="pMobile">67856697</field>
</row>
</table>
</database>
</mysqldump>
String expression="/mysqldump/database[#name='MyDb']/table[#name='DescriptionTable']/row/field[#name='id' and ./text()]";
Edit:
I would like to get Pname whoese id is 2 and pAddress=INDIA
String expression="/mysqldump/database[#name='MyDb']/table[#name='DescriptionTable']/row/field[#name='id' and .='2']and[#name='pAddress' and .='INDIA']/../field[#name='pName']/text()";

Both of the above answers could be improved by moving aspects of the path expression into the predicates, and using nested nested predicates. IMHO this makes the XPath selection much more human readable.
First we find the row with the field whose #name eq id and text() = "2", from there we can simply select the field from that row whose the #name eq "pName".
/mysqldump/database[#name = "MyDb"]/table[#name = "DescriptionTable"]/row[field[#name eq "id"][text() = "2"]]/field[#name = "pName"]
Also note the explicit use of eq and =, eq is used for comparing atomic values, in this instance the selection of our attributes, and = is used for comparing sequences (as it is conceivable that text() may return more than one item - although it won't for your example XML).

try
/mysqldump/database[#name='MyDb']/table[#name='DescriptionTable']/row/field[#name='id'][.='2']/following-sibling::field[#name='pName']/text()

/mysqldump/database/table/row/field[#name='id' and .=2]/../field[#name='pName']
Explanation:
/mysqldump/database/table/row/field[#name='id' and .=2]
gets the field where id name attribute= id and the value equals 2
../
goes to the parent node.
field[#name='pName']
searches the field where attribute name contains pName

You can try to do this way :
/mysqldump
/database[#name='MyDb']
/table[#name='DescriptionTable']
/row[
field[#name[.='id'] and .=2]
and
field[#name[.='pAddress'] and .='INDIA']
]
/field[#name='pName']
Above XPath will select <row> element whose id is 2 and pAddress=INDIA, then get the row's pName. But looking at the sample XML in this question, there is no such <row> that fulfill both criteria. If you meant to select row which either has id equals 2 or has pAddress equals INDIA, you can use or instead of and in the row filtering expression :
/mysqldump
/database[#name='MyDb']
/table[#name='DescriptionTable']
/row[
field[#name[.='id'] and .=2]
or
field[#name[.='pAddress'] and .='INDIA']
]
/field[#name='pName']

The Simplest way to achieve the above is
String expression = "/mysqldump/database[#name='MyDb']/table[#name='DescriptionTable']/
row[./field[#name="id"]/text()="2" and ./field[#name="pAddress"]/text()="INDIA"]
/field[#name="pName"]/text()";
You can add multiple conditions in the second line seperated by and/or based on your needs

As I understand, you want to get text from two pNames.
I think this should work for you in scope of current xml:
//row/field[text()='2' OR text()='INDIA']/../field[#name='pName']/text()
If you want to take just nodes:
//row/field[text()='2' OR text()='INDIA']/../field[#name='pName']

Related

How to skip field when empty in beanio?

The requirement is to skip field when its empty.
eg -
<segment name="seg1" class="com.company.bean.segmentBean" xmlType="none">
<field name="field1" xmlName= "fieldXml1" xmlType="attribute" maxLength="7" />
<field name="field2" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />
</segment>
Lets assume that field2="". As the value of field2 is "". I would like to have the field skipped in segment. Basically the end result XML shouldnt display field2 as its empty("").
You need the lazy attribute on field2. As stated in the Reference Guide
lazy - Set to true to convert empty field text to null before type conversion. For repeating fields bound to a collection, the collection will not be created if all field values are null or the empty String. Defaults to false.
This will make field2 = null and by default, most XML libraries will not output any null elements, BeanIO included.
Try this for field2:
<field name="field2" lazy="true" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />
Most of the time, I also combine lazy with trim. From the docs:
trim - Set to true to trim the field text before validation and type
conversion. Defaults to false.
<field name="field2" lazy="true" trim="true" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />

How to get specific field values from XML Response in Java?

When I am printing my API response, which gives me below xml as Response:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<BugInfo xmlns="ctessng" xmlns:ns2="http://www.w3.org/1999/xlink">
<Bug id="CSCvz53137">
<Field name="Assigned Date">09/01/2021 21:12:25</Field>
<Field name="Archived">N</Field>
<Field name="Assigner">James Vilson</Field>
<Field name="Status">V</Field>
<Field name="Submitter">Spark Mery</Field>
<Field name="Reason">Technically Inaccurate</Field>
<Field name="Regression">Y</Field>
<Field name="Resolved Date">09/02/2021 02:12:37</Field>
<Field name="Version">001.010</Field>
</Bug>
</BugInfo>
I want to fetch only specific values form this xml, like Assigned Date, Assigner, Submitter & Resolved-on
Assigned Date --> 09/01/2021 21:12:25
Assigner --> James Vilson
Submitter --> Spark Mery
Resolved Date --> 09/02/2021 02:12:37
What is the best/simplest way to read in values from this xml?
Regex
The most versatile would be plain text-filtering (match/find, extract) using a regular expression:
<Field name=\"(Assigned Date|Assigner|Submitter|Resolved Date)\">(.*)<
Iterating with find() then group(1) and group(2) can give you the desired strings.
See this regex demo
XPath
The pure XML-parsing way would be to use any XML parser, like DocumentBuilderFactory and SAXParser which can be used to read the XML into a document, then find the desired XML-nodes (Field elements) via XPath expression:
/BugInfo/Bug/Field[#name="Assigner"]|//Field[#name="Assigned Date"]|//Field[#name="Submitter"]|//Field[#name="Resolved Date"]
Iterating over the found nodes we can extract the child as text value.
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xPath.compile(xPathExpression).evaluate(xmlDocument, XPathConstants.NODESET);
See:
Filtering XML Document using XPATH in java
XPath OR operator for different nodes
XML mapping
The object-oriented way would use an XML mapper like Jackson to deserialize (unmarshall) the XML to an object.
Similar to the OkHTTP Recipe: Parse a JSON Response With Moshi (.kt, .java)
Then you would need a class where you can map the XML nodes to.
class Bug {
String submitter;
String assigner;
Date assignedOn;
Date resolvedOn
}
The mapping can be a bit tricky here, because from XML-model point-of-view a Bug node contains a collection of children Fields. But the target type, is semantically not a field-list, but a Bug-object with different typed properties.
This is probably the cleanest because it will be easy to parse: Bug bug = new XmlMapper().readValue(xmlString, Bug.class).

xpath query for editing a single sub-element with the same name and value that is contained in other elements

I'm trying to update the value in one element (one by one) but I updated all that meet the condition of xpath. XML is not complex, for example:
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<ELEMENT1>W</ELEMENT1>
<ELEMENT2>IN</ELEMENT2>
<ELEMENT3>RP</ELEMENT3>
<ELEMENT4>KKK</ELEMENT4>
</row>
<row>
<ELEMENT1>2</ELEMENT1>
<ELEMENT2>ARQ</ELEMENT2>
<ELEMENT3>MR</ELEMENT3>
<ELEMENT4>AC</ELEMENT4>
</row>
<row>
<ELEMENT1>3</ELEMENT1>
<ELEMENT2>I</ELEMENT2>
<ELEMENT3>RP</ELEMENT3>
<ELEMENT4>KKK</ELEMENT4>
</row>
<row>
<ELEMENT1>1</ELEMENT1>
<ELEMENT2>CC</ELEMENT2>
<ELEMENT3>XX</ELEMENT3>
<ELEMENT4>I</ELEMENT4>
</row>
<row>
<ELEMENT1>12</ELEMENT1>
<ELEMENT2>IN</ELEMENT2>
<ELEMENT3>3</ELEMENT3>
<ELEMENT4></ELEMENT4>
</row>
</root>
All row elements have the same name (row), all row elements have the same elements with the same name and different values but can be repeated between the elements.
By java iterating I get the elements and if I want to update the value of ELEMENT4 of the third row using the following query XPath expression
/root/row/ELEMENT4[text()='KKK']
It change the value in all ELEMENT4 in all rows.If I tried something like this:
/root/row/ELEMENT3[text()='RP'][/root/row/[position()='3']]
The result is the same. What xpath query expression can I use to edit the value of a sub-element into an element without affecting the sub-elements with the same name from other elements?...very thanks
If you want to get the ELEMENT4 of the 3rd row then just use :
/root/row[3]/ELEMENT4
you don't need to use the text() condition.

Lucene/Solr - Indexing publications/texts

I want to be able to search publications with facets. These documents will be annotated so I will upload the annotation to the solr instance. The annotation will have fields which are the terms in the document. Here is an example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<add>
<doc>
<field name="Title">High Glucose Increases the Expression of Inflammatory Cytokine Genes in
Macrophages Through H3K9 Methyltransferase Mechanism.</field>
<field name="Cytokine">INTERFERON </field>
<field name="Cytokine">CYTOKINE </field>
<field name="Cytokine">CYTOKINE</field>
<field name="Cytokine">MEC</field>
<field name="Cytokine">EPA</field>
<field name="Cytokine">DIA</field>
<field name="Cytokine">FIC</field>
<field name="Cytokine">CYTOKINES</field>
<field name="Cytokine">INTERLEUKIN-6 </field>
<field name="Cytokine">INTERLEUKIN</field>
<field name="Cytokine">IL-12P40</field>
<field name="Cytokine">IL-12</field>
<field name="Cytokine">IL-1</field>
<field name="Cytokine">P40</field>
<field name="Cytokine">MACROPHAGE INFLAMMATORY PROTEIN-1</field>
<field name="Cytokine">MACROPHAGE INFLAMMATORY PROTEIN</field>
</doc>
</add>
These terms are all from a Cytokine ontology.
I want be able to set the facet as Cytokine, then select the term and find all of the documents that contain the selected term.
Here is the catch:
I want to be able to store the location of said term found in the
document (it can show up in multiple locations. So I can highlight
later). All of these locations are stored in the annotation.
I want to be able to select one of the terms from the facet and also
bring up documents that contains that terms synonyms but not upload it as a term in the facet (or it being distinguished as a synonym some how (like subcategory)). e.g. automobile
and car
I want to be able to do a cross search e.g. find documents that
contain MEC and EPA.
I have a list of terms I do want to index and want to search the
documents by. These terms have synonyms which I have entered into the
synonyms.txt file.
Also When a term shows up multiple times in the document the annotation has multiple instances of this term with different locations, how should I handle this? Will solr automatically deal with duplication and not give me the documents twice?
One more thing: What about uploading the entire publication to solr, and indexing it on the predefined list of terms?
I understand that, you have synonyms and a search term should be verified directly and also with synonyms and return the results. Let me know if I got it.
If you have all the synonyms while indexing, then you can index them as multi valued field and search on that field.
Faceting is for searching, where the results are grouped.

How to give weight to the specific field?

I am using Apache Solr for indexing and searching. I have to give weight to the specific field so that If I make search then search has to perform on that field which is most weighted and then on others.
I am using SolrJ, Java, and GWT for development.
To boost at index time you need to supply a boost statement in your update doc.
<add overwrite="true">
<doc boost="2.0">
<field name="id">1234</field>
<field name="type">type1</type>
</doc>
<doc>
<field name="id">2345</field>
<field name="type" boost="0.5">type2</type>
</doc>
</add>
The above example demonstrates how to boost a complete document (elevation) as well as how to boost a specific field.
For more documentation look here and here
Using the dismax (or edismax) query handler, you can set the qf (Query Fields) parameter to assign boosts to different fields. It uses this format:
field1^boost_val field2^boost_val....etc.
There are other good parameters to help you control your result ranking as well.
http://wiki.apache.org/solr/ExtendedDisMax

Categories

Resources