XML Searching and Parsing

XML Searching and Parsing - java

I have an XML file that I am trying to search using Java. I just need to find an element by its Tag name and then find that Tag's value. So for example:
I have this XML file:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="https://company.com/test/xslt/processing_report.xslt"?>
<Certificate xmlns="urn:us:net:exchangenetwork:Company">
<Value1>Veggie</Value1>
<Value2>Fruits</Value2>
<type1>Apple</type1>
<FindME>Red</FindME>
<Value3>Bread</Value3>
</Certificate>
I want to find the value inside of the FindME Tag. I can't use XPath because different files can have different structures, but they always have a FindME tag. Lastly I am looking for the simplest piece of code, I do not care much about performance. Thank you

Here is the code:
XPathFactory f = XPathFactory.newInstance();
XPathExpression expr = f.newXPath().compile(
"//*[local-name() = 'FindME']/text()");
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("src/test.xml"); //your XML file
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
Explained :
//* - match any element node - no matter where they are
local-name() = 'FindME' - where local name - i.e; not the full path - matches 'FindME'
text() - get the node value.

I think you need to read up on XPath because it can very easily solve this problem. So can using getElementsByTagName in the DOM API.

You can still use XPath. All you need to do is use //FindMe (read here on // usage) expression. This finds a the "FindMe" elements from any where in the xml irrespective of its parent or path from the root.
If you are using namespaces then make sure you are making the parser aware of that

String findMeVal = null;
InputStream is = //...
XmlPullParser parser = //...
parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, true);
parser.setInput(is, null);
int event;
while (XmlPullParser.END_DOCUMENT != (event = parser.next())) {
if (event == XmlPullParser.START_TAG) {
if ("FindME".equals(parser.getName())) {
findMeVal = parser.nextText();
break;
}
}
}

Related

Efficiently unmarshaling a part of a large xml file with JAXB and XMLStreamReader

I want to unmarshall part of a large XML file. There exists solution of this already, but I want to improve it for my own implementation.
Please have a look at the following code: (source)
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
StreamSource xml = new StreamSource("input.xml");
XMLStreamReader xsr = xif.createXMLStreamReader(xml);
xsr.nextTag();
while(!xsr.getLocalName().equals("VersionList")&&xsr.getElementText().equals("1.81")) {
xsr.nextTag();
}
I want to unmarshall the input.xml (given below) for the node: versionNumber="1.81"
With the current code, the XMLStreamReader will first check the node versionNumber="1.80" and then it will check all sub nodes of versionNumber and then it will again move to node: versionNumber="1.81", where it will satisfy the exit condition of the while loop.
Since, I want to check node versionNumber only, iterating its subnodes are unnecessary and for large xml file, iterating all sub nodes of version 1.80 will take lone time. I want to check only root nodes (versionNumber) and if the first root node (versionNumber=1.80) is not matched, the XMLStreamReader should directly jump to next root node ((versionNumber=1.81)). But it seems not achievable with xsr.nextTag(). Is there any way, to iterate through the desired root nodes only?
input.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fileVersionListWrapper FileName="src.h">
<VersionList versionNumber="1.80">
<Reviewed>
<commentId>v1.80(c5)</commentId>
<author>Robin</author>
<lines>47</lines>
<lines>48</lines>
<lines>49</lines>
</Reviewed>
<Reviewed>
<commentId>v1.80(c6)</commentId>
<author>Sujan</author>
<lines>82</lines>
<lines>83</lines>
<lines>84</lines>
<lines>85</lines>
</Reviewed>
</VersionList>
<VersionList versionNumber="1.81">
<Reviewed>
<commentId>v1.81(c4)</commentId>
<author>Robin</author>
<lines>47</lines>
<lines>48</lines>
<lines>49</lines>
</Reviewed>
<Reviewed>
<commentId>v1.81(c5)</commentId>
<author>Sujan</author>
<lines>82</lines>
<lines>83</lines>
<lines>84</lines>
<lines>85</lines>
</Reviewed>
</VersionList>
</fileVersionListWrapper>

You can get the node from the xml using XPATH
XPath, the XML Path Language, is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. What is Xpath.
Your XPath expression will be
/fileVersionListWrapper/VersionList[#versionNumber='1.81']
meaning you want to only return VersionList where the attribute is 1.81
JAVA Code
I have made an assumption that you have the xml as string so you will need the following idea
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource inputSource = new InputSource(new StringReader(xml));
Document document = builder.parse(inputSource);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/fileVersionListWrapper/VersionList[#versionNumber='1.81']");
NodeList nl = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
Now it will be simply loop through each node
for (int i = 0; i < nl.getLength(); i++)
{
System.out.println(nl.item(i).getNodeName());
}
to get the nodes back to to xml you will have to create a new Document and append the nodes to it.
Document newXmlDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element root = newXmlDocument.createElement("fileVersionListWrapper");
for (int i = 0; i < nl.getLength(); i++)
{
Node node = nl.item(i);
Node copyNode = newXmlDocument.importNode(node, true);
root.appendChild(copyNode);
}
newXmlDocument.appendChild(root);
once you have the new document you will then run a serializer to get the xml.
DOMImplementationLS domImplementationLS = (DOMImplementationLS) document.getImplementation();
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
String string = lsSerializer.writeToString(document);
now that you have your String xml , I have made an assumption you already have a Jaxb object which looks similar to this
#XmlRootElement(name = "fileVersionListWrapper")
public class FileVersionListWrapper
{
private ArrayList<VersionList> versionListArrayList = new ArrayList<VersionList>();
public ArrayList<VersionList> getVersionListArrayList()
{
return versionListArrayList;
}
#XmlElement(name = "VersionList")
public void setVersionListArrayList(ArrayList<VersionList> versionListArrayList)
{
this.versionListArrayList = versionListArrayList;
}
}
Which you will simple use the Jaxb unmarshaller to create the objects for you
JAXBContext jaxbContext = JAXBContext.newInstance(FileVersionListWrapper .class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
StringReader reader = new StringReader(xmlString);
FileVersionListWrapper fileVersionListWrapper = (FileVersionListWrapper) jaxbUnmarshaller.unmarshal(reader);

Filling template xml in memory using Java and JDOM?

I want to create a XML from a template during runtime in Java using JDOM.
Below is a sample template
<PARENT>
<ISSUES>
<ISSUE id="ISSUE-X">
<SUMMARY></SUMMARY>
<CATEGORY></CATEGORY>
..
</ISSUE>
</ISSUES>
</PARENT>
I want to load this template file using Java + JDOM and get the following
<PARENT>
<ISSUES>
<ISSUE id="ISSUE-1">
<SUMMARY>Test 1</SUMMARY>
<CATEGORY>Cat 1</CATEGORY>
..
</ISSUE>
<ISSUE id="ISSUE-2">
<SUMMARY>Test 2</SUMMARY>
<CATEGORY>Cat 2</CATEGORY>
..
</ISSUE>
</ISSUES>
</PARENT>
Ideally I want to create more ISSUE nodes and fill the data from DB & save to file
Reason I thought I could use Template is because there will be additional nodes under <ISSUE> which I need to fill from db & was thinking filling this via template would be much faster
Can someone guide me on how to get this done in Java using JDOM?
Note: This template will adhere to a XSD which I haven't given here.
Thanks in advance
EDIT: Code snippet below
String sXMLPath = "D:\\WS\\issue_sample.xml";
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
dBuilder = dbFactory.newDocumentBuilder();
org.w3c.dom.Document doc = dBuilder.parse(new File(sXMLPath));
DOMBuilder domBuilder = new DOMBuilder();
Document xConfigurationDocument;
xConfigurationDocument = domBuilder.build(doc);
XPathFactory xpfac = XPathFactory.instance();
XPathExpression<Element> xElements = xpfac.compile("//ns:MY-ISSUE/ns:ISSUES",Filters.element(),null,Namespace.getNamespace("ns", "http://www.myns.net/schemas/issue"));
List<Element> elements = xElements.evaluate(xConfigurationDocument);
for (Element xIssuesParent : elements) {
System.out.println(xIssuesParent.getName());
Element xCloneIssue = null ;
for (Element xIssueChild : xIssuesParent.getChildren())
{
xCloneIssue = xIssueChild.clone();
System.out.println(xIssueChild.getName());
xIssuesParent.removeContent(xIssueChild);
}
for (int i = 1; i < 3; i++) {
xCloneIssue.setAttribute("ID", "ISSUE-" + i);
xIssuesParent.addContent(xCloneIssue);
}
}
XMLOutputter xmlOutput = new XMLOutputter();
// display nice nice
xmlOutput.setFormat(Format.getPrettyFormat());
xmlOutput.output(xConfigurationDocument, new FileWriter("c:\\temp\\OutputFile.xml"));
I am trying out this in a sample application
The problem I face is that in the for loop (for (int i = 1; i < 3; i++)) after 1st I always get the following error The Content already has an existing parent "ISSUES"
Obviously what I am missing is a new clone.
My question is how can i always get a handle of an element and keep adding to the parent

If it will adhere to an XSD then take a look at org.jdom.input.DOMBuilder which you can parse a DTD into.

Read sitemap with XPath

I want to read Sitemap with XPath but it doesn't work.
here is my code :
private void evaluate2(String src){
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
try{
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new ByteArrayInputStream(src.getBytes()));
System.out.println(src);
XPathFactory xp_factory = XPathFactory.newInstance();
XPath xpath = xp_factory.newXPath();
XPathExpression expr = xpath.compile("//url/loc");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
items.add(nodes.item(i).getNodeValue());
System.out.println(nodes.item(i).toString());
}
}catch(Exception e){
System.out.println(e.getMessage());
}
}
Before I retrieve the remote source of the sitemap, and it's passed to evaluate2 through the variable src.
And the System.out.println(nodes.getLength()); display 0
My xpath query is working because this query work in PHP.
Do you see errors in my code ?
Thanks

You parse the sitemap with a namespace-aware parser (that's what factory.setNamespaceAware(true) does), but then attempt to access it using an XPath that does not usea namespace resolver (or reference any namespaces).
The simplest solution is to configure the parser as not namespace aware. As long as you're just parsing a self-contained sitemap, that shouldn't be a problem.
One more problem in your code is that you pass the sitemap contents as a String, then convert that String using the platform default encoding. This will work as long as your platform-default encoding matches that of the actual bytes that you retrieved from the server (assuming that you also created the string using the platform-default encoding). If it doesn't, you're likely to get a conversion error.

I think the input has namespace. So you would have to initialize the namespaceContext for the xpath object and change your xpath with prefixes. i.e. //usr/loc should be //ns:url/ns:loc
and then add the namespace prefix binding in the namespace object.
You can find an NamespaceContext implementation available with apache common. http://ws.apache.org/commons/util/apidocs/index.html
ws-commons-utils
NamespaceContextImpl namespaceContextObj = new NamespaceContextImpl();
nsContext.startPrefixMapping("ns", "http://sitename/xx");
xpath.setNamespaceContext(namespaceContextObj);
XPathExpression expr = xpath.compile("//ns:url/ns:loc");
In case you don't know what namespaces that are comming, you can get them from the document it self, but I doubt it ll be of much use. There are few how-tos here
http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html

I can't see any errors in your code so I gues the problem is the source.
Are you sure that the source file contains this element?
Maybe you could try to use this code to parse the String in an Document
builder.parse(new InputSource(new StringReader(xml)));

Getting value of child node from XML in java

My xml file looks like this
<InNetworkCostSharing>
<FamilyAnnualDeductibleAmount>
<Amount>6000</Amount>
</FamilyAnnualDeductibleAmount>
<IndividualAnnualDeductibleAmount>
<NotApplicable>Not Applicable</NotApplicable>
</IndividualAnnualDeductibleAmount>
<PCPCopayAmount>
<CoveredAmount>0</CoveredAmount>
</PCPCopayAmount>
<CoinsuranceRate>
<CoveredPercent>0</CoveredPercent>
</CoinsuranceRate>
<FamilyAnnualOOPLimitAmount>
<Amount>6000</Amount>
</FamilyAnnualOOPLimitAmount>
<IndividualAnnualOOPLimitAmount>
<NotApplicable>Not Applicable</NotApplicable>
</IndividualAnnualOOPLimitAmount>
</InNetworkCostSharing>
I am trying to get Amount value from <FamilyAnnualDeductibleAmount> and also from <FamilyAnnualOOPLimitAmount>. How do i get those values in java?

You may use two XPath queries /InNetworkCostSharing/FamilyAnnualDeductibleAmount and InNetworkCostSharing/FamilyAnnualOOPLimitAmount or just get the node InNetworkCostSharing and retrieve the values of its two direct children.
Solution using XPath:
// load the XML as String into a DOM Document object
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream("YOUR XML".getBytes());
Document doc = docBuilder.parse(bis);
// XPath to retrieve the content of the <FamilyAnnualDeductibleAmount> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/InNetworkCostSharing/FamilyAnnualDeductibleAmount/text()");
String familyAnnualDeductibleAmount = (String)expr.evaluate(doc, XPathConstants.STRING);

StAX based solution:
XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader rdr = f.createXMLStreamReader(new FileReader("test.xml"));
while (rdr.hasNext()) {
if (rdr.next() == XMLStreamConstants.START_ELEMENT) {
if (rdr.getLocalName().equals("FamilyAnnualDeductibleAmount")) {
rdr.nextTag();
int familyAnnualDeductibleAmount = Integer.parseInt(rdr.getElementText());
System.out.println("familyAnnualDeductibleAmount = " + familyAnnualDeductibleAmount);
} else if (rdr.getLocalName().equals("FamilyAnnualOOPLimitAmount")) {
rdr.nextTag();
int familyAnnualOOPLimitAmount = Integer.parseInt(rdr.getElementText());
System.out.println("FamilyAnnualOOPLimitAmount = " + familyAnnualOOPLimitAmount);
}
}
}
rdr.close();
Note that StAX is especially good for cases like yours, it skips all unnecessary elements reading only the ones you need

Try something like this(use getElementsByTagName to get the parent nodes and then get the value be reaching out to child node):
File xmlFile = new File("NetworkCost.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile );
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("FamilyAnnualDeductibleAmount");
String familyDedAmount = nList.item(0).getChildNodes().item(0).getTextContent();
nList = doc.getElementsByTagName("FamilyAnnualOOPLimitAmount");
String familyAnnualAmount =
nList.item(0).getChildNodes().item(0).getTextContent();

I think I found the solution with this question from stackoverflow
Getting XML Node text value with Java DOM

How to parse this XML in Android?

I am quite new to XML parsing and I have my method of parsing XML. Only that method is for simple XML layouts with just 1 child node.
I now have to parse an XML file with childs that have childs that have childs (got it :)?)
This is the parse-method I have now:
protected Map<String, Maatschappij> getAutopechTel() {
Map<String, Maatschappij> telVoorAutopech = new HashMap<String, Maatschappij>();
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document doc = builder.parse(getAssets().open("autopech.xml"));
NodeList nl = doc.getElementsByTagName("dienst");
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
Maatschappij maat = new Maatschappij();
maat.setNaam(Xml.innerHtml(Xml.getChildByTagName(node, "naam")));
maat.setTel(Xml.innerHtml(Xml.getChildByTagName(node, "tel")));
telVoorAutopech.put(maat.getTel(), maat);
}
} catch (Exception e) {
}
return telVoorAutopech;
}
How must I adjust this in order to parse this type of XML file:
<Message>
<Service>Serviceeee</Service>
<Owner>Bla</Owner>
<LocationFeedTo>Too</LocationFeedTo>
<Location>http://maps.google.com/?q=52.390001,4.890145</Location>
<Child1>
<Child1_1>
<Child1_1_1>ANWB</Child1_1_1>
</Child1_1>
</Child1>
<Message>

You can use SAXParser to parse XML in Android :
Here is a detailed tutorial with example and also another one here by IBM developerWorks.
DOM Parser is slow and consume a lot
memory if it load a XML document
which contains a lot of data. Please
consider SAX parser as solution for
it, SAX is faster than DOM and use
less memory.
Try this one out but I haven't tested this code yet. It recursively traverses all the nodes and adds which are ELEMENT_NODE to the Vector<Node>.
public void traverseNodes(Node node, Vector<Node> nodeList)
{
if(node.getNodeType() == Node.ELEMENT_NODE)
{
nodeList.add(node);
if(node.getChildNodes().getLength() >= 1)
{
NodeList childNodeList = node.getChildNodes();
for(int nodeIndex = 1;nodeIndex < childNodeList.getLength(); nodeIndex++)
{
traverseNodes(childNodeList.item(nodeIndex),nodeList);
}
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XML Searching and Parsing - java

I think you need to read up on XPath because it can very easily solve this problem. So can using getElementsByTagName in the DOM API.

You can still use XPath. All you need to do is use //FindMe (read here on // usage) expression. This finds a the "FindMe" elements from any where in the xml irrespective of its parent or path from the root. If you are using namespaces then make sure you are making the parser aware of that

Related

Efficiently unmarshaling a part of a large xml file with JAXB and XMLStreamReader

Filling template xml in memory using Java and JDOM?

Read sitemap with XPath

Getting value of child node from XML in java

How to parse this XML in Android?

Categories

Resources