Read element inside element from XML in SAX or Dom - java

<rootNode>
<Movies>
<Movie id=1>
<title> title1</title>
<Actors>
<Actor>Actor1</Actor>
<Actor>Actor2</Actor>
<Actors>
</Movie>
</Movies>
<performers >
<performer id=100>
<name>name1</name>
<movie idref=1/>
</performer>
</performers>
</rootNode>
Question1: I only want to get the movie under the movies. I tried both of DOM and SAX. It also returns the under performers. How can I avoid this by using SAX or DOM
DOM:
doc.getElementsByTagName("movie");
SAX:
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("movie"))
Question2: How can I get the element inside element (Actor under movies) by using DOM or SAX?
Basically, what I want to do is output the data in order.
1,title, Actor1,Actor2
100,name1,1

doc.getElementsByTagName("movies")[0].childNodes;
gets you all the movies/movie nodes (watch for lower-/upper-case!). See here http://www.w3schools.com/dom/dom_intro.asp for a short tutorial.

XPath is designed for this type of extraction. For your example file, the query would be something like the following. For simplicity, I assumed your xml was in a res/raw, but in practice you will need to create the InputSource from where ever you are getting your xml.
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/rootNode/Movies/Movie";
try {
NodeList nodes = (NodeList) xpath.evaluate(expression, doc,XPathConstants.NODESET);
} catch (XPathExpressionException e) {
e.printStackTrace();
}

Related

Access data in xml as string

I am receiving a xml in string format. Is there any library to search for elements in the string?
<Version value="0"/>
<IssueDate>2017-12-15</IssueDate>
<Locale>en_US</Locale>
<RecipientAddress>
<Category>Primary</Category>
<SubCategory>0</SubCategory>
<Name>Vitsi</Name>
<Attention>Stowell Group Llc.</Attention>
<AddressLine1>511 6th St</AddressLine1>
<City>Lake Oswego</City>
<Country>United States</Country>
<PresentationValue>Lake Oswego OR 97034-2903</PresentationValue>
<State>OR</State>
<ZIPCode>97034</ZIPCode>
<ZIP4>2903</ZIP4>
</RecipientAddress>
<RecipientAddress>
<Category>Additional</Category>
<SubCategory>1</SubCategory>
<Name>Vitsi</Name>
<AddressLine1>Po Box 957</AddressLine1>
<City>Lake Oswego</City>
<Country>United States</Country>
<PresentationValue>Lake Oswego OR 97034-0104</PresentationValue>
<State>OR</State>
<ZIPCode>97034</ZIPCode>
<ZIP4>0104</ZIP4>
</RecipientAddress>
<SenderName>TMO</SenderName>
<SenderId>IL</SenderId>
<SenderAddress>
<Name>T-mobile</Name>
<AddressLine1>Po Box 790047</AddressLine1>
<City>St. Louis</City>
<PresentationValue>ST. LOUIS MO 63179-0047</PresentationValue>
<State>MO</State>
<ZIPCode>63179</ZIPCode>
.
.
.
.
I want to access the element RecipientAddress, which is a list. Is there any library to do that? Please note that what I receive is a string. It is an invoice and there will be many to process, so performance is important
Following options are available:
Convert xml string to java objects using JAXB.
Use .indexOf() in string method to retrieve specific parts of xml.
Use regular expression to retrieve specific parts of xml.
SAX/DOM/STAX parser for parsing and extraction from xml.
Xpath for fetching the specific values from xml.
You could use XPATH. Java has inbuilt support for XML querying without any thirdparty library,
Code piece would be,
String xmlInputStr = "<YOUR_XML_STRING_INPUT>"
String xpathExpressionStr = "<XPATH_EXPRESSION_STRING>"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlInputStr);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(xpathExpressionStr);
You can write your own expression string for querying. Typical example
"/RecipientAddress/Category"
Evaluate your xml against expression to retrieve list of nodes.
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
And iterate over nodes,
for (int i = 0; i < nodes.getLength(); i++) {
Node nNode = nodes.item(i);
...
}
There lot of pre-implemented api is available to convert xml to java object.
please look at that the xerces from Apache.
If you want extract only specified value the put whole in to string and use indexOf("string")

XPATH won't work

I am trying to extract a 'PartyID' from a request using XPath. This request is in the form of XML.
Here is the XML:
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<s1:invokerules xmlns:s1="http://rules.kmtool.abc.com"><s1:arg0><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<kbdInitiateRequest>
<kmTestHeader>
<MessageId>USER1_MSG1</MessageId>
<TestDate>08/07/2008 07:34:15</TestDate>
<TestReference>
<ConductorReference>
<InvokeIdentifier>
<RefNum>USER1_Ref1</RefNum>
</InvokeIdentifier>
</ConductorReference>
</TestReference>
<TestParty>
<ConductorParty>
<Party PartyID="123456789" AgencyID="DUNS">
<TestContact>
<DetailedContact>
<ContactName>Michael Jackson</ContactName>
<Telephone>02071059053</Telephone>
<TelephoneExtension>4777</TelephoneExtension>
<Email>Michal.Jackson#Neverland.com</Email>
<Title>Mr</Title>
<FirstName>Michael</FirstName>
<Initials>MJ</Initials>
</DetailedContact>
</TestContact>
</Party>
</ConductorParty>
<PerformerParty>
<Party PartyID="987654321" AgencyID="DUNS">
</Party>
</PerformerParty>
</TestParty>
</kmTestHeader>
<kmToolMessage>
<controlNode>
<userRequest>INITIATE</userRequest>
</controlNode>
<customer>
<circuitID>000111333777</circuitID>
</customer>
</kmToolMessage>
</kbdInitiateRequest>
]]></s1:arg0>
</s1:invokerules>
</soapenv:Body>
</soapenv:Envelope>
I have a method in my java code called getPartyId(). This method should extract the PartyID from the XML. However I cannot get this method to return the PartyID no matter what XPath query I use, this is where I need help.
Here is the getPartyId method:
private String getPartyId(String xml) throws XPathExpressionException
{
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Null prefix");
else if ("SOAP-ENV".equals(prefix)) return "http://schemas.xmlsoap.org/soap/envelope/";
else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
return XMLConstants.NULL_NS_URI;
}
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
});
XPathExpression expr = xpath.compile("/SOAP-ENV:Envelope/SOAP-ENV:Body/*/*/*/*/*/*/*/*/*/*/*[local-name()='PartyID']/text()");
InputSource source = new InputSource(new StringReader(xml));
String dunsId = (String) expr.evaluate(source,XPathConstants.STRING);
return dunsId;
}
I believe that the problem lies with the XPathExpression:
XPathExpression expr = xpath.compile("/SOAP-ENV:Envelope/SOAP-ENV:Body/*/*/*/*/*/*/*/*/*/*/*[local-name()='PartyID']/text()");
I have tried a number of alternatives for 'expr' however none of these have worked. Has anyone got any ideas?
Because the xml you need to parse is sitting inside a CDATA block, you'll need to re-parse the value of s1:arg0 before accessing data within it.
You will need to do this in 2 steps
You will need to access the arg0 node in the http://rules.kmtool.abc.com namespace.
Since you don't have a NamespaceContext for this inner xmlns, you can use :
/SOAP-ENV:Envelope/SOAP-ENV:Body/*[local-name()='invokerules']
/*[local-name()='arg0']/text()
You then need to load this value into another InputSource.
The PartyId attribute can be accessed via the path:
kbdInitiateRequest/kmTestHeader/TestParty/ConductorParty/Party/#PartyID
(no need to use local-name() since there aren't any xmlns in the CDATA)
Notice that your inner xml is inside CDATA node.
So basiclly you are trying to query path of an XML inside CDATA.
As this thread state
Xpath to the tag inside CDATA
Seems this is not possible :(
I would suggest take the CData inside the code and parse it into a new XML Document and query that.
Thanks,
Amir

parsing xml in java- multiple child elements

I want to parse xml elemets using java.I m succeeded in some part...But not sure how to do rest..I have xml as,
<MainTag>
<userid>user1</userid>
<country>US</country>
<city>LA</city>
<phone>
<number>1111111111</number>
</phone>
<phone>
<number>222222222</number>
</phone>
</MainTag>
<MainTag>
<userid>user2</userid>
<country>Aus</country>
<city>MB</city>
<phone>
<number>23233</number>
</phone>
<phone>
<number>8787822</number>
</phone>
<phone>
<number>10101</number>
</phone>
I am able to parse xml elements such as country,city etc as below.
public void endelement()
{
if (someText.equalsIgnoreCase("country"))
{
pojo.setCountry(Val);
}
else if(someText.equalsIgnoreCase("city"))
{
pojo.setCity(Val);
}
}
public void stratelement()
{
............
}
But in case of phone how I can parse it ? since one user has multiple phone nos.
I want to find multiple phone nos for particular user.
for e.g. in above xml
for user1 there are two phone nos.
for user2 there are three phone nos.
Can anybody help in this ? Thanks in advance.
I would recommend using JAXB, since it appears you are attempting to bind your xml to a POJO.
Looking at the code you have written here (and assuming that the example xml you have provided is a snippet of well formed xml), I am guess that your pojo object should have a member for phone numbers that is of type List<String>, and your pojo should have a method that allows you to add a phone number to the List (perhaps addPhoneNumber(String phoneNumber) {...})
First, that is not a well-formed XML (as it has two root elements) and you can't parse it with any parser API unless it is well-formed. Now, to parse the XML you would normally use the APIs meant for it like SAX, DOM or StAX or even better the JAXB binding API.
Since you seem to be new to this, I suggest you start learning JAXP. Use StAX instead of DOM or SAX.
you can use DocumetBuilderFactory java default class if you know the incoming xml format for example how many node it has and the names it is very simple look at this code ;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
//documentBuilder instance
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse("employees.xml");
}catch(ParserConfigurationException pce) {
pce.printStackTrace();
}catch(SAXException se) {
se.printStackTrace();
}catch(IOException ioe) {
ioe.printStackTrace();
}
//and than get root element
Element de= dom.getDocumentElement();
//get the nodelist of main element
NodeList nl = de.getElementsByTagName("Employee");
if(nl != null && nl.getLength() > 0) {
for(int i = 0 ; i < nl.getLength();i++) {
//get the employee element
Element el = (Element)nl.item(i);
}
}
//and then get data
private void getEmployee(Element el) {
//for each <employee> element get values
String name = getTextValue(el,"Name");
int id = getIntValue(el,"Id");
int age = getIntValue(el,"Age");
//get any element attribute
//String type = el.getAttribute("type");
}
thats all

How do I remove all selected nodes from an XPath?

I run an XPath in Java with the following xml and code:
<?xml version="1.0" encoding="UTF-8"?>
<list>
<member name="James">
<friendlist>
<friend>0001</friend>
<friend>0002</friend>
<friend>0003</friend>
</friendlist>
</member>
<member name="Jamie">
<friendlist>
<friend>0003</friend>
<friend>0002</friend>
<friend>0001</friend>
</friendlist>
</member>
<member name="Katie">
<friendlist>
<friend>0001</friend>
<friend>0003</friend>
<friend>0004</friend>
</friendlist>
</member>
</list>
Code:
try {
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression pathExpr = xpath.compile("/list/member/friendlist/friend[.='0003']");
} catch (XPathExpressionException e) {
Of course there are more codes after this but I didn't paste it here because it thought it may confuse even more.
But the idea is I wish to select all the friend nodes that have the ID 0003 from all the members' friendlist nodes, and then remove it from the XML file. The XPath works by selecting all the "friend" nodes that have the value=0003. I know I can use the removeChild() method of the XML Document object. But the problem is how do I remove all of it directly, without going through layers of loops starting from its parent? The removeChild() method needs me to know its parent's parent's parent.
Thanks!
Update:
This is how I used my XPath:
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression pathExpr = null;
try {
pathExpr = xpath.compile("/list/member/friendlist/friend[.='0003']");
} catch (XPathExpressionException e) {
e.printStackTrace();
}
NodeList list = null;
try {
list = (NodeList) pathExpr.evaluate(xmlDoc, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
The xmlDoc is an XML document object that has an XML file parsed. The XML works fine. It is only the XML not returning a reference but a whole new nodelist, which makes it impossible for me to refer back to its original xml document to do amendments.
for each node in the returned NodeList:
n.getParentNode().removeChild(n);
I don't understand why the returned nodelist's nodes are returning null for parentNode().
But you could try first selecting all the parents of the nodes you want to remove, with this XPath expression:
"/list/member/friendlist[friend[.='0003']]"
or the equivalent,
"/list/member/friendlist[friend = '0003']]"
Then iterate through the resulting nodelist, and in the context of each one, query for nodes matching the XPath expression
"friend[.='0003']"
That will give you a parent node and a child node to use with removeChild().
Have a look on XUpdate. It's not pretty, but it works.

Java - parse xml string with variable tagnames?

I'm trying to parse an XML string, and the tagnames are variable; I haven't seen any examples on how to pull the information out without knowing them. For example, I will always know the <response> and <data> tags below, but what falls inside/outside of them could be anything from <employee> to you name it.
<?xml version="1.0" encoding="UTF-8"?>
<response>
<generic>
....
</generic>
<data>
<employee>
<name>Seagull</name>
<id>3674</id>
<age>34</age>
</employee>
<employee>
<name>Robin</name>
<id>3675</id>
<age>25</age>
</employee>
</data>
</response>
You could parse it into a generic dom object and traverse it. For example, you could use dom4j to do this.
From the dom4j quick start guide:
public void treeWalk(Document document) {
treeWalk( document.getRootElement() );
}
public void treeWalk(Element element) {
for ( int i = 0, size = element.nodeCount(); i < size; i++ ) {
Node node = element.node(i);
if ( node instanceof Element ) {
treeWalk( (Element) node );
}
else {
// do something....
}
}
}
public Document parse(URL url) throws DocumentException {
SAXReader reader = new SAXReader();
Document document = reader.read(url);
return document;
}
I have seen similar situation in the projects.
If you are going to deal with large XMLs, you can use Stax or Sax parser to read the XML. On every step (like on reaching end element), enter the data into a Map or a dta structure of your choice, where you keep tag names as the key and value as value in the Map. Finally once you have the parsing done, use this Map to figure out which object to build as finally you would have a proper entity representation of the information in the XML
If XML is small,use DOM and directly build the entity object by reading the specific tag (like employee> or use XPATh to where you expect the tag to be present, giving you hint of the entity. Build that object directly by reading the specific information from the XML.

Categories

Resources