Split XML using a JDOM Java - java

I have following XML string.
<Engineers>
<Engineer>
<Name>JOHN</Name>
<Position>STL</Position>
<Team>SS</Team>
</Engineer>
<Engineer>
<Name>UDAY</Name>
<Position>TL</Position>
<Team>SG</Team>
</Engineer>
<Engineer>
<Name>INDRA</Name>
<Position>Director</Position>
<Team>PP</Team>
</Engineer>
</Engineers>
I need to split this xml into smaller xml strings when Xpath is given as Engineers/Enginner.
Smaller xml strings are as follows
<Engineer>
<Name>INDRA</Name>
<Position>Director</Position>
<Team>PP</Team>
</Engineer>
<Engineer>
<Name>JOHN</Name>
<Position>STL</Position>
<Team>SS</Team>
</Engineer>
I have implemented following using saxon xpath and JDOM.
import net.sf.saxon.Configuration;
import net.sf.saxon.lib.NamespaceConstant;
import net.sf.saxon.om.DocumentInfo;
import net.sf.saxon.om.NodeInfo;
import net.sf.saxon.s9api.DocumentBuilder;
import net.sf.saxon.s9api.XPathCompiler;
import net.sf.saxon.s9api.XPathSelector;
import net.sf.saxon.s9api.XdmNode;
import net.sf.saxon.xpath.XPathFactoryImpl;
import org.apache.axiom.om.OMElement;
import org.apache.axiom.om.impl.builder.StAXOMBuilder;
import org.junit.Test;
import org.xml.sax.InputSource;
import java.io.File;
import java.io.FileInputStream;
import java.io.StringReader;
import java.util.Iterator;
import java.util.List;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathFactoryConfigurationException;
public void testXML() throws XPathFactoryConfigurationException, XPathExpressionException, Exception {
System.setProperty("javax.xml.xpath.XPathFactory:" + NamespaceConstant.OBJECT_MODEL_JDOM,
"net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory xPathFactory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_JDOM);
XPath xPath = xPathFactory.newXPath();
InputSource inputSource = new InputSource(new File(filename).toURI().toString());
SAXSource saxSource = new SAXSource(inputSource);
Configuration config = ((XPathFactoryImpl) xPathFactory).getConfiguration();
DocumentInfo document = config.buildDocument(saxSource);
XPathExpression xPathExpression = xPath.compile("//Engineers/Engineer");
List matches = (List) xPathExpression.evaluate(document, XPathConstants.NODESET);
if (matches != null) {
for (Iterator iter = matches.iterator(); iter.hasNext(); ) {
NodeInfo node = (NodeInfo) iter.next();
System.out.println(node.getDisplayName() + " - " + node.getStringValue());
}
}
}
It gives following result.
Engineer -
JOHN
STL
SS
Engineer -
UDAY
TL
SG
Engineer -
INDRA
Director
PP
How can I change the code so that I get my desired output?Or is there a way to get the names of child attributes(Name,Position,Team) inside Engineer

If you are using JDOM for this work, you should consider using the native JDOM methods instead of the abstraction run through Saxon.
Consider something like:
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.xpath.XPathFactory;
import org.jdom2.xpath.XPAthExpression;
import org.jdom2.output.XMLOutputter;
import org.jdom2.input.SAXBuilder;
import org.jdom2.filter.Filters;
....
XPathExpression xpe = XPathFactory.instance()
.compile("//Engineers/Engineer", Filters.element());
Document doc = new SAXBuilder().build(new File(filename));
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
for (Element e : xpe.evaluate(doc)) {
xout.output(e, System.out);
}

I would do the splitting in XSLT:
<xsl:stylesheet ....>
<xsl:template match="Engineeers/Engineer">
<xsl:result-document href="{position()}.xml">
<xsl:copy-of select="."/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
If you want the result as a list of JDOM documents then you can supply Saxon with an OutputURIResolver:
Controller controller = transformer.getUnderlyingController();
final Configuration config = controller.getConfiguration();
List<Document> jdomDocuments = new ArrayLis<Document>();
Controller.setOutputURIResolver(new OutputURIResolver() {
public Result resolve(href, base) {
return new JDOM2Writer(config.makePipelineConfiguration());
}
public void close(Result result) {
jdomDocuments.add(((JDOM2Writer)result).getDocument());
}
}
and on completion the results will be in jdomDocuments.

Related

Java - Split XML using XPath but with its Parent Tag

I have following XML String :
<Aaaa>
<Bbbb>
<GroupC>
<KeyId>10001</KeyId>
</GroupC>
<DetailC>
<Dddd>
<Eeee>Eeee 001</Eeee>
<Ffff>Ffff 001</Ffff>
</Dddd>
</DetailC>
<DetailC>
<Dddd>
<Eeee>Eeee 002</Eeee>
<Ffff>Ffff 002</Ffff>
</Dddd>
</DetailC>
</Bbbb>
</Aaaa>
I would like to split "DetailC" it into the smaller XML:
XML 01:
<Aaaa>
<Bbbb>
<GroupC>
<KeyId>10001</KeyId>
</GroupC>
<DetailC>
<Dddd>
<Eeee>Eeee 001</Eeee>
<Ffff>Ffff 001</Ffff>
</Dddd>
</DetailC>
</Bbbb>
</Aaaa>
XML 02:
<Aaaa>
<Bbbb>
<GroupC>
<KeyId>10001</KeyId>
</GroupC>
<DetailC>
<Dddd>
<Eeee>Eeee 002</Eeee>
<Ffff>Ffff 002</Ffff>
</Dddd>
</DetailC>
</Bbbb>
</Aaaa>
Can I know how can I do so using Java?
Currently I only able to split into separate XML,
but it is without <Aaaa>, <Bbbb>, <GroupC>
Java code:
package message;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.apache.xpath.CachedXPathAPI;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.NodeIterator;
import org.xml.sax.InputSource;
public class mainClass {
public static void main(String[] args) throws Exception{
// TODO Auto-generated method stub
String path = "D:\\abc.xml";
String xml = readFile(path);
List<String> xmlList2 = splitXML(xml, "/Aaaa/Bbbb/DetailC");
for (String xmlC : xmlList2) {
System.out.println("xmlC: " + xmlC);
}
}
private static List<String> splitXML(String xmlMessage, String xPath) throws Exception {
List<String> xmlList = new ArrayList<>();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
InputSource parameterSource = new InputSource(new StringReader(xmlMessage));
Document doc = dBuilder.parse(parameterSource);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
CachedXPathAPI cachedXPathAPI = new CachedXPathAPI();
NodeIterator nl = cachedXPathAPI.selectNodeIterator(doc, xPath);
Node node;
while ((node = nl.nextNode()) != null) {
StringWriter buf = new StringWriter();
DOMSource dom = new DOMSource(node);
xform.transform(dom, new StreamResult(buf));
xmlList.add(buf.toString());
}
return xmlList;
}
private static String readFile(String path) {
String content = "";
try (Stream<String> lines = Files.lines(Paths.get(path))) {
content = lines.collect(Collectors.joining(System.lineSeparator()));
} catch (IOException e) {
e.printStackTrace();
}
return content;
}
}
If you use Saxon 9 HE (availabe on Sourceforge and Maven for Java) you can solve that with XSLT 3, see the approach from Split XML file into multiple files using XSLT where you can change the code to
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" version="3.0"
exclude-result-prefixes="xs">
<xsl:template match="DetailC">
<xsl:variable name="pos" as="xs:integer">
<xsl:number/>
</xsl:variable>
<xsl:result-document href="XML{format-number($pos, '000')}.xml">
<xsl:apply-templates select="/" mode="split">
<xsl:with-param name="this-detail" select="." tunnel="yes"/>
</xsl:apply-templates>
</xsl:result-document>
</xsl:template>
<xsl:template match="#* | node()" mode="split">
<xsl:copy>
<xsl:apply-templates select="#* | node()" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="DetailC" mode="split">
<xsl:param name="this-detail" tunnel="yes"/>
<xsl:if test=". is $this-detail">
<xsl:next-match/>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
To run Saxon 9 in Java you can use either the JAXP transformation API http://saxonica.com/html/documentation/using-xsl/embedding/jaxp-transformation.html or the Saxon 9 specific s9api http://saxonica.com/html/documentation/using-xsl/embedding/s9api-transformation.html.
Keep in mind that Transformer can directly transform a file with StreamSource (e.g. https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/stream/StreamSource.html#StreamSource-java.lang.String- or https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/stream/StreamSource.html#StreamSource-java.io.File-) so there is no need to read in the file contents in a string or to build a DOM by hand, you can load any XML file directly as the input to XSLT.

XML file containing multiple root elements

I have a file which contains multiple sets of root elements. How can I extract the root element one by one?
This is my XML
<Person>
<empid></empid>
<name></name>
</Person>
<Person>
<empid></empid>
<name></name>
</Person>
<Person>
<empid></empid>
<name></name>
</Person>
How can I extract one set of Person at a time?
Use java.io.SequenceInputStream to trick xml parser:
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.ByteArrayInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.SequenceInputStream;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
public class MultiRootXML{
public static void main(String[] args) throws Exception{
List<InputStream> streams = Arrays.asList(
new ByteArrayInputStream("<root>".getBytes()),
new FileInputStream("persons.xml"),
new ByteArrayInputStream("</root>".getBytes())
);
InputStream is = new SequenceInputStream(Collections.enumeration(streams));
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(is);
NodeList children = doc.getDocumentElement().getChildNodes();
for(int i=0; i<children.getLength(); i++){
Node child = children.item(i);
if(child.getNodeType()==Node.ELEMENT_NODE){
System.out.println("persion: "+child);
}
}
}
}
You cannot parse your file using an XML parser because your file is not XML. XML cannot have more than one root element.
You have to treat it as text, repair it to be well-formed, and then you can parse it with an XML parser.
If your XML is valid, using a SAX or DOM parser. Please consult the XML Developer's Kit Programmer's Guide for more details.

Why XPath.evaluate is returning NULL?

When I use below code to modify an xml file I receive this error :
Exception in thread "main" java.lang.NullPointerException
at ModifyXMLFile.updateFile(ModifyXMLFile.java:44)
at ModifyXMLFile.main(ModifyXMLFile.java:56)
The error occurs at line : node.setTextContent(newValue);
Am I not using xpath correctly ?
Here is the code and the xml file I'm attempting to update
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.apache.commons.io.FileUtils;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class ModifyXMLFile {
public void updateFile(String newValue){
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
try {
File f = new File("C:\\pom.xml");
InputStream in = new FileInputStream(f);
InputSource source = new InputSource(in);
Node node = (Node)xpath.evaluate("/project/parent/version/text()", source, XPathConstants.NODE);
node.setTextContent(newValue);
} catch (XPathExpressionException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
public static void main(String argv[]) {
new ModifyXMLFile().updateFile("TEST");
}
}
xml file :
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>testgroup</groupId>
<artifactId>testartifact</artifactId>
<version>update</version>
</parent>
</project>
xmlns="http://maven.apache.org/POM/4.0.0"
means that the un-prefixed element names in the XML file are in this namespace. In XPath 1.0 unprefixed node names always refer to nodes in no namespace, so /project/parent/version correctly matches nothing.
To match namespaced nodes in XPath you need to bind a prefix to the namespace URI and then use that prefix in the expression. For javax.xml.xpath this means creating a NamespaceContext. Unfortunately there are no default implementations of this interface in the standard Java library, but the Spring framework provides a SimpleNamespaceContext that you can use
XPath xpath = xPathfactory.newXPath();
SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
xpath.setNamespaceContext(nsCtx);
nsCtx.bindNamespaceUri("pom", "http://maven.apache.org/POM/4.0.0");
// ...
Node node = (Node)xpath.evaluate("/pom:project/pom:parent/pom:version/text()", source, XPathConstants.NODE);
That said, you'll still need to do a bit more work to actually modify the file, as you're currently loading it and modifying the DOM but then not saving the modified DOM anywhere.
An alternative approach might be to use XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:pom="http://maven.apache.org/POM/4.0.0">
<xsl:param name="newVersion" />
<!-- identity template - copy input to output unchanged except when
overridden -->
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()" /></xsl:copy>
</xsl:template>
<!-- override for the version value -->
<xsl:template match="pom:version/text()">
<xsl:value-of select="$newVersion" />
</xsl:template>
</xsl:stylesheet>
Then you can use the Transformer API to call this stylesheet with an appropriate parameter
StreamSource xslt = new StreamSource(new File("transform.xsl"));
Transformer transformer = TransformerFactory.newInstance().newTransformer(xslt);
transformer.setParameter("newVersion", newValue);
StreamSource input = new StreamSource(new File("C:\\pom.xml"));
StreamResult output = new StreamResult(new File("C:\\updated-pom.xml"));
transformer.transform(input, output);
Since your XML document is namespace qualified (see xmlns attribute in the project element):
<project xmlns="http://maven.apache.org/POM/4.0.0" ...
You will need to set an implementation of javax.xml.namespace.NamespaceContext on your XPath object. This is used to return the namespace information for an individual step in the XPath.
// SET A NAMESPACECONTEXT
xpath.setNamespaceContext(new NamespaceContext() {
#Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
#Override
public String getPrefix(String namespaceURI) {
return null;
}
#Override
public String getNamespaceURI(String prefix) {
if("pom".equals(prefix)) {
return "http://maven.apache.org/POM/4.0.0";
}
return null;
}
});
You need to change your XPath to include the prefixes used in the NamespaceContext. Now you are just not looking for an element called project, you are looking for a namespace qualified element with local name project, the NamespaceContext will resolve the prefix to match the actual URI you are looking for.
Node node = (Node)xpath.evaluate("/pom:project/pom:parent/pom:version/text()", source, XPathConstants.NODE);
#Adrain Please use the following xpath and you should be able to fetch or change the value
/parent/version/text()

how to extract entire xml element from xml document

all the examples about parsing xml elements/nodes, that i've found, are about how to extract node attributes/values etc from xml document.
Im interested on how should i extract an entire element from opening tag to closing tag.
Example:
from xml document
<?xml version="1.0"?>
<Employees>
<Employee emplid="1111" type="admin"/>
</Employees>
i would get the complete element
<Employee emplid="1111" type="admin"/>
to saving it in a String variable
Thanks in advance
You can either do the parsing yourself, or use Android's XML parser. This shows how to use the latter.
If you use Android's parser you probably have to parse it completely, and then construct the string <Employee emplid="1111" type="admin"/> by hand.
For now this is the best solution.
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
...
//xml document
String xml = "<?xml version=\"1.0\"?><Employees><Employee emplid=\"1111\" type=\"admin\"/></Employees>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
Document document = parser.parse(new ByteArrayInputStream(xml.getBytes()));
//getting the target element
NodeList list=document.getElementsByTagName("Employee");
Node node=list.item(0);
//writing the element in the string variable
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
StringWriter buffer = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(node), new StreamResult(buffer));
String str = buffer.toString();
System.out.println(str);
...
Inspired by this thread

XML parsing using SAX by xpath

i am trying to do XML parsing using SAX by using xpath. but when i try for getting data for multiple nodeset it do not give that.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.apache.xpath.NodeSet;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class XPathEvaluator{
public void evaluateDocument(File xmlDocument){
try{
XPathFactory factory=XPathFactory.newInstance();
XPath xPath=factory.newXPath();
InputSource inputSource=new InputSource(new FileInputStream(xmlDocument));
XPathExpression
xPathExpression=xPath.compile("/catalog/journal/article[#date='January-2004']/title");
String title=xPathExpression.evaluate(inputSource);
System.out.println("Title: "+ title);
inputSource=new InputSource(new FileInputStream(xmlDocument));
String publisher=xPath.evaluate("/catalog/journal/#publisher", inputSource);
System.out.println("Publisher:"+ publisher);
String expression="/catalog/journal[#title='Java Technology']/article";
NodeSet nodes = (NodeSet) xPath.evaluate(expression, inputSource,XPathConstants.NODESET);
NodeList nodeList=(NodeList)nodes;
System.out.println("node List"+nodeList);
}
catch(IOException e){}
catch(XPathExpressionException e){}
}
public static void main(String[] argv){
XPathEvaluator evaluator=new XPathEvaluator();
File xmlDocument=new File("e://catalog-modified.xml");
evaluator.evaluateDocument(xmlDocument);
}
}
my catalog-modified.xml is as follows
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns:journal="http://www.w3.org/2001/XMLSchema-Instance">
<journal:journal title="XML" publisher="IBM developerWorks">
<article journal:level="Advanced" date="February-2003">
<title>Design XML Schemas Using UML</title>
<author>Ayesha Malik</author>
</article>
</journal:journal>
<journal title="Java Technology" publisher="IBM
developerWorks">
<article level="Intermediate" date="January-2004"
section="Java Technology">
<title>Service Oriented Architecture Frameworks
</title>
<author>Naveen Balani
</author>
</article>
<article level="Advanced" date="October-2003" section="Java Technology">
<title>Advance DAO Programming</title>
<author>Sean Sullivan</author>
</article>
<article level="Advanced" date="May-2002" section="Java Technology">
<title>Best Practices in EJB Exception Handling </title>
<author>Srikanth Shenoy
</author>
</article>
</journal>
if i try to call this do not show any nodeset for this .
The journal element is in a namespace. You can't ignore the namespace. Please read about XPath and namespaces - there are thousands of posts on the subject in this forum.

Categories

Resources