I have a large XML and I want to update a particular node of the XML (like removing duplicate nodes).
As the XML is huge I considered using the STAX api class - XMLStreamReader. I first read the XML using XMLStreamReader. I stored the read data in user objects and manipulated these user objects to remove duplicates.
Now I want to put this updated user object back into my original XML. What I thought is that I can marshall the user object to a string and place the string at the right position in my input xml. But I am not able to achieve it using the STAX class - XMLStreamWriter
Can this be achieved using XMLStreamWriter? Please suggest.
If no, they please suggest an alternative approach to my problem.
My main concern is memory as I cannot load such huge XMLs into our project server's memory which is shared across multiple processes. Hence I do not want use DOM because this will use lot of memory to load these huge XML.
If you need to alter a particular value like text content /tag name etc. STAX might help. It would also help in removing few elements using createFilteredReader
Below code renames Name to AuthorName and adds a comment
public class StAx {
public static void main(String[] args) throws FileNotFoundException,
XMLStreamException {
String filename = "HelloWorld.xml";
try (InputStream in = new FileInputStream(filename);
OutputStream out = System.out;) {
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLEventFactory ef = XMLEventFactory.newInstance();
XMLEventReader reader = factory.createXMLEventReader(filename, in);
XMLEventWriter writer = xof.createXMLEventWriter(out);
while (reader.hasNext()) {
XMLEvent event = (XMLEvent) reader.next();
if (event.isCharacters()) {
String data = event.asCharacters().getData();
if (data.contains("Hello")) {
String replace = data.replace("Hello", "Oh");
event = ef.createCharacters(replace);
}
writer.add(event);
} else if (event.isStartElement()) {
StartElement s = event.asStartElement();
String tagName = s.getName().getLocalPart();
if (tagName.equals("Name")) {
String newName = "Author" + tagName;
event = ef.createStartElement(new QName(newName), null,
null);
writer.add(event);
writer.add(ef.createCharacters("\n "));
event = ef.createComment("auto generated comment");
writer.add(event);
} else {
writer.add(event);
}
} else {
writer.add(event);
}
}
writer.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Input
<?xml version="1.0"?>
<BookCatalogue>
<Book>
<Title>HelloLord</Title>
<Name>
<first>New</first>
<last>Earth</last>
</Name>
<ISBN>12345</ISBN>
</Book>
<Book>
<Title>HelloWord</Title>
<Name>
<first>New</first>
<last>Moon</last>
</Name>
<ISBN>12346</ISBN>
</Book>
</BookCatalogue>
Output
<?xml version="1.0"?><BookCatalogue>
<Book>
<Title>OhLord</Title>
<AuthorName>
<!--auto generated comment-->
<first>New</first>
<last>Earth</last>
</AuthorName>
<ISBN>12345</ISBN>
</Book>
<Book>
<Title>OhWord</Title>
<AuthorName>
<!--auto generated comment-->
<first>New</first>
<last>Moon</last>
</AuthorName>
<ISBN>12346</ISBN>
</Book>
</BookCatalogue>
As you can see things gets really complicated when modification is much more than this like swapping two nodes deleting one node based on state of few other node : delete All Books with price more than average price
Best solution in this case is to produce resulting xml using xslt transformation
Related
I'm trying to set up an Apache Camel route, that inputs a large XML file and then split the payload into two different files using a field condition. I.e. if an ID field starts with a 1, it goes to one output file, otherwise to another. Using Camel is not a must and I've looked at XSLT and regular Java options as well but I just feel that this should work.
I've covered splitting the actual payload but I'm having issues with making sure that the parent nodes, including a header, is included in each file as well. As the file can be large, I want to make sure that streams are used for the payload. I feel like I've read hundreds of different questions here, blog entries, etc. on this, and pretty much every case covers either loading the entire file into memory, splitting the file equally into parts og just using the payload nodes individually.
My prototype XML file looks like this:
<root>
<header>
<title>Testing</title>
</header>
<orders>
<order>
<id>11</id>
<stuff>One</stuff>
</order>
<order>
<id>20</id>
<stuff>Two</stuff>
</order>
<order>
<id>12</id>
<stuff>Three</stuff>
</order>
</orders>
</root>
The result should be two files - condition true (id starts with 1):
<root>
<header>
<title>Testing</title>
</header>
<orders>
<order>
<id>11</id>
<stuff>One</stuff>
</order>
<order>
<id>12</id>
<stuff>Three</stuff>
</order>
</orders>
</root>
Condition false:
<root>
<header>
<title>Testing</title>
</header>
<orders>
<order>
<id>20</id>
<stuff>Two</stuff>
</order>
</orders>
</root>
My prototype route:
from("file:" + inputFolder)
.log("Processing file ${headers.CamelFileName}")
.split()
.tokenizeXML("order", "*") // Includes parent in every node
.streaming()
.choice()
.when(body().contains("id>1"))
.to("direct:ones")
.stop()
.otherwise()
.to("direct:others")
.stop()
.end()
.end();
from("direct:ones")
//.aggregate(header("ones"), new StringAggregator()) // missing end condition
.to("file:" + outputFolder + "?fileName=ones-${in.header.CamelFileName}&fileExist=Append");
from("direct:others")
//.aggregate(header("others"), new StringAggregator()) // missing end condition
.to("file:" + outputFolder + "?fileName=others-${in.header.CamelFileName}&fileExist=Append");
This works as intented, except that the parent tags (header and footer, if you will) is added for every node. Using just the node in tokenizeXML returns only the node itself but I can't figure out how to add the header and footer. Preferably I would want to stream the parent tags into a header and footer property and add them before and after the split.
How can I do this? Would I somehow need to tokenize the parent tags first and would this mean streaming the file twice?
As a final note you might notice the aggregate at the end. I don't want to aggregate every node before writing to the file, as that defeats the purpose of streaming it and keep the entire file out of memory, but I figured I might gain some performance by aggregating a number of nodes before writing to the file, to lessen the perfomance hit of writing to the drive for every node. I'm not sure if this make sense to do.
I was unable to make it work with Camel. Or rather, when using plain Java for extracting the header, I already had everything I needed to continue and make the split and swapping back to Camel seemed cumbersome. There are most likely ways to improve on this, but this was my solution to splitting the XML payload.
Switching between the two types of output streams is not that pretty but it eases the use of everything else. Also of note, is that I chose equalsIgnoreCase to check the tag names even though XML is normally case sensitive. For me, it reduces the risk of errors. Finally, make sure your regex match the entire string using wildcards, as per normal string regex.
/**
* Splits a XML file's payload into two new files based on a regex condition. The payload is a specific XML tag in the
* input file that is repeated a number of times. All tags before and after the payload are added to both files in order
* to keep the same structure.
*
* The content of each payload tag is compared to the regex condition and if true, it is added to the primary output file.
* Otherwise it is added to the secondary output file. The payload can be empty and an empty payload tag will be added to
* the secondary output file. Note that the output will not be an unaltered copy of the input as self-closing XML tags are
* altered to corresponding opening and closing tags.
*
* Data is streamed from the input file to the output files, keeping memory usage small even with large files.
*
* #param inputFilename Path and filename for the input XML file
* #param outputFilenamePrimary Path and filename for the primary output file
* #param outputFilenameSecondary Path and filename for the secondary output file
* #param payloadTag XML tag name of the payload
* #param payloadParentTag XML tag name of the payload's direct parent
* #param splitRegex The regex split condition used on the payload content
* #throws Exception On invalid filenames, missing input, incorrect XML structure, etc.
*/
public static void splitXMLPayload(String inputFilename, String outputFilenamePrimary, String outputFilenameSecondary, String payloadTag, String payloadParentTag, String splitRegex) throws Exception {
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
XMLEventReader xmlEventReader = null;
FileInputStream fileInputStream = null;
FileWriter fileWriterPrimary = null;
FileWriter fileWriterSecondary = null;
XMLEventWriter xmlEventWriterSplitPrimary = null;
XMLEventWriter xmlEventWriterSplitSecondary = null;
try {
fileInputStream = new FileInputStream(inputFilename);
xmlEventReader = xmlInputFactory.createXMLEventReader(fileInputStream);
fileWriterPrimary = new FileWriter(outputFilenamePrimary);
fileWriterSecondary = new FileWriter(outputFilenameSecondary);
xmlEventWriterSplitPrimary = xmlOutputFactory.createXMLEventWriter(fileWriterPrimary);
xmlEventWriterSplitSecondary = xmlOutputFactory.createXMLEventWriter(fileWriterSecondary);
boolean isStart = true;
boolean isEnd = false;
boolean lastSplitIsPrimary = true;
while (xmlEventReader.hasNext()) {
XMLEvent xmlEvent = xmlEventReader.nextEvent();
// Check for start of payload element
if (!isEnd && xmlEvent.isStartElement()) {
StartElement startElement = xmlEvent.asStartElement();
if (startElement.getName().getLocalPart().equalsIgnoreCase(payloadTag)) {
if (isStart) {
isStart = false;
// Flush the event writers as we'll use the file writers for the payload
xmlEventWriterSplitPrimary.flush();
xmlEventWriterSplitSecondary.flush();
}
String order = getTagAsString(xmlEventReader, xmlEvent, payloadTag, xmlOutputFactory);
if (order.matches(splitRegex)) {
lastSplitIsPrimary = true;
fileWriterPrimary.write(order);
} else {
lastSplitIsPrimary = false;
fileWriterSecondary.write(order);
}
}
}
// Check for end of parent tag
else if (!isStart && !isEnd && xmlEvent.isEndElement()) {
EndElement endElement = xmlEvent.asEndElement();
if (endElement.getName().getLocalPart().equalsIgnoreCase(payloadParentTag)) {
isEnd = true;
}
}
// Is neither start or end and we're handling payload (most often white space)
else if (!isStart && !isEnd) {
// Add to last split handled
if (lastSplitIsPrimary) {
xmlEventWriterSplitPrimary.add(xmlEvent);
xmlEventWriterSplitPrimary.flush();
} else {
xmlEventWriterSplitSecondary.add(xmlEvent);
xmlEventWriterSplitSecondary.flush();
}
}
// Start and end is added to both files
if (isStart || isEnd) {
xmlEventWriterSplitPrimary.add(xmlEvent);
xmlEventWriterSplitSecondary.add(xmlEvent);
}
}
} catch (Exception e) {
logger.error("Error in XML split", e);
throw e;
} finally {
// Close the streams
try {
xmlEventReader.close();
} catch (XMLStreamException e) {
// ignore
}
try {
xmlEventReader.close();
} catch (XMLStreamException e) {
// ignore
}
try {
xmlEventWriterSplitPrimary.close();
} catch (XMLStreamException e) {
// ignore
}
try {
xmlEventWriterSplitSecondary.close();
} catch (XMLStreamException e) {
// ignore
}
try {
fileWriterPrimary.close();
} catch (IOException e) {
// ignore
}
try {
fileWriterSecondary.close();
} catch (IOException e) {
// ignore
}
}
}
/**
* Loops through the events in the {#code XMLEventReader} until the specific XML end tag is found and returns everything
* contained within the XML tag as a String.
*
* Data is streamed from the {#code XMLEventReader}, however the String can be large depending of the number of children
* in the XML tag.
*
* #param xmlEventReader The already active reader. The starting tag event is assumed to have already been read
* #param startEvent The starting XML tag event already read from the {#code XMLEventReader}
* #param tag The XML tag name used to find the starting XML tag
* #param xmlOutputFactory Convenience include to avoid creating another factory
* #return String containing everything between the starting and ending XML tag, the tags themselves included
* #throws Exception On incorrect XML structure
*/
private static String getTagAsString(XMLEventReader xmlEventReader, XMLEvent startEvent, String tag, XMLOutputFactory xmlOutputFactory) throws Exception {
StringWriter stringWriter = new StringWriter();
XMLEventWriter xmlEventWriter = xmlOutputFactory.createXMLEventWriter(stringWriter);
// Add the start tag
xmlEventWriter.add(startEvent);
// Add until end tag
while (xmlEventReader.hasNext()) {
XMLEvent xmlEvent = xmlEventReader.nextEvent();
// End tag found
if (xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().getLocalPart().equalsIgnoreCase(tag)) {
xmlEventWriter.add(xmlEvent);
xmlEventWriter.close();
stringWriter.close();
return stringWriter.toString();
} else {
xmlEventWriter.add(xmlEvent);
}
}
xmlEventWriter.close();
stringWriter.close();
throw new Exception("Invalid XML, no closing tag for <" + tag + "> found!");
}
In order to create big XML files, we decided to make use of the StAX API. The basic structure is build by using the low-level api's: createStartDocument(), createStartElement(). This works as expected.
However, in some cases we like to append existing XML data which resides in a String (retrieved from database). The following snippet illustrates this:
import java.lang.*;
import java.io.*;
import javax.xml.stream.*;
public class Example {
public static void main(String... args) throws XMLStreamException {
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
StringWriter writer = new StringWriter();
XMLEventWriter eventWriter = outputFactory.createXMLEventWriter(writer);
eventWriter.add(eventFactory.createStartDocument("UTF-8", "1.0"));
eventWriter.add(eventFactory.createStartElement("ns0", "http://example.org", "root"));
eventWriter.add(eventFactory.createNamespace("ns0", "http://example.org"));
//In here, we want to append a piece of XML which is stored in a string variable.
String xml = "<fragments><fragment><data>This is pure text.</data></fragment></fragments>";
eventWriter.add(inputFactory.createXMLEventReader(new StringReader(xml)));
eventWriter.add(eventFactory.createEndDocument());
System.out.println(writer.toString());
}
}
With the above code, depending on the implementation, we are not getting the expected result:
Woodstox: The following exception is thrown:'Can not output XML declaration, after output has already been done'. It seems that the XMLEventReader starts off with a startDocument event, but since a startDocument event was already triggered programatically, it throws the error.
JDK: It appends <?xml version="1.0" ... <fragments><fragment>... -> Which leads to invalid XML.
I have also tried to append the XML by using:
eventFactory.createCharacters(xml);
The problem here is that even though the XML is appended, the < and > are transformed into < and >. Therefore, this results in invalid XML.
Am I missing an API that allows me to simply append a String as XML?
You can first consume any StartDocument if necessary:
String xml = "<fragments><fragment><data>This is pure text.</data></fragment></fragments>";
XMLEventReader xer = inputFactory.createXMLEventReader(new StringReader(xml));
if (xer.peek().isStartDocument())
{
xer.nextEvent();
}
eventWriter.add(xer);
I need to create WEB service that will translate some words between two languages so I have created an interface:
#WebService
public interface Translator {
#WebMethod
String translate(String word, String originalLanguage, String targetLanguage);
}
And class that implements that interface:
#WebService(endpointInterface = "source.Translator")
public class TranslatorImpl implements Translator{
#Override
public String translate(String word, String originalLanguage, String targetLanguage) {
return word + originalLanguage +" butterfly " + targetLanguage + " baboska ";
}
}
But because I'm very new to this I don't know how to set this webMethod to read from an XML file that is supposed to be a database with words. Right now how I did it, when I test it, it only returns the same word whatever you write. So can anybody explain to me how to read from an XML file so if I write butterfly it translate that or if I write flower it translate that. Do I do parsing of XML file in this webMethod?
I think your question "Do I do parsing of XML file in this webMethod?" has not much to do with webservices in particular but with softwaredesign and -architecture. Following the "single responsibility" principle you should have the XML-handling in another class.
Regarding the reading of the xml-file there are a lot of questions with good answers here on SO as for example Java - read xml file.
By the way: Have you thought of using a database? It is more flexible when it comes to adding new translations than a XML-file and regarded as best practice when handling data that's likely to be changed (a lot of new entries added in the future).
EDIT
A little quick and dirty example to have a better understanding about what I suggested. Mind that the datastructure does not cover the usage of various languages! If you need that you have to alter the example.
First of all you need something like a XmlDataSource class:
public class XmlDataSource {
public String getTranslation(String original) throws Exception {
Document d = readData();
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/dictionary/entry/translation[../original/text() = '" + original + "']");
String translated = (String) expr.evaluate(d, XPathConstants.STRING);
return translated;
}
private Document readData() throws Exception {
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
File datafile = new File("your/path/to/translations.xml");
return documentBuilder.parse(new FileInputStream(datafile));
}
}
The xpath in the example relies on a structure like this:
<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
<entry>
<original>butterfly</original>
<translation>Schmetterling</translation>
</entry>
<entry>
<original>flower</original>
<translation>Blume</translation>
</entry>
<entry>
<original>tree</original>
<translation>Baum</translation>
</entry>
</dictionary>
Then you can call the datasource in your webservice to translate the requested word:
#Override
public String translate(String word, String originalLanguage, String targetLanguage) {
XmlDataSource dataSource = new XmlDataSource();
return dataSource.getTranslation(word);
}
I am writing a Java program that reads an XML file, makes some modifications, and writes back the XML.
Using the standard Java XML DOM API, the order of the attributes is not preserved.
That is, if I have an input file such as:
<person first_name="john" last_name="lederrey"/>
I might get an output file as:
<person last_name="lederrey" first_name="john"/>
That's correct, because the XML specification says that order attribute is not significant.
However, my program needs to preserve the order of the attributes, so that a person can easily compare the input and output document with a diff tool.
One solution for that is to process the document with SAX (instead of DOM):
Order of XML attributes after DOM processing
However, this does not work for my case,
because the transformation I need to do in one node might depend on a XPath expression on the whole document.
So, the simplest thing would be to have a XML library very similar to the standard Java DOM library, with the exception that it preserves the attribute order.
Is there such a library?
PS: Please, avoid discussing whether I should the preserve attribute order or not. This is a very interesting discussion, but it is not the point of this question.
Saxon these days offers a serialization option[1] to control the order in which attributes are output. It doesn't retain the input order (because Saxon doesn't know the input order), but it does allow you to control, for example, that the ID attribute always appears first.
And this can be very useful if the XML is going to be hand-edited; XML in which the attributes appear in the "wrong" order can be very disorienting to a human reader or editor.
If you're using this as part of a diff process then you would want to put both files through a process that normalizes the attribute order before comparing them. However, for comparing files my preferred approach is to parse them both and use the XPath deep-equal() function; or to use a specialized tool like DeltaXML.
[1] saxon:attribute-order - see http://www.saxonica.com/documentation/index.html#!extensions/output-extras/serialization-parameters
You might also want to try DecentXML, as it can preserve the attribute order, comments and even indentation.
It is very nice if you need to programmatically update an XML file that's also supposed to be human-editable. We use it for one of our configuration tools.
-- edit --
It seems it is no longer available on its original location; try these ones:
https://github.com/cartermckinnon/decentxml
https://github.com/haroldo-ok/decentxml (unnoficial and unmaintained fork; kept here just in case the other forks disappear, too)
https://directory.fsf.org/wiki/DecentXML
Do it twice:
Read the document in using a DOM parser so you have references, a repository, if you will.
Then read it again using SAX. At the point where you need to make the transformation, reference the DOM version to determine what you need, then output what you need in the middle of the SAX stream.
Your best bet would be to use StAX instead of DOM for generating the original document. StAX gives you a lot of fine control over these things and lets you stream output progressively to an output stream instead of holding it all in memory.
We had similar requirements per Dave's description. A solution that worked was based on Java reflection.
The idea is to set the propOrder for the attributes at runtime. In our case there's APP_DATA element containing three attributes: app, key, and value. The generated AppData class includes "content" in propOrder and none of the other attributes:
#XmlAccessorType(XmlAccessType.FIELD)
#XmlType(name = "AppData", propOrder = {
"content"
})
public class AppData {
#XmlValue
protected String content;
#XmlAttribute(name = "Value", required = true)
protected String value;
#XmlAttribute(name = "Name", required = true)
protected String name;
#XmlAttribute(name = "App", required = true)
protected String app;
...
}
So Java reflection was used as follows to set the order at runtime:
final String[] propOrder = { "app", "name", "value" };
ReflectionUtil.changeAnnotationValue(
AppData.class.getAnnotation(XmlType.class),
"propOrder", propOrder);
final JAXBContext jaxbContext = JAXBContext
.newInstance(ADI.class);
final Marshaller adimarshaller = jaxbContext.createMarshaller();
adimarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT,
true);
adimarshaller.marshal(new JAXBElement<ADI>(new QName("ADI"),
ADI.class, adi),
new StreamResult(fileOutputStream));
The changeAnnotationValue() was borrowed from this post:
Modify a class definition's annotation string parameter at runtime
Here's the method for your convenience (credit goes to #assylias and #Balder):
/**
* Changes the annotation value for the given key of the given annotation to newValue and returns
* the previous value.
*/
#SuppressWarnings("unchecked")
public static Object changeAnnotationValue(Annotation annotation, String key, Object newValue) {
Object handler = Proxy.getInvocationHandler(annotation);
Field f;
try {
f = handler.getClass().getDeclaredField("memberValues");
} catch (NoSuchFieldException | SecurityException e) {
throw new IllegalStateException(e);
}
f.setAccessible(true);
Map<String, Object> memberValues;
try {
memberValues = (Map<String, Object>) f.get(handler);
} catch (IllegalArgumentException | IllegalAccessException e) {
throw new IllegalStateException(e);
}
Object oldValue = memberValues.get(key);
if (oldValue == null || oldValue.getClass() != newValue.getClass()) {
throw new IllegalArgumentException();
}
memberValues.put(key, newValue);
return oldValue;
}
You may override AttributeSortedMap and sort attributes as you need...
The main idea: load the document, recursively copy to elements that support sorted attributeMap and serialize using the existing XMLSerializer.
File test.xml
<root>
<person first_name="john1" last_name="lederrey1"/>
<person first_name="john2" last_name="lederrey2"/>
<person first_name="john3" last_name="lederrey3"/>
<person first_name="john4" last_name="lederrey4"/>
</root>
File AttOrderSorter.java
import com.sun.org.apache.xerces.internal.dom.AttrImpl;
import com.sun.org.apache.xerces.internal.dom.AttributeMap;
import com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl;
import com.sun.org.apache.xerces.internal.dom.ElementImpl;
import com.sun.org.apache.xml.internal.serialize.OutputFormat;
import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.Writer;
import java.util.List;
import static java.util.Arrays.asList;
public class AttOrderSorter {
private List<String> sortAtts = asList("last_name", "first_name");
public void format(String inFile, String outFile) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbFactory.newDocumentBuilder();
Document outDocument = builder.newDocument();
try (FileInputStream inputStream = new FileInputStream(inFile)) {
Document document = dbFactory.newDocumentBuilder().parse(inputStream);
Element sourceRoot = document.getDocumentElement();
Element outRoot = outDocument.createElementNS(sourceRoot.getNamespaceURI(), sourceRoot.getTagName());
outDocument.appendChild(outRoot);
copyAtts(sourceRoot.getAttributes(), outRoot);
copyElement(sourceRoot.getChildNodes(), outRoot, outDocument);
}
try (Writer outxml = new FileWriter(new File(outFile))) {
OutputFormat format = new OutputFormat();
format.setLineWidth(0);
format.setIndenting(false);
format.setIndent(2);
XMLSerializer serializer = new XMLSerializer(outxml, format);
serializer.serialize(outDocument);
}
}
private void copyElement(NodeList nodes, Element parent, Document document) {
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = new ElementImpl((CoreDocumentImpl) document, node.getNodeName()) {
#Override
public NamedNodeMap getAttributes() {
return new AttributeSortedMap(this, (AttributeMap) super.getAttributes());
}
};
copyAtts(node.getAttributes(), element);
copyElement(node.getChildNodes(), element, document);
parent.appendChild(element);
}
}
}
private void copyAtts(NamedNodeMap attributes, Element target) {
for (int i = 0; i < attributes.getLength(); i++) {
Node att = attributes.item(i);
target.setAttribute(att.getNodeName(), att.getNodeValue());
}
}
public class AttributeSortedMap extends AttributeMap {
AttributeSortedMap(ElementImpl element, AttributeMap attributes) {
super(element, attributes);
nodes.sort((o1, o2) -> {
AttrImpl att1 = (AttrImpl) o1;
AttrImpl att2 = (AttrImpl) o2;
Integer pos1 = sortAtts.indexOf(att1.getNodeName());
Integer pos2 = sortAtts.indexOf(att2.getNodeName());
if (pos1 > -1 && pos2 > -1) {
return pos1.compareTo(pos2);
} else if (pos1 > -1 || pos2 > -1) {
return pos1 == -1 ? 1 : -1;
}
return att1.getNodeName().compareTo(att2.getNodeName());
});
}
}
public void main(String[] args) throws Exception {
new AttOrderSorter().format("src/main/resources/test.xml", "src/main/resources/output.xml");
}
}
Result - file output.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<person last_name="lederrey1" first_name="john1"/>
<person last_name="lederrey2" first_name="john2"/>
<person last_name="lederrey3" first_name="john3"/>
<person last_name="lederrey4" first_name="john4"/>
</root>
You can't use the DOM, but you can use SAX, or querying children using XPath.
Visit the answer Order of XML attributes after DOM processing.
I have an 200 MB xml of the following form:
<school name = "some school">
<class standard = "2A">
<student>
.....
</student>
<student>
.....
</student>
<student>
.....
</student>
</class>
</school>
I need to split this xml into several files using StAX such that n students come under each xml file and the structure is preserved as <school> then <class> and <students> under them. The attributes of School and class also must be preserved in the resultant xmls.
Here is the code I am using:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
String xmlFile = "input.XML";
XMLEventReader reader = inputFactory.createXMLEventReader(new FileReader(xmlFile));
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
outputFactory.setProperty("javax.xml.stream.isRepairingNamespaces", Boolean.TRUE);
XMLEventWriter writer = null;
int count = 0;
QName name = new QName(null, "student");
try {
while (true) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
StartElement element = event.asStartElement();
if (element.getName().equals(name)) {
String filename = "input"+ count + ".xml";
writer = outputFactory.createXMLEventWriter(new FileWriter(filename));
writeToFile(reader, event, writer);
writer.close();
count++;
}
}
if (event.isEndDocument())
break;
}
} catch (XMLStreamException e) {
throw e;
} catch (IOException e) {
e.printStackTrace();
} finally {
reader.close();
}
private static void writeToFile(XMLEventReader reader, XMLEvent startEvent, XMLEventWriter writer) throws XMLStreamException, IOException {
StartElement element = startEvent.asStartElement();
QName name = element.getName();
int stack = 1;
writer.add(element);
while (true) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement() && event.asStartElement().getName().equals(name))
stack++;
if (event.isEndElement()) {
EndElement end = event.asEndElement();
if (end.getName().equals(name)) {
stack--;
if (stack == 0) {
writer.add(event);
break;
}
}
}
writer.add(event);
}
}
Please check the function call writeToFile(reader, event, writer) in the try block. Here the reader object has only the student tag. I need the reader has the school, class, and then n students in it. so that the file generated has a similar structure as the original only with lesser children per file.
Thanks in advance.
I think you can keep track of list of parent events prior to the "student" start element event and pass it to the writeToFile() method. Then in the writeToFile() method you can use that list to simulate the "school" and "class" events.
You have code for determining when to start a new file which I haven't examined closely, but the process of finishing one file and starting the next is definitely incomplete.
On reaching a point where you want to end a file, you have to generate end events for the enclosing <class> and <school> tags and for the document before closing it. When you start your new file, you need to generate start events for the same after opening it and before starting again to copy student events.
In order to generate the start events properly, you will have to retain the corresponding events from the input.
Save yourself trouble and time and use the flat xml file structure you currently have, and then create POJO Objects which will represent each object as you've stated; Student, School and Class. And then using Jaxb bind the objects with different part of the Structure. You can then effectively unmarshal the xml and access the various elements as if you're dealing with SQL objects.
Use this link as a starting point XML parsing with JAXB
One issue doing it this way is memory consumption. For design flexibility and memory management, I will suggest using SQL to handle this.