Code for Using StAX in java - java

I have an 200 MB xml of the following form:
<school name = "some school">
<class standard = "2A">
<student>
.....
</student>
<student>
.....
</student>
<student>
.....
</student>
</class>
</school>
I need to split this xml into several files using StAX such that n students come under each xml file and the structure is preserved as <school> then <class> and <students> under them. The attributes of School and class also must be preserved in the resultant xmls.
Here is the code I am using:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
String xmlFile = "input.XML";
XMLEventReader reader = inputFactory.createXMLEventReader(new FileReader(xmlFile));
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
outputFactory.setProperty("javax.xml.stream.isRepairingNamespaces", Boolean.TRUE);
XMLEventWriter writer = null;
int count = 0;
QName name = new QName(null, "student");
try {
while (true) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
StartElement element = event.asStartElement();
if (element.getName().equals(name)) {
String filename = "input"+ count + ".xml";
writer = outputFactory.createXMLEventWriter(new FileWriter(filename));
writeToFile(reader, event, writer);
writer.close();
count++;
}
}
if (event.isEndDocument())
break;
}
} catch (XMLStreamException e) {
throw e;
} catch (IOException e) {
e.printStackTrace();
} finally {
reader.close();
}
private static void writeToFile(XMLEventReader reader, XMLEvent startEvent, XMLEventWriter writer) throws XMLStreamException, IOException {
StartElement element = startEvent.asStartElement();
QName name = element.getName();
int stack = 1;
writer.add(element);
while (true) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement() && event.asStartElement().getName().equals(name))
stack++;
if (event.isEndElement()) {
EndElement end = event.asEndElement();
if (end.getName().equals(name)) {
stack--;
if (stack == 0) {
writer.add(event);
break;
}
}
}
writer.add(event);
}
}
Please check the function call writeToFile(reader, event, writer) in the try block. Here the reader object has only the student tag. I need the reader has the school, class, and then n students in it. so that the file generated has a similar structure as the original only with lesser children per file.
Thanks in advance.

I think you can keep track of list of parent events prior to the "student" start element event and pass it to the writeToFile() method. Then in the writeToFile() method you can use that list to simulate the "school" and "class" events.

You have code for determining when to start a new file which I haven't examined closely, but the process of finishing one file and starting the next is definitely incomplete.
On reaching a point where you want to end a file, you have to generate end events for the enclosing <class> and <school> tags and for the document before closing it. When you start your new file, you need to generate start events for the same after opening it and before starting again to copy student events.
In order to generate the start events properly, you will have to retain the corresponding events from the input.

Save yourself trouble and time and use the flat xml file structure you currently have, and then create POJO Objects which will represent each object as you've stated; Student, School and Class. And then using Jaxb bind the objects with different part of the Structure. You can then effectively unmarshal the xml and access the various elements as if you're dealing with SQL objects.
Use this link as a starting point XML parsing with JAXB
One issue doing it this way is memory consumption. For design flexibility and memory management, I will suggest using SQL to handle this.

Related

ContentHandler is taking a lot of time to go through 3MB XML parsed xlsx file

I'm using SAX parser and XSSFReader of apache.poi to parse .xlsx file. my sheet contains up to 650 columns and 2000 rows (file size- about 2.5 MB). my code looks like so:
public class MyClass {
public static void main(String path){
try {
OPCPackage pkg = OPCPachage.open(new FileInputStream(path));
XSSFReader reader = new XSSFReader(pkg);
InputStream sheetData = reader.getSheet("rId3"); //the needed sheet
MyHandler handler = new MyHandler();
XMLReader parser = SAXHelper.newXMLReader();
parser.setContentHandler(handler);
parser.parse(new InputSource(sheetData));
}
catch (Exception e){
//or other catches with required exceptions
}
}
}
class MyHandler extends DefaultHandler {
#Override
public void startElement (String uri, String localName, String name, Attributes attributes) {
if("row".equals(name)) {
System.out.pringln("row: " + attributes.getValue("r"));
}
}
}
but unfortunately I saw that it takes 2 or 3 seconds to go over one row, that means that going over the sheet takes over than 30 minutes(!!)
Well, I am sure this is not supposed to be, if it was- noboy was suggesing apache.poi eventaApi for large files, was he?
I want to get to the <mergeCell> values at the end of the XML (after the closing </sheetData>, is there a better way to do it? (I was thinking of handle with the string of the xml, and simply search with some regular expression for the required values, is it possible?)
So- I have two questions:
1. What's wrong with my code/why it takes so long? (when I think about it- it actually sounds as normal situation, 600 cells- why not processing in few seconds?)
2. Is there a way to treat XML as a text file and simply search in it using regex?

Camel, split large XML file with header, using field condition

I'm trying to set up an Apache Camel route, that inputs a large XML file and then split the payload into two different files using a field condition. I.e. if an ID field starts with a 1, it goes to one output file, otherwise to another. Using Camel is not a must and I've looked at XSLT and regular Java options as well but I just feel that this should work.
I've covered splitting the actual payload but I'm having issues with making sure that the parent nodes, including a header, is included in each file as well. As the file can be large, I want to make sure that streams are used for the payload. I feel like I've read hundreds of different questions here, blog entries, etc. on this, and pretty much every case covers either loading the entire file into memory, splitting the file equally into parts og just using the payload nodes individually.
My prototype XML file looks like this:
<root>
<header>
<title>Testing</title>
</header>
<orders>
<order>
<id>11</id>
<stuff>One</stuff>
</order>
<order>
<id>20</id>
<stuff>Two</stuff>
</order>
<order>
<id>12</id>
<stuff>Three</stuff>
</order>
</orders>
</root>
The result should be two files - condition true (id starts with 1):
<root>
<header>
<title>Testing</title>
</header>
<orders>
<order>
<id>11</id>
<stuff>One</stuff>
</order>
<order>
<id>12</id>
<stuff>Three</stuff>
</order>
</orders>
</root>
Condition false:
<root>
<header>
<title>Testing</title>
</header>
<orders>
<order>
<id>20</id>
<stuff>Two</stuff>
</order>
</orders>
</root>
My prototype route:
from("file:" + inputFolder)
.log("Processing file ${headers.CamelFileName}")
.split()
.tokenizeXML("order", "*") // Includes parent in every node
.streaming()
.choice()
.when(body().contains("id>1"))
.to("direct:ones")
.stop()
.otherwise()
.to("direct:others")
.stop()
.end()
.end();
from("direct:ones")
//.aggregate(header("ones"), new StringAggregator()) // missing end condition
.to("file:" + outputFolder + "?fileName=ones-${in.header.CamelFileName}&fileExist=Append");
from("direct:others")
//.aggregate(header("others"), new StringAggregator()) // missing end condition
.to("file:" + outputFolder + "?fileName=others-${in.header.CamelFileName}&fileExist=Append");
This works as intented, except that the parent tags (header and footer, if you will) is added for every node. Using just the node in tokenizeXML returns only the node itself but I can't figure out how to add the header and footer. Preferably I would want to stream the parent tags into a header and footer property and add them before and after the split.
How can I do this? Would I somehow need to tokenize the parent tags first and would this mean streaming the file twice?
As a final note you might notice the aggregate at the end. I don't want to aggregate every node before writing to the file, as that defeats the purpose of streaming it and keep the entire file out of memory, but I figured I might gain some performance by aggregating a number of nodes before writing to the file, to lessen the perfomance hit of writing to the drive for every node. I'm not sure if this make sense to do.
I was unable to make it work with Camel. Or rather, when using plain Java for extracting the header, I already had everything I needed to continue and make the split and swapping back to Camel seemed cumbersome. There are most likely ways to improve on this, but this was my solution to splitting the XML payload.
Switching between the two types of output streams is not that pretty but it eases the use of everything else. Also of note, is that I chose equalsIgnoreCase to check the tag names even though XML is normally case sensitive. For me, it reduces the risk of errors. Finally, make sure your regex match the entire string using wildcards, as per normal string regex.
/**
* Splits a XML file's payload into two new files based on a regex condition. The payload is a specific XML tag in the
* input file that is repeated a number of times. All tags before and after the payload are added to both files in order
* to keep the same structure.
*
* The content of each payload tag is compared to the regex condition and if true, it is added to the primary output file.
* Otherwise it is added to the secondary output file. The payload can be empty and an empty payload tag will be added to
* the secondary output file. Note that the output will not be an unaltered copy of the input as self-closing XML tags are
* altered to corresponding opening and closing tags.
*
* Data is streamed from the input file to the output files, keeping memory usage small even with large files.
*
* #param inputFilename Path and filename for the input XML file
* #param outputFilenamePrimary Path and filename for the primary output file
* #param outputFilenameSecondary Path and filename for the secondary output file
* #param payloadTag XML tag name of the payload
* #param payloadParentTag XML tag name of the payload's direct parent
* #param splitRegex The regex split condition used on the payload content
* #throws Exception On invalid filenames, missing input, incorrect XML structure, etc.
*/
public static void splitXMLPayload(String inputFilename, String outputFilenamePrimary, String outputFilenameSecondary, String payloadTag, String payloadParentTag, String splitRegex) throws Exception {
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
XMLEventReader xmlEventReader = null;
FileInputStream fileInputStream = null;
FileWriter fileWriterPrimary = null;
FileWriter fileWriterSecondary = null;
XMLEventWriter xmlEventWriterSplitPrimary = null;
XMLEventWriter xmlEventWriterSplitSecondary = null;
try {
fileInputStream = new FileInputStream(inputFilename);
xmlEventReader = xmlInputFactory.createXMLEventReader(fileInputStream);
fileWriterPrimary = new FileWriter(outputFilenamePrimary);
fileWriterSecondary = new FileWriter(outputFilenameSecondary);
xmlEventWriterSplitPrimary = xmlOutputFactory.createXMLEventWriter(fileWriterPrimary);
xmlEventWriterSplitSecondary = xmlOutputFactory.createXMLEventWriter(fileWriterSecondary);
boolean isStart = true;
boolean isEnd = false;
boolean lastSplitIsPrimary = true;
while (xmlEventReader.hasNext()) {
XMLEvent xmlEvent = xmlEventReader.nextEvent();
// Check for start of payload element
if (!isEnd && xmlEvent.isStartElement()) {
StartElement startElement = xmlEvent.asStartElement();
if (startElement.getName().getLocalPart().equalsIgnoreCase(payloadTag)) {
if (isStart) {
isStart = false;
// Flush the event writers as we'll use the file writers for the payload
xmlEventWriterSplitPrimary.flush();
xmlEventWriterSplitSecondary.flush();
}
String order = getTagAsString(xmlEventReader, xmlEvent, payloadTag, xmlOutputFactory);
if (order.matches(splitRegex)) {
lastSplitIsPrimary = true;
fileWriterPrimary.write(order);
} else {
lastSplitIsPrimary = false;
fileWriterSecondary.write(order);
}
}
}
// Check for end of parent tag
else if (!isStart && !isEnd && xmlEvent.isEndElement()) {
EndElement endElement = xmlEvent.asEndElement();
if (endElement.getName().getLocalPart().equalsIgnoreCase(payloadParentTag)) {
isEnd = true;
}
}
// Is neither start or end and we're handling payload (most often white space)
else if (!isStart && !isEnd) {
// Add to last split handled
if (lastSplitIsPrimary) {
xmlEventWriterSplitPrimary.add(xmlEvent);
xmlEventWriterSplitPrimary.flush();
} else {
xmlEventWriterSplitSecondary.add(xmlEvent);
xmlEventWriterSplitSecondary.flush();
}
}
// Start and end is added to both files
if (isStart || isEnd) {
xmlEventWriterSplitPrimary.add(xmlEvent);
xmlEventWriterSplitSecondary.add(xmlEvent);
}
}
} catch (Exception e) {
logger.error("Error in XML split", e);
throw e;
} finally {
// Close the streams
try {
xmlEventReader.close();
} catch (XMLStreamException e) {
// ignore
}
try {
xmlEventReader.close();
} catch (XMLStreamException e) {
// ignore
}
try {
xmlEventWriterSplitPrimary.close();
} catch (XMLStreamException e) {
// ignore
}
try {
xmlEventWriterSplitSecondary.close();
} catch (XMLStreamException e) {
// ignore
}
try {
fileWriterPrimary.close();
} catch (IOException e) {
// ignore
}
try {
fileWriterSecondary.close();
} catch (IOException e) {
// ignore
}
}
}
/**
* Loops through the events in the {#code XMLEventReader} until the specific XML end tag is found and returns everything
* contained within the XML tag as a String.
*
* Data is streamed from the {#code XMLEventReader}, however the String can be large depending of the number of children
* in the XML tag.
*
* #param xmlEventReader The already active reader. The starting tag event is assumed to have already been read
* #param startEvent The starting XML tag event already read from the {#code XMLEventReader}
* #param tag The XML tag name used to find the starting XML tag
* #param xmlOutputFactory Convenience include to avoid creating another factory
* #return String containing everything between the starting and ending XML tag, the tags themselves included
* #throws Exception On incorrect XML structure
*/
private static String getTagAsString(XMLEventReader xmlEventReader, XMLEvent startEvent, String tag, XMLOutputFactory xmlOutputFactory) throws Exception {
StringWriter stringWriter = new StringWriter();
XMLEventWriter xmlEventWriter = xmlOutputFactory.createXMLEventWriter(stringWriter);
// Add the start tag
xmlEventWriter.add(startEvent);
// Add until end tag
while (xmlEventReader.hasNext()) {
XMLEvent xmlEvent = xmlEventReader.nextEvent();
// End tag found
if (xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().getLocalPart().equalsIgnoreCase(tag)) {
xmlEventWriter.add(xmlEvent);
xmlEventWriter.close();
stringWriter.close();
return stringWriter.toString();
} else {
xmlEventWriter.add(xmlEvent);
}
}
xmlEventWriter.close();
stringWriter.close();
throw new Exception("Invalid XML, no closing tag for <" + tag + "> found!");
}

JaxB marshaler overwriting file contents

I am trying to use JaxB to marshall objects I create to an XML. What I want is to create a list then print it to the file, then create a new list and print it to the same file but everytime I do it over writes the first. I want the final XML file to look like I only had 1 big list of objects. I would do this but there are so many that I quickly max my heap size.
So, my main creates a bunch of threads each of which iterate through a list of objects it receives and calls create_Log on each object. Once it is finished it calls printToFile which is where it marshalls the list to the file.
public class LogThread implements Runnable {
//private Thread myThread;
private Log_Message message = null;
private LinkedList<Log_Message> lmList = null;
LogServer Log = null;
private String Username = null;
public LogThread(LinkedList<Log_Message> lmList){
this.lmList = lmList;
}
public void run(){
//System.out.println("thread running");
LogServer Log = new LogServer();
//create iterator for list
final ListIterator<Log_Message> listIterator = lmList.listIterator();
while(listIterator.hasNext()){
message = listIterator.next();
CountTrans.addTransNumber(message.TransactionNumber);
Username = message.input[2];
Log.create_Log(message.input, message.TransactionNumber, message.Message, message.CMD);
}
Log.printToFile();
init_LogServer.threadCount--;
init_LogServer.doneList();
init_LogServer.doneUser();
System.out.println("Thread "+ Thread.currentThread().getId() +" Completed user: "+ Username+"... Number of Users Complete: " + init_LogServer.getUsersComplete());
//Thread.interrupt();
}
}
The above calls the below function create_Log to build a new object I generated from the XSD I was given (SystemEventType,QuoteServerType...etc). These objects are all added to an ArrayList using the function below and attached to the Root object. Once the LogThread loop is finished it calls the printToFile which takes the list from the Root object and marshalls it to the file... overwriting what was already there. How can I add it to the same file without over writing and without creating one master list in the heap?
public class LogServer {
public log Root = null;
public static String fileName = "LogFile.xml";
public static File XMLfile = new File(fileName);
public LogServer(){
this.Root = new log();
}
//output LogFile.xml
public synchronized void printToFile(){
System.out.println("Printing XML");
//write to xml file
try {
init_LogServer.marshaller.marshal(Root,XMLfile);
} catch (JAXBException e) {
e.printStackTrace();
}
System.out.println("Done Printing XML");
}
private BigDecimal ConvertStringtoBD(String input){
DecimalFormatSymbols symbols = new DecimalFormatSymbols();
symbols.setGroupingSeparator(',');
symbols.setDecimalSeparator('.');
String pattern = "#,##0.0#";
DecimalFormat decimalFormat = new DecimalFormat(pattern, symbols);
decimalFormat.setParseBigDecimal(true);
// parse the string
BigDecimal bigDecimal = new BigDecimal("0");
try {
bigDecimal = (BigDecimal) decimalFormat.parse(input);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return bigDecimal;
}
public QuoteServerType Log_Quote(String[] input, int TransactionNumber){
BigDecimal quote = ConvertStringtoBD(input[4]);
BigInteger TransNumber = BigInteger.valueOf(TransactionNumber);
BigInteger ServerTimeStamp = new BigInteger(input[6]);
Date date = new Date();
long timestamp = date.getTime();
ObjectFactory factory = new ObjectFactory();
QuoteServerType quoteCall = factory.createQuoteServerType();
quoteCall.setTimestamp(timestamp);
quoteCall.setServer(input[8]);
quoteCall.setTransactionNum(TransNumber);
quoteCall.setPrice(quote);
quoteCall.setStockSymbol(input[3]);
quoteCall.setUsername(input[2]);
quoteCall.setQuoteServerTime(ServerTimeStamp);
quoteCall.setCryptokey(input[7]);
return quoteCall;
}
public SystemEventType Log_SystemEvent(String[] input, int TransactionNumber, CommandType CMD){
BigInteger TransNumber = BigInteger.valueOf(TransactionNumber);
Date date = new Date();
long timestamp = date.getTime();
ObjectFactory factory = new ObjectFactory();
SystemEventType SysEvent = factory.createSystemEventType();
SysEvent.setTimestamp(timestamp);
SysEvent.setServer(input[8]);
SysEvent.setTransactionNum(TransNumber);
SysEvent.setCommand(CMD);
SysEvent.setFilename(fileName);
return SysEvent;
}
public void create_Log(String[] input, int TransactionNumber, String Message, CommandType Command){
switch(Command.toString()){
case "QUOTE": //Quote_Log
QuoteServerType quote_QuoteType = Log_Quote(input,TransactionNumber);
Root.getUserCommandOrQuoteServerOrAccountTransaction().add(quote_QuoteType);
break;
case "QUOTE_CACHED":
SystemEventType Quote_Cached_SysType = Log_SystemEvent(input, TransactionNumber, CommandType.QUOTE);
Root.getUserCommandOrQuoteServerOrAccountTransaction().add(Quote_Cached_SysType);
break;
}
}
EDIT: The below is code how the objects are added to the ArrayList
public List<Object> getUserCommandOrQuoteServerOrAccountTransaction() {
if (userCommandOrQuoteServerOrAccountTransaction == null) {
userCommandOrQuoteServerOrAccountTransaction = new ArrayList<Object>();
}
return this.userCommandOrQuoteServerOrAccountTransaction;
}
Jaxb is about mapping java object tree to xml document or vice versa. So in principle, you need complete object model before you can save it to xml.
Of course it would not be possible, for very large data, for example DB dump, so jaxb allows marshalling object tree in fragments, letting the user control moment of the object creation and marshaling. Typical use case would be fetching records from DB one by one and marshaling them one by one to a file, so there would not be problem with the heap.
However, you are asking about appending one object tree to another (one fresh in memory, second one already represented in a xml file). Which is not normally possible as it is not really appending but crating new object tree that contains content of the both (there is only one document root element, not two).
So what you could do,
is to create new xml representation with manually initiated root
element,
copy the existing xml content to the new xml either using XMLStreamWriter/XMLStreamReader read/write operations or unmarshaling
the log objects and marshaling them one by one.
marshal your log objects into the same xml stram
complete the xml with the root closing element. -
Vaguely, something like that:
XMLStreamWriter writer = XMLOutputFactory.newInstance().createXMLStreamWriter(new FileOutputStream(...), StandardCharsets.UTF_8.name());
//"mannually" output the beginign of the xml document == its declaration and the root element
writer.writeStartDocument();
writer.writeStartElement("YOUR_ROOT_ELM");
Marshaller mar = ...
mar.setProperty(Marshaller.JAXB_FRAGMENT, true); //instructs jaxb to output only objects not the whole xml document
PartialUnmarshaler existing = ...; //allows reading one by one xml content from existin file,
while (existing.hasNext()) {
YourObject obj = existing.next();
mar.marshal(obj, writer);
writer.flush();
}
List<YourObject> toAppend = ...
for (YourObject toAppend) {
mar.marshal(obj,writer);
writer.flush();
}
//finishing the document, closing the root element
writer.writeEndElement();
writer.writeEndDocument();
Reading the objects one by one from large xml file, and complete implementation of PartialUnmarshaler is described in this answer:
https://stackoverflow.com/a/9260039/4483840
That is the 'elegant' solution.
Less elegant is to have your threads write their logs list to individual files and the append them yourself. You only need to read and copy the header of the first file, then copy all its content apart from the last closing tag, copy the content of the other files ignoring the document openkng and closing tag, output the closing tag.
If your marshaller is set to marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
each opening/closing tag will be in different line, so the ugly hack is to
copy all the lines from 3rd to one before last, then output the closing tag.
It is ugly hack, cause it is sensitive to your output format (if you for examle change your container root element). But faster to implement than full Jaxb solution.

Parse some elements from a xml

i want to know if is possible to me to parse some atributes from a xml file, to be a object in java
I donĀ“t wanna to create all fields that are in xml.
So, how can i do this?
For exemple below there is a xml file, and i want only the data inside the tag .
<emit>
<CNPJ>1109</CNPJ>
<xNome>OESTE</xNome>
<xFant>ABATEDOURO</xFant>
<enderEmit>
<xLgr>RODOVIA</xLgr>
<nro>S/N</nro>
<xCpl>402</xCpl>
<xBairro>GOMES</xBairro>
<cMun>314</cMun>
<xMun>MINAS</xMun>
<UF>MG</UF>
<CEP>35661470</CEP>
<cPais>58</cPais>
<xPais>Brasil</xPais>
<fone>03</fone>
</enderEmit>
<IE>20659</IE>
<CRT>3</CRT>
For Java XML parsing where you don't have the XSD and don't want to create a complete object graph to represent the XML, JDOM is a great tool. It allows you to easily walk the XML tree and pick the elements you are interested in.
Here's some sample code that uses JDOM to pick arbitrary values from the XML doc:
// reading can be done using any of the two 'DOM' or 'SAX' parser
// we have used saxBuilder object here
// please note that this saxBuilder is not internal sax from jdk
SAXBuilder saxBuilder = new SAXBuilder();
// obtain file object
File file = new File("/tmp/emit.xml");
try {
// converted file to document object
Document document = saxBuilder.build(file);
//You don't need this or the ns parameters in getChild()
//if your XML document has no namespace
Namespace ns = Namespace.getNamespace("http://www.example.com/namespace");
// get root node from xml. emit in your sample doc?
Element rootNode = document.getRootElement();
//getChild() assumes one and only one, enderEmit element. Use a lib and error
//checking as needed for your document
Element enderEmitElement = rootNode.getChild("enderEmit", ns);
//now we get two of the child from
Element xCplElement = enderEmitElement.getChild("xCpl", ns);
//should be 402 in your example
String xCplValue = xCplElement.getText();
System.out.println("xCpl: " + xCplValue);
Element cMunElement = enderEmitElement.getChild("cMun", ns);
//should be 314 in your example
String cMunValue = cMunElement.getText();
System.out.println("cMun: " + cMunValue);
} catch (JDOMException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
You can use JAXB to unmarshal the xml into Java object, with which you can read selective elements easily. With JAXB, the given XML can be represented in Java as follows :
enderEmit element :
#XmlRootElement
public class EnderEmit{
private String xLgr;
//Other elements.Here you can define properties for only those elements that you want to load
}
emit element (This represents your XML file):
#XmlRootElement
public class Emit{
private String cnpj;
private String xnom;
private EnderEmit enderEmit;
..
//Add elements that you want to load
}
Now by using the below lines of code, you can read your xml to an object :
String filePath="filePath";
File file = new File(filePath);
JAXBContext jaxbContext = JAXBContext.newInstance(Emit.class);
jaxbUnmarshaller = jaxbContext.createUnmarshaller();
Emit emit = (Emit) jaxbUnmarshaller.unmarshal(file);
The line will give you an emit object for the given xml.
Try to use StringUtils.subStringBetween
try
{
String input = "";
br = new BufferedReader(new FileReader(FILEPATH));
String result = null;
while ((input = br.readLine()) != null) // here we read the file line by line
{
result = StringUtils.substringBetween(input, ">", "<"); // using StringUtils.subStringBetween to get the data what you want
if(result != null) // if the result should not be null because some of the line not having the tags
{
System.out.println(""+result);
}
}
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
if (br != null)
{
br.close();
}
}
catch (IOException ex)
{
ex.printStackTrace();
}
}

Update XML using XMLStreamWriter

I have a large XML and I want to update a particular node of the XML (like removing duplicate nodes).
As the XML is huge I considered using the STAX api class - XMLStreamReader. I first read the XML using XMLStreamReader. I stored the read data in user objects and manipulated these user objects to remove duplicates.
Now I want to put this updated user object back into my original XML. What I thought is that I can marshall the user object to a string and place the string at the right position in my input xml. But I am not able to achieve it using the STAX class - XMLStreamWriter
Can this be achieved using XMLStreamWriter? Please suggest.
If no, they please suggest an alternative approach to my problem.
My main concern is memory as I cannot load such huge XMLs into our project server's memory which is shared across multiple processes. Hence I do not want use DOM because this will use lot of memory to load these huge XML.
If you need to alter a particular value like text content /tag name etc. STAX might help. It would also help in removing few elements using createFilteredReader
Below code renames Name to AuthorName and adds a comment
public class StAx {
public static void main(String[] args) throws FileNotFoundException,
XMLStreamException {
String filename = "HelloWorld.xml";
try (InputStream in = new FileInputStream(filename);
OutputStream out = System.out;) {
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLEventFactory ef = XMLEventFactory.newInstance();
XMLEventReader reader = factory.createXMLEventReader(filename, in);
XMLEventWriter writer = xof.createXMLEventWriter(out);
while (reader.hasNext()) {
XMLEvent event = (XMLEvent) reader.next();
if (event.isCharacters()) {
String data = event.asCharacters().getData();
if (data.contains("Hello")) {
String replace = data.replace("Hello", "Oh");
event = ef.createCharacters(replace);
}
writer.add(event);
} else if (event.isStartElement()) {
StartElement s = event.asStartElement();
String tagName = s.getName().getLocalPart();
if (tagName.equals("Name")) {
String newName = "Author" + tagName;
event = ef.createStartElement(new QName(newName), null,
null);
writer.add(event);
writer.add(ef.createCharacters("\n "));
event = ef.createComment("auto generated comment");
writer.add(event);
} else {
writer.add(event);
}
} else {
writer.add(event);
}
}
writer.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Input
<?xml version="1.0"?>
<BookCatalogue>
<Book>
<Title>HelloLord</Title>
<Name>
<first>New</first>
<last>Earth</last>
</Name>
<ISBN>12345</ISBN>
</Book>
<Book>
<Title>HelloWord</Title>
<Name>
<first>New</first>
<last>Moon</last>
</Name>
<ISBN>12346</ISBN>
</Book>
</BookCatalogue>
Output
<?xml version="1.0"?><BookCatalogue>
<Book>
<Title>OhLord</Title>
<AuthorName>
<!--auto generated comment-->
<first>New</first>
<last>Earth</last>
</AuthorName>
<ISBN>12345</ISBN>
</Book>
<Book>
<Title>OhWord</Title>
<AuthorName>
<!--auto generated comment-->
<first>New</first>
<last>Moon</last>
</AuthorName>
<ISBN>12346</ISBN>
</Book>
</BookCatalogue>
As you can see things gets really complicated when modification is much more than this like swapping two nodes deleting one node based on state of few other node : delete All Books with price more than average price
Best solution in this case is to produce resulting xml using xslt transformation

Categories

Resources