I need to parse potentially large XML files, of which the schema is already provided to me in several XSD files, so XML binding is highly favored. I'd like to know if I can use JAXB to parse the file in chunks and if so, how.
Because code matters, here is a PartialUnmarshaller who reads a big file into chunks. It can be used that way new PartialUnmarshaller<YourClass>(stream, YourClass.class)
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
import javax.xml.stream.*;
import java.io.InputStream;
import java.util.List;
import java.util.NoSuchElementException;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import static javax.xml.stream.XMLStreamConstants.*;
public class PartialUnmarshaller<T> {
XMLStreamReader reader;
Class<T> clazz;
Unmarshaller unmarshaller;
public PartialUnmarshaller(InputStream stream, Class<T> clazz) throws XMLStreamException, FactoryConfigurationError, JAXBException {
this.clazz = clazz;
this.unmarshaller = JAXBContext.newInstance(clazz).createUnmarshaller();
this.reader = XMLInputFactory.newInstance().createXMLStreamReader(stream);
/* ignore headers */
skipElements(START_DOCUMENT, DTD);
/* ignore root element */
reader.nextTag();
/* if there's no tag, ignore root element's end */
skipElements(END_ELEMENT);
}
public T next() throws XMLStreamException, JAXBException {
if (!hasNext())
throw new NoSuchElementException();
T value = unmarshaller.unmarshal(reader, clazz).getValue();
skipElements(CHARACTERS, END_ELEMENT);
return value;
}
public boolean hasNext() throws XMLStreamException {
return reader.hasNext();
}
public void close() throws XMLStreamException {
reader.close();
}
void skipElements(int... elements) throws XMLStreamException {
int eventType = reader.getEventType();
List<Integer> types = asList(elements);
while (types.contains(eventType))
eventType = reader.next();
}
}
This is detailed in the user guide. The JAXB download from http://jaxb.java.net/ includes an example of how to parse one chunk at a time.
When a document is large, it's
usually because there's repetitive
parts in it. Perhaps it's a purchase
order with a large list of line items,
or perhaps it's an XML log file with
large number of log entries.
This kind of XML is suitable for
chunk-processing; the main idea is to
use the StAX API, run a loop, and
unmarshal individual chunks
separately. Your program acts on a
single chunk, and then throws it away.
In this way, you'll be only keeping at
most one chunk in memory, which allows
you to process large documents.
See the streaming-unmarshalling
example and the partial-unmarshalling
example in the JAXB RI distribution
for more about how to do this. The
streaming-unmarshalling example has an
advantage that it can handle chunks at
arbitrary nest level, yet it requires
you to deal with the push model ---
JAXB unmarshaller will "push" new
chunk to you and you'll need to
process them right there.
In contrast, the partial-unmarshalling
example works in a pull model (which
usually makes the processing easier),
but this approach has some limitations
in databinding portions other than the
repeated part.
Yves Amsellem's answer is pretty good, but only works if all elements are of exactly the same type. Otherwise your unmarshall will throw an exception, but the reader will have already consumed the bytes, so you would be unable to recover. Instead, we should follow Skaffman's advice and look at the sample from the JAXB jar.
To explain how it works:
Create a JAXB unmarshaller.
Add a listener to the unmarshaller for intercepting the appropriate elements. This is done by "hacking" the ArrayList to ensure the elements are not stored in memory after being unmarshalled.
Create a SAX parser. This is where the streaming happens.
Use the unmarshaller to generate a handler for the SAX parser.
Stream!
I modified the solution to be generic*. However, it required some reflection. If this is not OK, please look at the code samples in the JAXB jars.
ArrayListAddInterceptor.java
import java.lang.reflect.Field;
import java.util.ArrayList;
public class ArrayListAddInterceptor<T> extends ArrayList<T> {
private static final long serialVersionUID = 1L;
private AddInterceptor<T> interceptor;
public ArrayListAddInterceptor(AddInterceptor<T> interceptor) {
this.interceptor = interceptor;
}
#Override
public boolean add(T t) {
interceptor.intercept(t);
return false;
}
public static interface AddInterceptor<T> {
public void intercept(T t);
}
public static void apply(AddInterceptor<?> interceptor, Object o, String property) {
try {
Field field = o.getClass().getDeclaredField(property);
field.setAccessible(true);
field.set(o, new ArrayListAddInterceptor(interceptor));
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
Main.java
public class Main {
public void parsePurchaseOrders(AddInterceptor<PurchaseOrder> interceptor, List<File> files) {
try {
// create JAXBContext for the primer.xsd
JAXBContext context = JAXBContext.newInstance("primer");
Unmarshaller unmarshaller = context.createUnmarshaller();
// install the callback on all PurchaseOrders instances
unmarshaller.setListener(new Unmarshaller.Listener() {
public void beforeUnmarshal(Object target, Object parent) {
if (target instanceof PurchaseOrders) {
ArrayListAddInterceptor.apply(interceptor, target, "purchaseOrder");
}
}
});
// create a new XML parser
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
XMLReader reader = factory.newSAXParser().getXMLReader();
reader.setContentHandler(unmarshaller.getUnmarshallerHandler());
for (File file : files) {
reader.parse(new InputSource(new FileInputStream(file)));
}
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
*This code has not been tested and is for illustrative purposes only.
I wrote a small library (available on Maven Central) to help to read big XML files and process them by chunks. Please note it can only be applied for files with a unique container having a list of data (even from different types). In other words, your file has to follow the structure:
<container>
<type1>...</type1>
<type2>...</type2>
<type1>...</type1>
...
</container>
Here is an example where Type1, Type2, ... are the JAXB representation of the repeating data in the file:
try (StreamingUnmarshaller unmarshaller = new StreamingUnmarshaller(Type1.class, Type2.class, ...)) {
unmarshaller.open(new FileInputStream(fileName));
unmarshaller.iterate((type, element) -> doWhatYouWant(element));
}
You can find more information with detailed examples on the GitHub page of the library.
Related
The problem
I'm coding a Java Swing application dealing with XML files, so I'm using JAXB in order to marshal classes into documents and unmarshal the other way around.
I want to include in the class that gets marshalled a private field, that stores the backing file the class is based on (if any), in the form of a File object. In this way, I can determine if a backing file is in use, so that when saving via a Save command, if the backing file is available, I can just marshal the class directly to that file, instead of obtaining it via a "Save file" dialog.
However, it seems that with the tools available in JAXB, I cannot get the File object from the Unmarshaller, while opening it. How can I tackle the situation so that I can set that variable correctly?
As this variable is internal, I don't want to include a setter or expose it so that other classes can't change it.
Background
Being aware of class event callbacks and external listeners, I know I can use a class event callback to set a class instance private field either before or after unmarshalling, but it seems I can't retrieve the file object being in use by the Unmarshaller from inside that callback.
On the other hand, with an external listener I could get ahold of the File object being unmarshalled, as it would be at the same level with the unmarshal method call, but now the private field would either need to be public or has to include a setter in order for it to be set.
Sample code
The following is a minimal, reproducible example, split in two files: JAXBTest.java and MarshalMe.java, both placed at the same level.
MarshalMe.java
import java.io.File;
import javax.xml.bind.annotation.XmlRootElement;
#XmlRootElement
public class MarshalMe {
private File backingFile;
public File getBackingFile() {
return backingFile;
}
// Dummy function that sets the backing file beforehand.
public void processSth() {
backingFile = new File("dummy.hai");
}
}
JAXBDemo.java
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;
public class JAXBTest {
public static void writeXML(MarshalMe me, File xml) {
try {
JAXBContext contextObj = JAXBContext.newInstance(MarshalMe.class);
Marshaller marshallerObj = contextObj.createMarshaller();
marshallerObj.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshallerObj.marshal(me, new FileOutputStream(xml));
} catch (JAXBException jaxbe) {
jaxbe.printStackTrace();
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}
}
public static MarshalMe readXML(File xml) {
MarshalMe me = null;
try {
JAXBContext contextObj = JAXBContext.newInstance(MarshalMe.class);
Unmarshaller unmarshallerObj = contextObj.createUnmarshaller();
me = (MarshalMe) unmarshallerObj.unmarshal(xml);
} catch (JAXBException jaxbe) {
jaxbe.printStackTrace();
}
return me;
}
public static void main(String[] args) {
MarshalMe src = new MarshalMe();
src.processSth();
System.out.println(src.getBackingFile());
File meFile = new File("me.xml");
writeXML(new MarshalMe(), meFile);
MarshalMe retrieved = readXML(meFile);
System.out.println(retrieved.getBackingFile());
}
}
Expected output
Running with Java 1.8 (or later, provided a JAXB library and runtime implementation is accesible), the minimal, reproducible example outputs:
dummy.hai
null
when I expect the output to be
dummy.hai
me.xml
as the class is initially written in a XML file named me.xml before being read back.
I've found a way to set the private field without exposing it or giving it a setter: Reflection.
Using external event listeners, I can get ahold of the File object. Then, inside the beforeUnmarshal method, and after checking that I got the correct object, I use reflection to get the private field, and with the setAccessible method, I can now control when I get access to the field using reflection only.
After lifting the access checks, it's only a matter of editing the value via setting it, and reinstating the checks after that.
Relevant changes
The following snippet includes the relevant changes:
unmarshallerObj.setListener(new Unmarshaller.Listener() {
#Override
public void beforeUnmarshal(Object target, Object parent) {
if (!(target instanceof MarshalMe))
return;
MarshalMe me = (MarshalMe) target;
try {
Field meBackingFile = MarshalMe.class.getDeclaredField("backingFile");
meBackingFile.setAccessible(true);
meBackingFile.set(me, xml);
meBackingFile.setAccessible(false);
} catch (NoSuchFieldException nsfe) {
} catch (IllegalAccessException iae) {}
}
});
Including the edit in the sample program
Edit the file JAXBDemo.java by adding the following code:
// Add to the import section
import java.lang.reflect.Field;
// Under this function
public static MarshalMe readXML(File xml) {
MarshalMe me = null;
try {
JAXBContext contextObj = JAXBContext.newInstance(MarshalMe.class);
Unmarshaller unmarshallerObj = contextObj.createUnmarshaller();
/* Add this code vvv */
unmarshallerObj.setListener(new Unmarshaller.Listener() {
#Override
public void beforeUnmarshal(Object target, Object parent) {
if (!(target instanceof MarshalMe))
return;
MarshalMe me = (MarshalMe) target;
try {
Field meBackingFile = MarshalMe.class.getDeclaredField("backingFile");
meBackingFile.setAccessible(true);
meBackingFile.set(me, xml);
meBackingFile.setAccessible(false);
} catch (NoSuchFieldException nsfe) {
} catch (IllegalAccessException iae) {}
}
});
/* Add this code ^^^ */
me = (MarshalMe) unmarshallerObj.unmarshal(xml);
} catch (JAXBException jaxbe) {
jaxbe.printStackTrace();
}
return me;
}
After adding the import and the code between the /* Add this code */ lines, running the program again now outputs:
dummy.hai
me.xml
as expected.
I have an xml like this:
<root
xmlns:gl-bus="http://www.xbrl.org/int/gl/bus/2006-10-25"
xmlns:gl-cor="http://www.xbrl.org/int/gl/cor/2006-10-25" >
<gl-cor:entityInformation>
<gl-bus:accountantInformation>
...............
</gl-bus:accountantInformation>
</gl-cor:entityInformation>
</root>
All I want to extract the element "gl-cor:entityInformation" from the root with its child elements. However, I do not want the namespace declarations come with it.
The code is like this:
XPathExpression<Element> xpath = XPathFactory.instance().compile("gl-cor:entityInformation", Filters.element(), null, NAMESPACES);
Element innerElement = xpath.evaluateFirst(xmlDoc.getRootElement());
The problem is that the inner element holds the namespace declarations now. Sample output:
<gl-cor:entityInformation xmlns:gl-cor="http://www.xbrl.org/int/gl/cor/2006-10-25">
<gl-bus:accountantInformation xmlns:gl-bus="http://www.xbrl.org/int/gl/bus/2006-10-25">
</gl-bus:accountantInformation>
</gl-cor:entityInformation>
This is how I get xml as string:
public static String toString(Element element) {
Format format = Format.getPrettyFormat();
format.setTextMode(Format.TextMode.NORMALIZE);
format.setEncoding("UTF-8");
XMLOutputter xmlOut = new XMLOutputter();
xmlOut.setFormat(format);
return xmlOut.outputString(element);
}
As you see the namespace declarations are passed into the inner elements. Is there a way to get rid of these declarations without losing the prefixes?
I want this because later on I will be merging these inner elements inside another parent element and this parent element has already those namespace declarations.
JDOM by design insists that the in-memory model of the XML is well structured at all times. The behaviour you are seeing is exactly what I would expect from JDOM and I consider it to be "right". JDOM's XMLOutputter also outputs well structured and internally consistent XML and XML fragments.
Changing the bahaviour of the internal in-memory model is not an option with JDOM, but customizing the XMLOutputter to change its behaviour is relatively easy. The XMLOutputter is structured to have an "engine" supplied as a constructor argument: XMLOutputter(XMLOutputProcessor). In addition, JDOM supplies an easy-to-customize default XMLOutputProcessor called AbstractXMLOutputProcessor.
You can get the behaviour you want by doing the following:
private static final XMLOutputProcessor noNamespaces = new AbstractXMLOutputProcessor() {
#Override
protected void printNamespace(final Writer out, final FormatStack fstack,
final Namespace ns) throws IOException {
// do nothing with printing Namespaces....
}
};
Now, when you create your XMLOutputter to print your XML element fragment, you can do the following:
public static String toString(Element element) {
Format format = Format.getPrettyFormat();
format.setTextMode(Format.TextMode.NORMALIZE);
format.setEncoding("UTF-8");
XMLOutputter xmlOut = new XMLOutputter(noNamespaces);
xmlOut.setFormat(format);
return xmlOut.outputString(element);
}
Here's a full program working with your input XML:
import java.io.IOException;
import java.io.Writer;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.Namespace;
import org.jdom2.filter.Filters;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;
import org.jdom2.output.support.AbstractXMLOutputProcessor;
import org.jdom2.output.support.FormatStack;
import org.jdom2.output.support.XMLOutputProcessor;
import org.jdom2.xpath.XPathExpression;
import org.jdom2.xpath.XPathFactory;
public class JDOMEray {
public static void main(String[] args) throws JDOMException, IOException {
Document eray = new SAXBuilder().build("eray.xml");
Namespace[] NAMESPACES = {Namespace.getNamespace("gl-cor", "http://www.xbrl.org/int/gl/cor/2006-10-25")};
XPathExpression<Element> xpath = XPathFactory.instance().compile("gl-cor:entityInformation", Filters.element(), null, NAMESPACES);
Element innerElement = xpath.evaluateFirst(eray.getRootElement());
System.out.println(toString(innerElement));
}
private static final XMLOutputProcessor noNamespaces = new AbstractXMLOutputProcessor() {
#Override
protected void printNamespace(final Writer out, final FormatStack fstack,
final Namespace ns) throws IOException {
// do nothing with printing Namespaces....
}
};
public static String toString(Element element) {
Format format = Format.getPrettyFormat();
format.setTextMode(Format.TextMode.NORMALIZE);
format.setEncoding("UTF-8");
XMLOutputter xmlOut = new XMLOutputter(noNamespaces);
xmlOut.setFormat(format);
return xmlOut.outputString(element);
}
}
For me the above program outputs:
<gl-cor:entityInformation>
<gl-bus:accountantInformation>...............</gl-bus:accountantInformation>
</gl-cor:entityInformation>
I've got the WSDL for a SOAP web service
I created a "Top down, Java Bean" web service client in RAD Developer (an Eclipse based compiler used with IBM Websphere) and auto-generated a bunch of JAX-WS .java modules
Here is the auto-generated JAX-WS code for one of the operations:
#WebMethod(operationName = "CommitTransaction", action = "http://myuri.com/wsdl/gitsearchservice/CommitTransaction")
#RequestWrapper(localName = "CommitTransaction", targetNamespace = "http://myuri.com/wsdl/gitsearchservice", className = "com.myuri.shwsclients.CommitTransaction")
#ResponseWrapper(localName = "CommitTransactionResponse", targetNamespace = "http://myuri.com/wsdl/gitsearchservice", className = "com.myuri.shwsclients.CommitTransactionResponse")
public void commitTransaction(
#WebParam(name = "requestOptions", targetNamespace = "http://myuri.com/wsdl/gitsearchservice")
RequestOptions requestOptions,
#WebParam(name = "transactionData", targetNamespace = "http://myuri.com/wsdl/gitsearchservice")
TransactionData transactionData);
QUESTION:
"transactionData" comes from a large, complex XML data record. The WSDL format exactly matches the XML I'll be writing on the Java side, and exactly matches what the Web service will be reading on the server side.
Q: How do I bypass Java serialization for the "transactionData" parameter, to send raw XML in my SOAP message? Instead of having to read my XML, parse it, and pack the Java "TransactionType" structure field-by-field?
Thank you in advance!
I see two approaches:
complicated, and will fall apart as soon as any generated code is re-genereated...
Dig into the Service, Dispatch, and BindingProvider implementations that are created in your generated service Proxy class -- you can get the behavior you want by substituting your own BindingProvider implementation, but you have to make other substitutions to get there.
go through XML serializtion, but without the hassle of "packing field by field"
starting with your string of raw XML that you say "exactly matches" the expected format
String rawXML = someMethodThatReturnsXml();
JAXBContext context = JAXBContext.newInstance(TransactionData.class);
Unmarshaller unmarshaller = context.createUnmarshaller();
Object obj = unmarshaller.unmarshal(new StringReader(rawXML));
TransactionData data = (TransactionData) obj;
if the jaxb generated class 'TransactionData' class is not annotated as 'XmlRootElement' then you should still be able to accomplish this like so:
String rawXML = someMethodThatReturnsXml();
JAXBContext context = JAXBContext.newInstance(TransactionData.class);
Unmarshaller unmarshaller = context.createUnmarshaller();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource input = new InputSource(new StringReader(rawXML));
Document doc = db.parse(input);
JAXBElement<?> element = unmarshaller.unmarshal(doc, TransactionData.class);
Object obj = element.getValue();
TransactionData data = (TransactionData) obj;
If you deal with a lot of XML records of various types, you can throw these together, and make the desired output class a parameter and have a generic xml-to-object utility:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
/**
* #author Zach Shelton
*/
public class SampleCode {
/**
* Turn xml into an object.
*
* #param <SomeJaxbType>
* #param wellFormedXml a String of well-formed XML, with proper reference to the correct namespace
* #param jaxbClass the class representing the type of object you want to get back (probably a class generated from xsd or wsdl)
* #return an object of the requested type
* #throws JAXBException if there is a problem setting up the Unmarshaller, or performing the unmarshal operation
* #throws SAXException if the given class is not annotated as XmlRootElement, and the xml String can not be parsed to a generic DOM Document
*/
public <SomeJaxbType> SomeJaxbType xmlToObject(String wellFormedXml, Class<SomeJaxbType> jaxbClass) throws JAXBException, SAXException {
if (jaxbClass==null) throw new IllegalArgumentException("received null jaxbClass");
if (wellFormedXml==null || wellFormedXml.trim().isEmpty()) throw new IllegalArgumentException("received null or empty xml");
if (!jaxbClass.isAnnotationPresent(XmlType.class)) throw new IllegalArgumentException(jaxbClass.getName() + " is not annotated as a JAXB class");
JAXBContext context = JAXBContext.newInstance(jaxbClass);
Unmarshaller unmarshaller = context.createUnmarshaller();
Object genericObject;
if (jaxbClass.isAnnotationPresent(XmlRootElement.class)) {
genericObject = unmarshaller.unmarshal(new StringReader(wellFormedXml));
} else {//must use alternate method described in API
//http://docs.oracle.com/javaee/6/api/javax/xml/bind/Unmarshaller.html#unmarshalByDeclaredType
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db;
try {
db = dbf.newDocumentBuilder();
} catch (ParserConfigurationException e) {
throw new IllegalStateException("failed to get DocumentBuilder from factory");
}
InputSource input = new InputSource(new StringReader(wellFormedXml));
Document doc;
try {
doc = db.parse(input);
} catch (IOException e) {
throw new IllegalStateException("xml string vanished");
}
JAXBElement<?> element = unmarshaller.unmarshal(doc, jaxbClass);
genericObject = element.getValue();
}
SomeJaxbType typedObject = (SomeJaxbType) genericObject;
return typedObject;
}
}
I'm not that familiar with the usage of the RequestWrapper and ResponseWrapper annotations, but does your outbound message end up looking something like:
<CommitTransaction>
<requestOptions>...</requestOptions>
<transactionData>...</transactionData>
</CommitTransaction>
Let's pretend that it does :) and we'll also assume that the XML for the TransactionData param is represented by a Source instance. Create a custom SOAPHandler that maintains a handle to that Source:
public class TransactionDataHandler implements SOAPHandler<SOAPMessageContext> {
private final Source transactionDataSource;
public TransactionDataHandler(Source transactionDataSource) {
this.transactionDataSource = transactionDataSource;
}
#Override
public boolean handleMessage(SOAPMessageContext context) {
// no exception handling
Boolean isOutbound = (Boolean)context.get(MessageContext.MESSAGE_OUTBOUND_PROPERTY);
if (Boolean.TRUE.equals(isOutbound)) {
SOAPMessage message = context.getMessage();
SOAPBody body = message.getSOAPBody();
Node commitTransactionNode = body.getFirstChild();
Result commitTransactionResult = new DOMResult(commitTransactionNode);
TransformerFactory.newInstance().newTransformer().transform(this.transactionDataSource, commitTransactionResult);
}
return true;
}
#Override
public Set<QName> getHeaders() {
return null;
}
#Override
public boolean handleFault(SOAPMessageContext context) {
return true;
}
#Override
public void close(MessageContext context) {
// no-op
}
}
The idea is that the transform step should create the child <transactionData> node. You'll also need a custom HandlerResolver, perhaps something like:
public class TransactionDataHandlerResolver implements HandlerResolver {
private final Handler transactionDataHandler;
public TransactionDataHandlerResolver(Source transactionDataSource) {
this.transactionDataHandler = new TransactionDataHandler(transactionDataSource);
}
#Override
public List<Handler> getHandlerChain(PortInfo portInfo) {
return Collections.singletonList(this.transactionDataHandler);
}
}
Finally, create a Service instance and hook in the XML Source and HandlerResolver:
Source transactionDataSource;
URL wsdlDocumentLocation;
QName serviceName;
Service service = Service.create(wsdlDocumentLocation, serviceName);
service.setHandlerResolver(new TransactionDataHandlerResolver(transactionDataSource));
From here, you can get a Dispatch or port proxy and fire off the operation. This may not fit exactly into your existing code/environment, but hopefully it gives you some food for thought...
Edit: If you're using a port proxy, pass null for the second arg:
port.commitTransaction(requestOptions, null);
I am using JAXB to serialize my data to XML. The class code is simple as given below. I want to produce XML that contains CDATA blocks for the value of some Args. For example, current code produces this XML:
<command>
<args>
<arg name="test_id">1234</arg>
<arg name="source"><html>EMAIL</html></arg>
</args>
</command>
I want to wrap the "source" arg in CDATA such that it looks like below:
<command>
<args>
<arg name="test_id">1234</arg>
<arg name="source"><[![CDATA[<html>EMAIL</html>]]></arg>
</args>
</command>
How can I achieve this in the below code?
#XmlRootElement(name="command")
public class Command {
#XmlElementWrapper(name="args")
protected List<Arg> arg;
}
#XmlRootElement(name="arg")
public class Arg {
#XmlAttribute
public String name;
#XmlValue
public String value;
public Arg() {};
static Arg make(final String name, final String value) {
Arg a = new Arg();
a.name=name; a.value=value;
return a; }
}
Note: I'm the EclipseLink JAXB (MOXy) lead and a member of the JAXB (JSR-222) expert group.
If you are using MOXy as your JAXB provider then you can leverage the #XmlCDATA extension:
package blog.cdata;
import javax.xml.bind.annotation.XmlRootElement;
import org.eclipse.persistence.oxm.annotations.XmlCDATA;
#XmlRootElement(name="c")
public class Customer {
private String bio;
#XmlCDATA
public void setBio(String bio) {
this.bio = bio;
}
public String getBio() {
return bio;
}
}
For More Information
http://bdoughan.blogspot.com/2010/07/cdata-cdata-run-run-data-run.html
http://blog.bdoughan.com/2011/05/specifying-eclipselink-moxy-as-your.html
Use JAXB's Marshaller#marshal(ContentHandler) to marshal into a ContentHandler object. Simply override the characters method on the ContentHandler implementation you are using (e.g. JDOM's SAXHandler, Apache's XMLSerializer, etc):
public class CDataContentHandler extends (SAXHandler|XMLSerializer|Other...) {
// see http://www.w3.org/TR/xml/#syntax
private static final Pattern XML_CHARS = Pattern.compile("[<>&]");
public void characters(char[] ch, int start, int length) throws SAXException {
boolean useCData = XML_CHARS.matcher(new String(ch,start,length)).find();
if (useCData) super.startCDATA();
super.characters(ch, start, length);
if (useCData) super.endCDATA();
}
}
This is much better than using the XMLSerializer.setCDataElements(...) method because you don't have to hardcode any list of elements. It automatically outputs CDATA blocks only when one is required.
Solution Review:
The answer of fred is just a workaround which will fail while validating the content when the Marshaller is linked to a Schema because you modify only the string literal and do not create CDATA sections. So if you only rewrite the String from foo to <![CDATA[foo]]> the length of the string is recognized by Xerces with 15 instead of 3.
The MOXy solution is implementation specific and does not work only with the classes of the JDK.
The solution with the getSerializer references to the deprecated XMLSerializer class.
The solution LSSerializer is just a pain.
I modified the solution of a2ndrade by using a XMLStreamWriter implementation. This solution works very well.
XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLStreamWriter streamWriter = xof.createXMLStreamWriter( System.out );
CDataXMLStreamWriter cdataStreamWriter = new CDataXMLStreamWriter( streamWriter );
marshaller.marshal( jaxbElement, cdataStreamWriter );
cdataStreamWriter.flush();
cdataStreamWriter.close();
Thats the CDataXMLStreamWriter implementation. The delegate class simply delegates all method calls to the given XMLStreamWriter implementation.
import java.util.regex.Pattern;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
/**
* Implementation which is able to decide to use a CDATA section for a string.
*/
public class CDataXMLStreamWriter extends DelegatingXMLStreamWriter
{
private static final Pattern XML_CHARS = Pattern.compile( "[&<>]" );
public CDataXMLStreamWriter( XMLStreamWriter del )
{
super( del );
}
#Override
public void writeCharacters( String text ) throws XMLStreamException
{
boolean useCData = XML_CHARS.matcher( text ).find();
if( useCData )
{
super.writeCData( text );
}
else
{
super.writeCharacters( text );
}
}
}
Here is the code sample referenced by the site mentioned above:
import java.io.File;
import java.io.StringWriter;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
public class JaxbCDATASample {
public static void main(String[] args) throws Exception {
// unmarshal a doc
JAXBContext jc = JAXBContext.newInstance("...");
Unmarshaller u = jc.createUnmarshaller();
Object o = u.unmarshal(...);
// create a JAXB marshaller
Marshaller m = jc.createMarshaller();
// get an Apache XMLSerializer configured to generate CDATA
XMLSerializer serializer = getXMLSerializer();
// marshal using the Apache XMLSerializer
m.marshal(o, serializer.asContentHandler());
}
private static XMLSerializer getXMLSerializer() {
// configure an OutputFormat to handle CDATA
OutputFormat of = new OutputFormat();
// specify which of your elements you want to be handled as CDATA.
// The use of the '^' between the namespaceURI and the localname
// seems to be an implementation detail of the xerces code.
// When processing xml that doesn't use namespaces, simply omit the
// namespace prefix as shown in the third CDataElement below.
of.setCDataElements(
new String[] { "ns1^foo", // <ns1:foo>
"ns2^bar", // <ns2:bar>
"^baz" }); // <baz>
// set any other options you'd like
of.setPreserveSpace(true);
of.setIndenting(true);
// create the serializer
XMLSerializer serializer = new XMLSerializer(of);
serializer.setOutputByteStream(System.out);
return serializer;
}
}
For the same reasons as Michael Ernst I wasn't that happy with most of the answers here. I could not use his solution as my requirement was to put CDATA tags in a defined set of fields - as in raiglstorfer's OutputFormat solution.
My solution is to marshal to a DOM document, and then do a null XSL transform to do the output. Transformers allow you to set which elements are wrapped in CDATA tags.
Document document = ...
jaxbMarshaller.marshal(jaxbObject, document);
Transformer nullTransformer = TransformerFactory.newInstance().newTransformer();
nullTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
nullTransformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "myElement {myNamespace}myOtherElement");
nullTransformer.transform(new DOMSource(document), new StreamResult(writer/stream));
Further info here: http://javacoalface.blogspot.co.uk/2012/09/outputting-cdata-sections-with-jaxb.html
The following simple method adds CDATA support in JAX-B which does not support CDATA natively :
declare a custom simple type CDataString extending string to identify the fields that should be handled via CDATA
Create a custom CDataAdapter that parses and print content in CDataString
use JAXB bindings to link CDataString and you CDataAdapter. the CdataAdapter will add/remove to/from CdataStrings at Marshall/Unmarshall time
Declare a custom character escape handler that does not escape character when printing CDATA strings and set this as the Marshaller CharacterEscapeEncoder
Et voila, any CDataString element will be encapsulated with at Marshall time. At unmarshall time, the will automatically be removed.
Supplement of #a2ndrade's answer.
I find one class to extend in JDK 8. But noted that the class is in com.sun package. You can make one copy of the code in case this class may be removed in future JDK.
public class CDataContentHandler extends com.sun.xml.internal.txw2.output.XMLWriter {
public CDataContentHandler(Writer writer, String encoding) throws IOException {
super(writer, encoding);
}
// see http://www.w3.org/TR/xml/#syntax
private static final Pattern XML_CHARS = Pattern.compile("[<>&]");
public void characters(char[] ch, int start, int length) throws SAXException {
boolean useCData = XML_CHARS.matcher(new String(ch, start, length)).find();
if (useCData) {
super.startCDATA();
}
super.characters(ch, start, length);
if (useCData) {
super.endCDATA();
}
}
}
How to use:
JAXBContext jaxbContext = JAXBContext.newInstance(...class);
Marshaller marshaller = jaxbContext.createMarshaller();
StringWriter sw = new StringWriter();
CDataContentHandler cdataHandler = new CDataContentHandler(sw,"utf-8");
marshaller.marshal(gu, cdataHandler);
System.out.println(sw.toString());
Result example:
<?xml version="1.0" encoding="utf-8"?>
<genericUser>
<password><![CDATA[dskfj>><<]]></password>
<username>UNKNOWN::UNKNOWN</username>
<properties>
<prop2>v2</prop2>
<prop1><![CDATA[v1><]]></prop1>
</properties>
<timestamp/>
<uuid>cb8cbc487ee542ec83e934e7702b9d26</uuid>
</genericUser>
As of Xerxes-J 2.9, XMLSerializer has been deprecated. The suggestion is to replace it with DOM Level 3 LSSerializer or JAXP's Transformation API for XML. Has anyone tried approach?
Just a word of warning: according to documentation of the javax.xml.transform.Transformer.setOutputProperty(...) you should use the syntax of qualified names, when indicating an element from another namespace. According to JavaDoc (Java 1.6 rt.jar):
"(...) For example, if a URI and local name were obtained from an element defined with , then the qualified name would be "{http://xyz.foo.com/yada/baz.html}foo. Note that no prefix is used."
Well this doesn't work - the implementing class from Java 1.6 rt.jar, meaning com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl interprets elements belonging to a different namespace only then correctly, when they are declared as "http://xyz.foo.com/yada/baz.html:foo", because in the implementation someone is parsing it looking for the last colon. So instead of invoking:
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "{http://xyz.foo.com/yada/baz.html}foo")
which should work according to JavaDoc, but ends up being parsed as "http" and "//xyz.foo.com/yada/baz.html", you must invoke
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "http://xyz.foo.com/yada/baz.html:foo")
At least in Java 1.6.
The following code will prevent from encoding CDATA elements:
Marshaller marshaller = context.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
StringWriter stringWriter = new StringWriter();
PrintWriter printWriter = new PrintWriter(stringWriter);
DataWriter dataWriter = new DataWriter(printWriter, "UTF-8", new CharacterEscapeHandler() {
#Override
public void escape(char[] buf, int start, int len, boolean b, Writer out) throws IOException {
out.write(buf, start, len);
}
});
marshaller.marshal(data, dataWriter);
System.out.println(stringWriter.toString());
It will also keep UTF-8 as your encoding.
I need to just read the value of a single attribute inside an XML file using java. The XML would look something like this:
<behavior name="Fred" version="2.0" ....>
and I just need to read out the version. Can someone point in the direction of a resource that would show me how to do this?
You don't need a fancy library -- plain old JAXP versions of DOM and XPath are pretty easy to read and write for this. Whatever you do, don't use a regular expression.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
public class GetVersion {
public static void main(String[] args) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
Document doc = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse("file:////tmp/whatever.xml");
String version = xpath.evaluate("//behavior/#version", doc);
System.out.println(version);
}
}
JAXB for brevity:
private static String readVersion(File file) {
#XmlRootElement class Behavior {
#XmlAttribute String version;
}
return JAXB.unmarshal(file, Behavior.class).version;
}
StAX for efficiency:
private static String readVersionEfficient(File file)
throws XMLStreamException, IOException {
XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLStreamReader xmlReader = inFactory
.createXMLStreamReader(new StreamSource(file));
try {
while (xmlReader.hasNext()) {
if (xmlReader.next() == XMLStreamConstants.START_ELEMENT) {
if (xmlReader.getLocalName().equals("behavior")) {
return xmlReader.getAttributeValue(null, "version");
} else {
throw new IOException("Invalid file");
}
}
}
throw new IOException("Invalid file");
} finally {
xmlReader.close();
}
}
Here's one.
import javax.xml.parsers.SAXParser;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.SAXException;
import org.xml.sax.Attributes;
import javax.xml.parsers.SAXParserFactory;
/**
* Here is sample of reading attributes of a given XML element.
*/
public class SampleOfReadingAttributes {
/**
* Application entry point
* #param args command-line arguments
*/
public static void main(String[] args) {
try {
// creates and returns new instance of SAX-implementation:
SAXParserFactory factory = SAXParserFactory.newInstance();
// create SAX-parser...
SAXParser parser = factory.newSAXParser();
// .. define our handler:
SaxHandler handler = new SaxHandler();
// and parse:
parser.parse("sample.xml", handler);
} catch (Exception ex) {
ex.printStackTrace(System.out);
}
}
/**
* Our own implementation of SAX handler reading
* a purchase-order data.
*/
private static final class SaxHandler extends DefaultHandler {
// we enter to element 'qName':
public void startElement(String uri, String localName,
String qName, Attributes attrs) throws SAXException {
if (qName.equals("behavior")) {
// get version
String version = attrs.getValue("version");
System.out.println("Version is " + version );
}
}
}
}
As mentioned you can use the SAXParser.
Digester mentioned using regular expressions, which I won't recommend as it would lead to code that is difficult to maintain: What if you add another version attribute in another tag, or another behaviour tag? You can handle it, but it won't be pretty.
You can also use XPath, which is a language for querying xml. That's what I would recommend.
If all you need is to read the version, then you can use regex. But really, I think you need apache digester
Apache Commons Configuration is nice, too. Commons Digester is based on it.