How NOT to deserialise a nested XML document with XStream - java

I have a document structure like this:
<MyDocument>
<MyChildDocument>
<SubElement>
...
</SubElement>
</MyChildDocument>
</MyDocument>
I would like XStream to de-serialise this to the following object:
#XStreamAlias("MyDocument")
public class MyDocument {
String myChildDocument;
public String getMyChildDocument() {
return myChildDocument;
}
public void setMyChildDocument(String str) {
myChildDocument = str;
}
}
The myChildDocument variable should contain the full child document as a string including the tags.
I also need to do the serialisation side of this, avoiding XStream from entity encoding the XML string contained within the myChildDocument variable.
I've been looking at converters to do this for me, but have not found a good way to do it. Any ideas?

I managed to create a solution for this using a custom converter. In simple terms, when marshalling, feed the XML string for MyChildDocument into an XML reader and use a copier to feed this back out to the writer that is creating the marshalled result. Reverse the process when unmarshalling incoming XML!
public class MyExchangeConverter implements Converter {
protected static XmlPullParser pullParser;
protected static XmlPullParser getPullParser() {
if (pullParser == null) {
try {
pullParser = XmlPullParserFactory.newInstance().newPullParser();
}
catch (XmlPullParserException e) { } // Ah nuts!
}
return pullParser;
}
#Override
public boolean canConvert(#SuppressWarnings("rawtypes") Class type) {
return MyDocument.class.equals(type);
}
#Override
public void marshal(Object source, HierarchicalStreamWriter writer,
MarshallingContext context) {
MyDocument request = (MyDocument) source;
if (request.getMyChildDocument() != null) {
HierarchicalStreamReader reader;
reader = new XppReader(new StringReader(request.getMyChildDocument()), getPullParser());
HierarchicalStreamCopier copier = new HierarchicalStreamCopier();
copier.copy(reader, writer);
}
}
#Override
public Object unmarshal(HierarchicalStreamReader reader,
UnmarshallingContext context) {
MyDocument response = new MyDocument();
reader.moveDown();
Writer out = new StringWriter();
HierarchicalStreamWriter writer = new CompactWriter(out);
HierarchicalStreamCopier copier = new HierarchicalStreamCopier();
copier.copy(reader, writer);
response.setMyChildDocument(out.toString());
reader.moveUp();
return response;
}
}
Some would (rightly) argue this opens up the system to XML injection attacks to a degree. True enough, but for my particular use case, this is not a risk I am concerned about. Just something to be aware of if anybody plans to use this for public facing interfaces with unknown remote parties or the risk of man-in-the-middle attacks. You have been warned!

Related

Is there a possibility to store the single event information at a time in a JSONObject/JsonNode using the Jackson JsonParser

I am trying to read the events from a large JSON file one-by-one using the Jackson JsonParser. I would like to store each event temporarily in an Object something like JsonObject or any other object which I later want to use for some further processing.
I was previously reading the JSON events one-by-one and storing them into my own custom context: Old Post for JACKSON JsonParser Context which is working fine. However, rather than context, I would like to store them into jsonObject or some other object one by one.
Following is my sample JSON file:
{
"#context":"https://context.org/context.jsonld",
"isA":"SchoolManagement",
"format":"application/ld+json",
"schemaVersion":"2.0",
"creationDate":"2021-04-21T10:10:09+00:00",
"body":{
"members":[
{
"isA":"student",
"name":"ABCS",
"class":10,
"coaching":[
"XSJSJ",
"IIIRIRI"
],
"dob":"1995-04-21T10:10:09+00:00"
},
{
"isA":"teacher",
"name":"ABCS",
"department":"computer science",
"school":{
"name":"ABCD School"
},
"dob":"1995-04-21T10:10:09+00:00"
},
{
"isA":"boardMember",
"name":"ABCS",
"board":"schoolboard",
"dob":"1995-04-21T10:10:09+00:00"
}
]
}
}
At a time I would like to store only one member such as student or teacher in my JsonObject.
Following is the code I have so far:
What's the best way to store each event in an Object which I can later use for some processing.
Then again clear that object and use it for the next event?
public class Main {
private JSONObject eventInfo;
private final String[] eventTypes = new String[] { "student", "teacher", "boardMember" };
public static void main(String[] args) throws JsonParseException, JsonMappingException, IOException, JAXBException, URISyntaxException {
// Get the JSON Factory and parser Object
JsonFactory jsonFactory = new JsonFactory();
JsonParser jsonParser = jsonFactory.createParser(new File(Main.class.getClassLoader().getResource("inputJson.json").toURI()));
JsonToken current = jsonParser.nextToken();
// Check the first element is Object
if (current != JsonToken.START_OBJECT) {
throw new IllegalStateException("Expected content to be an array");
}
// Loop until the start of the EPCIS EventList array
while (jsonParser.nextToken() != JsonToken.START_ARRAY) {
System.out.println(jsonParser.getCurrentToken() + " --- " + jsonParser.getCurrentName());
}
// Goto the next token
jsonParser.nextToken();
// Call the method to loop until the end of the events file
eventTraverser(jsonParser);
}
// Method which will traverse through the eventList and read event one-by-one
private static void eventTraverser(JsonParser jsonParser) throws IOException {
// Loop until the end of the EPCIS events file
while (jsonParser.nextToken() != JsonToken.END_OBJECT) {
//Is there a possibility to store the complete object directly in an JSON Object or I need to again go through every token to see if is array and handle it accordingly as mentioned in my previous POST.
}
}
}
After trying some things I was able to get it working. I am posting the whole code as it can be useful to someone in the future cause I know how frustrating it is to find the proper working code sample:
public class Main
{
public void xmlConverter (InputStream jsonStream) throws IOException,JAXBException, XMLStreamException
{
// jsonStream is the input JSOn which is normally passed by reading the JSON file
// Get the JSON Factory and parser Object
final JsonFactory jsonFactory = new JsonFactory ();
final JsonParser jsonParser = jsonFactory.createParser (jsonStream);
final ObjectMapper objectMapper = new ObjectMapper ();
//To read the duplicate keys if there are any key duplicate json
final SimpleModule module = new SimpleModule ();
module.addDeserializer (JsonNode.class, new JsonNodeDupeFieldHandlingDeserializer ());
objectMapper.registerModule (module);
jsonParser.setCodec (objectMapper);
// Check the first element is Object if not then invalid JSON throw error
if (jsonParser.nextToken () != JsonToken.START_OBJECT)
{
throw new IllegalStateException ("Expected content to be an array");
}
while (!jsonParser.getText ().equals ("members"))
{
//Skipping the elements till members key
// if you want you can do some process here
// I am skipping for now
}
// Goto the next token
jsonParser.nextToken ();
while (jsonParser.nextToken () != JsonToken.END_ARRAY)
{
final JsonNode jsonNode = jsonParser.readValueAsTree ();
//Check if the JsonNode is valid if not then exit the process
if (jsonNode == null || jsonNode.isNull ())
{
System.out.println ("End Of File");
break;
}
// Get the eventType
final String eventType = jsonNode.get ("isA").asText ();
// Based on eventType call different type of class
switch (eventType)
{
case "student":
final Student studentInfo =
objectMapper.treeToValue (jsonNode, Student.class);
//I was calling the JAXB Method as I was doing the JSON to XML Conversion
xmlCreator (studentInfo, Student.class);
break;
case "teacher":
final Teacher teacherInfo =
objectMapper.treeToValue (jsonNode, Teacher.class);
xmlCreator (teacherInfo, Teacher.class);
break;
}
}
}
//Method to create the XML using the JAXB
private void xmlCreator (Object eventInfo,
Class eventType) throws JAXBException
{
private final StringWriter sw = new StringWriter ();
// Create JAXB Context object
JAXBContext context = JAXBContext.newInstance (eventType);
// Create Marshaller object from JAXBContext
Marshaller marshaller = context.createMarshaller ();
// Print formatted XML
marshaller.setProperty (Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
// Do not add the <xml> version tag
marshaller.setProperty (Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
// XmlSupportExtension is an interface that every class such as Student Teacher implements
// xmlSupport is a method in XmlSupportExtension which has been implemented in all classes
// Create the XML based on type of incoming event type and store in SW
marshaller.marshal (((XmlSupportExtension) eventInfo).xmlSupport (),
sw);
// Add each event within the List
eventsList.add (sw.toString ());
// Clear the StringWritter for next event
sw.getBuffer ().setLength (0);
}
}
This is the class that overrides the JACKSON class.
This can be used if your Json has duplicate JSON keys. Follow this post for the complete explnation if you need. If you dont need then skip this part and remove the part of the code module from the above class:
Jackson #JsonAnySetter ignores values of duplicate key when used with Jackson ObjectMapper treeToValue method
#JsonDeserialize(using = JsonNodeDupeFieldHandlingDeserializer.class)
public class JsonNodeDupeFieldHandlingDeserializer extends JsonNodeDeserializer {
#Override
protected void _handleDuplicateField(JsonParser p, DeserializationContext ctxt, JsonNodeFactory nodeFactory, String fieldName,
ObjectNode objectNode, JsonNode oldValue, JsonNode newValue) {
ArrayNode asArrayValue = null;
if (oldValue.isArray()) {
asArrayValue = (ArrayNode) oldValue;
} else {
asArrayValue = nodeFactory.arrayNode();
asArrayValue.add(oldValue);
}
asArrayValue.add(newValue);
objectNode.set(fieldName, asArrayValue);
}
}

Cannot find a matching 0-argument function named {NAME}METHOD() in built-in template rule

I registered ExtensionFunctionDefinition without parameters, but can not call it.
What is wrong and how this can be fixed?
Looks like function is unregistered.
Here is the code:
Saxon
...<saxon.he.version>9.7.0-3</saxon.he.version>...
<groupId>net.sf.saxon</groupId>
<artifactId>Saxon-HE</artifactId>...
Exception
Error at char 29 in xsl:value-of/#select on line 23 column 71
XTDE1425: Cannot find a matching 0-argument function named {http://date.com}getFormattedNow()
in built-in template rule
XSLT
<xsl:stylesheet ...
xmlns:dateService="http://date.com"
exclude-result-prefixes="dateService" version="1.0">
...
<xsl:value-of select="dateService:getFormattedNow()"/>
ExtensionFunctionDefinition
public class DateExtensionFunction extends ExtensionFunctionDefinition {
public StructuredQName getFunctionQName() {
return new StructuredQName("", "http://date.com", "getFormattedNow");
}
public SequenceType[] getArgumentTypes() {
return new SequenceType[]{SequenceType.OPTIONAL_STRING};
}
public SequenceType getResultType(SequenceType[] sequenceTypes) {
return SequenceType.SINGLE_STRING;
}
public boolean trustResultType() {
return true;
}
public int getMinimumNumberOfArguments() {
return 0;
}
public int getMaximumNumberOfArguments() {
return 1;
}
public ExtensionFunctionCall makeCallExpression() {
return new ExtensionFunctionCall() {
public Sequence call(XPathContext context, Sequence[] arguments) throws XPathException {
return new StringValue("TEST");
}
};
}
}
Transformer
Processor processor = new Processor(false);
Configuration configuration = new Configuration();
TransformerFactoryImpl transformerFactory = new TransformerFactoryImpl();
processor.registerExtensionFunction(new DateExtensionFunction());
configuration.setProcessor(processor);
transformerFactory.setConfiguration(configuration);
//...newTransformer
The relationship between your Processor, Configuration, and TransformerFactory are wrong.
It's best to think of the Configuration as holding all the significant data, and the Processor and TransformerFactory as API veneers on top of the Configuration.
When you create a Processor, it creates its own Configuration underneath. Ditto for the TransformerFactoryImpl. So you have three Configuration objects here, the two that Saxon created, and the one that you created. The extension function is registered with the Configuration that underpins the (s9api) processor, which has no relationship with the one that you are using with the JAXP TransformerFactory.
I would recommend that you either use JAXP or s9api, but avoid mixing them. If you want to use JAXP, do:
TransformerFactoryImpl transformerFactory = new TransformerFactoryImpl();
Configuration config = transformerFactory.getConfiguration();
config.registerExtensionFunction(new DateExtensionFunction());
Note that from Saxon 9.7, the JAXP interface is implemented as a layer on top of the s9api interface.
Here is some code that works (tested under Saxon 9.7 HE). I don't know why yours doesn't: please put together a complete program that illustrates the problem.
import ....;
public class ExtensionTest extends TestCase {
public class DateExtensionFunction extends ExtensionFunctionDefinition {
public StructuredQName getFunctionQName() {
return new StructuredQName("", "http://date.com", "getFormattedNow");
}
public net.sf.saxon.value.SequenceType[] getArgumentTypes() {
return new net.sf.saxon.value.SequenceType[]{net.sf.saxon.value.SequenceType.OPTIONAL_STRING};
}
public net.sf.saxon.value.SequenceType getResultType(net.sf.saxon.value.SequenceType[] sequenceTypes) {
return net.sf.saxon.value.SequenceType.SINGLE_STRING;
}
public boolean trustResultType() {
return true;
}
public int getMinimumNumberOfArguments() {
return 0;
}
public int getMaximumNumberOfArguments() {
return 1;
}
public ExtensionFunctionCall makeCallExpression() {
return new ExtensionFunctionCall() {
public Sequence call(XPathContext context, Sequence[] arguments) throws XPathException {
return new StringValue("TEST");
}
};
}
}
public void testIntrinsicExtension() {
try {
TransformerFactoryImpl factory = new TransformerFactoryImpl();
factory.getConfiguration().registerExtensionFunction(new DateExtensionFunction());
String xsl = "<e xsl:version='3.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' " +
"result='{Q{http://date.com}getFormattedNow()}'/>";
Templates t = factory.newTemplates(new StreamSource(new StringReader(xsl)));
StringWriter sw = new StringWriter();
t.newTransformer().transform(new StreamSource(new StringReader("<a/>")), new StreamResult(sw));
System.err.println(sw.toString());
} catch (TransformerConfigurationException tce) {
tce.printStackTrace();
fail();
} catch (TransformerException e) {
e.printStackTrace();
fail();
}
}
}
The output is:
<?xml version="1.0" encoding="UTF-8"?><e result="TEST"/>
Solution (works only for Saxon 9.4):
#Override
public ExtensionFunctionCall makeCallExpression() {
return new ExtensionFunctionCall() {
#Override
#SuppressWarnings("unchecked")
public SequenceIterator call(SequenceIterator[] arguments, XPathContext context) throws XPathException {
return SingletonIterator.makeIterator(StringValue.makeStringValue("TEST"));
}
};
}

How to set custom ValidationEventHandler on JAXB unmarshaller when using annotations

We’re using JAX-WS in combination with JAXB to receive and parse XML web service calls. It’s all annotation-based, i.e. we never get hold of the JAXBContext in our code. I need to set a custom ValidationEventHandler on the unmarshaller, so that if the date format for a particular field is not accepted, we can catch the error and report something nice back in the response. We have a XMLJavaTypeAdapter on the field in question, which does the parsing and throws an exception. I can’t see how to set a ValidationEventHandler onto the unmarshaller using the annotation-based configuration that we have. Any ideas?
Note: same question as this comment which is currently unanswered.
I have been struggling with this issue during the last week and finally i have managed a working solution. The trick is that JAXB looks for the methods beforeUnmarshal and afterUnmarshal in the object annotated with #XmlRootElement.
..
#XmlRootElement(name="MSEPObtenerPolizaFechaDTO")
#XmlAccessorType(XmlAccessType.FIELD)
public class MSEPObtenerPolizaFechaDTO implements Serializable {
..
public void beforeUnmarshal(Unmarshaller unmarshaller, Object parent) throws JAXBException, IOException, SAXException {
unmarshaller.setSchema(Utils.getSchemaFromContext(this.getClass()));
unmarshaller.setEventHandler(new CustomEventHandler());
}
public void afterUnmarshal(Unmarshaller unmarshaller, Object parent) throws JAXBException {
unmarshaller.setSchema(null);
unmarshaller.setEventHandler(null);
}
Using this ValidationEventHandler:
public class CustomEventHandler implements ValidationEventHandler{
#Override
public boolean handleEvent(ValidationEvent event) {
if (event.getSeverity() == event.ERROR ||
event.getSeverity() == event.FATAL_ERROR)
{
ValidationEventLocator locator = event.getLocator();
throw new RuntimeException(event.getMessage(), event.getLinkedException());
}
return true;
}
}
}
And this is the method getSchemaFromContext created in your Utility class:
#SuppressWarnings("unchecked")
public static Schema getSchemaFromContext(Class clazz) throws JAXBException, IOException, SAXException{
JAXBContext jc = JAXBContext.newInstance(clazz);
final List<ByteArrayOutputStream> outs = new ArrayList<ByteArrayOutputStream>();
jc.generateSchema(new SchemaOutputResolver(){
#Override
public Result createOutput(String namespaceUri,
String suggestedFileName) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
outs.add(out);
StreamResult streamResult = new StreamResult(out);
streamResult.setSystemId("");
return streamResult;
}
});
StreamSource[] sources = new StreamSource[outs.size()];
for (int i = 0; i < outs.size(); i++) {
ByteArrayOutputStream out = outs.get(i);
sources[i] = new StreamSource(new ByteArrayInputStream(out.toByteArray()), "");
}
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
return sf.newSchema(sources);
}

Jackson deserializing with custom deserializer causes a lot of GC calls and takes a lot longer

To solve my type mismatch problem discussed in this thread I created custom Deserializers and added them to ObjectMapper. However the performance deteriorates significantly with this.
With default deserializer i get 1-2 Garbage collection calls in logcat while with custom deserializer there are at least 7-8 GC calls, and hence the processing time is also increase significantly.
My Deserializer :
public class Deserializer<T> {
public JsonDeserializer<T> getDeserializer(final Class<T> cls) {
return new JsonDeserializer<T> (){
#Override
public T deserialize(JsonParser jp, DeserializationContext arg1) throws IOException, JsonProcessingException {
JsonNode node = jp.readValueAsTree();
if (node.isObject()) {
return new ObjectMapper().convertValue(node, cls);
}
return null;
}
};
}
}
And I am using this to add to Mapper
public class DeserializerAttachedMapper<T> {
public ObjectMapper getMapperAttachedWith(final Class<T> cls , JsonDeserializer<T> deserializer) {
ObjectMapper mapper = new ObjectMapper();
SimpleModule module = new SimpleModule(deserializer.toString(), new Version(1, 0, 0, null, null, null));
module.addDeserializer(cls, deserializer);
mapper.registerModule(module);
return mapper;
}
}
EDIT: Added extra data
My JSON is of considerable size but not huge:
I have pasted it here
Now for parsing the same JSON if i use this code:
String response = ConnectionManager.doGet(mAuthType, url, authToken);
FLog.d("location object response" + response);
// SimpleModule module = new SimpleModule("UserModule", new Version(1, 0, 0, null, null, null));
// JsonDeserializer<User> userDeserializer = new Deserializer<User>().getDeserializer(User.class);
// module.addDeserializer(User.class, userDeserializer);
ObjectMapper mapper = new ObjectMapper();
// mapper.registerModule(module);
JsonNode tree = mapper.readTree(response);
Integer code = Integer.parseInt(tree.get("code").asText().trim());
if(Constants.API_RESPONSE_SUCCESS_CODE == code) {
ExploreLocationObject locationObject = mapper.convertValue(tree.path("response").get("locationObject"), ExploreLocationObject.class);
FLog.d("locationObject" + locationObject);
FLog.d("locationObject events" + locationObject.getEvents().size());
return locationObject;
}
return null;
Then my logcat is like this
But if I use this code for same JSON
String response = ConnectionManager.doGet(mAuthType, url, authToken);
FLog.d("location object response" + response);
SimpleModule module = new SimpleModule("UserModule", new Version(1, 0, 0, null, null, null));
JsonDeserializer<User> userDeserializer = new Deserializer<User>().getDeserializer(User.class);
module.addDeserializer(User.class, userDeserializer);
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(module);
JsonNode tree = mapper.readTree(response);
Integer code = Integer.parseInt(tree.get("code").asText().trim());
if(Constants.API_RESPONSE_SUCCESS_CODE == code) {
ExploreLocationObject locationObject = mapper.convertValue(tree.path("response").get("locationObject"), ExploreLocationObject.class);
FLog.d("locationObject" + locationObject);
FLog.d("locationObject events" + locationObject.getEvents().size());
return locationObject;
}
return null;
Then my logcat is like this
How big is the object? Code basically builds a tree model (sort of dom tree), and that will take something like 3x-5x as much memory as the original document. So I assume your input is a huge JSON document.
You can definitely write a more efficient version using Streaming API. Something like:
JsonParser jp = mapper.getJsonFactory().createJsonParser(input);
JsonToken t = jp.nextToken();
if (t == JsonToken.START_OBJECT) {
return mapper.readValue(jp, classToBindTo);
}
return null;
it is also possible to implement this with data-binding (as JsonDeserializer), but it gets bit complicated just because you want to delegate to "default" deserializer.
To do this, you would need to implement BeanDeserializerModifier, and replace standard deserializer when "modifyDeserializer" is called: your own code can retain reference to the original deserializer and delegate to it, instead of using intermediate tree model.
If you are not tied to jackson you could also try Genson http://code.google.com/p/genson/.
In your case there are two main advantages: you will not loose in performance, it should be easier to implement. If the property event does not start with upper letter annotate it with #JsonProperty("Event") (same for the other properties starting with an upper letter).
With the following code you should be done:
Genson genson = new Genson.Builder()
.withDeserializerFactory(new EventDeserializerFactory()).create();
YourRootClass[] bean = genson.deserialize(json, YourRootClass[].class);
class EventDeserializerFactory implements Factory<Deserializer<Event>> {
public Deserializer<Event> create(Type type, Genson genson) {
return new EventDeserializer(genson.getBeanDescriptorFactory().provide(Event.class,
genson));
}
}
class EventDeserializer implements Deserializer<Event> {
private final Deserializer<Event> standardEventDeserializer;
public EventDeserializer(Deserializer<Event> standardEventDeserializer) {
this.standardEventDeserializer = standardEventDeserializer;
}
public Event deserialize(ObjectReader reader, Context ctx) throws TransformationException,
IOException {
if (ValueType.ARRAY == reader.getValueType()) {
reader.beginArray().endArray();
return null;
}
return standardEventDeserializer.deserialize(reader, ctx);
}
}

Getting elements from failed XML

I have a big xml file to be validated against a big XSD. The client asked me to populate a table with different values of data when there is a validation error. For eg if Student ID is not valid, I will show school district, region and student ID. In another section of the XML, if state is not valid I will show school name, state and region. The data to show varies based on the invalid data. But its two or three or four elements which are parents of the invalid child element should be extracted.
How I can extract data using XMLSTREAMREADER and Validator?
I tried this one and I can get only the invalid element not other data...
public class StaxReaderWithElementIdentification {
private static final StreamSource XSD = new StreamSource("files\\InterchangeEducationOrganizationExension.xsd");
private static final StreamSource XML = new StreamSource("files\\InterchangeEducationOrganizationExension.xml");
public static void main(String[] args) throws Exception {
SchemaFactory factory=SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(XSD);
XMLStreamReader reader = XMLInputFactory.newFactory().createXMLStreamReader(XML);
Validator validator = schema.newValidator();
validator.setErrorHandler(new MyErrorHandler(reader));
validator.validate(new StAXSource(reader));
}
}
and Handler is:
public class MyErrorHandler implements ErrorHandler {
private XMLStreamReader reader;
public MyErrorHandler(XMLStreamReader reader) {
this.reader = reader;
}
#Override
public void error(SAXParseException e) throws SAXException {
warning(e);
}
#Override
public void fatalError(SAXParseException e) throws SAXException {
warning(e);
}
#Override
public void warning(SAXParseException e) throws SAXException {
//System.out.println(reader.getProperty(name));
System.out.println(reader.getLocalName());
System.out.println(reader.getNamespaceURI());
e.printStackTrace(System.out);
}
}
Can anyone help me how I can extract the other data when the validation error occurred?
I'm not sure it is the best solution, but you might try using HTML EditorKit and implement a custom ParserCallback.
In that manner you could parse the document and react only to tags you are interested in. It will chew any XML/HTML no matter how invalid it is.

Categories

Resources