Convert Arbitrary JSON with invalid XML characters to XML in Java - java

Background:
I'm calling web APIs that are in JSON format and passing them through
a data orchestration tool that needs them in XML format.
Orchestration tool allows custom Java procedures.
Problem:
JSON can contain elements that when converted to XML cause issues.
For example twitter handles #john: somevalue is fine for a key in JSON
but when converted to XML <#john>somevalue causes the
orchestration tool to throw errors.
I'm hitting a wide variety of web APIs that change often. I need to
be able to convert arbitrary JSON to XML with little to no maintenance.
Research so far:
I've found several ways to convert JSON to XML in Java but many of them are for fixed input structures.
This StackOverflow post seems like what I want but I'm having issues getting it to work and tracking down all of the JARs required.
I've seen some libraries will do some basic character escapes for &, <, >, ' and ". Is there one that is more robust?

I ended up deserializing the JSON and traversing the data by using the following regex to find the nodes and then remove or replace non latin characters.
Regex that grabs JSON node
"(.*?)":

Related

How to convert Badgerfish style JSON to XML in Java?

My Java application takes XML input, and parses it using the simpleframework. I want it to accept JSON as well, therefore I want to convert the JSON to XML.
Tags and attributes are important, therefore I use the Badgerfish convention.
This works well in Python with xmljson, but I can't find a decent package to do this. GSON doesn't seem to have a Badgerfish implementation. This topic doesn't provide any tag/attribute retaining packages, the topic is a bit old as well.
Which Java packages can do the conversion from JSON to XML while putting tags/attributes at the right place?
Suggestions for alternative methods than Badgerfish are welcome as well...
Many thanks in advance!
You can use the XPath 3.1 json-to-xml() function, and then do an XSLT transformation on the generated XML to get it into the format you require.

CSV to XML using JAXB?

Background
I have a situation where I can get data either in the form of an XML-file or Excel/CSV-files. In case the data comes in a non-XML format it will be divided into several different files/tables, representing different subsections of the XML. The end goal is to validate the data and generate a valid XML-file using an existing schema, regardless of the format of the indata.
When receiving an XML-file the idea is to unmarshall and validate it. For simple errors autmatic fixes will be applied, and in the end a new XML-file will be marshalled from the JAXB classes.
Question
In order to be able to generalize as much as possible of the solution, my idea was to try to generate a JAXB representation of the non-XML data too, and then generate the end XML-file from those classes. I have been trying to find a good tutorial or introduction to converting non-XML to a JAXB representation, but I haven't really been able to find anything useful, which makes me wonder, is this a really bad approach? Any better suggestions for how to solve this problem? In the majority of the cases the files are likely to be non-XML, so I am willing to throw out the current approach if anyone has better solution that uses some other technology.
I've worked before with univocity parsers. They work well and are simple to use to converting CSV to Java object which then you searialize using JAXB as well.

JSON To JSON Transformation | Template Engine for Java

I have a requirement to transform an incoming JSON to an output JSON. For this I am looking for a solution that can work based on templates. What I have in my mind is a solution on lines of XSLT transformation that allows converting an XML to a desired output format (XML, HTML, Text) defined by the style sheet.
One option(or rather a workaround) to use XSLT is to convert JSON to XML that is:
input JSON -> XML -> transform -> output JSON
This approach would have a performance overhead of converting JSON to XML and this would become prominent as the size of incoming object increases.
I found a Node/client layer solution that transforms JSON based on the rules specified in a template. More details about can be found [here][1]. However, I was not able to find any solution that works for a java based application.
Any thoughts/help in terms of solution/frameworks to resolve this would be really helpful.
Thanks.
You could try JOLT, advertised as a JSON to JSON transformation library written in Java.
Or you can search this thread for other libraries and tools which can transform JSON.
The new XSLT 3.0 draft also includes support for JSON as input and output format. Saxon has already started an implementation and seems to support for the JSON part.
You could try JSLT, which is a transform language where you write the fixed part of the output in JSON syntax, then insert expressions to compute the values you want to insert in the template. It's quite similar to how XSLT and XPath work together.
It's implemented in Java on top of Jackson.
For simple transformations you can use jmom library.
For a complex transformation you can use template framework like freemarker.
And convert json data to Map/List form using json library so it can be used by template framework.

Unable to unmarshal strange XML format using Java and JAXB

I need to retrieve financial data using the Open Financial Exchange (OFX) protocol. In order to do this, I am using JAXB to marshal an object tree into an XML string that specifies data request parameters, and then I am sending this XML string to a bank's server. The bank then responds with an XML string containing the requested data, which I unmarshal into an object tree using JAXB. For the first couple of banks I tried, I received the data back in well-formed XML that conformed to the published OFX schema, and I was able to unmarshal it easily using JAXB.
However, when I requested data from Citigroup, they sent me back the following:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20150513180826.000
<LANGUAGE>ENG
<FI>
<ORG>Citigroup
<FID>24909
</FI>
</SONRS>
</SIGNONMSGSRSV1>
</OFX>
Note that this is an abbreviated form of the actual output, but it is enough to illustrate the problem. The problem is that I cannot figure out how to use JAXB to unmarshal this content. It is not well-formed XML because (1) it doesn't have an XML header, (2) the custom processing instructions (the first nine lines above) are not enclosed in <?...?> tags, and (3) most importantly, the simpleTypes have only opening tags but no closing tags.
I have searched all over for an answer to this and found a similar XML-ish format in a couple of places, and one of those places indicated that this may even be a valid format for sending XML over the web. But I haven't found any information that can help me unmarshal it or parse it.
Does anyone have any suggestions? I am usually pretty resourceful when it comes to these types of problems (hence why this is my first question on here), but this one has me stumped. Thanks in advance for any help you can provide.
Your basic problem is that the input you show here is not XML, it's SGML (see DATA:OFXSGML). You will have to preprocess it to make it acceptable to an XML parser. The kind of preprocessing you have to do will be application specific, as there's no general mechanism to deal well with that. If you have the SGML DTD, you might be able to get a product such as omnimark to "mostly" fix it up.
Well , maybe you need to handle this bank services in some other manner, for example when you receive data from this bank maybe read the Stream and maybe try to undetify the beggining of tag and then the end of (read line by line link)the rest of the stream ..free will . After that the string that remains is the XML that you need , so pass it through your already implemented JAXB code.

Parsing Invalid XML Characters using XStream parser - Java [duplicate]

This question already has answers here:
How to parse invalid (bad / not well-formed) XML?
(4 answers)
Closed 5 years ago.
I am having a classic XML validation question -
I need to parse incoming XML (from other applications - which don't use proper XML formatter) where
there are Broken Tags and XML Special characters embedded in Data (but not using CDData tag to wrap around)
I am using simple XStream parser to unmarshall the incoming stream as it's simple serialization and not a strict parser. For special characters it throws ConverterException and won't parse the file.
I want to know if there is any other parser which can be used to parse Invalid XML files (special characters etc)
We have no control over what would be sent as Input stream and as a part of auditing application, need to read as much Good records from the incoming file as possible.
Is there a better parsing option available or do I need to write Custom Parser to parse these files?
I am using Spring Batch to do batch processing and XStream(1.x) to parse the XML files.
AS XSD validation is failing, I am wondering even if it's worth to explore other parsers/ Custom parser option..
Looking for your expert opinions on XML Validations..
I understand that you trying to make best of messy input. Unfortunately, since there doesn't seem to be a clear specification of the format of that input, you are actually on your own. An approach could be to first convert the input files to valid XML, which is basically what you would do by writing your own parser. In Java you could do this by reading and parsing the files using your own specialized code and output a standard Java XML interface (SAX, DOM, etc.). But, depending on your knowledge, it may be faster to use a different language specialized in text parsing.
My experience is that the only real long-term solution here is to force the data suppliers to provide valid XML. The reason for this is that, although you can do your best in making valid data out of the invalid data, there is always the risk that your interpretation is wrong. And half-valid data is often worse than no data at all. IMHO it is best to leave the responsibility for correct data at the suppliers.

Categories

Resources