Validate XML - remove invalid data - java

I want to validate an xml file against it schema. Once the validation is completed I want to remove any invalid data and save this invalid data into a new file. I can perfom the validation, just stuck on the removing and saving invalid data into new file.

I take back everything I just wrote. ... :) You can get the node you need using the Current Element Node property at Exception time, it seems.
Element curElement = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");
Because the Schema is defined via Xerces, I think this will work. See http://xerces.apache.org/xerces2-j/properties.html#dom.current-element-node .
There is more explanation in the answer at How can I get more information on an invalid DOM element through the Validator? .

Related

How to parse xml with java

I'm calling a soap webservice from my java application.
I get response and I want to parse it and get data.
The problem is that field <tranData>, contains structure with >< instead of <>. How can I parse this document to get data from field <tranData>?
This is response structure:
<response>
<Portfolio>
<ID>1</ID>
<holder>2</holder>
</Portfolio>
<tranData> <responseOne><header><code>1</code></header></responseOne></tranData>
Please remember that, this is only a example of response, and the amount of data will be much bigger, so the solution should be fast.
What you show us is the actual document as it is received over the wire, right? So <tranData> contains an XML string that has been escaped to not interfere with the markup of the rest of the containing document.
When you read the content of the <tranData> element, the XML processor will 'unescape' the string and give you the 'original' value:
<responseOne><header><code>1</code></header></responseOne>
What you do with that value is a different story. You can parse it as yet another XML document and retrieve the value of the <code> element, or just pass the string along to some other processing step.

Cannot read the first line of a JSON file in Java

I am trying to read some data from a JSON file that I generated from a MongoDB document. But when trying to read the first entry in the document, i get an exception:
org.json.JSONException: JSONObject["Uhrzeit"] not found.
This only happens with the first entry, reading other entrys does not cause an exception.
Using jsonObject.getString("") on any entry that is not the first returns the values as expected.
//Initiate Mongodb and declare the database and collection
MongoClient mongoClient = new MongoClient(new MongoClientURI("mongodb://localhost:27017"));
MongoDatabase feedbackDb = mongoClient.getDatabase("local");
MongoCollection<Document> feedback = feedbackDb.getCollection("RückmeldungenShort");
//gets all documents in a collection. "new Document()" is the filter, that returns all Documents
FindIterable<Document> documents = feedback.find(new Document());
//Iterates over all documents and converts them to JSONObjects for further use
for(Document doc : documents) {
JSONObject jsonObject = new JSONObject(doc.toJson());
System.out.print(jsonObject.toString());
System.out.print(jsonObject.getString("Uhrzeit"));
}
Printing jsonObject.toString() produces the JSON String for testing purposes (in one line):
{
"Ort":"Elsterwerda",
"Wetter-Keyword":"Anderes",
"Feedback\r":"Test Gelb\r",
"Betrag":"Gelb",
"Datum":"18.05.2018",
"Abweichung":"",
"Typ":"Vorhersage",
"_id":{
"$oid":"5b33453b75ef3c23f80fc416"
},
"Uhrzeit":"05:00"
}
Note, that the order in which the entries appear is mixed up and the first one appearing in the database was "Uhrzeit".
This is how it looks like:
The JSON file is valid according to https://jsonformatter.curiousconcept.com/ .
The "Uhrzeit" is even recognized within the JSONObject while in debug mode:
I assumed it might have something to do with the entries themselves, so I switched "Datum" and "Ort" to the first place in the document but that produced the same results.
There are lots of others that have posted on this error message, but it seems to me like they all had slightly different problems.
I imported a .csv with my data into MongoDB and read the documents from there. Somewhere in the process of reading the data, "\r"s were automatically generated where the line breaks were in my .csv (aka. at the end of each dataset). In this case at the key value pair "Feedback" (as seen in the last picture).
When checking my output again with another JSON validator, I noticed that there was an "invisible" symbol in my JSON file that caused the key not to be found. Now this symbol is located in front of the first key (after the MongoDB-id) when importing a .csv document to my DB. I imported a correct version of the .csv into my MongoDB and exported it again and the symbol reappeared.
The problem was that my .csv was in "Windows" format. Converting it to "Unix" format will get rid of the generated "\r"s. The "invisible" symbol was the UTF-8-BOM code that is added at the beginning of a document. You can reformat your .csv to be just UTF-8 and get rid of it that way.

Convert DFDL to XML

I'm trying to parse a web service response message in the following format (message tree):
Message
Properties
Properties..[]
DFDL
ObjectIWantUnmarshalled
AllItsDataIwant[]
And unmarshal the "ObjectIWantUnmarshalled". However, this data is in DFDL format.
In my request, I use the following line in order to format from XML to DFDL:
Document outDocument = outMessage.createDOMDocument(MbDFDL.PARSER_NAME);
But there doesn't seem to be a way to to the opposite, of DFDL to XML.
I have tried:
Document outDocument = inMessage.createDOMDocument(MbXMLNSC.PARSER_NAME);
As well as other attempts to simply unmarshal the data directly from the MbMessage:
jaxbContext_COBOL.createUnmarshaller().unmarshal(inMessage.getDOMDocument())
But I have not been able to get a Document node this way, or any other way, it is always null.
Probably a lot too late, but you were going about this the wrong way.
When using WMB and IIB you should use the built-in XML support - not the javax.XML.* class library. So instead of using the JAXB unmarshaller, you should
create an XMLNSC tree under the output message root
copy the input DFDL message tree to the output XMLNSC message tree ( one line )
...and the message flow will serialize ( unmarshall ) the tree as XML whenever it needs to - when it encounters an output node, or when you call outMessage.toBitstream().

Get an XML from a url using dom

I have store in a String variable(link) the url that I get the xml response, I use a dom to parse the xml data.
In order to be sure that I extract the data correctly I store the xml in the local drive, build my parser and I took the data:
document = builder.parse(new File(filepath));
So when I try to get it from url I used:
document = builder.parse(new URL(link).openStream());
And it didn't work. What am I missing?
The data of the xml are stored in a list which then are shown in a jsf datatable.
Well the above works just fine, the problem was the index of elements of the nodelist. For some reason when i was reading from file
obj.setattribute1(cDetails.item(1).getTextContent());
obj.setattribute2(cDetails.item(3).getTextContent());
see that the item are increased by 2 each time
now that i read a URL the increment is 1 every time
Now i am sure that there is a reason for this which i don't understand probably cause of my limited yet knowledge but the above work and the index of the item increases 1 for the next item in the nodelist.

SAXParseException when trying to print data in table

Caused by: org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'jr:table'. One of '{"http://jasperreports.sourceforge.net/jasperreports":component}' is expected.
Why it occurs?
you provide almost no context for your question, so your chances of getting a helpful answer are slim.
however, that looks like the sort of exception you would get if you are parsing an xml document with a schema defined and the document does not match the schema.
UPDATE:
based on your comment below, i see the document does list a schema, so my original guess still stands. your xml document does not match the schema (e.g. there is a "table" element where the schema says there should be a "component" element).

Categories

Resources