How to internationalization SAXParseException while parsing XML file? - java

I've got a problem similar to this question: SAXParseException localized
I'm trying to parse a XML file and get a list of parser errors (SAXParseException) in a several languages for example:
XmlImporter.importFile(params, "en") should return a list of errors in English, XmlImporter.importFile(params, "fr") should return a list of errors in French, XmlImporter.importFile(params, "pl") should return a list of errors in Polish language.
Every call of XmlImporter.importFile(params, "...") may be with a different locale.
This is my validation method:
private void validate(String xmlFilePath, String schemaFilePath) throws Exception {
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(new File(schemaFilePath));
Validator validator = schema.newValidator();
XmlErrorHandler errorHandler = new XmlErrorHandler();
validator.setErrorHandler(errorHandler);
try (InputStream stream = new FileInputStream(new File(xmlFilePath))) {
validator.validate(new StreamSource(stream));
}
XmlErrorHandler:
public class XmlErrorHandler implements ErrorHandler {
private List<String> errorsList = new ArrayList<>();
public List<String> getErrorsList() {
return errorsList;
}
#Override
public void warning(SAXParseException exception) throws SAXException {
errorsList.add(prepareExceptionDescription(exception));
}
#Override
public void error(SAXParseException exception) throws SAXException {
errorsList.add(prepareExceptionDescription(exception));
}
#Override
public void fatalError(SAXParseException exception) throws SAXException {
errorsList.add(prepareExceptionDescription(exception));
}
private String prepareExceptionDescription(SAXParseException exception) {
return "Error: " +
"colNumber: " + exception.getColumnNumber() +
" line number: " + exception.getLineNumber() +
" message: " + exception.getLocalizedMessage();
}
}
I assume, that I need to pass somehow/somewhere java.util.Locale/String to get in exception.getLocalizedMessage() custom message (in en, fr or pl)?

By the default Xerces (Java Parser which is used to convert XML file to Java object) could provide internationalization for given languages:
XMLSchemaMessages_de.properties XMLSchemaMessages_es.properties
XMLSchemaMessages_fr.properties XMLSchemaMessages_it.properties
XMLSchemaMessages_ja.properties XMLSchemaMessages_ko.properties
XMLSchemaMessages_pt_BR.properties XMLSchemaMessages_sv.properties
XMLSchemaMessages_zh_CN.properties XMLSchemaMessages_zh_TW.properties
To provide internationalization in other language:
Get XMLSchemaMessages.properties file from Apache Xerces and rename file to a new file XMLSchemaMessages_LANG.properties, where LANG needs to be changed to a new language.
Update file's messages to a new language and place this file in a classpath (You can add this file to src\main\resources\com\sun\org\apache\xerces\internal\impl\msg)
Exceptions will be visible in a new language (messages will be taken from XMLSchemaMessages_LANG.properties file)

Related

How to generate custom tag names and namespaces in xml using apache camel

I'm trying to transform pipe delimited string data to xml using camel bindy. But it is generating the tags along with the class name. Also I would like to add namespace to my tags.
I tried to use Camel process to generate custom tag, it's not working.
ConverterRoute.java
private static final String SOURCE_INPUT_PATH = "file://inbox?fileName=3000.txt";
private static final String SOURCE_OUTPUT_PATH = "file://outbox?fileName=itemfile.xml";
public void addRoutesToCamelContext(CamelContext context) throws Exception {
context.addRoutes(new RouteBuilder() {
public void configure() {
try {
DataFormat bindyFixed = new BindyCsvDataFormat(PartInboundIFD.class);
NameSpace nameSpace = new NameSpace("PART_INB_IFD","https://apache.org.com");
from(SOURCE_INPUT_PATH).
unmarshal(bindyFixed).
marshal().
xstream().
to(SOURCE_OUTPUT_PATH);
} catch (Exception e) {
e.printStackTrace();
}
}
});
}
}
Pojo.java
#CsvRecord(separator = "\\|",skipField = true)
public class Pojo {
#Link
private ControlSegment CONTROL_SEGMENT;
}
CamelComponent.java
public class CamelConfig extends RouteBuilder {
#Override
public void configure() throws Exception {
try {
CamelContext context = new DefaultCamelContext();
ConverterRoute route = new ConverterRoute();
route.addRoutesToCamelContext(context);
context.start();
Thread.sleep(5000);
context.stop();
} catch (Exception exe) {
exe.printStackTrace();
}
}
}
OUTPUT
Result.xml
<list>
<com.abc.domain.Pojo>
<CONTROL__SEGMENT/>
<TRNNAM>PART_TRAN</TRNNAM>
<TRNVER>9.0</TRNVER>
</com.abc.domain.Pojo>
</list>
Above posted is the output of the given transformation.In the first tag it is printing the tag name with whole package and class name(eg: com.abc.domain.Pojo).Also I'm trying to generate namespace its not generating that in my output.
May be you can add an additional XSLT route (https://camel.apache.org/components/latest/xslt-component.html).
Within the XSLT it's possible to transform the XML to your liking and add the correct namespaces (How can I add namespaces to the root element of my XML using XSLT?)

Parsing Apache Tika XML Output returns Unknown Tag

Basically, i parse several xml output from Apache Tika to get metadata (via meta tags) and list of embedded files using <div class="embedded" id="content">. However, i found my map had several key Unknown tag (0x...). I wonder if it caused by Tika's incomplete tag output because the error i get only related to unclosed tag - which i suspect within the body of XML instead of the output i want (meta, div). However, it is rather illogical where the only code that writes into the map are meta tags and divs (with embedded class) - which is only a small part of the document.
public class Parse {
private class internalXMLReader extends DefaultHandler{
public final Map<String, Object> entityList = new HashMap<>();
#Override
public void startElement(String uri, String localname, String qName, Attributes attributes) throws SAXException{
String key, content;
if(qName.equalsIgnoreCase("meta")){
key = attributes.getValue("name");
content = attributes.getValue("content");
if(key.contains("Content-Type")){
String tmp[] = attributes.getValue("content").replace(' ', '\0').split(";");
if(tmp.length > 1){
content = tmp[0];
}
}
entityList.put(key, content);
}
else if(qName.equalsIgnoreCase("div")){
if((attributes.getValue("class") != null) && (attributes.getValue("class").equalsIgnoreCase("embedded"))){
key = "embedded";
List<String> inlist;
if(entityList.containsKey("embedded") && (entityList.get("embedded") instanceof List)){
inlist = (List) entityList.get(key);
}
else{
inlist = new LinkedList<>();
entityList.put(key, inlist);
}
inlist.add(attributes.getValue("id"));
}
}
}
#Override
public void endElement(String uri, String localname, String qName) throws SAXException{
//no, i just did not want to validate or such..
}
#Override
public void characters(char ch[], int start, int length) throws SAXException{
//no, we don't actually read <something>this</something> yet
}
}
public Entity parse(String xml, Entity in){
try{
InputSource xmlinput = new InputSource(new StringReader(xml));
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
internalXMLReader handler = new internalXMLReader();
parser.parse(xmlinput, handler);
in.addMeta(handler.entityList);
}
catch(IOException | ParserConfigurationException | SAXException ex){
Logger.getLogger(TikaParseNCluste.class.getName()).log(Level.SEVERE, null, ex);
}
return in;
}
}
Perhaps i should take a look at my 800+ xml files.
Google and javadocs had no information regarding this thing and i'm rather impatient. However runs grep -i -l -r "name=\"unknown" . i got several jpg files had <meta name="Unknown..." contents="..."/> perhaps this is why. I don't expect ApacheTika would give such outputs. So, i changed my code to:
...
if(qName.equalsIgnoreCase("meta") && (attributes.getValue("name") != null)){
key = attributes.getValue("name");
if((key != null) && (!key.contains("Unknown"))){
content = attributes.getValue("content");
if(key.contains("Content-Type")){
String tmp[] = attributes.getValue("content").replace(' ', '\0').split(";");
if(tmp.length > 1){
content = tmp[0];
}
}
entityList.put(key, content);
}
}
...
I wonder if it's a bug or something else. So far, quick query to Google Search with keyword apache tika unknown tag only lead me here.

How to extract useful information from TransformerException

I am using javax.xml.transform.* to do XSLT transformation. Since the xslt file to be used comes from the outside world there could be errors in that file, and I am going to give back some meaningful response to the user.
Although I can easily catch the TransformationExceptions, I found no way to obtain enough information from it. For example, if there is a tag to be terminated by an end-tag, printStackTrace() gives scarring message
javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(Unknown Source)
... (100 lines)
and getMessage() gives only
Could not compile stylesheet
None of them gives the real reason of the error.
I noticed that in Eclipse test console I can see the following
[Fatal Error] :259:155: The element type "sometag" must be terminated by the matching end-tag "</sometag>".
ERROR: 'The element type "sometag" must be terminated by the matching end-tag "</sometag>".'
FATAL ERROR: 'Could not compile stylesheet'
This is exactly what I want. Unfortunately, since this is a web application, the user cannot see this.
How can I display the correct error message to the user?
Put your own ErrorListener on your Transformer instance using Transformer.setErrorListener, like so:
final List<TransformationException> errors = new ArrayList<TransformationException>();
Transformer transformer = ... ;
transformer.setErrorListener(new ErrorListener() {
#Override
public void error(TransformerException exception) {
errors.add(exception);
}
#Override
public void fatalError(TransformerException exception) {
errors.add(exception);
}
#Override
public void warning(TransformerException exception) {
// handle warnings as well if you want them
}
});
// Any other transformer setup
Source xmlSource = ... ;
Result outputTarget = ... ;
try {
transformer.transform(xmlSource, outputTarget);
} catch (TransformerException e) {
errors.add(e); // Just in case one is thrown that isn't handled
}
if (!errors.isEmpty()) {
// Handle errors
} else {
// Handle output since there were no errors
}
This will log all the errors that occur into the errors list, then you can use the messages off those errors to get what you want. This has the added benefit that it will try to resume the transformation after the errors occur. If this causes any problems, just rethrow the exception by doing:
#Override
public void error(TransformerException exception) throws TransformationException {
errors.add(exception);
throw exception;
}
#Override
public void fatalError(TransformerException exception) throws TransformationException {
errors.add(exception);
throw exception;
}
Firstly, it's likely that any solution will dependent on your choice of XSLT processor. Different implementations of the JAXP interface might well provide different information in the exceptions they generate.
It's possible that the error from the XML parser is available in a wrapped exception. For historic reasons, TransformerConfigurationException offers both getException() and getCause() to access wrapped exceptions, and it may be worth checking them both.
Alternatively it's possible that the information was supplied in a separate call to the ErrorListener.
Finally, this particular error is detected by the XML parser (not the XSLT processor) so in the first instance it will be handled by the parser. It may well be worth setting the parser's ErrorHandler and catching parsing errors at that level. If you want explicit control over the XML parser used by the transformation, use a SAXSource whose XMLReader is suitably initialized.
You can configure System.out to write in your own OutputStream.
Use of ErrorListener don't catch all output.
If you work with threads you can look here (http://maiaco.com/articles/java/threadOut.php) to avoid change of System.out for other threads.
example
public final class XslUtilities {
private XslUtilities() {
// only static methods
}
public static class ConvertWithXslException extends Exception {
public ConvertWithXslException(String message, Throwable cause) {
super(message, cause);
}
}
public static String convertWithXsl(String input, String xsl) throws ConvertWithXslException {
ByteArrayOutputStream systemOutByteArrayOutputStream = new ByteArrayOutputStream();
PrintStream oldSystemOutPrintStream = System.out;
System.setOut(new PrintStream(systemOutByteArrayOutputStream));
ByteArrayOutputStream systemErrByteArrayOutputStream = new ByteArrayOutputStream();
PrintStream oldSystemErrPrintStream = System.err;
System.setErr(new PrintStream(systemErrByteArrayOutputStream));
String resultXml;
try {
System.setProperty("javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new StringReader(xsl)));
StringWriter stringWriter = new StringWriter();
transformer.transform(new StreamSource(new StringReader(input)), new StreamResult(stringWriter));
resultXml = stringWriter.toString();
} catch (TransformerException e) {
System.out.flush();
final String systemOut = systemOutByteArrayOutputStream.toString();
System.err.flush();
final String systemErr = systemErrByteArrayOutputStream.toString();
throw new ConvertWithXslException("TransformerException - " + e.getMessageAndLocation()
+ (systemOut.length() > 0 ? ("\nSystem.out:" + systemOut) : "")
+ (systemErr.length() > 0 ? ("\nSystem.err:" + systemErr) : ""), e);
} finally {
System.setOut(oldSystemOutPrintStream);
System.setErr(oldSystemErrPrintStream);
}
return resultXml;
}
}

Getting elements from failed XML

I have a big xml file to be validated against a big XSD. The client asked me to populate a table with different values of data when there is a validation error. For eg if Student ID is not valid, I will show school district, region and student ID. In another section of the XML, if state is not valid I will show school name, state and region. The data to show varies based on the invalid data. But its two or three or four elements which are parents of the invalid child element should be extracted.
How I can extract data using XMLSTREAMREADER and Validator?
I tried this one and I can get only the invalid element not other data...
public class StaxReaderWithElementIdentification {
private static final StreamSource XSD = new StreamSource("files\\InterchangeEducationOrganizationExension.xsd");
private static final StreamSource XML = new StreamSource("files\\InterchangeEducationOrganizationExension.xml");
public static void main(String[] args) throws Exception {
SchemaFactory factory=SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(XSD);
XMLStreamReader reader = XMLInputFactory.newFactory().createXMLStreamReader(XML);
Validator validator = schema.newValidator();
validator.setErrorHandler(new MyErrorHandler(reader));
validator.validate(new StAXSource(reader));
}
}
and Handler is:
public class MyErrorHandler implements ErrorHandler {
private XMLStreamReader reader;
public MyErrorHandler(XMLStreamReader reader) {
this.reader = reader;
}
#Override
public void error(SAXParseException e) throws SAXException {
warning(e);
}
#Override
public void fatalError(SAXParseException e) throws SAXException {
warning(e);
}
#Override
public void warning(SAXParseException e) throws SAXException {
//System.out.println(reader.getProperty(name));
System.out.println(reader.getLocalName());
System.out.println(reader.getNamespaceURI());
e.printStackTrace(System.out);
}
}
Can anyone help me how I can extract the other data when the validation error occurred?
I'm not sure it is the best solution, but you might try using HTML EditorKit and implement a custom ParserCallback.
In that manner you could parse the document and react only to tags you are interested in. It will chew any XML/HTML no matter how invalid it is.

Why am I getting "MalformedURLException: no protocol" when using SAXParser?

I'm copying code from one part of our application (an applet) to inside the app. I'm parsing XML as a String. It's been awhile since I parsed XML, but from the error that's thrown it looks like it might have to do with not finding the .dtd. The stack trace makes it difficult to find the exact cause of the error, but here's the message:
java.net.MalformedURLException: no protocol: http://www.mycomp.com/MyComp.dtd
and the XML has this as the first couple lines:
<?xml version='1.0'?>
<!DOCTYPE MYTHING SYSTEM 'http://www.mycomp.com/MyComp.dtd'>
and here's the relevant code snippets
class XMLImportParser extends DefaultHandler {
private SAXParser m_SaxParser = null;
private String is_InputString = "";
XMLImportParser(String xmlStr) throws SAXException, IOException {
super();
is_InputString = xmlStr;
createParser();
try {
preparseString();
parseString(is_InputString);
} catch (Exception e) {
throw new SAXException(e); //"Import Error : "+e.getMessage());
}
}
void createParser() throws SAXException {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
try {
factory.setFeature("http://xml.org/sax/features/namespaces", true);
factory.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
m_SaxParser = factory.newSAXParser();
m_SaxParser.getXMLReader().setFeature("http://xml.org/sax/features/namespaces", true);
m_SaxParser.getXMLReader().setFeature("http://xml.org/sax/features/namespace-prefixes", true);
} catch (SAXNotRecognizedException snre){
throw new SAXException("Failed to create XML parser");
} catch (SAXNotSupportedException snse) {
throw new SAXException("Failed to create XML parser");
} catch (Exception ex) {
throw new SAXException(ex);
}
}
void preparseString() throws SAXException {
try {
InputSource lSource = new InputSource(new StringReader(is_InputString));
lSource.setEncoding("UTF-8");
m_SaxParser.parse(lSource, this);
} catch (Exception ex) {
throw new SAXException(ex);
}
}
}
It looks like the error is happening in the preparseString() method, on the line that actually does the parsing, the m_SaxParser.parse(lSource, this); line.
FYI, the 'MyComp.dtd' file does exist at that location and is accessible via http. The XML file comes from a different service on the server, so I can't change it to a file:// format and put the .dtd file on the classpath.
I think you have some extra code in the XML declaration. Try this:
<?xml version='1.0'?>
<!DOCTYPE MYTHING SYSTEM "http://www.mycomp.com/MyComp.dtd">
The above was captured from the W3C Recommendations: http://www.w3.org/QA/2002/04/valid-dtd-list.html
You can use the http link to set the Schema on the SAXParserFactory before creating your parser.
void createParser() throws SAXException {
Schema schema = SchemaFactory.newSchema(new URL("http://www.mycomp.com/MyComp.dtd"));
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setSchema(schema);
The problem is that this:
http://www.mycomp.com/MyComp.dtd
is an HTML hyperlink, not a URL. Replace it with this:
http://www.mycomp.com/MyComp.dtd
Since this XML comes from an external source, the first thing to do would be to complain to them that they are sending invalid XML.
As a workaround, you can set an EntityResolver on your parser that compares the SystemId to this invalid url and returns a correct http url:
m_SaxParser.getXMLReader().setEntityResolver(
new EntityResolver() {
public InputSource resolveEntity(final String publicId, final String systemId) throws SAXException {
if ("http://www.mycomp.com/MyComp.dtd".equals(systemId)) {
return new InputSource("http://www.mycomp.com/MyComp.dtd");
} else {
return null;
}
}
}
);

Categories

Resources