How to validate an XML file against a given DTD file? - java

I have an XML file, which has a DTD reference in it, like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE something SYSTEM "something.dtd">
I'm using a DocumentBuilderFactory:
public static Document validateXMLFile(String xmlFilePath) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setValidating(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
builder.setErrorHandler(new ErrorHandler() {
#Override
public void error(SAXParseException exception) throws SAXException {
// do something more useful in each of these handlers
exception.printStackTrace();
}
#Override
public void fatalError(SAXParseException exception) throws SAXException {
exception.printStackTrace();
}
#Override
public void warning(SAXParseException exception) throws SAXException {
exception.printStackTrace();
}
});
Document doc = builder.parse(xmlFilePath);
return doc;
}
But now I want to validate the XML file against a DTD file on a user-defined location, and not relative to the path of the XML file.
How can I do that?
Example:
validateXMLFile("/path/to/the/xml_file.xml", "/path/to/the/dtd_file.dtd");

Use EntityResolver.
final String dtd = "/path/to/the/dtd_file.dtd";
builder.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
if (systemId.endsWith("something.dtd")) {
return new InputSource(new FileInputStream(dtd));
}
return null;
}
});
Note that it can work only if the XML document has a DTD declaration.

Related

Java Mockito - Unable to throw IOException during DocumentBuilder.parse

I am trying to create a JUnit test to fire IOException during DocumentBuilder.parse(InputSource.class).
I not sure why my "doThrow" method is not firing IOException.
The source code is as below:
JUnit class:
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration(locations = { "classpath:META-INF/spring/test.xml" })
#Transactional
public class junitTestClass {
#InjectMocks
TargetClass target;
#Rule
public MockitoRule mockito = MockitoJUnit.rule();
#Mock
DocumentBuilder documentBuilder;
#Mock
DocumentBuilderFactory documentBuilderFactory;
#Mock
XPath xpath;
#Test
public void test01() throws InterruptedException, SAXException, IOException, ParserConfigurationException{
when(documentBuilderFactory.newDocumentBuilder()).thenReturn(documentBuilder);
doThrow(new IOException()).when(documentBuilder).parse(any(InputSource.class));
String xml = "<?xml version="1.0" encoding="UTF-8"?><aaa><bbb>123</bbb></aaa>";
String pattern = "//aaa/bbb";
try {
target.parseXML(xml, pattern);
}catch(Exception e) {
e.printStackTrace();
}
}
}
Main Class:
private String parseXML(String xml, String pattern) {
String itemValue ;
try {
DocumentBuilderFactory dFac = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dFac.newDocumentBuilder();
Document document = db.parse(new InputSource(new StringReader(xml)));
XPath xPath = XPathFactory.newInstance().newXPath();
Node msgId = (Node) xPath.compile(pattern).evaluate(document, XPathConstants.NODE);
itemValue = msgId.getTextContent();
} catch (XPathExpressionException | SAXException | ParserConfigurationException | IOException e) {
e.printStackTrace();
}
return itemValue;
}
You should use:
doThrow(IOException.class)
Instead of instantiating it.

HTML Validation on back-end

I am receiving response from external service in html format and pass it directly to my front end. However, sometime external system returns broken html, which can lead to the broken page on my site. Thence, I want to validate this html response whether it is broken or valid. If it is valid I will pass it further, otherwise it will be ignored with error in log.
By what means can I make validation on back-end in Java?
Thank you.
I believe there is no such "generic" thing available in Java. But you can build your own parser to validate the HTML using any one Open Source HTML Parser
I found the solution:
private static boolean isValidHtml(String htmlToValidate) throws ParserConfigurationException,
SAXException, IOException {
String docType = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" " +
"\"https://www.w3.org/TR/xhtml11/DTD/xhtml11-flat.dtd\"> " +
"<html xmlns=\"http://www.w3.org/1999/xhtml\" " + "xml:lang=\"en\">\n";
try {
InputSource inputSource = new InputSource(new StringReader(docType + htmlToValidate));
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setValidating(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
builder.setErrorHandler(new ErrorHandler() {
#Override
public void error(SAXParseException exception) throws SAXException {
throw new SAXException(exception);
}
#Override
public void fatalError(SAXParseException exception) throws SAXException {
throw new SAXException(exception);
}
#Override
public void warning(SAXParseException exception) throws SAXException {
throw new SAXException(exception);
}
});
builder.parse(inputSource);
} catch (SAXException ex) {
//log.error(ex.getMessage(), ex); // validation message
return false;
}
return true;
}
This method can be used this way:
String htmlToValidate = "<head><title></title></head><body></body></html>";
boolean isValidHtml = isValidHtml(htmlToValidate);

Handle external Entities and Stylesheet in Sax Parser (XML)

I want to ignore external entities and external stylesheets (eg. <?xml-stylesheet type="text/xsl" href="......."?>).
I know I have to set XMLReader property to ignore external entities but I don't know how to ignore stylesheets...
import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.XMLReader;
//...
final XMLReader parser = new SAXParser();
// Ignore entities
parser.setProperty("http://xml.org/sax/features/external-general-entities", false);
// IS CORRECT???
parser.setProperty("http://xml.org/sax/features/external-general-entities", false);
There are more properties to set to avoid external entities and stylesheet?
How Can I understand if there are external entities o stylesheets?
Working for me:
public class SaxParser extends DefaultHandler
implements ContentHandler, DTDHandler, EntityResolver{
public transient static final String STYLE_SHEET_TAG = "xml-stylesheet";
public transient static final String EXTERNAL_ENTITY = "ExternalEntity";
public static void main(String[] args) {
new SaxParser().execute();
}
public void execute() {
String pathFileXml = "test/XML.xml";
final XMLReader parser = new SAXParser();
parser.setContentHandler(this);
parser.setDTDHandler(this);
parser.setEntityResolver(this);
try {
parser.parse(pathFileXml);
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
if (SaxParser.STYLE_SHEET_TAG.equals(e.getMessage())
|| SaxParser.EXTERNAL_ENTITY.equals(e.getMessage())) {
System.out.println("CATCH ERRORE");
}
e.printStackTrace();
}
System.out.println("OK");
}
#Override
public void processingInstruction(String target, String data)
throws SAXException {
System.out.println("Processing Instruction");
System.out.println("PI=> target: " + target + ", data: " + data);
if (STYLE_SHEET_TAG.equalsIgnoreCase(target.trim())) {
throw new SAXException(STYLE_SHEET_TAG);
}
return;
}
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws IOException, SAXException {
System.out.println("publicId: " + publicId + ", systemId: " + systemId);
throw new SAXException(SaxParser.EXTERNAL_ENTITY);
}
}
The external stylesheet declaration is a standard processing instruction.
You can ignore processing instructions by not implementing the handler method:
void processingInstruction(java.lang.String target, java.lang.String data) {}
in your SAX handler.

Conditional XML Parsing using Java

I am looking for a suitable parser to parse through the given XML. I want to parse through the full XML only when tag - 'employee', attribute - 'validated=false' else stop parsing. How we can perform this conditional XML parsing using SAX, STAX or any other parsers ?
<?xml version="1.0"?>
<database>
<employee validated="False">
<name>Lars </name>
<street validated="False"> Test </street>
<telephone number= "0123"/>
</employee>
<employee validated="True">
<name>Baegs </name>
<street validated="True"> Test </street>
<telephone number= "0123"/>
</employee>
</database>
I have tried the below SAX parser code
List<XmlObjects> xmlObjects;
String espXmlFileName;
String tmpValue;
XmlObjects xmlObjectsTmp;
public SaxParser(String espXmlFileName) {
this.espXmlFileName = espXmlFileName;
xmlObjects = new ArrayList<XmlObjects>();
parseDocument();
printDatas();
}
private void printDatas() {
for (XmlObjects tmpB : xmlObjects) {
System.out.println(tmpB.toString());
}
}
private void parseDocument() {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
parser.parse(espXmlFileName, this);
} catch (ParserConfigurationException e) {
System.out.println("ParserConfig error");
} catch (SAXException e) {
System.out.println("SAXException : xml not well formed");
} catch (IOException e) {
System.out.println("IO error");
}
}
public void startElement(String uri, String localName, String qName,
org.xml.sax.Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("employee")) {
String value1 = attributes.getValue("validated");
if (value1.equalsIgnoreCase("FALSE")) {
if (qName.equalsIgnoreCase("name")) {
String value2 = attributes.getValue("validated");
xmlObjectsTmp.setName(attributes
.getValue("name"));
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (qName.equalsIgnoreCase("employee")) {
xmlObjects.add(xmlObjectsTmp);
}
if (qName.equalsIgnoreCase("name")) {
xmlObjectsTmp.setName(tmpValue);
}
}
public static void main(String argv[]) {
new SaxParser("C:\\xml\\xml2.xml");
}
In the startElement method of your ContentHandler you can simply throw a SAXException to abort parsing when your validated attribute has the value True.
For example:
#Override
public void startElement(final String uri, final String localName,
final String qName, final Attributes attributes) throws SAXException {
if(localName.equalsIgnoreCase("employee") || localName.equalsIgnoreCase("street")) {
final String validated = attributes.getValue("validated");
if(validated != null && !validated.equals("False")) {
throw new SAXException(localName + " has already been validated");
} else {
//your processing logic here
}
}
}
You can register an ErrorHandler to deal with the error in your own way if you wish.

Xml not parsing String as input with sax

I have a string input from which I need to extract simple information, here is the sample xml (from mkyong):
<?xml version="1.0"?>
<company>
<staff>
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff>
<firstname>low</firstname>
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
</company>
How I parse it within my code (I have a field String name in my class) :
public String getNameFromXml(String xml) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean firstName = false;
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("firstname")) {
firstName = true;
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (firstName) {
name = new String(ch, start, length);
System.out.println("First name is : " + name);
firstName = false;
}
}
};
saxParser.parse(xml.toString(), handler);
} catch (Exception e) {
e.printStackTrace();
}
return name;
}
I'm getting a java.io.FileNotFoundException and I see that it's trying to find a file myprojectpath + the entireStringXML
What am I doing wrong?
Addon :
Here is my main method :
public static void main(String[] args) {
Text tst = new Text("<?xml version=\"1.0\"?><company> <staff> <firstname>yong</firstname> <lastname>mook kim</lastname> <nickname>mkyong</nickname> <salary>100000</salary> </staff> <staff> <firstname>low</firstname> <lastname>yin fong</lastname> <nickname>fong fong</nickname> <salary>200000</salary> </staff></company>");
NameFilter cc = new NameFilter();
String result = cc.getNameFromXml(tst);
System.out.println(result);
}
You should replace the line saxParser.parse(xml.toString(), handler); with the following one:
saxParser.parse(new InputSource(new StringReader(xml)), handler);
I'm going to highlight another issue, which you're likely to hit once you read your file correctly.
The method
public void characters(char ch[], int start, int length)
won't always give you the complete text element. It's at liberty to give you the text element (content) 'n' characters at a time. From the doc:
SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks
So you should build up your text element string from each call to this method (e.g. using a StringBuilder) and only interpret/store that text once the corresponding endElement() method is called.
This may not impact you now. But it'll arise at some time in the future - likely when you least expect it. I've encountered it when moving from small to large XML documents, where buffering has been able to hold the whole small document, but not the larger one.
An example (in pseudo-code):
public void startElement() {
builder.clear();
}
public void characters(char ch[], int start, int length) {
builder.append(new String(ch, start, length));
}
public void endElement() {
// no do something with the collated text
builder.toString();
}
Mybe this help. it's uses javax.xml.parsers.DocumentBuilder, which is easier than SAX
public Document getDomElement(String xml){
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xml));
doc = db.parse(is);
} catch (ParserConfigurationException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (SAXException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (IOException e) {
Log.e("Error: ", e.getMessage());
return null;
}
// return DOM
return doc;
}
you can loop through the document by using NodeList and check each Node by it's name
You call parse with a String as the first parameter. According to the docu that string is interpreted as the URI to your file.
If you want to parse your String directly, you have to transform it to an InputStream in the first place for usage with the parse(InputSource is, DefaultHandler dh) method (docu):
// transform from string to inputstream
ByteArrayInputStream in = new ByteArrayInputStream(xml.toString().getBytes());
InputSource is = new InputSource();
is.setByteStream(in);
// start parsing
saxParser.parse(xml.toString(), handler);
Seems you took this example from here . You need to pass a file with absolute path an not a string to method SAXParser.parse(); Look the example closely. The method parse() defined as follows
public void parse(File f,
DefaultHandler dh)
throws SAXException,
IOException
If you want to parse a string anyways. There is another method which takes Inputstream.
public void parse(InputStream is,
DefaultHandler dh)
throws SAXException,
IOException
Then you need to convert your string to an InputStream. Here is how to do it.

Categories

Resources