Unable to parse XML file using DocumentBuilder - java

I have this code:
if (file.exists()) {
Document doc = builder.parse(file);
NodeList list = doc.getElementsByTagName("property");
System.out.println("XML Elements: ");
for (int ii = 0; ii < list.getLength(); ii++) {
line 2 gives following exception
E:\workspace\test\testDomain\src\com\test\ins\nxg\maps\Right.hbm.xml
...***java.net.SocketException: Operation timed out: connect:could be due to invalid address
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:372)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:233)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:220)

Parhaps the DocumentBuilder is unsuccessfully trying to access a DTD via a network socket for your XML document?
If there are DTD references in the XML document, try editing them out to prove the cause.
If that fixes your problem, I think you can use an EntityResolver for a more permanent solution, but I've not done it myself.

The answer by Brabster is very helpful to me. In my case I have an XML document starting with
<?xml version="1.0"?> <!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN" http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd"> ... more to come
This caused a problem for DocumentBuilder. I got a time out problem. The true evil is in the content of the URL: http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd:
<!-- ============================================
::DATATOOL:: Generated from "gbseq.asn"
::DATATOOL:: by application DATATOOL version 1.5.0
::DATATOOL:: on 06/06/2006 23:03:48
============================================ -->
<!-- NCBI_GBSeq.dtd
This file is built from a series of basic modules.
The actual ELEMENT and ENTITY declarations are in the modules.
This file is used to put them together.
-->
<!ENTITY % NCBI_Entity_module PUBLIC "-//NCBI//NCBI Entity Module//EN"
"NCBI_Entity.mod.dtd"> %NCBI_Entity_module;
<!ENTITY % NCBI_GBSeq_module PUBLIC "-//NCBI//NCBI GBSeq Module//EN" "NCBI_GBSeq.mod.dtd"> %NCBI_GBSeq_module;
After deleting
<!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd">
My program can at least move forward!

Try to simplify your problem.
Can you get the code, you have to parse, manually?
If yes, try to parse it. I don't think it's the problem of your DocumentBuilder but your network connection. So you have to ensure, that the DocumentBuilder is able to access every bit of the xml document.
If your manually stored document fails when it is validated, there will be a different error message.
Hope it helps.

Did you create a new instance of a DocumentBuilderFactory and then create a newDocumentBuilder before you parse the file?
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
Hope this link helps. It definitely helped me earlier today.

Related

How to cache a dtd file when parsing xml in java

I am parsing a few million xml files with that are formatted like so:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE test-document PUBLIC "-//TEST//TEST DOC//EN" "https://somerandomurl.com/test.dtd">
<test-document>...</test-document>
Every time I am parsing a file the same https://somerandomurl.com/test.dtd file is downloaded and that consumes a lot of bandwidth and seems unnecessary. Is there a way to store the file and have my code redirect my local copy? I can't edit the xml files so it has to be in my code. Given the following java code what would be a reasonable way to implement such a thing?
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setIgnoringComments(true);
factory.setIgnoringElementContentWhitespace(true);
factory.setValidating(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource("file.xml"));//My final document object.
First read the DTD into a string variable.
Then do
builder.setEntityResolver(
(sysId, PubId) -> new InputSource(new StringReader(dtd)));
Or if you want to be more careful, have your EntityResolver check that the systemId and/or publicId are as expected before returning the contents of dtd.
Note that this will still involve parsing the DTD each time, it just saves the cost of fetching it from the network.
Also important: instantiating the XML parser is a significant cost (and instantiating a DocumentBuilderFactory is even bigger). Make sure you reuse both the factory and the parser.
If you just want to cache downloaded DTD files, way to go is using XML catalogs. In particular, you'd be specifying, in a resolution rule in a catalog file such as the following
<catalog
Xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<system
systemId="https://somerandomurl.com/test.dtd"
uri="file://mydir/test.dtd"/>
</catalog>
that the entity with system identifier https://somerandomurl.com/test.dtd is resolved as the file /mydir/test.dtd which should contain a downloaded local copy of the DTD file linked to by the https: URL.
Links
https://www.xml.com/pub/a/2004/03/03/catalogs.html
https://docs.oracle.com/javase/10/core/xml-catalog-api1.htm#JSCOR-GUID-96D2C9AC-641A-4BDB-BB08-9FA04358A6F4
https://www.oasis-open.org/committees/entity/spec-2001-08-06.html#s.system

Can I parse XML in Java without taking XML file input from outside?

Generally using DOM, SAX or XPath etc parser we do take input from outside Java code like this:
File inputFile = new File("C:\\Users\\DELL\\Desktop\\catalog.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
So can you parse XML file without taking input like this? I want to write my XML code alongside Java code.
Use DocumentBuilder.parse(new InputStream(new StringReader(xml))) where xml is a string containing the XML to be parsed.
That's if you really must use DOM. I can't imagine why anyone uses it any more, when alternatives such as JDOM2 are so much better.

Java replacing ampersand in XML file

I am processing a list of URLS containing XML files. My problem is that some of them are not well formed because they contain "&"(ampersand) characters,l so my code cannot parse it correctly.
<elementType>CK037 - AT&ZN -SET</elementType>
How could I avoid this?? Should I first read the XML as a String and replace the "&" with "amp;" ?? Are there any other more appropiate solutions for my problem??
This is my code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document doc = null;
try {
doc = factory.newDocumentBuilder().parse(new URL(inputURLString).openStream());
(...)
Thanks in advance.

Problems reading XML InputStream in Java

In my main activity I have this call:
InputStream stream = http_conn.getInputStream();
ParseXML.Login(stream);
I know the input stream is working as I can create a buffered reader, creating a string that I can send to the UI. The issue is this reports the entire XML document that is being returned to me.
Within my ParseXML class Login method, I have the following:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(stream);
doc.getDocumentElement().normalize();
So far so good, I think? I am new to using parsers, but basically the layout of my XML document is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<string xmlns="http://www.xxx.com/asmx">TOKEN HERE</string>
I have seen examples in which you can retrieve various items from deeper with an XML file, as per the example here: http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
I'm not only new to XML parsers but new to java as well, I just can't figure out how to pull that string out of the XML document!
Thanks
I don't know if I'm understanding but if you want to get only TOKEN HERE try doc.getDocumentElement().getTextContent()

How to check if a node exists using XMLConfigurtion in apache common?

<?xml version="1.0" encoding="utf-8"?>
<processor>
<user_config>
<a>xxxxx</a>
</user_config>
</processor>
I want to check if the user_config exist in this xml config file, is there any method I can use in org.apache.common.XMLConfiguration?
I solved the problem with:
if(configuration.configurationsAt( "user_config" ).size() > 0 ) {
//it exists
}
I don't like this solution. If anybody knows a better solution -> Please share.
While I am not sure about the XMLConfiguration you could use the org.apache.xpath to check if the user_config exists or parse it to a DOM object from either a string, inputstream or file and check it that way.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse("");
boolean containsKey(String key) would solve the problem: https://commons.apache.org/proper/commons-configuration/javadocs/v1.10/apidocs/org/apache/commons/configuration/Configuration.html#containsKey(java.lang.String)

Categories

Resources