I am trying to parse an XML file using Java that lives on a network drive...I have reviewed lots of XML parsing info here but cannot find the answer I need... the problem is that the getDocument() routine constantly returns a null value even though the parser gets a accurate location and file name.
Here is the code...
String ThisXMLFile = XMLFileData.getPath();
DOMParser myXMLParser = new DOMParser();
myXMLParser.parse(ThisXMLFile);
Document doc = myXMLParser.getDocument();
Some notes:
I had to use getPath() as the getName() function did not return the fully qualified file name and path - the XML file lives on a network directory and that directory is mapped on my PC to the 'V' drive
I have imported all the required class header files for DOM objects
The variable names given above are real and accurate so if I have inadvertently used a reserved keyword in a variable declaration then please offer correction.
I have extensive programming experience in a few languages but this is my first real Java app.
all the lines of code and the variables above work, until I reach the last line and then getDocument() just sets the doc variable to null... which makes the rest of the program break.
I Believe that your are calling the wrong method... according to your code, you're executing: DOMParser.parse(systemId) when you need to call: DOMParser.parse(InputSource) ...
to create an InputSource you can can do this:
InputSource source = new InputSource(new FileInputStream(ThisXMLFile));
myXMLParser.parse(source);
Document doc = myXMLParser.getDocument();
NOTE: remember to close the opened FileInputStream!!!
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value"))
{
String idValue = XMLReader.getAttributeValue(null, "id");
if (idValue.equals(ElementName))
{
System.out.println(idValue);
XMLReader.nextTag();
System.out.println(XMLReader.getElementText());
}
}
}
XMLReader.next();
}
so this is the code I finally got to...it works and solves the issue of retrieving specific XML data fro a XML file. I wanted at first to use nodelists, elements, Documents, etc but those functions never did work for me... this one did - thanks to all for the answers given as they helped me think this one through...
Related
I'm creating a program which checks the legitimacy of a given URL. I've already created my own algorithm for this, but now I want to add PhishTank's services into my program.
They provide services where you can directly query a URL from their website, but they have set a certain quota on the number of queries you can make per day. The other option, which I'm going with, is to simply download their database and work with it locally, without restrictions.
The file you get is in XML, and found some code to test with, but it seems like their XML contains illegal characters (such as unicode 0x07 -- the [BEL] character) inside CDATA, and so the parsing throws me an exception.
<url><![CDATA[http://shaghaf-edu.com/sign-in/??msg=InvalidOnlineIdException&id[BEL]da9ca9b23227a572d1fb5ff4ff91e3&lpOlbResetErrorCounter=0l=&request_locale=en-us]]></url>
I've done a bit of searching and all I've found is solutions that seem fine to rather small XML-files. The one I'm working with is close to 2.7 million lines -- I'm not sure how efficiently a regex would work in this case or a char-to-char comparison.
I should note that their database is updated hourly, and has to be redownloaded. So cleaning the file once manually isn't an option.
So I'm wondering if there is any fast and efficient way of solving this problem?
I don't have the exact code with me, but I use is a very slight variation of this which I found here on StackOverflow:
private void start() throws Exception
{
URL url = new URL("http://localhost:8080/AutoLogin/resource/web.xml");
URLConnection connection = url.openConnection();
Document doc = parseXML(connection.getInputStream());
NodeList descNodes = doc.getElementsByTagName("description");
for(int i=0; i<descNodes.getLength();i++)
{
System.out.println(descNodes.item(i).getTextContent());
}
}
private Document parseXML(InputStream stream)
throws Exception
{
DocumentBuilderFactory objDocumentBuilderFactory = null;
DocumentBuilder objDocumentBuilder = null;
Document doc = null;
try
{
objDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
objDocumentBuilder = objDocumentBuilderFactory.newDocumentBuilder();
doc = objDocumentBuilder.parse(stream);
}
catch(Exception ex)
{
throw ex;
}
return doc;
}
Answering by asking a question ...
Why not write a simple pre-processing utility?
It could read the XML file as is (line by line); and do whatever is required to turn that content into "correct" XML.
In other words: you should explicitly distinguish between the task of "preparing your input", and "actually working that xml input". This will also make it much easier to do fine tuning. If you find that regular expressions are too expensive; then just change the the "pre-processor" to not use them. And afterwards, easily measure the effects on runtime ...
I'm trying to read an xml file on from an android app using XOM as the XML library. I'm trying this:
Builder parser = new Builder();
Document doc = parser.build(context.openFileInput(XML_FILE_LOCATION));
But I'm getting nu.xom.ParsingException: Premature end of file. even when the file is empty.
I need to parse a very simple XML file, and I'm ready to use another library instead of XOM so let me know if there's a better one. or just a solution to the problem using XOM.
In case it helps, I'm using xerces to get the parser.
------Edit-----
PS: The purpose of this wasn't to parse an empty file, the file just happened to be empty on the first run which showed this error.
If you follow this post to the end, it seems that this has to do with xerces and the fact that its an empty file, and they didn't reach a solution on xerces side.
So I handled the issue as follows:
Document doc = null;
try {
Builder parser = new Builder();
doc = parser.build(context.openFileInput(XML_FILE_LOCATION));
}catch (ParsingException ex) { //other catch blocks are required for other exceptions.
//fails to open the file with a parsing error.
//I create a new root element and a new document.
//I fill them with xml data (else where in the code) and save them.
Element root = new Element("root");
doc = new Document(root);
}
And then I can do whatever I want with doc. and you can add extra checks to make sure that the cause is really an empty file (like check the file size as indicated by one of sam's comments on the question).
An empty file is not a well-formed XML document. Throwing a ParsingException is the right thing to do here.
First of all, thanks to all the people who's going to spend a little time on this question.
Second, sorry for my english (not my first language! :D).
Well, here is my problem.
I'm learning Android and I'm making an app which uses a XML file to store some info. I have no problem creating the file, but trying to read de XML tags with XPath (DOM, XMLPullParser, etc. only gave me problems) I've been able to read, at least, the first one.
Let's see the code.
Here is the XML file the app generates:
<dispositivo>
<id>111</id>
<nombre>Name</nombre>
<intervalo>300</intervalo>
</dispositivo>
And here is the function which reads the XML file:
private void leerXML() {
try {
XPathFactory factory=XPathFactory.newInstance();
XPath xPath=factory.newXPath();
// Introducimos XML en memoria
File xmlDocument = new File("/data/data/com.example.gps/files/devloc_cfg.xml");
InputSource inputSource = new InputSource(new FileInputStream(xmlDocument));
// Definimos expresiones para encontrar valor.
XPathExpression tag_id = xPath.compile("/dispositivo/id");
String valor_id = tag_id.evaluate(inputSource);
id=valor_id;
XPathExpression tag_nombre = xPath.compile("/dispositivo/nombre");
String valor_nombre = tag_nombre.evaluate(inputSource);
nombre=valor_nombre;
} catch (Exception e) {
e.printStackTrace();
}
}
The app gets correctly the id value and shows it on the screen ("id" and "nombre" variables are assigned to a TextView each one), but the "nombre" is not working.
What should I change? :)
Thanks for all your time and help. This site is quite helpful!
PD: I've been searching for a response on the whole site but didn't found any.
You're using the same input stream twice, but the second time you use it it's already at the end of file. You have to either open the stream once more or buffer it e.g. in a ByteArrayInputStream and reuse it.
In your case doing this:
inputSource = new InputSource(new FileInputStream(xmlDocument));
before this line
XPathExpression tag_nombre = xPath.compile("/dispositivo/nombre");
should help.
Be aware though that you should properly close your streams.
The problem is that you cannot re-use the stream-input-source multiple times - the first call to tag_id.evaluate(inputSource) already has read the input up to the end.
One solution would be to parse Document in advance:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
Document document = documentBuilderFactory.newDocumentBuilder().parse(inputSource);
Source source = new DOMSource(document);
// evalute xpath-expressions on the dom source
I have this XML file which doesn't have a root node. Other than manually adding a "fake" root element, is there any way I would be able to parse an XML file in Java? Thanks.
I suppose you could create a new implementation of InputStream that wraps the one you'll be parsing from. This implementation would return the bytes of the opening root tag before the bytes from the wrapped stream and the bytes of the closing root tag afterwards. That would be fairly simple to do.
I may be faced with this problem too. Legacy code, eh?
Ian.
Edit: You could also look at java.io.SequenceInputStream which allows you to append streams to one another. You would need to put your prefix and suffix in byte arrays and wrap them in ByteArrayInputStreams but it's all fairly straightforward.
Your XML document needs a root xml element to be considered well formed. Without this you will not be able to parse it with an xml parser.
One way is to provide your own dummy wrapper without touching the original 'xml' (the not well formed 'xml') Need the word for that:
Syntax
<!DOCTYPE some_root_elem SYSTEM "/home/ego/some.dtd"
[
<!ENTITY entity-name "Some value to be inserted at the entity">
]
Example:
<!DOCTYPE dummy [
<!ENTITY data SYSTEM "http://wherever-my-data-is">
]>
<dummy>
&data;
</dummy>
You could use another parser like Jsoup. It can parse XML without a root.
I think even if any API would have an option for this, it will only return you the first node of the "XML" which will look like a root and discard the rest.
So the answer is probably to do it yourself. Scanner or StringTokenizer might do the trick.
Maybe some html parsers might help, they are usually less strict.
Here's what I did:
There's an old java.io.SequenceInputStream class, which is so old that it takes Enumeration rather than List or such.
With it, you can prepend and append the root element tags (<div> and </div> in my case) around your no-root XML stream. (You shouldn't do it by concatenating Strings due to performance and memory reasons.)
public void tryExtractHighestHeader(ParserContext context)
{
String xhtmlString = context.getBody();
if (xhtmlString == null || "".equals(xhtmlString))
return;
// The XHTML needs to be wrapped, because it has no root element.
ByteArrayInputStream divStart = new ByteArrayInputStream("<div>".getBytes(StandardCharsets.UTF_8));
ByteArrayInputStream divEnd = new ByteArrayInputStream("</div>".getBytes(StandardCharsets.UTF_8));
ByteArrayInputStream is = new ByteArrayInputStream(xhtmlString.getBytes(StandardCharsets.UTF_8));
Enumeration<InputStream> streams = new IteratorEnumeration(Arrays.asList(new InputStream[]{divStart, is, divEnd}).iterator());
try (SequenceInputStream wrapped = new SequenceInputStream(streams);) {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(wrapped);
From here you can do whatever you like, but keep in mind the extra element.
XPath xPath = XPathFactory.newInstance().newXPath();
}
catch (Exception e) {
throw new RuntimeException("Failed parsing XML: " + e.getMessage());
}
}
I want to parse the following XML document to resolve all entities in it:
<!DOCTYPE doc SYSTEM 'mydoc.dtd'>
<doc>&title;</doc>
My EntityResolver is supposed to fetch the external entity with the given system ID from the database and then do the resolution, see below for an illustration:
private static class MyEntityResolver
{
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException
{
// At this point, systemId is always absolutized to the current working directory,
// even though the XML document specified it as relative.
// E.g. "file:///H:/mydoc.dtd" instead of just "mydoc.dtd"
// Why??? How can I prevent this???
SgmlEntity entity = findEntityFromDatabase(systemId);
InputSource is = new InputSource(new ByteArrayInputStream(entity.getContents()));
is.setPublicId(publicId);
is.setSystemId(systemId);
return is;
}
}
I tried both using DOM (DocumentBuilder) and SAX (XMLReader), set the entity resolver to MyEntityResolver (i.e. setEntityResolver(new MyEntityResolver())), but systemId in MyEntityResolver#resolveEntity(String publicId, String systemId) is always being absolutized to the current working directory.
I also tried calling setFeature("http://xml.org/sax/features/resolve-dtd-uris", false);, but that didn't help anything.
So how can I achieve what I wanted?
Thanks!
Apparently, there is another interface called EntityResolver2 which is the extension of the old EntityResolver. (Talk about confusing names!)
Anyway, I found that EntityResolver2 achieved what I wanted, that is, it does not make any changes to the systemId, so it will always exactly be what was specified in the XML document.
From the EntityResolver Javadocs:
If the system identifier is a URL, the
SAX parser must resolve it fully
before reporting it to the
application.
Also, the org.xml.sax docs have this to say about the resolve-dtd-uris feature:
It does not apply to
EntityResolver.resolveEntity(), which
is not used to report declarations...
I think you've either got to set your base-URI to something you can live with, or use public-IDs instead of system-IDs.