I am using a third party application and would like to change one of its files. The file is stored in XML but with an invalid doctype.
When I try to read use a it errors out becuase the doctype contains "file:///ReportWiz.dtd"
(as shown, with quotes) and I get an exception for cannot find file. Is there a way to tell the docbuilder to ignore this? I have tried setValidate to false and setNamespaceAware to false for the DocumentBuilderFactory.
The only solutions I can think of are
copy file line by line into a new file, omitting the offending line, doing what i need to do, then copying into another new file and inserting the offending line back in, or
doing mostly the same above but working with a FileStream of some sort (though I am not clear on how I could do this..help?)
DocumentBuilderFactory docFactory = DocumentBuilderFactory
.newInstance();
docFactory.setValidating(false);
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(file);
Tell your DocumentBuilderFactory to ignore the DTD declaration like this:
docFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
See here for a list of available features.
You also might find JDOM a lot easier to work with than org.w3c.dom:
org.jdom.input.SAXBuilder builder = new SAXBuilder();
builder.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
org.jdom.Document doc = builder.build(file);
Handle resolution of the DTD manually, either by returning a copy of the DTD file (loaded from the classpath) or by returning an empty one. You can do this by setting an entity resolver on your document builder:
EntityResolver er = new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if ("file:///ReportWiz.dtd".equals(systemId)) {
System.out.println(systemId);
InputStream zeroData = new ByteArrayInputStream(new byte[0]);
return new InputSource(zeroData);
}
return null;
}
};
My first thought was dealing with it as a stream. You could make a new adapter at some level and just copy input to output except for the offending text.
If the file is shortish (under half a gig or so) you could also read the entire thing into a byte array and make your modifications there, then create a new stream from the byte array into your builder.
That's the advantage of the amazingly bulky way Java handles streams, you actually have a lot of flexibility.
Another thing I was debating was storing all of the file in a string, then doing my manipulations and wiring the String out to a file.None of these seem clean or easy, but what is the best way to do this?
if you do not want to assume the parser is xerces, and want generic solution see this
Related
I am trying to parse an XML file using Java that lives on a network drive...I have reviewed lots of XML parsing info here but cannot find the answer I need... the problem is that the getDocument() routine constantly returns a null value even though the parser gets a accurate location and file name.
Here is the code...
String ThisXMLFile = XMLFileData.getPath();
DOMParser myXMLParser = new DOMParser();
myXMLParser.parse(ThisXMLFile);
Document doc = myXMLParser.getDocument();
Some notes:
I had to use getPath() as the getName() function did not return the fully qualified file name and path - the XML file lives on a network directory and that directory is mapped on my PC to the 'V' drive
I have imported all the required class header files for DOM objects
The variable names given above are real and accurate so if I have inadvertently used a reserved keyword in a variable declaration then please offer correction.
I have extensive programming experience in a few languages but this is my first real Java app.
all the lines of code and the variables above work, until I reach the last line and then getDocument() just sets the doc variable to null... which makes the rest of the program break.
I Believe that your are calling the wrong method... according to your code, you're executing: DOMParser.parse(systemId) when you need to call: DOMParser.parse(InputSource) ...
to create an InputSource you can can do this:
InputSource source = new InputSource(new FileInputStream(ThisXMLFile));
myXMLParser.parse(source);
Document doc = myXMLParser.getDocument();
NOTE: remember to close the opened FileInputStream!!!
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value"))
{
String idValue = XMLReader.getAttributeValue(null, "id");
if (idValue.equals(ElementName))
{
System.out.println(idValue);
XMLReader.nextTag();
System.out.println(XMLReader.getElementText());
}
}
}
XMLReader.next();
}
so this is the code I finally got to...it works and solves the issue of retrieving specific XML data fro a XML file. I wanted at first to use nodelists, elements, Documents, etc but those functions never did work for me... this one did - thanks to all for the answers given as they helped me think this one through...
I have generated an xml file from an excel data base and it contains automatically an element called "offset". To make my new file match my needs, I want to remove this element using java.
here is the xml content:
<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>
I made a code that reads(with a buffered reader) and writes the content in a new file and used the condition if:
while (fileContent !=null){
fileContent=xmlReader.readLine();
if (!(fileContent.equals("<offset/>"))){
System.out.println("here is the line:"+ fileContent);
XMLFile+=fileContent;
}
}
But it does not work
I would personally recommend using a proper XML parser like Java DOM to check and delete your nodes, rather than dealing with your XML as raw Strings (yuck). Try something like this to remove your 'offset' node.
File xmlFile = new File("your_xml.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
NodeList nList = doc.getElementsByTagName("offset");
for (int i = 0; i < nList.getLength(); i++) {
Node node = nList.item(i);
node.getParentNode().removeChild(node);
}
The above code removes any 'offset' nodes in an xml file.
If resources/speed considerations are an issue (like when your_xml.xml is huge), you would be better off using SAX, which is faster (a little more code intensive) and doesn't store the XML in memory.
Once your Document has been edited you'll probably want to convert it to a String to parse to your OutputStream of choice.
Hope this helps.
Just try to replace the unwanted string with an empty string. It is quick... and seriously dirty. But might solve your problem quickly... just to reappear later on ;-)
fileContent.replace("<offset/>, "");
String sXML= "<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>";
System.out.println(sXML);
String sNewXML = sXML.replace("<offset/>", "");
System.out.println(sNewXML);
String xml = "<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>";
xml = xml.replaceAll("<offset/>", "");
System.out.println(xml);
In your original code that you included you have:
while (fileContent !=null)
Are you initializing fileContent to some non-null value before that line? If not, the code inside your while block will not run.
But I do agree with the other posters that a simple replaceAll() would be more concise, and a real XML API is better if you want to do anything more sophisticated.
I have an input XML that has entities declared in it. It looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doctype PUBLIC "desc" "DTD.dtd" [
<!ENTITY SLSD_68115_jpg SYSTEM "68115.jpg" NDATA JPEG>
]>
The DTD.dtd file contains the neccessary notation:
<!NOTATION JPEG SYSTEM "JPG" >
During XSLT transformation I would like to get the URI declared in the entity using the name 'SLSD_68115_jpg' like so:
<xsl:value-of select="unparsed-entity-uri('SLSD_68115_jpg')"/>
So that it would return something like "68115.jpg".
The problem is that it always returns an empty string. There is no way for me to modify the input xml. I understand that this could be a common problem from what I found on the internet, but i haven't found any final conclusions, solutions or alternatives to this problem.
It might be important to note that I had a problem before since I am using a StreamSource and things like systemId had to be set manually, I think this is where the problem might be hidden. It's like the transformator is unable to resolve the entity with given id.
I'm using Xalan, I probably need to provide more details but I'm not sure what to add, I'll answer any questions is there are any.
Any help would be greatly appretiated.
I found out why the "unparsed-entity-uri" was unable to resolve the declared entities. This might be a special case, but I will post this solution so it might save someone else a lot of time.
I'm (very) new to XSLT. The xsl file I got to work with however as a student was pretty extreme with multiple import statements and files containing more than 5K lines of code.
Simply by the time I got to the point where I needed the entities the transformator used a different document that was essentially the sub document of the original one, which is okay, but for example the entity declarations are not passed to the sub document. Therefore there is no way for me to use the entities from that point beyond.
Now like I said im new to XSLT but I think that for example lines like this can cause the problem:
<xsl:apply-templates select="exslt:node-set($nodelist)"/>
Because after this, entity references are no bueno.
If this was trivial then my apologies for waisting your time.
Thanks to everyone none the less!
Instead of a StreamSource, try a SAXSource configured with a validating parser:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setValidating(true);
spf.setNamespaceAware(true);
XMLReader xmlr = spf.newSAXParser().getXMLReader();
InputSource input = new InputSource(
new File("/path/to/file.xml").toURI().toString());
// if you already have an InputStream/Reader then do
// input.setByteStream or input.setCharacterStream as appropriate
SAXSource source = new SAXSource(xmlr, input);
Or you can use a DOMSource in the same way
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
dbf.setNamespaceAware(true);
File f = new File("/path/to/file.xml");
Document doc = dbf.newDocumentBuilder().parse(f);
DOMSource source = new DOMSource(doc, f.toURI().toString());
I have an XML file of which I have an element as shown;
"<Event start="2011.12.12 13:45:00:0000" end="2011.12.12 13:47:00:0000" anon="89"/>"
I want to add another attribute "comment" and write it to this XML File giving;
"<Event start="2011.12.12 13:45:00:0000" end="2011.12.12 13:47:00:0000" anon="89" comment=""/>"
How would I go about doing this?
Thanks, Matt
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setIgnoringElementContentWhitespace(true);
Document document = factory.newDocumentBuilder().parse(xmlFile);
Element eventElement = (Element)document.getElementsByTagName("Event").item(0);
eventElement.setAttribute("comment", "");
FYI: I've use DOM framework here org.w3c.dom.*
Use setAttribute method to add attribute,
// Add an attribute
element.setAttribute("newAttrName", "attrValue");
Use the following method to write to XML file,
// This method writes a DOM document to a file
public static void writeXmlFile(Document doc, String filename) {
try {
// Prepare the DOM document for writing
Source source = new DOMSource(doc);
// Prepare the output file
File file = new File(filename);
Result result = new StreamResult(file);
// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);
} catch (TransformerConfigurationException e) {
} catch (TransformerException e) {
}
}
Parse the file, add the attribute and write it back to disk.
There is plenty of frameworks that can do this. The DOM framework in Java is probably the first thing you should look at.
Using DOM, as suggested in previous answers, is certainly reasonable for this particular problem, which is relatively simple.
However, I have found that JDOM is generally much easier to use when you want to parse and/or modify XML files. Its basic approach is to load the entire file into an easy to use data structure. This works well unless your XML file is very very large.
For more info go to http://www.jdom.org/
I have this XML file which doesn't have a root node. Other than manually adding a "fake" root element, is there any way I would be able to parse an XML file in Java? Thanks.
I suppose you could create a new implementation of InputStream that wraps the one you'll be parsing from. This implementation would return the bytes of the opening root tag before the bytes from the wrapped stream and the bytes of the closing root tag afterwards. That would be fairly simple to do.
I may be faced with this problem too. Legacy code, eh?
Ian.
Edit: You could also look at java.io.SequenceInputStream which allows you to append streams to one another. You would need to put your prefix and suffix in byte arrays and wrap them in ByteArrayInputStreams but it's all fairly straightforward.
Your XML document needs a root xml element to be considered well formed. Without this you will not be able to parse it with an xml parser.
One way is to provide your own dummy wrapper without touching the original 'xml' (the not well formed 'xml') Need the word for that:
Syntax
<!DOCTYPE some_root_elem SYSTEM "/home/ego/some.dtd"
[
<!ENTITY entity-name "Some value to be inserted at the entity">
]
Example:
<!DOCTYPE dummy [
<!ENTITY data SYSTEM "http://wherever-my-data-is">
]>
<dummy>
&data;
</dummy>
You could use another parser like Jsoup. It can parse XML without a root.
I think even if any API would have an option for this, it will only return you the first node of the "XML" which will look like a root and discard the rest.
So the answer is probably to do it yourself. Scanner or StringTokenizer might do the trick.
Maybe some html parsers might help, they are usually less strict.
Here's what I did:
There's an old java.io.SequenceInputStream class, which is so old that it takes Enumeration rather than List or such.
With it, you can prepend and append the root element tags (<div> and </div> in my case) around your no-root XML stream. (You shouldn't do it by concatenating Strings due to performance and memory reasons.)
public void tryExtractHighestHeader(ParserContext context)
{
String xhtmlString = context.getBody();
if (xhtmlString == null || "".equals(xhtmlString))
return;
// The XHTML needs to be wrapped, because it has no root element.
ByteArrayInputStream divStart = new ByteArrayInputStream("<div>".getBytes(StandardCharsets.UTF_8));
ByteArrayInputStream divEnd = new ByteArrayInputStream("</div>".getBytes(StandardCharsets.UTF_8));
ByteArrayInputStream is = new ByteArrayInputStream(xhtmlString.getBytes(StandardCharsets.UTF_8));
Enumeration<InputStream> streams = new IteratorEnumeration(Arrays.asList(new InputStream[]{divStart, is, divEnd}).iterator());
try (SequenceInputStream wrapped = new SequenceInputStream(streams);) {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(wrapped);
From here you can do whatever you like, but keep in mind the extra element.
XPath xPath = XPathFactory.newInstance().newXPath();
}
catch (Exception e) {
throw new RuntimeException("Failed parsing XML: " + e.getMessage());
}
}