The markup must be well-formed - java

First off, let me say I am a new to SAX and Java.
I am trying to read information from an XML file that is not well formed.
When I try to use the SAX or DOM Parser I get the following error in response:
The markup in the document following the root element must be well-formed.
This is how I set up my XML file:
<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
...
Can I force the SAX or DOM to parse XML files even if they are not well formed XML?
Thank you for your help. Much appreciated.
Haythem

Your best bet is to make the XML well-formed, probably by pre-processing it a bit. In this case, you can achieve that simply by putting an XML declaration on (and even that's optional) and providing a root element (which is not optional), like this:
<?xml version="1.0"?>
<wrapper>
<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
</wrapper>
There I've arbitrarily picked the name "wrapper" for the root element; it can be whatever you like.

Hint: using sax or stax you can successfully parse a not well formed xml document until the FIRST "well formed-ness" error is encountered.
(I know that this is not of too much help...)

As the DOM will scan you xml file then build a tree, the root node of the tree is like the as 1 Answer. However, if the Parser can't find the or even , it can even build the tree. So, its better to do some pre-processing the xml file before parser it by DOM or Sax.

Related

xml parsing using sax

I want to parse XML file using SaxParser. I'm trying to fetch the data associated to a tag or its attributes. The XML is in the following format.
<con>
<fig>
<abc>
<name xyz="">
<id>2</id>
</name>
</abc>
</fig>
</con>
I tried with couple of example but not succeeded in the fetching the data, I am requesting you to provide me any suggestion or and working example to increase my knowledge on parsing using SAX.
Use a DOM parser as your xml is very small and its easy to get what you are asking for.
There are my example and help available online, Please Google your question before posting on the stack over flow from next time.
Please check the following links which has good example for saxParsing
How to read XML file in Java – (SAX Parser)
Getting Attributes And its Value
hope this help you.

XML parsing /JAVA

I have an XML, for example
<root>
<config x="xxx" y="yyyy" z="zzz" />
<properties>blah blah blah </properties>
<example>
<name>...</name>
<decr>...</descr>
</example>
<example>
<name>...</name>
<decr>...</descr>
</example>
</root>
and I need to get nodes config, and properties and all values in it.
Thank you
Xpath can fetch you the data in the config tag. You need to create an expression first like this
expression="//root/config/#x", to get value of x,y,z.
For properties, follow this thread :
Parsing XML with XPath in Java
Hope this helps
DOM,DOM4J,SAX..
if the size of XML file is small,you can use DOM or DOM4J,but the size is big , you use the SAX
If you directly want to query or fetch data XPath can help, but if you want the data as Java Objects so that you can use it further then use JAXB
You can use SAX parser to read the xml manipulate its event based parsing and consumes more memory.
If your xml is big and requires lot of manipulations then go-for DOM/DOM4j either is good. DOM4L is very latest. DOM is widely used in industry.
Based on your requirement go for good parser.
Thanks,
Pavan

XML Parent & child Attributes reading in java?

I have the following data in my XML file.
<main>
<Team Name="Development" ID="10">
<Emp Source="Business" Total="130" Active="123" New="12" />
<Emp Source="Business" Total="131" Active="124" New="13" />
</Team>
<Team Name="Testing" ID="10">
<Emp Source="Business" Total="133" Active="125" New="14" />
</Team>
</main>
I want to read above data & store values into arrays,Can any one help on these?
Not sure why you need to convert those xml into Arrays, anyhow you can read xml and parse it by several ways. Normally we use DOM or Stax Parser and a Tutorial link is here, also here is a Java SAX Parsing Example tutorial.
Hope this can help you to achieve your goal. Update your question again if you stuck anywhere.
You can use parser in JAVA to parse the XML document. The package in java for this purpose is javax.xml.parsers . DocumentBuilder parses XML into a Document and Document is a tree structured data structure that is DOM(Document Object Model) readable file. Its nodes can be traversed/ changed/ accessed by DOM methods.
Here is a very good tutorial on XML DOM: http://www.roseindia.net/xml/dom/
and more specifically: http://www.roseindia.net/xml/dom/accessing-xml-file-java.shtml
also you can always refer to w3school for more theory on DOM!

Removing nodes with invalid tag names from a xml document

I transform xml with the Saxon XSLT2 processor (using Java + the Saxon S9API) and have to deal with xml-documents as the source, which contain invalid characters as tag names and therefore can't be parsed by the document-builder.
Example:
<A>
<B />
<C>
<D />
</C>
<E!_RANDOM_ />
< />
</A>
Code:
import net.sf.saxon.s9api.*;
[...]
/* XSLT Processor & Compiler */
proc = new Processor(false);
/* build document from input*/
XdmNode source = proc.newDocumentBuilder().build(new StreamSource(input));
Error:
Error on line X column Y
SXXP0003: Error reported by XML parser: Element type
"E" must be followed by either attribute specifications, ">" or "/>".
The exclamation mark and the tag name just being space are currently my only invalid tags.
I am searching for a more robust solution rather than just removing whole lines of the (formated) xml.
With some mind-bending I could come up with a regular expression to identify the invalid strings, but would struggle with the removal of the nodes containing attributes and child-nodes.
Thank you for your help!
If the input contains invalid tags then it is not XML. It's best to get your mind-set right by referring to these as non-XML documents rather than XML documents; that helps to make it clear that to process non-XML documents, you need non-XML tools. (Forget about "nodes" - there are no nodes until the document has been parsed, and it can't be parsed until you have turned it into well-formed XML). To turn non-XML into XML, you will typically want to use non-XML tools that are good at text manipulation, such as Perl. Of course, it's much better to fix the problem at source: all the benefits of XML are lost if people generate data in private non-XML formats.

XML parsing with java dom parser converts xml declaration into elements

Here is the xml file that I need to process as part of my assignment.
<?xml-stylesheet type="text/xsl" href="people.xsl"?>
<People>
<Person>
`...`
</Person>
`...`
</People>
I am using the "javax.xml.parsers.DocumentBuilderFactory" to create a dom parser. After parsing, the resultant document does not have People at the root but some root having children as xml-stylesheet and People.
Looks like this can be avoided.
<?xml-stylesheet ... ?> is not an XML declaration. It is a Processing Instruction (PI), and the DOM spec says that a Document node may contain zero or more of them.
One approach would be to code your application to deal appropriately with (e.g. ignore) an PI's in the Document node. Alternatively, just use the Document node's documentElement attribute / getter to get the root Element directly.

Categories

Resources