Http charset vs xml encoding (utf-8, utf-16, etc) - java

Which one I should use to parse the xml file. what is the recommended approach to the parse http-xml file. my approach is read xml as String and use DocumentBuilder to parse the String.
Is this right approach.
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
Document doc = null;
InputSource is = null;
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
is = new InputSource(new StringReader(xmlString));
doc = dBuilder.parse(is);

XML specifies its own encoding in <!xml encoding="..."> defaulting to UTF-8.
Using a StringReader using a String, already assumes that the reading has been done in a guessed encoding. That seems less recommendable, than using a pure binary format, like File or InputStream.
Another factor is the document base, to find included documents, xsd, dtd. There the usage of an XML catalog might help, storing such files offline.

Related

Can I parse XML in Java without taking XML file input from outside?

Generally using DOM, SAX or XPath etc parser we do take input from outside Java code like this:
File inputFile = new File("C:\\Users\\DELL\\Desktop\\catalog.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
So can you parse XML file without taking input like this? I want to write my XML code alongside Java code.
Use DocumentBuilder.parse(new InputStream(new StringReader(xml))) where xml is a string containing the XML to be parsed.
That's if you really must use DOM. I can't imagine why anyone uses it any more, when alternatives such as JDOM2 are so much better.

Why using InputSource fixes SAX parser when file contains special UTF-8 characters

I'm looking to get an explanation on why my SAX parser fails when some special UTF-8 characters are inside my XML file.
To parse the XML file I use Document doc = builder.parse(inputSource);
However when I use an inputSource it works fine:
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream in = new FileInputStream(file);
InputSource inputSource = new InputSource(new InputStreamReader(in));
Document doc = builder.parse(inputSource);
I don't quite understand why the latter works. I've seen example of it being used but there isn't an explanation on why it works.
Does the second parse a string rather than a file, therefore the encoding will be UTF-8?
I suspect your document isn't really in the encoding you've declared. This line:
InputSource inputSource = new InputSource(new InputStreamReader(in));
will use the platform default encoding to convert the binary data into text within InputStreamReader. The XML parser doesn't get to do it any more - it doesn't get to see the raw bytes.
If this is working, your XML file is probably subtly bust - it may be declaring that it's in UTF-8, but using the platform default encoding (e.g. Windows-1252). Rather than use the workaround, you should fix the XML if you have any choice about it.

Writing XML according to a DTD

I would like to know if there is a way (particularly, an API), in Java, to write a XML in a SAX-like way (i.e., event-like way, differently from JDOM, which I cannot use) that takes a DTD and guarantees that my XML document is being correctly written.
I have been using SAX for parsing and I have written a XML writer layer by myself as if I were writing a plain file (through OutputStreamWriter), but I have seen that my XML writer layer is not always following the DTD rules.
SAX does not know to write XML documents. It is attended to parse them. So, you can choose any method you want to create document and then validate it using SAX API against DTD.
BTW may I ask you why are you limiting yourself to using tools that were almost obsolete about 10 years ago? Why not to use higher level API that converts objects to XML and vice versa? For example JAXB.
The Standard DocumentBuilder methodology can validate for you.
This snippet taken from http://www.edankert.com/validate.html#Validate_using_internal_DTD
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")}));
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setErrorHandler(new SimpleErrorHandler());
Document document = builder.parse(new InputSource("document.xml"));

Java XML parser?

I'm currently converting a program I wrote in Visual Basic .NET (the 2005 variety) into Java. It used built-in XML methods to parse and generate the user's saved data, does Java have an equivalent feature built in or am I going to have to change file processing implementations? (I'd rather not, there's a lot of code I'd have to change.)
Yes, Java can parse XML. Here's an example that takes in a String that contains XML and builds a Document object out of it:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
InputSource inputSource = new InputSource(new StringReader(xml));
Document document = documentBuilder.parse(inputSource);
You can then use the XPath API to query the dom. Here's a tutorial/writeup about it.
As far as serializing objects to XML, the official implementation is JAXB and it is part of Java since 1.6. Here's a simple example. It will let you serialize and deserialize to and from XML.
You can also create a DOM object manually and add nodes to it, but it's a little more tedious:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.newDocument();
Element rootNode = document.createElement("root");
Element childNode = document.createElement("child");
childNode.setTextContent("I am a child node");
childNode.setAttribute("attr", "value");
rootNode.appendChild(childNode);
document.appendChild(rootNode);
I'm assuming that you mean that the properties/structure was generated through the classes/beans themselves? If so, then the answer is no [without an third party component]. I've used XStream before, and that is about the closest that I've gotten to .NET's XML Class serialization.

how to parse XML document?

I have xml document in variable (not in file). How can i get data storaged in that? I don't have any additional file with that, i have it 'inside' my sourcecode. When i use
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(XML);
(XML is my xml variable), i get an error
java.io.FileNotFoundException: C:\netbeans\app-s7013\<network ip_addr="10.0.0.0\8" save_ip="true"> File not found.
Read your XML into a StringReader, wrap it in an InputSource, and pass that to your DocumentBuilder:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
Assuming that XML is a String, don't be confused by the version that takes a string - the string is a URL, not your input!
What you need is the version that takes an input stream.
You need to create an input stream based on a string (I'll try and find code sample, but you can Google for that). Usually a StringReader is involved.

Categories

Resources