SAX Parsing in Java - java

I must parse an XML from URL in Java with SAX parser. I didn't find an example on the internet about this topic. All of them are reading an XML from local. Is there an example that xml has nested tags and parsing from url in Java?

Refer this example java snippet
String webServiceURL="web service url or document url here";
URL geoLocationDetailXMLURL = new URL(webServiceURL);
URLConnection geoLocationDetailXMLURLConnection = geoLocationDetailXMLURL.openConnection();
geoLocationDetailXMLURLConnection.setConnectTimeout(120000);
geoLocationDetailXMLURLConnection.setReadTimeout(120000);
BufferedReader geoLeocationDetails = new BufferedReader(new InputStreamReader(geoLocationDetailXMLURLConnection.getInputStream(), "UTF-8"));
InputSource inputSource = new InputSource(geoLeocationDetails);
saxParser.parse(inputSource, handler);

This should help
SAX parser and a file from the nework
The important line being
xr.parse(new InputSource(sourceUrl.openStream()));
where sourceUrl is a string

Related

SAX error: incompatible types: String cannot be converted to InputSource

Relevant code; barfs on instantiating the SAXSource:
TransformerFactory factory = TransformerFactory.newInstance();
XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Source input = new SAXSource(xmlReader, "http://books.toscrape.com/");
Result output = new StreamResult(System.out);
factory.newTransformer().transform(input, output);
The JavaDoc's say:
public SAXSource(XMLReader reader,
InputSource inputSource)
Create a SAXSource, using an XMLReader and a SAX InputSource. The
Transformer or SAXTransformerFactory will set itself to be the
reader's ContentHandler, and then will call reader.parse(inputSource).
Looking at InputSource shows:
InputSource(InputStream byteStream)
Create a new input source with a byte stream.
InputSource(Reader characterStream)
Create a new input source with a character stream.
So this would entail, for example, a character stream to read in html for the InputStream??
Would tagsoup better be used for this identity transform? But, how?
There is a constructor https://docs.oracle.com/javase/8/docs/api/org/xml/sax/InputSource.html#InputSource-java.lang.String- that takes a system id e.g. a URL so you can use Source input = new SAXSource(xmlReader, new InputSource("http://books.toscrape.com/"));.
You can get access to an InputStream that reads from the resource behind the URL like this:
InputStream i = new URL("http://...").openConnection().getInputStream();
Then you can use i for your SAXSource.

How to convert InputStream to InputSource?

ALL,
I wrote a simple SAX XML parser. It works and I was testing it with local XML file. Here is my code:
SAXParserFactory spf = SAXParserFactory.newInstance();
XMLParser xmlparser = null;
try
{
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
xmlparser = new XMLParser();
reader.setContentHandler( xmlparser );
reader.parse( new InputSource( getResources().openRawResource( R.raw.categories ) ) );
Now I need to read this XML file from the website. The code I'm trying is:
public InputStream getXMLFile()
{
URL url = new URL("http://example.com/test.php?param=0");
InputStream stream = url.openStream();
Document doc = docBuilder.parse(stream);
}
reader.parse( new Communicator().getXMLFile() );
I'm getting compiler error
"The method parse(InputSource) is not applicable for the argument (InputStream)".
I need help figuring out what do I need.
Thank you.
While I hate to sound obvious, is there any reason you're not using this constructor?
InputSource source = new InputSource(stream);
Document doc = docBuilder.parse(source);
Note that that's very similar to what you're doing in the first section of code. After all, openRawResource returns an InputStream as well...

How to switch java code from using local XML file to URL of an XML file

I have written web application that reads an XMl file parses it and does some work.
Rather than using a local file, I'd like to use a URL of the XML file ( something like http://mydomain.com/daily-extract.xml )
This is what my code looks like:
private String xmlFile = "D:\\default-user\\WINXP\\Desktop\\extract-jan10d.xml";
SAXBuilder builder = new SAXBuilder("org.apache.xerces.parsers.SAXParser");
// Parse the specified file and convert it to a JDOM document
document = builder.build(new File(xmlFile));
Element root = document.getRootElement();
How can I switch from a file to a URL on the internet
Try to replace this line :
document = builder.build(new File(xmlFile));
By :
document = builder.build(new File(new URI("http://mydomain.com/daily-extract.xml")));

Java XML parsing error : Content is not allowed in prolog

My code write a XML file with the LSSerializer class :
DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS","3.0");
LSSerializer ser = implLS.createLSSerializer();
String str = ser.writeToString(doc);
System.out.println(str);
String file = racine+"/"+p.getNom()+".xml";
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
out.write(str);
out.close();
The XML is well-formed, but when I parse it, I get an error.
Parse code :
File f = new File(racine+"/"+filename);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(f);
XPathFactory xpfactory = XPathFactory.newInstance();
XPath xp = xpfactory.newXPath();
String expression;
expression = "root/nom";
String nom = xp.evaluate(expression, doc);
The error :
[Fatal Error] Terray.xml:1:40: Content is not allowed in prolog.
9 août 2011 19:42:58 controller.MakaluController activatePatient
GRAVE: null
org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:208)
at model.MakaluModel.setPatientActif(MakaluModel.java:147)
at controller.MakaluController.activatePatient(MakaluController.java:59)
at view.ListePatientsPanel.jButtonOKActionPerformed(ListePatientsPanel.java:92)
...
Now, with some research, I found that this error is dure to a "hidden" character at the very beginning of the XML.
In fact, I can fix the bug by creating a XML file manually.
But where is the error in the XML writing ? (When I try to println the string, there is no space before ths
Solution : change the serializer
I run the solution of UTF-16 encoding for a while, but it was not very stable.
So I found a new solution : change the serializer of the XML document, so that the encoding is coherent between the XML header and the file encoding. :
DOMSource domSource = new DOMSource(doc);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
String file = racine+"/"+p.getNom()+".xml";
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT,"yes");
transformer.transform(domSource, new StreamResult(out));
But where is the error in the XML writing ?
Looks like the error is not in the writing but the parsing. As you have already discovered there is a blank character at the beginning of the file, which causes the error in the parse call in your stach trace:
Document doc = builder.parse(f);
The reason you do not see the space when you print it out may be simply the encoding you are using. Try changing this line:
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
to use 'UTF-16' or 'US-ASCII'
I think that it is probably linked to BOM (Byte Order Mark). See Wikipedia
You can verify with Notepad++ by example : Open your file and check the "Encoding" Menu to see if you're in "UTF8 without BOM" or "UTF8 with BOM".
Using UTF-16 is the way to go,
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(fileName),"UTF-16");
This can read the file with no issues
Try this code:
InputStream is = new FileInputStream(file);
Document doc = builder.parse(is , "UTF-8");

Writing XML in different character encodings with Java

I am attempting to write an XML library file that can be read again into my program.
The file writer code is as follows:
XMLBuilder builder = new XMLBuilder();
Document doc = builder.build(bookList);
DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS", "3.0");
LSSerializer ser = implLS.createLSSerializer();
String out = ser.writeToString(doc);
//System.out.println(out);
try{
FileWriter fstream = new FileWriter(location);
BufferedWriter outwrite = new BufferedWriter(fstream);
outwrite.write(out);
outwrite.close();
}catch (Exception e){
}
The above code does write an xml document.
However, in the XML header, it is an attribute that the file is encoded in UTF-16.
when i read in the file, i get the error:
"content not allowed in prolog"
this error does not occur when the encoding attribute is manually changed to UTF-8.
I am trying to get the above code to write an XML document encoded in UTF-8, or successfully parse a UTF-16 file.
the code for parsing in is
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder loader = factory.newDocumentBuilder();
Document document = loader.parse(filename);
the last line returns the error.
the LSSerializer writeToString method does not allow the Serializer to pick a encoding.
with the setEncoding method of an instance of LSOutput, LSSerializer's write method can be used to change encoding. the LSOutput CharacterStream can be set to an instance of the BufferedWriter, such that calls from LSSerializer to write will write to the file.

Categories

Resources