ALL,
I wrote a simple SAX XML parser. It works and I was testing it with local XML file. Here is my code:
SAXParserFactory spf = SAXParserFactory.newInstance();
XMLParser xmlparser = null;
try
{
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
xmlparser = new XMLParser();
reader.setContentHandler( xmlparser );
reader.parse( new InputSource( getResources().openRawResource( R.raw.categories ) ) );
Now I need to read this XML file from the website. The code I'm trying is:
public InputStream getXMLFile()
{
URL url = new URL("http://example.com/test.php?param=0");
InputStream stream = url.openStream();
Document doc = docBuilder.parse(stream);
}
reader.parse( new Communicator().getXMLFile() );
I'm getting compiler error
"The method parse(InputSource) is not applicable for the argument (InputStream)".
I need help figuring out what do I need.
Thank you.
While I hate to sound obvious, is there any reason you're not using this constructor?
InputSource source = new InputSource(stream);
Document doc = docBuilder.parse(source);
Note that that's very similar to what you're doing in the first section of code. After all, openRawResource returns an InputStream as well...
Related
Relevant code; barfs on instantiating the SAXSource:
TransformerFactory factory = TransformerFactory.newInstance();
XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Source input = new SAXSource(xmlReader, "http://books.toscrape.com/");
Result output = new StreamResult(System.out);
factory.newTransformer().transform(input, output);
The JavaDoc's say:
public SAXSource(XMLReader reader,
InputSource inputSource)
Create a SAXSource, using an XMLReader and a SAX InputSource. The
Transformer or SAXTransformerFactory will set itself to be the
reader's ContentHandler, and then will call reader.parse(inputSource).
Looking at InputSource shows:
InputSource(InputStream byteStream)
Create a new input source with a byte stream.
InputSource(Reader characterStream)
Create a new input source with a character stream.
So this would entail, for example, a character stream to read in html for the InputStream??
Would tagsoup better be used for this identity transform? But, how?
There is a constructor https://docs.oracle.com/javase/8/docs/api/org/xml/sax/InputSource.html#InputSource-java.lang.String- that takes a system id e.g. a URL so you can use Source input = new SAXSource(xmlReader, new InputSource("http://books.toscrape.com/"));.
You can get access to an InputStream that reads from the resource behind the URL like this:
InputStream i = new URL("http://...").openConnection().getInputStream();
Then you can use i for your SAXSource.
I must parse an XML from URL in Java with SAX parser. I didn't find an example on the internet about this topic. All of them are reading an XML from local. Is there an example that xml has nested tags and parsing from url in Java?
Refer this example java snippet
String webServiceURL="web service url or document url here";
URL geoLocationDetailXMLURL = new URL(webServiceURL);
URLConnection geoLocationDetailXMLURLConnection = geoLocationDetailXMLURL.openConnection();
geoLocationDetailXMLURLConnection.setConnectTimeout(120000);
geoLocationDetailXMLURLConnection.setReadTimeout(120000);
BufferedReader geoLeocationDetails = new BufferedReader(new InputStreamReader(geoLocationDetailXMLURLConnection.getInputStream(), "UTF-8"));
InputSource inputSource = new InputSource(geoLeocationDetails);
saxParser.parse(inputSource, handler);
This should help
SAX parser and a file from the nework
The important line being
xr.parse(new InputSource(sourceUrl.openStream()));
where sourceUrl is a string
Someone told me that Tika's XWPFWordExtractorDecorator class is used to convert docx into html. But I am not sure how to use this class to get the HTML from docx. Any other library for doing the same job is also appreciated/
You shouldn't use it directly
Instead, call Tika in the usual way, and it'll call the appropriate code for you
If you want XHTML from parsing a file, the code looks something like
// Either of these will work, the latter is recommended
//InputStream input = new FileInputStream("test.docx");
InputStream input = TikaInputStream.get(new File("test.docx"));
// AutoDetect is normally best, unless you know the best parser for the type
Parser parser = new AutoDetectParser();
// Handler for indented XHTML
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.setResult(new StreamResult(sw));
// Call the Tika Parser
try {
Metadata metadata = new Metadata();
parser.parse(input, handler, metadata, new ParseContext());
String xml = sw.toString();
} finally {
input.close();
}
Currently I have a java application that loads XML from a local file into a string. My code looks like this
private String xmlFile = "D:\\mylocalcomputer\\extract-2339393.xml";
String fileStr = FileUtils.readFileToString(new File(xmlFile));
How can I get the contents of the XML file if it was located on the internet, at a URL like http://mydomain.com/xml/extract-2000.xml ?
try the sax interface
private String xmlURL = "http://mydomain.com/xml/extract-2000.xml";
XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setContentHandler(handler);
reader.parse(new InputSource(new URL(xmlURL).openStream()));
For more information regarding SAX check this link
Check this code:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
InputStream inputStream = new FileInputStream(new File("http://mydomain.com/xml/extract-2000.xml"));
org.w3c.dom.Document doc = documentBuilderFactory.newDocumentBuilder().parse(inputStream);
StringWriter stw = new StringWriter();
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.transform(new DOMSource(doc), new StreamResult(stw));
stw.toString();
I'm writing an android application, and I would like to get an xml string from web and get all info it contains.
First of all, i get the string (this code works):
URL url = new URL("here my adrress");
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
String myData = reader.readLine();
reader.close();
Then, I use DOM:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(myData));
Still no problem. When I write
Document doc = db.parse(is);
the application doesn't do anything more. It stops, without errors.
Can someone please tell me what's going on?
I wouldn't know why your code doesn't work since there is no error but I can offer alternatives.
First, I am pretty sure your new InputStream "is" is unnecessary. "parse()" can take "url.openStream()" or "myData" directly as an argument.
Another cause of error could be that your xml data has more than one line(I know you said that the first part of your code worked but I'd rather mention it, just to be sure). If so, "reader.readLine()" will only get you a part of your xml data.
I hope this will help.
Use SAXParser instead of DOM parser. SAXParser is more efficient than DOM parser. Here is two good tutorials on SAXParser
1. http://www.androidpeople.com/android-xml-parsing-tutorial-using-saxparser
2. http://www.anddev.org/parsing_xml_from_the_net_-_using_the_saxparser-t353.html
Use XmlPullParser, it's very fast. Pass in the string from the web and get a hashtable with all the values.
public Hashtable<String, String> parse(String myData) {
XmlPullParser parser = Xml.newPullParser();
Hashtable<String, String> responseFromServer = new Hashtable<String, String>();
try {
parser.setInput(new StringReader (responseString));
int eventType = parser.getEventType();
while (eventType != XmlPullParser.END_DOCUMENT) {
if(eventType == XmlPullParser.START_TAG) {
String currentName = parser.getName();
String currentText = parser.nextText();
if (currentText.trim().length() > 0) {
responseFromServer.put(currentName, currentText);
}
}
eventType = parser.next();
}
} catch (Exception e) {
e.printStackTrace();
}
return responseFromServer;
}