The method createDOM not return document

The method createDOM not return document - java

I use HtmlCleaner 2.6.1 and Xpath to parse html page in Android application.
Here html page:
http://www.kino-govno.com/comments/42571-postery-kapitan-fillips-i-poslednij-rubezh
http://www.kino-govno.com/comments/42592-fantasticheskie-idei-i-mesta-ih-obitanija
The first link return document, is all right.The second link here in this place:
document = domSerializer.createDOM(tagNode);
returns nothing.
If you create a simple java project without android. That all works fine.
Here is the Code :
String queries = "//div[starts-with(#class, 'news_text op')]/p";
URL url = new URL(link2);
TagNode tagNode = new HtmlCleaner().clean(url);
CleanerProperties cleanerProperties = new CleanerProperties();
DomSerializer domSerializer = new DomSerializer(cleanerProperties);
document = domSerializer.createDOM(tagNode);
xPath = XPathFactory.newInstance().newXPath();
pageNode = (NodeList)xPath.evaluate(queries,document, XPathConstants.NODESET);
String val = pageNode.item(0).getFirstChild().getNodeValue();

That's because HtmlCleaner wraps the paragraphs of the second HTML page into another <div/>, so it is not a direct child any more. Use the descendent-or-self-axis // instead of the child-axis /:
//div[starts-with(#class, 'news_text op')]//p

Related

Get DIV contents with Java Swing

I'm trying to get DIV contents from previously fetched HTML document. I'm using Java Swing.
final java.io.Reader stringReader = new StringReader(html);
final HTMLEditorKit htmlKit = new HTMLEditorKit();
final HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
final HTMLEditorKit.Parser parser = new ParserDelegator();
parser.parse(stringReader, htmlDoc.getReader(0), true);
final javax.swing.text.Element el = htmlDoc.getElement("id");
This code should get a DIV with ID of "id" that I have inside html.
But what next? How to get the contents of div? Been searching it all around but only thing I found is how to get attribute value, not the Element contents.
Should I move to jsoup? I would rather use Java native, but so far I'm stuck.
Thanks!

not the Element contents.
Try something like:
int start = el.getStartOffset();
int end = el.getEndOffset();
String text = htmlDoc.getText(start, end - start);

Casting JDom 1.1.3 Element to Document without DocumentBuilderFactory or DocumentBuilder

I need to find the easier and the efficient way to convert a JDOM element (with all it's tailoring nodes) to a Document. ownerDocument( ) won't work as this is version JDOM 1.
Moreover, org.jdom.IllegalAddException: The Content already has an existing parent "root" exception occurs when using the following code.
DocumentBuilderFactory dbFac = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFac.newDocumentBuilder();
Document doc = null;
Element elementInfo = getElementFromDB();
doc = new Document(elementInfo);
XMLOutputter xmlOutput = new XMLOutputter();
byte[] byteInfo= xmlOutput.outputString(elementInfo).getBytes("UTF-8");
String stringInfo = new String(byteInfo);
doc = dBuilder.parse(stringInfo);

I think you have to use the following method of the element.
Document doc = <element>.getDocument();
Refer the API documentation It says
Return this parent's owning document or null if the branch containing this parent is currently not attached to a document.

JDOM content can only have one parent at a time, and you have to detatch it from one parent before you can attach it to another. This code:
Document doc = null;
Element elementInfo = getElementFromDB();
doc = new Document(elementInfo);
if that code is failing, it is because the getElementFromDB() method is returning an Element that is part of some other structure. You need to 'detach' it:
Element elementInfo = getElementFromDB();
elementInfo.detach();
Document doc = new Document(elementInfo);
OK, that solves the IllegalAddException
On the other hand, if you just want to get the document node containing the element, JDOM 1.1.3 allows you to do that with getDocument:
Document doc = elementInfo.getDocument();
Note that the doc may be null.
To get the top most element available, try:
Element top = elementInfo;
while (top.getParentElement() != null) {
top = top.getParentElement();
}
In your case, your elementInfo you get from the DB is a child of an element called 'root', something like:
<root>
<elementInfo> ........ </elementInfo>
</root>
That is why you get the message you do, with the word "root" in it:
The Content already has an existing parent "root"

From URL to Document object

I would like to transform a feed to a Document object.
I tried the following code but it seems it's not working with a real feed (uri = null), but it works with an XML file which is already in my computer.
The transform function :
public static Document obtainDocument(String feedurl) {
Document doc = null;
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL url = new URL(feedurl);
doc = builder.parse(url.openStream());
...Exceptions...
return doc;
}
EDIT
I'm pretty sure that the URL is right, I use:
String feedurl = "http://feeds2.feedburner.com/Pressecitron";
I tried to use the following code too:
public static Document obtainDocument(String feedurl) {
Document doc = null;
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL url = new URL(feedurl);
URLConnection conn = url.openConnection();
doc = builder.parse(conn.getInputStream());
...
return doc;
}
which seems to not works better
And my first version of parser used a String too, but my mate wants me to use a Document (if the connection doesn't work). It worked with the String if I remember well.

Have you tried all the possible ways of using the parse() method ?
Are you sure the URI / URL is correct ?
From the method that you have, you get the feedURL as a String. You can directly pass it to the parse() method and see if that works.

Add jsp directive from server side

i wanna do something like
Document dom = new Document();
Element ele = new Element("jsp:include");
dom.setRootElement(ele);
but its throwing error i am using jdom for getting dom(org.jdom.Document, org.jdom.Element)
whats wrong in doing this

Namespace ns = Namespace.getNamespace("jsp", "http://java.sun.com/JSP/Page");
Element element = new Element("include", ns);
Document dom = new Document(element);

xpaths not working in java

I am trying to access a url, get the html from it and use xpaths to get certain values from it. I am getting the html just fine and Jtidy seems to be cleaning it appropriately. However, when I try to get the desired values using xpaths, I get an empty NodeList back. I know my xpath expression is correct; I have tested it in other ways. Whats wrong with this code. Thanks for the help.
String url_string = base_url + countries[c];
URL url = new URL(url_string);
Tidy tidy = new Tidy();
tidy.setShowWarnings(false);
tidy.setXHTML(true);
tidy.setMakeClean(true);
Document doc = tidy.parseDOM(url.openStream(), null);
//tidy.pprint(doc, System.out);
String xpath_string = "id('catlisting')//a";
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(xpath_string);
NodeList nodes = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);
System.out.println("size="+nodes.getLength());
for (int r=0; r<nodes.getLength(); r++) {
System.out.println(nodes.item(r).getNodeValue());
}

Try "//div[#id='catlisting']//a"

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

The method createDOM not return document - java

That's because HtmlCleaner wraps the paragraphs of the second HTML page into another <div/>, so it is not a direct child any more. Use the descendent-or-self-axis // instead of the child-axis /: //div[starts-with(#class, 'news_text op')]//p

Related

Get DIV contents with Java Swing

Casting JDom 1.1.3 Element to Document without DocumentBuilderFactory or DocumentBuilder

From URL to Document object

Add jsp directive from server side

xpaths not working in java

Categories

Resources