I'm trying to get DIV contents from previously fetched HTML document. I'm using Java Swing.
final java.io.Reader stringReader = new StringReader(html);
final HTMLEditorKit htmlKit = new HTMLEditorKit();
final HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
final HTMLEditorKit.Parser parser = new ParserDelegator();
parser.parse(stringReader, htmlDoc.getReader(0), true);
final javax.swing.text.Element el = htmlDoc.getElement("id");
This code should get a DIV with ID of "id" that I have inside html.
But what next? How to get the contents of div? Been searching it all around but only thing I found is how to get attribute value, not the Element contents.
Should I move to jsoup? I would rather use Java native, but so far I'm stuck.
Thanks!
not the Element contents.
Try something like:
int start = el.getStartOffset();
int end = el.getEndOffset();
String text = htmlDoc.getText(start, end - start);
Related
lets say I have some xml:
<document>blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd</document>
and I now need to get the content of as a string, so I would have
blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd
i have been messing with this in my head for a while now, and I haven't seem to be able to figure it out.
Hope to get some directions to what I have to do.
EDIT:
just to be clear, lets say I have the XML like this:
SAXBuilder sb = new SAXBuilder();
Document doc = sb.build(new StringReader("<document>blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd</document>"));
and I now need to get the content of
It is very unusual to need to get an inconsistent subset of an XML document like you want. It's much more common to get just the text content: blabla test hello dfh sdfsd
Note that you can get a subset of the content as the "contentlist" of the root element, and then output just that list as a string:
XMLOutputter xout = new XMLOutputter();
String txt = xout.outputString(doc.getRootElement().getContent());
System.out.println(txt);
For me, I wrote the code:
public static void main(String[] args) throws JDOMException, IOException {
SAXBuilder sb = new SAXBuilder();
Document doc = sb.build(new StringReader("<document>blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd</document>"));
XMLOutputter xout = new XMLOutputter();
String txt = xout.outputString(doc.getRootElement().getContent());
System.out.println(txt);
}
and it output:
blabla<bold>test<list><item>hello<italics>dfh</italics></item></list></bold>sdfsd
Assume I have
<Sports>
<Soccer>
<Players>
<Player_1> Messi Leonel </Player_1>
</Players>
</Soccer>
</Sports>
How to get Player_1 node text in one line without iteration using Dom4J?
Return value should be: Messi Leonel
Thanks
Got it, to the person who looks something like this
File file = new File("/path/to/file.xml");
SAXReader reader = new SAXReader();
Document document = reader.read(file);
String name = document.selectSingleNode("//Sports/Soccer/Players/Player_1").getText();
I have this line in my SVG file:
<text id="region1Text" class="regionText" x="77" y="167">2</text>
I can get an object of Text class with this but I cant see any usable method for changing "2" to another number. The appendText method seems to do nothing and I see there is no "setText" method.
My code:
StringReader reader = new StringReader(svgInString);
uri = SVGCache.getSVGUniverse().loadSVG(reader, "myImage");
SVGDiagram diagram = SVGCache.getSVGUniverse().getDiagram(uri);
Text text = (Text) diagram.getElement("region1Text");
text.appendText("20");
When debugging I can see the content variable of the text object is set to "2"(so I think that text element is made correctly) but I'm not able to change it.
After appending the text, you have to use text.rebuild() function. In total it looks like this:
StringReader reader = new StringReader(svgInString);
uri = SVGCache.getSVGUniverse().loadSVG(reader, "myImage");
SVGDiagram diagram = SVGCache.getSVGUniverse().getDiagram(uri);
Text text = (Text) diagram.getElement("region1Text");
text.appendText("20");
text.rebuild();
I use HtmlCleaner 2.6.1 and Xpath to parse html page in Android application.
Here html page:
http://www.kino-govno.com/comments/42571-postery-kapitan-fillips-i-poslednij-rubezh
http://www.kino-govno.com/comments/42592-fantasticheskie-idei-i-mesta-ih-obitanija
The first link return document, is all right.The second link here in this place:
document = domSerializer.createDOM(tagNode);
returns nothing.
If you create a simple java project without android. That all works fine.
Here is the Code :
String queries = "//div[starts-with(#class, 'news_text op')]/p";
URL url = new URL(link2);
TagNode tagNode = new HtmlCleaner().clean(url);
CleanerProperties cleanerProperties = new CleanerProperties();
DomSerializer domSerializer = new DomSerializer(cleanerProperties);
document = domSerializer.createDOM(tagNode);
xPath = XPathFactory.newInstance().newXPath();
pageNode = (NodeList)xPath.evaluate(queries,document, XPathConstants.NODESET);
String val = pageNode.item(0).getFirstChild().getNodeValue();
That's because HtmlCleaner wraps the paragraphs of the second HTML page into another <div/>, so it is not a direct child any more. Use the descendent-or-self-axis // instead of the child-axis /:
//div[starts-with(#class, 'news_text op')]//p
Im a novice Java programmer trying to use the HTMLEditorKit library to traverse a HTML document and alter it to my linking (mostly for the fun of it, what I'm doing could be done in hand without a problem)
But my problem is: After i have modifed my HTML file i am left with a HTMLDocument that i have no clue how to save back to a HTML file.
HTMLEditorKit kit = new HTMLEditorKit();
File file = new File("local file")
HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
InputStreamReader(url.openConnection().getInputStream());
FileReader HTMLReader = new FileReader(file);
kit.read(HTMLReader, doc, 0);
after that i do my thing with the "doc" element.
Now that im done with that i just want to save it back, preferablly overwriting the file which i got HTML from in the first place.
Anyone able to tell me how to save the modified HTMLdocument into a html file afterwards?
You can use the write method of HTMLEditorKit class. Sample code here:
FileWriter writer = new FileWriter("local file");
try {
kit.write(writer, doc, 0, doc.getLength());
} finally {
writer.close();
}