I've just started using JSOUP and I'm trying to get the value in the textarea. The below is the element info from the HTML;
The below is the code that I'm using to attempt to read the value in the textarea;
try {
String html = "http://aviprobo.doorfree.com/control.html";
Document doc = Jsoup.connect(html).get();
Element textarea = doc.getElementById("control");
System.out.println("textarea value = " + textarea.val());
} catch (IOException e) {
//
}
The value of textarea.val() is empty. Could someone please point me in the right direction.
Thanks.
Document doc = Jsoup.connect("http://sports.163.com/13/0830/22/97IFSI5I00051CD5.html").get();
**Entities.EscapeMode.base.getMap().clear();**
Elements elements = doc.select("textarea[id^=photoList]");
for(Element e:elements){
System.out.println(e.html());
}
Related
Hello I'm creating javafx app with iText. I have html editor to write text and I want to create pdf from it. Everything works but when I have a really long line that is wrapped in html editor, in pdf it isn't wrapped, its out of page, how can I set wrapping page? here is my code:
PdfWriter writer = null;
try {
writer = new PdfWriter("doc.pdf");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements(editor.getHtmlText());
} catch (IOException e) {
e.printStackTrace();
}
// add elements to document
for (IElement p : list) {
document.add((IBlockElement) p);
}
// close document
document.close();
I also want to set line spacing for this text
Thank you for help
I don't get any errors for the following code:
public class stack_overflow_0008 extends AbstractSupportTicket{
private static String LONG_PIECE_OF_TEXT =
"Once upon a midnight dreary, while I pondered, weak and weary," +
"Over many a quaint and curious volume of forgotten lore—" +
"While I nodded, nearly napping, suddenly there came a tapping," +
"As of some one gently rapping, rapping at my chamber door." +
"Tis some visitor,” I muttered, “tapping at my chamber door—" +
"Only this and nothing more.";
public static void main(String[] args)
{
PdfWriter writer = null;
try {
writer = new PdfWriter(getOutputFile());
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements("<p>" + LONG_PIECE_OF_TEXT + "</p>");
} catch (IOException e) {
e.printStackTrace();
}
for (IElement p : list) {
document.add((IBlockElement) p);
}
document.close();
}
}
The document is a single (A4) page PDF with one string neatly wrapped.
I think perhaps the content of your string is to blame?
Could you post the HTML you get from this editor object?
Update:
Using the code from this answer on the HTML shared in a new comment to the question, I get the following result:
As you can see, the content is distributed over two lines. No content "falls off the page."
I am working on a java rcp application. Whenever user updates the details in UI, we are suppose to update the same details in html report also. Is there a we can update/add the html elements using java. Using Jsoup I am able to get the required element ID, but not able to innert/update new element to it.
Document htmlFile = null;
try {
htmlFile = Jsoup.parse(new File("C:\\ItemDetails1.html"), "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
Element div = htmlFile.getElementById("row2_comment");
System.out.println("text: " + div.html());
div.html("<li><b>Comments</b></li><ul><li>Testing for comment</li></ul>");
Any thoughts
Try:
Element div =
htmlFile.getElementById("row2_comment");
div.appendElement("p").attr("class",
"beautiful").text("Some New Text")
To add a new paragraph with some style and text content
I seem to be having this error where text is being written to a file twice, the first time with incorrect formatting and the second with correct formatting. The method below takes in this URL after it's been converted properly. The method is supposed to get print a newline in between the text conversion of all of the children of dividers that are children of the divider "ffaq" where all the body text resides. Any help would be appreciated. I'm fairly new to using jsoup so an explanation would be nice as well.
/**
* Method to deal with HTML 5 Gamefaq entries.
* #param url The location of the HTML 5 entry to read.
**/
public static void htmlDocReader(URL url) {
try {
Document doc = Jsoup.parse(url.openStream(), "UTF-8", url.toString());
//parse pagination label
String[] num = doc.select("div.span12").
select("ul.paginate").
select("li").
first().
text().
split("\\s+");
//get the max page number
final int max_pagenum = Integer.parseInt(num[num.length - 1]);
//create a new file based on the url path
File file = urlFile(url);
PrintWriter outFile = new PrintWriter(file, "UTF-8");
//Add every page to the text file
for(int i = 0; i < max_pagenum; i++) {
//if not the first page then change the url
if(i != 0) {
String new_url = url.toString() + "?page=" + i;
doc = Jsoup.parse(new URL(new_url).openStream(), "UTF-8",
new_url.toString());
}
Elements walkthroughs = doc.select("div.ffaq");
for(Element elem : walkthroughs.select("div")) {
for(Element inner : elem.children()) {
outFile.println(inner.text());
}
}
}
outFile.close();
} catch(Exception e) {
e.printStackTrace();
System.exit(1);
}
}
For every element you call text() you print all the text of its structure.
Assume the below example
<div>
text of div
<span>text of span</span>
</div>
if you call text() for div element you will get
text of div text of span
Then if you call text() for span you will get
text of span
What you need, in order to avoid duplicates is to use ownText(). This will get only the direct text of the element, and not the text of its children.
Long story sort change this
for(Element elem : walkthroughs.select("div")) {
for(Element inner : elem.children()) {
outFile.println(inner.text());
}
}
To this
for(Element elem : walkthroughs.select("div")) {
for(Element inner : elem.children()) {
String line = inner.ownText().trim();
if(!line.equals("")) //Skip empty lines
outFile.println(line);
}
}
I am reading a text file that contains HTML code from Google search results. Then I parse it and I try to extract the links with this code:
FileReader in = new FileReader("A.txt");
BufferedReader p = new BufferedReader(in);
while(p.readLine() != null)
{
String html = p.readLine();
Document doc = Jsoup.parse(html);
Elements Link = doc.select("a[href");
for(Element element :Link)
{
if(element != null)
{
System.out.println(element);
}
}
}
But I got many non-link strings. How can I show the links, not anything else?
Please try again with a complete selector, not only "a[href":
Elements links = doc.select("a[href]"); // a with href
See the Selector document for the full support - especially the examples on the right side.
I'am trying to parse a page which has no defined encoding in its header, in the HTML it defines ISO-8859-1 as encoding. Jsoup isn't able to parse it with default settings (also HTMLunit and PHP's Simple HTML Dom Parser can't handle it by default). Even if I define the encoding for Jsoup myself it still isn't working. Can't figure out why.
Here's my code:
String url = "http://www.parkett.de";
Document doc = null;
try {
doc = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url);
// doc = Jsoup.parse(new URL(url).openStream(), "CP1252", url);
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
Element extractHtml = null;
Elements elements = null;
String title = null;
elements = doc.select("h1");
if(!elements.isEmpty()) {
extractHtml = elements.get(0);
title = extractHtml.text();
}
System.out.println(title);
Thanks for any suggestions!
When working with URLs, chapters 4 & 9 of the cookbook recommend using Jsoup.connect(...).get(). Chapter 5 suggests using Jsoup.parse() when loading a document from a local file.
public static void main(String[] args) {
Document doc = null;
try {
doc = Jsoup.connect("http://www.parkett.de/").get();
} catch (IOException e) {
e.printStackTrace();
}
Element firstH1 = doc.select("h1").first();
System.out.println((firstH1 != null) ? firstH1.text() : "First <h1> not found.");
}