I am working on a java rcp application. Whenever user updates the details in UI, we are suppose to update the same details in html report also. Is there a we can update/add the html elements using java. Using Jsoup I am able to get the required element ID, but not able to innert/update new element to it.
Document htmlFile = null;
try {
htmlFile = Jsoup.parse(new File("C:\\ItemDetails1.html"), "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
Element div = htmlFile.getElementById("row2_comment");
System.out.println("text: " + div.html());
div.html("<li><b>Comments</b></li><ul><li>Testing for comment</li></ul>");
Any thoughts
Try:
Element div =
htmlFile.getElementById("row2_comment");
div.appendElement("p").attr("class",
"beautiful").text("Some New Text")
To add a new paragraph with some style and text content
Related
Hello I'm creating javafx app with iText. I have html editor to write text and I want to create pdf from it. Everything works but when I have a really long line that is wrapped in html editor, in pdf it isn't wrapped, its out of page, how can I set wrapping page? here is my code:
PdfWriter writer = null;
try {
writer = new PdfWriter("doc.pdf");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements(editor.getHtmlText());
} catch (IOException e) {
e.printStackTrace();
}
// add elements to document
for (IElement p : list) {
document.add((IBlockElement) p);
}
// close document
document.close();
I also want to set line spacing for this text
Thank you for help
I don't get any errors for the following code:
public class stack_overflow_0008 extends AbstractSupportTicket{
private static String LONG_PIECE_OF_TEXT =
"Once upon a midnight dreary, while I pondered, weak and weary," +
"Over many a quaint and curious volume of forgotten lore—" +
"While I nodded, nearly napping, suddenly there came a tapping," +
"As of some one gently rapping, rapping at my chamber door." +
"Tis some visitor,” I muttered, “tapping at my chamber door—" +
"Only this and nothing more.";
public static void main(String[] args)
{
PdfWriter writer = null;
try {
writer = new PdfWriter(getOutputFile());
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements("<p>" + LONG_PIECE_OF_TEXT + "</p>");
} catch (IOException e) {
e.printStackTrace();
}
for (IElement p : list) {
document.add((IBlockElement) p);
}
document.close();
}
}
The document is a single (A4) page PDF with one string neatly wrapped.
I think perhaps the content of your string is to blame?
Could you post the HTML you get from this editor object?
Update:
Using the code from this answer on the HTML shared in a new comment to the question, I get the following result:
As you can see, the content is distributed over two lines. No content "falls off the page."
This is my code:
try {
dozen = magazijn.getFfd().vraagDozenOp();
for (int i = 0; i < dozen.size(); i++) {
PdfWriter.getInstance(doc, new FileOutputStream("Order" + x + ".pdf"));
System.out.println("Writer instance created");
doc.open();
System.out.println("doc open");
Paragraph ordernummer = new Paragraph(order.getOrdernummer());
doc.add(ordernummer);
doc.add( Chunk.NEWLINE );
for (String t : text) {
Paragraph klant = new Paragraph(t);
doc.add(klant);
}
doc.add( Chunk.NEWLINE );
Paragraph datum = new Paragraph (order.getDatum());
doc.add(datum);
doc.add( Chunk.NEWLINE );
artikelen = magazijn.getFfd().vraagArtikelenOp(i);
for (Artikel a : artikelen){
artikelnr.add(a.getArtikelNaam());
}
for (String nr: artikelnr){
Paragraph Artikelnr = new Paragraph(nr);
doc.add(Artikelnr);
}
doc.close();
artikelnr.clear();
x++;
System.out.println("doc closed");
}
} catch (Exception e) {
System.out.println(e);
}
I get this exception: com.itextpdf.text.DocumentException: The document has been closed. You can't add any Elements.
can someone help me fix this so that the other pdf can be created and paragrphs added?
Alright, your intent is not very clear from your code and question so I'm going to operate under the following assumptions:
You are creating a report for each box you're processing
Each report needs to be a separate PDF file
You're getting a DocumentException on the second iteration of the loop, you're trying to add content to a Document that has been closed in the previous iteration via doc.close();. 'doc.close' will finalize the Document and write everything still pending to any linked PdfWriter.
If you wish to create separate pdfs for each box, you need to create a seperate Document in your loop statement as well, since creating a new PdfWriter via PdfWriter.getInstance(doc, new FileOutputStream("Order" + x + ".pdf")); will not create a new Document on its own.
If I'm wrong with assumption 2 and you wish to add everything to a single PDF, move doc.close(); outside of the loop and create only a single PdfWriter
You can try something like this using Apache PDFBox
File outputFile = new File(path);
outputFile.createNewFile();
PDDocument newDoc = new PDDocument();
then create a PDPage and write what you wanna write in that page. After your page is ready, add it to the newDoc and in the end save it and close it
newDoc.save(outputFile);
newDoc.close()
repeat this dozen.size() times and keep changing the file's name in path for every new document.
I seem to be having this error where text is being written to a file twice, the first time with incorrect formatting and the second with correct formatting. The method below takes in this URL after it's been converted properly. The method is supposed to get print a newline in between the text conversion of all of the children of dividers that are children of the divider "ffaq" where all the body text resides. Any help would be appreciated. I'm fairly new to using jsoup so an explanation would be nice as well.
/**
* Method to deal with HTML 5 Gamefaq entries.
* #param url The location of the HTML 5 entry to read.
**/
public static void htmlDocReader(URL url) {
try {
Document doc = Jsoup.parse(url.openStream(), "UTF-8", url.toString());
//parse pagination label
String[] num = doc.select("div.span12").
select("ul.paginate").
select("li").
first().
text().
split("\\s+");
//get the max page number
final int max_pagenum = Integer.parseInt(num[num.length - 1]);
//create a new file based on the url path
File file = urlFile(url);
PrintWriter outFile = new PrintWriter(file, "UTF-8");
//Add every page to the text file
for(int i = 0; i < max_pagenum; i++) {
//if not the first page then change the url
if(i != 0) {
String new_url = url.toString() + "?page=" + i;
doc = Jsoup.parse(new URL(new_url).openStream(), "UTF-8",
new_url.toString());
}
Elements walkthroughs = doc.select("div.ffaq");
for(Element elem : walkthroughs.select("div")) {
for(Element inner : elem.children()) {
outFile.println(inner.text());
}
}
}
outFile.close();
} catch(Exception e) {
e.printStackTrace();
System.exit(1);
}
}
For every element you call text() you print all the text of its structure.
Assume the below example
<div>
text of div
<span>text of span</span>
</div>
if you call text() for div element you will get
text of div text of span
Then if you call text() for span you will get
text of span
What you need, in order to avoid duplicates is to use ownText(). This will get only the direct text of the element, and not the text of its children.
Long story sort change this
for(Element elem : walkthroughs.select("div")) {
for(Element inner : elem.children()) {
outFile.println(inner.text());
}
}
To this
for(Element elem : walkthroughs.select("div")) {
for(Element inner : elem.children()) {
String line = inner.ownText().trim();
if(!line.equals("")) //Skip empty lines
outFile.println(line);
}
}
I've just started using JSOUP and I'm trying to get the value in the textarea. The below is the element info from the HTML;
The below is the code that I'm using to attempt to read the value in the textarea;
try {
String html = "http://aviprobo.doorfree.com/control.html";
Document doc = Jsoup.connect(html).get();
Element textarea = doc.getElementById("control");
System.out.println("textarea value = " + textarea.val());
} catch (IOException e) {
//
}
The value of textarea.val() is empty. Could someone please point me in the right direction.
Thanks.
Document doc = Jsoup.connect("http://sports.163.com/13/0830/22/97IFSI5I00051CD5.html").get();
**Entities.EscapeMode.base.getMap().clear();**
Elements elements = doc.select("textarea[id^=photoList]");
for(Element e:elements){
System.out.println(e.html());
}
I have a question. When I try to get images from a web page by using Jsoup in Java.
Here is the code:
String link = "http://truyentranhtuan.com/detective-conan/856/doc-truyen/";
Document docs = Jsoup.connect(link).timeout(60000).get();
Elements comics = docs.select("#hienthitruyen img");
System.out.println(comics.size());
for (Element comic : comics) {
int i = 0;
System.out.println(comic);
String linkImage = comic.attr("src");
if (!"".equals(linkImage)) {
URL url = new URL(linkImage);
BufferedImage image = ImageIO.read(url);
ImageIO.write(image, "jpg", new File(i + ".jpg"));
i++;
}
}
The problem is I can't get any img tag in this web page. The size of Elements always be zero.
But when I view source in this web page the img tag always be there.
If you look at the real source, not the DOM structure (for example, save the HTML page and open it in Notepad), you will see that there are no img tags there. They are all populated dynamically by the means of Javascript.
Now the problem is that Jsoup is not meant to execute Javascript, therefore you can only parse the original DOM structure, before it is modified (filled with images) by Javascript.
To do what you want, you can use HTMLUnit which can execute most of the Javascript.