IText html to pdf wrapping line

IText html to pdf wrapping line - java

Hello I'm creating javafx app with iText. I have html editor to write text and I want to create pdf from it. Everything works but when I have a really long line that is wrapped in html editor, in pdf it isn't wrapped, its out of page, how can I set wrapping page? here is my code:
PdfWriter writer = null;
try {
writer = new PdfWriter("doc.pdf");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements(editor.getHtmlText());
} catch (IOException e) {
e.printStackTrace();
}
// add elements to document
for (IElement p : list) {
document.add((IBlockElement) p);
}
// close document
document.close();
I also want to set line spacing for this text
Thank you for help

I don't get any errors for the following code:
public class stack_overflow_0008 extends AbstractSupportTicket{
private static String LONG_PIECE_OF_TEXT =
"Once upon a midnight dreary, while I pondered, weak and weary," +
"Over many a quaint and curious volume of forgotten lore—" +
"While I nodded, nearly napping, suddenly there came a tapping," +
"As of some one gently rapping, rapping at my chamber door." +
"Tis some visitor,” I muttered, “tapping at my chamber door—" +
"Only this and nothing more.";
public static void main(String[] args)
{
PdfWriter writer = null;
try {
writer = new PdfWriter(getOutputFile());
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements("<p>" + LONG_PIECE_OF_TEXT + "</p>");
} catch (IOException e) {
e.printStackTrace();
}
for (IElement p : list) {
document.add((IBlockElement) p);
}
document.close();
}
}
The document is a single (A4) page PDF with one string neatly wrapped.
I think perhaps the content of your string is to blame?
Could you post the HTML you get from this editor object?
Update:
Using the code from this answer on the HTML shared in a new comment to the question, I get the following result:
As you can see, the content is distributed over two lines. No content "falls off the page."

Related

merge multiple pdfs in order

hey guys sorry for long post and bad language and if there is unnecessary details
i created multiple 1page pdfs from one pdf template using excel document
i have now
something like this
tempfile0.pdf
tempfile1.pdf
tempfile2.pdf
...
im trying to merge all files in one single pdf using itext5
but it semmes that the pages in the resulted pdf are not in the order i wanted
per exemple
tempfile0.pdf in the first page
tempfile1. int the 2000 page
here is the code im using.
the procedure im using is:
1 filling a from from a hashmap
2 saving the filled form as one pdf
3 merging all the files in one single pdf
public void fillPdfitext(int debut,int fin) throws IOException, DocumentException {
for (int i =debut; i < fin; i++) {
HashMap<String, String> currentData = dataextracted[i];
// textArea.appendText("\n"+pdfoutputname +" en cours de preparation\n ");
PdfReader reader = new PdfReader(this.sourcePdfTemplateFile.toURI().getPath());
String outputfolder = this.destinationOutputFolder.toURI().getPath();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputfolder+"\\"+"tempcontrat"+debut+"-" +i+ "_.pdf"));
// get the document catalog
AcroFields acroForm = stamper.getAcroFields();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null) {
for (String key : currentData.keySet()) {
try {
String fieldvalue=currentData.get(key);
if (key=="ADRESSE1"){
fieldvalue = currentData.get("ADRESSE1")+" "+currentData.get("ADRESSE2") ;
acroForm.setField("ADRESSE", fieldvalue);
}
if (key == "IMEI"){
acroForm.setField("NUM_SERIE_PACK", fieldvalue);
}
acroForm.setField(key, fieldvalue);
// textArea.appendText(key + ": "+fieldvalue+"\t\t");
} catch (Exception e) {
// e.printStackTrace();
}
}
stamper.setFormFlattening(true);
}
stamper.close();
}
}
this is the code for merging
public void Merge() throws IOException, DocumentException
{
File[] documentPaths = Main.objetapp.destinationOutputFolder.listFiles((dir, name) -> name.matches( "tempcontrat.*\\.pdf" ));
Arrays.sort(documentPaths, NameFileComparator.NAME_INSENSITIVE_COMPARATOR);
byte[] mergedDocument;
try (ByteArrayOutputStream memoryStream = new ByteArrayOutputStream())
{
Document document = new Document();
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.open();
for (File docPath : documentPaths)
{
PdfReader reader = new PdfReader(docPath.toURI().getPath());
try
{
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
}
finally
{
pdfSmartCopy.freeReader(reader);
reader.close();
}
}
document.close();
mergedDocument = memoryStream.toByteArray();
}
FileOutputStream stream = new FileOutputStream(this.destinationOutputFolder.toURI().getPath()+"\\"+
this.sourceDataFile.getName().replaceFirst("[.][^.]+$", "")+".pdf");
try {
stream.write(mergedDocument);
} finally {
stream.close();
}
documentPaths=null;
Runtime r = Runtime.getRuntime();
r.gc();
}
my question is how to keep the order of the files the same in the resulting pdf

It is because of naming of files. Your code
new FileOutputStream(outputfolder + "\\" + "tempcontrat" + debut + "-" + i + "_.pdf")
will produce:
tempcontrat0-0_.pdf
tempcontrat0-1_.pdf
...
tempcontrat0-10_.pdf
tempcontrat0-11_.pdf
...
tempcontrat0-1000_.pdf
Where tempcontrat0-1000_.pdf will be placed before tempcontrat0-11_.pdf, because you are sorting it alphabetically before merge.
It will be better to left pad file number with 0 character using leftPad() method of org.apache.commons.lang.StringUtils or java.text.DecimalFormat and have it like this tempcontrat0-000000.pdf, tempcontrat0-000001.pdf, ... tempcontrat0-9999999.pdf.
And you can also do it much simpler and skip writing into file and then reading from file steps and merge documents right after the form fill and it will be faster. But it depends how many and how big documents you are merging and how much memory do you have.
So you can save the filled document into ByteArrayOutputStream and after stamper.close() create new PdfReader for bytes from that stream and call pdfSmartCopy.getImportedPage() for that reader. In short cut it can look like:
// initialize
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
for (int i = debut; i < fin; i++) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
// fill in the form here
stamper.close();
PdfReader reader = new PdfReader(out.toByteArray());
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
// other actions ...
}

Apache POI Table of contents not updating

I am using Apache POI XWPF components and java, to extract data from a .xml file into a word document. So far so good, but I am struggling to create a table of contents.
I have to create a table of contents at the start of the method and then I update it at the end to get all the new headers. Currently I use doc.createTOC(), where doc is a variable created from XWPFDocument, to create the table at the start and then I use doc.enforceUpdateFields() to update everything at the end of the document. But when I open the document after I ran the program, the table of contents is empty, but the navigation panel does include some of the headers I specified.
A comment recommended that I include some code. So i started off by create the document from a template:
XWPFDocument doc = new XWPFDocument(new FileInputStream("D://Template.docx"));
I then create a table of contents:
doc.createTOC();
Then throughout the method I add headers to the document:
XWPFParagraph documentControlHeading = doc.createParagraph();
documentControlHeading.setPageBreak(true);
documentControlHeading.setAlignment(ParagraphAlignment.LEFT);
documentControlHeading.setStyle("Tier1Header");
After all the headers are added, I want to update the document so that all the new headers will appear in the table of contents. I do this buy using the following command:
doc.enforceUpdateFields();

Hmmm... I am looking at the createTOC() method code, and it appears that it looks for styles that look like Heading #. So Tier1Header would not be found. Try creating your text first, and use styles like Heading 1 for your headings. Then add the TOC using createTOC(). It should find all the headings when the TOC is created. I do not know if enforceUpdateFields() affects the TOC.

//Your docx template should contain the following or something similar text //which will be searched for and replaced with a WORD TOC.
//${TOC}
public static void main(String[] args) throws IOException, OpenXML4JException {
XWPFDocument docTemplate = null;
try {
File file = new File(PATH_TO_FILE); //"C:\\Reports\\Template.docx";
FileInputStream fis = new FileInputStream(file);
docTemplate = new XWPFDocument(fis);
generateTOC(docTemplate);
saveDocument(docTemplate);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (docTemplate != null) {
docTemplate.close();
}
}
}
private static void saveDocument(XWPFDocument docTemplate) throws FileNotFoundException, IOException {
FileOutputStream outputFile = null;
try {
outputFile = new FileOutputStream(OUTFILENAME);
docTemplate.write(outputFile);
} finally {
if (outputFile != null) {
outputFile.close();
}
}
}
public static void generateTOC(XWPFDocument document) throws InvalidFormatException, FileNotFoundException, IOException {
String findText = "${TOC}";
String replaceText = "";
for (XWPFParagraph p : document.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
int pos = r.getTextPosition();
String text = r.getText(pos);
if (text != null && text.contains(findText)) {
text = text.replace(findText, replaceText);
r.setText(text, 0);
addField(p, "TOC \\o \"1-3\" \\h \\z \\u");
break;
}
}
}
}
private static void addField(XWPFParagraph paragraph, String fieldName) {
CTSimpleField ctSimpleField = paragraph.getCTP().addNewFldSimple();
// ctSimpleField.setInstr(fieldName + " \\* MERGEFORMAT ");
ctSimpleField.setInstr(fieldName);
ctSimpleField.addNewR().addNewT().setStringValue("<<fieldName>>");
}

This is the code of createTOC(), obtained by inspecting XWPFDocument.class:
public void createTOC() {
CTSdtBlock block = getDocument().getBody().addNewSdt();
TOC toc = new TOC(block);
for (XWPFParagraph par : this.paragraphs) {
String parStyle = par.getStyle();
if ((parStyle != null) && (parStyle.startsWith("Heading"))) try {
int level = Integer.valueOf(parStyle.substring("Heading".length())).intValue();
toc.addRow(level, par.getText(), 1, "112723803");
} catch (NumberFormatException e) {
e.printStackTrace();
}
}
}
As you can see, it adds to the TOC all paragraphs having styles named "HeadingX", with X being a number. But, unfortunately, that's not sufficent. The method, in fact, is bugged/uncomplete in its implementation.
The page number passed to addRow() is always 1, it's not even calculated.
So, at the end, you will have a TOC with all your paragraphs and the trailing dots giving the proper indentation, but the pages will be always equal to "1".
EDIT
...but, there's a solution here.

Internal links are not working on merged PDF files

I am currently modifying the logic for generating a PDF report, using itext package (itext-2.1.7.jar). The report is generated first, then - table of contents (TOC) and then TOC is inserted into the report file, using PdfStamper and it's method - replacePage:
PdfStamper stamper = new PdfStamper(new PdfReader(finalFile.getAbsolutePath()),
new FileOutputStream(reportFile));
stamper.replacePage(new PdfReader(tocFile.getAbsolutePath()), 1, 2);
However, I wanted TOC to also have internal links to point to the right chapter. Using replacePage() method, unfortunately, removes all links and annotations, so I tried another way. I've used a method I've found to merge the report file and the TOC file. It did the trick, meaning that the links were now preserved, but they don't seem to work, the user can click them, but nothing happens. I've placed internal links on other places in the report file, for testing purposes, and they seem to work, but they don't work when placed on the actual TOC. TOC is generated separately, here's how I've created the links (addLeadingDots is a local method, not pertinent to the problem):
Chunk anchor = new Chunk(idx + ". " + chapterName + getLeadingDots(chapterName, pageNumber) + " " + pageNumber, MainReport.FONT12);
String anchorLocation = idx+chapterName.toUpperCase().replaceAll("\\s","");
anchor.setLocalGoto(anchorLocation);
line = new Paragraph();
line.add(anchor);
line.setLeading(6);
table.addCell(line);
And this is how I've created anchor destinations on the chapters in the report file:
Chunk target = new Chunk(title.toUpperCase(), font);
String anchorDestination = title.toUpperCase().replace(".", "").replaceAll("\\s","");
target.setLocalDestination(anchorDestination);
Paragraph p = new Paragraph(target);
p.setAlignment(Paragraph.ALIGN_CENTER);
doc.add(p);
Finally, here's the method that supposed to merge report file and the TOC:
public static void mergePDFs(String originalFilePath, String fileToInsertPath, String outputFile, int location) {
PdfReader originalFileReader = null;
try {
originalFileReader = new PdfReader(originalFilePath);
} catch (IOException ex) {
System.out.println("ITextHelper.addPDFToPDF(): can't read original file: " + ex);
}
PdfReader fileToAddReader = null;
try {
fileToAddReader = new PdfReader(fileToInsertPath);
} catch (IOException ex) {
System.out.println("ITextHelper.addPDFToPDF(): can't read fileToInsert: " + ex);
}
if (originalFileReader != null && fileToAddReader != null) {
int numberOfOriginalPages = originalFileReader.getNumberOfPages();
Document document = new Document();
PdfCopy copy = null;
try {
copy = new PdfCopy(document, new FileOutputStream(outputFile));
document.open();
for (int i = 1; i <= numberOfOriginalPages; i++) {
if (i == location) {
for (int j = 1; j <= fileToAddReader.getNumberOfPages(); j++) {
copy.addPage(copy.getImportedPage(fileToAddReader, j));
}
}
copy.addPage(copy.getImportedPage(originalFileReader, i));
}
document.close();
} catch (DocumentException | FileNotFoundException ex) {
m_logger.error("ITextHelper.addPDFToPDF(): can't read output location: " + ex);
} catch (IOException ex) {
m_logger.error(Level.SEVERE, ex);
}
}
}
So, the main question is, what happens to the links, after both pdf documents are merged and how to make them work? I'm not too experienced with itext, so any help is appreciated.
Thanks for your help

How to parse this site with Jsoup or another parser?

I'am trying to parse a page which has no defined encoding in its header, in the HTML it defines ISO-8859-1 as encoding. Jsoup isn't able to parse it with default settings (also HTMLunit and PHP's Simple HTML Dom Parser can't handle it by default). Even if I define the encoding for Jsoup myself it still isn't working. Can't figure out why.
Here's my code:
String url = "http://www.parkett.de";
Document doc = null;
try {
doc = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url);
// doc = Jsoup.parse(new URL(url).openStream(), "CP1252", url);
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
Element extractHtml = null;
Elements elements = null;
String title = null;
elements = doc.select("h1");
if(!elements.isEmpty()) {
extractHtml = elements.get(0);
title = extractHtml.text();
}
System.out.println(title);
Thanks for any suggestions!

When working with URLs, chapters 4 & 9 of the cookbook recommend using Jsoup.connect(...).get(). Chapter 5 suggests using Jsoup.parse() when loading a document from a local file.
public static void main(String[] args) {
Document doc = null;
try {
doc = Jsoup.connect("http://www.parkett.de/").get();
} catch (IOException e) {
e.printStackTrace();
}
Element firstH1 = doc.select("h1").first();
System.out.println((firstH1 != null) ? firstH1.text() : "First <h1> not found.");
}

iText continuous PDF editing java

I'm using iText library to create and add data to a PDF.
I want to add some textLines and an image to the PDF more than once until i close the file.
numOfSamples = timeIHitTheButton();
.
.
.
*a loop tha call it the number of times choosen by numOfSamples*
DSM.saveData();
The DataStore (DSM is a DataStore instance) class creates the Document doc.pdf correctly and DSM.addText() and DSM.addPicture() prints correctly three textlines an an image on the file, but only if I press the button just once !!
I WANT TO WRITE THE SAME STRING AND AN IMAGE EVERY TIME I PRESS THE BUTTON (if I press it once i have one sample, if trwice i have two samples etc). IF I PRESS IT JUST ONCE AND I TERMINATE, I GET MY PDF WITH THE STRING AND THE PICTURES, BUT IF I PRESS IT MORE THAN ONCE, I GOT AN UNREADABLE AND DAMAGED PDF FILE. I DON'T KNOW WHY. HOW CAN I CONTINUE WRITIN A PICTURE AND THE STRING CONTINUOSLY UNTIL THE NUMBER OF SAMPLES IS FINISHED?
Here i post some code if useful ("newPic1.jpg" "newPic2.jpg" etc are the stored pictures to add to the PDF togheter with the text.):
public class DataStore{ ....
.
.
.
public DataStore(String Str1, String Str2, String Str3, int numOfSemples)
throws Exception{
document = new Document();
String1 = str1;
String2 = str2;
String3 = str3;
Samples = numOfSemples;
document.open();
}
privatevoid saveData(){
if(!created){
this.createFile();
created=true;
}
this.addText();
this.addPicture();
}
private void createFile(){
try {
OutputStream file = new FileOutputStream(
new File("Doc.pdf"));
PdfWriter.getInstance(document, file);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (DocumentException e) {
e.printStackTrace();
}
}
private void addText(){
try {
if(Samples > 0)
document.open();
document.add(new Paragraph(Double.toString(String1)));
document.add(new Paragraph(Double.toString(String2)));
document.add(new Paragraph(Double.toString(String3)));
} catch (DocumentException e) {
e.printStackTrace();
}
}
private void addPicture(){
try {
Image img = Image.getInstance("NewPic" + Samples + ".jpg");
document.add(img);
} catch (BadElementException bee) {
bee.printStackTrace();
} catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} catch (DocumentException dee) {
dee.printStackTrace();
}
if(Samples == 0)
document.close();
else Samples--;
}
}

You use iText commands in the wrong order:
Your DataStore constructor creates a new Document and calls its open method (which is too early as there is no writer yet).
Some time later, in the first saveData call, you call createFile which creates the PdfWriter.
In all saveData calls addText is called which for Samples > 0 opens the document again each time (which is ok at the first time but shall not be done multiple times).
Eventually, in the saveData call with Samples == 0 you close the document.
Thus, in essence you do this:
document = new Document();
document.open();
[...]
PdfWriter.getInstance(document, file);
[...]
[for `Samples` times]
document.open();
[add some paragraphs]
[add an image]
[end for]
document.close();
Compare this to how it should be done:
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream(filename));
// step 3
document.open();
// step 4
[add content to the PDF]
// step 5
document.close();
(copied from the HelloWorld.java sample from iText in Action — 2nd Edition)
Only for Samples == 1 you have it about right (the superfluous document.open() in the constructor being ignored as there is no writer yet); for larger values of Samples, though, you open the document multiple times with a writer present which will likely append a PDF start over and over again to the output stream.
Quite likely you can fix the issue by removing all your current document.open() calls (including the if(Samples > 0) in addText()) and add one in createFile() right after PdfWriter.getInstance(document, file).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

IText html to pdf wrapping line - java

Related

merge multiple pdfs in order

Apache POI Table of contents not updating

Internal links are not working on merged PDF files

How to parse this site with Jsoup or another parser?

iText continuous PDF editing java

Categories

Resources