iText style parsing HTML to PDF - java

I've a problem with iText.
I've followed this link: How to export html page to pdf format?
My snippet:
String str = "<html><head><body><div style=\"width:100%;height:100%;\"><h3 style=\"margin-left:5px;margin-top:40px\">First</h3><div style=\"margin-left:15px;margin-top:15px\"><title></title><p>sdasdasd shshshshdffgdfgd</p></div><h3 style=\"margin-left:5px;margin-top:40px\">The dream</h3><div style=\"margin-left:15px;margin-top:15px\"></div></div></body></head></html>";
String fileNameWithPath = "/Users/cecco/Desktop/pdf2.pdf";
com.itextpdf.text.Document document =
new com.itextpdf.text.Document(com.itextpdf.text.PageSize.A4);
FileOutputStream fos = new FileOutputStream(fileNameWithPath);
com.itextpdf.text.pdf.PdfWriter pdfWriter =
com.itextpdf.text.pdf.PdfWriter.getInstance(document, fos);
document.open();
document.addAuthor("Myself");
document.addSubject("My Subject");
document.addCreationDate();
document.addTitle("My Title");
com.itextpdf.text.html.simpleparser.HTMLWorker htmlWorker =
new com.itextpdf.text.html.simpleparser.HTMLWorker(document);
htmlWorker.parse(new StringReader(str.toString()));
document.close();
fos.close();
and work fine.
But tag style into h3 and div aren't considered.
But if I copy my html into http://htmledit.squarefree.com/ all is correct.
How can I solve this problem?

iText isn't the best Html Parser, but you can use Flying-Saucer for this. Flying-Saucer is build on top of iText but has a capable Xml / (X)Html parser. Short: Flying Saucer is perfect if you want html -> Pdf.
Here's how to generate the pdf from your string:
/*
* Note: i filled something in the title-tag and fixed the head tag (the whole body-tag was in the head)
*/
String str = "<html><head></head><body><div style=\"width:100%;height:100%;\"><h3 style=\"margin-left:5px;margin-top:40px\">First</h3><div style=\"margin-left:15px;margin-top:15px\"><title>t</title><p>sdasdasd shshshshdffgdfgd</p></div><h3 style=\"margin-left:5px;margin-top:40px\">The dream</h3><div style=\"margin-left:15px;margin-top:15px\"></div></div></body></html>";
OutputStream os = new FileOutputStream(new File("example.pdf"));
ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString(str);
renderer.layout();
renderer.createPDF(os);
os.close();
But: FS supports only valid Html / Xhtml / xml, so make shure it is.

Related

Conversion of HTML( with inline css) to PDF using itext 2.1.7

I want to convert one html page to pdf using itext 2.1.7. I have used HTMLWorker to convert the html file, but it not taking the inline css which I have used in the html. Below is my code snippet . Can anyone help to fix this issue..
PdfWriter pdfWriter = PdfWriter.getInstance(document, new
FileOutputStream("D:/testpdf.pdf"));
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(htmlContent));
document.close();
Thanks in Advance !
Use itext7-7.0.2 because iText 2.1.7 didn't support inline CSS.
String htmlContent = "<html><body style='color:red'> PDF project </body></html>";
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(new File("C:\\testpdf.pdf")));
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(htmlContent));
document.close();

Apache PdfBox Rotate Crop Box Only Not Text

I am trying to go from text to pdf but have only one of the pages rotated 90 degress. Main reason is that some of my text documents are a bit too large and need to be in landscape to look normal. I have tried a few things but it seems like everything rotates the text too. Is there an easy way to rotate the pdf to landscape but keep the text the same rotation?
OutputStream outputStream = response.getOutputStream();
PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
Map<String, Documents> documents = getDocuments(user, documentReports);
try (PDDocument documentToPrint = new PDDocument()){
for(Document doc : documentReports){
TextToPDF textToPDF = new TextToPDF();
textToPDF.setFont(PDType1Font.COURIER);
textToPDF.setFontSize(8);
Document documentReport = documents.get(doc.getId());
try(PDDocument pdDocument = textToPDF.createPDFFromText(new InputStreamReader(new ByteArrayInputStream(documentReport.getReportText().getBytes())))) {
pdfMergerUtility.appendDocument(documentToPrint, pdDocument);
}
}
pdfMergerUtility.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
LocalDateTime localUtcTime = Java8TimeUtil.getCurrentUtcTime();
documentToPrint.getDocumentInformation().setTitle(localUtcTime.toString());
response.setHeader("Content-Disposition", "inline; filename=" + localUtcTime + ".pdf");
response.setContentType("application/pdf");
documentToPrint.save(outputStream);
}
So this might not work for everyone but I figured it out for my specific requirement. TextToPDF has a method called setLandscape before creating the pdf from text. textToPDF.setLandscape(true);

How to set PDF page size A4 when we use ITextRenderer to generate PDF from thymeleaf HTML template?

How to set PDF page size A4 when we use ITextRenderer to generate PDF from thymeleaf HTML template ?
I have generated PDF but page size is not proper, how to set page size A4 ITextRenderer library in JAVA
ClassLoaderTemplateResolver templateResolver = new
ClassLoaderTemplateResolver();
templateResolver.setSuffix(".html");
templateResolver.setTemplateMode("HTML5");
TemplateEngine templateEngine = new TemplateEngine();
templateEngine.setTemplateResolver(templateResolver);
Context context = new Context();
context.setVariable("name", "Thomas");
String html = templateEngine.process("templates/Quote", context);
OutputStream outputStream = new FileOutputStream("message.pdf");
ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString(html);
renderer.layout();
renderer.createPDF(outputStream,true);
outputStream.close();
Please be aware that you are using FlyingSaucer, not iText.
FlyingSaucer is a product that internally uses (a very old version of) iText.
You are immediately cutting yourself off from 10+ years of bugfixes and developments.
If you are comfortable going for just iText, the best way of solving this issue is with pdfHTML.
It's an add-on we wrote to the iText7 core library that is specifically designed to convert HTML into PDF.
Simple example:
File src = new File("source_html_file.html");
File dest = new File("target_pdf_file.pdf");
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document doc = new Document(pdf, PageSize.A4);
InputStream stream = new FileInputStream(src);
ConverterProperties converterProperties = new ConverterProperties();
FontProvider dfp = new DefaultFontProvider(true, true, true);
converterProperties.setFontProvider(dfp);
HtmlConverter.convertToPdf(stream, pdf, converterProperties);
Check out the tutorials online for more information
https://developers.itextpdf.com/content/itext-7-examples/itext-7-converting-html-pdf
To fix this issue using jsoup, xhtmlrenderer from flying-saucer-pdf-openpdf just set this in your html-document:
#page {
size: A4;
}

pdfwriter doesn't translate special characters

I have HTML file with an external CSS. I want to create PDF from the HTML file, but the endcoing doesn't work. HTML file works fine, but after transfering to PDF, some characters in PDF are missing. (čřě...) It happens even if I set the Charset in PDFWriter constructor.
How do I solve this, please?
public void createPDF() {
try {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(username + ID + ".pdf"));
document.open();
String hovinko = username + ID + ".html";
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(hovinko), Charset.forName("UTF-8"));
document.close();
System.out.println("PDF Created!");
} catch (Exception ex) {
ex.printStackTrace();
}
}
Did you try to convert your special characters before writing them to your PDF?
yourHTMLString.replaceAll(oldChar, newChar);
ć = ć
ř = ř
ě = ě
If you need more special characters, visit this link.
EDIT: Then try this out, it worked for me:
BaseFont basefont = BaseFont.createFont("C:/Windows/Fonts/ARIAL.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(basefont, 12);
document.add(new Paragraph("čřě", font));
Try it with below logic. It worked for me:
InputStream is = new ByteArrayInputStream(hovinko.getBytes(Charset.forName("UTF-8")));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is, Charset.forName("UTF-8"));
I used xmlworker version 5.5.12 and itextpdf version 5.5.12.
I was strugling with sam problem (Polish special signs).
For me solution was to write a good font-family in html code.

white-space:nowrap does not work properly with Flying Saucer

I am using Flying Saucer to convert HTML documents to PDF. But there is a problem when I use <span style="white-space:nowrap">
Generally white-space:nowrap works fine. But when the span is near the right-margin of the document, it gets trimmed.
For example:
This html This is fine. <span style="white-space:nowrap">This is a test</span> gets converted to pdf like this:
which is perfect.
But when I use This is fine. This is also fine. <span style="white-space:nowrap">This is a test</span>, it gets converted to
Notice that part of span is trimmed because of right-margin.
What I expect is:
i.e. I expect the span to move to next line.
The code I am using to convert to pdf is:
String inputFile = "test.html";
String url = new File(inputFile).toURI().toURL().toString();
String outputFile = "firstdoc.pdf";
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
os.close();

Categories

Resources