How to Write HTML Content in PDF file using PDFBox

How to Write HTML Content in PDF file using PDFBox - java

I want to write HTML content in PDF using PDFBox using JAVA. How can I write it ? Is there any method by which I can add HTML Content ? There are different add methods but not able to add HTML content.

There is no html rendering support yet in pdfbox as of 2.0.6. But heard about few commitments on this feature in their future releases.

you can do same thing using IText also, use this code.
import java.io.FileOutputStream;
import java.io.StringReader;
import com.lowagie.text.Document;
import com.lowagie.text.PageSize;
import com.lowagie.text.html.simpleparser.HTMLWorker;
import com.lowagie.text.pdf.PdfWriter;
public class Test {
public static void main(String ... args ) {
try {
Document document = new Document(PageSize.LETTER);
PdfWriter.getInstance(document, new FileOutputStream("E:\\yogesh\\test.pdf"));
document.open();
document.addAuthor("author");
document.addSubject("subject");
document.addCreationDate();
document.addTitle("title");
HTMLWorker htmlWorker = new HTMLWorker(document);
String str = "<html><head></head><body>"+
"<table border='1'><tr><td>Demo<td>" +
"<td bgcolor='red'>DEMO<td></tr>DEMO</table>" +
"</body></html>";
htmlWorker.parse(new StringReader(str));
document.close();
System.out.println("Done");
}
catch (Exception e) {
e.printStackTrace();
}
}
}

Related

PDF Producer properties getting changed when I use merge of Itext7 for FOP Generated PDF Statements

I've PDF Statements getting generated using Apache FOP 2.6. For those statements when I go to the properties, it tells the producer is Apache FOP like the one in the below screen
Now I want to merge those statements to create a single PDF File. For that I used PdfMerger from Itext7. But when I see the properties, the producer property changes to Itext7 like the one in the below screen
Here's the code I've used.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.utils.PdfMerger;
public class ApacheFOPExample {
public static void main(String[] args) {
mergePdfs();
}
public static void mergePdfs() {
try {
PdfDocument pdf = new PdfDocument(new PdfWriter(new File("D:\\Workspace\\Itext\\xsl_employee_merged.pdf")));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(new File("D:\\Workspace\\Itext\\xsl_employee.pdf")));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(new File("D:\\Workspace\\Itext\\xsl_employee_another.pdf")));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
firstSourcePdf.close();
secondSourcePdf.close();
pdf.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
I want to merge or copy multiple multiple statements as a single PDF but with the properties intact. How can I achieve it? To merge, is there any alternative way that I can follow to achieve this?
Note: I can't use Itext5 classes to do this.
Thank You

How can I add pTab elements to docx4j while converting document to pdf

I'm getting some error while converting document to pdf using docx4j library in Java. Sadly, my error is this
NOT IMPLEMENTED support for w:pict without v:imagedata
and it's showing up on the converted pdf instead of displaying the error in my java terminal.
I have gone through some article and questions,thus found this converting docx to pdf . However, I am uncertain how to use this in my code or convert it. This is my code :
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.List;
import java.util.Map;
import org.docx4j.convert.out.pdf.viaXSLFO.PdfSettings;
import org.docx4j.fonts.PhysicalFont;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.model.structure.SectionWrapper;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
public class docTopdf {
public static void main(String[] args) {
try {
InputStream is = new FileInputStream(
new File(
"test.docx"));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(is);
List<SectionWrapper> sections = wordMLPackage.getDocumentModel().getSections();
for (int i = 0; i < sections.size(); i++) {
wordMLPackage.getDocumentModel().getSections().get(i)
.getPageDimensions();
}
PhysicalFonts.discoverPhysicalFonts();
#Deprecated
Map<String, PhysicalFont> physicalFonts = PhysicalFonts.getPhysicalFonts();
// 2) Prepare Pdf settings
#Deprecated
PdfSettings pdfSettings = new PdfSettings();
// 3) Convert WordprocessingMLPackage to Pdf
#Deprecated
org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
wordMLPackage);
#Deprecated
OutputStream out = new FileOutputStream(
new File(
"test.pdf"));
conversion.output(out, pdfSettings);
} catch (Throwable e) {
e.printStackTrace();
}
}
}
And my pom.xml
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j</artifactId>
<version>3.2.1</version>
</dependency>
any help would be appreciated as I am noob to this conversion. Thanks in advance

Creating a PDF via XSL FO doesn't support w:pict without v:imagedata (ie a graphic which isn't a simple image).
Whilst you could suppress the message by configuring logging appropriately, your PDF output would be lossy.
Your options are to correct the input docx (ie use an image instead of whatever you currently have), or to use a PDF converter with appropriate support. For one option, see https://www.docx4java.org/blog/2020/03/documents4j-for-pdf-output/

How to write HTML text with Marathi text to PDF document using docx4j?

I am using docx4j to create PDF documents from the HTML text. The HTML text has some English and Marathi text in it. English text comes properly in the pdf. but the marathi text is not displayed in the generated pdf.
In place of text, it shows square boxes.
Below is the code I am using.
import java.io.FileOutputStream;
import org.docx4j.Docx4J;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
public class ConvertInXHTMLFragment {
static String DEST_PDF = "/home/Downloads/Sample.pdf";
public static void main(String[] args) throws Exception {
// String content = "<html>Hello</html>";
String content = "<html>पासवर्ड</html>";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert(content, null));
Docx4J.toPDF(wordMLPackage, new FileOutputStream(DEST_PDF));
}
}
EDIT 1:-
This is from one of the samples from XSLFO
import java.io.OutputStream;
import org.docx4j.Docx4J;
import org.docx4j.convert.out.FOSettings;
import org.docx4j.fonts.IdentityPlusMapper;
import org.docx4j.fonts.Mapper;
import org.docx4j.fonts.PhysicalFont;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.model.fields.FieldUpdater;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.samples.AbstractSample;
public class ConvertOutPDFviaXSLFO extends AbstractSample {
static {
inputfilepath = "/home/Downloads/100.docx";;
saveFO = true;
}
static boolean saveFO;
public static void main(String[] args)
throws Exception {
try {
getInputFilePath(args);
} catch (IllegalArgumentException e) {
}
String regex = null;
PhysicalFonts.setRegex(regex);
WordprocessingMLPackage wordMLPackage;
System.out.println("Loading file from " + inputfilepath);
wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));
FieldUpdater updater = null;
Mapper fontMapper = new IdentityPlusMapper();
wordMLPackage.setFontMapper(fontMapper);
PhysicalFont font = PhysicalFonts.get("Arial Unicode MS");
fontMapper.put("Mangal", font);
FOSettings foSettings = Docx4J.createFOSettings();
if (saveFO) {
foSettings.setFoDumpFile(new java.io.File(inputfilepath + ".fo"));
}
foSettings.setWmlPackage(wordMLPackage);
String outputfilepath;
if (inputfilepath==null) {
outputfilepath = System.getProperty("user.dir") + "/OUT_FontContent.pdf";
} else {
outputfilepath = inputfilepath + ".pdf";
}
OutputStream os = new java.io.FileOutputStream(outputfilepath);
Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
System.out.println("Saved: " + outputfilepath);
if (wordMLPackage.getMainDocumentPart().getFontTablePart()!=null) {
wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
}
// This would also do it, via finalize() methods
updater = null;
foSettings = null;
wordMLPackage = null;
}
}
Now, I get #### in place of Marathi texts in the output PDF.

Docx4j v3.3 supports PDF output via 2 completely different ways.
The default is to use Plutext's PDF Converter. Things work if the mangal font you linked to is installed in the Conveter, and specified in the docx:
<w:r>
<w:rPr>
<w:rFonts w:ascii="mangal" w:eastAsia="mangal" w:hAnsi="mangal" w:cs="mangal"/>
</w:rPr>
<w:t>पासवर्ड</w:t>
</w:r>
Same would apply for Arial Unicode MS.
The other way is PDF via XSL FO; see https://github.com/plutext/docx4j-export-FO
If you have the relevant font installed it should just work. If you don't, then you need to tell it which font to use.
For example, suppose the docx specifies the mangal font, which I do not have. But I have Arial Unicode MS. So I tell the XSL FO process to use that instead:
fontMapper.put("mangal", PhysicalFonts.get("Arial Unicode MS"));
Note, you need to know which font your docx is specifying, and how to make specify the font you want. To do that in XHTML Import, copied from my answer to your earlier question:-
Fonts are handled by
https://github.com/plutext/docx4j-ImportXHTML/blob/master/src/main/java/org/docx4j/convert/in/xhtml/FontHandler.java#L58
Marathi might be relying on one of the other attributes in the RFonts
object. You'll need to look at a working docx to see. You can use
https://github.com/plutext/docx4j-ImportXHTML/blob/master/src/main/java/org/docx4j/convert/in/xhtml/FontHandler.java#L54
to inject a suitable font mapping.

PDF Creation for General Health report (sample image attached)

enter image description here
I want to generate the PDF for Health Report on monthly basis for individual person , can any body help to create the PDF .
The sample PDF attached.

you should use iText API for work on PDF creation.

import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import java.io.FileOutputStream;
public class ImageExample {
public static void main(String[] args) {
Document document = new Document();
try {
PdfWriter.getInstance(document,
new FileOutputStream("Image.pdf"));
document.open();
Image image1 = Image.getInstance("watermark.png");
document.add(image1);
String imageUrl = "IPaddress/sitename/images/" +
"imagename.jpg";
Image image2 = Image.getInstance(new URL(imageUrl));
document.add(image2);
document.close();
} catch(Exception e){
e.printStackTrace();
}
}
}

HTML to PDF using iText : How can produce a checkbox

I have a simple HTML page, iText is able to produce a PDF from it. It's fine but the checkbox is ignored. What can I do about it ?
import java.io.FileOutputStream;
import java.io.StringReader;
import com.itextpdf.text.Document;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.html.simpleparser.HTMLWorker;
import com.itextpdf.text.pdf.PdfWriter;
public class HtmlToPDF {
public static void main(String ... args ) {
try {
Document document = new Document(PageSize.LETTER);
PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream("c://temp//testpdf.pdf"));
document.open();
String str = "<HTML><HEAD></HEAD><BODY><H1>Testing</H1><FORM>" +
"check : <INPUT TYPE='checkbox' CHECKED/><br/>" +
"</FORM></BODY></HTML>";
htmlWorker.parse(new StringReader(str));
document.close();
System.out.println("Done.");
}
catch (Exception e) {
e.printStackTrace();
}
}
}
I got it working with YAHP ( http://www.allcolor.org/YaHPConverter/ ).
import java.io.File;
import java.io.FileOutputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
// http://www.allcolor.org/YaHPConverter/
import org.allcolor.yahp.converter.CYaHPConverter;
import org.allcolor.yahp.converter.IHtmlToPdfTransformer;
public class HtmlToPdf_yahp {
public static void main(String ... args ) throws Exception {
htmlToPdfFile();
}
public static void htmlToPdfFile() throws Exception {
CYaHPConverter converter = new CYaHPConverter();
File fout = new File("c:/temp/x.pdf");
FileOutputStream out = new FileOutputStream(fout);
Map properties = new HashMap();
List headerFooterList = new ArrayList();
String str = "<HTML><HEAD></HEAD><BODY><H1>Testing</H1><FORM>" +
"check : <INPUT TYPE='checkbox' checked=checked/><br/>" +
"</FORM></BODY></HTML>";
properties.put(IHtmlToPdfTransformer.PDF_RENDERER_CLASS,
IHtmlToPdfTransformer.FLYINGSAUCER_PDF_RENDERER);
//properties.put(IHtmlToPdfTransformer.FOP_TTF_FONT_PATH, fontPath);
converter.convertToPdf(str,
IHtmlToPdfTransformer.A4P, headerFooterList, "file://c:/temp/", out,
properties);
out.flush();
out.close();
}
}

Are you generating the HTML?
If so, then instead of using an HTML checkbox you could using the Unicode 'ballot box' character, which is ☐ or ☐. It's just a box, you can't electronically tick it or untick it; but if the PDF is intended for printing then of course people can tick it using a pen or pencil.
For example:
String str = "<HTML><HEAD></HEAD><BODY><H1>Testing</H1><FORM>" +
"check : ☐<br/>" +
"</FORM></BODY></HTML>";
Note that this will only work if you're using a Unicode font in your PDF; I think that iText won't use a Unicode font unless you tell it to.

You may be out of luck here.
The "htmlWorker" which is used to parse the html tags, doesn't seem to support the "input" tag.
public static final String tagsSupportedString = "ol ul li a pre font span br p div body table td th tr i b u sub sup em strong s strike h1 h2 h3 h4 h5 h6 img";
You can access the source code for "HtmlWorker" from here.
http://www.java2s.com/Open-Source/Java-Document/PDF/pdf-itext/com/lowagie/text/html/simpleparser/HTMLWorker.java.htm
It is from this source that I figured that out.
public void startElement(String tag, HashMap h) {
if (!tagsSupported.containsKey(tag))
return; //return if tag not supported
// ...
}

creating pdfs with iText from html is a bit troubled.
i advise to use the flying saucer library for this.
it is also using iText in the background.

The only alternative I'm aware of at that point is to hack iText. The new XMLWorker should be considerably more extensible than The Old Way (HTMLWorker), but it'll still be Non Trivial.
There might be some magic style tag you can pass in that will show up in a "generic tag" for a PdfPageEventHandler... lets see here...
Reading the code, it looks like a style or attribute "generictag" will be propagated to the ...text.Chunk object via setGenericTag().
So what you need to do is XSLT your unsupported tags into div/p/whatever with a "generictag" attribute that is a string which encodes the information you need to recreate the original element.
In your PdfPageEventHandler's OnGenericTag function, you have to parse that tag and recreate whatever it is you're trying to recreate.
That's just crazy enough to work!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to Write HTML Content in PDF file using PDFBox - java

I want to write HTML content in PDF using PDFBox using JAVA. How can I write it ? Is there any method by which I can add HTML Content ? There are different add methods but not able to add HTML content.

There is no html rendering support yet in pdfbox as of 2.0.6. But heard about few commitments on this feature in their future releases.

Related

PDF Producer properties getting changed when I use merge of Itext7 for FOP Generated PDF Statements

How can I add pTab elements to docx4j while converting document to pdf

How to write HTML text with Marathi text to PDF document using docx4j?

PDF Creation for General Health report (sample image attached)

HTML to PDF using iText : How can produce a checkbox

Categories

Resources