itext on tomcat print different fonts - java

I'm executing this code from Eclipse and on Tomcat into a webapp
FileInputStream is = new FileInputStream("C:/Users/admin/Desktop/dummy.txt");
try {
FontFactory.register("C:/Workspace/Osmosit/ReportManager/testSvn/ReportManagerCommon/src/main/java/com/osmosit/reportmanager/common/itext/fonts/ARIALUNI.TTF");
} catch (Exception e) {
e.printStackTrace();
}
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(1024);
Document document = new Document(PageSize.A4);
PdfWriter writer;
writer = PdfWriter.getInstance(document, byteArrayOutputStream);
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
byteArrayOutputStream.close();
FileOutputStream fos = new FileOutputStream("C:/Users/admin/Desktop/prova-web.pdf");
fos.write(byteArrayOutputStream.toByteArray());
fos.close();
the dummy.txt is a simple html with aranic and latin characters
<div style="font-family: Arial Unicode MS;" ><p>كما. أي مدن العدّ وقام test latin</p><br /></div>
When I run under eclipse I obtain a correct pd, when it runs on Tomcat I get this:
كما. أي مدن العدّ وقام test latin
PS: I'm using itextpdf ver 5.5.8

You have an encoding problem. Either you saved dummy.txt using the wrong encoding (e.g. as Latin-1 instead of as UTF-8), or you are reading dummy.txt using the wrong encoding.
See html to pdf convert, cyrillic characters not displayed properly and adapt the line in which you use parseHtml():
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
is, null, Charset.forName("UTF-8"), fontImp);
Take a look at the ParseHtml11 example to find out what fontImp is about.
You are also making another mistake: Arabic is read from right to left, and in your code, you aren't defining the run direction. See Arabic characters from html content to pdf using iText
In your case, I would put the Arabic text in a table and I would follow the ParseHtml7 example from the official documentation:
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
// Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/NotoNaskhArabic-Regular.ttf");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline pdf = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));
PdfPTable table = new PdfPTable(1);
PdfPCell cell = new PdfPCell();
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
for (Element e : elements) {
cell.addElement(e);
}
table.addCell(cell);
document.add(table);
// step 5
document.close();
}

Related

How to render special characters during HTML to pdf conversion using iText & XMLWorker?

Hi i am using iText & XMLWorker for HTML to pdf Conversion (Java) as below
public void convertHtmlToPdf(StringBuilder content, String path) throws Exception {
String methodName = "convertHtmlToPdf";
try {
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("C:/Users/Aaryan/Downloads/arial.ttf");
final OutputStream file = new FileOutputStream(new File(path));
final Document document = new Document();
final PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
final TagProcessorFactory tagProcessorFactory = Tags.getHtmlTagProcessorFactory();
tagProcessorFactory.removeProcessor(HTML.Tag.IMG);
tagProcessorFactory.addProcessor(new ImageTagProcessor(), HTML.Tag.IMG);
final CssFilesImpl cssFiles = new CssFilesImpl();
cssFiles.add(XMLWorkerHelper.getInstance().getDefaultCSS());
final StyleAttrCSSResolver cssResolver = new StyleAttrCSSResolver(cssFiles);
final HtmlPipelineContext hpc = new HtmlPipelineContext(new CssAppliersImpl(fontProvider));
hpc.setAcceptUnknown(true).autoBookmark(true).setTagFactory(tagProcessorFactory);
final HtmlPipeline htmlPipeline = new HtmlPipeline(hpc, new PdfWriterPipeline(document, writer));
final Pipeline<?> pipeline = new CssResolverPipeline(cssResolver, htmlPipeline);
final XMLWorker worker = new XMLWorker(pipeline, true);
final Charset charset = Charset.forName("UTF-8");
final XMLParser xmlParser = new XMLParser(true, worker, charset);
InputStream is2 = new ByteArrayInputStream(content.toString().getBytes());
xmlParser.parse(is2, charset);
is2.close();
document.close();
file.close();
} catch (Exception ex) {
System.out.println("Exception in Class::" + className + "::Method::" + methodName + "::" + ex.getMessage());
ex.printStackTrace();
throw new Exception(ex);
}
}
PDFGeneration works Fine. The HTML content parsed for pdfConversion has special characters as appropiate entities as below
StringBuilder content = new StringBuilder();
content.append("<html><body style=\"font-size:12.0pt; font-family:Arial\">
<p>Testes → → Vasa efferentia → Kidney → Seminal Vescile</p></body></html>");
The Generated pdf displays '?' instead appropiate special characters (arrow symbols) . "Testes ?? Vasa efferentia ? Kidney ? Seminal Vescile ". Where am i going wrong. Please guide me on this.
The solution has almost nothing to do with the code/classes/objects...
You need to set the CSS "font-family" with something matching your requested output char-set
for example, if you have your special characters inside the 'p' html tag, then you can set the below style with desired font-family:
<HEAD>
<style>
p {
font-family: -good-font-family-
}
</style>
</HEAD>
This site might help you w3schools, but try to replace → with →

HTML to PDF using Itext (checkbox, radio ) not rendered

We are coverting HTML to PDF using Itext and xmlworker 5.5.5. Following is the code. Issue is that we are not able to see radio or checkboxes rendered on pdf. What additional things that are needed for checkbox and radio box ?
cssStr - Containing all the css classes. Its string.
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(destFile));
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
// step 3
document.open();
// step 4 - Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new ByteArrayInputStream(cssStr.getBytes()));
cssResolver.addCss(cssFile);
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register(OLTContext.getWebappDir()+"/bs/fonts/ARIALUNI.TTF", BaseFont.IDENTITY_H);
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
ByteArrayInputStream stream = new ByteArrayInputStream(printable.getBytes("UTF-8"));
p.parse(stream, Charset.forName("UTF-8"));
// step 5
document.close();
Also I tried itext 7 with trial license of pdfHtml. Radio & Checkboxes or basically any input elements of HTML are not rendered at all. Following code
LicenseKey.loadLicenseFile("/Users/ashish/server-ws/workspac‌​e/Test/lib/itextkey-‌​0.xml"); HtmlConverter.convertToPdf(new File(HTML), new File(DEST));
Unless you have a very good reason to be using XMLWorker, I suggest you try pdfHtml. It's an addon we released for iText7 that offers support for HTML5 and CSS3.
Trial license can be obtained for free at the iText website. And the next release of pdfHtml should be AGPL licensed and open source (we are currently doing some final code cleanup).

Java iText using Lithuanian letters

I am trying to create PDF file with java and I need to use Lithuanian letters within file. I tried to use html code and use htmlWorker to parse it, however it does not work on most letters(it works on some). If anyone could help me with this I would gladly appreciate it.
try {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(pathy));
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
String s = ("<html>ĄąĘęŪūČŠ"
+ "ŽčšžĖėĮįŲų</html>");
htmlWorker.parse(new StringReader(s));
document.close();
}
catch(Exception e2){
}
I solved my issue by using unicode, not sure why it doesn't work with html code still...
Document document = new Document();
try {
PdfWriter.getInstance(document, new FileOutputStream(pathy));
document.open();
BaseFont bfComic = BaseFont.createFont("c:\\windows\\fonts\\arial.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
document.add(new Paragraph("ąĄčČęĘėĖįĮšŠųŲūŪžŽ", new Font(bfComic, 12)));
} catch (Exception e2) {
System.err.println(e2.getMessage());
}
document.close();

function that can use iText to concatenate / merge pdfs together - causing some issues

I'm using the following code to merge PDFs together using iText:
public static void concatenatePdfs(List<File> listOfPdfFiles, File outputFile) throws DocumentException, IOException {
Document document = new Document();
FileOutputStream outputStream = new FileOutputStream(outputFile);
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
PdfContentByte cb = writer.getDirectContent();
for (File inFile : listOfPdfFiles) {
PdfReader reader = new PdfReader(inFile.getAbsolutePath());
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
document.newPage();
PdfImportedPage page = writer.getImportedPage(reader, i);
cb.addTemplate(page, 0, 0);
}
}
outputStream.flush();
document.close();
outputStream.close();
}
This usually works great! But once and a while, it's rotating some of the pages by 90 degrees? Anyone ever have this happen?
I am looking into the PDFs themselves to see what is special about the ones that are being flipped.
There are errors once in a while because you are using the wrong method to concatenate documents. Please read chapter 6 of my book and you'll notice that using PdfWriter to concatenate (or merge) PDF documents is wrong:
You completely ignore the page size of the pages in the original document (you assume they are all of size A4),
You ignore page boundaries such as the crop box (if present),
You ignore the rotation value stored in the page dictionary,
You throw away all interactivity that is present in the original document, and so on.
Concatenating PDFs is done using PdfCopy, see for instance the FillFlattenMerge2 example:
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
document.open();
PdfReader reader;
String line = br.readLine();
// loop over readers
// add the PDF to PdfCopy
reader = new PdfReader(baos.toByteArray());
copy.addDocument(reader);
reader.close();
// end loop
document.close();
There are other examples in the book.
In case anyone is looking for it, using Bruno Lowagie's correct answer above, here is the version of the function that does not seem to have the page flipping issue i described above:
public static void concatenatePdfs(List<File> listOfPdfFiles, File outputFile) throws DocumentException, IOException {
Document document = new Document();
FileOutputStream outputStream = new FileOutputStream(outputFile);
PdfCopy copy = new PdfSmartCopy(document, outputStream);
document.open();
for (File inFile : listOfPdfFiles) {
PdfReader reader = new PdfReader(inFile.getAbsolutePath());
copy.addDocument(reader);
reader.close();
}
document.close();
}

pdfwriter doesn't translate special characters

I have HTML file with an external CSS. I want to create PDF from the HTML file, but the endcoing doesn't work. HTML file works fine, but after transfering to PDF, some characters in PDF are missing. (čřě...) It happens even if I set the Charset in PDFWriter constructor.
How do I solve this, please?
public void createPDF() {
try {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(username + ID + ".pdf"));
document.open();
String hovinko = username + ID + ".html";
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(hovinko), Charset.forName("UTF-8"));
document.close();
System.out.println("PDF Created!");
} catch (Exception ex) {
ex.printStackTrace();
}
}
Did you try to convert your special characters before writing them to your PDF?
yourHTMLString.replaceAll(oldChar, newChar);
ć = ć
ř = ř
ě = ě
If you need more special characters, visit this link.
EDIT: Then try this out, it worked for me:
BaseFont basefont = BaseFont.createFont("C:/Windows/Fonts/ARIAL.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(basefont, 12);
document.add(new Paragraph("čřě", font));
Try it with below logic. It worked for me:
InputStream is = new ByteArrayInputStream(hovinko.getBytes(Charset.forName("UTF-8")));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is, Charset.forName("UTF-8"));
I used xmlworker version 5.5.12 and itextpdf version 5.5.12.
I was strugling with sam problem (Polish special signs).
For me solution was to write a good font-family in html code.

Categories

Resources