Unicode character Issue - openhtmltopdf library - java

I use openhtmltopdf library (version: 0.0.1-RC15). I have a problem with unicode characters. In PDF file I see "#" symbols instead of "ă" and "ș".
How I can fix it? Thank you.

In case it helps, we're using com.openhtmltopdf:openhtmltopdf-pdfbox:1.0.10 and have found a way that works:
new PdfRendererBuilder()
.useFastMode()
.useFont(new File(main.class.getClassLoader().getResource("arial-unicode-ms.ttf").getFile()), "Arial Unicode MS")
.withW3cDocument(new W3CDom().fromJsoup(Jsoup.parse(html)), null)
.toStream(os)
.run();
In the html we have a <style> element that declares some css for the page, in particular:
font-family: "Arial Unicode MS";
Unicode characters then display as they should.

I am using openhtmltopdf library. I had character problem too. Turkish characters looks like (İÖĞ -> ###).
People already say its about font problem.
I add useFont method like this and download tff file in resource/fonts/
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFont(new File(Main.class.getClassLoader().getResource("fonts/ARIAL.TTF").getFile()), "Arial");

Related

UTF-8 emoji problem in PDF for Spring Boot

I am using Spring Boot to create and return PDF. There is an issue when my string content contains emoji and Unicode characters like "This is d£escript😭ion section😢😤😠😡🤬", then in downloaded PDF they are skipped. Can someone please help me to resolve this issue.
My code is like below
ITextRenderer renderer = new ITextRenderer();
ResourceLoaderUserAgent callback = new ResourceLoaderUserAgent(renderer.getOutputDevice());
callback.setSharedContext(renderer.getSharedContext());
renderer.getSharedContext().setUserAgentCallback(callback);
renderer.setDocumentFromString(pdfContent(templateId, pdfData));
renderer.layout();
renderer.createPDF(outputStream);
}
pdfContent(TemplateId templateId, Map<String, Object> pdfData) throws TemplateException,
IOException {
return FreeMarkerTemplateUtils
.processTemplateIntoString(freemarkerMailConfiguration.getTemplate(templateId.getValue()), pdfData);
}
The problem is that the font you use doesn't contain emojis, so they can't be rendered in the PDF. Unfortunately, I could not find a font that would cover all emojis. The best I could find is DejaVu, which cover some of the emojis in your example.
To use it,
you have to download the DejaVu font (you will find it easily on the internet).
include it in the rendering process (make sure you match the exact path of the file):
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("font/dejavu-sans/DejaVuSans.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
set the font in the HTML:
<html>
<head>
<meta charset="utf-8" />
<style>
body{font-family:"DejaVu Sans", sans-serif;}
</style>
</head>
<body>
<p>This is descript😭ion section😢😤😠😡🤬.</p>
</body>
</html>
Here is the result in the PDF:
Emoji symbols are problematic as symbols we can see that if we use one font with two styles (upper left) even in one font the symbols are not matched well so in upper style there is one missing and in lower style two look identical.
Converted to PDF (upper middle) they look reasonable on the surface graphic image however we see that when extracted text (upper right) the font styling was lost and there is only one glyph possible for each valid font character.
So the lower row is on left also as shown in modern notepad however the same system font is now applying the other style and if we extract those we get
😭😢😤😠😡🤬 as
Thus the way a font and its style of emoji symbols is generally not well supported by a font system but if we traverse via html it is much more consistent however the text is not text.
The best we might get is a poor hybrid of images of undefined CID characters which can be confusing as the characters are all the same.
������
������
So if you export the pdf as symbols with an image overlay there is no visual equivalence

How to display Chinese characters in java web applications?

I use Itext 5 to create pdf file. I refer to https://developers.itextpdf.com/examples/itext-action-second-edition/chapter-1 and get a pdf. When I open it, Chinese characters display normally.
But I develop web applications like https://developers.itextpdf.com/examples/itext-action-second-edition/chapter-9 described. Chinese characters is blank when pdf show in browser.
My font code is
String chFontPath = "c:\fonts\xxx.ttf";
BaseFont chBaseFont = BaseFont.CreateFont(chFontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(chBaseFont, 12);
Does anybody know?
If you embed the font using an absolute path, probably the path will be broken for any webapp you develop. Use a relative path instead for any embeddable (fonts, images, etc) so you can place them in server without any trouble.
I think Bruno's answer about a relative anchor could help you to set up a relative path for your font: https://stackoverflow.com/a/27064142/4048864

JavaFx html formatted text in pdf using iText with formatation

Is it possible to set a formatted HTML-Text (Color, Alignment, ...) from a HTMLEditor to an "editable" PDF using iText.
I didn't find anything on the internet.
Thanks.
The easiest way of doing this is (as Amedee suggested) using pdfHTML.
It's an iText7 add-on that converts HTML5 (+CSS3) into pdf syntax.
The code is pretty straightforward:
HtmlConverter.convertToPdf(
"<b>This text should be written in bold.</b>", // html to be converted
new PdfWriter(
new File("C://users/user2002/output.pdf") // destination file
)
);
To learn more, go to https://itextpdf.com/itext7/pdfHTML
I found a Solution in this post using The Flying Saucer: this

How to set a label text to a special character?

I need to set a label to have some special characters, I'm trying:
Label label = new Label();
label.setText("•");
label.setText("♦");
label.setText("★");
I'm not seeing the characters rendered though (firefox 17). The output html looks like this:
<div class="gwt-Label"></div>
Is there a different way we need to set the text to those characters?
I do no think it is possible with Label widget. You should be using HTML class which extends Label.
SafeHtmlBuilder builder = new SafeHtmlBuilder();
builder.appendEscaped("★");
HTML widget = new HTML();
widget.setHTML(builder.toSafeHtml());
RootPanel.get().add(widget);
Also ensure best practice of using SafeHtmlBuilder class.
You should make sure your source code uses UTF-8 encoding. If you are using Eclipse, you can set the default encoding in Windows -> Preferences.
The short version of this is: if you do absolutely everything in your GWT project and deployment using UTF-8 encoding, then all your special characters should work as expected.
If you have lots of existing files to convert, the JDK contains a little tool to convert your files for you.
You can use html codes: ★ for star(★), • for bullet (•) and so on.
http://www.quackit.com/html/html_special_characters.cfm

HTML content to pdf in JAVA [duplicate]

Does anyone know if it is possible to convert a HTML page (url) to a PDF using iText?
If the answer is 'no' than that is OK as well since I will stop wasting my time trying to work it out and just spend some money on one of a number of components which I know can :)
I think this is exactly what you were looking for
http://today.java.net/pub/a/today/2007/06/26/generating-pdfs-with-flying-saucer-and-itext.html
http://code.google.com/p/flying-saucer
Flying Saucer's primary purpose is to render spec-compliant XHTML and CSS 2.1 to the screen as a Swing component. Though it was originally intended for embedding markup into desktop applications (things like the iTunes Music Store), Flying Saucer has been extended work with iText as well. This makes it very easy to render XHTML to PDFs, as well as to images and to the screen. Flying Saucer requires Java 1.4 or higher.
I have ended up using ABCPdf from webSupergoo.
It works really well and for about $350 it has saved me hours and hours based on your comments above.
The easiest way of doing this is using pdfHTML.
It's an iText7 add-on that converts HTML5 (+CSS3) into pdf syntax.
The code is pretty straightforward:
HtmlConverter.convertToPdf(
"<b>This text should be written in bold.</b>", // html to be converted
new PdfWriter(
new File("C://users/mark/documents/output.pdf") // destination file
)
);
To learn more, go to http://itextpdf.com/itext7/pdfHTML
The answer to your question is actually two-fold. First of all you need to specify what you intend to do with the rendered HTML: save it to a new PDF file, or use it within another rendering context (i.e. add it to some other document you are generating).
The former is relatively easily accomplished using the Flying Saucer framework, which can be found here: https://github.com/flyingsaucerproject/flyingsaucer
The latter is actually a much more comprehensive problem that needs to be categorized further.
Using iText you won't be able to (trivially, at least) combine iText elements (i.e. Paragraph, Phrase, Chunk and so on) with the generated HTML. You can hack your way out of this by using the ContentByte's addTemplate method and generating the HTML to this template.
If you on the other hand want to stamp the generated HTML with something like watermarks, dates or the like, you can do this using iText.
So bottom line: You can't trivially integrate the rendered HTML in other pdf generating contexts, but you can render HTML directly to a blank PDF document.
Use itext libray:
Here is the sample code. It is working perfectly fine:
String htmlFilePath = filePath + ".html";
String pdfFilePath = filePath + ".pdf";
// create an html file on given file path
Writer unicodeFileWriter = new OutputStreamWriter(new FileOutputStream(htmlFilePath), "UTF-8");
unicodeFileWriter.write(document.toString());
unicodeFileWriter.close();
ConverterProperties properties = new ConverterProperties();
properties.setCharset("UTF-8");
if (url.contains(".kr") || url.contains(".tw") || url.contains(".cn") || url.contains(".jp")) {
properties.setFontProvider(new DefaultFontProvider(false, false, true));
}
// convert the html file to pdf file.
HtmlConverter.convertToPdf(new File(htmlFilePath), new File(pdfFilePath), properties);
Maven dependencies
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext7-core</artifactId>
<version>7.1.6</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>html2pdf</artifactId>
<version>2.1.3</version>
</dependency>
Use iText's HTMLWorker
Example
When I needed HTML to PDF conversion earlier this year, I tried the trial of Winnovative HTML to PDF converter (I think ExpertPDF is the same product, too). It worked great so we bought a license at that company. I don't go into it too in depth after that.

Categories

Resources