Encoding for FontFactor.getFont() [duplicate]

Encoding for FontFactor.getFont() [duplicate] - java

This question already has an answer here:
iText : Unable to print mathematical characters like ∈, ∩, ∑, ∫, ∆ √, ∠
(1 answer)
Closed 6 years ago.
Hiyas
I'm trying to display this string:
λλλλλλλλλλλλλλλλλλλλλλλλ
which is read from a RTF file, parsed and put into this variable. It is NOT used as constant in the code.
Font pdfFont = FontFactory.getFont(font.getFont().getName(), BaseFont.IDENTITY_H, embed, font.getFont().getSize2D(), style);
Phrase phrase = new Phrase("λλλλλλλλλλλλλλλλλλλλλλλλ", pdfFont);
ColumnText.showTextAligned(content[i], alignment, phrase, x, y, rotation);
I also tried CP1252 (and basically all the other encodings I found) together with a simple ArialMT.ttf font, but that damn string is never displayed. I can see that the conversion to the byte array inside iText (we use 5.5.0) always returns a null length byte array which explains why the text is not used, but I don't understand why. What encoding would I need to use to make this visible in a PDF?
Thanks a lot

I suppose that you want to get a result that looks like this:
That's easy. I first tried the SunCharacter example from the official documentation. That example was written in answer to the question: iText : Unable to print mathematical characters like ∈, ∩, ∑, ∫, ∆ √, ∠
I then changed the TEXT to:
public static final String TEXT = "Always use the Unicode notation for special characters: \u03bb";
As you can see, I don't use λ in my source code (that's bad practice). Instead I use \u03bb which is the Unicode notation of λ.
The result looked like this:
That's not what you want; you want ArialMT. So I changed the FONT to:
public static final String FONT = "c:/windows/fonts/arial.ttf";
This gave me the desired PDF.
This is the full code sample:
public class LambdaCharacter {
public static final String DEST = "results/fonts/lambda_character.pdf";
public static final String FONT = "c:/windows/fonts/arial.ttf";
public static final String TEXT = "Always use the Unicode notation for special characters: \u03bb";
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new LambdaCharacter().createPdf(DEST);
}
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
BaseFont bf = BaseFont.createFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font f = new Font(bf, 12);
Paragraph p = new Paragraph(TEXT, f);
document.add(p);
document.close();
}
}
I works just fine.
Maybe you aren't really using Arial. Maybe font.getFont().getName() doesn't give you the correct name of the font. Or maybe it gives you the correct name of the font, but you forgot to register the font. In that case, you will see that Helvetica is used. Helvetica can't render a lambda. You need Arial or Cardo-Regular or Arial Unicode or another font, as long as that font knows how to render a lambda.
If you don't know how to register a font, read:
How to load custom font in FontFactory.register in iText or
Creating fonts from *.ttf files using iText or
Using Fonts in System with iTextSharp or
Get list of supported fonts in ITextSharp or
Why is my font not applied when I create a PDF document? or... (there are just too many hits when I search for an answer to that question)

Related

Problem about font encoding in PDF/A generation

So here is my problem :
I'm currently working on an java application that will archive document in a PDF/A-1. I'm using PdfBox for pdf generation and when I can't generate a valid PDF/A-1 pdf, because of the font. The font is embedded inside the pdf file but this website : https://www.pdf-online.com/osa/validate.aspx tell me that this is not a valid PDF/A because of :
The key Encoding has a value Identity-H which is prohibited.
I look on internet on what is this Identity-H encoding and it seem that it's the way that font are encoded, like the ansi encoding.
I've already tried to get different font like Helvetica or arial unicode Ms but nothing work, there is alway this Identity-H encoding.I'm a bit lost with all this mess in encoding so if someone can explain me it'll be great. Also here is the code I write to embedded a font in the pdf :
// load the font as this needs to be embedded
PDFont font = PDType0Font.load(doc, getClass().getClassLoader().getResourceAsStream(fontfile), true);
if (!font.isEmbedded())
{
throw new IllegalStateException("PDF/A compliance requires that all fonts used for"
+ " text rendering in rendering modes other than rendering mode 3 are embedded.");
}
Thanks for your help :)

Problem solved :
I used the example of apache : CreatePDFA ( I have no clue why that work and not my code ) : Example in examples/src/main/java/org/apache/pdfbox/examples
I add to fit the PDF/A-3 requirement :
doc.getDocumentCatalog().setLanguage("en-US");
PDMarkInfo mark = new PDMarkInfo(); // new PDMarkInfo(page.getCOSObject());
PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
doc.getDocumentCatalog().setMarkInfo(mark);
doc.getDocumentCatalog().setStructureTreeRoot(treeRoot);
doc.getDocumentCatalog().getMarkInfo().setMarked(true);
PDDocumentInformation info = doc.getDocumentInformation();
info.setCreationDate(date);
info.setModificationDate(date);
info.setAuthor("KairosPDF");
info.setProducer("KairosPDF");
info.setCreator("KairosPDF");
info.setTitle("Generated PDf");
info.setSubject("PDF/A3-A");
Here is my code to embedded a file to the pdf :
private final PDDocument doc = new PDDocument();
private final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
private final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
private final Map<String, PDComplexFileSpecification> efMap = new HashMap<>();
public void addFile(PDDocument doc, File child) throws IOException {
File file = new File(child.getPath());
Calendar date = Calendar.getInstance();
//first create the file specification, which holds the embedded file
PDComplexFileSpecification fs = new PDComplexFileSpecification();
fs.setFileUnicode(child.getName());
fs.setFile(child.getName());
InputStream is = new FileInputStream(file);
PDEmbeddedFile ef = new PDEmbeddedFile(doc, is);
//Setting
ef.setSubtype("application/octet-stream");
ef.setSize((int) file.length() + 1);
ef.setCreationDate(date);
ef.setModDate(date);
COSDictionary dictionary = fs.getCOSObject();
dictionary.setItem(COSName.getPDFName("AFRelationship"), COSName.getPDFName("Data"));
fs.setEmbeddedFile(ef);
efMap.put(child.getName(), fs);
efTree.setNames(efMap);
names.setEmbeddedFiles(efTree);
doc.getDocumentCatalog().setNames(names);
is.close();
}
The only problem left is this error from the validation :
File specification 'Test.txt' not associated with an object.
Hope it'll help some.

Adding support to all international currency symbol to iTextPDF in Java [duplicate]

Friends am using itextpdf-5.3.4.jar for creating pdf. For showing rupee symbol am using custom font. I tried arial.ttf,arialbd.ttf both this font but no luck rupee symbol is not showing. For showing the rupee symbol i have followed these links but it's not working for me.
How to display indian rupee symbol in iText PDF in MVC3. This is the code I have used.
BaseFont rupee =BaseFont.createFont( "assets/arial .ttf", BaseFont.IDENTITY_H,BaseFont.EMBEDDED);
createHeadings(cb,495,60,": " +edt_total.getText().toString(),12,rupee);
private void createHeadings(PdfContentByte cb, float x, float y, String text, int size,BaseFont fb){
cb.beginText();
cb.setFontAndSize(fb, size);
cb.setTextMatrix(x,y);
cb.showText(text.trim());
cb.endText();
}
Please help me guys.

In the comment section, Funkystein wrote that the problem you describe is typical when
you are using a font which doesn't have that glyph. or
you aren't using the right encoding.
I have written an example that illustrates this: RupeeSymbol
public static final String DEST = "results/fonts/rupee.pdf";
public static final String FONT1 = "resources/fonts/PlayfairDisplay-Regular.ttf";
public static final String FONT2 = "resources/fonts/PT_Sans-Web-Regular.ttf";
public static final String FONT3 = "resources/fonts/FreeSans.ttf";
public static final String RUPEE = "The Rupee character \u20B9 and the Rupee symbol \u20A8";
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(DEST));
document.open();
Font f1 = FontFactory.getFont(FONT1, BaseFont.IDENTITY_H, BaseFont.EMBEDDED, 12);
Font f2 = FontFactory.getFont(FONT2, BaseFont.IDENTITY_H, BaseFont.EMBEDDED, 12);
Font f3 = FontFactory.getFont(FONT3, BaseFont.IDENTITY_H, BaseFont.EMBEDDED, 12);
Font f4 = FontFactory.getFont(FONT3, BaseFont.WINANSI, BaseFont.EMBEDDED, 12);
document.add(new Paragraph(RUPEE, f1));
document.add(new Paragraph(RUPEE, f2));
document.add(new Paragraph(RUPEE, f3));
document.add(new Paragraph(RUPEE, f4));
document.close();
}
The RUPEE constant is a String that contains the Rupee character as well as the Rupee symbol: "The Rupee character ₹ and the Rupee symbol ₨".
The characters are stored as Unicode values, because if we store the characters otherwise, they may not be rendered correctly. For instance: if you retrieve the values from a database as Winansi, you will end up with incorrect characters.
I test three different fonts (PlayfairDisplay-Regular.ttf, PT_Sans-Web-Regular.ttf and FreeSans.ttf)
and I use IDENTITY_H as encoding three times. I also use WINANSI a fourth time to show that it goes wrong if you do.
The result is a file named rupee.pdf:
As you can see, the first two fonts know how to draw the Rupee character. The third one doesn't. The first two fonts don't know how to draw the Rupee symbol. The third one does. However, if you use the wrong encoding, none of the fonts draw the correct character or symbol.
In short: you need to find a font that knows how to draw the characters or symbols you need, then you have to make sure that you are using the correct encoding (for the String as well as the Font).
You can download the full sample code here.

Render Type3 font character as image using PDFBox

In my project, I'm stuck with necessity to parse PDF file, that contains some characters rendered by Type3 fonts. So, what I need to do is to render such characters into BufferedImage for further processing.
I'm not sure if I'm looking in correct way, but I'm trying to get PDType3CharProc for such characters:
PDType3Font font = (PDType3Font)textPosition.getFont();
PDType3CharProc charProc = font.getCharProc(textPosition.getCharacterCodes()[0]);
and the input stream of this procedure contains following data:
54 0 1 -1 50 43 d1
q
49 0 0 44 1.1 -1.1 cm
BI
/W 49
/H 44
/BPC 1
/IM true
ID
<some binary data here>
EI
Q
but unfortunately I don't have any idea how can I use this data to render character into an image using PDFBox (or any other Java libraries).
Am I looking in correct direction, and what can I do with this data?
If not, are there some other tools that can solve such problem?

Unfortunately PDFBox out-of-the-box does not provide a class to render contents of arbitrary XObjects (like the type 3 font char procs), at least as far as I can see.
But it does provide a class for rendering complete PDF pages; thus, to render a given type 3 font glyph, one can simply create a page containing only that glyph and render this temporary page!
Assuming, for example, the type 3 font is defined on the first page of a PDDocument document and has name F1, all its char procs can be rendered like this:
PDPage page = document.getPage(0);
PDResources pageResources = page.getResources();
COSName f1Name = COSName.getPDFName("F1");
PDType3Font fontF1 = (PDType3Font) pageResources.getFont(f1Name);
Map<String, Integer> f1NameToCode = fontF1.getEncoding().getNameToCodeMap();
COSDictionary charProcsDictionary = fontF1.getCharProcs();
for (COSName key : charProcsDictionary.keySet())
{
COSStream stream = (COSStream) charProcsDictionary.getDictionaryObject(key);
PDType3CharProc charProc = new PDType3CharProc(fontF1, stream);
PDRectangle bbox = charProc.getGlyphBBox();
if (bbox == null)
bbox = charProc.getBBox();
Integer code = f1NameToCode.get(key.getName());
if (code != null)
{
PDDocument charDocument = new PDDocument();
PDPage charPage = new PDPage(bbox);
charDocument.addPage(charPage);
charPage.setResources(pageResources);
PDPageContentStream charContentStream = new PDPageContentStream(charDocument, charPage);
charContentStream.beginText();
charContentStream.setFont(fontF1, bbox.getHeight());
charContentStream.getOutput().write(String.format("<%2X> Tj\n", code).getBytes());
charContentStream.endText();
charContentStream.close();
File result = new File(RESULT_FOLDER, String.format("4700198773-%s-%s.png", key.getName(), code));
PDFRenderer renderer = new PDFRenderer(charDocument);
BufferedImage image = renderer.renderImageWithDPI(0, 96);
ImageIO.write(image, "PNG", result);
charDocument.close();
}
}
(RenderType3Character.java test method testRender4700198773)
Considering the textPosition variable in the OP's code, he quite likely attempts this from a text extraction use case. Thus, he'll have to either pre-generate the bitmaps as above and simply look them up by name or adapt the code to match the available information in his use case (e.g. he might not have the original page at hand, only the font object; in that case he cannot copy the resources of the original page but instead may create a new resources object and add the font object to it).
Unfortunately the OP did not provide a sample PDF. Thus I used one from another stack overflow question, 4700198773.pdf from extract text with custom font result non readble for my test. There obviously might remain issues with the OP's own files.

I stumbled upon the same issue and I was able to render Type3 font by modifying PDFRenderer and the underlying PageDrawer:
class Type3PDFRenderer extends PDFRenderer
{
private PDFont font;
public Type3PDFRenderer(PDDocument document, PDFont font)
{
super(document);
this.font = font;
}
#Override
protected PageDrawer createPageDrawer(PageDrawerParameters parameters) throws IOException
{
FontType3PageDrawer pd = new FontType3PageDrawer(parameters, this.font);
pd.setAnnotationFilter(super.getAnnotationsFilter());//as done in the super class
return pd;
}
}
class FontType3PageDrawer extends PageDrawer
{
private PDFont font;
public FontType3PageDrawer(PageDrawerParameters parameters, PDFont font) throws IOException
{
super(parameters);
this.font = font;
}
#Override
public PDGraphicsState getGraphicsState()
{
PDGraphicsState gs = super.getGraphicsState();
gs.getTextState().setFont(this.font);
return gs;
}
}
Simply use Type3PDFRenderer instead of PDFRendered. Of course if you have multiple fonts this needs some more modification to handle them.
Edit: tested with pdfbox 2.0.9

Why is the Gujarati-Indian text not rendered correctly using Arial Unicode MS?

This is a follow-up on this question How to export fonts in Gujarati-Indian Language to pdf?, #amedee-van-gasse, QA Engineer at iText asked me to post a question specific to itext with relevant mcve.
Why is this sequence of unicode \u0ab9\u0abf\u0aaa\u0acd\u0ab8 not rendered correctly?
It should be rendered like this:
હિપ્સ , also tested with unicode-converter
However this code (example adapted form iText: Chapter 11: Choosing the right font)
public class FontTest {
/** The resulting PDF file. */
public static final String RESULT = "fontTest.pdf";
/** the text to render. */
public static final String TEST = "\u0ab9\u0abf\u0aaa\u0acd\u0ab8";
public void createPdf(String filename) throws IOException, DocumentException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));
document.open();
BaseFont bf = BaseFont.createFont(
"ARIALUNI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(bf, 20);
ColumnText column = new ColumnText(writer.getDirectContent());
column.setSimpleColumn(36, 730, 569, 36);
column.addElement(new Paragraph(TEST, font));
column.go();
document.close();
System.out.println("DONE");
}
public static void main(String[] args) throws IOException, DocumentException {
new FontTest().createPdf(RESULT);
}
}
Generates this result:
That looks different from
હિપ્સ
I have test with itextpdf-5.5.4.jar,itextpdf-5.5.9.jar and also itext-2.1.7.js3.jar (distributed with jasper-reports)
The font used it the one distributes with MS Office ARIALUNI.TTF and it can be download from here Arial Unicode MS *Maybe there are some legal issues downloading see Mike 'Pomax' Kamermans comment

Neither iText5 nor iText2 (which is a very outdated version by the way) support rendering of Indic scripts, no matter which font you select.
Rendering Indic scripts is not similar to any Latin scripts, because a long series of additional actions should be taken to get the correct result, e.g. some characters need to be reordered first according to the language rules.
This is a known issue to iText company.
There is a stub implementation for Gujaranti in iText5 called GujaratiLigaturizer, but the implementation is really poor and you cannot expect to get correct results with it.
You can try to process your string with this ligaturizer and then output the resultant string in the following way:
IndicLigaturizer g = new GujaratiLigaturizer();
String processed = g.process(inputString);
// proceed with the processed string

Build your application using latest typography jar file that
Will solve your problem of Gujarati font rendering in pdf
In itext.

iText monospaced fonts in html view with bold, italic from css style

I am using html to generate PDF document, since its an EMR record I have to use Monospaced fonts.
PDF is getting generated fine, but css style for bold and italics are getting ignored, as I am using single .otf file for font hence no bold and italics.
I was wondering how to enable the same. Below are the code snippets.
Font Factory:
public static class MyFontFactory implements FontProvider,Serializable {
public Font getFont(String fontname,
String encoding, boolean embedded, float size,
int style, BaseColor color) {
BaseFont bf3 = null;
try {
bf3 = BaseFont.createFont("Inconsolata.otf",BaseFont.CP1252, BaseFont.EMBEDDED);
} catch (Exception e) {
e.printStackTrace();
}
return new Font(bf3, 6);
}
public boolean isRegistered(String fontname) {
return false;
}
}
PDF Generation Code:
public void createPdf(Object object) throws Exception, DocumentException{
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream(new File("test.pdf")));
// step 3
document.open();
// create extra properties
HashMap<String,Object> map = new HashMap<String, Object>();
map.put(HTMLWorker.FONT_PROVIDER, new MyFontFactory());
// step 4
String snippet;
// create the snippet
snippet = createHtmlSnippet(object);
Map<Object,Object> model = new HashMap<Object,Object>();
model.put("object", object);
StyleSheet css = new StyleSheet();
Map<String, String> stylemap = new HashMap<String, String>();
stylemap.put("font-style", "italic");
stylemap.put("font-size", "small");
stylemap.put("font-weight", "bold");
css.loadStyle("header",(HashMap<String, String>) stylemap);
css.loadStyle("strongClass", "text-decoration", "underline");
List<Element> objects = HTMLWorker.parseToList(new StringReader(snippet), css, map);
for (Element element : objects)
document.add(element);
// step 5
document.close();
}
In the above code css supplied does not produce any effect on output
as I mentioned due to single font defined, if I want to have bold and
italics how can that be achieved?
Really appreciate if anyone provides pointers or help regarding same.
Thanks.
Note: If I remove Monospaced fonts css gets applied.

You are confusing a font family with a font.
Inconsolata is a font family consisting of different fonts:
Inconsolata regular as defined in inconsolata.ttf
Inconsolata bold as defined in inconsolata-Bold.ttf
See http://code.google.com/p/googlefontdirectory/source/browse/ofl/inconsolata/
I didn't know of any bold, italic or bold-italic version because I assumed "there is no bold or italic for Inconsolata." And if there is no font program for other styles, you shouldn't expect iText to support those styles (*).
Then I found a repository with a TTF for the bold font: http://code.google.com/p/googlefontdirectory/source/browse/ofl/inconsolata/
Searching StackOverflow I read the question about Inconsolata Italic in MacVim on StackOverflow; unfortunately these fonts can't be used in iText.
(*) When a font doesn't support bold or italic, iText can mimic these styles by changing the render mode and/or the skew. However, you'll have better results by choosing another monospaced font.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Encoding for FontFactor.getFont() [duplicate] - java

Related

Problem about font encoding in PDF/A generation

Adding support to all international currency symbol to iTextPDF in Java [duplicate]

Render Type3 font character as image using PDFBox

Why is the Gujarati-Indian text not rendered correctly using Arial Unicode MS?

iText monospaced fonts in html view with bold, italic from css style

Categories

Resources