How do I set an IText7 fallback font?

How do I set an IText7 fallback font? - java

How do I set an IText7 fallback font? That is: A font which is used as a fallback font in cases where the primary font don't have a specific code point.
My original code just used PdfFontFactory.createFont(...) to get a PdfFont, and then I just called setFont on the Text element where I wanted to change font. This worked as in the text was shown with the expected font, but I could not find any way to specify a fallback font.
So I changed my code, so now I start by creating a new FontProvider and then call addFont(...) with all the fonts I need, including the fallback font.
And then I add this font provider to the document.
I then call setFontFamily on the Text objects where I want to change the font, and this works. Including fallback to the first added font, in cases where the primary font don't have the specified codepoint.
But is this the correct way to handle this issue?
This solution does have one new problem. It always embed the fonts in the pdf documents and I can't find a way to prevent this. None of the addFonts or setFontFamily methods have a flag to specify if the font should be embeded.
The FontProvider does have a method called getDefaultEmbeddingFlag() which always return true, so I tried to create a new Class which extended FontProvider and changed getDefaultEmbeddingFlag to just return false. But even when I use this class as FontProvider the font still get embeded.
So I guess my main question is: How do I enable fallback to a different font for unknown code points, while still having the ability not to embed fonts?.

Related

Java: using deriveFont does not change font size

I am loading a custom font (from ttf file) into my project, and using deriveFont(float f) to change the size. However, the size is not actually being set (stuck at 1). Here is my code:
public static void main(String[] args) {
GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
try {
Font mont =
Font.createFont(
Font.TRUETYPE_FONT,
new File(System.getProperty("user.dir") + "/data/Montserrat-MediumItalic.ttf"))
.deriveFont(20f);
ge.registerFont(mont);
Arrays.stream(ge.getAllFonts())
.filter(font -> font.getFontName().contains("Mont"))
.forEach(font -> System.out.println(font.getFontName() + ", Size: " + font.getSize()));
} catch (FontFormatException | IOException e) {
e.printStackTrace();
}
}
output:
Montserrat Medium Italic, Size: 1
note: replacing font.getSize() with font.getSize2D prints 1.0.
New: Using decode:
I am now using this
Font test = Font.decode("Montserrat Medium Italic-ITALIC-20");
(fixed class not loading)
Update 2:
this line:
Font mont = Font.createFont(Font.ITALIC, new File(System.getProperty("user.dir") + "/data/Montserrat-MediumItalic.ttf"));
throws IllegalArgumentException: font format not recognized

However, the size is not actually being set (stuck at 1).
This seems unlikely to be the case. I asked for direct confirmation in a comment on the question ("What mont.getSize() return?" -- oops, what embarrassingly bad grammar), but so far you have not answered. I am reasonably confident that if you check, you will see that mont.getSize() evaluates to the size you requested.
An alternative explanation for your observed behavior is readily available. You are using GraphicsEnvironment.getAllFonts() to report on the registered fonts, but according to its documentation, this method
Returns an array containing a one-point size instance of all fonts
available in this GraphicsEnvironment.
(Emphasis added.)
Another answer and especially comments on it suggest that the Font objects returned by GraphicsEnvironment.getAllFonts() might differ in other ways, too, from corresponding Font instances passed to GraphicsEnvironment.registerFont(). Although such variations are not documented as far as I can see, they are consistent with the intended usage of Font objects obtained from a GE, as the getAllFonts() docs describe:
Typical usage would be to allow a user to select a particular font. Then, the application can size the font and set various font attributes by calling the deriveFont method on the chosen instance.
They go on to say that
If a font in this GraphicsEnvironment has multiple programmable variations, only one instance of that Font is returned in the array, and other variations must be derived by the application.
I'm not positive that "multiple programmable variations" means attributes that can be modified when you derive one Font object from another (for then what font wouldn't have programmable variations?), but it is clear that getAllFonts() is not a mechanism for reading back the exact Font objects previously presented to GraphicsEnvironment.registerFont(). Those objects might not even be retained as such.
On the other hand, you can perhaps be relieved that you are not responsible for registering all the different font variations you may want in advance.

I think I know what is the problem. When you call ge.registerFont(mont), it does exactly that, it registers the underlying font, without the modifications to the font object. The deriveFont() function only changes the state of the current font object, but cannot modify the actual font. When you register a font, it is registered with the size of 1. If you print the size of all the other fonts, you will find that they also have the default value of 1. I do not think that you can register a font with a custom default size, or override the default size of Font.getFont(). When you get a font by using Font.getFont(), it will always have the default size of 12 (from the specification).
If you need to have a the font specially formatted, I would suggest creating a static class variable:
Font MontMediumItalic_20;
Then load the font once, either in a resource loader, or the constructor, and apply all the modifications to it.
Alternatively, you can also use Font.decode()
Please let me know if you need any help.

re-embed subset font in pdf with itext 7

I have some input PDF all with full set fonts, I want to "shrink" them all creating fonts subset. I know there is the way to unembed fonts and embed subset font, but the problem is that i don't have the source file of fonts. I just have fonts embedded in source PDF.
Someone can help me to troubleshoot this issue ?
ENV: java8, itext7.1.5

Here's a thread on a similar question (about embedding, not subsetting, despite the OP's question): How to subset fonts into an existing PDF file. The following statement is relevant:
If you want to subset it, you'd need to parse all the content streams
in the PDF to find out which glyphs are used. That's NOT a trivial
task.
I wouldn't recommend attempting this in iText, unless it's really necessary. It would likely end up buggy unless you have a very complete understanding of the PDF specs. It might be worth pursuing other avenues such as changing the way the PDFs are created, or use something like Distiller that can do this for you.
If you do want to do this in iText, I'm afraid you will likely have to use a PdfCanvasProcessor and some custom operator handlers. You would need to find all text fields, determine which font they use, build a new subset font with the applicable glyphs, and replace the fonts with new subset copies. This is how you would create a copy of the complete font to prepare for subsetting (assuming you don't have copies of the font files):
String encoding = PdfEncodings.WINANSI; // or another encoding if needed for more glyph support
PdfFont completeFont = ...; // get complete font from font dictionary
PdfFont subsetFont = PdfFontFactory.createFont(completeFont.getFontProgram(), encoding, true);
subsetFont.setSubset(true);
When you encounter a Font change operator (Tf), you would need to look up that font in the font dictionary and create a new (or lookup an already created) subset copy of that font to prepare for upcoming text fields. Don't forget to keep the font in a stack so you can pop back to the previous font (look for q and Q operators). And don't forget to check parent forms and page groups for the fonts if they don't exist in the current XObject or page resource dictionary.
When you encounter text (a Tj, TJ, ', or " operator), you would need to decode the text using the complete font, then re-encode it to the new subset font's encoding (unless you know for sure that all your source fonts are ASCII-compatible). Add that text's characters to the subset like this:
subsetFont.addSubsetRange(new int[]{character});

Detect if a PDF is created from a scanned document using OCR [pdfbox]

I would like to know if a PDF was created from a scanned document using OCR.
To make the text from the scanned document selectable, I guess the same text is written using a transparent color, a special font, ...
I'm using pdfbox and I looked at the font, the color, and many other properties and I didn't find anything special.

In my case the text rendering mode was set to "Neither fill nor stroke text".
pdfbox code:
getGraphicsState().getTextState().getRenderingMode() == PDTextState.RENDERING_MODE_NEITHER_FILL_NOR_STROKE_TEXT

In most cases, the original image is still present, and the OCRd text is invisible underneath.
So, one possibility would be finding out whether there is a picture covering all the area with text.
Another possibility would be looking at the fonts and make some smart decisions based on them

TrueType Collection (.ttc) set as font in Java Swing

I know that Java supports TrueType Fonts (.ttf) and that .ttc is extension of TrueType format, but i can't find information that Java also supports the TrueType collection (.ttc) to be explicitly set as font on JLabel for example.
I made an example, where I successfully load a .ttc file in my application with the following code:
InputStream is = getClass().getResourceAsStream("/resources/simsun.ttc");
Font font = Font.createFont(Font.TRUETYPE_FONT, is);
Font fontBase = font.deriveFont(15f);
field.setFont(fontBase);
The code is working well, there are no exceptions related to the creation, loading or setting of the .ttc file as a font in Swing components.
My question is: Can someone confirm this to be working well and that all glyphs from the fonts inside the .ttc are used in components, or there are any disadvantages related to this?
Also, is there any difference if the .ttc is loaded from jar on client machine or it has to be installed in system fonts?
I'm using Windows 7.

First of all, the difference between TTC and TTF is: TTC can (and usually) contain multiple fonts, but TTF only have font defined. The reason to put multiple font into one file is to save space by share glyphs (or sub glyphs). For example, in SimSun and NSimSun, most of glyphs are same, save them together can save lots of space.
Second, Java support TTC font format, but by using Font.createFont() you can only get the first font defined in the TTC file. Currently, there is no way to specify the font index. Take a look at sun.font.FontManager.createFont2D(), when they invoke new TrueTypeFont(), the fontIndex is alway zero. Shame!
For your question: if all you need is the first font in TTC file, then everything would be okay. All the glyphs defined for first font would be available. But, if you expect second or other font defined in that file, then you hit a block. You cannot even get the font name by using this API.
There is no difference between system loaded fonts and created font. However there is no good way to specify the font index, you may try to hack into FontManager and come up with some platform specific code.

Splitting a pdf with pdfbox, but losing the font

I wrote some code in Java using the pdfbox API that splits a pdf document into it's individual pages, looks through the pages for a specific string, and then makes a new pdf from the page with the string on it. My problem is that when the new page is saved, I lose my font. I just made a quick word document to test it and the default font was calibri, so when I run the program I get an error box that reads: "Cannot extract the embedded font..." So it replaces the font with some other default.
I have seen a lot of example code that shows how to change the font when you are inputting text to be placed in the pdf, but nothing that sets the font for the pdf.
If anyone is familiar with a way to do this, (or can find documentation/examples), I would greatly appreciate it!
Edit: forgot to include some sample code
if (pageContent.indexOf(findThis) >= 0){
PDPage pageToRip = pages.get(i);
>>set the font of pageToRip here
res.importPage(pageToRip); //res is the new document that will be saved
}
I don't know if that helps any, but I figured I'd include it.
Also, this is what the change looks like if the pdf is written in calibri and split:
Note: This might be a nonissue, it depends on the font used in the files that will need to be processed. I tried some things besides Calibri and it worked out fine.

From How to extract fonts from a PDF:
You actually cannot extract a font from a PDF, not even if the font is
fully embedded. There are two reasons why this is not feasible:
•Most fonts are copyrighted, making it illegal to use an extractor.
•When a font is embedded in a PDF, not all of the font data are
included. Obviously the font outline data are included as well as the
font width tables. Other information, such as data about ligatures,
are irrelevant within the PDF so those data do not get enclosed in a
PDF. I am not aware of any font extraction tools but if you come
across one, the above reasons should make it clear that these
utilities are to be avoided.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.