Java: Check if otf font file contains small caps glyphs - java

Is there a way to check if otf font file contains glyphs for small caps variation?
Is there a way to do it in java?
Thanks!

There is, just don't use standard-library-Java for it. Use either a Java OpenType font analyser, or do the more sensible thing and consult Freetype2 or Harfbuzz for that information.
Really, anything that lets you check OpenType features will do: check whether the font encodes for the smcp feature - if it does, it support proper smallcaps as per the OpenType spec. If not, it doesn't, and whatever text engine you're using is going to fake it.

Related

re-embed subset font in pdf with itext 7

I have some input PDF all with full set fonts, I want to "shrink" them all creating fonts subset. I know there is the way to unembed fonts and embed subset font, but the problem is that i don't have the source file of fonts. I just have fonts embedded in source PDF.
Someone can help me to troubleshoot this issue ?
ENV: java8, itext7.1.5
Here's a thread on a similar question (about embedding, not subsetting, despite the OP's question): How to subset fonts into an existing PDF file. The following statement is relevant:
If you want to subset it, you'd need to parse all the content streams
in the PDF to find out which glyphs are used. That's NOT a trivial
task.
I wouldn't recommend attempting this in iText, unless it's really necessary. It would likely end up buggy unless you have a very complete understanding of the PDF specs. It might be worth pursuing other avenues such as changing the way the PDFs are created, or use something like Distiller that can do this for you.
If you do want to do this in iText, I'm afraid you will likely have to use a PdfCanvasProcessor and some custom operator handlers. You would need to find all text fields, determine which font they use, build a new subset font with the applicable glyphs, and replace the fonts with new subset copies. This is how you would create a copy of the complete font to prepare for subsetting (assuming you don't have copies of the font files):
String encoding = PdfEncodings.WINANSI; // or another encoding if needed for more glyph support
PdfFont completeFont = ...; // get complete font from font dictionary
PdfFont subsetFont = PdfFontFactory.createFont(completeFont.getFontProgram(), encoding, true);
subsetFont.setSubset(true);
When you encounter a Font change operator (Tf), you would need to look up that font in the font dictionary and create a new (or lookup an already created) subset copy of that font to prepare for upcoming text fields. Don't forget to keep the font in a stack so you can pop back to the previous font (look for q and Q operators). And don't forget to check parent forms and page groups for the fonts if they don't exist in the current XObject or page resource dictionary.
When you encounter text (a Tj, TJ, ', or " operator), you would need to decode the text using the complete font, then re-encode it to the new subset font's encoding (unless you know for sure that all your source fonts are ASCII-compatible). Add that text's characters to the subset like this:
subsetFont.addSubsetRange(new int[]{character});

Rendering a string in Java which is in different languages english, chinese and Indic

I need to render the below String on a JTable cell. How do we do this?
testä漢字1ગુજરાતી2
Update:
Looks like my question is not clear. The above string is a name of file. Need to display this and many other file names in JTable. Data comes dynamically. I may need a custom renderer to display this string exactly same. Currently, this displays junk characters. Simply changing the table Font from Calibri to MS Gothic, I can see the chinese characters, but not indic letters. But, as the data comes dynamically, we won't be knowing what font to use.
So, want to know if there is way so that I can check the string programmatically and render the string with different fonts as appropriate.
The simplest solution would be to use a font which is more complete. I believe the DejaVu family is decent, others may be able to suggest better. The Oracle manual suggests that the logical fonts Dialog, DialogInput, Monospaced, Serif and SansSerif are also likely to be more complete. Potentially they could map to multiple underlying fonts depending on the specific characters which need to be rendered. Oracle also mentions the Lucidia family which is distributed with Oracle's JRE as another possibility which is fairly complete, though it doesn't have Chinese, Korean or Japanese characters.
A more convoluted solution would be to run Character.UnicodeBlock.of(c) on the characters in each string, assemble a Set<Character.UnicodeBlock> and guess which font is most appropriate based on the blocks present in the string, or even write a custom renderer to render each character (or sequence of characters) with a font appropriate to the Unicode block they belong to. Unicode blocks tend to be categorized according to the script they contain.

TrueType Collection (.ttc) set as font in Java Swing

I know that Java supports TrueType Fonts (.ttf) and that .ttc is extension of TrueType format, but i can't find information that Java also supports the TrueType collection (.ttc) to be explicitly set as font on JLabel for example.
I made an example, where I successfully load a .ttc file in my application with the following code:
InputStream is = getClass().getResourceAsStream("/resources/simsun.ttc");
Font font = Font.createFont(Font.TRUETYPE_FONT, is);
Font fontBase = font.deriveFont(15f);
field.setFont(fontBase);
The code is working well, there are no exceptions related to the creation, loading or setting of the .ttc file as a font in Swing components.
My question is: Can someone confirm this to be working well and that all glyphs from the fonts inside the .ttc are used in components, or there are any disadvantages related to this?
Also, is there any difference if the .ttc is loaded from jar on client machine or it has to be installed in system fonts?
I'm using Windows 7.
First of all, the difference between TTC and TTF is: TTC can (and usually) contain multiple fonts, but TTF only have font defined. The reason to put multiple font into one file is to save space by share glyphs (or sub glyphs). For example, in SimSun and NSimSun, most of glyphs are same, save them together can save lots of space.
Second, Java support TTC font format, but by using Font.createFont() you can only get the first font defined in the TTC file. Currently, there is no way to specify the font index. Take a look at sun.font.FontManager.createFont2D(), when they invoke new TrueTypeFont(), the fontIndex is alway zero. Shame!
For your question: if all you need is the first font in TTC file, then everything would be okay. All the glyphs defined for first font would be available. But, if you expect second or other font defined in that file, then you hit a block. You cannot even get the font name by using this API.
There is no difference between system loaded fonts and created font. However there is no good way to specify the font index, you may try to hack into FontManager and come up with some platform specific code.

itext font UnsupportedCharsetException

I am trying to create pdf documents using iText (version 5.4.0) in a java web application and I have come across an issue with fonts.
The web application is multi-lingual, and so users may save information into the system in various languages (eg. english, french, lithuanian, chinese, japanese, arabic, etc.).
When I tried to configure the pdf to output some sample japanese text it didn't show up, so I started following the examples in the official "iText in Action" book. The problem I have encountered is that when I try and configure a font with BaseFont.IDENTITY_H encoding I get the following error:
java.nio.charset.UnsupportedCharsetException: Identity-H
at java.nio.charset.Charset.forName(Charset.java:505)
at com.itextpdf.text.pdf.PdfEncodings.convertToBytes(PdfEncodings.java:186)
at com.itextpdf.text.pdf.Type1Font.<init>(Type1Font.java:276)
at com.itextpdf.text.pdf.BaseFont.createFont(BaseFont.java:692)
at com.itextpdf.text.pdf.BaseFont.createFont(BaseFont.java:615)
at com.itextpdf.text.pdf.BaseFont.createFont(BaseFont.java:450)
Nothing in the book or searching Google mentions this issue.
Any suggestions as to what I might have missed?
As you probably understood from the answers from two Michaels, you made the wrong assumption that the standard Type 1 font Times Roman and IDENTITY_H are compatible. You'll have to change the font if you want to use IDENTITY_H, or change the encoding if you want to use a standard Type 1 font (in which case using BaseFont.EMBEDDED doesn't make sense because standard Type 1 fonts are never embedded). I'm sorry if I didn't mention this in my book. I thought it was kind of trivial. One can deduct it from what I wrote about composite fonts.
I don't think there's any one encoding that works for all languages, with font embedding. For example, you'd assume that choosing the UTF-8 encoding, with font embedding set to true will embed the font, but it doesn't.
I find myself having to do this, because I don't know the language of the text ahead of time:
try {
// Try to embed the font.
// This doesn't work for type 1 fonts.
return FontFactory.getFont(fontFace, BaseFont.IDENTITY_H,
true, fontSize, fontStyle, textColor);
} catch (ExceptionConverter e) {
return FontFactory.getFont(fontFace, "UTF-8", true,
fontSize, fontStyle, textColor);
}
(The exception class may be different since I'm using an older version of iText -- 2.1.)
As with a lot of iText stuff, this is poorly documented, and makes the easy stuff unnecessarily hard.

Splitting a pdf with pdfbox, but losing the font

I wrote some code in Java using the pdfbox API that splits a pdf document into it's individual pages, looks through the pages for a specific string, and then makes a new pdf from the page with the string on it. My problem is that when the new page is saved, I lose my font. I just made a quick word document to test it and the default font was calibri, so when I run the program I get an error box that reads: "Cannot extract the embedded font..." So it replaces the font with some other default.
I have seen a lot of example code that shows how to change the font when you are inputting text to be placed in the pdf, but nothing that sets the font for the pdf.
If anyone is familiar with a way to do this, (or can find documentation/examples), I would greatly appreciate it!
Edit: forgot to include some sample code
if (pageContent.indexOf(findThis) >= 0){
PDPage pageToRip = pages.get(i);
>>set the font of pageToRip here
res.importPage(pageToRip); //res is the new document that will be saved
}
I don't know if that helps any, but I figured I'd include it.
Also, this is what the change looks like if the pdf is written in calibri and split:
Note: This might be a nonissue, it depends on the font used in the files that will need to be processed. I tried some things besides Calibri and it worked out fine.
From How to extract fonts from a PDF:
You actually cannot extract a font from a PDF, not even if the font is
fully embedded. There are two reasons why this is not feasible:
•Most fonts are copyrighted, making it illegal to use an extractor.
•When a font is embedded in a PDF, not all of the font data are
included. Obviously the font outline data are included as well as the
font width tables. Other information, such as data about ligatures,
are irrelevant within the PDF so those data do not get enclosed in a
PDF. I am not aware of any font extraction tools but if you come
across one, the above reasons should make it clear that these
utilities are to be avoided.

Categories

Resources