Below is the code to write PDF using Java.
Code
public class PDFTest {
public static void main(String args[]) {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
try {
File file = new File("C://test//itext-test.pdf");
FileOutputStream fileout = new FileOutputStream(file);
PdfWriter.getInstance(document, fileout);
document.addAuthor("Me");
document.addTitle("My iText Test");
document.open();
Chunk chunk = new Chunk("iText Test");
Paragraph paragraph = new Paragraph();
String test = "și";
String test1 = "şi";
if (test.equalsIgnoreCase(test1)) {
// System.out.println("equal ignore case true");
paragraph.add(test + " New Font equal with Old Font");
} else {
// System.out.println("equal ignore case X true");
paragraph.add(test1 + " New Font Not equal with Old Font");
}
paragraph.setAlignment(Element.ALIGN_CENTER);
document.add(paragraph);
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
When I test with Romanian language, I found that "ș" is missing in created PDF.
The Document appears like below:
Any advice or references links regarding this issue is highly appreciated.
**EDITED**
I've use unicode example like below and the output is still same. "ș" is still missing.
Code
static String RESULT = "C://test/itext-unicode4.pdf";
static String FONT = "C://Users//PenangIT//Desktop//Arial Unicode.ttf";
public static void main(String args[])
{
try
{
Document doc = new Document();
PdfWriter.getInstance(doc, new FileOutputStream(RESULT));
doc.open();
BaseFont bf;
bf = BaseFont.createFont(FONT,BaseFont.IDENTITY_H,BaseFont.EMBEDDED);
doc.add(new Paragraph("Font : "+bf.getPostscriptFontName()+" with encoding: "+bf.getEncoding()));
doc.add(new Paragraph(" TESTING "));
doc.add(new Paragraph(" TESTING 1 și "));
doc.add(new Paragraph(" TESTING 2 şi "));
doc.add(Chunk.NEWLINE);
doc.close();
}
catch(Exception ex)
{
}
The Output looks like this
It same for encode as well. The "ș" is still missing.
Please take a look at this PDF: encoding_example.pdf (*)
It contains all kinds of characters that aren't present in the default font Helvetica (which is the default font you're using as you're not defining any other font).
In the EncodingExample source, we use arialbd.ttf with a specific encoding, resulting in the use of a simple font in the PDF. In the UnicodeExample source, we use IDENTITY_H as encoding, resulting in the use of a composite font in the PDF.
I've adapted your code, because I see that you didn't understand my answer:
BaseFont bf = BaseFont.createFont(FONT,BaseFont.IDENTITY_H,BaseFont.EMBEDDED);
doc.add(new Paragraph(" TESTING 1 și ", new Font(bf, 12)));
doc.add(new Paragraph(" TESTING 2 \u015Fi ", new Font(bf, 12)));
Do you see the difference? In your code, you create bf, but you aren't using that object anywhere.
(* )Note: pdf.js can't interpret some glyphs because pdf.js doesn't support simple fonts with a special encoding; these glypgh show up correctly in Adobe Reader and Chrome PDF viewer. If you want to be safe, use composite fonts, because pdf.js can render those glyphs correctly: unicode_example.pdf
Related
I know that many people may have asked this question before. I've read almost all of them`but it couldn't help me solve my problem.
I'm using iText java library to generate a Persian PDF. I'm using the following
how to use PdfWriter.RUN_DIRECTION_RTL
code:
String ruta = txtruta.getText();
String contenido= txtcontenido.getText();
try {
FileOutputStream archivo = new FileOutputStream(ruta+".pdf");
Document doc = new Document(PageSize.A4,50,50,50,50);
PdfWriter.getInstance(doc, archivo);
doc.open();
BaseFont bfComic = BaseFont.createFont("D:\\Font\\B Lotus.ttf", BaseFont.IDENTITY_H,BaseFont.EMBEDDED);
Font font = new Font(bfComic, 12,Font.NORMAL);
doc.add(new Paragraph(contenido,font));
doc.close();
JOptionPane.showMessageDialog(null,"ok");
} catch (Exception e) {
System.out.println("Eroor"+e);
}
Output:
Problem
Document.add() doesn't support RTL text. You'll have to use ColumnText.setRunDirection or PdfPTable.setRunDirection.
I haven't worked with Persian language. But, I think your problem will be with the font (B Lotus.ttf) you used. In most of times using a registered Unicode font may solve the problem. Try again using a different font.
Also you can RTL a text phrase using following code.
PdfPCell pdfCell = new PdfPCell(new Phrase(contenido, myUnicodePersianFont));
pdfCell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
You will find out a similar question here.
I succeeded
private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {
JFileChooser dlg = new JFileChooser();
int option = dlg.showSaveDialog(this);
if(option==JFileChooser.APPROVE_OPTION){
File f = dlg.getSelectedFile();
txtaddress.setText(f.toString());
}
}
private void jButton2ActionPerformed(java.awt.event.ActionEvent evt) {
String ruta = txtaddress.getText();
String con= content.getText();
try {
FileOutputStream archivo = new FileOutputStream(ruta+".pdf");
Document doc = new Document(PageSize.A4,50,50,50,50);
PdfWriter Writer = PdfWriter.getInstance(doc, archivo);
doc.open();
LanguageProcessor al = new ArabicLigaturizer();
Writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
BaseFont bfComic = BaseFont.createFont("D:\\Font\\titr.ttf", BaseFont.IDENTITY_H,BaseFont.EMBEDDED);
Font font = new Font(bfComic, 12,Font.NORMAL);
Paragraph p = new Paragraph(al.process(con),font);
p.setAlignment(Element.ALIGN_RIGHT);
doc.add(p);
doc.close();
JOptionPane.showMessageDialog(null,"Yes");
} catch (Exception e) {
System.out.println("Eroor"+e);
}
}
I am trying to use NOTO fonts (https://www.google.com/get/noto/) to display Chinese characters. Here is my sample code,a modified sample code from iText.
public void createPdf(String filename) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(filename));
document.open();
//This is simple English Font
FontFactory.register("c:/temp/fonts/NotoSerif-Bold.ttf", "my_nato_font");
Font myBoldFont = FontFactory.getFont("my_nato_font");
BaseFont bf = myBoldFont.getBaseFont();
document.add(new Paragraph(bf.getPostscriptFontName(), myBoldFont));
//This is Chinese font
//Option 1 :
Font myAdobeTypekit = FontFactory.getFont("SourceHanSansSC-Regular", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
//Option 2 :
/*FontFactory.register("C:/temp/AdobeFonts/source-han-sans-1.001R/OTF/SimplifiedChinese/SourceHanSansSC-Regular.otf", "my_hans_font");
Font myAdobeTypekit = FontFactory.getFont("my_hans_font", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);*/
document.add(Chunk.NEWLINE);
document.add(new Paragraph("高興", myAdobeTypekit));
document.add(Chunk.NEWLINE);
//simplified chinese
document.add(new Paragraph("朝辞白帝彩云间", myAdobeTypekit));
document.add(Chunk.NEWLINE);
document.add(new Paragraph("高兴", myAdobeTypekit));
document.add(new Paragraph("The Source Han Sans Traditional Chinese ", myAdobeTypekit));
document.close();
}
I have downloaded the fonts files on my machine. I am using two approaches
To use the equivalent font family in Adobe
Embed the otf file in pdf
Using approach 1, I would expect the Chinese characters to be displayed in pdf but English text is displayed and it is blank for Chinese characters.
Using approach 2, when I try embedding the fonts with pdf, which is not the path I would like to take, there is error in opening pdf.
Update :
If I look at this example http://itextpdf.com/examples/iia.php?id=214
and in this code
public void createPdf(String filename, boolean appearances, boolean font)
throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));
// step 3
document.open();
// step 4
writer.getAcroForm().setNeedAppearances(appearances);
TextField text = new TextField(writer, new Rectangle(36, 806, 559, 780), "description");
text.setOptions(TextField.MULTILINE);
if (font) {
BaseFont unicode =
BaseFont.createFont("c:/windows/fonts/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
text.setExtensionFont(BaseFont.createFont());
ArrayList<BaseFont> list = new ArrayList<BaseFont>();
list.add(unicode);
text.setSubstitutionFonts(list);
BaseFont f= (BaseFont)text.getSubstitutionFonts().get(0);
System.out.println(f.getPostscriptFontName());
}
text.setText(TEXT);
writer.addAnnotation(text.getTextField());
// step 5
document.close();
}
I substitute, c:/windows/fonts/arialuni.ttf with C:/temp/fonts/NotoSansCJKtc-Thin.otf , I do not see the Chinese characters. The text to convert now is
public static final String TEXT = "These are the protagonists in 'Hero', a movie by Zhang Yimou:\n"
+ "\u7121\u540d (Nameless), \u6b98\u528d (Broken Sword), "
+ "\u98db\u96ea (Flying Snow), \u5982\u6708 (Moon), "
+ "\u79e6\u738b (the King), and \u9577\u7a7a (Sky).";
Clearly you are using the wrong font. I have downloaded the fonts from the link you posted. You are using NotoSerif-Bold.ttf, a font that does not support Chinese. However, the ZIP file also contains fonts with CJK in the font name. As described on the site you refer to, CJK stands for Chinese, Japanese and Korean. Use one of those CJK fonts and you'll be able to product Chinese text in your PDF.
Take a look at the NotoExample in which I use one of the fonts from the ZIP file you refer to. It creates a PDF that looks like this:
This is the code I used:
public static final String FONT = "resources/fonts/NotoSansCJKsc-Regular.otf";
public static final String TEXT = "These are the protagonists in 'Hero', a movie by Zhang Yimou:\n"
+ "\u7121\u540d (Nameless), \u6b98\u528d (Broken Sword), "
+ "\u98db\u96ea (Flying Snow), \u5982\u6708 (Moon), "
+ "\u79e6\u738b (the King), and \u9577\u7a7a (Sky).";
public static final String CHINESE = "\u5341\u950a\u57cb\u4f0f";
public static final String JAPANESE = "\u8ab0\u3082\u77e5\u3089\u306a\u3044";
public static final String KOREAN = "\ube48\uc9d1";
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(DEST));
document.open();
Font font = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Paragraph p = new Paragraph(TEXT, font);
document.add(p);
document.add(new Paragraph(CHINESE, font));
document.add(new Paragraph(JAPANESE, font));
document.add(new Paragraph(KOREAN, font));
document.close();
}
You claim that Adobe Reader XI doesn't show the Chinese glyphs, but instead shows a "Cannot extract the embedded Font" message. I can not reproduce this [*]. I have even used Preflight in Adobe Acrobat as indicated here, but no errors were found:
[*] Update: this problem can be reproduced if you use iText 4.2.x, a version that was released by somebody unknown to iText Group NV. Please use iText versions higher than 5 only.
So I was trying to use this font I installed called "Tengwar Quenya-1 Regular" and it didn't work, it keep writing de PDF document with the default font. So I tried to use the downloaded file, by using EMBED method, and it is still printing the default font, I wondering if anyone had tried this before, and could tell me what I am doing wrong. Check the code:
public void testePdf(){
Document document = new Document();
String filename = "C:\\Users\\Marcelo\\Downloads\\tengwar_quenya\\QUENCAP1.TFF";
FontFactory.register(filename);
Font fonte = FontFactory.getFont(filename, BaseFont.CP1252, BaseFont.EMBEDDED);
try {
PdfWriter.getInstance(document,
new FileOutputStream(filename+ "HelloWorld.pdf"));
document.open();
document.add(new Paragraph("A Hello World PDF document.", fonte));
document.close(); // no need to close PDFwriter?
} catch (DocumentException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
It compiles fine, just not with the font I selected. If it is a glyph instead of a caracter, will it be a problem?
I have Googled for the font you mention. I have downloaded it and I have made a small SSCCE that you can download here: TengwarQuenya1
This is the code:
public static final String FONT = "resources/fonts/QUENCAP1.TTF";
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(DEST));
document.open();
Font f1 = FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 12);
document.add(new Paragraph("A Hello World PDF document.", f1));
document.close();
}
This is the result: tengwarquenya1.pdf
I'm not sure what the resulting text means, but it doesn't look like the default font to me.
In other words: I can't reproduce the problem. Note that you don't need to register a font if you pass its file path to the FontFactory. Obviously, my font path is different from yours. I think that yours is wrong. Try putting the ".TTF" file in another location.
I have HTML file with an external CSS. I want to create PDF from the HTML file, but the endcoing doesn't work. HTML file works fine, but after transfering to PDF, some characters in PDF are missing. (čřě...) It happens even if I set the Charset in PDFWriter constructor.
How do I solve this, please?
public void createPDF() {
try {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(username + ID + ".pdf"));
document.open();
String hovinko = username + ID + ".html";
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(hovinko), Charset.forName("UTF-8"));
document.close();
System.out.println("PDF Created!");
} catch (Exception ex) {
ex.printStackTrace();
}
}
Did you try to convert your special characters before writing them to your PDF?
yourHTMLString.replaceAll(oldChar, newChar);
ć = ć
ř = ř
ě = ě
If you need more special characters, visit this link.
EDIT: Then try this out, it worked for me:
BaseFont basefont = BaseFont.createFont("C:/Windows/Fonts/ARIAL.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(basefont, 12);
document.add(new Paragraph("čřě", font));
Try it with below logic. It worked for me:
InputStream is = new ByteArrayInputStream(hovinko.getBytes(Charset.forName("UTF-8")));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is, Charset.forName("UTF-8"));
I used xmlworker version 5.5.12 and itextpdf version 5.5.12.
I was strugling with sam problem (Polish special signs).
For me solution was to write a good font-family in html code.
I am currently working Java project with use of apache poi.
Now in my project I want to convert doc file to pdf file. The conversion done successfully but I only get text in pdf not any text style or text colour.
My pdf file looks like a black & white. While my doc file is coloured and have different style of text.
This is my code,
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("/document/test2.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("/document/test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
System.out.println("Length:" + paragraphs[i].length());
System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());
// add the paragraph to the document
document.add(new Paragraph(paragraphs[i]));
}
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
please help me.
Thnx in advance.
If you look at Apache Tika, there's a good example of reading some style information from a HWPF document. The code in Tika generates HTML based on the HWPF contents, but you should find that something very similar works for your case.
The Tika class is
https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
One thing to note about word documents is that everything in any one Character Run has the same formatting applied to it. A Paragraph is therefore made up of one or more Character Runs. Some styling is applied to a Paragraph, and other parts are done on the runs. Depending on what formatting interests you, it may therefore be on the paragraph or the run.
If you use WordExtractor, you will get text only. Try using CharacterRun class. You will get style along with text. Please refer following Sample code.
Range range = doc.getRange();
for (int i = 0; i < range.numParagraphs(); i++) {
org.apache.poi.hwpf.usermodel.Paragraph poiPara = range.getParagraph(i);
int j = 0;
while (true) {
CharacterRun run = poiPara.getCharacterRun(j++);
System.out.println("Color "+run.getColor());
System.out.println("Font size "+run.getFontSize());
System.out.println("Font Name "+run.getFontName());
System.out.println(run.isBold()+" "+run.isItalic()+" "+run.getUnderlineCode());
System.out.println("Text is "+run.text());
if (run.getEndOffset() == poiPara.getEndOffset()) {
break;
}
}
}