PDFBox TextToPdf keep font

PDFBox TextToPdf keep font - java

I am currently converting a text document to pdf and rendering it to the browser and cannot seem to keep the font. The font is courier but gets converted to something else when it is converted to a pdf. Is there a easy way to just make it keep the default font? Or at least be able to set it after converting? here is the code.
public void downloadFile(HttpServletResponse response, List<Report> reports) throws IOException{
OutputStream outputStream = response.getOutputStream();
PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
PDDocument documentToPrint = new PDDocument();
for(Report report : reports){
PDDocument pdDocument = new TextToPDF().createPDFFromText(new InputStreamReader(
new FileInputStream(fileDirectory + File.separator + report.getFileLocation()), "UTF8")
);
pdfMergerUtility.appendDocument(documentToPrint, pdDocument);
}
pdfMergerUtility.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
response.setContentType("application/pdf");
response.addHeader("Content-Disposition", "inline; filename=" + "download.pdf");
documentToPrint.save(outputStream);
documentToPrint.close();
}
I have also tried setting it like the following before appending the document.
PDDocumentCatalog documentCatalog = pdDocument.getDocumentCatalog();
PDResources pdResources = documentCatalog.getPages().get(i).getResources();
pdResources.add(PDType1Font.COURIER);
documentCatalog.getPages().get(i++).setResources(pdResources);
But that does not seem to work either

Because I have a the font in the text document as courier.
No you don't, editors usually display it with Courier. So you have to set it because the default is Helvetica.
Change this:
PDDocument pdDocument = new TextToPDF().createPDFFromText(new InputStreamReader(....
to this:
TextToPDF textToPDF = new TextToPDF();
textToPDF.setFont(PDType1Font.COURIER);
textToPDF.createPDFFromText(new InputStreamReader(....

Related

Subsetting OpenType Collection font in pdfbox

I'm trying to embed a subset of noto-regular in my code. but I keeping on getting:
java.lang.UnsupportedOperationException: OTF fonts do not have a glyf table
at org.apache.fontbox.ttf.OpenTypeFont.getGlyph(OpenTypeFont.java:66)
at org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:481)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:136)
at org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:306)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:162)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1138)
I downloaded the font file NotoSansCJK-Regular.ttc from https://www.google.com/get/noto/help/cjk/
Font subsetting works for .ttf fonts, as I haven't had any issues if the document I saved contains no special characters.
EDIT
It appears that true type collection fonts can have shared glyf table (makes sense since the font collection contains Japanese glyphs). So the individual PDType0Font parsed from .ttc can't be treated as an individual font.
I loaded the font using:
ttc.processAllFonts((TrueTypeFont ttf) -> {
PDFont font = PDType0Font.load(doc, ttf, true);
fontList.add(font);
});
I'm guessing that there are extra work I needed to do to make this work, but I can't find any code samples anywhere.
EDIT2
Seems like the problem is that when subsetting specific OpenType font files, (which font collection contains) turns on an internal flag isPostScript. The flag is then checked and process is aborted when getGlyph() is called.
The following code generates the glyf table error when creating the pdf documents
// downloaded from Noto project site
String OTF_FILE = "./src/test/resources/NotoSansJP-Regular.otf";
PDDocument doc = new PDDocument();
PDFont otf = null;
try (InputStream inputStream = new FileInputStream(new File(OTF_FILE))) {
otf = PDType0Font.load(doc, new OTFParser().parse(inputStream), true);
PDPage page = new PDPage();
PDPageContentStream stream = new PDPageContentStream(doc, page);
stream.setFont(otf, 10f);
stream.beginText();
stream.newLineAtOffset(100f, 600f);
stream.showText("二ろほス反2化みた大第リきやね景手ハニエ者性ルヤリウ円脱");
stream.endText();
stream.close();
doc.addPage(page);
doc.save("test.pdf");
} catch (IOException iox) {
// failed
}
but it will generate the pdf fine as soon as I set the subsetting parameter to true in the PDType0Font.load call
Similarily if I load the otf font through the collection:
String OTF_FILE = "./src/test/resources/NotoSansCJK-Regular.ttc";
PDDocument doc = new PDDocument();
PDFont otf = null;
try (InputStream inputStream = new FileInputStream(new File(OTF_FILE))) {
TrueTypeCollection ttc = new TrueTypeCollection(inputStream);
otf = PDType0Font.load(doc, ttc.getFontByName("NotoSansCJKjp-Regular"), true);
PDPage page = new PDPage();
PDPageContentStream stream = new PDPageContentStream(doc, page);
stream.setFont(otf, 10f);
stream.beginText();
stream.newLineAtOffset(100f, 600f);
stream.showText("二ろほス反2化みた大第リきやね景手ハニエ者性ルヤリウ円脱");
stream.endText();
stream.close();
doc.addPage(page);
doc.save("test.pdf");
} catch (IOException iox) {
// failed
}
I either need to embed the whole font or subsetting will throw the error
EDIT 3
I ended up circumvent this by downloading the OTF font from "Language-specific OpenType/CFF (OTF)", which contains characters from all 4 regions and converted it using otf2ttf from fonttools

Apache PdfBox Rotate Crop Box Only Not Text

I am trying to go from text to pdf but have only one of the pages rotated 90 degress. Main reason is that some of my text documents are a bit too large and need to be in landscape to look normal. I have tried a few things but it seems like everything rotates the text too. Is there an easy way to rotate the pdf to landscape but keep the text the same rotation?
OutputStream outputStream = response.getOutputStream();
PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
Map<String, Documents> documents = getDocuments(user, documentReports);
try (PDDocument documentToPrint = new PDDocument()){
for(Document doc : documentReports){
TextToPDF textToPDF = new TextToPDF();
textToPDF.setFont(PDType1Font.COURIER);
textToPDF.setFontSize(8);
Document documentReport = documents.get(doc.getId());
try(PDDocument pdDocument = textToPDF.createPDFFromText(new InputStreamReader(new ByteArrayInputStream(documentReport.getReportText().getBytes())))) {
pdfMergerUtility.appendDocument(documentToPrint, pdDocument);
}
}
pdfMergerUtility.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
LocalDateTime localUtcTime = Java8TimeUtil.getCurrentUtcTime();
documentToPrint.getDocumentInformation().setTitle(localUtcTime.toString());
response.setHeader("Content-Disposition", "inline; filename=" + localUtcTime + ".pdf");
response.setContentType("application/pdf");
documentToPrint.save(outputStream);
}

So this might not work for everyone but I figured it out for my specific requirement. TextToPDF has a method called setLandscape before creating the pdf from text. textToPDF.setLandscape(true);

itext on tomcat print different fonts

I'm executing this code from Eclipse and on Tomcat into a webapp
FileInputStream is = new FileInputStream("C:/Users/admin/Desktop/dummy.txt");
try {
FontFactory.register("C:/Workspace/Osmosit/ReportManager/testSvn/ReportManagerCommon/src/main/java/com/osmosit/reportmanager/common/itext/fonts/ARIALUNI.TTF");
} catch (Exception e) {
e.printStackTrace();
}
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(1024);
Document document = new Document(PageSize.A4);
PdfWriter writer;
writer = PdfWriter.getInstance(document, byteArrayOutputStream);
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
byteArrayOutputStream.close();
FileOutputStream fos = new FileOutputStream("C:/Users/admin/Desktop/prova-web.pdf");
fos.write(byteArrayOutputStream.toByteArray());
fos.close();
the dummy.txt is a simple html with aranic and latin characters
<div style="font-family: Arial Unicode MS;" ><p>كما. أي مدن العدّ وقام test latin</p><br /></div>
When I run under eclipse I obtain a correct pd, when it runs on Tomcat I get this:
ÙƒÙ…Ø§. Ø£ÙŠ Ù…Ø¯Ù† Ø§Ù„Ø¹Ø¯Ù‘ ÙˆÙ‚Ø§Ù… test latin
PS: I'm using itextpdf ver 5.5.8

You have an encoding problem. Either you saved dummy.txt using the wrong encoding (e.g. as Latin-1 instead of as UTF-8), or you are reading dummy.txt using the wrong encoding.
See html to pdf convert, cyrillic characters not displayed properly and adapt the line in which you use parseHtml():
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
is, null, Charset.forName("UTF-8"), fontImp);
Take a look at the ParseHtml11 example to find out what fontImp is about.
You are also making another mistake: Arabic is read from right to left, and in your code, you aren't defining the run direction. See Arabic characters from html content to pdf using iText
In your case, I would put the Arabic text in a table and I would follow the ParseHtml7 example from the official documentation:
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
// Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/NotoNaskhArabic-Regular.ttf");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline pdf = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));
PdfPTable table = new PdfPTable(1);
PdfPCell cell = new PdfPCell();
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
for (Element e : elements) {
cell.addElement(e);
}
table.addCell(cell);
document.add(table);
// step 5
document.close();
}

CSS style is not taken while generating pdf from html using itext

I can successfully generate a pdf from an html string, but the problem is that it doesn't take the css script. How can I generate the pdf with css style?
Please help! I have tried cssresolver als
My code is here:
{String result = "failed";
try
{
String html2 ="<html>"+.....+"</html>" ;
long timemilli = System.currentTimeMillis();
String filename = "EastAfriPack2014_Ticket_"+timemilli;
String writePath = Global.PDF_SAVE_PATH + filename ;
System.out.println("----------writePath--------------"+ writePath);
OutputStream file = new FileOutputStream(new File(writePath+".pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
InputStream is = new ByteArrayInputStream(k.getBytes());
CSSResolver cssResolver = XMLWorkerHelper.getInstance().getDefaultCssResolver(false);
cssResolver.addCss("table {color: red; background-color: blue; } ", true);
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
System.out.println("pdf created");
result = filename;
return filename;
} catch (Exception e) {
e.printStackTrace();
return result;
}
}

I don't think your approach works. I tried it before because, its the easiest way to create a PDF from HTML, but got bitten by same problem.
You either provide the styles inline via style attribute for the table
or
use the HTML, CSS files separately and send them to the HelperClass
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream("myhtmlFile.html"),
new FileInputStream("myCSSFile.css"));
the HTML part can also be an inputStream you made above in the code.

pdfwriter doesn't translate special characters

I have HTML file with an external CSS. I want to create PDF from the HTML file, but the endcoing doesn't work. HTML file works fine, but after transfering to PDF, some characters in PDF are missing. (čřě...) It happens even if I set the Charset in PDFWriter constructor.
How do I solve this, please?
public void createPDF() {
try {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(username + ID + ".pdf"));
document.open();
String hovinko = username + ID + ".html";
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(hovinko), Charset.forName("UTF-8"));
document.close();
System.out.println("PDF Created!");
} catch (Exception ex) {
ex.printStackTrace();
}
}

Did you try to convert your special characters before writing them to your PDF?
yourHTMLString.replaceAll(oldChar, newChar);
ć = ć
ř = ř
ě = ě
If you need more special characters, visit this link.
EDIT: Then try this out, it worked for me:
BaseFont basefont = BaseFont.createFont("C:/Windows/Fonts/ARIAL.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(basefont, 12);
document.add(new Paragraph("čřě", font));

Try it with below logic. It worked for me:
InputStream is = new ByteArrayInputStream(hovinko.getBytes(Charset.forName("UTF-8")));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is, Charset.forName("UTF-8"));
I used xmlworker version 5.5.12 and itextpdf version 5.5.12.

I was strugling with sam problem (Polish special signs).
For me solution was to write a good font-family in html code.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PDFBox TextToPdf keep font - java

Related

Subsetting OpenType Collection font in pdfbox

Apache PdfBox Rotate Crop Box Only Not Text

itext on tomcat print different fonts

CSS style is not taken while generating pdf from html using itext

pdfwriter doesn't translate special characters

Categories

Resources