showing emoji in pdf or excel - java

I have the data containing emoji in database. I want to display in the generated document such as pdf or in excel format.
I am using spring boot application. Please suggest any java library for generating either PDF or excel which supports emoji.

iText supports this. Assuming
your emoji is a unicode character
you use a font that contains the correct glyph for this unicode character
Best way to test this is to try it.
This is how to get started with iText:
https://developers.itextpdf.com/content/itext-7-jump-start-tutorial/installing-itext-7
And this is a small code-snippet that adds text to a document with different fonts:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
PdfFont font = PdfFontFactory.createFont(FontConstants.TIMES_ROMAN);
PdfFont bold = PdfFontFactory.createFont(FontConstants.TIMES_BOLD);
Text title =
new Text("The Strange Case of Dr. Jekyll and Mr. Hyde").setFont(bold);
Text author = new Text("Robert Louis Stevenson").setFont(font);
Paragraph p = new Paragraph().add(title).add(" by ").add(author);
document.add(p);
document.close();
For more information check out the tutorials.
https://developers.itextpdf.com/content/itext-7-building-blocks/chapter-1

Related

PDFBox returns missing descendant font dictionary

when extracting the first page of a PDF I get java.io.IOException: Missing descendant font dictionary.
The extraction code is the following:
PDDocument pdDocument = PDDocument.load(file);
PageExtractor pageExtractor = new PageExtractor(pdDocument, 1, 1);
PDDocument singlePageDocument = pageExtractor.extract();
It only happens with few PDFs and the error points to the Fonts definitions, but I am unclear on how fonts in PDF are processed by Apache PDFBox (using version v2.0.18).
Any tip?
Thanks

PDF Box flatten PDF causes weird spacing

I'm having an issue with PDF box flattening a PDF generated by Adobe Acrobat DC.
The Adobe Acrobat text field I created is absolutely the default text field.
In my example below, I have a PatientName field with the text value "Douglas McDouggelman".
When I flatten the PDF, here's what it looks like:
Anyone know what's up with this bizarre spacing?
It appears that the space + next character are combined. This is what it looks like when you try to select that character.
Code:
try (PDDocument document = PDDocument.load(pdfFormInputStream)) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
acroForm.getField("PatientName").setValue("Douglas McDouggelman");
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
if (flattenPdfs) {
acroForm.flatten();
}
document.save(byteArrayOutputStream);
}
I realized this PDF was from some other group who made it and who knows what they did. So I found the source word document, repeated the creation of the form from Adobe DC, added the fields back to the document, then it was totally fine.
PDF box was not the problem... it was some unknown incorrect step that the person who originally prepared the pdf did.

How to embed an standard font into generated PDF with PDFBox

I need to add some text to PDF/A files using the Apache PDFBox library for Java. The problem is that, because it needs to be a valid PDF/A file, all the used fonts must be embedded in it. I know that I can embed a TTF font using PDFBox, but I'd like to avoid having to provide a font file with the application, so I was wondering if there's a way to embed one of the standard fonts available in PDFBox as if it was external.
For example, when I write something using one of the standard fonts, the PDF validator complains about this:
I've used the following code to write the text:
PDFont standardFont = PDType1Font.HELVETICA_BOLD;
PDPage pag = new PDPage();
pag.setResources(new PDResources());
PDPageContentStream contentStream = new PDPageContentStream(pdfFile, pag);
//Begin the Content stream
contentStream.beginText();
//Setting the font to the Content stream
contentStream.setFont(standardFont, 12);
//Setting the position for the line
contentStream.newLineAtOffset(25, 500);
//Adding text in the form of string
contentStream.showText("JUST A SAMPLE STRING");
//Ending the content stream
contentStream.endText();
//Closing the content stream
contentStream.close();
pdfFile.addPage(pag);
pdfFile.save(file);
pdfFile.close();
Is there any option to force the embed of the font when setting it?
Thanks in advance,
There is only one font embedded in PDFBox. You can use it this way:
PDFont font = PDType0Font.load(doc, SomePdfboxClass.class.getResourceAsStream(
"/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"));

HTML to PDF using iText, formatting issues

I'm working with iText in java to write PDF files. I'm trying to write a paragraph like heading and then text start very after the heading in the same line like
Heading: this a para now ...
Heading is bold and para is in normal text but I'm unable to do this using iText. I tried to use:
fonts[2] = new Font(Font.HELVETICA, 8, Font.BOLD);
Paragraph paranumber = new Paragraph(
fonts[2].getCalculatedLeading(1),
headingText.trim()
+ " ", fonts[0]);
Paragraph para = new Paragraph(
fonts[0].getCalculatedLeading(1), contentText.trim(), fonts[0]);
para.setAlignment(Element.ALIGN_JUSTIFIED);
para.setSpacingAfter(3f);
//Now adding the para to paraNumber that is having the heading and expecting
//that it will be added very after the heading, but this does not show correct
//result, formatting issue.
paranumber.add(para);
mct.addElement(paranumber);
I also tried to create a new paragraph and added both paras(heading para and normal text para) to that new one, but that is also not showing proper result. please see below chunk for that.
Paragraph newPara = new Paragraph();
newPara.add(paranumber);
newPara.add(para);
but this also not show proper formatting.
Or if anyone can advise me to use some other way to create PDF from HTML that will be good too, so that i may rewrite the module to create required PDF. Please advise.
Paragraphs typically use concepts like indentation and increased leading to set them apart visually. They are block level elements, not inline.
It doesn't make sense to add a paragraph inside another paragraph. The added paragraph would typically start on a new line, essentially making it a separate paragraph anyway.
To get a paragraph with different fonts, like your example, you can use Chunks in iText. A Chunk is basically a piece of text with an associated font.
Font fontbold = new Font(BaseFont.createFont(BaseFont.HELVETICA_BOLD,
BaseFont.WINANSI, BaseFont.NOT_EMBEDDED), 12);
Font fontregular = new Font(BaseFont.createFont(BaseFont.HELVETICA,
BaseFont.WINANSI, BaseFont.NOT_EMBEDDED), 12);
Chunk header = new Chunk("Heading: ", fontbold);
Chunk content = new Chunk("this is a para now ...", fontregular);
Paragraph paragraph = new Paragraph();
paragraph.add(header);
paragraph.add(content);
document.add(paragraph);
The result looks like this:
It's not clear from your question and code sample how HTML is involved. I assume you are somehow parsing HTML input and converting the parsed content to PDF using iText Elements. This is a valid approach. Alternatively, you can look into iText XML Worker, which does XHTML (+CSS) to PDF conversion.

docx conversion to pdf in korean font

Hope someone can help me. It's about docx to pdf conversion having korean sign in docx document.
I'm able to convert a docx document to pdf with docx4j.
In pdf document, I can see the result. But if my docx document contains korean font, I can't see any korean font in my pdf document except the latin numbers.
What do I have to do to get korean font in my pdf from the docx document?
Here is my code:
File docXFile ="E:/contract2Files/test.docx";
WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage.load(docXFile);
String out = docXFile .replace("docx","pdf");
File pdfFile = new File(out);
OutputStream pdfFileOs = new java.io.FileOutputStream(pdfFile);
org.docx4j.convert.out.pdf.PdfConversion c = new JanoPdfConversion(wordprocessingMLPackage);
c.output(pdfFileOs);
Please try http://www.docx4java.org/docx4j/docx4j-3_0-beta2.zip (link updated 15 Nov)
You might need to configure your font mapper, though things work out of the box with the Identity mapper on my Windows box, since I have the relevant font installed.
If this doesn't help, please put a sample docx somewhere StackOverflow users can see it.
thanks again. I tried the newest jar file with this code passage and it worked!!! Now I get the korean letters into pdf. Thank again.
ThemePart themePart =
wordprocessingMLPackage.getMainDocumentPart().getThemePart();
org.docx4j.dml.BaseStyles.FontScheme fontScheme = themePart.getFontScheme();
org.docx4j.dml.TextFont textFont = fontScheme.getMinorFont().getLatin();
textFont.setTypeface("Malgun Gothic");

Categories

Resources