Html to docx with Docx4j break table when page skip - java

I using docx4j to generated a docx from html string, works great but the tables broken when the page skip.
This the code.
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
//File is a htmlStringFile
wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert(file, null) );
File fileDos = new File(urlWord);
I´m using a docx4j 8.3.2 with Java 8.
Somebody know how to change all properties in the general document ?
Example : space between paragraph, space after the line,
Another problem, in some case the table cell add diferent space between cell. I don´t know why

Related

PDF Box flatten PDF causes weird spacing

I'm having an issue with PDF box flattening a PDF generated by Adobe Acrobat DC.
The Adobe Acrobat text field I created is absolutely the default text field.
In my example below, I have a PatientName field with the text value "Douglas McDouggelman".
When I flatten the PDF, here's what it looks like:
Anyone know what's up with this bizarre spacing?
It appears that the space + next character are combined. This is what it looks like when you try to select that character.
Code:
try (PDDocument document = PDDocument.load(pdfFormInputStream)) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
acroForm.getField("PatientName").setValue("Douglas McDouggelman");
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
if (flattenPdfs) {
acroForm.flatten();
}
document.save(byteArrayOutputStream);
}
I realized this PDF was from some other group who made it and who knows what they did. So I found the source word document, repeated the creation of the form from Adobe DC, added the fields back to the document, then it was totally fine.
PDF box was not the problem... it was some unknown incorrect step that the person who originally prepared the pdf did.

showing emoji in pdf or excel

I have the data containing emoji in database. I want to display in the generated document such as pdf or in excel format.
I am using spring boot application. Please suggest any java library for generating either PDF or excel which supports emoji.
iText supports this. Assuming
your emoji is a unicode character
you use a font that contains the correct glyph for this unicode character
Best way to test this is to try it.
This is how to get started with iText:
https://developers.itextpdf.com/content/itext-7-jump-start-tutorial/installing-itext-7
And this is a small code-snippet that adds text to a document with different fonts:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
PdfFont font = PdfFontFactory.createFont(FontConstants.TIMES_ROMAN);
PdfFont bold = PdfFontFactory.createFont(FontConstants.TIMES_BOLD);
Text title =
new Text("The Strange Case of Dr. Jekyll and Mr. Hyde").setFont(bold);
Text author = new Text("Robert Louis Stevenson").setFont(font);
Paragraph p = new Paragraph().add(title).add(" by ").add(author);
document.add(p);
document.close();
For more information check out the tutorials.
https://developers.itextpdf.com/content/itext-7-building-blocks/chapter-1

HTML to Docx using Docx4j

I have been trying to convert html content to docx using their library and I do get a docx file created after running my app but it has blank content whereas the html does have some content in it. Please check the code below and I have included all necessary libraries from AndroidDocxtoHTML example on git.
Code:
// HTML Code
String html = "<html><head><title>Import me</title></head><body><p>Hello World!</p></body></html>";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/hw.html"));
afiPart.setBinaryData(html.getBytes());
afiPart.setContentType(new ContentType("text/html"));
Relationship altChunkRel = wordMLPackage.getMainDocumentPart().addTargetPart(afiPart);
// .. the bit in document body
CTAltChunk ac = Context.getWmlObjectFactory().createCTAltChunk();
ac.setId(altChunkRel.getId() );
wordMLPackage.getMainDocumentPart().addObject(ac);
// .. content type
wordMLPackage.getContentTypeManager().addDefaultContentType("html", "text/html");
I do not understand what's missing in the code that I get blank document.
I found this code for java which i modified for android.
Some people suggested using nightly build jar for xhtml conversions. Do I need to use that?
AlternativeFormatInputPart doesn't actually convert your HTML to normal docx content.
It is up to the application displaying the docx to do that (and most can't).
Instead, consider using docx4j-ImportXHTML to do the conversion.

docx conversion to pdf in korean font

Hope someone can help me. It's about docx to pdf conversion having korean sign in docx document.
I'm able to convert a docx document to pdf with docx4j.
In pdf document, I can see the result. But if my docx document contains korean font, I can't see any korean font in my pdf document except the latin numbers.
What do I have to do to get korean font in my pdf from the docx document?
Here is my code:
File docXFile ="E:/contract2Files/test.docx";
WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage.load(docXFile);
String out = docXFile .replace("docx","pdf");
File pdfFile = new File(out);
OutputStream pdfFileOs = new java.io.FileOutputStream(pdfFile);
org.docx4j.convert.out.pdf.PdfConversion c = new JanoPdfConversion(wordprocessingMLPackage);
c.output(pdfFileOs);
Please try http://www.docx4java.org/docx4j/docx4j-3_0-beta2.zip (link updated 15 Nov)
You might need to configure your font mapper, though things work out of the box with the Identity mapper on my Windows box, since I have the relevant font installed.
If this doesn't help, please put a sample docx somewhere StackOverflow users can see it.
thanks again. I tried the newest jar file with this code passage and it worked!!! Now I get the korean letters into pdf. Thank again.
ThemePart themePart =
wordprocessingMLPackage.getMainDocumentPart().getThemePart();
org.docx4j.dml.BaseStyles.FontScheme fontScheme = themePart.getFontScheme();
org.docx4j.dml.TextFont textFont = fontScheme.getMinorFont().getLatin();
textFont.setTypeface("Malgun Gothic");

Convert entire JSP to PDF [duplicate]

I have a webpage with a export option to PDF. I have to display the contents of the page in the PDF. Currently I use iText PDF Library to generate PDFs. The problem is creating PDF with iText is quite a challenge. Moreover we get frequent layout/UI changes for the webpage, so we have make the same changes to PDF.
Is there any way i can convert my JSP output to PDF. Like for example "if we set the content type to contentType="application/vnd.ms-excel", a JSP table can be rendered as Excel document.
Have you checked Jasper Reports ? It has the concept of XML templates. Also same template can be used to generate Word / XLS / PDF/ CSV / XML output.
You don't need to change the iText code generation if you use it in combination with Flying Saucer (a.k.a. XhtmlRenderer). It's then basically as simple as:
String inputPath = new File("/file.xhtml").toURI().toURL().toString();
OutputStream outputStream = new FileOutputStream("/file.pdf");
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(inputPath);
renderer.layout();
renderer.createPDF(outputStream);
outputStream.close();
You can find a blog with more code samples here.
You should check wkhtmltopdf.

Categories

Resources