Convert html to pdf in landscape mode using iText - java

I'm trying to convert html to pdf using iText.
Here is the simple code that is working fine :
ByteArrayOutputStream pdfStream = new ByteArrayOutputStream();
HtmlConverter.convertToPdf(htmlAsStringToConvert, pdfStream)
Now, I want to convert the pdf to LANDSCAPE mode, so I've tried :
ConverterProperties converterProperties = new ConverterProperties();
MediaDeviceDescription mediaDeviceDescription = new MediaDeviceDescription(MediaType.SCREEN);
mediaDeviceDescription.setOrientation(LANDSCAPE);
converterProperties.setMediaDeviceDescription(mediaDeviceDescription);
HtmlConverter.convertToPdf(htmlAsStringToConvert, pdfStream, converterProperties);
and also :
PdfDocument pdfDoc = new PdfDocument(writer);
pdfDoc.setDefaultPageSize(PageSize.A4.rotate());
HtmlConverter.convertToPdf(htmlAsStringToConvert, pdfDoc, new ConverterProperties()).
I've also mixed both, but the result remains the same, the final PDF is still in default mode.

The best way to achieve landscape page size when converting HTML to PDF is to provide the corresponding CSS instruction for the page to become landscape.
This is done with the following CSS:
#page {
size: landscape;
}
Now, if you have your input HTML document in htmlAsStringToConvert variable then you can process it as an HTML using Jsoup library which iText embeds. Basically we are just adding the necessary CSS instruction into our <head>:
Document htmlDoc = Jsoup.parse(htmlAsStringToConvert);
htmlDoc.head().append("<style>" +
"#page { size: landscape; } "
+ "</style>");
HtmlConverter.convertToPdf(htmlDoc.outerHtml(), new FileOutputStream(outPdf));
Beware that if you already have #page declarations in your HTML then the one you append might be in conflict with the ones you already have - in this case, you need to make sure you insert your declaration as the latest one (this should be the case with the code above as long as all of your declaration are in <head> element).

Related

itext7 - How to copy page as form XObject while keeping hidden OCGs hidden

I am using PdfFormXObject pageCopy = sourcePage.CopyAsFormXObject(pdf); to then insert pageCopy into a new PDF page using pdfCanvas.AddXObjectFittedIntoRectangle. The copied page is visible in the new PDF as expected, but it how has it's 'hidden' OCGs visible.
The reason I am doing this is to be able to take a PDF page, scale and crop it and add it to a new PDF where it may be collated with other contents.
Is there a way to remove OCG PDF content prior to create the XObject, or is there a different way of achieving my goal without using the XObject route that allows me to maintain the 'off' status of hidden OCGs
OCG removal functionality is not yet available in iText 7.
There is, however, a workaround that you can try to apply: we can copy all the information about OCGs from your source document to the target document which should create the same OCGs in the target document and preserve default on/off states.
To copy the OCGs, you can copy a page from one document to another one (which is going to copy all the OCGs) and then remove that page.
When the OCG removal functionality becomes available in iText the approach would become cleaner but for now you can use the code similar to the following:
PdfDocument sourceDocument = new PdfDocument(new PdfReader(sourcePdfPath));
PdfDocument targetDocument = new PdfDocument(new PdfWriter(targetPdfPath));
PdfFormXObject pageCopy = sourceDocument.getFirstPage().copyAsFormXObject(targetDocument);
PdfPage page = targetDocument.addNewPage();
PdfCanvas canvas = new PdfCanvas(page);
canvas.addXObject(pageCopy);
// Workaround: copying the page from source document to destination document also copies OCGs
sourceDocument.copyPagesTo(1, 1, targetDocument);
// Workaround: remove the page that we only copied to make sure OCGs are copied
targetDocument.removePage(targetDocument.getNumberOfPages());
sourceDocument.close();
targetDocument.close();

Html and css with images overlaps in iText7 html to pdf conversion in java

I am adding images to my html file and trying to convert it to pdf. Though my html file shows images and css styles properly, pdf overlaps the content and doesn't show complete html file.
File dest = new File(\\some path\\ gen.pdf);
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDocument = new PdfDocument(writer);
//pdfDocument.setDefaultPageSize(PageSize.B0);
// I have tried PageSize and addPage but that didn't work maybe because I have added writer in PdfDocument and then added pages. I didn't find any other way of adding it (this also didn't work).
ConverterProperties converterProperties = new ConverterProperties();
converterProperties.setBaseUri(absolutePath);
FileInputStream fis = new FileInputStream(newHtmlFile); // passing here html file
HtmlConverter.convertToPdf(fis, pdfDocument, converterProperties);
html
in styles I have given #page { size: 11in 14in; } also positions are position:absolute;
After converting to pdf I get the error:
c.i.layout.renderer.AbstractRenderer : Occupied area has not been initialized. Absolute positioning might be applied incorrectly.
This is screenshot of html:
pdf created using itext7:
Thanks in advance I have tried everything but it didn't work for me I have also tried flying saucer in that also images are not loading. Any help will be appreciated.

PDF Box flatten PDF causes weird spacing

I'm having an issue with PDF box flattening a PDF generated by Adobe Acrobat DC.
The Adobe Acrobat text field I created is absolutely the default text field.
In my example below, I have a PatientName field with the text value "Douglas McDouggelman".
When I flatten the PDF, here's what it looks like:
Anyone know what's up with this bizarre spacing?
It appears that the space + next character are combined. This is what it looks like when you try to select that character.
Code:
try (PDDocument document = PDDocument.load(pdfFormInputStream)) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
acroForm.getField("PatientName").setValue("Douglas McDouggelman");
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
if (flattenPdfs) {
acroForm.flatten();
}
document.save(byteArrayOutputStream);
}
I realized this PDF was from some other group who made it and who knows what they did. So I found the source word document, repeated the creation of the form from Adobe DC, added the fields back to the document, then it was totally fine.
PDF box was not the problem... it was some unknown incorrect step that the person who originally prepared the pdf did.

extract thumbnail from 3d pdf using itextpdf

When I view a 3D pdf (aka PDF/E) with Adobe Acrobat Reader, it shows a thumbnail on the left side:
Is it possible to extract this thumbnail from the pdf using itext or is it generated on the fly by the viewer?
This is possible, though from what I am seeing I doubt your PDF has a specific thumbnail image and just renders the page in the thumbnail.
First, let's create a PDF that has a thumbnail according to the PDF specification since I couldn't find one. Section 12.3.4 of ISO-3200-2 (the PDF specification) states the following:
The thumbnail image for a page shall be an image XObject specified by the Thumb entry in the page object...
This can be easily created using iText like so:
PdfWriter writer = new PdfWriter(OUTPUT_FILE);
PdfDocument pdfDocument = new PdfDocument(writer);
Document document = new Document(pdfDocument);
document.add(new Paragraph("Hello world"));
PdfImageXObject thumbnail = new PdfImageXObject(ImageDataFactory.create(getInput("itext.png")));
pdfDocument.getFirstPage().getPdfObject().put(PdfName.Thumb, thumbnail.getPdfObject());
document.close();
Where getInput("itext.png") resolves to a full path of our image:
This gives us output.pdf
You'll note that neither Acrobat nor Reader display the thumbnail image- they simply render the page. Other readers do use our new thumbnail:
Since you are using reader I would think this means the thumbnail in your PDF is simply the rendered page since thumbnails appear to be ignored.
To answer your question, getting the thumbnail is simply the reverse of the operation above- we get the Page's dictionary and look for a /Thumb entry
PdfReader reader = new PdfReader(OUTPUT_FILE);
PdfDocument pdfDocument = new PdfDocument(reader);
PdfStream thumbnailStream = pdfDocument.getFirstPage().getPdfObject().getAsStream(PdfName.Thumb);
if (thumbnailStream != null) {
PdfImageXObject thumbnail = new PdfImageXObject(thumbnailStream);
BufferedImage image = thumbnail.getBufferedImage();
//Output to file, memory, etc
}

Changing page zoom of an existing pdf with PDFBox

I have a pdf which I'm iterating through using PDFBox as below:
PDDocument doc = PDDocument.load(new ByteArrayInputStream(bytearray));
PDDocumentCatalog catalog = doc.getDocumentCatalog();
for(PDPage page : catalog.getPages()){
...
}
I want to set the default magnification for the pages so that when it is opened through a pdf reader, it opens at 75% zoom by default. Is this possible? I've seen few posts where the zoom is set using PDPageXYZDestination, but I'm not sure whether that is applicable in my case.
Thanks.
Do this, it applies to the first seen page only, i.e. when opening:
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDPage page = doc.getPage(0); // zero-based; you can also put another number to jump to a specific existing page
PDPageXYZDestination dest = new PDPageXYZDestination();
dest.setPage(page);
dest.setZoom(0.75f);
dest.setLeft((int) page.getCropBox().getLowerLeftX());
dest.setTop((int) page.getCropBox().getUpperRightY());
PDActionGoTo action = new PDActionGoTo();
action.setDestination(dest);
catalog.setActions(null);
catalog.setOpenAction(action);
doc.save(...);

Categories

Resources