Adding hyperlink to inner PDF files - java

I have to create a PDF file by adding two PDF files inside a generated PDF file as a tree structure using iText in Java.
I have to create bookmarks with PDF file names and add a hyperlink to the bookmark. When the bookmark is clicked, the respective PDF should be opened in that PDF file itself, not as a separate PDF.
PDFTREE
pdf1
pdf2

Such bookmarks are referred to as outline elements in the PDF specification (PDF 32000-1:2008, p.367):
The outline consists of a tree-structured hierarchy of outline items (sometimes called bookmarks), which serve as a visual table of contents to display the document’s structure to the user.
If you merge the documents with PdfMerger, the outlines are copied to the resulting PDF by default. However, you want a main-node per document and not a flat list of bookmarks. Since cloning and copying outlines in no trivial task, it is best to let iText handle this. Unfortunately, we have little direct control how outlines are being merged.
We can build a SpecialMerger as a wrapper around PdfMerger to extract the cloned outlines (first step) and get them into a hierarchical structure afterwards (second step). The outline of each merged PDF is temporarily stored in the outlineList together with the desired name of the main node and its reference (page number in the merged PDF). After all the PDFs are merged, we can attach the temporarily stored outlines back to the root-node.
public static class SpecialMerger {
private final PdfDocument outputPdf;
private final PdfMerger merger;
private final PdfOutline rootOutline;
private final List<DocumentOutline> outlineList = new ArrayList<>();
private int nextPageNr = 1;
public SpecialMerger(final PdfDocument outputPdf) {
if (outputPdf.getNumberOfPages() != 0) {
throw new IllegalArgumentException("PDF must be empty");
}
this.outputPdf = outputPdf;
this.merger = new PdfMerger(outputPdf, true, true);
this.rootOutline = outputPdf.getOutlines(false);
}
public void merge(PdfDocument from, int fromPage, int toPage, String filename) {
merger.merge(from, fromPage, toPage); // merge with normal PdfMerger
// extract and clone outline of merged document
final List<PdfOutline> children = new ArrayList<>(rootOutline.getAllChildren());
rootOutline.getAllChildren().clear(); // clear root outline
outlineList.add(new DocumentOutline(filename, nextPageNr, children));
nextPageNr = outputPdf.getNumberOfPages() + 1; // update next page number
}
public void writeOutline() {
outlineList.forEach(o -> {
final PdfOutline outline = rootOutline.addOutline(o.getName()); // bookmark with PDF name
outline.addDestination(PdfExplicitDestination.createFit(outputPdf.getPage(o.getPageNr())));
outline.setStyle(PdfOutline.FLAG_BOLD);
o.getChildern().forEach(outline::addOutline); // add all extracted child bookmarks
});
}
private static class DocumentOutline {
private final String name;
private final int pageNr;
private final List<PdfOutline> childern;
public DocumentOutline(final String pdfName, final int pageNr, final List<PdfOutline> childern) {
this.name = pdfName;
this.pageNr = pageNr;
this.childern = childern;
}
public String getName() {
return name;
}
public int getPageNr() {
return pageNr;
}
public List<PdfOutline> getChildern() {
return childern;
}
}
}
Now, we can use this custom merger to merge the PDFs and then add the outline with writeOutline:
public static void main(String[] args) throws IOException {
String filename1 = "pdf1.pdf";
String filename2 = "pdf2.pdf";
try (
PdfDocument generatedPdf = new PdfDocument(new PdfWriter("output.pdf"));
PdfDocument pdfDocument1 = new PdfDocument(new PdfReader(filename1));
PdfDocument pdfDocument2 = new PdfDocument(new PdfReader(filename2))
) {
final SpecialMerger merger = new SpecialMerger(generatedPdf);
merger.merge(pdfDocument1, 1, pdfDocument1.getNumberOfPages(), filename1);
merger.merge(pdfDocument2, 1, pdfDocument2.getNumberOfPages(), filename2);
merger.writeOutline();
}
}
The result looks like this (Preview and Adobe Acrobat Reader on macOS):
Another option is to make a portfolio by embedding the PDFs. However, this is not supported by all PDF viewers and most users are not accustomed to these portfolios.
public static void main(String[] args) throws IOException {
String filename1 = "pdf1.pdf";
String filename2 = "pdf2.pdf";
try (PdfDocument generatedPdf = new PdfDocument(new PdfWriter("portfolio.pdf"))) {
Document doc = new Document(generatedPdf);
doc.add(new Paragraph("This PDF contains embedded documents."));
doc.add(new Paragraph("Use a compatible PDF viewer if you cannot see them."));
PdfCollection collection = new PdfCollection();
collection.setView(PdfCollection.TILE);
generatedPdf.getCatalog().setCollection(collection);
addAttachment(generatedPdf, filename1, filename1);
addAttachment(generatedPdf, filename2, filename2);
}
}
private static void addAttachment(PdfDocument doc, String attachmentPath, String name) throws IOException {
PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec(doc, attachmentPath, name, name, null, null);
doc.addFileAttachment(name, fileSpec);
}
The result in Adobe Acrobat Reader on macOS:

Related

Apache POI - Java - Get Section Name - PowerPoint

Microsoft Powerpoint has a feature to split the slides by section (a logical grouping).
What's the best way to extract the section name?
Tech Stack -
Apache POI - v5.2.2
Java
I've achieved the same with VBA
sectionName = ActivePresentation.SectionProperties.Name(currentSlide.sectionIndex)
The Office Open XML which Apache POI uses is Office Open XML defined in 2006 and first published in Office 2007. This OOXML does not know something about sections in presentations. Sections were introduced later (2010).
Even ECMA-376 5th edition does not contain anything about sections in presentations. So Microsoft has not public published XSDs for this extension yet. So XmlBeans can't have created classes for it.
So if one would want using that feature, one would must manipulate the XML directly.
How to get what XML needs to be manipulated?
All Office Open XML files, so also PowerPoint *.pptx, are ZIP archives containing XML files and other files in a special directory structure. One can simply unzip a *.pptx file and have a look into.
Have a look into the /ppt/presentation.xml and you will see the XML.
What to use to manipulate the XML?
One can use org.openxmlformats.schemas.presentationml.x2006.main.* classes contained in poi-ooxml-full-5.*.jar as long as possible and else org.apache.xmlbeans.XmlObject and/or org.apache.xmlbeans.XmlCursorcontained in xmlbeans-5.*.jar. But using XmlObject directly can be very laborious.
Complete example for how to get the sections and the section names:
import java.io.FileInputStream;
import org.apache.poi.xslf.usermodel.*;
import org.apache.xmlbeans.XmlObject;
import javax.xml.namespace.QName;
public class PowerPointGetSectionProperties {
static Long getSlideId(XSLFSlide slide) {
if (slide == null) return null;
Long slideId = null;
XMLSlideShow presentation = slide.getSlideShow();
String slideRId = presentation.getRelationId(slide);
org.openxmlformats.schemas.presentationml.x2006.main.CTPresentation ctPresentation = presentation.getCTPresentation();
org.openxmlformats.schemas.presentationml.x2006.main.CTSlideIdList sldIdLst = ctPresentation.getSldIdLst();
for (org.openxmlformats.schemas.presentationml.x2006.main.CTSlideIdListEntry sldId : sldIdLst.getSldIdList()) {
if (sldId.getId2().equals(slideRId)) {
slideId = sldId.getId();
break;
}
}
return slideId;
}
static XmlObject[] getSections(org.openxmlformats.schemas.presentationml.x2006.main.CTExtensionList extList) {
if (extList == null) return new XmlObject[0];
XmlObject[] sections = extList.selectPath(
"declare namespace p14='http://schemas.microsoft.com/office/powerpoint/2010/main' "
+".//p14:section");
return sections;
}
static XmlObject[] getSectionSldIds(XmlObject section) {
if (section == null) return new XmlObject[0];
XmlObject[] sldIds = section.selectPath(
"declare namespace p14='http://schemas.microsoft.com/office/powerpoint/2010/main' "
+".//p14:sldId");
return sldIds;
}
static Long getSectionSldId(XmlObject sectionSldId) {
if (sectionSldId == null) return null;
Long sldIdL = null;
XmlObject sldIdO = sectionSldId.selectAttribute(new QName("id"));
if (sldIdO instanceof org.apache.xmlbeans.impl.values.XmlObjectBase) {
String sldIsS = ((org.apache.xmlbeans.impl.values.XmlObjectBase)sldIdO).getStringValue();
try {
sldIdL = Long.valueOf(sldIsS);
} catch (Exception ex) {
// do nothing
}
}
return sldIdL;
}
static XmlObject getSection(XSLFSlide slide) {
Long slideId = getSlideId(slide);
if (slideId != null) {
XMLSlideShow presentation = slide.getSlideShow();
org.openxmlformats.schemas.presentationml.x2006.main.CTPresentation ctPresentation = presentation.getCTPresentation();
org.openxmlformats.schemas.presentationml.x2006.main.CTExtensionList extList = ctPresentation.getExtLst();
XmlObject[] sections = getSections(extList);
for (XmlObject section : sections) {
XmlObject[] sectionSldIds = getSectionSldIds(section);
for (XmlObject sectionSldId : sectionSldIds) {
Long sldIdL = getSectionSldId(sectionSldId);
if (slideId.equals(sldIdL)) {
return section;
}
}
}
}
return null;
}
static String getSectionName(XmlObject section) {
if (section == null) return null;
String sectionName = null;
XmlObject name = section.selectAttribute(new QName("name"));
if (name instanceof org.apache.xmlbeans.impl.values.XmlObjectBase) {
sectionName = ((org.apache.xmlbeans.impl.values.XmlObjectBase)name).getStringValue();
}
return sectionName;
}
public static void main(String args[]) throws Exception {
XMLSlideShow slideShow = new XMLSlideShow(new FileInputStream("./PPTXUsingSections.pptx"));
for (XSLFSlide slide : slideShow.getSlides()) {
System.out.println(slide.getSlideName());
XmlObject section = getSection(slide);
String sectionName = getSectionName(section);
System.out.println(sectionName);
}
slideShow.close();
}
}

encoding issue after pdfbox

I want to extract text in PDF on Java, so I use pdfbox library. PDF file seems to have been written by hwp(korea word edit software) before it was converted to a PDF file.
This is my simple API.
#RestController
#RequiredArgsConstructor
public class QuestionController {
private final QuestionParseService questionParseService;
#GetMapping("/")
public ResponseEntity<?> parsePDF() throws IOException {
return ResponseEntity.ok(questionParseService.parsePDF());
}
}
#Service
public class QuestionParseService {
public String parsePDF() throws IOException {
File file = new File("filePath");
PDDocument document = PDDocument.load(file);
PDFTextStripper s = new PDFTextStripper();
String content = s.getText(document);
return content;
}
}
This is my PDF file PDF file
But, the API result of question 1 was


×
 

의 값은? [2점]
①  ②  ③  ④  ⑤ 
How can I get correctly encoded text?

Creating a new header in docx4j

I have a maven project with docx4j. I have managed to successfully convert html file to docx. However I'm interested into inserting a header to the docx file.
In github docx4j there is a sample (link) which I used the it worked as expected, i.e.
Relationship relationship = createHeaderPart(wordMLPackage);
public static Relationship createHeaderPart(
WordprocessingMLPackage wordprocessingMLPackage)
throws Exception {
HeaderPart headerPart = new HeaderPart();
Relationship rel = wordprocessingMLPackage.getMainDocumentPart()
.addTargetPart(headerPart);
// After addTargetPart, so image can be added properly
headerPart.setJaxbElement(getHdr(wordprocessingMLPackage, headerPart));
return rel;
}
public static Hdr getHdr(WordprocessingMLPackage wordprocessingMLPackage,
Part sourcePart) throws Exception {
Hdr hdr = objectFactory.createHdr();
// I modified it for simplicity
P headerParagraph = docx.getMainDocumentPart().createParagraphOfText("hi there");
hdr.getContent().add(headerParagraph);
return hdr;
}
This is working as expected
However I'm interested into using dynamic content from html so I used:
public static Hdr getHdr(WordprocessingMLPackage wordprocessingMLPackage,
Part sourcePart) throws Exception {
Hdr hdr = objectFactory.createHdr();
String html = "<html><body><p>hi there</p></body></html>";
XHTMLImporter XHTMLImporter = new XHTMLImporterImpl(wordprocessingMLPackage);
hdr.getContent().add(XHTMLImporter.convert(html, null));
return hdr;
}
This doesn't work at all. Any ideas?
I just noticed that XHTMLImporter is creating a list of objects, i.e.
public static Hdr getHdr(WordprocessingMLPackage wordprocessingMLPackage,
Part sourcePart) throws Exception {
Hdr hdr = objectFactory.createHdr();
String html = "<html><body><p>hi there</p></body></html>";
XHTMLImporter XHTMLImporter = new XHTMLImporterImpl(wordprocessingMLPackage);
List<Object> list = XHTMLImporter.convert(html, null);
hdr.getContent().add(list.get(0));
return hdr;
}

How to change direction of Hebrew letters?

I'm using itext 7.1.8 and I need to save Hebrew text in my document. I found this solution here but it doesn't work for me.
My code looks like the following:
public class RunItextApp {
public static void main(String[] args) throws Exception {
final String filename = "simple.pdf";
final String hebrew = "שדג";
final String text = "\u05E9\u05D3\u05D2";
createSimplePdf(filename, hebrew);
}
private static void createSimplePdf(String filename, String text) throws Exception {
final String path = RunItextApp.class.getResource("/Arial.ttf").getPath();
final PdfFont font = PdfFontFactory.createFont(path, PdfEncodings.IDENTITY_H, true);
Style hebrewStyle = new Style()
.setBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.setTextAlignment(TextAlignment.RIGHT)
.setFontSize(14)
.setFont(font);
final PdfWriter pdfWriter = new PdfWriter(filename);
final PdfDocument pdfDocument = new PdfDocument(pdfWriter);
final Document pdf = new Document(pdfDocument);
pdf.setBaseDirection(BaseDirection.RIGHT_TO_LEFT);
pdf.add(
new Paragraph(text)
.setFontScript(Character.UnicodeScript.HEBREW)
.addStyle(hebrewStyle)
);
pdf.close();
}
}
Why this code doesn't work?
How can I set text direction?
Please take a look at this page https://kb.itextpdf.com/home/it5kb/faq/how-to-set-rtl-direction-for-hebrew-when-converting-html-to-pdf. It is described the same problem you have. I think that the main trick is in text font.

How do you create a PDF document using iText that has pages with differing page sizes? [duplicate]

I'm using iTextSharp to generate a large document. In this document I want some specific pages in landscape. All the rest is portrait. Does anyone know how I can do this?
Starting a new document is not an option.
Thanks!
You can set the document size and it will affect the next pages. Some snippets:
Set up your document somewhere (you know that already):
var document = new Document();
PdfWriter pdfWriter = PdfWriter.GetInstance(
document, new FileStream(destinationFile, FileMode.Create)
);
pdfWriter.SetFullCompression();
pdfWriter.StrictImageSequence = true;
pdfWriter.SetLinearPageMode();
Now loop over your pages (you probably do that as well already) and decide what page size you want per page:
for (int pageIndex = 1; pageIndex <= pageCount; pageIndex++) {
// Define the page size here, _before_ you start the page.
// You can easily switch from landscape to portrait to whatever
document.SetPageSize(new Rectangle(600, 800));
if (document.IsOpen()) {
document.NewPage();
} else {
document.Open();
}
}
try this code :
using System;
using System.IO;
using iText.Kernel.Events;
using iText.Kernel.Pdf;
using iText.Layout;
using iText.Layout.Element;
namespace iText.Samples.Sandbox.Events
{
public class PageOrientations
{
public static readonly String DEST = "results/sandbox/events/page_orientations.pdf";
public static readonly PdfNumber PORTRAIT = new PdfNumber(0);
public static readonly PdfNumber LANDSCAPE = new PdfNumber(90);
public static readonly PdfNumber INVERTEDPORTRAIT = new PdfNumber(180);
public static readonly PdfNumber SEASCAPE = new PdfNumber(270);
public static void Main(String[] args)
{
FileInfo file = new FileInfo(DEST);
file.Directory.Create();
new PageOrientations().ManipulatePdf(DEST);
}
protected void ManipulatePdf(String dest)
{
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest));
// The default page orientation is set to portrait in the custom event handler.
PageOrientationsEventHandler eventHandler = new PageOrientationsEventHandler();
pdfDoc.AddEventHandler(PdfDocumentEvent.START_PAGE, eventHandler);
Document doc = new Document(pdfDoc);
doc.Add(new Paragraph("A simple page in portrait orientation"));
eventHandler.SetOrientation(LANDSCAPE);
doc.Add(new AreaBreak());
doc.Add(new Paragraph("A simple page in landscape orientation"));
eventHandler.SetOrientation(INVERTEDPORTRAIT);
doc.Add(new AreaBreak());
doc.Add(new Paragraph("A simple page in inverted portrait orientation"));
eventHandler.SetOrientation(SEASCAPE);
doc.Add(new AreaBreak());
doc.Add(new Paragraph("A simple page in seascape orientation"));
doc.Close();
}
private class PageOrientationsEventHandler : IEventHandler
{
private PdfNumber orientation = PORTRAIT;
public void SetOrientation(PdfNumber orientation)
{
this.orientation = orientation;
}
public void HandleEvent(Event currentEvent)
{
PdfDocumentEvent docEvent = (PdfDocumentEvent) currentEvent;
docEvent.GetPage().Put(PdfName.Rotate, orientation);
}
}
}
}

Categories

Resources