Apache POI - Error while merging pptx - java

I have a scenario where I need to copy few slides from a pptx (source.pptx) and download it as a separate pptx file (output.pptx) based on the presentation notes available in the slides.
I am using apache poi to achieve it. This is my code.
String filename = filepath+"\\source.pptx";
try {
XMLSlideShow ppt = new XMLSlideShow(new FileInputStream(filename));
XMLSlideShow outputppt = new XMLSlideShow();
XSLFSlide[] slides = ppt.getSlides();
for (int i = 0; i < slides.length; i++) {
try {
XSLFNotes mynotes = slides[i].getNotes();
for (XSLFShape shape : mynotes) {
if (shape instanceof XSLFTextShape) {
XSLFTextShape txShape = (XSLFTextShape) shape;
for (XSLFTextParagraph xslfParagraph : txShape.getTextParagraphs()) {
if (xslfParagraph.getText().equals("NOTES1") || xslfParagraph.getText().equals("NOTES2")) {
outputppt.createSlide().importContent(slides[i]);
}
}
}
}
} catch (Exception e) {
}
}
FileOutputStream out = new FileOutputStream("output.pptx");
outputppt.write(out);
out.close();
} catch (Exception e) {
e.printStackTrace();
}
When I open the output.pptx which is created, I am getting the following
error:
"PowerPoint found a problem with the content in output.pptx
PowerPoint can attempt to repair the presentation
If you trust the source of this presentation, click Repair."
Upon clicking repair: "PowerPoint removed unreadable content in merged.pptx
[Repaired]. You should review this presenation to determine whether any content
was unexpectedly changed or removed"
And I can see blank slides with "Click to add Title" and "Click to add Subtitle"
Any suggestions to solve this issue?

This code works for me to copy slide content, layout and notes.
Just modify the code to your needs if you want to follow your original question. I assume you simple have to:
not import the slide content from it's source slide
copy the notes content to the slide instead
// get the layout from the source slide
XSLFSlideLayout layout = srcSlide.getSlideLayout();
XSLFSlide newslide = ppt
.createSlide(defaultMaster.getLayout(layout.getType()))
.importContent(srcSlide);
XSLFNotes srcNotes = srcSlide.getNotes();
XSLFNotes newNotes = ppt.getNotesSlide(newslide);
newNotes.importContent(srcNotes);

I had the same error in a case where some text boxes were empty. Solved it by always setting an empty text in all placeholders when creating slides.
XSLFSlide slide = presentation.createSlide(slideMaster.getLayout(layout));
// remove any placeholder texts
for (XSLFTextShape ph : slide.getPlaceholders()) {
ph.clearText();
ph.setText("");
}

Related

Java Aspose Slides find and replace text cannot keep text style

I'm working with: Aspose.Slides lib to read PPT and PPTX files.
When I replace text with another text, the font size is broken.
Origin:
After replace text:
public void asposeTranslate(String fileName) throws IOException {
Locale.setDefault(new Locale("en-us"));
// Load presentation
Presentation pres = new Presentation(URL + "/" + fileName);
// Loop through each slide
for (ISlide slide : pres.getSlides()) {
// Get all text frames in the slide
ITextFrame[] tf = SlideUtil.getAllTextBoxes(slide);
for (int i = 0; i < tf.length; i++) {
for (IParagraph para : tf[i].getParagraphs()) {
for (IPortion port : para.getPortions()) {
String originText = port.getText();
String newText = translateText(originTexmakes); // method make a new text
port.setText(newText); // replace with new text
}
}
}
}
pres.save(URL + "/new_" + fileName, SaveFormat.Pptx);
}
I read from blogs: https://blog.aspose.com/slides/find-and-replace-text-in-powerpoint-using-java/#API-to-Find-and-Replace-Text-in-PowerPoint
After replacing the new text, How can I keep older all styles of the older text?
I used aspose-slides-21.7
Thanks,
You can post the issue on Aspose.Slides forum, provide a sample presentation and get help. I am working as a Support Developer at Aspose.

How to read text in XSLFGraphicFrame with Apache POI for PowerPoint

I'm making a Java program to find occurrrences of a particular keyword in documents. I want to read many types of file format, including all Microsoft Office documents.
I already made it with all of them except for PowerPoint ones, I'm using Apache POI code snippets found on StackOverflow or on other sources.
I discovered all slides are made of shapes (XSLFTextShape) but many of them are objects of class XSLFGraphicFrame or XSLFTable for which I can't use simply the toString() methods. How can I extract all of the text contained in them using Java.
This is the piece of code\pseudocode:
File f = new File("C:\\Users\\Windows\\Desktop\\Modulo 9.pptx");
PrintStream out = System.out;
FileInputStream is = new FileInputStream(f);
XMLSlideShow ppt = new XMLSlideShow(is);
for (XSLFSlide slide : ppt.getSlides()) {
for (XSLFShape shape : slide) {
if (shape instanceof XSLFTextShape) {
XSLFTextShape txShape = (XSLFTextShape) shape;
out.println(txShape.getText());
} else if (shape instanceof XSLFPictureShape) {
//do nothing
} else if (shape instanceof XSLFGraphicFrame or XSLFTable ) {
//print all text in it or in its children
}
}
}
If your requirement "to find occurrences of a particular keyword in documents" needs simply searching in all text content of SlideShows, then simply using SlideShowExtractor could be an approach. This also can act as entry point to an POITextExtractor for getting textual content of the document metadata / properties, such as author and title.
Example:
import java.io.FileInputStream;
import org.apache.poi.xslf.usermodel.*;
import org.apache.poi.sl.usermodel.SlideShow;
import org.apache.poi.sl.extractor.SlideShowExtractor;
import org.apache.poi.extractor.POITextExtractor;
public class SlideShowExtractorExample {
public static void main(String[] args) throws Exception {
SlideShow<XSLFShape,XSLFTextParagraph> slideshow
= new XMLSlideShow(new FileInputStream("Performance_Out.pptx"));
SlideShowExtractor<XSLFShape,XSLFTextParagraph> slideShowExtractor
= new SlideShowExtractor<XSLFShape,XSLFTextParagraph>(slideshow);
slideShowExtractor.setCommentsByDefault(true);
slideShowExtractor.setMasterByDefault(true);
slideShowExtractor.setNotesByDefault(true);
String allTextContentInSlideShow = slideShowExtractor.getText();
System.out.println(allTextContentInSlideShow);
System.out.println("===========================================================================");
POITextExtractor textExtractor = slideShowExtractor.getMetadataTextExtractor();
String metaData = textExtractor.getText();
System.out.println(metaData);
}
}
Of course there are kinds of XSLFGraphicFrame which are not read by SlideShowExtractor because they are not supported by apache poi until now. For example all kinds of SmartArt graphic. The text content of those is stored in /ppt/diagrams/data*.xml document parts which are referenced from the slides. Since apache poi does not supporting this until now, it only can be read using low level underlying methods.
For example to additionally get all text out of all /ppt/diagrams/data which are texts in SmartArt graphics we could do:
...
System.out.println("===========================================================================");
//additionally get all text out of all /ppt/diagrams/data which are texts in SmartArt graphics:
StringBuilder sb = new StringBuilder();
for (XSLFSlide slide : ((XMLSlideShow)slideshow).getSlides()) {
for (org.apache.poi.ooxml.POIXMLDocumentPart part : slide.getRelations()) {
if (part.getPackagePart().getPartName().getName().startsWith("/ppt/diagrams/data")) {
org.apache.xmlbeans.XmlObject xmlObject = org.apache.xmlbeans.XmlObject.Factory.parse(part.getPackagePart().getInputStream());
org.apache.xmlbeans.XmlCursor cursor = xmlObject.newCursor();
while(cursor.hasNextToken()) {
if (cursor.isText()) {
sb.append(cursor.getTextValue() + "\r\n");
}
cursor.toNextToken();
}
sb.append(slide.getSlideNumber() + "\r\n\r\n");
}
}
}
String allTextContentInDiagrams = sb.toString();
System.out.println(allTextContentInDiagrams);
...

Why are Form Fields set with PDFBox(2.0.11) not being displayed?

I'm using PDFBox 2.0.11 to open a PDF Form and pulling out the values. This works as expected. When I try to set a value it appears to work. When I open the PDF the value is not displayed. If I click in the field, the value is then displayed as set, but then disappears again when I click out of the field.
This seems to be a common issue, but none of the fixes I've seen seem to work.
if(file.exists())
{
PDDocument doc = PDDocument.load(file);
doc.setAllSecurityToBeRemoved(true);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
// Add Font
PDResources resources = new PDResources();
resources.put(COSName.getPDFName("Helv"), PDType1Font.HELVETICA);
form.setDefaultResources(resources);
// End Add Font
form.setNeedAppearances(false);
List<PDField> fields = form.getFields();
for (Object field : fields)
{
if (field instanceof PDTextField) {
PDTextField pdTextbox = (PDTextField) field;
System.out.println("PDTextBox " + pdTextbox.getFullyQualifiedName() + " " + pdTextbox.getValue());
if(pdTextbox.getFullyQualifiedName().equalsIgnoreCase("a3_5"))
{
try {
pdTextbox.getWidgets().get(0).setHidden(false);
pdTextbox.setValue("5500");
}
catch(Exception e){
e.printStackTrace();
}
}
}
else
{
System.out.print(field);
System.out.print(" = ");
System.out.print(field.getClass());
System.out.println();
}
}
doc.save("..._MINE_UPDATE.pdf");
doc.close();
}
Stack Trace
java.io.IOException: Could not find font: /Helvetica
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processSetFont(PDDefaultAppearanceString.java:179)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processOperator(PDDefaultAppearanceString.java:132)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processAppearanceStringOperators(PDDefaultAppearanceString.java:108)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.<init>(PDDefaultAppearanceString.java:86)
at org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:93)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.<init>(AppearanceGeneratorHelper.java:100)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:262)
at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:228)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:218)
at com.controller.TestPDFBox.loadData(TestPDFBox.java:87)
Skipped for loop
for (COSName fontResourceName : widgetResources.getFontNames())
{
try
{
if (acroFormResources.getFont(fontResourceName) == null)
{
LOG.debug("Adding font resource " + fontResourceName + " from widget to AcroForm");
acroFormResources.put(fontResourceName, widgetResources.getFont(fontResourceName));
}
}
catch (IOException e)
{
LOG.warn("Unable to match field level font with AcroForm font");
}
}
Thanks to Tilman Hausherr for helping me get to an answer that ultimately stemmed from a quirk with MacOSX's Preview application.
Preview for some reason strips out functionality that also causes you to be unable to set values correctly in the PDF.
The code above works correctly, although I did make a change to the Add Font section to the following.
// Add Font
PDResources resources = form.getDefaultResources();
if(resources == null)
{
resources = new PDResources();
}
resources.put(COSName.getPDFName("Helvetica"), PDType1Font.HELVETICA);
if(form.getDefaultResources() == null)
{
form.setDefaultResources(resources);
}
// End Add Font
In case it is not obvious: Don't create / edit / save your template pdf with Mac's Preview to use with PDFBox.
I ran into the same issue and had to re-create the PDF in Acrobat Pro. With this PDF the above code worked perfectly fine.

Add bookmarks in PDF files using PDFRenderer API

I am using PDF Render for reading and updating PDF.
I want to add bookmark in that PDF and update it using same API.
Is it possible to do so with PDF Renderer?
Here is some code snippet to update bookmarks in PDF which is not working
File file = new File("C:\\test.pdf");
RandomAccessFile raf = new RandomAccessFile(file, "rw");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, channel.size());
PDFFile pdffile = new PDFFile(buf);
OutlineNode rootNode = new OutlineNode("New Bookmark");
PDFPage page = pdffile.getPage(0);
OutlineNode node = pdffile.getOutline();
OutlineNode node2 = (OutlineNode)node.getNextNode();
node2.add(rootNode);
I am using PDFRenderer-0.9.0.jar lib for above example.
If any one worked on PDF Renderer, please suggest me.
You can add a bookmark using this.
For full code and implementation follow the link.
you can use follow the link if you want to access the bookmark later.
import pdftron.Common.PDFNetException;
import pdftron.PDF.*;
import pdftron.SDF.Obj;
import pdftron.SDF.SDFDoc;
public class BookmarkTest {
public static void main(String[] args)
{
PDFNet.initialize();
// Relative path to the folder containing test files.
String input_path = "../../TestFiles/";
String output_path = "../../TestFiles/Output/";
// The following example illustrates how to create and edit the outline tree
// using high-level Bookmark methods.
try
{
PDFDoc doc=new PDFDoc((input_path + "numbered.pdf"));
doc.initSecurityHandler();
// Lets first create the root bookmark items.
Bookmark red = Bookmark.create(doc, "Red");
Bookmark green = Bookmark.create(doc, "Green");
Bookmark blue = Bookmark.create(doc, "Blue");
doc.addRootBookmark(red);
doc.addRootBookmark(green);
doc.addRootBookmark(blue);
// You can also add new root bookmarks using Bookmark.AddNext("...")
blue.addNext("foo");
blue.addNext("bar");
// We can now associate new bookmarks with page destinations:
// The following example creates an 'explicit' destination (see
// section '8.2.1 Destinations' in PDF Reference for more details)
Destination red_dest = Destination.createFit((Page)(doc.getPageIterator().next()));
red.setAction(Action.createGoto(red_dest));
// Create an explicit destination to the first green page in the document
green.setAction(Action.createGoto(
Destination.createFit((Page)(doc.getPage(10))) ));
// The following example creates a 'named' destination (see
// section '8.2.1 Destinations' in PDF Reference for more details)
// Named destinations have certain advantages over explicit destinations.
byte[] key={'b','l','u','e','1'};
Action blue_action = Action.createGoto(key,
Destination.createFit((Page)(doc.getPage(19))) );
blue.setAction(blue_action);
// We can now add children Bookmarks
Bookmark sub_red1 = red.addChild("Red - Page 1");
sub_red1.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(1)))));
Bookmark sub_red2 = red.addChild("Red - Page 2");
sub_red2.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(2)))));
Bookmark sub_red3 = red.addChild("Red - Page 3");
sub_red3.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(3)))));
Bookmark sub_red4 = sub_red3.addChild("Red - Page 4");
sub_red4.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(4)))));
Bookmark sub_red5 = sub_red3.addChild("Red - Page 5");
sub_red5.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(5)))));
Bookmark sub_red6 = sub_red3.addChild("Red - Page 6");
sub_red6.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(6)))));
// Example of how to find and delete a bookmark by title text.
Bookmark foo = doc.getFirstBookmark().find("foo");
if (foo.isValid())
{
foo.delete();
}
else
{
throw new Exception("Foo is not Valid");
}
Bookmark bar = doc.getFirstBookmark().find("bar");
if (bar.isValid())
{
bar.delete();
}
else
{
throw new Exception("Bar is not Valid");
}
// Adding color to Bookmarks. Color and other formatting can help readers
// get around more easily in large PDF documents.
red.setColor(1, 0, 0);
green.setColor(0, 1, 0);
green.setFlags(2); // set bold font
blue.setColor(0, 0, 1);
blue.setFlags(3); // set bold and itallic
doc.save((output_path + "bookmark.pdf"), 0, null);
doc.close();
}
catch(Exception e)
{
System.out.println(e);
}
// The following example illustrates how to traverse the outline tree using
// Bookmark navigation methods: Bookmark.GetNext(), Bookmark.GetPrev(),
// Bookmark.GetFirstChild () and Bookmark.GetLastChild ().
try
{
// Open the document that was saved in the previous code sample
PDFDoc doc=new PDFDoc((output_path + "bookmark.pdf"));
doc.initSecurityHandler();
Bookmark root = doc.getFirstBookmark();
PrintOutlineTree(root);
doc.close();
System.out.println("Done.");
}
catch(Exception e)
{
System.out.println(e);
}
// The following example illustrates how to create a Bookmark to a page
// in a remote document. A remote go-to action is similar to an ordinary
// go-to action, but jumps to a destination in another PDF file instead
// of the current file. See Section 8.5.3 'Remote Go-To Actions' in PDF
// Reference Manual for details.
try
{
// Open the document that was saved in the previous code sample
PDFDoc doc=new PDFDoc((output_path + "bookmark.pdf"));
doc.initSecurityHandler();
// Create file specification (the file reffered to by the remote bookmark)
Obj file_spec = doc.createIndirectDict();
file_spec.putName("Type", "Filespec");
file_spec.putString("F", "bookmark.pdf");
FileSpec spec=new FileSpec(file_spec);
Action goto_remote = Action.createGotoRemote(spec, 5, true);
Bookmark remoteBookmark1 = Bookmark.create(doc, "REMOTE BOOKMARK 1");
remoteBookmark1.setAction(goto_remote);
doc.addRootBookmark(remoteBookmark1);
// Create another remote bootmark, but this time using the low-level SDF/Cos API.
// Create a remote action
Bookmark remoteBookmark2 = Bookmark.create(doc, "REMOTE BOOKMARK 2");
doc.addRootBookmark(remoteBookmark2);
Obj gotoR = remoteBookmark2.getSDFObj().putDict("A");
{
gotoR.putName("S","GoToR"); // Set action type
gotoR.putBool("NewWindow", true);
// Set the file specification
gotoR.put("F", file_spec);
// jump to the first page. Note that pages are indexed from 0.
Obj dest = gotoR.putArray("D"); // Set the destination
dest.pushBackNumber(9);
dest.pushBackName("Fit");
}
doc.save((output_path + "bookmark_remote.pdf"), SDFDoc.e_linearized, null);
doc.close();
}
catch(Exception e)
{
System.out.println(e);
}
PDFNet.terminate();
}
static void PrintIndent(Bookmark item) throws PDFNetException
{
int ident = item.getIndent() - 1;
for (int i=0; i<ident; ++i) System.out.print( " ");
}
// Prints out the outline tree to the standard output
static void PrintOutlineTree(Bookmark item) throws PDFNetException
{
for (; item.isValid(); item=item.getNext())
{
PrintIndent(item);
System.out.print((item.isOpen() ? "- " : "+ ") + item.getTitle() + " ACTION -> ");
// Print Action
Action action = item.getAction();
if (action.isValid())
{
if (action.getType() == Action.e_GoTo)
{
Destination dest = action.getDest();
if (dest.isValid())
{
Page page = dest.getPage();
System.out.println("GoTo Page #" + page.getIndex());
}
}
else
{
System.out.println("Not a 'GoTo' action");
}
}
else
{
System.out.println("NULL");
}
if (item.hasChildren()) // Recursively print children sub-trees
{
PrintOutlineTree(item.getFirstChild());
}
}
}
}

Read .doc file content and write into pdf file in java

I'm writing a java code that utilizes Apache-poi to read ms-office .doc file and itext jar API's to create and write into pdf file. I have done reading texts and tables printed in the .doc file. Now i'm looking for a solution that reads images written in the document. I have coded as following to read images in the document file. Why this code is not working.
public static void main(String[] args) {
POIFSFileSystem fs = null;
Document document = new Document();
WordExtractor extractor = null ;
try {
fs = new POIFSFileSystem(new FileInputStream("C:\\DATASTORE\\tableandImage.doc"));
HWPFDocument hdocument=new HWPFDocument(fs);
extractor = new WordExtractor(hdocument);
OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/tableandImage.pdf"));
PdfWriter.getInstance(document, fileOutput);
document.open();
Range range=hdocument.getRange();
String readText=null;
PdfPTable createTable;
CharacterRun run;
PicturesTable picture;
for(int i=0;i<range.numParagraphs();i++) {
Paragraph par = range.getParagraph(i);
readText=par.text();
if(!par.isInTable()) {
if(readText.endsWith("\n")) {
readText=readText+"\n";
document.add(new com.itextpdf.text.Paragraph(readText));
} if(readText.endsWith("\r")) {
readText += "\n";
document.add(new com.itextpdf.text.Paragraph(readText));
}
run =range.getCharacterRun(i);
picture=hdocument.getPicturesTable();
if(picture.hasPicture(run)) {
//if(run.isSpecialCharacter()) {
Picture pic=picture.extractPicture(run, true);
byte[] picturearray=pic.getContent();
com.itextpdf.text.Image image=com.itextpdf.text.Image.getInstance(picturearray);
document.add(image);
}
} else if (par.isInTable()) {
Table table = range.getTable(par);
TableRow tRow1= table.getRow(0);
int numColumns=tRow1.numCells();
createTable=new PdfPTable(numColumns);
for (int rowId=0;rowId<table.numRows();rowId++) {
TableRow tRow = table.getRow(rowId);
for (int cellId=0;cellId<tRow.numCells();cellId++) {
TableCell tCell = tRow.getCell(cellId);
PdfPCell c1 = new PdfPCell(new Phrase(tCell.text()));
createTable.addCell(c1);
}
}
document.add(createTable);
}
}
}catch(IOException e) {
System.out.println("IO Exception");
e.printStackTrace();
}
catch(Exception exep) {
exep.printStackTrace();
}finally {
document.close();
}
}
The problems are:
1. Condition if(picture.hasPicture(run)) is not satisfying but document has jpeg image.
I'm getting following exception while reading table.
java.lang.IllegalArgumentException: This paragraph is not the first one in the table
at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:876)
at pagecode.ReadDocxOrDocFile.main(ReadDocxOrDocFile.java:113)
Can anybody help me to solve the problem.
Thank you.
Regarding your exception:
Your code iterates over all paragraphs and calls isInTable() for each one of them. Since tables are commonly composed of several such paragraphs, your call to getTable() also gets executed several times for a single table.
However, what your code should do instead is to find the first paragraph of a table, then process all paragraphs therein (via getRow(m).getCell(n)) and ultimately continue with the outer loop in the first paragraph after the table. Codewise this may look roughly like the following (assuming no merged cells, no nested tables and no other funny edge cases):
if (par.isInTable()) {
Table table = range.getTable(par);
for (int rn=0; rn<table.numRows(); rn++) {
TableRow row = table.getRow(rn);
for (int cn=0; cn<row.numCells(); cn++) {
TableCell cell = row.getCell(cn);
for (int pn=0; pn<cell.numParagraphs(); pn++) {
Paragraph cellParagraph = cell.getParagraph(pn);
// your PDF conversion code goes here
}
}
}
i += table.numParagraphs()-1; // skip the already processed (table-)paragraphs in the outer loop
}
Regarding the pictures issue:
Am I guessing right that you are trying to obtain the picture which is anchored within a given paragraph? Unfortunately, the predefined methods of POI only work if the picture is not embedded within a field (which is rather rare, actually). For field-based images (i.e. preview images of embedded OLEs) you should do something like the following (untested!):
PictureStore pictureStore = new PictureStore(hdocument);
// bla bla ...
for (int cr=0; cr < par.numCharacterRuns(); cr++) {
CharacterRun characterRun = par.getCharacterRun(cr);
Field field = hdocument.getFields().getFieldByStartOffset(FieldsDocumentPart.MAIN, characterRun.getStartOffset());
if (field != null && field.getType() == 0x3A) { // 0x3A is type "EMBED"
Picture pic = pictureStore.getPicture(field.secondSubrange(characterRun));
}
}
For a list of possible values of Field.getType() see here.

Categories

Resources