Java Aspose Slides find and replace text cannot keep text style - java

I'm working with: Aspose.Slides lib to read PPT and PPTX files.
When I replace text with another text, the font size is broken.
Origin:
After replace text:
public void asposeTranslate(String fileName) throws IOException {
Locale.setDefault(new Locale("en-us"));
// Load presentation
Presentation pres = new Presentation(URL + "/" + fileName);
// Loop through each slide
for (ISlide slide : pres.getSlides()) {
// Get all text frames in the slide
ITextFrame[] tf = SlideUtil.getAllTextBoxes(slide);
for (int i = 0; i < tf.length; i++) {
for (IParagraph para : tf[i].getParagraphs()) {
for (IPortion port : para.getPortions()) {
String originText = port.getText();
String newText = translateText(originTexmakes); // method make a new text
port.setText(newText); // replace with new text
}
}
}
}
pres.save(URL + "/new_" + fileName, SaveFormat.Pptx);
}
I read from blogs: https://blog.aspose.com/slides/find-and-replace-text-in-powerpoint-using-java/#API-to-Find-and-Replace-Text-in-PowerPoint
After replacing the new text, How can I keep older all styles of the older text?
I used aspose-slides-21.7
Thanks,

You can post the issue on Aspose.Slides forum, provide a sample presentation and get help. I am working as a Support Developer at Aspose.

Related

How to remove/replace a specific text in pdf, If the text to replace is drawn using multiple instructions following each other?

I've tried the following code. It works fine, but only for limited cases like if the text is added with a single instruction. How to do this if the text is added in multiple instructions. Can anyone help me out with this?
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor editor = new PdfContentStreamEditor(document, page) {
final StringBuilder recentChars = new StringBuilder();
#Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, Vector displacement)
throws IOException {
String string = font.toUnicode(code);
if (string != null)
recentChars.append(string);
super.showGlyph(textRenderingMatrix, font, code, displacement);
}
#Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String recentText = recentChars.toString();
recentChars.setLength(0);
String operatorString = operator.getName();
if (TEXT_SHOWING_OPERATORS.contains(operatorString) && "Text which is to be replace".equals(recentText))
{
return;
}
super.write(contentStreamWriter, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
editor.processPage(page);
}
document.save("watermark-RemoveByText.pdf");```
You can use pdfSweep (iText 7 add-on) that removes or redacts information from a PDF document.
For more information see https://itextpdf.com/en/products/itext-7/pdf-redaction-pdfsweep
The code which removes "Text which is to be replace" text from PDF document on Java with using of PdfSweep looks like this:
try (PdfDocument pdf = new PdfDocument(new PdfReader("Path to source file"), new PdfWriter("Path to out file"))) {
ICleanupStrategy cleanupStrategy = new RegexBasedCleanupStrategy("Text which is to be replace")
.setRedactionColor(ColorConstants.WHITE);
PdfCleaner.autoSweepCleanUp(pdf, cleanupStrategy);
}

How to read text in XSLFGraphicFrame with Apache POI for PowerPoint

I'm making a Java program to find occurrrences of a particular keyword in documents. I want to read many types of file format, including all Microsoft Office documents.
I already made it with all of them except for PowerPoint ones, I'm using Apache POI code snippets found on StackOverflow or on other sources.
I discovered all slides are made of shapes (XSLFTextShape) but many of them are objects of class XSLFGraphicFrame or XSLFTable for which I can't use simply the toString() methods. How can I extract all of the text contained in them using Java.
This is the piece of code\pseudocode:
File f = new File("C:\\Users\\Windows\\Desktop\\Modulo 9.pptx");
PrintStream out = System.out;
FileInputStream is = new FileInputStream(f);
XMLSlideShow ppt = new XMLSlideShow(is);
for (XSLFSlide slide : ppt.getSlides()) {
for (XSLFShape shape : slide) {
if (shape instanceof XSLFTextShape) {
XSLFTextShape txShape = (XSLFTextShape) shape;
out.println(txShape.getText());
} else if (shape instanceof XSLFPictureShape) {
//do nothing
} else if (shape instanceof XSLFGraphicFrame or XSLFTable ) {
//print all text in it or in its children
}
}
}
If your requirement "to find occurrences of a particular keyword in documents" needs simply searching in all text content of SlideShows, then simply using SlideShowExtractor could be an approach. This also can act as entry point to an POITextExtractor for getting textual content of the document metadata / properties, such as author and title.
Example:
import java.io.FileInputStream;
import org.apache.poi.xslf.usermodel.*;
import org.apache.poi.sl.usermodel.SlideShow;
import org.apache.poi.sl.extractor.SlideShowExtractor;
import org.apache.poi.extractor.POITextExtractor;
public class SlideShowExtractorExample {
public static void main(String[] args) throws Exception {
SlideShow<XSLFShape,XSLFTextParagraph> slideshow
= new XMLSlideShow(new FileInputStream("Performance_Out.pptx"));
SlideShowExtractor<XSLFShape,XSLFTextParagraph> slideShowExtractor
= new SlideShowExtractor<XSLFShape,XSLFTextParagraph>(slideshow);
slideShowExtractor.setCommentsByDefault(true);
slideShowExtractor.setMasterByDefault(true);
slideShowExtractor.setNotesByDefault(true);
String allTextContentInSlideShow = slideShowExtractor.getText();
System.out.println(allTextContentInSlideShow);
System.out.println("===========================================================================");
POITextExtractor textExtractor = slideShowExtractor.getMetadataTextExtractor();
String metaData = textExtractor.getText();
System.out.println(metaData);
}
}
Of course there are kinds of XSLFGraphicFrame which are not read by SlideShowExtractor because they are not supported by apache poi until now. For example all kinds of SmartArt graphic. The text content of those is stored in /ppt/diagrams/data*.xml document parts which are referenced from the slides. Since apache poi does not supporting this until now, it only can be read using low level underlying methods.
For example to additionally get all text out of all /ppt/diagrams/data which are texts in SmartArt graphics we could do:
...
System.out.println("===========================================================================");
//additionally get all text out of all /ppt/diagrams/data which are texts in SmartArt graphics:
StringBuilder sb = new StringBuilder();
for (XSLFSlide slide : ((XMLSlideShow)slideshow).getSlides()) {
for (org.apache.poi.ooxml.POIXMLDocumentPart part : slide.getRelations()) {
if (part.getPackagePart().getPartName().getName().startsWith("/ppt/diagrams/data")) {
org.apache.xmlbeans.XmlObject xmlObject = org.apache.xmlbeans.XmlObject.Factory.parse(part.getPackagePart().getInputStream());
org.apache.xmlbeans.XmlCursor cursor = xmlObject.newCursor();
while(cursor.hasNextToken()) {
if (cursor.isText()) {
sb.append(cursor.getTextValue() + "\r\n");
}
cursor.toNextToken();
}
sb.append(slide.getSlideNumber() + "\r\n\r\n");
}
}
}
String allTextContentInDiagrams = sb.toString();
System.out.println(allTextContentInDiagrams);
...

How to read DOCX using Apache POI in page by page mode

I would like to read a docx files to search for a particular text. I would like the program to print the page on which it was found and the document name.
I have written this simple method, but it doesn't count any page:
private static void searchDocx(File file, String searchText) throws IOException {
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
int pageNo = 1;
for (XWPFParagraph paragraph : document.getParagraphs()) {
String text = paragraph.getText();
if (text != null) {
if (text.toLowerCase().contains(searchText.toLowerCase())) {
System.out.println("found on page: " + pageNo+ " in: " + file.getAbsolutePath());
}
}
if (paragraph.isPageBreak()) {
pageNo++;
}
}
}
How to read the file, to be able to print the information on which page the searchText was found? Is there any way to know the page when reading the docx using ApachePOI?

How to Convert splitted pdf to Excel file using java pdfbox

Im new to this PDFBOX. I hve one pdf file, which contain 60 pages. Im using Apache PDFBox-app-1.8.10. jar splitting up the PDF files.
public class SplitDemo {
public static void main(String[] args) throws IOException {
JButton open = new JButton();
JFileChooser fc = new JFileChooser();
fc.setCurrentDirectory(new java.io.File("C:/Users"));
fc.setDialogTitle("Select PDF");
if(fc.showOpenDialog(open)== JFileChooser.APPROVE_OPTION)
{
}
String a = null;
a = fc.getSelectedFile().getAbsolutePath();
PDDocument document = new PDDocument();
document = PDDocument.load(a);
// Create a Splitter object
Splitter splitter = new Splitter();
// We need this as split method returns a list
List<PDDocument> listOfSplitPages;
// We are receiving the split pages as a list of PDFs
listOfSplitPages = splitter.split(document);
// We need an iterator to iterate through them
Iterator<PDDocument> iterator = listOfSplitPages.listIterator();
// I am using variable i to denote page numbers.
int i = 1;
while(iterator.hasNext()){
PDDocument pd = iterator.next();
try{
// Saving each page with its assumed page no.
pd.save("G://PDFCopy/Page " + i++ + ".pdf");
} catch (COSVisitorException anException){
// Something went wrong with a PDF object
System.out.println("Something went wrong with page " + (i-1) + "\n Here is the error message" + anException);
}
}
}
}
**In PDFCopy Folder i hve list of pdf files. How can I convert all pdf to excel format and need to save it in the target folder. i am fully confused in this conversion. **

Apache POI - Error while merging pptx

I have a scenario where I need to copy few slides from a pptx (source.pptx) and download it as a separate pptx file (output.pptx) based on the presentation notes available in the slides.
I am using apache poi to achieve it. This is my code.
String filename = filepath+"\\source.pptx";
try {
XMLSlideShow ppt = new XMLSlideShow(new FileInputStream(filename));
XMLSlideShow outputppt = new XMLSlideShow();
XSLFSlide[] slides = ppt.getSlides();
for (int i = 0; i < slides.length; i++) {
try {
XSLFNotes mynotes = slides[i].getNotes();
for (XSLFShape shape : mynotes) {
if (shape instanceof XSLFTextShape) {
XSLFTextShape txShape = (XSLFTextShape) shape;
for (XSLFTextParagraph xslfParagraph : txShape.getTextParagraphs()) {
if (xslfParagraph.getText().equals("NOTES1") || xslfParagraph.getText().equals("NOTES2")) {
outputppt.createSlide().importContent(slides[i]);
}
}
}
}
} catch (Exception e) {
}
}
FileOutputStream out = new FileOutputStream("output.pptx");
outputppt.write(out);
out.close();
} catch (Exception e) {
e.printStackTrace();
}
When I open the output.pptx which is created, I am getting the following
error:
"PowerPoint found a problem with the content in output.pptx
PowerPoint can attempt to repair the presentation
If you trust the source of this presentation, click Repair."
Upon clicking repair: "PowerPoint removed unreadable content in merged.pptx
[Repaired]. You should review this presenation to determine whether any content
was unexpectedly changed or removed"
And I can see blank slides with "Click to add Title" and "Click to add Subtitle"
Any suggestions to solve this issue?
This code works for me to copy slide content, layout and notes.
Just modify the code to your needs if you want to follow your original question. I assume you simple have to:
not import the slide content from it's source slide
copy the notes content to the slide instead
// get the layout from the source slide
XSLFSlideLayout layout = srcSlide.getSlideLayout();
XSLFSlide newslide = ppt
.createSlide(defaultMaster.getLayout(layout.getType()))
.importContent(srcSlide);
XSLFNotes srcNotes = srcSlide.getNotes();
XSLFNotes newNotes = ppt.getNotesSlide(newslide);
newNotes.importContent(srcNotes);
I had the same error in a case where some text boxes were empty. Solved it by always setting an empty text in all placeholders when creating slides.
XSLFSlide slide = presentation.createSlide(slideMaster.getLayout(layout));
// remove any placeholder texts
for (XSLFTextShape ph : slide.getPlaceholders()) {
ph.clearText();
ph.setText("");
}

Categories

Resources