Delete paragraph from PDF - Java - java

Greeting,
What is the easiest way to delete text/paragraph from a PDF document. The program takes a PDF document and creates a separate PDF document for each page. In each document I have text from the original that I would like to delete.
I tried a couple of examples but it doesn't work or they use old libraries
I am using the iText 7 library
private void processPDF(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
if (object instanceof PRStream) {
PRStream stream = (PRStream) object;
byte[] data = PdfReader.getStreamBytes(stream);
String dd = new String(data, "UTF8")
.replace("Hand made software", "");
stream.setData(dd.getBytes("UTF8"));
if (dd.contains("Hand made software")) {
System.out.println("Contains");
}
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
private void processPDF2(String src, String dest) throws InvalidPasswordException, IOException {
Map<String, String> map = new HashMap<>();
map.put("Hand made software", "");
File template = new File(src);
PDDocument document = PDDocument.load(template);
List<PDField> fields = document.getDocumentCatalog().getAcroForm().getFields();
for (PDField field : fields) {
for (Map.Entry<String, String> entry : map.entrySet()) {
if (entry.getKey().equals(field.getFullyQualifiedName())) {
field.setValue(entry.getValue());
field.setReadOnly(true);
}
}
}
File out = new File(dest);
document.save(out);
document.close();
}
I want to delete line "Hand made software"

You can make it easy iterating through the PDF elements. First create a PDF reader and writter that will read the template located on src string path.
File template = new File(src);
PdfReader reader = new PdfReader(template);
File out = new File(dest);
PdfWritter writter = new PdfWritter(out);
Then create a Document object by firstly creating a PdfDocument:
PdfDocument pdf = new PdfDocument(reader, writter);
Document document = new Document(pdf);
Last iterate through the elements of the pdf document until "Hand made software" line is found:
for (int i = 0; i < document.getRoots().size(); i++) {
if (document.getRoots().get(i) instanceof Paragraph) { //iterate only through paragraphs
Paragraph paragraph = (Paragraph) document.getRoots().get(i);
if(paragraph.getText().equals("Hand made software")){ //if the paragraph equals to the string to be removed, remove from the document
document.getRoots().remove(i);
i--;
}
}
}
Finally close de document to save the changes
document.close();

Related

iText7 : how to use the same reader in multiple page of the same pdf

As the title said,I have an existing pdf that contains an empty table with a custom design, I want to create a pdf file that write data in that table, but I need to have that same table on every page of my new pdf.
public byte[] pdfRdv(Rdv rdv, List<RdvClient> clients) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
//this file contains a empty table in which I want to insert each iteration of my
//table clients
String src = "C://Users//ASUS//Downloads//rdv.pdf";
PdfReader reader = new PdfReader(src);
PdfWriter writer = new PdfWriter(baos);
PdfDocument pdfDoc = new PdfDocument(reader, writer);
Document document = new Document(pdfDoc);
for(int i = 0; i < clients.size(); i++){
if(i > 0){
// I tried to add for each itaration a new page but i don't know how to insert in
//it the table in the existing pdf (reader)
pdfDoc.addNewPage(i+2);
}
Paragraph p1 = new Paragraph(clients.get(i).getId().toString()).setFontSize(10);
p1.setFixedPosition(140, 687, 400);
document.add(p1);
//..
}
document.close();
pdfDoc.close();
byte[] pdf = baos.toByteArray();
return pdf;
}
Thank you in advance

Merging of different file format into single pdf using itext is giving corrupt pdf

I have a multipart form in which different type of files will be uploaded in single request like pdf,png,jpg etc. I need to read these files and create final merged single pdf file. Does iText will support this kind of merger of different file types.
I can able to merge all image type but when i try to merge pdf with imgae getting some issues.Any sample example on this.Below code creating final pdf without images.
public void generateMergedPDF(Map<String, String> dataMap, MultipartFile[] files) throws Exception {
ClassPathResource resource = new ClassPathResource("templatepdfForm.pdf");
FileOutputStream userInfo = new FileOutputStream(new File("C:\\pdf\\megepath\\updateddocument.pdf"));
//Update filed values to template Starts
PdfReader reader = new PdfReader(resource.getInputStream());
PdfStamper stamper = new PdfStamper(reader, userInfo);
stamper.setFormFlattening(true);
AcroFields form = stamper.getAcroFields();
Map<String, Item> fieldMap = form.getFields();
for (String key : fieldMap.keySet()) {
String fieldValue = dataMap.get(key);
if (fieldValue != null) {
form.setField(key, fieldValue);
}
}
stamper.close();
//Update filed values to template Ends
Document mergePdfDoc = new Document();
PdfCopy pdfCopy;
boolean smartCopy = false;
FileOutputStream finalFile = new FileOutputStream("C:\\pdf\\finalmergedfile.pdf");
//Merge updated pdf with multipart content
if(smartCopy)
pdfCopy = new PdfSmartCopy(mergePdfDoc, finalFile);
else
pdfCopy = new PdfCopy(mergePdfDoc, finalFile);
PdfWriter writer = PdfWriter.getInstance(mergePdfDoc, finalFile);
mergePdfDoc.open();
PdfReader mergeReader = new PdfReader(new FileInputStream(new File("C:\\pdf\\megepath\\updateddocument.pdf")));
pdfCopy.addDocument(mergeReader);
pdfCopy.freeReader(mergeReader);
mergeReader.close();
PdfReader[] pdfReader = new PdfReader[files.length];
for(int i=0; i<=files.length-1;i++) {
if(FileContentType.APPLICATION_TYPE.getContentTypes().contains(files[i].getContentType())) {
//To add multipart pdf content
pdfReader[i] = new PdfReader(files[i].getInputStream());
pdfCopy.addDocument(pdfReader[i]);
pdfCopy.freeReader(pdfReader[i]);
pdfReader[i].close();
}else if(FileContentType.IMAGE_TYPE.getContentTypes().contains(files[i].getContentType())) {
//To add multipart image content
System.out.println("Image Type Loop");
Image fileImage = Image.getInstance(files[i].getBytes());
mergePdfDoc.setPageSize(fileImage);
mergePdfDoc.newPage();
claimImage.setAbsolutePosition(0, 0);
mergePdfDoc.add(fileImage);
}
}
pdfCopy.setMergeFields();
//mergePdfDoc.close(); //If i enable this close stream closed error
//writer.close(); //If i enable this close stream closed error
memInfo.close();
finalFile.close();
}

merge multiple pdfs in order

hey guys sorry for long post and bad language and if there is unnecessary details
i created multiple 1page pdfs from one pdf template using excel document
i have now
something like this
tempfile0.pdf
tempfile1.pdf
tempfile2.pdf
...
im trying to merge all files in one single pdf using itext5
but it semmes that the pages in the resulted pdf are not in the order i wanted
per exemple
tempfile0.pdf in the first page
tempfile1. int the 2000 page
here is the code im using.
the procedure im using is:
1 filling a from from a hashmap
2 saving the filled form as one pdf
3 merging all the files in one single pdf
public void fillPdfitext(int debut,int fin) throws IOException, DocumentException {
for (int i =debut; i < fin; i++) {
HashMap<String, String> currentData = dataextracted[i];
// textArea.appendText("\n"+pdfoutputname +" en cours de preparation\n ");
PdfReader reader = new PdfReader(this.sourcePdfTemplateFile.toURI().getPath());
String outputfolder = this.destinationOutputFolder.toURI().getPath();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputfolder+"\\"+"tempcontrat"+debut+"-" +i+ "_.pdf"));
// get the document catalog
AcroFields acroForm = stamper.getAcroFields();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null) {
for (String key : currentData.keySet()) {
try {
String fieldvalue=currentData.get(key);
if (key=="ADRESSE1"){
fieldvalue = currentData.get("ADRESSE1")+" "+currentData.get("ADRESSE2") ;
acroForm.setField("ADRESSE", fieldvalue);
}
if (key == "IMEI"){
acroForm.setField("NUM_SERIE_PACK", fieldvalue);
}
acroForm.setField(key, fieldvalue);
// textArea.appendText(key + ": "+fieldvalue+"\t\t");
} catch (Exception e) {
// e.printStackTrace();
}
}
stamper.setFormFlattening(true);
}
stamper.close();
}
}
this is the code for merging
public void Merge() throws IOException, DocumentException
{
File[] documentPaths = Main.objetapp.destinationOutputFolder.listFiles((dir, name) -> name.matches( "tempcontrat.*\\.pdf" ));
Arrays.sort(documentPaths, NameFileComparator.NAME_INSENSITIVE_COMPARATOR);
byte[] mergedDocument;
try (ByteArrayOutputStream memoryStream = new ByteArrayOutputStream())
{
Document document = new Document();
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.open();
for (File docPath : documentPaths)
{
PdfReader reader = new PdfReader(docPath.toURI().getPath());
try
{
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
}
finally
{
pdfSmartCopy.freeReader(reader);
reader.close();
}
}
document.close();
mergedDocument = memoryStream.toByteArray();
}
FileOutputStream stream = new FileOutputStream(this.destinationOutputFolder.toURI().getPath()+"\\"+
this.sourceDataFile.getName().replaceFirst("[.][^.]+$", "")+".pdf");
try {
stream.write(mergedDocument);
} finally {
stream.close();
}
documentPaths=null;
Runtime r = Runtime.getRuntime();
r.gc();
}
my question is how to keep the order of the files the same in the resulting pdf
It is because of naming of files. Your code
new FileOutputStream(outputfolder + "\\" + "tempcontrat" + debut + "-" + i + "_.pdf")
will produce:
tempcontrat0-0_.pdf
tempcontrat0-1_.pdf
...
tempcontrat0-10_.pdf
tempcontrat0-11_.pdf
...
tempcontrat0-1000_.pdf
Where tempcontrat0-1000_.pdf will be placed before tempcontrat0-11_.pdf, because you are sorting it alphabetically before merge.
It will be better to left pad file number with 0 character using leftPad() method of org.apache.commons.lang.StringUtils or java.text.DecimalFormat and have it like this tempcontrat0-000000.pdf, tempcontrat0-000001.pdf, ... tempcontrat0-9999999.pdf.
And you can also do it much simpler and skip writing into file and then reading from file steps and merge documents right after the form fill and it will be faster. But it depends how many and how big documents you are merging and how much memory do you have.
So you can save the filled document into ByteArrayOutputStream and after stamper.close() create new PdfReader for bytes from that stream and call pdfSmartCopy.getImportedPage() for that reader. In short cut it can look like:
// initialize
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
for (int i = debut; i < fin; i++) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
// fill in the form here
stamper.close();
PdfReader reader = new PdfReader(out.toByteArray());
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
// other actions ...
}

IText merge documents with acrofields

I currently have a PdfReader and a PdfStamper that I am filling out the acrofields with. I now have to copy another pdf to the end of that form I have been filling out and when I do I lose the acrofield on the new form I copy over. Here is the code.
public static void addSectionThirteenPdf(PdfStamper stamper, Rectangle pageSize, int pageIndex){
PdfReader reader = new PdfReader(FacesContext.getCurrentInstance().getExternalContext().getResourceAsStream("/resources/documents/Section13.pdf"));
AcroFields fields = reader.getAcroFields();
fields.renameField("SecurityGuidancePage3", "SecurityGuidancePage" + pageIndex);
stamper.insertPage(pageIndex, pageSize);
stamper.replacePage(reader, 1, pageIndex);
}
The way that I am creating the original document is like this.
OutputStream output = FacesContext.getCurrentInstance().getExternalContext().getResponseOutputStream();
PdfReader pdfTemplate = new PdfReader(FacesContext.getCurrentInstance().getExternalContext().getResourceAsStream("/resources/documents/dd254.pdf"));
PdfStamper stamper = new PdfStamper(pdfTemplate, output);
stamper.setFormFlattening(true);
AcroFields fields = stamper.getAcroFields();
Is there a way to merge using the first piece of code and merge both of the acrofields together?
Depending on what you want exactly, different scenarios are possible, but in any case: you are doing it wrong. You should use either PdfCopy or PdfSmartCopy to merge documents.
The different scenarios are explained in the following video tutorial.
You can find most of the examples in the iText sandbox.
Merging different forms (having different fields)
If you want to merge different forms without flattening them, you should use PdfCopy as is done in the MergeForms example:
public void createPdf(String filename, PdfReader[] readers) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(filename));
copy.setMergeFields();
document.open();
for (PdfReader reader : readers) {
copy.addDocument(reader);
}
document.close();
for (PdfReader reader : readers) {
reader.close();
}
}
In this case, readers is an array of PdfReader instances containing different forms (with different field names), hence we use PdfCopy and we make sure that we don't forget to use the setMergeFields() method, or the fields won't be copied.
Merging identical forms (having identical fields)
In this case, we need to rename the fields, because we probably want different values on different pages. In PDF a field can only have a single value. If you merge identical forms, you have multiple visualizations of the same field, but each visualization will show the same value (because in reality, there is only one field).
Let's take a look at the MergeForms2 example:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
copy.setMergeFields();
document.open();
List<PdfReader> readers = new ArrayList<PdfReader>();
for (int i = 0; i < 3; ) {
PdfReader reader = new PdfReader(renameFields(src, ++i));
readers.add(reader);
copy.addDocument(reader);
}
document.close();
for (PdfReader reader : readers) {
reader.close();
}
}
public byte[] renameFields(String src, int i) throws IOException, DocumentException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, baos);
AcroFields form = stamper.getAcroFields();
Set<String> keys = new HashSet<String>(form.getFields().keySet());
for (String key : keys) {
form.renameField(key, String.format("%s_%d", key, i));
}
stamper.close();
reader.close();
return baos.toByteArray();
}
As you can see, the renameFields() method creates a new document in memory. That document is merged with other documents using PdfSmartCopy. If you'd use PdfCopy here, your document would be bloated (as we'll soon find out).
Merging flattened forms
In the FillFlattenMerge1, we fill out the forms using PdfStamper. The result is a PDF file that is kept in memory and that is merged using PdfCopy. While this example is fine if you'd merge different forms, this is actually an example on how not to do it (as explained in the video tutorial).
The FillFlattenMerge2 shows how to merge identical forms that are filled out and flattened correctly:
public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
document.open();
ByteArrayOutputStream baos;
PdfReader reader;
PdfStamper stamper;
AcroFields fields;
StringTokenizer tokenizer;
BufferedReader br = new BufferedReader(new FileReader(DATA));
String line = br.readLine();
while ((line = br.readLine()) != null) {
// create a PDF in memory
baos = new ByteArrayOutputStream();
reader = new PdfReader(SRC);
stamper = new PdfStamper(reader, baos);
fields = stamper.getAcroFields();
tokenizer = new StringTokenizer(line, ";");
fields.setField("name", tokenizer.nextToken());
fields.setField("abbr", tokenizer.nextToken());
fields.setField("capital", tokenizer.nextToken());
fields.setField("city", tokenizer.nextToken());
fields.setField("population", tokenizer.nextToken());
fields.setField("surface", tokenizer.nextToken());
fields.setField("timezone1", tokenizer.nextToken());
fields.setField("timezone2", tokenizer.nextToken());
fields.setField("dst", tokenizer.nextToken());
stamper.setFormFlattening(true);
stamper.close();
reader.close();
// add the PDF to PdfCopy
reader = new PdfReader(baos.toByteArray());
copy.addDocument(reader);
reader.close();
}
br.close();
document.close();
}
These are three scenarios. Your question is too unclear for anyone but you to decide which scenario is the best fit for your needs. I suggest that you take the time to learn before you code. Watch the video, try the examples, and if you still have doubts, you'll be able to post a smarter question.

Unable to read unicode character in pdf using java

I am trying to convert Pdf document that contains Tamil unicode characters into a word document retaining all the formatting. I am not able to read the unicode character in the Pdf they are appearing as junk character in word. I am using the below code can someone please help?
public static void main(String[] args) throws IOException {
System.out.println("Document converted started");
XWPFDocument doc = new XWPFDocument();
String pdf = "D:\\sample1.pdf";
PdfReader reader = new PdfReader(pdf);
// InputStreamReader isr = new InputStreamReader(reader,"UTF8");
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i,
new SimpleTextExtractionStrategy());
System.out.println(strategy.getResultantText());
String text = strategy.getResultantText();
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
// run.setFontFamily(new Font("Arial"));
run.setFontSize(14);
run.setText(text);
// run.addBreak(BreakType.PAGE);
}
FileOutputStream out = new FileOutputStream("D:\\tamildoc.docx");
doc.write(out);
out.close();
reader.close();
System.out.println("Document converted successfully");
}
You can use the library Apache PDFBox https://pdfbox.apache.org/download.cgi . With the component PDFTextStripper, invoking method getText(PDDocument doc) you will obtain a simple String that represents the content of .pdf file
Here an example :
UploadedFile file = new UploadedFile(fileName);
InputStream is = file.getInputStream();
PDDocument doc = PDDocument.load(is);
String content = new PDFTextStripper().getText(doc);
doc.close();
And after that you can write on your file

Categories

Resources