Can't remove whiteSpace in Java itext - java

Here i am combining 2 pdf documents using the Itext packages.
Merging was done successfully using the code below
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
PdfContentByte cb = writer.getDirectContent();
for (InputStream in : list)
{
PdfReader reader = new PdfReader(in);
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
document.newPage();
//import the page from source pdf
PdfImportedPage page = writer.getImportedPage(reader, i);
//add the page to the destination pdf
cb.addTemplate(page, 0, 0);
}
}
outputStream.flush();
document.close();
outputStream.close();
Here the list is an InputStream List.
And outputStream is an output stream
The problem i am having is i want to append the PDFdocuments in the list after the 1st PDF is added
(i.e 1st PDF has 4 lines...i want the 2nd PDF to continue in the same page after the 4th line).
What i am getting is the 2nd PDF is added in the second page.
Is there any alternate keyword for document.newPage();
Can anyone help me with it.
Thanks would like to hear any responses:)

It depends on the requirements you have. As long as
you only are interested in the page contents of the merged PDFs, not in the page annotations and
the pages have no content but the text lines you mention, in particular no background graphics, watermarks, or header/footer lines,
you can you use either the
PdfDenseMergeTool from this answer or the
PdfVeryDenseMergeTool from this answer.
If you are interested in annotations, it should be no problem to extend those classes accordingly. If your PDDFs have background graphics or watermarks, headers or footers, they should be removed beforehand.

Related

Remain only used font subsets while splitting PDF in Java

I am using iText to split a PDF document into separate pages as PDF files. Each file seems to be too large as all fonts used in the input PDF are saved into all result pages, which is apparently not very clean.
Code of splitting is as below. Notice PdfSmartCopy and setFullCompression doesn't help to reduce size (which I have no idea why).
public List<byte[]> split(byte[] input) throws IOException, DocumentException {
PdfReader pdfReader = new PdfReader(input);
List<byte[]> pdfFiles = new ArrayList<>();
int pageCount = pdfReader.getNumberOfPages();
int pageIndex = 0;
while (++pageIndex <= pageCount) {
Document document = new Document(pdfReader.getPageSizeWithRotation(pageIndex));
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfCopy pdfCopy = new PdfSmartCopy(document, byteArrayOutputStream);
pdfCopy.setFullCompression();
PdfImportedPage pdfImportedPage = pdfCopy.getImportedPage(pdfReader, pageIndex);
document.open();
pdfCopy.addPage(pdfImportedPage);
document.close();
pdfCopy.close();
pdfFiles.add(byteArrayOutputStream.toByteArray());
}
return pdfFiles;
}
So is there a way in Java (iText or not) to solve these problem?
Update with demo PDF
Here is a 377KB PDF using multiple CJK fonts where any page in it using 1 or 2 fonts. The summary size of sub-PDFs is 1.2MB. Considering CJK fonts are very bloated, I would like to find a way to remove unused font and even remove unused characters in used fonts.
So my idea is to remain only used characters in used fonts and embed them in sub files and then un-embed all other fonts. Any advice?

iText Pdf Page Byte Size

I have a business requirement that requires me to splits pdfs into multiple documents.
Lets say I have a 100MB pdf, I need to split that into for simplicity sake, into multiple pdfs no larger than 10MB a piece.
I am using iText.
I am going to get the original pdf, and loop through the pages, but how can I determine the file size of each page without writing it separately to the disk?
Sample code for simplicity
int numPages = reader.getNumberOfPages();
PdfImportedPage page;
for (int currentPage = 0; currentPage &lt numPages; ){
++currentPage;
//Get page from reader
page = writer.getImportedPage(reader, currentPage);
// I need the size in bytes here of the page
}
I think the easiest way is to write it to the disk and delete it afterwards:
Document document = new Document();
File f= new File("C:\\delete.pdf"); //for instance
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(f));
document.open();
document.add(page);
document.close();
long filesize = f.length(); //this is the filesize in byte
f.delete();
I'm not absolutely sure, I admit, but I don't know how it should be possible to figure out the filesize if the file is not existing.

iText merge a stamped pdf with a pdf created at runtime

I want to merge 2 pdf documents using iText in java, one of the pdfs is created at runtime while the other is an existing pdf that I read in and using the PdfStamper function stamp an image onto it. I want to then merge these two pdfs and display them using a servlet.
I want to know if this is possible and how to do it.
I have no problem creating or stamping them separately but I just can't seem to figure out how to merge them.
Thanks
I suppose this code can help you. You would have to import IText.Jar for this
public static void doMerge(List<InputStream> list,
OutputStream outputStream) throws DocumentException,
IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
PdfContentByte cb = writer.getDirectContent();
float k = 0;
for (InputStream in : list) {
PdfReader reader = new PdfReader(in);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
// document.newPage();
//import the page from source pdf
PdfImportedPage page = writer.getImportedPage(reader, i);
//add the page to the destination pdf
cb.addTemplate(page, 0, 0);
System.out.println(page.getHeight());
}
}
outputStream.flush();
document.close();
outputStream.close();
}

How to show a limited number of pages of PDF file in JSP using iText?

I have to display a PDF document in a JSP page. The PDF document has 25 pages, but I want to display only 10 pages of the PDF file. How can I achieve this with help of iText?
Assuming you have the PDF file already.
You can use PdfStamper and PdfCopy to slice the PDF up:
PdfReader reader = new PdfReader("THE PDF SOURCE");
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Document document = new Document();
PdfCopy copy = new PdfCopy(document, outputStream);
document.open();
PdfStamper stamper = new PdfStamper(reader, outputStream);
for (int i = 1; i < reader.getNumberOfPages(); i++) {
// Select what pages you need here
PdfImportedPage importedPage = stamper.getImportedPage(reader, i);
copy.addPage(importedPage);
}
copy.freeReader(reader);
outputStream.flush();
document.close();
// Now you can send the byte array to your user
// set content type to application/pdf
As for sending the pdf to display, it depends on the way you display it. The outputstream will at the end of the supplied code contain the pages you copy in the loop, in the example it is all of the pages.
This essentially is a new PDF file, but in memory. If it is the same 10 pages of the same file every time, you may consider saving it as a file.

Remove page from PDF

I'm currently using iText and I'm wondering if there is a way to delete a page from a PDF file?
I have opened it up with a reader etc., and I want to remove a page before it is then saved back to a new file; how can I do that?
The 'better' way to 'delete' pages is doing
reader.selectPages("1-5,10-12");
Which means we only select pages 1-5, 10-12 effectively 'deleting' pages 6-9.
Get the reader of existing pdf file by
PdfReader pdfReader = new PdfReader("source pdf file path");
Now update the reader by
pdfReader.selectPages("1-5,15-20");
then get the pdf stamper object to write the changes into a file by
PdfStamper pdfStamper = new PdfStamper(pdfReader,
new FileOutputStream("destination pdf file path"));
close the PdfStamper by
pdfStamper.close();
It will close the PdfReader too.
Cheers.....
For iText 7 I found this example:
PdfReader pdfReader = new PdfReader(PATH + name + ".pdf");
PdfDocument srcDoc = new PdfDocument(pdfReader);
PdfDocument resultDoc = new PdfDocument(new PdfWriter(PATH + name + "_cut.pdf"));
resultDoc.initializeOutlines();
srcDoc.copyPagesTo(1, 2, resultDoc);
resultDoc.close();
srcDoc.close();
See also here: clone-reordering-pages
and here: clone-splitting-pdf-file
You can use a PdfStamper in combination with PdfCopy.
In this answer it is explained how to copy a whole document. If you change the criteria for the loop in the sample code you can remove the pages you don't need.
Here is a removing function ready for real life usage. Proven to work ok with itext 2.1.7. It does not use "strigly typing" also.
/**
* Removes given pages from a document.
* #param reader document
* #param pagesToRemove pages to remove; 1-based
*/
public static void removePages(PdfReader reader, int... pagesToRemove) {
int pagesTotal = reader.getNumberOfPages();
List<Integer> allPages = new ArrayList<>(pagesTotal);
for (int i = 1; i <= pagesTotal; i++) {
allPages.add(i);
}
for (int page : pagesToRemove) {
allPages.remove(new Integer(page));
}
reader.selectPages(allPages);
}

Categories

Resources