How to Convert splitted pdf to Excel file using java pdfbox

How to Convert splitted pdf to Excel file using java pdfbox - java

Im new to this PDFBOX. I hve one pdf file, which contain 60 pages. Im using Apache PDFBox-app-1.8.10. jar splitting up the PDF files.
public class SplitDemo {
public static void main(String[] args) throws IOException {
JButton open = new JButton();
JFileChooser fc = new JFileChooser();
fc.setCurrentDirectory(new java.io.File("C:/Users"));
fc.setDialogTitle("Select PDF");
if(fc.showOpenDialog(open)== JFileChooser.APPROVE_OPTION)
{
}
String a = null;
a = fc.getSelectedFile().getAbsolutePath();
PDDocument document = new PDDocument();
document = PDDocument.load(a);
// Create a Splitter object
Splitter splitter = new Splitter();
// We need this as split method returns a list
List<PDDocument> listOfSplitPages;
// We are receiving the split pages as a list of PDFs
listOfSplitPages = splitter.split(document);
// We need an iterator to iterate through them
Iterator<PDDocument> iterator = listOfSplitPages.listIterator();
// I am using variable i to denote page numbers.
int i = 1;
while(iterator.hasNext()){
PDDocument pd = iterator.next();
try{
// Saving each page with its assumed page no.
pd.save("G://PDFCopy/Page " + i++ + ".pdf");
} catch (COSVisitorException anException){
// Something went wrong with a PDF object
System.out.println("Something went wrong with page " + (i-1) + "\n Here is the error message" + anException);
}
}
}
}
**In PDFCopy Folder i hve list of pdf files. How can I convert all pdf to excel format and need to save it in the target folder. i am fully confused in this conversion. **

Related

merge multiple pdfs in order

hey guys sorry for long post and bad language and if there is unnecessary details
i created multiple 1page pdfs from one pdf template using excel document
i have now
something like this
tempfile0.pdf
tempfile1.pdf
tempfile2.pdf
...
im trying to merge all files in one single pdf using itext5
but it semmes that the pages in the resulted pdf are not in the order i wanted
per exemple
tempfile0.pdf in the first page
tempfile1. int the 2000 page
here is the code im using.
the procedure im using is:
1 filling a from from a hashmap
2 saving the filled form as one pdf
3 merging all the files in one single pdf
public void fillPdfitext(int debut,int fin) throws IOException, DocumentException {
for (int i =debut; i < fin; i++) {
HashMap<String, String> currentData = dataextracted[i];
// textArea.appendText("\n"+pdfoutputname +" en cours de preparation\n ");
PdfReader reader = new PdfReader(this.sourcePdfTemplateFile.toURI().getPath());
String outputfolder = this.destinationOutputFolder.toURI().getPath();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputfolder+"\\"+"tempcontrat"+debut+"-" +i+ "_.pdf"));
// get the document catalog
AcroFields acroForm = stamper.getAcroFields();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null) {
for (String key : currentData.keySet()) {
try {
String fieldvalue=currentData.get(key);
if (key=="ADRESSE1"){
fieldvalue = currentData.get("ADRESSE1")+" "+currentData.get("ADRESSE2") ;
acroForm.setField("ADRESSE", fieldvalue);
}
if (key == "IMEI"){
acroForm.setField("NUM_SERIE_PACK", fieldvalue);
}
acroForm.setField(key, fieldvalue);
// textArea.appendText(key + ": "+fieldvalue+"\t\t");
} catch (Exception e) {
// e.printStackTrace();
}
}
stamper.setFormFlattening(true);
}
stamper.close();
}
}
this is the code for merging
public void Merge() throws IOException, DocumentException
{
File[] documentPaths = Main.objetapp.destinationOutputFolder.listFiles((dir, name) -> name.matches( "tempcontrat.*\\.pdf" ));
Arrays.sort(documentPaths, NameFileComparator.NAME_INSENSITIVE_COMPARATOR);
byte[] mergedDocument;
try (ByteArrayOutputStream memoryStream = new ByteArrayOutputStream())
{
Document document = new Document();
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.open();
for (File docPath : documentPaths)
{
PdfReader reader = new PdfReader(docPath.toURI().getPath());
try
{
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
}
finally
{
pdfSmartCopy.freeReader(reader);
reader.close();
}
}
document.close();
mergedDocument = memoryStream.toByteArray();
}
FileOutputStream stream = new FileOutputStream(this.destinationOutputFolder.toURI().getPath()+"\\"+
this.sourceDataFile.getName().replaceFirst("[.][^.]+$", "")+".pdf");
try {
stream.write(mergedDocument);
} finally {
stream.close();
}
documentPaths=null;
Runtime r = Runtime.getRuntime();
r.gc();
}
my question is how to keep the order of the files the same in the resulting pdf

It is because of naming of files. Your code
new FileOutputStream(outputfolder + "\\" + "tempcontrat" + debut + "-" + i + "_.pdf")
will produce:
tempcontrat0-0_.pdf
tempcontrat0-1_.pdf
...
tempcontrat0-10_.pdf
tempcontrat0-11_.pdf
...
tempcontrat0-1000_.pdf
Where tempcontrat0-1000_.pdf will be placed before tempcontrat0-11_.pdf, because you are sorting it alphabetically before merge.
It will be better to left pad file number with 0 character using leftPad() method of org.apache.commons.lang.StringUtils or java.text.DecimalFormat and have it like this tempcontrat0-000000.pdf, tempcontrat0-000001.pdf, ... tempcontrat0-9999999.pdf.
And you can also do it much simpler and skip writing into file and then reading from file steps and merge documents right after the form fill and it will be faster. But it depends how many and how big documents you are merging and how much memory do you have.
So you can save the filled document into ByteArrayOutputStream and after stamper.close() create new PdfReader for bytes from that stream and call pdfSmartCopy.getImportedPage() for that reader. In short cut it can look like:
// initialize
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
for (int i = debut; i < fin; i++) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
// fill in the form here
stamper.close();
PdfReader reader = new PdfReader(out.toByteArray());
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
// other actions ...
}

IText html to pdf wrapping line

Hello I'm creating javafx app with iText. I have html editor to write text and I want to create pdf from it. Everything works but when I have a really long line that is wrapped in html editor, in pdf it isn't wrapped, its out of page, how can I set wrapping page? here is my code:
PdfWriter writer = null;
try {
writer = new PdfWriter("doc.pdf");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements(editor.getHtmlText());
} catch (IOException e) {
e.printStackTrace();
}
// add elements to document
for (IElement p : list) {
document.add((IBlockElement) p);
}
// close document
document.close();
I also want to set line spacing for this text
Thank you for help

I don't get any errors for the following code:
public class stack_overflow_0008 extends AbstractSupportTicket{
private static String LONG_PIECE_OF_TEXT =
"Once upon a midnight dreary, while I pondered, weak and weary," +
"Over many a quaint and curious volume of forgotten lore—" +
"While I nodded, nearly napping, suddenly there came a tapping," +
"As of some one gently rapping, rapping at my chamber door." +
"Tis some visitor,” I muttered, “tapping at my chamber door—" +
"Only this and nothing more.";
public static void main(String[] args)
{
PdfWriter writer = null;
try {
writer = new PdfWriter(getOutputFile());
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements("<p>" + LONG_PIECE_OF_TEXT + "</p>");
} catch (IOException e) {
e.printStackTrace();
}
for (IElement p : list) {
document.add((IBlockElement) p);
}
document.close();
}
}
The document is a single (A4) page PDF with one string neatly wrapped.
I think perhaps the content of your string is to blame?
Could you post the HTML you get from this editor object?
Update:
Using the code from this answer on the HTML shared in a new comment to the question, I get the following result:
As you can see, the content is distributed over two lines. No content "falls off the page."

How to read DOCX using Apache POI in page by page mode

I would like to read a docx files to search for a particular text. I would like the program to print the page on which it was found and the document name.
I have written this simple method, but it doesn't count any page:
private static void searchDocx(File file, String searchText) throws IOException {
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
int pageNo = 1;
for (XWPFParagraph paragraph : document.getParagraphs()) {
String text = paragraph.getText();
if (text != null) {
if (text.toLowerCase().contains(searchText.toLowerCase())) {
System.out.println("found on page: " + pageNo+ " in: " + file.getAbsolutePath());
}
}
if (paragraph.isPageBreak()) {
pageNo++;
}
}
}
How to read the file, to be able to print the information on which page the searchText was found? Is there any way to know the page when reading the docx using ApachePOI?

making more than one pdf document in java

This is my code:
try {
dozen = magazijn.getFfd().vraagDozenOp();
for (int i = 0; i < dozen.size(); i++) {
PdfWriter.getInstance(doc, new FileOutputStream("Order" + x + ".pdf"));
System.out.println("Writer instance created");
doc.open();
System.out.println("doc open");
Paragraph ordernummer = new Paragraph(order.getOrdernummer());
doc.add(ordernummer);
doc.add( Chunk.NEWLINE );
for (String t : text) {
Paragraph klant = new Paragraph(t);
doc.add(klant);
}
doc.add( Chunk.NEWLINE );
Paragraph datum = new Paragraph (order.getDatum());
doc.add(datum);
doc.add( Chunk.NEWLINE );
artikelen = magazijn.getFfd().vraagArtikelenOp(i);
for (Artikel a : artikelen){
artikelnr.add(a.getArtikelNaam());
}
for (String nr: artikelnr){
Paragraph Artikelnr = new Paragraph(nr);
doc.add(Artikelnr);
}
doc.close();
artikelnr.clear();
x++;
System.out.println("doc closed");
}
} catch (Exception e) {
System.out.println(e);
}
I get this exception: com.itextpdf.text.DocumentException: The document has been closed. You can't add any Elements.
can someone help me fix this so that the other pdf can be created and paragrphs added?

Alright, your intent is not very clear from your code and question so I'm going to operate under the following assumptions:
You are creating a report for each box you're processing
Each report needs to be a separate PDF file
You're getting a DocumentException on the second iteration of the loop, you're trying to add content to a Document that has been closed in the previous iteration via doc.close();. 'doc.close' will finalize the Document and write everything still pending to any linked PdfWriter.
If you wish to create separate pdfs for each box, you need to create a seperate Document in your loop statement as well, since creating a new PdfWriter via PdfWriter.getInstance(doc, new FileOutputStream("Order" + x + ".pdf")); will not create a new Document on its own.
If I'm wrong with assumption 2 and you wish to add everything to a single PDF, move doc.close(); outside of the loop and create only a single PdfWriter

You can try something like this using Apache PDFBox
File outputFile = new File(path);
outputFile.createNewFile();
PDDocument newDoc = new PDDocument();
then create a PDPage and write what you wanna write in that page. After your page is ready, add it to the newDoc and in the end save it and close it
newDoc.save(outputFile);
newDoc.close()
repeat this dozen.size() times and keep changing the file's name in path for every new document.

Add bookmarks in PDF files using PDFRenderer API

I am using PDF Render for reading and updating PDF.
I want to add bookmark in that PDF and update it using same API.
Is it possible to do so with PDF Renderer?
Here is some code snippet to update bookmarks in PDF which is not working
File file = new File("C:\\test.pdf");
RandomAccessFile raf = new RandomAccessFile(file, "rw");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, channel.size());
PDFFile pdffile = new PDFFile(buf);
OutlineNode rootNode = new OutlineNode("New Bookmark");
PDFPage page = pdffile.getPage(0);
OutlineNode node = pdffile.getOutline();
OutlineNode node2 = (OutlineNode)node.getNextNode();
node2.add(rootNode);
I am using PDFRenderer-0.9.0.jar lib for above example.
If any one worked on PDF Renderer, please suggest me.

You can add a bookmark using this.
For full code and implementation follow the link.
you can use follow the link if you want to access the bookmark later.
import pdftron.Common.PDFNetException;
import pdftron.PDF.*;
import pdftron.SDF.Obj;
import pdftron.SDF.SDFDoc;
public class BookmarkTest {
public static void main(String[] args)
{
PDFNet.initialize();
// Relative path to the folder containing test files.
String input_path = "../../TestFiles/";
String output_path = "../../TestFiles/Output/";
// The following example illustrates how to create and edit the outline tree
// using high-level Bookmark methods.
try
{
PDFDoc doc=new PDFDoc((input_path + "numbered.pdf"));
doc.initSecurityHandler();
// Lets first create the root bookmark items.
Bookmark red = Bookmark.create(doc, "Red");
Bookmark green = Bookmark.create(doc, "Green");
Bookmark blue = Bookmark.create(doc, "Blue");
doc.addRootBookmark(red);
doc.addRootBookmark(green);
doc.addRootBookmark(blue);
// You can also add new root bookmarks using Bookmark.AddNext("...")
blue.addNext("foo");
blue.addNext("bar");
// We can now associate new bookmarks with page destinations:
// The following example creates an 'explicit' destination (see
// section '8.2.1 Destinations' in PDF Reference for more details)
Destination red_dest = Destination.createFit((Page)(doc.getPageIterator().next()));
red.setAction(Action.createGoto(red_dest));
// Create an explicit destination to the first green page in the document
green.setAction(Action.createGoto(
Destination.createFit((Page)(doc.getPage(10))) ));
// The following example creates a 'named' destination (see
// section '8.2.1 Destinations' in PDF Reference for more details)
// Named destinations have certain advantages over explicit destinations.
byte[] key={'b','l','u','e','1'};
Action blue_action = Action.createGoto(key,
Destination.createFit((Page)(doc.getPage(19))) );
blue.setAction(blue_action);
// We can now add children Bookmarks
Bookmark sub_red1 = red.addChild("Red - Page 1");
sub_red1.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(1)))));
Bookmark sub_red2 = red.addChild("Red - Page 2");
sub_red2.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(2)))));
Bookmark sub_red3 = red.addChild("Red - Page 3");
sub_red3.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(3)))));
Bookmark sub_red4 = sub_red3.addChild("Red - Page 4");
sub_red4.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(4)))));
Bookmark sub_red5 = sub_red3.addChild("Red - Page 5");
sub_red5.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(5)))));
Bookmark sub_red6 = sub_red3.addChild("Red - Page 6");
sub_red6.setAction(Action.createGoto(Destination.createFit((Page)(doc.getPage(6)))));
// Example of how to find and delete a bookmark by title text.
Bookmark foo = doc.getFirstBookmark().find("foo");
if (foo.isValid())
{
foo.delete();
}
else
{
throw new Exception("Foo is not Valid");
}
Bookmark bar = doc.getFirstBookmark().find("bar");
if (bar.isValid())
{
bar.delete();
}
else
{
throw new Exception("Bar is not Valid");
}
// Adding color to Bookmarks. Color and other formatting can help readers
// get around more easily in large PDF documents.
red.setColor(1, 0, 0);
green.setColor(0, 1, 0);
green.setFlags(2); // set bold font
blue.setColor(0, 0, 1);
blue.setFlags(3); // set bold and itallic
doc.save((output_path + "bookmark.pdf"), 0, null);
doc.close();
}
catch(Exception e)
{
System.out.println(e);
}
// The following example illustrates how to traverse the outline tree using
// Bookmark navigation methods: Bookmark.GetNext(), Bookmark.GetPrev(),
// Bookmark.GetFirstChild () and Bookmark.GetLastChild ().
try
{
// Open the document that was saved in the previous code sample
PDFDoc doc=new PDFDoc((output_path + "bookmark.pdf"));
doc.initSecurityHandler();
Bookmark root = doc.getFirstBookmark();
PrintOutlineTree(root);
doc.close();
System.out.println("Done.");
}
catch(Exception e)
{
System.out.println(e);
}
// The following example illustrates how to create a Bookmark to a page
// in a remote document. A remote go-to action is similar to an ordinary
// go-to action, but jumps to a destination in another PDF file instead
// of the current file. See Section 8.5.3 'Remote Go-To Actions' in PDF
// Reference Manual for details.
try
{
// Open the document that was saved in the previous code sample
PDFDoc doc=new PDFDoc((output_path + "bookmark.pdf"));
doc.initSecurityHandler();
// Create file specification (the file reffered to by the remote bookmark)
Obj file_spec = doc.createIndirectDict();
file_spec.putName("Type", "Filespec");
file_spec.putString("F", "bookmark.pdf");
FileSpec spec=new FileSpec(file_spec);
Action goto_remote = Action.createGotoRemote(spec, 5, true);
Bookmark remoteBookmark1 = Bookmark.create(doc, "REMOTE BOOKMARK 1");
remoteBookmark1.setAction(goto_remote);
doc.addRootBookmark(remoteBookmark1);
// Create another remote bootmark, but this time using the low-level SDF/Cos API.
// Create a remote action
Bookmark remoteBookmark2 = Bookmark.create(doc, "REMOTE BOOKMARK 2");
doc.addRootBookmark(remoteBookmark2);
Obj gotoR = remoteBookmark2.getSDFObj().putDict("A");
{
gotoR.putName("S","GoToR"); // Set action type
gotoR.putBool("NewWindow", true);
// Set the file specification
gotoR.put("F", file_spec);
// jump to the first page. Note that pages are indexed from 0.
Obj dest = gotoR.putArray("D"); // Set the destination
dest.pushBackNumber(9);
dest.pushBackName("Fit");
}
doc.save((output_path + "bookmark_remote.pdf"), SDFDoc.e_linearized, null);
doc.close();
}
catch(Exception e)
{
System.out.println(e);
}
PDFNet.terminate();
}
static void PrintIndent(Bookmark item) throws PDFNetException
{
int ident = item.getIndent() - 1;
for (int i=0; i<ident; ++i) System.out.print( " ");
}
// Prints out the outline tree to the standard output
static void PrintOutlineTree(Bookmark item) throws PDFNetException
{
for (; item.isValid(); item=item.getNext())
{
PrintIndent(item);
System.out.print((item.isOpen() ? "- " : "+ ") + item.getTitle() + " ACTION -> ");
// Print Action
Action action = item.getAction();
if (action.isValid())
{
if (action.getType() == Action.e_GoTo)
{
Destination dest = action.getDest();
if (dest.isValid())
{
Page page = dest.getPage();
System.out.println("GoTo Page #" + page.getIndex());
}
}
else
{
System.out.println("Not a 'GoTo' action");
}
}
else
{
System.out.println("NULL");
}
if (item.hasChildren()) // Recursively print children sub-trees
{
PrintOutlineTree(item.getFirstChild());
}
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to Convert splitted pdf to Excel file using java pdfbox - java

Related

merge multiple pdfs in order

IText html to pdf wrapping line

How to read DOCX using Apache POI in page by page mode

making more than one pdf document in java

Add bookmarks in PDF files using PDFRenderer API

Categories

Resources