Apache PDFBox wrong acces permission [duplicate]

Apache PDFBox wrong acces permission [duplicate] - java

This question already has an answer here:
Security Method is No Security but Page Extraction and Document Assembly is not Allowed
(1 answer)
Closed 2 years ago.
I'm trying to extract access permissions with Apache PDFBox. The problem is that all the permissions are set to true.
For example, I extracted the Document Assembly property as follow:
PDDocument doc = PDDocument.load(new File(filePath));
AccessPermission ap = doc.getCurrentAccessPermission();
boolean documentAssembly = ap.canAssembleDocument();
The documentAssembly variable is true. However, when i check the permissions on Adobe reader I found that the document assembly property is set to not allowed:
Is there a way to extract all the correct informations, as in the above image?

What you see on the security tab is a summary of all document restrictions that apply. In particular there are some restrictions which only depend on the PDF viewer you use. If I look at the same dialog in Adobe Acrobat (not Reader), for example, I see
Obviously PDFBox does not know which viewer you will use. So it cannot consider viewer specific restrictions.

Related

PDDocument.load(file) isnt a method (PDFBox)

I wanted to make a simple program to get text content from a pdf file through Java. Here is the code:
PDFTextStripper ts = new PDFTextStripper();
File file = new File("C:\\Meeting IDs.pdf");
PDDocument doc1 = PDDocument.load(file);
String allText = ts.getText(doc1);
String gradeText = allText.substring(allText.indexOf("GRADE 10B"), allText.indexOf("GRADE 10C"));
System.out.println("Meeting ID for English: "
+ gradeText.substring(gradeText.indexOf("English") + 7, gradeText.indexOf("English") + 20));
This is just part of the code, but this is the part with the problem.
The error is: The method load(File) is undefined for the type PDDocument
I have learnt using PDFBox from JavaTPoint. I have followed the correct instructions for installing the PDFBox libraries and adding them to the Build Path.
My PDFBox version is 3.0.0
I have also searched the source files and their methods, and I am unable to find the load method there.
Thank you in advance.

As per the 3.0 migration guide the PDDocument.load method has been replaced with the Loader method:
For loading a PDF PDDocument.load has been replaced with the Loader
methods. The same is true for loading a FDF document.
When saving a PDF this will now be done in compressed mode per
default. To override that use PDDocument.save with
CompressParameters.NO_COMPRESSION.
PDFBox now loads a PDF Document incrementally reducing the initial
memory footprint. This will also reduce the memory needed to consume a
PDF if only certain parts of the PDF are accessed. Note that, due to
the nature of PDF, uses such as iterating over all pages, accessing
annotations, signing a PDF etc. might still load all parts of the PDF
overtime leading to a similar memory consumption as with PDFBox 2.0.
The input file must not be used as output for saving operations. It
will corrupt the file and throw an exception as parts of the file are
read the first time when saving it.
So you can either swap to an earlier 2.x version of PDFBox, or you need to use the new Loader method. I believe this should work:
File file = new File("C:\\Meeting IDs.pdf");
PDDocument doc1 = Loader.loadPDF(file);

Print a in memory pdf in Java

I am developing a module where i am supposed to print documents from the server. Following are the requirements :
the module should be able to print a pdf from a url, with & without saving
the module should be able to accept page numbers as parameters and only print/save those page numbers.
the module should be able to accept the printer name as a parameter and use only that printer
Is there any library available for this? How should i go about implementing this?

The answer was Apache PDFBox . I was able to load the PDF into a PDDocument object like this :
PDDocument pdf = PDDocument.load(new URL(download_pdf_from).openStream());
Splitting the document was as easy as :
Splitter splitter = new Splitter();
List<PDDocument> splittedDocuments = splitter.split(pdf);
Now, to get a reference to any particular page:
splittedDocuments.get(pageNo);
Saving the entire document or even a given page number :
pdf.save(path); //saving the entire document to device
splittedDocuments.get(pageNo).save(path); //saving a particular page number to device
For the printing part, this answer helped me.

How can I automatically fill in the Password field of a PDF document using java code?

i am working on a project that requires a password to access every downloaded PDF file. The password is to be fetched from a database (I am using MySQL).
I have searched for a java code that would relate to this type of task but I have barely got any. Most answers are inclined to form filling after the document is already downloaded.
I have thought about making the PDF files templates that would only display information if the password field that would be auto-filled (In case i am forced to use this option) but i am afraid that would take a lot of doing.
i have read on the context of iTEXT by Bruno Lowagie directed by a question on Stackoverflow but the closest i got was this snippet that answers a question by "affan" on how to fill a PDF automatically using external data from a database.
I recon that this snippet is to be used to fill in an already open PDF document.
This is the snippet:
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
form.setField(key, value);
stamper.setFormFlattening(true);
stamper.close();
reader.close();
Someone please help out with how i could automatically fill the password field of a PDF that requires a password for access.

It seems that you are mixing two different things:
Form fields, in which you have a text field for which a specific field flag is set so that the text that is added is obfuscated (e.g. "Password" is shown as "********"). Reading your requirement, this is not what you want. Also: I see that you set the form flattening to true and that removes all the field information, including the field flag that changes a text field into a password field.
Encryption, in which you protect a document with one or two passwords.
I assume that you want to protect a document, but I have to admit that my assumption could be wrong. I have no idea why you are mentioning form field in your question.
If you protect a document with an owner password only, everyone has access to that document, but you can put in place some permissions. This is not a secure solution, because many viewers ignore the owner password; moreover, it is very easy to remove the owner password.
If you protect a document with an owner and a user password, only people who know either the owner password or the user password can view the document. All the content of the document is encrypted (except for the Metadata if you want the metadata to be accessible).
Please read the answer to the following questions:
How to protect a PDF with a username and password?
How to protect an already existing PDF with a password?
iText setEncryption error
BadPasswordException: Bad user password (this is probably the most interesting one)
For a more elaborate answer, please consult the FAQ on the official web site.
for iText 5: How to decrypt a PDF document with the owner password?
for iText 7: How to decrypt a PDF document with the owner password?
Form fields (even when using the password flag on a text field) and using encryption to protect a document, are two completely separate topics in the PDF specification (ISO-32000). You can't "automatically fill a password field" in a PDF form, and hope that you can use the value of that field to open a protected document.

Creating a non-editable pdf document using PD4ml

I am using pd4ml to create pdf documents, however I don't want the user to be able to edit those documents using ms word 2013.
here is what I have tried so far
pd4ml = new PD4ML();
pd4ml.setPageSize(PD4Constants.A4);
pd4ml.setPageInsetsMM(new Insets(TOPVALUE, LEFTVALUE, BOTTOMVALUE, RIGHTVALUE));
pd4ml.setHtmlWidth(USERSPACEWIDTH);
pd4ml.enableImgSplit(false);
pd4ml.disableHyperlinks();
//some more code
pd4ml.render(arrayOfURLs, byteArrayOutputStream);
//some more code
then I read the PD4ML API documentation and added this line of code pd4ml.generatePdfa(true); I thought the problem was solved when I opened the document in adobe reader and saw this message "
this file claims compliance with the pdf/a standard and has been opened read-only", but of course it was still editable; so any suggestions on how this is done in pd4ml, or any reference to an api I can use to add this restriction to the generated pdf will be more than welcomed.

Tried this?
This if from the documentation by the way:
AllowModify
public static final int AllowModify
Document access permission (bit 4, value = 8).
Modify the contents of the document by operations other than those controlled by bits 6, 9, and 11.
See Also:
PD4ML.setPermissions(String, int, boolean), Constant Field Values
Also you may want to do:
pd4ml.setPermissions("", 0xffffffff ^ PD4Constants.AllowModify, false);
to disable the modification.
More information here: http://pd4ml.com/cookbook/pd4ml_pdf_security.htm

Unable to read a PDF file using PDFBOX

I am trying to fill in a PDF form using JAVA, but when I tried to get the fields using the below code the list is empty.
PDDocument pdDoc = PDDocument.load(filename);
PDAcroForm pdform = pdDoc.getDocumentCatalog().getAcroForm();
List<PDField> field = pdform.getFields();
Then I tried to read the file using PDFStripper
PDFTextStripper stripper = new PDFTextStripper();
System.out.println(stripper.getText(pdDoc));
and the ouput was as follows
"Please wait...
If this message is not eventually replaced by the proper contents of the document, your PDF
viewer may not be able to display this type of document.
You can upgrade to the latest version of Adobe Reader for Windows®, Mac, or Linux® by
visiting http://www.adobe.com/go/reader_download.
For more assistance with Adobe Reader visit http://www.adobe.com/go/acrreader.
Windows is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries. Mac is a trademark
of Apple Inc., registered in the United States and other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other
countries."
But I'm able to open the file manually and fill the fields as well. I've tried other tools like iText also. But again I wasn't able to get the fields.
How can I resolve this issue?

May be it is too late to answer but anyway why not. You can get empty list if your pdf file has XFA structure.
PDDocument pdDoc = PDDocument.load(filename);
PDAcroForm pdform = pdDoc.getDocumentCatalog().getAcroForm();
List<PDField> field = pdform.getFields();
Use these code lines to start working with pdf:
PDXFA xfa = pdform.getXFA();
Document xfaDocument = xfa.getDocument();
NodeList elements = xfaDocument.getElementsByTagName( "SomeElement" );

While struggling with Alfresco's content search abilities, I've had some trouble with pdfbox (used by Alfresco to extract text and metadata) reading PDF files written by old applications (like QuarkXPress) that use old Acrobat 4.0 format. This old format pdfbox seems to be unable to extract metadata or text from it, although the files were perfectly viewable with any PDF reader application.
The solution was having all old PFD files re-printed (saved as...) using a more modern PDF format (like 10.0 for instance). This can be done in a row using some bash scripting.
I directly didn't try intermediate Acrobat versions among 4.0 and 10.0.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.