Protecting PDF using PDFBox

Protecting PDF using PDFBox - java

Im really struggling with the documentation for PDFBox. For such a popular library info seems to be a little thin on the ground (for me!).
Anyway the problem Im having relates to protecting the PDF. At the moment all I want is to control the access permissions of the users. specifically I want to prevent the user from being able to modify the PDF.
If I omit the access permission code everything works perfectly. I am reading in a PDF from an external resource. I am then reading and populating the fields, adding some images before saving the new PDF. That all works perfectly.
The problem comes when I add the following code to manage the access:
/* Secure the PDF so that it cannot be edited */
try {
String ownerPassword = "DSTE$gewRges43";
String userPassword = "";
AccessPermission ap = new AccessPermission();
ap.setCanModify(false);
StandardProtectionPolicy spp = new StandardProtectionPolicy(ownerPassword, userPassword, ap);
pdf.protect(spp);
} catch (BadSecurityHandlerException ex) {
Logger.getLogger(PDFManager.class.getName()).log(Level.SEVERE, null, ex);
}
When I add this code, all the text and images are striped from the outgoing pdf. The fields are still present in the document but they are all empty and all the text and images that where part of the original PDF and that were added dynamically in the code are gone.
UPDATE:
Ok, as best as I can tell the problem is coming from a bug relating to the form fields. I'm going to try a different approach without the form fields and see what it gives.

I found the solution to this problem. It would appear that if the PDF comes from an external source, sometimes the PDF is protected or encrypted.
If you get a blank output when loading up a PDF document from an external source and add protections you are probably working with an encrypted document. I have a stream processing system working on PDF documents. So the following code works for me. If you are just working with PDF inputs then you could integrate the below code with your flow.
public InputStream convertDocument(InputStream dataStream) throws Exception {
// just acts as a pass through since already in pdf format
PipedOutputStream os = new PipedOutputStream();
PipedInputStream is = new PipedInputStream(os);
System.setProperty("org.apache.pdfbox.baseParser.pushBackSize", "2024768"); //for large files
PDDocument doc = PDDocument.load(dataStream, true);
if (doc.isEncrypted()) { //remove the security before adding protections
doc.decrypt("");
doc.setAllSecurityToBeRemoved(true);
}
doc.save(os);
doc.close();
dataStream.close();
os.close();
return is;
}
Now take that returned InputStream and use it for your security application;
PipedOutputStream os = new PipedOutputStream();
PipedInputStream is = new PipedInputStream(os);
System.setProperty("org.apache.pdfbox.baseParser.pushBackSize", "2024768");
InputStream dataStream = secureData.data();
PDDocument doc = PDDocument.load(dataStream, true);
AccessPermission ap = new AccessPermission();
//add what ever perms you need blah blah...
ap.setCanModify(false);
ap.setCanExtractContent(false);
ap.setCanPrint(false);
ap.setCanPrintDegraded(false);
ap.setReadOnly();
StandardProtectionPolicy spp = new StandardProtectionPolicy(UUID.randomUUID().toString(), "", ap);
doc.protect(spp);
doc.save(os);
doc.close();
dataStream.close();
os.close();
Now this should return a proper document with no blank output!
Trick is to remove encryption first!

Related

Not able to generate multiple documents using ServletOutputStream in Java [duplicate]

For example, i would like to download one zip file and one csv file in one response. Is there any way other than compressing these two files in one zip file.

Although ServletResponse is not meant to do this, we could programmatically tweak it to send multiple files, which all client browsers except IE seems to handle properly. A sample code snippet is given below.
response.setContentType("multipart/x-mixed-replace;boundary=END");
ServletOutputStream out = response.getOutputStream();
out.println("--END");
for(File f:files){
FileInputStream fis = new FileInputStream(file);
BufferedInputStream fif = new BufferedInputStream(fis);
int data = 0;
out.println("--END");
while ((data = fif.read()) != -1) {
out.write(data);
}
fif.close();
out.println("--END");
out.flush();
}
out.flush();
out.println("--END--");
out.close();
This will not work in IE browsers.
N.B - Try Catch blocks not included

Code developed by Jason Hunter to handle servlet request and response having multiple parts has been the defacto since years. You can find it at servlets.com

No you can not do that. The reason is that whenever you want to sent any data in request you use steam available in request and retrive this data using request.getRequestParameter("streamParamName").getInputStream(), also please make a note if you have already consumed this stream once you will not be able to get it again.
The example mentioned above is a tweak that google also uses in sending multipart email with multiple attachments. To achieve that they define boundaries for each attachment and client have to take care of these boundaries while retrieving this information and rendering it.

How to read / write into docx file using commons.io.FileUtils?

Need some quick help. I am trying to write a java program to generate a report. I have the report template in a docx file.
What I want to do is, use that docx file as template and put data in it multiple times for various records and write that to a new docx file. The main thing is I want to maintain the formatting and indentation of the contents inside the docx file. They are bullets data. And that's where the problem is.
Below is the piece of code handling the above operation,
public void readWriteDocx(HashMap<String, String> detailsMap) {
try {
File reportTemplateFile = new File("ReportTemplate.docx");
File actualReportFile = new File("ActualReport.docx");
StringBuilder preReport = new StringBuilder();
preReport.append("Some details about pre report goes here...: ");
preReport.append(System.lineSeparator());
String docxContent = "";
for (Map.Entry<String, String> entry : detailsMap.entrySet()) {
docxContent = FileUtils.readFileToString(reportTemplateFile, StandardCharsets.UTF_8);
// code to fetch and get data to insert into docxContent
docxContent = docxContent.replace("$filename", keyFilename);
docxContent = docxContent.replace("$expected", expectedFile);
docxContent = docxContent.replace("$actual", actualFile);
docxContent = docxContent.replace("$reportCount", String.valueOf(reportCount));
docxContent = docxContent.replace("$diffMessage", key);
FileUtils.writeStringToFile(actualReportFile, docxContent, StandardCharsets.UTF_8, true);
}
preReport.append(FileUtils.readFileToString(actualReportFile, StandardCharsets.UTF_8));
System.out.print(preReport.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
As you can see, I am using FileUtils read and write methods and using UTF_8 encoding. That's just a guess, I am not sure about the same. I am trying to append the newly generated docx file contents to a string builder and print the same on console, but that's secondary. Main thing is that the docx should be written properly. But no luck.
When this prints, its all weird characters and nothing is readable. When I try to open the newly generated docx file, it doesn't even open.
Any idea what should I do to get the data in proper format. I am attaching image file of how my ReportTemplate.docx looks, that I am using as a template to generate this report. I am using commons-io-2.4.jar
Please guide if you can. Thanks a lot.

You can use Apache POI for creating and editing doc docx files or docx4j. Otherwise there is no simple way to edit doc or docx files without these libraries.

Downloading a PDF file from a protected webpage

So I've been trying this for a couple of days now and I really don't have any time left since the project is due in tomorrow. I was wondering if someone could help me out with this. I'm trying to download a PDF file from this link, which is a link to a webpage of PDF content. I have tried using Jsoup but Jsoup does not support webpages when they are written in PDF format. This is the code I've been trying to use:
System.out.println("opening connection");
URL url = new URL("https://www.capitaliq.com/CIQDotNet/Filings/DocumentRedirector.axd?versionId=1257051021&type=pdf&forcedownload=false");
InputStream in = url.openStream();
FileOutputStream fos = new FileOutputStream("/Users/HIDDEN/Desktop/fullreport.pdf");
System.out.println("reading file...");
int length = -1;
byte[] buffer = new byte[1024];// buffer for portion of data from
// connection
while ((length = in.read(buffer)) > -1) {
fos.write(buffer, 0, length);
}
fos.close();
in.close();
System.out.println("file was downloaded");
The problem with this code is that it automatically redirects you to a login page in which you have to type your username and password. Therefore, I have to find a way to login to my account and connect to the page without using Jsoup (as earlier mentioned, this is unable to read PDF contents). If someone could alter this code to make it possible for me to login and subsequently download the pdf by looking at the html of this login page and adjusting the code. I would be eternally grateful. Thank you!

HtmlUnit is what I use for stuff like this, especially when speed is not critical.
Here's a random-ish piece of psuedo code from another one of my answers:
WebClient wc = new WebClient(BrowserVersion.CHROME);
HtmlPage p = wc.getPage(url)
((HtmlTextInput) p.getElementById(userNameId)).setText(userName);
((HtmlTextInput) p.getElementById(passId)).setText(pass);
p = ((HtmlElement) p.getElementById(submitBtnId)).click();
// Just as an example for something I've had to do, I use
// UnexpectedPage when the "content-type" is "application/zip"
UnexpectedPage up = ((HtmlElement) p.getElementById(downloadBtn)).click();
InputStream in = up.getInputStream();
...
Use another library for reading the pdf

Apache POI Formatting issue

I was wondering if someone could help me figure out why my text is not lining up when I read a .doc file. So far in my code I am using WordExtractor, but I am having formatting issue with stuff not lining up correctly. Here is my code that was written using Java 1.7.
public class Doc {
File docFile = null;
WordExtractor docExtractor = null ;
WordExtractor exprExtractor = null ;
public void read(){
docFile = new File("blue.doc");
try{
FileInputStream fis = new FileInputStream(docFile.getAbsolutePath());
HWPFDocument doc=new HWPFDocument(fis);
docExtractor = new WordExtractor(doc);
}catch(Exception e){
System.out.println(e.getMessage());
}
System.out.println(docExtractor.getText());
}
}
How the program displays the document.
A E
I'm stuck in Folsom Prison, and time keeps draggin on.
It is supposed to be displayed like this
A E
I'm stuck in Folsom Prison, and time keeps draggin on.

Of course this will not work. You are extracting the content of a document file into a string variable (which will distort formatting into document like paragraphs and all). Further you are printing the text into console and then you expect that it will look exactly like in Microsoft word?
Next, you should think what do you want to do. Assuming that you want to verify both formatting and content of the document, my answer follows. Converting a document into plain text using getText() will give you content of document in a distorted format which does not help you. By using POI library you should instead try to access each paragraph and table in the document and verify/read/write whatever you want to.
doc.getRange() would give you a Range object. Play with this object by referring to http://poi.apache.org/apidocs/org/apache/poi/hwpf/usermodel/Range.html and you would be able to access all paragraphs, tables and sections in the document. That should help you in working out the word document through program.

Generating a pdf, and sending it to user with using HttpServletResponse(headers)

Welcome all.
I'm trying to create a PDF to send to user, without saving the file on my server first.
I'm using Hibernate + struts2.
My samples code:
CreatePDF.java (Class for generate pdf)
Method BuildPdf():
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
document = new Document();
PdfWriter.getInstance(document,baos);
document.open();
buildPage(document, snippet, snippetContent);
document.close();
response.setContentType("application/pdf");
response.setContentLength(baos.size());
response.setHeader("Content-Disposition", "attachment;filename=document.pdf");
ServletOutputStream out = response.getOutputStream();
baos.writeTo(out);
out.flush();
response.flushBuffer();
} catch (Exception e) {
Log4jUtil.debug(logger, "Can not buid pdf-file", e);
}
My sample action:
method index():
pdf = new CreatePDF();
pdf.buildPdf(snippet, snippetContent);
return SUCCESS;
Can you check my code please for search error? Could there be errors....
Please help me. Need ideas, or example code to solve my task.

First, Hibernate is fully irrelevant here. Struts2 is relevant, but you are not using it, you are using plain (low level) servlet API. That should probably work, but if your webapp is built around Struts2, that's not the recommended way. You should instead use the Stream result

For creating PDF documents, you can use Smart PDF Creator. It creates professional PDFs in a couple of clicks. You can try it for free here: http://www.smartpdfcreator.com

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Protecting PDF using PDFBox - java

Related

Not able to generate multiple documents using ServletOutputStream in Java [duplicate]

How to read / write into docx file using commons.io.FileUtils?

Downloading a PDF file from a protected webpage

Apache POI Formatting issue

Generating a pdf, and sending it to user with using HttpServletResponse(headers)

Categories

Resources