I'm working on removing Protected View from a series of PDFs, and am trying to use the iText library within VBA. My main issue at this point is that I have no idea what method to use, and the iText documentation is pretty dense.
I'm also feeling my way forward on calling the iText library from VBA, so any help on syntax to do this is also appreciated, though I'm sure I could get there myself if I knew which method to call...
Currently, I've got:
Dim program As WshExec
program = Shell("Java.exe -jar " & mypath & "\itext-5.5.6\itextpdf-5.5.6.jar")
'Debug.print program returns a value here, so this line works.
'I'm thinking I need something like:
'Set program = RunProgram("Java.exe -jar " & mypath & "\itext-5.5.6\itextpdf-5.5.6.jar", & _
methodName, param1)
I've been using the following questions to get me this far...
Calling Java library (JAR) from VBA/VBScript/Visual Basic Classic
Microsoft Excel Macro to run Java program
Desired functionality is to have an unprotected PDF sitting in a folder on mypath.
The jar you are trying to run is not an executable jar. iText is a library that be used in a Java application by adding itextpdf-5.5.6.jar to the CLASSPATH. If you don't write any Java code, then the jar won't do a thing, hence your Shell() and your RunProgram() methods are useless: there is nothing to execute.
Moreover: from your question, it is far from certain that you have a Java environment on your machine. You are working in a VBA environment, which makes one wonder why you'd use the Java version of iText. Have you tried using iTextSharp, which is the .NET version of iText (written in C#)?
Take a look at this tutorial: Programmatically Complete PDF Form Fields using Visual Basic and the iTextSharp DLL
In this tutorial, we take an existing PDF, we fill out a form, and we get another PDF based on the original PDF, but with extra data. You can easily adapt the code so that it takes an existing PDF, doesn't add anything to the PDF, but saves the original PDF without its passwords, as is explained in my answer to How can I decrypt a PDF document with the owner password?
If you combine what you can learn from my Java code:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
with what you learn from the form filling tutorial, you get something like this (provided that you use the iTextSharp DLL instead of the iText jar):
Dim pdfTemplate As String = "c:\Temp\PDF\encrypted.pdf"
Dim newFile As String = "c:\Temp\PDF\decrypted.pdf"
PdfReader.unethicalreading = true
Dim pdfReader As New PdfReader(pdfTemplate)
Dim pdfStamper As New PdfStamper(pdfReader, New FileStream(
newFile, FileMode.Create))
pdfStamper.Close()
pdfReader.Close()
IMPORTANT: this will only remove the password if the file is only protected with an owner password (which is what I assume when you talk about protected view). If the file is protected in any other way, you'll have to clarify. Also note that the parameter unethicalreading is not without meaning: make sure that you're not doing unethical by removing the protection.
I was having to manipulate protected PDF files using iText.
I just put in my pom.xml the following dependency and nothing more.
<!-- https://mvnrepository.com/artifact/org.bouncycastle/bcprov-jdk15on -->
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15on</artifactId>
<version>1.59</version>
</dependency>
Related
I wanted to make a simple program to get text content from a pdf file through Java. Here is the code:
PDFTextStripper ts = new PDFTextStripper();
File file = new File("C:\\Meeting IDs.pdf");
PDDocument doc1 = PDDocument.load(file);
String allText = ts.getText(doc1);
String gradeText = allText.substring(allText.indexOf("GRADE 10B"), allText.indexOf("GRADE 10C"));
System.out.println("Meeting ID for English: "
+ gradeText.substring(gradeText.indexOf("English") + 7, gradeText.indexOf("English") + 20));
This is just part of the code, but this is the part with the problem.
The error is: The method load(File) is undefined for the type PDDocument
I have learnt using PDFBox from JavaTPoint. I have followed the correct instructions for installing the PDFBox libraries and adding them to the Build Path.
My PDFBox version is 3.0.0
I have also searched the source files and their methods, and I am unable to find the load method there.
Thank you in advance.
As per the 3.0 migration guide the PDDocument.load method has been replaced with the Loader method:
For loading a PDF PDDocument.load has been replaced with the Loader
methods. The same is true for loading a FDF document.
When saving a PDF this will now be done in compressed mode per
default. To override that use PDDocument.save with
CompressParameters.NO_COMPRESSION.
PDFBox now loads a PDF Document incrementally reducing the initial
memory footprint. This will also reduce the memory needed to consume a
PDF if only certain parts of the PDF are accessed. Note that, due to
the nature of PDF, uses such as iterating over all pages, accessing
annotations, signing a PDF etc. might still load all parts of the PDF
overtime leading to a similar memory consumption as with PDFBox 2.0.
The input file must not be used as output for saving operations. It
will corrupt the file and throw an exception as parts of the file are
read the first time when saving it.
So you can either swap to an earlier 2.x version of PDFBox, or you need to use the new Loader method. I believe this should work:
File file = new File("C:\\Meeting IDs.pdf");
PDDocument doc1 = Loader.loadPDF(file);
I need to convert a docx to a pdf. The following code use the library xdocreport and works pretty well.
The problem is for some specific docx which contain drawings. They are not visible in the final pdf. I've tested the conversion with the live demo avaible from the github and I've the same problem.
So I'm wondering, is this possible, or do I need to use an other library ? Which one ? (dox4j doesn't seems to works neither).
final XWPFDocument document = new XWPFDocument(inputStream);
final OutputStream outPdf = new FileOutputStream("myFile.pdf");
PdfConverter.getInstance().convert(document, outPdf, optionsPdf);
outPdf.close();
XDocReport doesn't support drawing. It could support it since docx->pdf is based on iText which supports draw, but it's a big task (any contribution are welcome!)
You can see here limitation of XDocReport docx->pdf converter.
my question: How can I create a new linked document and insert (or connect) it into an element (in my case a Note-Element of an activity diagram).
The Element-Class supports the three Methods:
GetLinkedDocument ()
LoadLinkedDocument (string Filename)
SaveLinkedDocument (string Filename)
I missing a function like
CreateLinkedDocument (string Filename)
My goal: I create an activity diagram programmatically and some notes are to big to display it pretty in the activity diagram. So my goal is to put this text into an linked document instead of directly in the activity diagram.
Regards
EDIT
Thank you very much to Uffe for the solution of my problem. Here is my solution code:
public void addLinkedDocumentToElement(Element element, String noteText) {
String filePath = "C:\\rtfNote.rtf";
PrintWriter writer;
//create new file on the disk
writer = new PrintWriter(filePath, "UTF-8");
//convert string to ea-rtf format
String rtfText = repository.GetFormatFromField("RTF", noteText);
//write content to file
writer.write(rtfText);
writer.close();
//create linked document to element by loading the before created rtf file
element.LoadLinkedDocument(filePath);
element.Update();
}
EDIT EDIT
It is also possible to work with a temporary file:
File f = File.createTempFile("rtfdoc", ".rtf");
FileOutputStream fos = new FileOutputStream(f);
String rtfText = repository.GetFormatFromField("RTF", noteText);
fos.write(rtfText.getBytes());
fos.flush();
fos.close();
element.LoadLinkedDocument(f.getAbsolutePath());
element.Update();
First up, let's separate the linked document, which is stored in the EA project and displayed in EA's built-in RTF viewer, from an RTF file, which is stored on disk.
Element.LoadLinkedDocument() is the only way to create a linked document. It reads an RTF file and stores its contents as the element's linked document. An element can only have one linked document, and I think it is overwritten if the method is called again but I'm not absolutely sure (you could get an error instead, but the EA API tends not to work that way).
In order to specify the contents of your linked document, you must create the file and then load it. The only other way would be to go hacking around in EA's internal and undocumented database, which people sometimes do but which I strongly advise against.
In .NET you can create RTF documents using Microsoft's Word API, but to my knowledge there is no corresponding API for Java. A quick search turns up jRTF, an open-source RTF library for Java. I haven't tested it but it looks as if it'll do the trick.
You can also use EA's API to create RTF data. You would then create your intended content in EA's internal display format and use Repository.GetFormatFromField() to convert it to RTF, which you would then save in the file.
If you need to, you can use Repository.GetFieldFromFormat() to convert plain-text or HTML-formatted text to EA's internal format.
We have an Oracle BPM 10g activity that:
Reads a form-fill protected Word document template.
Merges data into the fields.
Saves the merged/filled copy to the filesystem.
Prints the document to a selected, pre-defined printer, OR to the default printer.
All of this works fine when printing to a "real" printer. However, there is now a need to output the Word document to TIFF. Attempting to use "Microsoft Document Image Writer" as one of the printer selections does not work as expected. Normally, when printing to the Microsoft Document Image Writer from Word (or any other application) directly, you're prompted for a location to save the resultant file. This prompting does not occur when attempting to print from this particular activity in BPM 10g.
Ideally, we actually would like to bypass the dialog and output the TIFF directly to the filesystem. However, I have not found a way to control this programmatically. That is, being able to specify the destination filename in code. Right now, I'm just trying to get output to the Microsoft Document Image Writer at all, to make sure it works.
So, the bottom line question(s) is/are:
Can this be done? I.e., printing to Microsoft Document Image Writer
If yes, can the file location dialog be suppressed?
How?
You said nothing about the way you're automating Word.
In Word VBA, you may use this sample to print out the active document immediately without showing the print dialog:
Public Sub PrintToXPS()
'Presume that Microsoft XPS Document Writer was already
'set up as ActivePrinter
Dim strFilePath As String
strFilePath = "C:\temp\helloworld.xps"
ActiveDocument.PrintOut Background:=False, outputfilename:=strFilePath
End Sub
There's no need to use the print dialog instead. However, if you want to operate through the dialog object, that can be done in Word using a variable of type Word.Dialog and providing the necessary parameters, e.g.
Dim dlgFilePrint As Word.Dialog
Set dlgFilePrint = Application.Dialogs(wdDialogFilePrint)
dlgFilePrint.Update
dlgFilePrint.PrToFileName = strFilePath
dlgFilePrint.printtofile = True
'add other parameters as needed ...
'lock up parameter names in Word VBA Online Help using "WdWordDialog-Enumeration"
'as key word
dlgFilePrint.Execute
What I did here with the XPS printer, you may of course do also with any other printer.
Thank you, domke consulting.
After more searching, I found this forum post on MSDN.
Adding these registry entries to suppress the dialog box and suppress post-generation output seemed to do the trick:
In HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\MODI\MDI Writer
PrivateFlags = 17 (Decimal)
OpenInMODI = 0 (Decimal)
For our purposes, this seems to work fine if we call the printOut() method with the following relevant arguments (other arguments omitted here for brevity):
document.printOut(outputFileName : "C:\\temp\\fileName.tif", printToFile : true);
I am trying to generate a PDF document from a *.doc document.
Till now and thanks to stackoverflow I have success generating it but with some problems.
My sample code below generates the pdf without formatations and images, just the text.
The document includes blank spaces and images which are not included in the PDF.
Here is the code:
in = new FileInputStream(sourceFile.getAbsolutePath());
out = new FileOutputStream(outputFile);
WordExtractor wd = new WordExtractor(in);
String text = wd.getText();
Document pdf= new Document(PageSize.A4);
PdfWriter.getInstance(pdf, out);
pdf.open();
pdf.add(new Paragraph(text));
docx4j includes code for creating a PDF from a docx using iText. It can also use POI to convert a doc to a docx.
There was a time when we supported both methods equally (as well as PDF via XHTML), but we decided to focus on XSL-FO.
If its an option, you'd be much better off using docx4j to convert a docx to PDF via XSL-FO and FOP.
Use it like so:
wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));
// Set up font mapper
Mapper fontMapper = new IdentityPlusMapper();
wordMLPackage.setFontMapper(fontMapper);
// Example of mapping missing font Algerian to installed font Comic Sans MS
PhysicalFont font
= PhysicalFonts.getPhysicalFonts().get("Comic Sans MS");
fontMapper.getFontMappings().put("Algerian", font);
org.docx4j.convert.out.pdf.PdfConversion c
= new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);
// = new org.docx4j.convert.out.pdf.viaIText.Conversion(wordMLPackage);
OutputStream os = new java.io.FileOutputStream(inputfilepath + ".pdf");
c.output(os);
Update July 2016
As of docx4j 3.3.0, Plutext's commercial PDF renderer is docx4j's default option for docx to PDF conversion. You can try an online demo at converter-eval.plutext.com
If you want to use the existing docx to XSL-FO to PDF (or other target supported by Apache FOP) approach, then just add the docx4j-export-FO jar to your classpath.
Either way, to convert docx to PDF, you can use the Docx4J facade's toPDF method.
The old docx to PDF via iText code can be found at https://github.com/plutext/docx4j-export-FO/.../docx4j-extras/PdfViaIText/
WordExtractor just grabs the plain text, nothing else. That's why all you're seeing is the plain text.
What you'll need to do is get each paragraph individually, then grab each run, fetch the formatting, and generate the equivalent in PDF.
One option may be to find some code that turns XHTML into a PDF. Then, use Apache Tika to turn your word document into XHTML (it uses POI under the hood, and handles all the formatting stuff for you), and from the XHTML on to PDF.
Otherwise, if you're going to do it yourself, take a look at the code in Apache Tika for parsing word files. It's a really great example of how to get at the images, the formatting, the styles etc.
I have succesfully used Apache FOP to convert a 'WordML' document to PDF. WordML is the Office 2003 way of saving a Word document as xml. XSLT stylesheets can be found on the web to transform this xml to xml-fo which in turn can be rendered by FOP into PDF (among other outputs).
It's not so different from the solution plutext offered, except that it doesn't read a .doc document, whereas docx4j apparently does. If your requirements are flexible enough to have WordML style documents as input, this might be worth looking into.
Good luck with your project!
Wim
Use OpenOffice/LbreOffice and JODConnector
This also mostly works for .doc to .docx. Problems with graphics that I have not yet worked out though.
private static void transformDocXToPDFUsingJOD(File in, File out)
{
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
DocumentFormat pdf = converter.getFormatRegistry().getFormatByExtension("pdf");
converter.convert(in, out, pdf);
}
private static OfficeManager officeManager;
#BeforeClass
public static void setupStatic() throws IOException {
/*officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome("C:/Program Files/LibreOffice 3.6")
.buildOfficeManager();
*/
officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();
officeManager.start();
}
#AfterClass
public static void shutdownStatic() throws IOException {
officeManager.stop();
}
You need to be running LibreOffice as a serverto make this work.
From the command line you can do this using;
"C:\Program Files\LibreOffice 3.6\program\soffice.exe" -accept="socket,host=0.0.0.0,port=8100;urp;LibreOffice.ServiceManager" -headless -nodefault -nofirststartwizard -nolockcheck -nologo -norestore
Another option I came across recently is using the OpenOffice (or LibreOffice) API (see here). I have not been able to get into this but it should be able to open documents in various formats and output them in a pdf format. If you look into this, let me know how it worked!