Is there a way to micro-adjust exactly where a PDF prints on a page within IText?
Here's my problem: It appears that naturally through variations in printer manufactures or age of printer, etc., minor variations occur when printing a PDF document. These variations are typically very small, however for 2 sample printers (both laser and the same model and manufacturer) the variation in text placement is upwards of 1-2 millimeters between printers. This would be fine for most normal printing however, for this task, this is outside of acceptable tolerances.
My gut reaction to this is to provide some ability to make micro adjustments to exactly where the print happens and thus be able to account for any variation within the printers themselves. Printing does appear to be consistent for all jobs sent to a single printer however.
Presently I have a PDF created in Adobe Acrobat Pro X that has form fields, that are then filled out by a java application and sent to the printer.
Thank you for any and all suggestions
My first attempt to solve this issue would be to try changing the default settings of the printer driver... somehow. But I can't give any useful pointers about that.
If you want to adjust the PDFs, you should probably change the page boundaries to shift the content. For example, this code shifts the content of the first page 50 units down, for a simple PDF that has only a MediaBox.
PdfReader reader = new PdfReader("in.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("out.pdf"));
PdfDictionary pagedict = reader.getPageN(1);
PdfArray mediabox = pagedict.getAsArray(PdfName.MEDIABOX);
mediabox.set(1, new PdfNumber(mediabox.getAsNumber(1).intValue()+50));
mediabox.set(3, new PdfNumber(mediabox.getAsNumber(3).intValue()+50));
stamper.close();
You'll have to adjust any other boxes (CropBox, BleedBox, etc) accordingly. Take a look at the PDF spec for information on the different page boundaries.
Related
I'm attempting to perform some string validation against individual PDF pages in a file via the use of Apache PDFBox.
I'm going to be utilizing PDFTextStripper for the majority of this, so my first issue to tackle was the fact that all the PDFs i'm going to be validating against are generated as 2up; e.g Page 1 of 2 and page 2 of 2 were on the same page or if you imagine you literally scanned a book face down into a scanner - In addition to this, they were oriented incorrectly, and needed rotating 90 degrees so PDFTextStripper could read them properly.
Using elements of the below questions/solutions, i have built a method which first crops the page exactly in half, exports the cropped pages in order to a new file, rotates each page to the correct orientation and then saves the file;
Rotate PDF around its center using PDFBox in java
Split a PDF page in two parts [duplicate]
Visually, my method is seemingly working as expected until i run PDFTextStripper against it - It appears to be returning the text of not just the page i want, but also the page i cropped out of it.
To confirm the issue, I extracted a single page out of the entire document and saved it as a new file - when running PDFTextStripper, i still get the same results even though all i can see is literally one page. Adobe search doesn't bring up the hidden, legacy data either.
I can only assume that during my transform method, i need to redefine the cropped page with only the contents of the cropped page.
My question is, how can i do this?
p.s - i haven't posted my code as it's basically a amalgamation of the solutions provided in the aforementioned links above - however if it i needed, i can provide
The PDFTextStripper ignores the CropBox you set to crop the pages. It also ignores whether text is covered by some filled rectangle or image or whether the text is invisible, it extracts all text (except text in patterns or contains in Type 3 font characters).
You might want to try the PDFTextStripperByArea instead. This class (which is derived from PDFTextStripper) restricts itself to regions you can define.
(Unfortunately these regions have to be defined using a different coordinate system than the one used for the CropBox, so usually you will have to transform the coordinates first.)
I'm trying to generate an xsl to be printed in a pre-printed sheet which works fine.
Now i want to give the user a better previsualization (in the pdf screen version) adding a background image which emulates the "pre-printed" stuf on the sheet to give the user a "context" of what is he printing.
The question is: Is there any way I can set a background image in xsl (using apache fop) visible only in pdf but not in the printed version of it?
Thank you all for reading or givin any advice.
Although as the comments state, you can't have content in the PDF that does not come out in a physical printed copy, here is one possible work around for you. Depending on how your users are ultimately going to be using FOP for PDF rendering and how your a driving the work flow, it's possible to pass a parameter into an xslt file before the transofrmation phase is run, so potentially, you could do a dual rendering of the same PDF, one that is presented to the user where the background image is enabled, and one that gets printed, you could just set a variable similar to how they do in this Example, and call it something like $isPreview, and just use a simple if or choose statement to check for 'Y' or 'N'.
Since you are sending to a printer, you may even want to take advantage of FOP's ability to generate to Postscript rather than PDF, I've used this feature quite extensively for print documents using FOP while also producing a PDF copy for electronic delivery via email or hosted services, and I've yet to find any discrepancy between the PDF rendering and what is printed after sending a rendered postscript file, so it should work well for you as well.
As I said, this is not truly a solution to your problem as you've presented it, but as a work around, it could get you the desired results if your clever about how you implement it.
I don;t think the statement that it is not possible is true, I am just not sure how to create such a PDF with FOP. Certainly you can add an image field. One would use a button field and place the image in the button. Then you would set the properties of that button to not print (printable false).
PDF support images in fields: https://answers.acrobatusers.com/adding-image-field-form-q41825.aspx
RenderX supports PDF Form fields but I do not see where they support an image inside the button, only text: http://www.renderx.com/reference.html#PDF%20Forms. But they do support setting a field to "printable".
I have been using iText library for Java to fill automatically a PDF Document. The first thing I do is map every field. Once I get every field mapped, I save the variables name into Strings to be easy to be accessible.
So far, so good. The problem is that I have a group of 6 checkboxes with the same variable name. For exemple, they are named topmostSubform[0].Page2[0].p2_cb01[0].
With some tests I could figure out that if I check the first checkbox so the topmostSubform[0].Page2[0].p2_cb01[0] = 1
If I check the second one (that unchecks the first automatically) so topmostSubform[0].Page2[0].p2_cb01[0] = 2
Then topmostSubform[0].Page2[0].p2_cb01[0] = 3 successively until it gets the the number 6 that is the last one.
I am using form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1");to fill up the fields. When I fill with the value 1 the first checkbox gets checked, but when I fill with the number 2 that should check the second checkbox it does not work. It does not matter if I choose 2, 3, 4, 5 or 6 it just does not work, the checkboxes stay empty and I can`t check them.
Here a piece of the code:
String _5_1 = "topmostSubform[0].Page2[0].p2_cb01[0]";
AcroFields form = stamper.getAcroFields();
form.setField(_5_1, "3");
Please, I need suggestions.
Allow me to quote from ISO-32000-1 section 12.7.3.2 "Field names":
It is possible for different field dictionaries to have the same fully
qualified field name if they are descendants of a common ancestor with
that name and have no partial field names (T entries) of their own.
Such field dictionaries are different representations of the same
underlying field; they should differ only in properties that specify
their visual appearance. In particular, field dictionaries with the
same fully qualified field name shall have the same field type (FT),
value (V), and default value (DV).
If we apply this to your question: it is possible for different field dictionaries to have the same name topmostSubform[0].Page2[0].p2_cb01[0]. Such field dictionaries are different representations of the same field and they shall have the same value.
There are two options:
If you have a PDF with field dictionaries with name (topmostSubform[0].Page2[0].p2_cb01[0]) that have different values, then you don't have a valid PDF file: it is in violation with ISO-32000-1, which is the official PDF specification.
Maybe you think that you have check boxes with the same field name and different values, but maybe those check boxes are in reality a radio field with different radio buttons. Maybe you are not using the correct values. Maybe something else is at play. For a SO reader to be able to help you, he'd need to see the PDF file.
If option 1 applies, abandon all hope: you have a bad PDF. Fix it or throw it away. If option 2 applies, please share the PDF.
Update after inspecting the PDF file:
Option 2 applies. You have a hybrid form, which means that the form is described twice inside the PDF, once using AcroForm technology and once using XFA. Please start by reading my answer to the following question: PDFTK and removing the XFA format
When you open the PDF in Adobe Reader, you will notice that the fields act as if they are radio buttons. When you click one, it is selected, but when you click another, it is selected, but the first one is no longer selected.
What you see, is the form as described in XFA, and there are some important differences between the XFA form and the AcroForm description. This isn't an error. It's inherent to hybrid forms.
When you fill out the form using:
form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1");
iText fills out the AcroForm correctly, but it fails at filling out the XFA form because iText makes an educated guess (not an accurate guess) as to where the corresponding value should be set in the XFA stream (which is actually expressed in XML). For more details: this is explained in chapter 8 of iText in Action - Second Edition.
What I usually do in such cases is exactly what the person who asked if he could safely throw away the XFA part does: I remove the XFA part:
AcroFields form = stamper.getAcroFields();
form.removeXfa();
This simplifies things dramatically, but it doesn't solve your problem yet. To solve your problem, we need to look inside the PDF:
As you can see in the screen shot (taken from iText RUPS), there are two different descriptions for the form: you have a /Fields array (the AcroForm description) and you have an /XFA part that consists of different streams that, if you join them, form a large XML file.
We also see that where you think there's a single field topmostSubform[0].Page2[0].p2_cb01[0], there are in reality 6 fields:
topmostSubform[0].Page2[0].p2_cb01[0]
topmostSubform[0].Page2[0].p2_cb01[1]
topmostSubform[0].Page2[0].p2_cb01[2]
topmostSubform[0].Page2[0].p2_cb01[3]
topmostSubform[0].Page2[0].p2_cb01[4]
topmostSubform[0].Page2[0].p2_cb01[5]
Now let's take a look inside those fields.
This is field topmostSubform[0].Page2[0].p2_cb01[0]:
This is field topmostSubform[0].Page2[0].p2_cb01[0]:
These are AcroForm check boxes, but there an instruction meant for humans that says: select only one. This instruction can be understood by humans only, not by machines or software.
My first attempt at writing the FillHybridForm example failed because I made a similar error to yours. I didn't look closely enough at the different appearance states. I thought that the On value of topmostSubform[0].Page2[0].p2_cb01[0] was 0, of topmostSubform[0].Page2[0].p2_cb01[1] was 1, and so on. It wasn't... The On value of topmostSubform[0].Page2[0].p2_cb01[0] was 1, of topmostSubform[0].Page2[0].p2_cb01[1] was 2, and so on.
This is how you can fill out all the check boxes:
public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
form.removeXfa();
form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1");
form.setField("topmostSubform[0].Page2[0].p2_cb01[1]", "2");
form.setField("topmostSubform[0].Page2[0].p2_cb01[2]", "3");
form.setField("topmostSubform[0].Page2[0].p2_cb01[3]", "4");
form.setField("topmostSubform[0].Page2[0].p2_cb01[4]", "5");
form.setField("topmostSubform[0].Page2[0].p2_cb01[5]", "6");
stamper.close();
reader.close();
}
Now all the check boxes are checked. See f8966_filled.pdf:
Of course: being human, we know that we shouldn't do this, because we should treat the fields as if they were radio buttons, but there is no technical reason in the AcroForm description why we couldn't. The logic that prevents us to do so, is only present in the XFA description.
This solves your problem if it is acceptable to throw away the XFA part. It will also solve your problem if it's OK to flatten the form in which case you should add:
stamper.setFormFlattening(true);
If you the above options aren't acceptable, you shouldn't throw away the XFA part, but fill out the AcroForm part as described above and use iText to extract the XML dataset (see datasets in the first screen shot), update it the way the US government expect you to update it, and use iText to put the updates dataset back in the datasets object.
Phew... This is one of the longest answers I ever wrote on StackOverflow.
Disclaimer: I have been trying to do this with iText. I have read, studied, asked, queried, experimented, and did everything I could think of to make this work. I am infuriated. Please don't think I haven't tried and just came here so that I can get someone else to do this for me; that's not the case. I want to learn, I want to figure this out. I am looking for a good direction from someone that has experience with this.
I have a PDF that contains editable text fields. What I want to do is programmatically read from that PDF and take in the text from the text fields that are already there, take text from somewhere else in my app, and write the previous information + the new text information back to the original PDF.
What I have tried:- reading the PDF with PdfReader- using PdfStamper(PdfReader, FileOutputStream) to write to the PDF- used reader.getAcroFields() to get the text fields.
I have scoured the web for days now and I can't get this to resolve. When I do this:
String in = "C:/Users/me/Desktop/file.pdf";
String out = "C:/Users/me/Desktop/file.pdf";
PdfReader reader = new PdfReader(in);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(out));
AcroFields form = stamper.getAcroFields();
form.setField("dateDisc1", "5/21");
it ends up creating a PDF (file.pdf) that is corrupted.
If there is an easier way to do this, please help me to shine some light on this.
Thank you!
Yeah Sorry, I didn't notice the filenames.
Original answer:
Okay there is no simple way, I found this out many moons ago.
PDF is actually some sort of wrapper around PostScript (it contains fonts and such), and converting PDF to PostScript is very easy (I just ran a command and worked on the PostScript from there)
It's not like LaTeX, PostScript is for printers, and it has a stack where you can push states and such, every letter is positioned absolutely, PostScript is a set of instructions for a virtual machine the printer then interprets.
Text highlighing and such comes from a higher up knowledge that the text flows from left to right and stuff. I read the PostScript standard, got what I wanted and have not touched it since, this isn't a great anwer but will certainly point you in the right direction.
Remember PDFs and PostScript(document)s are not made to be edited, they don't do text-wrap and such, if you zoom you have to pan, they exist to preserve the format, for printers and such.
Figured it out on my own. I created a walkthrough for others in the future looking to do something similar:
I have scoured the web for days trying to find a simple way of doing this. Unsuccessful, I dug my heels in and was determined that if it is possible, I will make it work. I have seen several dozen locations all over the internet of people asking how to do this; now, here is a well documented example.
//Define the location of the PDF and establish a new file to write to. We will change the target later//
String dest = System.getProperty("user.home").concat("directory_and_name_of_PDF.pdf");
String out = System.getProperty("user.home").concat("directory_and_name_that_will_be_changed.pdf");
PdfReader pdfreader = new PdfReader(dest);
PdfStamper stamper = new PdfStamper(pdfreader, new FileOutputStream(System.getProperty("user.home").concat("same_as_String out")));
AcroFields form = stamper.getAcroFields();
//Append text to the text fields//
form.setField("text field name", "text to add");
form.setField("text field name2", "repeat");
form.setField("text field name3", "repeat");
form.setField("text field name4", "repeat");
stamper.close();
pdfreader.close();
//Change the file name of fileOutputStream to the original that was read from//
File destfile = new File(dest);
File outfile = new File(out);
destfile.delete();
outfile.renameTo(new File(System.getProperty("user.home").concat("name_of_the_original_PDF.pdf")));
And there we have it. Be aware that this will delete the original file and rename the fileOutputStream to the original file name; if there was any information from the original that you didn't read in, it will be lost. Make sure to gather ALL of the information that you wish to copy over. Also, bear in mind that I set up a conditional statement (if) to make sure that whatever text in the text fields that was read in wouldn't be written over. You will have to do this or you risk not carrying over the previous text to the new PDF.
If there are any questions, feel free to ask. I am in no way a professional developer but I can offer advice on what I know. Remember to research before you ask. Good luck!
Using JSP and Jasper Reports, I made an application for printing A4 label pages.
I have to configure my application (set alignment on the PDF page to be generated) according to different pages (2×5, 2×7, 3×10 and 3×11 grids), different printers (Kyocera, OKI and HP) and different PDF viewers (Adobe, Foxit and Nitro).
Example: I set in Jasper Reports an A4 page with a grid of 2×5 and an user who has Foxit Reader will print it on a Kyocera. If another user has Kyocera too but is using Adobe Reader, the space between the columns gets smaller. But if the user has Foxit Reader but will print on an OKI, the whole document goes left and even gets cut.
To configure each individual label page is unavoidable, but can it be possible for me not have to set the page alignment for specific PDF viewer or printer (at least one of those)? It can be a way to skip the PDF generation or some conventional configuration that all printers would interpret - to get my page printed exactly the same, regardless of PDF viewer or printer.
Are you sure this is not a setting in Foxit Reader and/or Adobe reader that is causing the issue?
I know in Adobe reader there is a setting in the print dialog under Page Sizing & Handling. They should use Actual Size so as to not do any scaling or manipulation of the image.
In Foxit Reader it is under Print Handling. You need to set Scaling Type to None. The default seems to be Fit to Printer Margins.
You should not have to do anything different for each PDF reader and/or each version and combination of printer. That is a maintenance nightmare you should not try to take this on. You would have to make changes every time they bought a new printer, and potentially when an upgrade to their reader comes out.
Best bet is to figure out why they are producing different results, and tackle that issue, instead of brute forcing the problem. I am pretty sure this is more a training issue with your users, and telling them about these settings should clear up the problem.
UPDATE: After some more digging it seems to be possible to set the value of the Print Scaling while exporting. After you create your JRPdfExporter you need to set JRPdfExporterParameter.PRINT_SCALING to JRPdfExporterParameter.PRINT_SCALING_NONE:
exporter.setParameter(JRPdfExporterParameter.PRINT_SCALING, JRPdfExporterParameter.PRINT_SCALING_NONE);
I do not know if this will work for Foxit Reader also, but I would assume it would.