Cropping PDF using iText (java PDF library)

Cropping PDF using iText (java PDF library) - java

I need to crop PDF file using this code. cb will output the needed coordinates for the cropping.
How can efficiently do this for every page in a PDF file?
Did this code, but does not work for every pdf input:
public class CropPages {
public static final String PREFACE = "input.pdf";
public static final String RESULT = "cropped.pdf";
public void addMarginRectangle(String src, String dest)
throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
TextMarginFinder finder;
PdfDictionary pageDict;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
finder = parser.processContent(i, new TextMarginFinder());
//PdfContentByte cb = stamper.getOverContent(i);
PdfRectangle rect = new PdfRectangle((finder.getLlx()-5), (finder.getLly()-5), (finder.getUrx()+5), (finder.getUry()+5));
if (i <= 10){
System.out.println(rect);
}
pageDict = reader.getPageN(i);
pageDict.put(PdfName.CROPBOX, rect);
}
stamper.close();
}
public static void main(String[] args) throws IOException, DocumentException {
new CropPages().addMarginRectangle(PREFACE, RESULT);
}
}

Related

Problem with footer only first page docx4j

Good morning
I try to create a .docx documento using docx4j.
In the document I need to insert the footer only in the first page.
This is my code
private WordprocessingMLPackage wordMLPackage;
private ObjectFactory factory;
private FooterPart footerPart;
private Ftr footer;
public void creaDocumentoConFooterImmagine() throws Exception {
wordMLPackage = WordprocessingMLPackage.createPackage();
factory = Context.getWmlObjectFactory();
Relationship relationship = this.createFooterPart();
this.createFooterReference(relationship);
wordMLPackage.getMainDocumentPart().addParagraphOfText("Hello Word!");
URL logo = getClass().getClassLoader().getResource("Footer.jpg");
InputStream is = logo.openStream();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] byteChunk = new byte[4096]; // Or whatever size you want to read in at a time.
int n;
while ( (n = is.read(byteChunk)) > 0 ) {
baos.write(byteChunk, 0, n);
}
this.addImageInline(baos.toByteArray());
addPageBreak();
wordMLPackage.getMainDocumentPart().addParagraphOfText("This is page 2!");
File exportFile = new File("C:\\footerConImmagine.docx");
wordMLPackage.save(exportFile);
}
private void addImageInline(byte[] byteArray) throws Exception {
BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wordMLPackage, footerPart, byteArray);
int docPrId = 1;
int cNvPrId = 2;
Inline inLine = imagePart.createImageInline("Filename hint", "Alternative text", docPrId, cNvPrId, false);
if (footer != null) {
this.addInlineImageToFooter(inLine);
}
}
private void addInlineImageToFooter(Inline inLine) {
// Now add the in-line image to a paragraph
ObjectFactory factory2 = new ObjectFactory();
P paragraph2 = factory2.createP();
R run = factory.createR();
paragraph2.getContent().add(run);
Drawing drawing = factory.createDrawing();
run.getContent().add(drawing);
drawing.getAnchorOrInline().add(inLine);
footer.getContent().add(paragraph2);
}
private void createFooterReference(Relationship relationship) {
List<SectionWrapper> sections = wordMLPackage.getDocumentModel().getSections();
SectPr sectionProperties = sections.get(sections.size() - 1).getSectPr();
// There is always a section wrapper, but it might not contain a sectPr
if (sectionProperties == null) {
sectionProperties = factory.createSectPr();
wordMLPackage.getMainDocumentPart().addObject(sectionProperties);
sections.get(0).setSectPr(sectionProperties);
}
FooterReference footerReference = factory.createFooterReference();
footerReference.setId(relationship.getId());
footerReference.setType(HdrFtrRef.FIRST);
sectionProperties.getEGHdrFtrReferences().add(footerReference);
}
private Relationship createFooterPart() throws InvalidFormatException {
footerPart = new FooterPart();
footerPart.setPackage(wordMLPackage);
footerPart.setJaxbElement(this.createFooter("Text"));
return wordMLPackage.getMainDocumentPart().addTargetPart(footerPart);
}
//inserisco il testo del footer
private Ftr createFooter(String content) {
footer = factory.createFtr();
P paragraph = factory.createP();
R run = factory.createR();
Text text = new Text();
text.setValue(content);
run.getContent().add(text);
paragraph.getContent().add(run);
footer.getContent().add(paragraph);
return footer;
}
/**
* Adds a page break to the document.
*/
private void addPageBreak() {
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
Br breakObj = new Br();
breakObj.setType(STBrType.PAGE);
P paragraph = factory.createP();
paragraph.getContent().add(breakObj);
documentPart.getJaxbElement().getBody().getContent().add(paragraph);
}
If I set
footerReference.setType(HdrFtrRef.DEFAULT);
the footer is created properly, so the code seems correct.
I use this version of docx4j library: 6.1.2
How can I debug the problem?
Is there same example in the documentation of the library?
Thanks
Regards

Itext7 Hebrew reverse issue

I have simple piece of code that writes a PDF sometime this PDF will contain RTL languages like Hebrew or Arabic.
I was able to manipulate the text and mirror it using Bidi (Ibm lib)
But the text is still running in reverse
In English it would be something like:
instead of:
The quick
brown fox
jumps over
the lazy dog
It appears as:
the lazy dog
jumps over
brown fox
The quick
Complete code:
#Test
public void generatePdf() {
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd-hh.mm.ss");
String dest = "c:\\temp\\" + formatter.format(Calendar.getInstance().getTime()) + ".pdf";
String fontPath = "C:\\Windows\\Fonts\\ARIALUNI.TTF";
FontProgramFactory.registerFont(fontPath, "arialUnicode");
OutputStream pdfFile = null;
Document doc = null;
try {
ByteArrayOutputStream output = new ByteArrayOutputStream();
PdfFont PdfFont = PdfFontFactory.createRegisteredFont("arialUnicode", PdfEncodings.IDENTITY_H, true);
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(output));
pdfDoc.setDefaultPageSize(PageSize.A4);
pdfDoc.addFont(PdfFont);
doc = new Document(pdfDoc);
doc.setBaseDirection(BaseDirection.RIGHT_TO_LEFT);
String txt = "בתשרי נתן הדקל פרי שחום נחמד בחשוון ירד יורה ועל גגי רקד בכסלו נרקיס הופיע בטבת ברד ובשבט חמה הפציעה ליום אחד. 1234 באדר עלה ניחוח מן הפרדסים בניסן הונפו בכוח כל החרמשים";
Bidi bidi = new Bidi();
bidi.setPara(txt, Bidi.RTL, null);
String mirrTxt = bidi.writeReordered(Bidi.DO_MIRRORING);
Paragraph paragraph1 = new Paragraph(mirrTxt)
.setFont(PdfFont)
.setFontSize(9)
.setTextAlignment(TextAlignment.CENTER)
.setHeight(200)
.setWidth(70);
paragraph1.setBorder(new SolidBorder(3));
doc.add(paragraph1);
Paragraph paragraph2 = new Paragraph(txt)
.setFont(PdfFont)
.setFontSize(9)
.setTextAlignment(TextAlignment.CENTER)
.setHeight(200)
.setWidth(70);
paragraph2.setBorder(new SolidBorder(3));
doc.add(paragraph2);
doc.close();
doc.flush();
pdfFile = new FileOutputStream(dest);
pdfFile.write(output.toByteArray());
ProcessBuilder b = new ProcessBuilder("cmd.exe","/C","explorer " + dest);
b.start();
} catch (Exception e) {
e.printStackTrace();
}finally {
try {pdfFile.close();} catch (IOException e) {e.printStackTrace();}
}
}

The only solution that I have found with iText7 and IBM ICU4J without any other third party libraries is to first render the lines and then mirror them one by one. This requires a helper class LineMirroring and it's not precisely the most elegant solution, but will produce the output that you expect.
Lines mirroring class:
public class LineMirroring {
private final PageSize pageSize;
private final String fontName;
private final int fontSize;
public LineMirroring(PageSize pageSize, String fontName, int fontSize) {
this.pageSize = pageSize;
this.fontName = fontName;
this.fontSize = fontSize;
}
public String mirrorParagraph(String input, int height, int width, Border border) {
final StringBuilder mirrored = new StringBuilder();
try (ByteArrayOutputStream output = new ByteArrayOutputStream()) {
PdfFont font = PdfFontFactory.createRegisteredFont(fontName, PdfEncodings.IDENTITY_H, true);
final PdfWriter writer = new PdfWriter(output);
final PdfDocument pdfDoc = new PdfDocument(writer);
pdfDoc.setDefaultPageSize(pageSize);
pdfDoc.addFont(font);
final Document doc = new Document(pdfDoc);
doc.setBaseDirection(BaseDirection.RIGHT_TO_LEFT);
final LineTrackingParagraph paragraph = new LineTrackingParagraph(input);
paragraph.setFont(font)
.setFontSize(fontSize)
.setTextAlignment(TextAlignment.RIGHT)
.setHeight(height)
.setWidth(width)
.setBorder(border);
LineTrackingParagraphRenderer renderer = new LineTrackingParagraphRenderer(paragraph);
doc.add(paragraph);
Bidi bidi;
for (LineRenderer lr : paragraph.getWrittenLines()) {
bidi = new Bidi(((TextRenderer) lr.getChildRenderers().get(0)).getText().toString(), Bidi.RTL);
mirrored.append(bidi.writeReordered(Bidi.DO_MIRRORING));
}
doc.close();
pdfDoc.close();
writer.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
return mirrored.toString();
}
private class LineTrackingParagraph extends Paragraph {
private List<LineRenderer> lines;
public LineTrackingParagraph(String text) {
super(text);
}
public void addWrittenLines(List<LineRenderer> lines) {
this.lines = lines;
}
public List<LineRenderer> getWrittenLines() {
return lines;
}
#Override
protected IRenderer makeNewRenderer() {
return new LineTrackingParagraphRenderer(this);
}
}
private class LineTrackingParagraphRenderer extends ParagraphRenderer {
public LineTrackingParagraphRenderer(LineTrackingParagraph modelElement) {
super(modelElement);
}
#Override
public void drawChildren(DrawContext drawContext) {
((LineTrackingParagraph)modelElement).addWrittenLines(lines);
super.drawChildren(drawContext);
}
#Override
public IRenderer getNextRenderer() {
return new LineTrackingParagraphRenderer((LineTrackingParagraph) modelElement);
}
}
}
Minimal JUnit Test:
public class Itext7HebrewTest {
#Test
public void generatePdf() {
final SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd-hh.mm.ss");
final String dest = "F:\\Temp\\" + formatter.format(Calendar.getInstance().getTime()) + ".pdf";
final String fontPath = "C:\\Windows\\Fonts\\ARIALUNI.TTF";
final String fontName = "arialUnicode";
FontProgramFactory.registerFont(fontPath, "arialUnicode");
try (ByteArrayOutputStream output = new ByteArrayOutputStream()) {
PdfFont arial = PdfFontFactory.createRegisteredFont(fontName, PdfEncodings.IDENTITY_H, true);
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(output));
pdfDoc.setDefaultPageSize(PageSize.A4);
pdfDoc.addFont(arial);
LineMirroring mirroring = new LineMirroring(pdfDoc.getDefaultPageSize(), fontName,9);
Document doc = new Document(pdfDoc);
doc.setBaseDirection(BaseDirection.RIGHT_TO_LEFT);
final String txt = "בתשרי נתן הדקל פרי שחום נחמד בחשוון ירד יורה ועל גגי רקד בכסלו נרקיס הופיע בטבת ברד ובשבט חמה הפציעה ליום אחד. 1234 באדר עלה ניחוח מן הפרדסים בניסן הונפו בכוח כל החרמשים";
final int height = 200;
final int width = 70;
final Border border = new SolidBorder(3);
Paragraph paragraph1 = new Paragraph(mirroring.mirrorParagraph(txt, height, width, border));
paragraph1.setFont(arial)
.setFontSize(9)
.setTextAlignment(TextAlignment.RIGHT)
.setHeight(height)
.setWidth(width)
.setBorder(border);
doc.add(paragraph1);
doc.close();
doc.flush();
try (FileOutputStream pdfFile = new FileOutputStream(dest)) {
pdfFile.write(output.toByteArray());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

IText7 only creates form/widgets on new documents

When running this code with the PdfDocument not having a read source, it works properly. When I try reading from a premade pdf it stops creating the form/widgets, but still adds the paragraph as expected. There is no error given. Does anyone understand why this is happening?
Here is the code I'm running:
public class HelloWorld {
public static final String DEST = "sampleOutput.pdf";
public static final String SRC = "sample.pdf";
public static void main(String args[]) throws IOException {
File file = new File(DEST);
new HelloWorld().createPdf(SRC, DEST);
}
public void createPdf(String src, String dest) throws IOException {
//Initialize PDF reader and writer
PdfReader reader = new PdfReader(src);
PdfWriter writer = new PdfWriter(dest);
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer); //if i do (reader, writer) the widget isn't added to the first page anymore.
// Initialize document
Document document = new Document(pdf);
HelloWorld.addAcroForm(pdf, document);
//Close document
document.close();
}
public static PdfAcroForm addAcroForm(PdfDocument pdf, Document doc) throws IOException {
Paragraph title = new Paragraph("Test Form")
.setTextAlignment(TextAlignment.CENTER)
.setFontSize(16);
doc.add(title);
doc.add(new Paragraph("Full name:").setFontSize(12));
//Add acroform
PdfAcroForm form = PdfAcroForm.getAcroForm(doc.getPdfDocument(), true);
//Create text field
PdfTextFormField nameField = PdfFormField.createText(doc.getPdfDocument(),
new Rectangle(99, 753, 425, 15), "name", "");
form.addField(nameField);
return form;
}
}

I adapted your code like this:
public static PdfAcroForm addAcroForm(PdfDocument pdf, Document doc) throws IOException {
Paragraph title = new Paragraph("Test Form")
.setTextAlignment(TextAlignment.CENTER)
.setFontSize(16);
doc.add(title);
doc.add(new Paragraph("Full name:").setFontSize(12));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
PdfTextFormField nameField = PdfFormField.createText(pdf,
new Rectangle(99, 525, 425, 15), "name", "");
form.addField(nameField, pdf.getPage(1));
return form;
}
You'll notice two changes:
I change the Y offset of the field (525 instead of 753). Now the field is added inside the visible area of the page. In your code, the field was added, but it wasn't visible.
I defined to which page the fields needs to be added by adding pdf.getPage(1) as second parameter for the addField() method.

Convert .doc with images to .html using xdocreport

i am converting doc to html using following code
private static final String docName = "This is a test page.docx";
private static final String outputlFolderPath = "C://";
String htmlNamePath = "docHtml1.html";
String zipName="_tmp.zip";
static File docFile = new File(outputlFolderPath+docName);
File zipFile = new File(zipName);
public void ConvertWordToHtml() {
try {
InputStream doc = new FileInputStream(new File(outputlFolderPath+docName));
System.out.println("InputStream"+doc);
XWPFDocument document = new XWPFDocument(doc);
XHTMLOptions options = XHTMLOptions.create(); //.URIResolver(new FileURIResolver(new File("word/media")));;
String root = "target";
File imageFolder = new File( root + "/images/" + doc );
options.setExtractor( new FileImageExtractor( imageFolder ) );
options.URIResolver( new FileURIResolver( imageFolder ) );
OutputStream out = new FileOutputStream(new File(htmlPath()));
XHTMLConverter.getInstance().convert(document, out, options);
} catch (Exception ex) {
}
}
public static void main(String[] args) throws IOException, ParserConfigurationException, Exception {
Convertion cwoWord=new Convertion();
cwoWord.ConvertWordToHtml();
}
public String htmlPath(){
return outputlFolderPath+htmlNamePath;
}
public String zipPath(){
// d:/_tmp.zip
return outputlFolderPath+zipName;
}
Above code is converting doc to html fine. Issue comes when i try to convert a doc file which has graphics
like circle (shown in screenshot), In this case, graphics doesn't show into html file.
Please help me out how can we maintain graphics from doc to html file as well after conversion. Thanks in Advance

You can embed the images in the html by using the following code:
Base64ImageExtractor imageExtractor = new Base64ImageExtractor();
options.setExtractor(imageExtractor);
options.URIResolver(imageExtractor);
where Base64ImageExtractor looks like:
public class Base64ImageExtractor implements IImageExtractor, IURIResolver {
private byte[] picture;
public void extract(String imagePath, byte[] imageData) throws IOException {
this.picture = imageData;
}
private static final String EMBED_IMG_SRC_PREFIX = "data:;base64,";
public String resolve(String uri) {
StringBuilder sb = new StringBuilder(picture.length + EMBED_IMG_SRC_PREFIX.length())
.append(EMBED_IMG_SRC_PREFIX)
.append(Base64Utility.encode(picture));
return sb.toString();
}
}

How to generate Dyanamic no of pages using PDFBOX

I have to generate a pdf file depending on some input .Each time the code runs , the input length may vary , so how can I add pages to the document dynamically depending on my input content .
public class pdfproject
{
static int lineno=768;
public static void main (String[] args) throws Exception
{
PDDocument doc= new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream cos = new PDPageContentStream(doc, page);
for(int i=0;i<2000;i++)
{
renderText("hello"+i,cos,60);
}
cos.close();
doc.save("test.pdf");
doc.close();
}
static void renderText(String Info,PDPageContentStream cos,int marginwidth) throws Exception
{
lineno-=12;
System.out.print("lineno="+lineno);
PDFont fontPlain = PDType1Font.HELVETICA;
cos.beginText();
cos.setFont(fontPlain, 10);
cos.moveTextPositionByAmount(marginwidth,lineno);
cos.drawString(Info);
cos.endText();
}
}
How do i ensure that the content is rendered on the next page by adding a new page dynamically when there is no space on the current page ?

Pdfbox does not include any automatic layouting support. Thus, you have to keep track of how full a page is, and you have to close the current page, create a new one, reset fill indicators, etc
This obviously should not be done in static members in some project class but instead in some dedicated class and its instance members. E.g.
public class PdfRenderingSimple implements AutoCloseable
{
//
// rendering
//
public void renderText(String Info, int marginwidth) throws IOException
{
if (content == null || textRenderingLineY < 12)
newPage();
textRenderingLineY-=12;
System.out.print("lineno=" + textRenderingLineY);
PDFont fontPlain = PDType1Font.HELVETICA;
content.beginText();
content.setFont(fontPlain, 10);
content.moveTextPositionByAmount(marginwidth, textRenderingLineY);
content.drawString(Info);
content.endText();
}
//
// constructor
//
public PdfRenderingSimple(PDDocument doc)
{
this.doc = doc;
}
//
// AutoCloseable implementation
//
/**
* Closes the current page
*/
#Override
public void close() throws IOException
{
if (content != null)
{
content.close();
content = null;
}
}
//
// helper methods
//
void newPage() throws IOException
{
close();
PDPage page = new PDPage();
doc.addPage(page);
content = new PDPageContentStream(doc, page);
content.setNonStrokingColor(Color.BLACK);
textRenderingLineY = 768;
}
//
// members
//
final PDDocument doc;
private PDPageContentStream content = null;
private int textRenderingLineY = 0;
}
(PdfRenderingSimple.java)
You can use it like this
PDDocument doc = new PDDocument();
PdfRenderingSimple renderer = new PdfRenderingSimple(doc);
for (int i = 0; i < 2000; i++)
{
renderer.renderText("hello" + i, 60);
}
renderer.close();
doc.save(new File("renderSimple.pdf"));
doc.close();
(RenderSimple.java)
For more specialized rendering support you will implement improved rendering classes, e.g. PdfRenderingEndorsementAlternative.java from this answer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cropping PDF using iText (java PDF library) - java

Related

Problem with footer only first page docx4j

Itext7 Hebrew reverse issue

IText7 only creates form/widgets on new documents

Convert .doc with images to .html using xdocreport

How to generate Dyanamic no of pages using PDFBOX

Categories

Resources