Create a New Page During Text to PDF conversion Using Itext

Create a New Page During Text to PDF conversion Using Itext - java

I am converting a text file to PDF using iText. The conversion works fine but I need that during conversion if the BufferedReader encounters a certain text, a new PDF Page is Started. This is what I have tried But A new Page is not started when that Text is encountered. My Sample code is as Below(Just the relevant part).
Document output = new Document(PageSize.B3);
FileInputStream fs = new FileInputStream("C:/ABC Statements final/File.TXT");
FileOutputStream file = new FileOutputStream(new File("C:/Pdf Statements/File.PDF"));
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
PdfWriter writer = PdfWriter.getInstance(output, file);
output.open();
writer.open();
.............................
String pageend = "Page Total";
String trimmedend = br.readLine().trim();
if (trimmedend.startsWith(pageend)) {
output.newPage();
}

Maybe you need to change your if-statement to something like this:
String pageend = "page total";
...
if (trimmedend.toLowerCase().contains(pageend)) {
...
}
This way, you avoid case-sensitivity and you avoid the problem of having characters that aren't considered being white space before "page total". Of course: this is just an educated guess. I don't know what your original data stream looks like.

Related

Apache PdfBox Rotate Crop Box Only Not Text

I am trying to go from text to pdf but have only one of the pages rotated 90 degress. Main reason is that some of my text documents are a bit too large and need to be in landscape to look normal. I have tried a few things but it seems like everything rotates the text too. Is there an easy way to rotate the pdf to landscape but keep the text the same rotation?
OutputStream outputStream = response.getOutputStream();
PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
Map<String, Documents> documents = getDocuments(user, documentReports);
try (PDDocument documentToPrint = new PDDocument()){
for(Document doc : documentReports){
TextToPDF textToPDF = new TextToPDF();
textToPDF.setFont(PDType1Font.COURIER);
textToPDF.setFontSize(8);
Document documentReport = documents.get(doc.getId());
try(PDDocument pdDocument = textToPDF.createPDFFromText(new InputStreamReader(new ByteArrayInputStream(documentReport.getReportText().getBytes())))) {
pdfMergerUtility.appendDocument(documentToPrint, pdDocument);
}
}
pdfMergerUtility.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
LocalDateTime localUtcTime = Java8TimeUtil.getCurrentUtcTime();
documentToPrint.getDocumentInformation().setTitle(localUtcTime.toString());
response.setHeader("Content-Disposition", "inline; filename=" + localUtcTime + ".pdf");
response.setContentType("application/pdf");
documentToPrint.save(outputStream);
}

So this might not work for everyone but I figured it out for my specific requirement. TextToPDF has a method called setLandscape before creating the pdf from text. textToPDF.setLandscape(true);

Java Pdf content to String

I'm wondering if is there a way to obtain the content of a pdf file (raw bytes) as a String using Apache PdfBox 2.0.8. What I'm doing is to save the PDDocument object to a ByteArrayOutputStream and then create a new String getting ByteArrayOutputStream's byte array. But if I save the String to a file, the result is a blank pdf. The reason for this is because pdf's stream section bytes are different from a pdf created directly from PdDocument object to a file. After knowing this, I tried to get the ByteArrayOutputStream's character encoding using juniversalchardet, but no luck. So, is there a way to acomplish this?
This is what I have tried so far:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PDDocument doc = new PDDocument();
... //Add page, font, pdPageContentStream and text only to doc object with some latin chars (áéíóú)
doc.save(baos);
So, if I create a file using baos object, the pdf file looks as expected, but if I do this:
String str = new String(baos.toByteArray());
And then create a file using str bytes, the pdf file only shows a blank page.
Hope I was clear enough this time :)

Using this, just append everything to a String.
StringBuilder sb = new StringBuilder();
try (PDDocument document = PDDocument.load(new File("your\\path\\file.pdf"))) {
document.getClass();
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
String lines[] = pdfFileInText.split("\\r?\\n");
for (String line : lines) {
sb.append(line);
}
}
}
return sb.toString();

How to copy/move AcroForm fields from one document to new blank one using IText5 or IText7?

I need to copy whole AcroForm including field positions and values from template PDF to a new blank PDF file. How can I do that?
In short words - I need to get rid of "background" from the template and leave only filed forms.
The whole point of this is to create a PDF with content that would be printed on pre-printed templates.
I am using IText 5 but I can switch to 7 if usefull examples would be provided

After a lot of trial and error I have found the solution to "How to copy AcfroForm fields into another PDF". It is a iText v7 version. I hope it will help somebody someday.
private byte[] copyFormElements(byte[] sourceTemplate) throws IOException {
PdfReader completeReader = new PdfReader(new ByteArrayInputStream(sourceTemplate));
PdfDocument completeDoc = new PdfDocument(completeReader);
ByteArrayOutputStream out = new ByteArrayOutputStream();
PdfWriter offsetWriter = new PdfWriter(out);
PdfDocument offsetDoc = new PdfDocument(offsetWriter);
offsetDoc.initializeOutlines();
PdfPage blank = offsetDoc.addNewPage();
PdfAcroForm originalForm = PdfAcroForm.getAcroForm(completeDoc, false);
// originalForm.getPdfObject().copyTo(offsetDoc,false);
PdfAcroForm offsetForm = PdfAcroForm.getAcroForm(offsetDoc, true);
for (String name : originalForm.getFormFields().keySet()) {
PdfFormField field = originalForm.getField(name);
PdfDictionary copied = field.getPdfObject().copyTo(offsetDoc, false);
PdfFormField copiedField = PdfFormField.makeFormField(copied, offsetDoc);
offsetForm.addField(copiedField, blank);
}
offsetDoc.close();
completeDoc.close();
return out.toByteArray();
}

Did you check the PdfCopyForms object:
Allows you to add one (or more) existing PDF document(s) to create a new PDF and add the form of another PDF document to this new PDF.
I didn't find an example, but you could try something like this:
PdfReader reader1 = new PdfReader(src1); // a document with a form
PdfReader reader2 = new PdfReader(src2); // a document without a form
PdfCopyForms copy = new PdfCopyForms(new FileOutputStream(dest));
copy.AddDocument(reader1); // add the document without the form
copy.CopyDocumentFields(reader2); // add the fields of the document with the form
copy.close();
reader1.close();
reader2.close();
I see that the class is deprecated. I'm not sure of that's because iText 7 makes it much easier to do this, or if it's because there were technical problems with the class.

how to read bullets from RTF file

I have a rtf file which has some text with bullets as shown in the screenshot below
I want to extract the data along with the bullets but when I print in the console, I get junk values. How do I print exactly the same from console.
The way I tried is as below
public static void main(String[] args) throws IOException, BadLocationException {
RTFEditorKit rtf = new RTFEditorKit();
Document doc = rtf.createDefaultDocument();
FileInputStream fis = new FileInputStream("C:\\Users\\Guest\\Desktop\\abc.rtf");
InputStreamReader i =new InputStreamReader(fis,"UTF-8");
rtf.read(i,doc,0);
System.out.println(doc.getText(0,doc.getLength()));
}
Console output:
I assumed junk values are due to console not supporting chareset so I tried to generate a pdf file but in pdf also I get the same junk values.
this is the pdf code
Paragraph de=new Paragraph();
Phrase pde=new Phrase();
pde.add(new Chunk(getText("C:\\Users\\Guest\\Desktop\\abc.rtf"),smallNormal_11));
de.add(pde);
de.getFont().setStyle(BaseFont.IDENTITY_H);
document.add(de);
public static String getText() throws IOException, BadLocationException {
RTFEditorKit rtf = new RTFEditorKit();
Document doc = rtf.createDefaultDocument();
FileInputStream fis = new FileInputStream("C:\\Users\\Guest\\Desktop\\abc.rtf");
InputStreamReader i =new InputStreamReader(fis,"UTF-8");
rtf.read(i,doc,0);
String output=doc.getText(0,doc.getLength());
return output;
}

Despite what you said, my guess is that it is a console encoding problem.
Anyway you can easily check it:
Just replace this line:
System.out.println(doc.getText(0,doc.getLength()));
With these 2 lines :
PrintStream ps = new PrintStream(System.out, true, "UTF-8");
ps.println(doc.getText(0,doc.getLength()));
This will force console encoding to UTF-8.
If it is still wrong, I would suspect your file is not fully rtf-compliant.
I made some tests and your code works well (the console one, I did not try the pdf) under Linux, but the console is natively in UTF-8.

white-space:nowrap does not work properly with Flying Saucer

I am using Flying Saucer to convert HTML documents to PDF. But there is a problem when I use <span style="white-space:nowrap">
Generally white-space:nowrap works fine. But when the span is near the right-margin of the document, it gets trimmed.
For example:
This html This is fine. <span style="white-space:nowrap">This is a test</span> gets converted to pdf like this:
which is perfect.
But when I use This is fine. This is also fine. <span style="white-space:nowrap">This is a test</span>, it gets converted to
Notice that part of span is trimmed because of right-margin.
What I expect is:
i.e. I expect the span to move to next line.
The code I am using to convert to pdf is:
String inputFile = "test.html";
String url = new File(inputFile).toURI().toURL().toString();
String outputFile = "firstdoc.pdf";
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
os.close();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Create a New Page During Text to PDF conversion Using Itext - java

Related

Apache PdfBox Rotate Crop Box Only Not Text

Java Pdf content to String

How to copy/move AcroForm fields from one document to new blank one using IText5 or IText7?

how to read bullets from RTF file

white-space:nowrap does not work properly with Flying Saucer

Categories

Resources