get all acrosfields avalibles in a pdf document itext7 - java

I am trying to automatize the modification of an pdf template according to some data computing (using Java)
I have no experience with pdf modification and I am being trying to use itext7 to do this.
I have been reading how to add text to a pdf and even here I saw how to field Acrosfield if they exist using a "key"
Nevertheless, I didn't made the pdf template I am using (which is modifiable) so I don't know if the fields with you can manually fill are made with Acrosfields or another tecnology and I don'w know what are the keys or each field If they have one...
I saw this question; where it is says how to get all the fields and their values but when I try the code that appear in the only answer I get;
main.java:[40,0] error: illegal start of type
main.java:[40,19] error: ')' expected
main.java:[40,30] error: <identifier> expected
3 errors
In this part:
for (String fldName : fldNames) {
System.out.println( fldName + ": " + fields.getField( fldName ) );
}
After try a bit, I have been finding more information but I can't find a way to get these "keys" if it's possible...
------- EDIT -------
I've made this code in order to make a copy of my pdf-template which have the name of the Acrosfield's key in each field:
package novgenrs;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.AcroFields;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Set;
public class MakePDF {
public static void MakePDF(String[] args) throws IOException, DocumentException{
PdfReader reader = new PdfReader("template.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("result.pdf"));
//AcroFields form = stamper.getAcroFields();
AcroFields fields = reader.getAcroFields();
AcroFields wrt = stamper.getAcroFields();
Set<String> fldNames = fields.getFields().keySet();
for (String fldName : fldNames) {
wrt.setField(fldName, fldName) ;
}
stamper.close();
reader.close();
}
}
NOTE: this only work with itext5. For some reason when I tried to do this with itext7 I couldn't made it work so I tried to do it with itext5 and it worked!

If you want a full answer to your question, you will have to provide the PDF so that we can inspect it, but these are already some answers that will get you in the right direction.
When you refer to How to fill out a pdf file programmatically? (AcroForm technology), you refer to the iText 7 version of How to fill out a pdf file programmatically? (AcroForm technology) which is the answer to the same question, but for developers who use iText 5. As you can see, there's a big difference between iText 5 and iText 7.
However, when you refer to How do I get all the fields and value from a PDF file using iText? you get an answer that is to be used with iText 5 only. If you are using iText 7, that code won't work because it's iText 5 code.
The code you need can be found here: How to get specific types from AcroFields? Like PushButtonField, RadioCheckField, etc
PdfReader reader = new PdfReader(src);
PdfDocument pdfDoc = new PdfDocument(reader);
// Get the fields from the reader (read-only!!!)
PdfAcroForm form = PdfAcroForm.getAcroForm(pdfDoc, true);
// Loop over the fields and get info about them
Set<String> fields = form.getFormFields().keySet();
for (String key : fields) {
writer.print(key + ": ");
PdfName type = form.getField(key).getFormType();
if (0 == PdfName.Btn.compareTo(type)) {
if(((PdfButtonFormField)form.getField(key)).isPushButton()){
writer.println("Pushbutton");
} else {
if(((PdfButtonFormField)form.getField(key)).isRadio()){
writer.println("Radiobutton");
}else {
writer.println("Checkbox");
}
}
} else if (0 == PdfName.Ch.compareTo(type)) {
writer.println("Choicebox");
} else if (0 == PdfName.Sig.compareTo(type)) {
writer.println("Signature");
} else if (0 == PdfName.Tx.compareTo(type)) {
writer.println("Text");
}else {
writer.println("?");
}
}
This code will loop over all the fields, and write the key to the System.out, as well as the type of field that corresponds with that key. You can also use RUPS to inspect your PDF.
You mention:
main.java:[40,0] error: illegal start of type
main.java:[40,19] error: ')' expected
main.java:[40,30] error: <identifier> expected
It isn't clear to me if this is a compiler error or a run-time error.
If it's a compiler error, you are simply missing a ) somewhere, in which case your problem is totally unrelated to iText (which is what I suspect: you just have a very simple programming error).
If it's a run-time error, there is something wrong with your PDF. In PDF, a string is delimited by parentheses. Maybe a bracket is missing somewhere in your PDF (but I doubt that: you'd get a different type of error).
In short: try a good IDE, and that IDE will tell give you an indication of where a parenthesis is missing. If you don't find that location immediately, clean up your code by adding indentation and spaces. That should make it clear where you forget a ).

Related

Fill out field for pdf revision when signing with PDFbox

I am trying to add input to my existing field in a pdf that should be signed at the end. I used the library from swisscom (https://github.com/SwisscomTrustServices/pdfbox-ais-client/blob/8d52c759ade267b0c443fcd6f15bc9635c745d72/src/main/java/com/swisscom/ais/client/impl/PdfDocument.java#L97) with PDFbox (v2.0.24) and added these lines
try {
PDAcroForm acroForm = pdDocument.getDocumentCatalog().getAcroForm();
acroForm.setSignaturesExist(true);
acroForm.setAppendOnly(true);
acroForm.getCOSObject().setDirect(true);
acroForm.getCOSObject().setNeedToBeUpdated(true);
FieldInput[] fields = new FieldInput[1];
COSObject pdfFields = acroForm.getCOSObject().getCOSObject(COSName.FIELDS);
if (pdfFields != null) {
pdfFields.setNeedToBeUpdated(true);
}
fields[0] = new FieldInput("1", "foobar");
for (int i = 0; i < fields.length; i++) {
PDField field = acroForm.getField(fields[0].id);
if (field != null) {
field.setValue(fields[0].value);
Log.info("set field: " + field.getFullyQualifiedName());
}
}
pdDocument.getDocumentCatalog().getCOSObject().setNeedToBeUpdated(true);
} catch (Exception e) {
Log.warn(e);
}
I get the log output that the field was set, but in the final document the field is still empty. Using this answer from PDFBox 2.0 create signature field and save incremental with already signed document was no luck for me, I think I messed up the form handling since pdfFields is null.
Update:
I added the suggestion from Tilman, now the entry gets set with
field.getCOSObject().setNeedToBeUpdated(true);
But when looking at the signature I have no information which fields are filled out:
Is it possible with pdfbox to achieve the same output as AdobeSign like where you can store more detailed information in the revision metadata? I am not able to open the file with itext rups because the pdf gets locked with a password at the end....
And what would be the best way to look the fields when everyone is done (so the fields are not shown as editable fields anymore):
setting a lock like in Adobe
setting some protection after the field was filled
The update flag must be set on the field itself
field.getCOSObject().setNeedToBeUpdated(true);
and on the appearance of the widget of the field
field.getWidgets().get(0).getAppearance().getCOSObject().setNeedToBeUpdated(true);
(this assumes that the field is its own widget and that there is only one)
It might still work if the second code part is missing because Adobe Reader updates the appearance when displaying.

Extracting answers to a flattened PDF form with iText 7

We have a few forms created from Adobe LiveCycle where users fill the dynamic forms and submits the document to our office where we stamp it with our signature and flatten it (at least most of the time - I've seen a few documents in our system that haven't been flattened yet but that can be a separate question, I'll focus on the flattened documents here because that's most of what we have).
I'm trying to use iText 7 to parse/extract the user's answers to our forms for migrating to an electronic solution that will happen a few months from now. I was able to make the example work in Java but I don't understand the process.
/*
This file is part of the iText (R) project.
Copyright (c) 1998-2020 iText Group NV
Authors: iText Software.
For more information, please contact iText Software at this address:
sales#itextpdf.com
*/
/**
* Example written by Bruno Lowagie in answer to:
* http://stackoverflow.com/questions/24506830/can-we-use-text-extraction-strategy-after-applying-location-extraction-strategy
*/
package ca.umanitoba.ad.research;
import com.itextpdf.kernel.font.PdfFont;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.canvas.parser.EventType;
import com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor;
import com.itextpdf.kernel.pdf.canvas.parser.data.IEventData;
import com.itextpdf.kernel.pdf.canvas.parser.data.TextRenderInfo;
import com.itextpdf.kernel.pdf.canvas.parser.filter.TextRegionEventFilter;
import com.itextpdf.kernel.pdf.canvas.parser.listener.FilteredEventListener;
import com.itextpdf.kernel.pdf.canvas.parser.listener.LocationTextExtractionStrategy;
import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.FileOutputStream;
import java.io.Writer;
import java.io.BufferedWriter;
public class Main {
public static final String DEST = "./target/txt/parse_custom.txt";
public static final String SRC = "./src/main/resources/pdfs/nameddestinations.pdf";
public static void main(String[] args) throws IOException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new Main().manipulatePdf(DEST);
}
protected void manipulatePdf(String dest) throws IOException {
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC));
Rectangle rect = new Rectangle(36, 750, 523, 56);
CustomFontFilter fontFilter = new CustomFontFilter(rect);
FilteredEventListener listener = new FilteredEventListener();
// Create a text extraction renderer
LocationTextExtractionStrategy extractionStrategy = listener
.attachEventListener(new LocationTextExtractionStrategy(), fontFilter);
// Note: If you want to re-use the PdfCanvasProcessor, you must call PdfCanvasProcessor.reset()
PdfCanvasProcessor parser = new PdfCanvasProcessor(listener);
parser.processPageContent(pdfDoc.getFirstPage());
// Get the resultant text after applying the custom filter
String actualText = extractionStrategy.getResultantText();
pdfDoc.close();
// See the resultant text in the console
System.out.println(actualText);
try (Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(dest)))) {
writer.write(actualText);
}
}
/*
* The custom filter filters only the text of which the font name ends with Bold or Oblique.
*/
protected class CustomFontFilter extends TextRegionEventFilter {
public CustomFontFilter(Rectangle filterRect) {
super(filterRect);
}
#Override
public boolean accept(IEventData data, EventType type) {
if (type.equals(EventType.RENDER_TEXT)) {
TextRenderInfo renderInfo = (TextRenderInfo) data;
PdfFont font = renderInfo.getFont();
if (null != font) {
String fontName = font.getFontProgram().getFontNames().getFontName();
return fontName.endsWith("Bold") || fontName.endsWith("Oblique");
}
}
return false;
}
}
}
Why is there a need to specify a Rectangle? Our forms are dynamic so users can add more fields as needed and we also accept paragraphs on some of the questions so the length will always vary so it's unlikely that the coordinates of the texts will be the same.
How can I change the flow so that I can perhaps just search for the question and then get the text right after it (presumably the answer) - I don't really know what the best way to parse a PDF is. If there's no other way except providing a Rectangle, can I programmatically determine the coordinates/dimensions of the rectangles?
From the example it looks like it's filtering the text based on whether it's bolded or italicized which I probably don't need but it looks to be easy enough to fix by modifying/removing the accept() method.
Please take a look at what that example is for: In the JavaDoc comment you can read
/**
* Example written by Bruno Lowagie in answer to:
* http://stackoverflow.com/questions/24506830/can-we-use-text-extraction-strategy-after-applying-location-extraction-strategy
*/
and that stack overflow question starts with
I used the following code to get data in PDF from a particular location. I want to get bold text present in that location
When you wonder, therefore,
Why is there a need to specify a Rectangle?
the answer is: because the example is about finding bold text in a particular location.
You mention your forms were dynamic before flattening and fields don't have fixed positions. Thus, this filter probably is not optimal for your use case.
How can I change the flow so that I can perhaps just search for the question and then get the text right after it
In that case simply don't filter at all but use a plain LocationTextExtractionStrategy to extract text, search for the question text in the extracted text, and use the text thereafter up to the next question text.
Alternatively, if you still have the unflattened dynamic forms, you may consider extracting the xfa xml and extract the filled-in data from that xml.

PDFbox saying PDDocument closed when its not

I am trying to populate repeated forms with PDFbox. I am using a TreeMap and populating the forms with individual records. The format of the pdf form is such that there are six records listed on page one and a static page inserted on page two. (For a TreeMap larger than six records, the process repeats). The error Im getting is specific to the size of the TreeMap. Therein lies my problem. I can't figure out why when I populate the TreeMap with more than 35 entries I get this warning:
Apr 23, 2018 2:36:25 AM org.apache.pdfbox.cos.COSDocument finalize
WARNING: Warning: You did not close a PDF Document
public class test {
public static void main(String[] args) throws IOException, IOException {
// TODO Auto-generated method stub
File dataFile = new File("dataFile.csv");
File fi = new File("form.pdf");
Scanner fileScanner = new Scanner(dataFile);
fileScanner.nextLine();
TreeMap<String, String[]> assetTable = new TreeMap<String, String[]>();
int x = 0;
while (x <= 36) {
String lineIn = fileScanner.nextLine();
String[] elements = lineIn.split(",");
elements[0] = elements[0].toUpperCase().replaceAll(" ", "");
String key = elements[0];
key = key.replaceAll(" ", "");
assetTable.put(key, elements);
x++;
}
PDDocument newDoc = new PDDocument();
int control = 1;
PDDocument doc = PDDocument.load(fi);
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDAcroForm form = cat.getAcroForm();
for (String s : assetTable.keySet()) {
if (control <= 6) {
PDField IDno1 = (form.getField("IDno" + control));
PDField Locno1 = (form.getField("locNo" + control));
PDField serno1 = (form.getField("serNo" + control));
PDField typeno1 = (form.getField("typeNo" + control));
PDField maintno1 = (form.getField("maintNo" + control));
String IDnoOne = assetTable.get(s)[1];
//System.out.println(IDnoOne);
IDno1.setValue(assetTable.get(s)[0]);
IDno1.setReadOnly(true);
Locno1.setValue(assetTable.get(s)[1]);
Locno1.setReadOnly(true);
serno1.setValue(assetTable.get(s)[2]);
serno1.setReadOnly(true);
typeno1.setValue(assetTable.get(s)[3]);
typeno1.setReadOnly(true);
String type = "";
if (assetTable.get(s)[5].equals("1"))
type += "Hydrotest";
if (assetTable.get(s)[5].equals("6"))
type += "6 Year Maintenance";
String maint = assetTable.get(s)[4] + " - " + type;
maintno1.setValue(maint);
maintno1.setReadOnly(true);
control++;
} else {
PDField dateIn = form.getField("dateIn");
dateIn.setValue("1/2019 Yearlies");
dateIn.setReadOnly(true);
PDField tagDate = form.getField("tagDate");
tagDate.setValue("2019 / 2020");
tagDate.setReadOnly(true);
newDoc.addPage(doc.getPage(0));
newDoc.addPage(doc.getPage(1));
control = 1;
doc = PDDocument.load(fi);
cat = doc.getDocumentCatalog();
form = cat.getAcroForm();
}
}
PDField dateIn = form.getField("dateIn");
dateIn.setValue("1/2019 Yearlies");
dateIn.setReadOnly(true);
PDField tagDate = form.getField("tagDate");
tagDate.setValue("2019 / 2020");
tagDate.setReadOnly(true);
newDoc.addPage(doc.getPage(0));
newDoc.addPage(doc.getPage(1));
newDoc.save("PDFtest.pdf");
Desktop.getDesktop().open(new File("PDFtest.pdf"));
}
I cant figure out for the life of me what I'm doing wrong. This is the first week I've been working with PDFbox so I'm hoping its something simple.
Updated Error Message
WARNING: Warning: You did not close a PDF Document
Exception in thread "main" java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:77)
at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:125)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1200)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:383)
at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:522)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:460)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:444)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1096)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:419)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1367)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1254)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1232)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1204)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1192)
at test.test.main(test.java:87)
The warning by itself
You appear to get the warning wrong. It says:
Warning: You did not close a PDF Document
So in contrast to what you think, "PDFbox saying PDDocument closed when its not", PDFBox says that you did not close a document!
After your edit one sees that it actually says that a COSStream has been closed and that a possible cause is that the enclosing PDDocument already has been closed. This is a mere possibility!
The warning in your case
That been said, by adding pages from one document to another you probably end up having references to those pages from both documents. In that case in the course of closing both documents (e.g. automatically via garbage collection), the second one closing may indeed stumble across some already closed COSStream instances.
So my first advice to simply do close the documents at the end by
doc.close();
newDoc.close();
probably won't remove the warnings, merely change their timing.
Actually you don't merely create two documents doc and newDoc, you even create new PDDocument instances and assign them to doc again and again, in the process setting the former document objects in that variable free for garbage collection. So you eventually have a big bunch of documents to be closed as soon as not referenced anymore.
I don't think it would be a good idea to close all those documents in doc early, in particular not before saving newDoc.
But if your code will eventually be run as part of a larger application instead of as a small, one-shot test application, you should collect all those PDDocument instances in some Collection and explicitly close them right after saving newDoc and then clear the collection.
Actually your exception looks like one of those lost PDDocument instances has already been closed by garbage collection, so you should collect the documents even in case of a simple one-shot utility to keep them from being GC disposed.
(#Tilman, please correct me if I'm wrong...)
Importing pages
To prevent problems with different documents sharing pages, you can try and import the pages to the target document and thereafter add the imported page to the target document page tree. I.e. replace
newDoc.addPage(doc.getPage(0));
newDoc.addPage(doc.getPage(1));
by
newDoc.addPage(newDoc.importPage(doc.getPage(0)));
newDoc.addPage(newDoc.importPage(doc.getPage(1)));
This should allow you to close each PDDocument instance in doc before losing it.
There are certain drawbacks to this, though, cf. the method JavaDoc and this answer here.
An actual issue in your code
In your combined document you will have many fields with the same name (at least in case of a sufficiently high number of entries in your CSV file) which you initially set to different values. And you access the fields from the PDAcroForm of the respective original document but don't add them to the PDAcroForm of the combined result document.
This is asking for trouble! The PDF format does consider forms to be document-wide with all fields referenced (directly or indirectly) from the AcroForm dictionary of the document, and it expects fields with the same name to effectively be different visualizations of the same field and therefore to all have the same value.
Thus, PDF processors might handle your document fields in unexpected ways, e.g.
by showing the same value in all fields with the same name (as they are expected to have the same value) or
by ignoring your fields (as they are not in the document AcroForm structure).
In particular programmatic reading of your PDF field values will fail because in that context the form is definitively considered document-wide and based in AcroForm. PDF viewers on the other hand might first show your set values and make look things ok.
To prevent this you should rename the fields before merging. You might consider using the PDFMergerUtility which does such a renaming under the hood. For an example usage of that utility class have a look at the PDFMergerExample.
Even though the above answer was marked as the solution to the problem, since the solution is buried in the comments, I wanted to add this answer at this level. I spent several hours searching for the solution.
My code snippets and comments.
// Collection solely for purpose of preventing premature garbage collection
List<PDDocument> sourceDocuments = new ArrayList<>( );
...
// Source document (actually inside a loop)
PDDocument docIn = PDDocument.load( artifactBytes );
// Add document to collection before using it to prevent the problem
sourceDocuments.add( docIn );
// Extract from source document
PDPage extractedPage = docIn.getPage( 0 );
// Add page to destination document
docOut.addPage( extractedPage );
...
// This was failing with "COSStream has been closed and cannot be read."
// Now it works.
docOut.save( bundleStream );

Writing Arabic with PDFBOX with correct characters presentation form without being separated

I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters.
Here is an example:
Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي
What I get in PDF File ->
I tried some methods but it's no use here are some of them:
1. Converting String to Stream of bits and trying to extract right values
2. Treating String a sequence of bytes with UTF-8 && UTF-16 and extracting values from them
There is some approach seems very promising to get the value "Unicode" of each character But it generate general "official Unicode" Here is what I mean
System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );
output is 644 but fee0 was the expected output because this character is in middle from then I should get the middle Unicode fee0
so what I want is some method that generates the correct Unicode not the just the official one
The very Left column in the first table in the following link represents the general Unicode
Arabic Unicode Tables Wikipedia
Notice:
The sample code in this answer might be outdated please refer to h q's answer for the working sample code
At First I will thank Tilman Hausherr and M.Prokhorov for showing me the library that made writing Arabic possible using PDFBox Apache.
This Answer will be divided into two Sections:
Downloading the library and installing it
How to use the library
Downloading the library and installing it
We are going to use ICU Library.
ICU stands for International Components for Unicode and it is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.
To download the Library go to the downloads page from here.
Choose the latest version of ICU4J as shown in the following image.
You will be transferred to another page and you will find a box with direct links of the needed components .Go ahead and download three Files you will find the highlighted in next image.
icu4j-docs.jar
icu4j-src.jar
icu4j.jar
The following explanation for creating and adding a library in Netbeans IDE
Navigate to the Toolbar and Click tools
Choose Libraries
At the bottom left you will find new Library button Create yours
Navigate to the library that you created in libraries list
Click it and add jar folders like that
Add icu4j.jar in class path
Add icu4j-src.jar in Sources
Add icu4j-docs.jar in Javadoc
View your opened projects from the very right
Expand the project that you want to use the library in
Right Click on the libraries folder and choose add library
Finally choose the library that you had just created.
Now you are ready to use the library just import what you want like that
import com.ibm.icu.What_You_Want_To_Import;
How to use the library
With ArabicShaping Class and reversing the String we can write a correct attached Arabic LINE
Here is the Code Notice the comments in the following code
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("Arabic Font File of format.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
//The Trick in the next Line of Code But Here is some few Notes first
//We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language
//The output will be perfect except every line will be justified to the left "It's not hard to resolve this"
// So we have to write arabic string to pdf line by line..It will be like this
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString());
// Note the previous line of code throws ArabicShapingExcpetion
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
}
Here is the output
I hope that I had gone over everything.
Update : After reversing make sure to reverse the numbers again in order to get the same proper number
Here is a couple of functions that could help
public static boolean isInt(String Input)
{
try{Integer.parseInt(Input);return true;}
catch(NumberFormatException e){return false;}
}
public static String reverseNumbersInString(String Input)
{
char[] Separated = Input.toCharArray();int i = 0;
String Result = "",Hold = "";
for(;i<Separated.length;i++ )
{
if(isInt(Separated[i]+"") == true)
{
while(i < Separated.length && (isInt(Separated[i]+"") == true || Separated[i] == '.' || Separated[i] == '-'))
{
Hold += Separated[i];
i++;
}
Result+=reverse(Hold);
Hold="";
}
else{Result+=Separated[i];}
}
return Result;
}
Here is a code that works. Download a sample font, e.g. trado.ttf
Make sure the pdfbox-app and icu4j jar files are in your classpath.
import java.io.File;
import java.io.IOException;
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import com.ibm.icu.text.Bidi;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("trado.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(bidiReorder(s));
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
private static String bidiReorder(String text)
{
try {
Bidi bidi = new Bidi((new ArabicShaping(ArabicShaping.LETTERS_SHAPE)).shape(text), 127);
bidi.setReorderingMode(0);
return bidi.writeReordered(2);
}
catch (ArabicShapingException ase3) {
return text;
}
}
}

Arabic Text in PdfBox [duplicate]

I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters.
Here is an example:
Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي
What I get in PDF File ->
I tried some methods but it's no use here are some of them:
1. Converting String to Stream of bits and trying to extract right values
2. Treating String a sequence of bytes with UTF-8 && UTF-16 and extracting values from them
There is some approach seems very promising to get the value "Unicode" of each character But it generate general "official Unicode" Here is what I mean
System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );
output is 644 but fee0 was the expected output because this character is in middle from then I should get the middle Unicode fee0
so what I want is some method that generates the correct Unicode not the just the official one
The very Left column in the first table in the following link represents the general Unicode
Arabic Unicode Tables Wikipedia
Notice:
The sample code in this answer might be outdated please refer to h q's answer for the working sample code
At First I will thank Tilman Hausherr and M.Prokhorov for showing me the library that made writing Arabic possible using PDFBox Apache.
This Answer will be divided into two Sections:
Downloading the library and installing it
How to use the library
Downloading the library and installing it
We are going to use ICU Library.
ICU stands for International Components for Unicode and it is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.
To download the Library go to the downloads page from here.
Choose the latest version of ICU4J as shown in the following image.
You will be transferred to another page and you will find a box with direct links of the needed components .Go ahead and download three Files you will find the highlighted in next image.
icu4j-docs.jar
icu4j-src.jar
icu4j.jar
The following explanation for creating and adding a library in Netbeans IDE
Navigate to the Toolbar and Click tools
Choose Libraries
At the bottom left you will find new Library button Create yours
Navigate to the library that you created in libraries list
Click it and add jar folders like that
Add icu4j.jar in class path
Add icu4j-src.jar in Sources
Add icu4j-docs.jar in Javadoc
View your opened projects from the very right
Expand the project that you want to use the library in
Right Click on the libraries folder and choose add library
Finally choose the library that you had just created.
Now you are ready to use the library just import what you want like that
import com.ibm.icu.What_You_Want_To_Import;
How to use the library
With ArabicShaping Class and reversing the String we can write a correct attached Arabic LINE
Here is the Code Notice the comments in the following code
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("Arabic Font File of format.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
//The Trick in the next Line of Code But Here is some few Notes first
//We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language
//The output will be perfect except every line will be justified to the left "It's not hard to resolve this"
// So we have to write arabic string to pdf line by line..It will be like this
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString());
// Note the previous line of code throws ArabicShapingExcpetion
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
}
Here is the output
I hope that I had gone over everything.
Update : After reversing make sure to reverse the numbers again in order to get the same proper number
Here is a couple of functions that could help
public static boolean isInt(String Input)
{
try{Integer.parseInt(Input);return true;}
catch(NumberFormatException e){return false;}
}
public static String reverseNumbersInString(String Input)
{
char[] Separated = Input.toCharArray();int i = 0;
String Result = "",Hold = "";
for(;i<Separated.length;i++ )
{
if(isInt(Separated[i]+"") == true)
{
while(i < Separated.length && (isInt(Separated[i]+"") == true || Separated[i] == '.' || Separated[i] == '-'))
{
Hold += Separated[i];
i++;
}
Result+=reverse(Hold);
Hold="";
}
else{Result+=Separated[i];}
}
return Result;
}
Here is a code that works. Download a sample font, e.g. trado.ttf
Make sure the pdfbox-app and icu4j jar files are in your classpath.
import java.io.File;
import java.io.IOException;
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import com.ibm.icu.text.Bidi;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("trado.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(bidiReorder(s));
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
private static String bidiReorder(String text)
{
try {
Bidi bidi = new Bidi((new ArabicShaping(ArabicShaping.LETTERS_SHAPE)).shape(text), 127);
bidi.setReorderingMode(0);
return bidi.writeReordered(2);
}
catch (ArabicShapingException ase3) {
return text;
}
}
}

Categories

Resources