IText7 only creates form/widgets on new documents - java

When running this code with the PdfDocument not having a read source, it works properly. When I try reading from a premade pdf it stops creating the form/widgets, but still adds the paragraph as expected. There is no error given. Does anyone understand why this is happening?
Here is the code I'm running:
public class HelloWorld {
public static final String DEST = "sampleOutput.pdf";
public static final String SRC = "sample.pdf";
public static void main(String args[]) throws IOException {
File file = new File(DEST);
new HelloWorld().createPdf(SRC, DEST);
}
public void createPdf(String src, String dest) throws IOException {
//Initialize PDF reader and writer
PdfReader reader = new PdfReader(src);
PdfWriter writer = new PdfWriter(dest);
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer); //if i do (reader, writer) the widget isn't added to the first page anymore.
// Initialize document
Document document = new Document(pdf);
HelloWorld.addAcroForm(pdf, document);
//Close document
document.close();
}
public static PdfAcroForm addAcroForm(PdfDocument pdf, Document doc) throws IOException {
Paragraph title = new Paragraph("Test Form")
.setTextAlignment(TextAlignment.CENTER)
.setFontSize(16);
doc.add(title);
doc.add(new Paragraph("Full name:").setFontSize(12));
//Add acroform
PdfAcroForm form = PdfAcroForm.getAcroForm(doc.getPdfDocument(), true);
//Create text field
PdfTextFormField nameField = PdfFormField.createText(doc.getPdfDocument(),
new Rectangle(99, 753, 425, 15), "name", "");
form.addField(nameField);
return form;
}
}

I adapted your code like this:
public static PdfAcroForm addAcroForm(PdfDocument pdf, Document doc) throws IOException {
Paragraph title = new Paragraph("Test Form")
.setTextAlignment(TextAlignment.CENTER)
.setFontSize(16);
doc.add(title);
doc.add(new Paragraph("Full name:").setFontSize(12));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
PdfTextFormField nameField = PdfFormField.createText(pdf,
new Rectangle(99, 525, 425, 15), "name", "");
form.addField(nameField, pdf.getPage(1));
return form;
}
You'll notice two changes:
I change the Y offset of the field (525 instead of 753). Now the field is added inside the visible area of the page. In your code, the field was added, but it wasn't visible.
I defined to which page the fields needs to be added by adding pdf.getPage(1) as second parameter for the addField() method.

Related

Lucene index HTML Headings

I want to index HTML files and be able to jump to the corresponding heading after receiving my search results.
I currently use a HTMLStripCharFilter for parsing my files.
public class MyAnalyzer extends Analyzer {
public MyAnalyzer() {
super();
}
#Override
protected Reader initReader(String fieldName, Reader reader) {
return new HTMLStripCharFilter(reader);
}
#Override
protected TokenStreamComponents createComponents(String fieldName) {
StandardTokenizer source = new StandardTokenizer();
TokenStream result = new StandardFilter(source);
result = new LowerCaseFilter(result);
return new TokenStreamComponents(source, result);
}
}
The method indexMyFile gets the path to one HTML file and creates the index, but it currently only stores the file name.
private static void indexMyFile(IndexWriter writer, Path file,
long lastModified) throws IOException {
try (InputStream stream = Files.newInputStream(file)) {
Document doc = new Document();
Field pathField = new StringField("path", file.toString(),
Field.Store.YES);
doc.add(pathField);
doc.add(new TextField("contents", new BufferedReader(
new InputStreamReader(stream, StandardCharsets.UTF_8))));
writer.addDocument(doc);
}
My solution was to add a new TextField to this Lucene Document, but I currently don't know the headings in this point of the code.
Is there a way of using Lucene, so I can link the content to the current heading and file name? Or do I have to use JSoup or JTidy and pass my indexMyFile Method the text after headings and create a Lucene Document for each heading, similar to this post?
I used JSoup to parse the HTML tags. Then instead of indexing the whole file, I created a Document for each heading containing several fields:
private void indexString(Path path, String title, String heading,
String content) throws IOException {
Document doc = new Document();
doc.add(new Field("title", title, TextField.TYPE_STORED));
doc.add(new Field("heading", heading, TextField.TYPE_STORED));
doc.add(new StringField("path", path.toString(), Field.Store.YES));
doc.add(new StringField("urlHeading", urlHeading, Field.Store.YES));
doc.add(new TextField("contents", content, Store.NO));
writer.addDocument(doc);
}

Create PDF Table from HTML String with UTF-8 encofing

I want to create PDF table from HTML string. I can create that table, but instead of Text, I'm getting question marks. Here is my code:
public class ExportReportsToPdf implements StreamSource {
private static final long serialVersionUID = 1L;
private ByteArrayOutputStream byteArrayOutputStream;
public static final String FILE_LOC = "C:/Users/KiKo/CasesWorkspace/case/Export.pdf";
private static final String CSS = ""
+ "table {text-align:center; margin-top:20px; border-collapse:collapse; border-spacing:0; border-width:1px;}"
+ "th {font-size:14px; font-weight:normal; padding:10px; border-style:solid; overflow:hidden; word-break:normal;}"
+ "td {padding:10px; border-style:solid; overflow:hidden; word-break:normal;}"
+ "table-header {font-weight:bold; background-color:#EAEAEA; color:#000000;}";
public void createReportPdf(String tableHtml, Integer type) throws IOException, DocumentException {
// step 1
Document document = new Document(PageSize.A4, 20, 20, 50, 20);
// step 2
PdfWriter.getInstance(document, new FileOutputStream(FILE_LOC));
// step 3
byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, byteArrayOutputStream);
if (type != null) {
writer.setPageEvent(new Watermark());
}
// step 4
document.open();
// step 5
document.add(getTable(tableHtml));
// step 6
document.close();
}
private PdfPTable getTable(String tableHtml) throws IOException {
// CSS
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new ByteArrayInputStream(CSS.getBytes()));
cssResolver.addCss(cssFile);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline pdf = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser parser = new XMLParser(worker);
InputStream inputStream = new byteArrayInputStream(tableHtml.getBytes());
parser.parse(inputStream);
return (PdfPTable) elements.get(0);
}
private static class Watermark extends PdfPageEventHelper {
#Override
public void onEndPage(PdfWriter writer, Document document) {
try {
URL url = Thread.currentThread().getContextClassLoader().getResource("/images/memotemp.jpg");
Image background = Image.getInstance(url);
float width = document.getPageSize().getWidth();
float height = document.getPageSize().getHeight();
writer.getDirectContentUnder().addImage(background, width, 0, 0, height, 0, 0);
} catch (DocumentException | IOException e) {
e.printStackTrace();
}
}
}
#Override
public InputStream getStream() {
return new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
}
}
This code is working, and I'm getting this:
I've try to add UTF-8,
InputStream inputStream = new byteArrayInputStream(tableHtml.getBytes("UTF-8"));
but than I'm getting this:
I want to get something like this:
I think the problem is with the encoding, but I don't know how to solve this bug. Any suggestions...?
To get bytes from a (Unicode) String in some encoding, specify it,
otherwise the default system encoding is used.
tableHtml.getBytes(StandardCharsets.UTF_8)
In your case however "Windows-1251" seems a better match as the PDF does not seem to use UTF-8.
Maybe the original tableHTML String was read with the wrong encoding. Might check that, if it came from file or database.
You need to tell iText what encoding to use by creating an instance of the BaseFont class. Then in your document.add(getTable(tableHtml)); you can add a call to the font. Example at http://itextpdf.com/examples/iia.php?id=199.
I can't tell how you create a table but the class PdfPTable has a method addCell(PdfCell) and one constructor for PdfCell takes a Phrase. The Phrase can be constructed with a String and a Font. The font class takes a BaseFont as a constructor argument.
If you look around the Javadoc for iText you will see various classes take a Font as a constructor argument.

How to generate Dyanamic no of pages using PDFBOX

I have to generate a pdf file depending on some input .Each time the code runs , the input length may vary , so how can I add pages to the document dynamically depending on my input content .
public class pdfproject
{
static int lineno=768;
public static void main (String[] args) throws Exception
{
PDDocument doc= new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream cos = new PDPageContentStream(doc, page);
for(int i=0;i<2000;i++)
{
renderText("hello"+i,cos,60);
}
cos.close();
doc.save("test.pdf");
doc.close();
}
static void renderText(String Info,PDPageContentStream cos,int marginwidth) throws Exception
{
lineno-=12;
System.out.print("lineno="+lineno);
PDFont fontPlain = PDType1Font.HELVETICA;
cos.beginText();
cos.setFont(fontPlain, 10);
cos.moveTextPositionByAmount(marginwidth,lineno);
cos.drawString(Info);
cos.endText();
}
}
How do i ensure that the content is rendered on the next page by adding a new page dynamically when there is no space on the current page ?
Pdfbox does not include any automatic layouting support. Thus, you have to keep track of how full a page is, and you have to close the current page, create a new one, reset fill indicators, etc
This obviously should not be done in static members in some project class but instead in some dedicated class and its instance members. E.g.
public class PdfRenderingSimple implements AutoCloseable
{
//
// rendering
//
public void renderText(String Info, int marginwidth) throws IOException
{
if (content == null || textRenderingLineY < 12)
newPage();
textRenderingLineY-=12;
System.out.print("lineno=" + textRenderingLineY);
PDFont fontPlain = PDType1Font.HELVETICA;
content.beginText();
content.setFont(fontPlain, 10);
content.moveTextPositionByAmount(marginwidth, textRenderingLineY);
content.drawString(Info);
content.endText();
}
//
// constructor
//
public PdfRenderingSimple(PDDocument doc)
{
this.doc = doc;
}
//
// AutoCloseable implementation
//
/**
* Closes the current page
*/
#Override
public void close() throws IOException
{
if (content != null)
{
content.close();
content = null;
}
}
//
// helper methods
//
void newPage() throws IOException
{
close();
PDPage page = new PDPage();
doc.addPage(page);
content = new PDPageContentStream(doc, page);
content.setNonStrokingColor(Color.BLACK);
textRenderingLineY = 768;
}
//
// members
//
final PDDocument doc;
private PDPageContentStream content = null;
private int textRenderingLineY = 0;
}
(PdfRenderingSimple.java)
You can use it like this
PDDocument doc = new PDDocument();
PdfRenderingSimple renderer = new PdfRenderingSimple(doc);
for (int i = 0; i < 2000; i++)
{
renderer.renderText("hello" + i, 60);
}
renderer.close();
doc.save(new File("renderSimple.pdf"));
doc.close();
(RenderSimple.java)
For more specialized rendering support you will implement improved rendering classes, e.g. PdfRenderingEndorsementAlternative.java from this answer.

Term frequency in Lucene 4.0

Trying to calculate term frequency using Lucene 4.0. I got document frequency working just fine, but can't figure out how to do term frequency using the API. Here's the code I have:
private static void addDoc(IndexWriter writer, String content) throws IOException {
FieldType fieldType = new FieldType();
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setIndexed(true);
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
fieldType.setStored(true);
Document doc = new Document();
doc.add(new Field("content", content, fieldType));
writer.addDocument(doc);
}
public static void main(String[] args) throws IOException, ParseException {
Directory directory = new RAMDirectory();
Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_40);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
IndexWriter writer = new IndexWriter(directory, config);
addDoc(writer, "Lucene is stupid");
addDoc(writer, "Java is great");
writer.close();
IndexReader reader = DirectoryReader.open(directory);
System.out.println(reader.docFreq(new Term("content", "Lucene")));
reader.close();
}
I've tried doing something like reader.getTermVector(0, "content")... but can't find a method to just get the frequency of a particular term in that document.
Thanks!
K, figured it out. You can get a DocsEnum object from MultiFields, and then iterate over that.
private static void addDoc(IndexWriter writer, String content) throws IOException {
FieldType fieldType = new FieldType();
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setIndexed(true);
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
fieldType.setStored(true);
Document doc = new Document();
doc.add(new Field("content", content, fieldType));
writer.addDocument(doc);
}
public static void main(String[] args) throws IOException, ParseException {
Directory directory = new RAMDirectory();
Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_40);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
IndexWriter writer = new IndexWriter(directory, config);
addDoc(writer, "bla bla bla bleu bleu");
addDoc(writer, "bla bla bla bla");
writer.close();
DirectoryReader reader = DirectoryReader.open(directory);
DocsEnum de = MultiFields.getTermDocsEnum(reader, MultiFields.getLiveDocs(reader), "content", new BytesRef("bla"));
int doc;
while((doc = de.nextDoc()) != DocsEnum.NO_MORE_DOCS) {
System.out.println(de.freq());
}
reader.close();
}

Cropping PDF using iText (java PDF library)

I need to crop PDF file using this code. cb will output the needed coordinates for the cropping.
How can efficiently do this for every page in a PDF file?
Did this code, but does not work for every pdf input:
public class CropPages {
public static final String PREFACE = "input.pdf";
public static final String RESULT = "cropped.pdf";
public void addMarginRectangle(String src, String dest)
throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
TextMarginFinder finder;
PdfDictionary pageDict;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
finder = parser.processContent(i, new TextMarginFinder());
//PdfContentByte cb = stamper.getOverContent(i);
PdfRectangle rect = new PdfRectangle((finder.getLlx()-5), (finder.getLly()-5), (finder.getUrx()+5), (finder.getUry()+5));
if (i <= 10){
System.out.println(rect);
}
pageDict = reader.getPageN(i);
pageDict.put(PdfName.CROPBOX, rect);
}
stamper.close();
}
public static void main(String[] args) throws IOException, DocumentException {
new CropPages().addMarginRectangle(PREFACE, RESULT);
}
}

Categories

Resources