Re-write existing pdf via iText - java

is here a possibility to make given pdf-file blank and re-write new data to this file? I know that it is possible to trim document deleting pages from the middle. But I didn't find any ways to clear document at all. Thank you

I agree with #Samuel Huylebroeck, if you are looking to create new content then just create new pages or a new document.
If you really want to though, you should be able to remove the existing content of a page in a PDF by going through some of the lower level APIs that deal with things like Content Streams (Content Streams are not specific to iText so if you are looking to learn more about PDF in general you can read about these anywhere).
I don't know whether iText will allow you to set a pages content stream to null though or the content streams data to null, it would be quick to try though if you are really committed to this approach for whatever you are trying to achieve.

Related

Make existing PDF's in to templets - iText

I am trying to make some existing PDF's into templets.
Because these documents hold real data I am replaceing this data such as names and addrsss and making them into dummy place holders.
Examples
[[Name]]
[[Address1]]
When I alter the text via the iText version 5 library replace via a program I can use the template.
To speed things up I tried to use Adobe DC.
When using this method the template stops working.
Any ideas?
From what I understand of your question;
you have (or want to have) a template document
fill in the template with data from a program
turn this back into a pdf
You can easily achieve some of your goals with iText.
I suggest you look into http://developers.itextpdf.com/examples/form-examples/clone-filling-out-forms

How to persistently store styled text from a Document in a database?

So I'm currently working on a program that allows users to created "posts" with styled text. Right now I'm using Java's DefaultStyledDocument, but I'm open to other options (preferably they implement StyledDocument, though). I originally posted something about directly serializing DefaultStyledDocuments here. However, it may be that there is a better way to store these documents. How can I do this?
Additionally, I want to be able to store these styles in a database (probably MySQL), would there be anything else I need to know when thinking about that? Can I directly export to XML?
Finally, a quick discussion on HTMLDocuments. I could use HTMLDocuments for this, however I've heard bad things about Java's HTML renderer, and I also want users to be able to easily edit the styled text. DefaultStyledDocument allows very easy editing use StyledEditorKit. So HTMLDocuments have their drawbacks, and unless alternative can be found, I'd prefer to stick with DefaultStyledDocuments.
The easiest way to do this is to write the document to RTF text using either RTFEditorKit or the AdvancedRTFEditorKit. Which is preferable depends on what you need the document for. The advanced one has support for tables, images, indentation, and more, but requires a separate jar file. RTFEditorKit is built in, but doesn't need a separate Jar file.
To write with AdvancedRTFEditorKit:
AdvancedRTFEditorKit editor = new AdvancedRTFEditorKit();
Writer writer = new StringWriter();
editor.write(writer, content, 0, content.getLength());
writer.close();
String RTFText = writer.toString();
RTFEditorKit uses a similar process. The result of this, a String, is easy to store in most types of databases.

how can I add textfield to the existing pdf template

I want create a pdf template from a another template,the result pdf is still the template then i can fill it with data。
I try to use PdfStamper but the result pdf is not template,any one can help me,thanks.
Let's distinguish two situations, depending on the nature of your PDF template:
You are talking about an XFA template:
In this case, the PDF is merely a container for an XML stream that defines your form. The only way to change it, is by editing the XML. This is best done manually using Adobe LiveCycle Designer, but if you really want to do it programmatically, you can extract the XML from the PDF using iText, manipulate the XML using any type of XML editing software, and finally put back the XML into the PDF using iText. The programmatical solution is very difficult as it requires you to be familiar with the XFA syntax and the specs for XFA consist of several hundreds of pages.
You are talking about an AcroForm template
In this case, the root dictionary has an /AcroForm dictionary of which one of the entries is a /Fields array that isn't empty. You can create a PdfReader instance for this template and pass the reader object to PdfStamper. You then create the extra fields you need (text fields, button fields,...) and add them to the stamper using the addAnnotation() method.
This is shown in the SubscribeForm example. We have an existing template subscribe.pdf and we add several buttons to it, resulting in the new template subscribe_me.pdf.
If this doesn't answer your question, please clarify, as it's generally not accepted to limit your question to saying "I try to use PdfStamper but the result pdf is not template", you should at least show what you've tried, otherwise you risk that your question will be closed.

RTF Java Parser

here is my issue.
I need to read an RTF document and render to a webpage (some sort of google docs) but these documents are templates, the idea, is that user can only edit certain text and not the text that is marked to be "template logic".
So far I've seen a bunch of RTF libraries that performs only rendering but wont let you access an object that can be iterated dynamically to go over the structure of the RTF document.
My idea is to determine what can be editable and what can not, put all that info (images, text, tables, headers, footers) into a json and send it to my JS client.
Maybe this is a crazy idea, any suggestions?
When I read "template", I think "Velocity". I wonder if you can solve this by separating template from dynamic data. I wonder if you can solve this by letting users modify dynamic data and only marry it with the static, unedited template at the last minute.
It's possible that Docmosis can help because it lets you use documents and templates and you can extract from Docmosis an "analysis" of the template (eg a list of fields). It's hard to be sure if it will fit your purpose though from your description. Please note I work for the company that produces Docmosis.

How to link scanned document with its text content to make it searchable?

I have PDF documents containing several images/pages of scanned documents. Their (OCR-produced) text content comes in separate XML files.
Is it possible to use/link the text content from XML somehow to my PDF files? (Ideally there would be no additional files left in the repository to confuse unaware users.)
As I've been told there's 65k limit on a text property, therefore I can't simply put the text content into a property on the , as the PDF might easily exceed that limit.
A suggestion has been made to pass a stream with the text content to cm:content property of my PDF file. I'm kinda lost here, as IMO that means that either I'm providing a reference or I'm assigning huge string again. The first would mean the text content has to be preserved somewhere as a separate document. The later sounds like I would hit the 65k limit again.
Also I think setting cm:content would probably delete the PDF content itself. I need the PDF binary data to remain untouched.
This is where the suggestion is being discussed. I'm currently trying that anyways.
Soo, it is actually quite easy... What needs to be done is to define a property of type "d:content" on your document; I do that via an aspect...
model.xml:
<aspects>
<aspect name="mm:my_aspect">
...
<property name="mm:myTextContentProperty">
<type>d:content</type>
</property>
</properties>
</aspect>
</aspects>
Then, when I have both PDF and its text representation in the repository, I link those two by adding the aspect and populating the property...
getNodeService().addAspect(pdfNodeRef, myAspect, null);
getNodeService().setProperty(pdfNodeRef, MyModel.MY_TEXT_CONTENT_PROPERTY, new ContentData("store://....bin", "text/plain", size, "UTF-8"));
Now the PDF can be found via both following queries even though it does not contain any text data...
"#\\{http\\://mymodel.ns/content/1.0\\}myTextContentProperty:\"" + string + "\""
"TEXT:\"" + string + "\""
The later is also hinted here, and I guess that is how regular search in Alfresco Web Client works, because now the PDF is reachable using the regular search input.
There is one issue though: the search spits the PDF document and also the document I link using the property. So now I need to hide the later from search results...
(When searching using the first query only the PDF is found, as expected; but that approach is of little use to me.)
Hopefully it saves some time to other Alfresco-newbies. :)
Another way to achieve what I need would be setting MY_TEXT_CONTENT_PROPERTY using contentService...
ContentWriter writer = getContentService().getWriter(pdfNodeRef, MyModel.MY_TEXT_CONTENT_PROPERTY, true);
writer.setMimetype("text/plain");
writer.setEncoding("UTF-8");
writer.putContent(stringFromXmlDescription); // the source XML gets thrown away
(Important thing seems to be to put the content after the mimetype and encoding are set. Otherwise the content/property is not searchable.)
With this approach there's no need to hide the linked text documents, there aren't any.

Categories

Resources