Adding custom metadata to binary files - java

How to include a custom metadata with a files without using a database to extract later using Apache Tika The files extension are *.doc .docx .pdf .txt .... *

this can be dong using PDFBox
this is an example on github with it's description on medium
https://github.com/enisinanaj/pdfbox-metadata-example
https://medium.com/#enisinanaj/how-to-write-custom-metadata-to-a-pdf-document-in-java-with-pdfbox-f52a82ab1b09
just add the Main method and call insertMetadata()
to set a costume metadata use
info.setCustomMetadataValue("ispublished", "true");

Related

How to create a package with excel sheet data in AEM?

So, I have to make a package using the package manager of CRXDE. I need to add hundreds of paths to the filter by copying each of them one by one and it takes a lot of time by doing so. Is there any way I can upload a data file such as excel or CSV where the file contains all the paths and I can just build a package with it?
If the paths are query-able via AEM, you may try using the Query Packager tool from ACS AEM Commons.
Alternatively, you can write code to read through the CSV file and create the package programmatically.
Similar tools have already been created by a few people which you can leverage (see below) or build your own
Create Package in AEM from Excel File
Selective content packaging

Generate PDF files using iText and apache velocity template(.vm)

What is the general workflow to generate a PDF using iText and an Apache Velocity template file (.vm) in Java?
I am interested in knowing steps like: parse template file, put Java object in context and steps to be performed to generate pdf etc.
I know this is a very basic question. But I am not able to find even a single example of this type on the web. I found XDocReport, but I am interested to know other alternatives as well.
Please help me with some sample project link or at least the steps to get started.
Yes, you can.
It all depends on how complex you want the PDFs to be.
Here are the steps for basic functionality
Generate a HTML file using Apache Velocity template file (.vm).
Use com.itextpdf.text.html.simpleparser.HTMLWorker (deprecated) to parse/convert that HTML file into a PDF.
Additionally, you can use com.itextpdf.text.pdf.PdfCopy.PageStamp to add content (borders, stamps, notes, annotations etc) to an existing PDF.
There is also com.itextpdf.tool.xml.XMLWorker for more advanced HTML conversion (adding style sheets etc)
Generating PDF using iText and an Apache Velocity template file (.vm) in Java directly is not possible because:
PDF is binary format,
Velocity generates plain text content.
On other words, Velocity cannot generate PDF.
XDocReport is able to generate a docx/odt report by merging a docx/odt template which contains some Velocity/Freemarker syntax with Java context. The generated docx/odt report can be convert it to pdf/xhtml.
It works because docx/odt are a zip which contains several xml entries. If you unzip a docx you will see word/document.xml. In this entry, you will see the content that you have typed with MS Word. word/document.xml is a plain text, so Velocity can be used in this case.
Here the XDocReport process to generate pdf from a docx template which uses Velocity:
Load docx template. this step consist to unzip the docx and stores in a map each xml entries (name entry as key and byte array as value). For instance map contains a key with word/document.xml and the xml content of this entry as value.
Loop for each xml entries which must be merged with Java context. For instance word/document.xml is merged with Java context by using Velocity and the result of merge replace the word/document.xml value of the map
Rebuild a new docx by zipping each entries of the map.
At this step we have a generated docx (the report).
To convert it to another format, XDocReport provides a docx-to-pdf converter based on Apache POI and iText. Here the XDocReport process to convert a docx to pdf:
Load docx with Apache POI
Loop for each structures of POI (XWPFParagraph, etc.) to create iText structure (iText Paragraph).
Note that XDocReport is modular and you can use other converters as well.
At first,we use freemarker template to generate a html file,and then render html to a pdf file by IItextRender .Finally, we can view pdf file in browser,there has a very useful javascript tools called pdfjs. Maybe you can try it.

Java Utility to convert content of any file to text file.

I am looking for a java utility through which a user can convert any type of file (pdf, doc, docx, xls, xlsx, csv, rtf, txt). We have a requirement in which user can upload any type of file and we need to read the content of the file(only text), convert it and store it in an object. That can be done using Apachi poi but I am wondering if any java utility exists?
You may be interested in Apache Tika, which includes the functionality of Apache POI and PDFBox. From the project description, the toolkit: "detects and extracts metadata and structured text content from various documents using existing parser libraries."
I imagine you can't have some sort of universal function for every type of file. You will need to implement conversion methods for each file type. This link helps with PDF files, and will also give you a template to work with your other file types.

Converting a xml file into jrxml file

Is there a way I can convert a xml file to jrxml file?
I am doing a project where I need to write data from the database in a MS Word file that is already designed with the required template. I have converted that document file into xml format.
I am using iReport to generate the resultant doc file. But it requires a jrxml file. It fails to read from xml file.
Bijil,
JRXML is a template that contains the format for the content that is shown on the report.
And from what i understand the xml is containing the input data.
How jasper reports work is, you create JASPER file by compiling the JRXML file (this can be done using iReport or through your java code). To this JASPER file you will attach a object from your java code that contain data for filling the JASPER.
Please see this link for details
Edited: iReport is a designer tool for creating jasper reports, I am not sure if there any tool that can convert xml to jrxml. jrxml will contain the syntax with respect to jasper report.
What we used to do, were try to create a similar report(by comparing the look and feel) as the one client has given using iReport and get the final jrxml.
Compile jrxml in iReport check the look and feel with the sample word doc with the generated sample report
Then use the compiled jasper file in the application directly. The use of jasper has 2 advantages,
you can use unicode characters in your report
you reduce the overhead of compiling your code every time before generating report.
disadvantage
you need to keep separate track of jrxml, to fix any defect on
previous jasper file.
Save your MS WORD file (Template) in report directory and using Apache POI,
Do necessary edits to your template. Then save your file to any place you like.
Apache POI tutorials
Link 01:
To print save your edited file in temp directory and call Desktop.getDesktop().print(file) using desktop
Link 02:
Wish you good luck.

Adding content to ms word, Include macro

I have a JSP file to flush all data from database into a MS-Word document by setting the content-type keyword.
I need to add header and footer to the same document. I couldn't find a direct way from JSP without using APIs like POI. So I created a macro which works locally.
How do I add this to a dynamically generated Word file?
I had a similar problem with POI and Excel.
The solution is to manually create a template .doc file, with the macro present. Then in your code, load that document, amend it with your data, and save it. The macro will be preserved from the template document.
I'd use POI or docx4j to create a docx file on the server, and add the header/footer as part of that process.

Categories

Resources