Java docx4j modify template for header footer with SQL data - java

What I am wanting to do is take in a word doc/docx template which already has pre-designed headers and footers and replace certain words with words applicable with that document generated from what a user has input and has been saved through MySQL. I already have a program that works to get the user input and saves to the MySQL. However, I'm a little confused at how the word manipulation would work into this.
I found docx4j and a tutorial that shows what I am looking for here and have found on another question on this site example code here. As I'm a beginner in using this, the things I'm confused on are:
I understand JAXB is used for converting to and from XML. Why is this relevant in a situation like this? Or if it's not, in what case would it be?
I am seeing two versions of loading:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File("P:\\Engineering\\Projects\\Naming\\EX_TEMP.docx"));
........ and the second example:
private WordprocessingMLPackage getTemplate(String name) throws Docx4JException, FileNotFoundException {
WordprocessingMLPackage template = WordprocessingMLPackage.load(new FileInputStream(new File(name)));
return template;
}
(where would you put the file directory on the second code, or how can you specify the file you want to load?)
what does hyperlinkresolver do and why is it necessary? (second link)
what is applying binding in this situation? (second link)
what is the content accessor? (first link)
am I going about this the right way, or is there an easier/better way of doing this?
I am using Eclipse with Java on a Windows 7 if that helps.
I would appreciate any help, thanks!
Also if anyone has any examples with good comments or explanations, that would be helpful!

You probably ought to take a step back and decide which approach to injecting your data you want to take. Docx4j supports three approaches:
replacing variables on the document surface (brittle but simple)
mail merge (using MERGEFIELD), good for legacy documents
content control data binding (your 2nd link; the modern/sophisticated/powerful approach, but you need to understand XML, and may be overkill here)
For answers to most of your specific questions, please take the time to read docx4j's Getting Started guide.

Related

Is there any way to create a dynamic word document to an existing template in Java?

I need to automatically generate 4 different types of CVs using Java/Spring. The information is already in the database in a structured way. However, we need to generate a Word Document for 4 different types of CVs. If you have noticed in the Europass format there are sections like work-experience and education and training that need to be duplicated more than once.
I have seen a docx4j version , where creating an XML file and adjusting the word document to comply with that XML can make it work. However, what I can't seem to be figuring out for now, is how to add repeating sections, for example a list of experiences. Not only do I have to repeat the actual data, but I also have to duplicate the text in the existing template.
If any of you guys knows any other library/plug-in/tech that might help me to dynamically create a word document (the CV) using Java, please let me know.

XDocReport generate report : loop thru collection in table (java)

I have been struggling with trying to follow a code sample by XDocReport(open source project).
I followed this tutorial from the website:
https://code.google.com/p/xdocreport/wiki/DocxReportingJavaMainListFieldInTable
I used the Freemarker template style.
I would not iterate and create the table, I just get back: $variable as text in the output doc.
Then I dug further, and discovered that this tutorial on the website was probably not updated for the newer version. I found some more examples in this url, which contains a zip file.
https://code.google.com/p/xdocreport/downloads/detail?name=docxandfreemarker-1.0.4-sample.zip
I still could not get it to work.
I was hoping someone would have a working code sample that takes a java collection and populates a table in a Word document.
I hope one of the developers of XDocReport, angelo.zerr, would give some input on this.
Sincerely,
P
I was hoping someone would have a working code sample that takes a java collection and populates a table in a Word document.
What is the problem with https://code.google.com/p/xdocreport/wiki/DocxReportingJavaMainListFieldInTable?
I suggets you that you create an issue on XDocReport forum with a very simple case (simple Java main + docx)
It seems that the issue was the template. If one sets up a mail merge field in a Word template and don't use it in the Java program, the program then complains it can't find the variable, or something to that effect. And if you just delete the mail merge text in the document, it may still exist as a mail merge field variable in the word document.
So one needs to be very careful it seems with how to set things in the template.
I think the API should be able to ignore if there is a field setup in the template, and we are not referencing it in the code though. But that solved the problem.

How to find and extract "main" image in website

I need help tackling a problem. I need a program which, given a site, finds and extracts the "main" picture, i.e. the one which represents the site. (To say it is the biggest or the first picture is sometimes but not always true).
How should I approach this? Are there any libraries that could help me with this?
Thanks!
OPTION 1
You could checkout Goose. It does something similar to what Pocket and Readability does, i.e. try to extract the main article from a given webpage using a set of heuristics. It can apparently also extract the main image from that article, but it is a bit of a hit and miss, so 60% of the time it works everytime.
It used to be a Java project but rewritten to Scala.
From the readme
Goose will try to extract the following information:
Main text of an article
Main image of article
Any Youtube/Vimeo movies embedded in article
Meta Description
Meta tags
Publish Date
Try it here: http://jimplush.com/blog/goose
OPTION 2
You could use a Java wrapper (e.g. GhostDriver) for running a headless browser, like PhantomJS. Then, fetch the website and find the img element with the largest dimensions. This GhostDriver test case shows how to query the DOM for elements and get it's renderd size.
OPTION 3
Use a library like jsoup that helps you parse HTML. Then get the value from the src attribute from all img tags. Request each URL you find for an image and measure their sizes. The one with the biggest dimensions is likely to be the website's main image.
Another solution would be to extract the meta tags for social media sharing first, if they are present, you are lucky otherwise you stil can try the other solutions.
<meta property="og:image" content="http://www.example.com/image.jpg"/>
<meta name="twitter:image" content="http://www.example.com/image.jpg">
<meta itemprop="image" content="http://www.example.com/image.jpg">
If you are yousing JSOUP the code would be like that:
String imageUrlOpenGraph = document.select("meta[property=og:image]").stream()
.findFirst()
.map(doc -> doc.attr("content").trim())
.orElse(null);
String imageUrlTwitter = document.select("meta[name=twitter:image]").stream()
.findFirst()
.map(doc -> doc.attr("content").trim())
.orElse(null);
String imageUrlGooglePlus = document.select("meta[itemprop=image]").stream()
.findFirst()
.map(doc -> doc.attr("content").trim())
.orElse(null);
You could use a service like embedly. Among a lot of other information they allow you to extract the main image of any page. Works particularly well for articles. You can try it here.
You need artificial intelligence to do so, Computer Vision namely.
It too big to fit in an answer. This link might help
If you are a mathematician with experience of Probability and Bayes rule, then you can just take the unit called Image Processing and Computer Vision.
If you are looking for available software you want to use check this out...
This stackoverflow thread might help...
There's this software called moodstocks which might help.
ImageResolver can do that for you without the need of server side interaction, except for a small proxy script.

How to edit docx field contents with Java?

I have a .docx template with fields defined in it. I need to take data inputted by a user in a web-service and insert it into those fields using Java.
My team and I have been researching this for most of the day, and we have been unable to find a straightforward solution to this.
Is there a way to do this relatively easily?
Thanks.
EDIT:
After pressing alt+F9, all of the fields display like this: { FORMTEXT }
POI doesn't seem to have sufficient support to do this.
I was unable to successfully set up the Open Office SDK in Windows XP because I couldn't fulfill all of its dependencies.
docx4j may work, but MailMerger in it is currently not filling the fields in with the given data.
If I extract the docx and open the word/document.xml file, this is what the XML around one field looks like: http://pastebin.com/uXBtz7X5 (search for FieldName and FieldValue to see where these are defined)
Have a look at Docx4j which you can use to update fields in docx documents there is also and example
fieldupdater example
Disclosure: my company sponsors docx4j
Have a look at MailMerger; see the main method at the bottom.
For fields of other types, you can try the more generic field support.
The docx format is a zip file, with XML and other files inside. You may be able to edit the XML files using standard XML tools.
Docmosis and JODReports might help you - they are Java libraries for producing documents / populating templates in several formats. Docmosis can work with DocX and since they are based on the same techologies JODReports probably can too. I don't know if the particular {FORMTEXT} field is going to work, but Docmosis can work with plain-text files or Word's merge fields which look like {MERGEFIELD} when you press ALT-F9.

What technologies are there for formatted, structured data input and output?

I am working on a project here that ingests internal resumes from people at my company, strips out the skills and relevant content from them and stores it in a database. This was all done using docx4j and Grails. This required the resumes to first be submitted via a template that formatted everything just right so that the ingest tool knew what to look for to strip the data.
The 2nd portion of this, is what if we want to get out a "reduced" resume from the database. In other words, I want to search the uploaded content I now have, and only print out new resumes for people who have Java programming experience lets say. So I can go into my database, find the people who originally had java as a skill, and output a new set of resumes that are also still in a nice templated format, and only have the relevant info in them, instead of ALL the content.
I have been writing some software to do this in Java that will basically use a docx template, overwriting the items in customXML which are bound to the content controls in the doc, so the new data shows up and can eb saved as a new docx with that custom data.
This seems really cumbersome to me, and has some limitations. For one, lets say my template has a place for 3 Skills, and the particular person has 8 skills. There seems to be no good way to add those 5 additional skills to the docx other than painstakingly inserting the data with all of the formatting XML tags and such. This is a real pain, because if the template changes, I dont want to have to go back into my software and edit source code to change that additional data input XML tag to bold instead of italic.
I was doing some reading up on using Infopath to create a form that I could use to get the input, connecting to some sharepoint data source or something to store the stripped out data. However, I can't seem to find out if it is possible using sharepoint to get the data back out, in a nice formatted way. What would the general steps for this be? It seems like I couldnt find very much about this topic with any quick googling.
Thanks
You could set up the skills:
<skills>
<skill>..</skill>
<skill>..</skill>
and use a "repeat" content control pointing to the container. This would handle any number of <skill> entries.

Categories

Resources