Apache POI or docx4j for dealing with docx documents [closed]

Apache POI or docx4j for dealing with docx documents [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
What do you think Which is better to use to read docx document as java objects and why ?
in other words. which library supports most of the word tags ?

Disclosure: I lead the docx4j project
Although docx4j can also handle pptx and xlsx, it is mostly used for docx manipulation. By way of illustration, as at the time of writing, there are nearly 1000 topics in the docx4j forum. The pptx forum has only 10% of the volume.
Whatever you want to do with the docx document, docx4j ought to be able to help you. There's a single page overview of a generic workflow.
For many common requirements, docx4j provides higher level API. These include:
Create/open/save docx (of course)
Report/document generation, using a variety of approaches: (i) Variable
substitution, (ii) XML data binding (particularly strong), and (iii) Mailmerge
Export as HTML, XHTML
Export as PDF (with font support)
For anything else, you can manipulate the JAXB representation of the docx to your heart's content. JAXB is a Java community standard, included in Java 6, and with a strong alternative implementation in EclipseLink's MOXy. (POI uses XML Beans instead of JAXB)
There's a web app to help you explore a docx, and generate Java code to create corresponding Java objects.
Of course, if there is some specific task you have in mind, it may be that docx4j or POI has a particular strength there.
Both docx4j and POI are ASL v2 licensed.
docx4j is actively maintained; its source code is on GitHub.
In addition, commercial support is available for docx4j if you want it, as are several commercial extensions eg MergeDocx.
docx4j does rely on POI as a library for its implementation of the OLE 2 Compound Document format, which we're grateful for.

I think Apache POI 's main focus is on dealing with spreadsheets though i has features to read word documents and it uses xml beans to do so.
Docx4j mainly deals with docx documents using jaxb. Usually jaxb allows xml to java object conversion hence i think docx4j would be preferable for your case.

If you are dealing with docx document, docx4j is more convenient than Apache POI.
You can use following links to learn basics of docx4j. Also, there is a nice forum of docx4j.
1.http://blog.iprofs.nl/2012/09/06/creating-word-documents-with-docx4j/
2.http://www.smartjava.org/content/create-complex-word-docx-documents-programatically-docx4j?

I tried Apache POI, but the problem is when printing anything from docx file (Ex: To print all "Heading1" elements from docx),it gets printed lots of bad data and whitespaces. Docx4j will avoid this bad data, I tried it.

Related

Any open source api to covert to pdf file in JAVA [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I Need to convert below file format to pdf format.
TIF,TIFF,TXT,JPG,JPEG,BMP,DOC,DOCX,XLS,XLSX,PPT,PPTX,GIF,PDF
Do we have any open source API to convert into PDF. I tried APACHE POI. but its not look sufficient. Let me know any open source api is available.

Creating a PDF that contains nothing but an image is quite easy using the iText library; its web site has an example that shows how to do that.
Converting Excel files is not hard; the Apache POI library can be used for reading the Excel file, and then again the iText library can be used for creating PDFs that contain tables.
Word can be dealt with in a similar manner (POI also supports it), but it'll be quite a bit tricker, especially if the file contains tables and images, since the POI API for handling DOC/DOCX isn't as advanced as the one handling XLS/XLSX, and of course Word files have a less regular structure than Excel files.
JAI won't be of any help with this.
There are commercial packages available that can be used from Java applications; you may want to investigate those before embarking on writing your own, especially if you need to deal with complex documents - writing your own converter that handles those and generates good quality output could easily take a couple of weeks (or a month) of your time.

Create a pdf for J2EE applications [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Could you please anyone tell me the way to create a pdf document for J2EE Application other than iText.
We are previously used the iText, but the problem is the html file (which is generated from Jsp) display is different with the generated PDF. So I need some other way to create a pdf as same as jsp display.
Any one please suggest me the libraries other than iText?
Thanks in Advance.

You are probably using iText 5 and XML Worker. Have you tried iText 7 and pdfHTML? See the HTML to PDF tutorial.
You will need:
iText 7: https://github.com/itext/itext7
the pdfHTML add-on: https://github.com/itext/i7j-pdfhtml
You claim:
the problem is the html file (which is generated from Jsp) display is different with the generated PDF.
That is certainly true when you use HTMLWorker (which you shouldn't) and it's true in many cases for XML Worker. But we rewrote iText from scratch because of the mismatch between the old iText architecture and the requirements when converting HTML to PDF.
If you have a problem with the HTML to PDF conversion, please explain the problem in a question and tag that question as an iText question. If we can improve iText 7 + pdfHTML, why wouldn't we do that?

This is always tricky because HTML and PDF have differing purposes (with a lot of overlap). That means "simply" converting between the two is sometimes not going to work well.
You can
Snapshot an image of the HTML and PDF the image. This has various downsides (can't search / extract text easily, larger, poor zoom, pagination) but is simple if it doesn't conflict with your requirements.
Use a PDF system (like iText) to construct the PDF as desired using the same data. This is obviously more work (possibly a lot more), but is the optimal result in terms of PDF quality and fitness for purpose.
Simplify/adjust your HTML so it converts better into PDF. This depends on what HTML tools/libraries you are using - you might not have much control over the HTML.
Try various other conversion libraries to see if you find a tool that works better for your HTML.

how can i get words out of pdf file in android [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
i am new to android programing.
I am trying to build a small app that uses the words of pdf file.
Is there any way to get them?
maybe a library that works with pdf format?

If you are looking for a good quality commercial solution, take a look at Aspose.Pdf for Android. It is a PDF processing library that enables you to create, manipulate and edit documents. Features include:
PDF compression options, support for graph objects, extensive hyperlink functionality, extended security controls, custom font handling, integration with data sources, adding or removing bookmarks, working with attachments and annotations, importing or exporting PDF form data, working with text and images, splitting, concatenating, extracting or inserting pages, converting pages to images and much more.
Here is a simple example to extract text from a PDF file.
String input = new File(Environment.getExternalStorageDirectory(), "Document1.pdf").toString();
// Load the PDF document
Document doc = new Document(input);
// Create a text absorber
TextAbsorber absorber = TextAbsorber();
// Accept page 1 for absorber.
doc.getPages().get_Item(1).accept(absorber);
// Extract all text from page 1
String text = absorber.getText();
Log.i("PDF", text);
PS: I am a developer at Aspose.

PDF reading/writing is a huge problem that many Android developers face and sadly, there are very few open source resources available. Most libraries that would work on the JVM use swing and other libraries which are not compatible with Android's VM.
MuPDF and PlugPDF will work if you are allowing the user to read the pdf and select the text they want to extract from it. Both are free including PlugPDF which is free if you are an indie developer.
If you are willing to pay money, there are many commercial libraries which are capable of extracting text from a pdf (iText and Aspose come to mind).

PDF Generation Library for Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I know this has been asked before, but I'm still undecided on which PDF generation framework to use for my current project.
My requirements
on-the-fly generation of PDF documents (mainly order forms, invoices)
Java based
easy to layout
should be open source
easy to change layout
A lot of people seem to use iText, but I have some concerns (apart from the changed licence) regarding separation of concerns: In an HTML context there's good MVC support, where I usually stick to Spring MVC and FreeMarker to separate logic and layout. I'm a little bit worried that with iText you end up mixing code and layout a lot.
I am aware, that Apache FOP could be a solution here, but then again I find XSLT tedious to work with and I read that FOP can be slow when it comes to huge throuput of many documents?
I also considered JasperReports, but from my understanding this is more suited for reports containing tabular datasets rather than single documents such as invoices which require a lot of layout formatting?
Any thoughts on this?

Give JasperReports a try. Use iReport to create the .jrxml files. JapserReports can handle complex layouts. For those parts of the report based on different queries have a look at using subreports embedded into the main report.
Just like #Adrian Smith's solution this approach will separate the report layout editing from the data sourcing.

I have implemented a good solution where my software creates a format-independent "pure" XML file, then I give my boss the XSD and he puts it into Altova StyleVision where he can WYSIWYG design reports based on data he plucks out from the XSD. That software produces an XSLT. So my program:
Produces the format-independent "pure" XML
Transforms it with the XSLT, the output of which is XML-FO
Use Apache FOP to convert the XML-FO into PDF
This is a really great solution, means no more do I (as a programmer) have to change my code each time my boss wants to change a color in the report, my job is simply to produce "pure" XML.
Update: I should also point out that I give my boss access to our SVN repository with Tortoise SVN which is sufficiently easy to use that he can use it without error. So he can check the XSLT files straight into SVN and run the build/deploy without even having to interrupt me from my work. Obviously that workflow only works with people who are sufficiently exact that they don't make mistakes etc., but it works out well for us in that case.

Based on my experience, I would suggest you to consider following Java PDF Libraries for creating PDF reports,
DynamicReports
Apache PDF Box
iText PDF
PDF Clown
For your requirement, I think DynamicReports would be the right choice. I have been using Dynamic Reports from last 3 years for all my PDF Reporting requirements. With a very less amount of code, you can easily create a truly dynamic PDF. Dynamicreports is a wrapper around Jasper Report. So, it internally makes use of Jasper report.

Docmosis allows you to create templates in Word or OpenOffice writer - separating concerns nicely and layout is then in the most familiar tools.

I have been using JODConverter for a while and I really like it.
What we do is use JODReports to generate dynamic OpenOffice.org documents (which internally uses FreeMarker). Then we convert these documents to PDF documents using JODConverter.
It sounds like a lot of work, but it really isn't.

One possibility is
to create your documents in PostScript format and then
convert it to pdf using ghostscript (ps2pdf)

Is there a Java Open source Library for parsing Excel 2007 Files? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there a Java Open source Library for parsing Excel 2007 Files?

Apache POI looks promising.

Apache POI is the pure java answer to the question. 2007 format support is in beta right now.
OpenXLS may support it already (if GPL is fine for you). The commercial version of the same product (ExtenXLS) does support it.
Although not strictly part of the question, I should point out that any rewrite of access to Excel files will always have some deficiency over the original, so Joel Spolsky's advice is a good alternative, if you need it.

Apache POI

From http://poi.apache.org/apidocs/index.html
DDF - Dreadful Drawing Format
This package contains classes for decoding the Microsoft Office Drawing format otherwise known as escher henceforth known in POI as the Dreadful Drawing Format.
HPSF - Horrible Property Set Format
HSSF - Horrible Spreadsheet Format
I love those guys. We will try to use POI to read Excel files, I will look at the JExcel solution too.

http://openxml4j.org/

Not POI. Andy Khan's JExcel is what you want.

I did an assessment of poi and jexcel some time ago and jexcel was far superior. They both use a lot of memory though in the case that you have very large datafiles. By this I mean, I was not able to figure out how to construct an excel file through a stream, such that I didn't have to load the entire file in memory.

I am currently comparing JExcelApi and Apache POI. POI supports Office 2007 in Beta and looks like the best option (in many ways)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.