Looking for a "Universal" Document viewer component/library [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am looking for an Applet with similar functionality to the Oracle/Stellent OutsideIn ActiveX control or the Autonomy KeyView technology that act as a browser plug-in allowing the rendering/display of a large number of file formats (Word processing, spreadhseet, graphics, etc.) I currently use the Stellent solution, but due to some restrictions of some of our clients would prefer something that either exists as a Java Applet, Silverlight control, or has a Java API that I could build an applet on top of (neither of the two I mentioned do).
At a bare minimum it would need to display at least the following formats:
MS Word, Excel, PowerPoint
MS Outlook MSG files
Adobe PDF
Standard image formats: BMP, PNG, JPEG, TIFF
WordPerfect
HTML
Any suggestions?

If a commercial product is an option, ViewOne is a nice product. It's an Applet and you can view a large variety of document.

It's not a plugin, but multivalent is a java library and browser for a large number of document formats, but probably not all the ones you'd like to cover.
It does at least cover the PDF, HTML, and any reasonable image format, but not any of the proprietary formats.

If you are looking for pure Java component that supports all these formats, I'm pretty confident that it doesn't exist. If what you want is to embed Browser, MS Office, Acrobat etc. you would need an ActiveX container.
Here are some choices:
JDIC - if you are using Swing (see the Document Viewer demo.)
SWT ActiveX container - if you are using SWT
TeamDev WinPack - if your time is more valuable than your money ;-) The product is very polished, the price is reasonable and the support is excellent.
Note that with any of these you need to have installed Acrobat, MS Office (or the free doc viewers) and whatever else applications you need to edit the file formats.

Have you looked at Adeptol AJAX Document Viewer.
A no plugin non applet no install viewer which supports more than 300 file typess.
See ajaxdocumentviewer.com

You may be interested in Net-it Central. It uses an Active-x plugin or java applet and works with several different formats. I am using it for Word and Excel currently.

Related

Create a pdf for J2EE applications [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Could you please anyone tell me the way to create a pdf document for J2EE Application other than iText.
We are previously used the iText, but the problem is the html file (which is generated from Jsp) display is different with the generated PDF. So I need some other way to create a pdf as same as jsp display.
Any one please suggest me the libraries other than iText?
Thanks in Advance.
You are probably using iText 5 and XML Worker. Have you tried iText 7 and pdfHTML? See the HTML to PDF tutorial.
You will need:
iText 7: https://github.com/itext/itext7
the pdfHTML add-on: https://github.com/itext/i7j-pdfhtml
You claim:
the problem is the html file (which is generated from Jsp) display is different with the generated PDF.
That is certainly true when you use HTMLWorker (which you shouldn't) and it's true in many cases for XML Worker. But we rewrote iText from scratch because of the mismatch between the old iText architecture and the requirements when converting HTML to PDF.
If you have a problem with the HTML to PDF conversion, please explain the problem in a question and tag that question as an iText question. If we can improve iText 7 + pdfHTML, why wouldn't we do that?
This is always tricky because HTML and PDF have differing purposes (with a lot of overlap). That means "simply" converting between the two is sometimes not going to work well.
You can
Snapshot an image of the HTML and PDF the image. This has various downsides (can't search / extract text easily, larger, poor zoom, pagination) but is simple if it doesn't conflict with your requirements.
Use a PDF system (like iText) to construct the PDF as desired using the same data. This is obviously more work (possibly a lot more), but is the optimal result in terms of PDF quality and fitness for purpose.
Simplify/adjust your HTML so it converts better into PDF. This depends on what HTML tools/libraries you are using - you might not have much control over the HTML.
Try various other conversion libraries to see if you find a tool that works better for your HTML.

PDF Generation Library for Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I know this has been asked before, but I'm still undecided on which PDF generation framework to use for my current project.
My requirements
on-the-fly generation of PDF documents (mainly order forms, invoices)
Java based
easy to layout
should be open source
easy to change layout
A lot of people seem to use iText, but I have some concerns (apart from the changed licence) regarding separation of concerns: In an HTML context there's good MVC support, where I usually stick to Spring MVC and FreeMarker to separate logic and layout. I'm a little bit worried that with iText you end up mixing code and layout a lot.
I am aware, that Apache FOP could be a solution here, but then again I find XSLT tedious to work with and I read that FOP can be slow when it comes to huge throuput of many documents?
I also considered JasperReports, but from my understanding this is more suited for reports containing tabular datasets rather than single documents such as invoices which require a lot of layout formatting?
Any thoughts on this?
Give JasperReports a try. Use iReport to create the .jrxml files. JapserReports can handle complex layouts. For those parts of the report based on different queries have a look at using subreports embedded into the main report.
Just like #Adrian Smith's solution this approach will separate the report layout editing from the data sourcing.
I have implemented a good solution where my software creates a format-independent "pure" XML file, then I give my boss the XSD and he puts it into Altova StyleVision where he can WYSIWYG design reports based on data he plucks out from the XSD. That software produces an XSLT. So my program:
Produces the format-independent "pure" XML
Transforms it with the XSLT, the output of which is XML-FO
Use Apache FOP to convert the XML-FO into PDF
This is a really great solution, means no more do I (as a programmer) have to change my code each time my boss wants to change a color in the report, my job is simply to produce "pure" XML.
Update: I should also point out that I give my boss access to our SVN repository with Tortoise SVN which is sufficiently easy to use that he can use it without error. So he can check the XSLT files straight into SVN and run the build/deploy without even having to interrupt me from my work. Obviously that workflow only works with people who are sufficiently exact that they don't make mistakes etc., but it works out well for us in that case.
Based on my experience, I would suggest you to consider following Java PDF Libraries for creating PDF reports,
DynamicReports
Apache PDF Box
iText PDF
PDF Clown
For your requirement, I think DynamicReports would be the right choice. I have been using Dynamic Reports from last 3 years for all my PDF Reporting requirements. With a very less amount of code, you can easily create a truly dynamic PDF. Dynamicreports is a wrapper around Jasper Report. So, it internally makes use of Jasper report.
Docmosis allows you to create templates in Word or OpenOffice writer - separating concerns nicely and layout is then in the most familiar tools.
I have been using JODConverter for a while and I really like it.
What we do is use JODReports to generate dynamic OpenOffice.org documents (which internally uses FreeMarker). Then we convert these documents to PDF documents using JODConverter.
It sounds like a lot of work, but it really isn't.
One possibility is
to create your documents in PostScript format and then
convert it to pdf using ghostscript (ps2pdf)

Convert PDF to Word in Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is it possible to convert PDF to Word in Java? I'm not talking about parsing a PDF document and then custom render it again to Word. I want a Java library that can directly convert it.
Reading PDF documents is a very involved process and there are no good free libraries for extracting non-text information from PDF documents in Java. Worse yet, PDF documents have a lot of layout information that is hard to reconstruct, for example a table in a Word document becomes some lines and a bunch of pieces of text in PDF.
It is almost impossible to recreate semantic information from an arbitrary PDF. If you have the same tool that wrote it you have somewhat more chance but even so there is much uncertainty. The only thing you can be sure of in a (text) PDF is the position of each character on the page. (Note that some PDFs include bitmaps in which textual information occurs and that has to rely on OCR).
There are several groups in computer science departments and elsewqhere who are spending very significant effort to try and get semantic information. We collaborate with Penn State - one of the leaders - and they are working on extracting tables. In good casees they get 90% in bad ones 50%.
So the answer is formally that you cannot, but you may occasionally be fortunate. (We do a lot of this for chemistry and count ourselves lucky if we get 50% on a regular basis).
You can try to do it with the iText library. Read the PDF and then write it as an RTF.
This is not that simple though, as you have to preserve the different style that the PDF has.
You can use some external tools.
Install some free program like "Free PDF to Doc" and execute it from you java program.
This Works fine in most cases.
use the Acrobat Pro SDK from you java code.
Best of luck

HTML rendering algorithm [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 months ago.
Improve this question
I am making an e-book reader for the J2ME and I wonder if I could make it render HTML pages. For the moment, I am using some simplified styling of my own.
So, could anyone point me to a good in-depth tutorial or a specification of an open-source HTML engine? Of course, I have some idea about it all, i.e. the main steps involved, the usage of finite-state machines an so forth, but it's not enough.
But why reinvent the wheel, when it's complicated enough? Do you know of any HTML engine written purely in Java, and light enough to be used as a lib in a J2ME project?
P.S. For the J2ME know-hows:
Porting from Java SE to J2ME is not necessarily an issue for me
I am not yet concerned about the inability (or at least unsuitability) of using vector fonts
UPDATE
If you could only point me to a detailed guide about layouting HTML code, I'd be more than grateful! I need to layout some very simple HTML, like text with basic styling, images, divs and tables. That's all.
(I know it's not trivial even though I need simple layouting, that's why I am asking.)
Webkit comes to mind.
I think Firefox uses Gecko Layout engine. Could prove helpful. More here
https://developer.mozilla.org/en/docs/Gecko and
https://wiki.mozilla.org/Gecko:Home_Page and
For some videos http://redivide.com/blog/gecko-reflow-awesome-visualization-of-web-page-layout/
Dear me, I seem to be answering my own question.
The only possibilities that I found are:
J2ME Polish HTML Browser Component
J2MEHTML
Fire
Unfortunatelly, neither of these seems to be agile enough so that I could implement it for my own puproses, which are:
render on any Graphics object
support for bitmap fonts
split content to pages
TeX hyphenation
be able to obtain the word (if any) at a given point on the image.
This all I've done, but the trouble is that it is not rendering html, but custom and limited styling.
I googled and found Cobra
Another option would be LWUIT
It has an HTML component in last version.(see http://www.nextgenmoco.com/2010/05/css-support-added-to-htmlcomponent.html)
LWUIT is a swing-insipered set of UI components for J2ME, it's open source and had some sort of SUN support, I don't know if oracle will still support it.

PDF Text Extraction Approach Using OCR [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written.
I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable.
I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible).
Any help would be appreciated.
If you have a text-based PDF, I'd strongly recommend PDFTextStream. It's not free, but licensing is reasonable, and it is much much better than PDFBox. PDFBox chokes on many PDF files which are generated by newer tools, and is not too consistent about PDFs it can handle. PDFTextStream handles any PDF I throw at it, including PDFs with embedded PNG images, which PDFBox can not do.
If you heckle the PDFTextStream folks to add OCR, they may listen up.
We use ABBYY FineReader Engine 11. They have java wrapper.
Pros:
It works great with all the languages (English, Russian, Uzbek etc) and doing real OCR (even if you have pdf without OCR they perform rendering at first and OCRing).
Cons:
It costs. You have to buy developer license and end-user license.
And it is EXTREMELY slow.
If you want to extract OCR from text based PDF you may have to convert it to an image first.
You can use Java wrappers of Tesseract - tesjeract or Tess4J - to perform OCR. However, for PDF, you'll need to convert to image (PNG or TIFF) first before feeding it to the OCR engine.
VietOCR calls Tesseract executable to perform the text extraction. It uses GhostScript to do PDF-to-image conversion.

Categories

Resources