MHT to HTML to PDF in Unix

MHT to HTML to PDF in Unix - java

I have a project with requires conversion of MHT documents to PDF format. The documents are large size drawings (C, D, E).
The documents are manually loaded into my web application (Apache/Tomcat on Unix AIX) and the requirement is to convert the MHT file on the file to have a more portable file.
I broke the project down into two steps:
1) MHT to HTML extraction (with images)
2) HTML to PDF conversion.
For step 1, thanks to this link How to read or parse MHTML (.mht) files in java , I was able to come up with a java solution for extract and create an HTML file. and it is working well. I had to enhance the code a little bit to work with my environment.
For step 2, things have been a little more difficult. I started looking into the html2doc software http://www.msweet.org/projects.php?Z1 , after spending a few days building the code, I found out it only handles letter and legal size documents.
I started looking at wkhtmltopdf http://wkhtmltopdf.org/ , but it's becoming a task on its own to build it.
Overall, AIX Unix is not the friendliest environment to build applications in and most options run in other OSs. I'm using the xlc compiler whenever possible.
I'd like to have a java solution, but any solution is can just execute would be just fine.

Related

Creating an editable document via java web application

I am looking for a convenient method to export some data from my database into a form that would be editable afterwards. The perfect scenario would be to export a word document, and perhaps a brutally simple solution would be to generate HTML and copy/paste it into Word.
I've looked at several open source libraries for generating word documents, but they seem a bit too simple or incomplete. I need support for tables and embedded images and control over formatting the fonts, table borders etc. (too much formatting seems to be lost when copying html and pasting into word).
Although Word is the end format, it'd be fine to generate it in any format that word would be able to open and subsequently save as DOCX.
I really haven't been able to find anything about generating ODT files (server side without client installation).
I would just dive into the ASPOSE libraries, but it'll take ages (and significant pain) to get a purchase order sorted out so I need to make sure its the only viable option before taking that route.
I could generate it as an excel file and copy that to word - this is looking like the best option currently.

Word2Fo Java PlugIn

I'm using Word2Fo to generate XSL-FO code so that my Java application can generate a PDF instead of a Doc. This is all well and good, and I've gotten the raw XSL-FO code now, but there's over a thousand lines of FO instructions, and it would be a pain to go over every single line to format it for Java output in w.write.
Is there any plugin for Word2Fo that can do this automatically on conversion? Directing me to a Word2Fo Plugin library would be help in and of itself as well, since I can't seem to find one naturally existing anywhere.

Extracting contents from a webpage and comparing using Java

I am developing a Java project in which i have a sub-module where i need to extract contents [text, image, color] from a webpage and compare it with another webpage. I am planning to use WinHTTrack software for downloading the webpage locally, but the problem is it doesn't save it as HTML. How can i download a webpage with HTML extension using softwares such as WinHTTrack [or just saving the webpage through ctrl+s is enogh.?]. Also i am planning to use HTML Parsers to extract the 3 content types[text, image, color],after downloading the webpage locally. So which parser to go with.?

WEll I use Httrack and it fetches html files as well. You are probably taking winhttrack project file as the only output file, but if you check inside the project directory there are html files (together with images, etc). I would suggest using - http://htmlparser.sourceforge.net/. It is a java library and since your project is a Java project it should be fairly easy to use it. You can also save the whole website locally using org.htmlparser.parserapplications.SiteCapturer (and specify whether resources such as images should be captured as well). Hope it helps.

Displaying .ai files in Java

I've been talking with an artist and she is planning to send me .ai files for a project I'm working on that is using Java for its front end. Unfortunately, I'm having a lot of trouble searching for this issue because search engines are replacing .ai with "a" (even when I specifically say not to) or are searching for artificial intelligence. Obviously neither of those are what I'm looking for.
Is anyone aware of a Java library capable of rendering .ai files as static images?

.ai file are vector graphics, they shouldn't be used in production. When the final copy of your image is ready your artist should be sending you a .png / .jpg or similar end working file.
.ai stands for Adobe Illustrator and are intended for use only by Illustrator. It's like a developer creating .java files and sending them to a client, it's more likely they'd want a executable jar or a program installer.
Worst case scenario you should install CS5.5 (there's a trial version) and exporting the .ai files to a static file type yourself.

making ePub with Java API

I'm relatively new to ePub format, but if I understand well, to make programmatically an ePub starting from XHTML or PDF content could mean:
choose HTML or XHTML content and validate them with an XHTML validator (or clean them with Tydy)
choose PDF file to insert in the ePub
create the XML manifest or XML packing files and TOC file
zip the whole files in a .epub file
validate the ePub (I saw something in Google code)
So my question is if there is some sort of high level Java API to do these steps. Sure I can use API for ZIP, XML in Java, but does it exist higher tools?
thanks a lot
------ EDIT -------
I've developed an open source project to do that!
http://scribaebookmake.sourceforge.net/

I haven't seen a java epub toolchain; however, I have been having good success with Sigil.
If the goal is to make an epub, I'd give Sigil a go. Before I used it I was rolling my epubs by hand (with the automation of an ant build.xml).
If the goal is to make a java based epub toolchain, then it shouldn't be terribly hard, depending on how much validation and pipelining you wish to do. Personally, I'd start with writing an epub viewer.
As far as the PDF parts go, I just embed XHTML. Haven't had a need for embedding PDF yet. As far as epub validation goes, if all the xml is valid and there's no dangling links prior to zipping, you're going to have a valid epub.

You should take a look at this project which seemed to be converting PDF to epub.

The following is a shameless plug for a project that I've been working on myself. It is basically EPUB tooling written in Java, for Eclipse. It comes with an API, UI and an Ant task that allows you to do pretty much everything. See http://help.eclipse.org/kepler/topic/org.eclipse.mylyn.docs.epub.help/help/introduction.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

MHT to HTML to PDF in Unix - java

Related

Creating an editable document via java web application

Word2Fo Java PlugIn

Extracting contents from a webpage and comparing using Java

Displaying .ai files in Java

making ePub with Java API

Categories

Resources