Convert html string to pptx in java - java

In my application, there are notes being fed by user inside browser. These notes can be formatted for font, size, color etc. These notes are saved in database using html tags string.
Now I want to export these formatted text into PPTX. Is there any solution for it? Currently, I have tried Apache POI which allows for formatted text but does not allow input of html string.
I am looking for open source library, so using Aspose is a difficulty. Somehow, I need to render these HTML text and then copy as it is to PPTX.
Any solution or way will be helpful.
EDIT: I am thinking for custom parsing the string html text; using JAXB to convert the tags into objects and then using some java logic to integrate POI with it. Any wayout/ help on achieving this will be appreciated.

Aspose.Slides offers you to import HTML text inside presentation and also exporting presentation to HTML. I suggest you please visit the following documentation link to serve the purpose in this regard. You are right that Aspose.Slide
I work as developer evangelist at Aspose.

Related

Java Detect language of Content from large String

I am working on a project, where there are pdfs with content is English and Spanish language,I am interested only in English part of it and save it to Database.I am using Apache PDF box for extracting the text out of it.How can I avoid Spanish content and get text having only English part of it.I tried out some library like Apache Tika and https://code.google.com/p/language-detection/ but they are not giving correct result in some cases.Can anyone please provide some reliable solution or any other way to achieve the requirement.
Thanks in Advance.

Creating Docx, PDF, XSL-FO

[Background Info]
We had a solution in place to use Word automation serverside to convert HTM documents into Docx, PDF or Print documents. This solution broke in the latest version of Windows Server 2012. We learned that MS does not intend on Word working in this manner and after trouble shooting with MS support Engineers we have come to the conclusion that it will never work.
[Currently]
I am currently researching potential technologies and tools that my company can use to regain this functionality. We need to be able to create Docx, PDF and print files to a local printer.
I have looked into a number of tool already and I am currently leaning towards Apache FOP this seems to handle PDF and Printing for us.
However, I'm looking for some advice and suggested tools that we could use to implement a pure Java approach. Currently our application creates HTM files with all the required information. So ideally we would like to take these HTM files and "Convert" them into Docx/XLS-FO format.
[Question]
So my question that I'm hoping you will be able to help me with.
What is the best tools that I can use to get from
HTM to Docx
HTM to PDF
Or what would be the best process for achieving this? has anyone had success finding a solution for this in the past?
Thank You
It depends on the level of control and the complexity of the source HTML. There are HTML to FO stylesheets but you might find them wanting for your specific need.
So you could use the Jericho parser to read the HTML and generate FO. Or you generate the target format directly using Apache PDFBox and Apache POI
It all boils down to the level of control you want/need
docx4j-ImportXHTML will get you from XHTML to docx. From there, you can use docx4j (or some other solution eg LibreOffice/OpenOffice) to do docx to PDF.
docx4j supports docx to XSL FO, and by default uses FOP.

Convert HTML page into MS word using java or any API

I want to convert a HTML page into MS word. I want to know what API's will be helpful and also if there is any other option to do the same.
The entire page is to be converted into .doc (eg. If there is a table in the html page, a similar table must be created in the word doc) .
Apache POI does not provide an option to format the word document as in the HTML page.
I need something that can give me a completely formatted word document.
Some of the things that i seek are JSOUP, docx4j, jasper reports, and JOD Convertor.
I tried parsing the HTML page using JSOUP and I get the contents of
the page in my java program. Now I need to pass these contents to a
doc/docx file. Can docx4j be helpful to get a formatted docx file?
Please help.
Thank you.
I would go with Ashwini Raman's suggestion. It wont work with every scenario. In the case of a complex HTML document with many images and stuff word will not do a good job. But for most cases it should be fine. Otherwise, there is a complex task ahead of you. You will have to parse your HTML document using the jsoup library for example and then use the docx4j library to create your workd document.
Links to both are here:
http://www.docx4java.org/trac/docx4j
http://jsoup.org/
When you are doing it also, the formatting might be iffy.
To answer your original question, no there is no ready made library that does what you are expecting. At least I havent come across any.
I found a way round to do the same. First I need to get the parsed objects using JSOUP and pass these to a document template. I am now looking for the options that can provide me creating easy templates and creating the document dynamically.
I have asked another question regarding the same.

Java api for pdf

Which APIs in java help in extracting table metadata from a pdf, and presenting that table in a web page?
The result should be that when the source of page is viewed it will show the html code of that table.
Itext is usefull in this context
http://itextpdf.com/
I assume that, you need a PDF library for Java.
PDFBox is one of the popular libraries created to PDF manipulation and I think it is worth to look at it.
try The Metadata Extract Tool which extracts metadata from specific file types including PDF. Then you can parse the xml output with any Java XML parser. Once you're able to parse it, elements can be easily laid down in your view page.

Changing/Replacing text inside a PDF using Java

Any clue about a good library to programmatically produce a PDF in Java, using a PDF as a template?
try iText. It has many many goodies. There is also a book about it by Manning iText in Action
If you want to be able to edit the Text in the template, you should set up the template carefully and use forms for text content. You can't easily replace text in PDFs because they do not contain text structure.
There is a blog article highlighting some of the issues at http://pdf.jpedal.org/java-pdf-blog/bid/17370/Problems-editing-PDF-files

Categories

Resources