I am working on project which converts html to a .doc file. I have implemented the html using divs not td/table. When I generate/download the doc file, the css which I have used in html is not applied.
I did some research and I found that .doc does not support some css attributes e.g position, float etc.
https://superuser.com/questions/146453/css-absolute-position-dont-work-in-ms-word
Is there any alternative to get css applied in .doc format
Can someone please help
Here Is list of supported attribute.
Ms word supported html tag
Here you can see and use it carefully in your project. It will help to make you minor change.
Maximum tag has full support except div tag.
Link for check COREEXTENDED
div has COREEXTENDED support. You can see it at link.
Related
I want to convert a portion of my jsp page to pdf. Now I am considering iText for this. Now iText will convert html code to pdf which is fine. But how to fetch my html code with all the stylling applied?
I have my css classes in external css file. I am trying to fetch html code but not getting the css rules applied to elements.
Is there any alternative way?
Does the apache FOP supports external CSS file for generating PDF from html document? I am specifying the css file path html file but styles are not applied on pdf report generated. Also I tried copy pasting the entire style content inside tag in html document. Still generated report does not the style applied. Since I am the beginner for FOP, I would be really great if someone tells me What am I missing here?
Another fundamental question would be.. does the Apache FOP supports external css file?
To use CSS with XSL FO you would need something that processes CSS into XSL FO. You can look at the link below which is not Apache FOP but uses RenderX XEP behind the scenes. It allows for XML and/or HTML with CSS internal and external, leveraging XSL FO technology to format content.
http://www.cloudformatter.com/CSS2Pdf
I want to convert a HTML page into MS word. I want to know what API's will be helpful and also if there is any other option to do the same.
The entire page is to be converted into .doc (eg. If there is a table in the html page, a similar table must be created in the word doc) .
Apache POI does not provide an option to format the word document as in the HTML page.
I need something that can give me a completely formatted word document.
Some of the things that i seek are JSOUP, docx4j, jasper reports, and JOD Convertor.
I tried parsing the HTML page using JSOUP and I get the contents of
the page in my java program. Now I need to pass these contents to a
doc/docx file. Can docx4j be helpful to get a formatted docx file?
Please help.
Thank you.
I would go with Ashwini Raman's suggestion. It wont work with every scenario. In the case of a complex HTML document with many images and stuff word will not do a good job. But for most cases it should be fine. Otherwise, there is a complex task ahead of you. You will have to parse your HTML document using the jsoup library for example and then use the docx4j library to create your workd document.
Links to both are here:
http://www.docx4java.org/trac/docx4j
http://jsoup.org/
When you are doing it also, the formatting might be iffy.
To answer your original question, no there is no ready made library that does what you are expecting. At least I havent come across any.
I found a way round to do the same. First I need to get the parsed objects using JSOUP and pass these to a document template. I am now looking for the options that can provide me creating easy templates and creating the document dynamically.
I have asked another question regarding the same.
Which APIs in java help in extracting table metadata from a pdf, and presenting that table in a web page?
The result should be that when the source of page is viewed it will show the html code of that table.
Itext is usefull in this context
http://itextpdf.com/
I assume that, you need a PDF library for Java.
PDFBox is one of the popular libraries created to PDF manipulation and I think it is worth to look at it.
try The Metadata Extract Tool which extracts metadata from specific file types including PDF. Then you can parse the xml output with any Java XML parser. Once you're able to parse it, elements can be easily laid down in your view page.
I am using Apache Tika to convert RTF documents to HTML.
In Tika's RTFParser class I made changes to generate HTML file using HTMLEditorKit and now I'm able to generate the HTML file.
I want to add the metadata tags into the head tag of the generated HTML file.
Can anybody give me an idea to how to proceed?
Check this out:
Add Metadata
I'm not sure that this will help, but I think it is worth to check out.