Keep CKEditor formatting when exporting to MS Word - java

I'm trying to export a text area (for which I use ckeditor) into a Word document. I'm using JSP, and setting HTTP headers of a target page to receive the textarea value in request scope:
<%#page contentType="application/vnd.ms-word"%>
response.setHeader("Content-Disposition", "attachment;filename=responseLetter.doc")
...
<%=textAreaReqScopeValue%>
However, I lose formatting and style of my source ckeditor (example below) when the Word document has been generated:
<p>Dear Anonymous,</p><p>This is in response to your <strong><em><u>request regarding your continued ...
Is there any way to keep the formatting, either by generating the Word document or through CKEditor?

Using googoose.js or html-doc.js solved my problem. An open xml library should have been used to process html tags for the ms-word output.

Related

parse text from xml

I have following link
https://hero.epa.gov/hero/ws/swift.cfc?method=getProjectRIS&project_id=993&getallabstracts=true
I want to parse this xml to get only text, like
Provider: HERO - 2.xx
DBvendor=EPA
Text-encoding=UTF-8
How can I parse it ?
Well, it's not a text file, it's an HTML file. If you open a file in browser and select view source you will be able to see text enclosed in <char> tags.
When it's opened in browser, these tags and other HTML content is interpreted and output is rendered on the page (that's why it looks like a text). If you want to implement similar behavior in Java then you should look into PhantomJS and/or JSoup examples.
It looks like a text file but it is an XML file and the browser just displays its text content.
To verify right click and look at the page source.
You can use a library like Jsoup for parsing the file and getting the contents.
https://jsoup.org/cookbook/introduction/parsing-a-document

Skip the content while HTML to Excel conversion using aspose.cell in Java

I want to convert the HTML to Excel using aspose cell in Java, but the generated Excel skipping the content.
HTML content :
Hi Fanny,
Urgent !! 
SPR'17 - S/545175 -- ADSO# 16843754;
SPR'17 - S/545175 -- ADSO# 16843754;
fdzjchxk;shdgasz;ASDO;fhsjdzx
dyzhbsxz;sdhbdugvfd;36457q;sfdnzcx;
Best regards
Tel: 0123-1234 8765
Generated Excel file content :
SPR'17 - S/545175 -- ADSO# 16843754;
I am using aspose cells-16.12.0.jar and there is not any error occurring. The Content doesn't has any image or table etc but have special symbol. The code is executing fine without any error. I feel special symbol is creating the problem.
Aspose.Cells supports to read/write MS Excel-oriented HTML file, it means, you got to use only those HTML tags which MS-Excel uses in the formation of the file. I guess your template HTML is not good. I think you may try to open the HTML file into MS-Excel first to check if it is opened fine or not. If MS Excel opens the file fine then Aspose.Cells APIs should be able to read it fine.

POI enable different header/footer for the first page in word docx file

I'm generating a docx file using Apache POI 3.13 and I stuck with headers/footers for first page.
I create XMPFParagraph[] without any problem. Next I create headers and footers like this (I've tried in different oreder):
policy.createHeader(XWPFHeaderFooterPolicy.DEFAULT, defaultHeader);
policy.createFooter(XWPFHeaderFooterPolicy.DEFAULT, defaultFooter);
policy.createHeader(XWPFHeaderFooterPolicy.FIRST, firstHeader);
policy.createFooter(XWPFHeaderFooterPolicy.FIRST, firstFooter);
Once I generate my docx file I could see my default header/footer on every page including first one. But if I select to use different header/footer for the first page - my first header and footer apperes correctly.
How could I make this happens automaticaly via code? And is there any appropriate documentation with examples about POI?
If you want to set a first page header in a section, you must enter a title page tag in section properties tag (w:sectPr). The title page tag can be empty, but it is necessary. In your case, you can add only 2 code lines:
CTSectPr sect = document.getDocument().getBody().getSectPr();
sect.addNewTitlePg();
`Best regards!

HTML -> PDF rendering issues

I'm generating pdfs from HTML pages with an application. Sometimes, the pdf is formatted correctly (with styles); other times, it lacks style elements.
In the log file I can see the "Error in rendering".
We are using HTML tags and using string buffer we are converting html tag to pdf file. Not sure why we are getting this missing format issues while generating the pdf file.
So sometimes the CSS file (style) does convert with the HTML file, and sometimes the CSS doesn't convert with the HTML file.
I'm guessing that you use an external CSS file. If I were you, I would try to type your CSS code inside your HTML file, under the header element, like this:
<style>
body {background-color:#fff}
h1 {color:#eee}
</style>

how to place HTML text into OpenOffice document using OpenOffice API

Lets see at this example:
I've got HTML tagged text:
<font size="100">Example text</font>
I have *.odt (OpenDocument Text) document where I want to place this HTML text with formatting depends on HTML tags (in this example font tag should be ommited and text Example text should have 100point size font in result *.odt file).
I prefer (but this is not strong requirement) to use OpenOffice UNO API for Java to achieve that. Is there any way to inject this HTML text into body of *.odt document with simple UNO API build-in HTML-odt converter or something like this (or I have to manually go through HTML tags in text and then use OO UNO API for placing text with specific formatting - e.g. font size)?
OK, this is what I've done to achieve this (using OpenOffice UNO Api with JAVA):
Load odt document where we want to place HTML text.
Goto place where you want to place HTML text.
Save HTML text in temp file in the system (maybe it is possible without saving with http URL but I wasn't testing it).
Insert HTML into odt following this instructions and passing URL to temp HTML file (remember about converting system path to OO path).
Maybe you can use JODConverter or you can use the xslt from xhtml2odt

Categories

Resources