html2pdf conversion converting with different language

html2pdf conversion converting with different language - java

I am using itextpdf.html2pdf to convert HTML to pdf format, but conversion converting with different language as well as alignment not correct after conversion.
Html file contains
Pdf file contains
I am using the below code to convert from HTML to pdf:
RestTemplate rTemplate = new RestTemplate();
String d = rTemplate.getForObject(url, String.class);
d = d.replaceAll("</tr>", "<td/></tr>");
org.jsoup.nodes.Document document = Jsoup.parse(d, "utf-8");
document.select("img" ).remove();
ByteArrayOutputStream outputArray = new ByteArrayOutputStream();
HtmlConverter.convertToPdf(document.html(), outputArray);
return outputArray.toByteArray();
I want to convert to pdf with the content as present in the HTML file. Also, alignment should same as an HTML file.

Related

HTML to PDF conversion using OpenhtmlTopdf with ByteArrayOutputStream

I am trying to convert HTML to PDF as encoded string. I am using openhtmltopdf library. I don't want to create a new file in users environment, so I am using ByteArrayOutputStream.
Following is my code:
Document document = Jsoup.parse(html, "UTF-8");
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
document.outputSettings().prettyPrint(false);
// File outputpdf = new File("output.pdf");
try (ByteArrayOutputStream os = new ByteArrayOutputStream()) {
PdfRendererBuilder pdfRendererBuilder = new PdfRendererBuilder();
pdfRendererBuilder.toStream(os);
pdfRendererBuilder.withW3cDocument(new W3CDom().fromJsoup(document), "/");
pdfRendererBuilder.run();
// os.writeTo(new FileOutputStream(outputpdf));
byte[] encoded = java.util.Base64.getEncoder().encode(os.toString().getBytes());
String encodedString = new String(encoded);
I used an online base64 string to PDF decoder and generated PDF while testing. My PDF is coming as empty. When I replaced the ByteArrayOutputStream with FileOutputStream(<fileName>). It is creating a proper PDF file and also when I decode the string it is coming correct.
What am I missing in ByteArrayOutputStream?

HTML tags are getting converted

I have the following code snippet to have output from XML data which is stored in the database table
ServletOutputStream os = response.getOutputStream();
String contentDisposition = "attachment;filename=Test.HTML";
response.setHeader("Content-Disposition",contentDisposition);
response.setContentType("text/html");
XMLNode xmlNode = (XMLNode)am.invokeMethod("getDataXML");
ByteArrayOutputStream outputStream =
new ByteArrayOutputStream();
xmlNode.print(outputStream);
ByteArrayInputStream inputStream =
new ByteArrayInputStream(outputStream.toByteArray());
ByteArrayOutputStream pdfFile = new ByteArrayOutputStream();
TemplateHelper.processTemplate(((OADBTransactionImpl)pageContext.getApplicationModule(webBean).getOADBTransaction()).getAppsContext(),
"INV",
"MyTemplate",
((OADBTransactionImpl)pageContext.getApplicationModule(webBean).getOADBTransaction()).getUserLocale().getLanguage(),
((OADBTransactionImpl)pageContext.getApplicationModule(webBean).getOADBTransaction()).getUserLocale().getCountry(),
inputStream,
TemplateHelper.OUTPUT_TYPE_HTML,
null, pdfFile);
byte[] b = pdfFile.toByteArray();
response.setContentLength(b.length);
os.write(b, 0, b.length);
os.flush();
os.close();
pdfFile.flush();
pdfFile.close();
public XMLNode getDataXML() {
OAViewObject vo = (OAViewObject)findViewObject("DataVO");
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
XMLNode xmlNode =
(XMLNode)vo.writeXML(4, XMLInterface.XML_OPT_ALL_ROWS);
return xmlNode;
}
I have HTML tags which is stored in the table as
<STRONG>this</STRONG> is only a test.
However the above is getting converted to
<STRONG>this</STRONG>is only a test.
How can I preserve the original HTML tags when I execute the code or how do I convert it back to the original without using any third party libraries as we have a restriction of using third party libraries in the server.

take a look at this for more information
The HTML character encoder converts all applicable characters to their
corresponding HTML entities. Certain characters have special
significance in HTML and should be converted to their correct HTML
entities to preserve their meanings. For example, it is not possible
to use the < character as it is used in the HTML syntax to create and
close tags. It must be converted to its corresponding < HTML entity
to be displayed in the content of an HTML page. HTML entity names are
case sensitive.
and then this may help you :
use the Apache Commons StringEscapeUtils.unescapeHtml4() for this:
Unescapes a string containing entity escapes to a string containing
the actual Unicode characters corresponding to the escapes. Supports
HTML 4.0 entities.
Edit
it seems the java itself has this method
URLDecoder.decode(String stringToDecode)
and this
URLDecoder.decode(String stringToDecode, String charset);
hope this works for you

Can I convert docx to PDF while also encrypting it at the same time?

I'm currently converting docx to pdf, then encrypting the pdf. Here is my code:
//Convert
XWPFDocument document = new XWPFDocument(inStream);
PdfOptions options = PdfOptions.create();
PdfConverter.getInstance().convert(document, outStream, options);
//Encrypt
PdfReader reader = new PdfReader("C:\\uploads\\Resume.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("C:\\uploads\\ResumeEncrypt.pdf"));
stamper.setEncryption("hello123".getBytes(), "hello".getBytes(),
PdfWriter.ALLOW_PRINTING, PdfWriter.ENCRYPTION_AES_128 | PdfWriter.DO_NOT_ENCRYPT_METADATA);
stamper.close();
reader.close();
By doing this I am getting 2 files.
What is happening is first I converted the Resume.docx to Resume.pdf, then encrypted the Resume.pdf to ResumeEncrypt.pdf, resulting to 2 files
This is the example -
But I want only one file, that is already converted and encrypted.
This is the example of what I want -
Is it possible to get a single file after converting and encrypting ?

Try use ByteArrayInputStream, give convert pdf.
I made something similar few days ago. Convert Base64 to Gzip and unzip to xml in stream if you want i can give you that code like tip.
So maaybe you can base on this code
//Convert Based64, unzip to xml in stream (strLista is list of Base64 bytes
ByteArrayInputStream in = new ByteArrayInputStream(strLista.getBytes());
try(InputStream reader = Base64.getMimeDecoder().wrap(in)){
try (GZIPInputStream gis = new GZIPInputStream(reader)) {
try (ByteArrayOutputStream out = new ByteArrayOutputStream()){
int readGis = 0;
while ((readGis = gis.read(buff)) > 0)
out.write(buff, 0, readGis);

java apache IOUtils breaks file content

I need to encode/decode pdf file into Base64 format.
So I read file from disk into String(because I will receive file in String Base64 format in future);
String pdfString = IOUtils.toString(new FileInputStream(new
File("D:\\vrpStamped.pdf")));
byte[] encoded = Base64.encodeBase64(pdfString.getBytes());
byte[] newPdfArray = Base64.decodeBase64(encoded);
FileOutputStream imageOutFile = new FileOutputStream(
"D:\\1.pdf");
imageOutFile.write(newPdfArray);
imageOutFile.close();
imageOutFile.flush();
So my D:\\1.pdf doesnt opens in AdobeReader, but if I read file straight to byte array, using IOUtils.toByteArray(..) instead ,all works fine and my D:\\1.pdf file sucessfuly opens in Adobe Reader:
byte[] encoded = Base64.encodeBase64(IOUtils.toByteArray(new FileInputStream(new File("D:\\vrpStamped.pdf"))););
It seems to me thath IOUtils.toString(..) change something inside file content. So how can I convert file to String with not content breaking?

How to encode a pdf...
byte[] bytes = IOUtils.toByteArray(new FileInputStream(new File("/home/fschaetz/test.pdf")));
byte[] encoded = Base64.encode(bytes);
String str = new String(encoded);
...now do something with this encoded String, for example, send it via a Rest service.
And now, if you receive an encoded String, you can decode and save it like this...
byte[] decoded = Base64.decode(str.getBytes());
FileOutputStream output = new FileOutputStream(new File("/home/fschaetz/result.pdf"));
output.write(decoded);
output.close();
Works perfectly fine with all files, not limited to images or pdfs.
What your example is doing is...
Read the pdf into a String (which pretty much destroys the data, since you are reading binary data into a String)
Encode this spring (which is in all likelyhood not a valid representation of the original pdf anymore)
Decode it and save it to disk

How to keep character "&" from ISO-8859-1 to UTF-8

I'd just written a java file using Eclipse encoding with ISO-8859-1.
In this file, I want to create a String such like that (in order to create a XML content and save it into a database) :
// <image><img src="path_of_picture"></image>
String xmlContent = "<image><img src=\"" + path_of_picture+ "\"></image>";
In another file, I get this String and create a new String with this constructor :
String myNewString = new String(xmlContent.getBytes(), "UTF-8");
In order to be understood by a XML parser, my XML content must be converted to :
<image><img src="path_of_picture"></image>
Unfortunately, I can't find how to write xmlContent to get this result in myNewString.
I tried two methods :
// First :
String xmlContent = "<image><img src=\"" + content + "\"></image>";
// But the result is just myNewString = <image><img src="path_of_picture"></image>
// and my XML parser can't get the content of <image/>
//Second :
String xmlContent = "<image><img src=\"" + content + "\"></image>";
// But the result is just myNewString = <image>&lt;img src="path_of_picture"&gt;</image>
Do you have any idea ?

This is unclear. But Strings don't have an encoding. So when you write
String s = new String(someOtherString.getBytes(), someEncoding);
you will get various results depending on your default encoding setting (which is used for the getBytes() method).
If you want to read a file encoded with ISO-8859-1, you simply do:
read the bytes from the file: byte[] bytes = Files.readAllBytes(path);
create a string using the file's encoding: String content = new String(bytes, "ISO-8859-1);
If you need to write back the file with a UTF-8 encoding you do:
convert the string to bytes with UTF-8 encoding: byte[] utfBytes = content.getBytes("UTF-8");
write the bytes to the file: Files.write(path, utfBytes);

I dont feel that your question is related to encoding but if you want to "create a String such like that (in order to create a XML content and save it into a database)", you can use this code:
public static Document loadXMLFromString(String xml) throws Exception
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
return builder.parse(is);
}
Refer this SO answer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

html2pdf conversion converting with different language - java

Related

HTML to PDF conversion using OpenhtmlTopdf with ByteArrayOutputStream

HTML tags are getting converted

Can I convert docx to PDF while also encrypting it at the same time?

java apache IOUtils breaks file content

How to keep character "&" from ISO-8859-1 to UTF-8

Categories

Resources