PDF file content to Base 64 and vice versa in Java - java

I need to convert PDF content to Base64 and use that as a String.
When I use the below program to test the out.pdf becomes blank.
byte[] pdfRawData = FileUtils.readFileToByteArray(new File("C:\\in.pdf")) ;
String pdfStr = new String(pdfRawData);
//My data is available in the form of String
BASE64Encoder encoder = new BASE64Encoder();
String encodedPdf = encoder.encode(pdfStr.getBytes());
System.out.println(encodedPdf);
// Decode the encoded content to test
BASE64Decoder decoder = new BASE64Decoder();
FileUtils.writeByteArrayToFile(new File("C:\\out.pdf") , decoder.decodeBuffer(encodedPdf));
Can anyone please help me?

Why are you doing:
String pdfStr = new String(pdfRawData);
instead of passing pdfRawData to the encoder?
Doing so lead to lots of encoding issue, as you don't specify the encoding of the byte array to use to build the string (it will use platform default). And this is clearly redondant (byte array -> string -> byte array)

Related

How to 'decode' a UTF-8 String which is built upon gzipped byte array

I got some legacy text data which is utf-8 encoded, but against a gzipped byte array.
I'm wondering whether I can get the raw data back
something like:
String text = "Hello World!";
byte[] binData = text.getBytes("UTF-8");
byte[] compressData = gzip(binData);//via GZIPOutputStream
//this is what I have
String encodedString = new String(compressData, "UTF-8");
assertEquals(text, smartDecode(encodedString));
Is it possible to provide a function like smartDecode to help me retrieve the original text 'Hello World!' back?

Cannot properly decode Base64 MIME image to byte array (Java)

I'm trying to write some selenium/java test that checks 2FA configuration process. Thus I have to scan some QR code from a page in order to process it with zxing. The image format is Base64 and I'm struggling with decoding it to the byte array. The following code should convert base64 string to byte array, and then write it to the file.
Here is the code I wrote:
String base64Source = LocalDriverManager.get().findElement(By.xpath("//img[#class='qr-code']")).getAttribute("src");
String base64Image = base64Source.split(",")[1];
byte[] decoded = Base64.getMimeDecoder().decode(base64Image);
try (OutputStream stream = new FileOutputStream("QR_CODE.png")){
stream.write(decoded);
}
This code compiles with no errors, but when I try to open generated png file I get only "Fatal error reading PNG image file: Decompression error in IDAT".
I know that base64 string is valid as I was able to convert it to the image using some online converter. Also, I checked the string with online validator and it said that this is a valid base64 MIME string.
Example of the base64 code below:
iVBORw0KGgoAAAANSUhEUgAAAeoAAAHqAQAAAADjFjCXAAAET0lEQVR4nO2dXYrrOgyAP50E5jGB%0AWUCX4uxgljScJd0dxEvpAgacx4KDzoPsxJ3hcqHppadUegiZxB9uQEjWjz2iHJD46wgNjjvuuOOO%0AO+64447fF5ciPbCUi8i4CnEEYBWZAJmWOnS63+yOvygeVFU1AfF0MV3TmU5lWnobob+lB+hUVVWv%0A8YOzO/6i+FLMl3yee2DI9kznzcx9pjK02MR7zu74a+H99wdx7LJAnyUoENJ7FpZ3lXi6yL1nd/w1%0A8R9aB6DxI6EsIxo/LqKQ/5/ZHX9NvGrdoMACwCp11dZlCQmVMK890Cks0OaVn/rbHX8wHkVEZARC%0A6lQ+zz0yATWkfVOCPVsthL3r7I6/GG62rjFfcezQeMooZLuD4SIwZPTa0j36xzv+pDiWBwkJ2NIi%0AJvOgqpo61XnI9e31uPmpv93xR+FUDcuoasZ0babk6wiaG8Xctc4yfK51jt8ku23D4oXdpA3F4Jlz%0AnSlWz+7c1jl+EJdpEYHhIjoDMgGq5x5gFQskPtMqsLypTEO2IX/Hj3f86fDqPrX4VXOpxa6pElJX%0AXli9YvO/buscv132qmrxmqlWWkMqvraEFECJOgb1dZ3jR6TRsNmebIu2Tbk21Su6NmS3dY4fkK0i%0AtgpB154wr6Iso9k1jeNXr1a5GPayWOc9J44fkbZvqfmzU8uc2LN0nUhJnjlx/JDUaGJP1e2BBJbD%0AI9SL5+scvwdeK2LLiIRznyWkUQhp7ZWly7C8F4MXpWtt4l1md/w18epShx91sN3+WeQaUmdRh8ew%0Ajt+pIgatXpWkSb3MUPWvZvNc6xy/Xbb+uq+eeLqIWgfdAqXTBASGhICAeWJdvdPJ8SPS5Ib3Smux%0AfzWugGrr2LtP3NY5frPsHtYammzlBm1tTBM0RbNmEfjU3+74o/A2XxdSUb1duZoti00M4bbO8WOy%0Ax7BNa1MJKaoSliE/1dG1zvGbpGSJm/r+FrTWAkXTbrf7Wrd1jt8uu4e9ustX6zrdfK3WdifXOsdv%0AlxrD0uThml46bdLCuyaCe1jHbxfL10mYFdvszzLaG2XpswCi8aOWwOKYtD0B4Km/3fFH4TVLvIxo%0AFNBy4gRQtv0DLAIMqca6/7zXpqen/nbHH4VfRRNbzcvWdQnM11odVrXuUfQY1vFDslXELqIsfdY4%0AfiEAGk+51zi9bS517Quw9G7rHD8iei0WNGyFryZeLXHtHle4rXP8VmlqE6W+muv2nGHbI7u72bRt%0AmXWtc/wgHmr6RCY6FTmVUxOt8SSc37YlXbnIdMfZHX8xvD3nZC/8A9TMnTabx8qL5PthHb8rvkez%0AZutYRSarza5iuvb75LbO8SPy84TYU0bCWcqpYXF6Uyx8HXJPnAAG35no+BH5flZnTZ+AMHz126md%0AaocTBwWBtb546m93/FH49xi2Pts26qheNRnvuyp8Xef4rSL632P+Xfy/1znuuOOOO+64447/Lfgf%0AFuoX02DU2vMAAAAASUVORK5CYII=
try this
String base64Source = LocalDriverManager.get().findElement(By.xpath("//img[#class='qr-code']")).getAttribute("src");
String base64Image = base64Source.split(";")[1].split(",")[1]; //Try this
byte[] decoded = Base64.getMimeDecoder().decode(base64Image);
try (OutputStream stream = new FileOutputStream("QR_CODE.png")){
stream.write(decoded);
}
Okay, so i figured it out, so now it works.
The thing is when I run:
String base64Source = LocalDriverManager.get().findElement(By.xpath("//img[#class='qr-code']")).getAttribute("src");
it adds a neweline (%0A) characters to the string so before decoding it to byte array I need to run qrCodeImage = qrCodeImage.replaceAll("%0A", ""); in order to remove them.

Junk Characters Coming while Decoding Data in Base 64

I am getting PDF content which is Base 64 encoded. I tried to decode it using NIFI with Processor Base64EncodeContent. The Decoded file I am sending in mail. Below is small sample of output coming in mail.
enter image description here
"No data should be available in . ¹ Check if sent . . All documents are sent as pdf to* 9 : ’ ³: > < âA m¬‘²#%é‚ÇŽÇ¢|ÀÈ™$Éز§Uû÷LÒTB¨ l,îåù˜$â´º?6N¬JC¤ŒÃ°‰_Ïg -æ¿;ž‰ìÛÖYl`õ?èÓÌ[ ÿÿ PK"
How to extract the data in PDF as sent by third party?
I have tried to decode it using JAVA code and there also its failing. Not able to open the PDF, junk characters coming there too.
ConvertedJPGPDF.pdf file used below contains Base64 encoded String.
String filePath = "C:\\Users\\xyz\\Desktop\\";
String originalFileName = "ConvertedJPGPDF.pdf";
String newFileName = "test.pdf";
byte[] input_file =
Files.readAllBytes(Paths.get(filePath+originalFileName));
// byte[] decodedBytes = Base64.getDecoder().decode(input_file);
byte[] decodedBytes1 = Base64.getMimeDecoder().decode(input_file);
FileOutputStream fos = new FileOutputStream(filePath+newFileName);
fos.write(decodedBytes1);
fos.flush();
fos.close();
You mentioned that the file contains base64 encoded string already.
ConvertedJPGPDF.pdf file used below contains Base64 encoded String.
So, you don't need to run this line:
byte[] encodedBytes = Base64.getEncoder().encode(input_file);
By doing so, you are trying to encode those bytes again.
Directly decode the input_file array and then save the obtained byte array into a .pdf file.
Update:
The ConvertedJPGPDF.pdf doesn't really have to be named .pdf. It's really a plain text file considering that it is base 64 encoded.
Anyway, the following piece of code is working for me:
String filePath = "C:\\Users\\xyz\\Desktop\\";
String originalFileName = "ConvertedJPGPDF.pdf";
String newFileName = "test.pdf";
byte[] input_file = Files.readAllBytes(Paths.get(filePath+originalFileName));
byte[] decodedBytes1 = Base64.getMimeDecoder().decode(input_file);
Files.write(Paths.get(filePath+newFileName), decodedBytes1);
Hope this helps!

Convert byte[] to String and back

I'm trying to save content of a pdf file in a json and thought of saving the pdf as String value converted from byte[].
byte[] byteArray = feature.convertPdfToByteArray(Paths.get("path.pdf"));
String byteString = new String(byteArray, StandardCharsets.UTF_8);
byte[] newByteArray = byteString.getBytes(StandardCharsets.UTF_8);
String secondString = new String(newByteArray, StandardCharsets.UTF_8);
System.out.println(secondString.equals(byteString));
System.out.println(Arrays.equals(byteArray, newByteArray));
System.out.println(byteArray.length + " vs " + newByteArray.length);
The result of the above code is as follows:
true
false
421371 vs 760998
The two String's are equal while the two byte[]s are not. Why is that and how to correctly convert/save a pdf inside a json?
You are probably using the wrong charset when reading from the PDF file.
For example, the character é (e with acute) does not exists in ISO-8859-1 :
byte[] byteArray = "é".getBytes(StandardCharsets.ISO_8859_1);
String byteString = new String(byteArray, StandardCharsets.UTF_8);
byte[] newByteArray = byteString.getBytes(StandardCharsets.UTF_8);
String secondString = new String(newByteArray, StandardCharsets.UTF_8);
System.out.println(secondString.equals(byteString));
System.out.println(Arrays.equals(byteArray, newByteArray));
System.out.println(byteArray.length + " vs " + newByteArray.length);
Output :
true
false
1 vs 3
Why is that
If the byteArray indeed contains a PDF, it most likely is not valid UTF-8. Thus, wherever
String byteString = new String(byteArray, StandardCharsets.UTF_8);
stumbles over a byte sequence which is not valid UTF-8, it will replace that by a Unicode replacement character. I.e. this line damages your data, most likely beyond repair. So the following
byte[] newByteArray = byteString.getBytes(StandardCharsets.UTF_8);
does not result in the original byte array but instead a damaged version of it.
The newByteArray, on the other hand, is the result of UTF-8 encoding a given string, byteString. Thus, newByteArray is valid UTF-8 and
String secondString = new String(newByteArray, StandardCharsets.UTF_8);
does not need to replace anything outside the UTF-8 mappings, in particular byteString and secondString are equal.
how to correctly convert/save a pdf inside a json?
As #mammago explained in his comment,
JSON is not the appropriate format for binary content (like files). You should propably use something like base64 to create a string out of your PDF and store that in your JSON object.

Issues in converting base64 decoded byte array to String in java

Issues in converting base64 decoded byte array to String in java :
public static String decode(String strcontent) throws Exception
{
BASE64Decoder decoder = new BASE64Decoder();
byte[] imgBytes = decoder.decodeBuffer(strcontent);
return new String(imgBytes);
}
With the above code; was trying to create a string out of the Base 64 decoded byte array (imgBytes ) & input strcontent is base 64 encoded string. For text files it working fine , but for PDF and image files the string conversion is having issues. Have tried different encoding as UTF-8 , UTF 16 etc. But no use. The returned string is different than the original one.
When tried to write the byte array to a file like :
OutputStream out = new FileOutputStream(##path);
out.write(imgBytes);
out.close();
File is getting created properly without any issues.
I tried the below code:
byte[] imgBytes= ( new String(imgBytes1)).getBytes(); //Converting to String and back to bytes
OutputStream out = new FileOutputStream(##Filename);
out.write(imgBytes); out.close();
This time the image file is corrupted.
Please suggest.

Categories

Resources