Damaged Pdf after setting content from a server response

Damaged Pdf after setting content from a server response - java

I am currently making rest calls to a server for signing a pdf document.
I am sending a pdf(binary content) and retrieving the binary content of the signed pdf.
When i get the binary content from the inputStream:
try (InputStream inputStream = conn.getInputStream()) {
if (inputStream != null) {
try (BufferedReader br = new BufferedReader(new InputStreamReader(inputStream))) {
String lines;
while ((lines = br.readLine()) != null) {
output += lines;
}
}
}
}
signedPdf.setBinaryContent(output.getBytes());
(signedPdf is a DTO with byte[] attribute)
but when i try to set the content of the pdf with the content of the response pdf:
ByteArrayOutputStream out = new ByteArrayOutputStream();
out.write(signedPdf);
pdf.setContent(signedPdf);
and try to open it, it says that the pdf is damaged and cannot be repaired.
Anyone encountered something similar? Do i need to set the content-length as well for the output stream?

PDF is binary data. One corrupts the PDF when reading as text (which in Java is always Unicode).
Also it is a waste: a byte as char would double the memory usages, and
there are two conversions: from bytes to String and vice versa, using some encoding.
When converting from UTF-8 even UTF-8 format errors may be raised.
try (InputStream inputStream = conn.getInputStream()) {
if (inputStream != null) {
byte[] content = inputStream.readAllBytes();
signedPdf.setBinaryContent(content);
}
}
Whether to use a BufferedInputStream depends, for instance on the expected PDF size.
Furthermore new String(byte[], Charset) and String.getBytes(Charset) with explicit Charset (like StandardCharsets.UTF_8) are preferable over a default Charset overloaded version. Those use the current platform encoding, and hence delivers non-portable code. Behaving differently on an other platform/computer.

Related

How to unzip file from InputStream

I'm trying to get a zip file from the server.
Im using HttpURLConnection to get InputStream and this is what i have:
myInputStream.toString().getBytes().toString() is equal to [B#4.....
byte[] bytes = Base64.decode(myInputStream.toString(), Base64.DEFAULT);
String string = new String(bytes, "UTF-8");
string == �&ܢ��z�m����y....
I realy tried to unzip this file but nothing works, also there is so many questions, should I use GZIPInputStream or ZipInputStream? Do I have to save this stream as file, or I can work on InputStream
Please help, my boss is getting impatient:O
I have no idea what is in this file i have to find out:)

GZipInputStream and ZipInputStream are two different formats. https://en.wikipedia.org/wiki/Gzip
It is not a good idea to retrieve a string directly from the stream.From an InputStream, you can create a File and write data into it using a FileOutputStream.
Decoding in Base 64 is something else. If your stream has already decoded the format upstream, it's OK; otherwise you have to encapsulate your stream with another input stream that decodes the Base64 format.
The best practice is to use a buffer to avoid memory overflow.
Here is some Kotlin code that decompresses the InputStream zipped into a file. (simpler than java because the management of byte [] is tedious) :
val fileBinaryDecompress = File(..path..)
val outputStream = FileOutputStream(fileBinaryDecompress)
readFromStream(ZipInputStream(myInputStream), BUFFER_SIZE_BYTES,
object : ReadBytes {
override fun read(buffer: ByteArray) {
outputStream.write(buffer)
}
})
outputStream.close()
interface ReadBytes {
/**
* Called after each buffer fill
* #param buffer filled
*/
#Throws(IOException::class)
fun read(buffer: ByteArray)
}
#Throws(IOException::class)
fun readFromStream(inputStream: InputStream, bufferSize: Int, readBytes: ReadBytes) {
val buffer = ByteArray(bufferSize)
var read = 0
while (read != -1) {
read = inputStream.read(buffer, 0, buffer.size)
if (read != -1) {
val optimizedBuffer: ByteArray = if (buffer.size == read) {
buffer
} else {
buffer.copyOf(read)
}
readBytes.read(optimizedBuffer)
}
}
}
If you want to get the file from the server without decompressing it, remove the ZipInputStream() decorator.

Usually, there is no significant difference between GZIPInputStream or ZipInputStream, so if at all, both should work.
Next, you need to identify whether the zipped stream was Base64 encoded, or the some Base64 encoded contents was put into a zipped stream - from what you put to your question, it seems to be the latter option.
So you should try
ZipInputStream zis = new ZipInputStream( myInputStream );
ZipEntry ze = zis.getNextEntry();
InputStream is = zis.getInputStream( ze );
and proceed from there ...

basically by setting inputStream to be GZIPInputStream should be able to read the actual content.
Also for simplicity using IOUtils package from apache.commons makes your life easy
this works for me:
InputStream is ; //initialize you IS
is = new GZIPInputStream(is);
byte[] bytes = IOUtils.toByteArray(is);
String s = new String(bytes);
System.out.println(s);

Who is adding "\n" in Base64 encoded image when I write it in a file? Java

WHAT I'M DOING
I need to send via HTTPS request a JsonArray with some data and images in Base64 encoded strings. This works well if data is stored in memory.
Now, I need to avoid load all data in memory and I'm creating a temporally file in android device with all data that I need to send.
To create the file, I'm writting lots of JsonObjects inside him. Some of this JsonObjects have a field that represents the image. When I detect one, I get the image path and I encode it with Base64 as a String.
UPDATE:
First of all, I inicialize the file and I get the bufferedWriter
File f = new File(privateSincronizePath+File.separator+"upload_"+timefile+".json");
f.createNewFile();
FileWriter fw = new FileWriter(f.getAbsoluteFile(), true);
BufferedWriter bw = new BufferedWriter(fw);
Here is the code that create the image when exists:
JSONObject jobInf = new JSONObject();
jobInf.put("xx", "INSERT");
jobInf.put("xx", c.getString(c.getColumnIndex("xx")));
jobInf.put("xx", ""+c.getInt(c.getColumnIndex("xx")));
jobInf.put("xx", c.getString(c.getColumnIndex("xx")));
JSONObject jsonObject = new JSONObject(docu.filtre(dades, docu.getXx()));
Iterator<?> keys = jsonObject.keys();
boolean updated = false;
while(keys.hasNext() && !updated){
String key = (String)keys.next();
if (key != null && key.equals("path") && key.length() > 0){
jsonObject.put(key, getBase64EncodeFromImage(jsonObject.getString(key)));
updated = true;
}
}
jobInf.put("object", jsonObject);
escriure(bw, ","+jobInf.toString());
Method escriure():
UPDATE: This method is called every time I complete the creation of the JsonObject. Only append JsonObject as String to the file.
private void escriure(BufferedWriter bw, String s) throws IOException {
uploadLength += s.length();
bw.append(s);
}
Finally, when file is created, I'm reading it and, using OutputStream, I'm sending the data as Post parameters to the server:
this.Https_url = new URL(url+"/sync.php?");
HttpsURLConnection con = (HttpsURLConnection) Https_url.openConnection();
con.setReadTimeout(60000);
con.setConnectTimeout(60000);
con.setRequestMethod("POST");
con.setDoInput(true);
con.setDoOutput(true);
con.setFixedLengthStreamingMode(uploadLength);
OutputStream os = con.getOutputStream();
InputStream inp = new FileInputStream(new File(privateSincronizePath+File.separator+"upload_"+timefile+".json"));
byte[] buffer = new byte[1024];
int len;
while ((len = inp.read(buffer)) != -1) {
os.write(buffer, 0, len);
}
inp.close();
os.close();
// Establim la connexió:
con.connect();
WHAT IS THE PROBLEM
The problem is simple. When i open the image in the server, the file is corrupted and doesn't show the image.
WHAT I NOTICED
If I capture the image Base64 string encoded, before write in the file, and uploads it in the server, the image is Ok! Then, after write in the file, the image seems to be corrupted.
After investigate, I noticed that the Base64 encoded string after be written in the file, it have a lot of "\n" every x characters.
If I delete all of this "\n" or breaklines, the image can be opened by the server correctly.
THE QUESTION
Who is putting this breaklines? How can I write the Base64 encoded as String "as is" ?
Thanks all for your help in advance!
THE ANSWER
As Schoenobates answer, the solution was use the flag NO_WRAP.
To add more information, we put in the server side this function to read the encoded Base64 string with flag URL_SAFE
The function, obtained in on coment of TOM in php.net is:
<?php
function base64url_decode($base64url)
{
$base64 = strtr($base64url, '-_', '+/');
$plainText = base64_decode($base64);
return ($plainText);
}
?>
Thanks to StackOverflow, Axel and Schoenobates for your time!

That'll be the Android Base64 class - you need to set the flags to remove the newlines:
byte[] image = ...;
Base64.encodeToString(image, Base64.NO_WRAP | Base64.URL_SAFE);

inputStream and utf 8 sometimes shows "?" characters

So I've been dealing with this problem for over a months now and I also checked almost every possible related solution over here in and over google but I couldn't find anything that really solved my case.
my problem is that i'm trying to download an html source from a website but what i'm getting in most cases is that some of the text shows some "?" characters in it,most likely beacuse the site is in Hebrew.
Here's my code,
public static InputStream openHttpGetConnection(String url)
throws Exception {
InputStream inputStream = null;
HttpClient httpClient = new DefaultHttpClient();
HttpResponse httpResponse = httpClient.execute(new HttpGet(url));
inputStream = httpResponse.getEntity().getContent();
return inputStream;
}
public static String downloadSource(String url) {
int BUFFER_SIZE = 1024;
InputStream inputStream = null;
try {
inputStream = openHttpGetConnection(url);
} catch (Exception e) {
// TODO: handle exception
}
int bytesRead;
String str = "";
byte[] inpputBuffer = new byte[BUFFER_SIZE];
try {
while ((bytesRead = inputStream.read(inpputBuffer)) > 0) {
String read = new String(inpputBuffer, 0, bytesRead,"UTF-8");
str +=read;
}
} catch (Exception e) {
// TODO: handle exception
}
return str;
}
Thanks.

To read characters from a byte stream with a given encoding, use a Reader. In your case it would be something like:
InputStreamReader isr = new InputStreamReader(inpputStream, "UTF-8");
char[] inputBuffer = new char[BUFFER_SIZE];
while ((charsRead = isr.read(inputBuffer, 0, BUFFER_SIZE)) > 0) {
String read = new String(inputBuffer, 0, charsRead);
str += read;
}
You can see that the bytes will be read in directly as characters --- it's the reader's problem to know if it needs to read one or two bytes, e.g., to create the character in the buffer. It's basically your approach but decoding as the bytes are being read in, instead of after.

Converting an InputStream to a String entails specifying an encoding, just as you do at new String(inpputBuffer, 0, bytesRead,"UTF-8");.
But your approach as several drawbacks.
How do you know you have to use UTF8 ?
When retreiving HTTP content, generally speaking, you can not know in advance what encoding will be used in the HTTP response. But HTTP provides a mechanism for specifying that, using the Content-Type header.
More specifically, your response object should have a Content-Type "header", that has an "attribute" called encoding. In the response, it should look something like :
Content-Type: text/html; encoding=UTF-8
You should use whatever is after the encoding= part to transform your bytes to chars.
Seeing you seem to use Apache HTTPClient, their documentation states :
You can set the content type header for a request with the addRequestHeader method in each method and retrieve the encoding for the response body with the getResponseCharSet method.
If the response is known to be a String, you can use the getResponseBodyAsString method which will automatically use the encoding specified in the Content-Type header or ISO-8859-1 if no charset is specified..
Alternate way
If there is no Content-Type header, and if you know your content is HTML, then you can try to convert it as a String using some encoding (UTF or ISO Latin preferably), and try to find some content matching <meta charset="UTF-8">, and use that as the charset. This should only be a fail-over.
Any byte sequence is not convertible to a String
Drawback number two is that you read any number of bytes from your stream, and try to convert it to a String, which may not be possible.
In practice, UTF-8 can encode some "characters" across several bytes. For example "é" can be encoded as 0xC3A9. So say for example that the response consists of two "é" characters. If your first call to read returns :
[c3, a9, c3]
Your conversion to a String using new String(byte[], off, enc) will leave the last byte apart, because it does not match a valid UTF8 sequence.
Your following read will get what's left to read
[a9]
Which is (whatever that is) not a "é" character.
Bottom line : you can not convert even a valid UTF-8 sequence to byte using your pattern.
Going forward : you use HTTPClient, use their method of HTTP Response to String conversion.
If you wish to do it yourself, the easy way is to copy your input to a byte array, and then convert the byte array. Something along the lines of (pseudo code) :
ByteArrayOutputStream responseContent = new ByteArrayOutputStream()
copyAllBytes(responseInputStream, responseContent)
byte[] rawResponse = responseContent.toByteArray();
String stringResponse = new String(rawResponse, encoding);
But you could also use a CharsetDecoder if you want a fully streamed implementation (one that does not buffer the response fully into memory), or as #jas answers, wrap your inputStream to a reader and concatenate the output (preferably into a StringBuilder, which should be faster if a high number of concatenation is to occur).

base64 decoded file is not equal to the original unencoded file

I have a normal pdf file A.pdf , a third party encodes the file in base64 and sends it to me in a webservice as a long string (i have no control on the third party).
My problem is that when i decode the string with java org.apache.commons.codec.binary.Base64 and right the output to a file called B.pdf
I expect B.pdf to be identical to A.pdf, but B.pdf turns out a little different then A.pdf. As a result B.pdf is not recognized as a valid pdf by acrobat.
Does base64 have different types of encoding\charset mechanisms? can i detect how the string I received is encoded so that B.pdf=A.pdf ?
EDIT- this is the file I want to decode, after decoding it should open as a pdf
my encoded file
this is the header of the files opened in notepad++
**A.pdf**
%PDF-1.4
%±²³´
%Created by Wnv/EP PDF Tools v6.1
1 0 obj
<<
/PageMode /UseNone
/ViewerPreferences 2 0 R
/Type /Catalog
**B.pdf**
%PDF-1.4
%±²³´
%Created by Wnv/EP PDF Tools v6.1
1 0! bj
<<
/PageMode /UseNone
/ViewerPreferences 2 0 R
/]
pe /Catalog
this is how I decode the string
private static void decodeStringToFile(String encodedInputStr,
String outputFileName) throws IOException {
BufferedReader in = null;
BufferedOutputStream out = null;
try {
in = new BufferedReader(new StringReader(encodedInputStr));
out = new BufferedOutputStream(new FileOutputStream(outputFileName));
decodeStream(in, out);
out.flush();
} finally {
if (in != null)
in.close();
if (out != null)
out.close();
}
}
private static void decodeStream(BufferedReader in, OutputStream out)
throws IOException {
while (true) {
String s = in.readLine();
if (s == null)
break;
//System.out.println(s);
byte[] buf = Base64.decodeBase64(s);
out.write(buf);
}
}

You are breaking your decoding by working line-by-line. Base64 decoders simply ignore whitespace, which means that a byte in the original content could very well be broken into two Base64 text lines. You should concatenate all the lines together and decode the file in one go.
Prefer using byte[] rather than String when supplying content to the Base64 class methods. String implies character set encoding, which may not do what you want.

Corrupt Gzip string due to character encoding

I have some corrupted Gzip log files that I'm trying to restore. The files were transfered to our servers through a Java backed web page. The files have always been sent as plain text, but we recently started to receive log files that were Gzipped. These Gzipped files appear to be corrupted, and are not unzip-able, and the originals have been deleted. I believe this is from the character encoding in the method below.
Is there any way to revert the process to restore the files to their original zipped format? I have the resulting Strings binary array data in a database blob.
Thanks for any help you can give!
private String convertStreamToString(InputStream is) throws IOException {
/*
* To convert the InputStream to String we use the
* Reader.read(char[] buffer) method. We iterate until the
* Reader return -1 which means there's no more data to
* read. We use the StringWriter class to produce the string.
*/
if (is != null) {
Writer writer = new StringWriter();
char[] buffer = new char[1024];
try {
Reader reader = new BufferedReader(
new InputStreamReader(is, "UTF-8"));
int n;
while ((n = reader.read(buffer)) != -1) {
writer.write(buffer, 0, n);
}
} finally {
is.close();
}
return writer.toString();
} else {
return "";
}
}

If this is the method that was used to convert the InputStream to a String, then your data is almost certainly lost.
The problem is that UTF-8 has quite a few byte sequences that are simply not legal (i.e. they don't represent any value). These sequences will be replaced with the Unicode replacement character.
That character is the same no matter which invalid byte sequence was decoded. Therefore the specific information in those bytes is lost.

If that's the code you have you never should have converted to a Reader (or in fact a String). Only preserving as a Stream (or byte array) would avoid corrupting binary files. And once it's read into the string....illegal sequences (and there are many in utf-8) WILL be discarded.
So no, unless you are quite lucky, there is no way to recover the info. You'll have to provide another process where you process the pure stream and insert as a pure BLOB not a CLOB

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.