Getting Binary HTTP Post parameter in an Java/Tomcat/HttpServlet - java

I have a binary value being URL Encoded, and then POSTed to an HttpServlet. The following code shows how I first attempted to extract this data. Very simple except that the result is a String, not bytes.
This seemed to work at first, except that an extra byte appeared three bytes from the end. What I eventually figured out was that my data was being treated as Unicode and converted from one Unicode encoding to UTF-8.
So, other that getting the entire post body and parsing it myself, how can I extract my data without treating it as a string after the url encoding is decoded? Have I misunderstood the specs for posted data in general, or is this a Java/Tomcat specific issue?
protected void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
// Receive/Parse the request
String requestStr = request.getParameter("request");
byte[] rawRequestMsg = requestStr.getBytes();
Here is a snippet of the Python test script I'm using for the request:
urlRequest = urllib.urlencode( {'request': rawRequest} )
connection = urllib.urlopen(self.url, data = urlRequest)
result = connection.readlines()
connection.close()

There are two possible solutions:
ASCII-encode your data before POSTing it. Base64 would be a sensible choice. Decode it in your servlet and you have your original binary again.
Use form content type multipart/form-data ( http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4 ) to encode your binary data as a stream of bytes; then your servlet can do servletRequest.getReader() to read the data in, again as a binary stream.

I think this should work (it treats request as a single-byte encoding, so transformation to String is completely reversible):
String someSingleByteEncoding = "ISO-8859-1";
request.setCharacterEncoding(someSingleByteEncoding);
String requestStr = request.getParameter("request");
byte[] rawRequestMsg = requestStr.getBytes(someSingleByteEncoding);

you can do this with a servlet wrapper (HttpServletRequestWrapper)... catch the request and snatch the request body before its decoded
but the best way is probably to send the data as a file upload (multipart/form-data content type)

Related

Headers getting added to file content while retrieving file from APIGatewayProxyRequestEvent in AWS lambda

I am using AWS Lambda to push the file to S3 through Java code.
While sending the file from Postman or from Angular I am trying to print the content of file in Java functions. While doing so headers are getting added to the file content automatically like:
"----------------------------965855468995803568737630
Content-Disposition: form-data; name="test"; filename="test.pdf"
Content-Type: application/pdf"
.
How to get the file content without headers from APIGatewayProxyRequestEvent?.
This is code am using to print the file content.
context.getLogger().log("Input File: "+apiGatewayProxyRequestEvent.getBody());
This is a tricky one for you to solve. The method getBody() will give you the actual request body that is sent through the APIGatewayProxyRequest so it's going to give you back what is sent through, which is the file encoded as form-data with a Content-Type and a filename. The responsibility lies on you to convert the form-data back into an understandable object format if you wan to print the content.
If you have a look at this tutorial on Medium you can see an approach to this. It boils down to processing the data and working with the format boundary:
//Get the uploaded file and decode from base64
byte[] bI = Base64.decodeBase64(event.getBody().getBytes());
//Get the content-type header and extract the boundary
Map<String, String> hps = event.getHeaders();
if (hps != null) {
contentType = hps.get("content-type");
}
String[] boundaryArray = contentType.split("=");
//Transform the boundary to a byte array
byte[] boundary = boundaryArray[1].getBytes();
//Log the extraction for verification purposes
logger.log(new String(bI, "UTF-8") + "\n");
That last line will get you what you want, which is printing the body content, obviously if it's a binary format that might not be very useful for you. I'd recommend giving that tutorial a full read as it will help show you how to iterate through the data stream and create the object.

How to ensure that the JSON string is UTF-8 encoded in Java

I am working on a legacy web service client code where the JSON data is being sent to the web service. Recently it was found that for some requests in the JSON body, the service is giving HTTP 400 response due to invalid characters (non-UTF8) in the JSON Body.
Below is one example of the data which is causing the issue.
String value = "zu3z5eq tô‰U\f‹Á‹€z";
I am using org.json.JSONObject.toString() method to generate the JSON string. Can you please let me know how can I ensure that the JSON string is UTF-8 encoded?
I already tried few solutions like available online , like converting to byte array and then back, using java charset methods etc, but they did not work. Either they convert the valid values as well like chinese/japanese characters, or doesn't work at all.
Can you please provide some input on this?
You need to set the character encoding for OutputStreamWriter when you create it:
httpConn.connect();
wr = new OutputStreamWriter(httpConn.getOutputStream(), StandardCharsets.UTF_8);
wr.write(jsonObject.toString());
wr.flush();
Otherwise it defaults to the "platform default encoding," which is some encoding that has been used historically for text files on whatever system you are running.
Use Base64 encoding for converting the value to Byte[].
String value = "zu3z5eq tô‰U\f‹Á‹€z";
// WHILE SENDING ENCODE THE VALUE
byte[] encodedBytes = Base64.getEncoder().encode(value.getBytes("UTF-8"));
String encodedValue = new String(encodedBytes, "UTF-8");
// TRANSPORT....
// ON RECEIVING END DECODE THE VALUE
byte[] decodedBytes = Base64.getDecoder().decode(encodedValue.getBytes("UTF-8"));
System.out.println( new String(decodedBytes, "UTF-8"));

How to transfer binary pdf data when company won't accept Base64, but insists on JSON API calls

Seriously.
I've been scratching around trying to find the answer to this conundrum for a while.
The request size is too large if the String is encoded, and the company won't take Base64 anyway. They actually want the binary code, but in JSON. Can anyone shed any light on how they think that other people might do this? Currently I'm processing it like this;
String addressProof = null;
if (proofRequired)
{
Part filePart = request.getPart("proof_of_address");
addressFileName = getSubmittedFileName(filePart);
InputStream fileContent = filePart.getInputStream();
final byte[] bytes = IOUtils.toByteArray(fileContent);
addressProof = new String(bytes);
//byte[] bytes64 = Base64.encodeBase64(fileBytes);
//addressProof = new String(fileBytes);
fileContent.close();
}
Am I being dim, or is this whole request, just a little bit flawed.
Many thanks.
You can send it (or receive) as a hex string. See how-to-convert-a-byte-array-to-a-hex-string-in-java.
Example output would be (if enclosed by a JSON object):
{
"content": "C5192E4190E54F5985ED09C6CD0D4BCC"
}
or just plain hex string: "C5192E4190E54F5985ED09C6CD0D4BCC"
You don't have to write it (or read) all at once. You can open two streams (in and out) and then stream the data. From file to response output stream or from request input stream to file.
Sorry but I am not sure if You want to send the bytes or receive them.

JAX-RS and character encoding problems

I am using Jax RS and have simple POST WS, that takes InputStream, that contains MIME message (xml + file).
The MIME message is in UTF-8, file contained as a body part is an email message in MIME RFC 822 in ISO-8859-1 encoding, that I'm converting to PDF using Aspose.
When running as a webservice, the resulting PDF has incorrect characters (ø, å etc.). But when I tried to use the exact input, but reading it from file instead and call the method with FileInputStream, the resulting PDF is OK.
Here is the simplified version of the code:
#POST
#Path(value = "/documents/convert/{flag}")
#Produces("text/plain")
public String convertFile(InputStream input, #PathParam("flag") String flag) throws WebApplicationException {
FileInfo info = convertToPdf(input);
return info.getResponse();
}
If I run this as webservice it produces PDF with incorrectly encoded characters with "box" instead of some charcters (such as ø, å etc.). When I run the the same code with the same input by by calling
FileInputStream fis = new FileInputStream(file);
convertFile(fis);
the resulting PDF has correct encoding (the WS is run on server, testing with file is done on my local machine).
Could this be incorrect setting of locale on the server?
Do you use an InputStreamReader to read the FileInputStream ? If so, did you initialize it using the 2-parameters constructor, with CharSet.forName("UTF-8") as the second argument ? (as you mentionned the incoming stream is already in UTF-8) ?
You might need to tell the container that it's UTF-8.
something like...
#Produces("text/plain; charset=utf-8")
Apparently your local file and you MIME message body are not encoded the same way.
Your post states that the file is encoded in ISO-8859-1.
If you are using an InputStreamReader (as Xavier Coulon's is suggesting) you should pass the expected encoding to it. In this case
CharSet.forName("ISO-8859-1")
If this does not help, could you please provide the content of the convertToPdf(InputStream is) method

Convert and Display the UTF8 Encoded String

I have a JSON response which i want to store in DB and display in text view or edit text. This json response is encoded by UTF-8 format.
Response is somthing like
"currencies": [[0,"RUR"," ",1,0],[1,"EUR","â¬",1.44,100],[2,"GBP","£",1.6,100],[3,"JPY","Â¥",0.0125,100],[4,"AUD","$",1.1,100]]}
where â¬,£,Â¥ are currency symbol. I have to decode this and then display. This symbols are symbol in Unicode (transferrred as UTF8). How can I convert this encoded symbol. Plz help.
I tried this but it didnt works:
byte[] b = stringSymbol.getBytes("UTF-8"); // â¬,£,Â¥
final String str = new String(b);
You're showing the text with non-currency symbols... it's as if you're taking the original text, then encoding that as UTF-8, then decoding it as ISO-8859-1.
It's just text - you shouldn't need to do anything to it afterwards, and you should never see it in this broken format. If you have to convert the text back to bytes and then to a string again, that means you've already lost, basically.
Check the headers on the HTTP response which returns the JSON - I suspect you'll find that it's claiming the data is ISO-8859-1 rather than UTF-8. The actual encoding has to match the encoding that's specified in the headers, otherwise you end up with this sort of effect.
Another possibility is that whatever's returning the JSON is accurately giving you the data that it knows about, and that the data is broken upstream. You should follow the data step by step (assuming you own all the links in the chain) until you can see where you're first encountering this brokenness.

Categories

Resources