I send http head request with URLConnection and got header value Content-Disposition Unreadable value like bellow.
Content-Disposition: attachment; filename="৩টি ধাপে সহজেই আতà§à¦¬à¦¬à¦¿à¦¶à§à¦¬à¦¾à¦¸à§€ হয়ে উঠà§à¦¨ | Motivational Video in Bangla.mp4"
How to resolve this text ৩টি ধাপে সহজেই আতà§à¦¬à¦¬à¦¿à¦¶à§à¦¬à¦¾à¦¸à§€ হয়ে উঠà§à¦¨ to ৩টি ধাপে সহজেই আত্ববিশ্বাসী হয়ে উঠুন
Your issue is that the response comes in a.. non-typical, Bengali encoding. I couldn't find the exact one, but seems to be something close to "Windows-1252".
Running the below code gives me the following output, having issues with some composite characters:
public static void main(String[] args) throws UnsupportedEncodingException {
var source = "৩টি ধাপে সহজেই আতà§à¦¬à¦¬à¦¿à¦¶à§à¦¬à¦¾à¦¸à§€ হয়ে উঠà§à¦¨";
var bytes = source.getBytes("Windows-1252");
System.out.println("Expected: " + "৩টি ধাপে সহজেই আত্ববিশ্বাসী হয়ে উঠুন");
System.out.println("Actual : " + new String(bytes, StandardCharsets.UTF_8));
}
Expected: ৩টি ধাপে সহজেই আত্ববিশ্বাসী হয়ে উঠুন
Actual : ৩টি ধাপে সহজেই আত�ববিশ�বাসী হয়ে উ� �ন
The solution may be to find the right decoder for this encoding of Bengali text so you can convert it to Unicode.
Best of luck!
Related
When I try to output string in Java like this:
System.output.println("Привет");
Console output shows me this result:
Привет
I have a REST API method where I receive string from outside request. When I send exact same Привет string with UTF-8 encoding and try to output it like this:
post("/check", (req, res) -> {
receivedString = req.body();
}
System.ouput.println(receivedString);
It shows this:
������
What I need to do in order to turn this questionmark thing into proper readable string?
You can try with ...
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(receivedString);
I am trying to implement http/1.0 in a project with a website that's loaded with a serversocket i've coded. It works fine with character based files. But with image files that i've specified to return the base64 encoded version of the image doesn't work even though the right headers are set such as content-type: image/png and content-transfer-encoding: base64 RFC 2045. I've tried to look at the packets from chrome's networking tool and it looks like it's treating it as a document event though it's an image file. I have no clue whatsoever to do since i've been stuck on this issue for a couple of DAYS! I've searched all of stackoverflow, all of google and i am basically stuck.
I posted this question a day or 2 ago where it was recommended to use a byte reader (which i've also tried) without luck. Any visual inputs are of great appreciation.
I have 2 methods that are relevant.
The first one is the one where i choose the way to read the file depending on if it's an image or text.
public String readUri(String reqUri) {
returnFile = "";
if (this.fileExists(reqUri)) {
fileType = this.fileType(reqUri); // returns e.g image from image/png
if (fileType.equals("text")) {
// bufferedreader ...
} else if (fileType.equals("image")) {
File imgPath = new File(reqUri);
try {
FileInputStream fileInputStreamReader = new FileInputStream(imgPath);
byte[] bytes = new byte[(int)imgPath.length()];
fileInputStreamReader.read(bytes);
returnFile = Base64.getEncoder().encodeToString(bytes);
fileInputStreamReader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return returnFile;
}
The second one collects this data from the above method. This method is called in my get request controller and sends back the data to the client through the serversocket.
StringBuilder response = new StringBuilder();
public String response(
String HTTPVersion, int statusCode, String fileContent, String contentType) {
response.append(
HTTPVersion + " " +
statusCode + " " +
this.getHTTPStatusText(statusCode) + "\n"
);
response.append("Content-transfer-encoding: BASE64");
response.append("Content-Type: " + contentType + "\n");
response.append("content-length: " + fileContent.length() + "\n");
response.append("Date: " + date() + "\n");
response.append("\n");
response.append(fileContent + "\n");
return response.toString();
}
Here is a request/response from chromes networking tool:
This is how the image is currently loaded with the base64 encoding:
HTTP IS NOT MIME
RFC 2045 is MIME, and although HTTP is similar in some respects to MIME, it is not MIME, and it differs in other respects. In particular it DOES NOT USE Content-Transfer-Encoding. It DOES USE Content-Encoding with a similar meaning. See https://www.rfc-editor.org/rfc/rfc1945#section-10.3 and https://www.rfc-editor.org/rfc/rfc1945#appendix-C.3 et seq.
Also, you are terminating the lines of the response header with only Java \n which is LF. The standards call for CR LF (Java \r\n) and always have. Some receivers are tolerant, following Postel's dictum, but you shouldn't rely on that. And worse your code doesn't appear to terminate the CTE line at all, although since Chrome parsed it okay I'm guessing you just posted the wrong code. Also you should NOT add a line terminator after the body that isn't counted in Content-Length, although if you are using original HTTP/1.0, i.e. without keepalive, this won't matter, because there can't be another request and response on the same transport connection.
From Angular, there is one parameter and the value of that parameter is Ébénisterie but when I print the value of that variable in java then I got Ã?bénisterie can you please let me know how I can convert it to original text Ébénisterie? Which Encode/decode I have to apply?
I have tried the following thing.
new String(readable.getBytes("ISO-8859-15"), "UTF-8");
new String(readable.getBytes("UTF-8"), "ISO-8859-15");
but it's not working.
String readable ="�bénisterie Distinction";
String test = null;
try {
test = new String(readable.getBytes("ISO-8859-15"), "UTF-8");
System.out.println("test"+test);
} catch (UnsupportedEncodingException e) {
}
Expected: Ébénisterie
Actual: �bénisterie
After long research didn't find anything.
So got one solution in mind that BASE64 Encode decode so now from Angularjs sending encoded text and In java side, I have decoded the text.
Here, is the sample code
Angularjs
window.btoa("Ébénisterie")
JAVA
String actualString= new String(Base64.getDecoder().decode("ENCODED STRING"));
I try to send a multipart form data with a file by using only javascript. I write the request myself. So my javascript code is the following :
var data =
'------------f8n51w2QYCsvNftihodgfJ\n' +
'Content-Disposition: form-data; name="upload-id"\n' +
'\n' +
'uploadedFiles\n' +
'------------f8n51w2QYCsvNftihodgfJ\n' +
'Content-Disposition: form-data; name="file"; filename="doc1.txt"\n' +
'Content-Type: text/plain\n' +
'\n' +
'azerty\n' +
'------------f8n51w2QYCsvNftihodgfJ--\n';
var xhr = new XMLHttpRequest();
xhr.open('POST', '/upload');
xhr.setRequestHeader('Content-Type', 'multipart/form-data; boundary=----------f8n51w2QYCsvNftihodgfJ');
xhr.sendAsBinary(data);
I run this javascript on Firefox 18.
So i got a servlet on /upload. Here's the code :
protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
RequestContext request_context = new ServletRequestContext(request);
boolean is_multipart = ServletFileUpload.isMultipartContent(request_context);
if (is_multipart) {
FileUpload file_upload = new FileUpload(fileItemFactory);
List<FileItem> file_items = file_upload.parseRequest(request_context); // This line crash
}
}
As the comment says, the line file_upload.parseRequest(request_context); crash and throws the following exception :
org.apache.commons.fileupload.MultipartStream$MalformedStreamException: Stream ended unexpectedly
at org.apache.commons.fileupload.MultipartStream.readHeaders(MultipartStream.java:539)
at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.findNextItem(FileUploadBase.java:976)
at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.<init>(FileUploadBase.java:942)
at org.apache.commons.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331)
at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:349)
And i just don't know why i got this exception ... Any idea ?
It seems like MultipartStream can't find the request headers. But if i log the headers, they are all here and they are correct.
My servlet code works with a "normal" form. I tried to log the request body and headers of a normal form, and they are the same (except the boundary, of course).
I also tried to change the data variable with a invalid content. The error is still the same, so there's definitively a problem with my headers but i don't see what.
I found the solution.
\n IS NOT a valid separator for multipart form. You must use \r\n. Now my code works properly.
I don't understand why you use sendAsBinary. If not absolutely necessary I wouldn't assemble the payload (data variable) myself but use FormData.
https://developer.mozilla.org/en-US/docs/DOM/XMLHttpRequest/FormData/Using_FormData_Objects
var oMyForm = new FormData();
oMyForm.append("username", "Groucho");
oMyForm.append("accountnum", 123456); // number 123456 is immediately converted to string "123456"
// HTML file input user's choice...
oMyForm.append("userfile", fileInputElement.files[0]);
// JavaScript file-like object...
var oFileBody = '<a id="a"><b id="b">hey!</b></a>'; // the body of the new file...
var oBlob = new Blob([oFileBody], { type: "text/xml"});
oMyForm.append("webmasterfile", oBlob);
var oReq = new XMLHttpRequest();
oReq.open("POST", "http://foo.com/submitform.php");
oReq.send(oMyForm);
try change f8n51w2QYCsvNftihodgfJ to f8n51w2QYCsvNftihodgfM
I've tried running your code with different random boundaries and turn out only f8n51w2QYCsvNftihodgfJ\n got issue. I reckon you can try a different boundary, since it is really just a random string.
I'm not writing a mail application, so I don't have access to all the headers and such. All I have is something like the block at the end of this question. I've tried using the JavaMail API to parse this, using something like
Session s = Session.getDefaultInstance(new Properties());
InputStream is = new ByteArrayInputStream(<< String to parse >>);
MimeMessage message = new MimeMessage(s, is);
Multipart multipart = (Multipart) message.getContent();
But, it just tells me that message.getContent is a String, not a Multipart or MimeMultipart. Plus, I don't really need all the overhead of the whole JavaMail API, I just need to parse the text into it's parts. Here's an example:
This is a multi-part message in MIME format.\n\n------=_NextPart_000_005D_01CC73D5.3BA43FB0\nContent-Type: text/plain;\n\tcharset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\nStuff:\n\n Please read this stuff at the beginning of each week. =\nFeel free to discuss it throughout the week.\n\n\n--=20\n\nMrs. Suzy M. Smith\n555-555-5555\nsuzy#suzy.com\n------=_NextPart_000_005D_01CC73D5.3BA43FB0\nContent-Type: text/html;\n\tcharset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\n\n\n\n\n\n\n\n\nStuff:\n =20\nPlease read this stuff at the beginning of each =\nweek. Feel=20\nfree to discuss it throughout the week.\n-- Mrs. Suzy M. Smith555-555-5555suzy#suzy.com\n\n------=_NextPart_000_005D_01CC73D5.3BA43FB0--\n\n
First I took your example message and replaced all occurrences of \n with newlines and \t with tabs.
Then I downloaded the JARs from the Mime4J project, a subproject of Apache James, and executed the GUI parsing example org.apache.james.mime4j.samples.tree.MessageTree with the transformed message above as input. And apparently Mime4J was able to parse the message and to extract the HTML message part.
There are a few things wrong with the text you posted.
It is not a valid multi-part mime. Check out wikipedia reference which, while non-normative, is still correct.
The mime boundary is not defined. From the wikipedia example: Content-Type: multipart/mixed; boundary="frontier" shows that the boundary is "frontier". In your example, "----=_NextPart_000_005D_01CC73D5.3BA43FB0" is the boundary, but that can only be determined by scanning the text (i.e. the mime is malformed). You need to instruct the goofball that is passing you the mime content that you also need to know the mime boundary value, which is not defined in a message header. If you get the entire body of the message you will have enough because the body of the message starts with MIME-Version: 1.0 followed by Content-Type: multipart/mixed; boundary="frontier" where frontier will be replaced with the value of the boundary for the encoded mime.
If the person who is sending the body is a goofball (changed from monkey because monkey is too judgemental - my bad DwB), and will not (more likely does not know how to) send the full body, you can derive the boundary by scanning the text for a line that starts and ends with "--" (i.e. --boundary--). Note that I mentioned a "line". The terminal boundary is actually "--boundary--\n".
Finally, the stuff you posted has 2 parts. The first part appears to define substitutions to take place in the second part. If this is true, the Content-Type: of the first part should probably be something other than "text/plain". Perhaps "companyname/substitution-definition" or something like that. This will allow for multiple (as in future enhancements) substitution formats.
Can create MimeMultipart from http request.
javax.mail.internet.MimeMultipart m = new MimeMultipart(new ServletMultipartDataSource(httpRequest));
public class ServletMultipartDataSource implements DataSource {
String contentType;
InputStream inputStream;
public ServletMultipartDataSource(ServletRequest request) throws IOException {
inputStream = new SequenceInputStream(new ByteArrayInputStream("\n".getBytes()), request.getInputStream());
contentType = request.getContentType();
}
public InputStream getInputStream() throws IOException {
return inputStream;
}
public OutputStream getOutputStream() throws IOException {
return null;
}
public String getContentType() {
return contentType;
}
public String getName() {
return "ServletMultipartDataSource";
}
}
For get submitted form parameter need parse BodyPart headers:
public String getStringParameter(String name) throws MessagingException, IOException {
for (int i = 0; i < getCount(); i++) {
BodyPart bodyPart = m.getBodyPart(i);
String[] nameHeader = bodyPart.getHeader("Content-Disposition");
if (nameHeader != null && content instanceof String) {
for (String bodyName : nameHeader) {
if (bodyName.contains("name=\"" + name + "\"")) return String.valueOf(bodyPart.getContent());
}
}
}
return null;
}
If you are using javax.servlet.http.HttpServlet to receive the message, you will have to use HttpServletRequests.getHeaders to obtain the value of the HTTP header content-type. You will then use org.apache.james.mime4j.stream.MimeConfig.setHeadlessParsing to set the MimeConfig with the information so that it can properly process the mime message.
It appears that you are using HttpServletRequest.getInputStream to read the contents of the request. The input stream returned only has the content of the message after the HTTP headers (terminated by a blank line). That is why you have to extract content-type from the HTTP headers and feed it to the parser using setHeadlessParsing.