I'm attempting to fix a character encoding issue. I realize this is really not a good way to go about it but currently I am just going to bandage it up and when character encoding comes up in a new to do list I will bring a proper solution.
anyway currently i've fixed a character encoding issue with french characters by doing this in the action:
String folderName = request.getParameter(PK_FOLDER_NAME);
if (response.getCharacterEncoding().equals("ISO-8859-1") && folderName != null) {
folderName = URLDecoder.decode(new String(folderName.getBytes("ISO-8859-1"), "UTF-8"), "UTF-8");
}
however what is the string is an array? how would i do it?
for example what if string is as such:
String[] memos = request.getParameterValues(PK_MEMO);
how would i convert using the URLDecoder than?
thanks guys...
the answer I was looking for was this (which works):
if (response.getCharacterEncoding().equals("ISO-8859-1") && memos != null) {
for(int n=0; n< memos.length; n++) {
memos[n] = URLDecoder.decode(new String(memos[n].getBytes("ISO-8859-1"), "UTF-8"), "UTF-8");
}
}
You're going about it completely the wrong way.
You're first obtaining the request parameter (and thus it start to get parsed which makes it too late to set the proper encoding for request parameter parsing!) and you're determing the encoding of the response instead of the request. This makes no sense.
Just set the request encoding before ever getting the first parameter. It will then be used during parsing the request parameters for the first time.
request.setCharacterEncoding("UTF-8");
String folderName = request.getParameter(PK_FOLDER_NAME);
String[] memos = request.getParameterValues(PK_MEMO);
// ...
Note that you'd normally like to call request.setCharacterEncoding("UTF-8") in a servlet filter so that you don't need to repeat it over all servlets of your webapp.
The response encoding is normally to be configured on the JSP side by #page pageEncoding on a per-JSP basis or <page-encoding> in web.xml on an application-wide basis.
Don't try to introduce bandages/workarounds, it would only make things worse. Just do it the right way from the beginning on.
See also:
Unicode - How to get the characters right?
Related
In my Rest API it should be possible to retrieve data which is inside a bounding box. Because the bounding box has four coordinates I want to design the GET requests in such way, that they accept the bounding box as JSON. Therefore I need to be able to send and document JSON strings as URL parameter.
The test itself works, but I can not document these requests with Spring RestDocs (1.0.0.RC1). I reproduced the problem with a simpler method. See below:
#Test public void ping_username() throws Exception
{
String query = "name={\"user\":\"Müller\"}";
String encodedQuery = URLEncoder.encode(query, "UTF-8");
mockMvc.perform(get(URI.create("/ping?" + encodedQuery)))
.andExpect(status().isOk())
.andDo(document("ping_username"));
}
When I remove .andDo(document("ping_username")) the test passes.
Stacktrace:
java.lang.IllegalArgumentException: Illegal character in query at index 32: http://localhost:8080/ping?name={"user":"Müller"}
at java.net.URI.create(URI.java:852)
at org.springframework.restdocs.mockmvc.MockMvcOperationRequestFactory.createOperationRequest(MockMvcOperationRequestFactory.java:79)
at org.springframework.restdocs.mockmvc.RestDocumentationResultHandler.handle(RestDocumentationResultHandler.java:93)
at org.springframework.test.web.servlet.MockMvc$1.andDo(MockMvc.java:158)
at application.rest.RestApiTest.ping_username(RestApiTest.java:65)
After I received the suggestion to encode the URL I tried it, but the problem remains.
The String which is used to create the URI in my test is now /ping?name%3D%7B%22user%22%3A%22M%C3%BCller%22%7D.
I checked the class MockMvcOperationRequestFactory which appears in the stacktrace, and in line 79 the following code is executed:
URI.create(getRequestUri(mockRequest)
+ (StringUtils.hasText(queryString) ? "?" + queryString : ""))
The problem here is that a not encoded String is used (in my case http://localhost:8080/ping?name={"user":"Müller"}) and the creation of the URI fails.
Remark:
Andy Wilkinson's answer is the solution for the problem. Although I think that David Sinfield is right and JSONs should be avoided in the URL to keep it simple. For my bounding box I will use a comma separated string, as it is used in WMS 1.1: BBOX=x1,y1,x2,y2
You haven't mentioned the version of Spring REST Docs that you're using, but I would guess that the problem is with URIUtil. I can't tell for certain as I can't see where URIUtil is from.
Anyway, using the JDK's URLEncoder works for me with Spring REST Docs 1.0.0.RC1:
String query = "name={\"user\":\"Müller\"}";
String encodedQuery = URLEncoder.encode(query, "UTF-8");
mockMvc.perform(get(URI.create("/baz?" + encodedQuery)))
.andExpect(status().isOk())
.andDo(document("ping_username"));
You can then use URLDecoder.decode on the server side to get the original JSON:
URLDecoder.decode(request.getQueryString(), "UTF-8")
The problem is that URIs have to be encoded as ACII. And ü is not a valid ASCII character, so it must be escaped in the url with % escaping.
If you are using Tomcat, you can use URIEncoding="UTF-8" in the Connector element of the server.xml, to configure UTF-8 escaping as default. If you do this, ü will be automatically converted to %C3%BC, which is the ASCII representation of the \uC3BC Unicode code-point (which represents ü).
Edit: It seems that I have missed the exact point of the error, but it is still the same error. Curly braces are invalid in a URI. Only the following characters are acceptable according to RFC 3986:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]#!$&'()*+,;=%
So these must be escaped too.
I am trying to submit a form with fields containing special characters, such as €ŠšŽžŒœŸ. As far as I can see from the ISO-8859-15 wikipedia page, these characters are included in the standard. Even though the encoding for both request and response is set to the ISO-8859-15, when I am trying to display the values (using FreeMarker 2.3.18 in a JAVA EE environment), the values are ???????. I have set the form's accepted charset to ISO-8859-15, I have checked that the form is submitted with content-type text/html;charset=ISO-8859-15 (using firebug) but I can't figure out how to display the correct characters. If I am running the following code, the correct hex value is displayed (ex: Ÿ = be).
What am I missing? Thank you in advance!
System.out.println(Integer.toHexString(myString.charAt(i)));
EDIT:
I am having the following code as I process the request:
PrintStream ps = new PrintStream(System.out, true, "ISO-8859-15");
String firstName = request.getParameter("firstName");
// check for null before
for (int i = 0; i < firstName.length(); i++) {
ps.println(firstName.charAt(i)); // prints "?"
}
BufferedWriter file=new BufferedWriter(new OutputStreamWriter(new FileOutputStream(path), "ISO-8859-15"));
file.write(firstName); // writes "?" to file (checked with notepad++, correct encoding set)
file.close();
According to the hex value, the form data is submitted correctly.
The problem seems to be related to the output. Java replaces a character with ? if it cannot be represented with the charset in use.
You have to use a correct charset when constructing the output stream. What commands do you use for that? I do not know FreeMarker but there will probably be something like
Writer out = new OutputStreamWriter(System.out);
This should be replaced with something resembling this:
Writer out = new OutputStreamWriter(System.out, "iso-8859-15");
By the way, UTF-8 is usually much better choice for the encoding charset.
I'm processing MMS and got it text part as :
mmsBodyPart.getContent();
it's simpy Object. Now i need to convert it to String using utf-8. I have tried:
String contentText = (String) mmsBodyPart.getContent();
but it doesn't works with specyfics characters and some strange chars appear.
Also i tried :
String content = new String(contentText.getBytes("UTF-8"), "UTF-8"));
not a mystery that also failed.
How that can be done ?
EDIT: Problem was caused by bad encoding in file. Nothing wrong was in code, ya didn't thought about it in first place...
Strings haven't an Encoding in Java. If you need one, you should use byte[] with Encoding to get a String
I'm seeking a better way to extract data from a String that contains a HTTP header. For example, I'd like to get the number 160 from the content length portion of the string: "Content-Length: 160\r\n".
It appears that all the data in the HTTP header is preceded with a name, colon and space, and after the value immediately follows the '\r' and '\n' characters.
At the moment I am doing this:
int contentLengthIndex = serverHeader.lastIndexOf("Content-Length: ");
int contentLastIndex = serverHeader.length()-1;
String contentText = serverHeader.substring(contentLengthIndex + 16, contentLastIndex);
contentText = contentText.replaceAll("(\\r|\\n)", "");
int contentLength = Integer.parseInt(contentText);
But it seems messy and it is only good for getting the "Content-Length" at the end of the string. Is there a better more universal solution for extracting values from a String containing a HTTP header that can be adjusted to work for obtaining both int values or String values?
I should also mention that the connection needs to be able return data back to the browser after a request, which from my understanding prevents me from reaping the benefits of using HttpURLConnection.
A quick solution will be:
new Scanner(serverHeader).useDelimiter("[^\\d]+").nextInt());
The other way if you want to create a Hashtable of the headers:
String[] fields = serverHeader.trim().split(":");
String key = fields[0].trim();
String value = fields[1].trim();
I am not sure why you are doing this manual, there is already API for this!
use Class java.net.HttpURLConnection
edited: also methods URLConnection.getContentLength() and URLConnection.getContentLengthLong()
Have you tried just stripping all non-numeric characters from the string?
serverHeader = serverHeader.replaceAll("[^\\d.]", "");
If you are using Socket class to read HTTP data i suggest you to use HttpURLConnection as it provides you convenient method to parse the Header Fields.
It has public Map getHeaderFields() method which you can use to get all the fields.
If you want a guide to start using HttpURLConnection you can see it here.
I get a sites source code by Java and assign it to a string. But when i see content of that string there ara ? instead of ç,ş,İ,ğ. Hope you can help me.
DataInputStream.readLine is capable of reading latin1-encoded text only. The characters you want are not in latin1 so the page must have some different encoding, such as UTF-8.
Assuming the page is encoded in UTF-8 you can read it if you substitute the part where you declare and initialize the variable in with the following:
Reader in = null;
try {
in = new BufferedReader(new InputStreamReader(u.getInputStream(), "UTF-8"));
If you don't know the page encoding beforehand you may be able to use the URLConnection.getContentEncoding() method to find out. This method returns the encoding declared i the HTTP header Content-Type. If the content type does not have the encoding you just have to guess.