I'm working on a legacy system, and i ran into this piece of code that i can't make sense of.
String note = URLDecoder.decode(URLEncoder.encode(
message.replaceAll("\\<.*?\\>", ""),
"UTF-8").replace("%0D%0A", "<br>"), "UTF-8");
What does this do, and why is it being encoded and then again decoded.
FYI: This "message" is appended to an email which is sent.
1st replace to enter is replacing CRLF (carret return and line feed symbols) with <br> tag.
2nd replaceAll removes all tags (like <tag>).
That UTF-8 is the charset encoding used to decode/encode raw bytes into actual characters. WWWC (World wide Web Consorcium) states that UTF-8 should be used.
From the coding perspective, break down to the following code may help to understand better:
String updatedMessage = message.replaceAll("\\<.*?\\>", "");
System.out.println(updatedMessage );
String encodedMessage = URLEncoder.encode(updatedMessage ,"UTF-8");
System.out.println(encodedMessage );
String updatedEncodedMessage = encodedMessage .replace("%0D%0A", "<br>");
System.out.println(updatedEncodedMessage );
String note = URLDecoder.decode(updatedEncodedMessage ,"UTF-8");
System.out.println(note );
There is no regex involved, just some string replacement.
Related
I consuming an api which is returning String with special characters, so I replace them with blank or some other user readable char.
My code:
String text = response;
if (text != null) {
text = text.replace("Â", "");
//same for other special char
}
The above code works fine for windows machine but in Linux "Â" converted into "?", even other all special char converted into "?".
I am using Java, UTF-8 in my HTML.
Please let me know any platform independent solution. Thanks
I am consuming the REST api, so while getting the output I have to maintain UTF-8 encoding.
BufferedReader br = new BufferedReader(new InputStreamReader((inputStream), standardCharsets.UTF_8));
I have added standardCharsets.UTF_8
I get a string from a 3rd party library, which is not well encoded.
Unfortunately I'm not allowed to change the library or use another one...
So the actual problem is, that the 3rd party library result string will encode characters like "è ò à ù ì ä ö ü, ..." as SHIFT_JIS (Kanji) inside an UTF-8 string. But only if the character is connected to a word and isn't standalone.
For example:
"Ö Just a simple test"
"ÖJust a simple test"
I tried the following without success:
byte[] b = resultString.getBytes("Shift_JIS");
String value = new String(b, "UTF-8");
UPDATE 1:
That's the content of "resultString".
Note:
The byte array shown, is without any modifications (such as getBytes("Shift_JIS"), it's just the resultString as bytes)
Do you have any ideas?
Any help would be greatly appreciated.
Thank you.
Well, very strange:
As
byte[] b = resultString.getBytes("Shift_JIS");
String value = new String(b, "UTF-8");
didn't work for me I tried the following:
String value = new String(resultString.getBytes("SHIFT-JIS"), "UTF-8")
Works like a charm.
Maybe it was because of the underscore and lower case character in "Shift_JIS".
I'm processing MMS and got it text part as :
mmsBodyPart.getContent();
it's simpy Object. Now i need to convert it to String using utf-8. I have tried:
String contentText = (String) mmsBodyPart.getContent();
but it doesn't works with specyfics characters and some strange chars appear.
Also i tried :
String content = new String(contentText.getBytes("UTF-8"), "UTF-8"));
not a mystery that also failed.
How that can be done ?
EDIT: Problem was caused by bad encoding in file. Nothing wrong was in code, ya didn't thought about it in first place...
Strings haven't an Encoding in Java. If you need one, you should use byte[] with Encoding to get a String
I want to replace a carriage return followed by quotation marks with just quotation marks. For example, if I have:
Hello World
"Hello World"
I would like the result to be:
Hello World"Hello World"
This is my attempt, where String text is what I have above:
String adjusted = text.replaceAll("[\n][\"], "\"");
However, my IDE does not accept this.
Thanks for the help!
String adjusted = text.replaceAll("(?m)\r?\n\"", "\"");
The (?m) is for multi-line usage, for \r for a real CR in Windows (CR+LF).
You can use replace instead of replaceAll to avoid matching regular expression, but instead matching literals.
String adjusted = text.replace("\n\"", "\"");
If you want this method to use you operating system line separators you should use
String adjusted = text.replace(System.lineSeparator()+"\"", "\"");
You should do it in a platform agnostic way like:
String newline = System.getProperty("line.separator");
String newStr = str.replaceAll(newline, "\"");
I am doing the following:
String url = String.format(WEBSERVICE_WITH_CITYSTATE, cityName, stateName);
String urlUtf8 = new String(url.getBytes(), "UTF8");
Log.d(TAG, "URL: [" + urlUtf8 + "]");
Reader reader = WebService.queryApi(url);
The output that I am looking for is essentially to get the city name with blanks (e.g., "Overland Park") to be formatted as Overland%20Park.
Is it this the best way?
Assuming you are actually wanting to encode your string for use in a URL (ie, "Overland Park" can also be formatted as "Overland+Park") you want URLEncoder.encode(url, "UTF-8"). Other unsafe characters will be converted to the %xx format you are asking for.
The simple answer is to use URLEncoder.encode(...) as stated by #Recurse. However, if part or all of the URL has already been encoded, then this can lead to double encoding. For example:
http://foo.com/pages/Hello%20There
or
http://foo.com/query?keyword=what%3f
Another concern with URLEncoder.encode(...) is that it doesn't understand that certain characters should be escaped in some contexts and not others. So for example, a '?' in a query parameter should be escaped, but the '?' that marks the start of the "query part" should not be escaped.
I think that safer way to add missing escapes would be the following:
String safeURI = new URI(url).toASCIIString();
However, I haven't tested this ...