special character "Â" is not work in Linux, converted into "?" - java

I consuming an api which is returning String with special characters, so I replace them with blank or some other user readable char.
My code:
String text = response;
if (text != null) {
text = text.replace("Â", "");
//same for other special char
}
The above code works fine for windows machine but in Linux "Â" converted into "?", even other all special char converted into "?".
I am using Java, UTF-8 in my HTML.
Please let me know any platform independent solution. Thanks

I am consuming the REST api, so while getting the output I have to maintain UTF-8 encoding.
BufferedReader br = new BufferedReader(new InputStreamReader((inputStream), standardCharsets.UTF_8));
I have added standardCharsets.UTF_8

Related

Reading characters with UTF-8 standards from console in java

I want to read some Unicode characters from console (Farsi Characters).
I have used System.in but it didn't work. Looks like that Standard Input does not understand the characters I'm writing in the input so its just returns some mumbo jumbo to my String variable. I am absolutely sure that String variable's standard is set to "UTF-8". Believe me i doubled check.
Some pieces of code that I tried.
String t = new String (new Scanner(System.in).nextLine().getBytes() , "UTF-8");
didn't work.
byte b[] = new byte[4];
System.in.read(b);
String st = new String (b , "UTF-16");
System.out.println(st);
I wrote the above code for reading just one Farsi character. didn't work either.
First of all, the console must be in UTF-8 mode.
If using NetBeans, edit the file <NetBeansRoot>/etc/netbeans.conf.
Under netbeans_default_options, add -J-Dfile.encoding=UTF-8.
Once you're sure the console and your project encoding are set to UTF-8, try this:
Scanner console = new Scanner(new InputStreamReader(System.in, "UTF-8"));
while (console.hasNextLine())
System.out.println(console.nextLine());
Note: System.in is an InputStream, i.e. a stream of bytes, it produces the bytes from the console 1-to-1.
To read characters you need a Reader. A Reader takes an InputStream and an encoding, and produces characters.
If it doesn't help, try another console (e.g. Windows cmd, but first run chcp 65001).

Special characters from UNIX not being read properly by Java

I have a Java application wherein a string is being read from a file in UNIX. Then, the string is being passed to another application using URL POST method. However, it is having problems when there are special characters such as:
~
^
[
]
\
{
}
|
I am constructing the URL using a StringBuilder:
new StringBuilder() .append("message=").append(message).toString()
Is there a standard on how these characters should be encoded from UNIX to Java? I believe this is the issue here.
Those are characters used for a regular expression.
So somewhere you place the string in a position where a regex is expected.
replaceFirst
replaceAll instead of replace
split
format
printf
Encoding cannot be the error here (normal ASCII functions). However be aware that FileReader is an old utility class that reads a file with the default platform encoding.
When the file is in a known encoding, say UTF-8, better do:
Path path = file.toPath();
try (BufferedReader in = Files.newBufferedReader(path, StandardCharsets.UTF_8)) {
...
}
To properly read characters from a file in Java you need to specify the character set. E.g. like this (error handling left out for brevity):
String charset = "UTF-8"; // replace with what you are really using in your Unix system
Reader reader = new InputStreamReader(new FileInputStream(file), charset);
// use the reader...
A URL requires that certain characters be encoded. This is nothing to do with Unix, or Java; it is part of the specification for URLs.
In Java, you can encode arbitrary text to make it suitable for URLs via the URLEncoder.encode method:
new StringBuilder()
.append("message=")
.append(URLEncoder.encode(message, "UTF-8"))
.toString()

Greek characters display issue Tomcat 7

I am facing an issue in displaying greek characters. The characters should appear as σ μυστικός αυτό? but they are appearing as ó ìõóôéêüò áõôü?
There are some other greek characters which appear fine but the above text appears garbled.
The content is read from a HTML file using following code by a servlet:
public String getResponse() {
StringBuffer sb = new StringBuffer();
try {
BufferedReader in = new BufferedReader((new InputStreamReader(new FileInputStream(fn), "8859_1")));
String line=null;
while ((line=in.readLine())!=null){
sb.append(line);
}
in.close();
return sb.toString();
}
}
I am setting encoding as UTF-8 while sending back response:
PrintWriter out;
if ((encodings != null) && (encodings.indexOf("gzip") != 1)) {
OutputStream out1 = response.getOutputStream();
out = new PrintWriter(new GZIPOutputStream(out1), false);
response.setHeader("Content-Encoding","gzip");
}
else {
out = response.getWriter();
}
response.setCharacterEncoding("UTF-8");
response.setContentType("text/html;charset=UTF-8");
out.println(getResponse());
The characters appear fine on my local development machine (which is Windows), but appear garbled when deployed on a CentOS Server. Both machines have JDK7 and Tomcat 7 installed.
I'm 99% sure the problem is your input encoding (when you read the data). You're decoding it as ISO-8859-1 when it's probably ISO-8859-7 instead. This would cause the symptoms you see.
The simplest way to check would be to open the HTML in a hex editor and examine the character encodings directly. If the Greek characters take up one byte each then it's almost definitely ISO-8859-7 (not -1). If they take up 2 bytes each then it's UTF-8.
From what you posted it looks like ISO-8859-7. In that character set, the lower-case sigma σ is 0xF3, while in ISO-8859-1 that same code maps to ó, which matches the data you showed. I'm sure if you mapped all the remaining characters you'd see a 1-to-1 match in the codes. Maybe your Windows system's default codepage is ISO-8859-7?

Why am i getting ?? when i try to read ä character from a text file in java?

I am trying to read text from a text file. There are some special characters like å,ä and ö. When i make a string and print out that string then i get ?? from these special characters. I am using the following code:
File fileDir = new File("files/myfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(fileDir), "UTF8"));
String strLine;
while ((strLine = br.readLine()) != null) {
System.out.println("strLine: "+strLine);
}
Can anybody tell me whats the problem. I want strLine to show and save å, ä and ö as they are in text file. Thanks in advance.
The problem might not be with the file but with the console where you are trying to print. I suggest you follow the following steps
Make sure the file you are reading is encoded in UTF-8.
Make sure the console you are printing to has the proper encoding/charset to display these characters
Finally, this article Unicode - How to get characters right? is a must read.
Check here for the lists of Java supported encodings
Most common single-byte encoding that includes non-ascii characters is ISO8859_1; maybe your file is that, and you should specifiy that encoding for your FileInputStream

Display Hindi language in console using Java

StringBuffer contents=new StringBuffer();
BufferedReader input = new BufferedReader(new FileReader("/home/xyz/abc.txt"));
String line = null; //not declared within while loop
while (( line = input.readLine()) != null){
contents.append(line);
}
System.out.println(contents.toString());
File abc.txt contains
\u0905\u092d\u0940 \u0938\u092e\u092f \u0939\u0948 \u091c\u0928\u0924\u093e \u091c\u094b \u091a\u093e\u0939\u0924\u0940 \u0939\u0948 \u092
I want to dispaly in Hindi language in console using Java.
if i simply print like this
String str="\u0905\u092d\u0940 \u0938\u092e\u092f \u0939\u0948 \u091c\u0928\u0924\u093e \u091c\u094b \u091a\u093e\u0939\u0924\u0940 \u0939\u0948 \u092";
System.out.println(str);
then it works fine but when i try to read from a file it doesn't work.
help me out.
Use Apache Commons Lang.
import org.apache.commons.lang3.StringEscapeUtils;
// open the file as ASCII, read it into a string, then
String escapedStr; // = "\u0905\u092d\u0940 \u0938\u092e\u092f \u0939\u0948 ..."
// (to include such a string in a Java program you would have to double each \)
String hindiStr = StringEscapeUtils.unescapeJava( escapedStr );
System.out.println(hindiStr);
(Make sure your console is set up to display Hindi (correct fonts, etc) and the console's encoding matches your Java encoding. The Java code above is just the bare bones.)
You should store the contents in the file as UTF-8 encoded Hindi characters. For instance, in your case it would be अभी समय है जनता जो चाहती है. That is, instead of saving unicode escapes, directly save the raw Hindi characters. You can then simply read like normal.
You just have to make sure that the editor you use saves it using UTF-8 encoding. See Spanish language chars are not displayed properly?
Otherwise, you'll have to make the file a .properties file and read using java.util.Properties as it offers unicode unescaping support inherently.
Also read Reading unicode character in java

Categories

Resources