Android convert diamond question marks to UTF-8 Arabic string - java

I'm using an API that sends and receives raw bytes.
But i have problem with displaying the Arabic words that comes over the API, it's displaying like diamond question marks "���"
I've tried to convert the string from and to utf-8.
This example returns question marks but not inside the black square "??? ???" :
String str = new String(originalStr.getBytes("ISO-8859-1"), "UTF-8");
This one returns empty string :
String str = new String(originalStr.getBytes("WINDOWS-1256"), "UTF-8");
And this one also returns an empty string :
String str = new String(originalStr.getBytes("WINDOWS-1252"), "UTF-8");
I've succeded to display the Arabic words in PHP by converting from cp1256 to utf-8 :
echo iconv('cp1256', 'utf-8', $string);
The correct character encoding for Arabic is cp1256
How can i achieve that?

Related

Print unicode character in java

Displaying unicode character in java shows "?" sign. For example, i tried to print "अ". Its unicode Number is U+0905 and html representation is "अ".
The below codes prints "?" instead of unicode character.
char aa = '\u0905';
String myString = aa + " result" ;
System.out.println(myString); // displays "? result"
Is there a way to display unicode character directly from unicode itself without using unicode numbers? i.e "अ" is saved in file now display the file in jsp.
Java defines two types of streams, byte and character.
The main reason why System.out.println() can't show Unicode characters is that System.out.println() is a byte stream that deal with only the low-order eight bits of character which is 16-bits.
In order to deal with Unicode characters(16-bit Unicode character), you have to use character based stream i.e. PrintWriter.
PrintWriter supports the print( ) and println( ) methods. Thus, you can use these methods
in the same way as you used them with System.out.
PrintWriter printWriter = new PrintWriter(System.out,true);
char aa = '\u0905';
printWriter.println("aa = " + aa);
try to use utf8 character set -
Charset utf8 = Charset.forName("UTF-8");
Charset def = Charset.defaultCharset();
String charToPrint = "u0905";
byte[] bytes = charToPrint.getBytes("UTF-8");
String message = new String(bytes , def.name());
PrintStream printStream = new PrintStream(System.out, true, utf8.name());
printStream.println(message); // should print your character
Your myString variable contains the perfectly correct value. The problem must be the output from System.out.println(myString) which has to send some bytes to some output to show the glyphs that you want to see.
System.out is a PrintStream using the "platform default encoding" to convert characters to byte sequences - maybe your platform doesn't support that character. E.g. on my Windows 7 computer in Germany, the default encoding is CP1252, and there's no byte sequence in this encoding that corresponds to your character.
Or maybe the encoding is correct, but simply the font that creates graphical glyphs from characters doesn't have that charater.
If you are sending your output to a Windows CMD.EXE window, then maybe both reasons apply.
But be assured, your string is correct, and if you send it to a destination that can handle it (e.g. a Swing JTextField), it'll show up correctly.
I ran into the same problem wiht Eclipse. I solved my problem by switching the Encoding format for the console from ISO-8859-1 to UTF-8. You can do in the Run/Run Configurations/Common menu.
https://eclipsesource.com/blogs/2013/02/21/pro-tip-unicode-characters-in-the-eclipse-console/
Unicode is a unique code which is used to print any character or symbol.
You can use unicode from --> https://unicode-table.com/en/
Below is an example for printing a symbol in Java.
package Basics;
/**
*
* #author shelc
*/
public class StringUnicode {
public static void main(String[] args) {
String var1 = "Cyntia";
String var2 = new String(" is my daughter!");
System.out.println(var1 + " \u263A" + var2);
//printing heart using unicode
System.out.println("Hello World \u2665");
}
}
******************************************************************
OUTPUT-->
Cyntia ☺ is my daughter!
Hello World ♥

char from ldap not displaying correctly in java

In the Eclipse Ldap Browser plugin, I see an attribute value that has a UTF-8 char (a lowercase n with a ~ above it). This is UTF char c3b1 or USC2 char 00F1 which I've read Java uses for its Strings. But when I print it out to the log file with the following code, it shows up as an uppercase A with a ~ above it, followed by a +/- symbol. All three output statements show the same thing.
while(allAttributes.hasNext()) {
LDAPAttribute attribute = (LDAPAttribute)allAttributes.next();
String attributeValue = new String(attribute.getStringValue());
byte[] attByteValue = attribute.getByteValue();
String utf8Str = new String( attByteValue, "UTF-8");
log.debug("attribute.getStringValue="+attribute.getStringValue());
log.debug("attributeValue="+attributeValue);
log.debug("utf8Str="+utf8Str);
boolean isValidUTF8 = Base64.isValidUTF8(attByteValue, true);
if (isValidUTF8) log.debug("string contains all valid UTF8 chars and UCS2 chars");
else log.debug("string contains invalid UTF8 char(s) or invalid UCS2 char(s)");
}
isValidUTF returns true so it seems there are no invalid chars. Any suggestions how to make it display correctly in the log?

java convert String windows-1251 to utf8

Scanner sc = new Scanner(System.in);
System.out.println("Enter text: ");
String text = sc.nextLine();
try {
String result = new String(text.getBytes("windows-1251"), Charset.forName("UTF-8"));
System.out.println(result);
} catch (UnsupportedEncodingException e) {
System.out.println(e);
}
I'm trying change keyboard: input cyrylic keyboard, output latin. Example: qwerty +> йцукен
It doesn't work, can anyone tell me what i'm doing wrong?
First java text, String/char/Reader/Writer is internally Unicode, so it can combine all scripts.
This is a major difference with for instance C/C++ where there is no such standard.
Now System.in is an InputStream for historical reasons. That needs an indication of encoding used.
Scanner sc = new Scanner(System.in, "Windows-1251");
The above explicitly sets the conversion for System.in to Cyrillic. Without this optional parameter the default encoding is taken. If that was not changed by the software, it would be the platform encoding. So this might have been correct too.
Now text is correct, containing the Cyrillic from System.in as Unicode.
You would get the UTF-8 bytes as:
byte[] bytes = text.getBytes(StandardCharsets.UTF_8);
The old "recoding" of text was wrong; drop this line. in fact not all Windows-1251 bytes are valid UTF-8 multi-byte sequences.
String result = text;
System.out.println(result);
System.out is a PrintStream, a rather rarely used historic class. It prints using the default platform encoding. More or less rely on it, that the default encoding is correct.
System.out.println(result);
For printing to an UTF-8 encoded file:
byte[] bytes = ("\uFEFF" + text).getBytes(StandardCharsets.UTF_8);
Path path = Paths.get("C:/Temp/test.txt");
Files.writeAllBytes(path, bytes);
Here I have added a Unicode BOM character in front, so Windows Notepad may recognize the encoding as UTF-8. In general one should evade using a BOM. It is a zero-width space (=invisible) and plays havoc with all kind of formats: CSV, XML, file concatenation, cut-copy-paste.
The reason why you have gotten the answer to a different question, and nobody answered yours, is because your title doesn't fit the question. You were not attempting to convert between charsets, but rather between keyboard layouts.
Here you shouldn't worry about character layout at all, simply read the line, convert it to an array of characters, go through them and using a predefined map convert these.
The code will be something like this:
Map<char, char> table = new TreeMap<char, char>();
table.put('q', 'й');
table.put('Q', 'Й');
table.put('w', 'ц');
// .... etc
String text = sc.nextLine();
char[] cArr = text.toCharArray();
for(int i=0; i<cArr.length; ++i)
{
if(table.containsKey(cArr[i]))
{
cArr[i] = table.get(cArr[i]);
}
}
text = new String(cArr);
System.out.println(text);
Now, i don't have time to test that code, but you should get the idea of how to do your task.

Java Output UTF-8 to Real Characters?

In Java, how can I output UTF-8 to real string?
我们
\u6211\u4eec
String str = new String("\u6211\u4eec");
System.out.println(str); // still ouput \u6211\u4eec, but I expect 我们 to be an output
-----
String tmp = request.getParameter("tag");
System.out.println("request:"+tmp);
System.out.println("character set :"+request.getCharacterEncoding());
String tmp1 = new String("\u6211\u4eec");
System.out.println("string equal:"+(tmp.equalsIgnoreCase(tmp1)));
String tag = new String(tmp);
System.out.println(tag);
request:\u6211\u4eec
character set :UTF-8
string equal:false
\u6211\u4eec
From the output, the value from the request is the same as the string value of tmp1, but why does equalsIgnoreCase output false?
did you try to display just one of them? like
String str = new String("\u6211");
System.out.println(str);
I bet there is a problem in how you create that string.
Java String are encoded in UTF-16. I do not see any problem in your code, I would believe the problem comes from your console and it doesn't show correctly the content of the String.
If you are using eclipse, change your console encoding here to UTF-8
Eclipse > Preferences > General > Workspace > Text file encoding

java.text.ParseException: Unparseable number: "ä¢è»ÅÒËÅèÍ"

I have one field in Database ( Sql Server DB 2000) with varchar field in which i have stored Thai Sentence (in the form of Unicode ).I am using the Locale object to convert the unicode data into Thai sentence as follows
NumberFormat thai = NumberFormat.getNumberInstance(new Locale("th", "TH", "TH"));//Line1
String thaiText = ResultSet.getString(i);// Data Fetched From DB//Line2
double number = thai.parse(thaiText).doubleValue();//Line3
String outputString= nf.format(number);//Line4
I am getting the following exception on line no 3 :-
java.text.ParseException: Unparseable number: "ä¢è»ÅÒËÅèÍ"
The problem is not in line 3; i.e. it is not with the way you are parsing the string.
The contents of thaiText has been corrupted due to an earlier problem with encodings. You need to track down where the text is going bad.
The text could be bad before you put it into the database.
The text could be going bad when you put it into the database.
The text could be going bad when you retrieve it from the database.
Figure out which of the above is the case, and that will tell you where you need to fix the problem.
The problem is because the data you are parsing is in wrong encoding.
You need to find out what is the data, you can use "Character set converter tool", for example like this one : http://kanjidict.stc.cx/recode.php , find out what is the encoding for "ä¢è»ÅÒËÅèÍ"
then use following code to set the correct encoding.
String original = "ä¢è»ÅÒËÅèÍ";
String thaiText = new String(original.getBytes(charset1), charset2);//you need to work out charset1 and charset2 here by youself
Your problem is most likely that the String you are reading from the database is not being decoded correctly. You have indicated this in your comments. You could try reading the comments out and FORCING the encoding. This example is UTF-8:
InputStreamReader isr = new InputStreamReader(new
ByteArrayInputStream(rs.getBytes(i)), "UTF-8");
StringWriter sw = new StringWriter();
char[] cbuf = new char[4096];
int len;
while((len=isr.read(cbuf, 0, cbuf.length)) != -1) {
sw.write(cbuf, 0, len);
}
isr.close();
sw.close();
String data = sw.toString();
Check that "data" has the correct information, then decode that into a number (if that makes sense) as you are already doing.

Categories

Resources