Convert String to unicode? - java

How to convert Strings to unicode? Characters are easy. But if I have "C" stored as a String, how can convert it to unicode? Because for characters, you just can use (int)charvariable but how to do for strings?
Actually I am using String.split() to split a String and then want to check if the 1st character is capital or small. Integer.parseInt is not working. It says NumberFormatException.

You may try this -
byte[] bytes = new byte[10];
String str = new String(bytes, Charset.forName("UTF-8"));
System.out.println(str);
for more detail you can see this tutorial
and for checking the first character you my use str.CharAt(0)

Related

Converting hex to binary with leading zeros

I am following the below method
new Biginteger(str,16).toString(2);
It works really well, but it removes leading zeros. I need output in 64 bits if input string is "3031323334353637"
But it returns 62 characters. I can use a for loop to do that. Is there any other way to do that without loop?
Goal: Converting hex to binary with leading zeros
You can pad with spaces using String.format("%64s") and then replace spaces with zeros. This has the advantage of working for any size of input, not just something in the int range. I'm guessing you're working with arbitrary inputs from your use of BigInteger...
String value = new BigInteger("3031323334353637", 16).toString(2);
System.out.println(String.format("%64s", value).replace(" ", "0"));
Output
0011000000110001001100100011001100110100001101010011011000110111
Explanation... The String.format("%64s, value) outputs the earlier String padded to fill 64 characters.
" 11000000110001001100100011001100110100001101010011011000110111"
The leading spaces are then replaced with '0' characters using String.replace(oldString, newString)
"0011000000110001001100100011001100110100001101010011011000110111"
The following may be the easiest:
new BigInteger("1" + str,16).toString(2).substring(1)
Check out this question.
You can do it using String.format():
String unpaddedBinary = new BigInteger("a12", 16).toString(2);
String paddedBinary = String.format("%064s", Integer.parseInt(unpaddedBinary, 2));

Encode/decode two strings into one string, and back to two

This is an interview question. I was thinking of a solution in java. This questions seems very simple, is there a catch here?
I was thinking of the following solution:
string1 + 1*hash(String1) + string2 + 2*hash(String2).
If I concat strings like this, then I can decode them as well easily into 2 separate strings.
Am I missing something in the question?
Encode:
String encoded = new JsonArray().add(str1).add(str2).toString();
Decode:
JsonArray arr = JsonArray.readFrom(encoded);
String str1 = arr.get(0).asString();
String str2 = arr.get(1).asString();
Here I use minimal-json lib, but it's pretty similar with any other JSON library as well.
Note that it's usually a bad idea to invent new formats of encoding the information into the string as you have plenty of existing ones (xml, json, yaml, etc.) which already solved all the possible issues like symbol escaping and exception handling.
To encode:
String encoded = ""+str1.length()+"/"+str1+str2;
To decode:
String[] temp = encoded.split("/", 2);
int length1 = Integer.parseInt(temp[0]);
String str1 = temp[1].substring(0, length1);
String str2 = temp[1].substring(length1);
Explanation:
The encoded string is in the form "<number>/<str1><str2>". When you call split(regex, limit) the size of the resulting array will be at most limit, considering only the first matches of regex. Thus even if your strings contain the character / you can be sure that the resulting array will be {"<number>", "<str1><str2>"}.
the substring(begin, end) return a string starting at begin inclusive and ending at end exclusive, giving you a resulting substring of end-begin length. Since you are calling it with values(0, str1.length()) what you get is exactly str1. The last call will return a substring from str1.length(), which is also the index of the first character of str2, to the end of the string (which is the end of str2).
Reference: String javadoc page
One way is to use the length of the first string.
// encode
String concat = string1 + string2;
// decode
String str1 = concat.substring( 0, string1.length() );
String str2 = concat.substring( string1.length(), concat.length() );
Another way is to use a delimiter. But the delimiter character should not be included in any of the strings to be joined.
// encode
String concat = "hello" + "`" + "world!";
// decode
String[] decoded = concat.split("`");
String str1 = decoded[0];
String str2 = decoded[1];

Handling Strings with octal ASCII code (in Java)

i'm having some trouble with a text file that contains strings like these:
Grandchamp-le-Ch\303\242teau
It's the name of a Wikipedia page by the way. The two asciis represent "â" I think.
Is there any piece of software that easily converts the string above into
Grandchamp-le-Château
or maybe
Grandchamp-le-Ch%C3%A2teau
I would prefer a java absed solution, but any other idea is just as well!
Any advice or hint is very much appreciated!
This is a slightly hacky way to achieve your goal:
final String name = "Grandchamp-le-Ch\\303\\242teau";
final Matcher m = Pattern.compile("\\\\(\\d{3})").matcher(name);
final StringBuffer out = new StringBuffer();
while (m.find()) m.appendReplacement(out, String.valueOf((char)parseInt(m.group(1), 8)));
m.appendTail(out);
final String decoded = new String(out.toString().getBytes(ISO_8859_1), UTF_8);
System.out.println(decoded);
How it works:
the regular expression matches the octal character notation;
the original string is transformed by replacing each such octal notation with a char whose numeric value equals that octal number;
the new string (now in "mojibake" state) is written out as bytes, using a single-byte encoding (any will do, but ISO_8859_1 happens to be the standard one);
the bytes are re-read, now assuming they are an UTF-8-encoded string.
The code will print out
Grandchamp-le-Château
Here you are:
String myString = "Grandchamp-le-Ch\303\242teau";
byte[] byteArray = myString.getBytes("ISO-8859-1");
String result = new String(byteArray, "UTF-8");
System.out.println(result);
This prints:
Grandchamp-le-Château

how to convert string into it's "html" ascii code using Java?

e.g.
B is uppercase B.
so if I have string like "BOY". I want it converted to BOY
I'm hoping there's already a library I can use. I've searched the net but I didn't see it.
thanks
Those codes are nothing but concatenation of &# and ; with the Unicode Codepoint for each character. You can iterate over each character in the string, and do:
output.append("&#")
.append((int)ch)
.append(";");
Where, output refers to a StringBuilder instance.
You could try writing your own utility:
String input = "BOY";
char[] chars = input.toCharArray();
StringBuilder output = new StringBuilder();
for (char c : chars)
{
output.append("&#").append((int) c).append(";");
}
output content after execution:
BOY

Converting to utf-8 in java

I just have the string \u0130smail and I want to convert it to
İsmail and also convert
\u0130 --> İ
\u00E7 --> ç
I tried
String str = "\u0130smail";
sysout(str);
and it worked, but whenever I get the string "\u0130smail" from the DB or the internet it doesn't give the correct result.
static String deneme(String string){
String string2 = null;
try {
byte[] utf8 = string.getBytes("UTF-8");
string2 = new String(utf8, "UTF-8");
} catch (UnsupportedEncodingException e) {
}
return string2;
}
didn't work either.
Strings "\u0130smail" and "İsmail" are absolutely the same from the language standpoint. If you mean that you get a string "\\u0130smail" (note that I've escaped the backslash), then you will have to find the pattern of the unicode code points and convert them to normal unicode letters or just print the number, whichever you need. Regular expressions could help you in this case.
Converting the existing string to bytes and back again isn't going to help you. You need to look at the exact characters in the string you've got - and work out how you got them.
I suggest you print out the integer value of each character in the string as an integer (ideally in hex) to find out exactly what you've got... then trace it back as far as you can, to work out what's going wrong.

Categories

Resources