Decoding a String Containing Percentage(%) - java

I am trying to decode a String that Contains (%) percentage, it's throwing an Exception
Exception:URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "%&"
My Code:
public class DecodeCbcMsg {
public static void main(String[] args) throws UnsupportedEncodingException
{
String msg="Hello%%&&$$";
String strTMsg = URLDecoder.decode(msg,"UTF-8");
System.out.println(strTMsg);
}

It doesn't look like your string is encoded correctly...
Maybe you should ensure it is properly encoded first?
For example, the encoded character representation for % is %25...
So please try decoding Hello%25%25%26%26%24%24 instead, and see what you get :)

your msg is not a valid encoded url, so it cannot be decode.
just like you try to decode a invalid base64 encode string.
ps:
from URLDecoder code
case '%':
/*
* Starting with this instance of %, process all
* consecutive substrings of the form %xy. Each
* substring %xy will yield a byte. Convert all
* consecutive bytes obtained this way to whatever
* character(s) they represent in the provided
* encoding.
*/
try {
// (numChars-i)/3 is an upper bound for the number
// of remaining bytes
if (bytes == null)
bytes = new byte[(numChars-i)/3];
int pos = 0;
while ( ((i+2) < numChars) &&
(c=='%')) {
int v = Integer.parseInt(s.substring(i+1,i+3),16);
if (v < 0)
throw new IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value");
bytes[pos++] = (byte) v;
i+= 3;
if (i < numChars)
c = s.charAt(i);
}
// A trailing, incomplete byte encoding such as
// "%x" will cause an exception to be thrown
if ((i < numChars) && (c=='%'))
throw new IllegalArgumentException(
"URLDecoder: Incomplete trailing escape (%) pattern");
sb.append(new String(bytes, 0, pos, enc));
} catch (NumberFormatException e) {
throw new IllegalArgumentException(
"URLDecoder: Illegal hex characters in escape (%) pattern - "
+ e.getMessage());
}
so it try to parse int of string %&, it will throw exception

in order to decode an URL-encoded string, the string would first really need to be url-encoded. In a properly encoded URL, a % sign will be followed by two hex digits 0-9,A-F and so the URLDecoder considers your %% as illegal. The message is quite clear. Make sure you encode your URL properly. Use URLEncoder first, to encode your msg String.

Related

Java - What is the proper way to convert a UTF-8 String to binary?

I'm using this code to convert a UTF-8 String to binary:
public String toBinary(String str) {
byte[] buf = str.getBytes(StandardCharsets.UTF_8);
StringBuilder result = new StringBuilder();
for (int i = 0; i < buf.length; i++) {
int ch = (int) buf[i];
String binary = Integer.toBinaryString(ch);
result.append(("00000000" + binary).substring(binary.length()));
result.append(' ');
}
return result.toString().trim();
}
Before I was using this code:
private String toBinary2(String str) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
int ch = (int) str.charAt(i);
String binary = Integer.toBinaryString(ch);
if (ch<256)
result.append(("00000000" + binary).substring(binary.length()));
else {
binary = ("0000000000000000" + binary).substring(binary.length());
result.append(binary.substring(0, 8));
result.append(' ');
result.append(binary.substring(8));
}
result.append(' ');
}
return result.toString().trim();
}
These two method can return different results; for example:
toBinary("è") = "11000011 10101000"
toBinary2("è") = "11101000"
I think that because the bytes of è are negative while the corresponding char is not (because char is a 2 byte unsigned integer).
What I want to know is: which of the two approaches is the correct one and why?
Thanks in advance.
Whenever you want to convert text into binary data (or into text representing binary data, as you do here) you have to use some encoding.
Your toBinary uses UTF-8 for that encoding.
Your toBinary2 uses something that's not a standard encoding: it encodes every UTF-16 codepoint * <= 256 in a single byte and all others in 2 bytes. Unfortunately that one is not a useful encoding, since for decoding you'll have to know if a single byte is stand-alone or part of a 2-byte sequence (UTF-8/UTF-16 do that by indicating with the highest-level bits which one it is).
tl;dr toBinary seems correct, toBinary2 will produce output that can't uniquely be decoded back to the original string.
* You might be wondering where the mention of UTF-16 comes from: That's because all String objects in Java are implicitly encoded in UTF-16. So if you use charAt you get UTF-16 codepoints (which just so happen to be equal to the Unicode code number for all characters that fit into the Basic Multilingual Plane).
This code snippet might help.
String s = "Some String";
byte[] bytes = s.getBytes();
StringBuilder binary = new StringBuilder();
for(byte b:bytes){
int val =b;
for(int i=;i<=s.length;i++){
binary.append((val & 128) == 0 ? 0 : 1);
val<<=1;
}
}
System.out.println(" "+s+ "to binary" +binary);

org.apache.commons.codec.DecoderException: Odd number of characters

Sending hex string in url parameter and trying to convert it in to string at server side.
Converting user input string by using following javascript encoding code
function encode(string) {
var number = "";
var length = string.trim().length;
string = string.trim();
for (var i = 0; i < length; i++) {
number += string.charCodeAt(i).toString(16);
}
return number;
}
Now I'm trying to parse hex string 419 for russian character Й in java code as follows
byte[] bytes = "".getBytes();
try {
bytes = Hex.decodeHex(hex.toCharArray());
sb.append(new String(bytes,"UTF-8"));
} catch (DecoderException e) {
e.printStackTrace(); // Here it gives error 'Odd number of characters'
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
but it gives following error
"org.apache.commons.codec.DecoderException: Odd number of characters."
How it can be resolved. As there are many russian character have hex code 3 digit and due to this it is not able to convert it to .toCharArray().
Use Base64 instead
val aes = KeyGenerator.getInstance("AES")
aes.init(128)
val secretKeySpec = aes.generateKey()
val base64 = Base64.encodeToString(secretKeySpec.encoded, 0)
val bytes = Base64.decode(base64, 0)
SecretKeySpec(bytes, 0, bytes.size, "AES") == secretKeySpec
In the case you mentioned Й is U+0419 and most cyrillic characters start with a leading 0. This apparently means that adding a 0 before odd numbered character arrays before converting would help.
Testing the javascript seems that this could be safe only for 1 letter long strings: Ѓ(U+0403) returned 403, Ѕ(U+0405) returned 405, but ЃЅ returned 403405 instead of 04030405 or 4030405, which is even worse, becouse it is even and would not trigger the exception and could decode to something completely different.
This question dealing with padding with leading zeros may help with the javascript part.
Instead of
sb.append(new String(bytes,"UTF-8"));
Try this
sb.append(new String(bytes,"Windows-1251"));

Error when show UTF8 String with build mode in java

I have a hex string (sA) convert from UTF8 string.
When I convert hex string sA to UTF8 string, I can't show it in form UI with build mode (run file .jar) but when I run with run mode or debug mode UTF8 string can show in form UI.
I use netbeans IDE 7.3.1.
My code below:
public String hexToString(String txtInHex) {
byte[] txtInByte = new byte[txtInHex.length() / 2];
int j = 0;
for (int i = 0; i < txtInHex.length(); i += 2) {
txtInByte[j++] = Byte.parseByte(txtInHex.substring(i, i + 2), 16);
}
return new String(txtInByte);
}
private String asHex(byte[] buf) {
char[] chars = new char[2 * buf.length];
for (int i = 0; i < buf.length; ++i) {
chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
}
return new String(chars);
}
There are multiple problems with this code.
The valid range for byte values is -128 to 127, or -80 to 7F in hex, and Byte.parseByte enforces this. If your asHex method has to process a character whose second byte is greater than 127 it will produce a string that can't be decoded by toHexString.
The asHex method processes only the second byte of the input characters, so it will work correctly only for the first 256 Unicode characters and produce bogus output for the rest of them.
The toHexString method decodes a string from a byte array assuming some platform-specific default encoding, which will give incorrect results if the data was supposedly encoded in UTF-8 and the default encoding is something else.
Why are you trying to create your own methods for encoding and decoding hex strings instead of using a well known and tested library?
new String(txtInByte, "UTF-8");
Without the encoding the platform encoding is taken, for instance Windows-1252. The same holds for its inverse: String.getBytes-
String s = "....";
byte[] b = s.getBytes("UTF-8");

Extract hexadecimal values from a percent encoded URL

Let's say for example i have URL containing the following percent encoded character : %80
It is obviously not an ascii character.
How would it be possible to convert this value to the corresponding hex string in Java.
i tried the following with no luck.Result should be 80.
public static void main(String[] args) {
System.out.print(byteArrayToHexString(URLDecoder.decode("%80","UTF-8").getBytes()));
}
public static String byteArrayToHexString(byte[] bytes)
{
StringBuffer buffer = new StringBuffer();
for(int i=0; i<bytes.length; i++)
{
if(((int)bytes[i] & 0xff) < 0x10)
buffer.append("0");
buffer.append(Long.toString((int) bytes[i] & 0xff, 16));
}
return buffer.toString();
}
The best way to deal with this is to parse the url using either java.net.URL or java.net.URI, and then use the relevant getters to extract the components that you require. These will take care of decoding any %-encoded portions in the appropriate fashion.
The problem with your current idea is that %80 does not represent "80", or 80. Rather it represents a byte that further needs to be interpreted in the context of the character encoding of the URL. And if the encoding is UTF-8, then the %80 needs to be followed by one or two more %-encoded bytes ... otherwise this is a malformed UTF-8 character representation.
I don't really see what you are trying. However, I'll give it a try.
When you have got this String: "%80" and you want to got the string "80", you can use this:
String str = "%80";
String hex = str.substring(1); // Cut off the '%'
If you are trying to extract the value 0x80 (which is 128 in decimal) out of it:
String str = "%80";
String hex = str.substring(1); // Cut off the '%'
int value = Integer.parseInt(hex, 16);
If you are trying to convert an int to its hexadecimal representation use this:
String hexRepresenation = Integer.toString(value, 16);

Check if a String is valid UTF-8 encoded in Java

How can I check if a string is in valid UTF-8 format?
Only byte data can be checked. If you constructed a String then its already in UTF-16 internally.
Also only byte arrays can be UTF-8 encoded.
Here is a common case of UTF-8 conversions.
String myString = "\u0048\u0065\u006C\u006C\u006F World";
System.out.println(myString);
byte[] myBytes = null;
try
{
myBytes = myString.getBytes("UTF-8");
}
catch (UnsupportedEncodingException e)
{
e.printStackTrace();
System.exit(-1);
}
for (int i=0; i < myBytes.length; i++) {
System.out.println(myBytes[i]);
}
If you don't know the encoding of your byte array, juniversalchardet is a library to help you detect it.
The following post is taken from the official Java tutorials available at: https://docs.oracle.com/javase/tutorial/i18n/text/string.html.
The StringConverter program starts by creating a String containing
Unicode characters:
String original = new String("A" + "\u00ea" + "\u00f1" + "\u00fc" + "C");
When printed, the String named original appears as:
AêñüC
To convert the String object to UTF-8, invoke the getBytes method and
specify the appropriate encoding identifier as a parameter. The
getBytes method returns an array of bytes in UTF-8 format. To create a
String object from an array of non-Unicode bytes, invoke the String
constructor with the encoding parameter. The code that makes these
calls is enclosed in a try block, in case the specified encoding is
unsupported:
try {
byte[] utf8Bytes = original.getBytes("UTF8");
byte[] defaultBytes = original.getBytes();
String roundTrip = new String(utf8Bytes, "UTF8");
System.out.println("roundTrip = " + roundTrip);
System.out.println();
printBytes(utf8Bytes, "utf8Bytes");
System.out.println();
printBytes(defaultBytes, "defaultBytes");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
The StringConverter program prints out the values in the utf8Bytes and
defaultBytes arrays to demonstrate an important point: The length of
the converted text might not be the same as the length of the source
text. Some Unicode characters translate into single bytes, others into
pairs or triplets of bytes.
The printBytes method displays the byte arrays by invoking the byteToHex method, which is defined in the source file,
UnicodeFormatter.java. Here is the printBytes method:
public static void printBytes(byte[] array, String name) {
for (int k = 0; k < array.length; k++) {
System.out.println(name + "[" + k + "] = " + "0x" +
UnicodeFormatter.byteToHex(array[k]));
}
}
The output of the printBytes method follows. Note that only the first
and last bytes, the A and C characters, are the same in both arrays:
utf8Bytes[0] = 0x41
utf8Bytes[1] = 0xc3
utf8Bytes[2] = 0xaa
utf8Bytes[3] = 0xc3
utf8Bytes[4] = 0xb1
utf8Bytes[5] = 0xc3
utf8Bytes[6] = 0xbc
utf8Bytes[7] = 0x43
defaultBytes[0] = 0x41
defaultBytes[1] = 0xea
defaultBytes[2] = 0xf1
defaultBytes[3] = 0xfc
defaultBytes[4] = 0x43

Categories

Resources