Java %u20AC conversion to euro €

Java %u20AC conversion to euro € - java

how can I convert a string like:
URLDecoder.decode("promo desc %u20AC", "UTF-16");
into "promo desc €" ?
In fact the method above doesn't work because % indicates a hex string whilst u20AC is not a valid hex string.
The string to decode is generated by a Javascript like this:
var string = escape("{€ć") ---> "%7B%u20AC%u0107"
I didn't want to use URLDecoder because, semantically, it's not a URL I'm trying to decode but a very long text. In java % indicates a hex string and %u is illegal. I think that converting % to \ is a bit naive, there may be sequences of % in the text.
What I am after is this function here:
unescape("%7B%u20AC%u0107")
that exists in Javascript but not in Java to my knowledge. How can I achieve this in Java?
Thanks

I was curious, because I've not seen the %u escapes before, but it turns out unescaping them is fairly easy:
private static final Pattern JAVASCRIPT_ESCAPE_SEQUENCE= Pattern.compile("%(u[0-9a-fA-F]{4}|[0-9a-fA-F]{2})");
/**
* Unescape a JavaScript-escaped string.
* Undoes the effect of calling the <a href="https://developer.mozilla.org/de/docs/Web/JavaScript/Reference/Global_Objects/escape">
* the JavaScript escape method</a>.
*/
static String unescape(String input) {
Matcher matcher = JAVASCRIPT_ESCAPE_SEQUENCE.matcher(input);
StringBuilder sb = new StringBuilder(input.length());
while(matcher.find()) {
String escapeSequence = matcher.group(1);
if (escapeSequence.startsWith("u")) {
escapeSequence = escapeSequence.substring(1);
}
char c = (char) Integer.parseInt(escapeSequence, 16);
matcher.appendReplacement(sb, Character.toString(c));
}
matcher.appendTail(sb);
return sb.toString();
}
Given this method unescape("%7B%u20AC%u0107") produces the desired output {€ć.

Related

how to detect base64 encoded strings? [duplicate]

I want to decode a Base64 encoded string, then store it in my database. If the input is not Base64 encoded, I need to throw an error.
How can I check if a string is Base64 encoded?

You can use the following regular expression to check if a string constitutes a valid base64 encoding:
^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$
In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /]. If the rest length is less than 4, the string is padded with '=' characters.
^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.
([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$ means the string ends in one of three forms: [A-Za-z0-9+/]{4}, [A-Za-z0-9+/]{3}= or [A-Za-z0-9+/]{2}==.

If you are using Java, you can actually use commons-codec library
import org.apache.commons.codec.binary.Base64;
String stringToBeChecked = "...";
boolean isBase64 = Base64.isArrayByteBase64(stringToBeChecked.getBytes());
[UPDATE 1] Deprecation Notice
Use instead
Base64.isBase64(value);
/**
* Tests a given byte array to see if it contains only valid characters within the Base64 alphabet. Currently the
* method treats whitespace as valid.
*
* #param arrayOctet
* byte array to test
* #return {#code true} if all bytes are valid characters in the Base64 alphabet or if the byte array is empty;
* {#code false}, otherwise
* #deprecated 1.5 Use {#link #isBase64(byte[])}, will be removed in 2.0.
*/
#Deprecated
public static boolean isArrayByteBase64(final byte[] arrayOctet) {
return isBase64(arrayOctet);
}

Well you can:
Check that the length is a multiple of 4 characters
Check that every character is in the set A-Z, a-z, 0-9, +, / except for padding at the end which is 0, 1 or 2 '=' characters
If you're expecting that it will be base64, then you can probably just use whatever library is available on your platform to try to decode it to a byte array, throwing an exception if it's not valid base 64. That depends on your platform, of course.

As of Java 8, you can simply use java.util.Base64 to try and decode the string:
String someString = "...";
Base64.Decoder decoder = Base64.getDecoder();
try {
decoder.decode(someString);
} catch(IllegalArgumentException iae) {
// That string wasn't valid.
}

Try like this for PHP5
//where $json is some data that can be base64 encoded
$json=some_data;
//this will check whether data is base64 encoded or not
if (base64_decode($json, true) == true)
{
echo "base64 encoded";
}
else
{
echo "not base64 encoded";
}
Use this for PHP7
//$string parameter can be base64 encoded or not
function is_base64_encoded($string){
//this will check if $string is base64 encoded and return true, if it is.
if (base64_decode($string, true) !== false){
return true;
}else{
return false;
}
}

var base64Rejex = /^(?:[A-Z0-9+\/]{4})*(?:[A-Z0-9+\/]{2}==|[A-Z0-9+\/]{3}=|[A-Z0-9+\/]{4})$/i;
var isBase64Valid = base64Rejex.test(base64Data); // base64Data is the base64 string
if (isBase64Valid) {
// true if base64 formate
console.log('It is base64');
} else {
// false if not in base64 formate
console.log('it is not in base64');
}

Try this:
public void checkForEncode(String string) {
String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(string);
if (m.find()) {
System.out.println("true");
} else {
System.out.println("false");
}
}

It is impossible to check if a string is base64 encoded or not. It is only possible to validate if that string is of a base64 encoded string format, which would mean that it could be a string produced by base64 encoding (to check that, string could be validated against a regexp or a library could be used, many other answers to this question provide good ways to check this, so I won't go into details).
For example, string flow is a valid base64 encoded string. But it is impossible to know if it is just a simple string, an English word flow, or is it base 64 encoded string ~Z0

There are many variants of Base64, so consider just determining if your string resembles the varient you expect to handle. As such, you may need to adjust the regex below with respect to the index and padding characters (i.e. +, /, =).
class String
def resembles_base64?
self.length % 4 == 0 && self =~ /^[A-Za-z0-9+\/=]+\Z/
end
end
Usage:
raise 'the string does not resemble Base64' unless my_string.resembles_base64?

Check to see IF the string's length is a multiple of 4. Aftwerwards use this regex to make sure all characters in the string are base64 characters.
\A[a-zA-Z\d\/+]+={,2}\z
If the library you use adds a newline as a way of observing the 76 max chars per line rule, replace them with empty strings.

/^([A-Za-z0-9+\/]{4})*([A-Za-z0-9+\/]{4}|[A-Za-z0-9+\/]{3}=|[A-Za-z0-9+\/]{2}==)$/
this regular expression helped me identify the base64 in my application in rails, I only had one problem, it is that it recognizes the string "errorDescripcion", I generate an error, to solve it just validate the length of a string.

For Flutter, I tested couple of the above comments and translated that into dart function as follows
static bool isBase64(dynamic value) {
if (value.runtimeType == String){
final RegExp rx = RegExp(r'^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$',
multiLine: true,
unicode: true,
);
final bool isBase64Valid = rx.hasMatch(value);
if (isBase64Valid == true) {return true;}
else {return false;}
}
else {return false;}
}

In Java below code worked for me:
public static boolean isBase64Encoded(String s) {
String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(s);
return m.find();
}

This works in Python:
import base64
def IsBase64(str):
try:
base64.b64decode(str)
return True
except Exception as e:
return False
if IsBase64("ABC"):
print("ABC is Base64-encoded and its result after decoding is: " + str(base64.b64decode("ABC")).replace("b'", "").replace("'", ""))
else:
print("ABC is NOT Base64-encoded.")
if IsBase64("QUJD"):
print("QUJD is Base64-encoded and its result after decoding is: " + str(base64.b64decode("QUJD")).replace("b'", "").replace("'", ""))
else:
print("QUJD is NOT Base64-encoded.")
Summary: IsBase64("string here") returns true if string here is Base64-encoded, and it returns false if string here was NOT Base64-encoded.

C#
This is performing great:
static readonly Regex _base64RegexPattern = new Regex(BASE64_REGEX_STRING, RegexOptions.Compiled);
private const String BASE64_REGEX_STRING = #"^[a-zA-Z0-9\+/]*={0,3}$";
private static bool IsBase64(this String base64String)
{
var rs = (!string.IsNullOrEmpty(base64String) && !string.IsNullOrWhiteSpace(base64String) && base64String.Length != 0 && base64String.Length % 4 == 0 && !base64String.Contains(" ") && !base64String.Contains("\t") && !base64String.Contains("\r") && !base64String.Contains("\n")) && (base64String.Length % 4 == 0 && _base64RegexPattern.Match(base64String, 0).Success);
return rs;
}

There is no way to distinct string and base64 encoded, except the string in your system has some specific limitation or identification.

This snippet may be useful when you know the length of the original content (e.g. a checksum). It checks that encoded form has the correct length.
public static boolean isValidBase64( final int initialLength, final String string ) {
final int padding ;
final String regexEnd ;
switch( ( initialLength ) % 3 ) {
case 1 :
padding = 2 ;
regexEnd = "==" ;
break ;
case 2 :
padding = 1 ;
regexEnd = "=" ;
break ;
default :
padding = 0 ;
regexEnd = "" ;
}
final int encodedLength = ( ( ( initialLength / 3 ) + ( padding > 0 ? 1 : 0 ) ) * 4 ) ;
final String regex = "[a-zA-Z0-9/\\+]{" + ( encodedLength - padding ) + "}" + regexEnd ;
return Pattern.compile( regex ).matcher( string ).matches() ;
}

If the RegEx does not work and you know the format style of the original string, you can reverse the logic, by regexing for this format.
For example I work with base64 encoded xml files and just check if the file contains valid xml markup. If it does not I can assume, that it's base64 decoded. This is not very dynamic but works fine for my small application.

This works in Python:
def is_base64(string):
if len(string) % 4 == 0 and re.test('^[A-Za-z0-9+\/=]+\Z', string):
return(True)
else:
return(False)

Try this using a previously mentioned regex:
String regex = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
if("TXkgdGVzdCBzdHJpbmc/".matches(regex)){
System.out.println("it's a Base64");
}
...We can also make a simple validation like, if it has spaces it cannot be Base64:
String myString = "Hello World";
if(myString.contains(" ")){
System.out.println("Not B64");
}else{
System.out.println("Could be B64 encoded, since it has no spaces");
}

if when decoding we get a string with ASCII characters, then the string was
not encoded
(RoR) ruby solution:
def encoded?(str)
Base64.decode64(str.downcase).scan(/[^[:ascii:]]/).count.zero?
end
def decoded?(str)
Base64.decode64(str.downcase).scan(/[^[:ascii:]]/).count > 0
end

Function Check_If_Base64(ByVal msgFile As String) As Boolean
Dim I As Long
Dim Buffer As String
Dim Car As String
Check_If_Base64 = True
Buffer = Leggi_File(msgFile)
Buffer = Replace(Buffer, vbCrLf, "")
For I = 1 To Len(Buffer)
Car = Mid(Buffer, I, 1)
If (Car < "A" Or Car > "Z") _
And (Car < "a" Or Car > "z") _
And (Car < "0" Or Car > "9") _
And (Car <> "+" And Car <> "/" And Car <> "=") Then
Check_If_Base64 = False
Exit For
End If
Next I
End Function
Function Leggi_File(PathAndFileName As String) As String
Dim FF As Integer
FF = FreeFile()
Open PathAndFileName For Binary As #FF
Leggi_File = Input(LOF(FF), #FF)
Close #FF
End Function

import java.util.Base64;
public static String encodeBase64(String s) {
return Base64.getEncoder().encodeToString(s.getBytes());
}
public static String decodeBase64(String s) {
try {
if (isBase64(s)) {
return new String(Base64.getDecoder().decode(s));
} else {
return s;
}
} catch (Exception e) {
return s;
}
}
public static boolean isBase64(String s) {
String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(s);
return m.find();
}

For Java flavour I actually use the following regex:
"([A-Za-z0-9+]{4})*([A-Za-z0-9+]{3}=|[A-Za-z0-9+]{2}(==){0,2})?"
This also have the == as optional in some cases.
Best!

I try to use this, yes this one it's working
^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$
but I added on the condition to check at least the end of the character is =
string.lastIndexOf("=") >= 0

Encode only specific characters in String

I have to encode only some special characters in a string to numeric value.
Say,
String name = "test $#";
I want to encode only characters $ and # in the above string. I tried using below code but it did not work out.
String encode = URLEncoder.encode(StringEscapeUtils.escapeJava(name), "UTF-8");
The encoded value will be like, for white space the encoded value is &#160

What about to split that String (by string#split method - with space as regex), from Array, which it returns you can use last item and you will get there symbols, what you need :)
String name = "test $#";
String nameSplittedArr = name.split(" ");
String yourChars = nameSplittedArr[nameSplittedArr.length-1]; //indexes from zero
That should works :)

As per the comments, I think you are after a customized encoding function. Something like:
public static String EncodeString(String text) {
StringBuffer sb = new StringBuffer();
for (char c : text.toCharArray()) {
if (Character.isLetterOrDigit(c)) {
sb.append(c);
} else {
sb.append("&#" + (int)c + ";");
}
}
return sb.toString();
}
An example of this is here.

java how to escape accented character in string

For example
{"orderNumber":"S301020000","customerFirstName":"ke ČECHA ","customerLastName":"张科","orderStatus":"PENDING_FULFILLMENT_REQUEST","orderSubmittedDate":"May 13, 2015 1:41:28 PM"}
how to get the accented character like "Č" in above json string and escape it in java
Just give some context of this question, please check this question from me
Ajax unescape response text from java servlet not working properly
Sorry for my English :)

You should escape all characters that are greater than 0x7F. You can loop through the String's characters using the .charAt(index) method. For each character ch that needs escaping, replace it with:
String hexDigits = Integer.toHexString(ch).toUpperCase();
String escapedCh = "\\u" + "0000".substring(hexDigits.length) + hexDigits;
I don't think you will need to unescape them in JavaScript because JavaScript supports escaped characters in string literals, so you should be able to work with the string the way it is returned by the server. I'm guessing you will be using JSON.parse() to convert the returned JSON string into a JavaScript object, like this.
Here's a complete function:
public static escapeJavaScript(String source)
{
StringBuilder result = new StringBuilder();
for (int i = 0; i < source.length(); i++)
{
char ch = source.charAt(i);
if (ch > 0x7F)
{
String hexDigits = Integer.toHexString(ch).toUpperCase();
String escapedCh = "\\u" + "0000".substring(hexDigits.length) + hexDigits;
result.append(escapedCh);
}
else
{
result.append(ch);
}
}
return result.toString();
}

Want to replace special characters with equivalent UTF-8 symbols

As part of my application I have written a custom method to extract data from the DB and return it as a string. My string has special characters like the pound sign, which when extracted looks like this:
"MyMobile Blue £54.99 [12 month term]"
I want the £ to be replaced with actual pound symbol. Below is my method:
public String getOfferName(String offerId) {
log(Level.DEBUG, "Entered getSupOfferName");
OfferClient client = (OfferClient) ApplicationContext
.get(OfferClient.class);
OfferObject offerElement = getOfferElement(client, offerId);
if (offerElement == null) {
return "";
} else {
return offerElement.getDisplayValue();
}
}
Can some one help on this?

The document contains XML/HTML entities .
You can use the StringEscapeUtils.unescapeXml() method from commons-lang to parse these back to their unicode equivalents.
If this is HTML rather than XML use the other methods as there are differences in the two sets of entities.

I voted for StringEscapeUtils.unescapeXml() solution. Anyway, here's is a custom solution
String s = "MyMobile Blue £54.99 [12 month term]";
Pattern p = Pattern.compile("&#(\\d+?);");
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while(m.find()) {
int c = Integer.parseInt(m.group(1));
m.appendReplacement(sb, "" + (char)c);
}
m.appendTail(sb);
System.out.println(sb);
output
MyMobile Blue £54.99 [12 month term]
note that it does not accept hex entity reference

Java: Find a specific pattern using Pattern and Matcher

This is the string that I have:
KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007
This is a weather report. I need to extract the following numbers from the report: 10/M13. It is temperature and dewpoint, where M means minus. So, the place in the String may differ and the temperature may be presented as M10/M13 or 10/13 or M10/13.
I have done the following code:
public String getTemperature (String metarIn){
Pattern regex = Pattern.compile(".*(\\d+)\\D+(\\d+)");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 1) {
temperature = matcher.group(1);
System.out.println(temperature);
}
return temperature;
}
Obviously, the regex is wrong, since the method always returns null. I have tried tens of variations but to no avail. Thanks a lot if someone can help!

This will extract the String you seek, and it's only one line of code:
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
Here's some test code:
public static void main(String[] args) throws Exception {
String input = "KLAS 282356Z 32010KT 10SM FEW090 M01/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
System.out.println(tempAndDP);
}
Output:
M01/M13

The regex should look like:
M?\d+/M?\d+
For Java this will look like:
"M?\\d+/M?\\d+"
You might want to add a check for white space on the front and end:
"\\sM?\\d+/M?\\d+\\s"
But this will depend on where you think you are going to find the pattern, as it will not be matched if it is at the end of the string, so instead we should use:
"(^|\\s)M?\\d+/M?\\d+($|\\s)"
This specifies that if there isn't any whitespace at the end or front we must match the end of the string or the start of the string instead.
Example code used to test:
Pattern p = Pattern.compile("(^|\\s)M?\\d+/M?\\d+($|\\s)");
String test = "gibberish M130/13 here";
Matcher m = p.matcher(test);
if (m.find())
System.out.println(m.group().trim());
This returns: M130/13

Try:
Pattern regex = Pattern.compile(".*\\sM?(\\d+)/M?(\\d+)\\s.*");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 2) {
temperature = matcher.group(1);
System.out.println(temperature);
}

Alternative for regex.
Some times a regex is not the only solution. It seems that in you case, you must get the 6th block of text. Each block is separated by a space character. So, what you need to do is count the blocks.
Considering that each block of text does NOT HAVE fixed length
Example:
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
int spaces = 5;
int begin = 0;
while(spaces-- > 0){
begin = s.indexOf(' ', begin)+1;
}
int end = s.indexOf(' ', begin+1);
String result = s.substring(begin, end);
System.out.println(result);
Considering that each block of text does HAVE fixed length
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String result = s.substring(33, s.indexOf(' ', 33));
System.out.println(result);
Prettier alternative, as pointed by Adrian:
String result = rawString.split(" ")[5];
Note that split acctualy receives a regex pattern as parameter

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java %u20AC conversion to euro € - java

Related

how to detect base64 encoded strings? [duplicate]

Encode only specific characters in String

java how to escape accented character in string

Want to replace special characters with equivalent UTF-8 symbols

Java: Find a specific pattern using Pattern and Matcher

Categories

Resources