Forming a regular expression to a Java string - java

I have a String, where only numbers and none, one or more percentages are allowed
so my regex would be: [\d+%], you can test it here
for java i have to transform it,
public static final String regex = "[\\d+\\%]";
and to test it i use this function
public static final String regex = "[\\d+\\%]";
public boolean validate(String myString){
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(myString);
if (!matcher.matches()) {
return false;
}else{
return true;
}
}
The regular expression is not working, also if i use
public static final String regex = "[\\d+%]";
Is there any good online tool for escaping a long regular expression for java?
A more advanced question:
the % should be only allowed if a minimum of one digit is in the String, only a % shouldn't be allowed! And: numbers without a % are only allowed if the number of digits is exactly 8, not less (means: 1234567 is bad, but 12345678 is good)
Testcases:
Bad: %, (empty string), 23b, -1, 7.5, %5a, 1, 1234567
Good: 12345678, 23%, 1%53%53, %7

I have a String, where only numbers and none, one or more percentages are allowed so my regex would be: [\d+%]
Actually, that matches ONE character which may be a digit, a + or a %.
To match what you have described in words, you need something like this:
[\d%]*\d[\d%]*
which matches a string containing at least one digit with optional percent signs. Note that the % character is not a meta-character and hence doesn't need to be escaped in the regex. It will match all of the following:
0
00
%0
0%0
00%
0%0%0
0%%0
and so on, but not just % or any string that contains characters other than digits or % characterss.
Is there any good online tool for escaping a long regular expression for java?
I'm not aware of one. But escaping wasn't the reason your regex wasn't working.
A more advanced question: the % should be only allowed if a minimum of one digit is in the String, only a % shouldn't be allowed!
I think my regex above does that. And for the record, here is what it looks like as a Java String literal:
"[\\d%]*\\d[\\d%]*"
Unless you have TAB, NL, CR, etc characters in the regex , it is sufficient to just replace each individual \ with \\.

Related

Regex for extracting all heading digits from a string

I am trying to extract all heading digits from a string using Java regex without writing additional code and I could not find something to work:
"12345XYZ6789ABC" should give me "12345".
"X12345XYZ6789ABC" should give me nothing
public final class NumberExtractor {
private static final Pattern DIGITS = Pattern.compile("what should be my regex here?");
public static Optional<Long> headNumber(String token) {
var matcher = DIGITS.matcher(token);
return matcher.find() ? Optional.of(Long.valueOf(matcher.group())) : Optional.empty();
}
}
Use a word boundary \b:
\b\d+
See live demo.
If you strictly want to match only digits at the start of the input, and not from each word (same thing when the input contains only one word), use ^:
^\d+
Pattern DIGITS = Pattern.compile("\\b\\d+"); // leading digits of all words
Pattern DIGITS = Pattern.compile("^\\d+"); // leading digits of input
I'd think something like "^[0-9]*" would work. There's a \d that matches other Unicode digits if you want to include them as well.
Edit: removed errant . from the string.

What is the Regex for decimal numbers in Java?

I am not quite sure of what is the correct regex for the period in Java. Here are some of my attempts. Sadly, they all meant any character.
String regex = "[0-9]*[.]?[0-9]*";
String regex = "[0-9]*['.']?[0-9]*";
String regex = "[0-9]*["."]?[0-9]*";
String regex = "[0-9]*[\.]?[0-9]*";
String regex = "[0-9]*[\\.]?[0-9]*";
String regex = "[0-9]*.?[0-9]*";
String regex = "[0-9]*\.?[0-9]*";
String regex = "[0-9]*\\.?[0-9]*";
But what I want is the actual "." character itself. Anyone have an idea?
What I'm trying to do actually is to write out the regex for a non-negative real number (decimals allowed). So the possibilities are: 12.2, 3.7, 2., 0.3, .89, 19
String regex = "[0-9]*['.']?[0-9]*";
Pattern pattern = Pattern.compile(regex);
String x = "5p4";
Matcher matcher = pattern.matcher(x);
System.out.println(matcher.find());
The last line is supposed to print false but prints true anyway. I think my regex is wrong though.
Update
To match non negative decimal number you need this regex:
^\d*\.\d+|\d+\.\d*$
or in java syntax : "^\\d*\\.\\d+|\\d+\\.\\d*$"
String regex = "^\\d*\\.\\d+|\\d+\\.\\d*$"
String string = "123.43253";
if(string.matches(regex))
System.out.println("true");
else
System.out.println("false");
Explanation for your original regex attempts:
[0-9]*\.?[0-9]*
with java escape it becomes :
"[0-9]*\\.?[0-9]*";
if you need to make the dot as mandatory you remove the ? mark:
[0-9]*\.[0-9]*
but this will accept just a dot without any number as well... So, if you want the validation to consider number as mandatory you use + ( which means one or more) instead of *(which means zero or more). That case it becomes:
[0-9]+\.[0-9]+
If you on Kotlin, use ktx:
fun String.findDecimalDigits() =
Pattern.compile("^[0-9]*\\.?[0-9]*").matcher(this).run { if (find()) group() else "" }!!
Your initial understanding was probably right, but you were being thrown because when using matcher.find(), your regex will find the first valid match within the string, and all of your examples would match a zero-length string.
I would suggest "^([0-9]+\\.?[0-9]*|\\.[0-9]+)$"
There are actually 2 ways to match a literal .. One is using backslash-escaping like you do there \\., and the other way is to enclose it inside a character class or the square brackets like [.]. Most of the special characters become literal characters inside the square brackets including .. So use \\. shows your intention clearer than [.] if all you want is to match a literal dot .. Use [] if you need to match multiple things which represents match this or that for example this regex [\\d.] means match a single digit or a literal dot
I have tested all the cases.
public static boolean isDecimal(String input) {
return Pattern.matches("^[-+]?\\d*[.]?\\d+|^[-+]?\\d+[.]?\\d*", input);
}

split on integer values but not floating point values

I have a java program where I need to split on integer values but not floating point values
ie. "1/\\2" should produce: [1,/\\,2]
but "1.0/\\2.0" should produce: [1.0,/\\,2.0]
does anybody have any ideas?
or could anybody point me in the direction of how to split on the specific strings "\\/" and "/\\" ?
UPDATE: sorry! one more case! for the string "100 /\ 3.4e+45" I need to split it into:
[100,/\,3.4,e,+,45]
my current regex is (kind of really ugly):
line.split("\\s+|(?<=[-+])|(?=[-+])|(?:(?<=[0-9])(?![0-9.]|$))|(?:(?<![0-9.]|^)(?=[0-9]))|(?<=[-+()])|(?=[-+()])|(?<=e)|(?=e)");
and for the string: "100 /\ 3.4e+45" is giving me:
[100,/\,3.4,+,45]
This regex should do it:
(?:(?<=[0-9])(?![0-9.]|$))|(?:(?<![0-9.]|^)(?=[0-9]))
It's two checks, basically matching:
A digit not followed by a digit, a decimal point, or the end of text.
A digit not preceded by a digit, a decimal point, or the start of text.
It will match the empty space after/before the digit, so you can use this regex in split().
See regex101 for demo.
Follow-up
could anybody point me in the direction of how to split on the specific strings "\/" and "/\""
If you want to split before a specific pattern, use a positive lookahead: (?=xxx). If you want to split after a specific pattern, use a positive lookbehind: (?<=xxx). To do either, separate by |:
(?<=xxx)|(?=xxx)
where xxx is the text \/ or /\, i.e. the regex \\/|/\\, and doubling for Java string literal:
"(?<=\\\\/|/\\\\)|(?=\\\\/|/\\\\)"
See regex101 for demo.
You could try something like this:
String regex = "\\d+(.\\d+)?", str = "1//2";
Matcher m = Pattern.compile(regex).matcher(str);
ArrayList<String> list = new ArrayList<String>();
int index = 0;
for(index = 0 ; m.find() ; index = m.end()) {
if(index != m.start()) list.add(str.substring(index, m.start()));
list.add(str.substring(m.start(), m.end()));
}
list.add(str.substring(index));
The idea is to find number using regex and Matcher, and also add the strings in between.

Password check using regex in Java

My application has a feature to check password. I need to get this scenario:
password length 10 ~ 32 .
It has to be a combination of either:
character and numbers
characters and special characters
numbers and special characters
Current code in application:
private boolean isSpecialMixedText(String password)
{
String number = "[0-9]";
String english = "[a-zA-Z]";
String special = "[!##\\$%^&*()~`\\-=_+\\[\\]{}|:\\\";',\\./<>?£¥\\\\]";
Pattern numberPattern = Pattern.compile(number);
Matcher numberMatcher = numberPattern.matcher(password);
Pattern englishPattern = Pattern.compile(english);
Matcher englishMatcher = englishPattern.matcher(password);
Pattern specialPattern = Pattern.compile(special);
Matcher specialMatcher = specialPattern.matcher(password);
return numberMatcher.find() && englishMatcher.find() || specialMatcher.find();
}
Please help me get the combination working
Actually, the regexes look fine. The problem is in this statement:
return numberMatcher.find() && englishMatcher.find() ||
specialMatcher.find();
It actually needs to be something like this:
boolean n = numberMatcher.find();
boolean e = englishMatcher.find();
boolean s = specialMatcher.find();
return (n && e) || (n && s) || (e && s);
And I agree with #adelphus' comment. Your rules for deciding what passwords are acceptable are very English-language-centric.
In my opinion your logic is wrong because you look for combination (so only these characters allowed) of: characters and numbers OR characters and special characters OR numbers and special characters. However with pair of matches like: [0-9] and [a-zA-Z] you are actually looking for a String with some digits and some letter, but it could be also 123ABC#$%#$%$#%#$ (because it has letters and digits).
What you need is something to check, if given string is composed ONLY of of allowed combination of characters. I think you can use one regex here (not too elegant, but effective) like:
^(?:((?=.*[A-Za-z].*)(?=.*[0-9].*)[A-Za-z0-9]{10,32})|((?=.*[-!##\\$%^&*()~`\=_+\[\]{}|:\";',.\/<>?£¥\\].*)(?=.*[0-9].*)[0-9-!##\\$%^&*()~`\=_+\[\]{}|:\";',.\/<>?£¥\\]{10,32})|((?=.*[-!##\\$%^&*()~`\=_+\[\]{}|:\";',.\/<>?£¥\\].*)(?=.*[A-Za-z].*)[-A-Za-z!##\\$%^&*()~`\=_+\[\]{}|:\";',.\/<>?£¥\\]{10,32}))$
DEMO - it show valid and invalid matches.
This is quite long regex, but mainly because of you special character class. This regular expression is composed of three parts with similar structure:
positive lookagead for required characters + character class of
allowed characters
On an example:
(?=.*[A-Za-z].*)(?=.*[0-9].*)[A-Za-z0-9]{10,32}
means that string need to have:
(?=.*[A-Za-z].*) - at least one letter (positive lookahead for letter which could be surrounded by other characters),
(?=.*[0-9].*) - at least one number (positive lookahead for digit which could be surrounded by other characters)
[A-Za-z0-9]{10,32} - from 10 to 32 letters or digits,
in effect, the given password need to have 10 to 32 characters, but both letters and digits, proportion is not important.
Whats more, the ^ at beginning and $ in the end ensure that the whole examined string has such composition.
Also I would agree with others, it is not best idea to restrict allowed character in password like that, but it is your decision.

Escaping special characters in Java Regular Expressions

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?
This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.
For example, consider a simple regex like \d+\.\d+ that matches numbers with a decimal point like 1.2, as well as the following code:
String digit = "d";
String point = ".";
String regex1 = "\\d+\\.\\d+";
String regex2 = Pattern.quote(digit + "+" + point + digit + "+");
Pattern numbers1 = Pattern.compile(regex1);
Pattern numbers2 = Pattern.compile(regex2);
System.out.println("Regex 1: " + regex1);
if (numbers1.matcher("1.2").matches()) {
System.out.println("\tMatch");
} else {
System.out.println("\tNo match");
}
System.out.println("Regex 2: " + regex2);
if (numbers2.matcher("1.2").matches()) {
System.out.println("\tMatch");
} else {
System.out.println("\tNo match");
}
Not surprisingly, the output produced by the above code is:
Regex 1: \d+\.\d+
Match
Regex 2: \Qd+.d+\E
No match
That is, regex1 matches 1.2 but regex2 (which is "dynamically" built) does not (instead, it matches the literal string d+.d+).
So, is there a method that would automatically escape each regex meta-character?
If there were, let's say, a static escape() method in java.util.regex.Pattern, the output of
Pattern.escape('.')
would be the string "\.", but
Pattern.escape(',')
should just produce ",", since it is not a meta-character. Similarly,
Pattern.escape('d')
could produce "\d", since 'd' is used to denote digits (although escaping may not make sense in this case, as 'd' could mean literal 'd', which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.').
Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?
If you are looking for a way to create constants that you can use in your regex patterns, then just prepending them with "\\" should work but there is no nice Pattern.escape('.') function to help with this.
So if you are trying to match "\\d" (the string \d instead of a decimal character) then you would do:
// this will match on \d as opposed to a decimal character
String matchBackslashD = "\\\\d";
// as opposed to
String matchDecimalDigit = "\\d";
The 4 slashes in the Java string turn into 2 slashes in the regex pattern. 2 backslashes in a regex pattern matches the backslash itself. Prepending any special character with backslash turns it into a normal character instead of a special one.
matchPeriod = "\\.";
matchPlus = "\\+";
matchParens = "\\(\\)";
...
In your post you use the Pattern.quote(string) method. This method wraps your pattern between "\\Q" and "\\E" so you can match a string even if it happens to have a special regex character in it (+, ., \\d, etc.)
I wrote this pattern:
Pattern SPECIAL_REGEX_CHARS = Pattern.compile("[{}()\\[\\].+*?^$\\\\|]");
And use it in this method:
String escapeSpecialRegexChars(String str) {
return SPECIAL_REGEX_CHARS.matcher(str).replaceAll("\\\\$0");
}
Then you can use it like this, for example:
Pattern toSafePattern(String text)
{
return Pattern.compile(".*" + escapeSpecialRegexChars(text) + ".*");
}
We needed to do that because, after escaping, we add some regex expressions. If not, you can simply use \Q and \E:
Pattern toSafePattern(String text)
{
return Pattern.compile(".*\\Q" + text + "\\E.*")
}
The only way the regex matcher knows you are looking for a digit and not the letter d is to escape the letter (\d). To type the regex escape character in java, you need to escape it (so \ becomes \\). So, there's no way around typing double backslashes for special regex chars.
The Pattern.quote(String s) sort of does what you want. However it leaves a little left to be desired; it doesn't actually escape the individual characters, just wraps the string with \Q...\E.
There is not a method that does exactly what you are looking for, but the good news is that it is actually fairly simple to escape all of the special characters in a Java regular expression:
regex.replaceAll("[\\W]", "\\\\$0")
Why does this work? Well, the documentation for Pattern specifically says that its permissible to escape non-alphabetic characters that don't necessarily have to be escaped:
It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.
For example, ; is not a special character in a regular expression. However, if you escape it, Pattern will still interpret \; as ;. Here are a few more examples:
> becomes \> which is equivalent to >
[ becomes \[ which is the escaped form of [
8 is still 8.
\) becomes \\\) which is the escaped forms of \ and ( concatenated.
Note: The key is is the definition of "non-alphabetic", which in the documentation really means "non-word" characters, or characters outside the character set [a-zA-Z_0-9].
Use this Utility function escapeQuotes() in order to escape strings in between Groups and Sets of a RegualrExpression.
List of Regex Literals to escape <([{\^-=$!|]})?*+.>
public class RegexUtils {
static String escapeChars = "\\.?![]{}()<>*+-=^$|";
public static String escapeQuotes(String str) {
if(str != null && str.length() > 0) {
return str.replaceAll("[\\W]", "\\\\$0"); // \W designates non-word characters
}
return "";
}
}
From the Pattern class the backslash character ('\') serves to introduce escaped constructs. The string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
Example: String to be matched (hello) and the regex with a group is (\(hello\)). Form here you only need to escape matched string as shown below. Test Regex online
public static void main(String[] args) {
String matched = "(hello)", regexExpGrup = "(" + escapeQuotes(matched) + ")";
System.out.println("Regex : "+ regexExpGrup); // (\(hello\))
}
Agree with Gray, as you may need your pattern to have both litrals (\[, \]) and meta-characters ([, ]). so with some utility you should be able to escape all character first and then you can add meta-characters you want to add on same pattern.
use
pattern.compile("\"");
String s= p.toString()+"yourcontent"+p.toString();
will give result as yourcontent as is

Categories

Resources