java - String.replaceAll to replace all characters not in pattern - java

I have a Java regex:
^[a-zA-Z_][a-zA-Z0-9_]{1,126}$
It means:
Begin with an alphabetic character or underscore character.
Subsequent characters may include letters, digits or underscores.
Be between 1 and 127 characters in length.
Now, I want to replace a string having characters not in that regex with a underscore.
Example:
final String label = "23_fgh99##";
System.out.println(label.replaceAll("^[^a-zA-Z_][^a-zA-Z0-9_]{1,126}$", "_"));
But the result is still 23_fgh99##.
How can I "convert" it to _3_fgh99__?

Use this code:
final String label = "23_fgh99##";
System.out.println(label.replaceAll("^[^a-zA-Z_]|(?<!^)[^a-zA-Z0-9_]", "_"));
It outputs _3_fgh99__.
To remove what is "not in the original pattern", you need to negate the first character class and only check a character at the beginning (^[^a-zA-Z_]), and then check other characters not at the beginning with the negated second character class ((?<!^)[^a-zA-Z0-9_]). Then, we just use an alternation symbol | to apply both patterns in 1 replacement operation.

Related

Regex Pattern Format Validation

I need a RegEx pattern which will be sent by the client where the starting characters will be alphanumeric, the length of this starting String will be defined by the number after this String. This is followed by a special character which will always be a single character. This is again followed by a variable length string of alphanumeric characters.
I have come closest to the below String and formats.
[A-Za-z0-9]{4}-[A-Za-z0-9]{5} - RegEx Input String
[A-Za-z0-9]{2}#[A-Za-z0-9]{6} - RegEx Input String
[0-9]{3}#[0-9]{5} - RegEx Input String
[a-z]{5}#[a-z]{5} - RegEx Input String
[A-Z]{4}#[a-z]{4} - RegEx Input String
[\w]{\d{1,1}}(\S{1,1})[\w]{\d{1,1}} - RegEx Format
Is the above pattern and format correct?
Can we validate the RegEx input string against the required RegEx format?
This is a web service which will have an input as [A-Za-z0-9]{4}-[A-Za-z0-9]{5}. I need two things here. First, how do I validate this input to see if it matches the format I want and the proceed. The format is the one I mentioned above as RegEx format.
This regular expression should match the subset of regular expressions you're interested in :
\[(?:[a-zA-Z0-9](?:-[a-zA-Z0-9])?)*\](?:\{\d\})?\S\[(?:[a-zA-Z0-9](?:-[a-zA-Z0-9])?)*\](?:\{\d\})?
Let's break it down :
it matches a line which contains in sequence a character class, an optional quantifier, a separator, a second character class and its second optional quantifier
the separator is any non-whitespace character, \S (you might want to change that to something more specific, or which includes some whitespaces)
the optional quantifier is easy, it's a digit surrounded with literal curly brackets, the whole enclosed in an unbound group we use to make it optional : (?:\{\d\})?. Note that this will not accept multiple digits length, so you might want to change the \d to \d+, nor the more specific {m,n} range quantifier.
a character class is a sequence of 0 or more characters or character-ranges, enclosed in literal brackets.
a character is a letter or digit : [a-zA-Z0-9](?:-[a-zA-Z0-9])? when the unbounded group isn't matched
a character range is a character followed by the literal - followed by another character : [a-zA-Z0-9](?:-[a-zA-Z0-9])? when the unbounded group is matched

Java string validation

I'm finding a regular expression which adheres below rules.
Allowed Characters
Alphabet : a-z A-Z
Numbers : 0-9
I am using [^a-zA-Z0-9] but when call
regex = "[^a-zA-Z0-9]" ;
String key = "message";
if (!key.matches(regex))
message = "Invalid key";
system will show Invalid key, The key should be valid. Could you please help me?
If you want to allow these characters [a-zA-Z0-9] you should not use ^ since it negates what is inside the [].
This expression [^a-zA-Z0-9] means anything that is not a-z A-Z or numbers : 0-9.
You may have seen the ^ being used outside the [] at the begging of a regular expression to indicate the begging string like ^[a-zA-Z0-9].
The below regex would allow one or more alphanumeric characters,
^[A-Za-z0-9]+$
Your regex [^a-zA-Z0-9], matches a single character but not of a alphanumeric character. [^..] called negated character class which do the negation of chars which are present inside that character class.
You don't need to give start or end anchors in the regex when it is passed to matches method. So [A-Za-z0-9]+ would be enough.
Explanation:
^ Anchor which denotes the start.
[A-Za-z0-9]+ , + repeats the preceding token [A-Za-z0-9] one or more times.
$ End of the line.
I think you just have to remove the not-operator. Here is the same example, only the variable is renamed:
invalidChars = "[^a-zA-Z0-9]" ;
String key = "message";
if (key.matches(invalidChars)) {
message = "Invalid key";
}
(However, the negated logic is not very readable.)
Try below Alphanumeric regex
"^[a-zA-Z0-9]$"
^ - Start of string
[a-zA-Z0-9] - multiple characters to include
$ - End of string
With validation use \A \z anchors instead of ^ $:
\\A[a-zA-Z0-9]+\\z

How to write this Java regex?

I need to break the string into words by a hyphen. For example:
"WorkInProgress" is converted to "Work-In-Progress"
"NotComplete" is converted to "Not-Complete"
Most of cases one word starts with capital and ends with lowercase.
But there is one exception, "CIInProgress" should be converted to "CI-In-Progress".
I wrote like the code below, any pattern that has lowercase or "CI", followed by an capital, will be added "-" in middle. But it still can't work for "CIInProgress". Can anyone tell me how to correct it?
String str;
String pattern = "([a-z|CI]+)([A-Z])";
str= str.replaceAll(pattern, "$1\\-$2");
You could use a negative lookbehind,
Regex:
(?<!^)([A-Z][a-z])
Replacement string:
-$1
DEMO
Explanation:
(?<!^) Negative lookbehind is used here , which asserts what proceeds the characters [A-Z](uppercase) and also the following [a-z](lowercase) is not a starting anchor. An uppercase letter and the following lowercase letter will be matched only if it satisfies the above mentioned condition.() capturing groups are used to capture the matched characters, captured chars are stored into groups. Later you could get the captured chars by referring it's group index number.
Code:
System.out.println("WorkInProgress".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
System.out.println("NotComplete".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
System.out.println("CIInProgress".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
Output:
Work-In-Progress
Not-Complete
CI-In-Progress
You can't have | in a character class; it will just get interpreted as a literal vertical bar character. Try:
String pattern = "([a-z]+|CI)([A-Z])";
try this
str= str.replaceAll("(?<=\\p{javaLowerCase})(?=\\p{javaUpperCase})", "-");

Check string contains whitespace along with some other char sequence using regex in java

am using regex expression to check if a string contains white space.
my regex is : ^\\s+$
for example if my string is my name then regex matches should return true.
but it is returning true only if my string contains only spaces no other character.
How to check if a string contains a whitespace or tab or carriage return characters in between/start/end of some string.
^(.*\s+.*)+$ seems to work for me. Accepts anything as long as there is at least one space in the string. This will match the entire string.
If you only want to check for the presence of a space, you can just use \s without any begin or end markers in the string. The difference is that this will only match the individual spaces.
Your regex is not correct.
That's a string representing a regular expression. (as tchrist pointed out correctly)
The corresponding pattern that you get when using Pattern.compile() matches only strings containing one or more whitespace characters, starting from the beginning until the end. Thus, the matching string only consists of whitespace characters.
Try this string instead for Pattern.compile():
"\\s+"
The difference is that without the anchors "^" and "$" there may be other characters around the whitespace character. The whitespace character(s) may be everywhere in the string.
Using this pattern-string the whitespace character(s) must be at the beginning:
"^\\s+"
And here the sequence of whitespace characters has to be at the end:
"\\s+$"
Use org.apache.commons.lang.StringUtils.containsAny(). See http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html.

In a java regex, how can I get a character class e.g. [a-z] to match a - minus sign?

Pattern pattern = Pattern.compile("^[a-z]+$");
String string = "abc-def";
assertTrue( pattern.matcher(string).matches() ); // obviously fails
Is it possible to have the character class match a "-" ?
Don't put the minus sign between characters.
"[a-z-]"
Escape the minus sign
[a-z\\-]
Inside a character class [...] a - is treated specially(as a range operator) if it's surrounded by characters on both sides. That means if you include the - at the beginning or at the end of the character class it will be treated literally(non-special).
So you can use the regex:
^[a-z-]+$
or
^[-a-z]+$
Since the - that we added is being treated literally there is no need to escape it. Although it's not an error if you do it.
Another (less recommended) way is to not include the - in the character class:
^(?:[a-z]|-)+$
Note that the parenthesis are not optional in this case as | has a very low precedence, so with the parenthesis:
^[a-z]|-+$
Will match a lowercase alphabet at the beginning of the string and one or more - at the end.
I'd rephrase the "don't put it between characters" a little more concretely.
Make the dash the first or last character in the character class. For example "[-a-z1-9]" matches lower-case characters, digits or dash.
This works for me
Pattern p = Pattern.compile("^[a-z\\-]+$");
String line = "abc-def";
Matcher matcher = p.matcher(line);
System.out.println(matcher.matches()); // true

Categories

Resources