Can someone explain the regular expression part of String#replaceAll(..) method? [duplicate] - java

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
//Its a question on replacement of duplicate characters
public class RemoveDuplicateChars {
static String testcase1 = "DPMD Jayawardene";
public static void main(String args[]){
RemoveDuplicateChars testInstance= new RemoveDuplicateChars();
String result = testInstance.remove(testcase1);
System.out.println(result);
}
//write your code here
public String remove(String str){
return str.replaceAll("(.)(?=.*\\1)", "");//how this replacement working
}
}

As you can see from the name of the class - it removes characters that repeat in a string.
Breakdown:
(.) - stands for any character, the brackets are used for grouping, so we'll be able to reference it later on using \1
(?=) - lookahead
(?=.*\\1) - we're looking forward
.* consuming any number of characters and looking for our first character\1
If the regex is truthy, the referenced character will be replaced with the empty string.
See Fiddle

From java.util.Pattern:
(.) : Match any character in a capture group (basically a variable named \1)
(?= : Zero-width positive lookahead (make sure the rest of the string matches)
.* any number of characters followed by
\\1 the captured group
In other words, it matches any character that also appears later in the string (i.e. is a duplicate). In Java, this would be:
for(int i=0; i<str.length(); i++) {
char captured = str.charAt(i); // (.)
if (str.substring(i+1).matches(".*" + captured)) { // (?=.*\1)
// the char is a duplicate, replace it with ""
}
}

Related

How can I test if a string contains any uppercase letter using Java with regular expression? [duplicate]

This question already has answers here:
How do I check if a Java String contains at least one capital letter, lowercase letter, and number?
(9 answers)
Closed 5 years ago.
I want to use only regular expression
Raida => true
raida => false
raiDa => true
I have tried this :
String string = "Raida"
boolean bool = string.contains("?=.*[A-Z]");
System.out.println(bool);
but it is not working
A simple solution would be:
boolean hasUpperCase = !string.equals(string.toLowerCase());
You convert the String to lowercase, if it is equal to the original string then it does not contain any uppercase letter.
In your example Raida you'll be compairing
Raida to raida these two are not equal so meaning the original string contains an uppercase letter
The answer with regular expression solution has already been posted as well as many other rather convenient options. What I would also suggest here is using Java 8 API for that purpose. It might not be the best option in terms of performance, but it simplifies code a lot. The solution can be written within one line:
string.chars().anyMatch(Character::isUpperCase);
The benefit of this solution is readability. The intention is clear. Even if you want to inverse it.
Something which is close to your original idea. You basically just check whether there is a part in the string which contains an upper case letter - there can be any other characters after and before it. Here I included a small main method for testing purposes.
public static void main(String[] args) {
test("raid");
test("raId");
test("Raida");
test("R");
test("r");
test(".");
test("");
}
public static void test(String word) {
//(?s) enables the DOTALL mode
System.out.println(word + " -> " + word.matches("(?s).*[A-Z].*"));
}
I edited the example to deal with line breaks too. I just tested in on a Windows machine. This now uses DOTALL: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#DOTALL. In this mode, the expression . matches any character, including a line terminator.
Your pattern just needs to be surrounded by ().
Pattern pattern = Pattern.compile("(?=.*[A-Z])");
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
bool = true;
}
If you want an alternative to regex, try using Character.isUpperCase().
boolean bool = false;
for (int i = 0; i < string.length(); i++) {
char c = string.charAt(i);
if (Character.isUpperCase(c)) {
bool = true;
break;
}
}
System.out.println(bool);
You can replace all non UpperLetter and calculate the length, if it is great or equal to 1 this mean there are at least one upper letter :
String input = "raiDa";
boolean check = input.replaceAll("[^A-Z]", "").length() >= 1;
Beside String::contain not use regex read the documentation
Ideone demo

Replace all characters between two delimiters using regex

I'm trying to replace all characters between two delimiters with another character using regex. The replacement should have the same length as the removed string.
String string1 = "any prefix [tag=foo]bar[/tag] any suffix";
String string2 = "any prefix [tag=foo]longerbar[/tag] any suffix";
String output1 = string1.replaceAll(???, "*");
String output2 = string2.replaceAll(???, "*");
The expected outputs would be:
output1: "any prefix [tag=foo]***[/tag] any suffix"
output2: "any prefix [tag=foo]*********[/tag] any suffix"
I've tried "\\\\\[tag=.\*?](.\*?)\\\\[/tag]" but this replaces the whole sequence with a single "\*".
I think that "(.\*?)" is the problem here because it captures everything at once.
How would I write something that replaces every character separately?
you can use the regex
\w(?=\w*?\[)
which would match all characters before a "[\"
see the regex demo, online compiler demo
You can capture the chars inside, one by one and replace them by * :
public static String replaceByStar(String str) {
String pattern = "(.*\\[tag=.*\\].*)\\w(.*\\[\\/tag\\].*)";
while (str.matches(pattern)) {
str = str.replaceAll(pattern, "$1*$2");
}
return str;
}
Use like this it will print your tx2 expected outputs :
public static void main(String[] args) {
System.out.println(replaceByStar("any prefix [tag=foo]bar[/tag] any suffix"));
System.out.println(replaceByStar("any prefix [tag=foo]loooongerbar[/tag] any suffix"));
}
So the pattern "(.*\\[tag=.*\\].*)\\w(.*\\[\\/tag\\].*)" :
(.*\\[tag=.*\\].*) capture the beginning, with eventually some char in the middle
\\w is for the char you want to replace
(.*\\[\\/tag\\].*) capture the end, with eventually some char in the middle
The substitution $1*$2:
The pattern is (text$1)oneChar(text$2) and it will replace by (text$1)*(text$2)

Java Regex X{n,m} X, at least n but not more than m times

I am trying to understand how to match an email address to the following pattern:
myEmail#something.any
The any should be between 2,4 characters.
Please find Java code below. I cannot understand why it returns true. Thanks!
public static void main(String[] args){
String a = "daniel#gmail.com";
String b = "[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.+-]+.[a-zA-Z]{2,4}";
String c = "MyNameis1#abcx.comfff";
Boolean b1 = c.matches(b);
System.out.println(b1);
}
OUTPUT: true
In regex, . matches any character (except newline). If you want to match . literally, you need to escape it:
[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.+-]+\\.[a-zA-Z]{2,4}
This is better, but it still matches the MyNameis1#abcx.comf portion. We can add an end of string anchor ($) to ensure there are no trailing unmatched characters:
[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.+-]+\\.[a-zA-Z]{2,4}$
Escape the . in String b = "[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.+-]+.[a-zA-Z]{2,4}";. It is a special character in regex.
Use : String b = "[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.+-]+\\.[a-zA-Z]{2,4}";
In your expression dot meant any character. escape that and make sure no character follows post your min/max char checks like below:
[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.+-]+\\.[a-zA-Z]{2,4}$

Why isn't my regex matching uppercase characters and underscores?

I have the following Java code:
public static void main(String[] args) {
String var = "ROOT_CONTEXT_MATCHER";
boolean matches = var.matches("/[A-Z][a-zA-Z0-9_]*/");
System.out.println("The value of 'matches' is: " + matches);
}
This prints: The value of 'matches' is: false
Why doesn't my var match the regex? If I am reading my regex correctly, it matches any String:
Beginning with an upper-case char, A-Z; then
Consisting of zero or more:
Lower-case chars a-z; or
Upper-case chars A-Z; or
Digits 0-9; or
An underscore
The String "ROOT_CONTEXT_MATCHER":
Starts with an A-Z char; and
Consists of 19 subsequent characters that are all uppper-case A-Z or are an underscore
What's going on here?!?
The issue is with the forward slash characters at the beginning and at the end of the regex. They don't have any special meaning here and are treated as literals. Simply remove them to get it fixed:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
If you intended to use metacharacters for boundary matching, the correct characters are ^ for the beginning of the line, and $ for the end of the line:
boolean matches = var.matches("^[A-Z][a-zA-Z0-9_]*$");
although these are not needed here because String#matches would match the entire string.
You need to remove regex delimiers i.e. / from Java regex:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
That can be further shortened to:
boolean matches = var.matches("[A-Z]\\w*");
Since \\w is equivalent of [a-zA-Z0-9_] (word character)

Regular Expression - inserting space after comma only if succeeded by a letter or number

In Java I want to insert a space after a String but only if the character after the comma is succeeded by a digit or letter. I am hoping to use the replaceAll method which uses regular expressions as a parameter. So far I have the following:
String s1="428.0,chf";
s1 = s1.replaceAll(",(\\d|\\w)",", ");
This code does successfully distinguish between the String above and one where there is already a space after the comma. My problem is that I can't figure out how to write the expression so that the space is inserted. The code above will replace the c in the String shown above with a space. This is not what I want.
s1 should look like this after executing the replaceAll: "428.0 chf"
s1.replaceAll(",(?=[\da-zA-Z])"," ");
(?=[\da-zA-Z]) is a positive lookahead which would look for a digit or a word after ,.This lookahead would not be replaced since it is never included in the result.It's just a check
NOTE
\w includes digit,alphabets and a _.So no need of \d.
A better way to represent it would be [\da-zA-Z] instead of \w since \w also includes _ which you do not need 2 match
Try this, and note that $1 refers to your matched grouping:
s1.replaceAll(",(\\d|\\w)"," $1");
Note that String.replaceAll() works in the same way as a Matcher.replaceAll(). From the doc:
The replacement string may contain references to captured subsequences
String s1="428.0,chf";
s1 = s1.replaceAll(",([^_]\\w)"," $1"); //Match alphanumeric except '_' after ','
System.out.println(s1);
Output: -
428.0 chf
Since \w matches digits, words, and an underscore, So, [^_] negates the underscore from \w..
$1 represents the captured group.. You captured c after , here, so replace c with _$1 -> _c.. "_" represent a space..
Try this....
public class Tes {
public static void main(String[] args){
String s1="428.0,chf";
String[] sArr = s1.split(",");
String finalStr = new String();
for(String s : sArr){
finalStr = finalStr +" "+ s;
}
System.out.println(finalStr);
}
}

Categories

Resources