I am trying to apply the below pattern:
Pattern p = Pattern.compile(".*?");
Matcher m = p.matcher("RAJ");
StringBuffer sb = new StringBufffer();
while(m.find()) {
m.appendReplacement(sb, "L");
}
m.appendTail(sb);
Expected Output : LLL
Actual output : LRLALJL
Does the Dot(.) in the above regex match the position between the characters? If not why is the above output received
The .*? matches any number of characters, but as few as necessary to match the whole regex (the ? makes the * reluctant (also known as lazy)). Since there's nothing after that in the regex, this will always match the empty string (a.k.a the place between characters).
If you want at least a single character to be matched try .+?. Note that this is the same as just . if there's nothing else after it in the regex.
You can get it doing this:
String s = "RAJ";
s = s.replaceAll(".","L");
System.out.println(s);
You can do it using a Matcher and find method, but replaceAll accepts a regex.
It is not that . matches between the characters, but that * means 0 or more and the ? means as few as possible.
So "Zero or more things, and as few of them as possible" will always match Zero things, as that is the fewest possible, if it's not followed by something else the expression is looking for.
.{1} would result in an output of LLL as it matches anything once.
The * in your regex .*? means none or more repetitions. If you want to match at least a single character use the regex .+?.
Related
I have this string "u2x4m5x7" and I want replace all the characters but a number followed by an x with "".
The output should be:
"2x5x"
Just the number followed by the x.
But I am getting this:
"2x45x7"
I'm doing this:
String string = "u2x4m5x7";
String s = string.replaceAll("[^0-9+x]","");
Please help!!!
Here is a one-liner using String#replaceAll with two replacements:
System.out.println(string.replaceAll("\\d+(?!x)", "").replaceAll("[^x\\d]", ""));
Here is another working solution. We can iterate the input string using a formal pattern matcher with the pattern \d+x. This is the whitelist approach, of trying to match the variable combinations we want to keep.
String input = "u2x4m5x7";
Pattern pattern = Pattern.compile("\\d+x");
Matcher m = pattern.matcher(input);
StringBuilder b = new StringBuilder();
while(m.find()) {
b.append(m.group(0));
}
System.out.println(b)
This prints:
2x5x
It looks like this would be much simpler by searching to get the match rather than replacing all non matches, but here is a possible solution, though it may be missing a few cases:
\d(?!x)|[^0-9x]|(?<!\d)x
https://regex101.com/r/v6udph/1
Basically it will:
\d(?!x) -- remove any digit not followed by an x
[^0-9x] -- remove all non-x/digit characters
(?<!\d)x -- remove all x's not preceded by a digit
But then again, grabbing from \dx would be much simpler
Capture what you need to $1 OR any character and replace with captured $1 (empty if |. matched).
String s = string.replaceAll("(\\d+x)|.", "$1");
See this demo at regex101 or a Java demo at tio.run
I'm executing this regex code expecting a grouping value of 11, but am getting a 1. Seems like the grouping contains the correct regex for getting one or more digits prior to a known value. I'm sure it is simple, bit I cannot seem to figure it out.
String mydata = "P0Y0M0W0DT11H0M0S";
Pattern pattern = Pattern.compile("P.*(\\\\d+)H.*");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find()){
System.out.println(matcher.group(1));
}
Try this
public static void main(String a1[]) {
String mydata = "P0Y0M0W0DT11H0M0S";
Pattern pattern = Pattern.compile("P.*?(\\d+)H.*");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find()){
System.out.println(matcher.group(1));
}
}
Output
11
The problem is that .* will try to consume/match as much as possible before the next part is checked. Thus in your regex P.*(\d+)H.* the first .* will match 0Y0M0W0DT1 since that's as much as can be matched with the group still being able to match a single digit afterwards.
If you make that quantifier lazy/reluctant (i.e. .*?), it will try to match as little as possible so of the possible matches 0Y0M0W0DT1 and 0Y0M0W0DT it will select the shorter one and leave all the digits for the group to match.
Thus the regex P.*?(\d+)H.* should do what you want.
Additional note: since you're using Matcher#find() you'd not need the catch-all-expression .* at the end. It would also match any string that contains the character H preceeded by at least one digit and a P somewhere in front of those digits. So if you want to be more restrictive your regex would need to be enhanced.
I have the following string:
CLASSIC STF
CLASSIC
am using regexp to match the strings.
Pattern p = Pattern.compile("^CLASSIC(\\s*)$", Pattern.CASE_INSENSITIVE);
CLASSIC STF is also being displayed.
am using m.find()
How is it possible that only CLASSIC is displayed not CLASSIC STF
Thanks for helping.
If you use Matcher.find() the expression CLASSIC(\s*) will match CLASSIC STF.
Matcher.matches() will return false, however, since it requires the expression to match the entire input.
To make Matcher.find() do the same, change the expression to ^CLASSIC(\s*)$, as said by reto.
By default ^ and $ match against the beginning and end of the entire input string respectively, ignoring any newlines. I would expect that your expression would not match on the string you mention. Indeed:
String pattern = "^CLASSIC(\\s*)$";
String input = "CLASSIC STF%nCLASSIC";
Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(String.format(input));
while (m.find()) {
System.out.println(m.group());
}
prints no results.
If you want ^ and $ to match the beginning and end of all lines in the string you should enable "multiline mode". Do so by replacing line 3 above with Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE + Pattern.MULTILINE);. When I do so I get one result, namely: "CLASSIC".
You also asked why "CLASSIC STF" is not matched. Let's break down your pattern to see why. The pattern says: match anything that...
starts at the beginning of a line ~ ^
begins with a C, followed by an L, A, S, S, I and C ~ CLASSIC
after which 0 or more whitespace characters follow ~ (\s*)
after which we see a line ending ~ $
After matching the space in "CLASSIC STF" (step 3) we are looking at a character "S". This doesn't match a line ending (step 4), so we cannot match the regex.
Note that the parentheses in your regex are not necessary. You can leave them out.
The Javadoc of the Pattern class is very elaborate. It could be helpful to read it.
EDIT:
If you want to check if a string/line contains the word "CLASSIC" using a regex, then I'd recommend to use the regex \bCLASSIC\b. If you want to see if a string starts with the word "CLASSIC", then I'd use ^CLASSIC\b.
I wonder if this would help:
practice = c("CLASSIC STF", "CLASSIC")
grep("^CLASSIC[[:space:]STF]?", practice)
I'm trying to make a regex all or nothing in the sense that the given word must EXACTLY match the regular expression - if not, a match is not found.
For instance, if my regex is:
^[a-zA-Z][a-zA-Z|0-9|_]*
Then I would want to match:
cat9
cat9_
bob_____
But I would NOT want to match:
cat7-
cat******
rango78&&
I want my regex to be as strict as possible, going for an all or nothing approach. How can I go about doing that?
EDIT: To make my regex absolutely clear, a pattern must start with a letter, followed by any number of numbers, letters, or underscores. Other characters are not permitted. Below is the program in question I am using to test out my regex.
Pattern p = Pattern.compile("^[a-zA-Z][a-zA-Z|0-9|_]*");
Scanner in = new Scanner(System.in);
String result = "";
while(!result.equals("-1")){
result = in.nextLine();
Matcher m = p.matcher(result);
if(m.find())
{
System.out.println(result);
}
}
I think that if you use String.matches(regex), then you will get the effect you are looking for. The documentation says that matches() will return true only if the entire string matches the pattern.
The regex won't match the second example. It's already strict, since * and & are not in the allowed set of characters.
It may match a prefix, but you can avoid this by adding '$' to the end of the regex, which explicitly matches end of input. So try,
^[a-zA-Z][a-zA-Z|0-9|_]*$
This will ensure the match is against the entire input string, and not just a prefix.
Note that \w is the same as [A-Za-z0-9_]. And you need to anchor to the end of the string like so:
Pattern p = Pattern.compile("^[a-zA-Z]\\w*$")
I am trying to search this string:
,"tt" : "ABC","r" : "+725.00","a" : "55.30",
For:
"r" : "725.00"
And here is my current code:
Pattern p = Pattern.compile("([r]\".:.\"[+|-][0-9]+.[0-9][0-9]\")");
Matcher m = p.matcher(raw_string);
I've been trying multiple variations of the pattern, and a match is never found. A second set of eyes would be great!
Your regexp actually works, it's almost correct
Pattern p = Pattern.compile("\"[r]\".:.\"[+|-][0-9]+.[0-9][0-9]\"");
Matcher m = p.matcher(raw_string);
if (m.find()){
String res = m.toMatchResult().group(0);
}
The next line should read:
if ( m.find() ) {
Are you doing that?
A few other issues: You're using . to match the spaces surrounding the colon; if that's always supposed to be whitespace, you should use + (one or more spaces) or \s+ (one or more whitespace characters). On the other hand, the dot between the digits is supposed to match a literal ., so you should escape it: \. Of course, since this is a Java String literal, you need to escape the backslashes: \\s+, \\..
You don't need the square brackets around the r, and if you don't want to match a | in front of the number you should change [+|-] to [+-].
While some of these issues I've mentioned could result in false positives, none of them would prevent it from matching valid input. That's why I suspect you aren't actually applying the regex by calling find(). It's a common mistake.
First thing try to escape your dot symbol: ...[0-9]+\.[0-9][0-9]...
because the dot symbol match any character...
Second thing: the [+|-]define a range of characters but it's mandatory...
try [+|-]?
Alban.