A regular expression to match and exclude word/string

A regular expression to match and exclude word/string - java

I have strings like
patric NY abc other
patric ny
Expected output: patric ny and patric NY.
So, patric ny is varying statement which could be address. And abc MIGHT be there.
So, I want to retrieve random address like whatever we have before ABC
and if ABC is not present , give the complete string.
I tried
(.+?(?=abc))
It gives me result for patric NY abc other but not for patric ny.
Any help would be gratefully appreciated.

Extracting approach
You may use
^(.*?)(?:\s+abc\b.*)?$
See the regex demo.
Details
^ - start of string
(.*?) - Capturing group 1: any 0+ chars other than line break chars, as few as possible
(?:\s+abc\b.*)? - an optional non-capturing group that matches 1+ whitespaces, abc, a word boundary and any 0+ chars other than line break chars, as many as possible
$ - end of string.
Replacing approach
You may just remove 1+ whitespaces, abc and the rest from your string:
String result = input.replaceFirst("(?s)\\s+abc.*", "");
Or, if abc is a whole word:
String result = input.replaceFirst("(?s)\\s+abc\\b.*", "");
See the regex demo.
The replaceFirst() matches the first occurrence of the pattern and removes it.
Pattern details
(?s) - DOTALL flag making . match any char
\s+ - 1+ whitespaces
abc - an abc substring
\b - a word boundary
.* - the rest of the string

you can try this:
intput.replaceFirst("(patric (?:NY|ny)) ((?:abc|ABC).*)","$1")

Related

Find a three-digit number in a string using replaceAll()

I have String from which I need to extract a keyword.
Something like: "I have 100 friends and 1 evil".
I need to extract "100" from that String using only replaceAll function and appropriate regex.
I tried to do it in that way:
String input = "I have 100 friends and 1 evil";
String result = input.replaceAll("[^\\d{3}]", "")
But it doesn't work. Any help would be appreciated.

You can consider any of the solutions below:
String result = input.replaceFirst(".*?(\\d{3}).*", "$1");
String result = input.replaceFirst(".*?(?<!\\d)(\\d{3})(?!\\d).*", "$1");
String result = input.replaceFirst(".*?\\b(\\d{3})\\b.*", "$1");
String result = input.replaceFirst(".*?(?<!\\S)(\\d{3})(?!\\S).*", "$1");
See the regex demo. NOTE you may use replaceAll here, too, but it makes little sense as the replacement must occur only once in this case.
Here,
.*? - matches any zero or more chars other than line break chars, as few as possible
(\d{3}) - captures into Group 1 any three digits
.* - matches any zero or more chars other than line break chars, as many as possible.
The (?<!\d) / (?!\d) lookarounds are digit boundaries, there is no match if the sequence is four or more digits. \b are word boundaries, there will be no match of the three digits are glued to a letter, digit or underscore. (?<!\S) / (?!\S) lookarounds are whitespace boundaries, there must be a space or start of string before the match and either a space or end of string after.
The replacement is $1, the value of Group 1.
See the Java demo:
String input = "I have 100 friends and 1 evil";
System.out.println(input.replaceFirst(".*?(\\d{3}).*", "$1"));
System.out.println(input.replaceFirst(".*?(?<!\\d)(\\d{3})(?!\\d).*", "$1"));
System.out.println(input.replaceFirst(".*?\\b(\\d{3})\\b.*", "$1"));
System.out.println(input.replaceFirst(".*?(?<!\\S)(\\d{3})(?!\\S).*", "$1"));
All output 100.

Java regex match anything except a single expression

I am trying to replace everything except a specific expression including digits in java using only the replaceAll() method and a single regex.
Given the String P=32 N=5 M=2 I want to extract each variable independently.
I can match the expression N=5 with the regex N=\d, but I can't seem to find an inverse expression that will match anything but N=\d, where x may be any digit.
I do not want to use Pattern or Matcher but solve this using regex only. So for x, y, z being any digit, I want to be able to replace everything but the expression N=y in a String P=x N=y M=z:
String input = "P=32 N=5 M=2";
output = input.replaceAll(regex, "");
System.out.println(output);
// expected "N=5"

You may use
s = s.replaceAll("\\s*\\b(?!N=\\d)\\w+=\\d+", "").trim();
See the Java demo and the regex demo.
Details
\s* - 0+ whitespaces
\b - a word boundary
(?!N=\d) - immediately to the right, there should be no N= and any digit
\w+ - 1+ letters/digits/_
= - an = sign
\d+ - 1+ digits.

Java regex shortest match

I have the following string, (a.1) (b.2) (c.3) (d.4). I want to change it to (1) (2) (3) (4). I use the following method.
str.replaceAll("\(.*[.](.*)\)","($1)"). And I only get (4). What is the correct method?
Thanks

Couple things here. First, your escapes for the parentheses are incorrect. In Java string literals, backslash itself is an escape character, meaning you need to use \\( to represent \( in regex.
I think your question is how to do non-greedy matches in regex. Use ? to specify non-greedy matching; e.g. *? means "zero or more times, but as few times as possible".
This doesn't negate other answers, but they depend on your test input being as simple as it is in your question. This gives me the correct output without changing the spirit of your original regex (that only the parentheses and dot delimiter are known to be present):
String test = "(a.1) (b.2) (c.3) (d.4)";
String replaced = test.replaceAll("\\(.*?[.](.*?)\\)", "($1)");
System.out.println(replaced); // "(1) (2) (3) (4)"

Root cause
You want to match ()-delimited substrings, but are using .* greedy dot pattern that can match any 0 or more chars (other than line break chars). The \(.*[.](.*)\) pattern will match the first ( in (a.1) (b.2) (c.3) (d.4), then .* will grab the whole string, and backtracking will start trying to accommodate text for the subsequent obligatory subpatterns. [.] will find the last . in the string, the one before the last digit, 4. Then, (.*) will again grab all the rest of the string, but since the ) is required right after, due to backtracking the last (.*) will only capture 4.
Why is lazy / reluctant .*? not a solution?
Even if you use \(.*?[.](.*?)\), if there are (xxx) like substrings inside the string, they will get matched together with expected matches, as . matches any char but line break chars.
Solution
.replaceAll("\\([^()]*\\.([^()]*)\\)", "($1)")
See the regex demo. The [^()] will only match any char BUT a ( and ).
Details
\( - a ( char
[^()]* - a negated character class matching 0 or more chars other than ( and )
\. - a dot
([^()]*) - Group 1 (its value is later referred to with $1 from the replacement pattern): any 0+ chars other than ( and )
\) - a ) char.
Java demo:
List<String> strs = Arrays.asList("(a.1) (b.2) (c.3) (d.4)", "(a.1) (xxxx) (b.2) (c.3) (d.4)");
for (String str : strs)
System.out.println("\"" + str.replaceAll("\\([^()]*\\.([^()]*)\\)", "($1)") + "\"");
Output:
"(1) (2) (3) (4)"
"(1) (xxxx) (2) (3) (4)"

try this one, it will match any alphabets, . and " and replace them all with empty ""
str.replaceAll("[a-zA-Z\\.\"]", "")
Edit:
You can use also [^\\d)(\\s] to match all characters that are not number, space and )( and replace them all with empty "" string
String str = "(a.1) (b.2) (c.3) (d.4)";
System.out.println(str.replaceAll("[^\\d)(\\s]",""));

Try this
str.replaceAll("[A-Za-z0-9]+\.","");
[A-Za-z0-9] will match the upper case, lower case and digits. If you want to match anything before the dot(.) you can use .+ or .* in the place of [A-Za-z0-9]+

Java regex - allow only if n digits in group from long string

i need a regex which allows a string, unless there are 6 or more numbers in a group at any point.
my current wrong regex:
^([a-zA-Z ]*)|(\d{0,5})$
match:
teststring 12345
teststring
1234 teststring
teststring 123 teststring
test1234string
not match:
1234567 teststring
teststring 123456
test123456789string
i hope someone can help.
thx guys
UPDATE:
this regex does the job:
^(?!.*\d{6}).*$
thx #WiktorStribiżew

The pattern to match a string that has no 6 consecutive digits is
^(?!.*\d{6}).*
The regex demo is available here. If there can be line breaks inside, you need to add a DOTALL modifier that will make . match all chars including line break chars: (?s)^(?!.*\d{6}).*.
Details
^ - start of string (implicit in matches())
(?!.*\d{6}) - a negative lookahead that fails the match if there are 0+ chars as many as possible followed with 6 consecutive digits
.* - any 0+ chars as many as possible
In Java, you may use it in the following way:
Boolean found = s.matches("(?s)(?!.*\\d{6}).*");
Note you may just try to find 6 digits with Matcher#find and if not found, proceed with the code excecution:
if (!Pattern.compile("\\d{6}").matcher(s).find()) {
// Cool, proceed
}

You can replace first group which match \d{6,} then check the result length with original length :
String text = "1234567 teststring";
boolean check = text.replaceFirst("\\d{6,}", "").length() == text.length();

What about this regex:
^(\D*)\d{0,5}(\D*)$

Java Regexp to match words only (', -, space)

What is the Java Regular expression to match all words containing only :
From a to z and A to Z
The ' - Space Characters but they must not be in the beginning or the
end.
Examples
test'test match
test' doesn't match
'test doesn't match
-test doesn't match
test- doesn't match
test-test match

You can use the following pattern: ^(?!-|'|\\s)[a-zA-Z]*(?!-|'|\\s)$
Below are the examples:
String s1 = "abc";
String s2 = " abc";
String s3 = "abc ";
System.out.println(s1.matches("^(?!-|'|\\s)[a-zA-Z]*(?!-|'|\\s)$"));
System.out.println(s2.matches("^(?!-|'|\\s)[a-zA-Z]*(?!-|'|\\s)$"));
System.out.println(s3.matches("^(?!-|'|\\s)[a-zA-Z]*(?!-|'|\\s)$"));

When you mean the whitespace char it is: [a-zA-Z ]
So it checks if your string contains a-z(lowercase) and A-Z(uppercase) chars and the whitespace chars. If not, the test will fail

Here's my solution:
/(\w{2,}(-|'|\s)\w{2,})/g
You can take it for a spin on Regexr.
It is first checking for a word with \w, then any of the three qualifiers with "or" logic using |, and then another word. The brackets {} are making sure the words on either end are at least 2 characters long so contractions like don't aren't captured. You could set that to any value to prevent longer words from being captured or omit them entirely.
Caveat: \w also looks for _ underscores. If you don't want that you could replace it with [a-zA-Z] like so:
/([a-zA-Z]{2,}(-|'|\s)[a-zA-Z]{2,})/g

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

A regular expression to match and exclude word/string - java

you can try this: intput.replaceFirst("(patric (?:NY|ny)) ((?:abc|ABC).*)","$1")

Related

Find a three-digit number in a string using replaceAll()

Java regex match anything except a single expression

Java regex shortest match

Java regex - allow only if n digits in group from long string

Java Regexp to match words only (', -, space)

Categories

Resources