I am trying to replace everything except a specific expression including digits in java using only the replaceAll() method and a single regex.
Given the String P=32 N=5 M=2 I want to extract each variable independently.
I can match the expression N=5 with the regex N=\d, but I can't seem to find an inverse expression that will match anything but N=\d, where x may be any digit.
I do not want to use Pattern or Matcher but solve this using regex only. So for x, y, z being any digit, I want to be able to replace everything but the expression N=y in a String P=x N=y M=z:
String input = "P=32 N=5 M=2";
output = input.replaceAll(regex, "");
System.out.println(output);
// expected "N=5"
You may use
s = s.replaceAll("\\s*\\b(?!N=\\d)\\w+=\\d+", "").trim();
See the Java demo and the regex demo.
Details
\s* - 0+ whitespaces
\b - a word boundary
(?!N=\d) - immediately to the right, there should be no N= and any digit
\w+ - 1+ letters/digits/_
= - an = sign
\d+ - 1+ digits.
Related
I have String from which I need to extract a keyword.
Something like: "I have 100 friends and 1 evil".
I need to extract "100" from that String using only replaceAll function and appropriate regex.
I tried to do it in that way:
String input = "I have 100 friends and 1 evil";
String result = input.replaceAll("[^\\d{3}]", "")
But it doesn't work. Any help would be appreciated.
You can consider any of the solutions below:
String result = input.replaceFirst(".*?(\\d{3}).*", "$1");
String result = input.replaceFirst(".*?(?<!\\d)(\\d{3})(?!\\d).*", "$1");
String result = input.replaceFirst(".*?\\b(\\d{3})\\b.*", "$1");
String result = input.replaceFirst(".*?(?<!\\S)(\\d{3})(?!\\S).*", "$1");
See the regex demo. NOTE you may use replaceAll here, too, but it makes little sense as the replacement must occur only once in this case.
Here,
.*? - matches any zero or more chars other than line break chars, as few as possible
(\d{3}) - captures into Group 1 any three digits
.* - matches any zero or more chars other than line break chars, as many as possible.
The (?<!\d) / (?!\d) lookarounds are digit boundaries, there is no match if the sequence is four or more digits. \b are word boundaries, there will be no match of the three digits are glued to a letter, digit or underscore. (?<!\S) / (?!\S) lookarounds are whitespace boundaries, there must be a space or start of string before the match and either a space or end of string after.
The replacement is $1, the value of Group 1.
See the Java demo:
String input = "I have 100 friends and 1 evil";
System.out.println(input.replaceFirst(".*?(\\d{3}).*", "$1"));
System.out.println(input.replaceFirst(".*?(?<!\\d)(\\d{3})(?!\\d).*", "$1"));
System.out.println(input.replaceFirst(".*?\\b(\\d{3})\\b.*", "$1"));
System.out.println(input.replaceFirst(".*?(?<!\\S)(\\d{3})(?!\\S).*", "$1"));
All output 100.
How can i change 4 -1/4 -5 to 4/1 -1/4 -5/1 using regex?
String str = "4 -1/4 -5";
String regex = "(-?\\d+/\\d+)";
Matcher matcher = Pattern.compile(regex).matcher(str);
My code finding only fraction but i want to find integer without fraction.
String result = str.replaceAll("(?<!/)\\b\\d+\\b(?!/)", "$0/1");
looks for entire numbers (\b\d+\b), not preceded by ((?<!/)) nor followed by a slash ((?!/)), and adds /1 to them.
Try (?<=-| |^)(\d+)(?!\d*\/)
Explanation:
(?<=...) - positive lookahead, assert, what precedes matches pattern inside
-| |^ - match either -, , or beginning of a line ^
(\d+) - match one or more digits and store in first capturing group
(?!\d*\/) - negative lookahead, assert what follows is not zero or mroe digits followed by \/.
Replace it with \1/1, so first capturing group followed by /1
Demo
I'm not sure I understand what you want to do here, but if you want to remove the slashes you can use:
str.replaceAll("\\/", " ");
This will leave you with a string having only the integers.
I have strings like
patric NY abc other
patric ny
Expected output: patric ny and patric NY.
So, patric ny is varying statement which could be address. And abc MIGHT be there.
So, I want to retrieve random address like whatever we have before ABC
and if ABC is not present , give the complete string.
I tried
(.+?(?=abc))
It gives me result for patric NY abc other but not for patric ny.
Any help would be gratefully appreciated.
Extracting approach
You may use
^(.*?)(?:\s+abc\b.*)?$
See the regex demo.
Details
^ - start of string
(.*?) - Capturing group 1: any 0+ chars other than line break chars, as few as possible
(?:\s+abc\b.*)? - an optional non-capturing group that matches 1+ whitespaces, abc, a word boundary and any 0+ chars other than line break chars, as many as possible
$ - end of string.
Replacing approach
You may just remove 1+ whitespaces, abc and the rest from your string:
String result = input.replaceFirst("(?s)\\s+abc.*", "");
Or, if abc is a whole word:
String result = input.replaceFirst("(?s)\\s+abc\\b.*", "");
See the regex demo.
The replaceFirst() matches the first occurrence of the pattern and removes it.
Pattern details
(?s) - DOTALL flag making . match any char
\s+ - 1+ whitespaces
abc - an abc substring
\b - a word boundary
.* - the rest of the string
you can try this:
intput.replaceFirst("(patric (?:NY|ny)) ((?:abc|ABC).*)","$1")
I have the following string, (a.1) (b.2) (c.3) (d.4). I want to change it to (1) (2) (3) (4). I use the following method.
str.replaceAll("\(.*[.](.*)\)","($1)"). And I only get (4). What is the correct method?
Thanks
Couple things here. First, your escapes for the parentheses are incorrect. In Java string literals, backslash itself is an escape character, meaning you need to use \\( to represent \( in regex.
I think your question is how to do non-greedy matches in regex. Use ? to specify non-greedy matching; e.g. *? means "zero or more times, but as few times as possible".
This doesn't negate other answers, but they depend on your test input being as simple as it is in your question. This gives me the correct output without changing the spirit of your original regex (that only the parentheses and dot delimiter are known to be present):
String test = "(a.1) (b.2) (c.3) (d.4)";
String replaced = test.replaceAll("\\(.*?[.](.*?)\\)", "($1)");
System.out.println(replaced); // "(1) (2) (3) (4)"
Root cause
You want to match ()-delimited substrings, but are using .* greedy dot pattern that can match any 0 or more chars (other than line break chars). The \(.*[.](.*)\) pattern will match the first ( in (a.1) (b.2) (c.3) (d.4), then .* will grab the whole string, and backtracking will start trying to accommodate text for the subsequent obligatory subpatterns. [.] will find the last . in the string, the one before the last digit, 4. Then, (.*) will again grab all the rest of the string, but since the ) is required right after, due to backtracking the last (.*) will only capture 4.
Why is lazy / reluctant .*? not a solution?
Even if you use \(.*?[.](.*?)\), if there are (xxx) like substrings inside the string, they will get matched together with expected matches, as . matches any char but line break chars.
Solution
.replaceAll("\\([^()]*\\.([^()]*)\\)", "($1)")
See the regex demo. The [^()] will only match any char BUT a ( and ).
Details
\( - a ( char
[^()]* - a negated character class matching 0 or more chars other than ( and )
\. - a dot
([^()]*) - Group 1 (its value is later referred to with $1 from the replacement pattern): any 0+ chars other than ( and )
\) - a ) char.
Java demo:
List<String> strs = Arrays.asList("(a.1) (b.2) (c.3) (d.4)", "(a.1) (xxxx) (b.2) (c.3) (d.4)");
for (String str : strs)
System.out.println("\"" + str.replaceAll("\\([^()]*\\.([^()]*)\\)", "($1)") + "\"");
Output:
"(1) (2) (3) (4)"
"(1) (xxxx) (2) (3) (4)"
try this one, it will match any alphabets, . and " and replace them all with empty ""
str.replaceAll("[a-zA-Z\\.\"]", "")
Edit:
You can use also [^\\d)(\\s] to match all characters that are not number, space and )( and replace them all with empty "" string
String str = "(a.1) (b.2) (c.3) (d.4)";
System.out.println(str.replaceAll("[^\\d)(\\s]",""));
Try this
str.replaceAll("[A-Za-z0-9]+\.","");
[A-Za-z0-9] will match the upper case, lower case and digits. If you want to match anything before the dot(.) you can use .+ or .* in the place of [A-Za-z0-9]+
i am using regex to check correctness of the string in my application. I want to check if string has a following pattern: x=y&a=b&... x,y,a,b etc. can be empty.
Example of correct strings:
abc=def&gef=cda&pdf=cdf
=&gef=def
abc=&gef=def
=abc&gef=def
Example of incorrect strings:
abc=def&gef=cda&
abc=def&gef==cda&
abc=defgef=cda&abc=gda
This is my code showing current solution:
String pattern = "[[a-zA-Z0-9]*[=]{1}[a-zA-Z0-9]*[&]{1}]*";
if(!Pattern.matches(pattern, s)){
throw new IllegalArgumentException(s);
}
This solution is bad because it accepts strings like:
abc=def&gef=def&
Can anyone help me with correct pattern?
You may use the following regex:
^[a-zA-Z0-9]*=[a-zA-Z0-9]*(?:&[a-zA-Z0-9]*=[a-zA-Z0-9]*)*$
See the regex demo
When used with matches(), the ^ and $ anchors may be omitted.
Details:
^ - start of string
[a-zA-Z0-9]* - 0+ alphanumeric chars (may be replaced with \p{Alnum})
= - a = symbol
[a-zA-Z0-9]* - 0+ alphanumeric chars
= - a = symbol
(?: - start of a non-capturing group matching sequences of...
& - a & symbol
[a-zA-Z0-9]*=[a-zA-Z0-9]* - same as above
)* - ... zero or more occurrences
$ - end of string
NOTE: If you want to make the pattern more generic, you may match any char other than = and & with a [^&=] pattern that would replace a more restrictive [a-zA-Z0-9] pattern:
^[^=&]*=[^=&]*(?:&[^=&]*=[^=&]*)*$
See this regex demo
I believe you want this.
([a-zA-Z0-9]*=[a-zA-Z0-9]*&)*[a-zA-Z0-9]*=[a-zA-Z0-9]*
This matches any number of repetitions like x=y, with a & after each one; followed by one repetition like x=y without the following &.
Here you go:
^\w*=\w*(?:&(?:\w*=\w*))*$
^ is the starting anchor
(\w*=\w*) is to represent parameters like abc=def
\w matches a word character [a-zA-Z0-9_]
\w* represents 0 or more characters
& represents tha actual ampersand literal
(&(\w*=\w*))* matches any subsequents parameters like &b=d etc.
$ represents the ending anchor
Regex101 Demo
EDIT: Made all groups non-capturing.
Note: As #WiktorStribiżew has pointed out in the comments, \w will match _ as well, so above regex should be modified to exclude underscores if they are to be avoided in the pattern, i.e [A-Za-z0-9]