I have the following Java string replaceAll function with a regular expression that replaces with zero variables with format ${var}:
String s = "( 200828.22 +400000.00 ) / ( 2.00 + ${16!*!8!1} ) + 200828.22 + ${16!*!8!0}";
s = s.replaceAll("\\$\\{.*\\}", "0");
The problem is that the resulting string s is:
"( 200828.22 +400000.00 ) / ( 2.00 + 0"
What's wrong with this code?
Change your regex to
\\$\\{.*?\\}
↑
* is greedy, the engine repeats it as many times as it can, so it matches {, then match everything until last token. It then begins to backtrack until it matches the last character before }.
For example, if you have the regex
\\{.*\\}
and the string
"{this is} a {test} string"
it'll match as follows:
{ matches the first {
.* matches everything until g token
the regex fails to match last } in the string
it backtracks until it reaches t, then it can match the next } resulting with matching "{this is} a {test}"
In order to make it ungreedy, you should add an ?. By doing that, it'll become lazy and stops until first } is encountered.
As mentioned in the comments, an alternative would be [^}]*. It matches anything that's not } (since it's placed in a character class).
Related
In java I am using regex \".*?\".
I used this for replacing all the string with doublequote with a term String.
Ex:
INPUT: Functions.unescapeJson("test")
Result : Functions.unescapeJson("String")
But now I wanted to exclude some string if they contains double quote. So, I am using / as the escape character. How to achieve this.
Ex:
INPUT: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.con"),"payloads_ul.dataFrameOutput"),"[/"Dimming Value/"]")
RESULT: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(String), String),String),String)
But the result I am getting if I use the previous regex is:
Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(input.mIntegerm/:sgn.nev.rep), String),String),StringDimming ValueString)
How to achieve this using regex if it finds / it should neglect without replacing original string.
The code that I am using
public static void main(String[] args) {
String STRINGVALIDATIONREGEX = "\".*?\"";
String formula = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(input.m2m/:sgn.nev.rep), \"m2m:cin.con\"),\"payloads_ul.dataFrameOutput\"),\"[\"Dimming Value\"]\")";
System.out.println(formula.replace(STRINGVALIDATIONREGEX, "String"));
}
You can use this regex:
\"(\/?.)*?\"
Use [^/] to match anything that is not a slash.
For example, [^/]?\".*[^/]?\" would catch quotes not preceded by /
"((?:[^"]|(?<=\/)")*)"
" match a "
[^"] match a non-quote character
| or
(?<=\/)") a quote character that is preceded by a /
* match sub-expressions 2 - 4 zero or more times.
" match a "
See Regex demo
If you believe that a string such as "abc/" is invalid, then you should use the stricter regex:
"((?:[^"\/]|\/")*)"
" match "
[^"\/] match a any character that isn't a quote for /
| or
\/" match a /" combination
* match sub-expressions 2 - 4 zero or more times.
" match a "
See Regex demo
String text = "$.example(\"This is the tes't\")";
final String quoteRegex = "example.*?(\"[^is].*?\")";
Matcher matcher0 = Pattern.compile(quoteRegex).matcher(text);
while (matcher0.find()) {
System.out.println(matcher0.group(1));
}
It returns This is the tes't. I was expecting not to return any result because of negation [^is] which says do not match is. Then why
it is returning This is the tes't ?
Similarly example.*?(\".*?\") regex returns This is the tes't but example(\".*?\") does not why ?
[^is] does not say do not match is, it says match a character that is not i or s, and your example has T after the " so it matches.
If you want to match zero or more characters and exclude the string "is", you can do:
example.*?(\"(?:(?!is).)*?\")
If you only want to not match is immediately after the " (which is not what your example has):
example.*?(\"(?!is).*?\")
You also ask why example(\".*?\") does not match; that regex only matches if there is a " immediately after example, while your example has a ( between. You could match the ( but still capture the quoted string with:
example\((\"...
I have a string that I want to make sure that the format is always a + followed by digits.
The following would work:
String parsed = inputString.replaceAll("[^0-9]+", "");
if(inputString.charAt(0) == '+') {
result = "+" + parsed;
}
else {
result = parsed;
}
But is there a way to have a regex in the replaceAll that would keep the + (if exists) in the beginning of the string and replace all non digits in the first line?
The following statement with the given regex would do the job:
String result = inputString.replaceAll("(^\\+)|[^0-9]", "$1");
(^\\+) find either a plus sign at the beginning of string and put it to a group ($1),
| or
[^0-9] find a character which is not a number
$1 and replace it with nothing or the plus sign at the start of group ($1)
You can use this expression:
String r = s.replaceAll("((?<!^)[^0-9]|^[^0-9+])", "");
The idea is to replace any non-digit when it is not the initial character of the string (that's the (?<!^)[^0-9] part with a lookbehind) or any character that is not a digit or plus that is the initial character of the string (the ^[^0-9+] part).
Demo.
What about just
(?!^)\D+
Java string:
"(?!^)\\D+"
Demo at regex101.com
\D matches a character that is not a digit [^0-9]
(?!^) using a negative lookahead to check, if it is not the initial character
Yes you can use this kind of replacement:
String parsed = inputString.replaceAll("^[^0-9+]*(\\+)|[^0-9]+", "$1");
if present and before the first digit in the string, the + character is captured in group 1. For example: dfd+sdfd12+sdf12 returns +1212 (the second + is removed since its position is after the first digit).
try this
1- This will allow negative and positive number and will match app special char except - and + at first position.
(?!^[-+])[^0-9.]
2- If you only want to allow + at first position
(?!^[+])[^0-9.]
I have an input string in the following format
String input = "00IG356001110002005064007000000";
Characters 3-7 is the code.
Characters 8-12 is the amount.
Based on the code in the input string (IG356 in the sample input string), i need to capture the amount(00111 in the sample).
The value in the amount (characters 8-12) should be picked up only for specific codes and the logic is detailed below.
The code should not be SG356. If it is SG356, not a match and exit.
a. If the code is not SG356, check if the codes are IG902 or SG350, in this case capture the amount(00111)
else
b. Check for the 3 numbers in the code (characters 5-7, 356 in this sample). If they are 200,201,356,370. go ahead and capture the amount
I am using the regular expression shown below:
Using positive lookahead and if then else construct.
String regex= ".{2}(?!SG356)((?=IG902|SG350).{5}(.{5}).+|.{2}(?=200|201|356|370).{3}(.{5}).+)";
The regular expression works fine if the code in the input string is IG902 or SG350 (when the 'if' part of the regex is getting matched). but if the 'else' is getting matched, i am unable to capture the amount.
This regular expression is working fine while just checking for a match.
.{2}(?!SG356)((?=IG902|SG350).+|.{2}(?=200|201|356|370).+)
The problem is only while capturing the group.
I am running this in Java. Any help would be greatly appreciated.
The java code i am using is :
public String getTsqlSum(String input, String regex){
String value = null;
Matcher m = Pattern.compile(regex).matcher(input);
System.out.println("Group Count: " + m.groupCount());
if (m.matches()) {
for (int i=0;i<m.groupCount();i++){
System.out.println("For i: " + i +" Value: " + m.group(i));
}
}
return value;
}
public void forumTest(){
//String input = "00IG902001110002005064007000000";
String input = "00IG356001110002005064007000000";
String regex= ".{2}(?!SG356)(?:(?=IG902|SG350).{5}|.{2}(?=200|201|356|370).{3})(.{5}).+";
System.out.println(match(input, regex));
String match = getTsqlSum(input, regex);
System.out.println("Match: " + match);
}
The regular expression works fine if the code in the input string is IG902 or SG350 (when the 'if' part of the regex is getting matched). but if the 'else' is getting matched, i am unable to capture the amount.
You are not unable to capture the amount, the expression is working fine. But if you are in the second part of the alternation (This is not a regex if-then-else) then your result is in a different capturing group. You will find it in the capturing group 3 and not in the second one like when you are matching in the first part of the alternation.
String regex= ".{2}(?!SG356)((?=IG902|SG350).{5}(.{5}).+|.{2}(?=200|201|356|370).{3}(.{5}).+)";
Group number 1 2 3
In a regular expression the capturing groups are numbered by their opening brackets and this continues also in an alternation. In Perl there would be a construct that gives the capturing groups of an alternation the same number, but I think thats the only flavour that is able to do this.
In Java you need to check after the expression in which group you have the result.
See my answer here, similar topic
You can change your regex and make the alternation before the capturing group
try this
.{2}(?!SG356)(?:(?=IG902|SG350).{5}|.{2}(?=200|201|356|370).{3})(.{5}).+
You will find your result in both cases in the group 1. (I made the first one a non capturing group using the ?:)
Update after the source was added
Your loop is wrong, that means the groups are starting at 1, if you want the content of group one, you have to use m.group(1).
In group m.group(0) you will find the whole matched string.
Try this
for (int i=1;i<=m.groupCount();i++){
System.out.println("For i: " + i +" Value: " + m.group(i));
}
I need to be able to return signed and unsigned integer constants with no
intervening symbols, possibly preceded by + or -. The only allowed digits are 3, 4, and 5.
I can't figure out a way to say that the expression must not contain a period before or after the integer.
This is what I have so far, but if I pass say "34.5 - 43" the string returned will be: "34 5 43".
All that needs to be returned is "43".
public String getInts(String toBeScanned){
String INT = "";
Pattern p = Pattern.compile("\\b[+-]?[3-5]+\\b");
Matcher m = p.matcher(toBeScanned);
if (m.matches() == true){
INT = toBeScanned;
}
else{
m = p.matcher(" " + toBeScanned);
while (m.find()){
INT = INT + m.group() + " ";
}
}
return INT;
}
Any thoughts or pushes in the right direction are appreciated. Is there a way to say it that the first and last character can be [\b and not .]
This is frustrating the heck out of me. Help!
You don't want a word boundary \b here. I think the best is to create your own assertion, try this
(?<![.\d])[+-]?[3-5]+(?![.\d])
See it here on Regexr
(?<![.\d]) is a negative lookbehind assertion, it says before the pattern is no dot and no digit allowed.
(?![.\d]) is a negative lookahead assertion, it says after the pattern is no dot and no digit allowed.
Improvement
to avoid that it matches stuff like "hf34" we can make it more strict
(?<![.\w])[+-]?[3-5]+(?![.\w])
See it on Regexr
The word boundary \b
\b matches on a change from a word character to a non word character. A word character is a letter or a digit or a _. That means you will also get problems with your \b before the [+-], because there is no \b between a space/start of the string and a [+-].
"\b[+-]?[3-5]+[.][3-5]+\b"
This pattern says that in order to match, there must be at least one number before, and one number after the decimal point.
Is there a way to say it that the first and last character can be [\b and not .]
[^\.\b]
matches \b but not '.'
Is that what you are looking for?
[^\.\b][+-]?[3-5]+[^\.\b]
Will match '43' but not '34.5'