In java replace the regex with string - java

In java I am using regex \".*?\".
I used this for replacing all the string with doublequote with a term String.
Ex:
INPUT: Functions.unescapeJson("test")
Result : Functions.unescapeJson("String")
But now I wanted to exclude some string if they contains double quote. So, I am using / as the escape character. How to achieve this.
Ex:
INPUT: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.con"),"payloads_ul.dataFrameOutput"),"[/"Dimming Value/"]")
RESULT: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(String), String),String),String)
But the result I am getting if I use the previous regex is:
Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(input.mIntegerm/:sgn.nev.rep), String),String),StringDimming ValueString)
How to achieve this using regex if it finds / it should neglect without replacing original string.
The code that I am using
public static void main(String[] args) {
String STRINGVALIDATIONREGEX = "\".*?\"";
String formula = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(input.m2m/:sgn.nev.rep), \"m2m:cin.con\"),\"payloads_ul.dataFrameOutput\"),\"[\"Dimming Value\"]\")";
System.out.println(formula.replace(STRINGVALIDATIONREGEX, "String"));
}

You can use this regex:
\"(\/?.)*?\"

Use [^/] to match anything that is not a slash.
For example, [^/]?\".*[^/]?\" would catch quotes not preceded by /

"((?:[^"]|(?<=\/)")*)"
" match a "
[^"] match a non-quote character
| or
(?<=\/)") a quote character that is preceded by a /
* match sub-expressions 2 - 4 zero or more times.
" match a "
See Regex demo
If you believe that a string such as "abc/" is invalid, then you should use the stricter regex:
"((?:[^"\/]|\/")*)"
" match "
[^"\/] match a any character that isn't a quote for /
| or
\/" match a /" combination
* match sub-expressions 2 - 4 zero or more times.
" match a "
See Regex demo

Related

Java regex 2 ignore pattern between 2 words and should say match in if conditio

I have to find out the match between from and IN_TXT and anything between 2 words should be ignored and say it is matched.I tried with below expression but not working.,
String table="IN_TXT";
String s="select * from JAN_X.IN_TXT";
if((s.matches("from"+"(.*)"+table))){
System.out.println``("Matched");
}
What might be missing here?
matches will add a ^ and $ anchor so your regex is not completely matching against your input
so you can use .*? as .*?from"+"(.*)"+table where .*? will cover the string occurred before from
.*? match as few times as possible
String s = "select * from JAN_X.IN_TXT";
String table = "IN_TXT";
if ((s.matches(".*?from" + "(.*)" + table))) {
System.out.println("Matched");
}
if you want to extract JAN_X then you can use
// $1 represents (.*) capture group
String s2= s.replaceAll(".*?from (.*)\\."+table,"$1");
System.out.println(s2);
output
JAN_X

Regex pattern for repeated words

I am very new to regex, I am learning it now. I have a requirement like this:
Any String starts with #newline# and also ends with #newline#. In between these two words, there could be (0 or more spaces) or (0 or more #newline#).
below is an example:
#newline# #newline# #newline##newline# #newline##newline##newline#.
How to do regex for this?
I have tried this, but not working
^#newline#|(\s+#newline#)|#newline#|#newline#$
Your ^#newline#|(\s+#newline#)|#newline#|#newline#$ matches either a #newline# at the start of the string (^#newline#), or 1+ whitespaces followed with #newline# ((\s+#newline#)), or #newline#, or (and this never matches as the previous catches all the cases of #newline#) a #newline# at the end of the string (#newline#$).
You may match these strings with
^#newline#(?:\s*#newline#)*$
or (if there should be at least 2 occurrences of #newline# in the string)
^#newline#(?:\s*#newline#)+$
^
See the regex demo.
^ - start of string
#newline# - literal string
(?:\s*#newline#)* - zero (NOTE: replacing * with + will require at least 1) or more sequences of
\s* - 0+ whitespaces
#newline# - a literal substring
$ - end of string.
Java demo:
String s = "#newline# #newline# #newline##newline# #newline##newline##newline#";
System.out.println(s.matches("#newline#(?:\\s*#newline#)+"));
// => true
Note: inside matches(), the expression is already anchored, and ^ and $ can be removed.
As far as I understand the requirements, it should be this:
^#newline#(\s|#newline#)*#newline#$
this will not match your example string, since it does not start with #newline#
without the ^ and the $ it matches a sub-string.
Check out http://www.regexplanet.com/ to play around with Regular Expressions.
Please use the pattern and matches classes to identify.
You can give the patternString string at runtime
patternString="newline";
public void findtheMatch(String patternString)
{
String text ="#newline# #newline# #newline##newline# #newline##newline##newline# ";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println("found: " + matcher.group(1));
}
}
You can try this as well:
#newline#[\s\S]+#newline#
It says, match anything that starts with #newline# followed by any combination of whitespace or non-whitespace characters and ends with #newline#.

How can I strip all non digits in a string except the first character?

I have a string that I want to make sure that the format is always a + followed by digits.
The following would work:
String parsed = inputString.replaceAll("[^0-9]+", "");
if(inputString.charAt(0) == '+') {
result = "+" + parsed;
}
else {
result = parsed;
}
But is there a way to have a regex in the replaceAll that would keep the + (if exists) in the beginning of the string and replace all non digits in the first line?
The following statement with the given regex would do the job:
String result = inputString.replaceAll("(^\\+)|[^0-9]", "$1");
(^\\+) find either a plus sign at the beginning of string and put it to a group ($1),
| or
[^0-9] find a character which is not a number
$1 and replace it with nothing or the plus sign at the start of group ($1)
You can use this expression:
String r = s.replaceAll("((?<!^)[^0-9]|^[^0-9+])", "");
The idea is to replace any non-digit when it is not the initial character of the string (that's the (?<!^)[^0-9] part with a lookbehind) or any character that is not a digit or plus that is the initial character of the string (the ^[^0-9+] part).
Demo.
What about just
(?!^)\D+
Java string:
"(?!^)\\D+"
Demo at regex101.com
\D matches a character that is not a digit [^0-9]
(?!^) using a negative lookahead to check, if it is not the initial character
Yes you can use this kind of replacement:
String parsed = inputString.replaceAll("^[^0-9+]*(\\+)|[^0-9]+", "$1");
if present and before the first digit in the string, the + character is captured in group 1. For example: dfd+sdfd12+sdf12 returns +1212 (the second + is removed since its position is after the first digit).
try this
1- This will allow negative and positive number and will match app special char except - and + at first position.
(?!^[-+])[^0-9.]
2- If you only want to allow + at first position
(?!^[+])[^0-9.]

Remove leading trailing non numeric characters from a string in Java

I need to strip off all the leading and trailing characters from a string upto the first and last digit respectively.
Example : OBC9187A-1%A
Should return : 9187A-1
How do I achieve this in Java?
I understand regex is the solution, but I am not good at it.
I tried this replaceAll("([^0-9.*0-9])","")
But it returns only digits and strips all the alpha/special characters.
Here is a self-contained example of using regex and java to solve your problem. I would suggest looking at a regex tutorial of some kind here is a nice one.
public static void main(String[] args) throws FileNotFoundException {
String test = "OBC9187A-1%A";
Pattern p = Pattern.compile("\\d.*\\d");
Matcher m = p.matcher(test);
while (m.find()) {
System.out.println("Match: " + m.group());
}
}
Output:
Match: 9187A-1
\d matches any digit .* matches anything 0 or more times \d matches any digit. The reason we use \\d is to escape the \ for Java since \ is a special character...So this regex will match a digit followed by anything followed by another digit. This is greedy so it will take the longest/largest/greediest match so it will get the first and last digit and anything in between. The while loop is there because if there was more than 1 match it would loop through all matches. In this case there can only be 1 match so you can leave the while loop or change to if like this:
if(m.find())
{
System.out.println("Match: " + m.group());
}
This will strip leading and trailing non-digit characters from string s.
String s = "OBC9187A-1%A";
s = s.replaceAll("^\\D+", "").replaceAll("\\D+$", "");
System.out.println(s);
// prints 9187A-1
DEMO
Regex explanation
^\D+
^ assert position at start of the string
\D+ match any character that's not a digit [^0-9]
Quantifier: + Between one and unlimited times, as many times as possible
\D+$
\D+ match any character that's not a digit [^0-9]
Quantifier: + Between one and unlimited times, as many times as possible
$ assert position at end of the string

Regex for specific url format

I am trying to get a regex expression to match a specific url format. Specifically the api urls for stackexchange. For example I want both of these to match:
http://api.stackoverflow.com/1.1/questions/1234/answers
http://api.physics.stackexchange.com/1.0/questions/5678/answers
Where
everything not in bold must identical.
The first bold part, can only be made of a to z, and either one or no full stop.
Also it would be good, if there is one full stop the word "stackexchange" must follow. However this isn't crucial.
The second bold part can only be a 1 or a 0.
The last bold part can be only numbers 0 to 9, and can be any length
There can't be anything at all before or after the url, not even a trailing slash
Pattern.compile("^(?i:http://api\\.(?:[a-z]+(?:\\.stackexchange)?)\\.com)/1\\.[01]/questions/[0-9]+/answers\\z")
The ^ makes sure it starts at the start of input, and the \\z makes sure it ends at the end of input. All the dots are escaped so they are literal. The (?i:...) part makes the domain and scheme case-insensitive as per the URL spec. The [01] only matches the characters 0 or 1. The [0-9]+ matches 1 or more Arabic digits. The rest is self explanatory.
^http://api[.][a-z]+([.]stackexchange)?[.]com/1[.][01]/questions/[0-9]+/answers$
^ matches start-of-string, $ matches end-of-line, [.] is an alternative way to escape the dot than a backslash (which itself would need to be escaped as \\.).
This tested Java program has a commented regex which should do the trick:
import java.util.regex.*;
public class TEST {
public static void main(String[] args) {
String s = "http://api.stackoverflow.com/1.1/questions/1234/answers";
Pattern p = Pattern.compile(
"http://api\\. # Scheme and api subdomain.\n" +
"(?: # Group for domain alternatives.\n" +
" stackoverflow # Either one\n" +
"| physics\\.stackexchange # or the other\n" +
") # End group for domain alternatives.\n" +
"\\.com # TLD\n" +
"/1\\.[01] # Either 1.0 or 1.1\n" +
"/questions/\\d+/answers # Rest of path.",
Pattern.COMMENTS);
Matcher m = p.matcher(s);
if (m.matches()) {
System.out.print("Match found.\n");
} else {
System.out.print("No match found.\n");
}
}
}

Categories

Resources