How to match a string of tuples in Java? - java

I have strings like "(C,D) (E,F) (G,H) (J,K)" and "(C,D) (E,F) (G,H) (J,K)" or "((C,D) (E,F) (G,H) (J,K)". How to return true if regex matches pattern like in first string (which is a one tuple or series of tuples seperated by one whitespace). I tried something like "(\([A-Z],[A-Z]\)[ |$])+?", but it does not capture the final pair of tuple. In case of 2nd and 3rd string it should return false.

Here is the problem of your regex:
(\([A-Z],[A-Z]\)[ |$])+?
^^^^^
You thought that meant "space or end of string", didn't you? It actually means "space or | or dollar sign". A lot of special characters lose their special meaning when placed inside a character class.
You should replace it with (?: |$) instead. Also, the +? at the end should be a greedy +:
(\([A-Z],[A-Z]\)(?: |$))+
Personally, I don't really like this "space or end of string" thing. I would prefer repeating the tuple pattern (especially when the repeated pattern is not long):
(?:\([A-Z],[A-Z]\) )*(?:\([A-Z],[A-Z]\))
Needless to say, you should match with matches, not find.

If you want to match a string of parenthesised pairs of comma-separated capital letters, with a single space between each pair, you could use a pattern like this:
^\\([A-Z],[A-Z]\\)( \\([A-Z],[A-Z]\\))*$
That is: letter,comma,letter all in parentheses, following by zero or more occurrences of the similar parenthetic expressions, each preceded by a space.

I guess, you might be able to do that with:
\s*|\(([^()\r\n]+)\)
If the pattern would not return an empty string, would be false.
RegEx Demo
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "\\([^()\\r\\n]+\\)|\\s*";
final String string = "(C,D) (E,F) (G,H) (J,K)\n"
+ "(C,D) (E,F) (G,H) (J,K)\n"
+ "((C,D) (E,F) (G,H) (J,K)";
final String subst = "";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
}
}
Output
(
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Source
Regular expression to match balanced parentheses

Related

placeholders for regular expressions

I'm pretty new to regular expressions, and i don't really know how to use them correctly yet.
As input i have a string, in which i want to look for a certain pattern, let's say a word enclosed in !, like this: "Hello, my name is !John!". Now i want to replace the substring inside with something different. How do i look for the substring without knowing what is inside?
String str = "I don't !know! how to do this";
str = str.replace("!placeholder!", "X");
Just like that.
str.replaceAll("!.*!", "X") would be a way to do it. There are however many different "placeholders" and special characters you should be aware of (at least to escape them). In this instance I used . to match any character and * to signify that I want any number of those. The expression then reads as "replace all exclamation points, followed by any number of characters and ending in another exclamation point with the letter X".
That would also replace the exclamation points, so perhaps you want to write str.replaceAll("!.*!", "!X!"). Or maybe you don't want to replace the string "!!" so you'd use "!.+!". But to explore all the possibilities, you should really read some tutorial like this one: https://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Maybe,
!\w+!\s*
or,
!\w+!
might simply work OK for those examples.
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "!\\w+!\\s*";
final String string = "I don't !know! how to do this\n"
+ "Hello, my name is !John! ";
final String subst = "something_else ";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
}
}
Output
I don't something_else how to do this
Hello, my name is something_else
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

Java RegEx doesn't replaceAll

I was trying to replace concatenation symbol '+' with '||' in given multi-line script, however it seems that java regex just replaces 1 occurrence, instead of all.
String ss="A+B+C+D";
Matcher mm=Pattern.compile("(?imc)(.+)\\s*\\+\\s*(.+)").matcher(ss);
while(mm.find())
{
System.out.println(mm.group(1));
System.out.println(mm.group(2));
ss=mm.replaceAll("$1 \\|\\| $2");
}
System.out.println(ss); // Output: A+B+C||D, Expected: A||B||C||D
The reason you only replace one element, is because you match the entire line. The regular expression you use "(?imc)(.+)\\s*\\+\\s*(.+)", matches anything (.+) until the end, then reverts, so it can match the rest \\s*\\+.... So basically your group 1 is .+ almost everything, but the last + and beyond. Therefore replaceAll can only match once, and will terminate after that one replacement.
What you need is a replacement that finds + optionally wrapped in spaces:
Pattern.compile("(?imc)\\s*\\+\\s*");
This should match all you want to match, and does not match the entire line, but only your replacement character.
You could just use:
ss = ss.replaceAll("\\+", "||")
as #ernest_k has pointed out. If you really want to continue using a matcher with iteration, then use Matcher#appendReplacement with a StringBuffer:
String ss = "A+B+C+D";
Matcher mm = Pattern.compile("\\+").matcher(ss);
StringBuffer sb = new StringBuffer();
while (mm.find()) {
mm.appendReplacement(sb, "||");
}
mm.appendTail(sb);
System.out.println(sb);
I thing maybe we would just need a simple string replace:
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\+";
final String string = "A+B+C+D";
final String subst = "||";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
This link on the right panel explains your original expression. The first capturing group does match between one and unlimited times, as many times as possible, thus it would not work here. If we would have changed them to (.+?), it would have partially worked, yet still unnecessary.

What is the Regex for decimal numbers in Java?

I am not quite sure of what is the correct regex for the period in Java. Here are some of my attempts. Sadly, they all meant any character.
String regex = "[0-9]*[.]?[0-9]*";
String regex = "[0-9]*['.']?[0-9]*";
String regex = "[0-9]*["."]?[0-9]*";
String regex = "[0-9]*[\.]?[0-9]*";
String regex = "[0-9]*[\\.]?[0-9]*";
String regex = "[0-9]*.?[0-9]*";
String regex = "[0-9]*\.?[0-9]*";
String regex = "[0-9]*\\.?[0-9]*";
But what I want is the actual "." character itself. Anyone have an idea?
What I'm trying to do actually is to write out the regex for a non-negative real number (decimals allowed). So the possibilities are: 12.2, 3.7, 2., 0.3, .89, 19
String regex = "[0-9]*['.']?[0-9]*";
Pattern pattern = Pattern.compile(regex);
String x = "5p4";
Matcher matcher = pattern.matcher(x);
System.out.println(matcher.find());
The last line is supposed to print false but prints true anyway. I think my regex is wrong though.
Update
To match non negative decimal number you need this regex:
^\d*\.\d+|\d+\.\d*$
or in java syntax : "^\\d*\\.\\d+|\\d+\\.\\d*$"
String regex = "^\\d*\\.\\d+|\\d+\\.\\d*$"
String string = "123.43253";
if(string.matches(regex))
System.out.println("true");
else
System.out.println("false");
Explanation for your original regex attempts:
[0-9]*\.?[0-9]*
with java escape it becomes :
"[0-9]*\\.?[0-9]*";
if you need to make the dot as mandatory you remove the ? mark:
[0-9]*\.[0-9]*
but this will accept just a dot without any number as well... So, if you want the validation to consider number as mandatory you use + ( which means one or more) instead of *(which means zero or more). That case it becomes:
[0-9]+\.[0-9]+
If you on Kotlin, use ktx:
fun String.findDecimalDigits() =
Pattern.compile("^[0-9]*\\.?[0-9]*").matcher(this).run { if (find()) group() else "" }!!
Your initial understanding was probably right, but you were being thrown because when using matcher.find(), your regex will find the first valid match within the string, and all of your examples would match a zero-length string.
I would suggest "^([0-9]+\\.?[0-9]*|\\.[0-9]+)$"
There are actually 2 ways to match a literal .. One is using backslash-escaping like you do there \\., and the other way is to enclose it inside a character class or the square brackets like [.]. Most of the special characters become literal characters inside the square brackets including .. So use \\. shows your intention clearer than [.] if all you want is to match a literal dot .. Use [] if you need to match multiple things which represents match this or that for example this regex [\\d.] means match a single digit or a literal dot
I have tested all the cases.
public static boolean isDecimal(String input) {
return Pattern.matches("^[-+]?\\d*[.]?\\d+|^[-+]?\\d+[.]?\\d*", input);
}

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

regex for letters or numbers in brackets

I am using Java to process text using regular expressions. I am using the following regular expression
^[\([0-9a-zA-Z]+\)\s]+
to match one or more letters or numbers in parentheses one or more times. For instance, I like to match
(aaa) (bb) (11) (AA) (iv)
or
(111) (aaaa) (i) (V)
I tested this regular expression on http://java-regex-tester.appspot.com/ and it is working. But when I use it in my code, the code does not compile. Here is my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tester {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("^[\([0-9a-zA-Z]+\)\s]+");
String[] words = pattern.split("(a) (1) (c) (xii) (A) (12) (ii)");
String w = pattern.
for(String s:words){
System.out.println(s);
}
}
}
I tried to use \ instead of \ but the regex gave different results than what I expected (it matches only one group like (aaa) not multiple groups like (aaa) (111) (ii).
Two questions:
How can I fix this regex and be able to match multiple groups?
How can I get the individual matches separately (like (aaa) alone and then (111) and so on). I tried pattern.split but did not work for me.
Firstly, you want to escape any backslashes in the quotation marks with another backslash. The Regex will treat it as a single backslash. (E.g. call a word character \w in quotation marks, etc.)
Secondly, you got to finish the line that reads:
String w = pattern.
That line explains why it doesn't compile.
Here is my final solution to match the individual groups of letters/numbers in brackets that appear at the beginning of a line and ignore the rest
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tester {
static ArrayList<String> listOfEnums;
public static void main(String[] args) {
listOfEnums = new ArrayList<String>();
Pattern pattern = Pattern.compile("^\\([0-9a-zA-Z^]+\\)");
String p = "(a) (1) (c) (xii) (A) (12) (ii) and the good news (1)";
Matcher matcher = pattern.matcher(p);
boolean isMatch = matcher.find();
int index = 0;
//once you find a match, remove it and store it in the arrayList.
while (isMatch) {
String s = matcher.group();
System.out.println(s);
//Store it in an array
listOfEnums.add(s);
//Remove it from the beginning of the string.
p = p.substring(listOfEnums.get(index).length(), p.length()).trim();
matcher = pattern.matcher(p);
isMatch = matcher.find();
index++;
}
}
}
1) Your regex is incorrect. You want to match individual groups of letters / numbers in brackets, and the current regex will match only a single string of one or more such groups. I.e. it will match
(abc) (def) (123)
as a single group rather than three separate groups.
A better regex that would match only up to the closing bracket would be
\([0-9a-zA-Z^\)]+\)
2) Java requires you to escape all backslashes with another backslash
3) The split() method will not do what you want. It will find all matches in your string then throw them away and return an array of what is left over. You want to use matcher() instead
Pattern pattern = Pattern.compile("\\([0-9a-zA-Z^\\)]+\\)");
Matcher matcher = pattern.matcher("(a) (1) (c) (xii) (A) (12) (ii)");
while (matcher.find()) {
System.out.println(matcher.group());
}

Categories

Resources