java.util.regex matching parentheses - java

I am using java.util.regex for matching like bellow
public static void main(String[] args) {
String input = "<b>I love you (LT): </b>xxxxxxxxxxxxxxxxxxxxxxxxx";
String patternStr = "I love you (LT):";
String noParentStr = "I love you";
Pattern pattern = Pattern.compile(patternStr);
Pattern noParentPattern = Pattern.compile(noParentStr);
Matcher matcher = pattern.matcher(input);
Matcher noParrentTheseMatcher = noParentPattern.matcher(input);
System.out.println("result:" + matcher.find());
System.out.println("result no parenthese:" + noParrentTheseMatcher.find());
}
I can see the input string contain patternStr "I love you (LT):". But I get the result
result:false
result no parenthese:true
How can i match string contain parentheses '(',')'

In regex, parentheses are meta characters.
i.e., they are reserved for special use.
Specifically a feature called "Capture Groups".
Try escaping them with a \ before each bracket
I love you \(LT\):
List of all special characters that need to be escaped in a regex

As it has been pointed out in the comments, you don't need to use a regex to check if your input String contains I love you (LT):. In fact, there is no actual pattern to represent, only a character by character comparison between a portion of your input and the string you're looking for.
To achieve what you want, you could use the contains method of the String class, which suits perfectly your needs.
String input = "<b>I love you (LT): </b>xxxxxxxxxxxxxxxxxxxxxxxxx";
String strToLookFor = "I love you (LT):";
System.out.println("Result w Contains: " + input.contains(strToLookFor)); //Returns true
Instead, if you actually need to use a regex because it is a requirement. Then, as #Yarin already said, you need to escape the parenthesis since those are characters with a special meaning. They're in fact employed for capturing groups.
String input = "<b>I love you (LT): </b>xxxxxxxxxxxxxxxxxxxxxxxxx";
String strToLookFor = "I love you (LT):";
Pattern pattern = Pattern.compile(strPattern);
Matcher matcher = pattern.matcher(input);
System.out.println("Result w Pattern: " + matcher.find()); //Returns true

Related

Extracting a string using Regex

I have the following code to extract the string within double quotes using Regex.
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
final Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
The output I get now is java programming.But from the String str I want the content in the second double quotes which is programming. Can any one tell me how to do that using Regex.
If you take your example, and change it slightly to:
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
final Matcher matcher = pattern.matcher(str);
int i = 0
while(matcher.find()){
System.out.println("match " + ++i + ": " + matcher.group(1) + "\n");
}
You should find that it prints:
match 1: Java
match 2: programming
This shows that you are able to loop over all of the matches. If you only want the last match, then you have a number of options:
Store the match in the loop, and when the loop is finished, you have the last match.
Change the regex to ignore everything until your pattern, with something like: Pattern.compile(".*\"([^\"]*)\"")
If you really want explicitly the second match, then the simplest solution is something like Pattern.compile("\"([^\"]*)\"[^\"]*\"([^\"]*)\""). This gives two matching groups.
If you want the last token inside double quotes, add an end-of-line archor ($):
final Pattern pattern = Pattern.compile("\"([^\"]*)\"$");
In this case, you can replace while with if if your input is a single line.
Great answer from Paul. Well,You can also try this pattern
final Pattern pattern = Pattern.compile(",\"(\\w+)\"");
Java program
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile(",\"(\\w+)\"");
final Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Explanation
,\": matches a comma, followed by a quotation mark "
(\\w+): matches one or more words
\": matches the last quotation mark "
Then the group(\\w+) is captured (group 1 precisely)
Output
programming

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

Regular Expression in java- code error

I wrote a regEx program in java. I think that is true but The result is different. please help me to fix that.
my code:
String text ="My wife back me up over my decision to quit my job";
String patternString = "[/w/s]*back(\\s\\w+\\s)*up[/w/s]*.";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
boolean matches = matcher.matches();
System.out.println("matches = " + matches);
output:
matches = false
I'm new in java programming. I want to write a program with regEx to test match of "back up" in the input sentence.
Thanks for your attention.
I think you pattern should be like this:
String patternString = "[\\w\\s]*back(\\s\\w+\\s)*up[\\w\\s]*.";
You are using forward slashes instead of backslashes:
String patternString = "[/w/s]*back(\\s\\w+\\s)*up[/w/s]*.";
^ ^ ^ ^
The two are not interchangeable (and don't forget that backslashes need to be doubled up).

Regular expression is not working (Java)

I want to make a regular expression that matches the form (+92)-(21)-1234.... I made this program
public static void main(String[] args) {
// A regex and a string in which to search are specifi ed
String regEx = "([+]\\d{2})-(\\d{2})-\\d+";
String phoneNumber = "(+92)-(21)-1234567890";
// Obtain the required matcher
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(phoneNumber);
if (matcher.matches()) {
System.out.println("Phone Number Valid");
} else {
System.out.println("Phone Number must be in the form (+xx)-(xx)-xxxxx..");
}
} //end of main()
The regular expression i created like starts with the bracket((), +[+], two numbers(\d{2}), bracket close()), a dash(-), start bracket((), two numbers(\d{2}), bracket close()), a dash(-) and then any number of digits(\d+). But it is not working. What i am doing wrong?
Thanks
The regular expression i created like starts with the bracket(()
No, it starts with a grouping construct - that's what an unescaped ( means in a regular expression. I haven't looked at the rest of the expression in detail, but try just escaping the brackets:
String regEx = "\\([+]\\d{2}\\)-\\(\\d{2}\\)-\\d+";
Or a nicer (IMO) way of saying that you need the +
String regEx = "\\(\\+\\d{2}\\)-\\(\\d{2}\\)-\\d+";
Escape the parentheses and the dashes
You need to escape the parantheses (as Jon already mentioned they create a capturing group):
public static void main(String[] args) {
// A regex and a string in which to search are specifi ed
String regEx = "\\([+]\\d{2}\\)-\\(\\d{2}\\)-\\d+";
String phoneNumber = "(+92)-(21)-1234567890";
// Obtain the required matcher
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(phoneNumber);
if (matcher.matches()) {
System.out.println("Phone Number Valid");
} else {
System.out.println("Phone Number must be in the form (+xx)-(xx)-xxxxx..");
}
}
Output:
Phone Number Valid
The correct regex is
[(][+]\\d{2}[)]-[(]\\d{2}[)]-\\d+
You just needed to put your brackets between [ and ].
if the plus symbol is always there you could just write \\+, if it may or may not be there, \\+?. You should escape all regex-specific characters like this
String regEx = "\\(\\+\\d{2}\\)-\\(\\d{2}\\)-\\d+";

forming correct regular expression in dynamic string

I have a FileInputStream who reads a file which somewhere contains a string subset looking like:
...
OperatorSpecific(XXX)
{
Customer(someContent)
SaveImage()
{
...
I would like to identify the Customer(someContent) part of the string and switch the someContent inside the parenthesis for something else.
someContent will be a dynamic parameter and will contain a string of maybe 5-10 chars.
I have used regEx before, like once or twice, but I feel that in a context such as this where I don't know what value will be inside the parenthesis I'm at a loss of how I should express it...
In summary I want to have a string returned to me which has my someContent value inside the Customer-parenthesis.
Does anyone have any bright ideas of how to get this done?
Try this one (double the escaping backslashes for the use in java!)
(?<=Customer\()[^\)]*
And replace with your content.
See it here at Regexr
(?<=Customer\() is look behind assertion. It checks at every position if there is a "Customer(" on the left, if yes it matches on the right all characters that are not a ")" with the [^\)]*, this is then the part that will be replaced.
Some working java code
Pattern p = Pattern.compile("(?<=Customer\\()[^\\)]*");
String original = "Customer(someContent)";
String Replacement = "NewContent";
Matcher m = p.matcher(original);
String result = m.replaceAll(Replacement);
System.out.println(result);
This will print
Customer(NewContent)
Using groups works and non-greedy works:
String s =
"OperatorSpecific(XXX)\n {\n" +
" Customer(someContent)\n" +
" SaveImage() {";
Pattern p = Pattern.compile("Customer\\((.*?)\\)");
Matcher matcher = p.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
will print
someContent
Untested, but something like the following should work:
Pattern pattern = Pattern.compile("\\s+Customer\\(\\s*(\\w+)\\s*\\)\\s*");
Matcher matcher = pattern.matcher(input);
matcher.matches();
System.out.println(matcher.group(1));
EDIT
This of course won't work with all possible cases:
// legal variable names
Customer(_someContent)
Customer($some_Content)

Categories

Resources