regex expression in java using wildcards

regex expression in java using wildcards - java

Is there a way to use a regex expression with wild cards? Specifically, I have a String phrase and another String target. I would like to use the match method to find the first occurrence of the target in the phrase where the character before and after the target is anything other than a-z.
Updated:
Is there a way to use the String method matches() with the following regex:
"(?<![a-z])" + "hello" + "(?![a-z])";

You can use the regex, "(?<![a-z])" + Pattern.quote(phrase) + "(?![a-z])"
Demo at regex101 with phrase = "hello".
(?<![a-z]): Negative lookbehind for [a-z]
(?![a-z]): Negative lookahead for [a-z]
Java Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class Main {
public static void main(String[] args) {
// Test
String phrase = "hello";
String regex = "(?<![a-z])" + Pattern.quote(phrase) + "(?![a-z])";
Pattern pattern = Pattern.compile(regex);
Stream.of(
"hi hello world",
"hihelloworld"
).forEach(s -> {
Matcher matcher = pattern.matcher(s);
System.out.print(s + " => ");
if(matcher.find()) {
System.out.println("Match found");
}else {
System.out.println("No match found");
}
});
}
}
Output:
hi hello world => Match found
hihelloworld => No match found
In case you want the full-match, use the regex, .*(?<![a-z]) + Pattern.quote(phrase) +(?![a-z]).* as demonstrated at regex101.com. The pattern, .* means any character any number of times. The rest of the patterns are already explained above. The presence of .* before and after the match will ensure covering the whole string.
Java Demo:
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class Main {
public static void main(String[] args) {
// Test
String phrase = "hello";
String regex = ".*(?<![a-z])" + Pattern.quote(phrase) + "(?![a-z]).*";
Stream.of(
"hi hello world",
"hihelloworld"
).forEach(s -> System.out.println(s + " => " + (s.matches(regex) ? "Match found" : "No match found")));
}
}
Output:
hi hello world => Match found
hihelloworld => No match found

Related

How to capture lookbehind using java

I am trying to capture text that is matched by lookbehind.
My code :
private static final String t1="first:\\\w*";
private static final String t2="(?<=\\w+)=\\".+\\"";
private static final String t=t1+'|'+t2;
Pattern p=Pattern.compile(t);
Matcher m=p.matcher("first:second=\\"hello\\"");
while(m.find())
System.out.println(m.group());
The output:
first:second
="hello"
I expected:
first:second
second="hello"
How can I change my regex so that I could get what I expect.
Thank you

Why don't you just use one regex to match it all?
(first:)(\w+)(=".+")
And then simply use one match, and use the groups 1 and 2 for the first expected row and the groups 2 and 3 for the second expected row.
I modified your example to be compilable and showing my attempt:
package examples.stackoverflow.q71651411;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Q71651411 {
public static void main(String[] args) {
Pattern p = Pattern.compile("(first:)(\\w+)(=\".+\")");
Matcher m = p.matcher("first:second=\"hello\"");
while (m.find()) {
System.out.println("part 1: " + m.group(1) + m.group(2));
System.out.println("part 2: " + m.group(2) + m.group(3));
}
}
}

Camel-Case to Sentence-Case in Java

I have the following code to convert a camel-case phrase to sentence-case. It works fine for almost all cases, but it can't handle acronyms. How can this code be corrected to work with acronyms?
private static final Pattern UPPERCASE_LETTER = Pattern.compile("([A-Z]|[0-9]+)");
static String toSentenceCase(String camelCaseString) {
return camelCaseString.substring(0, 1).toUpperCase()
+ UPPERCASE_LETTER.matcher(camelCaseString.substring(1))
.replaceAll(matchResult -> " " + (matchResult.group(1).toLowerCase()));
}
JUnit5 test:
#ParameterizedTest(name = "#{index}: Convert {0} to sentence case")
#CsvSource(value = {"testOfAcronymUSA:Test of acronym USA"}, delimiter = ':')
void shouldSentenceCaseAcronym(String input, String expected) {
//TODO: currently fails
assertEquals(expected, toSentenceCase(input));
}
Output:
org.opentest4j.AssertionFailedError:
Expected :Test of acronym USA
Actual :Test of acronym u s a
I thought to add (?=[a-z]) to the end of the regex, but then it doesn't handle the spacing correctly.
I'm on Java 14.

Change the regex to (?<=[a-z])[A-Z]+|[A-Z](?=[a-z])|[0-9]+ where
(?<=[a-z])[A-Z]+ specifies positive lookbehind for [a-z]
[A-Z](?=[a-z]) specifies positive lookahead for [a-z]
Note that you do not need any capturing group.
Demo:
import java.util.regex.Pattern;
public class Main {
private static final Pattern UPPERCASE_LETTER = Pattern.compile("(?<=[a-z])[A-Z]+|[A-Z](?=[a-z])|[0-9]+");
static String toSentenceCase(String camelCaseString) {
return camelCaseString.substring(0, 1).toUpperCase() + UPPERCASE_LETTER.matcher(camelCaseString.substring(1))
.replaceAll(matchResult -> !matchResult.group().matches("[A-Z]{2,}")
? " " + matchResult.group().toLowerCase()
: " " + matchResult.group());
}
public static void main(String[] args) {
System.out.println(toSentenceCase("camelCaseString"));
System.out.println(toSentenceCase("USA"));
System.out.println(toSentenceCase("camelCaseStringUSA"));
}
}
Output:
Camel case string
USA
Camel case string USA

To fix your immediate issue you may use
private static final Pattern UPPERCASE_LETTER = Pattern.compile("([A-Z]{2,})|([A-Z]|[0-9]+)");
static String toSentenceCase(String camelCaseString) {
return camelCaseString.substring(0, 1).toUpperCase()
+ UPPERCASE_LETTER.matcher(camelCaseString.substring(1))
.replaceAll(m -> m.group(1) != null ? " " + m.group(1) : " " + m.group(2).toLowerCase() );
}
See the Java demo.
Details
([A-Z]{2,})|([A-Z]|[0-9]+) regex matches and captures into Group 1 two or more uppercase letters, or captures into Group 2 a single uppercase letter or 1+ digits
.replaceAll(m -> m.group(1) != null ? " " + m.group(1) : " " + m.group(2).toLowerCase() ) replaces with space + Group 1 if Group 1 matched, else with a space and Group 2 turned to lower case.

Regex Pattern required in java for matching string starts with '{{' and ends with "}}"

Hi,
I need to create a regex pattern that will pick the matching string starts with '{{' and ends with
"}}" from a given string.
The pattern I have created is working same with the strings starting with '{{{' and '{{', Similarly with ending with '}}}' and
'}}'
Output of above code:
matches = {{phone2}}
matches = {{phone3}}
matches = {{phone5}}
**Expected Output**:
matches = {{phone5}}
I need only Strings which follows two consecutive pattern of '{' and '}' not three.
Sharing the code below
package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
String text = "<test>{{#phone1}}{{{phone3}}}{{/phone4}} {{phone5}}></test>";
//String pattern = "\\{\\{\\s*?(\\w*?)\\s*?(?!.*\\}\\}\\}$)";
String pattern = "\\{\\{\\s*?(\\w*?)\\s*?}}";
Pattern placeholderPattern = Pattern.compile(pattern);
Matcher placeholderMatcher = placeholderPattern.matcher(text);
while (placeholderMatcher.find()) {
System.out.println("matches = " + placeholderMatcher.group());
}
}
}

You may use
String pattern = "(?<!\\{)\\{{2}\\s*(\\w*)\\s*\\}{2}(?!\\})";
Or, if empty or blank {{...}} are not expected, use
String pattern = "(?<!\\{)\\{{2}\\s*(\\w+)\\s*\\}{2}(?!\\})";
See the regex demo.
Details
(?<!\{) - a negative lookbehind failing the match if there is a { char immediately to the left of the current location
\{{2} - {{ substring
\s* - 0+ whitespaces
(\w*) - Group 1: one or more word chars (1 or more if + quantifier is used)
\s* - 0+ whitespaces
\}{2} - }} string
(?!\}) - a negative lookahead that fails the match if there is a } char immediately to the right of the current location.
See the Java demo:
String text = "<test>{{#phone1}}{{{phone3}}}{{/phone4}} {{phone5}}></test>";
String pattern = "(?<!\\{)\\{{2}\\s*(\\w*)\\s*\\}{2}(?!\\})";
Pattern placeholderPattern = Pattern.compile(pattern);
Matcher placeholderMatcher = placeholderPattern.matcher(text);
while (placeholderMatcher.find()) {
System.out.println("Match: " + placeholderMatcher.group());
System.out.println("Group 1: " + placeholderMatcher.group(1));
}
Output:
Match: {{phone5}}
Group 1: phone5

Why Regular expression matches one character less at the end?

Problem is last one character never gets matched.
When I tried displaying using group ,it shows all match except last character.
Its same in all cases.
Below is the code and its o/p.
package mon;
import java.util.*;
import java.util.regex.*;
class HackerRank {
static void Pattern(String text) {
String p="\\d{1,2}|(0|1)\\d{2}|2[0-4]\\d|25[0-5]";
String pattern="(("+p+")\\.){3}"+p;
Pattern pi=Pattern.compile(pattern);
Matcher m=pi.matcher(text);
// System.out.println(m.group());
if(m.find() && m.group().equals(text))
System.out.println(m.group()+"true");
else
System.out.println(m.group()+" false");
}
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
while(sc.hasNext()) {
Pattern(sc.next());
}
sc.close();
}
}
I/P:000.12.12.034;
O/P:000.12.12.03 false

You should properly group the alternatives inside the octet pattern:
String p="(?:\\d{1,2}|[01]\\d{2}|2[0-4]\\d|25[0-5])";
// ^^^ ^
Then build the patter like
String pattern = p + "(?:\\." + p + "){3}";
It will become a bit more efficient. Then, use matches to require a full string match:
if(m.matches()) {...
See a Java demo:
String p="(?:\\d{1,2}|[01]\\d{2}|2[0-4]\\d|25[0-5])";
String pattern = p + "(?:\\." + p + "){3}";
String text = "192.156.34.56";
// System.out.println(pattern); => (?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])(?:\.(?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])){3}
Pattern pi=Pattern.compile(pattern);
Matcher m=pi.matcher(text);
if(m.matches())
System.out.println(m.group()+" => true");
else
System.out.println("False"); => 192.156.34.56 => true
And here is the resulting regex demo.

Regex back reference to match a number (or any char sequence) with itself

I am missing something basic here. I have this regex (.*)=\1 and I am using it to match 100=100 and its failing. When I remove the back reference from the regex and continue to use the capturing group, it shows that the captured group is '100'. Why does it not work when I try to use the back reference?
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
String eqPattern = "(.*)=\1";
String input[] = {"1=1"};
testAndPrint(eqPattern, input); // this does not work
eqPattern = "(.*)=";
input = new String[]{"1=1"};
testAndPrint(eqPattern, input); // this works when the backreference is removed from the expr
}
static void testAndPrint(String regexPattern, String[] input) {
System.out.println("\n Regex pattern is "+regexPattern);
Pattern p = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);
boolean found = false;
for (String str : input) {
System.out.println("Testing "+str);
Matcher matcher = p.matcher(str);
while (matcher.find()) {
System.out.println("I found the text "+ matcher.group() +" starting at " + "index "+ matcher.start()+" and ending at index "+matcher.end());
found = true;
System.out.println("Group captured "+matcher.group(1));
}
if (!found) {
System.out.println("No match found");
}
}
}
}
When I run this, I get the following output
Regex pattern is (.*)=\1
Testing 100=100
No match found
Regex pattern is (.*)=
Testing 100=100
I found the text 100= starting at index 0 and ending at index 4
Group captured 100 -->If the group contains 100, why doesnt it match when I add \1 above
?

You have to escape the pattern string.
String eqPattern = "(.*)=\\1";

I think you need to escape the backslash.
String eqPattern = "(.*)=\\1";

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex expression in java using wildcards - java

Related

How to capture lookbehind using java

Camel-Case to Sentence-Case in Java

Regex Pattern required in java for matching string starts with '{{' and ends with "}}"

Why Regular expression matches one character less at the end?

Regex back reference to match a number (or any char sequence) with itself

Categories

Resources