Regex match repeatation punctuation in java - java

I have some punctuation [] punctuation = {'.', ',' , '!', '?'};. And I want create a regex that can match the word that was combined from those punctuations.
For example some string I want to find: "....???", "!!!!!......", "??.....!", so on.
Thanks for any advice.

Use String.matches() with the posix regex for "punctuation":
str.matches("\\p{Punct}+");
FYI according to the Pattern javadoc, \p{Punct} is one of
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
Also, The ^ and $ aren't needed in the expression either, because matches() must matche the whole input to return true, so start and end are implied.

Try this, it should match and group all the symbols written between []:
([.,!?]+)
Tested it with
??..,..!fsdgsdfgsdfgsdfg
And output was
??..,..!
Also tested with this:
String s = "??.....!fsdgsdfgsdfgsdfg?.,!0000a";
Pattern p = Pattern.compile("([.,!?]+)");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1));
}
And output was
??.....!
?.,!

You can try with a Unicode category for punctuation and a while loop to match your input, as such:
String test = "!...abcd??...!!efgh....!!??abc!";
Pattern pattern = Pattern.compile("\\p{Punct}{2,}");
Matcher matcher = pattern.matcher(test);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
!...
??...!!
....!!??
Note: this has the advantage of matching any punctuation character sequence larger than 1 character (hence, the last "!" is not matched by design). To decide the minimum length of the punctuation sequence, just play with the {2,} part of the Pattern.

Related

Java Regex to extract substring with optional trailing slash

Regex:
\/test\/(.*|\/?)
Input
/something/test/{abc}/listed
/something/test/{abc}
Expected
{abc} for both the inputs
You need to capture all characters other than / after /test/:
String s = "/something/test/{abc}/listed";
Pattern pattern = Pattern.compile("/test/([^/]+)"); // or "/test/\\{([^/}]+)"
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
See the online demo
Details:
/test/ - matches /test/
([^/]+) - matches and captures into Group 1 one or more (+) (but as many as possible, since + is greedy) characters other than / (due to the negated character class [^/]).
Note that in Java regex patterns you do not need to escape / since it is not a special character and one needs no regex delimiters.
This should work for you :
public static void main(String[] args) {
String s1 = "/something/test/{abc}/listed";
String s2 = "/something/test/{abc}";
System.out.println(s1.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
System.out.println(s2.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
}
O/P :
{abc}
{abc}
Regex (as Java string, that is with doubled backslashes):
".*\\/test\\/([^/]*).*"

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.
Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Android Java regexp pattern

I ping a host. In result a standard output. Below a REGEXP but it do not work correct. Where I did a mistake?
String REGEXP ="time=(\\\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
You only need \\d+ in your regex because
Matcher looks for the pattern (using which it is created) and then tries to find every occurance of the pattern in the string being matched.
Use while(matcher.group(1) in case of multiple occurances.
each () represents a captured group.
You have too many backslashes. Assuming you want to get the number from a string like "time=32ms", then you need:
String REGEXP ="time=(\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
Explanation: The search pattern you are looking for is "\d", meaning a decimal number, the "+" means 1 or more occurrences.
To get the "\" to the matcher, it needs to be escaped, and the escape character is also "\".
The brackets define the matching group that you want to pick out.
With "\\\\d+", the matcher sees this as "\\d+", which would match a backslash followed by one or more "d"s. The first backslash protects the second backslash, and the third protects the fourth.

Java regexto match tuples

I need to extract tuples out of string
e.g. (1,1,A)(2,1,B)(1,1,C)(1,1,D)
and thought some regex like:
String tupleRegex = "(\\(\\d,\\d,\\w\\))*";
would work but it just gives me the first tuple. What would be proper regex to match all the tuples in the strings.
Remove the * from the regex and iterate over the matches using a java.util.regex.Matcher:
String input = "(1,1,A)(2,1,B)(1,1,C)(1,1,D)";
String tupleRegex = "(\\(\\d,\\d,\\w\\))";
Pattern pattern = Pattern.compile(tupleRegex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
System.out.println(matcher.group());
}
The * character is a quantifier that matches zero or more tuples. Hence your original regex would match the entire input string.
One line solution using String.split() method and here is the pattern (?!^\\()(?=\\()
Arrays.toString("(1,1,A)(2,1,B)(1,1,C)(1,1,D)".split("(?!^\\()(?=\\()"))
output:
[(1,1,A), (2,1,B), (1,1,C), (1,1,D)]
Here is DEMO as well.
Pattern explanation:
(?! look ahead to see if there is not:
^ the beginning of the string
\( '('
) end of look-ahead
(?= look ahead to see if there is:
\( '('
) end of look-ahead

java Pattern Matching issue

I have an issue to write proper regex to match URL.
String input = "AAAhttp://www.gmail.comBBBBabc#gmail.com"
String regex = "www.*.com" // To match www.gmail.com URL
Pattern p = Pattern.compile(regex)
Matcher m = p.matcher(input)
while(m.find()){
}
Here I want to remove the Url www.gmail.com. However it matches till end of string to match email address also which ends with gmail.com.
Can someone help me to get proper regex to match only the URL?
.* does a greedy match. You have to add ? after * to does an reluctant match.
"www\\..*?\\.com"
Your code would be,
String s = "AAAhttp://www.gmail.comBBBBabc#gmail.com";
Pattern p = Pattern.compile("www\\..*?\\.com");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(0));
}
IDEONE
String regex = "www\\..*?\\.com"
Non-greedy repetition of the wildcard '.' and escape dot when literally
A negated character class is faster than .*?
Use this regex:
www\.[^.]+\.com
[^.]+ means any character that is not a dot.
In Java we need to escape some characters:
// for instance
Pattern regex = Pattern.compile("www\\.[^.]+\\.com");
// etc

Categories

Resources