need regex pattern to split String - java

How to split the following string
String str = "(obj.userAge EQUALS 51) AND (obj.userAddress CONTAINS STREET1)";
so that I should get
string 1 = "obj.userAge EQUALS 512";
string 2 = "obj.userAddress CONTAINS STREET1";
I tried with
str.split("AND")
but I need string without brackets
sometime I can get a string from the database like
String str = "(obj.userAge EQUALS 51) AND (obj.userAddress CONTAINS STREET1) OR (obj2.salary >= 3000)";
so now OR is added.

Try this:
public static void main(String[] args) {
String str = "(obj.userAge EQUALS 51) AND (obj.userAddress CONTAINS STREET1)";
Pattern pattern = Pattern.compile("\\((.+?)\\)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Output:
obj.userAge EQUALS 51
obj.userAddress CONTAINS STREET1

You should match it with this regex
(?<=\\().*?(?=\\))
. matches a single character.
* is a quantifier that matches 0 to many preceding character
.*? matches 0 to many characters lazily
(?<=) is a positive lookbehind which checks for a pattern before the current position.So (?<=a)b would match b only if it's preceded by a
(?=) is a positive lookahead which checks for a pattern after the current position.So a(?=b) would match a only if it's followed by b

This regex would generate a group for each of the brackets which you then can retrieve: \((.*)\)\sAND\s\((.*)\)

And yet one more (for the fun of it) using guava:
Iterable<String> result = Splitter.on(Pattern.compile("\\(|\\)|AND")).omitEmptyStrings().trimResults().split(input);

Related

How to use Pattern, Matcher in Java regex API to remove a specific line

I have a complicate string split, I need to remove the comments, spaces, and keep all the numbers but change all string into character. If the - sign is at the start and followed by a number, treat it as a negative number rather than a operator
the comment has the style of ?<space>comments<space>? (the comments is a place holder)
Input :
-122+2 ? comments ?sa b
-122+2 ? blabla ?sa b
output :
["-122","+","2","?","s","a","b"]
(all string into character and no space, no comments)
Replace the unwanted string \s*\?\s*\w+\s*(?=\?) with "". You can chain String#replaceAll to remove any remaining whitespace. Note that ?= means positive lookahead and here it means \s*\?\s*\w+\s* followed by a ?. I hope you already know that \s specifies whitespace and \w specifies a word character.
Then you can use the regex, ^-\d+|\d+|\D which means either negative integer in the beginning (i.e. ^-\d+) or digits (i.e. \d+) or a non-digit (\D).
Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "-122+2 ? comments ?sa b";
str = str.replaceAll("\\s*\\?\\s*\\w+\\s*(?=\\?)", "").replaceAll("\\s+", "");
Pattern pattern = Pattern.compile("^-\\d+|\\d+|\\D");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
Output:
-122
+
2
?
s
a
b

Java Regex to extract substring with optional trailing slash

Regex:
\/test\/(.*|\/?)
Input
/something/test/{abc}/listed
/something/test/{abc}
Expected
{abc} for both the inputs
You need to capture all characters other than / after /test/:
String s = "/something/test/{abc}/listed";
Pattern pattern = Pattern.compile("/test/([^/]+)"); // or "/test/\\{([^/}]+)"
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
See the online demo
Details:
/test/ - matches /test/
([^/]+) - matches and captures into Group 1 one or more (+) (but as many as possible, since + is greedy) characters other than / (due to the negated character class [^/]).
Note that in Java regex patterns you do not need to escape / since it is not a special character and one needs no regex delimiters.
This should work for you :
public static void main(String[] args) {
String s1 = "/something/test/{abc}/listed";
String s2 = "/something/test/{abc}";
System.out.println(s1.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
System.out.println(s2.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
}
O/P :
{abc}
{abc}
Regex (as Java string, that is with doubled backslashes):
".*\\/test\\/([^/]*).*"

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.
Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Extracting both matching and not matching regex

I have a String like this one abc3a de'f gHi?jk I want to split it into the substrings abc3a, de'f, gHi, ? and jk. In other terms, I want to return Strings that match the regular expression [a-zA-Z0-9'] and the Strings that do not match this regular expression. If there is a way to tell whether each resulting substring is a match or not, this will be a plus.
Thanks!
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void main(String []args){
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9']*)?");
String str = "abc3a de'f gHi?jk";
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
if(matcher.group(1).length() > 0)
System.out.println("Match:" + matcher.group(1));
if(matcher.group(2).length() > 0)
System.out.println("Miss: `" + matcher.group(2) + "`");
}
}
}
Output:
Match:abc3a
Miss: ` `
Match:de'f
Miss: ` `
Match:gHi
Miss: `?`
Match:jk
If you don't want white space.
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9'\\s]*)?");
Output:
Match:abc3a
Match:de'f
Match:gHi
Miss: `?`
Match:jk
You can use this regex:
"[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+"
Will give:
["abc3a", "de'f", "gHi", "?", "jk"]
Online Demo: http://regex101.com/r/xS0qG4
Java code:
Pattern p = Pattern.compile("[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+");
Matcher m = p.matcher("abc3a de'f gHi?jk");
while (m.find())
System.out.println(m.group());
OUTPUT
abc3a
de'f
gHi
?
jk
myString.split("\\s+|(?<=[a-zA-Z0-9'])(?=[^a-zA-Z0-9'\\s])|(?<=[^a-zA-Z0-9'\\s])(?=[a-zA-Z0-9'])")
splits at all the boundaries between runs of characters in that charset.
The lookbehind (?<=...) matches after a character in a run, while the lookahead (?=...) matches before a character in a run of characters outside the set.
The \\s+ is not a boundary match, and matches a run of whitespace characters. This has the effect of removing white-space from the result entirely.
The | allows causing splitting to happy at either boundary or at a run of white-space.
Since the lookbehind and lookahead are both positive, the boundaries will not match at the start or end of the string, so there's no need to ignore empty strings in the output unless there is white-space there.
You can use anchors to split
private static String[] splitString(final String s) {
final String [] arr = s.split("(?=[^a-zA-Z0-9'])|(?<=[^a-zA-Z0-9'])");
final ArrayList<String> strings = new ArrayList<String>(arr.length);
for (final String str : arr) {
if(!"".equals(str.trim())) {
strings.add(str);
}
}
return strings.toArray(new String[strings.size()]);
}
(?=xxx) means xxx will follow here and (?<=xxx) mean xxx precedes this position.
As you did not want to include all-whitespace-matches into the result you need to filter the Array given by split.

Returning substring without markers using Pattern&Matcher

I want to use Pattern and Matcher to return the following string as multiple variables.
ArrayList <Pattern> pArray = new ArrayList <Pattern>();
pArray.add(Pattern.compile("\\[[0-9]{2}/[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2}\\]"));
pArray.add(Pattern.compile("\\[\\d{1,5}\\]"));
pArray.add(Pattern.compile("\\[[a-zA-Z[^#0-9]]+\\]"));
pArray.add(Pattern.compile("\\[#.+\\]"));
pArray.add(Pattern.compile("\\[[0-9]{10}\\]"));
Matcher iMatcher;
String infoString = "[03/12/13 10:00][30][John Smith][5554215445][#Comment]";
for (int i = 0 ; i < pArray.size() ; i++)
{
//out.println(pArray.get(i).toString());
iMatcher = pArray.get(i).matcher(infoString);
while (dateMatcher.find())
{
String found = iMatcher.group();
out.println(found.substring(1, found.length()-1));
}
}
}
the program outputs:
[03/12/13 10:00]
[30]
[John Smith]
[\#Comment]
[5554215445]
The only thing I need is to have the program not print the brackets and the # character.
I can easily avoid printing the brackets using substrings inside the loop but I cannot avoid the # character. # is only a comment indentifier in the string.
Can this be done inside the loop?
How about this?
public static void main(String[] args) {
String infoString = "[03/12/13 10:00][30][John Smith][5554215445][#Comment]";
final Pattern pattern = Pattern.compile("\\[#?(.+?)\\]");
final Matcher matcher = pattern.matcher(infoString);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
You just need to make the .+ non greedy and it will match everything between square brackets. We then use a match group to grab what we want rather than using the whole matched pattern, a match group is represented by (pattern). The #? matches a hash before the match group so that it doesn't get into the group.
The match group is retreived using matcher.group(1).
Output:
03/12/13 10:00
30
John Smith
5554215445
Comment
Use lookaheads. i.e. change all your \\[ (in your regex) with positive lookbehind:
(?<=\\[)
and then change all your \\] (in your regex) with positive lookahead:
(?=\\])
finally change \\[# (in your regex) with positive lookbehind:
(?<=\\[#)

Categories

Resources