java regex matching &[text] - java

I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text] and need to get at the text. Strings could look like: something &[text] &[text] anything &[text] etc.
I have tried the following regex but I cannot seem to get it to work: &\[(.*)\]
Any help would be greatly appreciated.

Brackets are a bit tricky regarding escaping. Try this:
Pattern r = Pattern.compile("&\\[([^\\]]*)\\]");
Matcher m = r.matcher("foo &[bla] [foo] &[blub]&[blab]");
while (m.find()) {
System.out.println("Found value: " + m.group(1));
}
I replaced your dot with a group of any sign that is not a closing bracket. The star operator would otherwise greedily match until the very end of the string. You could also suppress the greedy matching with a question mark, this reads even better: "&\\[(.*?)\\]"

Two things you need to do:
Double escape your square brackets
Prevent the capture group from matching other occurrences of the pattern, by preventing it from matching an opening or a closing bracket
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "&[test] something ] something &[test2]";
Pattern pattern = Pattern.compile("&\\[([^\\[\\]]*)\\]");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println("capture group: " + matcher.group(1));
}
}
}

Related

Need regex help - How to extract a String using regex?

I am looking for regex to extract a string from another string.
"sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule REQUIRED
storeKey=true principal='test#test.net' validate=true serviceName=esaas
keyTab='<some value>' useKeyTab=true;"
How to I extract the string after keyTab= I want to retrieve the value inside the single quotes -
Use the regex keyTab='(.*?)' and match the group 1. In java, your code should look like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
String content = "\"sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule REQUIRED \r\n" +
"storeKey=true principal='test#test.net' validate=true serviceName=esaas \r\n" +
"keyTab='<some value>' useKeyTab=true;\"";
Pattern pattern = Pattern.compile("keyTab='(.*?)'");
Matcher matcher = pattern.matcher(content);
matcher.find();
System.out.println(matcher.group(1)); //<some value>
}
}
Something that will work in most regex engine is to look for both the thing you want and the thing before it.
And put the thing you want in a capture group
This regex will put what's between the quotes in capture group \1
\bkeyTab=\'([^\']*)\'
The \b is a word boundary to make sure keyTab isn't part of a larger word.
You can use this expression to find it:
keyTab='(.*?)'
It will find all the values around keyTab='...', but will only capture what is between the quotes.
[\n\r].*keyTab='\s*([^\n\r]*)'
Your desired match will be in capture group 1.

Java Regex capture nested matches

I am having trouble with regex here.
Say i have this input:
608094.21.1.2014.TELE.&BIG00Z.1.1.GBP
My regex looks like this
(\d\d\d\d\.\d?\d\.\d?\d)|(\d?\d\.\d?\d\.\d?\d?\d\d)
I want to extract the date 21.1.2014 out of the string, but all i get is
8094.21.1
I think my problem here is, that 21.1.2014 starts within the (wrong) match before. Is there a simple way to make the matcher look for the next match not after the end of the match before but one character after the beginning of the match before?
You could use a regex like this:
\d{1,2}\.\d{1,2}\.\d{4}
Working demo
Or shorten it and use:
(\d{1,2}\.){2}\d{4}
If the date is always surrounded by dot:
\.(\d\d\d\d\.\d?\d\.\d?\d|\d?\d\.\d?\d\.\d?\d?\d\d)\.
I hope this will help you.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String x = "608094.21.1.2014.TELE.&BIG00Z.1.1.GBP";
String pattern = "[0-9]{2}.[0-9]{1}.[0-9]{4}";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(x);
if (m.find( )) {
System.out.println("Found value: " + m.group() );
}else {
System.out.println("NO MATCH");
}
}

regex for letters or numbers in brackets

I am using Java to process text using regular expressions. I am using the following regular expression
^[\([0-9a-zA-Z]+\)\s]+
to match one or more letters or numbers in parentheses one or more times. For instance, I like to match
(aaa) (bb) (11) (AA) (iv)
or
(111) (aaaa) (i) (V)
I tested this regular expression on http://java-regex-tester.appspot.com/ and it is working. But when I use it in my code, the code does not compile. Here is my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tester {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("^[\([0-9a-zA-Z]+\)\s]+");
String[] words = pattern.split("(a) (1) (c) (xii) (A) (12) (ii)");
String w = pattern.
for(String s:words){
System.out.println(s);
}
}
}
I tried to use \ instead of \ but the regex gave different results than what I expected (it matches only one group like (aaa) not multiple groups like (aaa) (111) (ii).
Two questions:
How can I fix this regex and be able to match multiple groups?
How can I get the individual matches separately (like (aaa) alone and then (111) and so on). I tried pattern.split but did not work for me.
Firstly, you want to escape any backslashes in the quotation marks with another backslash. The Regex will treat it as a single backslash. (E.g. call a word character \w in quotation marks, etc.)
Secondly, you got to finish the line that reads:
String w = pattern.
That line explains why it doesn't compile.
Here is my final solution to match the individual groups of letters/numbers in brackets that appear at the beginning of a line and ignore the rest
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tester {
static ArrayList<String> listOfEnums;
public static void main(String[] args) {
listOfEnums = new ArrayList<String>();
Pattern pattern = Pattern.compile("^\\([0-9a-zA-Z^]+\\)");
String p = "(a) (1) (c) (xii) (A) (12) (ii) and the good news (1)";
Matcher matcher = pattern.matcher(p);
boolean isMatch = matcher.find();
int index = 0;
//once you find a match, remove it and store it in the arrayList.
while (isMatch) {
String s = matcher.group();
System.out.println(s);
//Store it in an array
listOfEnums.add(s);
//Remove it from the beginning of the string.
p = p.substring(listOfEnums.get(index).length(), p.length()).trim();
matcher = pattern.matcher(p);
isMatch = matcher.find();
index++;
}
}
}
1) Your regex is incorrect. You want to match individual groups of letters / numbers in brackets, and the current regex will match only a single string of one or more such groups. I.e. it will match
(abc) (def) (123)
as a single group rather than three separate groups.
A better regex that would match only up to the closing bracket would be
\([0-9a-zA-Z^\)]+\)
2) Java requires you to escape all backslashes with another backslash
3) The split() method will not do what you want. It will find all matches in your string then throw them away and return an array of what is left over. You want to use matcher() instead
Pattern pattern = Pattern.compile("\\([0-9a-zA-Z^\\)]+\\)");
Matcher matcher = pattern.matcher("(a) (1) (c) (xii) (A) (12) (ii)");
while (matcher.find()) {
System.out.println(matcher.group());
}

Extracting both matching and not matching regex

I have a String like this one abc3a de'f gHi?jk I want to split it into the substrings abc3a, de'f, gHi, ? and jk. In other terms, I want to return Strings that match the regular expression [a-zA-Z0-9'] and the Strings that do not match this regular expression. If there is a way to tell whether each resulting substring is a match or not, this will be a plus.
Thanks!
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void main(String []args){
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9']*)?");
String str = "abc3a de'f gHi?jk";
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
if(matcher.group(1).length() > 0)
System.out.println("Match:" + matcher.group(1));
if(matcher.group(2).length() > 0)
System.out.println("Miss: `" + matcher.group(2) + "`");
}
}
}
Output:
Match:abc3a
Miss: ` `
Match:de'f
Miss: ` `
Match:gHi
Miss: `?`
Match:jk
If you don't want white space.
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9'\\s]*)?");
Output:
Match:abc3a
Match:de'f
Match:gHi
Miss: `?`
Match:jk
You can use this regex:
"[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+"
Will give:
["abc3a", "de'f", "gHi", "?", "jk"]
Online Demo: http://regex101.com/r/xS0qG4
Java code:
Pattern p = Pattern.compile("[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+");
Matcher m = p.matcher("abc3a de'f gHi?jk");
while (m.find())
System.out.println(m.group());
OUTPUT
abc3a
de'f
gHi
?
jk
myString.split("\\s+|(?<=[a-zA-Z0-9'])(?=[^a-zA-Z0-9'\\s])|(?<=[^a-zA-Z0-9'\\s])(?=[a-zA-Z0-9'])")
splits at all the boundaries between runs of characters in that charset.
The lookbehind (?<=...) matches after a character in a run, while the lookahead (?=...) matches before a character in a run of characters outside the set.
The \\s+ is not a boundary match, and matches a run of whitespace characters. This has the effect of removing white-space from the result entirely.
The | allows causing splitting to happy at either boundary or at a run of white-space.
Since the lookbehind and lookahead are both positive, the boundaries will not match at the start or end of the string, so there's no need to ignore empty strings in the output unless there is white-space there.
You can use anchors to split
private static String[] splitString(final String s) {
final String [] arr = s.split("(?=[^a-zA-Z0-9'])|(?<=[^a-zA-Z0-9'])");
final ArrayList<String> strings = new ArrayList<String>(arr.length);
for (final String str : arr) {
if(!"".equals(str.trim())) {
strings.add(str);
}
}
return strings.toArray(new String[strings.size()]);
}
(?=xxx) means xxx will follow here and (?<=xxx) mean xxx precedes this position.
As you did not want to include all-whitespace-matches into the result you need to filter the Array given by split.

Java regex patterns

I need help with this matter. Look at the following regex:
Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s1);
I want to look for words like this: "home-made", "aaaa-bbb" and not "aaa - bbb", but not
"aaa--aa--aaa". Basically, I want the following:
word - hyphen - word.
It is working for everything, except this pattern will pass: "aaa--aaa--aaa" and shouldn't. What regex will work for this pattern?
Can can remove the backslash from your expression:
"[A-Za-z]+-[A-Za-z]+"
The following code should work then
Pattern pattern = Pattern.compile("[A-Za-z]+-[A-Za-z]+");
Matcher matcher = pattern.matcher("aaa-bbb");
match = matcher.matches();
Note that you can use Matcher.matches() instead of Matcher.find() in order to check the complete string for a match.
If instead you want to look inside a string using Matcher.find() you can use the expression
"(^|\\s)[A-Za-z]+-[A-Za-z]+(\\s|$)"
but note that then only words separated by whitespace will be found (i.e. no words like aaa-bbb.). To capture also this case you can then use lookbehinds and lookaheads:
"(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])"
which will read
(?<![A-Za-z-]) // before the match there must not be and A-Z or -
[A-Za-z]+ // the match itself consists of one or more A-Z
- // followed by a -
[A-Za-z]+ // followed by one or more A-Z
(?![A-Za-z-]) // but afterwards not by any A-Z or -
An example:
Pattern pattern = Pattern.compile("(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])");
Matcher matcher = pattern.matcher("It is home-made.");
if (matcher.find()) {
System.out.println(matcher.group()); // => home-made
}
Actually I can't reproduce the problem mentioned with your expression, if I use single words in the String. As cleared up with the discussion in the comments though, the String s contains a whole sentence to be first tokenised in words and then matched or not.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExp {
private static void match(String s) {
Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("'" + s + "' match");
} else {
System.out.println("'" + s + "' doesn't match");
}
}
/**
* #param args
*/
public static void main(String[] args) {
match(" -home-made");
match("home-made");
match("aaaa-bbb");
match("aaa - bbb");
match("aaa--aa--aaa");
match("home--home-home");
}
}
The output is:
' -home-made' doesn't match
'home-made' match
'aaaa-bbb' match
'aaa - bbb' doesn't match
'aaa--aa--aaa' doesn't match
'home--home-home' doesn't match

Categories

Resources