Extract two substrings from String surronded by special string (Java) - java

I have a String like this..
I am a !!!guy!!! but I like !!!cats!!! better than dogs.
I need the strings within the exclamation Strings (!!!), a collection of Strings or array will do.
I can probably do this a dirty way with String's substring and indexOf, but if you can suggest a better way with regular expressions or just cleaner code that would be much appreciated.
Thanks.

You can use a simple regex like this:
!!!(.*?)!!!
And then grab the capturing group content
Working demo
Match information
MATCH 1
1. [10-13] `guy`
MATCH 2
1. [31-35] `cats`
You can use something like this java code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "I am a !!!guy!!! but I like !!!cats!!! better than dogs.";
String pattern = "!!!(.*?)!!!";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while (m.find( )) {
//--> If you want a array do the logic you want with m.group(1)
System.out.println("Found value: " + m.group(1) );
}
}
}

Related

Java Regex capture nested matches

I am having trouble with regex here.
Say i have this input:
608094.21.1.2014.TELE.&BIG00Z.1.1.GBP
My regex looks like this
(\d\d\d\d\.\d?\d\.\d?\d)|(\d?\d\.\d?\d\.\d?\d?\d\d)
I want to extract the date 21.1.2014 out of the string, but all i get is
8094.21.1
I think my problem here is, that 21.1.2014 starts within the (wrong) match before. Is there a simple way to make the matcher look for the next match not after the end of the match before but one character after the beginning of the match before?
You could use a regex like this:
\d{1,2}\.\d{1,2}\.\d{4}
Working demo
Or shorten it and use:
(\d{1,2}\.){2}\d{4}
If the date is always surrounded by dot:
\.(\d\d\d\d\.\d?\d\.\d?\d|\d?\d\.\d?\d\.\d?\d?\d\d)\.
I hope this will help you.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String x = "608094.21.1.2014.TELE.&BIG00Z.1.1.GBP";
String pattern = "[0-9]{2}.[0-9]{1}.[0-9]{4}";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(x);
if (m.find( )) {
System.out.println("Found value: " + m.group() );
}else {
System.out.println("NO MATCH");
}
}

java regex matching &[text]

I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text] and need to get at the text. Strings could look like: something &[text] &[text] anything &[text] etc.
I have tried the following regex but I cannot seem to get it to work: &\[(.*)\]
Any help would be greatly appreciated.
Brackets are a bit tricky regarding escaping. Try this:
Pattern r = Pattern.compile("&\\[([^\\]]*)\\]");
Matcher m = r.matcher("foo &[bla] [foo] &[blub]&[blab]");
while (m.find()) {
System.out.println("Found value: " + m.group(1));
}
I replaced your dot with a group of any sign that is not a closing bracket. The star operator would otherwise greedily match until the very end of the string. You could also suppress the greedy matching with a question mark, this reads even better: "&\\[(.*?)\\]"
Two things you need to do:
Double escape your square brackets
Prevent the capture group from matching other occurrences of the pattern, by preventing it from matching an opening or a closing bracket
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "&[test] something ] something &[test2]";
Pattern pattern = Pattern.compile("&\\[([^\\[\\]]*)\\]");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println("capture group: " + matcher.group(1));
}
}
}

Convert this pattern to regex for Pattern.matches(..)

Some of my strings may contain a substring that looks like #[alph4Num3ric-alph4Num3ric] , where I will find the alpha numberic id and replace it with a corresponding text value mapped to the associated key in a map.
My first inclination was to check if my string.contains("#[") but I want to be more specific
so now I am looking at Pattern.matches( but am unsure of the regex and total expression
how would I regex for #[ ...... - .... ] in the Pattern.matches method, it must also account for dashes. So I'm not sure what needs to be escaped in this syntax or wildcarded, or more.
I am also not 100% sure if this is the best message. I want to get a boolean from Pattern.matches first, and then get the real value and modify the string with those values, which seems good enough, but I want to minimize computations.
Plese try this ,
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
// TODO Auto-generated method stub
String expression = "String contains #[alph4Num3ric-alph4Num3ric] as substring";
Pattern pattern = Pattern
.compile("\\#\\[([a-zA-Z0-9]+)-([a-zA-Z0-9]+)\\]");
Matcher matcher = pattern.matcher(expression);
while (matcher.find()) {
System.out.println("matched: "+matcher.group());
System.out.println("group1: "+matcher.group(1));
System.out.println("group2: "+matcher.group(2));
System.out
.println("after replace "+expression.replace(matcher.group(1), "customkey"));
}
}
}
output :
matched: #[alph4Num3ric-alph4Num3ric]
group1: alph4Num3ric
group2: alph4Num3ric
after replace: String contains #[customkey-customkey] as substring
Try using this:
/#[(a-zA-Z0-9-)+]/
I haven't given it a try but hope this would help. Also if it returns an error then add a backward slash between 9 and - e.g. /#[(a-zA-Z0-9-)+]/

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

Need help in Regex to exclude splitting string within "

I need to split a String based on comma as seperator, but if the part of string is enclosed with " the splitting has to stop for that portion from starting of " to ending of it even it contains commas in between.
Can anyone please help me to solve this using regex with look around.
Resurrecting this question because it had a simple regex solution that wasn't mentioned. This situation sounds very similar to ["regex-match a pattern unless..."][4]
\"[^\"]*\"|(,)
The left side of the alternation matches complete double-quoted strings. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
Here is working code (see online demo):
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) {
String subject = "\"Messages,Hello\",World,Hobbies,Java\",Programming\"";
Pattern regex = Pattern.compile("\"[^\"]*\"|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b = new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits)
System.out.println(split);
} // end main
} // end Program
Reference
How to match pattern except in situations s1, s2, s3
Please try this:
(?<!\G\s*"[^"]*),
If you put this regex in your program, it should be:
String regex = "(?<!\\G\\s*\"[^\"]*),";
But 2 things are not clear:
Does the " only start near the ,, or it can start in the middle of content, such as AAA, BB"CC,DD" ? The regex above only deal with start neer , .
If the content has " itself, how to escape? use "" or \"? The regex above does not deal any escaped " format.

Categories

Resources