How can I use the $+ or $& regular expressions in Java? - java

I've been searching for a while trying to figure out why this wont work- I found the exact thing I want to accomplish Simple regex replace to keep original string but I can't seem to use the regular expression $+ or $& in Java
like so:
String S1 = "bob";
String S2 = "the builder";
Pattern p = Pattern.compile(S1, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(ST);
ST = m.replaceAll("$+/"+S2);

Use $0 to refer to the whole match in replacement pattern.

Related

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

Find a subtring in a string using a regular expression

Suppose i have a string kk a.b.cjkmkc jjkocc a.b.c.
I want to find the substring a.b.c in the string , but it is not working.
Here is my code
Pattern p = Pattern.compile("a.b.c");
Matcher m = p.matcher(str);
int x = m.find()
The . in Java Pattern is a special character: "Any character (may or may not match line terminators)" (from the java.util.regex.Pattern web page).
Try escaping it:
Pattern p = Pattern.compile("a\\.b\\.c");
Also note:
Matcher.find returns boolean, not int.
Patterns take double escapes
As others have mentioned, . is a special charater in regular expressions. You can let Java quote sepcial characters using Pattern.quote. BTW: What about String.indexof(String) (which is faster). if you really need regular expressions, have a look at this:
String str = "kk a.b.cjkmkc jjkocc a.b.c.";
Pattern p = Pattern.compile(Pattern.quote("a.b.c"));
Matcher m = p.matcher(str);
while (m.find()) {
int x = m.start();
// ...
}

Extract text from two different types of chars using regex

I'm trying to extract strings from a text that features two different types of characters.
The characters are | and # and the text is coming from an outside source.
I will give you an example:
Input: #hello|#what|whatsup| should return hello| and whatsup.
Input: #hello# should return hello
Input: |ola|1 should return ola
Input: |hello#|what#whatsup#node should return hello# and whatsup
This works for your strings. I don't know if I completely understood what you need, but I think it can be tuned if necessary:
String s1 = "#hello|#what|whatsup|";
String s2 = "#hello#";
String s3 = "|ola|1";
String s4 = "|hello#|what#whatsup#node";
Pattern pattern = Pattern.compile("((\\w)+)(\\||#)(\\||#)?");
Matcher matcher = pattern.matcher(s4);
while(matcher.find()) {
System.out.println(matcher.group(1) + (matcher.group(4) != null ? matcher.group(4).equals("|")? "#" : "|" : ""));
matcher.find(); //to jump over the next match
}
Update:
I just read the middlerecursion example. Doesn't work for that, I'm afraid, and I have to leave my computer for a while. So this is just something to get you started.
Update version that works for all the examples:
String s1 = "#hello|#what|whatsup|";
String s2 = "#hello#";
String s3 = "|ola|1";
String s4 = "|hello#|what#whatsup#node";
String s5 = "#||##||MiddleRecursion||##||#";
Pattern pattern = Pattern.compile("(#|\\|)((#|\\|)*\\w+(#|\\|)*)(#|\\|)");
Matcher matcher = pattern.matcher(s1);
while(matcher.find()) {
System.out.println(matcher.group(2));
}
Since #||##||MiddleRecursion||##||# --> ||##||MiddleRecursion||##||, I'm afraid you have to do bracket matching. In this case, there will be no general solution using regex (you can force it to work if you know the maximum consecutive appearance of | and #). The reason is that, there is middle recursion; regular expression can only solves left or right recursion.
This is also one of the reasons why HTML parsing is not possible with regex.
Ok, I'll start.
So you have to match #something# or |something|
Can you write two separate regexps that do that?
Where you'll get annoyed first is that the pipe | is a magic character in regexp. If you want to match on that character, you'll have to prefix it with \\ as per the other thread I linked.
When you have those two regexp working, let me know and I'll post more.
(I'm heading out for a few hours...)

java regular expressions matching specific urls

I once built a program in php that used very specific regular expressions to match links, however that pattern doesnt seem to work in java, Im trying to find the java equivalent of
"~http://(bit.ly|t.co)~"
in php this would would match links such as http://t.co/UURRNlrK and http://bit.ly/AenG5W what would be a java equivalent of this?
I think you are looking for
String str = "http://t.co/UURRNlrK";
String p = "(http://(t\.co|bit\.ly).*)";
Pattern pattern = Pattern.compile(p);
Matcher matcher = pattern.matcher(str);
if(matcher.find())
System.out.println(matcher.group(0));
Output = http://t.co/UURRNlrK
if str = "http://bit.ly/AenG5W"
Output = http://bit.ly/AenG5W
Here is a nice Regex Tutorial for java.
http://(bit\.ly|t\.co)/\w*
I think this one would result same as the upper ones
I tried this:
String str = "http://bit.ly/asdfsd";
if(str.matches("http://(bit\.ly|t\.co).+")){
System.out.println("hurray");
}

extract substring in java using regex

I need to extract "URPlus1_S2_3" from the string:
"Last one: http://abc.imp/Basic2#URPlus1_S2_3,"
using regular expression in Java language.
Can someone please help me? I am using regex for the first time.
Try
Pattern p = Pattern.compile("#([^,]*)");
Matcher m = p.matcher(myString);
if (m.find()) {
doSomethingWith(m.group(1)); // The matched substring
}
String s = "Last one: http://abc.imp/Basic2#URPlus1_S2_3,";
Matcher m = Pattern.compile("(URPlus1_S2_3)").matcher(s);
if (m.find()) System.out.println(m.group(1));
You gotta learn how to specify your requirements ;)
You haven't really defined what criteria you need to use to find that string, but here is one way to approach based on '#' separator. You can adjust the regex as necessary.
expr: .*#([^,]*)
extract: \1
Go here for syntax documentation:
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
String s = Last one: http://abc.imp/Basic2#URPlus1_S2_3,"
String result = s.replaceAll(".*#", "");
The above returns the full String in case there's no "#". There are better ways using regex, but the best solution here is using no regex. There are classes URL and URI doing the job.
Since it's the first time you use regular expressions I would suggest going another way, which is more understandable for now (until you master regular expressions ;) and it will be easily modified if you will ever need to:
String yourPart = new String().split("#")[1];
Here's a long version:
String url = "http://abc.imp/Basic2#URPlus1_S2_3,";
String anchor = null;
String ps = "#(.+),";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(url);
if (m.matches()) {
anchor = m.group(1);
}
The main point to understand is the use of the parenthesis, they are used to create groups which can be extracted from a pattern. In the Matcher object, the group method will return them in order starting at index 1, while the full match is returned by the index 0.
If you just want everything after the #, use split:
String s = "Last one: http://abc.imp/Basic2#URPlus1_S2_3," ;
System.out.println(s.split("#")[1]);
Alternatively, if you want to parse the URI and get the fragment component you can do:
URI u = new URI("http://abc.imp/Basic2#URPlus1_S2_3,");
System.out.println(u.getFragment());

Categories

Resources