What's up with this regular expression not matching? - java

public class PatternTest {
public static void main(String[] args) {
System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.)"));
}
}
This program prints "false". What?!
I am expecting to match the prefix of the string: "117_117_0009v0_1"
I know this stuff, really I do... but for the life of me, I've been staring at this for 20 minutes and have tried every variation I can think of and I'm obviously missing something simple and obvious here.
Hoping the many eyes of SO can pick it out for me before I lose my mind over this.
Thanks!
The final working version ended up as:
String text = "117_117_0009v0_172_5738_5740";
String regex = "[0-9_]+v._.";
Pattern p = Pattern.compile(regex);
Mather m = p.matcher(text);
if (m.lookingAt()) {
System.out.println(m.group());
}
One non-obvious discovery/reminder for me was that before accessing matcher groups, one of matches() lookingAt() or find() must be called. If not an IllegalStateException is thrown with the unhelpful message "Match not found". Despite this, groupCount() will still return non-zero, but it lies. Do not beleive it.
I forgot how ugly this API is. Argh...

by default Java sticks in the ^ and $ operators, so something like this should work:
public class PatternTest {
public static void main(String[] args) {
System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.).*$"));
}
}
returns:
true
Match content:
117_117_0009v0_1
This is the code I used to extract the match:
Pattern p = Pattern.compile("^([0-9_]+v._.).*$");
String str = "117_117_0009v0_172_5738_5740";
Matcher m = p.matcher(str);
if (m.matches())
{
System.out.println(m.group(1));
}

If you want to check if a string starts with the certain pattern you should use Matcher.lookingAt() method:
Pattern pattern = Pattern.compile("([0-9_]+v._.)");
Matcher matcher = pattern.matcher("117_117_0009v0_172_5738_5740");
if (matcher.lookingAt()) {
int groupCount = matcher.groupCount();
for (int i = 0; i <= groupCount; i++) {
System.out.println(i + " : " + matcher.group(i));
}
}
Javadoc:
boolean
java.util.regex.Matcher.lookingAt()
Attempts to match the input sequence,
starting at the beginning of the
region, against the pattern. Like the
matches method, this method always
starts at the beginning of the region;
unlike that method, it does not
require that the entire region be
matched. If the match succeeds then
more information can be obtained via
the start, end, and group methods.

I donno Java Flavor of Regular Expression However This PCRE Regular Expression Should work
^([\d_]+v\d_\d).+
Dont know why you are using ._. instead of \d_\d

Related

Unable to Match Using Regex in Java

I asked this question a while ago, but did not get a proper answer, so giving it another shot.
class Test {
public static void main (String[] args) throws java.lang.Exception
{
String file_name = "C:\\Temp\\Test.txt";
String string = FileUtils.readFileToString(new File(file_name), "UTF-8");
String regex = "^(ipv6 pim(?: vrf .*?)? rp-address .*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println("Matcher: " + matcher.group(1));
} else {
System.out.println("No Matches");
}
}
}
The file contains a lot of lines, more than 750, i guess, I want to extract all the lines that match the regex value. Now the problem is, the way i have done the code, does not return any matches. I only does if the first line of the file matches the regex and nothing else, if its somewhere in the middle, no luck. I thought that since everything is in new line it is causing a problem. But even writing some code converting the string into a single line one does not return a value if the pattern does not match is at the beginning.
A sample matching string: ipv6 pim rp-address 20:20:20::F
Try giving the MULTILINE modifier :
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Instead of using an if condition, switch it to a while loop.
while (matcher.find()) {
System.out.println("Matcher: " + matcher.group(1));
}
find() searches for one matching value. To get the next one, you must invoke find() again, hence the loop.
Additionally, the ^ prevents you to match again & again as subsequent searches don't match the starting with criteria. So you may drop the ^.
Alternatively, as Rambler suggested use the Pattern.MULTILINE flag. This will ensure the ^ is used at the beginning of every new line instead of once at the beginning of the whole string.

My regex search only prints out there last match

I actually wrote a regex expression to search for web URLs in a text (full code below) but on running the code, console prints out only the last URL in the text. I don't know what's wrong and I actually used a while loop. See code below and kindly help make corrections. Thanks
import java.util.*;
import java.util.regex.*;
public class Main
{
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String args[])
{
String pattern = "([\\w \\W]*)((http://)([\\w \\W]+)(.com))";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while(m.find())
{
System.out.println(m.group(2));
}
}
}
On running the above code, only http://instagram.com gets printed to the console output
I found another RegEx here
https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)
It looks for https, but seems to be valid in your case.
I'm getting all 3 URLs printed with this code :
public class Main {
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String[] args) {
String pattern = "https?:\\/\\/(www\\.)?[-a-zA-Z0-9#:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9#:%_\\+.~#?&//=]*)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while (m.find()) {
System.out.println(m.group());
}
}
}
I hope this will clear it for you but you are matching too many characters, your match should be as restrictive as possible because regex is greedy and is going to try to match as much as possible.
here is my take on your code:
public class Main {
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String args[]) {
String pattern = "(http:[/][/][Ww.]*[a-zA-Z]+.com)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while(m.find())
{
System.out.println(m.group(1));
}
}
}
the above cote will match only your examples if you wish to match more you need to tweak it to your needs.
And a great way to live test patterns is http://www.regexpal.com/ you can tweet your pattern there to match exactly what you want just remember to replace the \ with double \\ in java for escaped caracters .
I'm not sure how reliable this pattern is, but it prints out all the URLs when I run your example.
(http://[A-Za-z0-9]+\\.[a-zA-Z]{2,3})
You will have to modify it if you encounter an url that looks like this:
http://www.instagram.com
As it will only capture URLs without the 'www'.
Perhaps you're looking for this regex:
http://(\w+(?:\.\w+)+)
For example, from this string:
http://ww1.amazon.com and http://npr.org
it extracts
"ww1.amazon.com"
"npr.org"
To break down how it works:
http:// is literal
( ... ) is the main capture group
\w+ find one or more alphanumeric characters
(?: ... ) ...followed by a non-capturing group
\.\w+ ...that contains a literal period followed by at least one alphanumeric
+ repeated one or more times
Hope this helps.
Your problem is that your regex quantifiers (i.e. the * and + characters) are greedy, meaning that they match as much as possible. You need to use reluctant quantifiers. See the corrected code pattern below - just two extra characters - a ? character after the * and + to match as little as possible.
String pattern = "([\\w \\W]*?)((http://)([\\w \\W]+?)(.com))";

What is wrong with this pattern: Pattern.compile("\\p{Upper}{4}")

I'm writing a Pattern matching a String consisting of 4 upper-case letters.
For instance:
"AAAA"
"ABCD"
"ZZZZ"
... are all correct matches, while:
"1DFG"
"!##$"
"1234"
... should not be matched.
Find my code below.
It keeps returning false on "AAAA".
Can anyone shed some light on this please?
public static boolean checkSettings(String str) {
Pattern p = Pattern.compile("\\p{Upper}{4}");
Matcher m = p.matcher("%str".format(str));
if (m.matches()) {
return true;
} else {
// System.exit(1)
return false;
}
}
I think there's nothing wrong with your Pattern, probably something bad with your input String.
Take this example:
Pattern p = Pattern.compile("\\p{Upper}{4}");
Matcher m = p.matcher("%str".format("AAAA"));
System.out.println(m.find());
Output:
true
Warning
\\p{Upper}{4} and \\P{Upper}{4} are not the same Pattern, but rather one the opposite of one another.
The second instance negates 4 upper-case characters (see the uppercase "P"). I'm pointing this out because your question title indicates the wrong Pattern.
Final note
If you only plan to use ASCII alphabetic characters for your Pattern, you might want to use [A-Z] (upper-case important here), as mentioned by others in this thread. It's the exact equivalent of \\p{Upper}.
There is a slight difference with \\p{Lu}, which would match the Unicode category for upper-case letter.
change your pattern to:
Pattern p = Pattern.compile("[A-Z]{4}");
change your matcher to:
Matcher m = p.matcher(str);
Your code should give the correct result if you really pass in AAAA.
You should however rewrite your code like this:
public static boolean checkSettings(String str) {
Pattern p = Pattern.compile("\\p{Upper}{4}");
Matcher m = p.matcher(String.format(str));
return m.matches();
}
or even
public static boolean checkSettings(String str) {
return str.matches("\\p{Upper}{4}");
}
These samples are widely equivalent to your code. I just tested it, it returns true for AAAA.

How would I do this in Java Regex?

Trying to make a regex that grabs all words like lets just say, chicken, that are not in brackets. So like
chicken
Would be selected but
[chicken]
Would not. Does anyone know how to do this?
String template = "[chicken]";
String pattern = "\\G(?<!\\[)(\\w+)(?!\\])";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(template);
while (m.find())
{
System.out.println(m.group());
}
It uses a combination of negative look-behind and negative look-aheads and boundary matchers.
(?<!\\[) //negative look behind
(?!\\]) //negative look ahead
(\\w+) //capture group for the word
\\G //is a boundary matcher for marking the end of the previous match
(please read the following edits for clarification)
EDIT 1:
If one needs to account for situations like:
"chicken [chicken] chicken [chicken]"
We can replace the regex with:
String regex = "(?<!\\[)\\b(\\w+)\\b(?!\\])";
EDIT 2:
If one also needs to account for situations like:
"[chicken"
"chicken]"
As in one still wants the "chicken", then you could use:
String pattern = "(?<!\\[)?\\b(\\w+)\\b(?!\\])|(?<!\\[)\\b(\\w+)\\b(?!\\])?";
Which essentially accounts for the two cases of having only one bracket on either side. It accomplishes this through the | which acts as an or, and by using ? after the look-ahead/behinds, where ? means 0 or 1 of the previous expression.
I guess you want something like:
final Pattern UNBRACKETED_WORD_PAT = Pattern.compile("(?<!\\[)\\b\\w+\\b(?!])");
private List<String> findAllUnbracketedWords(final String s) {
final List<String> ret = new ArrayList<String>();
final Matcher m = UNBRACKETED_WORD_PAT.matcher(s);
while (m.find()) {
ret.add(m.group());
}
return Collections.unmodifiableList(ret);
}
Use this:
/(?<![\[\w])\w+(?![\w\]])/
i.e., consecutive word characters with no square bracket or word character before or after.
This needs to check both left and right for both a square bracket and a word character, else for your input of [chicken] it would simply return
hicke
Without look around:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatchingTest
{
private static String x = "pig [cow] chicken bull] [grain";
public static void main(String[] args)
{
Pattern p = Pattern.compile("(\\[?)(\\w+)(\\]?)");
Matcher m = p.matcher(x);
while(m.find())
{
String firstBracket = m.group(1);
String word = m.group(2);
String lastBracket = m.group(3);
if ("".equals(firstBracket) && "".equals(lastBracket))
{
System.out.println(word);
}
}
}
}
Output:
pig
chicken
A bit more verbose, sure, but I find it more readable and easier to understand. Certainly simpler than a huge regular expression trying to handle all possible combinations of brackets.
Note that this won't filter out input like [fence tree grass]; it will indicate that tree is a match. You cannot skip tree in that without a parser. Hopefully, this is not a case you need to handle.

how can i extract an value using regex java?

i need to extract the numbers alone from this text i use sub string to extract the details some times the number decreases so i am getting an error value...
example(16656);
Use Pattern to compile your regular expression and Matcher to get a particular captured group. The regex I'm using is:
example\((\d+)\)
which captures the digits (\d+) within the parentheses. So:
Pattern p = Pattern.compile("example\\((\\d+)\\)");
Matcher m = p.matcher(text);
if (m.find()) {
int i = Integer.valueOf(m.group(1));
...
}
look at Java Regular Expression sample here:
http://java.sun.com/developer/technicalArticles/releases/1.4regex/
specially focus on find method.
String yourString = "example(16656);";
Pattern pattern = Pattern.compile("\\w+\\((\\d+)\\);");
Matcher matcher = pattern.matcher(yourString);
if (matcher.matches())
{
int value = Integer.parseInt(matcher.group(1));
System.out.println("Your number: " + value);
}
I will suggest you to write your own logic to do this. Using Pattern and Matcher things from java are good practice but these are standard solutions and may not suit as a solution in effective manner always. Like cletus provided a very neat solution but what happens in this logic is that a substring matching algorithm is performed in the background to trace digits. You do not need the pattern finding here I suppose. You just need to extract the digits from a string (like 123 from "a1b2c3") .See the following code which does it in clean manner in O(n) and does not perform unnecessary extra operation as Pattern and Matcher classes do for you (just do copy and paste and run :) ):
public class DigitExtractor {
/**
* #param args
*/
public static void main(String[] args) {
String sample = "sdhj12jhj345jhh6mk7mkl8mlkmlk9knkn0";
String digits = getDigits(sample);
System.out.println(digits);
}
private static String getDigits(String sample) {
StringBuilder out = new StringBuilder(10);
int stringLength = sample.length();
for(int i = 0; i <stringLength ; i++)
{
char currentChar = sample.charAt(i);
int charDiff = currentChar -'0';
boolean isDigit = ((9-charDiff)>=0&& (9-charDiff <=9));
if(isDigit)
out.append(currentChar);
}
return out.toString();
}
}

Categories

Resources