I'm writing a Pattern matching a String consisting of 4 upper-case letters.
For instance:
"AAAA"
"ABCD"
"ZZZZ"
... are all correct matches, while:
"1DFG"
"!##$"
"1234"
... should not be matched.
Find my code below.
It keeps returning false on "AAAA".
Can anyone shed some light on this please?
public static boolean checkSettings(String str) {
Pattern p = Pattern.compile("\\p{Upper}{4}");
Matcher m = p.matcher("%str".format(str));
if (m.matches()) {
return true;
} else {
// System.exit(1)
return false;
}
}
I think there's nothing wrong with your Pattern, probably something bad with your input String.
Take this example:
Pattern p = Pattern.compile("\\p{Upper}{4}");
Matcher m = p.matcher("%str".format("AAAA"));
System.out.println(m.find());
Output:
true
Warning
\\p{Upper}{4} and \\P{Upper}{4} are not the same Pattern, but rather one the opposite of one another.
The second instance negates 4 upper-case characters (see the uppercase "P"). I'm pointing this out because your question title indicates the wrong Pattern.
Final note
If you only plan to use ASCII alphabetic characters for your Pattern, you might want to use [A-Z] (upper-case important here), as mentioned by others in this thread. It's the exact equivalent of \\p{Upper}.
There is a slight difference with \\p{Lu}, which would match the Unicode category for upper-case letter.
change your pattern to:
Pattern p = Pattern.compile("[A-Z]{4}");
change your matcher to:
Matcher m = p.matcher(str);
Your code should give the correct result if you really pass in AAAA.
You should however rewrite your code like this:
public static boolean checkSettings(String str) {
Pattern p = Pattern.compile("\\p{Upper}{4}");
Matcher m = p.matcher(String.format(str));
return m.matches();
}
or even
public static boolean checkSettings(String str) {
return str.matches("\\p{Upper}{4}");
}
These samples are widely equivalent to your code. I just tested it, it returns true for AAAA.
Related
I need to match string as below:
match everything upto ;
If - occurs, match only upto - excluding -
For e.g. :
abc; should return abc
abc-xyz; should return abc
Pattern.compile("^(?<string>.*?);$");
Using above i can achieve half. but dont know how to change this pattern to achieve the second requirement. How do i change .*? so that it stops at forst occurance of -
I am not good with regex. Any help would be great.
EDIT
I need to capture it as group. i cant change it since there many other patterns to match and capture. Its only part of it that i have posted.
Code looks something like below.
public static final Pattern findString = Pattern.compile("^(?<string>.*?);$");
if(findString.find())
{
return findString.group("string"); //cant change anything here.
}
Just use a negated char class.
^[^-;]*
ie.
Pattern p = Pattern.compile("^[^-;]*");
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group());
}
This would match any character at the start but not of - or ;, zero or more times.
This should do what you are looking for:
[^-;]*
It matches characters that are not - or ;.
Tipp: If you don't feel sure with regular expressions there are great online solutions to test your input, e.g. https://regex101.com/
UPDATE
I see you have an issue in the code since you try to access .group in the Pattern object, while you need to use the .group method of the Matcher object:
public static String GetTheGroup(String str) {
Pattern findString = Pattern.compile("(?s)^(?<string>.*?)[;-]");
Matcher matcher = findString.matcher(str);
if (matcher.find())
{
return matcher.group("string"); //you have to change something here.
}
else
return "";
}
And call it as
System.out.println(GetTheGroup("abc-xyz;"));
See IDEONE demo
OLD ANSWER
Your ^(?<string>.*?);$ regex only matches 0 or more characters other than a newline from the beginning up to the first ; that is the last character in the string. I guess it is not what you expect.
You should learn more about using character classes in regex, as you can match 1 symbol from a specified character set that is defined with [...].
You can achieve this with a String.split taking the first element only and a [;-] regex that matches a ; or - literally:
String res = "abc-xyz;".split("[;-]")[0];
System.out.println(res);
Or with replaceAll with (?s)[;-].*$ regex (that matches the first ; or - and then anything up to the end of string:
res = "abc-xyz;".replaceAll("(?s)[;-].*$", "");
System.out.println(res);
See IDEONE demo
I have found the solution without removing groupings.
(?<string>.*?) matches everything upto next grouping pattern
(?:-.*?)? followed by a non grouping pattern starts with - and comes zero or once.
; end character.
So putting all together:
public static final Pattern findString = Pattern.compile("^(?<string>.*?)(?:-.*?)?;$");
if(findString.find())
{
return findString.group("string"); //cant change anything here.
}
I need to check a string whether it includes a specific arrangements of letters and numbers.
Valid arrangements are for example:
X
X-Y
A-H-K-L-J-Y
A-H-J-Y
123
12?
12*
12-17
Invalid are for example:
-X-Y
-XY
*12
?12
I have written this method in java to solve this problem (but i donĀ“t have some experiences with regular expressions):
public boolean checkPatternMatching(String sourceToScan, String searchPattern) {
boolean patternFounded;
if (sourceToScan == null) {
patternFounded = false;
} else {
Pattern pattern = Pattern.compile(Pattern.quote(searchPattern),
Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(sourceToScan);
patternFounded = matcher.find();
}
return patternFounded;
}
How can i implemented this requirement with regular expressions?
By the way: It is a good solution to check a string, whether it includes numeric content by using the method isNumeric from the java class StringUtils?
//EDIT
The link, which was edited by the admins includes not specific arrangements of characters but only an appearance of characters with regular expressions in general !
After a good while trying to help, answering to constantly changing questions, just found out that the same was asked yesterday, and that the OP doesn't accept answers to his questions...all I have left to say is good night sir, good luck
n-th answer follows:
First pattern: [a-z](-[a-z])* : a letter, possibly followed by more letters, separated by -.
Second pattern: \d+(-\d+)*[?*]* : a number, possibly followed by more numbers, separated by -, and possibly ending with ? or *.
So join them together: ^([a-z](-[a-z])*)|(\d+(-\d+)*[?*]*)$. ^ and $ mark the beginning and the end of the string.
Few more comments on the code: you don't need to use Pattern.quote, and you should use matches() instead of find(), because find() returns true if any part of the string matches the pattern, and you want the whole string:
public static boolean checkPatternMatching(String sourceToScan, String searchPattern) {
boolean patternFounded;
if (sourceToScan == null) {
patternFounded = false;
} else {
Pattern pattern = Pattern.compile(searchPattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(sourceToScan);
patternFounded = matcher.matches();
}
return patternFounded;
}
Called like this: checkPatternMatching(s, "^([a-z](-[a-z])*)|(\\d+(-\\d+)*[?*]*)$")
About the second question, this is the current implementation of StringUtils.isNumeric:
public static boolean isNumeric(final CharSequence cs) {
if (isEmpty(cs)) {
return false;
}
final int sz = cs.length();
for (int i = 0; i < sz; i++) {
if (Character.isDigit(cs.charAt(i)) == false) {
return false;
}
}
return true;
}
So no, there is nothing wrong about it, that is as simple as it gets. But you need to include an external JAR in your program, which I find unnecessary if you just want to use such a simple method.
I believe that you should first remove the Pattern.quote() method because that would turn the inputting patterns into string literals; and those are not really useful in your context.
To match the valid arrangements with letters, something like this should work:
^[a-z](?:-[a-z])*$
For the numbers (if I understood the rules correctly):
^\\d+(?:[?*]|-\\d+)*$
And if you want to combine them:
^(?:[a-z](?:-[a-z])*|\\d+(?:[?*]|-\\d+)*)$
I'm not familiar with Java itself, nor the isNumeric method, sorry.
As per your comment, if you want to accept *12 or 1?2 or 12*456, you can use:
^\\*?\\d+(?:[?*]\\d*|-\\d+)*$
Then add it to the previous regex like so:
^(?:[a-z](?:-[a-z])*|\\*?\\d+(?:[?*]\\d*|-\\d+)*)$
How do you build a regex to return for the characters between < and # of a string?
For example <1001#10.2.2.1> would return 1001.
Would something using <.?> work?
Would something using "<.?>" work?
A slightly modified version of it would work: <.*?# (you need an # at the end, and you need a reluctant quantifier *? in place of an optional mark ?). However it could be inefficient because of backtracking. Something like this would be better:
<([^#]*)#
This expression starts by finding <, taking as many non-# characters as it could, and capturing the # before stopping.
Parentheses denote a capturing group. Use regex API to extract it:
Pattern p = Pattern.compile("<([^#]*)#");
Matcher m = p.matcher("<1001#10.2.2.1>");
if (m.find()) {
System.out.println(m.group(1));
}
This prints 1001 (demo).
What about the next:
(?<=<)[^#]*
e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("(?<=<)[^#]*");
public static void main(String[] args) {
String input = "<1001#10.2.2.1>";
Matcher matcher = REGEX_PATTERN.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Output:
1001
Um.
<([0-9]*?)#
I'm assuming it's numbers only.
if all characters use this..
<(.*?)#
tested here..
Maybe i'm lacking knowledge but my understanding of regex is that you need () to get the capture groups... otherwise if you don't you'll just be selecting characters without actually "capturing" them.
so this..
<.?>
won't do anything .
Trying to make a regex that grabs all words like lets just say, chicken, that are not in brackets. So like
chicken
Would be selected but
[chicken]
Would not. Does anyone know how to do this?
String template = "[chicken]";
String pattern = "\\G(?<!\\[)(\\w+)(?!\\])";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(template);
while (m.find())
{
System.out.println(m.group());
}
It uses a combination of negative look-behind and negative look-aheads and boundary matchers.
(?<!\\[) //negative look behind
(?!\\]) //negative look ahead
(\\w+) //capture group for the word
\\G //is a boundary matcher for marking the end of the previous match
(please read the following edits for clarification)
EDIT 1:
If one needs to account for situations like:
"chicken [chicken] chicken [chicken]"
We can replace the regex with:
String regex = "(?<!\\[)\\b(\\w+)\\b(?!\\])";
EDIT 2:
If one also needs to account for situations like:
"[chicken"
"chicken]"
As in one still wants the "chicken", then you could use:
String pattern = "(?<!\\[)?\\b(\\w+)\\b(?!\\])|(?<!\\[)\\b(\\w+)\\b(?!\\])?";
Which essentially accounts for the two cases of having only one bracket on either side. It accomplishes this through the | which acts as an or, and by using ? after the look-ahead/behinds, where ? means 0 or 1 of the previous expression.
I guess you want something like:
final Pattern UNBRACKETED_WORD_PAT = Pattern.compile("(?<!\\[)\\b\\w+\\b(?!])");
private List<String> findAllUnbracketedWords(final String s) {
final List<String> ret = new ArrayList<String>();
final Matcher m = UNBRACKETED_WORD_PAT.matcher(s);
while (m.find()) {
ret.add(m.group());
}
return Collections.unmodifiableList(ret);
}
Use this:
/(?<![\[\w])\w+(?![\w\]])/
i.e., consecutive word characters with no square bracket or word character before or after.
This needs to check both left and right for both a square bracket and a word character, else for your input of [chicken] it would simply return
hicke
Without look around:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatchingTest
{
private static String x = "pig [cow] chicken bull] [grain";
public static void main(String[] args)
{
Pattern p = Pattern.compile("(\\[?)(\\w+)(\\]?)");
Matcher m = p.matcher(x);
while(m.find())
{
String firstBracket = m.group(1);
String word = m.group(2);
String lastBracket = m.group(3);
if ("".equals(firstBracket) && "".equals(lastBracket))
{
System.out.println(word);
}
}
}
}
Output:
pig
chicken
A bit more verbose, sure, but I find it more readable and easier to understand. Certainly simpler than a huge regular expression trying to handle all possible combinations of brackets.
Note that this won't filter out input like [fence tree grass]; it will indicate that tree is a match. You cannot skip tree in that without a parser. Hopefully, this is not a case you need to handle.
public class PatternTest {
public static void main(String[] args) {
System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.)"));
}
}
This program prints "false". What?!
I am expecting to match the prefix of the string: "117_117_0009v0_1"
I know this stuff, really I do... but for the life of me, I've been staring at this for 20 minutes and have tried every variation I can think of and I'm obviously missing something simple and obvious here.
Hoping the many eyes of SO can pick it out for me before I lose my mind over this.
Thanks!
The final working version ended up as:
String text = "117_117_0009v0_172_5738_5740";
String regex = "[0-9_]+v._.";
Pattern p = Pattern.compile(regex);
Mather m = p.matcher(text);
if (m.lookingAt()) {
System.out.println(m.group());
}
One non-obvious discovery/reminder for me was that before accessing matcher groups, one of matches() lookingAt() or find() must be called. If not an IllegalStateException is thrown with the unhelpful message "Match not found". Despite this, groupCount() will still return non-zero, but it lies. Do not beleive it.
I forgot how ugly this API is. Argh...
by default Java sticks in the ^ and $ operators, so something like this should work:
public class PatternTest {
public static void main(String[] args) {
System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.).*$"));
}
}
returns:
true
Match content:
117_117_0009v0_1
This is the code I used to extract the match:
Pattern p = Pattern.compile("^([0-9_]+v._.).*$");
String str = "117_117_0009v0_172_5738_5740";
Matcher m = p.matcher(str);
if (m.matches())
{
System.out.println(m.group(1));
}
If you want to check if a string starts with the certain pattern you should use Matcher.lookingAt() method:
Pattern pattern = Pattern.compile("([0-9_]+v._.)");
Matcher matcher = pattern.matcher("117_117_0009v0_172_5738_5740");
if (matcher.lookingAt()) {
int groupCount = matcher.groupCount();
for (int i = 0; i <= groupCount; i++) {
System.out.println(i + " : " + matcher.group(i));
}
}
Javadoc:
boolean
java.util.regex.Matcher.lookingAt()
Attempts to match the input sequence,
starting at the beginning of the
region, against the pattern. Like the
matches method, this method always
starts at the beginning of the region;
unlike that method, it does not
require that the entire region be
matched. If the match succeeds then
more information can be obtained via
the start, end, and group methods.
I donno Java Flavor of Regular Expression However This PCRE Regular Expression Should work
^([\d_]+v\d_\d).+
Dont know why you are using ._. instead of \d_\d