Regex - Match numbers & special cases - java

I'm trying to make a regex that would produce the following results :
for 7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3 : 7.0, 5, :asc, 8.256, :b, 2, :d, 3
for -+*-/^^ )รง# : nothing
It's should first match numbers which can be float, so in my regex I have : [0-9]+(\\.[0-9])? but it should also mach special cases like :a or :Abc.
To be more precise, it should (if possible) match anything but mathematical operators /*+^- and parentheses.
So here is my final regex : ([0-9]+(\\.[0-9])?)|(:[a-zA-Z]+) but it's not working because matcher.groupCount() returns 3 for both of the examples I gave.

Groups are what you specifically group in the regex. Anything surrounded in parentheses is a group. (Hello) World has 1 group, Hello. What you need to be doing is finding all the matches.
In your code ([0-9]+(\\.[0-9])?)|(:[a-zA-Z]+), 3 sets of parentheses can be seen. This is why you will always be given 3 groups in every match.
Your code works fine as it is, here is an example:
String text = "7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3";
Pattern p = Pattern.compile("([0-9]+(\\.[0-9]+)?)|(:[a-zA-Z]+)");
Matcher m = p.matcher(text);
List<String> matches = new ArrayList<String>();
while (m.find()) matches.add(m.group());
for (String match : matches) System.out.println(match);
The ArrayList matches will contain all of the matches that your regex finds.
The only change I made was add a + after the second [0-9].
Here is the output:
7.0
5
:asc
8.256
:b
2
:d
3
Here is some more information about groups in java.
Does that help?

Your regex is correct, run the following code:
String input = "7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3"; // your input
String regex = "(\\d+(\\.\\d+)?)|(:[a-z-A-Z]+)"; // exactly yours.
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
Your problem is the understanding of the method matcher.groupCount(). JavaDoc clearly says
Returns the number of capturing groups in this matcher's pattern.

([^\()+\-*\s])+ //put any mathematical operator inside square bracket

Related

regex for not matching alpha plus numeric range

I have the following regex
.{19}_.{3}PDR_.{8}(ABCD|CTNE|PFRE)006[0-9][0-9].{3}_.{6}\.POC
a match is for example
NRM_0157F0680884976_598PDR_T0060000ABCD00619_00_6I1N0T.POC
and would like to negate the (ABCD|CTNE|PFRE)006[0-9][0-9]
portion such that
NRM_0157F0680884976_598PDR_T0060000ABCD00719_00_6I1N0T.POC
is a match but
NRM_0157F0680884976_598PDR_T0060000ABCD007192_00_6I1N0T.POC
or
NRM_0157F0680884976_598PDR_T0060000ABCD0061_00_6I1N0T.POC
is not (the negated part must be 9 chars long just like the non negated part for a total length of 58 chars).
Consider using the following pattern:
\b(?:ABCD|CTNE|PFRE)006[0-9][0-9]\b
Sample Java code:
String input = "Matching value is ABCD00601 but EFG123 is non matching";
Pattern r = Pattern.compile("\\b(?:ABCD|CTNE|PFRE)006[0-9][0-9]\\b");
Matcher m = r.matcher(input);
while (m.find()) {
System.out.println("Found a match: " + m.group());
}
This prints:
Found a match: ABCD00601
I would like to propose this expression
(ABCD|CTNE|PFRE)006\d{1,2}
where \d{1,2} catches any one or two digit number
that is it would get any alphanumeric values from ABCD0060~ABCD00699 or CTNE0060~CTNE00699 or PFRE0060~PFRE00699
Edit #1:
as user #Hao Wu mentioned the above regex would also accept if its ABCD0060 which is not ideal so
this should do the job by removing 1 from the { } we can get
alphanumeric values from ABCD00600~ABCD00699 or CTNE00600~CTNE00699 or PFRE00600~PFRE00699
so the resulting regex would be
(ABCD|CTNE|PFRE)006\d{2}

add +n to the name of some Strings id ends with "v + number"

I have an array of Strings
Value[0] = "Documento v1.docx";
Value[1] = "Some_things.pdf";
Value[2] = "Cosasv12.doc";
Value[3] = "Document16.docx";
Value[4] = "Nodoc";
I want to change the name of the document and add +1 to the version of every document. But only the Strings of documents that ends with v{number} (v1, v12, etc).
I used the regex [v]+a*^ but only i obtain the "v" and not the number after the "v"
If all your strings ending with v + digits + extension are to be processed, use a pattern like v(\\d+)(?=\\.[^.]+$) and then manipulate the value of Group 1 inside the Matcher#appendReplacement method:
String[] strs = { "Documento v1.docx", "Some_things.pdf", "Cosasv12.doc", "Document16.docx", "Nodoc"};
Pattern pat = Pattern.compile("v(\\d+)(?=\\.[^.]+$)");
for (String s: strs) {
StringBuffer result = new StringBuffer();
Matcher m = pat.matcher(s);
while (m.find()) {
int n = 1 + Integer.parseInt(m.group(1));
m.appendReplacement(result, "v" + n);
}
m.appendTail(result);
System.out.println(result.toString());
}
See the Java demo
Output:
Documento v2.docx
Some_things.pdf
Cosasv13.doc
Document16.docx
Nodoc
Pattern details
v - a v
(\d+) - Group 1 value: one or more digits
(?=\.[^.]+$) - that are followed with a literal . and then 1+ chars other than . up to the end of the string.
The Regex v\d+ should match on the letter v, followed by a number (please note that you may need to write it as v\\d+ when assigning it to a String). Further enhancement of the Regex depends in what your code looks like. You may want to to wrap in a Capturing Group like (v\d+), or even (v(\d+)).
The first reference a quick search turns up is
https://docs.oracle.com/javase/tutorial/essential/regex/ ,
which should be a good starting point.
Try a regex like this:
([v])([1-9]{1,3})(\.)
notice that I've already included the point in order to have less "collisions" and a maximum of 999 versions({1,3}).
Further more I've used 3 different groups so that you can easily retrieve the version number increase it and replace the string.
Example:
String regex = ;
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(time);
if(matcher.matches()){
int version = matcher.group(2); // don't remember if is 0 or 1 based
}

Regex to find a group which not match with given pattern?

I have a string say :
test=t1,test2=1,test3=t4
I want to find group or value where test2 value is not equal to 1,
I know I can find its value easily by using regex like .+,test2=(.+?),.+. but it also give me where test2=1, but I want test2 value only if it is not equal to one?
You can use negative lookahead assertion:
"test2=(?!1\\b)([^,]*)"
Above pattern will matchtest2 will match only if it is not followed by 1 (word boundary \b is used to not match numbers like 17, but only match 1)
This will work for you :
String s = "test=t1,test2=2,test3=t4";
Pattern p = Pattern.compile("test2=(?!1,)(\\d+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
I/O :
"test=t1,test2=2,test3=t4" 2
"test=t1,test2=11,test3=t4" 11
"test=t1,test2=1,test3=t4" no result

Regular expression matching "dictionary words"

I'm a Java user but I'm new to regular expressions.
I just want to have a tiny expression that, given a word (we assume that the string is only one word), answers with a boolean, telling if the word is valid or not.
An example... I want to catch all words that is plausible to be in a dictionary... So, i just want words with chars from a-z A-Z, an hyphen (for example: man-in-the-middle) and an apostrophe (like I'll or Tiffany's).
Valid words:
"food"
"RocKet"
"man-in-the-middle"
"kahsdkjhsakdhakjsd"
"JESUS", etc.
Non-valid words:
"gipsy76"
"www.google.com"
"me#gmail.com"
"745474"
"+-x/", etc.
I use this code, but it won't gave the correct answer:
Pattern p = Pattern.compile("[A-Za-z&-&']");
Matcher m = p.matcher(s);
System.out.println(m.matches());
What's wrong with my regex?
Add a + after the expression to say "one or more of those characters":
Escape the hyphen with \ (or put it last).
Remove those & characters:
Here's the code:
Pattern p = Pattern.compile("[A-Za-z'-]+");
Matcher m = p.matcher(s);
System.out.println(m.matches());
Complete test:
String[] ok = {"food","RocKet","man-in-the-middle","kahsdkjhsakdhakjsd","JESUS"};
String[] notOk = {"gipsy76", "www.google.com", "me#gmail.com", "745474","+-x/" };
Pattern p = Pattern.compile("[A-Za-z'-]+");
for (String shouldMatch : ok)
if (!p.matcher(shouldMatch).matches())
System.out.println("Error on: " + shouldMatch);
for (String shouldNotMatch : notOk)
if (p.matcher(shouldNotMatch).matches())
System.out.println("Error on: " + shouldNotMatch);
(Produces no output.)
This should work:
"[A-Za-z'-]+"
But "-word" and "word-" are not valid. So you can uses this pattern:
WORD_EXP = "^[A-Za-z]+(-[A-Za-z]+)*$"
Regex - /^([a-zA-Z]*('|-)?[a-zA-Z]+)*/
You can use above regex if you don't want successive "'" or "-".
It will give you accurate matching your text.
It accepts
man-in-the-middle
asd'asdasd'asd
It rejects following string
man--in--midle
asdasd''asd
Hi Aloob please check with this, Bit lengthy, might be having shorter version of this, Still...
[A-z]*||[[A-z]*[-]*]*||[[A-z]*[-]*[']*]*

Author and time matching regex

I would to use a regex in my Java program to recognize some feature of my strings.
I've this type of string:
`-Author- has wrote (-hh-:-mm-)
So, for example, I've a string with:
Cecco has wrote (15:12)
and i've to extract author, hh and mm fields. Obviously I've some restriction to consider:
hh and mm must be numbers
author hasn't any restrictions
I've to consider space between "has wrote" and (
How can I can use regex?
EDIT: I attach my snippet:
String mRegex = "(\\s)+ has wrote \\((\\d\\d):(\\d\\d)\\)";
Pattern mPattern = Pattern.compile(mRegex);
String[] str = {
"Cecco CQ has wrote (14:55)", //OK (matched)
"yesterday you has wrote that I'm crazy", //NO (different text)
"Simon has wrote (yesterday)", // NO (yesterday isn't numbers)
"John has wrote (22:32)", //OK
"James has wrote(22:11)", //NO (missed space between has wrote and ()
"Tommy has wrote (xx:ss)" //NO (xx and ss aren't numbers)
};
for(String s : str) {
Matcher mMatcher = mPattern.matcher(s);
while (mMatcher.find()) {
System.out.println(mMatcher.group());
}
}
homework?
Something like:
(.+) has wrote \((\d\d):(\d\d)\)
Should do the trick
() - mark groups to capture (there are three in the above)
.+ - any chars (you said no restrictions)
\d - any digit
\(\) escape the parens as literals instead of a capturing group
use:
Pattern p = Pattern.compile("(.+) has wrote \\((\\d\\d):(\\d\\d)\\)");
Matcher m = p.matcher("Gareth has wrote (12:00)");
if( m.matches()){
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
To cope with an optional (HH:mm) at the end you need to start to use some dark regex voodoo:
Pattern p = Pattern.compile("(.+) has wrote\\s?(?:\\((\\d\\d):(\\d\\d)\\))?");
Matcher m = p.matcher("Gareth has wrote (12:00)");
if( m.matches()){
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
m = p.matcher("Gareth has wrote");
if( m.matches()){
System.out.println(m.group(1));
// m.group(2) == null since it didn't match anything
}
The new unescaped pattern:
(.+) has wrote\s?(?:\((\d\d):(\d\d)\))?
\s? optionally match a space (there might not be a space at the end if there isn't a (HH:mm) group
(?: ... ) is a none capturing group, i.e. allows use to put ? after it to make is optional
I think #codinghorror has something to say about regex
The easiest way to figure out regular expressions is to use a testing tool before coding.
I use an eclipse plugin from http://www.brosinski.com/regex/
Using this I came up with the following result:
([a-zA-Z]*) has wrote \((\d\d):(\d\d)\)
Cecco has wrote (15:12)
Found 1 match(es):
start=0, end=23
Group(0) = Cecco has wrote (15:12)
Group(1) = Cecco
Group(2) = 15
Group(3) = 12
An excellent turorial on regular expression syntax can be found at http://www.regular-expressions.info/tutorial.html
Well, just in case you didn't know, Matcher has a nice function that can draw out specific groups, or parts of the pattern enclosed by (), Matcher.group(int). Like if I wanted to match for a number between two semicolons like:
:22:
I could use the regex ":(\\d+):" to match one or more digits between two semicolons, and then I can fetch specifically the digits with:
Matcher.group(1)
And then its just a matter of parsing the String into an int. As a note, group numbering starts at 1. group(0) is the whole match, so Matcher.group(0) for the previous example would return :22:
For your case, I think the regex bits you need to consider are
"[A-Za-z]" for alphabet characters (you could probably also safely use "\\w", which matchers alphabet characters, as well as numbers and _).
"\\d" for digits (1,2,3...)
"+" for indicating you want one or more of the previous character or group.

Categories

Resources