I need to build a regex for a name with the following pattern, so
John D.E. would pass the regex test.
Basically what I want is:
N number of chars(a-zA-Z) goes first
Then there's exactly one space
Exactly one char(a-zA-Z)
Exactly one dot
Exactly one char(a-zA-Z)
Exactly one dot
I wrote this regex ^([a-zA-Z]*)+( {1})+([a-zA-Z]{1})+(\.)+([a-zA-Z]{1})+(\.), but it doesn't seem to work properly (the expression still allows n number of spaces, for example). How do I restrict it? {1} doesn't work.
Try this:
^([a-zA-Z])+([ ]{1})([a-zA-Z]{1})([.])([a-zA-Z]{1})([.])
I've taken space and dots into class ([]). If you don't do this with dot, then it means any character. Alo pluses are redundant, they mean more than one character.
P.S.: #f1sh correctly notices, that having {1} doesn't change a thing, so the shorter form would be:
^([a-zA-Z])+([ ])([a-zA-Z])([.])([a-zA-Z])([.])
Related
I have some larger text which in essence looks like this:
abc12..manycharshere...hi - abc23...manyothercharshere...jk
Obviously there are two items, each starting with "abc", the numbers (12 and 23) are interesting as well as the "hi" and "jk" at the end.
I would like to create a regular expression which allows me to parse out the numbers, but only if the two characters at the end match, i.e. I am looking for the number related to "jk", but the following regular expression matches the whole string and thus returns "12", not "23" even when non-greedy matching the area with the following:
abc([0-9]+).*?jk
Is there a way to construct a regular expression which matches text like the one above, i.e. retrieving "23" for items ending in "jk"?
Basically I would need something like "match abc followed by a number, but only if there is "jk" at the end before another instance of "abc followed by a number appears"
Note: the texts/matches are an abstraction here, the actual text is more complicated, espially the things that can appear as "manyothercharactershere", I simplified to show the underlying problem more clearly.
Use a regex like this. .*abc([0-9]+).*?jk
demo here
I think you want something like this,
abc([0-9]+)(?=(?:(?!jk|abc[0-9]).)*jk)
DEMO
You need to use negative lookahead here to make it work:
abc(?!.*?abc)([0-9]+).*?jk
RegEx Demo
Here (?!.*?abc) is negative lookahead that makes sure to match abc where it is NOT followed by another abc thus making sure closes string between abc and jk is matched.
Being non-greedy does not change the rule, that the first match is returned. So abc([0-9]+).*?jk will find the first jk after “abcnumber” rather than the last one, but still match the first “abcnumber”.
One way to solve this is to tell that the dot should not match abc([0-9]+):
abc([0-9]+)((?!abc([0-9]+)).)*jk
If it is not important to have the entire pattern being an exact match you can do it simpler:
.*(abc([0-9]+).*?jk)
In this case, it’s group 1 which contains your intended match. The pattern uses a greedy matchall to ensure that the last possible “abcnumber” is matched within the group.
Assuming that hyphen separates "items", this regex will capture the numbers from the target item:
abc([0-9]+)[^-]*?jk
See demo
I have a regex like this (which is thanks to you guys in a big way):
(?<=( |\\s|\\A|^))#(!)[\\w]{3,}+ ?[\\w]*
Which works great however I now need to match one more case and I can't work out how to do it. I need to have a minimum of 3 chars after the # which I've done but I also need toallow for a minimum of 3 chars, at least two before a space and one after however a space is optional. So I need to match these patterns:
#tst
#tst test
#ts t
How can I enforce a minimum of three chars if there's no space or a minimum of two chars, a space and then at least one more char? I can do it as two seperate expressions but I'm hoping it's possible to do it with one?
Can anyone point me in the right direction..
Thanks.
EDIT:
Ok I think I've kind of achieved what I want with:
(?<=( |\\s|\\A|^))#{1}(([\\w]{3,}+ ?[\\w]*)|([\\w]{2,} {1}[\\w]{1,}))
Is there a more efficient way or is this how it should be done?
I think you can simplify your expression a bit:
String regex = "(?<=(\\s|\\A|^))#(\\w{2,} ?\\w+)";
What I have done:
Removed the redundant space from the first part.
Simplified the last expression. It now accepts, as per your description, a minimum of 2 characters, followed by an optional whitespace, followed by at least one more character.
I'm not sure what the point of the (!) part was, so it is removed in this version to match your test cases.
The following one should suit your needs:
(?<=(?<!\w)#)\w{2,} ?\w+
Debuggex Demo
Don't forget to escape the backslashes in Java since in a string literal:
(?<=(?<!\\w)#)\\w{2,} ?\\w+
The simplest regex I can think of is:
(?<=\s|^)#(\s*\w){3,}
Pattern
^\\d{1}-\\d{10}|\\d{1,9}|^TWC([0-9){12})$
should validate any of these
1-23232445
1-232323
1-009121212
12
12222
TWC12222
TWC1222324
When i test for TWC pattern doesn't match, I have added "|" to consider OR condition and then to have numbers from 0-9 but limiting to 12 digits. What am i missing ?
TWC([0-9)
I think this is where it might be not working??
You need
TWC([0-9]{12})
Complete answer...
(\d{1}-\d{1,12})|^TWC(\d{1,12})$
even nicer answer ..
^(\\d-|TWC|)(\\d{1,12})$ // this syntax i believe will match your needs.
tested :)
^([0-9]-|TWC|)([0-9]{1,12})$ // or
^(\d-|TWC|)(\d{1,12})$
breakdown
^
this denotes the start of the string
\d or [0-9]
denotes one character of the numbers 0 through 9 (note \d might not work in some lanagues or require different syntax!)
|
is essentially an OR
{1,12}
will only accept a particular pattern 1-12 times for instance in my code the patternw ould be \d or [0-9]
$
is the end of the line
this essentially checks if the line contains a [0-9] with a - after,TWC, or just a nothing space to account for nothing being there at the start then reads up to 12 digits. Should work for all your cases.
testing
edit code.
all unit tests. click on "java" if you want to see them :0
more testing.
NOTE:
YOU NEED TO LOOK AT THE SYNTAX OF WHAT YOU ARE USING IN SOME CASES YOU MIGHT NEED TO \ SOME THINGS IN ORDER FOR THEM TO WORK.. IN C++/C its 2 // IN ORDER FOR THESE TO WORK PLEASE BE VERY WARY ABOUT PARTICULAR SYNTAXES.
Sorry for all the confusion, and also for lying a whole bunch apparently. The issue you're having is that you are using exact quantifiers in a couple of places you don't mean to, namely the {10} and {12}. This requires exactly ten or twelve digits in those spots. What you presumably want is for those to be {1,10} and {1,12} respectively.
What I would do is something like this, using parentheses and quantifiers to clean everything up and repeating yourself as little as possible, to avoid confusion. You've got three possible prefixes (a digit and a dash, or "TWC", or nothing). I'd put those possibilities all together, and then add the rest. This makes the regex much easier to look at.
^(\\d-|TWC){0,1}\\d{1,12}$
The breakdown:
^ is at the beginning, always.
(\\d-|TWC){0,1} Next comes either a single digit followed by a dash, or the string "TWC". This prefix occurs either zero times (for no prefix) or one time.
\\d{1,12}$ Finally, there is a string of one to twelve digits, followed by the end of the line/input (depending on your DOTALL settings of course).
Of course you won't be able to simplify it quite this much if the different prefixes can only allow certain numbers of digits, but this is the basic idea.
You've also got what looks like a typo; TWC([0-9){12}) should be TWC([0-9]{12}). I'm guessing this was just a typo when writing out the question though, since what you have right now would blow up at runtime when you tried to use it otherwise, and it sounds like it's working for some of your inputs.
I want to create a regular expression, in Java, that will match the following:
*A*B
where A and B are ANY character except asterisk, and there can be any number of A characters and B characters. A(s) is/are preceded by asterisk, and B(s) is/are preceded by asterisk.
Will the following work? Seems to work when I run it, but I want to be absolutely sure.
Pattern.matches("\\A\\*([^\\*]{1,})\\*([^\\*]{1,})\\Z", someString)
It will work, however you can rewrite it as this (unquoted):
\A\*([^*]+)\*([^*]+)\Z
there is no need to quote the star in a character class;
{1,} and + are the same quantifier (once or more).
Note 1: you use .matches() which automatically anchors the regex at the beginning and end; you may therefore do without \A and \Z.
Note 2: I have retained the capturing groups -- do you actually need them?
Note 3: it is unclear whether you want the same character repeated between the stars; the example above assumes not. If you want the same, then use this:
\A\*(([^*])\2*)\*(([^*])\4*)\Z
If I got it correct.. it can be as simple as
^\\*((?!\\*).)+\\*((?!\\*).)+
If you want a match on *AAA*BBB but not on *ABC*DEF use
^\*([a-zA-Z])\1*\*([a-zA-Z])\2*$
This won't match on this either
*A_$-123*B<>+-321
I need a regular expression for below pattern
It can start with / or number
It can only contain numbers, no text
Numbers can have space in between them.
It can contain /*, at least 1 number and space or numbers and /*
Valid Strings:
3232////33 43/323//
3232////3343/323//
/3232////343/323//
Invalid Strings:
/sas/3232/////dsds/
/ /34343///// /////
///////////
My Problem is, it can have space between numbers like /3232 323/ but not / /.
How to validate it ?
I have tried so far:
(\\d[\\d ]*/+) , (/*\\d[\\d ]*/+) , (/*)(\\d*)(/*)
This regex should work for you:
^/*(?:\\d(?: \\d)*/*)+$
Live Demo: http://www.rubular.com/r/pUOYFwV8SQ
My solution is not so simple but it works
^(((\d[\d ]*\d)|\d)|/)*((\d[\d ]*\d)|\d)(((\d[\d ]*\d)|\d)|/)*$
Just use lookarounds for the last criteria.
^(?=.*?\\d)([\\d/]*(?:/ ?(?!/)|\\d ?))+$
The best would have been to use conditional regex, but I think Java doesn't support them.
Explanation:
Basically, numbers or slashes, followed by one number and a space, or one slash and a space which is not followed by another slash. Repeat that. The space is made optional because I assume there's none at the end of your string.
Try this java regex
/*(\\d[\\d ]*(?<=\\d)/+)+
It meets all your criteria.
Although you didn't specifically state it, I have assumed that a space may not appear as the first or last character for a number (ie spaces must be between numbers)
"(?![A-z])(?=.*[0-9].*)(?!.*/ /.*)[0-9/ ]{2,}(?![A-z])"
this will match what you want but keep in mind it will also match this
/3232///// from /sas/3232/////dsds/
this is because part of the invalid string is correct
if you reading line by line then match the ^ $ and if you are reading an entire block of text then search for \r\n around the regex above to match each new line