Regular Expression Match after double space, and before comma - java

I am trying to match the bolded portion of the below String, which would represent a city.
1795 New Test Dr Test TEst Wildwood, MI 48769-1100
There are two spaces between Dr and Test, the starting portion should happen after those double spaces, and end before the comma.
I feel like I am very close to having this correct but can't quite get it 100%, as it is including the white space characters before Test.
(?=\s{2})[\w+\s]*[^,]
The above is what I have so far, also the many other alternatives did not work either they still include the white space characters I do not want at the beginning.
I feel like I missing something simple, but even after looking many places I cannot seem to find the regex that would match this pattern.
Also I know this can be easily accomplished with split and substrings, but the requirement is a regex unfortunately, as this is for a database driven automation application and the format should be able to change on the fly without requiring a deploy due to code changes.

You need a look-behind for the spaces rather than a look-ahead, as you want the match to start immediately after them. From that point on, you can simply do a greedy match for anything that is not a comma:
(?<=\s{2})[^,]*
The * is greedy and will consume as many characters as it can, ending the match immediately before the comma.

\s actually also matches whitespace other than space, which may or may not be not be what you what.
How about ^.*? ([^,]*).*$. That's a non-greedy match at the beginning of the line ^.*?, followed by two literal spaces , then capturing everything that isn't a comma, then matching everything else to the end of the line.
Be aware, though, that when I copy and paste your example text, it does not contain two spaces. This might be causing you problems, or it's just a transcription issue and your original has the two spaces.

Related

JAVA - Add ! and ? to regex pattern [duplicate]

I would like to extract sentences with the word "flung" in the whole text.
For example, in the following text, I'd like to extract the sentence "It was exactly as if a hand had clutched them in the centre and flung them aside." using regular expression.
I tried to use this .*? flung (?<sub>.*?)\., but it starts searching from the beginning of the line.
How could I solve the problem?
As she did so, a most extraordinary thing happened. The bed-clothes gathered themselves together, leapt up suddenly into a sort of peak, and then jumped headlong over the bottom rail. It was exactly as if a hand had clutched them in the centre and flung them aside. Immediately after, .........
Here you go,
[^.]* flung [^.]*\.
DEMO
OR
[^.?!]*(?<=[.?\s!])flung(?=[\s.?!])[^.?!]*[.?!]
DEMO
Simply anything between dots:
without a dote
[A-Za-z," ]+word[A-Za-z," ]+
with a dote
[A-Za-z," ]+word[A-Za-z," ]+\.
"[A-Z]\\s?\\w*\\s?(([^(\\.\\s)|(\\?\\s)|(!\\s)])|\\s)*(?:your target\\s)(([^(\\.\\s)|(\\?\\s)|(!\\s)])|\\s)*(([^(\\.\\s)|(\\?\\s)|(!\\s)])|\\s)*[\\.|\\?|!]"
A sentence starts with any capital letter, in the middle it may contain decimal or abbreviation.
(?<=^|\s)[A-Z][^!?.]*( word\s*)[^!?.]*(?=\.|\!|\?)
Before first capital letter there is a line start or a white space, then it may consist any characters without set of [!?.](*)-or may not , then contains your target word with or without white spaces after it (if it is in the end of the sentence), then may consist again any characters without set of [!?.](*)-or not, and finally ends with dot or ! or ?.

Java Regex to validate String

I have just bought a book on Regex to try and get my head around it but I'm still really struggling with it. I am trying to create a java regex that will satisfy a string configuration that can;
Can contain lowercase letters ([a-z])
Can contain commas (,) but only between words
Can contain colon (:) but must be separated by words or multiply (*)
Can contain hyphens (-) but must be separated by words
Can contain multiply (*) but if used it must be the only character before/between/after the colon
Cannot contain spaces, 'words' are delimitated by a hyphens (-) or commas (,) or colon (:) or the end of the string
So for example the following would be true:
foo:bar
foo-bar:foo
foo,bar:foo
foo-bar,foo:bar,foo-bar
foo:bar:foo,bar
*:foo
foo:*
*:*:*
But the following would be false:
foo :bar
,foo:bar
foo-:bar
-foo:bar
foo,:bar-
foo:bar,
foo,*:bar
foo-*:bar
This is what I have so far:
^[a-z-]|*[:?][a-z-]|*[:?][a-z-]|*
Here is a regex that will work for all your cases:
([a-z]+([,-][a-z]+)*|\*)(:([a-z]+)([,-][a-z]+)*|\*)*
Here is a detailed analysis:
One of the basic structures used to build complicated regular expressions like this is actually pretty simple, and has the form text(separator text)*. A regex of that form will match:
one text
one text, a separator, and another text
one text, a separator, another text, another separator, and yet another text
or more, just add another separator and a text to the end.
So here is a breakdown of the code:
[a-z]+([,-][a-z]+)* is an instance of the pattern I discussed above: the text here is [a-z]+, and the separator is [,-].
([a-z]+([,-][a-z]+)*|\*) allows an asterisk to be matched instead.
([a-z]+([,-][a-z]+)*|\*)(:([a-z]+([,-][a-z]+)*|\*))* is another instance of the pattern I discussed above: the text is ([a-z]+([,-][a-z]+)*|\*), and the separator is :.
If you plan to use this as a component of an even larger regex, in which the group matches will be important, I would recommend making the internal parens non-grouping, and place grouping parens around the entire regex, like so:
((?:[a-z]+(?:[,-][a-z]+)*|\*)(?::([a-z]+)(?:[,-][a-z]+)*|\*)*)
We rarely see here somebody who can define positive and negative test cases. That makes live really easier.
Here's my regex with a 95% solution:
"(([a-z]+|\\*)[:,-])*([a-z]+|\\*)" (JAVA-Version)
(([a-z]+|\*)[:,-])*([a-z]+|\*) (plain regex)
It simply differntiates between words (a-z or *) and separators (one of :-,) and it must contain at least one word and words must be separated by a separator. It works for the positive cases and for the negative cases except the last two negative ones.
One remark: Such a complex "syntax" would in real live be implemented with a grammer definition tool like ANTLR (or a few years ago with lex/yacc, flex/bison). Regex can do that but will not be easy to maintain.

Need regular expression for pattern this

I need a regular expression for below pattern
It can start with / or number
It can only contain numbers, no text
Numbers can have space in between them.
It can contain /*, at least 1 number and space or numbers and /*
Valid Strings:
3232////33 43/323//
3232////3343/323//
/3232////343/323//
Invalid Strings:
/sas/3232/////dsds/
/ /34343///// /////
///////////
My Problem is, it can have space between numbers like /3232 323/ but not / /.
How to validate it ?
I have tried so far:
(\\d[\\d ]*/+) , (/*\\d[\\d ]*/+) , (/*)(\\d*)(/*)
This regex should work for you:
^/*(?:\\d(?: \\d)*/*)+$
Live Demo: http://www.rubular.com/r/pUOYFwV8SQ
My solution is not so simple but it works
^(((\d[\d ]*\d)|\d)|/)*((\d[\d ]*\d)|\d)(((\d[\d ]*\d)|\d)|/)*$
Just use lookarounds for the last criteria.
^(?=.*?\\d)([\\d/]*(?:/ ?(?!/)|\\d ?))+$
The best would have been to use conditional regex, but I think Java doesn't support them.
Explanation:
Basically, numbers or slashes, followed by one number and a space, or one slash and a space which is not followed by another slash. Repeat that. The space is made optional because I assume there's none at the end of your string.
Try this java regex
/*(\\d[\\d ]*(?<=\\d)/+)+
It meets all your criteria.
Although you didn't specifically state it, I have assumed that a space may not appear as the first or last character for a number (ie spaces must be between numbers)
"(?![A-z])(?=.*[0-9].*)(?!.*/ /.*)[0-9/ ]{2,}(?![A-z])"
this will match what you want but keep in mind it will also match this
/3232///// from /sas/3232/////dsds/
this is because part of the invalid string is correct
if you reading line by line then match the ^ $ and if you are reading an entire block of text then search for \r\n around the regex above to match each new line

Java Regular Expressions find matches within x characters

I've been pulling my hair out over this, and I know it's a simple solution that just seems to escape me at the moment.
I am attempting to perform a match using a Regex code (client side, character classes only) that will match "looking for" within 20 spaces (any character) of "male".
I don't care what the characters or spaces are, it must not find a match if the two words/phrases are more than 20 characters apart.
I have the code setup to match the phrases I just need to know how to set the parameter of a distance search. "Only match Looking for with Male if they are within zero to twenty characters of each other.
(?i).*looking for.{0,20}male.*
The (?i) flag is just "ignore case".
EDIT:
with the suggestions:
Pattern.compile("(?is).*\\blooking for\\b.{0,20}\\bman\\b.*");
Maybe you shouldn't pull your hair out but instead start with the root of the issue? I mean can't you write your code/application more logical so you wouldn't need to do such weird string search with even weirder distance matching?

regex for that excludes matches within quotes

I'm working on this pretty big re-factoring project and I'm using intellij's find/replace with regexp to help me out.
This is the regexp I'm using:
\b(?<!\.)Units(?![_\w(.])\b
I find that most matches that are not useful for my purpose are the matches that occur with strings within quotes, for example: "units"
I'd like to find a way to have the above expression not match when it finds a matching string that's between quotes...
Thx in advance, this place rocks!
Assuming the quotes are always paired on a given line, you could create matches before and after for an even number of quotes, and make sure the whole line is matched:
^([^"]*("[^"]*")*[^"]*)*\b(?<!\.)Units(?![_\w(.])\b([^"]*("[^"]*")*[^"]*)*$
this works because the fragment
([^"]*("[^"]*")*[^"]*)*
will only match paired quotes. By adding the begin and end line anchors, it forces the quotes on the left and right side of your regex to be an even count.
This won't handle embedded escaped quotes properly, and multiline quoted strings will be trouble.
Intellij uses Java regexes, doesn't it? Try this:
(?m)(?<![\w.])Units(?![\w(.])(?=(?:[^\r\n"\\]++|\\.)*+[^\r\n"\\]*+$)
The first part is your regex after a little cosmetic surgery:
(?<![\w.])Units(?![\w(.])
The \b at the beginning and end were effectively the same as a negative lookbehind and a negative lookahead (respectively) for \w, so I folded them into your existing lookarounds. The new lookahead matches the rest of the line if it contains even number (including zero) of unescaped quotation marks:
(?=(?:[^\r\n"\\]++|\\.)*+[^\r\n"\\]*+$)
That handles pathological cases like the one Welbog pointed out, and unlike Michael's regex it will find multiple occurrences of the text the same line. But it doesn't take comments into account. Is Intellij's find/replace feature intelligent enough to disregard text in comments? Come to think of it, doesn't it have some kind of refactoring support built in?

Categories

Resources