I would like to extract sentences with the word "flung" in the whole text.
For example, in the following text, I'd like to extract the sentence "It was exactly as if a hand had clutched them in the centre and flung them aside." using regular expression.
I tried to use this .*? flung (?<sub>.*?)\., but it starts searching from the beginning of the line.
How could I solve the problem?
As she did so, a most extraordinary thing happened. The bed-clothes gathered themselves together, leapt up suddenly into a sort of peak, and then jumped headlong over the bottom rail. It was exactly as if a hand had clutched them in the centre and flung them aside. Immediately after, .........
Here you go,
[^.]* flung [^.]*\.
DEMO
OR
[^.?!]*(?<=[.?\s!])flung(?=[\s.?!])[^.?!]*[.?!]
DEMO
Simply anything between dots:
without a dote
[A-Za-z," ]+word[A-Za-z," ]+
with a dote
[A-Za-z," ]+word[A-Za-z," ]+\.
"[A-Z]\\s?\\w*\\s?(([^(\\.\\s)|(\\?\\s)|(!\\s)])|\\s)*(?:your target\\s)(([^(\\.\\s)|(\\?\\s)|(!\\s)])|\\s)*(([^(\\.\\s)|(\\?\\s)|(!\\s)])|\\s)*[\\.|\\?|!]"
A sentence starts with any capital letter, in the middle it may contain decimal or abbreviation.
(?<=^|\s)[A-Z][^!?.]*( word\s*)[^!?.]*(?=\.|\!|\?)
Before first capital letter there is a line start or a white space, then it may consist any characters without set of [!?.](*)-or may not , then contains your target word with or without white spaces after it (if it is in the end of the sentence), then may consist again any characters without set of [!?.](*)-or not, and finally ends with dot or ! or ?.
Related
I'm looking for a regex that will match a period character, ONLY if none of that period's surrounding characters are also periods.
Fine by me... leave! FAIL
Okay.. You win. SUCCEED
Okay. SUCCEED //Note here, the period is the last char in the string.
I was thinking do:
[^\\.*]\\.
But that is just wrong and probably not at all in the right direction. I hope this question helps others in the same situation as well.
Thanks.
You need to wrap the dot in negative look arounds:
(?<![.])[.](?![.])
I prefer [.] over \\., because:
It's easier to read - there are too many back slashes in java literals already
[.] looks a bit like an X wing fighter from Star Wars ™
You can use negative look ahead and look behind or this alternative regex:
String regex = "(^\\.[^\\.]|[^\\.]\\.[^\\.]|[^\\.]\\.$)";
The first alternative check the beginning ^ of the string (if it can start with a dot), the second looks for any dot inside and the third looks for a dot at the end of the string $.
That regex will still match any period that isn't preceded by another period.
[^\.]\.[^\.] Takes care of both sides of the target period.
EDIT: Java doesn't have a raw string like Python, so you would need full escapes: [^.]\\.[^.]|^\\.[^.]|[^.]\\.$
I am trying to match the bolded portion of the below String, which would represent a city.
1795 New Test Dr Test TEst Wildwood, MI 48769-1100
There are two spaces between Dr and Test, the starting portion should happen after those double spaces, and end before the comma.
I feel like I am very close to having this correct but can't quite get it 100%, as it is including the white space characters before Test.
(?=\s{2})[\w+\s]*[^,]
The above is what I have so far, also the many other alternatives did not work either they still include the white space characters I do not want at the beginning.
I feel like I missing something simple, but even after looking many places I cannot seem to find the regex that would match this pattern.
Also I know this can be easily accomplished with split and substrings, but the requirement is a regex unfortunately, as this is for a database driven automation application and the format should be able to change on the fly without requiring a deploy due to code changes.
You need a look-behind for the spaces rather than a look-ahead, as you want the match to start immediately after them. From that point on, you can simply do a greedy match for anything that is not a comma:
(?<=\s{2})[^,]*
The * is greedy and will consume as many characters as it can, ending the match immediately before the comma.
\s actually also matches whitespace other than space, which may or may not be not be what you what.
How about ^.*? ([^,]*).*$. That's a non-greedy match at the beginning of the line ^.*?, followed by two literal spaces , then capturing everything that isn't a comma, then matching everything else to the end of the line.
Be aware, though, that when I copy and paste your example text, it does not contain two spaces. This might be causing you problems, or it's just a transcription issue and your original has the two spaces.
I have created a regex for phone numbers as follows \\d+ ?\\w{0,9} ?\\d+ . Now i have a problem that this only accepts numbers. Sometimes i receive the phone number as starified
so it can be 011***1334. How can i incorporate the stars portion into the above regex expression.
I'm having trouble getting your regex to work in the first place, without stars,
but anyway... you can represent the star by escaping it.
\*
So you should just turn all of your \d into [\d\*] or [\\d\\*] if you have to escape the \ first in your java.
Some regular expression engines don't require you to escape all special characters in [] so I'd watch for that behavior if it doesn't work at first
Anywhere you use \\d, turn it into [\\d\\*]
[\\d\\*]+ ?\\w{0,9} ?[\\d\\*]+
I agree with Sam I am though; the original seems odd. If you just need numbers and asterisks, this should do (escaped Java-style):
[\\d\\*]{7,10}
I need a regular expression for below pattern
It can start with / or number
It can only contain numbers, no text
Numbers can have space in between them.
It can contain /*, at least 1 number and space or numbers and /*
Valid Strings:
3232////33 43/323//
3232////3343/323//
/3232////343/323//
Invalid Strings:
/sas/3232/////dsds/
/ /34343///// /////
///////////
My Problem is, it can have space between numbers like /3232 323/ but not / /.
How to validate it ?
I have tried so far:
(\\d[\\d ]*/+) , (/*\\d[\\d ]*/+) , (/*)(\\d*)(/*)
This regex should work for you:
^/*(?:\\d(?: \\d)*/*)+$
Live Demo: http://www.rubular.com/r/pUOYFwV8SQ
My solution is not so simple but it works
^(((\d[\d ]*\d)|\d)|/)*((\d[\d ]*\d)|\d)(((\d[\d ]*\d)|\d)|/)*$
Just use lookarounds for the last criteria.
^(?=.*?\\d)([\\d/]*(?:/ ?(?!/)|\\d ?))+$
The best would have been to use conditional regex, but I think Java doesn't support them.
Explanation:
Basically, numbers or slashes, followed by one number and a space, or one slash and a space which is not followed by another slash. Repeat that. The space is made optional because I assume there's none at the end of your string.
Try this java regex
/*(\\d[\\d ]*(?<=\\d)/+)+
It meets all your criteria.
Although you didn't specifically state it, I have assumed that a space may not appear as the first or last character for a number (ie spaces must be between numbers)
"(?![A-z])(?=.*[0-9].*)(?!.*/ /.*)[0-9/ ]{2,}(?![A-z])"
this will match what you want but keep in mind it will also match this
/3232///// from /sas/3232/////dsds/
this is because part of the invalid string is correct
if you reading line by line then match the ^ $ and if you are reading an entire block of text then search for \r\n around the regex above to match each new line
I've been pulling my hair out over this, and I know it's a simple solution that just seems to escape me at the moment.
I am attempting to perform a match using a Regex code (client side, character classes only) that will match "looking for" within 20 spaces (any character) of "male".
I don't care what the characters or spaces are, it must not find a match if the two words/phrases are more than 20 characters apart.
I have the code setup to match the phrases I just need to know how to set the parameter of a distance search. "Only match Looking for with Male if they are within zero to twenty characters of each other.
(?i).*looking for.{0,20}male.*
The (?i) flag is just "ignore case".
EDIT:
with the suggestions:
Pattern.compile("(?is).*\\blooking for\\b.{0,20}\\bman\\b.*");
Maybe you shouldn't pull your hair out but instead start with the root of the issue? I mean can't you write your code/application more logical so you wouldn't need to do such weird string search with even weirder distance matching?