How to match regex pattern on single line only? - java

I have the following regex and sample input:
http://regex101.com/r/xK9dE3
As you can see it matching the first "yo". I only want the pattern to match on the same line (the second "yo") pattern with "cut me".
How can I make sure that the regex match is only on the same line?
Output:
Hi
Expected Output (this is what I really want):
Hi
yo keep this here
Keep this here

You can use this regex with s (DOTALL) regex flag:
^.*?(?=yo\b[^\n]*cut me:)
Online Demo: http://regex101.com/r/oV3eP7
yo\b[^\n]*cut me: is lookahead pattern that makes sure that yo with word boundary and cut me: are matched in the same line.

Remove the s or DOTALL flag and change your regex to the following:
^.*?((\yo\b.*?(cut me:)[\s\S]*))
With the DOTALL flag enabled . will match newline characters, so your match can span multiple lines including lines before yo or between yo and cut me. By removing this flag you can ensure that you only match the line with both yo and cut me, and then change the .* at the end to [\s\S]* which will match any character including newlines so that you can match to the end of the string.
http://regex101.com/r/sX2kL0
edit: Note that this takes a slightly different approach than the other answer, this will match the portion of the string that you want deleted so you can replace this portion with an empty string to remove it.

Related

Regex Match substring if said substring contains another word

Sample String
Apple, Pinapple, NONE\nOrange, Pears, Apples\nMango, None, Banana\nLemon, NONEDLE, Grape
Regex I have tried
(?<=\\n)(.*?)(?=\\n) - This matches each of the substrings, but I could not figure out how to only match the ones with NONE in them
Desired result
I have tried to build a regex that will match each of the lines in the sample string (a line being between one \n to another).
However, I would like it to only match if that line contains the word NONE as a whole word. I have tried to reverse engineer the result from Regex Match text within a Capture Group but wasn't able to get far.
I'm writing a java method that should remove parts of the string that match the regex.
Any help would be appreciated!!!
Try this.
String input = "Apple, Pinapple, NONE\nOrange, Pears, Apples\nMango, None, Banana\nLemon, NONEDLE, Grape";
input.lines()
.filter(s -> s.matches(".*\\bNONE\\b.*"))
.forEach(System.out::println);
output
Apple, Pinapple, NONE
Use word boundaries either side of NONE:
"(?m)^.*\\bNONE\\b.*$"
With multiline flag on, ^ and $ match start and end of lines and since dot doesn’t match newlines, this will match whole lines with NONE is them.
Use this regex with a Matcher; each call to find() will give you the lines you want.

How to match and exclude "!x" with regex?

I have been trying to come up with a regex for Java to match a bot command:
!x play search words here
where the x can be any alphanumeric character and it works with:
"(?:\\w)(\\w+)"
However if I want to use alias "p" for "play", the regex will skip the "p" also. I've been also trying to get the skip match to work with exclamation mark without success.
One workaround I found was to use:
"[^\\!\\w]+(\\w+)"
but then the first match is " p" with whitespace. I just can't figure this out!
To avoid matching words preceded with !, you may use
"\\b(?<!!)\\w+"
See the regex demo
Details:
\b - word boundary
(?<!!) - a negative lookbehind making sure there cannot be ! right before the current position
\w+ - 1 or more word chars.
Note that lookbehinds are zero-width assertions, they just signal the regex engine whether to go on matching or stop (the text matched does not get added to the current matched text).

Java Regex ignoring newline character WITHOUT Dotall

I have to parse returned emails for a specific object id. The problem is that, when the email is returned, the id may be split into several lines. Usually it should look like this:
foo#bar-20130101-103000#12345
whereat I'm interested in the last part "12345". The problem is that that string tends to be split by a newline, for example:
foo#bar-20130101-103000#12
345
which causes my regex
[a-zA-Z0-9äöüÄÖÜß]{1,5}#[a-zA-Z0-9äöüÄÖÜß]{1,5}-\d{8}-\d{6}#(\d+)
to only find "12" instead of "12345". Now all the hints i find on the 'net are to use Pattern.MULTILINE and/or Pattern.DOTALL, but multiline only influences the ^ and $ anchors and dotall only makes . match on newline chars too. The problem is that i don't have a . here and it's not really applicable either, because i only want digits.
So how can i make my regex match the whole thing and not stop at the line break?
[\d\r\n] will match a digit or a new line, so try with ([\d\r\n]+).
Since your number is in the end you can try:
"(?s)^[a-zA-Z0-9äöüÄÖÜß]{1,5}#[a-zA-Z0-9äöüÄÖÜß]{1,5}-\d{8}-\d{6}#(.*)$"
i.e. capture everything after # with DOTALL
Following should also work without DOTALL:
"^[a-zA-Z0-9äöüÄÖÜß]{1,5}#[a-zA-Z0-9äöüÄÖÜß]{1,5}-\d{8}-\d{6}#[\\d\\r\\n]+$"

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!
There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff
How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

Using java.util.regex.Pattern

I´m not a programmer, so my level is newie in this field. I must create a regular expression to check two lines. Between these two lines A and B could be one, two or more different lines.
I´ve been reviewing link http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html but i´ve not reach the solution, althouth i think that i´m very close to the solution.
I am testing the expression
^(.*$)
and this gets an entire line. If i write this expression twice it gets two lines. So it seems that this expression is getting as entire lines as occurrences of the expression.
But, i would like to check undetermined lines between A and B. I know that at least it will be one line
If i write ^(.*$){1,} it doesn´t work.
Anyone knows which could be the mistake?
Thank you for your time
Andres
DOT . in regex matches any character except newline character.
You're looking for DOTALL or s flag here that makes dot match any character including newline character as well. So if you want to match all the lines between literals A and B then use this regex:
(?s)A.*?B
(?s) is for DOTALL that will make .*? match all the characters including newline characters between A and B.
? is to make above regex non-greedy.
Read More: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html
Why don't you use Scanner ? It might be more related to what you want:
Scanner sc = new ...
while (sc.nextLine().compareTo(strB)!=0) {
whatYouWantToDo
}
You could try to search for line terminators \r and \n. Depending on the source of the file you maybe have to experiment a bit.
As far as I understood it, you want to match the lines, with at least one empty line in between? Try ^(.*)$\n{2,}^(.*)$
If you want to find two equal lines, using regex:
Pattern pattern = Pattern.compile("^(?:.*\n)*(.*\n)(?:.*\n)*\\1");
// Skip some lines, find a line, skip some lines, find the first group `(...)`
Matcher m = pattern.matcher(text);
while (m.find()) {
System.out.println("Double: " + m.group(1);
}
The (?: ...) is a non-capturing group; that is, not available through m.group(#).
However this won't find line B in: "A\nB\nA\nB\n".

Categories

Resources