Regex Match substring if said substring contains another word

Regex Match substring if said substring contains another word - java

Sample String
Apple, Pinapple, NONE\nOrange, Pears, Apples\nMango, None, Banana\nLemon, NONEDLE, Grape
Regex I have tried
(?<=\\n)(.*?)(?=\\n) - This matches each of the substrings, but I could not figure out how to only match the ones with NONE in them
Desired result
I have tried to build a regex that will match each of the lines in the sample string (a line being between one \n to another).
However, I would like it to only match if that line contains the word NONE as a whole word. I have tried to reverse engineer the result from Regex Match text within a Capture Group but wasn't able to get far.
I'm writing a java method that should remove parts of the string that match the regex.
Any help would be appreciated!!!

Try this.
String input = "Apple, Pinapple, NONE\nOrange, Pears, Apples\nMango, None, Banana\nLemon, NONEDLE, Grape";
input.lines()
.filter(s -> s.matches(".*\\bNONE\\b.*"))
.forEach(System.out::println);
output
Apple, Pinapple, NONE

Use word boundaries either side of NONE:
"(?m)^.*\\bNONE\\b.*$"
With multiline flag on, ^ and $ match start and end of lines and since dot doesn’t match newlines, this will match whole lines with NONE is them.
Use this regex with a Matcher; each call to find() will give you the lines you want.

Related

Filter out exact words from part of string - java regex

I want to match all lines that start with fn- except ones that use specific words after the hyphen.
match:
fn-bar
fn-foo
fn-foobarb
dont match (foobar and fubar are my exact negative filters):
fn-foobar
fn-fubar
xn-blah
So far I have:
(fn)-(?!(fubar|foobar)$)
which does not match the whole line

You can use the word delimiter \b to avoid matching the "foobar" and "fubar" words. Instead in order to avoid whole line issue, it's sufficient to antepose the negation of the full phrases you don't want to match.
(?!fn-foobar\b|fn-fubar\b)fn-.*
Check the demo here.

How to match regex pattern on single line only?

I have the following regex and sample input:
http://regex101.com/r/xK9dE3
As you can see it matching the first "yo". I only want the pattern to match on the same line (the second "yo") pattern with "cut me".
How can I make sure that the regex match is only on the same line?
Output:
Hi
Expected Output (this is what I really want):
Hi
yo keep this here
Keep this here

You can use this regex with s (DOTALL) regex flag:
^.*?(?=yo\b[^\n]*cut me:)
Online Demo: http://regex101.com/r/oV3eP7
yo\b[^\n]*cut me: is lookahead pattern that makes sure that yo with word boundary and cut me: are matched in the same line.

Remove the s or DOTALL flag and change your regex to the following:
^.*?((\yo\b.*?(cut me:)[\s\S]*))
With the DOTALL flag enabled . will match newline characters, so your match can span multiple lines including lines before yo or between yo and cut me. By removing this flag you can ensure that you only match the line with both yo and cut me, and then change the .* at the end to [\s\S]* which will match any character including newlines so that you can match to the end of the string.
http://regex101.com/r/sX2kL0
edit: Note that this takes a slightly different approach than the other answer, this will match the portion of the string that you want deleted so you can replace this portion with an empty string to remove it.

How to extract multi-line text delimited by 2 strings

I've following pattern:
Claims(40)
This is good.
This is good, too.
Description
This is description.
The delimiter strings in this case are:
1st delimiter: "Claims(40)"
2nd delimiter: "Description"
I want to extract text between these delimiters while excluding the delimiters.
Also, in the above text, following rules exist:
1st delimiter starts on the 1st column in the text and it's the only word on the line.
In the first delimiter, opening parenthesis, combination of digits, and closing parenthesis may be absent. However, combination of digits and closing parenthesis exist if does the opening parenthesis.
2nd delimiter starts on the 1st column in the text and it's the only word on the line.
My regular expression:
String regxStr = "^Claims(\\(\\d+\\)?)$(.*?)^Description$";
This doesn't work.
I tried a lot many other regx, but none did work. So finally, I resorted applying brute-force approach with the regex:
String regxStr = "Claims(.*?)Description";
But neither of the regx is working. I am not being able to figure out what's and where the regx is going wrong.
I'm using Matcher class and find() method of Matcher class for further processing.
Please help me.

This captures the text you want, although I'm not totally clear on your requirements for the (40) part. #lovetostrike's answer addresses that.
\bClaims(?:\(\d+\))?\s+(.+?)\s+Description\b
You must activate the DOTALL flag when compiling the pattern:
Pattern.compile(regxStr, Pattern.DOTALL)
Escaped in a Java string:
"\\bClaims(?:\\(\\d+\\))?\\s+(.+?)\\s+Description\\b"

Here's a one-line solution:
String target = input.relaceAll(".*Claims(\\(\\d+\\))?\\s+(.*?)Description.*", "$1");

Also in addition to #aliteralmind answer, Regex isn't a good tool for nested structure, i.e. matching paren pairs. But in your simple case, you can use the OR, '|', operator in your pattern. The outer parens are used to separate the two groups for OR operator, first part with parens, and the second without parens.
(\\(\\d+\\)|\\d+)

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!

There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff

How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

java regex tricky pattern

I'm stucked for a while with a regex that does me the following:
split my sentences with this: "[\W+]"
but if it finds a word like this: "aaa-aa" (not "aaa - aa" or "aaa--aaa-aa"), the word isnt splitted, but the whole word.
Basically, i want to split a sentece per words, but also considering "aaa-aa" is a word. I'have sucessfully done that by creating two separate functions, one for spliting with \w, and other to find words like "aaa-aa". Finally, i then add both, and subctract each compound word.
For example, the sentence:
"Hello my-name is Richard"
First i collect {Hello, my, name, is, Richard}
then i collect {my-name}
then i add {my-name} to {Hello, my, name, is, Richard}
then i take out {my} and {name} in here {Hello, my, name, is, Richard}.
result: {Hello, my-name, is, Richard}
this approach does what i need, but for parsing large files, this becomes too heavy, because for each sentence there's too many copies needed. So my question is, there is anything i can do to include everything in one pattern? Like:
"split me the text using this pattern "[\W+], but if you find a word like this "aaa-aa", consider it a word and not two words.

If you want to use a split() rather than explicitly matching the words you are interested in, the following should do what you want: [\s-]{2,}|\s To break that down, you first split on two or more whitespaces and/or hyphens - so a single '-' won't match so 'one-two' will be left alone but something like 'one--two', 'one - two' or even 'one - --- - two' will be split into 'one' and 'two'. That still leaves the 'normal' case of a single whitespace - 'one two' - unmatched, so we add an or ('|') followed by a single whitespace (\s). Note that the order of the alternatives is important - RE subexpressions separated by '|' are evaluated left-to-right so we need to put the spaces-and-hyphens alternative first. If we did it the other way around, when presented with something like 'one -two' we'd match on the first whitespace and return 'one', '-two'.
If you want to interactively play around with Java REs I can thoroughly recommend http://myregexp.com/signedJar.html which allows you to edit the RE and see it matching against a sample string as you edit the RE.

Why not to use pattern \\s+? This does exactly what you want without any tricks: splits text by words separated by whitespace.

Your description isn't clear enough, but why not just split it up by spaces?

I am not sure whether this pattern would work, because I don't have developer tools for Java, you might try it though, it uses character class substraction, which is supported only in Java regex as far as I know:
[\W&&[^-]]+
it means match characters if they are [\W] and [^-], that is characters are [\W] and not [-].

Almost the same regular expression as in your previous question:
String sentence = "Hello my-name is Richard";
Pattern pattern = Pattern.compile("(?<!\\w)\\w+(-\\w+)?(?!\\w)");
Matcher matcher = pattern.matcher(sentence);
while (matcher.find()) {
System.out.println(matcher.group());
}
Just added the option (...)? to also match non-hypened words.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex Match substring if said substring contains another word - java

Try this. String input = "Apple, Pinapple, NONE\nOrange, Pears, Apples\nMango, None, Banana\nLemon, NONEDLE, Grape"; input.lines() .filter(s -> s.matches(".\\bNONE\\b.")) .forEach(System.out::println); output Apple, Pinapple, NONE

Use word boundaries either side of NONE: "(?m)^.\\bNONE\\b.$" With multiline flag on, ^ and $ match start and end of lines and since dot doesn’t match newlines, this will match whole lines with NONE is them. Use this regex with a Matcher; each call to find() will give you the lines you want.

Related

Filter out exact words from part of string - java regex

How to match regex pattern on single line only?

How to extract multi-line text delimited by 2 strings

capture all characters between match character (single or repeated) on string

java regex tricky pattern

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex Match substring if said substring contains another word - java

Try this. String input = "Apple, Pinapple, NONE\nOrange, Pears, Apples\nMango, None, Banana\nLemon, NONEDLE, Grape"; input.lines() .filter(s -> s.matches(".*\\bNONE\\b.*")) .forEach(System.out::println); output Apple, Pinapple, NONE

Use word boundaries either side of NONE: "(?m)^.*\\bNONE\\b.*$" With multiline flag on, ^ and $ match start and end of lines and since dot doesn’t match newlines, this will match whole lines with NONE is them. Use this regex with a Matcher; each call to find() will give you the lines you want.

Related

Filter out exact words from part of string - java regex

How to match regex pattern on single line only?

How to extract multi-line text delimited by 2 strings

capture all characters between match character (single or repeated) on string

java regex tricky pattern

Categories

Resources

Try this. String input = "Apple, Pinapple, NONE\nOrange, Pears, Apples\nMango, None, Banana\nLemon, NONEDLE, Grape"; input.lines() .filter(s -> s.matches(".\\bNONE\\b.")) .forEach(System.out::println); output Apple, Pinapple, NONE

Use word boundaries either side of NONE: "(?m)^.\\bNONE\\b.$" With multiline flag on, ^ and $ match start and end of lines and since dot doesn’t match newlines, this will match whole lines with NONE is them. Use this regex with a Matcher; each call to find() will give you the lines you want.