Negative Look-Ahead assertion for multiline text [duplicate]

Negative Look-Ahead assertion for multiline text [duplicate] - java

This question already has answers here:
How to use java regex to match a line
(2 answers)
Closed 4 years ago.
i'm looking for a way to check whether a multiline string (from a pdf) contains a certain letter combination which must not start with a specific prefix. Specifically, i'm trying to find Strings that contain ARC but don't contain NON-ARC.
I found this great example Regular expression for a string that does not start with a sequence but it seems it does not work with my problem. With my pattern ^(?!NON\\-)ARC.* i get the expected result in a single line test, with real input the negative look ahead assertion has a false positive. Here is what i did:
#Test
public void testRegexLookAhead() {
String strTestSimplePos = "ARC 0.1-1";
String strTestSimpleNeg = "NON-ARC 3.4-1";
String strTestRealPos = "HEADLINE\r\n" + "Subheader Author\r\n" + "ARC 0.1-1\r\n" + "20190211";
String strTestRealNeg = "HEADLINE\r\n" + "Subheader Author\r\n" + "NON-ARC 0.1-1\r\n" + "20190211";
//based on https://stackoverflow.com/questions/899422/regular-expression-for-a-string-that-does-not-start-with-a-sequence
String regexNoNON = "^(?!NON\\-)ARC.*";
Pattern noNONPatter = Pattern.compile(regexNoNON);
System.out.println(noNONPatter.matcher(strTestSimplePos).find()); //true OK
System.out.println(noNONPatter.matcher(strTestSimpleNeg).find()); //false OK
System.out.println(noNONPatter.matcher(strTestRealPos).find()); //false but should be true -> does not work as intended
System.out.println(noNONPatter.matcher(strTestRealNeg).find()); //false OK
Would be glad if anyone can point out what went wrong...
Edit: This was marked as a duplicate of How to use java regex to match a line - however i didn't try to use a regex to match a line at all. Just needed a way to find a specific sequence (with negative look-ahead) for a multiline text input. One approach to solve the other question is also the solution to this one (compile pattern with java.util.regex.Pattern.MULTILINE) - but the questions are at best related.

Your input strings have multiple lines and you're using the caret, you need to add the multi-line flag:
Pattern.compile(regexNoNON, java.util.regex.Pattern.MULTILINE);
About MULTILINE:
Enables multiline mode.
In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.

Try this Regex:
HEADLINE(?:(?!HEADLINE)[\s\S])*(?<!NON-)ARC(?:(?!HEADLINE)[\s\S])*
Click for Demo
JAVA Code
Explanation:
HEADLINE - matches the word HEADLINE
(?:(?!HEADLINE)[\s\S])* - matches 0+ occurrences of any character that does not start with the word HEADLINE
(?<!NON-)ARC - matches the word ARC if it is not immediately preceded by NON-
(?:(?!HEADLINE)[\s\S])* - matches 0+ occurrences of any character that does not start with the word HEADLINE

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.

If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.

You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.

^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

java regex find all whitespace in a string [duplicate]

This question already has answers here:
Whitespace Matching Regex - Java
(11 answers)
Regexp Java for password validation
(17 answers)
Closed 5 years ago.
I have see numerous suggestions for regex to find whitespace in a string none of which have worked so far. Yes the concept of looping through the string with a for next loop will work. I would really like to learn how to do this with regex and Pattern Matcher ! My question is what and where do I need to add to my regex string so it will return FALSE? code below I have added numerous incarnations of (\\s) to no avail. I do not want to remove the whitespace.
I tested the code suggested as a duplicate and it does not work see the link suggested in the comments
String tstr = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[$#$!%*?&])[A-Za-z\\d$#$!%*?&]";
String astr = etPW.getText().toString().trim();
Pattern regex = Pattern.compile(tstr);
Matcher regexMatcher = regex.matcher(astr);
boolean foundMatch = regexMatcher.find();
if(foundMatch == false){
Toast.makeText( MainActivity.this, "Password must have one Numeric Value\n"
+ "\nOne Upper & Lower Case Letters\n"
+ "\nOne Special Character $ # ! % * ? &", Toast.LENGTH_LONG ).show();
//etPW.setText("");
//etCPW.setText("");
// Two lines of code above are optional
// Also by design these fields can be set to input type Password in the XML file
etPW.requestFocus();
return ;
}

You can use negative lookahead to check for spaces:
^(?!.* )
^ - Start matching at the beginning of the string.
(?! - Begin a negative lookahead group (the pattern inside the parentheses must not come next.
.* - Any non-newline character any number of times followed by a space.
) - Close the negative lookahead group.
Combined with the full regex pattern (also cleaned up a bit to remove redundancy):
^(?!.* )(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[!#$%&*?])[A-Za-z\\d!#$%&*?]+

How to replace a specific occurrence of a sub-string in a string and ignoring incomplete matches in java? [duplicate]

This question already has answers here:
Search for a word in a String
(9 answers)
Closed 7 years ago.
In my application I'm giving dictionary word suggestions and replacing the selected word with the suggested word using .replaceAll(). However that is replacing every sub string in the entire string
for example in this String,
String sentence = "od and not odds as a sample sam. but not odinary";
If I suggest the first word as odd .replaceAll() will replace every occurrence of od with odd hence affecting the fourth word to oddds and changing the sentence to
sentence.replaceAll("od", "odd");
//sentence String becomes
sentence ="odd and not oddds as a sample sam. but not oddinary"
Replacing the od to odd has affected all the other words which have the od characters in them.
Can any one help me with a better aproach?

Use regex. For you example "\bod\b" will just match od as a whole word. \b is a word boundary, meaning either the start or the end of a word (whether it ends with a dot or a whitespace or anything else).
The replaceAll method can already take in a regex, but if you need more power you can look at the Matcher class.
String REPLACE_WORD = "od"
sentence.replaceAll("\\b" + REPLACE_WORD + "\\b", "odd");
will give you the correct answer. The \ tells java that you want to write \ instead of \b (it first parses the string, and than parses that string as regex).

As mentioned, you can use a Matcher from Java.util.regex.* which has a lot of useful functionality.
String text = "I detect quite an od odour.";
String searchTerm = "\\bod\\b";
Pattern pattern = Pattern.compile(searchTerm);
Matcher matcher = pattern.matcher(text);
text = matcher.replaceAll("odd");
System.out.println(text);
The output would be:
I detect quite an odd odour.

Use the regular expression in the replaceAll() method:
\bod\b
This will filter out occurrences of the od inside any other word.
Of course when you use it in Java method, you need to escape the \
So
replaceAll("\\bod\\b", "odd");
should do it.

How to write a regex that prevents partial matching [duplicate]

This question already has answers here:
Regex whitespace word boundary
(3 answers)
Closed 2 years ago.
How do I build a regex pattern that searches over a text T and tries to find a search string S.
There are 2 requirements:
S could be made of any character.
S could be anywhere in the string but can't be part of a word.
I know that in order to escape special regex characters I put the search string between \Q and \E as such:
\EMySearch_String\Q
How do I prevent finding partial matching of S in T?

You can do like this if
can't be part of a word
is interpreted as
preceded by start-of-string or space and followed by end-of-string or space:
String s = "3894$75\\/^()";
String text = "fdsfsd3894$75\\/^()dasdasd 22348 3894$75\\/^()";
Matcher m = Pattern.compile("(?<=^|\\s)\\Q" + s + "\\E(?=\\s|$)").matcher(text);
while (m.find()) {
System.out.println("Found match! :'" + m.group() + "'");
}
This prints only one
Found match! :'3894$75/^()'

I think what you're trying to find can be easily solved with lookaheads and lookbehinds. Take a look at this for a good explanation.
Then there's a bit of flip-flopping booleans, but you're looking ahead and behind for NOT Non-Space characters (\S). You don't want to look for space characters only because S might be at the start or end of the string. Like so:
(?<!\S)S(?!\S)

Regex matching in Java

I have a string in Java that I need to split using "<$" and "$>" as delimiters.
But if I have something looking like "\<$something_we_dont_care_what$>" than we ignore it and move on.
I've been trying to write a regex doing this for a while but I keep failing and reading about regular expressions in Java is just making me more and more confused...
Can anyone tell me the right way to do this?
Thank you.

Think you have two strings - not in your code, but read from file or a JTextField:
s = "\<$foo$>";
p = "[^\\]?<\$[^\$]*\$>";
And you want to match the pattern to the String.
What I have done so far:
A group, which does not contain a backslash [^\\]? but might be optional.
<$, where the Dollar, as special regex has to be masked by a backslash, as the backslash before.
A group [^\$]* which does not contain another Dollar of free length.
A Dollar with \$> a greater-than. Again: Dollar masked.
A question for your domain is, whether the foo-part, or something_we_dont_care_what, might contain a dollar sign, not followed by a >. I asssumed not.
s.match (p);
Should now return true or false, but the problem is, how to get it into your code. The problem is, that not only regex, but Java itself treats the backslash as masking character. So you have to double each of them:
p = "[^\\\\]?<\\$[^\\$]*\\$>";
If the test case is a literal text in your code too, this applies for it too:
"\\<$foo$>".matches (p);
Trying them out is often a good idea if you have a tool where you can omit the Java masking first - a simple GUI with two JTextFields, or code which reads the pattern from a properties file, which saves you from repeated recompiles.
public class PM
{
public static void main (String args[])
{
String bad = "\\<$foo$>";
String good = "<$foo$>";
String p = "[^\\\\]?<\\$[^\\$]*\\$>";
System.out.println ("bad:\t" + bad.matches (p));
System.out.println ("good:\t" + good.matches (p));
}
}

Never mind.
I've found a solution after a few hours of browsing and experimenting.
Regex expression that does exactly what I wanted is following:
// char $ needs to be escaped because it has different meaning in regular expressions
// <$
String leftDelimiter = "(<\\$)";
// $>
String rightDelimiter = "(\\$>)";
// leftDelimiter | rightDelimiter
// when used to split a string would split it each time it detected those two patters
// and it would also split it in the case I dont want them to split it
// and that is "\<$foo$>" case - when they are "escaped" in the string
// to solve it we can try to match our leftDelimiter only if char \ isnt before it
// matches all [$ that dont start with \
String fixedLeftDelimiter = "(?<!\\\\)"+leftDelimiter;
// the problem presents itself with the rightDelimiter because it needs to check
// whether there had been a leftDelimiter before it that has been escaped
// the following takes care of that
// matches all $> that dont have a <$ starting with \
String betterRightDelimiter = "(?<!\\\\"+leftDelimiter+whatCanBeInTags+rightDelimiter;
// whatCanBeInTags is everything that can be in out tags besides $ sign
// we are using {0,"+(Integer.MAX_VALUE-3)+"}? instead of *? because of a limitation
// of number of characters put in lookbehind assertion
String whatCanBeInTags = "[^\\$]{0,"+(Integer.MAX_VALUE-3)+"}?)";

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Negative Look-Ahead assertion for multiline text [duplicate] - java

Related

Regex pattern matching with multiple strings

java regex find all whitespace in a string [duplicate]

How to replace a specific occurrence of a sub-string in a string and ignoring incomplete matches in java? [duplicate]

How to write a regex that prevents partial matching [duplicate]

Regex matching in Java

Categories

Resources