Regex Multiple Strings With "or" Operator - java

I need to establish a java regex that will recognize the following 3 cases:
Any combination/amount of the following characters: "ACTGactg:"
or
Any single question marks "?"
or
Any string "NTC"
I will list what I have tried so far and the errors that have arisen.
public static final VALID_STRING = "[ACTGactg:]*";
// Matches the first case but not the second or third
// as expected.
public static final VALID_STRING = "\\?|[ACTGactg:]*";
// Matches all 3 conditions when my understanding leads me to
// believe that it should not except the third case of "NTC"
public static final VALID_STRING = "?|[ACTGactg:]*";
// Yields PatternSyntaxException dangling metacharacter ?
What I would expect to be accurate is the following:
public static final VALID_STRING = "NTC|\\?|[ACTGacgt:]*";
But I want to make sure that if I take away the "NTC" that any "NTC" string will appear as invalid.
Here is the method I am using to test these regexs.
private static boolean isValid(String thisString){
boolean valid = false;
Pattern checkRegex = Pattern.compile(VALID_STRING);
Matcher matchRegex = checkRegex.matcher(thisString);
while (matchRegex.find()){
if (matchRegex.group().length != 0){
valid = true;
}
}
return valid;
}
So here are my closing questions:
Could the "\\?" regex possible be acting as a wild card character that is accepting the "NTC" string?
Are the or operators "|" appropriate here?
Do I need to make use of parenthesis when using these or operators?
Here are some example incoming strings:
A:C
T:G
AA:CC
T:C:A:G
NTC
?
Thank you

Yes the provided regex would be ok:
public static final VALID_STRING = "NTC|\\?|[ACTGacgt:]+";
...
boolean valid = str.matches(VALID_STRING);
If your remove NTC| from the regex the string NTC becomes invalid.
You can test it and experiment yourself here.

Since you are using the Matcher.find() method, you are looking for your pattern anywhere in the string.
This means the strings A:C, T:G, AA:CC etc. match in their entirety. But how about NTC?
It matches because find() looks for a match anywhere. the TC part of it matches, therefore you get true.
If you want to match only the strings in their entirety, either use the match() method, or use ^ and $.
Note that you don't have to check that the match is longer than 0, if you change your pattern to [ACTGactg:]+ instead of [ACTGactg:]*.

Related

match all characters in a string independent of their order in the sequence

I want to match certain group of characters in a String independent of their order in the String using regex fucntion. However, the only requirement is that they all must be there.
I have tried
String elD = "15672";
String t = "12";
if ((elD.matches(".*[" + t + "].*"))) {
System.out.println(elD);
}
This one checks whether any of the characters are present. But I want all of them to be there.
Also I tried
String elD = "15672";
String t = "12";
if ((elD.matches(".*(" + t + ").*"))) {
System.out.println(elD);
}
This does not work as well. I have searched quite a while but I could not find an example when all of the characters from the pattern must be present in the String independent of their order.
Thanks
You can write regex for this but it would not look nice. If you would want to check if your string contains anywhere x and y you would need to use few times look-ahead like
^(?=.*x)(?=.*y).*$
and use it like
yourStirng.matches(regex);
But this way you would need to create your own method which would generate you dynamic regex and add (?=.*X) for each character you want to check. You would also need to make sure that this character is not special in regex like ? or +.
Simpler and not less effective solution would be creating your own method which would check if your string contains all searched characters, something like
public static boolean containsUnordered(String input, String searchFor){
char[] characters = searchFor.toCharArray();
for (char c: characters)
if (!input.contains(String.valueOf(c)))
return false;
return true;
}
You can built a pattern from the search string using the replaceAll method:
String s = "12";
String pattern = s.replaceAll("(.)", "(?=[^$1]*$1)");
Note: You can't test the same character several times. (i.e. 112 gives (?=[^1]*1)(?=[^1]*1)(?=[^2]*2) that is exactly the same as (?=[^1]*1)(?=[^2]*2))
But in my opinion Pshemo method is probably more efficient.

Check special arrangement of specific signs in a string in Java

I need to check a string whether it includes a specific arrangements of letters and numbers.
Valid arrangements are for example:
X
X-Y
A-H-K-L-J-Y
A-H-J-Y
123
12?
12*
12-17
Invalid are for example:
-X-Y
-XY
*12
?12
I have written this method in java to solve this problem (but i donĀ“t have some experiences with regular expressions):
public boolean checkPatternMatching(String sourceToScan, String searchPattern) {
boolean patternFounded;
if (sourceToScan == null) {
patternFounded = false;
} else {
Pattern pattern = Pattern.compile(Pattern.quote(searchPattern),
Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(sourceToScan);
patternFounded = matcher.find();
}
return patternFounded;
}
How can i implemented this requirement with regular expressions?
By the way: It is a good solution to check a string, whether it includes numeric content by using the method isNumeric from the java class StringUtils?
//EDIT
The link, which was edited by the admins includes not specific arrangements of characters but only an appearance of characters with regular expressions in general !
After a good while trying to help, answering to constantly changing questions, just found out that the same was asked yesterday, and that the OP doesn't accept answers to his questions...all I have left to say is good night sir, good luck
n-th answer follows:
First pattern: [a-z](-[a-z])* : a letter, possibly followed by more letters, separated by -.
Second pattern: \d+(-\d+)*[?*]* : a number, possibly followed by more numbers, separated by -, and possibly ending with ? or *.
So join them together: ^([a-z](-[a-z])*)|(\d+(-\d+)*[?*]*)$. ^ and $ mark the beginning and the end of the string.
Few more comments on the code: you don't need to use Pattern.quote, and you should use matches() instead of find(), because find() returns true if any part of the string matches the pattern, and you want the whole string:
public static boolean checkPatternMatching(String sourceToScan, String searchPattern) {
boolean patternFounded;
if (sourceToScan == null) {
patternFounded = false;
} else {
Pattern pattern = Pattern.compile(searchPattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(sourceToScan);
patternFounded = matcher.matches();
}
return patternFounded;
}
Called like this: checkPatternMatching(s, "^([a-z](-[a-z])*)|(\\d+(-\\d+)*[?*]*)$")
About the second question, this is the current implementation of StringUtils.isNumeric:
public static boolean isNumeric(final CharSequence cs) {
if (isEmpty(cs)) {
return false;
}
final int sz = cs.length();
for (int i = 0; i < sz; i++) {
if (Character.isDigit(cs.charAt(i)) == false) {
return false;
}
}
return true;
}
So no, there is nothing wrong about it, that is as simple as it gets. But you need to include an external JAR in your program, which I find unnecessary if you just want to use such a simple method.
I believe that you should first remove the Pattern.quote() method because that would turn the inputting patterns into string literals; and those are not really useful in your context.
To match the valid arrangements with letters, something like this should work:
^[a-z](?:-[a-z])*$
For the numbers (if I understood the rules correctly):
^\\d+(?:[?*]|-\\d+)*$
And if you want to combine them:
^(?:[a-z](?:-[a-z])*|\\d+(?:[?*]|-\\d+)*)$
I'm not familiar with Java itself, nor the isNumeric method, sorry.
As per your comment, if you want to accept *12 or 1?2 or 12*456, you can use:
^\\*?\\d+(?:[?*]\\d*|-\\d+)*$
Then add it to the previous regex like so:
^(?:[a-z](?:-[a-z])*|\\*?\\d+(?:[?*]\\d*|-\\d+)*)$

Java Regex is including new line in match

I'm trying to match a regular expression to textbook definitions that I get from a website.
The definition always has the word with a new line followed by the definition. For example:
Zither
Definition: An instrument of music used in Austria and Germany It has from thirty to forty wires strung across a shallow sounding board which lies horizontally on a table before the performer who uses both hands in playing on it Not to be confounded with the old lute shaped cittern or cithern
In my attempts to get just the word (in this case "Zither") I keep getting the newline character.
I tried both ^(\w+)\s and ^(\S+)\s without much luck. I thought that maybe ^(\S+)$ would work, but that doesn't seem to successfully match the word at all. I've been testing with rubular, http://rubular.com/r/LPEHCnS0ri; which seems to successfully match all my attempts the way I want, despite the fact that Java doesn't.
Here's my snippet
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\S+)$");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group();
terms.add(new SearchTerm(result, System.nanoTime()));
}
This is easily solved by triming the resulting string, but that seems like it should be unnecessary if I'm already using a regular expression.
All help is greatly appreciated. Thanks in advance!
Try using the Pattern.MULTILINE option
Pattern rgx = Pattern.compile("^(\\S+)$", Pattern.MULTILINE);
This causes the regex to recognise line delimiters in your string, otherwise ^ and $ just match the start and end of the string.
Although it makes no difference for this pattern, the Matcher.group() method returns the entire match, whereas the Matcher.group(int) method returns the match of the particular capture group (...) based on the number you specify. Your pattern specifies one capture group which is what you want captured. If you'd included \s in your Pattern as you wrote you tried, then Matcher.group() would have included that whitespace in its return value.
With regular expressions the first group is always the complete matching string. In your case you want group 1, not group 0.
So changing mtch.group() to mtch.group(1) should do the trick:
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\w+)\s");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group(1);
terms.add(new SearchTerm(result, System.nanoTime()));
}
A late response, but if you are not using Pattern and Matcher, you can use this alternative of DOTALL in your regex string
(?s)[Your Expression]
Basically (?s) also tells dot to match all characters, including line breaks
Detailed information: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Just replace:
String result = mtch.group();
By:
String result = mtch.group(1);
This will limit your output to the contents of the capturing group (e.g. (\\w+)) .
Try the next:
/* The regex pattern: ^(\w+)\r?\n(.*)$ */
private static final REGEX_PATTERN =
Pattern.compile("^(\\w+)\\r?\\n(.*)$");
public static void main(String[] args) {
String input = "Zither\n Definition: An instrument of music";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1 = $2")
); // prints "Zither = Definition: An instrument of music"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1")
); // prints "Zither"
}

RegEx - Java - Matching Strings (012|123|234|345|456|567|678|789|890)

I am working on password enhancement and client wants a password that does not have consecutive letters ie: 123, 234.
I have figured out that you can declare strings that you want to match in regex like (012|123|234|345|456|567|678|789|890) and made it as the regex sequence.
This sequence is separated from the other sequences for easy reading.
The problem is, I cannot match the password with the pattern even if I included 123 or 234 in the password character.
I've read that regex cannot detect 123 as consecutive numbers, but as a string, can it do so?
If you have a limited sequence of characters following one another you can use a Pattern, .find() on a matcher on your input and just invert the test:
// Only the alternation is needed, no need for the capture
private static final Pattern PATTERN
= Pattern.compile("012|123|234|345|456|567|678|789|890");
// ...
if (PATTERN.matcher(input).find())
// fail: illegal sequence found
But if you want to detect that code points follow one another you have to use character functions:
final CharBuffer buf = CharBuffer.wrap(input);
int maxSuccessive = 0;
int successive = 0;
char prev = buf.get();
char next;
while (buf.hasRemaining()) {
next = buf.get();
if (next - prev == 1)
successive++;
else {
maxSuccessive = Math.max(maxSuccessive, successive);
successive = 0;
}
prev = next;
}
// test maxSuccessive
Note however that this will test successive characters according to "canonical ordering", not collation. In some locales, for instance, what is immediately after a is A, not b.
More generally, if you want to test for password requirements and constraint evolves, you are better off splitting things a bit. For instance, consider this:
public interface PasswordChecker
{
boolean isValid(final String passwd);
}
Implement this interface for each of your checks (for instance, length, presence/absence of certain characters, etc), and when you do check for a password, have a List of checkers; the password is invalid if one checker returns false:
private final List<PasswordChecker> checkers = ...;
// then
for (final PasswordChecker checker: checkers)
if (!checker.isValid(passwd))
return false;
return true;
If you use Guava, you can forget about PasswordChecker and use Predicate<String>.
If you're only dealing with these strings of digits that you want to exclude, you can achieve this using a negative lookahead assertion:
^(?!.*(012|123|234|345|456|567|678|789|890))<regex>
where <regex> is the actual regex you're using to match the password, and (?!...) is the lookahead that asserts it's impossible to match that string in your regex.
If you're asking about any increasing sequence of characters, then regex is not the right tool for this. You would have to do that programmatically.

Regex NOT operator doesn't work

I'm trying to filter files in a folder. I need the files that don't end with ".xml-test". The following regex works as expected (ok1,ok2,ok3 = false, ok4 = true)
String regex = ".+\\.xml\\-test$";
boolean ok1 = Pattern.matches(regex, "database123.xml");
boolean ok2 = Pattern.matches(regex, "database123.sql");
boolean ok3 = Pattern.matches(regex, "log_file012.txt");
boolean ok4 = Pattern.matches(regex, "database.xml-test");
Now I just need to negate it, but it doesn't work for some reason:
String regex = "^(.+\\.xml\\-test)$";
I still get ok1,ok2,ok3 = false, ok4 = true
Any ideas? (As people pointed, this could be done easily without regex. But for arguments sake assume I have to use a single regex pattern and nothing else (ie !Pattern.matches(..); is also not allowed))
I think you are looking for:
if (! someString.endsWith(".xml-test")) {
...
}
No regular expression required. Throw this into a FilenameFilter as follows:
public accept(File dir, String name) {
return ! name.endsWith(".xml-test");
}
The meaning of ^ changes depending on its position in the regexp. When the symbol is inside a character class [] as the first character, it means negation of the character class; when it is outside a character class, it means the beginning of line.
The easiest way to negate a result of a match is to use a positive pattern in regex, and then to add a ! on the Java side to do the negation, like this:
boolean isGoodFile = !Pattern.matches(regex, "database123.xml");
The following Java regex asserts that a string does NOT end with: .xml-test:
String regex = "^(?:(?!\\.xml-test$).)*$";
This regex walks the string one character at a time and asserts that at each and every position the remainder of the string is not .xml-test.
Simple!
^ - is not a negation in regexp, this is a symbol indicating beginning of line
you probably need (?!X) X, via zero-width negative lookahead
But I suggest you to use File#listFiles method with FilenameFilter implementation:
name.endsWith(".xml-test")
If you really need to test it with regex, then you should use negative lookbehinds from Pattern class:
String reges = "^.*(?<!\\.xml-test)$"
How it works:
first you match whole string: from start (^) all characters (.*),
you check if what have already matched doesn't have ".xml-test" at end (lookbehind at position you already matched),
you test if it's end of string.

Categories

Resources