Finding whether a string meets a certain pattern

Finding whether a string meets a certain pattern - java

I recently had an interview with Google for a Software Engineering position and the question asked regarded building a pattern matcher.
So you have to build the
boolean isPattern(String givenPattern, String stringToMatch)
Function that does the following:
givenPattern is a string that contains:
a) 'a'-'z' chars
b) '*' chars which can be matched by 0 or more letters
c) '?' which just matches to a character - any letter basically
So the call could be something like
isPattern("abc", "abcd") - returns false as it does not match the pattern ('d' is extra)
isPattern("a*bc", "aksakwjahwhajahbcdbc"), which is true as we have an 'a' at the start, many characters after and then it ends with "bc"
isPattern("a?bc", "adbc") returns true as each character of the pattern matches in the given string.
During the interview, time being short, I figured one could walk through the pattern, see if a character is a letter, a * or a ? and then match the characters in the given string respectively. But that ended up being a complicated set of for-loops and we didn't manage to come to a conclusion within the given 45 minutes.
Could someone please tell me how they would solve this problem quickly and efficiently?
Many thanks!

Assuming you are allowed to use regexes, you could have written something like:
static boolean isPattern(String givenPattern, String stringToMatch) {
String regex = "^" + givenPattern.replace("*", ".*").replace("?", ".") + "$";
return Pattern.compile(regex).matcher(stringToMatch).matches();
}
"^" is the start of the string
"$" is the end of the string
. is for "any character", exactly once
.* is for "any character", 0 or more times
Note: If you want to restrict * and ? to letters only, you can use [a-zA-Z] instead of ..

boolean isPattern(String givenPattern, String stringToMatch) {
if (givenPattern.empty)
return stringToMatch.isEmpty();
char patternCh = givenPatter.charAt(0);
boolean atEnd = stringToMatch.isEmpty();
if (patternCh == '*') {
return isPattenn(givenPattern.substring(1), stringToMatch)
|| (!atEnd && isPattern(givenPattern, stringToMatch.substring(1)));
} else if (patternCh == '?') {
return !atEnd && isPattern(givenPattern.substring(1),
stringToMatch.substring(1));
}
return !atEnd && patternCh == stringToMatch.charAt(0)
&& isPattern(givenPattern.substring(1), stringToNatch.subtring(1);
}
(Recursion being easiest to understand.)

Related

Regex to require one alphabet and one non-alphabet and no whitespace

I want to create a regex in Java to match at least 1 alphabet and 1 non-alphabet (could be anything except A-Za-z) and no white space.
Below Regex is working partially correct:
^([A-Za-z]{1,}[^A-Za-z]{1,})+$
It matches aaaa7777
but doesn't match 777aaaaa.
Any Help would be appreciated.

Your regex implicitly assumes the order of the characters you want to match. The regex is saying that a letter must come before a non-latter. However, you want the letter and the non-letter to come in either order, so you need to account for both cases. Also note that it should be [^\sa-zA-Z] instead of [^a-zA-Z] as you don't allow spaces.
(?:[a-zA-Z][^\sa-zA-Z]|[^\sa-zA-Z][a-zA-Z])
At the start and end, any non-space character is allowed, so:
^\S*(?:[a-zA-Z][^\sa-zA-Z]|[^\sa-zA-Z][a-zA-Z])\S*$

You may use
s.matches("(?=\\P{Alpha}*\\p{Alpha})(?=\\p{Alpha}*\\P{Alpha})\\S*")
This is how the pattern works.
Details
The pattern will match a whole string since ^ and \z anchors are implicit in matches
(?=\P{Alpha}*\p{Alpha}) - a lookahead that requires at least one ASCII letter after any 0+ chars other than an ASCII letter
(?=\p{Alpha}*\P{Alpha}) - a lookahead that requires a char other than an ASCII letter after 0 or more ASCII letters
\S* - zero or more non-whitespace chars.
To make the regex Unicode aware replace \p{Alpha} with \p{L} and \P{Alpha} with \P{L}.

Regular expressions aren't the right tool for this type of validation. Just write out the plain logic, your specific example:
public class Main {
public static void main(String[] args) {
System.out.println("'foo' ? " + doesMatch("foo"));
System.out.println("'bar7' ? " + doesMatch("bar7"));
System.out.println("'55baz' ? " + doesMatch("55baz"));
}
public static boolean doesMatch(String input) {
boolean hasAlpha = false,
hasNonAlpha = false;
for(char ch : input.toCharArray()) {
if(ch >= 'a' && ch <= 'z' || ch >= 'A' && ch <= 'Z') {
hasAlpha = true;
} else {
hasNonAlpha = true;
}
if(hasAlpha && hasNonAlpha) {
return true;
}
}
return false;
}
}
Anyone can understand what inputs do match and which inputs don't. If you use regular expressions this wouldn't be so simple.

Regular expression for phrase contain literals and numbers but is not all phrase as a number only with fixed range length

i want to have regular expression to check input character as a-z and 0-9 but i do not want to allow input as just numeric value at all ( must be have at least one alphabetic character)
for example :
413123123123131
not allowed but if have just only one alphabetic character in any place of phrase it's ok
i trying to define correct Regex for that and at final i raised to
[0-9]*[a-z].*
but in now i confused how to defined {x,y} length of phrase i want to have {9,31} but after last * i can not to have length block too i trying to define group but unlucky and not worked
tested at https://www.debuggex.com/
how can i to add it ??

What you seek is
String regex = "(?=.{9,31}$)\\p{Alnum}*\\p{Alpha}\\p{Alnum}*";
Use it with String#matches() / Pattern#matches() method to require a full string match:
if (s.matches(regex)) {
return true;
}
Details
^ - implicit in matches() - matches the start of string
(?=.{9,31}$) - a positive lookahead that requires 9 to 31 any chars other than line break chars from the start to end of the string
\\p{Alnum}* - 0 or more alphanumeric chars
\\p{Alpha} - an ASCII letter
\\p{Alnum}* - 0 or more alphanumeric chars
Java demo:
String lines[] = {"413123123123131", "4131231231231a"};
Pattern p = Pattern.compile("(?=.{9,31}$)\\p{Alnum}*\\p{Alpha}\\p{Alnum}*");
for(String line : lines)
{
Matcher m = p.matcher(line);
if(m.matches()) {
System.out.println(line + ": MATCH");
} else {
System.out.println(line + ": NO MATCH");
}
}
Output:
413123123123131: NO MATCH
4131231231231a: MATCH

This might be what you are looking for.
[0-9a-zA-Z]*[a-zA-Z][0-9a-zA-Z]*
To help explain it, think of the middle term as your one required character and the outer terms as any number of alpha numeric characters.
Edit: to restrict the length of the string as a whole you may have to check that manually after matching. ie.
if (str.length > 9 && str.length < 31)
Wiktor does provide a solution that involves more regex, please look at his for a better regex pattern

Try this Regex:
^(?:(?=[a-z])[a-z0-9]{9,31}|(?=\d.*[a-z])[a-z0-9]{9,31})$
OR a bit shorter form:
^(?:(?=[a-z])|(?=\d.*[a-z]))[a-z0-9]{9,31}$
Demo
Explanation(for the 1st regex):
^ - position before the start of the string
(?=[a-z])[a-z0-9]{9,31} means If the string starts with a letter, then match Letters and digits. minimum 9 and maximum 31
| - OR
(?=\d.*[a-z])[a-z0-9]{9,31} means If the string starts with a digit followed by a letter somewhere in the string, then match letters and digits. Minimum 9 and Maximum 31. This also ensures that If the string starts with a digit and if there is no letter anywhere in the string, there won't be any match
$ - position after the last literal of the string
OUTPUT:
413123123123131 NO MATCH(no alphabets)
kjkhsjkf989089054835werewrew65 MATCH
kdfgfd4374985794379857984379857weorjijuiower NO MATCH(length more than 31)
9087erkjfg9080980984590p465467 MATCH
4131231231231a MATCH
kjdfg34 NO MATCH(Length less than 9)

Here's the regex:
[a-zA-Z\d]*[a-zA-Z][a-zA-Z\d]*
The trick here is to have something that is not optional. The leading and trailing [a-zA-Z\d] has a * quantifier, so they are optional. But the [a-zA-Z] in the middle there is not optional. The string must have a character that matches [a-zA-Z] in order to be matched.
However, you need to check the length of the string with length afterwards and not with regex. I can't think of any way how you can do this in regex.
Actually, I think you can do this regexless pretty easily:
private static boolean matches(String input) {
for (int i = 0 ; i < input.length() ; i++) {
if (Character.isLetter(input.charAt(i))) {
return input.length() >= 9 && input.length() <= 31;
}
}
return false;
}

Use regex to replace sequences in a string with modified characters

I am trying to solve a codingbat problem using regular expressions whether it works on the website or not.
So far, I have the following code which does not add a * between the two consecutive equal characters. Instead, it just bulldozes over them and replaces them with a set string.
public String pairStar(String str) {
Pattern pattern = Pattern.compile("([a-z])\\1", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
if(matcher.find())
matcher.replaceAll(str);//this is where I don't know what to do
return str;
}
I want to know how I could keep using regex and replace the whole string. If needed, I think a recursive system could help.

This works:
while(str.matches(".*(.)\\1.*")) {
str = str.replaceAll("(.)\\1", "$1*$1");
}
return str;
Explanation of the regex:
The search regex (.)\\1:
(.) means "any character" (the .) and the brackets create a group - group 1 (the first left bracket)
\\1, which in regex is \1 (a java literal String must escape a backslash with another backslash) means "the first group" - this kind of term is called a "back reference"
So together (.)\1 means "any repeated character"
The replacement regex $1*$1:
The $1 term means "the content captured as group 1"
Recursive solution:
Technically, the solution called for on that site is a recursive solution, so here is recursive implementation:
public String pairStar(String str) {
if (!str.matches(".*(.)\\1.*")) return str;
return pairStar(str.replaceAll("(.)\\1", "$1*$1"));
}

FWIW, here's a non-recursive solution:
public String pairStar(String str) {
int len = str.length();
StringBuilder sb = new StringBuilder(len*2);
char last = '\0';
for (int i=0; i < len; ++i) {
char c = str.charAt(i);
if (c == last) sb.append('*');
sb.append(c);
last = c;
}
return sb.toString();
}

I dont know java, but I believe there is replace function for string in java or with regular expression. Your match string would be
([a-z])\\1
And the replace string would be
$1*$1
After some searching I think you are looking for this,
str.replaceAll("([a-z])\\1", "$1*$1").replaceAll("([a-z])\\1", "$1*$1");

This is my own solutions.
Recursive solution (which is probably more or less the solution that the problem is designed for)
public String pairStar(String str) {
if (str.length() <= 1) return str;
else return str.charAt(0) +
(str.charAt(0) == str.charAt(1) ? "*" : "") +
pairStar(str.substring(1));
}
If you want to complain about substring, then you can write a helper function pairStar(String str, int index) which does the actual recursion work.
Regex one-liner one-function-call solution
public String pairStar(String str) {
return str.replaceAll("(.)(?=\\1)", "$1*");
}
Both solution has the same spirit. They both check whether the current character is the same as the next character or not. If they are the same then insert a * between the 2 identical characters. Then we move on to check the next character. This is to produce the expected output a*a*a*a from input aaaa.
The normal regex solution of "(.)\\1" has a problem: it consumes 2 characters per match. As a result, we failed to compare whether the character after the 2nd character is the same character. The look-ahead is used to resolve this problem - it will do comparison with the next character without consuming it.
This is similar to the recursive solution, where we compare the next character str.charAt(0) == str.charAt(1), while calling the function recursively on the substring with only the current character removed pairStar(str.substring(1).

Java - How to test if a String contains both letters and numbers

I need a regex which will satisfy both conditions.
It should give me true only when a String contains both A-Z and 0-9.
Here's what I've tried:
if PNo[0].matches("^[A-Z0-9]+$")
It does not work.

I suspect that the regex below is slowed down by the look-around, but it should work regardless:
.matches("^(?=.*[A-Z])(?=.*[0-9])[A-Z0-9]+$")
The regex asserts that there is an uppercase alphabetical character (?=.*[A-Z]) somewhere in the string, and asserts that there is a digit (?=.*[0-9]) somewhere in the string, and then it checks whether everything is either alphabetical character or digit.

It easier to write and read if you use two separate regular expressions:
String s = "blah-FOO-test-1-2-3";
String numRegex = ".*[0-9].*";
String alphaRegex = ".*[A-Z].*";
if (s.matches(numRegex) && s.matches(alphaRegex)) {
System.out.println("Valid: " + input);
}
Better yet, write a method:
public boolean isValid(String s) {
String n = ".*[0-9].*";
String a = ".*[A-Z].*";
return s.matches(n) && s.matches(a);
}

A letter may be either before or after the digit, so this expression should work:
(([A-Z].*[0-9])|([0-9].*[A-Z]))
Here is a code example that uses this expression:
Pattern p = Pattern.compile("(([A-Z].*[0-9])|([0-9].*[A-Z]))");
Matcher m = p.matcher("AXD123");
boolean b = m.find();
System.out.println(b);

Here is the regex for you
Basics:
Match in the current line of string: .
Match 0 or any amount of any characters: *
Match anything in the current line: .*
Match any character in the set (range) of characters: [start-end]
Match one of the regex from a group: (regex1|regex2|regex3)
Note that the start and end comes from ASCII order and the start must be before end. For example you can do [0-Z], but not [Z-0]. Here is the ASCII chart for your reference
Check the string against regex
Simply call yourString.matches(theRegexAsString)
Check if string contains letters:
Check if there is a letter: yourString.matches(".*[a-zA-Z].*")
Check if there is a lower cased letter: yourString.matches(".*[a-z].*")
Check if there is a upper cased letter: yourString.matches(".*[A-Z].*")
Check if string contains numbers:
yourString.matches(".*[0-9].*")
Check if string contains both number and letter:
The simplest way is to match twice with letters and numbers
yourString.matches(".*[a-zA-Z].*") && yourString.matches(".*[0-9].*")
If you prefer to match everything all together, the regex will be something like: Match a string which at someplace has a character and then there is a number afterwards in any position, or the other way around. So your regex will be:
yourString.matches(".*([a-zA-Z].*[0-9]|[0-9].*[a-zA-Z]).*")
Extra regex for your reference:
Check if the string stars with letter
yourString.matches("[a-zA-Z].*")
Check if the string ends with number
yourString.matches(".*[0-9]")

This should solve your problem:
^([A-Z]+[0-9][A-Z0-9]*)|([0-9]+[A-Z][A-Z0-9]*)$
But it's unreadable. I would suggest to first check input with "^[A-Z0-9]+$", then check with "[A-Z]" to ensure it contains at least one letter then check with "[0-9]" to ensure it contains at least one digit. This way you can add new restrictions easily and code will remain readable.

What about ([A-Z].*[0-9]+)|([0-9].*[A-Z]+) ?

Try using (([A-Z]+[0-9])|([0-9]+[A-Z])) .It should solve.

use this method:
private boolean isValid(String str)
{
String Regex_combination_of_letters_and_numbers = "^(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]+$";
String Regex_just_letters = "^(?=.*[a-zA-Z])[a-zA-Z]+$";
String Regex_just_numbers = "^(?=.*[0-9])[0-9]+$";
String Regex_just_specialcharachters = "^(?=.*[##$%^&+=])[##$%^&+=]+$";
String Regex_combination_of_letters_and_specialcharachters = "^(?=.*[a-zA-Z])(?=.*[##$%^&+=])[a-zA-Z##$%^&+=]+$";
String Regex_combination_of_numbers_and_specialcharachters = "^(?=.*[0-9])(?=.*[##$%^&+=])[0-9##$%^&+=]+$";
String Regex_combination_of_letters_and_numbers_and_specialcharachters = "^(?=.*[a-zA-Z])(?=.*[0-9])(?=.*[##$%^&+=])[a-zA-Z0-9##$%^&+=]+$";
if(str.matches(Regex_combination_of_letters_and_numbers))
return true;
if(str.matches(Regex_just_letters))
return true;
if(str.matches(Regex_just_numbers))
return true;
if(str.matches(Regex_just_specialcharachters))
return true;
if(str.matches(Regex_combination_of_letters_and_specialcharachters))
return true;
if(str.matches(Regex_combination_of_numbers_and_specialcharachters))
return true;
if(str.matches(Regex_combination_of_letters_and_numbers_and_specialcharachters))
return true;
return false;
}
You can delete some conditions according to your taste

codingbat wordEnds using regex

I'm trying to solve wordEnds from codingbat.com using regex.
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
wordEnds("XYXY", "XY") → "XY"
This is the simplest as I can make it with my current knowledge of regex:
public String wordEnds(String str, String word) {
return str.replaceAll(
".*?(?=word)(?<=(.|^))word(?=(.|$))|.+"
.replace("word", java.util.regex.Pattern.quote(word)),
"$1$2"
);
}
replace is used to place in the actual word string into the pattern for readability. Pattern.quote isn't necessary to pass their tests, but I think it's required for a proper regex-based solution.
The regex has two major parts:
If after matching as few characters as possible ".*?", word can still be found "(?=word)", then lookbehind to capture any character immediately preceding it "(?<=(.|^))", match "word", and lookforward to capture any character following it "(?=(.|$))".
The initial "if" test ensures that the atomic lookbehind captures only if there's a word
Using lookahead to capture the following character doesn't consume it, so it can be used as part of further matching
Otherwise match what's left "|.+"
Groups 1 and 2 would capture empty strings
I think this works in all cases, but it's obviously quite complex. I'm just wondering if others can suggest a simpler regex to do this.
Note: I'm not looking for a solution using indexOf and a loop. I want a regex-based replaceAll solution. I also need a working regex that passes all codingbat tests.
I managed to reduce the occurrence of word within the pattern to just one.
".+?(?<=(^|.)word)(?=(.?))|.+"
I'm still looking if it's possible to simplify this further, but I also have another question:
With this latest pattern, I simplified .|$ to just .? successfully, but if I similarly tried to simplify ^|. to .? it doesn't work. Why is that?

Based on your solution I managed to simplify the code a little bit:
public String wordEnds(String str, String word) {
return str.replaceAll(".*?(?="+word+")(?<=(.|^))"+word+"(?=(.|$))|.+","$1$2");
}
Another way of writing it would be:
public String wordEnds(String str, String word) {
return str.replaceAll(
String.format(".*?(?="+word+")(?<=(.|^))"+word+"(?=(.|$))|.+",word),
"$1$2");
}

With this latest pattern, I simplified .|$ to just .? successfully, but if I similarly tried to simplify ^|. to .? it doesn't work. Why is that?
In Oracle's implementation, the behavior of look-behind is as follow:
By "studying" the regex (with study() method in each node), it knows the maximum length and minimum length of the pattern in look-behind group. (The study() method is what allows for obvious look-behind length)
It verifies the look-behind by starting a match at every position from index (current - min_length) to position (current - max_length) and exits early if the condition is satisfied.
Effectively, it will try to verify the look-behind on the shortest string first.
The implementation multiplies the matching complexity by O(k) factor.
This explains why changing ^|. to .? doesn't work: due to the starting position, it effectively checks for word before .word. The quantifier doesn't have a say here, since the ordering is imposed by the match range.
You can check the code of match method in Pattern.Behind and Pattern.NotBehind inner classes to verify what I said above.
In .NET's flavor, look-behind is likely implemented by the reverse matching feature, which means that no extra factor is incurred on the matching complexity.
My suspicion comes from the fact that the capturing group in (?<=(a+))b matches all a's in aaaaaaaaaaaaaab. The quantifier is shown to have free reign in look-behind group.
I have tested that ^|. can be simplified to .? in .NET and the regex works correctly.

I am working in .NET's regex but I was able to change your pattern to:
.+?(?<=(\w?)word)(?=(\w?))|.+
with the positive results. You know its a word (alphanumeric) type character, why not give a valid hint to the parser of that fact; instead of any character its an optional alpha numeric character.
It may answer why you don't need to specify the anchors of ^ and $, for what exactly is $ - is it \r or \n or other? (.NET has issues with $, and maybe you are not exactly capturing a Null of $, but the null of \r or \n which allowed you to change to .? for $)

Another solution to look at...
public String wordEnds(String str, String word) {
if(str.equals(word)) return "";
int i = 0;
String result = "";
int stringLen = str.length();
int wordLen = word.length();
int diffLen = stringLen - wordLen;
while(i<=diffLen){
if(i==0 && str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i+wordLen);
}else if(i==diffLen && str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i-1);
}else if(str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i-1) + str.charAt(i+wordLen) ;
}
i++;
}
if(result.length()==1) result = result + result;
return result;
}

Another possible solution:
public String wordEnds(String str, String word) {
String result = "";
if (str.contains(word)) {
for (int i = 0; i < str.length(); i++) {
if (str.startsWith(word, i)) {
if (i > 0) {
result += str.charAt(i - 1);
}
if ((i + word.length()) < str.length()) {
result += str.charAt(i + word.length());
}
}
}
}
return result;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding whether a string meets a certain pattern - java

Related

Regex to require one alphabet and one non-alphabet and no whitespace

Regular expression for phrase contain literals and numbers but is not all phrase as a number only with fixed range length

Use regex to replace sequences in a string with modified characters

Java - How to test if a String contains both letters and numbers

codingbat wordEnds using regex

Categories

Resources