I am trying to write one regular expression for string. Let us say there is a string RBY_YBR where _ represents empty so we can recursively replace the alphabets and _ and the result is RRBBYY_ . There can be two or more alphabet pairs can be formed or something like this also RRR .
Conditions
1). Left or right alphabet should be the same.
2). If there is no _ then the alphabet should be like RRBBYY not RBRBYY or RBYRBY etc.
3). There can be more than one underscore _ .
From regular expression I am trying to find whether the given string can satisfy the regular expression or not by replacing the character with _ to form a pattern of consecutive alphabets
The regular expression which I wrote is
String regEx = "[A-ZA-Z_]";
But this regular expression is failing for RBRB. since there is no empty space to replace the characters and RBRB is also not in a pattern.
How could I write the effective regular expression to solve this.
Ok, as I understand it, a matching string shall either consist only of same characters being grouped together, or must contain at least one underscore.
So, RRRBBR would be invalid, while RRRRBB, RRRBBR_, and RRRBB_R_ would all be valid.
After comment of question creator, additional condition: Every character must occur 0 or 2 or more times.
As far as I know, this is not possible with Regular Expressions, as Regular Expressions are finite-state machines without "storage". You would have to "store" each character found in the string to check that it won't appear later again.
I would suggest a very simple method for verifying such strings:
public static boolean matchesMyPattern(String s) {
boolean withUnderscore = s.contains("_");
int[] found = new int[26];
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if (ch != '_' && (ch < 'A' || ch > 'Z')) {
return false;
}
if (ch != '_' && i > 0 && s.charAt(i - 1) != ch && found[ch - 'A'] > 0
&& !withUnderscore) {
return false;
}
if (ch != '_') {
found[ch - 'A']++;
}
}
for (int i = 0; i < found.length; i++) {
if (found[i] == 1) {
return false;
}
}
return true;
}
Please take my answer with a grain of salt, since it's a bit of a "Fastest gun in the West" post.
It follows the same assumptions as Florian Albrecht's answer. (thanks)
I believe that this will solve your problem:
(([A-Za-z])(\2|_)+)+
https://regex101.com/r/7TfSVc/1
It works by using the second capturing group and ensuring that more of it follow, or there are underscores.
Known bug: it does not work if an underscore starts a string.
EDIT
This one is better, though I forgot what I was doing by the end of it.
(([A-Za-z_])(\2|_)+|_+[A-Za-z]_*)+
https://regex101.com/r/7TfSVc/4
Related
I'm very new to Java so please bear with me.
My assignment:
Ask the user to input a password and write a message stating whether or not it is acceptable. The password requirements:
the password is at least 8 characters long
it has upper case and lower case letters
at least one letter is followed by a number
it has one of the special characters $#?!_-=%
I really dont now what to do on number 3 and 4. Ive read something about regex but we didnt even had that in class. are there any other possible methods?
For number 3 you can use the cycle. Inside it, you can catch every letter via isLetter() method and then check the following element of your array by isDigit() method
boolean isLetterFollowedByNumber;
for (int[] a : nameOfYourArray) {
if (Character.isLetter(array[i]) && Character.isLetter(array[i])) {
isLetterFollowedByNumber = true;
}
}
For number 4 you can just compare every element of your array of char with special characters
boolean hasCharacter;
for (int[] a : nameOfYourArray) {
if (a == '$' || a == '#' || a == '?' || a == '!' || a == '_'- || a == '=' || a == '%') {
hasCharacter = true;
}
}
Both of my examples include for-each loop, but you can use for loop as well. Good luck with your task!
I want to create a regex in Java to match at least 1 alphabet and 1 non-alphabet (could be anything except A-Za-z) and no white space.
Below Regex is working partially correct:
^([A-Za-z]{1,}[^A-Za-z]{1,})+$
It matches aaaa7777
but doesn't match 777aaaaa.
Any Help would be appreciated.
Your regex implicitly assumes the order of the characters you want to match. The regex is saying that a letter must come before a non-latter. However, you want the letter and the non-letter to come in either order, so you need to account for both cases. Also note that it should be [^\sa-zA-Z] instead of [^a-zA-Z] as you don't allow spaces.
(?:[a-zA-Z][^\sa-zA-Z]|[^\sa-zA-Z][a-zA-Z])
At the start and end, any non-space character is allowed, so:
^\S*(?:[a-zA-Z][^\sa-zA-Z]|[^\sa-zA-Z][a-zA-Z])\S*$
You may use
s.matches("(?=\\P{Alpha}*\\p{Alpha})(?=\\p{Alpha}*\\P{Alpha})\\S*")
This is how the pattern works.
Details
The pattern will match a whole string since ^ and \z anchors are implicit in matches
(?=\P{Alpha}*\p{Alpha}) - a lookahead that requires at least one ASCII letter after any 0+ chars other than an ASCII letter
(?=\p{Alpha}*\P{Alpha}) - a lookahead that requires a char other than an ASCII letter after 0 or more ASCII letters
\S* - zero or more non-whitespace chars.
To make the regex Unicode aware replace \p{Alpha} with \p{L} and \P{Alpha} with \P{L}.
Regular expressions aren't the right tool for this type of validation. Just write out the plain logic, your specific example:
public class Main {
public static void main(String[] args) {
System.out.println("'foo' ? " + doesMatch("foo"));
System.out.println("'bar7' ? " + doesMatch("bar7"));
System.out.println("'55baz' ? " + doesMatch("55baz"));
}
public static boolean doesMatch(String input) {
boolean hasAlpha = false,
hasNonAlpha = false;
for(char ch : input.toCharArray()) {
if(ch >= 'a' && ch <= 'z' || ch >= 'A' && ch <= 'Z') {
hasAlpha = true;
} else {
hasNonAlpha = true;
}
if(hasAlpha && hasNonAlpha) {
return true;
}
}
return false;
}
}
Anyone can understand what inputs do match and which inputs don't. If you use regular expressions this wouldn't be so simple.
I'm looking for patterns like "tip" and "top" in the string -- length-3, starting with 't' and ending with 'p'. The goal is to return a string where for all such words, the middle letter is gone. So for example, "tipXtap" yields "tpXtp".
So far, I've thought about using recursion, and the replace() method, but am not sure if that is the best way to approach this problem.
Here is my code thus far:
String result = "";
if(str.length() < 3)
return str;
for(int i = 0; i <= str.length() - 2; i++){
if(str.charAt(i) == 't' && str.charAt(i + 2) == 'p'){
str.replaceAll(str.substring(i + 1, i + 2), "");
}
return str;
}
return str;
Use this Java code:
String str = "tipXtap";
str = str.replaceAll("t.p", "tp");
This uses regular expressions and the String.replaceAll function. The . (dot) character is a regex metacharacter that matches any single character.
One way of doing this.
Convert the String to a char array.
Use if conditions to validate first and third letter from the first letter. First look whether a char of a String is T and then check the char two chars away is a 'p'. You have to do this inside a loop traversing the char array.
If the validation condition is true, remove the middle element. You will have to move the element in the char array.
Convert the char array to a String and return it.
Hope this helps.
Here's a JavaScript solution to this problem using regular expressions:
foo = 'tipXtop'
foo.replace(/t\wp/g, 'tp')
The \w regex operator matches a word character like a-z, A-Z, 0-9 or _.
The g regex flag will match all instances of the regex in the string.
I know it might be another topic about regexes, but despite I searched it, I couldn't get the clear answer. So here is my problem- I have a string like this:
{1,2,{3,{4},5},{5,6}}
I'm removing the most outside parentheses (they are there from input, and I don't need them), so now I have this:
1,2,{3,{4},5},{5,6}
And now, I need to split this string into an array of elements, treating everything inside these parentheses as one, "seamless" element:
Arr[0] 1
Arr[1] 2
Arr[2] {3,{4},5}
Arr[3] {5,6}
I have tried doing it using lookahead but so far, I'm failing (miserably). What would be the neatest way of dealing with those things in terms of regex?
You cannot do this if elements like this should be kept together: {{1},{2}}. The reason is that a regex for this is equivalent to parsing the balanced parenthesis language. This language is context-free and cannot be parsed using a regular expression. The best way to handle this is not to use regex but use a for loop with a stack (the stack gives power to parse context-free languages). In pseudo code we could do:
for char in input
if stack is empty and char is ','
add substring(last, current position) to output array
last = current index
if char is '{'
push '{' on stack
if char is '}'
pop from stack
This pseudo code will construct the array as desired, note that it's best to loop over the indexes of the chars in the given string as you'll need those to determine the boundaries of the substrings to add to the array.
Almost near to the requirement. Running out of time. Will complete rest later (A single comma is incorrect).
Regex: ,(?=[^}]*(?:{|$))
To check regex validity: Go to http://regexr.com/
To implement this pattern in Java, there is a slight difference. \ needs to be added before { and }.
Hence, regex for Java Input: ,(?=[^\\}]*(?:\\{|$))
String numbers = {1,2,{3,{4},5},{5,6}};
numbers = numbers.substring(1, numbers.length()-1);
String[] separatedValues = numbers.split(",(?=[^\\}]*(?:\\{|$))");
System.out.println(separatedValues[0]);
Could not figure out a regex solution, but here's a non-regex solution. It involves parsing numbers (not in curly braces) before each comma (unless its the last number in the string) and parsing strings (in curly braces) until the closing curly brace of the group is found.
If regex solution is found, I'd love to see it.
public static void main(String[] args) throws Exception {
String data = "1,2,{3,{4},5},{5,6},-7,{7,8},{8,{9},10},11";
List<String> list = new ArrayList();
for (int i = 0; i < data.length(); i++) {
if ((Character.isDigit(data.charAt(i))) ||
// Include negative numbers
(data.charAt(i) == '-') && (i + 1 < data.length() && Character.isDigit(data.charAt(i + 1)))) {
// Get the number before the comma, unless it's the last number
int commaIndex = data.indexOf(",", i);
String number = commaIndex > -1
? data.substring(i, commaIndex)
: data.substring(i);
list.add(number);
i += number.length();
} else if (data.charAt(i) == '{') {
// Get the group of numbers until you reach the final
// closing curly brace
StringBuilder sb = new StringBuilder();
int openCount = 0;
int closeCount = 0;
do {
if (data.charAt(i) == '{') {
openCount++;
} else if (data.charAt(i) == '}') {
closeCount++;
}
sb.append(data.charAt(i));
i++;
} while (closeCount < openCount);
list.add(sb.toString());
}
}
for (int i = 0; i < list.size(); i++) {
System.out.printf("Arr[%d]: %s\r\n", i, list.get(i));
}
}
Results:
Arr[0]: 1
Arr[1]: 2
Arr[2]: {3,{4},5}
Arr[3]: {5,6}
Arr[4]: -7
Arr[5]: {7,8}
Arr[6]: {8,{9},10}
Arr[7]: 11
I recently had an interview with Google for a Software Engineering position and the question asked regarded building a pattern matcher.
So you have to build the
boolean isPattern(String givenPattern, String stringToMatch)
Function that does the following:
givenPattern is a string that contains:
a) 'a'-'z' chars
b) '*' chars which can be matched by 0 or more letters
c) '?' which just matches to a character - any letter basically
So the call could be something like
isPattern("abc", "abcd") - returns false as it does not match the pattern ('d' is extra)
isPattern("a*bc", "aksakwjahwhajahbcdbc"), which is true as we have an 'a' at the start, many characters after and then it ends with "bc"
isPattern("a?bc", "adbc") returns true as each character of the pattern matches in the given string.
During the interview, time being short, I figured one could walk through the pattern, see if a character is a letter, a * or a ? and then match the characters in the given string respectively. But that ended up being a complicated set of for-loops and we didn't manage to come to a conclusion within the given 45 minutes.
Could someone please tell me how they would solve this problem quickly and efficiently?
Many thanks!
Assuming you are allowed to use regexes, you could have written something like:
static boolean isPattern(String givenPattern, String stringToMatch) {
String regex = "^" + givenPattern.replace("*", ".*").replace("?", ".") + "$";
return Pattern.compile(regex).matcher(stringToMatch).matches();
}
"^" is the start of the string
"$" is the end of the string
. is for "any character", exactly once
.* is for "any character", 0 or more times
Note: If you want to restrict * and ? to letters only, you can use [a-zA-Z] instead of ..
boolean isPattern(String givenPattern, String stringToMatch) {
if (givenPattern.empty)
return stringToMatch.isEmpty();
char patternCh = givenPatter.charAt(0);
boolean atEnd = stringToMatch.isEmpty();
if (patternCh == '*') {
return isPattenn(givenPattern.substring(1), stringToMatch)
|| (!atEnd && isPattern(givenPattern, stringToMatch.substring(1)));
} else if (patternCh == '?') {
return !atEnd && isPattern(givenPattern.substring(1),
stringToMatch.substring(1));
}
return !atEnd && patternCh == stringToMatch.charAt(0)
&& isPattern(givenPattern.substring(1), stringToNatch.subtring(1);
}
(Recursion being easiest to understand.)