I'm trying to write a Regex expression that can determine if a string contains an odd number of " - quotation marks.
An answerer on this question has accomplished something very similar for determining if a string of letters contains an odd number of a certain letter. However I am having trouble adapting it to my problem.
What I have so far, but is not exactly working:
String regexp = "(\\b[^\"]*\"(([^\"]*\"){2})*[^\"]*\\b)";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher("bbacac");
if(matcher.find()){
System.out.println("Found");
}
else
System.out.println("Not Found");
Regex is a fairly poor solution for this. <-- I though you were talking about nesting, not pair matching.
Iterating over all characters in the string, counting instances of " would be a faster and more efficient way to achieve this.
int quoteCount = 0;
for(char ch : inputString.toCharArray())
{
if(ch == '"') quoteCount++;
}
boolean even = quoteCount % 2 == 0;
If you want a regex, this is simple to accomplish:
boolean oddQuotes = subjectString.matches("[^\"]*\"(?:[^\"]*\"[^\"]*\")*[^\"]*");
Explanation: (without all the Java quote escapes):
[^"]*" # Match any number of non-quote characters, then a quote
(?: # Now match an even number of quotes by matching:
[^"]*" # any number of non-quote characters, then a quote
[^"]*" # twice
)* # and repeat any number of times.
[^"]* # Finally, match any remaining non-quote characters
So far, this is probably slower than a simple "count the quotes" solution. But we can do one better: We can design the regex to also handle escaped quotes, i. e. not to count a quote if it's preceded by an odd number of backslashes:
boolean oddQuotes = subjectString.matches("(?:\\\\.|[^\\\\\"])*\"(?:(?:\\\\.|[^\\\\\"])*\"(?:\\\\.|[^\\\\\"])*\")*(?:\\\\.|[^\\\\\"])*");
Now admittedly, this looks horrible, but mainly because of Java's string escaping rules. The actual regex is straightforward:
(?: # Match either
\\. # an escaped character
| # or
[^\\"] # a character except backslash or quote
)* # any number of times.
" # Then match a quote.
(?: # The rest of the regex works just the same way (as above)
(?:\\.|[^\\"])*"
(?:\\.|[^\\"])*"
)*
(?:\\.|[^\\"])*
Don't use regex for this. Just iterate through the characters in the string and count the "". It's going to be a lot more efficient. It's an O(n) algorithm.
Especially if it's simple and make the solution a lot easier to read than some obscure regex pattern.
boolean odd = false;
for(int i=0; i<s.length(); i++) {
if(s.chartAt(i) == '\"') odd != odd;
}
Or, use a regex, replace everything except for quotation marks with empty strings, and check the length of the result.
You can use split and check if the nubmer of elements in the returned array is even or odd to gauge the odd or even-ness of that character's frequency
String s = ".. what ever is in your string";
String[] parts = s.split("\"");
if(parts.size()%2){
//String has odd number of quotes
}else{
//String has even number of quotes
}
I would have to say it probably better to just count the number of "s manually, but if you really want a regular expression, here is one that should work:
"(^(([^\"]*\"){2})*[^\"]*$)"
I just bound the expression to the front and back of the string and make sure there are only pairs of "s, blindly absorbing anything not a " between them.
Related
I'm trying to write a code to count number of letters,characters,space and symbols in a String. But I don't know how to count Symbols.
Is there any such function available in java?
That very much depends on your definition of the term symbol.
A straight forward solution could be something like
Set<Character> SYMBOLS = Set.of('#', ' ', ....
for (int i=0; i < someString.length(); i++} {
if (SYMBOLS.contains(someString.charAt(i)) {
That iterates the chars someString, and checks each char whether it can be found within that predefined SYMBOLS set.
Alternatively, you could use a regular expression to define "symbols", or, you can rely on a variety of existing definitions. When you check the regex Pattern language for java, you can find
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
for example. And various other shortcuts that denote this or that set of characters already.
Please post what you have tried so far
If you need the count of individual characters - you better iterate the string and use a map to track the character with its count
Or
You can use a regex if just the overall count would enough like below
while (matcher.find() ) {count++}
One way of doing it would be to just iterate over the String and compare each character to their ASCII value
String str = "abcd!##";
for(int i=0;i<str.length();i++)
{
if(33==str.charAt(i))
System.out.println("Found !");
}
lookup here for ASCII values https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html
I am attempting to only accept regular words with or without a hyphen in the middle of the word or an apostrophe in either the middle or at the end of the word. So any numeric string or one with any other special characters would be replaced with white-space. Also preceding white-space would not need to be accounted for as these strings would be read in from a file and already separated using white-space.
I.e. "0", "-hi", "hi-", and "'hello" would all be rejected.
However, "apple", "Ben's", "Ben'", and "well-respected" would be accepted.
I am trying to figure this out with Java's String replaceAll functionality. I'd like to know how to do this with a "simple" regular expression and also how to utilize a more advanced lookbehind/lookahead to achieve this.
So far, in regard to the RegEx, this is what I have attempted:
String tempString;
tempString = tempString.replaceAll("^([a-zA-Z]+(-)?[a-zA-Z]+)"," ");
tempString = tempString.replaceAll("^([a-zA-Z]+(')?[a-zA-Z]*)"," ");
//Basically if it does not meet this condition, replace w/ whitespace
As of right now, the syntax of the regular expressions are not even correct. Testing either of these two against the associated "non-accepted" words above will not replace them with " ". On top of this, I need to combine these two RegEx's into one to allow for proper overall functionality.
On a similar note, and as I understand, I can utilize a lookahead/lookbehind to achieve the desired result. However, after reading up on the process, I am confused as to the syntax that would be inserted into the replaceAll function.
So, my two questions are as follows:
What can I change in the RegEx's syntax to check for both hyphens and apostrophes in one replaceAll function call?
How can I utilize a lookahead/lookbehind to achieve the same goal?
Please note I am NOT looking for other solutions as I am trying to better understand RegEx's. Also this is my first question on here so apologies for any formatting issues or other dumb things.
Thanks!
This regex should works. But we must split the input files into words then do regex. because String.replaceAll searches sequences continues. e.g. '-apple', replaceAll will always skip the '-', then match 'apple'.
legal chars + ( ( - or ' ) and legal chars )``+ ( ( - or ' ) and legal chars ) + ...
#Test
public void test() {
Pattern pattern = Pattern.compile("([\\w]*[a-zA-Z][\\w]*)([-'][\\w]*[a-zA-Z][\\w]*)*");
Matcher m = pattern.matcher("0");
Assert.assertFalse( m.matches());
m = pattern.matcher("apple");
Assert.assertTrue( m.matches());
m = pattern.matcher("apple-");
Assert.assertFalse( m.matches());
m = pattern.matcher("-apple");
Assert.assertFalse( m.matches());
m = pattern.matcher("apple-a0");
Assert.assertTrue( m.matches());
m = pattern.matcher("Tom-Jerry's");
Assert.assertTrue( m.matches());
}
I need to validate an input string such that validation returns true only if the string contains one of the special characters # # $ %, only one, and one time at the most. Letters and numbers can be anywhere and can be repeated any number of times, but at least one number or letter should be present
For example:
a# : true
#a : true
a#$: false
a#n01 : true
an01 : false
a : false
# : false
I tried
[0-9A-Za-z]*[##%$]{1}[0-9A-Za-z]*
I was hoping this would match one occurrence of any of the special characters. But, no. I need only one occurrence of any one in the set.
I also tried alternation but could not solve it.
Vivek, your regex was really close. Here is the one-line regex you are looking for.
^(?=.*?[0-9a-zA-Z])[0-9a-zA-Z]*[##$%][0-9a-zA-Z]*$
See demo
How does it work?
The ^ and $ anchors ensure that whatever we are matching is the whole string, avoiding partial matches with forbidden characters later.
The (?=.*?[0-9a-zA-Z]) lookahead ensures that we have at least one number or letter.
The [0-9a-zA-Z]*[##$%][0-9a-zA-Z]* matches zero or more letters or digits, followed by exactly one character that is either a #, #, $ or %, followed by zero or more letters or digits—ensuring that we have one special character but no more.
Implementation
I am sure you know how to implement this in Java, but to test if the string match, you could use something like this:
boolean foundMatch = subjectString.matches("^(?=[0-9a-zA-Z]*[##$%][0-9a-zA-Z]*$)[##$%0-9a-zA-Z]*");
What was wrong with my regex?
Actually, your regex was nearly there. Here is what was missing.
Because you didn't have the ^ and $ anchors, the regex was able to match a subset of the string, for instance a# in a##%%, which means that special characters could appear in the string, but outside of the match. Not what you want: we need to validate the whole string by anchoring it.
You needed something to ensure that at least one letter or digit was present. You could definitely have done it with an alternation, but in this case a lookahead is more compact.
Alternative with Alternation
Since you tried alternations, for the record, here is one way to do it:
^(?:[0-9a-zA-Z]+[##$%][0-9a-zA-Z]*|[0-9a-zA-Z]*[##$%][0-9a-zA-Z]+)$
See demo.
Let me know if you have any questions.
I hope this answer will be useful for you, if not, it might be for future readers. I am going to make two assumptions here up front: 1) You do not need regex per se, you are programming in Java. 2) You have access to Java 8.
This could be done the following way:
private boolean stringMatchesChars(final String str, final List<Character> characters) {
return (str.chars()
.filter(ch -> characters.contains((char)ch))
.count() == 1);
}
Here I am:
Using as input a String and a List<Character> of the ones that are allowed.
Obtaining an IntStream (consisting of chars) from the String.
Filtering every char to only remain in the stream if they are in the List<Character>.
Return true only if the count() == 1, that is of the characters in List<Character>, exactly one is present.
The code can be used as:
String str1 = "a";
String str2 = "a#";
String str3 = "a##a";
String str4 = "a##a";
List<Character> characters = Arrays.asList('#', '#', '$', '%');
System.out.println("stringMatchesChars(str1, characters) = " + stringMatchesChars(str1, characters));
System.out.println("stringMatchesChars(str2, characters) = " + stringMatchesChars(str2, characters));
System.out.println("stringMatchesChars(str3, characters) = " + stringMatchesChars(str3, characters));
System.out.println("stringMatchesChars(str4, characters) = " + stringMatchesChars(str4, characters));
Resulting in false, true, false, false.
I am coding in Java here.
I know that the regex for matching any number or string of letter is
"(0|[1-9][0-9]*)(\\.[0-9]+)?|[a-zA-Z]+"
But I would like to match anything except letter or number, ie symbols like !, #, +, -
I tried doing [^.. ] but it doesn't work.
For example, let's say I want to do the opposite, ie return all parts of the string that contains numbers or strings of letters or #, I would do
public ArrayList<String> findMatch(String string){
ArrayList <String> outputArr = new ArrayList<String>();
Pattern p = Pattern.compile("(0|[1-9][0-9]*)(\\.[0-9]+)?|[a-zA-Z]+|\\#");
// recognizes number, string, and #
Matcher m = p.matcher(string)
while (m.find()) {
outputArr.add(m.group());
}
return outputArr;
}
Let's say I want to find the opposite of the code above, how can I change line 3?
You'll probably want to use just this:
\W+
That will match a string of any characters that aren't "word characters", defined as:
[a-zA-Z0-9_]
or "all letters, numbers, and underscore". If you want to include underscore, try the following:
[\W_]+
Or, if you'd rather have it explicit:
[^A-Za-z0-9]+
Which means "everything but letters and numbers".
Hope this helps.
The simplest regex pattern that you can use is : [^\w]+
This will match all the special characters which are neither numbers nor alphabets. Hope this helps. This is a sample Regex Tester with sample examples. You can test your regex for correctness over here. Hope this will help you.
From the example you have provided what I understand is, you want all the characters except alphabets, numbers and '#'.
In regex '\w' matches any alphabet(including underscore) and any number. So you need to negate this, to get other symbolic characters like '$,#' etc.
Below expression will solve your issue = [^\w#]+
'^' indicate negation symbol. Here '^\w' meaning 'match anything except alphabets or numbers'. I have also added '#' symbol in the expression as you need to ignore it as well.
Hope this will answer your question.
If you can give some more detail, what is your requirement? and what you expect?
It will help me to figure out the solution.
What you put in your query looks like you want to match special characters only. Am I right?
If so you can just try:
[^A-Za-z0-9][your quantifier here]
quantifier can be:
? for 0 or 1 frequency
+ for >=1 frequency
* for >=0 frequency
Suppose you have a String like
String s="shyuit6785%^7kui!#*&123f#$annds";
//And you want to find out the characters except alphabets and numerals . (I hope its your requirement)
Pattern p = Pattern.compile("[^A-Za-z0-9#]+");
Matcher m = p.matcher(s);
while (m.find())
{
System.out.println("Found a required character " + m.group() + " at index number " +m.start());
}
I made a regular expression for checking the length of String , all characters are numbers and start with number e.g 123
Following is my expression
REGEX =^123\\d+{9}$";
But it was unable to check the length of String. It validates those strings only their length is 9 and start with 123.
But if I pass the String 1234567891 it also validates it. But how should I do it which thing is wrong on my side.
Like already answered here, the simplest way is just removing the +:
^123\\d{9}$
or
^123\\d{6}$
Depending on what you need exactly.
You can also use another, a bit more complicated and generic approach, a negative lookahead:
(?!.{10,})^123\\d+$
Explanation:
This: (?!.{10,}) is a negative look-ahead (?= would be a positive look-ahead), it means that if the expression after the look-ahead matches this pattern, then the overall string doesn't match. Roughly it means: The criteria for this regular expression is only met if the pattern in the negative look-ahead doesn't match.
In this case, the string matches only if .{10} doesn't match, which means 10 or more characters, so it only matches if the pattern in front matches up to 9 characters.
A positive look-ahead does the opposite, only matching if the criteria in the look-ahead also matches.
Just putting this here for curiosity sake, it's more complex than what you need for this.
Try using this one:
^123\\d{6}$
I changed it to 6 because 1, 2, and 3 should probably still count as digits.
Also, I removed the +. With it, it would match 1 or more \ds (therefore an infinite amount of digits).
Based on your comment below Doorknobs's answer you can do this:
int length = 9;
String prefix = "123"; // or whatever
String regex = "^" + prefix + "\\d{ " + (length - prefix.length()) + "}$";
if (input.matches(regex)) {
// good
} else {
// bad
}