checking for specific characters in String using matches()

checking for specific characters in String using matches() - java

What im trying to do is reject any string that contains characters outside a-z, 0-9 or _
I tried using the match function below as id seen elsewhere but i cant get it to work correctly. It will either tell me the string is fine when its not, or it will tell me its not fine when it is.
public static Boolean checkc(String word) {
String w = word;
for (int i = 0; i < w.length(); i++) {
if (w.substring(i, i).matches("[A-Za-z0-9_]")) {
return true;
}
}
return false;
}
The logic might be wrong now because I've fiddled with it trying to get it working but to be fair, it wasnt working in the first place. Im checking a few things in the function thats calling this, so i just need to know if it string is fine given the rules.

The end index argument to substring is exclusive, so substring(i, i) always returns a 0 length string. You could fix this by using substring(i, i+1), but there's no reason to use a loop here. You can just use word.matches("[A-Za-z0-9_]+") and check the entire string at once. The regex quantifier + means "one or more". You could also use the quantifier * which means "zero or more", if the method should return true if the string is empty.
Edit: There's also another problem with your loop logic that I just noticed. Your conditional in the loop returns true the first time the condition is met:
for (...) {
if ( /* condition is met */ )
return true;
}
return false;
That logic only requires that the condition be met at least once, and then it returns true, but you probably meant the following:
for (...) {
if (! /* condition is met */ )
return false;
}
return true;
That requires that the condition be met for every character.

Try this:
public static boolean check(String word) {
return word.matches("[^a-zA-Z0-9_]+");
}
this method returns true when word string contains no single character described in the square bracket, ^ regex symbol means same as logical ! (in example !true == false). Plus symbol + after square bracket means that one symbol [] could repeat one or more time.
javadoc link to Pattern class (regex explanations and examples)
Regex101 convenient online regex debug tool

stringToCheck.String.matches("[^0-9a-zA-Z_]")
This will check whether string that needs to be matched contains any digits or alphabets and return a boolean value

Related

Using a recursive method to determine if a word is elf-ish

public static boolean Xish
This method should take in two parameters, in the following order: A String of the word to check and a String made up of the letters to check for. For example, a word is considered elf-ish, if it contains the letters e, l, and f, in any order (“waffle”, “rainleaf”) and a true return of the method would be Xish(“waffle”, ”elf”). If there are multiple occurrences of a letter to check for, it must occur multiple times in the search word. Return true if the word contains all the needed characters and false if it does not contain all the characters.
This is what I have so far, but I am lost how I would recall the method and check to see if there are multiple occurrences (2nd part).
public static boolean Xish(String check, String letters) {
String word = check;
String contains= letters;
if(word.indexOf(contains) >= 0)
return true;
else
return false;
}

Actually, doing this recursively will also take care of the multiple occurrences issue.
First, your own method is not really correct - it looks for the whole letters in the word. That is, if letters is elf, then true will be returned for self, but not for heartfelt, and that's wrong. You are supposed to look for the individual letters, because the order is not important.
For recursion:
If the letters is an empty string - return true. You can say that any word is fine if there are no restrictions.
If the check is an empty string - return false. An empty string does not contain the letters in letters (and we already know that letters is not empty).
Take the first letter in letters. Look for it in check. If it's not there, return false.
If it was there, then call the same method, but pass only what remains of check and letters. For example, if check was selfish and letters was elf, you found that e exists. Return the result of Xish("slfish","lf"). This will take care of the multiple occurrences. You do that by using substring and concatenating the applicable parts.
If multiple occurrences weren't an issue, you could pass the check as-is to the next level of the recursion. But since they matter, we need to remove one letter for each letter requested, to make sure that we don't match the same position again for the next occurrenc.

The title mentions a recursive function so I will propose a recursive solution.
For each character in your check string, compare it against the first character in your letters string.
If the compared characters are equivalent, remove the first character from your letters string and pass both strings back into your function.
If the check string is fully iterated without finding a character in the letters string, return false
If letters is empty at any point, return true
This is a brute force approach, and there are several other ways to accomplish what you are looking for. Maybe think about how you could check every character in your in you check string a single time?

public static boolean Xish(String check, String letters) {
boolean ish = true;
String word = check;
char[] contains= letters.toCharArray();
for(int i = 0; i < contains.length; i++){
if(word.indexOf(contains[i]) < 0){
ish = false;
}else {
StringBuilder sb = new StringBuilder(word);
sb.deleteCharAt(word.indexOf(contains[i]));
word = sb.toString();
// System.out.println(word);
}
}
return ish;
}
This could be one way, but it is not recursive.
Xish("Waffle", "elff") returns true, but
Xish("Waffle", "elfff") returns false.

Not sure whether it solves your question 100 %. But i tried a recursive method. See if this helps.
package com.company;
public class Selfish {
public static void main(String args[]) {
String check = "waffle";
String letters = "elf"; // "eof"
int xishCount = xish(check, letters, 0);
if(letters.length()== xishCount) {
System.out.println("TRUE");
}else{
System.out.println("FALSE");
}
}
static int xish(String check, String letters, int xishCount) {
if(letters.length() < 1) {
return 0;
}
if(check.contains(letters.substring(0, 1))) {
xishCount = 1;
}
return xishCount + xish(check, letters.substring(1, letters.length()), 0);
}
}

How to know if a string could match a regular expression by adding more characters

This is a tricky question, and maybe in the end it has no solution (or not a reasonable one, at least). I'd like to have a Java specific example, but if it can be done, I think I could do it with any example.
My goal is to find a way of knowing whether an string being read from an input stream could still match a given regular expression pattern. Or, in other words, read the stream until we've got a string that definitely will not match such pattern, no matter how much characters you add to it.
A declaration for a minimalist simple method to achieve this could be something like:
boolean couldMatch(CharSequence charsSoFar, Pattern pattern);
Such a method would return true in case that charsSoFar could still match pattern if new characters are added, or false if it has no chance at all to match it even adding new characters.
To put a more concrete example, say we have a pattern for float numbers like "^([+-]?\\d*\\.?\\d*)$".
With such a pattern, couldMatch would return true for the following example charsSoFar parameter:
"+"
"-"
"123"
".24"
"-1.04"
And so on and so forth, because you can continue adding digits to all of these, plus one dot also in the three first ones.
On the other hand, all these examples derived from the previous one should return false:
"+A"
"-B"
"123z"
".24."
"-1.04+"
It's clear at first sight that these will never comply with the aforementioned pattern, no matter how many characters you add to it.
EDIT:
I add my current non-regex approach right now, so to make things more clear.
First, I declare the following functional interface:
public interface Matcher {
/**
* It will return the matching part of "source" if any.
*
* #param source
* #return
*/
CharSequence match(CharSequence source);
}
Then, the previous function would be redefined as:
boolean couldMatch(CharSequence charsSoFar, Matcher matcher);
And a (drafted) matcher for floats could look like (note this does not support the + sign at the start, just the -):
public class FloatMatcher implements Matcher {
#Override
public CharSequence match(CharSequence source) {
StringBuilder rtn = new StringBuilder();
if (source.length() == 0)
return "";
if ("0123456789-.".indexOf(source.charAt(0)) != -1 ) {
rtn.append(source.charAt(0));
}
boolean gotDot = false;
for (int i = 1; i < source.length(); i++) {
if (gotDot) {
if ("0123456789".indexOf(source.charAt(i)) != -1) {
rtn.append(source.charAt(i));
} else
return rtn.toString();
} else if (".0123456789".indexOf(source.charAt(i)) != -1) {
rtn.append(source.charAt(i));
if (source.charAt(i) == '.')
gotDot = true;
} else {
return rtn.toString();
}
}
return rtn.toString();
}
}
Inside the omitted body for the couldMatch method, it will just call matcher.match() iteratively with a new character added at the end of the source parameter and return true while the returned CharSequence is equal to the source parameter, and false as soon as it's different (meaning that the last char added broke the match).

You can do it as easy as
boolean couldMatch(CharSequence charsSoFar, Pattern pattern) {
Matcher m = pattern.matcher(charsSoFar);
return m.matches() || m.hitEnd();
}
If the sequence does not match and the engine did not reach the end of the input, it implies that there is a contradicting character before the end, which won’t go away when adding more characters at the end.
Or, as the documentation says:
Returns true if the end of input was hit by the search engine in the last match operation performed by this matcher.
When this method returns true, then it is possible that more input would have changed the result of the last search.
This is also used by the Scanner class internally, to determine whether it should load more data from the source stream for a matching operation.
Using the method above with your sample data yields
Pattern fpNumber = Pattern.compile("[+-]?\\d*\\.?\\d*");
String[] positive = {"+", "-", "123", ".24", "-1.04" };
String[] negative = { "+A", "-B", "123z", ".24.", "-1.04+" };
for(String p: positive) {
System.out.println("should accept more input: "+p
+", couldMatch: "+couldMatch(p, fpNumber));
}
for(String n: negative) {
System.out.println("can never match at all: "+n
+", couldMatch: "+couldMatch(n, fpNumber));
}
should accept more input: +, couldMatch: true
should accept more input: -, couldMatch: true
should accept more input: 123, couldMatch: true
should accept more input: .24, couldMatch: true
should accept more input: -1.04, couldMatch: true
can never match at all: +A, couldMatch: false
can never match at all: -B, couldMatch: false
can never match at all: 123z, couldMatch: false
can never match at all: .24., couldMatch: false
can never match at all: -1.04+, couldMatch: false
Of course, this doesn’t say anything about the chances of turning a nonmatching content into a match. You could still construct patterns for which no additional character could ever match. However, for ordinary use cases like the floating point number format, it’s reasonable.

I have no specific solution, but you might be able to do this with negations.
If you setup regex patterns in a blacklist that definitely do not match with your pattern (e.g. + followed by char) you could check against these. If a blacklisted regex returns true, you can abort.
Another idea is to use negative lookaheads (https://www.regular-expressions.info/lookaround.html)

Java: How to implement wildcard matching?

I'm researching on how to find k values in the BST that are closest to the target, and came across the following implementation with the rules:
'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).
The matching should cover the entire input string (not partial).
The function prototype should be:
bool isMatch(const char *s, const char *p)
Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "cab") → false
Code:
import java.util.*;
public class WildcardMatching {
boolean isMatch(String s, String p) {
int i=0, j=0;
int ii=-1, jj=-1;
while(i<s.length()) {
if(j<p.length() && p.charAt(j)=='*') {
ii=i;
jj=j;
j++;
} else if(j<p.length() &&
(s.charAt(i) == p.charAt(j) ||
p.charAt(j) == '?')) {
i++;
j++;
} else {
if(jj==-1) return false;
j=jj;
i=ii+1;
}
}
while(j<p.length() && p.charAt(j)=='*') j++;
return j==p.length();
}
public static void main(String args[]) {
String s = "aab";
String p = "a*";
WildcardMatching wcm = new WildcardMatching();
System.out.println(wcm.isMatch(s, p));
}
}
And my question is, what's the reason for having two additional indexes, ii and jj, and why do they get initialized with -1? What's the purpose of each? Wouldn't traversing it with i and j be enough?
And what's the purpose of ii=i; and jj=j; in the first if case, and i=ii+1; and j=jj; in the third if case?
Lastly, in what case would you encounter while(j<p.length() && p.charAt(j)=='*') j++;?
Examples would be extremely helpful in understanding.
Thank you in advance and will accept answer/up vote.

It looks like ii and jj are used to handle the wildcard "*", which matches to any sequence. Their initialization to -1 acts as a flag: it tells us if we've hit an unmatched sequence and are not currently evaluating a "*". We can walk through your examples one at a time.
Notice that i is related to the parameter s (the original string) and j is related to the parameter p (the pattern).
isMatch("aa","a"):
this returns false because the j<p.length() statement will fail before we leave the while loop, since the length of p ("a") is only 1 whereas the length of s ("aa") is 2, so we'll jump to the else block. This is where the -1 initialization comes in: since we never saw any wildcards in p, jj is still -1, indicating that there's no way the strings can match, so we return false.
isMatch("aa","aa"):
s and p are exactly the same, so the program repeatedly evaluates the else-if block with no problems and finally breaks out of the while loop once i equals 2 (the length of "aa"). The second while loop never runs, since j is not less than p.length() - in fact, since the else-if increments i and j together, they are both equal to 2, and 2 is not less than the length of "aa". We return j == p.length(), which evaluates to 2 == 2, and get true.
isMatch("aaa","aa"): this one fails for the same reason as the first. Namely, the strings are not the same length and we never hit a wildcard character.
isMatch("aa","*"): this is where it gets interesting. First we'll enter the if block, since we've seen a "*" in p. We set ii and jj to 0 and increment j only. On the second iteration, j<p.length() fails, so we jump to the else block. jj is not -1 anymore (it's 0), so we reset j to 0 and set i to 0+1. This basically allows us to keep evaluating the wildcard, since j just gets reset to jj, which holds the position of the wildcard, and ii tells us where to start from in our original string. This test case also explains the second while loop. In some cases our pattern may be much shorter than the original string, so we need to make sure it's matched up with wildcards. For example, isMatch("aaaaaa","a**") should return true, but the final return statement is checking to see if j == p.length(), asking if we checked the entire pattern. Normally we would stop at the first wildcard, since it matches anything, so we need to finally run through the rest of the pattern and make sure it only contains wildcards.
From here you can figure out the logic behind the other test cases. I hope this helped!

Lets look at this a bit out of order.
First, this is a parallel iteration of the string (s) and the wildcard pattern (p), using variable i to index s and variable j to index p.
The while loop will stop iterating when end of s is reached. When that happens, hopefully end of p has been reached too, in while case it'll return true (j==p.length()).
If however p ends with a *, that is also valid (e.g. isMatch("ab", "ab*")), and that's what the while(j<p.length() && p.charAt(j)=='*') j++; loop ensures, i.e. any * in the pattern at this point is skipped, and if that reaches end of p, then it returns true. If end of p is not reached, it returns false.
That was the answer to your last question. Now lets look at the loop. The else if will iterate both i and j as long as there is a match, e.g. 'a' == 'a' or 'a' == '?'.
When a * wildcard is found (first if), it saves the current positions in ii and jj, in case backtracking becomes necessary, then skips the wildcard character.
This basically starts by assuming the wildcard matches the empty string (e.g. isMatch("ab", "a*b")). When it continues iterating, the else if will match the rest and method ends up returning true.
Now, if a mismatch is found (the else block), it will try to backtrack. Of course, if it doesn't have a saved wildcard (jj==-1), it can't backtrack, so it just returns false. That's why jj is initialized to -1, so it can detect if a wildcard was saved. ii could be initialized to anything, but is initialized to -1 for consistency.
If a wildcard position was saved in ii and jj, it will restore those values, then forward i by one, i.e. assuming that if the next character is matched against the wildcard, the rest of the matching will succeed and return true.
That's the logic. Now, it could be optimized a tiny bit, because that backtracking is sub-optimal. It currently resets j back to the *, and i back to the next character. When it loops around, it will enter the if and save the save value again in jj and save the i value in ii, and then increment j. Since that is a given (unless end of s is reached), the backtracking could just do that too, saving an iteration loop, i.e.
} else {
if(jj==-1) return false;
i=++ii;
j=jj+1;
}

The code looks buggy to me. (See below)
The ostensible purpose of ii and jj is to implement a form of backtracking.
For example, when you try to match "abcde" against the pattern "a*e", the algorithm will first match the "a" in the pattern against the "a" in the the input string. Then it will eagerly match the "*" against the rest of the string ... and find that it has made a mistake. At that point, it needs to backtrack and try an alternative
The ii and jj are to record the point to backtrack to, and the uses those variables are either recording a new backtrack point, or backtracking.
Or at least, that was probably the author's intent at some point.
The while(j<p.length() && p.charAt(j)=='*') j++; seems to be dealing with an edge-case
However, I don't think this code is correct.
It certainly won't cope with backtracking in the case where there are multiple "*" wildcards in the pattern. That requires a recursive solution.
The part:
if(j<p.length() && p.charAt(j)=='*') {
ii=i;
jj=j;
j++;
doesn't make much sense. I'd have thought it should increment i not j. It might "mesh" with the behavior of the else part, but even if it does this is a convoluted way of coding this.
Advice:
Don't use this code as an example. Even if it works (in a limited sense) it is not a good way to do this task, or an example of clarity or good style.
I would handle this by translating the wildcard pattern into a regex and then using Pattern / Matcher to do the matching.
For example: Wildcard matching in Java

I know you are asking about BST, but to be honest there is also a way of doing that with regex (not for competitive programming, but stable and fast enough be used in a production environment):
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class WildCardMatcher{
public static void main(String []args){
// Test
String urlPattern = "http://*.my-webdomain.???",
urlToMatch = "http://webmail.my-webdomain.com";
WildCardMatcher wildCardMatcher = new WildCardMatcher(urlPattern);
System.out.printf("\"%s\".matches(\"%s\") -> %s%n", urlToMatch, wildCardMatcher, wildCardMatcher.matches(urlToMatch));
}
private final Pattern p;
public WildCardMatcher(final String urlPattern){
Pattern charsToEscape = Pattern.compile("([^*?]+)([*?]*)");
// here we need to escape all the strings that are not "?" or "*", and replace any "?" and "*" with ".?" and ".*"
Matcher m = charsToEscape.matcher(urlPattern);
StringBuffer sb = new StringBuffer();
String replacement, g1, g2;
while(m.find()){
g1 = m.group(1);
g2 = m.group(2);
// We first have to escape pattern (original string can contain charachters that are invalid for regex), then escaping the '\' charachters that have a special meaning for replacement strings
replacement = (g1 == null ? "" : Matcher.quoteReplacement(Pattern.quote(g1))) +
(g2 == null ? "" : g2.replaceAll("([*?])", ".$1")); // simply replacing "*" and "?"" with ".*" and ".?"
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
p = Pattern.compile(sb.toString());
}
#Override
public String toString(){
return p.toString();
}
public boolean matches(final String urlToMatch){
return p.matcher(urlToMatch).matches();
}
}
There is still a list of optimizations that you can implement (lowecase / uppercase distinction, setting a max-length to the string being checked to prevent attackers to make you check against a 4-GigaByte-String, ...).

Checking to see if a string is letters + spaces ONLY?

I want to write a static method that is passed a string and that checks to see if the string is made up of just letters and spaces. I can use String's methods length() and charAt(i) as needed..
I was thinking something like the following: (Sorry about the pseudocode)
public static boolean onlyLettersSpaces(String s){
for(i=0;i<s.length();i++){
if (s.charAt(i) != a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z) {
return false;
break;
}else {
return true;
}
}
I know there is probably an error in my coding, and there is probably a much easier way to do it, but please let me know your suggestions!

use a regex. This one only matches if it starts with, contains, and ends with only letters and spaces.
^[ A-Za-z]+$
In Java, initialize this as a pattern and check if it matches your strings.
Pattern p = Pattern.compile("^[ A-Za-z]+$");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();

That isn't how you test character equality, one easy fix would be
public static boolean onlyLettersSpaces(String s){
for(i=0;i<s.length();i++){
char ch = s.charAt(i);
if (Character.isLetter(ch) || ch == ' ') {
continue;
}
return false;
}
return true;
}

For the constraints your mentioned (use of only length() and charAt()), you got it almost right.
You do loop over each character and check if its one of the acceptable characters - thats the right way. If you find a non-acceptable character, you immediately return "false", thats also good. Whats wrong is that if you determined to accept the character, you do return true. But the definition says only to return true if all characters are accepted. You need to move the "return true" to after the loop (thats the point at which you will know that all characters were accepted)
So you change your pseudocode to:
for (all characters in string) {
if (character is bad) {
// one bad character means reject the string, we're done.
return false;
}
}
// we now know all chars are good
return true;

How to validate phone number(US format) in Java?

I just want to know where am i wrong here:
import java.io.*;
class Tokens{
public static void main(String[] args)
{
//String[] result = "this is a test".split("");
String[] result = "4543 6546 6556".split("");
boolean flag= true;
String num[] = {"0","1","2","3","4","5","6","7","8","9"};
String specialChars[] = {"-","#","#","*"," "};
for (int x=1; x<result.length; x++)
{
for (int y=0; y<num.length; y++)
{
if ((result[x].equals(num[y])))
{
flag = false;
continue;
}
else
{
flag = true;
}
if (flag == true)
break;
}
if (flag == false)
break;
}
System.out.println(flag);
}
}

If this is not homework, is there a reason you're avoiding regular expressions?
Here are some useful ones: http://regexlib.com/DisplayPatterns.aspx?cattabindex=6&categoryId=7
More generally, your code doesn't seem to validate that you have a phone number, it seems to merely validate that your strings consists only of digits. You're also not allowing any special characters right now.

Asides from the regex suggestion (which is a good one), it would seem to make more sense to deal with arrays of characters rather than single-char Strings.
In particular, the split("") call (shudder) could/should be replaced by toCharArray(). This lets you iterate over each individual character, which more clearly indicates your intent, is less prone to bugs as you know you're treating each character at once, and is more efficient*. Likewise your valid character sets should also be characters.
Your logic is pretty strangely expressed; you're not even referencing the specialChars set at all, and the looping logic once you've found a match seems odd. I think this is your bug; the matching seems to be the wrong way round in that if the character matches the first valid char, you set flag to false and continue round the current loop; so it will definitely not match the next valid char and hence you break out of the loop with a true flag. Always.
I would have thought something like this would be more intuitive:
private static final Set<Character> VALID_CHARS = ...;
public boolean isValidPhoneNumber(String number)
{
for (char c : number,toCharArray())
{
if (!VALID_CHARS.contains(c))
{
return false;
}
}
// All characters were valid
return true;
}
This doesn't take sequences into account (e.g. the strings "--------** " and "1" would be valid because all individual characters are valid) but then neither does your original code. A regex is better because it lets you specify the pattern, I supply the above snippet as an example of a clearer way of iterating through the characters.
*Yes, premature optimization is the root of all evil, but when better, cleaner code also happens to be faster that's an extra win for free.

Maybe this is overkill, but with a grammar similar to:
<phone_numer> := <area_code><space>*<local_code><space>*<number> |
<area_code><space>*"-"<space>*<local_code><space>*"-"<space>*<number>
<area_code> := <digit><digit><digit> |
"("<digit><digit><digit>")"
<local_code> := <digit><digit><digit>
<number> := <digit><digit><digit><digit>
you can write a recursive descent parser. See this page for an example.

You can checkout the Pattern class in Java, really easy to work with regular expression using this class:
https://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

checking for specific characters in String using matches() - java

stringToCheck.String.matches("[^0-9a-zA-Z_]") This will check whether string that needs to be matched contains any digits or alphabets and return a boolean value

Related

Using a recursive method to determine if a word is elf-ish

How to know if a string could match a regular expression by adding more characters

Java: How to implement wildcard matching?

Checking to see if a string is letters + spaces ONLY?

How to validate phone number(US format) in Java?

Categories

Resources