In Java, I need to make sure a String only contains alphanumeric, space and dash characters.
I found the class org.apache.commons.lang.StringUtils and the almost adequate method isAlphanumericSpace(String)... but I also need to include dashes.
What is the best way to do this? I don't want to use Regular Expressions.
You could use:
StringUtils.isAlphanumericSpace(string.replace('-', ' '));
Hum... just program it yourself using String.chatAt(int), it's pretty easy...
Iterate through all char in the string using a position index, then compare it using the fact that ASCII characters 0 to 9, a to z and A to Z use consecutive codes, so you only need to check that character x numerically verifies one of the conditions:
between '0' and '9'
between 'a' and 'z'
between 'A and 'Z'
a space ' '
a hyphen '-'
Here is a basic code sample (using CharSequence, which lets you pass a String but also a StringBuilder as arg):
public boolean isValidChar(CharSequence seq) {
int len = seq.length();
for(int i=0;i<len;i++) {
char c = seq.charAt(i);
// Test for all positive cases
if('0'<=c && c<='9') continue;
if('a'<=c && c<='z') continue;
if('A'<=c && c<='Z') continue;
if(c==' ') continue;
if(c=='-') continue;
// ... insert more positive character tests here
// If we get here, we had an invalid char, fail right away
return false;
}
// All seen chars were valid, succeed
return true;
}
Just iterate through the string, using the character-class methods in java.lang.Character to test whether each character is acceptable or not. Which is presumably all that the StringUtils methods do, and regular expressions are just a way of driving a generalised engine to do much the same.
You have 1 of 2 options:
1. Compose a list of chars that CAN be in the string, then loop over the string checking to make sure each character IS in the list.
2. Compose a list of chars that CANNOT be in the string, then loop over the string checking to make sure each character IS NOT in the list.
Choose whatever option is quicker to compose the list.
Definitely use a regex expression. There's no point in writing your own system when a very comprehensive system in place for this exact task. If you need to learn about or brush up on regex then check out this website, it's great: http://regexr.com
I would challenge yourself on this one.
Related
I'm trying to write a code to count number of letters,characters,space and symbols in a String. But I don't know how to count Symbols.
Is there any such function available in java?
That very much depends on your definition of the term symbol.
A straight forward solution could be something like
Set<Character> SYMBOLS = Set.of('#', ' ', ....
for (int i=0; i < someString.length(); i++} {
if (SYMBOLS.contains(someString.charAt(i)) {
That iterates the chars someString, and checks each char whether it can be found within that predefined SYMBOLS set.
Alternatively, you could use a regular expression to define "symbols", or, you can rely on a variety of existing definitions. When you check the regex Pattern language for java, you can find
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
for example. And various other shortcuts that denote this or that set of characters already.
Please post what you have tried so far
If you need the count of individual characters - you better iterate the string and use a map to track the character with its count
Or
You can use a regex if just the overall count would enough like below
while (matcher.find() ) {count++}
One way of doing it would be to just iterate over the String and compare each character to their ASCII value
String str = "abcd!##";
for(int i=0;i<str.length();i++)
{
if(33==str.charAt(i))
System.out.println("Found !");
}
lookup here for ASCII values https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html
I want to create a program for checking whether any inputted character is a special character or not. The problem is that I hava no idea what to do: either check for special characters or check for the ASCII value. Can anyone tell me if I can just check for the numerical ASCII value using 'if' statement or if I need to check each special character?
You can use regex (Regular Expressions):
if (String.valueOf(character).matches("[^a-zA-Z0-9]")) {
//Your code
}
The code in the if statement will execute if the character is not alphanumeric. (whitespace will count as a special character.) If you don't want white space to count as a special character, change the string to "[^a-zA-Z0-9\\s]".
Further reading:
JavaDoc for the matches method
An excellent regex tutorial
More info about regex in Java
A regex builder (pointed out by #Wietlol)
You can use isLetter(char c) and isDigit(char c). You could do it like this:
char c;
//assign c in some way
if(!Character.isLetter(c) && !Character.isDigit(c)) {
//do something in case of special character
} else {
//do something for non-special character
}
EDIT: As pointed out in the comments it may be more viable to use isLetterOrDigit(char c) instead.
EDIT2: As ostrichofevil pointed out (which I did not think or know of when i posted the answer) this solution won't restrict "non-special" characters to A-Z, a-z and 0-9, but will include anything that is considered a letter or number in Unicode. This probably makes ostrichofevil's answer a more practical solution in most cases.
you can achieve it in this way :
char[] specialCh = {'!','#',']','#','$','%','^','&','*'}; // you can specify all special characters in this array
boolean hasSpecialChar = false;
char current;
for (Character c : specialCh) {
if (current == c){
hasSpecialChar = true;
}
}
I need to test whether character is a letter or a space before moving on further with processing. So, i
for (Character c : take.toCharArray()) {
if (!(Character.isLetter(c) || Character.isSpaceChar(c)))
continue;
data.append(c);
Once i examined the data, i saw that it contains characters which look like a unicode representation of characters from outside of Latin alphabet. How can i modify the above code to tighten my conditions to only accept letter characters which fall in range of [a-z][A-Z]?
Is Regex a way to go, or there is a better (faster) way?
If you specifically want to handle only those 52 characters, then just handle them:
public static boolean isLatinLetter(char c) {
return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}
If you just want to strip out non-ASCII letter characters, then a quick approach is to use String.replaceAll() and Regex:
s.replaceAll("[^a-zA-Z]", "")
Can't say anything about performance vs. a character by character scan and append to StringBuilder, though.
I'd use the regular expression you specified for this. It's easy to read and should be quite speedy (especially if you allocate it statically).
Consider the following as tokens:
+, -, ), (
alpha charactors and underscore
integer
Implement 1.getToken() - returns a string corresponding to the next token
2.getTokPos() - returns the position of the current token in the input string
Example input: (a+b)-21)
Output: (| a| +| b| )| -| 21| )|
Note: Cannot use the java string tokenizer class
Work in progress - Successfully tokenized +,-,),(. Need to figure out characters and numbers:
OUTPUT: +|-|+|-|(|(|)|)|)|(| |
java.util tokenizer is a deprecated class.
Tokenizing Strings in Java is much easier with "String.split()" since Java 1.4 :
String[] tokens = "(a+b)-21)".split("[+-)(]");
If it is a homework, you probably have to reimplement a "split" method:
read the String character by character
if the character is not a special char, add it to a buffer
when you encounter a special char, add the buffer content to a list and clear the buffer
Since it is (probably) a homework, I let you implement it.
Java lets you examine the characters in a String one by one with the charAt method. So use that in a for loop and examine each character. When you encounter a TOKEN you wrap that token with the pipes and any other character you just append to the output.
public static final char PLUS_TOKEN = '+';
// add all tokens as
public String doStuff(String input)
{
StringBuilder output = new StringBuilder();
for (int index = 0; index < input.length(); index++)
{
if (input.charAt(index) == PLUS_TOKEN)
{
// when you see a token you need to append the pipes (|) around it
output.append('|');
output.append(input.charAt(index);
output.append('|');
}
else if () //compare the current character with all tokens
else
{
// just add to new output
output.append(input.charAt(index);
}
}
return output.toString();
}
If it's not a homework assignment use String.split(). If is a homework assignment, say so and tag it so that we can give the appropriate level of help (I did so for you, just in case...).
Because the string needs to be cut in several different ways, not just on whitespace or parens, using the String.split method with any of the symbols there will not work. Split removes the character used as a seperator. You could try to split on the empty string, but this wouldn't get compound symbols, like 21. To correctly parse this string, you will need to effectively implement your own tokenizer. Try thinking about how you could tell you had a complete token if you looked at the string one character at a time. You could probably start a string that collects the characters until you have identified a complete token, and then you can remove the characters from the original and return the string. Starting from this point, you can probably make a basic tokenizer.
If you'd rather learn how to make a full strength tokenizer, most of them are defined by creating a regular expression that only matches the tokens.
I need to convert a string like
"string"
to
"*s*t*r*i*n*g*"
What's the regex pattern? Language is Java.
You want to match an empty string, and replace with "*". So, something like this works:
System.out.println("string".replaceAll("", "*"));
// "*s*t*r*i*n*g*"
Or better yet, since the empty string can be matched literally without regex, you can just do:
System.out.println("string".replace("", "*"));
// "*s*t*r*i*n*g*"
Why this works
It's because any instance of a string startsWith(""), and endsWith(""), and contains(""). Between any two characters in any string, there's an empty string. In fact, there are infinite number of empty strings at these locations.
(And yes, this is true for the empty string itself. That is an "empty" string contains itself!).
The regex engine and String.replace automatically advances the index when looking for the next match in these kinds of cases to prevent an infinite loop.
A "real" regex solution
There's no need for this, but it's shown here for educational purpose: something like this also works:
System.out.println("string".replaceAll(".?", "*$0"));
// "*s*t*r*i*n*g*"
This works by matching "any" character with ., and replacing it with * and that character, by backreferencing to group 0.
To add the asterisk for the last character, we allow . to be matched optionally with .?. This works because ? is greedy and will always take a character if possible, i.e. anywhere but the last character.
If the string may contain newline characters, then use Pattern.DOTALL/(?s) mode.
References
regular-expressions.info/Dot Matches (Almost) Any Character and Grouping and Backreferences
I think "" is the regex you want.
System.out.println("string".replaceAll("", "*"));
This prints *s*t*r*i*n*g*.
If this is all you're doing, I wouldn't use a regex:
public static String glitzItUp(String text) {
return insertPeriodically(text, "*", 1);
}
Putting char into a java string for each N characters
public static String insertPeriodically(
String text, String insert, int period)
{
StringBuilder builder = new StringBuilder(
text.length() + insert.length() * (text.length()/period)+1);
int index = 0;
while (index <= text.length())
{
builder.append(insert);
builder.append(text.substring(index,
Math.min(index + period, text.length())));
index += period;
}
return builder.toString();
}
Another benefit (besides simplicity) is that it's about ten times faster than a regex.
IDEOne | Working example
Just to be a jerk, I'm going to say use J:
I've spent a school year learning Java, and self-taught myself a bit of J over the course of the summer, and if you're going to be doing this for yourself, it's probably most productive to use J simply because this whole inserting an asterisk thing is easily done with one simple verb definition using one loop.
asterisked =: 3 : 0
i =. 0
running_String =. '*'
while. i < #y do.
NB. #y returns tally, or number of items in y: right operand to the verb
running_String =. running_String, (i{y) , '*'
i =. >: i
end.
]running_String
)
This is why I would use J: I know how to do this, and have only studied the language for a couple months loosely. This isn't as succinct as the whole .replaceAll() method, but you can do it yourself quite easily and edit it to your specifications later. Feel free to delete this/ troll this/ get inflamed at my suggestion of J, I really don't care: I'm not advertising it.