Regular Expression to restrict some special characters - java

I am trying to write regular expression to restrict some characters. The character to restrict is based on the requirement from various users.
I am trying to use this regex - [(char1|char2|char3|...)$]
Note: Each char will be from requirement.
If the user entered string matches any of the character i ll return true. Now,
what I want to know is weather this expression will work for all the conditions?
For example - requirement1 = .:, requirement2 = .:&%
I will concatinate | in between each char and then i will generate regular expression in java. This is working for my requirement1 but not for requirement2.
my sample java code
String requirement = ":>&%";
String regExp1 = null;
for (int i = 0; i < requirement.length(); i++) {
regExp1 = "[(" + requirement.charAt(i);
if (i - 1 != requirement.length()) {
regExp1.concat("|");
}
}
if (regExp1 != null) {
regExp1.concat(")]$");
}
Pattern p = Pattern.compile(regExp);
Matcher m = p.matcher(arg);
if (m.find())
return true;
else
return false;
How can I generate standard regular expression?

If you want "one of these characters" the brackets are good enough. No need for parenthesis and pipes.
Something like this : [.:,] and [.:&%] may work. If want them one or more times you have to had + at the end of your regex (ie: [.:&%]+).
As said in the comments, beware of special chars (like the dot, which means any chars in regex).

Related

How to split a string and save the 2 characters that I split with?

I am trying to split a given string using the java split method while the string should be devided by two different characters (+ and -) and I am willing to save the characters inside the array aswell in the same index the string has been saven.
for example :
input : String s = "4x^2+3x-2"
output :
arr[0] = 4x^2
arr[1] = +3x
arr[2] = -2
I know how to get the + or - characters in a different index between the numbers but it is not helping me,
any suggestions please?
You can face this problem in many ways. I´m sure there are clever and fancy ways to split this expression. I will show you the simplest problem-solving process that can help you.
State the problem you need to solve, the input and output
Problem: Split a math expression into subexpressions at + and - signals
Input: 4x^2+3x-2
Output: 4x^2,+3x,-2
Create a pseudo code with some logic you might think works
Given an expression string
Create an empty list of expressions
Create a subExpression string
For each character in the expression
Check if the character is + ou - then
add the subExpression in the list and create a new empty subexpression
otherwise, append the character in the subExpression
In the end, add the left subexpression in the list
Implement the pseudo-code in the programming language of your choice
String expression = "4x^2+3x-2";
List<String> expressions = new ArrayList();
StringBuilder subExpression = new StringBuilder();
for (int i = 0; i < expression.length(); i++) {
char character = expression.charAt(i);
if (character == '-' || character == '+') {
expressions.add(subExpression.toString());
subExpression = new StringBuilder(String.valueOf(character));
} else {
subExpression.append(String.valueOf(character));
}
}
expressions.add(subExpression.toString());
System.out.println(expressions);
Output
[4x^2, +3x, -2]
You will end with one algorithm that works for your problem. You can start to improve it.
Try this code:
String s = "4x^2+3x-2";
s = s.replace("+", "#+");
s = s.replace("-", "#-");
String[] ss = s.split("#");
for (int i = 0; i < ss.length; i++) {
Log.e("XOP",ss[i]);
}
This code replaces + and - with #+ and #- respectively and then splits the string with #. That way the + and - operators are not lost in the result.
If you require # as input character then you can use any other Unicode character instead of #.
Try this one:
String s = "4x^2+3x-2";
String[] arr = s.split("[\\+-]");
for(int i=0;i<arr.length;i++){
System.out.println(arr[i]);
}
Personally I like it better to have positive matches of patterns, especially if the split pattern itself is empty.
So for instance you could use a Pattern and Matcher like this:
Pattern p = Pattern.compile("(^|[+-])([^+-]*)");
Matcher m = p.matcher("4x^2+3x-2");
while (m.find()) {
System.out.printf("%s or %s %s%n", m.group(), m.group(1), m.group(2));
}
This matches the start of the string or a plus or minus: ^|[+-], followed by any amount of characters that are not a plus or minus: [^+-]*.
Do note that the ^ first matches the start of the string, and is then used to negate a character class when used between brackets. Regular expressions are tricky like that.
Bonus: you can also use the two groups (within the parenthesis in the pattern) to match the operators - if any.
All this is presuming that you want to use/test regular expressions; generally things like this require a parser rather than a regular expression.
A one-liner for persons thinking that this is too complex:
var expressions = Pattern.compile("^|[+-][^+-]*")
.matcher("4x^2+3x-2")
.results()
.map(r -> r.group())
.collect(Collectors.toList());

java pattern regular expression matching

I'm really bad with pattern matching. I'm trying to take in a password and just check that it meets this criteria:
contains at least 1 lowercase letter
contains at least 1 uppercase letter
contains at least 1 number
contains at least one of these special chars: ##$%
has a minimum length of 8 characters
has a maximum length of 10 characters
This is what I have:
Pattern pattern = Pattern.compile("((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{8,10})");
Matcher matcher = pattern.matcher(in);
if(!matcher.find())
{
return false;
}
else
{
return true;
}
I would also like to do something like this:
int MIN = 8,
MAX = 10;
"((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{MIN,MAX})"
but I get some weird message about malformed expression.
Something isn't right. My program crashes with this. I don't know what's wrong. Any ideas?
private boolean isValidPassword(String in)
{
/* PASSWORD MUST:
* contains at least 1 lowercase letter
* contains at least 1 uppercase letter
* contains at least 1 number
* contains at least one of these special chars: ##$%
* has a minimum length of 8 characters
* has a maximum length of 10 characters
*/
Pattern hasLowercase = Pattern.compile(".*[a-z].*");
Pattern hasUppercase = Pattern.compile(".*[A-Z].*");
Pattern hasNumber = Pattern.compile(".*[0-9].*");
Pattern hasSpecial = Pattern.compile(".*(#|#|$|%).*");
Matcher matcher = hasLowercase.matcher(in);
if (!matcher.matches()) //a-z
{
return false;
}
matcher = hasUppercase.matcher(in);
if (!matcher.matches()) //A-Z
{
return false;
}
matcher = hasNumber.matcher(in);
if (!matcher.matches()) //0-9
{
return false;
}
matcher = hasSpecial.matcher(in);
if (!matcher.matches()) //##$%
{
return false;
}
if(in.length() < MIN_LENGTH || in.length() > MAX_LENGTH) //length must be min-to-max.
{
return false;
}
return true;
}
If you really want to do this with regular expressions, it would be much easier to test the input against multiple simple expressions rather than one single and excessively complex expression.
Test your input against the following regexes.
If one of them fails, then the input is invalid.
.*[a-z].*
.*[A-Z].*
.*[0-9].*
.*(#|#|$|%).*
Additionnally, check the length of the input, with basic string methods.
I am not sure how to help you with crashing without more information, but I do have a suggestion.
Instead of trying to create one giant regex expression, I would recommend making one expression for each rule, then test them all on the string individually. This allows you to easily edit the individual rules if you decide you want to change/add/remove rules. This also makes them easier to understand.
There is also the option of not using regex, which would make your rules pretty easy using the string contains method with these character classes
As for the malformed expression, you should concat the MIN and MAX like this:
"((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{" + MIN + "," + MAX + "})" which will insert the values of MAX and MIN into the string.
I think that your expression might be off, but I found one that meets what you are looking for.
"^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!##\$%\^&\*])(?=.{8,10})"
You can modify the min and max length by using
"^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!##\$%\^&\*])(?=.{" + MIN + "," + MAX + "})"
I have included this RegEx in Regexr so you can see how it works.
http://regexr.com/3gnbd
Also, for future reference when testing Regular Expressions, regexr.com is very helpful for seeing the different components.
You also should use if/then statements to return true or false, because you could just return the tested condition instead. return matcher.find() eliminates the need for an if statement.

Replacing Strings with a number in it without a for loop

So I currently have this code;
for (int i = 1; i <= this.max; i++) {
in = in.replace("{place" + i + "}", this.getUser(i)); // Get the place of a user.
}
Which works well, but I would like to just keep it simple (using Pattern matching)
so I used this code to check if it matches;
System.out.println(StringUtil.matches("{place5}", "\\{place\\d\\}"));
StringUtil's matches;
public static boolean matches(String string, String regex) {
if (string == null || regex == null) return false;
Pattern compiledPattern = Pattern.compile(regex);
return compiledPattern.matcher(string).matches();
}
Which returns true, then comes the next part I need help with, replacing the {place5} so I can parse the number. I could replace "{place" and "}", but what if there were multiple of those in a string ("{place5} {username}"), then I can't do that anymore, as far as I'm aware, if you know if there is a simple way to do that then please let me know, if not I can just stick with the for-loop.
then comes the next part I need help with, replacing the {place5} so I can parse the number
In order to obtain the number after {place, you can use
s = s.replaceAll(".*\\{place(\\d+)}.*", "$1");
The regex matches arbitrary number of characters before the string we are searching for, then {place, then we match and capture 1 or more digits with (\d+), and then we match the rest of the string with .*. Note that if the string has newline symbols, you should append (?s) at the beginning of the pattern. $1 in the replacement pattern "restores" the value we need.

Efficient Regular Expression for big data, if a String contains a word

I have a code that works but is extremely slow. This code determines whether a string contains a keyword. The requirements I have need to be efficient for hundreds of keywords that I will search for in thousands of documents.
What can I do to make finding the keywords (without falsely returning a word that contains the keyword) efficiently?
For example:
String keyword="ac";
String document"..." //few page long file
If i use :
if(document.contains(keyword) ){
//do something
}
It will also return true if document contains a word like "account";
so I tried to use regular expression as follows:
String pattern = "(.*)([^A-Za-z]"+ keyword +"[^A-Za-z])(.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(document);
if(m.find()){
//do something
}
Summary:
This is the summary: Hopefully it will be useful to some one else:
My regular expression would work but extremely impractical while
working with big data. (it didn't terminate)
#anubhava perfected the regular expression. it was easy to
understand and implement. It managed to terminate which is a big
thing. but it was still a bit slow. (Roughly about 240 seconds)
#Tomalak solution is abit complex to implement and understand but it
was the fastest solution. so hats off mate.(18 seconds)
so #Tomalak solution was ~15 times faster than #anubhava.
Don't think you need to have .* in your regex.
Try this regex:
String pattern = "\\b"+ Pattern.quote(keyword) + "\\b";
Here \\b is used for word boundary. If the keyword can contain special characters, make sure they are not at the start or end of the word, or the word boundaries will fail to match.
Also you must be using Pattern.quote if your keyword contains special regex characters.
EDIT: You might use this regex if your keywords are separated by space.
String pattern = "(?<=\\s|^)"+ Pattern.quote(keyword) + "(?=\\s|$)";
The fastest-possible way to find substrings in Java is to use String.indexOf().
To achieve "entire-word-only" matches, you would need to add a little bit of logic to check the characters before and after a possible match to make sure they are non-word characters:
public class IndexOfWordSample {
public static void main(String[] args) {
String input = "There are longer strings than this not very long one.";
String search = "long";
int index = indexOfWord(input, search);
if (index > -1) {
System.out.println("Hit for \"" + search + "\" at position " + index + ".");
} else {
System.out.println("No hit for \"" + search + "\".");
}
}
public static int indexOfWord(String input, String word) {
String nonWord = "^\\W?$", before, after;
int index, before_i, after_i = 0;
while (true) {
index = input.indexOf(word, after_i);
if (index == -1 || word.isEmpty()) break;
before_i = index - 1;
after_i = index + word.length();
before = "" + (before_i > -1 ? input.charAt(before_i) : "");
after = "" + (after_i < input.length() ? input.charAt(after_i) : "");
if (before.matches(nonWord) && after.matches(nonWord)) {
return index;
}
}
return -1;
}
}
This would print:
Hit for "long" at position 44.
This should perform better than a pure regular expressions approach.
Think if ^\W?$ already matches your expectation of a "non-word" character. The regular expression is a compromise here and may cost performance if your input string contains many "almost"-matches.
For extra speed, ditch the regex and work with the Character class, checking a combination of the many properties it provides (like isAlphabetic, etc.) for before and after.
I've created a Gist with an alternative implementation that does that.

Splitting input string for a calculator

I'm trying to split the input given by the user for my calculator.
For example,
if the user inputs "23+45*(1+1)" I want to this to be split into [23,+,45,*,(,1,+,1,)].
What your looking for is called a lexer. A lexer splits up input into chunks (called tokens) that you can read.
Fortunately, your lexer is pretty simple and can be written by hand. For more complicated lexers, you can use flex (as in "The Fast Lexical Analyzer"--not Adobe Flex), or (since you're using Java) ANTLR (note, ANTLR is much more than just a lexer).
Simply come up with a list of regular expressions, one for each token to match (note that since your input is so simple, you can probably do away with this list and merge them all into one single regex. However, for more advanced lexers, it helps to do one regex for each token) e.g.
\d+
\+
-
*
/
\(
\)
Then start a loop: while there are more characters to be parsed, go through each of your regular expressions and attempt to match them against the beginning of the string. If they match, add the first matched group to your list of input. Otherwise, continue matching (if none of them match, tell the user they have a syntax error).
Pseudocode:
List<String>input = new LinkedList<String>();
while(userInputString.length()>0){
for (final Pattern p : myRegexes){
final Matcher m = p.matcher(userInputString);
if(m.find()) {
input.add(m.group());
//Remove the token we found from the user's input string so that we
//can match the rest of the string against our regular expressions.
userInputString=userInputString.substring(m.group().length());
break;
}
}
}
Implementation notes:
You may want to prepend the ^ character to all of your regular expressions. This makes sure you anchor your matches against the beginning of the string. My pseudocode assumes you have done this.
I think using stacks to split the operand and operator and evaluate the expression would be more appropriate. In the calculator we generally use Infix notation to define the arithmetic expression.
Operand1 op Operand2
Check the Shunting-yard algorithm used in many such cases to parse the mathematical expression. This is also a good read.
This might be a little sloppy, because I am learning still, but it does split them into strings.
public class TestClass {
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
ArrayList<String> separatedInput = new ArrayList<String>();
String input = "";
System.out.print("Values: ");
input = sc.next();
if (input.length() != 0)
{
boolean numberValue = true;
String numbers = "";
for (int i = 0; i < input.length(); i++)
{
char ch = input.charAt(i);
String value = input.substring(i, i+1);
if (Character.isDigit(ch))
{ numberValue = true; numbers = numbers + value; }
if (!numberValue)
{ separatedInput.add(numbers); separatedInput.add(value); numbers = ""; }
numberValue = false;
if (i == input.length() - 1)
{
if (Character.isDigit(ch))
{ separatedInput.add(numbers); }
}
}
}
System.out.println(separatedInput);
}
}

Categories

Resources