I need a Java regular expression - java

I am currently using the following regular expression:
^[a-zA-Z]{0,}(\\*?)?[a-zA-Z0-9]{0,}
to check a string to start with an alpha character and end with alphanumeric characters and have an asterisk(*) anywhere in the string but only a maximum of one time. The problem here is that if the given string still passes if it starts with a number but doesn't have an *, which should fail. How can I rework the regex to fail this case?
ex.
TE - pass
*TE - pass
TE* - pass
T*E - pass
*9TE - pass
*TE* - fail (multiple asterisk)
9E - fail (starts with number)
EDIT:
Sorry to introduce a late edit but I also need to ensure that the string is 8 characters or less, can I include that in the regex as well? Or should I just check the string length after the regex validation?

This passes your example:
"^([a-zA-Z]+\\*?|\\*)[a-zA-Z0-9]*$"
It says:
start with: [a-zA-Z]+\\*? (a letter and maybe a star)
| (or)
\\* a single star
and end with [a-zA-Z0-9]* (an alphanumeric character)
Code to test it:
public static void main(final String[] args) {
final Pattern p = Pattern.compile("^([a-zA-Z]+\\*?|\\*)\\w*$");
System.out.println(p.matcher("TE").matches());
System.out.println(p.matcher("*TE").matches());
System.out.println(p.matcher("TE*").matches());
System.out.println(p.matcher("T*E").matches());
System.out.println(p.matcher("*9TE").matches());
System.out.println(p.matcher("*TE*").matches());
System.out.println(p.matcher("9E").matches());
}
Per Stargazer, if you allow alphanumeric before the star, then use this:
^([a-zA-Z][a-zA-Z0-9]*\\*?|\\*)\\w*$

One possible way is to separate into 2 conditions:
^(?=[^*]*\*?[^*]*$)[a-zA-Z*][a-zA-Z0-9*]*$
The (?=[^*]*\*?[^*]*$) part ensures there is at most one * in the string.
The [a-zA-Z*][a-zA-Z0-9*]* part ensures it starts with an alphabet or a *, and followed by only alphanumerals or *.

It might be easier to develop and maintain later if you just break your regular expressions into a few pieces, e.g., one for the start and end, and one for the asterisk. I am not sure what the overall performance effect would be, you would have simpler expressions but have to run a few of them.

This is Python, it'll need some massaging for Java:
>>> import re
>>> p = re.compile('^([a-z][^*]*[*]?[^*]*[a-z0-9]|[*][^*]*[a-z0-9]|[a-z][^*]*[*])$', re.I)
>>> for test in ['TE', '*TE', 'TE*', 'T*E', '*9TE', '*TE*', '9E']:
... if p.match(test):
... print test, 'pass'
... else:
... print test, 'fail'
...
TE pass
*TE pass
TE* pass
T*E pass
*9TE pass
*TE* fail
9E fail
Hope I didn't miss anything.

How about this, it's easier to read:
boolean pass = input.replaceFirst("\\*", "").matches("^[a-zA-Z].*\\w$");
Assuming I read right, you want to:
Start with an alpha character
End with an alphanumeric character
Allow up to one * anywhere

At most one asterisk, alphabetic characters anywhere and numbers anywhere but at start.
String alpha = "[a-zA-Z]";
String alnum = "[a-zA-Z0-9]";
String asteriskNone = "^" + alpha + "+" + alnum + "*";
String asteriskStart = "^\\*" + alnum + "*";
String asteriskInside = "^" + alpha + "+" + alnum + "+\\*" + alnum + "*";
String yourRegex = asteriskNone + "|" + asteriskStart + "|"
+ asteriskInside;
String[] tests = {"TE","*TE","TE*","T*E","*9TE","*TE*", "9E"};
for (String test : tests)
System.out.println(test + " " + (test.matches(yourRegex)?"PASS":"FAIL"));

Look for two possible patterns, one starting with *, and one with an alpha char:
^[a-zA-Z][a-zA-Z0-9]*(\\*?)?[a-zA-Z0-9]*|\*[a-zA-Z0-9]*

^([a-zA-Z][a-zA-Z0-9]*\*|\*|[a-zA-Z])([a-zA-Z0-9])*$
the parenthesis around the second half are for clarity and can be safely excluded.

This was a tough one (liked the challenge), but here it is:
^(\*[a-zA-Z0-9]+|[a-zA-Z]+[\*]{1}[a-zA-Z]*)$
In order to comply with T9*Z, as pointed out on another post with StarGazer712, I had to change it to:
^(\*[a-zA-Z0-9]+|[a-zA-Z]{1}[a-zA-Z0-9]*[\*]{1}[a-zA-Z0-9]*)$

Related

Java date pattern seperated by hyphen

I want to know if this is the right pattern for matches string like following
String samples
23.04.2019-30.04.2019
3.06.2019-20.06.2019
Pattern
private final Pattern TIMELINE_PATTERN = Pattern.compile("^\\d{2}.\\d{2}.\\d{4}-\\d{2}.\\d{2}.\\d{4}$");
If the day/month components could be one or two digit characters, then you should use this pattern:
^\d{1,2}\.\d{1,2}\.\d{4}-\d{1,2}\.\d{1,2}\.\d{4}$
Demo
Presumably the years might also not be fixed width, but it is probably unlikely that a year earlier than 1000 would appear, so we can fix the year at 4 digits. Also, literal dot in a regex pattern needs to be escaped with a backslash.
Edit:
If you want to first validate the string, and then separate the two dates, then consider this:
String input = "3.06.2019-20.06.2019";
if (input.matches("\\d{1,2}\\.\\d{1,2}\\.\\d{4}-\\d{1,2}\\.\\d{1,2}\\.\\d{4}")) {
String[] dates = input.split("-");
System.out.println("date1: " + dates[0]);
System.out.println("date2: " + dates[1]);
}
Two problems in your current regex,
First quantifier needs to be {1,2} instead of just {2} to support either one digit or two
You need to escape dot
The correct regex you need to use should be this,
^\d{1,2}\.\d{2}\.\d{4}-\d{2}\.\d{2}\.\d{4}$
Regex Demo
Java code,
List<String> list = Arrays.asList("23.04.2019-30.04.2019", "3.06.2019-20.06.2019");
list.forEach(x -> {
System.out.println(x + " --> " + x.matches("^\\d{1,2}\\.\\d{2}\\.\\d{4}-\\d{2}\\.\\d{2}\\.\\d{4}$"));
});
Prints,
23.04.2019-30.04.2019 --> true
3.06.2019-20.06.2019 --> true

Regex does not store the element in the first index

I have a function which takes a String containing a math expression such as 6+9*8 or 4+9 and it evaluates them from left to right (without normal order of operation rules).
I've been stuck with this problem for the past couple of hours and have finally found the culprit BUT I have no idea why it is doing what it does. When I split the string through regex (.split("\\d") and .split("\\D")), I make it go into 2 arrays, one is a int[] where it contains the numbers involved in the expression and a String[] where it contains the operations.
What I've realized is that when I do the following:
String question = "5+9*8";
String[] mathOperations = question.split("\\d");
for(int i = 0; i < mathOperations.length; i++) {
System.out.println("Math Operation at " + i + " is " + mathOperations[i]);
}
it does not put the first operation sign in index 0, rather it puts it in index 1. Why is this?
This is the system.out on the console:
Math Operation at 0 is
Math Operation at 1 is +
Math Operation at 2 is *
Because on position 0 of mathOperations there's an empty String. In other words
mathOperations = {"", "+", "*"};
According to split documentation
The array returned by this method contains each substring of this
string that is terminated by another substring that matches the given
expression or is terminated by the end of the string. ...
Why isn't there an empty string at the end of the array too?
Trailing empty strings are therefore not included in the resulting
array.
More detailed explanation - your regex matched the String like this:
"(5)+(9)*(8)" -> "" + (5) + "+" + (9) + "*" + (8) + ""
but the trailing empty string is discarded as specified by the documentation.
(hope this silly illustration helps)
Also a thing worth noting, the regex you used "\\d", would split following string "55+5" into
["", "", "+"]
That's because you match only a single character, you should probably use "\\d+"
You may find the following variation on your program helpful, as one split does the jobs of both of yours...
public class zw {
public static void main(String[] args) {
String question = "85+9*8-900+77";
String[] bits = question.split("\\b");
for (int i = 0; i < bits.length; ++i) System.out.println("[" + bits[i] + "]");
}
}
and its output:
[]
[85]
[+]
[9]
[*]
[8]
[-]
[900]
[+]
[77]
In this program, I used \b as a "zero-width boundary" to do the splitting. No characters were harmed during the split, they all went into the array.
More info here: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
and here: http://www.regular-expressions.info/wordboundaries.html

Regular expression query (runtime customizable)

I have a special requirement, my regular expression pattern will be determined at run time for say i have a date and will like it to be checked against mm-dd-yyyy or mm/dd/yyyy or d.mm.yyyy something basically i would be feeding the pattern as NN-NN-TTTT where N mean a number and T means a letter and the expression can be anything. Can we write any regular expression that will work for this kind of requirement?
my form will look like as displayed in http://jsfiddle.net/E2EHZ/ data will matched corresponding to pattern specified in the text box
T - letter
N - Numeric
A - Alphanum
So essentially you would have your users enter a pattern containing T, N or A as placeholders with other characters that need to match literally in between? If so, then it's rather easy: Just replace your placeholders by appropriate character classes, quote the rest (so regex metacharacters are escaped) and use the result as a regex.
First escape everthing that is not A, N or T. How to do this varies by language, but essentially you'd replace [^ANT]+ by an escaped version of the match. In C# it might look like this:
Regex.Replace(s, "[^ANT]+", m => Regex.Escape(m.Value));
or in Java:
s.replaceAll("[^ANT]+", "\\Q$0\\E"
The translations to perform then are easy:
T → [a-zA-Z]
N → [0-9]
A → [0-9a-zA-Z]
That is, assuming ASCII-only. For Unicode you might want
T → \p{L}
N → \p{Nd}
A → [\p{L}\p{Nd}]
instead. Also note that if you perform simple string replacements you'll need to replace A first with the ASCII versions and N first for the Unicode variants to avoid replacing it in subsequent results.
In the end you might want to prefix your string with ^ and suffix it with $ if you want to match complete strings.
A sample implementation in C# (with a tiny optimisation):
string CreateRegex(string pattern) {
string result = Regex.Replace(pattern, "[^ANT]+", m => Regex.Escape(m.Value));
result = Regex.Replace(result, "A+", m => "[0-9a-zA-Z]" + (m.Length > 1 ? "{"+m.Length+"}" : ""));
result = Regex.Replace(result, "T+", m => "[a-zA-Z]" + (m.Length > 1 ? "{"+m.Length+"}" : ""));
result = Regex.Replace(result, "N+", m => "[0-9]" + (m.Length > 1 ? "{"+m.Length+"}" : ""));
return "^" + result + "$";
}
which for example results in the following:
NN-NN-TTTT → ^[0-9]{2}-[0-9]{2}-[a-zA-Z]{4}$
*(#&#^(&%(# AA-AA-NN-TTTTTTTT lreglig → \*\(#&\#\^\(&%\(#\ \ [0-9a-zA-Z]{2}-[0-9a-zA-Z]{2}-[0-9]{2}-[a-zA-Z]{8}\ lreglig
Or in Java (without said optimisation, because I cannot figure out how to use a function as replacement):
String createRegex(String pattern) {
String result = pattern.replaceAll("[^ANT]+", "\\Q$0\\E");
result = result.replaceAll("A", "[0-9a-zA-Z]");
result = result.replaceAll("T", "[a-zA-Z]");
result = result.replaceAll("N", "[0-9]");
return "^" + result + "$";
}
The resulting regexes will be a bit longer because the code above won't use repetition for identical tokens.

Regular expression pattern to find a number within a semicolon delimited list of numbers

String temp = "77"; // It can be 0 or 100 or any value
// So the pattern will be like this only but number can be change anytime
String inclusion = "100;0;77;200;....;90";
I need to write a regular expression so that I can see whether temp exists in inclusion or not so for that I wrote a regexPattern like this.
// This is the regular Expression I wrote.
String regexPattern = "(^|.*;)" + temp + "(;.*|$)";
So do you think this regular expression will work everytime or there is some problem with that regexPattern?
if(inclusion.matches(regexPattern)) {
}
You could run into issues if temp can contain special characters for regular expressions, but if it is always integers then your method should be fine.
However, a more straightforward way to do this would be to split your string on semi-colons and then see if temp is in the resulting array.
If you do stick with regex, you can simplify it a bit by dropping the .*, the following will work the same way as your current regex:
"(^|;)" + temp + "(;|$)"
edit: Oops, the above will actually not work, I am a bit unfamiliar with regex in Java and didn't realize that the entire string needs to match, thanks Affe!
You don't need regex:
temp = "77"
String searchPattern = ";" + temp + ";";
String inclusion = ";" + "100;0;77;200;....;90" + ";";
inclusion.indexOf(searchPattern);
Another alternative without regex
String inclusion2 = ";" + inclusion + ";"; // To ensure that all number are between semicolons
if (inclusion2.indexOf(";" + temp + ";") =! -1) {
// found
}
Of course, no pattern recognition here (wildcards and the like)

Help building a regex

I need to build a regular expression that finds the word "int" only if it's not part of some string.
I want to find whether int is used in the code. (not in some string, only in regular code)
Example:
int i; // the regex should find this one.
String example = "int i"; // the regex should ignore this line.
logger.i("int"); // the regex should ignore this line.
logger.i("int") + int.toString(); // the regex should find this one (because of the second int)
thanks!
It's not going to be bullet-proof, but this works for all your test cases:
(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)
It does a look behind and look ahead to assert that there's either none or two preceding/following quotes "
Here's the code in java with the output:
String regex = "(?<=^([^\"]*|[^\"]*\"[^\"]*\"[^\"]*))\\bint\\b(?=([^\"]*|[^\"]*\"[^\"]*\"[^\"]*)$)";
System.out.println(regex);
String[] tests = new String[] {
"int i;",
"String example = \"int i\";",
"logger.i(\"int\");",
"logger.i(\"int\") + int.toString();" };
for (String test : tests) {
System.out.println(test.matches("^.*" + regex + ".*$") + ": " + test);
}
Output (included regex so you can read it without all those \ escapes):
(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)
true: int i;
false: String example = "int i";
false: logger.i("int");
true: logger.i("int") + int.toString();
Using a regex is never going to be 100% accurate - you need a language parser. Consider escaped quotes in Strings "foo\"bar", in-line comments /* foo " bar */, etc.
Not exactly sure what your complete requirements are but
$\s*\bint\b
perhaps
Assuming input will be each line,
^int\s[\$_a-bA-B\;]*$
it follows basic variable naming rules :)
If you think to parse code and search isolated int word, this works:
(^int|[\(\ \;,]int)
You can use it to find int that in code can be only preceded by space, comma, ";" and left parenthesis or be the first word of line.
You can try it here and enhance it http://www.regextester.com/
PS: this works in all your test cases.
$[^"]*\bint\b
should work. I can't think of a situation where you can use a valid int identifier after the character '"'.
Of course this only applies if the code is limited to one statement per line.

Categories

Resources