I understand the basics of regex but I am not able to create a regular expression satisfying all these conditions. Can anybody give me an idea how to do it.
The string must be at least 20 character long
The string must contain a digit
The digit must be preceded by some non-numeric character
The end of the string must be a date of format DD/MM/YYYY HH:MM - yes, there is a space in between and all the digits must be present. Digits occurring in the date part of the string are not considered as satisfiability condition of rule 2.
If there is a $ sign before the first numeric digit occurs, the string is invalid
I have tried using code:
if (sCurrentLine.length() > 20) {
for (int i=0; i <= sCurrentLine.length() - 1; i++) {
char character = sCurrentLine.charAt(i);
int ascii = (int) character;
if (((ascii >= 48) && (ascii <= 57)) && (i!=0)) {
char character2 = sCurrentLine.charAt(i-1);
int ascii2 = (int) character2;
if(!((ascii2 >= 48) && (ascii2 <= 57))) {
//
}
}
}
}
but it seems too complicated.
Is there a regex approach that could solve this?
Try this:
if (sCurrentLine.matches("(?=.{20})[^$]*[^\\d$]\\d.*\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}"))
The length is checked using a look ahead that asserts there are 20 chars (which of course means there are at least 20 chars).
Your required-digit logic may be expressed as "starts with any number of non-dollar chars, a non-dollar/digit char then a digit", which is the first part of the regex.
The last part is the date format. Note that this checks only that there are digits in the right place, not that it's a legitimate date.
This should do except one point: the length of 20 wich is easy to check around:
Demo
Regex used: (\D+(?<!\$)\d.+)(\d{2}\/\d{2}\/\d{4} \d{2}:\d{2})
Edit: it captures the string as first group and the "date" as second group.
So it should be easy to check first is > 20 and that the second is a valid date.
Matching a valid date with a regex is quite a pain and there's library for that (which will take in account leap years etc.)
Trying to do this with a lookahead seems difficult. I tried some mechanisms where the first part of the string is 4 or more characters (note that the MM/DD/YYYY HH:MM portion will need to be exactly 16 characters), and a lookahead is used to make sure that 4+-character portion has a digit preceded by a non-digit that isn't $. Unfortunately, I don't know how to get the lookahead to stop at the end of that first portion. Perhaps using lookbehind will work.
But I'd recommend just splitting into two regexes. Anything that uses one regex is going to be a whole lot less readable and probably less efficient.
if (sCurrentLine.length >= 20 &&
sCurrentLine.substring(sCurrentLine.length() - 16, sCurrentLine.length())
.matches("\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}") &&
sCurrentLine.substring(0, sCurrentLine.length() - 16)
.matches("[^$]*[^\\d$]\\d"))
This makes sure that the last 16 characters match the date/time format, and the first part of the string has a character that isn't a digit or $, followed by a digit.
Note: not tested
Edit: I interpreted condition 5 as meaning there can't be an $ immediately preceding the first digit, but I think I got this wrong. Now makes sure there isn't an $ anywhere before the first digit.
Related
We get xml with invalid duration, like PT10HMS (note lack of numbers before M and S). I have handled this by reading the file and fixing by iterating the duration string character by character and inserting 0 between 2 letters that are side by side (except between P and T). I was wondering if there was a more elegant solution maybe using a regex with sed or anything else?
thanks for any suggestions
An idea for a Java solution here (sure sed can be used too).
String incorrectDuration = "PT10HMS";
String dur = incorrectDuration.replaceAll("(?<!\\d+)[HMS]", "0$0");
This produces
PT10H0M0S
Personally I would prefer deleting the letters that do not have a number in front of them:
String dur = incorrectDuration.replaceAll("(?<!\\d+)[HMS]", "");
Now I get
PT10H
In both cases Duration.parse(dur) works and gives the expected result.
(?<!\\d+) is a negative lookbehind: with this the regex only matches if the H, M or S is not preceded by a string of digits.
Edit: I am probably overdoing it in the following. I was just curious how I could produce my preferred string also in the case where you have got for example PTHMS as you mentioned in the comment. For production code you will probably want to stick with the simpler solution above.
String durationString = "PTHMS";
// if no digits, insert 0 before last letter
if (! durationString.matches(".*\\d.*")) {
durationString = durationString.replaceFirst("(?=[HMS]$)", "0");
}
// then delete letters that do not have a digit before them
durationString = durationString.replaceAll("(?<!\\d)[HMS]", "");
This produces
PT0S
(?=[HMS]$) is a lookahead. It matches the empty string but only if this empty string is followed by either H, M or S and then the end of the string. So replacing this empty string with 0 gives us PTHM0S. Confident that there is now at least one digit in the string, we can proceed to delete letters that don’t have a digit before them.
It still wouldn’t work if you had just PT. As I understand, this doesn’t happen. If it did, you would prefer for example durationString = PT0S; inside the if statement instead.
I want to match a string starting with capital letter and have length < 70.
I tried this regex ([A-Z][a-zA-Z\s\/\-]*\:?\'?) to check if the string starts with capital letter. It is working fine. But to check length, I changed to (([A-Z][a-zA-Z\s\/\-]*\:?\'?){4,70}) and it is not working.
Though, I can check the length using length() method of string in if statement. Doing so would make if statement lengthy. I want to combine length checking in regex itself. I think it can be done in regex, but I am not sure how.
Update(Forgot to mention): String can have either of two symbol- :,' and only one of two will be there for either zero or one time in the string.
E.g : Acceptable String : Looking forwards to an opportunity, WORK EXPERIENCE: , WORK EXPERIENCE- , India's Prime Minister
UnAcceptable String : Work Experience:: , Manager's Educational Qualification- , work experience: , Education - 2014 - 2017 , Education (Graduation)
Kindly help me.
Thanks in advance.
You'll certainly need anchors and lookarounds
(?=^[^-':\n]*[-':]{0,1}[^-':\n]*$)^[A-Z][-':\w ]{4,70}$
Thus, a string between 5-71 characters will be matched, see a demo on regex101.com. Additionally, it checks for the presence of zero or one of your Special characters (with the help of lookarounds, that is).
I would add ^ and $ to your regex:
^[A-Z].{,69}$
should work. This means:
^ beginning of the string
[A-Z] any capital character (in English anyway)
.{0,69} up to 69 other characters
$ end of the string
for a total length of up to 70 characters...
why would the if statement be lengthy?
String str = "Scary";
if (str.length() < 70 && str.charAt(0) >= 'A') {
}
Specify a lookaround assertion at the start of the regex that asserts that it may contain between 4 and 70 characters :
(?=.{4,70}$)
You would write so :
String regex = "(?=.{4,70}$)[A-Z][a-zA-Z\\s\\/\\-]*\\:?\\'?";
Working REGEX =
/\A^[A-Z][A-Za-z]*\z/
I have to match an 8 character string, which can contain exactly 2 letters (1 uppercase and 1 lowercase), and exactly 6 digits, but they can be permutated arbitrarily.
So, basically:
K82v6686 would pass
3w28E020 would pass
1276eQ900 would fail (too long)
98Y78k9k would fail (three letters)
A09B2197 would fail (two capital letters)
I've tried using the positive lookahead to make sure that the string contains digits, uppercase and lowercase letters, but I have trouble with limiting it to a certain number of occurrences. I suppose I could go about it by including all possible combinations of where the letters and digits can occur:
(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z]) ([A-Z][a-z][0-9]{6})|([A-Z][0-9][a-z][0-9]{5})| ... | ([0-9]{6}[a-z][A-Z])
But that's a very roundabout way of doing it, and I'm wondering if there's a better solution.
You can use
^(?=[^A-Z]*[A-Z][^A-Z]*$)(?=[^a-z]*[a-z][^a-z]*$)(?=(?:\D*\d){6}\D*$)[a-zA-Z0-9]{8}$
See the regex demo (a bit modified due to the multiline input). In Java, do not forget to use double backslashes (e.g. \\d to match a digit).
Here is a breakdown:
^ - start of string (assuming no multiline flag is to be used)
(?=[^A-Z]*[A-Z][^A-Z]*$) - check if there is only 1 uppercase letter (use \p{Lu} to match any Unicode uppercase letter and \P{Lu} to match any character other than that)
(?=[^a-z]*[a-z][^a-z]*$) - similar check if there is only 1 lowercase letter (alternatively, use \p{Ll} and \P{Ll} to match Unicode letters)
(?=(?:\D*\d){6}\D*$) - check if there are six digits in a string (=from the beginning of the string, there can be 0 or more non-digit symbols (\D matches any character but a digit, you may also replace it with [^0-9]), then followed by a digit (\d) and then followed by 0 or more non-digit characters (\D*) up to the end of string ($)) and then
[a-zA-Z0-9]{8} - match exactly 8 alphanumeric characters.
$ - end of string.
Following the logic, we can even reduce this to just
^(?=[^a-z]*[a-z][^a-z]*$)(?=(?:\D*\d){6}\D*$)[a-zA-Z0-9]{8}$
One condition can be removed as we only allow lower- and uppercase letters and digits with [a-zA-Z0-9], and when we apply 2 conditions the 3rd one is automatically performed when matching the string (one character must be an uppercase in this case).
When using it with Java matches() method, there is no need to use ^ and $ anchors at the start and end of the pattern, but you still need it in the lookaheads:
String s = "K82v6686";
String rx = "(?=[^a-z]*[a-z][^a-z]*$)" + // 1 lowercase letter check
"(?=(?:\\D*\\d){6}\\D*$)" + // 6 digits check
"[a-zA-Z0-9]{8}"; // matching 8 alphanum chars exactly
if (s.matches(rx)) {
System.out.println("Valid");
}
Pattern.matches(".*[A-Z].*", s) &&
Pattern.matches(".*[a-z].*", s) &&
Pattern.matches(".*(\\D*\\d){6}.*", s) &&
Pattern.matches(".{8}", s)
As we need an alternating automaton to be created for this task, it's much simpler to use a conjunction of regexps for constituent types of character.
We require it to have at least one lowercase letter, one uppercase letter and 6 digits, which three classes are mutually exclusive. And with the last condition we require the length of string to be exactly the sum of these numbers in such a way leaving no room for extra characters beyond the desired types. Of course we may say s.lenght() == 8 as the last condition term but this would break the style :).
Sort the string lexically and then match against ^(?:[a-z][A-Z]|[A-Z][a-z])[0-9]{6}$.
How can I match a character exactly once with a regex in Java? Let's say I want to look for strings which contain exactly one time the digit 3, and it doesn't matter where it is.
I tried to do this with ".*3{1}.*" but obviously this will also match "330" as I specified with the period that I don't care what character it is. How can I fix this?
^[^3]*3[^3]*$
Match (not three), then three, then (not three).
Edit: Adding ^ and $ at beginning and end. This will force the regex to match the whole line. Thanks #Bobbyrogers and #Mindastic
A non-regex solution:
int index = s.indexOf('3');
boolean unique = index != -1 && index == s.lastIndexOf('3');
Basically the character is unique if the first and last occurrences are at the same place and exist in the string (not -1).
I have numbers like this that need leading zero's removed.
Here is what I need:
00000004334300343 -> 4334300343
0003030435243 -> 3030435243
I can't figure this out as I'm new to regular expressions. This does not work:
(^0)
You're almost there. You just need quantifier:
str = str.replaceAll("^0+", "");
It replaces 1 or more occurrences of 0 (that is what + quantifier is for. Similarly, we have * quantifier, which means 0 or more), at the beginning of the string (that's given by caret - ^), with empty string.
Accepted solution will fail if you need to get "0" from "00". This is the right one:
str = str.replaceAll("^0+(?!$)", "");
^0+(?!$) means match one or more zeros if it is not followed by end of string.
Thank you to the commenter - I have updated the formula to match the description from the author.
If you know input strings are all containing digits then you can do:
String s = "00000004334300343";
System.out.println(Long.valueOf(s));
// 4334300343
Code Demo
By converting to Long it will automatically strip off all leading zeroes.
Another solution (might be more intuitive to read)
str = str.replaceFirst("^0+", "");
^ - match the beginning of a line
0+ - match the zero digit character one or more times
A exhausting list of pattern you can find here Pattern.
\b0+\B will do the work. See demo \b anchors your match to a word boundary, it matches a sequence of one or more zeros 0+, and finishes not in a word boundary (to not eliminate the last 0 in case you have only 00...000)
The correct regex to strip leading zeros is
str = str.replaceAll("^0+", "");
This regex will match 0 character in quantity of one and more at the string beginning.
There is not reason to worry about replaceAll method, as regex has ^ (begin input) special character that assure the replacement will be invoked only once.
Ultimately you can use Java build-in feature to do the same:
String str = "00000004334300343";
long number = Long.parseLong(str);
// outputs 4334300343
The leading zeros will be stripped for you automatically.
I know this is an old question, but I think the best way to do this is actually
str = str.replaceAll("(^0+)?(\d+)", "$2")
The reason I suggest this is because it splits the string into two groups. The second group is at least one digit. The first group matches 1 or more zeros at the start of the line. However, the first group is optional, meaning that if there are no leading zeros, you just get all of the digits. And, if str is only a zero, you get exactly one zero (because the second group must match at least one digit).
So if it's any number of 0s, you get back exactly one zero. If it starts with any number of 0s followed by any other digit, you get no leading zeros. If it starts with any other digit, you get back exactly what you had in the first place.
Here is the simple and proper solution.
str = str.replaceAll(/^0+/g, "");
Global Flag g is required when using replaceAll with regex