Regex starts with Capital letter and having length < 70 - java

I want to match a string starting with capital letter and have length < 70.
I tried this regex ([A-Z][a-zA-Z\s\/\-]*\:?\'?) to check if the string starts with capital letter. It is working fine. But to check length, I changed to (([A-Z][a-zA-Z\s\/\-]*\:?\'?){4,70}) and it is not working.
Though, I can check the length using length() method of string in if statement. Doing so would make if statement lengthy. I want to combine length checking in regex itself. I think it can be done in regex, but I am not sure how.
Update(Forgot to mention): String can have either of two symbol- :,' and only one of two will be there for either zero or one time in the string.
E.g : Acceptable String : Looking forwards to an opportunity, WORK EXPERIENCE: , WORK EXPERIENCE- , India's Prime Minister
UnAcceptable String : Work Experience:: , Manager's Educational Qualification- , work experience: , Education - 2014 - 2017 , Education (Graduation)
Kindly help me.
Thanks in advance.

You'll certainly need anchors and lookarounds
(?=^[^-':\n]*[-':]{0,1}[^-':\n]*$)^[A-Z][-':\w ]{4,70}$
Thus, a string between 5-71 characters will be matched, see a demo on regex101.com. Additionally, it checks for the presence of zero or one of your Special characters (with the help of lookarounds, that is).

I would add ^ and $ to your regex:
^[A-Z].{,69}$
should work. This means:
^ beginning of the string
[A-Z] any capital character (in English anyway)
.{0,69} up to 69 other characters
$ end of the string
for a total length of up to 70 characters...

why would the if statement be lengthy?
String str = "Scary";
if (str.length() < 70 && str.charAt(0) >= 'A') {
}

Specify a lookaround assertion at the start of the regex that asserts that it may contain between 4 and 70 characters :
(?=.{4,70}$)
You would write so :
String regex = "(?=.{4,70}$)[A-Z][a-zA-Z\\s\\/\\-]*\\:?\\'?";

Working REGEX =
/\A^[A-Z][A-Za-z]*\z/

Related

Regular expression replace characters by a given match between strings

I am trying to replace a given character by a regular expression match.
For example, given the following string:
If you look at what you have in life, you'll always have more. If you look at what you don't have in life, you'll never have enough
I would like to replace all 't' with a '!' only where the match is between the characters 'ok' and 'fe'.
I get the match between 'ok' and 'fe' with this regular expression:
(?<=ok).*?(?=fe)
And I can only match one character with the following regex:
(?<=ok).*?(t).*?(?=fe)
I tried to transform that regex in the following way but it does not work:
(?<=ok).*?((t).*?)*?(?=fe)
How can I match all 't' between 'ok' and 'fe'?
https://regex101.com/r/ORgseA/1
You can use
String result = text.replaceAll("(?s)(\\G(?!\\A)|ok)((?:(?!ok|fe|t).)*)t(?=(?:(?!ok|fe).)*fe)", "$1$2!");
See the regex demo and the Java demo:
String text = "If you look at what you have in life, you'll always have more. If you look at what you don't have in life, you'll never have enough";
String result = text.replaceAll("(?s)(\\G(?!\\A)|ok)((?:(?!ok|fe|t).)*)t(?=(?:(?!ok|fe).)*fe)", "$1$2!");
System.out.println(result);
// => If you look a! wha! you have in life, you'll always have more. If you look a! wha! you don'! have in life, you'll never have enough
Details:
(?s) - Pattern.DOTALL embedded flag option (to make . match line break chars)
(\G(?!\A)|ok) - Group 1 ($1): ok or the end of the previous successful match
((?:(?!ok|fe|t).)*) - Group 2 ($2): any one char, zero or more occurrences, as many as possible, that does not start a ok, fe or t char sequence
t - a t char
(?=(?:(?!ok|fe).)*fe) - immediately to the right, there must be any single char, zero or more occurrences, as many as possible, that does not start ok or fe char sequences and then a fe substring.

How to check if a word ends and starts with a common symbol and replace it as many times it appears with 1

I am facing a little challenge, here's what I've been trying to do.
Assuming I have these 2 variables
String word1 ="hello! hello!! %can you hear me%? Yes I can.";
And then this one
String word2 ="*Java is awesome* Do you % agree % with us?";
I want to be able to check if a variable contains a word that begins and ends with a particular symbol(s) like % and * that I am using and replace with; with '1' (one). Here's what I tried.
StringTokenizer st = new StringTokenizer(word1);
while(st.hasMoreTokens()){
String block = st.nextToken();
if( (block.startsWith("%") && block.endsWith("%") ||(block.startsWith("*") && block.endsWith("*")){
word1.replace (block,"1");
}
}
//output
'hello!hello!!%canyouhearme%?YesIcan."
//expected
"hello! hello!! 1? Yes I can.";
It just ended up trimming it. I guess this is because of the delimiter used is Space and since the last % ends with %? It read it as a single block.
When I tried the same for word2
I got "1Doyou%agree%withus?"
//expected
"1 Do you 1 with us?"
And assuming I have another word like
String word3 ="%*hello*% friends";
I want to be able to produce
//output
"1friends"
//expected
"11 friends"
Since it has 4-symbols
Any help would be truly appreciated, just sharpening my java skills. Thanks.
You can use a Regular Expression (RegEx) within the String.matches() method for determining if a string contains the specific criteria, for example:
if (word1.matches(".*\\*.*\\*.*|.*\\%.*\\%.*")) {
// Replace desired test with the value of 1 here...
}
If you want the full explanation of this regular expression then go to rexex101.com and enter the following expression: .*\*.*\*.*|.*\%.*\%.*.
The above if statement condition utilizes the String.matches() method to validate whether or not the string contains text (or no text) between either asterisks (*) or between percent (%) characters. If it does we simply use the String.replaceAll() method to replace those string sections (between and including *...* and %...%) with the value of 1, something like this:
String word1 = "hello! hello!! %can you hear me%? Yes I can.";
if (word1.matches(".*\\*.*\\*.*|.*\\%.*\\%.*")) {
String newWord1 = word1.replaceAll("\\*.*\\*|%.*%", "1");
System.out.println(newWord1);
}
The Console window will display:
hello! hello!! 1? Yes I can.
If you were to play this string: "*Java is awesome* Do you % agree % with us?" into with the above code your console window will display:
1 Do you 1 with us?
Keep in mind that this will provide the same output to console if your supplied string was "** Do you %% with us?". If you don't really want this then you will need to modify the RegEx within the matches() method a wee bit to something like this:
".*\\*.+\\*.*|.*\\%.+\\%.*"
and you will need to modify the the RegEx within the replaceAll() method to this:
"\\*.+\\*|%.+%"
With this change there now must be text between both the asterisks and or the Percent characters before validation is successful and a change is made.
The question isn't clear (not sure about how %*hello*% somehow translates to 11, and didn't understand what you mean by Since it has 4-symbols), but wouldn't regular expressions work?
Can't you simply do:
String replaced = word1.replaceAll("\\*[^\\*]+\\*", "1")
.replaceAll("\\%[^\\%]+\\%", "1");
I would say your presumption that special characters will be replaced twice is wrong. Replace function only works with case when you are trying to replace occurance of String, which doesn't seem to work with special characters. Only replaceAll, seems to work in that case. In your code you are trying to replace special characters along with other strings inside that, so only replaceAll function will do so.
In other words, when replaceAll function is executed it checks occurance of special characters , and replaces it once. You wouldn't require effort of using StringTokenizer, which is part of Scanner library, it is only required if you are taking user's input. So, no matter what you do you would only see 1 friends instead of 11 friends , also , you wouldn't need if statement. Credit goes to jbx above for regex. Now, you could shorten your code like this, still bearing in mind that 1 is printed replacing whatever is inside special character is replaced by single number 1.
You will need if-statement to search , replaceAll, or replace function already searches in String you specify to search on, so that if-statement is redundant, it's just making code end up being verbose.
package object_list_stackoverflow;
import java.util.StringTokenizer;
public class Object_list_stackoverflow {
public static void main(String[] args) {
String word1 = "hello! hello!! %can you hear me%? Yes I can.";
String word2 ="*Java is awesome* Do you % agree % with us?";
String word3 ="%*hello*% friends";
String regex = "\\*[^\\*]+\\*";
String regex1= "\\%[^\\%]+\\%";
System.out.println(word3.replaceAll(regex, "1").replaceAll(regex1, "1"));
}
}
Also read similar question by going to : Find Text between special characters and replace string
You can also get rid of alphanumeric characters by looking at dhuma1981's answer: How to replace special characters in a string?
Syntax to replace alphanumerics in String :
replaceAll("[^a-zA-Z0-9]", "");

java - Fix an invalid Duration

We get xml with invalid duration, like PT10HMS (note lack of numbers before M and S). I have handled this by reading the file and fixing by iterating the duration string character by character and inserting 0 between 2 letters that are side by side (except between P and T). I was wondering if there was a more elegant solution maybe using a regex with sed or anything else?
thanks for any suggestions
An idea for a Java solution here (sure sed can be used too).
String incorrectDuration = "PT10HMS";
String dur = incorrectDuration.replaceAll("(?<!\\d+)[HMS]", "0$0");
This produces
PT10H0M0S
Personally I would prefer deleting the letters that do not have a number in front of them:
String dur = incorrectDuration.replaceAll("(?<!\\d+)[HMS]", "");
Now I get
PT10H
In both cases Duration.parse(dur) works and gives the expected result.
(?<!\\d+) is a negative lookbehind: with this the regex only matches if the H, M or S is not preceded by a string of digits.
Edit: I am probably overdoing it in the following. I was just curious how I could produce my preferred string also in the case where you have got for example PTHMS as you mentioned in the comment. For production code you will probably want to stick with the simpler solution above.
String durationString = "PTHMS";
// if no digits, insert 0 before last letter
if (! durationString.matches(".*\\d.*")) {
durationString = durationString.replaceFirst("(?=[HMS]$)", "0");
}
// then delete letters that do not have a digit before them
durationString = durationString.replaceAll("(?<!\\d)[HMS]", "");
This produces
PT0S
(?=[HMS]$) is a lookahead. It matches the empty string but only if this empty string is followed by either H, M or S and then the end of the string. So replacing this empty string with 0 gives us PTHM0S. Confident that there is now at least one digit in the string, we can proceed to delete letters that don’t have a digit before them.
It still wouldn’t work if you had just PT. As I understand, this doesn’t happen. If it did, you would prefer for example durationString = PT0S; inside the if statement instead.

RegEx commands in java

I understand the basics of regex but I am not able to create a regular expression satisfying all these conditions. Can anybody give me an idea how to do it.
The string must be at least 20 character long
The string must contain a digit
The digit must be preceded by some non-numeric character
The end of the string must be a date of format DD/MM/YYYY HH:MM - yes, there is a space in between and all the digits must be present. Digits occurring in the date part of the string are not considered as satisfiability condition of rule 2.
If there is a $ sign before the first numeric digit occurs, the string is invalid
I have tried using code:
if (sCurrentLine.length() > 20) {
for (int i=0; i <= sCurrentLine.length() - 1; i++) {
char character = sCurrentLine.charAt(i);
int ascii = (int) character;
if (((ascii >= 48) && (ascii <= 57)) && (i!=0)) {
char character2 = sCurrentLine.charAt(i-1);
int ascii2 = (int) character2;
if(!((ascii2 >= 48) && (ascii2 <= 57))) {
//
}
}
}
}
but it seems too complicated.
Is there a regex approach that could solve this?
Try this:
if (sCurrentLine.matches("(?=.{20})[^$]*[^\\d$]\\d.*\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}"))
The length is checked using a look ahead that asserts there are 20 chars (which of course means there are at least 20 chars).
Your required-digit logic may be expressed as "starts with any number of non-dollar chars, a non-dollar/digit char then a digit", which is the first part of the regex.
The last part is the date format. Note that this checks only that there are digits in the right place, not that it's a legitimate date.
This should do except one point: the length of 20 wich is easy to check around:
Demo
Regex used: (\D+(?<!\$)\d.+)(\d{2}\/\d{2}\/\d{4} \d{2}:\d{2})
Edit: it captures the string as first group and the "date" as second group.
So it should be easy to check first is > 20 and that the second is a valid date.
Matching a valid date with a regex is quite a pain and there's library for that (which will take in account leap years etc.)
Trying to do this with a lookahead seems difficult. I tried some mechanisms where the first part of the string is 4 or more characters (note that the MM/DD/YYYY HH:MM portion will need to be exactly 16 characters), and a lookahead is used to make sure that 4+-character portion has a digit preceded by a non-digit that isn't $. Unfortunately, I don't know how to get the lookahead to stop at the end of that first portion. Perhaps using lookbehind will work.
But I'd recommend just splitting into two regexes. Anything that uses one regex is going to be a whole lot less readable and probably less efficient.
if (sCurrentLine.length >= 20 &&
sCurrentLine.substring(sCurrentLine.length() - 16, sCurrentLine.length())
.matches("\\d{2}/\\d{2}/\\d{4} \\d{2}:\\d{2}") &&
sCurrentLine.substring(0, sCurrentLine.length() - 16)
.matches("[^$]*[^\\d$]\\d"))
This makes sure that the last 16 characters match the date/time format, and the first part of the string has a character that isn't a digit or $, followed by a digit.
Note: not tested
Edit: I interpreted condition 5 as meaning there can't be an $ immediately preceding the first digit, but I think I got this wrong. Now makes sure there isn't an $ anywhere before the first digit.

Java regex - parts of words

Is it possible to write regex in Java that matches word parts? For example I want to search for strings 'ab', 'cd' and 'ef'. A match should be returned in following examples:
[lab stef ecde], [lecde effe aaaab]
So a match should be returned if all strings ('ab', 'cd', 'ef') are parts of words anywhere in the text - order is not imported.
However match shouldn't be returned if any of strings are missing
[lab stef]
If it doesn't have to be regex then Tichodroma's answer is the one you are looking for.
But if you really need to complicate your life and use regex you can try to use look-around mechanisms like look ahead and create something like
"lab stef ecde".matches("(?=.*ab)(?=.*cd)(?=.*ef).*") //true
"lab stef".matches("(?=.*ab)(?=.*cd)(?=.*ef).*") //false
to explain it more clearly: in
(?=.*ab)(?=.*cd)(?=.*ef).*
(?=.*ab) will check if your string contains .*ab where .* will match any characters before ab part.
also look-ahead (?=...) is zero-width which means that it will reset cursor to the position where it was before look-ahead started so in our case it will still be at start of string
this way we can use again (?=.*cd) and (?=.*ef)
but we also need to include in our regex .* at the end, because matches check if entire string matches our regex, so we need to somehow iterate over entire string.
Find every substring in the input and && the resulting boolean values.
String s = "lab stef ecde";
boolean ab = s.indexOf("ab") > -1;
boolean cd = s.indexOf("cd") > -1;
boolean ef = s.indexOf("ef") > -1;
boolean match = ab && cd && ef; // true
Edit
In Germany there is a proverb:
Warum einfach wenn es auch kompliziert geht?
"Why simple when you can do it complicated?"
That's what I think about regular expressions in this case.
This will do:
^.*(ab.*cd.*ef|ab.*ef.*cd|cd.*ab.*ef|cd.*ef.*ab|ef.*ab.*cd|ef.*cd.*ab).*$
You can test it here: http://www.regexplanet.com/advanced/java/index.html
I believe it is an overkill though. Another optimized solution would be better.

Categories

Resources