Java regex - parts of words - java

Is it possible to write regex in Java that matches word parts? For example I want to search for strings 'ab', 'cd' and 'ef'. A match should be returned in following examples:
[lab stef ecde], [lecde effe aaaab]
So a match should be returned if all strings ('ab', 'cd', 'ef') are parts of words anywhere in the text - order is not imported.
However match shouldn't be returned if any of strings are missing
[lab stef]

If it doesn't have to be regex then Tichodroma's answer is the one you are looking for.
But if you really need to complicate your life and use regex you can try to use look-around mechanisms like look ahead and create something like
"lab stef ecde".matches("(?=.*ab)(?=.*cd)(?=.*ef).*") //true
"lab stef".matches("(?=.*ab)(?=.*cd)(?=.*ef).*") //false
to explain it more clearly: in
(?=.*ab)(?=.*cd)(?=.*ef).*
(?=.*ab) will check if your string contains .*ab where .* will match any characters before ab part.
also look-ahead (?=...) is zero-width which means that it will reset cursor to the position where it was before look-ahead started so in our case it will still be at start of string
this way we can use again (?=.*cd) and (?=.*ef)
but we also need to include in our regex .* at the end, because matches check if entire string matches our regex, so we need to somehow iterate over entire string.

Find every substring in the input and && the resulting boolean values.
String s = "lab stef ecde";
boolean ab = s.indexOf("ab") > -1;
boolean cd = s.indexOf("cd") > -1;
boolean ef = s.indexOf("ef") > -1;
boolean match = ab && cd && ef; // true
Edit
In Germany there is a proverb:
Warum einfach wenn es auch kompliziert geht?
"Why simple when you can do it complicated?"
That's what I think about regular expressions in this case.

This will do:
^.*(ab.*cd.*ef|ab.*ef.*cd|cd.*ab.*ef|cd.*ef.*ab|ef.*ab.*cd|ef.*cd.*ab).*$
You can test it here: http://www.regexplanet.com/advanced/java/index.html
I believe it is an overkill though. Another optimized solution would be better.

Related

Regex starts with Capital letter and having length < 70

I want to match a string starting with capital letter and have length < 70.
I tried this regex ([A-Z][a-zA-Z\s\/\-]*\:?\'?) to check if the string starts with capital letter. It is working fine. But to check length, I changed to (([A-Z][a-zA-Z\s\/\-]*\:?\'?){4,70}) and it is not working.
Though, I can check the length using length() method of string in if statement. Doing so would make if statement lengthy. I want to combine length checking in regex itself. I think it can be done in regex, but I am not sure how.
Update(Forgot to mention): String can have either of two symbol- :,' and only one of two will be there for either zero or one time in the string.
E.g : Acceptable String : Looking forwards to an opportunity, WORK EXPERIENCE: , WORK EXPERIENCE- , India's Prime Minister
UnAcceptable String : Work Experience:: , Manager's Educational Qualification- , work experience: , Education - 2014 - 2017 , Education (Graduation)
Kindly help me.
Thanks in advance.
You'll certainly need anchors and lookarounds
(?=^[^-':\n]*[-':]{0,1}[^-':\n]*$)^[A-Z][-':\w ]{4,70}$
Thus, a string between 5-71 characters will be matched, see a demo on regex101.com. Additionally, it checks for the presence of zero or one of your Special characters (with the help of lookarounds, that is).
I would add ^ and $ to your regex:
^[A-Z].{,69}$
should work. This means:
^ beginning of the string
[A-Z] any capital character (in English anyway)
.{0,69} up to 69 other characters
$ end of the string
for a total length of up to 70 characters...
why would the if statement be lengthy?
String str = "Scary";
if (str.length() < 70 && str.charAt(0) >= 'A') {
}
Specify a lookaround assertion at the start of the regex that asserts that it may contain between 4 and 70 characters :
(?=.{4,70}$)
You would write so :
String regex = "(?=.{4,70}$)[A-Z][a-zA-Z\\s\\/\\-]*\\:?\\'?";
Working REGEX =
/\A^[A-Z][A-Za-z]*\z/

Check a string order in a sentence

I want to find out if a specific word comes before another. Partial words are not a match.
Some example tests:
“Hi my name is AB, I’m from London and I love it here ..."
if "from" is before "Hi" -> return false
if "Hi" is before "AB" -> return true
There are several ways of doing this:
Use indexOf - this is perhaps the simplest approach. Get indexes of the strings, and compare them. The string with a lower indexs is before the other string
Use regular expressions - construct a regex that matches the strings in the desired order, for example "from.*?Hi". This approach is likely to use multiple regular expressions.
One twist on the first approach would be to start searching for the second word at the index of the first word plus the length of the word, and avoid index comparisons. With many searches and long strings this could save you some CPU cycles.
Note: Depending on the requirements you may need to watch out for the Scunthorpe problem, when you get a false positive for a match on a substring. If your requirement is that "Hi, my friend AB" should be matched, but "Higher than AB" should not be matched, then the regex approach with \b anchors on both ends of the word would provide an easier solution than manipulating string indexes. The "from.*?Hi" regex above becomes "\\bfrom\\b.*?\\bHi\\b".
yourString.matches(".*? Hi\\b.*? AB\\b.*")
This will make sure that you have spaces in between and you're matching whole words.
If you're dealing with latin american stuff where puncuation can come before words, this is more general
yourString.matches(".*?\\bHi\\b.*?\\bAB\\b.*")
Breaking that down you have
.*? = anything, even the empty string. Ignore the ? for now.
\\b = a word boundary
So that regex means
<anything><word boundary>Hi<word boundary><anything><word boundary>AB<word boundary><anything>
which is the same as
if "Hi" is before "AB" -> return true
which would be used as
if(yourString.matches(".*?\\bHi\\b.*?\\bAB\\b.*")){
return true;
}
You can take a look at the indexOf(String string), which returns an integer denoting the position of the substring, or -1 if not found. You could use that to see which strings preceeds another.
You can use indexOf method and get the first occurrence of each word and then check. For example:
String sentence = "Hi my name is AB, I’m from London and I love it here …";
int fromIndex = sentence.indexOf("from");
int hiIndex = sentence.indexOf("Hi");
if (fromIndex < hiIndex)
System.out.println("false");
else
System.out.println("true");
Note that if a word does not exist within the sentence, then indexOf will return -1.

Java's regex confusion

Basically i want to match filename with .json extension but not file that start with . and excluding list.json.
This is what i come out with (without java string escapes)
(?i)^([^\.][^list].+|list.+)\.json$
I had use an online regex tester, Regexplanet to try my regex
http://fiddle.re/x9g86
Everything works fine with the regex tester, however when i tried it in Java. Everything that has the letter l,i,s,t will be excluded... which is very confusing for me.
Can anyone give me some clues?
Many thanks in advance.
I want to match filename with .json extension but not file that start with . and excluding list.json.
I am not sure you need regular expressions for this. I find the following much easier on the eye:
boolean match = s.endsWith(".json") && !s.startsWith(".") && !s.equals("list.json");
You're using a character exclusion class, [^list], which ignores character order and instead of excluding list, excludes any cases of l, i, s, or t.
Instead, you want to use a negative lookahead:
(?i)(?!^list\.json$)[^\.].*\.json
A negative look-ahead will do it.
(?i)(?!\.|list\.json$).*\.json
(?!\.|list\.json$) is a negative look-ahead checking that the characters following is not either list.json followed by the end of the string, or ..
Code:
String regex = "(?i)(?!\\.|list\\.json$).*\\.json";
System.out.println("list.json".matches(regex)); // false
System.out.println(".json".matches(regex)); // false
System.out.println("a.Json".matches(regex)); // true
System.out.println("abc.json".matches(regex)); // true
But NPE's more readable solution is probably preferred.

Java: substring.matches numbers

I'm trying to check if a my substring within my teleInput string contain numbers but I can't seem to get it return true. The results always ends up as false, what am I doing wrong.
String teleInput = "(555)555-5555";
boolean returntT = teleInput.substring(1,3).matches(".*[0-9].*");
I'm an extreme beginner so I don't know if I'm missing something obvious, like mixing methods or maybe something is wrong with the rest of my code, either way, I would really appreciate the help!
Your problem is your substring - you are only returning two characters: the second parameter is the index up to, but not including, the last character. To get 3 characters, you need:
teleInput.substring(1,4)
So try this (notice that matches only needs to check that it's "all digits", because the length is already know to be 3):
teleInput.substring(1,4).matches("\\d*");
Or just forget substring and use matches alone:
teleInput.matches(".\\d{3}.*");
Which is the regex for "any character then 3 digits then anything".
But you can validate the entire input in one line:
teleInput.matches("(.\\d{3}){3}\\d");
You might have to read up on regex to understand this pattern, but it works!
Note: matches() must match the entire string to be true.
If you want to check if the 2nd character in your string is number, you can do it like this:
String teleInput = "(555)555-5555";
boolean returntT = teleInput.substring(1,2).matches("[0-9]");
Instead of this line:
boolean returntT = teleInput.substring(1,3).matches(".*[0-9].*");
You should be doing:
boolean returntT = teleInput.substring(1,3).matches("[0-9]+");
OR this:
boolean returntT = teleInput.substring(1,3).matches("\\d{2}");
Regex [0-9]+ will make sure that the given input has only 1 or more digits in it. For good tutorial on regex read: http://www.regular-expressions.info/

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?
Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link
IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").
Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.
Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Categories

Resources