How to use regex in java to match a word - java

I am trying to match a word (Source Ip) where each letter can be small or capital letter so i have wrote a regex pattern down but my m.find() is showing false even for Correct Match...
Is there any wrong in my regex pattern or the way I have written is wrong?
String word = "Source Ip";
String pattern = "[S|s][O|o][U|u][R|r][C|c][E|e]\\s*[I|i][P|p]";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(word);
System.out.println(m.find());

You can simple use
String pattern = "SOURCE\\s*IP";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Pattern.CASE_INSENSITIVE will make the matching case insensitive.

You don't need to alternate all letters between upper case and lowercase (note, as mentioned by others, the character class idiom does not require | to alternate - adding it between square brackets will also match the | literal).
You can parametrize your Pattern initialization with the Pattern.CASE_INSENSITIVE flag (alternative inline usage would be (?i) before your pattern representation).
For instance:
Pattern.compile("(?i)source\\s*ip");
... or ...
Pattern.compile("source\\s*ip", Pattern.CASE_INSENSITIVE);
Note
Flag API here.

This solution has the problem of accepting sourceip as well.
source\s*ip
Debuggex Demo
Therefore the correct answer should be
source\s+ip
Debuggex Demo
to force the presence of at least one whitespace between the two words.
To use this expression in Java you have to escape the backslash and use something like:
Pattern.compile("source\\s+ip", Pattern.CASE_INSENSITIVE);

if you really want to use regex though for some reason and not pattern. methods then this regex should suit your needs and it works just fine in java for me
[s|S][o|O][U|u][r|R][c|C][e|E][ ]*[i|I][p|P]
your case of using
\\s*
won't identify spaces however this will
\s*
you had one slash too many :)
EDIT:
I demonstrate my ignorance, after checking regexpal I was wrong, [sS] is better than [s|S]
[sS][oO][Uu][rR][cC][eE][ ]*[iI][pP]
thank you Pshemo
and yes i completely forgot about escaping characters Mena thank you for pointing that out for me

Related

Regex For All String Except Certain Characters

I am trying to write a regular expression that matches a certain word that is not preceded by 2 dashes (--) or a slash and a star (/*). I tried some expression but none seem to work.
Below is the text I am testing on
a_func(some_param);
/* a comment initialization */
init;
I am trying to write a regex that will only match the word init in the last line alone, what I've tried so far is matching the word init in initialization and the last line, I tried to look for existing answers, and found that used negative lookahead, but it was still matching init in initialization. Below are the expressions I tried:
(?!\/\*).*(init)
[^(\-\-|\/\*)].*(init)
(?<!\/\*).*(init) While reading in regex101's quick reference, I found this negative lookbehind which I believe had a similar example to what I need but I was still not able to get what I want, should I look into the negative lookbehind more or is this not how I achieve what I want?
My knowledge in regular expression is not that extensive, so I don't know if it is possible for what I want or not, but is it doable?
Assuming the -- or /* are on the same line as the init, there are some options. As the commenters said, multiline comments will likely require stronger techniques.
The simplest way I know is to actually preprocess the strings to remove the --.*$ and /\*.*$, then look for init (or init\b if you don't want to match initialization):
String input = "if init then foo";
String checkme = input.replaceAll("--.*$", "").replaceAll("/\\*.*$", "");
Pattern pattern = Pattern.compile("init"); // or "init\\b"
Matcher matcher = pattern.matcher(checkme);
System.out.println(matcher.find());
You can also use negative lookbehind as in #olsli's answer.
You can start with:
String input = "/*init";
Pattern pattern = Pattern.compile("^((?!--|\\/\\*).)*init");
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.find());
I have added more braces to separate things out. This should also work, tested it in Regexr and IDEONE
Pattern p = Pattern.compile("^(?!=((\\/\\*)|(--)))([.]*?)init[.]*$", Pattern.MULTILINE|Pattern.CASE_INSENSITIVE);
String s = "/* Initialisation";
Matcher m = p.matcher(s);
m.find(); /* should return you >-1 if there's a match

Can't get a match for regular expression in Java

This is the format/example of the string I want to get data:
<span style='display:block;margin-bottom:3px;'><a style='margin:4px;color:#B82933;font-size:120%' href='/cartelera/pelicula/18312'>Español </a></span><br><span style='display:block;margin-bottom:3px;'><a style='margin:4px;color:#FBEBC4;font-size:120%' href='/cartelera/pelicula/18313'>Subtitulada </a></span><br> </div>
And this is the regular expression I'm using for it:
"pelicula/([0-9]*)'>([\\w\\s]*)</a>"
I tested this regular expression in RegexPlanet, and it turned out OK, it gave me the expected result:
group(1) = 18313
group(2) = Subtitulada
But when I try to implement that regular expression in Java, it won't match anything. Here's the code:
Pattern pattern = Pattern.compile("pelicula/([0-9]*)'>([\\w\\s]*)</a>");
Matcher matcher = pattern.matcher(inputLine);
while(matcher.find()){
version = matcher.group(2);
}
}
What's the problem? If the regular expression is already tested, and in that same code I search for more patterns but I'm having trouble with two (I'm showing you here just one). Thank you in advance!
_EDIT__
I discovered the problem... If I check the sourcecode of the page it shows everything, but when I try to consume it from Java, it gets another sourcecode. Why? Because this page asks for your city so it can show information about that. I don't know if there's a workaround about that to actually access the information I want, but that's it.
Your regex is correct but it seems \w does not match ñ.
I changed the regex to
"pelicula/([0-9]*)'>(.*?)</a>"
and it seems to match both the occurrences.
Here I've used the reluctant *? operator to prevent .* match all characters in between first <a> till last <\a>
See What is the difference between `Greedy` and `Reluctant` regular expression quantifiers? for explanation.
#Bohemian is correct in pointing out that you might need to enable the Pattern.DOTALL flag as well if the text in <a> has line breaks
If your input is over several lines (ie it contains newline characters) you'll need to turn on "dot matches newline".
There are two way to do this:
Use the "dot matches newline" regex switch (?s) in your regex:
Pattern pattern = Pattern.compile("(?s)pelicula/([0-9]*)'>([\\w\\s]*)</a>");
or use the Pattern.DOTALL flag in the call to Pattern.compile():
Pattern pattern = Pattern.compile("pelicula/([0-9]*)'>([\\w\\s]*)</a>", Pattern.DOTALL);

Java Regex for username

I'm looking for a regex in Java, java.util.regex, to accept only letters ’, -, and . and a range of Unicode characters such as umlauts, eszett, diacritic and other valid letters from European languages.
What I don't want is numbers, spaces like “ ” or “ Tom”, or special characters like !”£$% etc.
So far I'm finding it very confusing.
I started with this
[A-Za-z.\\s\\-\\.\\W]+$
And ended up with this:
[A-Za-z.\\s\\-\\.\\D[^!\"£$%\\^&*:;##~,/?]]+$
Using the cavat to say none of the inner square brackets, according to the documentation
Anyone have any suggestions for a new regex or reasons why the above isn't working?
For my answer, I want to use a simpler regex similar to yours: [A-Z[^!]]+, which means "At least once: (a character from A to Z) or (a character that is not '!').
Note that "not '!'" already includes A to Z. So everything in the outer character group([A-Z...) is pointless.
Try [\p{Alpha}'-.]+ and compile the regex with the Pattern.UNICODE_CHARACTER_CLASS flag.
Use: (?=.*[##$%&\s]) - Return true when atleast one special character (from set) and also if username contain space.
you can add more special character as per your requirment. For Example:
String str = "k$shor";
String regex = "(?=.*[##$%&\\s])";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find()); => gives true

Find string in between two strings using regular expression

I am using a regular expression for finding string in between two strings
Code:
Pattern pattern = Pattern.compile("EMAIL_BODY_XML_START_NODE"+"(.*)(\\n+)(.*)"+"EMAIL_BODY_XML_END_NODE");
Matcher matcher = pattern.matcher(part);
if (matcher.find()) {
..........
It works fine for texts but when text contains special characters like newline it's break
You need to compile the pattern such that . matches line terminaters as well. To do this you need to use the DOTALL flag.
Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
edit: Sorry, it's been a while since I've had this problem. You'll also have to change the middle regex from (.*)(\\n+)(.*) to (.*?). You need to lazy quantifier (*?) if you have multiple EMAIL_BODY_XML_START_NODE elements. Otherwise the regex will match the start of the first element with the end of the last element rather than having separate matches for each element. Though I'm guessing this is unlikely to be the case for you.

Java Regex Question - Ignore Quotations

I am trying to write a program using regex. The format for an identifier, as I might have explained in another question of mine, is that it can only begin with a letter (and the rest of it can contain whatever). I have this part worked out for the most part.
However, anything within quotes cannot count as an identifier either.
Currently I am using Pattern pattern = Pattern.compile("[A-Za-z][_A-Za-z0-9]*"); as my pattern, which indicates that the first character can only be letters. So how can I edit this to check if the word is surrounded by quotations (and EXCLUSE those words)?
Use negative lookaround assertions:
"(?<!\")\\b[A-Za-z][_A-Za-z0-9]*\\b(?!\")"
Example:
Pattern pattern = Pattern.compile("(?<!\")\\b[A-Za-z][_A-Za-z0-9]*\\b(?!\")");
Matcher matcher = pattern.matcher("Foo \"bar\" baz");
while (matcher.find())
{
System.out.println(matcher.group());
}
Output:
Foo
baz
See it working online: ideone.
Use lookarounds.
"(?<![\"A-Za-z])[A-Z...
The (?<![\"A-Za-z]) part means "if the previous character is not a quotation mark or a letter".

Categories

Resources