Regex For All String Except Certain Characters - java

I am trying to write a regular expression that matches a certain word that is not preceded by 2 dashes (--) or a slash and a star (/*). I tried some expression but none seem to work.
Below is the text I am testing on
a_func(some_param);
/* a comment initialization */
init;
I am trying to write a regex that will only match the word init in the last line alone, what I've tried so far is matching the word init in initialization and the last line, I tried to look for existing answers, and found that used negative lookahead, but it was still matching init in initialization. Below are the expressions I tried:
(?!\/\*).*(init)
[^(\-\-|\/\*)].*(init)
(?<!\/\*).*(init) While reading in regex101's quick reference, I found this negative lookbehind which I believe had a similar example to what I need but I was still not able to get what I want, should I look into the negative lookbehind more or is this not how I achieve what I want?
My knowledge in regular expression is not that extensive, so I don't know if it is possible for what I want or not, but is it doable?

Assuming the -- or /* are on the same line as the init, there are some options. As the commenters said, multiline comments will likely require stronger techniques.
The simplest way I know is to actually preprocess the strings to remove the --.*$ and /\*.*$, then look for init (or init\b if you don't want to match initialization):
String input = "if init then foo";
String checkme = input.replaceAll("--.*$", "").replaceAll("/\\*.*$", "");
Pattern pattern = Pattern.compile("init"); // or "init\\b"
Matcher matcher = pattern.matcher(checkme);
System.out.println(matcher.find());
You can also use negative lookbehind as in #olsli's answer.

You can start with:
String input = "/*init";
Pattern pattern = Pattern.compile("^((?!--|\\/\\*).)*init");
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.find());

I have added more braces to separate things out. This should also work, tested it in Regexr and IDEONE
Pattern p = Pattern.compile("^(?!=((\\/\\*)|(--)))([.]*?)init[.]*$", Pattern.MULTILINE|Pattern.CASE_INSENSITIVE);
String s = "/* Initialisation";
Matcher m = p.matcher(s);
m.find(); /* should return you >-1 if there's a match

Related

Regex matches exact string contains word

I want to "catch" the next path to do some action on it:
/root/m/api/users/<user-id-can be any combination of characters and digits>/content
The path must ends with content
For example:
/root/m/api/users/acme/content
To do so, I need to match regex to know if this the correct path:
private boolean isPathAllow(final String urlToBlock) {
Matcher matcher = Pattern.compile("^/root/m/api/users/.*/content$").matcher(urlToBlock);
return matcher.matches();
}
But it's return true even on requests like:
/root/m/api/users/acme/applications/versions/1.0/content
So I must do something wrong in the matches function.
Any help to do so as it's supposed to be?
I succeeded with:
Matcher matcher = Pattern.compile("^/root/m/api/users/\\w*/content$").matcher(urlToBlock);
or
Matcher matcher = Pattern.compile("^/root/m/api/users/[^/]+/content$").matcher(urlToBlock);
So what are the differents between them (\\w* vs [^/]+)?
.* is greedy so it takes everything between users/ and /content.
Use [^/] to catch everything that is not / between users/ and /content. Or you can make the .* lazy by appending a question mark (?).
A 'greedy' quantifier will try to match as much tokens possible. A 'lazy' quantifier will stop at the first mach.
In some cases, greedy quantifiers can also be much less efficient, as the regex engine will try to match more (or a lot more) tokens after the actual good match. And will back trace only after a certain failure.

a strange regular on look behind

i write a piece of program to fetch content from a string between ":"(may not have) and "#" and order guaranteed,for example a string like "url:123#my.com",the I fetch "123",or "123#my.com" then i fetch "123" ,too; so I write a regular expression to implement it ,but i can not work,behind is first version:
Pattern pattern = Pattern.compile("(?<=:?).*?(?=#)");
Matcher matcher = pattern.matcher("sip:+8610086#dmcw.com");
if (matcher.find()) {
Log.d("regex", matcher.group());
} else {
Log.d("regex", "not match");
}
it can not work because in the first case:"url:123#my.com" it will get the result:"url:123"
obviously not what i want:
so i write the second version:
Pattern pattern = Pattern.compile("(?<=:??).*?(?=#)");
but it get the error,somebody said java not support variable length in look behind;
so I try the third version:
Pattern pattern = Pattern.compile("(?<=:).*?(?=#)|.*?(?=#)");
and its result is same as the first version ,BUT SHOULD NOT THE FIRST CONDITION BE CONSIDERED FIRST?
it same as
Pattern pattern = Pattern.compile(".*?(?=#)|(?<=:).*?(?=#)");
not left to right! I consider I understood regular expression before ,but confused again.thanks in advance anyway.
Try this (slightly edited, see comments):
String test = "sip:+8610086#dmcw.com";
String test2 = "8610086#dmcw.com";
Pattern pattern = Pattern.compile("(.+?:)?(.+?)(?=#)");
Matcher matcher = pattern.matcher(test);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
matcher = pattern.matcher(test2);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
Output:
+8610086
8610086
Let me know if you need explanations on the pattern.
You really don't need any look-aheads or look-behinds here. What you need can be accomplished by using a a greedy quantifer and some alternation:
.*(?:^|:)([^#]+)
By default java regular expression quantifiers (*+{n}?) are all greedy (will match as many characters as possible until a match can't be found. They can be made lazy by using a question mark after the quantifier like so: .*?
You will want to output capture group 1 for this expression, outputting capture group 0 will return the entire match.
As you said, you can't do a variable lookbehind in java.
Then, you can do something like this, you don't need lookbehind or lookaround.
Regex: :?([^#:]*)#
Example In this example (forget about \n, its because of regex101) you will get in the first group what you need, and you don't have to do anything special. Sometimes the easiest solution is the best.

Finding substring in RegEx Java

Hello I have a question about RegEx. I am currently trying to find a way to grab a substring of any letter followed by any two numbers such as: d09.
I came up with the RegEx ^[a-z]{1}[0-9]{2}$ and ran it on the string
sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0
However, it never finds r30, the code below shows my approach in Java.
Pattern pattern = Pattern.compile("^[a-z]{1}[0-9]{2}$");
Matcher matcher = pattern.matcher("sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0");
if(matcher.matches())
System.out.println(matcher.group(1));
it never prints out anything because matcher never finds the substring (when I run it through the debugger), what am I doing wrong?
There are three errors:
Your expression contains anchors. ^ matches only at the start of the string, and $ only matches at the end. So your regular expression will match "r30" but not "foo_r30_bar". You are searching for a substring so you should remove the anchors.
The matches should be find.
You don't have a group 1 because you have no parentheses in your regular expression. Use group() instead of group(1).
Try this:
Pattern pattern = Pattern.compile("[a-z][0-9]{2}");
Matcher matcher = pattern.matcher("sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0");
if(matcher.find()) {
System.out.println(matcher.group());
}
ideone
Matcher Documentation
A matcher is created from a pattern by invoking the pattern's matcher method. Once created, a matcher can be used to perform three different kinds of match operations:
The matches method attempts to match the entire input sequence against the pattern.
The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
The find method scans the input sequence looking for the next subsequence that matches the pattern.
It doesn't match because ^ and $ delimite the start and the end of the string. If you want it to be anywhere, remove that and you will succed.
Your regex is anchored, as such it will never match unless the whole input matches your regex. Use [a-z][0-9]{2}.
Don't use .matches() but .find(): .matches() is shamefully misnamed and tries to match the whole input.
How about "[a-z][0-9][0-9]"? That should find all of the substrings that you are looking for.
^[a-z]{1}[0-9]{2}$
sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0
as far as i can read this
find thr first lower gives[s] caps letter after it there should be two numbers meaning the length of your string is and always will be 3 word chars
Maybe if i have more data about your string i can help
EDIT
if you are sure of *number of dots then
change this line
Matcher matcher = pattern.matcher("sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0");
to
Matcher matcher = pattern.matcher("sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0".split("\.")[0]);
note:-
using my solution you should omit the leading ^ for pattern
read this page for Spliting strings

Find string in between two strings using regular expression

I am using a regular expression for finding string in between two strings
Code:
Pattern pattern = Pattern.compile("EMAIL_BODY_XML_START_NODE"+"(.*)(\\n+)(.*)"+"EMAIL_BODY_XML_END_NODE");
Matcher matcher = pattern.matcher(part);
if (matcher.find()) {
..........
It works fine for texts but when text contains special characters like newline it's break
You need to compile the pattern such that . matches line terminaters as well. To do this you need to use the DOTALL flag.
Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
edit: Sorry, it's been a while since I've had this problem. You'll also have to change the middle regex from (.*)(\\n+)(.*) to (.*?). You need to lazy quantifier (*?) if you have multiple EMAIL_BODY_XML_START_NODE elements. Otherwise the regex will match the start of the first element with the end of the last element rather than having separate matches for each element. Though I'm guessing this is unlikely to be the case for you.

How can I make a Java regex all or nothing?

I'm trying to make a regex all or nothing in the sense that the given word must EXACTLY match the regular expression - if not, a match is not found.
For instance, if my regex is:
^[a-zA-Z][a-zA-Z|0-9|_]*
Then I would want to match:
cat9
cat9_
bob_____
But I would NOT want to match:
cat7-
cat******
rango78&&
I want my regex to be as strict as possible, going for an all or nothing approach. How can I go about doing that?
EDIT: To make my regex absolutely clear, a pattern must start with a letter, followed by any number of numbers, letters, or underscores. Other characters are not permitted. Below is the program in question I am using to test out my regex.
Pattern p = Pattern.compile("^[a-zA-Z][a-zA-Z|0-9|_]*");
Scanner in = new Scanner(System.in);
String result = "";
while(!result.equals("-1")){
result = in.nextLine();
Matcher m = p.matcher(result);
if(m.find())
{
System.out.println(result);
}
}
I think that if you use String.matches(regex), then you will get the effect you are looking for. The documentation says that matches() will return true only if the entire string matches the pattern.
The regex won't match the second example. It's already strict, since * and & are not in the allowed set of characters.
It may match a prefix, but you can avoid this by adding '$' to the end of the regex, which explicitly matches end of input. So try,
^[a-zA-Z][a-zA-Z|0-9|_]*$
This will ensure the match is against the entire input string, and not just a prefix.
Note that \w is the same as [A-Za-z0-9_]. And you need to anchor to the end of the string like so:
Pattern p = Pattern.compile("^[a-zA-Z]\\w*$")

Categories

Resources