Matching pound (#) or empty line comments with regex

Matching pound (#) or empty line comments with regex - java

As a start, I am using Java, if this influences the regex.
I am trying to match the contents of a line that start with any number of whitespace character but no other, followed by any number of pounds (#), and followed by any characters, then ending with a new line.
Or, a fully empty line with only either whitespace or new line.
I tried finding the first part myself but it doesn't seem to match any of the comments:
^(?!.+)#+.*$
It doesn't work even if I include \r*\n* on the end

In your regexr example you have selected Javascript and enabled the s flag to have to dot match a newline.
If you want to match all lines, you can enable the multiline and global flag instead, and use
^[^\S\r\n]*(?:#.*)?\r?\n
Regex demo
In Java, you might use
^\h*(?:#.*)?\R
With the doubled escapes backslashes
String regex = "^\\h*(?:#.*)?\\R";
The pattern matches:
^ Start of string
\h* Match optional horizontal whitespace chars
(?:#.*)? Optionally match # followed by the rest of the line
\R Match any Unicode newline sequence
Regex demo
If you want to match the whole line, and instead of matching a newline you want to assert the end of the string you can use an anchor $ instead of \R
^\h*(?:#.*)?$
Regex demo

Related

What is the Regular Expression to get all the newline characters from the end of the string

I have tried with [\s]+$ and (?:$|\s)+$ but i don't get the desired output.
What i am looking for is
String str ="this is a string ending with multiple newlines\n\n\n"
the new line can be : \n or \r or \r\n depending on OS so we use \s+ here.
I need to find all the newline chars from end of the string
and i have to use it in Java Code

The point is that \s, in Java, matches any non-Unicode whitespace by default (it matches any Unicode whitespace if you use (?U)\s).
You can use
String regex = "\\R+$";
String regex = "\\R+\\z";
See the regex demo.
If you need to get each individual line break sequence at the end of string, you can use
String regex = "\\R(?=\\R*$)";
See this regex demo.
These patterns mean
\R+ - one or more line break sequences
$ - at the end of the string (\z matches the very end of string and will work identically in this case)
\R(?=\R*$) - any line break sequence followed with zero or more line break sequences up to the end of the whole string.

Regex string validation

Trying to write some regex to validate a string, where null and empty strings are not allowed, but characters + new line should be allowed. The string I'm trying to validate is as follows:
First line \n
Second line \n
This is as far as i got:
^(?!\s*$).+
This fails my validation because of the new line. Any ideas? I should add, i cannot use awk.

Code
The following regex matches the entire line.
See regex in use here
^[^\r\n]*?\S.*$
The following regexes do the same as above except they're used for validation purposes only (they don't match the whole line, instead they simply ensures it's properly formed). The benefit of using these regexes over the one above is the number of steps (performance). In the regex101 links below they show as 28 steps as opposed to 34 for the pattern above.
See regex in use here
^[^\r\n]*?\S
See regex in use here
^.*?\S
Results
Input
First line \n
Second line \n
s
Output
Matches only
First line \n
Second line \n
s
Explanation
^ Assert position at the start of the line
[^\r\n]*? Match any character not present in the set (any character except the carriage return or line-feed characters) any number of times, but as few as possible (making this lazy increases performance - less steps)
\S Match any non-whitespace character
.* Match any character (excludes newline characters) any number of times
$ Assert position at the end of the line

Try this pattern:
([\S ]*(\n)*)*

How to Java Regex to match everything but specified pattern

I am trying to match everything but garbage values in the entire string.The pattern I am trying to use is:
^.*(?!\w|\s|-|\.|[#:,]).*$
I have been testing the pattern on regexPlanet and this seems to be matching the entire string.The input string I was using was:
Vamsi///#k03#g!!!l.com 123**5
How can I get it to only match everything but the pattern,I would like to replace any string that matches with an empty space or a special charecter of my choice.

The pattern, as written, is supposed to match the whole string.
^ - start of string.
.* - zero or more of any character.
(?!\w|\s|-|\.|[#:,]) - negative look-ahead for some characters.
.* - zero or more of any character.
$ - end of string.
If you only want to match characters which aren't one of the supplied characters, try simply:
[^-\w\s.#:,]
[^...] is a negated character class, it will match any characters not supplied in the brackets. See this for more information.
Test.

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!

There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff

How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

Check string contains whitespace along with some other char sequence using regex in java

am using regex expression to check if a string contains white space.
my regex is : ^\\s+$
for example if my string is my name then regex matches should return true.
but it is returning true only if my string contains only spaces no other character.
How to check if a string contains a whitespace or tab or carriage return characters in between/start/end of some string.

^(.*\s+.*)+$ seems to work for me. Accepts anything as long as there is at least one space in the string. This will match the entire string.
If you only want to check for the presence of a space, you can just use \s without any begin or end markers in the string. The difference is that this will only match the individual spaces.

Your regex is not correct.
That's a string representing a regular expression. (as tchrist pointed out correctly)
The corresponding pattern that you get when using Pattern.compile() matches only strings containing one or more whitespace characters, starting from the beginning until the end. Thus, the matching string only consists of whitespace characters.
Try this string instead for Pattern.compile():
"\\s+"
The difference is that without the anchors "^" and "$" there may be other characters around the whitespace character. The whitespace character(s) may be everywhere in the string.
Using this pattern-string the whitespace character(s) must be at the beginning:
"^\\s+"
And here the sequence of whitespace characters has to be at the end:
"\\s+$"

Use org.apache.commons.lang.StringUtils.containsAny(). See http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Matching pound (#) or empty line comments with regex - java

Related

What is the Regular Expression to get all the newline characters from the end of the string

Regex string validation

How to Java Regex to match everything but specified pattern

capture all characters between match character (single or repeated) on string

Check string contains whitespace along with some other char sequence using regex in java

Categories

Resources