Regex string validation

Regex string validation - java

Trying to write some regex to validate a string, where null and empty strings are not allowed, but characters + new line should be allowed. The string I'm trying to validate is as follows:
First line \n
Second line \n
This is as far as i got:
^(?!\s*$).+
This fails my validation because of the new line. Any ideas? I should add, i cannot use awk.

Code
The following regex matches the entire line.
See regex in use here
^[^\r\n]*?\S.*$
The following regexes do the same as above except they're used for validation purposes only (they don't match the whole line, instead they simply ensures it's properly formed). The benefit of using these regexes over the one above is the number of steps (performance). In the regex101 links below they show as 28 steps as opposed to 34 for the pattern above.
See regex in use here
^[^\r\n]*?\S
See regex in use here
^.*?\S
Results
Input
First line \n
Second line \n
s
Output
Matches only
First line \n
Second line \n
s
Explanation
^ Assert position at the start of the line
[^\r\n]*? Match any character not present in the set (any character except the carriage return or line-feed characters) any number of times, but as few as possible (making this lazy increases performance - less steps)
\S Match any non-whitespace character
.* Match any character (excludes newline characters) any number of times
$ Assert position at the end of the line

Try this pattern:
([\S ]*(\n)*)*

Related

Matching pound (#) or empty line comments with regex

As a start, I am using Java, if this influences the regex.
I am trying to match the contents of a line that start with any number of whitespace character but no other, followed by any number of pounds (#), and followed by any characters, then ending with a new line.
Or, a fully empty line with only either whitespace or new line.
I tried finding the first part myself but it doesn't seem to match any of the comments:
^(?!.+)#+.*$
It doesn't work even if I include \r*\n* on the end

In your regexr example you have selected Javascript and enabled the s flag to have to dot match a newline.
If you want to match all lines, you can enable the multiline and global flag instead, and use
^[^\S\r\n]*(?:#.*)?\r?\n
Regex demo
In Java, you might use
^\h*(?:#.*)?\R
With the doubled escapes backslashes
String regex = "^\\h*(?:#.*)?\\R";
The pattern matches:
^ Start of string
\h* Match optional horizontal whitespace chars
(?:#.*)? Optionally match # followed by the rest of the line
\R Match any Unicode newline sequence
Regex demo
If you want to match the whole line, and instead of matching a newline you want to assert the end of the string you can use an anchor $ instead of \R
^\h*(?:#.*)?$
Regex demo

Unable to understand the result of RegEx split for a string

I am running below 2 lines in Java (Java 8):
String dirTree = "dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext";
String[] result = dirTree.split("\\n\\t[^\\t]");
Result seen - result:
["dir", "ubdir1", "ubdir2\n\t\tfile.ext"]
I was expecting - result:
["dir", "subdir1", "subdir2\n\t\tfile.ext"]
Can someone please explain why the first character of the strings in result are missing(Eg - "ubdir1" instead of "subdir1") ?

Just Split by this :
\n\t(?!\t)
Explanation:
\n\t matches one newline and one tab
(?!\t) negative lookahead to ensure no \t followed immediately
after \n\t
So the difference between (?!\t) and [^\t] is that the first one returns true or false and second one matches the character. So in your case, it matched the non tab character and used that to split as well.

You have to understand how the regex pattern works before applying it.
Your regex pattern is \n\t[^\t] This search for a pattern with \n\t and any character except \t. [^\t] negates the pattern. This pattern matches any character except \t. So in your case it is matching s because its is any character except \t.
To get your expected result your pattern should be \n\t

How to spot * in regular expressions?

I want to spot and delete all lines that have *** in them. How can I do this?
I tried to use regex but got
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 6
Here is my regular expression: (?m)^**.*.
.........text...........
***..........text....... //want to delete this line
........................

The * character in a regular expression has a special meaning. To show the Pattern you don't mean for this special meaning, you have to "escape" it. The easiest way to do it is to put your expression through Pattern.quote().
For example:
String searchFor = Pattern.quote("***");
Then use that string to search

Note that* is a special character in regex so you have to use \\*
Your expression will be: (?m)^\\*\\*.\\*

This is not perfect, but it'll get you started:
// 4 lines, 2 of each containing "***" at random locations
String input = "abc***def\nghijkl\n***mnop\n**blah";
// replacing multiline pattern starting with any character 0 or more times,
// followed by 3 escaped "*"s,
// followed by any character 0 or more times
System.out.println(input.replaceAll("(?m).*\\*{3}.*", ""));
Output:
ghijkl
**blah

If the three asterisks are not always at the begining of the line, you can use this pattern that removes newlines too:
(\r?\n)?[^\r\n*]*\Q***\E.*((1)?|\r?\n?)

If all you're doing is looking for three specific characters together in a string, you don't need a regex at all:
if (line.contains("***")) {
...
}
(But if things get more complicated and you do need a regex, then use a backslash or Pattern.quote as the other answers say.)
(This is assuming you're reading lines one at a time, instead of having one big long buffer containing all the lines with newline characters. Some of the other answers handle the latter case.)

Matching at line endings

This is a pretty trivial thing: Replace a line
possibly containing trailing blanks
ended by '\n', '\r', '\r\n' or nothing
by a line containing no trailing blanks and ended by '\n'.
I thought I could do it via a simple regex. Here, "\\s+$" doesn't work as the $ matches before the final \n. That's why there's \\z. At least I thought. But
"\n".replaceAll("\\s*\\z", "\n").length()
returns 2. Actually, $, \\z, and \\Z do exactly the same thing here. I'm confused...
The explanation by Alan Moore was helpful, but it was just now when it occurred to me that for replacing an arbitrary final blank garbage at EOF I can do
replaceFirst("\\s*\\z"", "\n");
instead of replaceAll. A simple solution doing all the things described above is
replaceAll("(?<!\\s)\\s*\\z|[ \t]*(\r?\n|\r)", "\n");
I'm afraid, it's not very fast, but it's acceptable.

Actually, the \z is irrelevant. On the first match attempt, \s* consumes the linefeed (\n) and \z succeeds because it's now at the end of the string. So it replaces the linefeed with a linefeed, then it tries to match at the position after the linefeed, which is the end of the string. It matches again because \s* is allowed to match empty string, so it replaces the empty sting with another linefeed.
You might expect it to go on matching nothing and replacing it with infinite linefeeds, but that can't happen. Unless you reset it, the regex can't match twice at the same position. Or more accurately, starting at the same position. In this case, the first match started at position #0, and the second at position #1.
By the way, \s+$ should match the string "\n"; $ can match the very end of the string as well as before a line separator at the end of the string.
Update: In order to handle both cases: (1) getting rid of unwanted whitespace at the end of the line, and (2) adding a linefeed in cases where there's no unwanted whitespace, I thin your best bet is to use a lookbehind:
line = line.replaceAll("(?<!\\s)\\s*\\z", "\n");
This will still match every line, but it will only match once per line.

Could you just do something like the following?
String result = myString.trim() + '\n';

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!

There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff

How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex string validation - java

Try this pattern: ([\S ](\n))*

Related

Matching pound (#) or empty line comments with regex

Unable to understand the result of RegEx split for a string

How to spot * in regular expressions?

Matching at line endings

capture all characters between match character (single or repeated) on string

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex string validation - java

Try this pattern: ([\S ]*(\n)*)*

Related

Matching pound (#) or empty line comments with regex

Unable to understand the result of RegEx split for a string

How to spot * in regular expressions?

Matching at line endings

capture all characters between match character (single or repeated) on string

Categories

Resources

Try this pattern: ([\S ](\n))*