How to spot * in regular expressions?

How to spot * in regular expressions? - java

I want to spot and delete all lines that have *** in them. How can I do this?
I tried to use regex but got
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 6
Here is my regular expression: (?m)^**.*.
.........text...........
***..........text....... //want to delete this line
........................

The * character in a regular expression has a special meaning. To show the Pattern you don't mean for this special meaning, you have to "escape" it. The easiest way to do it is to put your expression through Pattern.quote().
For example:
String searchFor = Pattern.quote("***");
Then use that string to search

Note that* is a special character in regex so you have to use \\*
Your expression will be: (?m)^\\*\\*.\\*

This is not perfect, but it'll get you started:
// 4 lines, 2 of each containing "***" at random locations
String input = "abc***def\nghijkl\n***mnop\n**blah";
// replacing multiline pattern starting with any character 0 or more times,
// followed by 3 escaped "*"s,
// followed by any character 0 or more times
System.out.println(input.replaceAll("(?m).*\\*{3}.*", ""));
Output:
ghijkl
**blah

If the three asterisks are not always at the begining of the line, you can use this pattern that removes newlines too:
(\r?\n)?[^\r\n*]*\Q***\E.*((1)?|\r?\n?)

If all you're doing is looking for three specific characters together in a string, you don't need a regex at all:
if (line.contains("***")) {
...
}
(But if things get more complicated and you do need a regex, then use a backslash or Pattern.quote as the other answers say.)
(This is assuming you're reading lines one at a time, instead of having one big long buffer containing all the lines with newline characters. Some of the other answers handle the latter case.)

Related

Complicated regex and possible simple way to do it [duplicate]

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?

Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma

Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.

Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.

You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+

Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)

Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.

Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement

Regex-How to prevent repeated special characters?

I don't have an experience on Regular Expressions. I need to a regular expression which doesn't allow to repeat of special characters (+-*/& etc.)
The string can contain digits, alphanumerics, and special characters.
This should be valid : abc,df
This should be invalid : abc-,df
i will be really appreciated if you can help me ! Thanks for advance.

Two solutions presented so far match a string that is not allowed.
But the tilte is How to prevent..., so I assume that the regex
should match the allowed string. It means that the regex should:
match the whole string if it does not contain 2
consecutive special characters,
not match otherwise.
You can achieve this putting together the following parts:
^ - start of string anchor,
(?!.*[...]{2}) - a negative lookahead for 2 consecutive special
characters (marked here as ...), in any place,
a regex matching the whole (non-empty) string,
$ - end of string anchor.
So the whole regex should be:
^(?!.*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2}).+$
Note that within a char class (between [ and ]) a backslash
escaping the following char should be placed before - (if in
the middle of the sequence), closing square bracket,
a backslash itself and / (regex terminator).
Or if you want to apply the regex to individual words (not the whole
string), then the regex should be:
\b(?!\S*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2})\S+

[\,\+\-\*\/\&]{2,} Add more characters in the square bracket if you want.
Demo https://regex101.com/r/CBrldL/2

Use the following regex to match the invalid string.
[^A-Za-z0-9]{2,}

[^\w!\s]{2,} This would be a shortest version to match any two consecutive special characters (ignoring space)
If you want to consider space, please use [^\w]{2,}

How to extract multi-line text delimited by 2 strings

I've following pattern:
Claims(40)
This is good.
This is good, too.
Description
This is description.
The delimiter strings in this case are:
1st delimiter: "Claims(40)"
2nd delimiter: "Description"
I want to extract text between these delimiters while excluding the delimiters.
Also, in the above text, following rules exist:
1st delimiter starts on the 1st column in the text and it's the only word on the line.
In the first delimiter, opening parenthesis, combination of digits, and closing parenthesis may be absent. However, combination of digits and closing parenthesis exist if does the opening parenthesis.
2nd delimiter starts on the 1st column in the text and it's the only word on the line.
My regular expression:
String regxStr = "^Claims(\\(\\d+\\)?)$(.*?)^Description$";
This doesn't work.
I tried a lot many other regx, but none did work. So finally, I resorted applying brute-force approach with the regex:
String regxStr = "Claims(.*?)Description";
But neither of the regx is working. I am not being able to figure out what's and where the regx is going wrong.
I'm using Matcher class and find() method of Matcher class for further processing.
Please help me.

This captures the text you want, although I'm not totally clear on your requirements for the (40) part. #lovetostrike's answer addresses that.
\bClaims(?:\(\d+\))?\s+(.+?)\s+Description\b
You must activate the DOTALL flag when compiling the pattern:
Pattern.compile(regxStr, Pattern.DOTALL)
Escaped in a Java string:
"\\bClaims(?:\\(\\d+\\))?\\s+(.+?)\\s+Description\\b"

Here's a one-line solution:
String target = input.relaceAll(".*Claims(\\(\\d+\\))?\\s+(.*?)Description.*", "$1");

Also in addition to #aliteralmind answer, Regex isn't a good tool for nested structure, i.e. matching paren pairs. But in your simple case, you can use the OR, '|', operator in your pattern. The outer parens are used to separate the two groups for OR operator, first part with parens, and the second without parens.
(\\(\\d+\\)|\\d+)

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!

There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff

How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

Need regular expression for pattern this

I need a regular expression for below pattern
It can start with / or number
It can only contain numbers, no text
Numbers can have space in between them.
It can contain /*, at least 1 number and space or numbers and /*
Valid Strings:
3232////33 43/323//
3232////3343/323//
/3232////343/323//
Invalid Strings:
/sas/3232/////dsds/
/ /34343///// /////
///////////
My Problem is, it can have space between numbers like /3232 323/ but not / /.
How to validate it ?
I have tried so far:
(\\d[\\d ]*/+) , (/*\\d[\\d ]*/+) , (/*)(\\d*)(/*)

This regex should work for you:
^/*(?:\\d(?: \\d)*/*)+$
Live Demo: http://www.rubular.com/r/pUOYFwV8SQ

My solution is not so simple but it works
^(((\d[\d ]*\d)|\d)|/)*((\d[\d ]*\d)|\d)(((\d[\d ]*\d)|\d)|/)*$

Just use lookarounds for the last criteria.
^(?=.*?\\d)([\\d/]*(?:/ ?(?!/)|\\d ?))+$
The best would have been to use conditional regex, but I think Java doesn't support them.
Explanation:
Basically, numbers or slashes, followed by one number and a space, or one slash and a space which is not followed by another slash. Repeat that. The space is made optional because I assume there's none at the end of your string.

Try this java regex
/*(\\d[\\d ]*(?<=\\d)/+)+
It meets all your criteria.
Although you didn't specifically state it, I have assumed that a space may not appear as the first or last character for a number (ie spaces must be between numbers)

"(?![A-z])(?=.*[0-9].*)(?!.*/ /.*)[0-9/ ]{2,}(?![A-z])"
this will match what you want but keep in mind it will also match this
/3232///// from /sas/3232/////dsds/
this is because part of the invalid string is correct
if you reading line by line then match the ^ $ and if you are reading an entire block of text then search for \r\n around the regex above to match each new line

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to spot * in regular expressions? - java

Note that* is a special character in regex so you have to use \\* Your expression will be: (?m)^\\\\.\\*

If the three asterisks are not always at the begining of the line, you can use this pattern that removes newlines too: (\r?\n)?[^\r\n]\Q***\E.*((1)?|\r?\n?)

Related

Complicated regex and possible simple way to do it [duplicate]

Regex-How to prevent repeated special characters?

How to extract multi-line text delimited by 2 strings

capture all characters between match character (single or repeated) on string

Need regular expression for pattern this

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to spot * in regular expressions? - java

Note that* is a special character in regex so you have to use \\* Your expression will be: (?m)^\\*\\*.\\*

If the three asterisks are not always at the begining of the line, you can use this pattern that removes newlines too: (\r?\n)?[^\r\n*]*\Q***\E.*((1)?|\r?\n?)

Related

Complicated regex and possible simple way to do it [duplicate]

Regex-How to prevent repeated special characters?

How to extract multi-line text delimited by 2 strings

capture all characters between match character (single or repeated) on string

Need regular expression for pattern this

Categories

Resources

Note that* is a special character in regex so you have to use \\* Your expression will be: (?m)^\\\\.\\*

If the three asterisks are not always at the begining of the line, you can use this pattern that removes newlines too: (\r?\n)?[^\r\n]\Q***\E.*((1)?|\r?\n?)