I need to split my text into pieces and also keep the delimiter as well, I know I can use below code to do that as explained Here:
Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))"))
but what I'm stuck is that my text contains delimiter with a value inside it, my delimiter is #[x]#
where x is a value which can be any number. eg: #[1]#, #[44]#. What I want to achieve is to get an array as below:
text : "Hello my Name is blabla.#[1]#How are you today?#[2]#ByeBye"
and what I need to get:
[ "Hello my Name is blabla.", "#[1]#", "How are you today?", "#[2]#", "ByeBye" ]
How can I achieve that? Thanks in advance.
Try the following regex as a delimiter:
((?<=(#\\[\\d\\]#))|(?=(#\\[\\d\\]#)))
basically replacing the semi-colon with the expression (#\\[\\d\\]#) where \d matches any digit.
If more than one digit can exist, you can specify a range for the possible number of digits, for example \d{1,1000} instead of \d to have a maximum of 1000 digits. An unknown number of digits using an expression like \d+ cannot be used with Java lookbehind regular expressions.
Related
I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.
Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement
I am working with a calculator , after writing up the whole statement in the EditText, I want to extract the numbers and calculator symbols from the edit text field. The numbers can be simple integers or decimal numbers, i.e; Float and Double. For example I have this string now, 2 + 2.7 - 10 x 20.000. The regex will extract the +,- and x separately and the numbers as; 2,2.7,10,20.000. Basically i need to regex for each.
The regex you're searching for will be different from the regex from the split. If you use a Pattern for this, things will get really slow and complicated. So here we're using String#split().
Here is your regex:
"\s*([^\d.]+)\s*"
Regex Explanation:
\s* - Matches every white space if available
([^\d.]+) - Capture everything but a digit and a dot together as many as possible
\s* - Matches every whitespace if available
Note[1]: Once again, as it's going to be used in the split, you don't want to capture the numbers, this regex will match anything but numbers and remove them from the string, so if you use them in a Pattern#exec() you'll have a different output.
So this way we can simply:
"2 + 2.7 - 10 x 20.000".split("\\s*([^\\d.]+)\\s*");
Note[2]: The regex inside a String in Java must have its backslashes escaped like above's. And also, the capture groups (...) are useless since we're just splitting it, feel free to remove them if you want.
JShell Output:
$1 ==> String[4] { "2", "2.7", "10", "20.000" }
Java snippet:
https://ideone.com/CHjKNB
I have this regex expression written that should extract toll-free numbers but when there is a number like 1-800-343-2432 (when there is a 1 before the 800 stuff) it doesn't work
(?!(\$|#|800|855|866|877|888))\(?[\\s.-]*([0-9]{3})?[\\s.-]*\)?[\\s.-]*[0-9]{3}[\\s.-]*[0-9]{4}
how can i modify this expression to not take numbers like 1-866-343-1232 too ?!
Without checking your full regex you can use this regex to block 1-888:
(?!(?:1-)?(\\$|#|800|855|866|877|888))\(?[\\s.-]*([0-9]{3})?[\\s.-]*\)?[\\s.-]*[0-9]{3}[\\s.-]*[0-9]{4}
Prepend (1-)? to your regex. This will work for optional 1-.
Modifying your regular expression:
(\+)?(1-)?\(?(\\$|#|800|855|866|877|888)\)?[\\s.-]*([0-9]{3})?[\\s.-]*\)?[\\s.-]*[0-9]{3}[\\s.-]*[0-9]{4}
The key differences here are the following:
(\+)? :: A lazy quantifier `?' matches a + character if it takes place prior to the 1. Many numbers display like +1-800-343-2432
(1-)? :: Matches a 1 followed by a - character. The ? is a lazy quantifier that matches the 1- if it exists.
And I also added \(? and \)? which allow you to match on numbers that present in the format +1-(800)-343-2432
I have a list of files in a folder:
maze1.in.txt
maze2.in.txt
maze3.in.txt
I've used substring to remove the .txt extensions.
How do I use regex to match the front and the back of the file name?
I need it to match "maze" at the front and ".in" at the back, and the middle must be a digit (can be single or double digit).
I've tried the following
if (name.matches("name\\din")) {
//dosomething
}
It doesn't match anything. What is the correct regex expression to use?
I'm a little confused what you are asking for in particular
^(maze[0-9]*\.in)$
This will match maze(any number).in
^(maze[0-9]*\.in)\.txt$
this will match maze(any number).in.txt -- excludes the .txt NO NEED FOR USING SUB STRING!
Edit live on Debuggex
The think i would be wary about as of right now is the capture groups... I'm not particularly sure what you are doing with this regex. However, I believe explaining capture groups could benefit you.
A capture group for instance is denoted by () this is basically store them in the pattern array and is a way to parse stuff.
example maze1.in.txt
So if you want to capture the entire line minus .txt i would use this ^(maze[0-9]*\.in\.txt)$
However, if I wanted to capture things separately I would do this ^(maze)([0-9]*)(\.in)\.txt$ this will exclude .txt but include maze, the number, and .in IN separate indexes of the pattern array.
Your original solution doesn't work because string "name" is not in your text. It is "maze".
You can try this
name.matches("maze\\d{1,2}\\.in")
d{1,2} is used to match a digit(can be single or double digit).
You need regex anchors that tell the regex to
start at the beginning: ^
and signal the end of the string: $
^maze[\d]{0,2}\.in$
or in Java:
name.matches("^maze[\\d]{0,2}\\.in$");
Also, your regex wasn't matching strings with a dot (.) which would not accept your examples given. You need to add \. to the regex to accept dots because . is a special character.
It is always good to think of what you are trying to do in english, before you create regular expressions.
You want to match a word maze followed by a digit, followed by a literal period . followed by another word.
word `\w` matches a word character
digit `\d` matches a single digit
period `\.` matches a literal period
word `\w` matches a word character
putting it all together into a single string you get (keep in mind the double backslash for the Java escape and the pluses to repeat the previous match one or more times):
"\\w+\\d\\.\\w+"
The above is the generic case for any file name in the format xxx1.yyy, if you wanted to match maze and in specifically, you can just add those in as literal strings.
"maze\\d+\\.in"
example: http://ideone.com/rS7tw1
name.matches("^maze[0-9]+\\.in\\.txt$")
I need a regular expression for below pattern
It can start with / or number
It can only contain numbers, no text
Numbers can have space in between them.
It can contain /*, at least 1 number and space or numbers and /*
Valid Strings:
3232////33 43/323//
3232////3343/323//
/3232////343/323//
Invalid Strings:
/sas/3232/////dsds/
/ /34343///// /////
///////////
My Problem is, it can have space between numbers like /3232 323/ but not / /.
How to validate it ?
I have tried so far:
(\\d[\\d ]*/+) , (/*\\d[\\d ]*/+) , (/*)(\\d*)(/*)
This regex should work for you:
^/*(?:\\d(?: \\d)*/*)+$
Live Demo: http://www.rubular.com/r/pUOYFwV8SQ
My solution is not so simple but it works
^(((\d[\d ]*\d)|\d)|/)*((\d[\d ]*\d)|\d)(((\d[\d ]*\d)|\d)|/)*$
Just use lookarounds for the last criteria.
^(?=.*?\\d)([\\d/]*(?:/ ?(?!/)|\\d ?))+$
The best would have been to use conditional regex, but I think Java doesn't support them.
Explanation:
Basically, numbers or slashes, followed by one number and a space, or one slash and a space which is not followed by another slash. Repeat that. The space is made optional because I assume there's none at the end of your string.
Try this java regex
/*(\\d[\\d ]*(?<=\\d)/+)+
It meets all your criteria.
Although you didn't specifically state it, I have assumed that a space may not appear as the first or last character for a number (ie spaces must be between numbers)
"(?![A-z])(?=.*[0-9].*)(?!.*/ /.*)[0-9/ ]{2,}(?![A-z])"
this will match what you want but keep in mind it will also match this
/3232///// from /sas/3232/////dsds/
this is because part of the invalid string is correct
if you reading line by line then match the ^ $ and if you are reading an entire block of text then search for \r\n around the regex above to match each new line