Append a digit in front of another digit between /s? - java

I have a bunch of strings that looks like this
/files/etc/hosts/2/ipaddr
/files/etc/hosts/2/canonical
/files/etc/hosts/2/alias[1]
/files/etc/hosts/3/ipaddr
/files/etc/hosts/3/canonical
/files/etc/hosts/3/alias[1]
/files/etc/hosts/4/ipaddr
/files/etc/hosts/4/canonical
/files/etc/hosts/4/alias[1]
I would like to append a 0 in front of any digit that sits between the / and /. After the append, the results should look like this...
/files/etc/hosts/02/ipaddr
/files/etc/hosts/02/canonical
/files/etc/hosts/02/alias[1]
/files/etc/hosts/03/ipaddr
/files/etc/hosts/03/canonical
/files/etc/hosts/03/alias[1]
/files/etc/hosts/04/ipaddr
/files/etc/hosts/04/canonical
/files/etc/hosts/04/alias[1]
I am pretty sure that I need to use a simple regular expression for searching. I think /\d*/ should be sufficient but I am not sure how to modify the string to insert the digit 0. Can someone give me some advice?

Use:
newString = string.replaceAll("/(\\d)(?=/)", "/0$1");
\\d is only one digit. Feel free to use \\d+ for one-or-more digits.
$1 means the part of the string that matched first thing that appears in brackets (with some exceptions), thus (\\d).
You need to use look-ahead (?=...) instead of just /(\\d)/ because otherwise
/files/etc/hosts/3/5/canonical
will become
/files/etc/hosts/03/5/canonical
instead of
/files/etc/hosts/03/05/canonical
(the / between 3 and 5 will get consumed during matching the 3 if you don't use look-ahead, thus it won't match on the 5).
This is not an issue (and you can simply use /(\\d)/) if the above string is not a possible input.
Java regex reference.

Related

Regular expression for the name

I need to build a regex for a name with the following pattern, so
John D.E. would pass the regex test.
Basically what I want is:
N number of chars(a-zA-Z) goes first
Then there's exactly one space
Exactly one char(a-zA-Z)
Exactly one dot
Exactly one char(a-zA-Z)
Exactly one dot
I wrote this regex ^([a-zA-Z]*)+( {1})+([a-zA-Z]{1})+(\.)+([a-zA-Z]{1})+(\.), but it doesn't seem to work properly (the expression still allows n number of spaces, for example). How do I restrict it? {1} doesn't work.
Try this:
^([a-zA-Z])+([ ]{1})([a-zA-Z]{1})([.])([a-zA-Z]{1})([.])
I've taken space and dots into class ([]). If you don't do this with dot, then it means any character. Alo pluses are redundant, they mean more than one character.
P.S.: #f1sh correctly notices, that having {1} doesn't change a thing, so the shorter form would be:
^([a-zA-Z])+([ ])([a-zA-Z])([.])([a-zA-Z])([.])

Complicated regex and possible simple way to do it [duplicate]

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.
Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement

Java place a "-" between odd numbers in a string using regex

I am trying to place a -between all odd numbers in a string. So if a string is passed in as Hel776o it should output Hel7-76o. Dashes should only be placed between two consecutive odd numbers.
I am trying to do this in one line via String.replaceAll()
I have the following line:
return str.replaceAll(".*([13579])([13579]).*","$1-$2");
If any odd number, followed by an odd number place a - between them. But it's destructively replacing everything except for the last match.
Eg if I pass in "999477" it will output 7-7 instead of9-9-947-7. Are more groupings needed so I don't replace everything except the matches?
I already did this with a traditional loop through each char in string but wanted to do it in a one-liner with regex replace.
Edit: I should say I meant return str.replaceAll(".*([13579])([13579]).*","$0-$1"); and not $1 and $2
Remove .* from your regex to prevent consuming all characters in one match.
Also if you want to reuse some part of previously match you can't consume it. For instance if your string will be 135 and you will match 13 you will not be able to reuse that matched 3 again in next match with 5.
To solve this problem use look-around mechanisms which are zero-length which means they will not consume part they match.
So to describe place which has
odd number before use look behind (?<=[13579]),
odd number after it use look-ahead (?=[13579]).
So your code can look like
return str.replaceAll("(?<=[13579])(?=[13579])","-");
You can also let regex consume only one of two odd numbers to let other one be reused:
return str.replaceAll("[13579](?=[13579])","$0-");
return str.replaceAll("(?<=[13579])[13579]","-$0");

Need regular expression for pattern this

I need a regular expression for below pattern
It can start with / or number
It can only contain numbers, no text
Numbers can have space in between them.
It can contain /*, at least 1 number and space or numbers and /*
Valid Strings:
3232////33 43/323//
3232////3343/323//
/3232////343/323//
Invalid Strings:
/sas/3232/////dsds/
/ /34343///// /////
///////////
My Problem is, it can have space between numbers like /3232 323/ but not / /.
How to validate it ?
I have tried so far:
(\\d[\\d ]*/+) , (/*\\d[\\d ]*/+) , (/*)(\\d*)(/*)
This regex should work for you:
^/*(?:\\d(?: \\d)*/*)+$
Live Demo: http://www.rubular.com/r/pUOYFwV8SQ
My solution is not so simple but it works
^(((\d[\d ]*\d)|\d)|/)*((\d[\d ]*\d)|\d)(((\d[\d ]*\d)|\d)|/)*$
Just use lookarounds for the last criteria.
^(?=.*?\\d)([\\d/]*(?:/ ?(?!/)|\\d ?))+$
The best would have been to use conditional regex, but I think Java doesn't support them.
Explanation:
Basically, numbers or slashes, followed by one number and a space, or one slash and a space which is not followed by another slash. Repeat that. The space is made optional because I assume there's none at the end of your string.
Try this java regex
/*(\\d[\\d ]*(?<=\\d)/+)+
It meets all your criteria.
Although you didn't specifically state it, I have assumed that a space may not appear as the first or last character for a number (ie spaces must be between numbers)
"(?![A-z])(?=.*[0-9].*)(?!.*/ /.*)[0-9/ ]{2,}(?![A-z])"
this will match what you want but keep in mind it will also match this
/3232///// from /sas/3232/////dsds/
this is because part of the invalid string is correct
if you reading line by line then match the ^ $ and if you are reading an entire block of text then search for \r\n around the regex above to match each new line

Pattern/Regular expression to grab a number *only* if it's the only field in the record

This has been driving me crazy the past couple of days. I'm trying to kill two birds with one stone by validating a record and extracting a field at the same time. My strategy has been to do this with a regular expression:
private Pattern firstNumber = Pattern.compile("\\d{1}");
Which I understand to mean "the first number in the line (record)." So far this has been effective at grabbing the first field (and ensuring that it's a number), but I want to take this a step further:
How can I tweak the regexp to specify that I want the number only if it's the sole field?
That is, if the record is simply 10, I want to grab 10. But if the record is 10 4, I don't want to grab anything (as this is an invalid record for the project).
I tried:
private Pattern oneNumberOnly = Pattern.compile("\\d{1}\n");
But -- to my chagrin -- this (and any other permutation of it) does not pick up any numbers. Is there something I'm missing here?
You can denote beginning of line/string with ^ and end of line/string with $, so the pattern would be
^\d+$
The {1} won't work because it excludes anything with more than one digit, such as 10. Using \d+ indicates one or more digits. Using \d may also allow decimals and negative values (not sure about Java), so if you only want digits, replace \d with [0-9].
Specifying {1} is always redundant, by the way, because by default an atom is matched once.
You can use the start line character and end line character. If you are trying to grab a number that is on its own line you can use:
Pattern.compile("^(\\d)++$");
By adding the {1} you will only get 1 digit of a number. You should also trim the string you are comparing against to get rid of any extra whitespace.
^ - Start of line character
\\d - digit character [0-9]
+ - 1 or more characters that match \d
+ - possesive (this will grab all the digits and is quicker than greedy quantifiers)
$ - End of line character

Categories

Resources