Java place a "-" between odd numbers in a string using regex - java

I am trying to place a -between all odd numbers in a string. So if a string is passed in as Hel776o it should output Hel7-76o. Dashes should only be placed between two consecutive odd numbers.
I am trying to do this in one line via String.replaceAll()
I have the following line:
return str.replaceAll(".*([13579])([13579]).*","$1-$2");
If any odd number, followed by an odd number place a - between them. But it's destructively replacing everything except for the last match.
Eg if I pass in "999477" it will output 7-7 instead of9-9-947-7. Are more groupings needed so I don't replace everything except the matches?
I already did this with a traditional loop through each char in string but wanted to do it in a one-liner with regex replace.
Edit: I should say I meant return str.replaceAll(".*([13579])([13579]).*","$0-$1"); and not $1 and $2

Remove .* from your regex to prevent consuming all characters in one match.
Also if you want to reuse some part of previously match you can't consume it. For instance if your string will be 135 and you will match 13 you will not be able to reuse that matched 3 again in next match with 5.
To solve this problem use look-around mechanisms which are zero-length which means they will not consume part they match.
So to describe place which has
odd number before use look behind (?<=[13579]),
odd number after it use look-ahead (?=[13579]).
So your code can look like
return str.replaceAll("(?<=[13579])(?=[13579])","-");
You can also let regex consume only one of two odd numbers to let other one be reused:
return str.replaceAll("[13579](?=[13579])","$0-");
return str.replaceAll("(?<=[13579])[13579]","-$0");

Related

How do implement this regex function in java?

The string should be 15 character maximum, composed of numbers only and there should be at least two one-character sized whitespace anywhere in the string.
It is easy to find the solution for numeric only, I'm getting stuck finding adding the condition for the whitespace.
I tried searching the most frequently asked regex question but couldn't find anything similar.
EDIT:
Additional conditions
whitespaces cannot be next to each other
they must not be placed in first or last character
I suppose for your demands, something like this would work:
\d+(\s\d+){2,}
But you'll need to check the length separately (e.g. input.length() <= 15).
This expressions says:
Digits in the beginning.
Then a single space followed by digits - at least two such combinations
This ensures that all spaces in the string are not before or after a space, and that there are at least two of them. It also prevents the spaces from being in the beginning or the end, and also allows for more than two of them.
You can use this regex: [0-9\s]{2,15}
And in your Java code you check if there are three parts separated by a whitespace:
String input =...;
if (input.matches("[0-9\\s]{2,15}") && (input.split(" ").length == 3) ) {
System.out.println("valid input");
}
Edited: Leading and ending whitespaces, connected whitespaces are not allowed

^ and $ in Java regular expression

I know that ^ and $ means "matches the beginning of the line" and "matches the end of line"
However, when I did some coding today, I didn't notice any difference between including them and excluding them in a regular expression used in Java.
For example, I want to match a positive Integer using
^[1-9]\\d*$
, and when I exclude them in the regular expression like
[1-9]\\d*
, it seems that there is no difference. I have tried to test with a String that "contains" an integer like ###123###, and the second regular expression can still recognize it is not valid like the first one.
So are the two regular expressions above completely equal to the other one? Thanks!
Do you need to search a string like 2343, or [SPACE]2345, or abc234?
The anchored regex will only find the number in the first string. The un-anchored will find them in all strings.
It all depends on what your requirements are. Are you analyzing lines in a text file, where each line contains only digits?, or are you analyzing the text in a prose document or source-code, where digits may be interspersed among a whole bunch of other stuff?
In the former case, the anchors are good. In the latter, they are bad.
More info: http://www.regular-expressions.info/anchors.html
They are different, the first input checks the whole line so from the begin to the end of the line and second doesn't care about the line.
For more check: regex-bounds
Well...no, the regular expressions aren't equivalent. They're also not doing what you think they are.
You intend to match a positive digit - what your regular expression aims to do is to match some character between 1 and 9, then match any number of digit characters after that (which includes zero).
The difference between the two is the anchoring, as you've noted - the first regex will only match values that literally begin with a 1 through 9, then zero or more digits, then expect there to be nothing else in the string.
The correct regex to match any positive number anywhere in the string would look like this:
[1-9]*\\d*
...and the correct regex to match any line that is a positive number would be this:
^[1-9]*\\d*$

Parse content-page using Regex?

I'm writing a Java code using regex to parse a content-page extracted from a PDF document.
In a string the regex must match: a digit (up to three) followed by a space (or many) followed by a word (or many [word: any sequence of characters]). And vise versa: (word(s) space(s) digit(s)), they all must be in the string. Also considering leading spaces and be case insensitive.
The extracted content-page could look something like this:
Directors’ responsibilities 8
Corporate governance 9
Remuneration report 10
the numbering-style is not consistent and number of spaces between digit and string do vary, so it could also look like:
01 Contents
02 Strategy and highlights
04 Chairman’s statement
The regex i'm using matches any number of words followed by any number of spaces and then a number of no more than 3 digits:
(?i)([a-z\\s])*[0-9]{1,3}(?i)
It works but not quite well, can't tell what I'm doing wrong? and I wish there is a way to detect both numbering-style (having the page numbers to the left or right of the string) instead of repeating the regex and flip the order.
Cheers
If you want to match phrases you should include any punctuation you want to match in your regex. AFAIK there is no way in regex to say if a phrase is "before or after", so you should flip one and append it with a |. Something along the lines of:
[a-zA-Z'".,!\s]+\d{1,3}|\d{1,3}[a-zA-Z'".,!\s]+
Also, you don't need two instances of (?i), as the regex will apply the case insensitivity until the end of the string or if it encounters a (?-i).
You can use this pattern with multiline mode, if there is always a number before or after each items:
"^(?:(?<nb1>\\d{1,3}) +)?(?<item>\\S+(?: +\\S+)*?)(?: +(?<nb2>\\d{1,3})|$)"
Then you can use m.group('nb1')+m.group('nb2') to always obtain the number for each whole match.
But if you must check there is at least a number, you must repeat the whole pattern:
"^(?:(?<nb1>\\d{1,3}) +(?<item1>\\S+(?: +\\S+)*)|(?<item2>\\S+(?: +\\S+)*) +(?<nb2>\\d{1,3})$"
Then:
item = m.group('item1')+m.group('item2');
nb = m.group('nb1')+m.group('nb2');
Notice: since the patterns are anchored at the begining and at the end, it is possible that you have to add some optional spaces to do them work: ^\\s* and \\s*$

Append a digit in front of another digit between /s?

I have a bunch of strings that looks like this
/files/etc/hosts/2/ipaddr
/files/etc/hosts/2/canonical
/files/etc/hosts/2/alias[1]
/files/etc/hosts/3/ipaddr
/files/etc/hosts/3/canonical
/files/etc/hosts/3/alias[1]
/files/etc/hosts/4/ipaddr
/files/etc/hosts/4/canonical
/files/etc/hosts/4/alias[1]
I would like to append a 0 in front of any digit that sits between the / and /. After the append, the results should look like this...
/files/etc/hosts/02/ipaddr
/files/etc/hosts/02/canonical
/files/etc/hosts/02/alias[1]
/files/etc/hosts/03/ipaddr
/files/etc/hosts/03/canonical
/files/etc/hosts/03/alias[1]
/files/etc/hosts/04/ipaddr
/files/etc/hosts/04/canonical
/files/etc/hosts/04/alias[1]
I am pretty sure that I need to use a simple regular expression for searching. I think /\d*/ should be sufficient but I am not sure how to modify the string to insert the digit 0. Can someone give me some advice?
Use:
newString = string.replaceAll("/(\\d)(?=/)", "/0$1");
\\d is only one digit. Feel free to use \\d+ for one-or-more digits.
$1 means the part of the string that matched first thing that appears in brackets (with some exceptions), thus (\\d).
You need to use look-ahead (?=...) instead of just /(\\d)/ because otherwise
/files/etc/hosts/3/5/canonical
will become
/files/etc/hosts/03/5/canonical
instead of
/files/etc/hosts/03/05/canonical
(the / between 3 and 5 will get consumed during matching the 3 if you don't use look-ahead, thus it won't match on the 5).
This is not an issue (and you can simply use /(\\d)/) if the above string is not a possible input.
Java regex reference.

Need regular expression for pattern this

I need a regular expression for below pattern
It can start with / or number
It can only contain numbers, no text
Numbers can have space in between them.
It can contain /*, at least 1 number and space or numbers and /*
Valid Strings:
3232////33 43/323//
3232////3343/323//
/3232////343/323//
Invalid Strings:
/sas/3232/////dsds/
/ /34343///// /////
///////////
My Problem is, it can have space between numbers like /3232 323/ but not / /.
How to validate it ?
I have tried so far:
(\\d[\\d ]*/+) , (/*\\d[\\d ]*/+) , (/*)(\\d*)(/*)
This regex should work for you:
^/*(?:\\d(?: \\d)*/*)+$
Live Demo: http://www.rubular.com/r/pUOYFwV8SQ
My solution is not so simple but it works
^(((\d[\d ]*\d)|\d)|/)*((\d[\d ]*\d)|\d)(((\d[\d ]*\d)|\d)|/)*$
Just use lookarounds for the last criteria.
^(?=.*?\\d)([\\d/]*(?:/ ?(?!/)|\\d ?))+$
The best would have been to use conditional regex, but I think Java doesn't support them.
Explanation:
Basically, numbers or slashes, followed by one number and a space, or one slash and a space which is not followed by another slash. Repeat that. The space is made optional because I assume there's none at the end of your string.
Try this java regex
/*(\\d[\\d ]*(?<=\\d)/+)+
It meets all your criteria.
Although you didn't specifically state it, I have assumed that a space may not appear as the first or last character for a number (ie spaces must be between numbers)
"(?![A-z])(?=.*[0-9].*)(?!.*/ /.*)[0-9/ ]{2,}(?![A-z])"
this will match what you want but keep in mind it will also match this
/3232///// from /sas/3232/////dsds/
this is because part of the invalid string is correct
if you reading line by line then match the ^ $ and if you are reading an entire block of text then search for \r\n around the regex above to match each new line

Categories

Resources