The string should be 15 character maximum, composed of numbers only and there should be at least two one-character sized whitespace anywhere in the string.
It is easy to find the solution for numeric only, I'm getting stuck finding adding the condition for the whitespace.
I tried searching the most frequently asked regex question but couldn't find anything similar.
EDIT:
Additional conditions
whitespaces cannot be next to each other
they must not be placed in first or last character
I suppose for your demands, something like this would work:
\d+(\s\d+){2,}
But you'll need to check the length separately (e.g. input.length() <= 15).
This expressions says:
Digits in the beginning.
Then a single space followed by digits - at least two such combinations
This ensures that all spaces in the string are not before or after a space, and that there are at least two of them. It also prevents the spaces from being in the beginning or the end, and also allows for more than two of them.
You can use this regex: [0-9\s]{2,15}
And in your Java code you check if there are three parts separated by a whitespace:
String input =...;
if (input.matches("[0-9\\s]{2,15}") && (input.split(" ").length == 3) ) {
System.out.println("valid input");
}
Edited: Leading and ending whitespaces, connected whitespaces are not allowed
Related
I have an input string like this:
one `two three` four five `six` seven
where some parts can be wrapped by grave accent character (`).
I want to match only these parts which are not wrapped by it, it is one, four five and seven in example (skip two three and six).
I tryied to do it using lookaheads ((?<=) and (?=)) but it recognised four five group like two three and six. Is it possible to solve this problem using regex only, or I have to do it programmatically? (I'm using java 1.8)
If you are sure that there are no unclosed backticks, you could do this:
((?:\w| )+)(?=(?:[^`]*`[^`]*`)*[^`]*$)
This will match:
"one "
" four five "
" seven"
But it's a little bit expensive, because the lookahead that checks whether the number of backtics in the remaining part of line is divisible by 2 takes O(n^2) time to scan through the entire string.
Note that this works regardless of where the whitespace is, it really counts the backticks, it does not care about the relative position of the backticks. If you don't need this kind of robustness, #anubhava's answer is certainly more performant.
Demo: regex101.
You may use this regex using a lookahead and lookbehind:
(?<!`)\b\w+(?:\s+\w+)*\b(?!`)
RegEx Demo
Explanation:
- (?<!`): Negative Lookbehind to assert that we don't have ` at previous position
- \b\w+(?:\s+\w+)*\b: Match our text surrounded by word boundaries
- (?!`): Negative Lookahead to assert that we don't have ` at next position
I solve issues like this by specifying to exclude closing characters (in your case whitespace) like so:
`[^\s]+`
I'm practicing reading input and then tokenizing it.
For example, if I have [882,337] I want to just get the numbers 882 and 337. I tried using the following code:
String test = "[882,337]";
String[] tokens = test.split("\\[|\\]|,");
System.out.println(tokens[0]);
System.out.println(tokens[1]);
System.out.println(tokens[2]);
It kind of works, the output is:
(blank line)
882
337
What I don't understand is why token[0] is empty? I would expect there to only be two tokens where token[0] = 882 and token[1] = 337.
I checked out some links but didn't find the answer.
Thanks for the help!
Split splits the given String. If you split "[882,337]" on "[" or "," or "]" then you actually have:
nothing
882
337
nothing
But, as you have called String.split(delimiter), this calls String.split(delimiter, limit) with a limit of zero.
From the documentation:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
(emphasis mine)
So in this configuration the final, empty, strings are discarded. You are therefore left with exactly what you have.
Usually, to tokenize something like this, one would go for a combination of replaceAll and split:
final String[] tokens = input.replaceAll("^\\[|\\]$").split(",");
This will first strip off the start (^[) and end (]$) brackets and then split on ,. This way you don't have to have somewhat obtuse program logic where you start looping from an arbitrary index.
As an alternative, for more complex tokenizations, one can use Pattern - might be overkill here, but worth bearing in mind before you get into writing multiple replaceAll chains.
First we need to define, in Regex, the tokens we want (rather than those we're splitting on) - in this case it's simple, it's just digits so \d.
So, in order to extract all digit only (no thousands/decimal separators) values from an arbitrary String on would do the following:
final List<Integer> tokens = new ArrayList<>(); <-- to hold the tokens
final Pattern pattern = Pattern.compile("\\d++"); <-- the compiled regex
final Matcher matcher = pattern.matcher(input); <-- the matcher on input
while(matcher.find()) { <-- for each matched token
tokens.add(Integer.parseInt(matcher.group())); <-- parse and `int` and store
}
N.B: I have used a possessive regex pattern for efficiency
So, you see, the above code is somewhat more complex than the simple replaceAll().split(), but it is much more extensible. You can use arbitrary complex regex to token almost any input.
The symbols where the string is split are here:
String test = "[882,337]";
^ ^ ^
Because The first char matches your delimiter, everything left from it will be the first result. Well, left from the first letter is nothing, so the result is the empty string.
One could expect the same behaviour for the end, since the last symbol also matches the delimiter. But:
Trailing empty strings are therefore not included in the resulting array.
See Javadoc.
Splitting creates two (or more) things from one thing. For instance if you split a,b by , you will get a and b.
But in case of ",b" you will get "" and "b". You can think of it this way:
"" exists at start, end and even in-between all characters of string:
""+","+"b" -> ",b" so if we split on this "," we are getting left and right part: "" and "b"
Similar things happens in case of "a," and at first result array is ["a",""] but here split method removes trailing empty strings and returns only ["a"] (you can turn off this clearing mechanism by using split(",", -1)).
So in case of
String test = "[882,337]";
String[] tokens = test.split("\\[|\\]|,");
you are splitting:
""+"["+"882"+","+"337"+"]"+""
here: ^ ^ ^
which at first creates array ["", "882", "337", ""] but then trailing empty string is removed and finally you are receiving:
["", "882", "337"]
Only case where empty string is removed from start of result array is when
you are using Java 8 (or newer) and splitting on regex which is zero-length like split("") or lets say before each x with split("(?=x)") (more info at: Why in Java 8 split sometimes removes empty strings at start of result array?)
and when this empty string was result of split method. For instance "".split("") will not remove "", more info here: https://stackoverflow.com/a/25058091/1393766
That's because each delimiter has a "before" and "after" result, even if it is empty. Consider
882,337
You expect that to produce two results.
Similarly, you expect
882,337,
to produce three, with the last one being empty (assuming your limit is big enough, or assuming you're using almost any other language / implementation of split()). Extending that logically,
,882,337,
must produce four, with the first and last results being empty. This is exactly the case you have, except you have multiple delimiters.
I am trying to place a -between all odd numbers in a string. So if a string is passed in as Hel776o it should output Hel7-76o. Dashes should only be placed between two consecutive odd numbers.
I am trying to do this in one line via String.replaceAll()
I have the following line:
return str.replaceAll(".*([13579])([13579]).*","$1-$2");
If any odd number, followed by an odd number place a - between them. But it's destructively replacing everything except for the last match.
Eg if I pass in "999477" it will output 7-7 instead of9-9-947-7. Are more groupings needed so I don't replace everything except the matches?
I already did this with a traditional loop through each char in string but wanted to do it in a one-liner with regex replace.
Edit: I should say I meant return str.replaceAll(".*([13579])([13579]).*","$0-$1"); and not $1 and $2
Remove .* from your regex to prevent consuming all characters in one match.
Also if you want to reuse some part of previously match you can't consume it. For instance if your string will be 135 and you will match 13 you will not be able to reuse that matched 3 again in next match with 5.
To solve this problem use look-around mechanisms which are zero-length which means they will not consume part they match.
So to describe place which has
odd number before use look behind (?<=[13579]),
odd number after it use look-ahead (?=[13579]).
So your code can look like
return str.replaceAll("(?<=[13579])(?=[13579])","-");
You can also let regex consume only one of two odd numbers to let other one be reused:
return str.replaceAll("[13579](?=[13579])","$0-");
return str.replaceAll("(?<=[13579])[13579]","-$0");
I need a regular expression for below pattern
It can start with / or number
It can only contain numbers, no text
Numbers can have space in between them.
It can contain /*, at least 1 number and space or numbers and /*
Valid Strings:
3232////33 43/323//
3232////3343/323//
/3232////343/323//
Invalid Strings:
/sas/3232/////dsds/
/ /34343///// /////
///////////
My Problem is, it can have space between numbers like /3232 323/ but not / /.
How to validate it ?
I have tried so far:
(\\d[\\d ]*/+) , (/*\\d[\\d ]*/+) , (/*)(\\d*)(/*)
This regex should work for you:
^/*(?:\\d(?: \\d)*/*)+$
Live Demo: http://www.rubular.com/r/pUOYFwV8SQ
My solution is not so simple but it works
^(((\d[\d ]*\d)|\d)|/)*((\d[\d ]*\d)|\d)(((\d[\d ]*\d)|\d)|/)*$
Just use lookarounds for the last criteria.
^(?=.*?\\d)([\\d/]*(?:/ ?(?!/)|\\d ?))+$
The best would have been to use conditional regex, but I think Java doesn't support them.
Explanation:
Basically, numbers or slashes, followed by one number and a space, or one slash and a space which is not followed by another slash. Repeat that. The space is made optional because I assume there's none at the end of your string.
Try this java regex
/*(\\d[\\d ]*(?<=\\d)/+)+
It meets all your criteria.
Although you didn't specifically state it, I have assumed that a space may not appear as the first or last character for a number (ie spaces must be between numbers)
"(?![A-z])(?=.*[0-9].*)(?!.*/ /.*)[0-9/ ]{2,}(?![A-z])"
this will match what you want but keep in mind it will also match this
/3232///// from /sas/3232/////dsds/
this is because part of the invalid string is correct
if you reading line by line then match the ^ $ and if you are reading an entire block of text then search for \r\n around the regex above to match each new line
I need to check that a file contains some amounts that match a specific format:
between 1 and 15 characters (numbers or ",")
may contains at most one "," separator for decimals
must at least have one number before the separator
this amount is supposed to be in the middle of a string, bounded by alphabetical characters (but we have to exclude the malformed files).
I currently have this:
\d{1,15}(,\d{1,14})?
But it does not match with the requirement as I might catch up to 30 characters here.
Unfortunately, for some reasons that are too long to explain here, I cannot simply pick a substring or use any other java call. The match has to be in a single, java-compatible, regular expression.
^(?=.{1,15}$)\d+(,\d+)?$
^ start of the string
(?=.{1,15}$) positive lookahead to make sure that the total length of string is between 1 and 15
\d+ one or more digit(s)
(,\d+)? optionally followed by a comma and more digits
$ end of the string (not really required as we already checked for it in the lookahead).
You might have to escape backslashes for Java: ^(?=.{1,15}$)\\d+(,\\d+)?$
update: If you're looking for this in the middle of another string, use word boundaries \b instead of string boundaries (^ and $).
\b(?=[\d,]{1,15}\b)\d+(,\d+)?\b
For java:
"\\b(?=[\\d,]{1,15}\\b)\\d+(,\\d+)?\\b"
More readable version:
"\\b(?=[0-9,]{1,15}\\b)[0-9]+(,[0-9]+)?\\b"