Java Regex to split numbers with unknown number of spaces - java

I have a string of numbers that are a little weird. The source I'm pulling from has a non-standard formatting and I'm trying to switch from a .split where I need to specify an exact method to split on (2 spaces, 3 spaces, etc.) to a replaceall regex.
My data looks like this:
23574 123451 81239 1234 19274 4312457 1234719
I want to end up with
23574,xxxxx,xxxxx,xxxx
So I can just do a String.split on the ,

I will use \s Regex
This is its usage on Java
String[] numbers = myString.split("\\s+");

final Iterable<String> splitted = Splitter.on('').trimResults().omitEmptyStrings().split(input);
final String output = Joiner.on(',').join(splitted);
with Guava Splitter and Joiner

String pattern = "(\s+)";
Pattern regex = Pattern.compile(pattern);
Matcher match = r.matcher(inputString);
match.replaceAll(",");
String stringToSplit = match.toString();
I think that should do it for you. If not, googling for the Matcher and Pattern classes in the java api will be very helpful.

I understand this problem as a way to obtain integer numbers from a string with blank (not only space) separators.
The accepted solution does not work if the separator is a TAB \t for instance or if it has an \n at the end.
If we define an integer number as a sequence of digits, the best way to solve this is using a simple regular expression. Checking the Java 8 Pattern API, we can find that \D represents any non digit character:
\D A non-digit: [^0-9]
So if the String.split() method accepts a regular expression with the possible separators, it is easy to send "\\D+" to a trimmed string and get the result in one shot like this.
String source = "23574 123451 81239 1234 19274 4312457 1234719";
String trimmed = source.trim();
String[] numbers = trimmed.split("\\D+");
It is translated as split this trimmed string using any non digit character sequence as a possible separator.

Related

Java regex, replace certain characters except

I have this string "u2x4m5x7" and I want replace all the characters but a number followed by an x with "".
The output should be:
"2x5x"
Just the number followed by the x.
But I am getting this:
"2x45x7"
I'm doing this:
String string = "u2x4m5x7";
String s = string.replaceAll("[^0-9+x]","");
Please help!!!
Here is a one-liner using String#replaceAll with two replacements:
System.out.println(string.replaceAll("\\d+(?!x)", "").replaceAll("[^x\\d]", ""));
Here is another working solution. We can iterate the input string using a formal pattern matcher with the pattern \d+x. This is the whitelist approach, of trying to match the variable combinations we want to keep.
String input = "u2x4m5x7";
Pattern pattern = Pattern.compile("\\d+x");
Matcher m = pattern.matcher(input);
StringBuilder b = new StringBuilder();
while(m.find()) {
b.append(m.group(0));
}
System.out.println(b)
This prints:
2x5x
It looks like this would be much simpler by searching to get the match rather than replacing all non matches, but here is a possible solution, though it may be missing a few cases:
\d(?!x)|[^0-9x]|(?<!\d)x
https://regex101.com/r/v6udph/1
Basically it will:
\d(?!x) -- remove any digit not followed by an x
[^0-9x] -- remove all non-x/digit characters
(?<!\d)x -- remove all x's not preceded by a digit
But then again, grabbing from \dx would be much simpler
Capture what you need to $1 OR any character and replace with captured $1 (empty if |. matched).
String s = string.replaceAll("(\\d+x)|.", "$1");
See this demo at regex101 or a Java demo at tio.run

Matching a string which occurs after a certain pattern

I want to match a string which occurs after a certain pattern but I am not able to come up with a regex to do that (I am using Java).
For example, let's say I have this string,
caa,abb,ksmf,fsksf,fkfs,admkf
and I want my regex to match only those commas which are prefixed by abb. How do I do that? Is it even possible using regexes?
If I use the regex abb, it matches the whole string abb, but I only want to match the comma after that.
I ask this because I wanted to use this regex in a split method which accepts a regex. If I pass abb, as the regex, it will consider the string abb, to be the delimiter and not the , which I want.
Any help would be greatly appreciated.
String test = "caa,abb,ksmf,fsksf,fkfs,admkf";
String regex = "(?<=abb),";
String[] split = test.split(regex);
for(String s : split){
System.out.println(s);
}
Output:
caa,abb
ksmf,fsksf,fkfs,admkf
See here for information:
https://www.regular-expressions.info/lookaround.html

Regex allow ; and at least 5 digit numbers and trim leading/trailing semicolon in JAVA

This is what I am after:
Replace all characters that are not digits and not semicolon ; with nothing: "".
Numbers must be at least 5 digits long.
Trim leading and trailing semicolon ;
So:
567834 is valid
123456;654321;3456789 is valid
123;456 is not valid(too short numbers), will be replaced with empty string ""
;123456; will be trimmed to 123456
;567890 will be trimmed to 567890
456789; will be trimmed to 456789
I was thinking of using replaceAll method to do the work.
str.replaceAll("(\\d+\\;?)*\\d+", "");
But this doesn't take care of trimming leading and trailing semicolons and doesn't replace too short numbers with "".
Any help is appreciated!
I'd recommend breaking the problem into steps. This is an easy problem if you do. A single regex will be challenging, both to develop today and to read for every day after. Readable, easily understandable code should be your objective.
String trimmedStr = str.trim();
String noSemicolons = trimmedStr.replaceAll(";", "");
Matcher matcher = Pattern.compile("^\d{5,}$").matcher(noSemicolons);
boolean isValid = matcher.matches();
You can use:
String repl = input.replaceAll(";?\\b(\\d{5,})\\b;?|[\\d;]*", "$1");
RegEx Demo
You can use this replacement:
String result = input.replaceAll("(\\d{5,})|\\d{1,4}(?:;+|\\z)|;+\\d{0,4}\\z|\\A;", "$1");
The idea is to preserve numbers with at least 5 digits first in a capture group (because the first branch on the left that succeeds wins). Other branches describes what you need to remove.
An other way:
String result = input.replaceAll("((?:\\d{5,}(?:;(?!\\z))?)*+)(?:;*\\d{0,4}(?:;+|\\z))++", "$1");
This one describes the string as a succession of parts to remove preceded by an optional part to preserve.

Split String if it has number

Hi Guys its been a while since I ask another question,
I have this String which consist of a name and a number
Ex.
String myString = "give11arrow123test2356read809cell1245cable1257give222..."
Now what I am trying to do is to split it whenever there is a number attached to it
I have to split it so that I could have a result like this
give11, arrow123, test2356, read809, cell1245, cable1257, give222, ....
I could use this code but I cant find the right regex
String[] arrayString = myString.split("Regex")
Thanks for your help.
You can use a combination of lookarounds to split your string.
Lookarounds are zero-width assertions. They don't consume any characters on the string. The point of zero-width is the validation to see if a regex can or cannot be matched looking ahead or looking back from the current position, without adding them to the overall match.
String s = "give11arrow123test2356read809cell1245cable1257give222...";
String[] parts = s.split("(?<=\\d)(?=\\D)");
System.out.println(Arrays.toString(parts));
Output
[give11, arrow123, test2356, read809, cell1245, cable1257, give222, ...]
Use this regex for spliting
String regex = "(?<=\\d)(?=\\D)";
I am unfamiliar with using regex in java, but this expression matches what you need on www.rubular.com
([A-Za-z]+[0-9]+)

Java String Split on any character (including regex special characters)

I'm sure I'm just overlooking something here...
Is there a simple way to split a String on an explicit character without applying RegEx rules?
For instance, I receive a string with a dynamic delimiter, I know the 5th character defines the delimiter.
String s = "This,is,a,sample";
For this, it's simple to do
String delimiter = String.valueOf(s.charAt(4));
String[] result = s.split(delimiter);
However, when I have a delimiter that's a special RegEx character, this doesn't work:
String s = "This*is*a*sample";
So... is there a way to split the string on an explicit character without trying to apply extra RegEx rules? I feel like I must be missing something pretty simple.
split uses a regular expression as its argument. * is a meta-character used to match zero of more characters in regular expressions, You could use Pattern#quote to avoid interpreting the character
String[] result = s.split(Pattern.quote(delimiter));
You need not to worry about the character type If you use Pattern
Pattern regex = Pattern.compile(s.charAt(4));
Matcher matcher = regex.matcher(yourString);
if (matcher.find()){
//do something
}
You can run Pattern.quote on the delimiter before feeding it in. This will create a string literal and escape any regex specific chars:
delimiter = Pattern.quote(delimiter);
StringUtils.split(s, delimiter);
That will treat the delimiter as just a character, not use it like a regex.
StringUtils is a part of the ApacheCommons library, which is tons of useful methods. It is worth taking a look, could save you some time in the future.
Simply put your delimiter between []
String delimiter = "["+s.charAt(4)+"]";
String[] result = s.split(delimiter);
Since [ ] is the regex matches any characters between [ ]. You can also specify a list of delimiters like [*,.+-]

Categories

Resources