Java (Regex?) split string between number/letter combination - java

I've been looking through pages and pages of Google results but haven't come across anything that could help me.
What I'm trying to do is split a string like Bananas22Apples496Pears3, and break it down into some kind of readable format. Since String.split() cannot do this, I was wondering if anyone could point me to a regex snippet that could accomplish this.
Expanding a bit: the above string would be split into (String[] for simplicity's sake):
{"Bananas:22", "Apples:496", "Pears:3"}

Try this
String s = "Bananas22Apples496Pears3";
String[] res = s.replaceAll("(?<=\\p{L})(?=\\d)", ":").split("(?<=\\d)(?=\\p{L})");
for (String t : res) {
System.out.println(t);
}
The first step would be to replace the empty string with a ":", when on the left is a letter with the lookbehind assertion (?<=\\p{L}) and on the right is a digit, with the lookahead assertion (?=\\d).
Then split the result, when on the left is a digit and on the right is a letter.
\\p{L} is a Unicode property that matches every letter in every language.

You need to Replace and then split the string.You can't do it with the split alone
1> Replace All the string with the following regex
(\\w+?)(\\d+)
and replace it with
$1:$2
2> Now Split it with this regex
(?<=\\d)(?=[a-zA-Z])

This should do what you want:
import java.util.regex.*;
String d = "Bananas22Apples496Pears3"
Pattern p = Pattern.compile("[A-Za-z]+|[0-9]+");
Matcher m = p.matcher(d);
while (m.find()) {
System.out.println(m.group());
}
// Bananas
// 22
// Apples
// 496
// Pears
// 3

String myText = "Bananas22Apples496Pears3";
System.out.println(myText.replaceAll("([A-Za-z]+)([0-9]+)", "$1:$2,"));

Replace \d+ by :$0 and then split at (?=[a-zA-Z]+:\d+).

Related

Extract letter from String characters and numbers

I have these Strings:
"Turtle123456_fly.me"
"birdy_12345678_prd.tr"
I want the first words of each, ie:
Turtle
birdy
I tried this:
Pattern p = Pattern.compile("//d");
String[] items = p.split(String);
but of course it's wrong. I am not familiar with using Pattern.
Replace the stuff you don't want with nothing:
String firstWord = str.replaceAll("[^a-zA-Z].*", "");
to leave only the part you want.
The regex [^a-zA-Z] means "not a letter", the everything from (and including) the first non-letter to the end is "removed".
See live demo.
String s1 ="Turtle123456_fly.me";
String s2 ="birdy_12345678_prd.tr";
Pattern p = Pattern.compile("^([A-Za-z]+)[^A-Za-z]");
Matcher matcher = p.matcher(s1);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Explanation:
The first part ^([A-Za-z]+) is a group that captures all the letters anchored to the beginning of the input (using the ^ anchor).
The second part [^A-Za-z] captures the first non-letter, and serves as a terminator for the letters sequence.
Then all we have left to do is to fetch the group with index 1 (group 1 is what we have in the first parenthesis).
maybe you should try this \d+\w+.*

Regex including date string, email, number

I have this regex expression:
String patt = "(\\w+?)(:|<|>)(\\w+?),";
Pattern pattern = Pattern.compile(patt);
Matcher matcher = pattern.matcher(search + ",");
I am able to match a string like
search = "firstName:Giorgio"
But I'm not able to match string like
search = "email:giorgio.rossi#libero.it"
or
search = "dataregistrazione:27/10/2016"
How I should modify the regex expression in order to match these strings?
You may use
String pat = "(\\w+)[:<>]([^,]+)"; // Add a , at the end if it is necessary
See the regex demo
Details:
(\w+) - Group 1 capturing 1 or more word chars
[:<>] - one of the chars inside the character class, :, <, or >
([^,]+) - Group 2 capturing 1 or more chars other than , (in the demo, I added \n as the demo input text contains newlines).
You can use regex like this:
public static void main(String[] args) {
String[] arr = new String[]{"firstName:Giorgio", "email:giorgio.rossi#libero.it", "dataregistrazione:27/10/2016"};
String pattern = "(\\w+[:|<|>]\\w+)|(\\w+:\\w+\\.\\w+#\\w+\\.\\w+)|(\\w+:\\d{1,2}/\\d{1,2}/\\d{4})";
for(String str : arr){
if(str.matches(pattern))
System.out.println(str);
}
}
output is:
firstName:Giorgio
email:giorgio.rossi#libero.it
dataregistrazione:27/10/2016
But you have to remember that this regex will work only for your format of data. To make up the universal regex you should use RFC documents and articles (i.e here) about email format. Also this question can be useful.
Hope it helps.
The Character class \w matches [A-Za-z0-9_]. So kindly change the regex as (\\w+?)(:|<|>)(.*), to match any character from : to ,.
Or mention all characters that you can expect i.e. (\\w+?)(:|<|>)[#.\\w\\/]*, .

Java extract only first letters/characters from String

Hello guys I want to extract only first letters from this String:
String str = "使 徒 行 傳 16:31 ERV-ZH";
I only want to get these characters:
使 徒 行 傳
and not include
ERV-ZH
Only the letters or characters before the numbers plus the colon.
Note that Chinese letters can also be English and other letters.
this is what I've tried:
str.split(" ")[0];
But I'm only getting the first letter. Do you have an idea how to achieve my requirement? Any help will be appreciated. Thanks.
NOTE:
Also, strings are dynamic so I only presented sample characters.
This should give you the desired output
String str = "使 徒 行 傳 16:31 ERV-ZH";
String[] test = str.split("\\d\\d:\\d\\d");
for (String s : test) {
System.out.println(s);
}
The first element will be the part before the time and so on
Edit: if you are in need to be more dynamic for times like 6:31 or 16:6 then you could use this regex "\\d{1,2}:\\d{1,2}"
You can use the following regex ^([\\D\\s]+), this is what you need:
String str = "使 徒 行 傳 16:31 ERV-ZH";
String pattern = "^([\\D\\s]+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(str);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
}
This is a live DEMO here.
In the following regex ^([\\D\\s]+):
^ will match only in the begginnig.
\\D will avoid matching any number.
Note that this will be the case for any string.
If you don't always have a date pattern that can be used as a delimiter in the middle, and are looking for a more generic solution, you could go with this: str.replaceAll("[^\\p{L}\\s]+.*", "")

Why the string does not split?

While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?
You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.
You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532
There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.
You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)

Problems with building this regex [1,2,3]

i have a problem to build following regex:
[1,2,3,4]
i found a work-around, but i think its ugly
String stringIds = "[1,2,3,4]";
stringIds = stringIds.replaceAll("\\[", "");
stringIds = stringIds.replaceAll("\\]", "");
String[] ids = stringIds.split("\\,");
Can someone help me please to build one regex, which i can use in the split function
Thanks for help
edit:
i want to get from this string "[1,2,3,4]" to an array with 4 entries. the entries are the 4 numbers in the string, so i need to eliminate "[","]" and ",". the "," isn't the problem.
the first and last number contains [ or ]. so i needed the fix with replaceAll. But i think if i use in split a regex for ",", i also can pass a regex which eliminates "[" "]" too. But i cant figure out, who this regex should look like.
This is almost what you're looking for:
String q = "[1,2,3,4]";
String[] x = q.split("\\[|\\]|,");
The problem is that it produces an extra element at the beginning of the array due to the leading open bracket. You may not be able to do what you want with a single regex sans shenanigans. If you know the string always begins with an open bracket, you can remove it first.
The regex itself means "(split on) any open bracket, OR any closed bracket, OR any comma."
Punctuation characters frequently have additional meanings in regular expressions. The double leading backslashes... ugh, the first backslash tells the Java String parser that the next backslash is not a special character (example: \n is a newline...) so \\ means "I want an honest to God backslash". The next backslash tells the regexp engine that the next character ([ for example) is not a special regexp character. That makes me lol.
Maybe substring [ and ] from beginning and end, then split the rest by ,
String stringIds = "[1,2,3,4]";
String[] ids = stringIds.substring(1,stringIds.length()-1).split(",");
Looks to me like you're trying to make an array (not sure where you got 'regex' from; that means something different). In this case, you want:
String[] ids = {"1","2","3","4"};
If it's specifically an array of integer numbers you want, then instead use:
int[] ids = {1,2,3,4};
Your problem is not amenable to splitting by delimiter. It is much safer and more general to split by matching the integers themselves:
static String[] nums(String in) {
final Matcher m = Pattern.compile("\\d+").matcher(in);
final List<String> l = new ArrayList<String>();
while (m.find()) l.add(m.group());
return l.toArray(new String[l.size()]);
}
public static void main(String args[]) {
System.out.println(Arrays.toString(nums("[1, 2, 3, 4]")));
}
If the first line your code is following:
String stringIds = "[1,2,3,4]";
and you're trying to iterate over all number items, then the follwing code-frag only could work:
try {
Pattern regex = Pattern.compile("\\b(\\d+)\\b", Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Categories

Resources