How to split a string based on a pattern using regex - java

I have trouble splitting string based on regex.
String str = "1=(1-2,3-4),2=2,3=3,4=4";
Pattern commaPattern = Pattern.compile("\\([0-9-]+,[0-9-]+\\)|(,)") ;
String[] arr = commaPattern.split(str);
for (String s : arr)
{
System.out.println(s);
}
Expected output,
1=(1-2,3-4)
2=2
3=3
4=4
Actual output,
1=
2=2
3=3
4=4

This regex would split as required
,(?![^()]*\\))
------------
|->split with , only if it is not within ()

This isn't well suited for a split(...). Consider scanning through the input and matching instead:
String str = "1=(1-2,3-4),2=2,3=3,4=4";
Matcher m = Pattern.compile("(\\d+)=(\\d+|\\([^)]*\\))").matcher(str);
while(m.find()) {
String key = m.group(1);
String value = m.group(2);
System.out.printf("key=%s, value=%s\n", key, value);
}
which would print:
key=1, value=(1-2,3-4)
key=2, value=2
key=3, value=3
key=4, value=4

You will have to use some look ahead mechanism here. As I see it you are trying to split it on comma that is not in parenthesis. But your regular expressions says:
Split on comma OR on comma between numbers in parenthesis
So your String gets splitted in 4 places
1) (1-2,3-4)
2-4) comma

String[] arr = commaPattern.split(str);
should be
String[] arr = str.split(commaPattern);

Related

How to extract integers from a complicated string?

I am having a hard time figuring with out. Say I have String like this
String s could equal
s = "{1,4,204,3}"
at another time it could equal
s = "&5,3,5,20&"
or it could equal at another time
s = "/4,2,41,23/"
Is there any way I could just extract the numbers out of this string and make a char array for example?
You can use regex for this sample:
String s = "&5,3,5,20&";
System.out.println(s.replaceAll("[^0-9,]", ""));
result:
5,3,5,20
It will replace all the non word except numbers and commas. If you want to extract all the number you can just call split method -> String [] sArray = s.split(","); and iterate to all the array to extract all the number between commas.
You can use RegEx and extract all the digits from the string.
stringWithOnlyNumbers = str.replaceAll("[^\\d,]+","");
After this you can use split() using deliminator ',' to get the numbers in an array.
I think split() with replace() must help you with that
Use regular expressions
String a = "asdf4sdf5323ki";
String regex = "([0-9]*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(a);
while (matcher.find())
{
String group = matcher.group(1);
if (group.length() > 0)
{
System.out.println(group);
}
}
from your cases, if the pattern of string is same in all cases, then something like below would work, check for any exceptions, not mentioned here :
String[] sArr= s.split(",");
sArr[0] = sArr[0].substring(1);
sArr[sArr.length()-1] =sArr[sArr.length()-1].substring(0,sArr[sArr.length()-1].length()-1);
then convert the String[] to char[] , here is an example converter method
You can use Scanner class with , delimiter
String s = "{1,4,204,3}";
Scanner in = new Scanner(s.substring(1, s.length() - 1)); // Will scan the 1,4,204,3 part
in.useDelimiter(",");
while(in.hasNextInt()){
int x = in.nextInt();
System.out.print(x + " ");
// do something with x
}
The above will print:
1 4 204 3

Parsing String with Regex in Java

I have a String which is formatted as such
[dgdds,dfse][fsefsf,sefs][fsfs,fsef]
How would I use Regex to quickly parse this to return an ArrayList with each value containing one "entry" as such?
ArrayList <String>:
0(String): [dgdds,dfse]
1(String): [fsefsf,sefs]
2(String): [fsfs,fsef]
Really stuck with this, any help would be great.
How about
String myData = "[dgdds,dfse][fsefsf,sefs][fsfs,fsef]";
List<String> list = new ArrayList<>(Arrays.asList(myData
.split("(?<=\\])")));
for (String s : list)
System.out.println(s);
Output:
[dgdds,dfse]
[fsefsf,sefs]
[fsfs,fsef]
This regex will use look behind mechanism to split on each place after ].
You should try this regex :
Pattern pattern = Pattern.compile("\\[\\w*,\\w*\\]");
Old, easy, awesome way :)
String s = "[dgdds,dfse][fsefsf,sefs][fsfs,fsef]";
String[] token = s.split("]");
for (String string : token) {
System.out.println(string + "]");
}
You can use simple \[.*?\] regex, which means: match a string starting with [, later zero or more characters (but as short as possible, not greedly, that's why the ? in .*?), ending with ].
This works, you can test it on Ideone:
List<String> result = new ArrayList<String>();
String input = "[dgdds,dfse][fsefsf,sefs][fsfs,fsef]";
Pattern pattern = Pattern.compile("\\[.*?\\]");
Matcher matcher = pattern.matcher(input);
while (matcher.find())
{
result.add(matcher.group());
}
System.out.println(result);
Output:
[[dgdds,dfse], [fsefsf,sefs], [fsfs,fsef]]
You may need to do it in two passes:
(1) Split out by the brackets if it's just a 1D array (not clear in the question):
String s = "[dgdds,dfse][fsefsf,sefs][fsfs,fsef]";
String[] sArray = s.split("\\[|\\]\\[|\\]");
(2) Split by the commas if you want to also divide, say "dgdds,dfse"
sArray[i].split(",");
We can use split(regex) function directly by escaping "]": "\\]" and then use it as the regex for pattern matching:
String str = "[dgdds,dfse][fsefsf,sefs][fsfs,fsef]";
String bal[] = str.split("\\]");
ArrayList<String>finalList = new ArrayList<>();
for(String s:bal)
{
finalList.add(s+"]");
}
System.out.println(finalList);
Split using this (?:(?<=\])|^)(?=\[) might work if there are nothing between ][

Why the string does not split?

While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?
You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.
You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532
There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.
You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)

Splitting a string on the double pipe(||) using String.split()

I'm trying to split the string with double pipe(||) being the delimiter.String looks something like this:
String str ="user#email1.com||user#email2.com||user#email3.com";
i'm able to split it using the StringTokeniser.The javadoc says the use of this class is discouraged and instead look at String.split as option.
StringTokenizer token = new StringTokenizer(str, "||");
The above code works fine.But not able to figure out why below string.split function not giving me expected result..
String[] strArry = str.split("\\||");
Where am i going wrong..?
String.split() uses regular expressions. You need to escape the string that you want to use as divider.
Pattern has a method to do this for you, namely Pattern.quote(String s).
String[] split = str.split(Pattern.quote("||"));
You must escape every single | like this str.split("\\|\\|")
try this bellow :
String[] strArry = str.split("\\|\\|");
You can try this too...
String[] splits = str.split("[\\|]+");
Please note that you have to escape the pipe since it has a special meaning in regular expression and the String.split() method expects a regular expression argument.
For this you can follow two different approaches you can follow whichever suites you best:
Approach 1:
By Using String SPLIT functionality
String str = "a||b||c||d";
String[] parts = str.split("\\|\\|");
This will return you an array of different values after the split:
parts[0] = "a"
parts[1] = "b"
parts[2] = "c"
parts[3] = "d"
Approach 2:
By using PATTERN and MATCHER
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = "a||b||c||d";
Pattern p = Pattern.compile("\\|\\|");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println("Found two consecutive pipes at index " + m.start());
}
This will give you the index positions of consecutive pipes:
parts[0] = "a"
parts[1] = "b"
parts[2] = "c"
parts[3] = "d"
Try this
String yourstring="Hello || World";
String[] storiesdetails = yourstring.split("\\|\\|");

java split () method

I've got a string '123' (yes, it's a string in my program). Could anyone explain, when I use this method:
String[] str1Array = str2.split(" ");
Why I got str1Array[0]='123' rather than str1Array[0]=1?
str2 does not contain any spaces, therefore split copies the entire contents of str2 to the first index of str1Array.
You would have to do:
String str2 = "1 2 3";
String[] str1Array = str2.split(" ");
Alternatively, to find every character in str2 you could do:
for (char ch : str2.toCharArray()){
System.out.println(ch);
}
You could also assign it to the array in the loop.
str2.split("") ;
Try this:to split each character in a string .
Output:
[, 1, 2, 3]
but it will return an empty first value.
str2.split("(?!^)");
Output :
[1, 2, 3]
the regular expression that you pass to the split() should have a match in the string so that it will split the string in places where there is a match found in the string. Here you are passing " " which is not found in '123' hence there is no split happening.
Because there's no space in your String.
If you want single chars, try char[] characters = str2.toCharArray()
Simple...You are trying to split string by space and in your string "123", there is no space
This is because the split() method literally splits the string based on the characters given as a parameter.
We remove the splitting characters and form a new String every time we find the splitting characters.
String[] strs = "123".split(" ");
The String "123" does not have the character " " (space) and therefore cannot be split apart. So returned is just a single item in the array - { "123" }.
To do the "Split" you must use a delimiter, in this case insert a "," between each number
public static void main(String[] args) {
String[] list = "123456".replaceAll("(\\d)", ",$1").substring(1)
.split(",");
for (String string : list) {
System.out.println(string);
}
}
Try this:
String str = "123";
String res = str.split("");
will return the following result:
1,2,3

Categories

Resources