Best way to split "a{b}c{d}" - java

I'm struggling other than brute force method to split
String str = "a{b}c{d}"
into
String[] arr;
arr[0] = "a"
arr[1] = "{b}"
arr[2] = "c"
arr[3] = "{d}"
Wondering if there's a more efficient way other out there than using indexOf and subString

Based on your current edit it looks like you want to split on place which is either
directly before {
directly after }
In that case you can use split method which supports regex (regular expression). Regex provides lookaround mechanisms like
(?=subregex) to see if we are directly before something which can be matched by subregex
(?<=subregex) to see if we are directly after something which can be matched by subregex
Also { and } are considered regex metacharacters (we can use them like {m,n} to describe amount of repetitions like a{1,3} can match a, aa, aaa but not aaaa or more) so to make it normal literal we need to escape it like \{ and \}
Last thing you need is OR operator which is represented as |.
So your code can look like:
String str = "a{b}c{d}";
String[] arr = str.split("(?=\\{)|(?<=\\})"); // split at places before "{" OR after "}"
for (String s : arr){
System.out.println(s);
}
Output:
a
{b}
c
{d}
Demo: https://ideone.com/FdUbKs

just use the String.split() method (documentation)
arr = str.split()

You may use the String.split(String delimiter) method :
String str = "a {b} c {d}";
String[] arr = str.split(" ");
System.out.println(Arrays.toString(arr)); // [a, {b], c, {d}]

Use String.split()...
String[] arr = str.split(" ");

I don't know if it's as efficient as the previous regex solutions; I'm putting a single white space before { and after } then splitting string by " ":
String str = "a{b}c{d}";
String[] split = str.replace("{"," {").replace("}","} ").split(" ");
System.out.println(Arrays.toString(split));
Desired output:
[a, {b}, c, {d}]

Related

How to split uppercase string? [duplicate]

I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)
I have a String such as "thisIsMyString" and I need to convert it to a String[] {"this", "Is", "My", "String"}.
Please notice the first letter is not uppercase.
You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:
String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");
Y(?=X) matches Y followed by X, but doesn't include X into match. So (?=\\p{Upper}) matches an empty sequence followed by a uppercase letter, and split uses it as a delimiter.
See javadoc for more info on Java regexp syntax.
EDIT: By the way, it doesn't work with thisIsMyÜberString. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:
String[] r = s.split("(?=\\p{Lu})");
String[] camelCaseWords = s.split("(?=[A-Z])");
For anyone that wonders how the Pattern is when the String to split might start with an upper case character:
String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\\p{Lu})");
System.out.println(Arrays.toString(r));
gives: [This, Is, My, String]
Since String::split takes a regular expression you can use a look-ahead:
String[] x = "thisIsMyString".split("(?=[A-Z])");
Try this;
static Pattern p = Pattern.compile("(?=\\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");
...
This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.
(?<=.)(?=(\\p{Upper}))
TestText = Test, Text
thisIsATest = this, Is, A, Test
A simple scala/java suggestion that does not split at entire uppercase strings like NYC:
def splitAtMiddleUppercase(token: String): Iterator[String] = {
val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}
test with:
val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interÜber")
for( example <- examples) {
println(example + " -> " + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}
it produces:
catch22 -> [catch22]
iPhone -> [i, Phone]
eReplacement -> [e, Replacement]
TotalRecall -> [Total, Recall]
NYC -> [NYC]
JGHSD87 -> [JGHSD87]
interÜber -> [inter, Über]
Modify the regex to cut at digits too.
String str = "IAmAJavaProgrammer";
StringBuilder expected = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if(Character.isUpperCase(str.charAt(i))){
expected.append(" ");
}
expected.append(str.charAt(i));
}
System.out.println(expected);

Splitting a string on the double pipe(||) using String.split()

I'm trying to split the string with double pipe(||) being the delimiter.String looks something like this:
String str ="user#email1.com||user#email2.com||user#email3.com";
i'm able to split it using the StringTokeniser.The javadoc says the use of this class is discouraged and instead look at String.split as option.
StringTokenizer token = new StringTokenizer(str, "||");
The above code works fine.But not able to figure out why below string.split function not giving me expected result..
String[] strArry = str.split("\\||");
Where am i going wrong..?
String.split() uses regular expressions. You need to escape the string that you want to use as divider.
Pattern has a method to do this for you, namely Pattern.quote(String s).
String[] split = str.split(Pattern.quote("||"));
You must escape every single | like this str.split("\\|\\|")
try this bellow :
String[] strArry = str.split("\\|\\|");
You can try this too...
String[] splits = str.split("[\\|]+");
Please note that you have to escape the pipe since it has a special meaning in regular expression and the String.split() method expects a regular expression argument.
For this you can follow two different approaches you can follow whichever suites you best:
Approach 1:
By Using String SPLIT functionality
String str = "a||b||c||d";
String[] parts = str.split("\\|\\|");
This will return you an array of different values after the split:
parts[0] = "a"
parts[1] = "b"
parts[2] = "c"
parts[3] = "d"
Approach 2:
By using PATTERN and MATCHER
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = "a||b||c||d";
Pattern p = Pattern.compile("\\|\\|");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println("Found two consecutive pipes at index " + m.start());
}
This will give you the index positions of consecutive pipes:
parts[0] = "a"
parts[1] = "b"
parts[2] = "c"
parts[3] = "d"
Try this
String yourstring="Hello || World";
String[] storiesdetails = yourstring.split("\\|\\|");

java split () method

I've got a string '123' (yes, it's a string in my program). Could anyone explain, when I use this method:
String[] str1Array = str2.split(" ");
Why I got str1Array[0]='123' rather than str1Array[0]=1?
str2 does not contain any spaces, therefore split copies the entire contents of str2 to the first index of str1Array.
You would have to do:
String str2 = "1 2 3";
String[] str1Array = str2.split(" ");
Alternatively, to find every character in str2 you could do:
for (char ch : str2.toCharArray()){
System.out.println(ch);
}
You could also assign it to the array in the loop.
str2.split("") ;
Try this:to split each character in a string .
Output:
[, 1, 2, 3]
but it will return an empty first value.
str2.split("(?!^)");
Output :
[1, 2, 3]
the regular expression that you pass to the split() should have a match in the string so that it will split the string in places where there is a match found in the string. Here you are passing " " which is not found in '123' hence there is no split happening.
Because there's no space in your String.
If you want single chars, try char[] characters = str2.toCharArray()
Simple...You are trying to split string by space and in your string "123", there is no space
This is because the split() method literally splits the string based on the characters given as a parameter.
We remove the splitting characters and form a new String every time we find the splitting characters.
String[] strs = "123".split(" ");
The String "123" does not have the character " " (space) and therefore cannot be split apart. So returned is just a single item in the array - { "123" }.
To do the "Split" you must use a delimiter, in this case insert a "," between each number
public static void main(String[] args) {
String[] list = "123456".replaceAll("(\\d)", ",$1").substring(1)
.split(",");
for (String string : list) {
System.out.println(string);
}
}
Try this:
String str = "123";
String res = str.split("");
will return the following result:
1,2,3

Split()-ing in java

So let's say I have:
String string1 = "123,234,345,456,567*nonImportantData";
String[] stringArray = string1.split(", ");
String[] lastPart = stringArray[stringArray.length-1].split("*");
stringArray[stringArray.length-1] = lastPart[0];
Is there any easier way of making this code work? My objective is to get all the numbers separated, whether stringArray includes nonImportantData or not. Should I maybe use the substring method?
Actually, the String.split(...) method's argument is not a separator string but a regular expression.
You can use
String[] splitStr = string1.split(",|\\*");
where | is a regexp OR and \\ is used to escape * as it is a special operator in regexp. Your split("*") would actually throw a java.util.regex.PatternSyntaxException.
Assuming you always have the format you've provided....
String input = "123,234,345,456,567*nonImportantData";
String[] numbers = input.split("\\*")[0].split(",");
I'd probably remove the unimportant data before splitting the string.
int idx = string1.indexOf('*');
if (idx >= 0)
string1 = string1.substring(0, idx);
String[] arr = string1.split(", ");
If '*' is always present, you can shorten it like this:
String[] arr = str.substring(0, str.indexOf('*')).split(", ");
This is different than MarianP's approach because the "unimportant data" isn't preserved as an element of the array. This may or may not be helpful, depending on your application.

Java: Split string when an uppercase letter is found

I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)
I have a String such as "thisIsMyString" and I need to convert it to a String[] {"this", "Is", "My", "String"}.
Please notice the first letter is not uppercase.
You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:
String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");
Y(?=X) matches Y followed by X, but doesn't include X into match. So (?=\\p{Upper}) matches an empty sequence followed by a uppercase letter, and split uses it as a delimiter.
See javadoc for more info on Java regexp syntax.
EDIT: By the way, it doesn't work with thisIsMyÜberString. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:
String[] r = s.split("(?=\\p{Lu})");
String[] camelCaseWords = s.split("(?=[A-Z])");
For anyone that wonders how the Pattern is when the String to split might start with an upper case character:
String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\\p{Lu})");
System.out.println(Arrays.toString(r));
gives: [This, Is, My, String]
Since String::split takes a regular expression you can use a look-ahead:
String[] x = "thisIsMyString".split("(?=[A-Z])");
Try this;
static Pattern p = Pattern.compile("(?=\\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");
...
This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.
(?<=.)(?=(\\p{Upper}))
TestText = Test, Text
thisIsATest = this, Is, A, Test
A simple scala/java suggestion that does not split at entire uppercase strings like NYC:
def splitAtMiddleUppercase(token: String): Iterator[String] = {
val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}
test with:
val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interÜber")
for( example <- examples) {
println(example + " -> " + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}
it produces:
catch22 -> [catch22]
iPhone -> [i, Phone]
eReplacement -> [e, Replacement]
TotalRecall -> [Total, Recall]
NYC -> [NYC]
JGHSD87 -> [JGHSD87]
interÜber -> [inter, Über]
Modify the regex to cut at digits too.
String str = "IAmAJavaProgrammer";
StringBuilder expected = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if(Character.isUpperCase(str.charAt(i))){
expected.append(" ");
}
expected.append(str.charAt(i));
}
System.out.println(expected);

Categories

Resources