Java: Split string when an uppercase letter is found

Java: Split string when an uppercase letter is found - java

I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)
I have a String such as "thisIsMyString" and I need to convert it to a String[] {"this", "Is", "My", "String"}.
Please notice the first letter is not uppercase.

You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:
String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");
Y(?=X) matches Y followed by X, but doesn't include X into match. So (?=\\p{Upper}) matches an empty sequence followed by a uppercase letter, and split uses it as a delimiter.
See javadoc for more info on Java regexp syntax.
EDIT: By the way, it doesn't work with thisIsMyÜberString. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:
String[] r = s.split("(?=\\p{Lu})");

String[] camelCaseWords = s.split("(?=[A-Z])");

For anyone that wonders how the Pattern is when the String to split might start with an upper case character:
String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\\p{Lu})");
System.out.println(Arrays.toString(r));
gives: [This, Is, My, String]

Since String::split takes a regular expression you can use a look-ahead:
String[] x = "thisIsMyString".split("(?=[A-Z])");

Try this;
static Pattern p = Pattern.compile("(?=\\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");
...

This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.
(?<=.)(?=(\\p{Upper}))
TestText = Test, Text
thisIsATest = this, Is, A, Test

A simple scala/java suggestion that does not split at entire uppercase strings like NYC:
def splitAtMiddleUppercase(token: String): Iterator[String] = {
val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}
test with:
val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interÜber")
for( example <- examples) {
println(example + " -> " + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}
it produces:
catch22 -> [catch22]
iPhone -> [i, Phone]
eReplacement -> [e, Replacement]
TotalRecall -> [Total, Recall]
NYC -> [NYC]
JGHSD87 -> [JGHSD87]
interÜber -> [inter, Über]
Modify the regex to cut at digits too.

String str = "IAmAJavaProgrammer";
StringBuilder expected = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if(Character.isUpperCase(str.charAt(i))){
expected.append(" ");
}
expected.append(str.charAt(i));
}
System.out.println(expected);

Related

How to split uppercase string? [duplicate]

I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)
I have a String such as "thisIsMyString" and I need to convert it to a String[] {"this", "Is", "My", "String"}.
Please notice the first letter is not uppercase.

You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:
String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");
Y(?=X) matches Y followed by X, but doesn't include X into match. So (?=\\p{Upper}) matches an empty sequence followed by a uppercase letter, and split uses it as a delimiter.
See javadoc for more info on Java regexp syntax.
EDIT: By the way, it doesn't work with thisIsMyÜberString. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:
String[] r = s.split("(?=\\p{Lu})");

String[] camelCaseWords = s.split("(?=[A-Z])");

For anyone that wonders how the Pattern is when the String to split might start with an upper case character:
String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\\p{Lu})");
System.out.println(Arrays.toString(r));
gives: [This, Is, My, String]

Since String::split takes a regular expression you can use a look-ahead:
String[] x = "thisIsMyString".split("(?=[A-Z])");

Try this;
static Pattern p = Pattern.compile("(?=\\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");
...

This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.
(?<=.)(?=(\\p{Upper}))
TestText = Test, Text
thisIsATest = this, Is, A, Test

A simple scala/java suggestion that does not split at entire uppercase strings like NYC:
def splitAtMiddleUppercase(token: String): Iterator[String] = {
val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}
test with:
val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interÜber")
for( example <- examples) {
println(example + " -> " + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}
it produces:
catch22 -> [catch22]
iPhone -> [i, Phone]
eReplacement -> [e, Replacement]
TotalRecall -> [Total, Recall]
NYC -> [NYC]
JGHSD87 -> [JGHSD87]
interÜber -> [inter, Über]
Modify the regex to cut at digits too.

String str = "IAmAJavaProgrammer";
StringBuilder expected = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if(Character.isUpperCase(str.charAt(i))){
expected.append(" ");
}
expected.append(str.charAt(i));
}
System.out.println(expected);

How to split a string and save the 2 characters that I split with?

I am trying to split a given string using the java split method while the string should be devided by two different characters (+ and -) and I am willing to save the characters inside the array aswell in the same index the string has been saven.
for example :
input : String s = "4x^2+3x-2"
output :
arr[0] = 4x^2
arr[1] = +3x
arr[2] = -2
I know how to get the + or - characters in a different index between the numbers but it is not helping me,
any suggestions please?

You can face this problem in many ways. I´m sure there are clever and fancy ways to split this expression. I will show you the simplest problem-solving process that can help you.
State the problem you need to solve, the input and output
Problem: Split a math expression into subexpressions at + and - signals
Input: 4x^2+3x-2
Output: 4x^2,+3x,-2
Create a pseudo code with some logic you might think works
Given an expression string
Create an empty list of expressions
Create a subExpression string
For each character in the expression
Check if the character is + ou - then
add the subExpression in the list and create a new empty subexpression
otherwise, append the character in the subExpression
In the end, add the left subexpression in the list
Implement the pseudo-code in the programming language of your choice
String expression = "4x^2+3x-2";
List<String> expressions = new ArrayList();
StringBuilder subExpression = new StringBuilder();
for (int i = 0; i < expression.length(); i++) {
char character = expression.charAt(i);
if (character == '-' || character == '+') {
expressions.add(subExpression.toString());
subExpression = new StringBuilder(String.valueOf(character));
} else {
subExpression.append(String.valueOf(character));
}
}
expressions.add(subExpression.toString());
System.out.println(expressions);
Output
[4x^2, +3x, -2]
You will end with one algorithm that works for your problem. You can start to improve it.

Try this code:
String s = "4x^2+3x-2";
s = s.replace("+", "#+");
s = s.replace("-", "#-");
String[] ss = s.split("#");
for (int i = 0; i < ss.length; i++) {
Log.e("XOP",ss[i]);
}
This code replaces + and - with #+ and #- respectively and then splits the string with #. That way the + and - operators are not lost in the result.
If you require # as input character then you can use any other Unicode character instead of #.

Try this one:
String s = "4x^2+3x-2";
String[] arr = s.split("[\\+-]");
for(int i=0;i<arr.length;i++){
System.out.println(arr[i]);
}

Personally I like it better to have positive matches of patterns, especially if the split pattern itself is empty.
So for instance you could use a Pattern and Matcher like this:
Pattern p = Pattern.compile("(^|[+-])([^+-]*)");
Matcher m = p.matcher("4x^2+3x-2");
while (m.find()) {
System.out.printf("%s or %s %s%n", m.group(), m.group(1), m.group(2));
}
This matches the start of the string or a plus or minus: ^|[+-], followed by any amount of characters that are not a plus or minus: [^+-]*.
Do note that the ^ first matches the start of the string, and is then used to negate a character class when used between brackets. Regular expressions are tricky like that.
Bonus: you can also use the two groups (within the parenthesis in the pattern) to match the operators - if any.
All this is presuming that you want to use/test regular expressions; generally things like this require a parser rather than a regular expression.
A one-liner for persons thinking that this is too complex:
var expressions = Pattern.compile("^|[+-][^+-]*")
.matcher("4x^2+3x-2")
.results()
.map(r -> r.group())
.collect(Collectors.toList());

Java regex: Replace all characters with `+` except instances of a given string

I have the following problem which states
Replace all characters in a string with + symbol except instances of the given string in the method
so for example if the string given was abc123efg and they want me to replace every character except every instance of 123 then it would become +++123+++.
I figured a regular expression is probably the best for this and I came up with this.
str.replaceAll("[^str]","+")
where str is a variable, but its not letting me use the method without putting it in quotations. If I just want to replace the variable string str how can I do that? I ran it with the string manually typed and it worked on the method, but can I just input a variable?
as of right now I believe its looking for the string "str" and not the variable string.
Here is the output its right for so many cases except for two :(
List of open test cases:
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
plusOut("abXYabcXYZ", "ab") → "ab++ab++++"
plusOut("abXYabcXYZ", "abc") → "++++abc+++"
plusOut("abXYabcXYZ", "XY") → "++XY+++XY+"
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
plusOut("--++ab", "++") → "++++++"
plusOut("aaxxxxbb", "xx") → "++xxxx++"
plusOut("123123", "3") → "++3++3"

Looks like this is the plusOut problem on CodingBat.
I had 3 solutions to this problem, and wrote a new streaming solution just for fun.
Solution 1: Loop and check
Create a StringBuilder out of the input string, and check for the word at every position. Replace the character if doesn't match, and skip the length of the word if found.
public String plusOut(String str, String word) {
StringBuilder out = new StringBuilder(str);
for (int i = 0; i < out.length(); ) {
if (!str.startsWith(word, i))
out.setCharAt(i++, '+');
else
i += word.length();
}
return out.toString();
}
This is probably the expected answer for a beginner programmer, though there is an assumption that the string doesn't contain any astral plane character, which would be represented by 2 char instead of 1.
Solution 2: Replace the word with a marker, replace the rest, then restore the word
public String plusOut(String str, String word) {
return str.replaceAll(java.util.regex.Pattern.quote(word), "#").replaceAll("[^#]", "+").replaceAll("#", word);
}
Not a proper solution since it assumes that a certain character or sequence of character doesn't appear in the string.
Note the use of Pattern.quote to prevent the word being interpreted as regex syntax by replaceAll method.
Solution 3: Regex with \G
public String plusOut(String str, String word) {
word = java.util.regex.Pattern.quote(word);
return str.replaceAll("\\G((?:" + word + ")*+).", "$1+");
}
Construct regex \G((?:word)*+)., which does more or less what solution 1 is doing:
\G makes sure the match starts from where the previous match leaves off
((?:word)*+) picks out 0 or more instance of word - if any, so that we can keep them in the replacement with $1. The key here is the possessive quantifier *+, which forces the regex to keep any instance of the word it finds. Otherwise, the regex will not work correctly when the word appear at the end of the string, as the regex backtracks to match .
. will not be part of any word, since the previous part already picks out all consecutive appearances of word and disallow backtrack. We will replace this with +
Solution 4: Streaming
public String plusOut(String str, String word) {
return String.join(word,
Arrays.stream(str.split(java.util.regex.Pattern.quote(word), -1))
.map((String s) -> s.replaceAll("(?s:.)", "+"))
.collect(Collectors.toList()));
}
The idea is to split the string by word, do the replacement on the rest, and join them back with word using String.join method.
Same as above, we need Pattern.quote to avoid split interpreting the word as regex. Since split by default removes empty string at the end of the array, we need to use -1 in the second parameter to make split leave those empty strings alone.
Then we create a stream out of the array and replace the rest as strings of +. In Java 11, we can use s -> String.repeat(s.length()) instead.
The rest is just converting the Stream to an Iterable (List in this case) and joining them for the result

This is a bit trickier than you might initially think because you don't just need to match characters, but the absence of specific phrase - a negated character set is not enough. If the string is 123, you would need:
(?<=^|123)(?!123).*?(?=123|$)
https://regex101.com/r/EZWMqM/1/
That is - lookbehind for the start of the string or "123", make sure the current position is not followed by 123, then lazy-repeat any character until lookahead matches "123" or the end of the string. This will match all characters which are not in a "123" substring. Then, you need to replace each character with a +, after which you can use appendReplacement and a StringBuffer to create the result string:
String inputPhrase = "123";
String inputStr = "abc123efg123123hij";
StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("(?<=^|" + inputPhrase + ")(?!" + inputPhrase + ").*?(?=" + inputPhrase + "|$)");
Matcher m = regex.matcher(inputStr);
while (m.find()) {
String replacement = m.group(0).replaceAll(".", "+");
m.appendReplacement(resultString, replacement);
}
m.appendTail(resultString);
System.out.println(resultString.toString());
Output:
+++123+++123123+++
Note that if the inputPhrase can contain character with a special meaning in a regular expression, you'll have to escape them first before concatenating into the pattern.

You can do it in one line:
input = input.replaceAll("((?:" + str + ")+)?(?!" + str + ").((?:" + str + ")+)?", "$1+$2");
This optionally captures "123" either side of each character and puts them back (a blank if there's no "123"):

So instead of coming up with a regular expression that matches the absence of a string. We might as well just match the selected phrase and append + the number of skipped characters.
StringBuilder sb = new StringBuilder();
Matcher m = Pattern.compile(Pattern.quote(str)).matcher(input);
while (m.find()) {
for (int i = 0; i < m.start(); i++) sb.append('+');
sb.append(str);
}
int remaining = input.length() - sb.length();
for (int i = 0; i < remaining; i++) {
sb.append('+');
}

Absolutely just for the fun of it, a solution using CharBuffer (unexpectedly it took a lot more that I initially hoped for):
private static String plusOutCharBuffer(String input, String match) {
int size = match.length();
CharBuffer cb = CharBuffer.wrap(input.toCharArray());
CharBuffer word = CharBuffer.wrap(match);
int x = 0;
for (; cb.remaining() > 0;) {
if (!cb.subSequence(0, size < cb.remaining() ? size : cb.remaining()).equals(word)) {
cb.put(x, '+');
cb.clear().position(++x);
} else {
cb.clear().position(x = x + size);
}
}
return cb.clear().toString();
}

To make this work you need a beast of a pattern. Let's say you you are operating on the following test case as an example:
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
What you need to do is build a series of clauses in your pattern to match a single character at a time:
Any character that is NOT "X", "Y" or "Z" -- [^XYZ]
Any "X" not followed by "YZ" -- X(?!YZ)
Any "Y" not preceded by "X" -- (?<!X)Y
Any "Y" not followed by "Z" -- Y(?!Z)
Any "Z" not preceded by "XY" -- (?<!XY)Z
An example of this replacement can be found here: https://regex101.com/r/jK5wU3/4
Here is an example of how this might work (most certainly not optimized, but it works):
import java.util.regex.Pattern;
public class Test {
public static void plusOut(String text, String exclude) {
StringBuilder pattern = new StringBuilder("");
for (int i=0; i<exclude.length(); i++) {
Character target = exclude.charAt(i);
String prefix = (i > 0) ? exclude.substring(0, i) : "";
String postfix = (i < exclude.length() - 1) ? exclude.substring(i+1) : "";
// add the look-behind (?<!X)Y
if (!prefix.isEmpty()) {
pattern.append("(?<!").append(Pattern.quote(prefix)).append(")")
.append(Pattern.quote(target.toString())).append("|");
}
// add the look-ahead X(?!YZ)
if (!postfix.isEmpty()) {
pattern.append(Pattern.quote(target.toString()))
.append("(?!").append(Pattern.quote(postfix)).append(")|");
}
}
// add in the other character exclusion
pattern.append("[^" + Pattern.quote(exclude) + "]");
System.out.println(text.replaceAll(pattern.toString(), "+"));
}
public static void main(String [] args) {
plusOut("12xy34", "xy");
plusOut("12xy34", "1");
plusOut("12xy34xyabcxy", "xy");
plusOut("abXYabcXYZ", "ab");
plusOut("abXYabcXYZ", "abc");
plusOut("abXYabcXYZ", "XY");
plusOut("abXYxyzXYZ", "XYZ");
plusOut("--++ab", "++");
plusOut("aaxxxxbb", "xx");
plusOut("123123", "3");
}
}
UPDATE: Even this doesn't quite work because it can't deal with exclusions that are just repeated characters, like "xx". Regular expressions are most definitely not the right tool for this, but I thought it might be possible. After poking around, I'm not so sure a pattern even exists that might make this work.

The problem in your solution that you put a set of instance string str.replaceAll("[^str]","+") which it will exclude any character from the variable str and that will not solve your problem
EX: when you try str.replaceAll("[^XYZ]","+") it will exclude any combination of character X , character Y and character Z from your replacing method so you will get "++XY+++XYZ".
Actually you should exclude a sequence of characters instead in str.replaceAll.
You can do it by using capture group of characters like (XYZ) then use a negative lookahead to match a string which does not contain characters sequence : ^((?!XYZ).)*$
Check this solution for more info about this problem but you should know that it may be complicated to find regular expression to do that directly.
I have found two simple solutions for this problem :
Solution 1:
You can implement a method to replace all characters with '+' except the instance of given string:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
for(int i = 0; i < str.length(); i++){
// exclude any instance string of exWord from replacing process in str
if(str.substring(i, str.length()).indexOf(exWord) + i == i){
i = i + exWord.length()-1;
}
else{
str = str.substring(0,i) + "+" + str.substring(i+1);//replace each character with '+' symbol
}
}
Note : str.substring(i, str.length()).indexOf(exWord) + i this if statement will exclude any instance string of exWord from replacing process in str.
Output:
+++++++XYZ
Solution 2:
You can try this Approach using ReplaceAll method and it doesn't need any complex regular expression:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
str = str.replaceAll(exWord,"*"); // replace instance string with * symbol
str = str.replaceAll("[^*]","+"); // replace all characters with + symbol except *
str = str.replaceAll("\\*",exWord); // replace * symbol with instance string
Note : This solution will work only if your input string str doesn't contain any * symbol.
Also you should escape any character with a special meaning in a regular expression in phrase instance string exWord like : exWord = "++".

Best way to split "a{b}c{d}"

I'm struggling other than brute force method to split
String str = "a{b}c{d}"
into
String[] arr;
arr[0] = "a"
arr[1] = "{b}"
arr[2] = "c"
arr[3] = "{d}"
Wondering if there's a more efficient way other out there than using indexOf and subString

Based on your current edit it looks like you want to split on place which is either
directly before {
directly after }
In that case you can use split method which supports regex (regular expression). Regex provides lookaround mechanisms like
(?=subregex) to see if we are directly before something which can be matched by subregex
(?<=subregex) to see if we are directly after something which can be matched by subregex
Also { and } are considered regex metacharacters (we can use them like {m,n} to describe amount of repetitions like a{1,3} can match a, aa, aaa but not aaaa or more) so to make it normal literal we need to escape it like \{ and \}
Last thing you need is OR operator which is represented as |.
So your code can look like:
String str = "a{b}c{d}";
String[] arr = str.split("(?=\\{)|(?<=\\})"); // split at places before "{" OR after "}"
for (String s : arr){
System.out.println(s);
}
Output:
a
{b}
c
{d}
Demo: https://ideone.com/FdUbKs

just use the String.split() method (documentation)
arr = str.split()

You may use the String.split(String delimiter) method :
String str = "a {b} c {d}";
String[] arr = str.split(" ");
System.out.println(Arrays.toString(arr)); // [a, {b], c, {d}]

Use String.split()...
String[] arr = str.split(" ");

I don't know if it's as efficient as the previous regex solutions; I'm putting a single white space before { and after } then splitting string by " ":
String str = "a{b}c{d}";
String[] split = str.replace("{"," {").replace("}","} ").split(" ");
System.out.println(Arrays.toString(split));
Desired output:
[a, {b}, c, {d}]

How to extract integers from a complicated string?

I am having a hard time figuring with out. Say I have String like this
String s could equal
s = "{1,4,204,3}"
at another time it could equal
s = "&5,3,5,20&"
or it could equal at another time
s = "/4,2,41,23/"
Is there any way I could just extract the numbers out of this string and make a char array for example?

You can use regex for this sample:
String s = "&5,3,5,20&";
System.out.println(s.replaceAll("[^0-9,]", ""));
result:
5,3,5,20
It will replace all the non word except numbers and commas. If you want to extract all the number you can just call split method -> String [] sArray = s.split(","); and iterate to all the array to extract all the number between commas.

You can use RegEx and extract all the digits from the string.
stringWithOnlyNumbers = str.replaceAll("[^\\d,]+","");
After this you can use split() using deliminator ',' to get the numbers in an array.

I think split() with replace() must help you with that

Use regular expressions
String a = "asdf4sdf5323ki";
String regex = "([0-9]*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(a);
while (matcher.find())
{
String group = matcher.group(1);
if (group.length() > 0)
{
System.out.println(group);
}
}

from your cases, if the pattern of string is same in all cases, then something like below would work, check for any exceptions, not mentioned here :
String[] sArr= s.split(",");
sArr[0] = sArr[0].substring(1);
sArr[sArr.length()-1] =sArr[sArr.length()-1].substring(0,sArr[sArr.length()-1].length()-1);
then convert the String[] to char[] , here is an example converter method

You can use Scanner class with , delimiter
String s = "{1,4,204,3}";
Scanner in = new Scanner(s.substring(1, s.length() - 1)); // Will scan the 1,4,204,3 part
in.useDelimiter(",");
while(in.hasNextInt()){
int x = in.nextInt();
System.out.print(x + " ");
// do something with x
}
The above will print:
1 4 204 3

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Split string when an uppercase letter is found - java

I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :) I have a String such as "thisIsMyString" and I need to convert it to a String[] {"this", "Is", "My", "String"}. Please notice the first letter is not uppercase.

String[] camelCaseWords = s.split("(?=[A-Z])");

For anyone that wonders how the Pattern is when the String to split might start with an upper case character: String s = "ThisIsMyString"; String[] r = s.split("(?<=.)(?=\\p{Lu})"); System.out.println(Arrays.toString(r)); gives: [This, Is, My, String]

Since String::split takes a regular expression you can use a look-ahead: String[] x = "thisIsMyString".split("(?=[A-Z])");

Try this; static Pattern p = Pattern.compile("(?=\\p{Lu})"); String[] s1 = p.split("thisIsMyFirstString"); String[] s2 = p.split("thisIsMySecondString"); ...

This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case. (?<=.)(?=(\\p{Upper})) TestText = Test, Text thisIsATest = this, Is, A, Test

String str = "IAmAJavaProgrammer"; StringBuilder expected = new StringBuilder(); for (int i = 0; i < str.length(); i++) { if(Character.isUpperCase(str.charAt(i))){ expected.append(" "); } expected.append(str.charAt(i)); } System.out.println(expected);

Related

How to split uppercase string? [duplicate]

How to split a string and save the 2 characters that I split with?

Java regex: Replace all characters with `+` except instances of a given string

Best way to split "a{b}c{d}"

How to extract integers from a complicated string?

Categories

Resources