problem with java split()

problem with java split() - java

I have a string:
strArray= "-------9---------------";
I want to find 9 from the string. The string may be like this:
strArray= "---4-5-5-7-9---------------";
Now I want to find out only the digits from the string. I need the values 9,4, or such things and ignore the '-' . I tried the following:
strArray= strignId.split("-");
but it gets error, since there are multiple '-' and I don't get my output. So what function of java should be used?
My input and output should be as follows:
input="-------9---------------";
output="9";
input="---4-5-5-7-9---------------";
output="45579";
What should I do?

The + is a regex metacharacter of "one-or-more" repetition, so the pattern -+ is "one or more dash". This would allow you to use str.split("-+") instead, but you may get an empty string as first element.
If you just want to remove all -, then you can do str = str.replace("-", ""). This uses replace(CharSequence, CharSequence) method, which performs literal String replacement, i.e. not regex patterns.
If you want a String[] with each digit in its own element, then it's easiest to do in two steps: first remove all non-digits, then use zero-length assertion to split everywhere that's not the beginning of the string (?!^) (to prevent getting an empty string as a first element). If you want a char[], then you can just call String.toCharArray()
Lastly, if the string can be very long, it's better to use a java.util.regex.Matcher in a find() loop looking for a digit \d, or a java.util.Scanner with a delimiter \D*, i.e. a sequence (possibly empty) of non-digits. This will not give you an array, but you can use the loop to populate a List (see Effective Java 2nd Edition, Item 25: Prefer lists to arrays).
References
regular-expressions.info/Repetition with Star and Plus, Character Class, Lookaround
Snippets
Here are some examples to illustrate the above ideas:
System.out.println(java.util.Arrays.toString(
"---4--5-67--8-9---".split("-+")
));
// [, 4, 5, 67, 8, 9]
// note the empty string as first element
System.out.println(
"---4--5-67--8-9---".replace("-", "")
);
// 456789
System.out.println(java.util.Arrays.toString(
"abcdefg".toCharArray()
));
// [a, b, c, d, e, f, g]
The next example first deletes all non-digit \D, then splitting everywhere except the beginning of the string (?!^), to get a String[] each containing a digit:
System.out.println(java.util.Arrays.toString(
"#*#^$4#!#5ajs67>?<{8_(9SKJDH"
.replaceAll("\\D", "")
.split("(?!^)")
));
// [4, 5, 6, 7, 8, 9]
This uses a Scanner, with \D* as delimiter, to get each digit as its own token, using it to populate a List<String>:
List<String> digits = new ArrayList<String>();
String text = "(&*!##123ask45{P:L6";
Scanner sc = new Scanner(text).useDelimiter("\\D*");
while (sc.hasNext()) {
digits.add(sc.next());
}
System.out.println(digits);
// [1, 2, 3, 4, 5, 6]
Common problems with split()
Here are some common beginner problems when dealing with String.split:
Lesson #1: split takes a regular expression pattern
This is probably the most common beginner mistake:
System.out.println(java.util.Arrays.toString(
"one|two|three".split("|")
));
// [, o, n, e, |, t, w, o, |, t, h, r, e, e]
System.out.println(java.util.Arrays.toString(
"not.like.this".split(".")
));
// []
The problem here is that | and . are regex metacharacters, and since they are intended to be matched literally, they need to be escaped by preceding with a backslash, which as a Java string literal is "\\".
System.out.println(java.util.Arrays.toString(
"one|two|three".split("\\|")
));
// [one, two, three]
System.out.println(java.util.Arrays.toString(
"not.like.this".split("\\.")
));
// [not, like, this]
Lesson #2: split discards trailing empty strings by default
Sometimes it's desired to keep trailing empty strings (which are discarded by default split):
System.out.println(java.util.Arrays.toString(
"a;b;;d;;;g;;".split(";")
));
// [a, b, , d, , , g]
Note that there are slots for the "missing" values for c, e, f, but not for h and i. To fix this, you can use a negative limit argument to String.split(String regex, int limit).
System.out.println(java.util.Arrays.toString(
"a;b;;d;;;g;;".split(";", -1)
));
// [a, b, , d, , , g, , ]
You can also use a positive limit of n to apply the pattern at most n - 1 times (i.e. resulting in no more than n elements in the array).
Zero-width matching split examples
Here are more examples of splitting on zero-width matching constructs; this can be used to split a string but also keep "delimiters".
Simple sentence splitting, keeping punctuation marks:
String str = "Really?Wow!This.Is.Awesome!";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[.!?])")
)); // prints "[Really?, Wow!, This., Is., Awesome!]"
Splitting a long string into fixed-length parts, using \G
String str = "012345678901234567890";
System.out.println(java.util.Arrays.toString(
str.split("(?<=\\G.{4})")
)); // prints "[0123, 4567, 8901, 2345, 6789, 0]"
Split before capital letters (except the first!)
System.out.println(java.util.Arrays.toString(
"OhMyGod".split("(?=(?!^)[A-Z])")
)); // prints "[Oh, My, God]"
A variety of examples is provided in related questions below.
References
regular-expressions.info/Lookarounds
Related questions
Can you use zero-width matching regex in String split?
"abc<def>ghi<x><x>" -> "abc", "<def>", "ghi", "<x>", "<x>"
How do I convert CamelCase into human-readable names in Java?
"AnXMLAndXSLT2.0Tool" -> "An XML And XSLT 2.0 Tool"
C# version: is there a elegant way to parse a word and add spaces before capital letters
Java split is eating my characters
Is there a way to split strings with String.split() and include the delimiters?
Regex split string but keep separators

You don't use split!
Split is to get the things BETWEEN the separator.
For this you want to eliminate the unwanted chars; '-'
The solution is simple
out=in.replaceAll("-","");

Use something like this to get the single values splitted. I'd rather eliminate the unwanted chars first to avoid getting empty/null String in the result array.
final Vector nodes = new Vector();
int index = original.indexOf(separator);
while (index >= 0) {
nodes.addElement(original.substring(0, index));
original = original.substring(index + separator.length());
index = original.indexOf(separator);
}
nodes.addElement(original);
final String[] result = new String[nodes.size()];
if (nodes.size() > 0) {
for (int loop = 0; loop smaller nodes.size(); loop++) {
result[loop] = (String) nodes.elementAt(loop);
}
}
return result;
}

Related

Regex to split a string when there's nothing between two occurrences of the delimiter

Suppose I want to split this string a^b^c^d^e^^^f^g^h^^^ , I simply do string.split("\\^") which returns me an array of length 10 i.e [a, b, c, d, e, , , f, g, h] . However I want an array of length 13 which takes occurrences of the delimiter after h into consideration.
I can do something like this to achieve what I want
string = string.replace("^", "^ ");
String[] split = string.split("\\^");
for(String x : split){
System.out.println(x.trim());
}
but this seems like an overburden. Is there a regex to do this?

You can do this
String[] split = string.split("\\^", -1);
and it won't drop trailing separators.
If you really want to trim the last separator to get 12 values, you can do
String[] split = string.replaceAll("\\^$", "").split("\\^", -1);
or this will ensure you always have 12. It will either trim or expand the number of elements as required (adding null if expanding)
String[] split = Arrays.copyOf(string.split("\\^", -1), 12);

why split() produces extra , after sets limit -1

I want to split Area Code and preceding number from Telephone number without brackets so i did this.
String pattern = "[\\(?=\\)]";
String b = "(079)25894029".trim();
String c[] = b.split(pattern,-1);
for (int a = 0; a < c.length; a++)
System.out.println("c[" + a + "]::->" + c[a] + "\nLength::->"+ c[a].length());
Output:
c[0]::-> Length::->0
c[1]::->079 Length::->3
c[2]::->25894029 Length::->8
Expected Output:
c[0]::->079 Length::->3
c[1]::->25894029 Length::->8
So my question is why split() produces and extra blank at the start, e.g
[, 079, 25894029]. Is this its behavior, or I did something go wrong here?
How can I get my expected outcome?

First you have unnecessary escaping inside your character class. Your regex is same as:
String pattern = "[(?=)]";
Now, you are getting an empty result because ( is the very first character in the string and split at 0th position will indeed cause an empty string.
To avoid that result use this code:
String str = "(079)25894029";
toks = (Character.isDigit(str.charAt(0))? str:str.substring(1)).split( "[(?=)]" );
for (String tok: toks)
System.out.printf("<<%s>>%n", tok);
Output:
<<079>>
<<25894029>>

From the Java8 Oracle docs:
When there is a positive-width match at the beginning of this string
then an empty leading substring is included at the beginning of the
resulting array. A zero-width match at the beginning however never
produces such empty leading substring.
You can check that the first character is an empty string, if yes then trim that empty string character.

Your regex has problems, as does your approach - you can't solve it using your approach with any regex. The magic one-liner you seek is:
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
This removes all leading/trailing non-digits, then splits on non-digits. This will handle many different formats and separators (try a few yourself).
See live demo of this:
String b = "(079)25894029".trim();
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
System.out.println(Arrays.toString(c));
Producing this:
[079, 25894029]

Split string containing newline characters Java

Say I have a following string str:
GTM =0.2
Test =100
[DLM]
ABCDEF =5
(yes, it contains newline characters) That I am trying to split with [DLM] delimiter substring like this:
String[] strArr = str.split("[DLM]");
Why is it that when I do:
System.out.print(strArr[0]);
I get this output: GT
and when I do
System.out.print(strArr[1]);
I get =0.2
Does this make any sense at all?

str.split("[DLM]"); should be str.split("\\[DLM\\]");
Why?
[ and ] are special characters and String#split accepts regex.
A solution that I like more is using Pattern#quote:
str.split(Pattern.quote("[DLM]"));
quote returns a String representation of the given regex.

Yes, you're giving a regex which says "split with either D, or L, or M".
You should escape those boys like this: str.split("\[DLM\]");
It's being split at the first M.

Escape the brackets
("\\[DLM\\]")
When you use brackets inside the " ", it reads it as, each character inside of the brackets is a delimiter. So in your case, M was a delimiter

use
String[] strArr = str.split("\\[DLM]\\");
Instead of
String[] strArr = str.split("[DLM]");
Other wise it will split with either D, or L, or M.

Writing a regex in java. Using the string.split() method. I want it to stop splitting after the first occurrence of '('

I have strings with this format: "a,b, c,d" and this format: "a(b,c,d)"
I want to split on ',' or ', ' but I want to terminate splitting when I encounter the '(' in the second format.
This is what I had before I started hacking.
String [] stringArray = string.split(", |,");
The array of the first format would contain: 'a', 'b', 'c', 'd'
The array of the second format would conaint 'a(b,c,d)'
Example:
String string1 = "ab,cd, de";
String string2 = "ab(de,ef);
String [] array1 = string1.split(...);
String [] array2 = string2.split(...);
array1 result: ["ab" "cd" "de"]
array2 result: ["ab(de,ef)"]
The number of characters between the commas are not limited. I hope this is more clear.
Thanks.

If you know the parentheses are always properly balanced and they'll never be nested inside other parens, this will work:
String[] result = source.split(",\\s*(?![^()]*\\))");
If the lookahead finds a ) without seeing a ( first, it must be inside a pair of parens. Given this string:
"ab,cd, de,ef(gh,ij), kl,mn"
...result will be:
["ab", "cd", "de", "ef(gh,ij)", "kl", "mn"]

I think what you could need is a negative lookbehind; according to the doc, Java regex are like (more or less) Perl regex; but variable length lookbehind is not implemented in Perl, so that (?<!\(.*),\s* won't work (it would match comma followed by any number of spaces or no space, and not preceded by a ( followed by anything, i.e. would match comma only if not preceded by a ().
I believe the easiest thing is to split on the first ( occurrence (you can avoid regex to do so) and treat the two resulting segments differently, splitting the first on , and adding to the final array the second (prepended with the maybe lost ().
EDIT
since "a(b,d)" should give "a(b,d)", you must append whatever comes after ( (included) to the last splited string from the "first" segment. However, the concept is as written before.

Use the indexOf() method.
Initially, check if a string has a "(".
index = string.indexOf('(');
if(index ==-1) // it means there is no '('
{
string.split(...);
}
else
{
subString = string.subString(0,index); // the part of the string before the '('
// now do the following-
// 1. proceed with split on substring
// array1 = substring.split(...)
// 2. Create a new array, insert the elements of array1 in it,
// followed by the remaining part of the string
// array2 = combine(array1, string.subString(index+1)); // <-- you will need to write this method
}

Apply regex conditions after certain character

I have a string that i want to parse into an array.
The given string has the form P[AB, AC, AD] (A1, A2, A3).
I want to store it in an array like this so that all the data after the first ( will be filtered by the regex conditions:
P[AB, AC, AD]
A1
A2
A3
This is what i came up with:
String regex = "/^(([,()]+)$";
String[] numbers = stringIn.split(regex);
My problem is that it simply does not work because the regex won't filter out the pieces, everything is stored at numbers[0].

I think this is what you were trying for:
String regex = "[ ,()]+(?=[^\\]]*$)";
String stringIn = "P[AB, AC, AD] (A1, A2, A3)";
String[] numbers = stringIn.split(regex);
for (String n : numbers)
{
System.out.println(n);
}
output:
P[AB, AC, AD]
A1
A2
A3
[ ,()]+ part tries match spaces, commas and parentheses wherever they appear, but (?=[^\\]]*$) (a positive lookahead) filters out any match before the ]. I'm assuming there's only the one set of square brackets in the string.

Thought of a less elegant, but possibly easier to follow, way of doing this:
// Split the string into two parts.
// parts[0] == "P[AB, AC, AD]"
// parts[1] == "A1, A2, A3"
String[] parts = stringIn.split(" *\\(|\\)");
// Split the second part into its components.
String[] secondParts = stringInParts[1].split(", *");
// Combine the results.
String[] numbers = new String[secondParts.length + 1];
numbers[0] = parts[0];
System.arrayCopy(secondParts, 0, numbers, 1, secondParts.length);

You need to escape the parenthesis that you want to match literally, and you don't need back-slashes around the body, and you probably want to match instead of splitting.
Pattern regex = Pattern.compile("\\(([^)]+)");
Matcher m = regex.matcher(stringIn);
if (m.find()) {
String[] numbers = m.group(1).split("[,\\s]+");
}
Finally, unlike in JavaScript and C# and Python, $ in Java does not match the end of the input. You need to use \\z instead. $ in Java matches at the end of input or just before the last linebreak if there is a linebreak at the end of the input.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

problem with java split() - java

You don't use split! Split is to get the things BETWEEN the separator. For this you want to eliminate the unwanted chars; '-' The solution is simple out=in.replaceAll("-","");

Related

Regex to split a string when there's nothing between two occurrences of the delimiter

why split() produces extra , after sets limit -1

Split string containing newline characters Java

Writing a regex in java. Using the string.split() method. I want it to stop splitting after the first occurrence of '('

Apply regex conditions after certain character

Categories

Resources