Splitting String with wildcard - java

I have a variable String which contains values i need and splitters. The problem is, the length of the string is variable and the type of splitters as well. They arrive through XML-file.
A string will look like this:
1+"."+20+"."+51+"."+2+"name.jpg"
but can also be:
1+"*"+20+"*"+51+"name.jpg"
The solid factors are:
the digits are id's which I need to retrieve.
the splitter values will be between "quotes".
the amount of id's is unknown, can be one, can be 200
the value used to split can be everything, but will always be between two quotes.
I was looking for a way to split the string on the "." but instead of the dot (.) give a wildcard, which can be 1 character or multiple.
Note: The value between the quotes can be anything! Doesn't even have to be a single character

Try to split by regular expression, i.e. like this:
String regex = "\\+?\"[^\"]*\"\\+?";
System.out.println(Arrays.toString( "1+\".\"+20+\".\"+51+\".\"+2+\"name.jpg\"".split( regex ) ));
System.out.println(Arrays.toString( "1+\"*\"+20+\"*\"+51+\"name.jpg\"".split( regex ) ));
Output:
[1, 20, 51, 2]
[1, 20, 51]
The regex would match any 2 double quotes with non-double quote characters in between and preceeded/followed by optional pluses. You could expand that to allow whitespace as well, e.g. "\\s*\\+?\\s*\"[^\"]*\"\\s*\\+?\\s*". The only thing that's not allowed in a splitter would be double quotes.
If you need the name as well, you might try and define the potential splitters in the regex,
e.g. "(\\+?\"[\\.\\*]*\"\\+?)|\\+?\""
Note that in that case you'd have to account for the quotes around the name, i.e. to split 2+"name.jpg" you have to add the alternative \+?" (double quotes preceded by an optional plus).
Update:
Additional examples (input -> output)
5+".."+272+"..."+21+"splitter"+2+"name.jpg" --> [5, 272, 21, 2]
444+"()"+0+"abc"+51+"__"+2+"name.jpg" --> [444, 0, 51, 2]
1+"."+20+"."+51+"."+2+"name.jpg" --> [1, 20, 51, 2]
1+"*"+20+"*"+51+"name.jpg" --> [1, 20, 51]

hmm can't you try something like this:
String oldStr=1+"."+20+"."+51+"."+2+"name.jpg";
String newStr= oldStr.replace("name.jpg",""); // or you can use regex such as : oldStr.replaceAll("(\w+.\w+)","");
String[] array;
array=newStr.split(".");
if(array==null || array.length==0){
array=newStr.split("*");
}

So, just that I get it right, possible filenames / string values are:
1.20.51.2name.jpg
1*20*51*name.jpg
Right?
So more general you could say: Some digits of unknown amount, seperated by a non-digit character?
You could execute a RegEx statement onto each String: \d+.
If executed globaly, you will get a list of each number. So for
1.20.51.2name.jpg
I got
1, 20, 51, 2

Using this :
String x = 1+"."+20+"."+51+"."+2+"name.jpg";
String y = 1+"*"+20+"*"+51+"name.jpg";
System.out.println(Arrays.toString(x.split("\\.|\\*")));
System.out.println(Arrays.toString(y.split("\\.|\\*")));
Will give you the following output:
[1, 20, 51, 2name, jpg]
[1, 20, 51name, jpg]

Related

Split comma-separated string but ignore comma followed by a space

public static void main(String[] args) {
String title = "Today, and tomorrow,2,1,2,5,0";
String[] titleSep = title.split(",");
System.out.println(Arrays.toString(titleSep));
System.out.println(titleSep[0]);
System.out.println(titleSep[1]);
}
output:
[Today, and tomorrow, 2, 1, 2, 5, 0]
Today
(space) and tomorrow
I want to treat "Today, and tomorrow" as a phrase representing the first index value of titleSep (do not want to separate at comma it contains).
What is the split method argument that would split the string only at commas NOT followed by a space?
(Java 8)
Use a negative look ahead:
String[] titleSep = title.split(",(?! )");
The regex (?! ) means "the input following the current position is not a space".
FYI a negative look ahead has the form (?!<some regex>) and a positive look ahead has the form (?=<some regex>)
The argument to the split function is a regex, so we can use a negative lookahead to split by comma-not-followed-by-space:
String title = "Today, and tomorrow,2,1,2,5,0";
String[] titleSep = title.split(",(?! )"); // comma not followed by space
System.out.println(Arrays.toString(titleSep));
System.out.println(titleSep[0]);
System.out.println(titleSep[1]);
The output is:
[Today, and tomorrow, 2, 1, 2, 5, 0]
Today, and tomorrow
2

Java String.split() - splitting a string using //s+ doesn't capture parentheses as separate elements?

Let's say I have a string: "(2 * 32) + 5 ^ 2"
I'd like to turn this into a String array: [(2, *, 32, ), +, 5, ^, 2]
i.e. I don't want to capture spaces in the original string and I want to split by whitespace characters.
So I tried string.split**("\\s+")** but the result looks like [(2,*,32), +, 5, ^, 2].
Can someone explain why it doesn't split "(2" into (,2? Thank you!
This works, and has the added benefit of not splitting when there are numbers longer than 1 digit, and not requiring spaces between tokens.
String str = "(2*32) + 5 ^ 2";
String[] tokens = str.replace(" ", "").split("\\b|(?=\\D)");
Output:
[ (, 2, *, 32, ), +, 5, ^, 2 ]
Ideone Demo

Java an unremoveable white space string

I have this string from mysql DB: it should be this: 2100428169/2010
this is my code
String str = rs.getString("str");
str = str.replaceAll("\\s+","");
str = str.trim();
char[] strCH = str.toCharArray();
and I get this:
[, 2, 1, 0, 0, 4, 2, 8, 1, 6, 9, /, 2, 0, 1, 0]
Why?
It's a problem because I need to use str1.equals(str) but it doesn't work because after
Object obj = (object)str;
It is in obj again with a space at the beginning like when I use toCharArray so it means equals doesn't work.
I finally found solution:
it was problem because of ASCII 65279 is something from BOM and trim() doesn't work for it.
this helped: str = str.replace("\uFEFF", "");
Neither replaceAll() nor trim() will work for some characters.
Actually there are several characters that could not be removed with this method. I even saw some files having characters that could not be recognized by java compiler, which creates unbelievable situations.
trim() method removes all \s from ends of string and by replaceAll() you are removing all \s from your string.
Instead use following
str = str.replaceAll("[^\\w\\\\]+", "");
You don't need to call trim() now.

replaceAll not working as expected on a String

I have this line of code: temp5.replaceAll("\\W", "");
The contents of temp5 at this point are: [1, 2, 3, 4] but the regex doesn't remove anything. And when I do a toCharArray() I end up with this: [[, 1, ,, , 2, ,, , 3, ,, , 4, ]]
Am I not using the regex correctly? I was under the impression that \W should remove all punctuation and white space.
Note: temp5 is a String
And I just tested using \w, \W, and various others. Same output for all of them
Strings are immutable. replaceAll() returns the string with the changes made, it does not modify temp5. So you might do something like this instead:
temp5 = temp5.replaceAll("\\W", "");
After that, temp5 will be "1234".
String temp5="1, 2, 3, 4";
temp5=temp5.replaceAll("\\W", "");
System.out.println(temp5.toCharArray());
This will help

problem with java split()

I have a string:
strArray= "-------9---------------";
I want to find 9 from the string. The string may be like this:
strArray= "---4-5-5-7-9---------------";
Now I want to find out only the digits from the string. I need the values 9,4, or such things and ignore the '-' . I tried the following:
strArray= strignId.split("-");
but it gets error, since there are multiple '-' and I don't get my output. So what function of java should be used?
My input and output should be as follows:
input="-------9---------------";
output="9";
input="---4-5-5-7-9---------------";
output="45579";
What should I do?
The + is a regex metacharacter of "one-or-more" repetition, so the pattern -+ is "one or more dash". This would allow you to use str.split("-+") instead, but you may get an empty string as first element.
If you just want to remove all -, then you can do str = str.replace("-", ""). This uses replace(CharSequence, CharSequence) method, which performs literal String replacement, i.e. not regex patterns.
If you want a String[] with each digit in its own element, then it's easiest to do in two steps: first remove all non-digits, then use zero-length assertion to split everywhere that's not the beginning of the string (?!^) (to prevent getting an empty string as a first element). If you want a char[], then you can just call String.toCharArray()
Lastly, if the string can be very long, it's better to use a java.util.regex.Matcher in a find() loop looking for a digit \d, or a java.util.Scanner with a delimiter \D*, i.e. a sequence (possibly empty) of non-digits. This will not give you an array, but you can use the loop to populate a List (see Effective Java 2nd Edition, Item 25: Prefer lists to arrays).
References
regular-expressions.info/Repetition with Star and Plus, Character Class, Lookaround
Snippets
Here are some examples to illustrate the above ideas:
System.out.println(java.util.Arrays.toString(
"---4--5-67--8-9---".split("-+")
));
// [, 4, 5, 67, 8, 9]
// note the empty string as first element
System.out.println(
"---4--5-67--8-9---".replace("-", "")
);
// 456789
System.out.println(java.util.Arrays.toString(
"abcdefg".toCharArray()
));
// [a, b, c, d, e, f, g]
The next example first deletes all non-digit \D, then splitting everywhere except the beginning of the string (?!^), to get a String[] each containing a digit:
System.out.println(java.util.Arrays.toString(
"#*#^$4#!#5ajs67>?<{8_(9SKJDH"
.replaceAll("\\D", "")
.split("(?!^)")
));
// [4, 5, 6, 7, 8, 9]
This uses a Scanner, with \D* as delimiter, to get each digit as its own token, using it to populate a List<String>:
List<String> digits = new ArrayList<String>();
String text = "(&*!##123ask45{P:L6";
Scanner sc = new Scanner(text).useDelimiter("\\D*");
while (sc.hasNext()) {
digits.add(sc.next());
}
System.out.println(digits);
// [1, 2, 3, 4, 5, 6]
Common problems with split()
Here are some common beginner problems when dealing with String.split:
Lesson #1: split takes a regular expression pattern
This is probably the most common beginner mistake:
System.out.println(java.util.Arrays.toString(
"one|two|three".split("|")
));
// [, o, n, e, |, t, w, o, |, t, h, r, e, e]
System.out.println(java.util.Arrays.toString(
"not.like.this".split(".")
));
// []
The problem here is that | and . are regex metacharacters, and since they are intended to be matched literally, they need to be escaped by preceding with a backslash, which as a Java string literal is "\\".
System.out.println(java.util.Arrays.toString(
"one|two|three".split("\\|")
));
// [one, two, three]
System.out.println(java.util.Arrays.toString(
"not.like.this".split("\\.")
));
// [not, like, this]
Lesson #2: split discards trailing empty strings by default
Sometimes it's desired to keep trailing empty strings (which are discarded by default split):
System.out.println(java.util.Arrays.toString(
"a;b;;d;;;g;;".split(";")
));
// [a, b, , d, , , g]
Note that there are slots for the "missing" values for c, e, f, but not for h and i. To fix this, you can use a negative limit argument to String.split(String regex, int limit).
System.out.println(java.util.Arrays.toString(
"a;b;;d;;;g;;".split(";", -1)
));
// [a, b, , d, , , g, , ]
You can also use a positive limit of n to apply the pattern at most n - 1 times (i.e. resulting in no more than n elements in the array).
Zero-width matching split examples
Here are more examples of splitting on zero-width matching constructs; this can be used to split a string but also keep "delimiters".
Simple sentence splitting, keeping punctuation marks:
String str = "Really?Wow!This.Is.Awesome!";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[.!?])")
)); // prints "[Really?, Wow!, This., Is., Awesome!]"
Splitting a long string into fixed-length parts, using \G
String str = "012345678901234567890";
System.out.println(java.util.Arrays.toString(
str.split("(?<=\\G.{4})")
)); // prints "[0123, 4567, 8901, 2345, 6789, 0]"
Split before capital letters (except the first!)
System.out.println(java.util.Arrays.toString(
"OhMyGod".split("(?=(?!^)[A-Z])")
)); // prints "[Oh, My, God]"
A variety of examples is provided in related questions below.
References
regular-expressions.info/Lookarounds
Related questions
Can you use zero-width matching regex in String split?
"abc<def>ghi<x><x>" -> "abc", "<def>", "ghi", "<x>", "<x>"
How do I convert CamelCase into human-readable names in Java?
"AnXMLAndXSLT2.0Tool" -> "An XML And XSLT 2.0 Tool"
C# version: is there a elegant way to parse a word and add spaces before capital letters
Java split is eating my characters
Is there a way to split strings with String.split() and include the delimiters?
Regex split string but keep separators
You don't use split!
Split is to get the things BETWEEN the separator.
For this you want to eliminate the unwanted chars; '-'
The solution is simple
out=in.replaceAll("-","");
Use something like this to get the single values splitted. I'd rather eliminate the unwanted chars first to avoid getting empty/null String in the result array.
final Vector nodes = new Vector();
int index = original.indexOf(separator);
while (index >= 0) {
nodes.addElement(original.substring(0, index));
original = original.substring(index + separator.length());
index = original.indexOf(separator);
}
nodes.addElement(original);
final String[] result = new String[nodes.size()];
if (nodes.size() > 0) {
for (int loop = 0; loop smaller nodes.size(); loop++) {
result[loop] = (String) nodes.elementAt(loop);
}
}
return result;
}

Categories

Resources