I have following string:
;Spe \,\:\; cial;;;
and I want to split it with semicolon as delimiter, however semicolon preceded by "\" should not be counted as delimiter. So I would like to get something like
["", "Spe \,\:\; cial", "", "", ""]
Update:
Java representation looks like:
String s = ";Spe \\,\\:\\; cial;;;";
Use a negative look-behind:
(?<!\\\\);
(Note that there's really only a single \ in this expression -- ie, the expression should be (?<!\); -- but the backslash character has to be double-escaped: once for the benefit of the Java compiler, and again for the benefit of the regex engine.)
You want to extract the parts captured by the following regex : ;?([^;]*)\\\\?;
So search this pattern in your string as long as a match is found :
Pattern pattern = Pattern.compile(";?([^;]*)\\\\?;");
Matcher matcher = pattern.matcher(yourString);
List<String> tokens = new ArrayList<String>();
while(matcher.find()){
tokens.add(matcher.group(1));
}
String[] yourArray = tokens.toArray(new String[0]); // if you prefer an array
// rather than a list
Related
I would like to split a String and despair on the regex pattern.
I need to split a string like this: Hi I want "to split" this (String) to a String array like this:
String [] array = {"Hi", "I", "want", """, "to", "split", """, "this", "(", "string", ")"};
This is what I have tried, but it deletes the delimiter.
public static void main(String[] args) {
String string = "Hi \"why should\" (this work)";
String[] array;
array = string.split("\\s"
+ "|\\s(?=\")"
+ "|\\w(?=\")"
+ "|\"(?=\\w)"
+ "|\\s(?=\\()"
+ "|\\w(?=\\))"
+ "|\\((?=\\w)");
for (String str : array) {
System.out.println(str);
}
}
Result:
Hi
why
shoul
"
this
wor
)
You can match the tokens with the regex \w+|[\w\s], assuming that you want the punctuation characters to end up in different tokens:
String input = "Hi I want \"to split\" this (String).";
Matcher matcher = Pattern.compile("\\w+|[^\\w\\s]").matcher(input);
List<String> out = new ArrayList<>();
while (matcher.find()) {
out.add(matcher.group());
}
The output ArrayList contains:
[Hi, I, want, ", to, split, ", this, (, String, ), .]
You might want to use (?U) flag to make the \w and \s follows the Unicode definition of word and whitespace character. By default, \w and \s only recognizes word and whitespace characters in ASCII range.
For the sake of completeness, here is the solution in split(), which works on Java 8 and above. There will be an extra empty string at the beginning in Java 7.
String tokens[] = input.split("\\s+|(?<![\\w\\s])(?=\\w)|(?<=\\w)(?![\\w\\s])|(?<=[^\\w\\s])(?=[^\\w\\s])");
The regex is rather complex, since the empty string splits between punctuation character and word character need to avoid the cases already split by \s+.
Since the regex in the split solution is quite a mess, please use the match solution instead.
What language are you trying to write this in?
You could write regex groups something like: (.+)(\s)
This would match any quantity of characters followed by a space
I've been trying to split Strings using RegEx with no success. The idea is to split a given music file metadata from its file name in a way so that:
"01. Kodaline - Autopilot.mp3"
.. would result in..
metadata[0] = "01"
metadata[1] = "Kodaline"
metadata[2] = "Autopilot"
This is the RegEx I've been trying to use in its original form:
^(.*)\.(.*)\-(.*)\.(mp3|flac)
From what I've read, I need to format the RegEx for String.split(String regex) to work. So here's my formatted RegEx:
^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)
..and this is what my code looks like:
String filename = "01. Kodaline - Autopilot.mp3";
String regex = "^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)";
String[] metadata = filename.split(regex);
But I'm not receiving the result I expected. Can you help me on this?
Your regex is fine for matching the input string. Your problem is that you used split(), which expects a regex with a totally different purpose. For split(), the regex you give it matches the delimiters (separators) that separate parts of the input; they don't match the entire input. Thus, in a different situation (not your situation), you could say
String[] parts = s.split("[\\- ]");
The regex matches one character that is either a dash or a space. So this will look for dashes and spaces in your string and return the parts separated by the dashes and spaces.
To use your regex to match the input string, you need something like this:
String filename = "01. Kodaline - Autopilot.mp3";
String regex = "^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(filename);
String[] metadata = new String[4];
if (matcher.find()) {
metadata[0] = matcher.group(1); // in real life I'd use a loop
metadata[1] = matcher.group(2);
metadata[2] = matcher.group(3);
metadata[3] = matcher.group(4);
// the rest of your code
}
which sets metadata to the strings "01", " Kodaline ", " Autopilot", "mp3", which is close to what you want except maybe for extra spaces (which you can look for in your regex). Unfortunately, I don't think there's a built-in Matcher function that returns all the groups in one array.
(By the way, in your regex, you don't need the backslashes in front of -, but they're harmless, so I left them in. The - doesn't normally have a special meaning, so it doesn't need to be escaped. Inside square brackets, however, a hyphen is special, so you should use backslashes if you want to match a set of characters and a hyphen is one of those characters. That's why I used backslashes in my split example above.)
this worked for me
str.split("\\.\\s+|\\s+-\\s+|\\.(mp3|flac)");
Try something like:
String filename = "01. Kodaline - Autopilot.mp3";
String fileWithoutExtension = filename.substring(0, filename.lastIndexOf('.'));
System.out.println(Arrays.toString(fileWithoutExtension.replaceAll("[^\\w\\s]", "").split("\\s+")));
Output:
[01, Kodaline, Autopilot]
While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?
You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.
You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532
There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.
You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)
i have a problem to build following regex:
[1,2,3,4]
i found a work-around, but i think its ugly
String stringIds = "[1,2,3,4]";
stringIds = stringIds.replaceAll("\\[", "");
stringIds = stringIds.replaceAll("\\]", "");
String[] ids = stringIds.split("\\,");
Can someone help me please to build one regex, which i can use in the split function
Thanks for help
edit:
i want to get from this string "[1,2,3,4]" to an array with 4 entries. the entries are the 4 numbers in the string, so i need to eliminate "[","]" and ",". the "," isn't the problem.
the first and last number contains [ or ]. so i needed the fix with replaceAll. But i think if i use in split a regex for ",", i also can pass a regex which eliminates "[" "]" too. But i cant figure out, who this regex should look like.
This is almost what you're looking for:
String q = "[1,2,3,4]";
String[] x = q.split("\\[|\\]|,");
The problem is that it produces an extra element at the beginning of the array due to the leading open bracket. You may not be able to do what you want with a single regex sans shenanigans. If you know the string always begins with an open bracket, you can remove it first.
The regex itself means "(split on) any open bracket, OR any closed bracket, OR any comma."
Punctuation characters frequently have additional meanings in regular expressions. The double leading backslashes... ugh, the first backslash tells the Java String parser that the next backslash is not a special character (example: \n is a newline...) so \\ means "I want an honest to God backslash". The next backslash tells the regexp engine that the next character ([ for example) is not a special regexp character. That makes me lol.
Maybe substring [ and ] from beginning and end, then split the rest by ,
String stringIds = "[1,2,3,4]";
String[] ids = stringIds.substring(1,stringIds.length()-1).split(",");
Looks to me like you're trying to make an array (not sure where you got 'regex' from; that means something different). In this case, you want:
String[] ids = {"1","2","3","4"};
If it's specifically an array of integer numbers you want, then instead use:
int[] ids = {1,2,3,4};
Your problem is not amenable to splitting by delimiter. It is much safer and more general to split by matching the integers themselves:
static String[] nums(String in) {
final Matcher m = Pattern.compile("\\d+").matcher(in);
final List<String> l = new ArrayList<String>();
while (m.find()) l.add(m.group());
return l.toArray(new String[l.size()]);
}
public static void main(String args[]) {
System.out.println(Arrays.toString(nums("[1, 2, 3, 4]")));
}
If the first line your code is following:
String stringIds = "[1,2,3,4]";
and you're trying to iterate over all number items, then the follwing code-frag only could work:
try {
Pattern regex = Pattern.compile("\\b(\\d+)\\b", Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
The String will looks like this:
String temp = "IF (COND_ITION) (ACT_ION)";
// Only has one whitespace in either side of the parentheses
or
String temp = " IF (COND_ITION) (ACT_ION) ";
// Have more irrelevant whitespace in the String
// But no whitespace in condition or action
I hope to get a new String array which contains three elemets, ignore the parentheses:
String[] tempArray;
tempArray[0] = IF;
tempArray[1] = COND_ITION;
tempArray[2] = ACT_ION;
I tried to use String.split(regex) method but I don't know how to implement the regex.
If your input string will always be in the format you described, it is better to parse it based on the whole pattern instead of just the delimiter, as this code does:
Pattern pattern = Pattern.compile("(.*?)[/s]\\((.*?)\\)[/s]\\((.*?)\\)");
Matcher matcher = pattern.matcher(inputString);
String tempArray[3];
if(matcher.find()) {
tempArray[0] name = matcher.group(1);
tempArray[1] name = matcher.group(2);
tempArray[2] name = matcher.group(3);
}
Pattern breakdown:
(.*?) IF
[/s] white space
\\((.*?)\\) (COND_ITION)
[/s] white space
\\((.*?)\\) (ACT_ION)
You can use StringTokenizer to split into strings delimited by whitespace. From Java documentation:
The following is one example of the use of the tokenizer. The code:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
prints the following output:
this
is
a
test
Then write a loop to process the strings to replace the parentheses.
I think you want a regular expression like "\\)? *\\(?", assuming any whitespace inside the parentheses is not to be removed. Note that this doesn't validate that the parentheses match properly. Hope this helps.