Splitting String using RegEx in Android - java

I've been trying to split Strings using RegEx with no success. The idea is to split a given music file metadata from its file name in a way so that:
"01. Kodaline - Autopilot.mp3"
.. would result in..
metadata[0] = "01"
metadata[1] = "Kodaline"
metadata[2] = "Autopilot"
This is the RegEx I've been trying to use in its original form:
^(.*)\.(.*)\-(.*)\.(mp3|flac)
From what I've read, I need to format the RegEx for String.split(String regex) to work. So here's my formatted RegEx:
^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)
..and this is what my code looks like:
String filename = "01. Kodaline - Autopilot.mp3";
String regex = "^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)";
String[] metadata = filename.split(regex);
But I'm not receiving the result I expected. Can you help me on this?

Your regex is fine for matching the input string. Your problem is that you used split(), which expects a regex with a totally different purpose. For split(), the regex you give it matches the delimiters (separators) that separate parts of the input; they don't match the entire input. Thus, in a different situation (not your situation), you could say
String[] parts = s.split("[\\- ]");
The regex matches one character that is either a dash or a space. So this will look for dashes and spaces in your string and return the parts separated by the dashes and spaces.
To use your regex to match the input string, you need something like this:
String filename = "01. Kodaline - Autopilot.mp3";
String regex = "^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(filename);
String[] metadata = new String[4];
if (matcher.find()) {
metadata[0] = matcher.group(1); // in real life I'd use a loop
metadata[1] = matcher.group(2);
metadata[2] = matcher.group(3);
metadata[3] = matcher.group(4);
// the rest of your code
}
which sets metadata to the strings "01", " Kodaline ", " Autopilot", "mp3", which is close to what you want except maybe for extra spaces (which you can look for in your regex). Unfortunately, I don't think there's a built-in Matcher function that returns all the groups in one array.
(By the way, in your regex, you don't need the backslashes in front of -, but they're harmless, so I left them in. The - doesn't normally have a special meaning, so it doesn't need to be escaped. Inside square brackets, however, a hyphen is special, so you should use backslashes if you want to match a set of characters and a hyphen is one of those characters. That's why I used backslashes in my split example above.)

this worked for me
str.split("\\.\\s+|\\s+-\\s+|\\.(mp3|flac)");

Try something like:
String filename = "01. Kodaline - Autopilot.mp3";
String fileWithoutExtension = filename.substring(0, filename.lastIndexOf('.'));
System.out.println(Arrays.toString(fileWithoutExtension.replaceAll("[^\\w\\s]", "").split("\\s+")));
Output:
[01, Kodaline, Autopilot]

Related

Regex including date string, email, number

I have this regex expression:
String patt = "(\\w+?)(:|<|>)(\\w+?),";
Pattern pattern = Pattern.compile(patt);
Matcher matcher = pattern.matcher(search + ",");
I am able to match a string like
search = "firstName:Giorgio"
But I'm not able to match string like
search = "email:giorgio.rossi#libero.it"
or
search = "dataregistrazione:27/10/2016"
How I should modify the regex expression in order to match these strings?
You may use
String pat = "(\\w+)[:<>]([^,]+)"; // Add a , at the end if it is necessary
See the regex demo
Details:
(\w+) - Group 1 capturing 1 or more word chars
[:<>] - one of the chars inside the character class, :, <, or >
([^,]+) - Group 2 capturing 1 or more chars other than , (in the demo, I added \n as the demo input text contains newlines).
You can use regex like this:
public static void main(String[] args) {
String[] arr = new String[]{"firstName:Giorgio", "email:giorgio.rossi#libero.it", "dataregistrazione:27/10/2016"};
String pattern = "(\\w+[:|<|>]\\w+)|(\\w+:\\w+\\.\\w+#\\w+\\.\\w+)|(\\w+:\\d{1,2}/\\d{1,2}/\\d{4})";
for(String str : arr){
if(str.matches(pattern))
System.out.println(str);
}
}
output is:
firstName:Giorgio
email:giorgio.rossi#libero.it
dataregistrazione:27/10/2016
But you have to remember that this regex will work only for your format of data. To make up the universal regex you should use RFC documents and articles (i.e here) about email format. Also this question can be useful.
Hope it helps.
The Character class \w matches [A-Za-z0-9_]. So kindly change the regex as (\\w+?)(:|<|>)(.*), to match any character from : to ,.
Or mention all characters that you can expect i.e. (\\w+?)(:|<|>)[#.\\w\\/]*, .

Splitting text by punctuation and special cases like :) or space

I have a following string:
Hello word!!!
or
Hello world:)
Now I want to split this string to an array of string which contains Hello,world,!,!,! or Hello,world,:)
the problem is if there was space between all the parts I could use split(" ")
but here !!! or :) is attached to the string
I also used this code :
String Text = "But I know. For example, the word \"can\'t\" should";
String[] Res = Text.split("[\\p{Punct}\\s]+");
System.out.println(Res.length);
for (String s:Res){
System.out.println(s);
}
which I found it from here but not really helpful in my case:
Splitting strings through regular expressions by punctuation and whitespace etc in java
Can anyone help?
Seems to me like you do not want to split but rather capture certain groups. The thing with split string is that it gets rid of the parts that you split by (so if you split by spaces, you don't have spaces in your output array), therefore if you split by "!" you won't get them in your output. Possibly this would work for capturing the things that you are interested in:
(\w+)|(!)|(:\))/g
regex101
Mind you don't use string split with it, but rather exec your regex against your string in whatever engine/language you are using. In Java it would be something like:
String input = "Hello world!!!:)";
Pattern p = Pattern.compile("(\w+)|(!)|(:\))");
Matcher m = p.matcher(input);
List<String> matches = new ArrayList<String>();
while (m.find()) {
matches.add(m.group());
}
Your matches array will have:
["Hello", "world", "!", "!", "!", ":)"]

Regex to split on delimiter

I have following string:
;Spe \,\:\; cial;;;
and I want to split it with semicolon as delimiter, however semicolon preceded by "\" should not be counted as delimiter. So I would like to get something like
["", "Spe \,\:\; cial", "", "", ""]
Update:
Java representation looks like:
String s = ";Spe \\,\\:\\; cial;;;";
Use a negative look-behind:
(?<!\\\\);
(Note that there's really only a single \ in this expression -- ie, the expression should be (?<!\); -- but the backslash character has to be double-escaped: once for the benefit of the Java compiler, and again for the benefit of the regex engine.)
You want to extract the parts captured by the following regex : ;?([^;]*)\\\\?;
So search this pattern in your string as long as a match is found :
Pattern pattern = Pattern.compile(";?([^;]*)\\\\?;");
Matcher matcher = pattern.matcher(yourString);
List<String> tokens = new ArrayList<String>();
while(matcher.find()){
tokens.add(matcher.group(1));
}
String[] yourArray = tokens.toArray(new String[0]); // if you prefer an array
// rather than a list

Java regex validating special chars

This seems like a well known title, but I am really facing a problem in this.
Here is what I have and what I've done so far.
I have validate input string, these chars are not allowed :
&%$###!~
So I coded it like this:
String REGEX = "^[&%$###!~]";
String username= "jhgjhgjh.#";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(username);
if (matcher.matches()) {
System.out.println("matched");
}
Change your first line of code like this
String REGEX = "[^&%$##!~]*";
And it should work fine. ^ outside the character class denotes start of line. ^ inside a character class [] means a negation of the characters inside the character class. And, if you don't want to match empty usernames, then use this regex
String REGEX = "[^&%$##!~]+";
i think you want this:
[^&%$###!~]*
To match a valid input:
String REGEX = "[^&%$##!~]*";
To match an invalid input:
String REGEX = ".*[&%$##!~]+.*";

Escape comma when using String.split

I'm trying to perform some super simple parsing o log files, so I'm using String.split method like this:
String [] parts = input.split(",");
And works great for input like:
a,b,c
Or
type=simple, output=Hello, repeat=true
Just to say something.
How can I escape the comma, so it doesn't match intermediate commas?
For instance, if I want to include a comma in one of the parts:
type=simple, output=Hello, world, repeate=true
I was thinking in something like:
type=simple, output=Hello\, world, repeate=true
But I don't know how to create the split to avoid matching the comma.
I've tried:
String [] parts = input.split("[^\,],");
But, well, is not working.
You can solve it using a negative look behind.
String[] parts = str.split("(?<!\\\\), ");
Basically it says, split on each ", " that is not preceeded by a backslash.
String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
System.out.println(s);
Output:
type=simple
output=Hello\, world
repeate=true
(ideone.com link)
If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:
String[] parts = str.split(", (?=\\w+=)");
Which says split on each ", " which is followed by some word-characters and an =
(ideone.com link)
I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe
final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));
You'll probably want to skip the spaces after the comma as well:
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");
It's not really complicated, just note that you need four backslashes in order to match one.
Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind
final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
System.out.println("'" + item.replace("\\,", ",") + "'");
}
Output:
'type=simple'
'output=Hello, world'
'repeate=true'
Reference:
Pattern: Special Constructs
I think
input.split("[^\\\\],");
should work. It will split at all commas that are not preceeded with a backslash.
BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.

Categories

Resources