I am trying to parse an string to retrieve the home and away teams, and also the result of it.
So the strings can be something like this:
Football: Real Madrid 2-1 FC Barcelona
Football: Atletico de Madrid 4-2 Real Madrid
Let's say, you have the home team name, plus the result in {homeTeamGoals}-{awayTeamGoals} and then the away team name
I want to use regexp to parse the string and retrieve the team names and result. I thought of having something like this:
String PATTERN_SPORT = "([a-zA-Z]+ ?[0-9]?)"
String PATTERN_NAME = "(.*)"
String PATTERN_RESULT = "([0-9]*)-([0-9]*)"
String PATTERN_SPORT_AND_HOME_TEAM_RESULT_AWAY_TEAM = Pattern.compile("^" + PATTERN_SPORT + ": " + PATTERN_NAME + " " + PATTERN_RESULT + " ?"
+ PATTERN_NAME + "?$")
But it does not match, and I don't know why since I used for the pattern name (.*), any clue?
I would use the following regex: (\w*:)\s?(.*)\s?(\d{1,2}-\d{1,2})\s?(.*) see here
group 1 (\w*:) will match the sport and : (eventually you can improve this to take only the sport without the : -> just do (\w*):)
group 2 (.*) first team name
group 3 (\d{1,2}-\d{1,2}) this will take any score (0-0 to 99-99)
group 4 (.*) second team name
just ignore the \s.
This will work only for your format (if you have other format the regex can be adjusted)
Java:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Test {
public static void main(String [] args){
String s = "Football: Hannover 96 3-3 1.FC Nuernberg";
String PATTERN_SPORT = "(\\w*:)";
String PATTERN_NAME = "(.*)";
String PATTERN_RESULT = "(\\d{1,2}-\\d{1,2})";
Pattern PATTERN_RESULTS= Pattern.compile("^" + PATTERN_SPORT + "\\s?" + PATTERN_NAME + "\\s?" + PATTERN_RESULT + "\\s?" + PATTERN_NAME + "$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = PATTERN_RESULTS.matcher(s);
if (matcher.matches()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
}
}
}
You can paste the code here and test it.
Output:
Football:
Hannover 96
3-3
1.FC Nuernberg
You need to make sure you match all Unicode whitespaces (the first one after : is a non-breaking space). Replacing all spaces with \s and compileing with Pattern.UNICODE_CHARACTER_CLASS option will solve the issue:
String PATTERN_SPORT = "([a-zA-Z]+\\s?[0-9]?)";
String PATTERN_NAME = "(.*)";
String PATTERN_RESULT = "([0-9]*)-([0-9]*)";
Pattern PATTERN_SPORT_AND_HOME_TEAM_RESULT_AWAY_TEAM = Pattern.compile("^" + PATTERN_SPORT + ":\\s" + PATTERN_NAME + "\\s" + PATTERN_RESULT + "\\s?"
+ PATTERN_NAME + "$", Pattern.UNICODE_CHARACTER_CLASS);
Java demo:
String s = "Football: Real Madrid 2-1 FC Barcelona";
String PATTERN_SPORT = "([a-zA-Z]+\\s?[0-9]?)";
String PATTERN_NAME = "(.*)";
String PATTERN_RESULT = "([0-9]*)-([0-9]*)";
Pattern PATTERN_SPORT_AND_HOME_TEAM_RESULT_AWAY_TEAM = Pattern.compile("^" + PATTERN_SPORT + ":\\s" + PATTERN_NAME + "\\s" + PATTERN_RESULT + "\\s?" + PATTERN_NAME + "$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = PATTERN_SPORT_AND_HOME_TEAM_RESULT_AWAY_TEAM.matcher(s);
if (matcher.matches()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
System.out.println(matcher.group(5));
}
Output:
Football
Real Madrid
2
1
FC Barcelona
You can try this pattern: (?<=: )(?P<home_team>[\w ]+) (?P<result>\d{1,2}-\d{1,2}) (?P<away_team>[\w ]+).
You might want to use different lookbehind: (?<=Football: ) to parse only football results.
I also assumed, that one team won't score more than 100 goals :) \d{1,2} will match scores from range 0-99.
Demo
Related
Using these two regex expressions regPrefix and regSuffix,
final String POEM = "1. Twas brillig, and the slithy toves\n" +
"2. Did gyre and gimble in the wabe.\n" +
"3. All mimsy were the borogoves,\n" +
"4. And the mome raths outgrabe.\n\n";
String regPrefix = "(?m)^(\\S+)"; // for the first word in each line.
String regSuffix = "(?m)\\S+\\s+\\S+\\s+\\S+$"; // for the last 3 words in each line.
Matcher m1 = Pattern.compile(regPrefix).matcher(POEM);
Matcher m2 = Pattern.compile(regSuffix).matcher(POEM);
while (m1.find() && m2.find()) {
System.out.println(m1.group() + " " + m2.group());
}
I am getting the correct output as:
1. the slithy toves
2. in the wabe.
3. were the borogoves,
4. mome raths outgrabe.
Is it possible to merge those two regex expressions into one, and get the same output? I tried something like:
String singleRegex = "(?m)^(\\S+)\\S+\\s+\\S+\\s+\\S+$";
but it didn't work for me.
Use a single pattern with two capture groups:
String regex = "(?m)^(\\S+).*?((?:\\s+\\S+){3})$";
Matcher m = Pattern.compile(regex).matcher(POEM);
while (m.find()) {
System.out.println(m.group(1) + m.group(2));
}
1. the slithy toves
2. in the wabe.
3. were the borogoves,
4. mome raths outgrabe.
Demo
Trying to parse out names with given samples
++++++++++++++++++SELIZABETH+COLLAZO+++++++++++++++++++
+++++++++++++++++++PALOMA+CORREA+++++++++++++++++++++++
+++++++++++++++++++NOAH+BLAKEMORE++++++++++++++++++++++
I've tried
//++(.*?)+(.*?)//++
but that's way off.
Would like to parse out the first and last name to two strings.
You can use this regex (\w+)\+(\w+) or \+{2,}(.*?)\+(.*?)\+{2,} with Pattern like this :
String str = "++++++++++++++++++SELIZABETH+COLLAZO+++++++++++++++++++\n"
+ "+++++++++++++++++++PALOMA+CORREA+++++++++++++++++++++++\n"
+ "+++++++++++++++++++NOAH+BLAKEMORE++++++++++++++++++++++";
Pattern pattern = Pattern.compile("(\\w+)\\+(\\w+)");// or instead "\\+{2,}(.*?)\\+"(.*?)\\+{2,}
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
Outputs
SELIZABETH COLLAZO
PALOMA CORREA
NOAH BLAKEMORE
I have this regex:
String regexPattern = "[0-9A-Za-z]+(st|nd|rd|th)" + " " + "floor";
I want to test it against:
String lineString = "8th floor, Prince's Building, 12 Chater Road";
so I do:
boolean isMatching = lineString.matches(regexPattern);
and it return false. Why?
I thought it had something to do with whitespaces in Java, so I removed the whitespace in the regexPattern variable so it reads
regexPattern = "[0-9A-Za-z]+(st|nd|rd|th)floor";
and matched it with a string without white space:
String lineString = "8thfloor,Prince'sBuilding,12ChaterRoad"
it still returns false. Why? Any help very much appreciated.
String.matches() only returns true if the entire string matches the pattern.
Try adding .* to the beginning and end of your regex.
Example:
String regex = ".*[0-9A-Za-z]+(st|nd|rd|th)" + " " + "floor.*";
This is not the best approach, however...
Here's a better alternative:
String input = "8th floor, Prince's Building, 12 Chater Road";
String regex = "[0-9A-Za-z]+(st|nd|rd|th)" + " " + "floor";
Pattern p = Pattern.compile(regex);
boolean isMatch = p.matcher(input).find();
If you want to extract the floor number, do this:
String input = "8th floor, Prince's Building, 12 Chater Road";
String regex = "([0-9A-Za-z])+(st|nd|rd|th)" + " " + "floor";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
if (m.find()) {
String num = m.group(1);
String suffix = m.group(2);
System.out.println("Welcome to the " + num + suffix + " floor!");
// prints 'Welcome to the 8th floor!'
}
Check out the Pattern API for a boatload of info about Java regular expressions.
Edited, per comments ...
The [0-9A-Za-z]+ part is greedily matching until the end of th.
Try [0-9] instead.
I have an input string that will follow the pattern /user/<id>?name=<name>, where <id> is alphanumeric but must start with a letter, and <name> is a letter-only string that can have multiple spaces. Some examples of matches would be:
/user/ad?name=a a
/user/one111?name=one ONE oNe
/user/hello?name=world
I came up with the following regex:
String regex = "/user/[a-zA-Z]+\\w*\\?name=[a-zA-Z\\s]+";
All of the above examples match the regex, but it only looks at the first word in <name>. Shouldn't the sequence \s allow me to have white spaces?
The code that I made to test what it is doing is:
String regex = "/user/[a-zA-Z]+\\w*\\?name=[a-zA-Z\\s]+";
// Check to see that input matches pattern
if(Pattern.matches(regex, str) == true){
str = str.replaceFirst("/user/", "");
str = str.replaceFirst("name=", "");
String[] tokens = str.split("\\?");
System.out.println("size = " + tokens.length);
System.out.println("tokens[0] = " + tokens[0]);
System.out.println("tokens[1] = " + tokens[1]);
} else
System.out.println("Didn't match.");
So for example, one test might look like:
/user/myID123?name=firstName LastName
size = 2
tokens[0] = myID123
tokens[1] = firstName
whereas the desired output would be
tokens[1] = firstName LastName
How can I change my regex to do this?
Not sure what you think is the problem in your code. tokens[1] will indeed contain firstName LastName in your example.
Here's an ideone.com demo showing this.
However, have you considered using capturing groups for the id and the name.
If you write it like
String regex = "/user/(\\w+)\\?name=([a-zA-Z\\s]+)";
Matcher m = Pattern.compile(regex).matcher(input);
you can get hold of myID123 and firstName LastName through m.group(1) and m.group(2)
I don't find any fault in your code but you may capture group like this:
String str = "/user/myID123?name=firstName LastName ";
String regex = "/user/([a-zA-Z]+\\w*)\\?name=([a-zA-Z\\s]+)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1) + ", " + m.group(2));
}
The problem is that * is greedy by default (it matches the whole string), so you need to modify your regex by adding a ? (making it reluctant):
List<String> str = Arrays.asList("/user/ad?name=a a", "/user/one111?name=one ONE oNe", "/user/hello?name=world");
String regex = "/user/([a-zA-Z]+\\w*?)\\?name=([a-zA-Z\\s]+)";
for (String s : str) {
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.matches()) {
System.out.println("user: " + matcher.group(1));
System.out.println("name: " + matcher.group(2));
}
}
Output:
user: ad
name: a a
user: one111
name: one ONE oNe
user: hello
name: world
I'm not new to Java, but have not dealt with Regex and Patterns before. What I'm looking to do is take a string like
"Class: " + data1 + "\nFrom: " + data2 + " To: " + data3 + "\nOccures: " + data4 + " In: " + data5 + " " + data6;
and pull out only data_1 to data_n.
I appreciate any help.
Use this regex:
Pattern pattern = Pattern.compile("Class: (.+?)\nFrom: (.+?) To: (.+?)\nOccures: (.+?) In: (.+?) (.+?)");
Matcher matcher = pattern.matcher(yourInputString);
if (matcher.find())
{
String data1 = matcher.group(1);
String data2 = matcher.group(2);
String data3 = matcher.group(3);
String data4 = matcher.group(4);
String data5 = matcher.group(5);
String data6 = matcher.group(6);
} else
{
// String didn't match the specified format
}
Explanation:
.+? will match any character for undefined times, but non-greedy.
(), using brackets will create a group. A group is given an index starting by 1 (since group 0 is the entire match)
So, (.+?) will creates groups of any character.
And what the matcher does, is searching for the whole pattern somewhere in the input string. But since you specified the format, we know exactly how your entire string is going to look like. The only thing you have to do is copy the format and replace the data you want to extract with "something" (.+?), which you give an index by creating a group of it.
Afterwards, the matcher will try to find the pattern (done by matcher.find()) and you ask them what the content is of the groups 1 up to 6.
how about using split() with ":", then from the splitted String[] get string[2i+1] ? (i from 0)