How to use regex in java to manipulate the string - java

String s = "My cake should have ( sixteen | sixten | six teen ) candles, I love and ( should be | would be ) puff them."
final changed string
My cake should have <div><p id="1">sixteen</p><p id="2">sixten</p><p id="3">six teen</p></div> candles, I love and <div><p id="1">should be</p><p id="2"> would be</p> puff them
What i had tried is using this
Pattern pattern = Pattern.compile("\\(\\s*(.*?)(?=\\s*\\))");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}

You can match strings between parentheses and then split the texts inside with a pipe and build the replacement dynamically using Matcher.appendReplacement:
String s = "My cake should have ( sixteen | sixten | six teen ) candles, I love and ( should be | would be ) puff them.";
String rx = "\\(([^()]*)\\)";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(rx).matcher(s);
while (m.find()) {
String add = "";
String[] items = m.group(1).split("\\|");
for (int i=1; i<=items.length; i++) {
add += "<p id=\"" + i + "\">" + items[i-1].trim() + "</p>";
}
m.appendReplacement(result, "<div>"+add+"</div>");
}
m.appendTail(result);
System.out.println(result.toString());
See the Java online demo. Output:
My cake should have <div><p id="1">sixteen</p><p id="2">sixten</p><p id="3">six teen</p></div> candles, I love and <div><p id="1">should be</p><p id="2">would be</p></div> puff them.

Related

Parse football teams and result

I am trying to parse an string to retrieve the home and away teams, and also the result of it.
So the strings can be something like this:
Football: Real Madrid 2-1 FC Barcelona
Football: Atletico de Madrid 4-2 Real Madrid
Let's say, you have the home team name, plus the result in {homeTeamGoals}-{awayTeamGoals} and then the away team name
I want to use regexp to parse the string and retrieve the team names and result. I thought of having something like this:
String PATTERN_SPORT = "([a-zA-Z]+ ?[0-9]?)"
String PATTERN_NAME = "(.*)"
String PATTERN_RESULT = "([0-9]*)-([0-9]*)"
String PATTERN_SPORT_AND_HOME_TEAM_RESULT_AWAY_TEAM = Pattern.compile("^" + PATTERN_SPORT + ": " + PATTERN_NAME + " " + PATTERN_RESULT + " ?"
+ PATTERN_NAME + "?$")
But it does not match, and I don't know why since I used for the pattern name (.*), any clue?
I would use the following regex: (\w*:)\s?(.*)\s?(\d{1,2}-\d{1,2})\s?(.*) see here
group 1 (\w*:) will match the sport and : (eventually you can improve this to take only the sport without the : -> just do (\w*):)
group 2 (.*) first team name
group 3 (\d{1,2}-\d{1,2}) this will take any score (0-0 to 99-99)
group 4 (.*) second team name
just ignore the \s.
This will work only for your format (if you have other format the regex can be adjusted)
Java:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Test {
public static void main(String [] args){
String s = "Football: Hannover 96 3-3 1.FC Nuernberg";
String PATTERN_SPORT = "(\\w*:)";
String PATTERN_NAME = "(.*)";
String PATTERN_RESULT = "(\\d{1,2}-\\d{1,2})";
Pattern PATTERN_RESULTS= Pattern.compile("^" + PATTERN_SPORT + "\\s?" + PATTERN_NAME + "\\s?" + PATTERN_RESULT + "\\s?" + PATTERN_NAME + "$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = PATTERN_RESULTS.matcher(s);
if (matcher.matches()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
}
}
}
You can paste the code here and test it.
Output:
Football:
Hannover 96
3-3
1.FC Nuernberg
You need to make sure you match all Unicode whitespaces (the first one after : is a non-breaking space). Replacing all spaces with \s and compileing with Pattern.UNICODE_CHARACTER_CLASS option will solve the issue:
String PATTERN_SPORT = "([a-zA-Z]+\\s?[0-9]?)";
String PATTERN_NAME = "(.*)";
String PATTERN_RESULT = "([0-9]*)-([0-9]*)";
Pattern PATTERN_SPORT_AND_HOME_TEAM_RESULT_AWAY_TEAM = Pattern.compile("^" + PATTERN_SPORT + ":\\s" + PATTERN_NAME + "\\s" + PATTERN_RESULT + "\\s?"
+ PATTERN_NAME + "$", Pattern.UNICODE_CHARACTER_CLASS);
Java demo:
String s = "Football: Real Madrid 2-1 FC Barcelona";
String PATTERN_SPORT = "([a-zA-Z]+\\s?[0-9]?)";
String PATTERN_NAME = "(.*)";
String PATTERN_RESULT = "([0-9]*)-([0-9]*)";
Pattern PATTERN_SPORT_AND_HOME_TEAM_RESULT_AWAY_TEAM = Pattern.compile("^" + PATTERN_SPORT + ":\\s" + PATTERN_NAME + "\\s" + PATTERN_RESULT + "\\s?" + PATTERN_NAME + "$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = PATTERN_SPORT_AND_HOME_TEAM_RESULT_AWAY_TEAM.matcher(s);
if (matcher.matches()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
System.out.println(matcher.group(5));
}
Output:
Football
Real Madrid
2
1
FC Barcelona
You can try this pattern: (?<=: )(?P<home_team>[\w ]+) (?P<result>\d{1,2}-\d{1,2}) (?P<away_team>[\w ]+).
You might want to use different lookbehind: (?<=Football: ) to parse only football results.
I also assumed, that one team won't score more than 100 goals :) \d{1,2} will match scores from range 0-99.
Demo

Extracting tags and text between tags using regex

I am trying to extract both XML tags and text within tags using regex. I understand using regex is not the best option. I only have very few tags in my inline text file hence did not opt for XML parsers.
String txt="American Airlines made <TRIPS> 100 </TRIPS> flights in <DATE> December </DATE> over <ROUTE> Altantic </ROUTE> ";
String re1="<([^>]+)>"; // Tag 1
String re2="([^<]*)"; // Variable Name 1
String re3="</([^>]+)>"; // Tag 2
// String re3 = re1;
Pattern p = Pattern.compile(re1+re2+re3,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String tag1=m.group(1);
String var1=m.group(2);
System.out.println(tag1.toString());
System.out.println(var1.toString());
}
The problem is that, it only identifies the first tag and not the second one or subsequent ones.
Current Output
TRIPS
100
Desired Output
TRIPS
100
DATE
December
ROUTE
Altantic
Please Change if to while :
String txt = "American Airlines made <TRIPS> 100 <TRIPS> flights in <DATE> December </DATE> over <ROUTE> Altantic </ROUTE> ";
String re1 = "<([^>]+)>"; // Tag 1
String re2 = "([^<]*)"; // Variable Name 1
// String re3="</([^>]+)>"; // Tag 2
String re3 = re1;
Pattern p = Pattern.compile(re1 + re2 + re3, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
while (m.find()) {
String tag1 = m.group(1);
String var1 = m.group(2);
System.out.println(tag1.toString());
System.out.println(var1.toString());
}
If you came to this post looking for a way to parse XML, don't read this. Use an XML parser instead.
Solution:
Change if (m.find()) to while (m.find()). You can iterate to find all matches.
This is the general case to find all regex matches:
Pattern p = Pattern.compile(regex,flags);
Matcher m = p.matcher(text);
while (m.find())
{
System.out.println("First group: " + m.group(1) +
"\nSecond group: " + m.group(2) );
}

match a string with text and store the next text

I have a string which contains some text (in Greek language) which was extracted from a pdf.
How can I found a particular text lets say id.name: 123 and then store the number 123?
You can find using a regular expression:
String s = "Έχω ένα string που περιέχει κάποιο κείμενο ( στην ελληνική γλώσσα ), "
+ "το οποίο εξήχθη από ένα PDF .\nΠως μπορώ να ιδρύσω ένα συγκεκριμένο κείμενο "
+ "ας πούμε id.name : 123 και στη συνέχεια να αποθηκεύσετε τον αριθμό 123";
Pattern p = Pattern.compile("id\\.name \\: (\\d+)");
Matcher m = p.matcher(s);
if(m.find()){
System.out.println(m.group(1));
}
Regards.
there are many ways to do it, you can try regular expressions,
for instance let's suppose we have a string call s1 that contain "today is monday" and we can find the word monday, you can do that by:
String matcher = "today is monday";
Pattern p2 = Pattern.compile(".*monday.*");
Matcher m2 = p2.matcher(matcher);
boolean b2 = m2.matches();
if(b2 == true)
{
System.out.println(p2 + " found");
}
else
{
System.out.println(p2 + "no found");
}
}

Java Regex replace all

My text will look like this
| birth_date = {{birth date|1925|09|2|df=y}}
| birth_place = [[Bristol]], [[England]], UK
| death_date = {{death date and age|2000|11|16|1925|09|02|df=y}}
| death_place = [[Eastbourne]], [[Sussex]], England, UK
| origin =
| instrument = [[Piano]]
| genre =
| occupation = [[Musician]]
I would like to get everything that is inside of [[ ]]. I tried to use replace all to replace everything that is not inside the [[ ]] and then use split by new line to get a list of text with [[ ]].
input = input.replaceAll("^[\\[\\[(.+)\\]\\]]", "");
Required output:
[[Bristol]]
[[England]]
[[Eastbourne]]
[[Sussex]]
[[Piano]]
[[Musician]]
But this is not giving the desired output. What am I missing here?. There are thousands of documents and is this the fastest way to get it? If no, do tell me the optimum way to get the desired output.
You need to match it not replace
Matcher m=Pattern.compile("\\[\\[\\w+\\]\\]").matcher(input);
while(m.find())
{
m.group();//result
}
Use Matcher.find. For example:
import java.util.regex.*;
...
String text =
"| birth_date = {{birth date|1925|09|2|df=y}}\n" +
"| birth_place = [[Bristol]], [[England]], UK\n" +
"| death_date = {{death date and age|2000|11|16|1925|09|02|df=y}}\n" +
"| death_place = [[Eastbourne]], [[Sussex]], England, UK\n" +
"| origin = \n" +
"| instrument = [[Piano]]\n" +
"| genre = \n" +
"| occupation = [[Musician]]\n";
Pattern pattern = Pattern.compile("\\[\\[.+?\\]\\]");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
Just for fun, using replaceAll:
String output = input.replaceAll("(?s)(\\]\\]|^).*?(\\[\\[|$)", "$1\n$2");

How to compile different Patterns with String?

I need to define multi Pattern to compile with String and after running it should give me any thing in the string that has the same format in my Pattern. here is codes :
String line = "This order was places for QT 30.00$ !OK ? ";
Pattern[] patterns = new Pattern[]{
Pattern.compile("\\d+[.,]\\d+.[$] ", Pattern.CASE_INSENSITIVE),
Pattern.compile("\\d:\\d\\d",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE)
}; // Create a Pattern object
// Now create matcher object.
for (Pattern scriptPattern : patterns){
Matcher m = scriptPattern.matcher(line);
System.out.println(m.group());
} }
Is this what you are looking for
private static Pattern[] patterns = new Pattern[]{
Pattern.compile("Your pattern ", Pattern.CASE_INSENSITIVE),
Pattern.compile("your pattern",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE)
};
You can use this to go through the patterns and match them
for (Pattern scriptPattern : patterns){
Matcher m = scriptPattern.matcher(line)
while (m.find()) {
String d = m.group();
if(d != null) {
System.out.print(d);
}
}
}
Using your original question prior to the edit here is a few tweaks to make it work:
public static void main(String[] args) {
// String to be scanned to find the pattern.
String line = "This order was places for QT 30.00$ !OK ? 2:37 ";
String pattern = "(\\d+[.,]\\d+.[$])(.*)(\\d:\\d\\d)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(1) + " with time: " + m.group(3));
}
}
Output:
Found value: 30.00$ with time: 2:37

Categories

Resources