Find All Word between < and > with Regex - java

I want to find word between < and > from a String.
For example:
String str=your mobile number is <A> and username is <B> thanks <C>;
I want to get A, B, C from the String.
I have tried
import java.util.regex.*;
public class Main
{
public static void main (String[] args)
{
String example = your mobile number is <A> and username is <B> thanks <C>;
Matcher m = Pattern.compile("\\<([^)]+)\\>").matcher(example);
while(m.find()) {
System.out.println(m.group(1));
}
}
}
What's wrong with what I am doing?

Use the following idiom and back-reference to get the values for your A, B and C placeholders:
String example = "your mobile number is <A> and username is <B> thanks <C>";
// ┌ left delimiter - no need to escape here
// | ┌ group 1: 1+ of any character, reluctantly quantified
// | | ┌ right delimiter
// | | |
Matcher m = Pattern.compile("<(.+?)>").matcher(example);
while (m.find()) {
System.out.println(m.group(1));
}
Output
A
B
C
Note
If you favor a solution with no indexed back-reference, and "look-arounds", you can achieve the same with the following code:
String example = "your mobile number is <A> and username is <B> thanks <C>";
// ┌ positive look-behind for left delimiter
// | ┌ 1+ of any character, reluctantly quantified
// | | ┌ positive look-ahead for right delimiter
// | | |
Matcher m = Pattern.compile("(?<=<).+?(?=>)").matcher(example);
while (m.find()) {
// no index for back-reference here, catching main group
System.out.println(m.group());
}
I personally find the latter less readable in this instance.

You need to use > or <> inside the negated character class. [^)]+ in your regex matches any charcater but not of ), one or more times. So this would match also the < or > symbols.
Matcher m = Pattern.compile("<([^<>]+)>").matcher(example);
while(m.find()) {
System.out.println(m.group(1));
}
OR
Use lookarounds.
Matcher m = Pattern.compile("(?<=<)[^<>]*(?=>)").matcher(example);
while(m.find()) {
System.out.println(m.group());
}

Can you please try this?
public static void main(String[] args) {
String example = "your mobile number is <A> and username is <B> thanks <C>";
Matcher m = Pattern.compile("\\<(.+?)\\>").matcher(example);
while(m.find()) {
System.out.println(m.group(1));
}
}

Related

Java Regex get all numbers

I need to retrieve all numbers from a String, example :
"a: 1 | b=2 ; c=3.2 / d=4,2"
I want get this result :
1
2
3.2
4,2
So, i don't know how to say that in Regex on Java.
Actually, i have this :
(?<=\D)(?=\d)|(?<=\d)(?=\D)
He split letter and number (but the double value is not respected), and the result is :
1
2
3
2 (problem)
4
2 (problem)
Can you help me ?
Thanks :D
You might use a capturing group with a character class:
[a-z][:=]\h*(\d+(?:[.,]\d+)?)
Explanation
[a-z] Word boundary, match a char a-z
[:=] Match either : or =
\h* Match 0+ horizontal whitespace chars
( Capture group 1
\d+(?:[.,]\d+)? Match 1+ digits with an optional decimal part with either . or ,
) Close group
Regex demo | Java demo
For example
String regex = "[a-z][:=]\\h*(\\d+(?:[.,]\\d+)?)";
String string = "a: 1 | b=2 ; c=3.2 / d=4,2";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
1
2
3.2
4,2
You can do it as follows:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) throws InterruptedException {
// Test
String s = "a: 1 | b=2 ; c=3.2 / d=4,2";
showNumbers(s);
}
static void showNumbers(String s) {
Pattern regex = Pattern.compile("\\d[\\d,.]*");
Matcher matcher = regex.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
Output:
1
2
3.2
4,2
You can use
/\d+(?:[.,]\d+)?/g
demo
You may jsut need a regex like (\d(?:[.,]\d)?) that
find a digit
evently dot/comma + other digit after
The multi-digit version is (\d+(?:[.,]\d+)?)
String value = "a: 1 | b=2 ; c=3.2 / d=4,2";
Pattern p = Pattern.compile("(\\d(?:[.,]\\d)?)");
Matcher m = p.matcher(value);
while (m.find()) {
System.out.println(m.group());
}
// 1
// 2
// 3.2
// 4,2

Regex to find Integers in particular string lines

I have this regex to find integers in a string (newlines). However, I want to filtrate this. I want the regex to find the number in certain lines, and not others.
String:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern);
match = pr.matcher(test);
System.out.println();
if (match.find()) {
System.out.println("Found: " + match.group());
}
This regex find the integers after the comma, for all the lines. If I want a particular regex to find the integers in the line containing "test1", "test2", and "test3". How should I do this? I want to create three different regex, but my regex skills are weak.
First regex should print out 2. The second 8 and the third 3.
You can expand your pattern to include test[123] in the lookbehind, which would match test1, test2, or test3:
String pattern = "(?<=test[123][^,]{0,100},[^,]{1,100},)\\d+";
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(test);
System.out.println();
while (match.find()) {
System.out.println("Found: " + match.group());
}
The ,[^,] portion skis everything between two commas that follow testN.
I use {0,100} in place of * and {1,100} in place of + inside lookbehind expressions, because Java regex engine requires that lookbehinds had a pre-defined limit on their length. If you need to allow skipping more than 100 characters, adjust the maximum length accordingly.
Demo.
You can use the following Pattern and loop for this:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
// | "test" literal
// | | any number of digits
// | | | comma
// | | | any number of digits
// | | | | comma
// | | | | | group1, your digits
Pattern p = Pattern.compile("test\\d+,\\d+,(\\d+)");
Matcher m = p.matcher(test);
while (m.find()) {
// prints back-reference to group 1
System.out.printf("Found: %s%n", m.group(1));
}
Output
Found: 2
Found: 8
Found: 3
You could also use capturing groups to extract the test number and the other number from the string:
String pattern = "test([123]),\\d+,(\\d+),";
...
while (match.find()) {
// get and parse the number after "test" (first capturing group)
int testNo = Integer.parseInt(match.group(1));
// get and parse the number you wanted to extract (second capturing group)
int num = Integer.parseInt(match.group(2));
System.out.println("test"+testNo+": " + num);
}
Which prints
test1: 2
test2: 8
test3: 3
Note: In this example parsing the strings is only done for demonstration purposes, but it could be useful, if you want to do something with the numbers, like storing them in a array.
Update: If you also want to match strings like "ytrt.ytrwyt.test1.ytrwyt,0,2,0" you could change pattern to "test([123])\\D*,\\d+,(\\d+)," to allow any number of non-digits to follow test1, test2 or test3 (preceding the comma seperated ints).

Extract substring that appears after certain pattern

I need to extract a substring that appears after a certain pattern in the input string. I have been trying various combinations but not getting expected output.
The input string can be in following 2 forms
1. 88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE
2. 88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507
I need to write a regex that will be applicable to above 2 variations and extract '149IF1007JMO2507' part that follows 'SNDR REF:'.
Please find below sample program that i have written.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final String input = "88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE";
private static Pattern pattern = Pattern.compile(".*SNDR REF:(.*?)(\\s.)*");
private static Matcher matcher = pattern.matcher(input);
public static void main (String[] args) {
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
}
}
Output:149IF1007JMO2507 BISCAYNE BLVD STE
I want output to be '149IF1007JMO2507'
Thank you.
You can use the following idiom to find your sub-string:
String[] examples = {
"88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE",
"88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507"
};
// ┌ look-behind for "SNDR REF:"
// | ┌ anything, reluctantly quantified
// | | ┌ lookahead for
// | | | whitespace or end of input
Pattern p = Pattern.compile("(?<=SNDR\\sREF:).+?(?=\\s|$)");
// iterating examples
for (String s: examples) {
Matcher m = p.matcher(s);
// iterating single matches (one per example here)
while (m.find()) {
System.out.printf("Found: %s%n", m.group());
}
}
Output
Found: 149IF1007JMO2507
Found: 149IF1007JMO2507
Note
I expect you don't know in advance it's going to be "149IF1007JMO2507", hence the contextual matching.
You can use this regexp:
private static Pattern pattern = Pattern.compile(".*SNDR REF:([^\\s]+).*");
This will take everything after "SNDR REF
You can do it with replaceAll
str = str.replaceAll(".*(REF:(\\S+)).*", "$2");

Split by excel conditions

I want to split this string:
=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))
[IF, BI18=0, INTYEARTODAY, IF, INTYEARBI18>2025, 2025, INTYEARBI18]
I tried it with that regex:
String[] result = text.substring(1, text.length()).split("[;()]+");
However, I am getting:
[IF, BI18=0, INT, YEAR, TODAY, IF, INT, YEAR, BI18, >2025, 2025, INT, YEAR, BI18]
I am struggeling to identify the excel methods generically.
I would appreciate your answer, to split the string generically as expected.
Following up on the comments, if you want the main contents of the IF(...) conditions wherein the ... is the content, here's a quick solution.
Please note that albeit this solution applies to the input at hand, it may be unreliable in other cases, with nested statements - basically it's a workaround.
String formula = "=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))";
// | positive lookbehind: starts with "IF("
// | | any character, reluctantly quantified
// | | | positive lookahead, followed by
// | | | ")", then...
// | | | | ";" or end of input
// | | | |
Pattern p = Pattern.compile("(?<=IF\\().+?(?=\\)(;|$))");
Matcher m = p.matcher(formula);
while (m.find()) {
System.out.println(m.group());
}
Output
BI18=0;INT(YEAR(TODAY())
INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18)))
Try,
String str1 = "=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))";
ArrayList<String> strList = new ArrayList<String>();
for(String str2 : str1.replaceFirst("=", "").split(";")){
if(str2.contains("IF")){
strList.add("IF");
strList.add(str2.replaceAll("IF|\\(|\\)", ""));
}else{
strList.add(str2.replaceAll("\\(|\\)", ""));
}
}
System.out.println(strList.toString());
Output:
[IF, BI18=0, INTYEARTODAY, IF, INTYEARBI18>2025, 2025, INTYEARBI18]
You can use this regex. In the demo, make sure to look at the capture groups on the right.
^=([^(]+)\(|\G([^;]+)[;|)$]
We retrieve the matches from capture Groups 1 and 2.
In Java, this means something like this:
Pattern regex = Pattern.compile("^=([^(]+)\\(|\\G([^;]+)[;|)$]");
Matcher regexMatcher = regex.matcher(your_original_string);
while (regexMatcher.find()) {
// check Group 1, which is regexMatcher.group(1)
// check Group 2, which is regexMatcher.group(2)
}

Regular Expression strings in Java

I want to use a regular expression that extracts a substring with the following properties in Java:
Beginning of the substring begins with 'WWW'
The end of the substring is a colon ':'
I have some experience in SQL with using the Like clause such as:
Select field1 from A where field2 like '%[A-Z]'
So if I were using SQL I would code:
like '%WWW%:'
How can I start this in Java?
Pattern p = Pattern.compile("WWW.*:");
Matcher m = p.matcher("zxdfefefefWWW837eghdehgfh:djf");
while (m.find()){
System.out.println(m.group());
}
Here's a different example using substring.
public static void main(String[] args) {
String example = "http://www.google.com:80";
String substring = example.substring(example.indexOf("www"), example.lastIndexOf(":"));
System.out.println(substring);
}
If you want to match only word character and ., then you may want to use the regular expression as "WWW[\\w.]+:"
Pattern p = Pattern.compile("WWW[\\w.]+:");
Matcher m = p.matcher("WWW.google.com:hello");
System.out.println(m.find()); //prints true
System.out.println(m.group()); // prints WWW.google.com:
If you want to match any character, then you may want to use the regular expression as "WWW[\\w\\W]+:"
Pattern p = Pattern.compile("WWW[\\w\\W]+:");
Matcher m = p.matcher("WWW.googgle_$#.com:hello");
System.out.println(m.find());
System.out.println(m.group());
Explanation: WWW and : are literals. \\w - any word character i.e. a-z A-Z 0-9. \\W - Any non word character.
If I understood it right
String input = "aWWW:bbbWWWa:WWW:aWWWaaa:WWWa:WWWabc:WWW:";
Pattern p = Pattern.compile("WWW[^(WWW)|^:]*:");
Matcher m = p.matcher(input);
while(m.find()) {
System.out.println(m.group());
}
Output:
WWW:
WWWa:
WWW:
WWWaaa:
WWWa:
WWWabc:
WWW:

Categories

Resources