1) Pattern pattern = Pattern.compile("34238");
Matcher matcher = pattern.matcher("6003 Honore Ave Suite 101 Sarasota Florida,
34238");
if (matcher.find()) {
System.out.println("ok");
}
2) Pattern pattern = Pattern.compile("^[0-9]{5}(?:-[0-9]{4})?$");
Matcher matcher = pattern.matcher("34238");
if (matcher.find()) {
System.out.println("ok");
}
Output for the above code is: ok
But the following code is not printing anything:
Pattern pattern = Pattern.compile("^[0-9]{5}(?:-[0-9]{4})?$");
Matcher matcher = pattern.matcher("6003 Honore Ave Suite 101 Sarasota Florida, 34238");
if (matcher.find()) {
System.out.println("ok");
}
What is the reason for this not to print ok? I am using the same pattern here also.
Although the pattern is the same, the input strings are different:
In your second example, you are matching a string consisting entirely of a zip code, so you get a match for ^...$ expression
The second example does not start with the zip code, so the ^ anchor prevents your regex from matching.
^ and $ anchors are used when you want your expression to match the entire input line. When you want to match at the beginning, keep ^ and remove $; when you want to match at the end, remove ^ and keep $; when you want to match anywhere inside the string, remove both anchors.
The code is good and working as expected. In the 2) and 3) block in your question you are using the same regex but different input strings.
However, if you just want to check if a string must contain a US zip code, then the problem is that your regex is using anchors, so you are only matching lines that starts and finish with a zip code.
The strings that matches your regex are like 34238 or 34238-1234 and won't match something 12345 something.
If you remove the anchors, then you will match whatever 12345 whatever:
// Pattern pattern = Pattern.compile("^[0-9]{5}(?:-[0-9]{4})?$");
// ^--------- Here -------^
Pattern pattern = Pattern.compile("[0-9]{5}(?:-[0-9]{4})?");
Matcher matcher = pattern.matcher("6003 Honore Ave Suite 101 Sarasota Florida, 34238");
if (matcher.find()) {
System.out.println("ok");
}
Btw, if you just want to check if a string contains a zip code, then you can use String.matches(..), like this:
String str = "6003 Honore Ave Suite 101 Sarasota Florida, 34238";
if (str.matches(".*[0-9]{5}(?:-[0-9]{4})?.*")) {
System.out.println("ok");
}
IDEOne demo
Related
I am going to extract numbers from a string. Numbers represents a version.
It means, I am going to match numbers which are between:
_ and /
/ and /
I have prepared the following regex, but it doesn't work as expected:
.*[\/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})\/.*
For the following example, the regex should match twice:
Input: name_1.1.1/9.10.0/abc. Expected result: 1.1.1 and 9.10.0
, but my regex returns only 9.10.0, 1.1.1 is omitted. Do you have any idea what is wrong?
You could just split the string on _ or /, and then retain components which appear to be versions:
List<String> versions = new ArrayList<>();
String input = "name_1.1.1/9.10.0/abc";
String[] parts = input.split("[_/]");
for (String part : parts) {
if (part.matches("\\d+(?:\\.\\d+)*")) {
versions.add(part);
}
}
System.out.println(versions); // [1.1.1, 9.10.0]
You can assert the / at the end instead of matching it, and omit the .*
Note that you don't have to escape the /
[/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})(?=/)
Regex demo | Java demo
Example code
String regex = "[/_](\\d{1,2}[.]\\d{1,2}[.]\\d{1,2})(?=/)";
String string = "name_1.1.1/9.10.0/abc";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
1.1.1
9.10.0
Another option could be using a positive lookbehind to assert either a / or _ to the left, and get a match only.
(?<=[/_])\d{1,2}[.]\d{1,2}[.]\d{1,2}(?=/)
regex demo
Code Demo
String regex = "(\\d+.\\d+.\\d+)";
String string = "name_1.1.1/9.10.0/abc";
String string2 = "randomversion4.5.6/09.7.8_9.88.9";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
Matcher matcher2 = pattern.matcher(string2);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
while (matcher2.find()) {
System.out.println(matcher2.group(1));
}
Out:
1.1.1
9.10.0
4.5.6
09.7.8
9.88.9
Just write regex for what you want to match. In this case just the version number.
Regex can be used to match whole strings or to find if there is a substring that exists in a string.
When using regex to find a substring, you cannot always match all filenames or any string. Hence only match on what you want to find.
This way you can find the versions no matter what string it is in.
I want to read comments from .sql file and get the values:
<!--
#fake: some
#author: some
#ticket: ti-1232323
#fix: some fix
#release: master
#description: This is test example
-->
Code:
String text = String.join("", Files.readAllLines(file.toPath()));
Pattern pattern = Pattern.compile("^\\s*#(?<key>(fake|author|description|fix|ticket|release)): (?<value>.*?)$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find())
{
if (matcher.group("key").equals("author")) {
author = matcher.group("value");
}
if (matcher.group("key").equals("description")) {
description = matcher.group("value");
}
}
The first key in this case fake is always empty. If I put author for the first key it's again empty. Do you know how I can fix the regex pattern?
Use the following regex pattern:
(?<!\S)#(?<key>(?:fake|author|description|fix|ticket|release)): (?<value>.*?(?![^#]))
The negative lookbehind (?<!\S) used above will match either whitespace or the start o the string, covering the initial edge case. The negative lookahead (?![^#]) at the end of the pattern will stop before the next # term begins, or upon hitting the end of the input
String text = String.join("", Files.readAllLines(file.toPath()));
Pattern pattern = Pattern.compile("(?<!\\S)#(?<key>(?:fake|author|description|fix|ticket|release)): (?<value>.*?(?![^#]))", Pattern.DOTALL);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
if ("author".equals(matcher.group("key")) {
author = matcher.group("value");
}
if ("description".equals(matcher.group("key")) {
description = matcher.group("value");
}
}
If the <!-- and --> parts should be there, you could make use of the \G anchor to get consecutive matches and keep the groups.
Note that the alternatives are already in a named capturing group (?<key> so you don't have to wrap them in another group. The part in group value can be non greedy as you are matching to the end of the string.
As #Wiktor Stribiżew mentioned, you are joining the lines back without a newline so the separate parts will not be matched using for example the anchor $ asserting the end of the string.
Pattern
(?:^<!--(?=.*(?:\R(?!-->).*)*\R-->)|\G(?!^))\R#(?<key>fake|author|description|fix|ticket|release): (?<value>.*)$
Explanation
(?: Non capture group
^ Start of line
<!-- Match literally
(?=.*(?:\R(?!-->).*)*\R-->) Assert an ending -->
| Or
\G(?!^) Assert the end of the previous match, not at the start
) Close group
\R# Match a unicode newline sequence and #
(?<key> Named group key, match any of the alternatives
fake|author|description|fix|ticket|release
): Match literally
(?<value>.*)$ Named group value Match any char except a newline until the end of the string
Regex demo | Java demo
Example code
String text = String.join("\n", Files.readAllLines(file.toPath()));
String regex = "(?:^<!--(?=.*(?:\\R(?!-->).*)*\\R-->)|\\G(?!^))\\R#(?<key>fake|author|description|fix|ticket|release): (?<value>.*)$";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
if (matcher.group("key").equals("author")) {
System.out.println(matcher.group("value"));
}
if (matcher.group("key").equals("description")) {
System.out.println(matcher.group("value"));
}
}
Output
some
This is test example
I would like to test if a string contains insert and name, with any interceding characters. And if it does, I would like to print the match.
For the below code, only the third Pattern matches, and the entire line is printed. How can I match only insert...name?
String x = "aaa insert into name sdfdf";
Matcher matcher = Pattern.compile("insert.*name").matcher(x);
if (matcher.matches())
System.out.print(matcher.group(0));
matcher = Pattern.compile(".*insert.*name").matcher(x);
if (matcher.matches())
System.out.print(matcher.group(0));
matcher = Pattern.compile(".*insert.*name.*").matcher(x);
if (matcher.matches())
System.out.print(matcher.group(0));
try to use group like this .*(insert.*name).*
Matcher matcher = Pattern.compile(".*(insert.*name).*").matcher(x);
if (matcher.matches()) {
System.out.print(matcher.group(1));
//-----------------------------^
}
Or in your case you can just use :
x = x.replaceAll(".*(insert.*name).*", "$1");
Both of them print :
insert into name
You just need to use find() instead of matches() in your code:
String x = "aaa insert into name sdfdf";
Matcher matcher = Pattern.compile("insert.*?name").matcher(x);
if (matcher.find())
System.out.print(matcher.group(0));
matches() expects you to match entire input string whereas find() lets you match your regex anywhere in the input.
Also suggest you to use .*? instead of .*, in case your input may contain multiple instances of index ... name pairs.
This code sample will output:
insert into name
Just use multiple positive lookaheads:
(?=.*insert)(?=.*name).+
See a demo on regex101.com.
i am trying to use the pattern \w(?=\w) to find 2 consecutive characters using the following,
although lookahead works, i want to output the actual matched but not consume it
here is the code:
Pattern pattern = Pattern.compile("\\w(?=\\w)");
Matcher matcher = pattern.matcher("abcde");
while (matcher.find())
{
System.out.println(matcher.group(0));
}
i want the matching output: ab bc cd de
but i can only get a b c d e
any idea?
The content of the lookahead has zero width, so it is not part of group zero. To do what you want, you need to explicitly capture the content of the lookahead, and then reconstruct the combined text+lookahead, like this:
Pattern pattern = Pattern.compile("\\w(?=(\\w))");
// ^ ^
// | |
// Add a capturing group
Matcher matcher = pattern.matcher("abcde");
while (matcher.find()) {
// Use the captured content of the lookahead below:
System.out.println(matcher.group(0) + matcher.group(1));
}
Demo on ideone.
This is my input string and I wanted to break it up into 5 parts according to regex below so that I can print out the 5 groups but I always get a no match found. What am I doing wrong ?
String content="beit Be it enacted by the Senate and House of Representatives of the United States of America in Congress assembled,,<m>Surface Transportation Extension Act of 2012.,<xm>";
Pattern regEx = Pattern.compile("^(.*)(<m>)(.*)(<xm>)(.*)$", Pattern.MULTILINE);
System.out.println(regEx.matcher(content).group(1));
System.out.println(regEx.matcher(content).group(2));
System.out.println(regEx.matcher(content).group(3));
System.out.println(regEx.matcher(content).group(4));
System.out.println(regEx.matcher(content).group(5));
Pattern regEx = Pattern.compile("^(.*)(<m>)(.*)(<xm>)(.*)$", Pattern.MULTILINE);
Matcher matcher = regEx.matcher(content);
if (matcher.find()) { // calling find() is important
// if the regex matches multiple times, use while instead of if
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
System.out.println(matcher.group(5));
} else {
System.out.println("Regex didn't match");
}
Your regex's 5th match doesn't match anything - there is no content after the <xm>. Also, you should really run regEx.matcher() once, and then pull the groups out of the one matcher; as written, it executes the regex 5 times, once to get each group out. Also your RegEx is never executed unless you call find() or matches.