First pattern key is always not found

First pattern key is always not found - java

I want to read comments from .sql file and get the values:
<!--
#fake: some
#author: some
#ticket: ti-1232323
#fix: some fix
#release: master
#description: This is test example
-->
Code:
String text = String.join("", Files.readAllLines(file.toPath()));
Pattern pattern = Pattern.compile("^\\s*#(?<key>(fake|author|description|fix|ticket|release)): (?<value>.*?)$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find())
{
if (matcher.group("key").equals("author")) {
author = matcher.group("value");
}
if (matcher.group("key").equals("description")) {
description = matcher.group("value");
}
}
The first key in this case fake is always empty. If I put author for the first key it's again empty. Do you know how I can fix the regex pattern?

Use the following regex pattern:
(?<!\S)#(?<key>(?:fake|author|description|fix|ticket|release)): (?<value>.*?(?![^#]))
The negative lookbehind (?<!\S) used above will match either whitespace or the start o the string, covering the initial edge case. The negative lookahead (?![^#]) at the end of the pattern will stop before the next # term begins, or upon hitting the end of the input
String text = String.join("", Files.readAllLines(file.toPath()));
Pattern pattern = Pattern.compile("(?<!\\S)#(?<key>(?:fake|author|description|fix|ticket|release)): (?<value>.*?(?![^#]))", Pattern.DOTALL);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
if ("author".equals(matcher.group("key")) {
author = matcher.group("value");
}
if ("description".equals(matcher.group("key")) {
description = matcher.group("value");
}
}

If the <!-- and --> parts should be there, you could make use of the \G anchor to get consecutive matches and keep the groups.
Note that the alternatives are already in a named capturing group (?<key> so you don't have to wrap them in another group. The part in group value can be non greedy as you are matching to the end of the string.
As #Wiktor Stribiżew mentioned, you are joining the lines back without a newline so the separate parts will not be matched using for example the anchor $ asserting the end of the string.
Pattern
(?:^<!--(?=.*(?:\R(?!-->).*)*\R-->)|\G(?!^))\R#(?<key>fake|author|description|fix|ticket|release): (?<value>.*)$
Explanation
(?: Non capture group
^ Start of line
<!-- Match literally
(?=.*(?:\R(?!-->).*)*\R-->) Assert an ending -->
| Or
\G(?!^) Assert the end of the previous match, not at the start
) Close group
\R# Match a unicode newline sequence and #
(?<key> Named group key, match any of the alternatives
fake|author|description|fix|ticket|release
): Match literally
(?<value>.*)$ Named group value Match any char except a newline until the end of the string
Regex demo | Java demo
Example code
String text = String.join("\n", Files.readAllLines(file.toPath()));
String regex = "(?:^<!--(?=.*(?:\\R(?!-->).*)*\\R-->)|\\G(?!^))\\R#(?<key>fake|author|description|fix|ticket|release): (?<value>.*)$";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
if (matcher.group("key").equals("author")) {
System.out.println(matcher.group("value"));
}
if (matcher.group("key").equals("description")) {
System.out.println(matcher.group("value"));
}
}
Output
some
This is test example

Related

Regular Expression (regex). How to ignore or exclude everything in between?

I have this input text:
142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48
I want to use regular expression to extract 000781fe0000326f and -51.984, so the output looks like this
000781fe0000326f-51.984
I can use [0-9]{5,7}(?:[a-z][a-z0-9_]*) and ([-]?\\d*\\.\\d+)(?![-+0-9\\.]) to extract 000781fe0000326f and -51.984, respectively.
Is there a way to ignore or exclude everything between 000781fe0000326f and -51.984? To ignore everythin that will be captured by the non greedy filler (.*?) ?
String ref="[0-9]{5,7}(?:[a-z][a-z0-9_]*)_____([-]?\\d*\\.\\d+)(?![-+0-9\\.])";
Pattern p = Pattern.compile(ref,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(input);
while (m.find())
{
String all = m.group();
//list3.add(all);
}

For you example data you might use an alternation | to match either one of the regexes in you question and then concatenate them.
Note that in your regex you could write (?:[a-z][a-z0-9_]*) as [a-z][a-z0-9_] and you don't have to escape the dot in a character class.
For example:
[0-9]{5,7}[a-z][a-z0-9_]*|-?\d*\.\d+(?![-+0-9.])
Regex demo
String regex = "[0-9]{5,7}[a-z][a-z0-9_]*|-?\\d*\\.\\d+(?![-+0-9.])";
String string = "142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = "";
while (matcher.find()) {
result += matcher.group(0);
}
System.out.println(result); // 000781fe0000326f-51.984
Demo Java

There's no way to combine strings together like that in pure regex, but it's easy to create a group for the first match, a group for the second match, and then use m.group(1) + m.group(2) to concatenate the two groups together and create your desired combined string.
Also note that [0-9] simplifies to \d, a character set with only one token in it simplifies to just that token, [a-z0-9_] with the i flag simplifies to \w, and there's no need to escape a . inside a character set:
String input = "142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48";
String ref="(\\d{5,7}(?:[a-z]\\w*)).*?((?:-?\\d*\\.\\d+)(?![-+\\d.]))";
Pattern p = Pattern.compile(ref,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(input);
while (m.find())
{
String all = m.group(1) + m.group(2);
System.out.println(all);
}

you cannot really ignore the words in between. You can include them all.
something like this will include all of them.
[0-9]{5,7}(?:[a-z][a-z0-9_])[a-zA-Z0-9_ ]([-]?\d*.\d+)(?![-+0-9.])
But that is not what you want.
I think the best bet is either having 2 regular expressions and then combining the result, or splitting the string on spaces/tab characters and checking the 'n'th elements as required

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);

One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.

Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Java - Regular Expressions matching one to another

I am trying to retrieve bits of data using RE. Problem is I'm not very fluent with RE. Consider the code.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class HTTP{
private static String getServer(httpresp){
Pattern p = Pattern.compile("(\bServer)(.*[Server:-\r\n]"); //What RE syntax do I use here?
Matcher m = p.matcher(httpresp);
if (m.find()){
return m.group(2);
public static void main(String[] args){
String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n"; //Test data
System.out.println(getServer(testdata));
How would I get "Server:" to the next "\r\n" out which would output "Apache"? I googled around and tried myself, but have failed.

It's a one liner:
private static String getServer(httpresp) {
return httpresp.replaceAll(".*Server: (.*?)\r\n.*", "$1");
}
The trick here is two-part:
use .*?, which is a reluctant match (consumes as little as possible and still match)
regex matches whole input, but desired target captured and returned using a back reference

You could use capturing groups or positive lookbehind.
Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)");
Then print the group index 1.
Example:
String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n";
Matcher matcher = Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
OR
Matcher matcher = Pattern.compile("(?:\\bServer\\b\\S*\\s+)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
Output:
Apache
Explanation:
(?:\\bServer:\\s*) In regex, non-capturing group would be represented as (?:...), which will do matching only. \b called word boundary which matches between a word character and a non-word character. Server: matches the string Server: and the following zero or more spaces would be matched by \s*
(.*?) In regex (..) called capturing group which captures those characters which are matched by the pattern present inside the capturing group. In our case (.*?) will capture all the characters non-greedily upto,
(?=[\r\n]+) one or more line breaks are detected. (?=...) called positive lookahead which asserts that the match must be followed by the characters which are matched by the pattern present inside the lookahead.

java regular expression lookahead non-capture but output it

i am trying to use the pattern \w(?=\w) to find 2 consecutive characters using the following,
although lookahead works, i want to output the actual matched but not consume it
here is the code:
Pattern pattern = Pattern.compile("\\w(?=\\w)");
Matcher matcher = pattern.matcher("abcde");
while (matcher.find())
{
System.out.println(matcher.group(0));
}
i want the matching output: ab bc cd de
but i can only get a b c d e
any idea?

The content of the lookahead has zero width, so it is not part of group zero. To do what you want, you need to explicitly capture the content of the lookahead, and then reconstruct the combined text+lookahead, like this:
Pattern pattern = Pattern.compile("\\w(?=(\\w))");
// ^ ^
// | |
// Add a capturing group
Matcher matcher = pattern.matcher("abcde");
while (matcher.find()) {
// Use the captured content of the lookahead below:
System.out.println(matcher.group(0) + matcher.group(1));
}
Demo on ideone.

Java regex patterns

I need help with this matter. Look at the following regex:
Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s1);
I want to look for words like this: "home-made", "aaaa-bbb" and not "aaa - bbb", but not
"aaa--aa--aaa". Basically, I want the following:
word - hyphen - word.
It is working for everything, except this pattern will pass: "aaa--aaa--aaa" and shouldn't. What regex will work for this pattern?

Can can remove the backslash from your expression:
"[A-Za-z]+-[A-Za-z]+"
The following code should work then
Pattern pattern = Pattern.compile("[A-Za-z]+-[A-Za-z]+");
Matcher matcher = pattern.matcher("aaa-bbb");
match = matcher.matches();
Note that you can use Matcher.matches() instead of Matcher.find() in order to check the complete string for a match.
If instead you want to look inside a string using Matcher.find() you can use the expression
"(^|\\s)[A-Za-z]+-[A-Za-z]+(\\s|$)"
but note that then only words separated by whitespace will be found (i.e. no words like aaa-bbb.). To capture also this case you can then use lookbehinds and lookaheads:
"(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])"
which will read
(?<![A-Za-z-]) // before the match there must not be and A-Z or -
[A-Za-z]+ // the match itself consists of one or more A-Z
- // followed by a -
[A-Za-z]+ // followed by one or more A-Z
(?![A-Za-z-]) // but afterwards not by any A-Z or -
An example:
Pattern pattern = Pattern.compile("(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])");
Matcher matcher = pattern.matcher("It is home-made.");
if (matcher.find()) {
System.out.println(matcher.group()); // => home-made
}

Actually I can't reproduce the problem mentioned with your expression, if I use single words in the String. As cleared up with the discussion in the comments though, the String s contains a whole sentence to be first tokenised in words and then matched or not.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExp {
private static void match(String s) {
Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("'" + s + "' match");
} else {
System.out.println("'" + s + "' doesn't match");
}
}
/**
* #param args
*/
public static void main(String[] args) {
match(" -home-made");
match("home-made");
match("aaaa-bbb");
match("aaa - bbb");
match("aaa--aa--aaa");
match("home--home-home");
}
}
The output is:
' -home-made' doesn't match
'home-made' match
'aaaa-bbb' match
'aaa - bbb' doesn't match
'aaa--aa--aaa' doesn't match
'home--home-home' doesn't match

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

First pattern key is always not found - java

Related

Regular Expression (regex). How to ignore or exclude everything in between?

What is wrong in regexp in Java

Java - Regular Expressions matching one to another

java regular expression lookahead non-capture but output it

Java regex patterns

Categories

Resources