getting NULL values from Java regex Matcher with a found pattern

getting NULL values from Java regex Matcher with a found pattern - java

I'm trying to get the following regex to work on my String:
Pattern Regex = Pattern.compile("(?:(\\d+) ?(days?|d) *?)?(?:(\\d+) ?(hours?|h) *?)?(?:(\\d+) ?(minutes?|m) *?)?(?:(\\d+) ?(seconds?|s))?",
Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher RegexMatcher = Regex.matcher(myString);
while (RegexMatcher.find()) {
...
}
.. it basically splits a string like 1day 3 hours into matched regex groups.
The problem I'm having is that when I get into the while loop, calls to RegexMatcher.group(i) will always return a NULL value, meaning they were not found in the string.
When I try to output RegexMatcher.group(0), it returns an empty string, even though myString definitelly contains like "hello 1d" - which should return at least 1st group as "1" and second as "d".
I've checked and double-checked the regex and it seems to be ok. No Idea what's wrong here.
Thanks for any ideas :-)

For a matcher m, input sequence s, and group index g, the expressions m.group(g) and s.substring(m.start(g), m.end(g)) are equivalent.
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
If the match was successful but the group specified failed to match any part of the input sequence, then null is returned. Note that some groups, for example (a*), match the empty string. This method will return the empty string when such a group successfully matches the empty string in the input.
If you want to ergodic all the matches, you can code like :
Pattern Regex = Pattern
.compile(
"(?:(\\d+) ?(days?|d) *?)?(?:(\\d+) ?(hours?|h) *?)?(?:(\\d+) ?(minutes?|m) *?)?(?:(\\d+) ?(seconds?|s))?",
Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE
| Pattern.UNICODE_CASE);
Matcher RegexMatcher = Regex.matcher("1 d 3 hours");
while (RegexMatcher.find()) {
System.out.println(RegexMatcher.group());
}
Note: m.group() is equivalent to m.group(0)

Related

First pattern key is always not found

I want to read comments from .sql file and get the values:
<!--
#fake: some
#author: some
#ticket: ti-1232323
#fix: some fix
#release: master
#description: This is test example
-->
Code:
String text = String.join("", Files.readAllLines(file.toPath()));
Pattern pattern = Pattern.compile("^\\s*#(?<key>(fake|author|description|fix|ticket|release)): (?<value>.*?)$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find())
{
if (matcher.group("key").equals("author")) {
author = matcher.group("value");
}
if (matcher.group("key").equals("description")) {
description = matcher.group("value");
}
}
The first key in this case fake is always empty. If I put author for the first key it's again empty. Do you know how I can fix the regex pattern?

Use the following regex pattern:
(?<!\S)#(?<key>(?:fake|author|description|fix|ticket|release)): (?<value>.*?(?![^#]))
The negative lookbehind (?<!\S) used above will match either whitespace or the start o the string, covering the initial edge case. The negative lookahead (?![^#]) at the end of the pattern will stop before the next # term begins, or upon hitting the end of the input
String text = String.join("", Files.readAllLines(file.toPath()));
Pattern pattern = Pattern.compile("(?<!\\S)#(?<key>(?:fake|author|description|fix|ticket|release)): (?<value>.*?(?![^#]))", Pattern.DOTALL);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
if ("author".equals(matcher.group("key")) {
author = matcher.group("value");
}
if ("description".equals(matcher.group("key")) {
description = matcher.group("value");
}
}

If the <!-- and --> parts should be there, you could make use of the \G anchor to get consecutive matches and keep the groups.
Note that the alternatives are already in a named capturing group (?<key> so you don't have to wrap them in another group. The part in group value can be non greedy as you are matching to the end of the string.
As #Wiktor Stribiżew mentioned, you are joining the lines back without a newline so the separate parts will not be matched using for example the anchor $ asserting the end of the string.
Pattern
(?:^<!--(?=.*(?:\R(?!-->).*)*\R-->)|\G(?!^))\R#(?<key>fake|author|description|fix|ticket|release): (?<value>.*)$
Explanation
(?: Non capture group
^ Start of line
<!-- Match literally
(?=.*(?:\R(?!-->).*)*\R-->) Assert an ending -->
| Or
\G(?!^) Assert the end of the previous match, not at the start
) Close group
\R# Match a unicode newline sequence and #
(?<key> Named group key, match any of the alternatives
fake|author|description|fix|ticket|release
): Match literally
(?<value>.*)$ Named group value Match any char except a newline until the end of the string
Regex demo | Java demo
Example code
String text = String.join("\n", Files.readAllLines(file.toPath()));
String regex = "(?:^<!--(?=.*(?:\\R(?!-->).*)*\\R-->)|\\G(?!^))\\R#(?<key>fake|author|description|fix|ticket|release): (?<value>.*)$";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
if (matcher.group("key").equals("author")) {
System.out.println(matcher.group("value"));
}
if (matcher.group("key").equals("description")) {
System.out.println(matcher.group("value"));
}
}
Output
some
This is test example

Assign value to variable from ArrayList with regex

I need to extract the first 6 characters (ICAO) and assign it to a local variable from an ArrayList (taken from https://opensky-network.org/apidoc/rest.html) that looks like this:
["1234a5","ABC123","Schlumpfhausen",1572255699,1572255699,8.9886,48.3756,6278.88,false,155.16,216.64,-6.18,null,6484.62,"3026",false,0
"44035c","LDM87NM ","Austria",1572430045,1572430052,9.2009,48.6891,null,true,0,163,null,null,null,"6463",false,0
.
.
.
]
It is required to use java.util.regex to solve this Problem.
String icao=null;
Pattern icaoPattern = Pattern.compile("([+-]*[a-zA-Z0-9.]*)");
Matcher matcher = icaoPattern.matcher(sentence.getAircraftJson());
if(matcher.find()) {
icao=matcher.group(1);
}
The outcome should be printed like this:
ICAO: 1234a5 | Callsign: ABC123 | ...
ICAO: 44035c | Callsign: LDM87NM| ...
but all I get is
ICAO: | Callsign: | ...
ICAO: | Callsign: | ...

You want group(0) for the first match. The regex also needs some attention unless you can be very sure about the content always being clean. I removed the parentheses and changed the first * to ? and the other to a + but there's a million things that could be done here to make it safer
Pattern icaoPattern = Pattern.compile("[+-]?[a-zA-Z0-9.]+");
Matcher matcher = icaoPattern.matcher(sentence);
while(matcher.find()) {
System.out.println(matcher.group(0));
}

You can try the following regex to get every String including (or not) the double quote (") or any character except commas :
Pattern icaoPattern = Pattern.compile("(?:\"[^\"]*(?:\"\"[^\"]*)*\"|[^,])+");
Matcher matcher = icaoPattern.matcher(result);
while (matcher.find()) {
System.out.println(matcher.group());
}
In your code you run matcher.find() once (in if() statement). You should run it until it does not find any other match, thus the while() loop.
You can later use String methods to remove the leading and trailing quotes, etc.

Your pattern would match at the beginning or the sentence the empty string "".
Also the requirement of having the first 6 characters is tricky, only using regex.
The simplest would be:
Pattern icaoPattern = Pattern.compile("("
+ "[a-zA-Z0-9.]{6}"
+ "|[-+][a-zA-Z0-9.]{5}"
+ "|[-+]{2}[a-zA-Z0-9.]{4}"
+ "|[-+]{3}[a-zA-Z0-9.]{3}"
+ "|[-+]{4}[a-zA-Z0-9.]{2}"
+ "|[-+]{5}[a-zA-Z0-9.]"
+ "|[-+]{6}"
+ ")");

Regular Expression (regex). How to ignore or exclude everything in between?

I have this input text:
142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48
I want to use regular expression to extract 000781fe0000326f and -51.984, so the output looks like this
000781fe0000326f-51.984
I can use [0-9]{5,7}(?:[a-z][a-z0-9_]*) and ([-]?\\d*\\.\\d+)(?![-+0-9\\.]) to extract 000781fe0000326f and -51.984, respectively.
Is there a way to ignore or exclude everything between 000781fe0000326f and -51.984? To ignore everythin that will be captured by the non greedy filler (.*?) ?
String ref="[0-9]{5,7}(?:[a-z][a-z0-9_]*)_____([-]?\\d*\\.\\d+)(?![-+0-9\\.])";
Pattern p = Pattern.compile(ref,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(input);
while (m.find())
{
String all = m.group();
//list3.add(all);
}

For you example data you might use an alternation | to match either one of the regexes in you question and then concatenate them.
Note that in your regex you could write (?:[a-z][a-z0-9_]*) as [a-z][a-z0-9_] and you don't have to escape the dot in a character class.
For example:
[0-9]{5,7}[a-z][a-z0-9_]*|-?\d*\.\d+(?![-+0-9.])
Regex demo
String regex = "[0-9]{5,7}[a-z][a-z0-9_]*|-?\\d*\\.\\d+(?![-+0-9.])";
String string = "142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = "";
while (matcher.find()) {
result += matcher.group(0);
}
System.out.println(result); // 000781fe0000326f-51.984
Demo Java

There's no way to combine strings together like that in pure regex, but it's easy to create a group for the first match, a group for the second match, and then use m.group(1) + m.group(2) to concatenate the two groups together and create your desired combined string.
Also note that [0-9] simplifies to \d, a character set with only one token in it simplifies to just that token, [a-z0-9_] with the i flag simplifies to \w, and there's no need to escape a . inside a character set:
String input = "142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48";
String ref="(\\d{5,7}(?:[a-z]\\w*)).*?((?:-?\\d*\\.\\d+)(?![-+\\d.]))";
Pattern p = Pattern.compile(ref,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(input);
while (m.find())
{
String all = m.group(1) + m.group(2);
System.out.println(all);
}

you cannot really ignore the words in between. You can include them all.
something like this will include all of them.
[0-9]{5,7}(?:[a-z][a-z0-9_])[a-zA-Z0-9_ ]([-]?\d*.\d+)(?![-+0-9.])
But that is not what you want.
I think the best bet is either having 2 regular expressions and then combining the result, or splitting the string on spaces/tab characters and checking the 'n'th elements as required

Single pattern to match digits in a SMS message with 2 allowed formas

I am extracting digits from a SMS where I want to extract only the digits in either one of these formats:
ID is xxx
For User: yyyy ID:xxxx
When I use "\\d+" it extracts the username (yyyy) instead of xxxx.
I also tried with the pattern \d+ | [ID:]\d+ but it only works for the SMS with username, not for the first type.
Is there any way to write a regular expression where if this pattern is not matched, it checks for another pattern in Android?
Also, i tried with 2 different patterns with an if and elseif, but that also didn't work. For example:
public Pattern p = Pattern.compile("\\d+");
public Pattern q=Pattern.compile("[ID:]\\d+");
if (msgbody.contains("ID"){
final Matcher m = p.matcher(msgbody);
final Matcher n = q.matcher(msgbody);
if(m.find()){
\\first pattern p matched.
} else if(n.find()){
\\Second Pattern q matched
}
}

While it is perfectly possible to test as many regex matches as you want, both of your test cases match \\d. Then, you're not testing against the 2nd expression (which won't achieve what you want).
You can match both ID is xxx and For User: xxxx ID:yyyy with a single regex.
Use the following expression:
^(?:For User: \\S+ ID:|ID is )(\\d+)$
(?: ... | ... ) is a group with 2 alternatives
For User: \\S+ ID: matches the literal text for your second case, and with any username that does not contain spaces.
ID is matches your first case literally.
(\\d+) matches a number, capturing the match (that we can later reference as m.group[1].
Code
String msgbody = "ID is 12345"; //For testing purposes
Pattern idPatt = Pattern.compile("^(?:For User: \\S+ ID:|ID is )(\\d+)$");
Matcher m = idPatt.matcher(msgbody);
if (m.find()) {
//Print the text matched by the first group (in parentheses)
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("Invalid message body");
}
ideone demo

java regular expression lookahead non-capture but output it

i am trying to use the pattern \w(?=\w) to find 2 consecutive characters using the following,
although lookahead works, i want to output the actual matched but not consume it
here is the code:
Pattern pattern = Pattern.compile("\\w(?=\\w)");
Matcher matcher = pattern.matcher("abcde");
while (matcher.find())
{
System.out.println(matcher.group(0));
}
i want the matching output: ab bc cd de
but i can only get a b c d e
any idea?

The content of the lookahead has zero width, so it is not part of group zero. To do what you want, you need to explicitly capture the content of the lookahead, and then reconstruct the combined text+lookahead, like this:
Pattern pattern = Pattern.compile("\\w(?=(\\w))");
// ^ ^
// | |
// Add a capturing group
Matcher matcher = pattern.matcher("abcde");
while (matcher.find()) {
// Use the captured content of the lookahead below:
System.out.println(matcher.group(0) + matcher.group(1));
}
Demo on ideone.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

getting NULL values from Java regex Matcher with a found pattern - java

Related

First pattern key is always not found

Assign value to variable from ArrayList with regex

Regular Expression (regex). How to ignore or exclude everything in between?

Single pattern to match digits in a SMS message with 2 allowed formas

java regular expression lookahead non-capture but output it

Categories

Resources