Assign value to variable from ArrayList with regex - java

I need to extract the first 6 characters (ICAO) and assign it to a local variable from an ArrayList (taken from https://opensky-network.org/apidoc/rest.html) that looks like this:
["1234a5","ABC123","Schlumpfhausen",1572255699,1572255699,8.9886,48.3756,6278.88,false,155.16,216.64,-6.18,null,6484.62,"3026",false,0
"44035c","LDM87NM ","Austria",1572430045,1572430052,9.2009,48.6891,null,true,0,163,null,null,null,"6463",false,0
.
.
.
]
It is required to use java.util.regex to solve this Problem.
String icao=null;
Pattern icaoPattern = Pattern.compile("([+-]*[a-zA-Z0-9.]*)");
Matcher matcher = icaoPattern.matcher(sentence.getAircraftJson());
if(matcher.find()) {
icao=matcher.group(1);
}
The outcome should be printed like this:
ICAO: 1234a5 | Callsign: ABC123 | ...
ICAO: 44035c | Callsign: LDM87NM| ...
but all I get is
ICAO: | Callsign: | ...
ICAO: | Callsign: | ...

You want group(0) for the first match. The regex also needs some attention unless you can be very sure about the content always being clean. I removed the parentheses and changed the first * to ? and the other to a + but there's a million things that could be done here to make it safer
Pattern icaoPattern = Pattern.compile("[+-]?[a-zA-Z0-9.]+");
Matcher matcher = icaoPattern.matcher(sentence);
while(matcher.find()) {
System.out.println(matcher.group(0));
}

You can try the following regex to get every String including (or not) the double quote (") or any character except commas :
Pattern icaoPattern = Pattern.compile("(?:\"[^\"]*(?:\"\"[^\"]*)*\"|[^,])+");
Matcher matcher = icaoPattern.matcher(result);
while (matcher.find()) {
System.out.println(matcher.group());
}
In your code you run matcher.find() once (in if() statement). You should run it until it does not find any other match, thus the while() loop.
You can later use String methods to remove the leading and trailing quotes, etc.

Your pattern would match at the beginning or the sentence the empty string "".
Also the requirement of having the first 6 characters is tricky, only using regex.
The simplest would be:
Pattern icaoPattern = Pattern.compile("("
+ "[a-zA-Z0-9.]{6}"
+ "|[-+][a-zA-Z0-9.]{5}"
+ "|[-+]{2}[a-zA-Z0-9.]{4}"
+ "|[-+]{3}[a-zA-Z0-9.]{3}"
+ "|[-+]{4}[a-zA-Z0-9.]{2}"
+ "|[-+]{5}[a-zA-Z0-9.]"
+ "|[-+]{6}"
+ ")");

Related

Regular Expression (regex). How to ignore or exclude everything in between?

I have this input text:
142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48
I want to use regular expression to extract 000781fe0000326f and -51.984, so the output looks like this
000781fe0000326f-51.984
I can use [0-9]{5,7}(?:[a-z][a-z0-9_]*) and ([-]?\\d*\\.\\d+)(?![-+0-9\\.]) to extract 000781fe0000326f and -51.984, respectively.
Is there a way to ignore or exclude everything between 000781fe0000326f and -51.984? To ignore everythin that will be captured by the non greedy filler (.*?) ?
String ref="[0-9]{5,7}(?:[a-z][a-z0-9_]*)_____([-]?\\d*\\.\\d+)(?![-+0-9\\.])";
Pattern p = Pattern.compile(ref,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(input);
while (m.find())
{
String all = m.group();
//list3.add(all);
}
For you example data you might use an alternation | to match either one of the regexes in you question and then concatenate them.
Note that in your regex you could write (?:[a-z][a-z0-9_]*) as [a-z][a-z0-9_] and you don't have to escape the dot in a character class.
For example:
[0-9]{5,7}[a-z][a-z0-9_]*|-?\d*\.\d+(?![-+0-9.])
Regex demo
String regex = "[0-9]{5,7}[a-z][a-z0-9_]*|-?\\d*\\.\\d+(?![-+0-9.])";
String string = "142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = "";
while (matcher.find()) {
result += matcher.group(0);
}
System.out.println(result); // 000781fe0000326f-51.984
Demo Java
There's no way to combine strings together like that in pure regex, but it's easy to create a group for the first match, a group for the second match, and then use m.group(1) + m.group(2) to concatenate the two groups together and create your desired combined string.
Also note that [0-9] simplifies to \d, a character set with only one token in it simplifies to just that token, [a-z0-9_] with the i flag simplifies to \w, and there's no need to escape a . inside a character set:
String input = "142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48";
String ref="(\\d{5,7}(?:[a-z]\\w*)).*?((?:-?\\d*\\.\\d+)(?![-+\\d.]))";
Pattern p = Pattern.compile(ref,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(input);
while (m.find())
{
String all = m.group(1) + m.group(2);
System.out.println(all);
}
you cannot really ignore the words in between. You can include them all.
something like this will include all of them.
[0-9]{5,7}(?:[a-z][a-z0-9_])[a-zA-Z0-9_ ]([-]?\d*.\d+)(?![-+0-9.])
But that is not what you want.
I think the best bet is either having 2 regular expressions and then combining the result, or splitting the string on spaces/tab characters and checking the 'n'th elements as required

Subtracting characters in a back reference from a character class in java.util.regex.Pattern

Is it possible to subtract the characters in a Java regex back reference from a character class?
e.g., I want to use String#matches(regex) to match either:
any group of characters that are [a-z'] that are enclosed by "
Matches: "abc'abc"
Doesn't match: "1abc'abc"
Doesn't match: 'abc"abc'
any group of characters that are [a-z"] that are enclosed by '
Matches: 'abc"abc'
Doesn't match: '1abc"abc'
Doesn't match: "abc'abc"
The following regex won't compile because [^\1] isn't supported:
(['"])[a-z'"&&[^\1]]*\1
Obviously, the following will work:
'[a-z"]*'|"[a-z']*"
But, this style isn't particularly legible when a-z is replaced by a much more complex character class that must be kept the same in each side of the "or" condition.
I know that, in Java, I can just use String concatenation like the following:
String charClass = "a-z";
String regex = "'[" + charClass + "\"]*'|\"[" + charClass + "']*\"";
But, sometimes, I need to specify the regex in a config file, like XML, or JSON, etc., where java code is not available.
I assume that what I'm asking is almost definitely not possible, but I figured it wouldn't hurt to ask...
One approach is to use a negative look-ahead to make sure that every character in between the quotes is not the quotes:
(['"])(?:(?!\1)[a-z'"])*+\1
^^^^^^
(I also make the quantifier possessive, since there is no use for backtracking here)
This approach is, however, rather inefficient, since the pattern will check for the quote character for every single character, on top of checking that the character is one of the allowed character.
The alternative with 2 branches in the question '[a-z"]*'|"[a-z']*" is better, since the engine only checks for the quote character once and goes through the rest by checking that the current character is in the character class.
You could use two patterns in one OR-separated pattern, expressing both your cases:
// | case 1: [a-z'] enclosed by "
// | | OR
// | | case 2: [a-z"] enclosed by '
Pattern p = Pattern.compile("(?<=\")([a-z']+)(?=\")|(?<=')([a-z\"]+)(?=')");
String[] test = {
// will match group 1 (for case 1)
"abcd\"efg'h\"ijkl",
// will match group 2 (for case 2)
"abcd'efg\"h'ijkl",
};
for (String t: test) {
Matcher m = p.matcher(t);
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
}
Output
efg'h
null
null
efg"h
Note
There is nothing stopping you from specifying the enclosing characters or the character class itself somewhere else, then building your Pattern with components unknown at compile-time.
Something in the lines of:
// both strings are emulating unknown-value arguments
String unknownEnclosingCharacter = "\"";
String unknownCharacterClass = "a-z'";
// probably want to catch a PatternSyntaxException here for potential
// issues with the given arguments
Pattern p = Pattern.compile(
String.format(
"(?<=%1$s)([%2$s]+)(?=%1$s)",
unknownEnclosingCharacter,
unknownCharacterClass
)
);
String[] test = {
"abcd\"efg'h\"ijkl",
"abcd'efg\"h'ijkl",
};
for (String t: test) {
Matcher m = p.matcher(t);
while (m.find()) {
// note: only main group here
System.out.println(m.group());
}
}
Output
efg'h

Regex to find Integers in particular string lines

I have this regex to find integers in a string (newlines). However, I want to filtrate this. I want the regex to find the number in certain lines, and not others.
String:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern);
match = pr.matcher(test);
System.out.println();
if (match.find()) {
System.out.println("Found: " + match.group());
}
This regex find the integers after the comma, for all the lines. If I want a particular regex to find the integers in the line containing "test1", "test2", and "test3". How should I do this? I want to create three different regex, but my regex skills are weak.
First regex should print out 2. The second 8 and the third 3.
You can expand your pattern to include test[123] in the lookbehind, which would match test1, test2, or test3:
String pattern = "(?<=test[123][^,]{0,100},[^,]{1,100},)\\d+";
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(test);
System.out.println();
while (match.find()) {
System.out.println("Found: " + match.group());
}
The ,[^,] portion skis everything between two commas that follow testN.
I use {0,100} in place of * and {1,100} in place of + inside lookbehind expressions, because Java regex engine requires that lookbehinds had a pre-defined limit on their length. If you need to allow skipping more than 100 characters, adjust the maximum length accordingly.
Demo.
You can use the following Pattern and loop for this:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
// | "test" literal
// | | any number of digits
// | | | comma
// | | | any number of digits
// | | | | comma
// | | | | | group1, your digits
Pattern p = Pattern.compile("test\\d+,\\d+,(\\d+)");
Matcher m = p.matcher(test);
while (m.find()) {
// prints back-reference to group 1
System.out.printf("Found: %s%n", m.group(1));
}
Output
Found: 2
Found: 8
Found: 3
You could also use capturing groups to extract the test number and the other number from the string:
String pattern = "test([123]),\\d+,(\\d+),";
...
while (match.find()) {
// get and parse the number after "test" (first capturing group)
int testNo = Integer.parseInt(match.group(1));
// get and parse the number you wanted to extract (second capturing group)
int num = Integer.parseInt(match.group(2));
System.out.println("test"+testNo+": " + num);
}
Which prints
test1: 2
test2: 8
test3: 3
Note: In this example parsing the strings is only done for demonstration purposes, but it could be useful, if you want to do something with the numbers, like storing them in a array.
Update: If you also want to match strings like "ytrt.ytrwyt.test1.ytrwyt,0,2,0" you could change pattern to "test([123])\\D*,\\d+,(\\d+)," to allow any number of non-digits to follow test1, test2 or test3 (preceding the comma seperated ints).

Looking for advices for regex

I'm trying to create a regex which has to match these patterns:
\n 700000000123
I mean this "\n"+"white space"+"12 digits"
So I tried:
(\\\\)(n)(\\s)(\\d{12})
or something like this:
(\\\\)(n)(\\s)(\\[0-9]{12})
But it still doesn't work. for me {12} means repeat a digit \d or [0-9], 12 times ?
My idea is a java code which could check if a string contains this regex:
Boolean result = false;
String string_to_match = "a random string \n 700000000123"
String re1="(\\\\)";
String re2="(n)";
String re3="(\\s)";
String re4="([0-9]{11})";
Pattern p = Pattern.compile(re1+re2+re3+re4, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
if (string_to_match.contains(p.toString()){
result = true;
}
I tried to use: http://www.txt2re.com/ to help me.
Have you any advices to build this regex ? I would like to understand why at the moment it doesn't work.
You need to use String#matches instead of String#contains to match a regex.
Following should work:
String re1="(\\n)";
String re2="( )";
String re3="(\\d{12})";
Pattern p = Pattern.compile(re1+re2+re3, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
System.out.println("\n 700000000123".matches(p.pattern())); // true
Or simply:
System.out.println( "\n 700000000123".matches("(\n)( )(\\d{12})") ); // true
You can wrap it up in a chain of invocations without compromising to matching the entire input.
For instance:
String input = "\n 700000000123";
System.out.println(Pattern.compile("\n\\s\\d{12}").matcher(input).find());
Output
true
^\\\\n\\s+\\d{12}$
I guess this should work for you.See demo.
https://regex101.com/r/eZ0yP4/33
\n will work only in Unix machines. In Windows it is \r\n. Please use System.getProperty("line.separator") if you want your code to work both in linux and windows.
Use the following
System.out.println(Pattern.compile(System.getProperty("line.separator")+"\s\d{12}").matcher(input).find());

getting NULL values from Java regex Matcher with a found pattern

I'm trying to get the following regex to work on my String:
Pattern Regex = Pattern.compile("(?:(\\d+) ?(days?|d) *?)?(?:(\\d+) ?(hours?|h) *?)?(?:(\\d+) ?(minutes?|m) *?)?(?:(\\d+) ?(seconds?|s))?",
Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher RegexMatcher = Regex.matcher(myString);
while (RegexMatcher.find()) {
...
}
.. it basically splits a string like 1day 3 hours into matched regex groups.
The problem I'm having is that when I get into the while loop, calls to RegexMatcher.group(i) will always return a NULL value, meaning they were not found in the string.
When I try to output RegexMatcher.group(0), it returns an empty string, even though myString definitelly contains like "hello 1d" - which should return at least 1st group as "1" and second as "d".
I've checked and double-checked the regex and it seems to be ok. No Idea what's wrong here.
Thanks for any ideas :-)
For a matcher m, input sequence s, and group index g, the expressions m.group(g) and s.substring(m.start(g), m.end(g)) are equivalent.
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
If the match was successful but the group specified failed to match any part of the input sequence, then null is returned. Note that some groups, for example (a*), match the empty string. This method will return the empty string when such a group successfully matches the empty string in the input.
If you want to ergodic all the matches, you can code like :
Pattern Regex = Pattern
.compile(
"(?:(\\d+) ?(days?|d) *?)?(?:(\\d+) ?(hours?|h) *?)?(?:(\\d+) ?(minutes?|m) *?)?(?:(\\d+) ?(seconds?|s))?",
Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE
| Pattern.UNICODE_CASE);
Matcher RegexMatcher = Regex.matcher("1 d 3 hours");
while (RegexMatcher.find()) {
System.out.println(RegexMatcher.group());
}
Note: m.group() is equivalent to m.group(0)

Categories

Resources