I have this regex to find integers in a string (newlines). However, I want to filtrate this. I want the regex to find the number in certain lines, and not others.
String:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern);
match = pr.matcher(test);
System.out.println();
if (match.find()) {
System.out.println("Found: " + match.group());
}
This regex find the integers after the comma, for all the lines. If I want a particular regex to find the integers in the line containing "test1", "test2", and "test3". How should I do this? I want to create three different regex, but my regex skills are weak.
First regex should print out 2. The second 8 and the third 3.
You can expand your pattern to include test[123] in the lookbehind, which would match test1, test2, or test3:
String pattern = "(?<=test[123][^,]{0,100},[^,]{1,100},)\\d+";
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(test);
System.out.println();
while (match.find()) {
System.out.println("Found: " + match.group());
}
The ,[^,] portion skis everything between two commas that follow testN.
I use {0,100} in place of * and {1,100} in place of + inside lookbehind expressions, because Java regex engine requires that lookbehinds had a pre-defined limit on their length. If you need to allow skipping more than 100 characters, adjust the maximum length accordingly.
Demo.
You can use the following Pattern and loop for this:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
// | "test" literal
// | | any number of digits
// | | | comma
// | | | any number of digits
// | | | | comma
// | | | | | group1, your digits
Pattern p = Pattern.compile("test\\d+,\\d+,(\\d+)");
Matcher m = p.matcher(test);
while (m.find()) {
// prints back-reference to group 1
System.out.printf("Found: %s%n", m.group(1));
}
Output
Found: 2
Found: 8
Found: 3
You could also use capturing groups to extract the test number and the other number from the string:
String pattern = "test([123]),\\d+,(\\d+),";
...
while (match.find()) {
// get and parse the number after "test" (first capturing group)
int testNo = Integer.parseInt(match.group(1));
// get and parse the number you wanted to extract (second capturing group)
int num = Integer.parseInt(match.group(2));
System.out.println("test"+testNo+": " + num);
}
Which prints
test1: 2
test2: 8
test3: 3
Note: In this example parsing the strings is only done for demonstration purposes, but it could be useful, if you want to do something with the numbers, like storing them in a array.
Update: If you also want to match strings like "ytrt.ytrwyt.test1.ytrwyt,0,2,0" you could change pattern to "test([123])\\D*,\\d+,(\\d+)," to allow any number of non-digits to follow test1, test2 or test3 (preceding the comma seperated ints).
Related
I need to extract the first 6 characters (ICAO) and assign it to a local variable from an ArrayList (taken from https://opensky-network.org/apidoc/rest.html) that looks like this:
["1234a5","ABC123","Schlumpfhausen",1572255699,1572255699,8.9886,48.3756,6278.88,false,155.16,216.64,-6.18,null,6484.62,"3026",false,0
"44035c","LDM87NM ","Austria",1572430045,1572430052,9.2009,48.6891,null,true,0,163,null,null,null,"6463",false,0
.
.
.
]
It is required to use java.util.regex to solve this Problem.
String icao=null;
Pattern icaoPattern = Pattern.compile("([+-]*[a-zA-Z0-9.]*)");
Matcher matcher = icaoPattern.matcher(sentence.getAircraftJson());
if(matcher.find()) {
icao=matcher.group(1);
}
The outcome should be printed like this:
ICAO: 1234a5 | Callsign: ABC123 | ...
ICAO: 44035c | Callsign: LDM87NM| ...
but all I get is
ICAO: | Callsign: | ...
ICAO: | Callsign: | ...
You want group(0) for the first match. The regex also needs some attention unless you can be very sure about the content always being clean. I removed the parentheses and changed the first * to ? and the other to a + but there's a million things that could be done here to make it safer
Pattern icaoPattern = Pattern.compile("[+-]?[a-zA-Z0-9.]+");
Matcher matcher = icaoPattern.matcher(sentence);
while(matcher.find()) {
System.out.println(matcher.group(0));
}
You can try the following regex to get every String including (or not) the double quote (") or any character except commas :
Pattern icaoPattern = Pattern.compile("(?:\"[^\"]*(?:\"\"[^\"]*)*\"|[^,])+");
Matcher matcher = icaoPattern.matcher(result);
while (matcher.find()) {
System.out.println(matcher.group());
}
In your code you run matcher.find() once (in if() statement). You should run it until it does not find any other match, thus the while() loop.
You can later use String methods to remove the leading and trailing quotes, etc.
Your pattern would match at the beginning or the sentence the empty string "".
Also the requirement of having the first 6 characters is tricky, only using regex.
The simplest would be:
Pattern icaoPattern = Pattern.compile("("
+ "[a-zA-Z0-9.]{6}"
+ "|[-+][a-zA-Z0-9.]{5}"
+ "|[-+]{2}[a-zA-Z0-9.]{4}"
+ "|[-+]{3}[a-zA-Z0-9.]{3}"
+ "|[-+]{4}[a-zA-Z0-9.]{2}"
+ "|[-+]{5}[a-zA-Z0-9.]"
+ "|[-+]{6}"
+ ")");
I want to capture all the consecutive groups in a binary string
1000011100001100111100001
should give me
1
0000
111
0000
11
00
1111
0000
1
I have made ([1?|0?]+) regex in my java application to group the consequential 1 or 0 in the string like 10000111000011.
But when I run it in my code, there is nothing in the console printed:
String name ="10000111000011";
regex("(\\[1?|0?]+)" ,name);
public static void regex(String regex, String searchedString) {
Pattern pattern = Pattern.compile(regex);
Matcher regexMatcher = pattern.matcher(searchedString);
while (regexMatcher.find())
if (regexMatcher.group().length() > 0)
System.out.println(regexMatcher.group());
}
To avoid syntax error in the runtime of regex, I have changed the ([1?|0?]+) to the (\\[1?|0?]+)
Why there is no group based on regex?
First - just as an explanation - your regex defines a character class ([ ... ]) that matches any of the characters 1, ?, | or 0 one or more times (+). I think you mean to have ( ... ) in it, among other things, which would make the | an alternation lazy matching a 0 or a 1. But that's not either what you want (I think ;).
Now, the solution might be this:
([01])\1*
which matches a 0 or a 1, and captures it. Then it matches any number of the same digit (\1 is a back reference to what ever is captured in the first capture group - in this case the 0 or the 1) any number of times.
Check it out at ideone.
You can try this:
(1+|0+)
Explanation
Sample Code:
final String regex = "(1+|0+)";
final String string = "10000111000011\n"
+ "11001111110011";
final Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Group " + 1 + ": " + matcher.group(1));
}
I am trying to pull two strings (which represent integers i where -999999 <= i <= 999999) out of the string 'left.' There will always be exactly two strings representing two integers. Also I want the regex to match {"-1", "2"} for "-1-2", not {"-1", "-2"}. I've been going through the tutorials on http://www.regular-expressions.info and the stackoverflow regex page for going on four hours now. I am testing my expressions in a Java program. Here's what I've got
String left = "-123--4567";
Pattern pattern = Pattern.compile("-?[0-9]{1,6}");
Matcher matcher = pattern.matcher(left);
arg1 = matcher.group(1);
arg2 = matcher.group(2);
System.out.println("arg1: " + arg1 + " arg2: " + arg2);
This code should produce
arg1: -123 arg2: -4567
Here's a self-contained example of what you're probably trying to do:
String[] examples = {
"-123--4567",
"123-4567",
"-123-4567",
"123--4567"
};
// ┌ group 1:
// |┌ zero or one "-"
// || ┌ any number of digits (at least one)
// || | ┌ zero or one "-" as separator
// || | | ┌ group 2
// || | | |┌ zero or one "-"
// || | | || ┌ any number of digits (at least one)
Pattern p = Pattern.compile("(-?\\d+)-?(-?\\d+)");
// iterating over examples
for (String s: examples) {
// matching
Matcher m = p.matcher(s);
// iterating over matches (only 1 per example here)
while (m.find()) {
// printing out group1 --> group 2 back references
System.out.printf("%s --> %s%n", m.group(1), m.group(2));
}
}
Output
-123 --> -4567
123 --> 4567
-123 --> 4567
123 --> -4567
You can use this regex:
(-?[0-9]{1,6})-?
And grab capture group #1
RegEx Demo
I want to split this string:
=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))
[IF, BI18=0, INTYEARTODAY, IF, INTYEARBI18>2025, 2025, INTYEARBI18]
I tried it with that regex:
String[] result = text.substring(1, text.length()).split("[;()]+");
However, I am getting:
[IF, BI18=0, INT, YEAR, TODAY, IF, INT, YEAR, BI18, >2025, 2025, INT, YEAR, BI18]
I am struggeling to identify the excel methods generically.
I would appreciate your answer, to split the string generically as expected.
Following up on the comments, if you want the main contents of the IF(...) conditions wherein the ... is the content, here's a quick solution.
Please note that albeit this solution applies to the input at hand, it may be unreliable in other cases, with nested statements - basically it's a workaround.
String formula = "=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))";
// | positive lookbehind: starts with "IF("
// | | any character, reluctantly quantified
// | | | positive lookahead, followed by
// | | | ")", then...
// | | | | ";" or end of input
// | | | |
Pattern p = Pattern.compile("(?<=IF\\().+?(?=\\)(;|$))");
Matcher m = p.matcher(formula);
while (m.find()) {
System.out.println(m.group());
}
Output
BI18=0;INT(YEAR(TODAY())
INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18)))
Try,
String str1 = "=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))";
ArrayList<String> strList = new ArrayList<String>();
for(String str2 : str1.replaceFirst("=", "").split(";")){
if(str2.contains("IF")){
strList.add("IF");
strList.add(str2.replaceAll("IF|\\(|\\)", ""));
}else{
strList.add(str2.replaceAll("\\(|\\)", ""));
}
}
System.out.println(strList.toString());
Output:
[IF, BI18=0, INTYEARTODAY, IF, INTYEARBI18>2025, 2025, INTYEARBI18]
You can use this regex. In the demo, make sure to look at the capture groups on the right.
^=([^(]+)\(|\G([^;]+)[;|)$]
We retrieve the matches from capture Groups 1 and 2.
In Java, this means something like this:
Pattern regex = Pattern.compile("^=([^(]+)\\(|\\G([^;]+)[;|)$]");
Matcher regexMatcher = regex.matcher(your_original_string);
while (regexMatcher.find()) {
// check Group 1, which is regexMatcher.group(1)
// check Group 2, which is regexMatcher.group(2)
}
I have this regex in java
String pattern = "(\\s)(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})(\\s)";
It works as intended but I have a new problem to get some valid dates:
1st problem:
If I have this String It was at 22-febrero-1999 and 10-enero-2009 and 01-diciembre-2000 I should get another string as febrero-enero-diciembre and I only get febrero-enero
2nd problem
If I have a single date in a String like 12-octubre-1989 I get an emptry String.
Why I have in my pattern to have whitespaces in the start and end of any date? because I have to catch only valid months in a String like adsadasd 12-validMonth-2999 asd 11-validMonth-1989 I should get both validMonth, then never get a validMonth in a String like asdadsad12-validMonth-1989 asdadsad 23-validMonth-1989 in the last one I only should get the last validMonth
PD: My java code is
String resultado = "";
String pattern = "(\\s)(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})(\\s)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(fecha);
while (m.find()) {
resultado += m.group().split("-")[1] + "-";
}
return (resultado.compareTo("") == 0 ? "" : resultado.substring(0, resultado.length() - 1));
You might want to use a word boundary instead:
\\b(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})\\b
And I believe some of the months can be optimized a little bit (it could reduce readability unfortunately, but should speed things up by a notch):
\\b(\\d{2}-)((?:en|febr)ero|ma(?:rz|y)o|abril|ju[ln]io|agosto|(?:septiem|octu|noviem|diciem)bre)(-\\d{4})\\b
Perhaps try using a \b instead of \s:
String pattern = "\\b(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})\\b";
This will only match strings where the first digit is not preceded by another word character (digit, letter, or underscore), and the last digit is not followed by a word character. I've also removed the capturing groups around the \b, because it would always be a zero-length string, if matched.
I wouldn't use a word boundry as a delimeter.
I'd suggest to use either whitespace or NOT digit,
or no delimeter and put in a validation range of numbers for day/year.
This way you may catch more embeded dates that are in close
proximity (adjacent) to letters and underscore.
Something like:
# "(?<!\\d)\\d{2}-(?:enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)-\\d{4}(?!\\d)"
(?<! \d ) # Not a digit before us
\d{2} - # Two digits followed by dash
(?: # A month
enero
| febrero
| marzo
| abril
| mayo
| junio
| julio
| agosto
| septiembre
| octubre
| noviembre
| diciembre
)
- \d{4} # Dash followed by four digits
(?! \d ) # Not a digit after us