Split by excel conditions - java

I want to split this string:
=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))
[IF, BI18=0, INTYEARTODAY, IF, INTYEARBI18>2025, 2025, INTYEARBI18]
I tried it with that regex:
String[] result = text.substring(1, text.length()).split("[;()]+");
However, I am getting:
[IF, BI18=0, INT, YEAR, TODAY, IF, INT, YEAR, BI18, >2025, 2025, INT, YEAR, BI18]
I am struggeling to identify the excel methods generically.
I would appreciate your answer, to split the string generically as expected.

Following up on the comments, if you want the main contents of the IF(...) conditions wherein the ... is the content, here's a quick solution.
Please note that albeit this solution applies to the input at hand, it may be unreliable in other cases, with nested statements - basically it's a workaround.
String formula = "=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))";
// | positive lookbehind: starts with "IF("
// | | any character, reluctantly quantified
// | | | positive lookahead, followed by
// | | | ")", then...
// | | | | ";" or end of input
// | | | |
Pattern p = Pattern.compile("(?<=IF\\().+?(?=\\)(;|$))");
Matcher m = p.matcher(formula);
while (m.find()) {
System.out.println(m.group());
}
Output
BI18=0;INT(YEAR(TODAY())
INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18)))

Try,
String str1 = "=IF(BI18=0;INT(YEAR(TODAY()));IF(INT(YEAR(BI18))>2025;2025;INT(YEAR(BI18))))";
ArrayList<String> strList = new ArrayList<String>();
for(String str2 : str1.replaceFirst("=", "").split(";")){
if(str2.contains("IF")){
strList.add("IF");
strList.add(str2.replaceAll("IF|\\(|\\)", ""));
}else{
strList.add(str2.replaceAll("\\(|\\)", ""));
}
}
System.out.println(strList.toString());
Output:
[IF, BI18=0, INTYEARTODAY, IF, INTYEARBI18>2025, 2025, INTYEARBI18]

You can use this regex. In the demo, make sure to look at the capture groups on the right.
^=([^(]+)\(|\G([^;]+)[;|)$]
We retrieve the matches from capture Groups 1 and 2.
In Java, this means something like this:
Pattern regex = Pattern.compile("^=([^(]+)\\(|\\G([^;]+)[;|)$]");
Matcher regexMatcher = regex.matcher(your_original_string);
while (regexMatcher.find()) {
// check Group 1, which is regexMatcher.group(1)
// check Group 2, which is regexMatcher.group(2)
}

Related

Assign value to variable from ArrayList with regex

I need to extract the first 6 characters (ICAO) and assign it to a local variable from an ArrayList (taken from https://opensky-network.org/apidoc/rest.html) that looks like this:
["1234a5","ABC123","Schlumpfhausen",1572255699,1572255699,8.9886,48.3756,6278.88,false,155.16,216.64,-6.18,null,6484.62,"3026",false,0
"44035c","LDM87NM ","Austria",1572430045,1572430052,9.2009,48.6891,null,true,0,163,null,null,null,"6463",false,0
.
.
.
]
It is required to use java.util.regex to solve this Problem.
String icao=null;
Pattern icaoPattern = Pattern.compile("([+-]*[a-zA-Z0-9.]*)");
Matcher matcher = icaoPattern.matcher(sentence.getAircraftJson());
if(matcher.find()) {
icao=matcher.group(1);
}
The outcome should be printed like this:
ICAO: 1234a5 | Callsign: ABC123 | ...
ICAO: 44035c | Callsign: LDM87NM| ...
but all I get is
ICAO: | Callsign: | ...
ICAO: | Callsign: | ...
You want group(0) for the first match. The regex also needs some attention unless you can be very sure about the content always being clean. I removed the parentheses and changed the first * to ? and the other to a + but there's a million things that could be done here to make it safer
Pattern icaoPattern = Pattern.compile("[+-]?[a-zA-Z0-9.]+");
Matcher matcher = icaoPattern.matcher(sentence);
while(matcher.find()) {
System.out.println(matcher.group(0));
}
You can try the following regex to get every String including (or not) the double quote (") or any character except commas :
Pattern icaoPattern = Pattern.compile("(?:\"[^\"]*(?:\"\"[^\"]*)*\"|[^,])+");
Matcher matcher = icaoPattern.matcher(result);
while (matcher.find()) {
System.out.println(matcher.group());
}
In your code you run matcher.find() once (in if() statement). You should run it until it does not find any other match, thus the while() loop.
You can later use String methods to remove the leading and trailing quotes, etc.
Your pattern would match at the beginning or the sentence the empty string "".
Also the requirement of having the first 6 characters is tricky, only using regex.
The simplest would be:
Pattern icaoPattern = Pattern.compile("("
+ "[a-zA-Z0-9.]{6}"
+ "|[-+][a-zA-Z0-9.]{5}"
+ "|[-+]{2}[a-zA-Z0-9.]{4}"
+ "|[-+]{3}[a-zA-Z0-9.]{3}"
+ "|[-+]{4}[a-zA-Z0-9.]{2}"
+ "|[-+]{5}[a-zA-Z0-9.]"
+ "|[-+]{6}"
+ ")");

Regex to find Integers in particular string lines

I have this regex to find integers in a string (newlines). However, I want to filtrate this. I want the regex to find the number in certain lines, and not others.
String:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern);
match = pr.matcher(test);
System.out.println();
if (match.find()) {
System.out.println("Found: " + match.group());
}
This regex find the integers after the comma, for all the lines. If I want a particular regex to find the integers in the line containing "test1", "test2", and "test3". How should I do this? I want to create three different regex, but my regex skills are weak.
First regex should print out 2. The second 8 and the third 3.
You can expand your pattern to include test[123] in the lookbehind, which would match test1, test2, or test3:
String pattern = "(?<=test[123][^,]{0,100},[^,]{1,100},)\\d+";
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(test);
System.out.println();
while (match.find()) {
System.out.println("Found: " + match.group());
}
The ,[^,] portion skis everything between two commas that follow testN.
I use {0,100} in place of * and {1,100} in place of + inside lookbehind expressions, because Java regex engine requires that lookbehinds had a pre-defined limit on their length. If you need to allow skipping more than 100 characters, adjust the maximum length accordingly.
Demo.
You can use the following Pattern and loop for this:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
// | "test" literal
// | | any number of digits
// | | | comma
// | | | any number of digits
// | | | | comma
// | | | | | group1, your digits
Pattern p = Pattern.compile("test\\d+,\\d+,(\\d+)");
Matcher m = p.matcher(test);
while (m.find()) {
// prints back-reference to group 1
System.out.printf("Found: %s%n", m.group(1));
}
Output
Found: 2
Found: 8
Found: 3
You could also use capturing groups to extract the test number and the other number from the string:
String pattern = "test([123]),\\d+,(\\d+),";
...
while (match.find()) {
// get and parse the number after "test" (first capturing group)
int testNo = Integer.parseInt(match.group(1));
// get and parse the number you wanted to extract (second capturing group)
int num = Integer.parseInt(match.group(2));
System.out.println("test"+testNo+": " + num);
}
Which prints
test1: 2
test2: 8
test3: 3
Note: In this example parsing the strings is only done for demonstration purposes, but it could be useful, if you want to do something with the numbers, like storing them in a array.
Update: If you also want to match strings like "ytrt.ytrwyt.test1.ytrwyt,0,2,0" you could change pattern to "test([123])\\D*,\\d+,(\\d+)," to allow any number of non-digits to follow test1, test2 or test3 (preceding the comma seperated ints).

What is the regex pattern for comma, space plus a certain word?

I need to split a string using Java's split() method. How to write the regex pattern for delimiters that is a certain word? For example, "and"?
I got the pattern for splitting space and comma which is [,\\s] but I want to add the word, and so that it also becomes a delimiter.
I tried many combinations including [,\\s]|(and) but no luck.
Not really sure without an input and desired output, but you could change your last pattern to something like: \\s(?!and|,)|\\s*,\\s*|\\s+and\\s+.
For instance:
String toSplit = "Blah,blah, foo ,bar and blah again";
System.out.println(
Arrays.toString(
toSplit.split(
// ┌ whitespace not followed by "and" or ","
// | ┌ or
// | | ┌ 0/more whitespace, ",", 0/more whitespace
// | | | ┌ or
// | | | |┌ 1/more whitespace, "and", 1/more ws
// | | | ||
"\\s(?!and|,)|\\s*,\\s*|\\s+and\\s+"
)
)
);
Output
[Blah, blah, foo, bar, blah, again]
You can try:
String[] toks = input.split( "\\s*\\band\b\\s*|[,\\s]" );
You can use an alternation operator. Here is a sample program:
String string = "My mother and I";
String[] parts = string.split("(?:[,\\s]|and)");
for (int i=0; i<parts.length; i++) {
System.out.println(parts[i]);
}
Output:
My
mother
I

Matching a whitespace or emptry string using regex in Java

I have this regex in java
String pattern = "(\\s)(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})(\\s)";
It works as intended but I have a new problem to get some valid dates:
1st problem:
If I have this String It was at 22-febrero-1999 and 10-enero-2009 and 01-diciembre-2000 I should get another string as febrero-enero-diciembre and I only get febrero-enero
2nd problem
If I have a single date in a String like 12-octubre-1989 I get an emptry String.
Why I have in my pattern to have whitespaces in the start and end of any date? because I have to catch only valid months in a String like adsadasd 12-validMonth-2999 asd 11-validMonth-1989 I should get both validMonth, then never get a validMonth in a String like asdadsad12-validMonth-1989 asdadsad 23-validMonth-1989 in the last one I only should get the last validMonth
PD: My java code is
String resultado = "";
String pattern = "(\\s)(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})(\\s)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(fecha);
while (m.find()) {
resultado += m.group().split("-")[1] + "-";
}
return (resultado.compareTo("") == 0 ? "" : resultado.substring(0, resultado.length() - 1));
You might want to use a word boundary instead:
\\b(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})\\b
And I believe some of the months can be optimized a little bit (it could reduce readability unfortunately, but should speed things up by a notch):
\\b(\\d{2}-)((?:en|febr)ero|ma(?:rz|y)o|abril|ju[ln]io|agosto|(?:septiem|octu|noviem|diciem)bre)(-\\d{4})\\b
Perhaps try using a \b instead of \s:
String pattern = "\\b(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})\\b";
This will only match strings where the first digit is not preceded by another word character (digit, letter, or underscore), and the last digit is not followed by a word character. I've also removed the capturing groups around the \b, because it would always be a zero-length string, if matched.
I wouldn't use a word boundry as a delimeter.
I'd suggest to use either whitespace or NOT digit,
or no delimeter and put in a validation range of numbers for day/year.
This way you may catch more embeded dates that are in close
proximity (adjacent) to letters and underscore.
Something like:
# "(?<!\\d)\\d{2}-(?:enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)-\\d{4}(?!\\d)"
(?<! \d ) # Not a digit before us
\d{2} - # Two digits followed by dash
(?: # A month
enero
| febrero
| marzo
| abril
| mayo
| junio
| julio
| agosto
| septiembre
| octubre
| noviembre
| diciembre
)
- \d{4} # Dash followed by four digits
(?! \d ) # Not a digit after us

getting NULL values from Java regex Matcher with a found pattern

I'm trying to get the following regex to work on my String:
Pattern Regex = Pattern.compile("(?:(\\d+) ?(days?|d) *?)?(?:(\\d+) ?(hours?|h) *?)?(?:(\\d+) ?(minutes?|m) *?)?(?:(\\d+) ?(seconds?|s))?",
Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher RegexMatcher = Regex.matcher(myString);
while (RegexMatcher.find()) {
...
}
.. it basically splits a string like 1day 3 hours into matched regex groups.
The problem I'm having is that when I get into the while loop, calls to RegexMatcher.group(i) will always return a NULL value, meaning they were not found in the string.
When I try to output RegexMatcher.group(0), it returns an empty string, even though myString definitelly contains like "hello 1d" - which should return at least 1st group as "1" and second as "d".
I've checked and double-checked the regex and it seems to be ok. No Idea what's wrong here.
Thanks for any ideas :-)
For a matcher m, input sequence s, and group index g, the expressions m.group(g) and s.substring(m.start(g), m.end(g)) are equivalent.
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
If the match was successful but the group specified failed to match any part of the input sequence, then null is returned. Note that some groups, for example (a*), match the empty string. This method will return the empty string when such a group successfully matches the empty string in the input.
If you want to ergodic all the matches, you can code like :
Pattern Regex = Pattern
.compile(
"(?:(\\d+) ?(days?|d) *?)?(?:(\\d+) ?(hours?|h) *?)?(?:(\\d+) ?(minutes?|m) *?)?(?:(\\d+) ?(seconds?|s))?",
Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE
| Pattern.UNICODE_CASE);
Matcher RegexMatcher = Regex.matcher("1 d 3 hours");
while (RegexMatcher.find()) {
System.out.println(RegexMatcher.group());
}
Note: m.group() is equivalent to m.group(0)

Categories

Resources