Find string after last underscore before dot extension - java

I need to find 20140809T0000Z in this string:
PREVIMER_F2-MARS3D-MENOR1200_20140809T0000Z.nc
I tried the following to keep the string before the .nc:
(?<=_)(.*)(?=.nc)
I have the following to start from the last underscore:
/_[^_]*$/
How can I find string after last underscore before dot extension, using a regex?

RegEx is not always the best solution... :)
String pattern="PREVIMER_F2-MARS3D-MENOR1200_20140809T0000Z.nc";
int start=pattern.lastIndexOf("_") + 1;
int end=pattern.lastIndexOf(".");
if(start != 0 && end != -1 && end > start) {
System.out.println(pattern.substring(start,end);
}

You just need lookahead for this requirement.
You can use:
[^._]+(?=[^_]*$)
// matches and returns 20140809T0000Z
RegEx Demo

You could use the below regex,
(?<=_)[^_]*(?=\.nc)
In your pattern just replace .* with [^_]* so that it would match the inner string.
DEMO
String s = "PREVIMER_F2-MARS3D-MENOR1200_20140809T0000Z.nc";
Pattern regex = Pattern.compile("(?<=_)[^_]*(?=\\.nc)");
Matcher regexMatcher = regex.matcher(s);
if (regexMatcher.find()) {
String ResultString = regexMatcher.group();
System.out.println(ResultString);
} //=> 20140809T0000Z

You could use a simpler pattern with a capturing group
.*_(.*)\.nc
By default the first .* will be "greedy" and consume as many characters as possible before the _, leaving just the desired string inside the (.*).
Demo: http://regex101.com/r/aI2xQ9/1
Java code:
String input = "PREVIMER_F2-MARS3D-MENOR1200_20140809T0000Z.nc";
Pattern pattern = Pattern.compile(".*_(.*)\\.nc");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
String group = matcher.group(1);
// ...
}

So, you need a sequence of non-underscore characters that immediately precede the period character.
Try [^_.]+(?=\.)
Demo: https://regex101.com/r/sLAnVs/2
Thanks to Cary Swoveland for pointing out that "no need to escape a period in a character class".

Related

How to delete everything after the last number in a String in Java?

If I have a String that consists of letters and numbers, how can I get rid of everything after the last number in the String?
Example:
banana_orange_62_34_wednesday would become banana_orange_62_34
1234_4564_www_6_j_1_rrrr would become 1234_4564_www_6_j_1
I tried this so far:
int endIndex = inputXMLFilename.lastIndexOf("\\d+");
inputXMLFilename = inputXMLFilename.substring(0, endIndex);
Use regex replace:
str = str.replaceAll("\\D+$", "");
What the regex means:
\D means “non-digit”
+ means “one or more of the previous term, greedy (as much of the input as possible)”
$ means “end of input”
The $ anchors the match to the end, without which this would match (and delete) all non-digits.
lastIndexOf() only works with plain text, not regex.
#Test
public void cutAfterLastDigit() {
String s = "banana_orange_62_34_wednesday";
Pattern pattern = Pattern.compile("^(.*\\d)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}

How to check if specific pattern precedes some character?

I am new into java regex and I could't find an answer.
This is my regex: -?\\d*\\.?\\d+(?!i)
and I want it not to recognize eg. String 551i
This is my method:
private static double regexMatcher(String s, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s.replaceAll("\\s+", ""));
if (!matcher.find()) {
return 0;
}
String found = matcher.group();
return Double.parseDouble(matcher.group());
}
I want this method to return 0.0 but it keeps returning 55.0.
What am I doing wrong?
Use an atomic group to avoid backtracking into the whole digit dot digit matching pattern:
"-?(?>\\d*\\.?\\d+)(?!i)"
See the Java demo and a regex demo.

How to match a String in a line having immediate special character?

I have given one condition like below which can not able to match line like from table1; or insert into table1(col1,col2 ..)
if(Arrays.asList(line.split("\"")).contains("table1")) ||
Arrays.asList(line.split(" ")).contains("table1"))
System.out.println(line);
Which logic i need to follow ?
Use a regular expression and place all the special characters which you need to split inside that expression.
if(Arrays.asList(line.split("[\",\s\.]").contains("table1"))
Use a regex match as below
if(Arrays.asList(line.split("[\", .]").contains("table1"))
Note that you can put whatever characters you want to split the line against in the square brackets.
You can use regex:
Pattern pat = Pattern.compile("(?<!\\p{L})table1(?!\\p{L})");
if (pat.matcher(line).find())
{
System.out.println(line);
}
If I understand your question properly, you can achieve it without using Splits:
String stringPattern = ".*table1.*";
Pattern pattern = Pattern.compile(stringPattern);
Matcher matcher = pattern.matcher(line);
if (matcher.matches())
System.out.println(line);
You can use a regexp with negative lookahed and negative lookbehind:
String input = "from table1;";
Pattern p = Pattern.compile("(?<![a-zA-Z0-9_])table1(?![a-zA-Z0-9_])");
Matcher matcher = p.matcher(input);
if (matcher.find())
System.out.println(input);
This will match any "table1" occurences where it is not preceded or followed by any letters, numbers or _ sign.
Try this:
if (Arrays.asList(list.split("[^a-zA-Z0-9_]")).contains("table1")) {
System.out.println(list);
}
Or as RealSkeptic suggests use regular expression matching:
if (list.matches(".*\\btable1\\b.*")) {
System.out.println(list);
}

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.
Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

java Pattern Matching issue

I have an issue to write proper regex to match URL.
String input = "AAAhttp://www.gmail.comBBBBabc#gmail.com"
String regex = "www.*.com" // To match www.gmail.com URL
Pattern p = Pattern.compile(regex)
Matcher m = p.matcher(input)
while(m.find()){
}
Here I want to remove the Url www.gmail.com. However it matches till end of string to match email address also which ends with gmail.com.
Can someone help me to get proper regex to match only the URL?
.* does a greedy match. You have to add ? after * to does an reluctant match.
"www\\..*?\\.com"
Your code would be,
String s = "AAAhttp://www.gmail.comBBBBabc#gmail.com";
Pattern p = Pattern.compile("www\\..*?\\.com");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(0));
}
IDEONE
String regex = "www\\..*?\\.com"
Non-greedy repetition of the wildcard '.' and escape dot when literally
A negated character class is faster than .*?
Use this regex:
www\.[^.]+\.com
[^.]+ means any character that is not a dot.
In Java we need to escape some characters:
// for instance
Pattern regex = Pattern.compile("www\\.[^.]+\\.com");
// etc

Categories

Resources