Regex expression to get the file name - java

I want to extract only filename from the complete file name + time stamp . below is the input.
String filePath = "fileName1_20150108.csv";
expected output should be: "fileName1"
String filePath2 = "fileName1_filedesc1_20150108_002_20150109013841.csv"
And expected output should be: "fileName1_filedesc1"
I wrote a below code in java to get the file name but it is working for first part (filePath) but not for filepath2.
Pattern pattern = Pattern.compile(".*.(?=_)");
String filePath = "fileName1_20150108.csv";
String filePath2 = "fileName1_filedesc1_20150108_002_20150109013841.csv";
Matcher matcher = pattern.matcher(filePath);
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
Can somebody please help me to correct the regex so i can parse both filepath using same regex?
Thanks

Anchor the start, and make the .* non-greedy:
^.*?(_\D.*?)?(?=[_.])
Update: change the second group (for fileDesc) to optional, and enforce that it starts with a non-digit character. This will work as long as your fileDesc strings never start with numbers.

You can get the characters before the first underscode, the first underscore, and then the characters until the next underscore:
^[^_]*_[^_]*

This should work: "^(.*?)_([0-9_]*)\\.([^.]*)$"
It will return you 3 groups:
the base name (assuming not a single part will be all numbers)
the timestamp info
the extension.
You can test here: http://fiddle.re/v0hne6 (RegexPlanet)

Related

How can I get non-matching groups using a Matcher in Java?

I'm trying to write a java regex to catch some groups of words from a String using a Matcher.
Say i got this string: "Hello, we are #happy# to see you today".
I would like to get 2 group of matches, one having
Hello, we are
to see you today
and the other
happy
So far, I was only able to match the word between the #s using this Pattern:
Pattern p = Pattern.compile("#(.+?)#");
I've read about negative lookahead and lookaround, played a bit with it but without success.
I assume I should do some sort of negation of the regex so far, but I couldn't come up with anything.
Any help would be really appreciated, thank you.
From comment:
I may incur in a string where I got more than one instances of words wrapped by #, such as "#Hello# kind #stranger#"
From comment:
I need to apply some different style format to both the text inside and outside.
Since you need to apply different stylings, the code need to process each block of text separately, and needs to know if the text is inside or outside a #..# section.
Note, in the following code, it will silently skip the last #, if there is an odd number of them.
String input = ...
for (Matcher m = Pattern.compile("([^#]+)|#([^#]+)#").matcher(input); m.find(); ) {
if (m.start(1) != -1) {
String outsideText = m.group(1);
System.out.println("Outside: \"" + outsideText + "\"");
} else {
String insideText = m.group(2);
System.out.println("Inside: \"" + insideText + "\"");
}
}
Output for input = "Hello, we are #happy# to see you today"
Outside: "Hello, we are "
Inside: "happy"
Outside: " to see you today"
Output for input = "#Hello# kind #stranger#"
Inside: "Hello"
Outside: " kind "
Inside: "stranger"
Output for input = "This #text# has unpaired # characters"
Outside: "This "
Inside: "text"
Outside: " has unpaired "
Outside: " characters"
The best I could do is splitting in 3 groups, then merging the group 1 and 4 :
(^.*)(\#(.+?)\#)(.*)
Test it here
EDIT: Taking remarks from the comments :
(^[^\#]*)(?:\#(.+?)\#)([^\#]*)
Thanks to #Lino we don't capture the useless group with # anymore, and we capture anything except #, instead of any non whitespace character in the 1st and 2nd groups.
Test it here
Is this solution fine?
Pattern pattern =
Pattern.compile("([^#]+)|#([^#]*)#");
Matcher matcher =
pattern.matcher("Hello, we are #happy# to see you today");
List<String> notBetween = new ArrayList<>(); // not surrounded by #
List<String> between = new ArrayList<>(); // surrounded by #
while (matcher.find()) {
if (Objects.nonNull(matcher.group(1))) notBetween.add(matcher.group(1));
if (Objects.nonNull(matcher.group(2))) between.add(matcher.group(2));
}
System.out.println("Printing group 1");
for (String string :
notBetween) {
System.out.println(string);
}
System.out.println("Printing group 2");
for (String string :
between) {
System.out.println(string);
}

Need help in regex matching

It may be very simple, but I am extremely new to regex and have a requirement where I need to do some regex matches in a string and extract the number in it. Below is my code with sample i/p and required o/p. I tried to construct the Pattern by referring to https://www.freeformatter.com/java-regex-tester.html, but my regex match itself is returning false.
Pattern pattern = Pattern.compile(".*/(a-b|c-d|e-f)/([0-9])+(#[0-9]?)");
String str = "foo/bar/Samsung-Galaxy/a-b/1"; // need to extract 1.
String str1 = "foo/bar/Samsung-Galaxy/c-d/1#P2";// need to extract 2.
String str2 = "foo.com/Samsung-Galaxy/9090/c-d/69"; // need to extract 69
System.out.println("result " + pattern.matcher(str).matches());
System.out.println("result " + pattern.matcher(str1).matches());
System.out.println("result " + pattern.matcher(str1).matches());
All of above SOPs are returning false. I am using java 8, is there is any way by which in a single statement I can match the pattern and then extract the digit from the string.
I would be great if somebody can point me on how to debug/develop the regex.Please feel free to let me know if something is not clear in my question.
You may use
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
See the regex demo
When used with matches(), the pattern above does not require explicit anchors, ^ and $.
Details
.* - any 0+ chars other than line break chars, as many as possible
/ - the rightmost / that is followed with the subsequent subpatterns
(?:a-b|c-d|e-f) - a non-capturing group matching any of the alternatives inside: a-b, c-d or e-f
/ - a / char
[^/]*? - any chars other than /, as few as possible
([0-9]+) - Group 1: one or more digits.
Java demo:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
for (String s : strs) {
Matcher m = pattern.matcher(s);
if (m.matches()) {
System.out.println(s + ": \"" + m.group(1) + "\"");
}
}
A replacing approach using the same regex with anchors added:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
String pattern = "^.*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)$";
for (String s : strs) {
System.out.println(s + ": \"" + s.replaceFirst(pattern, "$1") + "\"");
}
See another Java demo.
Output:
foo/bar/Samsung-Galaxy/a-b/1: "1"
foo/bar/Samsung-Galaxy/c-d/1#P2: "2"
foo.com/Samsung-Galaxy/9090/c-d/69: "69"
Because you match always the last number in your regex, I would Like to just use replaceAll with this regex .*?(\d+)$ :
String regex = ".*?(\\d+)$";
String strResult1 = str.replaceAll(regex, "$1");
System.out.println(!strResult1.isEmpty() ? "result " + strResult1 : "no result");
String strResult2 = str1.replaceAll(regex, "$1");
System.out.println(!strResult2.isEmpty() ? "result " + strResult2 : "no result");
String strResult3 = str2.replaceAll(regex, "$1");
System.out.println(!strResult3.isEmpty() ? "result " + strResult3 : "no result");
If the result is empty then you don't have any number.
Outputs
result 1
result 2
result 69
Here is a one-liner using String#replaceAll:
public String getDigits(String input) {
String number = input.replaceAll(".*/(?:a-b|c-d|e-f)/[^/]*?(\\d+)$", "$1");
return number.matches("\\d+") ? number : "no match";
}
System.out.println(getDigits("foo.com/Samsung-Galaxy/9090/c-d/69"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/a-b/some other text/1"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/9090/a-b/69ace"));
69
no match
no match
This works on the sample inputs you provided. Note that I added logic which will display no match for the case where ending digits could not be matched fitting your pattern. In the case of a non-match, we would typically be left with the original input string, which would not be all digits.

Regex to match file extension

I have file names separated by colon :
This one is working as expected
String fileName = "test.pdf:test1.txt:test2.png:test3.jpg:test4.jpeg:test5.doc";
String ext = "pdf";
System.out.println(fileName.matches(".*\\b\\."+ext+":\\b.*"));
but when a matching file is at the end, above solution does not work
String fileName = "test1.txt:test2.png:test3.jpg:test4.jpeg:test5.doc:test.pdf";
What is the regex to achieve it?
Change the pattern to look for : or the end $:
".*\\." + ext + "(:|$).*"
(Also, I removed the unnecessary \\b.)
You can use pattern and matcher.
Pattern pdfPattern = Pattern.compile("\\.pdf");
if(pdfPattern.matcher(fileName).find()){
System.out.println("Found PDF");
}

Java Split String by colon on both side

Can you suggest me an approach by which I can split a String which is like:
:31C:150318
:31D:150425 IN BANGLADESH
:20:314015040086
So I tried to parse that string with
:[A-za-z]|\\d:
This kind of regular expression, but it is not working . Please suggest me a regular expression by which I can split that string with 20 , 31C , 31D etc as Keys and 150318 , 150425 IN BANGLADESH etc as Values .
If I use string.split(":") then it would not serve my purpose.
If a string is like:
:20: MY VALUES : ARE HERE
then It will split up into 3 string , and key 20 will be associated with "MY VALUES" , and "ARE HERE" will not associated with key 20 .
You may use matching mechanism instead of splitting since you need to match a specific colon in the string.
The regex to get 2 groups between the first and second colon and also capture everything after the second colon will look like
^:([^:]*):(.*)$
See demo. The ^ will assert the beginning of the string, ([^:]*) will match and capture into Group 1 zero or more characters other than :, and (.*) will match and capture into Group 2 the rest of the string. $ will assert the position at the end of a single line string (as . matches any symbol but a newline without Pattern.DOTALL modifier).
String s = ":20:AND:HERE";
Pattern pattern = Pattern.compile("^:([^:]*):(.*)$");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1) + ", Value: " + matcher.group(2) + "\n");
}
Result for this demo: Key: 20, Value: AND:HERE
You can use the following to split:
^[:]+([^:]+):
Try with split function of String class
String[] splited = string.split(":");
For your requirements:
String c = ":31D:150425 IN BANGLADESH:todasdsa";
c=c.substring(1);
System.out.println("C="+c);
String key= c.substring(0,c.indexOf(":"));
String value = c.substring(c.indexOf(":")+1);
System.out.println("key="+key+" value="+value);
Result:
C=31D:150425 IN BANGLADESH:todasdsa
key=31D value=150425 IN BANGLADESH:todasdsa

Java last occurence of regex till end of string

I need a regex that gets the text between the last occurrence of .java;[number] or java;NONE and the end of the string.
Here's an example of the text I have as input:
user: ilian
branch: HEAD
changed files:
FlatFilePortfolioImportController.java;1.78
ConvertibleBondParser.java;1.52
OptionKnockedOutException.java;1.1.2.1
RebatePayoff.java;NONE
possible dead-lock. The suggested solution is to first create a TransactionContext and then lock AccountableDataFactory.IMPORT_LOCK and PositionManagerSQL
Basically I need to get the comment at the end of the commit, which is after the last changed file, which could end in something like 1.52, 1.1.2.1 or NONE.
String regex = "\\.java;\\d+\\.\\d+(.+)";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(input);
if (m.find()) {
System.out.println(m.group(1));
}
Edited
Solution assuming input is in a single line (lines in original post only for clarity, see OP's comment below).
String input = "user: ilian branch: "
+ "HEAD changed files: "
+ "FlatFilePortfolioImportController.java;1.78 "
+ "ConvertibleBondParser.java;1.52 "
+ "possible dead-lock. The suggested solution is to first create a "
+ "TransactionContext and then lock AccountableDataFactory.IMPORT_LOCK "
+ "and PositionManagerSQL";
// checks last occurrence of java;x.xx, optional space(s), anything until end of input
Pattern pattern = Pattern.compile(".+java;[\\d\\.]+\\s+?(.+?)$");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
possible dead-lock. The suggested solution is to first create a TransactionContext and then lock AccountableDataFactory.IMPORT_LOCK and PositionManagerSQL
String comment = mydata.replaceAll("(?s).*java;[0-9,.]+|.*java;NONE", "");
System.out.println(comment);
Works for all the file endings and prints correctly.

Categories

Resources