Regex function rename file issue - java

I'm using following code to rename a file automatically:
public static String getNewNameForCopyFile(final String originalName, final boolean firstCall) {
if (firstCall) {
final Pattern p = Pattern.compile("(.*?)(\\..*)?");
final Matcher m = p.matcher(originalName);
if (m.matches()) { //group 1 is the name, group 2 is the extension
String name = m.group(1);
String extension = m.group(2);
if (extension == null) {
extension = "";
}
return name + "-Copy1" + extension;
} else {
throw new IllegalArgumentException();
}
} else {
final Pattern p = Pattern.compile("(.*?)(-Copy(\\d+))?(\\..*)?");
final Matcher m = p.matcher(originalName);
if (m.matches()) { //group 1 is the prefix, group 2 is the number, group 3 is the suffix
String prefix = m.group(1);
String numberMatch = m.group(3);
String suffix = m.group(4);
return prefix + "-Copy" + (numberMatch == null ? 1 : (Integer.parseInt(numberMatch) + 1)) + (suffix == null ? "" : suffix);
} else {
throw new IllegalArgumentException();
}
}
}
This works mostly only with following filename I'm having a problem and I don't know how to adapt my code:
test.abc.txt
The renamed file becomes 'test-Copy1.abc.txt' but should be 'test.abc-Copy1.txt'.
Do you know how can I achieve this with my method?

If I understand you correctly, you want to insert a copy number before the last dot ('.') in the file name if there is any, and instead you get insertion before the first dot. This arises because you are using a reluctant quantifier for the first group, and the second group is able to match a filename tail containing any number of dots. I think you will do better with this:
final Pattern p = Pattern.compile("(.*?)(\\.[^.]*)?");
Note that if it is present, the second group starts with a dot, but cannot contain other dots.

I think what you're trying to do is find the last '.' in the firstname, correct? I that case you need to use greedy matching .* (which matches as much as possible) instead of .*?
final Pattern p = Pattern.compile("(.*)(\\..*)")
You will need to handle the case with no dot seperately:
if (originalName.indexOf('.') == -1)
return originalName + "-Copy1"
Your other code

Related

Remove pattern from string in Java

I am currently working on a tool, which helps me to analyze a constantly growing String, that can look like this: String s = "AAAAAAABBCCCDDABQ". What I want to do is to find a sequence of A's and B's, do something and then remove that sequence from the original String.
My code looks like this:
while (someBoolean){
if(Pattern.matches("A+B+", s)) {
//Do stuff
//Remove the found pattern
}
if(Pattern.matches("C+D+", s)) {
//Do other stuff
//Remove the found pattern
}
}
return s;
Also, how I could remove the three sequences, so that s just contains "Q" at the end of the calculation, without and endless loop?
You should use a regex replacement loop, i.e. the methods appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb).
To find one of many patterns, use the | regex matcher, and capture each pattern separately.
You can then use group(int group) to get the matched string for each capture group (first group is group 1), which returns null if that group didn't match. For better performance, to simply check whether the group matched, use start(int group), which returns -1 if that group didn't match.
Example:
String s = "AAAAAAABBCCCDDABQ";
StringBuffer buf = new StringBuffer();
Pattern p = Pattern.compile("(A+B+)|(C+D+)");
Matcher m = p.matcher(s);
while (m.find()) {
if (m.start(1) != -1) { // Group 1 found
System.out.println("Found AB: " + m.group(1));
m.appendReplacement(buf, ""); // Replace matched substring with ""
} else if (m.start(2) != -1) { // Group 2 found
System.out.println("Found CD: " + m.group(2));
m.appendReplacement(buf, ""); // Replace matched substring with ""
}
}
m.appendTail(buf);
String remain = buf.toString();
System.out.println("Remain: " + remain);
Output
Found AB: AAAAAAABB
Found CD: CCCDD
Found AB: AB
Remain: Q
This solution assumes that the string always ends in Q.
String s="AAAAAAABBCCCDDABQ";
Pattern abPattern = Pattern.compile("A+B+");
Pattern cdPattern = Pattern.compile("C+D+");
while (s.length() > 1){
Matcher abMatcher = abPattern.matcher(s);
if (abMatcher.find()) {
s = abMatcher.replaceFirst("");
//Do other stuff
}
Matcher cdMatcher = cdPattern.matcher(s);
if (cdMatcher.find()) {
s = cdMatcher.replaceFirst("");
//Do other stuff
}
}
System.out.println(s);
You are probably looking for something like this:
String input = "AAAAAAABBCCCDDABQ";
String result = input;
String[] chars = {"A", "B", "C", "D"}; // chars to replace
for (String ch : chars) {
if (result.contains(ch)) {
String pattern = "[" + ch + "]+";
result = result.replaceAll(pattern, ch);
}
}
System.out.println(input); //"AAAAAAABBCCCDDABQ"
System.out.println(result); //"ABCDABQ"
This basically replace sequence of each character for single one.
If you want to remove the sequence completely, just replace ch to "" in replaceAll method parameters inside if body.

Regular expression in java that encloses some url

i have this problem:
i have to make a regular expression which take this urls:
http://www.amazon.it/TP-LINK-TL-WR841N-Wireless-300Mbps-Ethernet/dp/B001FWYGJS?ie=UTF8&redirect=true&ref_=s9_simh_gw_p147_d0_i2
http://www.amazon.it/gp/product/B014KMQWU0/
http://www.amazon.it/gp/product/glance/B014KMQWU0/
I need a regular expression which matches the full url until the ASIN of the product (ASIN is a word of 10 capital letters)
I have write this regex but not make what i want:
String regex="http:\\/\\/(?:www\\.|)amazon\\.com\\/(?:gp\\ product|| gp\\ product\\ glance || [^\\/]+\\/dp|dp)\\/([^\\/]{10})";
Pattern pattern=Pattern.compile(regex);
Matcher urlAmazonMatcher = pattern.matcher(url);
while (urlAmazonMatcher.find()) {
System.out.println("PROVA "+urlAmazonMatcher.group(0));
}
This is my solution. Finally it works :D
String regex="(http|www\\.)amazon\\.(com|it|uk|fr|de)\\/(?:gp\\/product|gp\\/product\\/glance|[^\\/]+\\/dp|dp)\\/([^\\/]{10})";
Pattern pattern=Pattern.compile(regex);
Matcher urlAmazonMatcher = pattern.matcher(url);
String toReturn = null;
while (urlAmazonMatcher.find()) {
toReturn=urlAmazonMatcher.group(0);
}
How about
/[^/?]{10}(/$|\?)
This matches 10 characters that are neither / nor ? following a slash if those characters are followed by a final slash or a question mark.
You can get the part that precedes or follows the ASIN using one of the various Matcher functions.
Here is my work from a previous project that was to extract URLs from text:
private Pattern getUriPattern() {
if(uriPattern == null) {
// taken from http://labs.apache.org/webarch/uri/rfc/rfc3986.html
//TODO implement the full URI syntax
String genDelims = "\\:\\/\\?\\#\\[\\]\\#";
String subDelims = "\\!\\$\\&\\'\\*\\+\\,\\;\\=";
String reserved = genDelims + subDelims;
String unreserved = "\\w\\-\\.\\~"; // i.e. ALPHA / DIGIT / "-" / "." / "_" / "~"
String allowed = reserved + unreserved;
// ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
uriPattern = Pattern.compile("((?:[^\\:/\\?\\#]+:)?//[" + allowed + "&&[^\\?\\#]]*(?:\\?([" + allowed + "&&[^\\#]]*))?(?:\\#[" + allowed + "]*)?).*");
}
return uriPattern;
}
You can use the above method as follows:
Matcher uriMatcher =
getUriPattern().matcher(text);
if(uriMatcher.matches()) {
String candidateUriString = uriMatcher.group(1);
try {
new URI(candidateUriString); // check once again if you matched a URL
// your code here
} catch (Exception e) {
// error handling
}
}
This will catch the whole URL, including params. You can then split it up to the first occurence of '?' (if any) and take the first part. Of course, you can rework the regex too.

Replace environment variable place-holders with their actual value?

In my Application.properties file I am using a key and value like that
report.custom.templates.path=${CATALINA_HOME}\\\\Medic\\\\src\\\\main\\\\reports\\\\AllReports\\\\
I need to replace the ${CATALINA_HOME} with its actual path:
{CATALINA_HOME} = C:\Users\s57893\softwares\apache-tomcat-7.0.27
Here is my code:
public class ReadEnvironmentVar {
public static void main(String args[]) {
String path = getConfigBundle().getString("report.custom.templates.path");
System.out.println("Application Resources : " + path);
String actualPath = resolveEnvVars(path);
System.out.println("Actual Path : " + actualPath);
}
private static ResourceBundle getConfigBundle() {
return ResourceBundle.getBundle("medicweb");
}
private static String resolveEnvVars(String input) {
if (null == input) {
return null;
}
Pattern p = Pattern.compile("\\$\\{(\\w+)\\}|\\$(\\w+)");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String envVarName = null == m.group(1) ? m.group(2) : m.group(1);
String envVarValue = System.getenv(envVarName);
m.appendReplacement(sb, null == envVarValue ? "" : envVarValue);
}
m.appendTail(sb);
return sb.toString();
}
}
from my code I am getting the result as -
Actual Path :
C:Userss57893softwaresapache-tomcat-7.0.27\Medic\src\main\reports\AllReports\
but I need the result as -
Actual Path :
C:\Users\s57893\softwares\apache-tomcat-7.0.27\Medic\src\main\reports\AllReports\
Please send me one example?
Because of the way appendReplacement() works, you'll need to escape backslashes that you find in the environment variable. From the Javadocs:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
I would use:
m.appendReplacement(sb,
null == envVarValue ? "" : Matcher.quoteReplacement(envVarValue));

How can i match particular format in input using java.util.regex in java?

INPUT
Input can be in any of the form shown below with following mandatory content TXT{Any comma separated strings in any format}
String loginURL = "http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}";
String loginURL1 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}";
String loginURL2 = "http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}";
String loginURL3 = "http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}";
String loginURL4 = "http://ip:port/path?username=abcd&password={PASS}";
Required Output
1. OutputURL corresponding to loginURL.
String outputURL = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL1 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL2 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL3 = "http://ip:port/path?";
String outputURL4 = "http://ip:port/path?username=abcd&password={PASS}";
2. Deleted pattern(if any)
String deletedPatteren = TXT{UE-IP,UE-Username,UE-Password}
My Attempts
String loginURLPattern = TXT+"\\{([\\w-,]*)\\}&*";
System.out.println("1. ");
getListOfTemplates(loginURL, loginURLPattern);
System.out.println();
System.out.println("2. ");
getListOfTemplates(loginURL1, loginURLPattern);
System.out.println();
private static void getListOfTemplates(String inputSequence,String pattern){
System.out.println("Input URL : " + inputSequence);
Matcher templateMatcher = Pattern.compile(pattern).matcher(inputSequence);
if (templateMatcher.find() && templateMatcher.group(1).length() > 0) {
System.out.println(templateMatcher.group(1));
System.out.println("OutputURL : " + templateMatcher.replaceAll(""));
}
}
OUTPUT obtained
1.
Input URL : http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}
UE-IP,UE-Username,UE-Password}&password={PASS
OutputURL : http://ip:port/path?username=abcd&location={LOCATION}&
2.
Input URL : http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}
UE-IP,UE-Username,UE-Password
OutputURL : http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&
DRAWBACK OF ABOVE PATTERN
If i add any String containing character like #,%,# in between TXT{} then my code breaks.
How can i achieve it using java.util.regex library so that user can input any comma separated String between TXT{Any Comma Separated Strings}.
I would recommend using Matcher.appendReplacement:
public static void main(final String[] args) throws Exception {
final String[] loginURLs = {
"http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}",
"http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}",
"http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}",
"http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}",
"http://ip:port/path?username=abcd&password={PASS}"};
final Pattern patt = Pattern.compile("(\\?)?&?(TXT\\{[^}]++})(&)?");
for (final String loginURL : loginURLs) {
System.out.printf("%1$-10s %2$s%n", "Processing", loginURL);
final StringBuffer sb = new StringBuffer();
final Matcher matcher = patt.matcher(loginURL);
while (matcher.find()) {
final String found = matcher.group(2);
System.out.printf("%1$-10s %2$s%n", "Found", found);
if (matcher.group(1) != null && matcher.group(3) != null) {
matcher.appendReplacement(sb, "$1");
} else {
matcher.appendReplacement(sb, "$3");
}
}
matcher.appendTail(sb);
System.out.printf("%1$-10s %2$s%n%n", "Processed", sb.toString());
}
}
Output:
Processing http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path
Processing http://ip:port/path?username=abcd&password={PASS}
Processed http://ip:port/path?username=abcd&password={PASS}
As you rightly point out, there are 3 possible cases:
"?{TEXT}&" -> "?"
"&{TEXT}&" -> "&"
"?{TEXT}" -> ""
So what we need to do is test for those cases in the regex. Here is the pattern:
(\\?)?&?(TXT\\{[^}]++})(&)?
Explanation:
(\\?)? optionally matches and captures a ?
&? optionally captures an &
(TXT\\{[^}]++}) matches and captures TXT, followed by {, followed by one or most not } (possessively), followed by } (closing brackets don't need to be escaped
(&)? optionally matches and captures a &
We have 3 groups:
potentially a ?
the required text
potentially an &
Now when we find a match we need to replace with the appropriate capture of case 1..3
if (matcher.group(1) != null && matcher.group(3) != null) {
matcher.appendReplacement(sb, "$1");
} else {
matcher.appendReplacement(sb, "$3");
}
If groups 1 and 3 are both present:
We must be in case 1; we must replace with "?" which is in group 1 so $1.
Otherwise we are in case 2 or 3:
In case 2 we need to replace with "&" and in 3 with "".
In case 2 group 3 will hold "&" and in case 3 it will hold "" so we can replace with $3 in both these cases.
Here I only capture the TXT{...} part using a match group. This means that although the leading ? or & is replaced it is not in the String found. I you only want the bit between {} then just move the parenthesis.
Note that I reuse the Pattern - you can also reuse the Matcher if performance is a concern. You should always reuse the Pattern as it is (very) expensive to create. Store it in a static final if you can - it's threadsafe, matchers are not. The usual way to do it is to store the Pattern in a static final and then reuse the Matcher in the context of a method.
Also, the use of Matcher.appendReplacement is much more efficient than your current approach as it only needs to process the input once. Your approach parses the string twice.

Regex to match "path/*.extension"

I am trying to find a regular expression that would match the following format:
path/*.file_extension
For example:
temp/*.jpg
usr/*.pdf
var/lib/myLib.so
tmp/
Using the regex, I want to store the matching parts into a String array, such as:
String[] tokens;
// regex magic here
String path = tokens[0];
String filename = tokens[1];
String extension = tokens[2];
In case of the last case tmp/, that contains no filename and extension, then token[1] and token[2] would be null.
In case of the:
usr/*.pdf
then the token[1] would contain only the string "*".
Thank you very much for your help.
If you can use Java7 then you can use named groups like this
String data = "temp/*.jpg, usr/*.pdf, var/lib/*.so, tmp/*, usr/*, usr/*.*";
Pattern p = Pattern
.compile("(?<path>(\\w+/)+)((?<name>\\w+|[*]))?([.](?<extension>\\w+|[*]))?");
Matcher m = p.matcher(data);
while (m.find()) {
System.out.println("data=" + m.group());
System.out.println("path=" + m.group("path"));
System.out.println("name=" + m.group("name"));
System.out.println("extension=" + m.group("extension"));
System.out.println("------------");
}
This code should wotk:
String line = "var/lib/myLib.so";
Pattern p = Pattern.compile("(.+?(?=/[^/]*$))/([^.]+)\\.(.+)$");
Matcher m = p.matcher(line);
List<String> tokens = new ArrayList<String>();
if (m.find()) {
for (int i=1; i <= m.groupCount(); i++) {
tokens.add(m.group(i));
}
}
System.out.println("Tokens => " + tokens);
OUTPUT:
Tokens => [var/lib, myLib, so]
I'm assuming you're using Java. This should work:
Pattern.compile("path/(.*?)(?:\\.(file_extension))?");
Why use a regular expression?
I personally find lastIndexOf more readable.
String path;
String filename;
#Nullable String extension;
// Look for the last slash
int lastSlash = fullPath.lastIndexOf('/');
// Look for the last dot after the last slash
int lastDot = fullPath.lastIndexOf('.', lastSlash + 1);
if (lastDot < 0) {
filename = fullPath.substring(lastSlash + 1);
// If there is no dot, then there is no extension which
// is distinct from the empty extension in "foo/bar."
extension = null;
} else {
filename = fullPath.substring(lastSlash + 1, lastDot);
extension = fullPath.substring(lastDot + 1);
}
On a different approach, a simple usage of 'substring()/lastIndexOf()' methods should serve the purpose:
String filePath = "var/lib/myLib.so";
String fileName = filePath.substring(filePath.lastIndexOf('/')+1);
String path = filePath.substring(0, filePath.lastIndexOf('/'));
String fileName = fileName.substring(0, fileName.lastIndexOf('.'));
String extension = fileName.substring(fileName.lastIndexOf('.')+1);
Please Note: You need to handle the alternate scenarios e.g. file path without extension.

Categories

Resources