regex to get string in between slash of a url - java

In my website, user enters URL as http://mydomain/shirt/abc etc. I need to get abc as keyword (as logic given below). It's possible with String utils. I think regex would be easier, in which I am not good.
Java code
String testURLs[] = {"", "/" ,"/shirt", "/shirt/", "/shirt/abc", "/shirt/abc/", "/shirt/abc/xyz", "/shirt/abc?x=y", "/shirt/xyz/abc/pqr",
"abc/something", "/abc/some", "abc/shirt/something"};
for(String x : testURLs){
System.out.println(x + " --> " +getRelativeWebappURL(x));
}
private String getAppNameFromURL(String userEnteredURL){
String uri = userEnteredURL.replaceFirst("/shirt/","");
int index = uri.indexOf("/");
if(index == -1 || index == 0){
return null;
}
return uri.substring(0,index);
}
Output: (expected)
--> NULL
/ --> NULL
/shirt --> "" (empty string)
/shirt/ --> "" (empty string)
/shirt/abc --> abc
/shirt/abc/ --> abc
/shirt/abc/xyz --> abc
/shirt/abc?x=y --> abc
/shirt/xyz/abc/pqr --> xyz
abc/something --> NULL
/abc/some --> NULL
abc/shirt/something --> NULL
Literally, the expected logic is :
Get the string just after /shirt/
If starts with "" (empty string) - return NULL
If starts with "/" - return NULL
If starts with "/shirt/" - return "" (empty string)
If starts with "/shirt" - return "" (empty string)
If starts with "/shirt?" - return "" (empty string)
If starts with "/shirt/?" - return "" (empty string)
If starts with "/shirt/abc" - return abc
If starts with "/shirt/xyz/" - return xyz
If starts with "/shirt/xyz/abc" - return xyz
Is there any way to do this with regex ?
I tried in regex but failed!!

Yes it is!
shirt\/(\w+)
is what you need.
private String getAppNameFromURL(String userEnteredURL){
Pattern p = Pattern.compile("shirt\/(\w+)");
Matcher m = p.matcher(userEnteredURL);
if(m.find()){
return m.group(1);
}
return null;
}
This will always give you the first path parameter after shirt. If it is a word and does not start with a digit or any other non alphabetical character.
Also check this https://regexr.com/3o3jm
You can place your expected logic here to check the regex.

Thank you all for your time to help. At last I got an exact idea from all your inputs.
Here is a DEMO
private static String getAppNameFromURL(String userEnteredURL){
if(userEnteredURL!=null && !userEnteredURL.startsWith("/shirt")){
return null;
}
Pattern p = Pattern.compile("^/shirt/([^?]([^\\?/]+|.*))");
Matcher m = p.matcher(userEnteredURL);
if(m.find()){
return m.group(1);
}
return "";
}
Output:
empty string-->null
/-->null
/shirt--> empty string
/shirt/--> empty string
/shirt/abc-->abc
/shirt/abc/-->abc
/shirt/abc/xyz-->abc
/shirt/abc?x=y-->abc
/shirt/xyz/abc/pqr-->xyz
abc/something-->null
/abc/some-->null
abc/shirt/something-->null

Related

Java StringUtils.stripEnd with period, hyphen or underscore

I'm trying to strip trailing characters off of a string using StringUtils.stripEnd, and noticed if I try to strip "_FOO" from "FOO_FOO", this returns an empty string. For example,
import org.apache.commons.lang3.StringUtils;
public class StripTest {
public static void printStripped(String s1, String suffix){
String result = StringUtils.stripEnd(s1, suffix);
System.out.println(String.format("Stripping '%s' from %s --> %s", suffix, s1, result));
}
public static void main(String[] args) {
printStripped("FOO.BAR", ".BAR");
printStripped("BAR.BAR", ".BAR");
printStripped("FOO_BAR", "_BAR");
printStripped("BAR_BAR", "_BAR");
printStripped("FOO-BAR", "-BAR");
printStripped("BAR-BAR", "-BAR");
}
}
Which outputs
Stripping '.BAR' from FOO.BAR --> FOO
Stripping '.BAR' from BAR.BAR -->
Stripping '_BAR' from FOO_BAR --> FOO
Stripping '_BAR' from BAR_BAR -->
Stripping '-BAR' from FOO-BAR --> FOO
Stripping '-BAR' from BAR-BAR -->
Can someone explain this behavior? Didn't see any examples from docs of this case. Using Java 7.
Look at the documentation and examples present in the StringUtils Javadoc:
Strips any of a set of characters from the end of a String.
A null input String returns null. An empty string ("") input returns the empty string.
If the stripChars String is null, whitespace is stripped as defined by Character.isWhitespace(char).
StringUtils.stripEnd(null, *) = null
StringUtils.stripEnd("", *) = ""
StringUtils.stripEnd("abc", "") = "abc"
StringUtils.stripEnd("abc", null) = "abc"
StringUtils.stripEnd(" abc", null) = " abc"
StringUtils.stripEnd("abc ", null) = "abc"
StringUtils.stripEnd(" abc ", null) = " abc"
StringUtils.stripEnd(" abcyx", "xyz") = " abc"
StringUtils.stripEnd("120.00", ".0") = "12"
This is not what you want, as it will strip the SET of characters anywhere from the end. I believe you are looking for removeEnd(...)
Removes a substring only if it is at the end of a source string, otherwise returns the source string.
A null source string will return null. An empty ("") source string will return the empty string. A null search string will return the source string.
StringUtils.removeEnd(null, *) = null
StringUtils.removeEnd("", *) = ""
StringUtils.removeEnd(*, null) = *
StringUtils.removeEnd("www.domain.com", ".com.") = "www.domain.com"
StringUtils.removeEnd("www.domain.com", ".com") = "www.domain"
StringUtils.removeEnd("www.domain.com", "domain") = "www.domain.com"
StringUtils.removeEnd("abc", "") = "abc"
removeEnd(...) operates not a set of characters, but instead a substring, which is what you are trying to extract.

Parsing a String based on '$', as being used string seperator and may be used as the part of the value

"$1200-$2000$amol" -->{$1200-$2000, amol}.
"amol$$1200-$2000" -->{amol,$1200-$2000}.
"amol$1200-2000" -->{amol,1200-2000}.
"amol$$1200-$2000$patare" -->{amol,$1200-$2000,patare}.
"amol$$1200-$2000$patare$$12-$20" -->{amol,$1200-$2000,patare,$12-$20}.
Here, I am looking for the logic which will parse the string (left hand side) and result in a vector(right hand side). '$' is used as a seperator and '$' may be the part of the value for eg. second pattern "amol$$1200-$2000" here $ is seperator between "amol" and "$1200-$2000" as well as '$' is the part of value "$1200-$2000".
private Vector getTockensForLovValue(String lovValue) //...where lovValue is the string to be parsed {
int beginIndex = 0; Vector vector = new Vector();
while (beginIndex < lovValue.length())
{
int dollarIndex = lovValue.indexOf("$", beginIndex);
if (dollarIndex != -1)
{
String s1 = lovValue.substring(beginIndex, dollarIndex);
vector.add(s1);
beginIndex = dollarIndex + 1;
}
else
{
vector.add(lovValue.substring(beginIndex));
beginIndex = lovValue.length();
}
}
return vector;
}
UPDATE: The answer has been updated with an extended regex to also map amol$$1200$patare to {amol,$1200,patare} as request in comment.
You can use a regular expression to do this:
\$\d+(?:-\$\d+)?(?=\$|$)|[^$]+|(?<=^|\$)(?=\$|$)
It says: First try to match $99-$99 or $99. The match must be followed by $ or end-of-string.
If that fails, match a sequence of any character that is not $.
Also match the empty string between two $ or before leading $ or after trailing $.
When specified in a Java string literal, double the \.
private static List<String> parse(String input) {
Pattern p = Pattern.compile("\\$\\d+(?:-\\$\\d+)?(?=\\$|$)|[^$]+|(?<=^|\\$)(?=\\$|$)");
List<String> list = new ArrayList<>();
for (Matcher m = p.matcher(input); m.find(); )
list.add(m.group());
return list;
}
See regex101 for demo.
TEST
public static void main(String[] args) {
test("$1200-$2000$amol");
test("amol$$1200-$2000");
test("amol$1200-2000");
test("amol$$1200-$2000$patare");
test("amol$$1200-$2000$patare$$12-$20");
test("amol$$1200$patare");
test("$$$1200$$");
test("$$$1200x$$");
}
private static void test(String input) {
System.out.println(input + " --> " + parse(input));
}
OUTPUT
$1200-$2000$amol --> [$1200-$2000, amol]
amol$$1200-$2000 --> [amol, $1200-$2000]
amol$1200-2000 --> [amol, 1200-2000]
amol$$1200-$2000$patare --> [amol, $1200-$2000, patare]
amol$$1200-$2000$patare$$12-$20 --> [amol, $1200-$2000, patare, $12-$20]
amol$$1200$patare --> [amol, $1200, patare]
$$$1200$$ --> [, , $1200, , ]
$$$1200x$$ --> [, , , 1200x, , ]

Regex function rename file issue

I'm using following code to rename a file automatically:
public static String getNewNameForCopyFile(final String originalName, final boolean firstCall) {
if (firstCall) {
final Pattern p = Pattern.compile("(.*?)(\\..*)?");
final Matcher m = p.matcher(originalName);
if (m.matches()) { //group 1 is the name, group 2 is the extension
String name = m.group(1);
String extension = m.group(2);
if (extension == null) {
extension = "";
}
return name + "-Copy1" + extension;
} else {
throw new IllegalArgumentException();
}
} else {
final Pattern p = Pattern.compile("(.*?)(-Copy(\\d+))?(\\..*)?");
final Matcher m = p.matcher(originalName);
if (m.matches()) { //group 1 is the prefix, group 2 is the number, group 3 is the suffix
String prefix = m.group(1);
String numberMatch = m.group(3);
String suffix = m.group(4);
return prefix + "-Copy" + (numberMatch == null ? 1 : (Integer.parseInt(numberMatch) + 1)) + (suffix == null ? "" : suffix);
} else {
throw new IllegalArgumentException();
}
}
}
This works mostly only with following filename I'm having a problem and I don't know how to adapt my code:
test.abc.txt
The renamed file becomes 'test-Copy1.abc.txt' but should be 'test.abc-Copy1.txt'.
Do you know how can I achieve this with my method?
If I understand you correctly, you want to insert a copy number before the last dot ('.') in the file name if there is any, and instead you get insertion before the first dot. This arises because you are using a reluctant quantifier for the first group, and the second group is able to match a filename tail containing any number of dots. I think you will do better with this:
final Pattern p = Pattern.compile("(.*?)(\\.[^.]*)?");
Note that if it is present, the second group starts with a dot, but cannot contain other dots.
I think what you're trying to do is find the last '.' in the firstname, correct? I that case you need to use greedy matching .* (which matches as much as possible) instead of .*?
final Pattern p = Pattern.compile("(.*)(\\..*)")
You will need to handle the case with no dot seperately:
if (originalName.indexOf('.') == -1)
return originalName + "-Copy1"
Your other code

Replace environment variable place-holders with their actual value?

In my Application.properties file I am using a key and value like that
report.custom.templates.path=${CATALINA_HOME}\\\\Medic\\\\src\\\\main\\\\reports\\\\AllReports\\\\
I need to replace the ${CATALINA_HOME} with its actual path:
{CATALINA_HOME} = C:\Users\s57893\softwares\apache-tomcat-7.0.27
Here is my code:
public class ReadEnvironmentVar {
public static void main(String args[]) {
String path = getConfigBundle().getString("report.custom.templates.path");
System.out.println("Application Resources : " + path);
String actualPath = resolveEnvVars(path);
System.out.println("Actual Path : " + actualPath);
}
private static ResourceBundle getConfigBundle() {
return ResourceBundle.getBundle("medicweb");
}
private static String resolveEnvVars(String input) {
if (null == input) {
return null;
}
Pattern p = Pattern.compile("\\$\\{(\\w+)\\}|\\$(\\w+)");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String envVarName = null == m.group(1) ? m.group(2) : m.group(1);
String envVarValue = System.getenv(envVarName);
m.appendReplacement(sb, null == envVarValue ? "" : envVarValue);
}
m.appendTail(sb);
return sb.toString();
}
}
from my code I am getting the result as -
Actual Path :
C:Userss57893softwaresapache-tomcat-7.0.27\Medic\src\main\reports\AllReports\
but I need the result as -
Actual Path :
C:\Users\s57893\softwares\apache-tomcat-7.0.27\Medic\src\main\reports\AllReports\
Please send me one example?
Because of the way appendReplacement() works, you'll need to escape backslashes that you find in the environment variable. From the Javadocs:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
I would use:
m.appendReplacement(sb,
null == envVarValue ? "" : Matcher.quoteReplacement(envVarValue));

How can i match particular format in input using java.util.regex in java?

INPUT
Input can be in any of the form shown below with following mandatory content TXT{Any comma separated strings in any format}
String loginURL = "http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}";
String loginURL1 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}";
String loginURL2 = "http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}";
String loginURL3 = "http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}";
String loginURL4 = "http://ip:port/path?username=abcd&password={PASS}";
Required Output
1. OutputURL corresponding to loginURL.
String outputURL = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL1 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL2 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL3 = "http://ip:port/path?";
String outputURL4 = "http://ip:port/path?username=abcd&password={PASS}";
2. Deleted pattern(if any)
String deletedPatteren = TXT{UE-IP,UE-Username,UE-Password}
My Attempts
String loginURLPattern = TXT+"\\{([\\w-,]*)\\}&*";
System.out.println("1. ");
getListOfTemplates(loginURL, loginURLPattern);
System.out.println();
System.out.println("2. ");
getListOfTemplates(loginURL1, loginURLPattern);
System.out.println();
private static void getListOfTemplates(String inputSequence,String pattern){
System.out.println("Input URL : " + inputSequence);
Matcher templateMatcher = Pattern.compile(pattern).matcher(inputSequence);
if (templateMatcher.find() && templateMatcher.group(1).length() > 0) {
System.out.println(templateMatcher.group(1));
System.out.println("OutputURL : " + templateMatcher.replaceAll(""));
}
}
OUTPUT obtained
1.
Input URL : http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}
UE-IP,UE-Username,UE-Password}&password={PASS
OutputURL : http://ip:port/path?username=abcd&location={LOCATION}&
2.
Input URL : http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}
UE-IP,UE-Username,UE-Password
OutputURL : http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&
DRAWBACK OF ABOVE PATTERN
If i add any String containing character like #,%,# in between TXT{} then my code breaks.
How can i achieve it using java.util.regex library so that user can input any comma separated String between TXT{Any Comma Separated Strings}.
I would recommend using Matcher.appendReplacement:
public static void main(final String[] args) throws Exception {
final String[] loginURLs = {
"http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}",
"http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}",
"http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}",
"http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}",
"http://ip:port/path?username=abcd&password={PASS}"};
final Pattern patt = Pattern.compile("(\\?)?&?(TXT\\{[^}]++})(&)?");
for (final String loginURL : loginURLs) {
System.out.printf("%1$-10s %2$s%n", "Processing", loginURL);
final StringBuffer sb = new StringBuffer();
final Matcher matcher = patt.matcher(loginURL);
while (matcher.find()) {
final String found = matcher.group(2);
System.out.printf("%1$-10s %2$s%n", "Found", found);
if (matcher.group(1) != null && matcher.group(3) != null) {
matcher.appendReplacement(sb, "$1");
} else {
matcher.appendReplacement(sb, "$3");
}
}
matcher.appendTail(sb);
System.out.printf("%1$-10s %2$s%n%n", "Processed", sb.toString());
}
}
Output:
Processing http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path
Processing http://ip:port/path?username=abcd&password={PASS}
Processed http://ip:port/path?username=abcd&password={PASS}
As you rightly point out, there are 3 possible cases:
"?{TEXT}&" -> "?"
"&{TEXT}&" -> "&"
"?{TEXT}" -> ""
So what we need to do is test for those cases in the regex. Here is the pattern:
(\\?)?&?(TXT\\{[^}]++})(&)?
Explanation:
(\\?)? optionally matches and captures a ?
&? optionally captures an &
(TXT\\{[^}]++}) matches and captures TXT, followed by {, followed by one or most not } (possessively), followed by } (closing brackets don't need to be escaped
(&)? optionally matches and captures a &
We have 3 groups:
potentially a ?
the required text
potentially an &
Now when we find a match we need to replace with the appropriate capture of case 1..3
if (matcher.group(1) != null && matcher.group(3) != null) {
matcher.appendReplacement(sb, "$1");
} else {
matcher.appendReplacement(sb, "$3");
}
If groups 1 and 3 are both present:
We must be in case 1; we must replace with "?" which is in group 1 so $1.
Otherwise we are in case 2 or 3:
In case 2 we need to replace with "&" and in 3 with "".
In case 2 group 3 will hold "&" and in case 3 it will hold "" so we can replace with $3 in both these cases.
Here I only capture the TXT{...} part using a match group. This means that although the leading ? or & is replaced it is not in the String found. I you only want the bit between {} then just move the parenthesis.
Note that I reuse the Pattern - you can also reuse the Matcher if performance is a concern. You should always reuse the Pattern as it is (very) expensive to create. Store it in a static final if you can - it's threadsafe, matchers are not. The usual way to do it is to store the Pattern in a static final and then reuse the Matcher in the context of a method.
Also, the use of Matcher.appendReplacement is much more efficient than your current approach as it only needs to process the input once. Your approach parses the string twice.

Categories

Resources