regex to get string in between slash of a url

regex to get string in between slash of a url - java

In my website, user enters URL as http://mydomain/shirt/abc etc. I need to get abc as keyword (as logic given below). It's possible with String utils. I think regex would be easier, in which I am not good.
Java code
String testURLs[] = {"", "/" ,"/shirt", "/shirt/", "/shirt/abc", "/shirt/abc/", "/shirt/abc/xyz", "/shirt/abc?x=y", "/shirt/xyz/abc/pqr",
"abc/something", "/abc/some", "abc/shirt/something"};
for(String x : testURLs){
System.out.println(x + " --> " +getRelativeWebappURL(x));
}
private String getAppNameFromURL(String userEnteredURL){
String uri = userEnteredURL.replaceFirst("/shirt/","");
int index = uri.indexOf("/");
if(index == -1 || index == 0){
return null;
}
return uri.substring(0,index);
}
Output: (expected)
--> NULL
/ --> NULL
/shirt --> "" (empty string)
/shirt/ --> "" (empty string)
/shirt/abc --> abc
/shirt/abc/ --> abc
/shirt/abc/xyz --> abc
/shirt/abc?x=y --> abc
/shirt/xyz/abc/pqr --> xyz
abc/something --> NULL
/abc/some --> NULL
abc/shirt/something --> NULL
Literally, the expected logic is :
Get the string just after /shirt/
If starts with "" (empty string) - return NULL
If starts with "/" - return NULL
If starts with "/shirt/" - return "" (empty string)
If starts with "/shirt" - return "" (empty string)
If starts with "/shirt?" - return "" (empty string)
If starts with "/shirt/?" - return "" (empty string)
If starts with "/shirt/abc" - return abc
If starts with "/shirt/xyz/" - return xyz
If starts with "/shirt/xyz/abc" - return xyz
Is there any way to do this with regex ?
I tried in regex but failed!!

Yes it is!
shirt\/(\w+)
is what you need.
private String getAppNameFromURL(String userEnteredURL){
Pattern p = Pattern.compile("shirt\/(\w+)");
Matcher m = p.matcher(userEnteredURL);
if(m.find()){
return m.group(1);
}
return null;
}
This will always give you the first path parameter after shirt. If it is a word and does not start with a digit or any other non alphabetical character.
Also check this https://regexr.com/3o3jm
You can place your expected logic here to check the regex.

Thank you all for your time to help. At last I got an exact idea from all your inputs.
Here is a DEMO
private static String getAppNameFromURL(String userEnteredURL){
if(userEnteredURL!=null && !userEnteredURL.startsWith("/shirt")){
return null;
}
Pattern p = Pattern.compile("^/shirt/([^?]([^\\?/]+|.*))");
Matcher m = p.matcher(userEnteredURL);
if(m.find()){
return m.group(1);
}
return "";
}
Output:
empty string-->null
/-->null
/shirt--> empty string
/shirt/--> empty string
/shirt/abc-->abc
/shirt/abc/-->abc
/shirt/abc/xyz-->abc
/shirt/abc?x=y-->abc
/shirt/xyz/abc/pqr-->xyz
abc/something-->null
/abc/some-->null
abc/shirt/something-->null

Related

Java StringUtils.stripEnd with period, hyphen or underscore

I'm trying to strip trailing characters off of a string using StringUtils.stripEnd, and noticed if I try to strip "_FOO" from "FOO_FOO", this returns an empty string. For example,
import org.apache.commons.lang3.StringUtils;
public class StripTest {
public static void printStripped(String s1, String suffix){
String result = StringUtils.stripEnd(s1, suffix);
System.out.println(String.format("Stripping '%s' from %s --> %s", suffix, s1, result));
}
public static void main(String[] args) {
printStripped("FOO.BAR", ".BAR");
printStripped("BAR.BAR", ".BAR");
printStripped("FOO_BAR", "_BAR");
printStripped("BAR_BAR", "_BAR");
printStripped("FOO-BAR", "-BAR");
printStripped("BAR-BAR", "-BAR");
}
}
Which outputs
Stripping '.BAR' from FOO.BAR --> FOO
Stripping '.BAR' from BAR.BAR -->
Stripping '_BAR' from FOO_BAR --> FOO
Stripping '_BAR' from BAR_BAR -->
Stripping '-BAR' from FOO-BAR --> FOO
Stripping '-BAR' from BAR-BAR -->
Can someone explain this behavior? Didn't see any examples from docs of this case. Using Java 7.

Look at the documentation and examples present in the StringUtils Javadoc:
Strips any of a set of characters from the end of a String.
A null input String returns null. An empty string ("") input returns the empty string.
If the stripChars String is null, whitespace is stripped as defined by Character.isWhitespace(char).
StringUtils.stripEnd(null, *) = null
StringUtils.stripEnd("", *) = ""
StringUtils.stripEnd("abc", "") = "abc"
StringUtils.stripEnd("abc", null) = "abc"
StringUtils.stripEnd(" abc", null) = " abc"
StringUtils.stripEnd("abc ", null) = "abc"
StringUtils.stripEnd(" abc ", null) = " abc"
StringUtils.stripEnd(" abcyx", "xyz") = " abc"
StringUtils.stripEnd("120.00", ".0") = "12"
This is not what you want, as it will strip the SET of characters anywhere from the end. I believe you are looking for removeEnd(...)
Removes a substring only if it is at the end of a source string, otherwise returns the source string.
A null source string will return null. An empty ("") source string will return the empty string. A null search string will return the source string.
StringUtils.removeEnd(null, *) = null
StringUtils.removeEnd("", *) = ""
StringUtils.removeEnd(*, null) = *
StringUtils.removeEnd("www.domain.com", ".com.") = "www.domain.com"
StringUtils.removeEnd("www.domain.com", ".com") = "www.domain"
StringUtils.removeEnd("www.domain.com", "domain") = "www.domain.com"
StringUtils.removeEnd("abc", "") = "abc"
removeEnd(...) operates not a set of characters, but instead a substring, which is what you are trying to extract.

Parsing a String based on '$', as being used string seperator and may be used as the part of the value

"$1200-$2000$amol" -->{$1200-$2000, amol}.
"amol$$1200-$2000" -->{amol,$1200-$2000}.
"amol$1200-2000" -->{amol,1200-2000}.
"amol$$1200-$2000$patare" -->{amol,$1200-$2000,patare}.
"amol$$1200-$2000$patare$$12-$20" -->{amol,$1200-$2000,patare,$12-$20}.
Here, I am looking for the logic which will parse the string (left hand side) and result in a vector(right hand side). '$' is used as a seperator and '$' may be the part of the value for eg. second pattern "amol$$1200-$2000" here $ is seperator between "amol" and "$1200-$2000" as well as '$' is the part of value "$1200-$2000".
private Vector getTockensForLovValue(String lovValue) //...where lovValue is the string to be parsed {
int beginIndex = 0; Vector vector = new Vector();
while (beginIndex < lovValue.length())
{
int dollarIndex = lovValue.indexOf("$", beginIndex);
if (dollarIndex != -1)
{
String s1 = lovValue.substring(beginIndex, dollarIndex);
vector.add(s1);
beginIndex = dollarIndex + 1;
}
else
{
vector.add(lovValue.substring(beginIndex));
beginIndex = lovValue.length();
}
}
return vector;
}

UPDATE: The answer has been updated with an extended regex to also map amol$$1200$patare to {amol,$1200,patare} as request in comment.
You can use a regular expression to do this:
\$\d+(?:-\$\d+)?(?=\$|$)|[^$]+|(?<=^|\$)(?=\$|$)
It says: First try to match $99-$99 or $99. The match must be followed by $ or end-of-string.
If that fails, match a sequence of any character that is not $.
Also match the empty string between two $ or before leading $ or after trailing $.
When specified in a Java string literal, double the \.
private static List<String> parse(String input) {
Pattern p = Pattern.compile("\\$\\d+(?:-\\$\\d+)?(?=\\$|$)|[^$]+|(?<=^|\\$)(?=\\$|$)");
List<String> list = new ArrayList<>();
for (Matcher m = p.matcher(input); m.find(); )
list.add(m.group());
return list;
}
See regex101 for demo.
TEST
public static void main(String[] args) {
test("$1200-$2000$amol");
test("amol$$1200-$2000");
test("amol$1200-2000");
test("amol$$1200-$2000$patare");
test("amol$$1200-$2000$patare$$12-$20");
test("amol$$1200$patare");
test("$$$1200$$");
test("$$$1200x$$");
}
private static void test(String input) {
System.out.println(input + " --> " + parse(input));
}
OUTPUT
$1200-$2000$amol --> [$1200-$2000, amol]
amol$$1200-$2000 --> [amol, $1200-$2000]
amol$1200-2000 --> [amol, 1200-2000]
amol$$1200-$2000$patare --> [amol, $1200-$2000, patare]
amol$$1200-$2000$patare$$12-$20 --> [amol, $1200-$2000, patare, $12-$20]
amol$$1200$patare --> [amol, $1200, patare]
$$$1200$$ --> [, , $1200, , ]
$$$1200x$$ --> [, , , 1200x, , ]

Regex function rename file issue

I'm using following code to rename a file automatically:
public static String getNewNameForCopyFile(final String originalName, final boolean firstCall) {
if (firstCall) {
final Pattern p = Pattern.compile("(.*?)(\\..*)?");
final Matcher m = p.matcher(originalName);
if (m.matches()) { //group 1 is the name, group 2 is the extension
String name = m.group(1);
String extension = m.group(2);
if (extension == null) {
extension = "";
}
return name + "-Copy1" + extension;
} else {
throw new IllegalArgumentException();
}
} else {
final Pattern p = Pattern.compile("(.*?)(-Copy(\\d+))?(\\..*)?");
final Matcher m = p.matcher(originalName);
if (m.matches()) { //group 1 is the prefix, group 2 is the number, group 3 is the suffix
String prefix = m.group(1);
String numberMatch = m.group(3);
String suffix = m.group(4);
return prefix + "-Copy" + (numberMatch == null ? 1 : (Integer.parseInt(numberMatch) + 1)) + (suffix == null ? "" : suffix);
} else {
throw new IllegalArgumentException();
}
}
}
This works mostly only with following filename I'm having a problem and I don't know how to adapt my code:
test.abc.txt
The renamed file becomes 'test-Copy1.abc.txt' but should be 'test.abc-Copy1.txt'.
Do you know how can I achieve this with my method?

If I understand you correctly, you want to insert a copy number before the last dot ('.') in the file name if there is any, and instead you get insertion before the first dot. This arises because you are using a reluctant quantifier for the first group, and the second group is able to match a filename tail containing any number of dots. I think you will do better with this:
final Pattern p = Pattern.compile("(.*?)(\\.[^.]*)?");
Note that if it is present, the second group starts with a dot, but cannot contain other dots.

I think what you're trying to do is find the last '.' in the firstname, correct? I that case you need to use greedy matching .* (which matches as much as possible) instead of .*?
final Pattern p = Pattern.compile("(.*)(\\..*)")
You will need to handle the case with no dot seperately:
if (originalName.indexOf('.') == -1)
return originalName + "-Copy1"
Your other code

Replace environment variable place-holders with their actual value?

In my Application.properties file I am using a key and value like that
report.custom.templates.path=${CATALINA_HOME}\\\\Medic\\\\src\\\\main\\\\reports\\\\AllReports\\\\
I need to replace the ${CATALINA_HOME} with its actual path:
{CATALINA_HOME} = C:\Users\s57893\softwares\apache-tomcat-7.0.27
Here is my code:
public class ReadEnvironmentVar {
public static void main(String args[]) {
String path = getConfigBundle().getString("report.custom.templates.path");
System.out.println("Application Resources : " + path);
String actualPath = resolveEnvVars(path);
System.out.println("Actual Path : " + actualPath);
}
private static ResourceBundle getConfigBundle() {
return ResourceBundle.getBundle("medicweb");
}
private static String resolveEnvVars(String input) {
if (null == input) {
return null;
}
Pattern p = Pattern.compile("\\$\\{(\\w+)\\}|\\$(\\w+)");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String envVarName = null == m.group(1) ? m.group(2) : m.group(1);
String envVarValue = System.getenv(envVarName);
m.appendReplacement(sb, null == envVarValue ? "" : envVarValue);
}
m.appendTail(sb);
return sb.toString();
}
}
from my code I am getting the result as -
Actual Path :
C:Userss57893softwaresapache-tomcat-7.0.27\Medic\src\main\reports\AllReports\
but I need the result as -
Actual Path :
C:\Users\s57893\softwares\apache-tomcat-7.0.27\Medic\src\main\reports\AllReports\
Please send me one example?

Because of the way appendReplacement() works, you'll need to escape backslashes that you find in the environment variable. From the Javadocs:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
I would use:
m.appendReplacement(sb,
null == envVarValue ? "" : Matcher.quoteReplacement(envVarValue));

How can i match particular format in input using java.util.regex in java?

INPUT
Input can be in any of the form shown below with following mandatory content TXT{Any comma separated strings in any format}
String loginURL = "http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}";
String loginURL1 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}";
String loginURL2 = "http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}";
String loginURL3 = "http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}";
String loginURL4 = "http://ip:port/path?username=abcd&password={PASS}";
Required Output
1. OutputURL corresponding to loginURL.
String outputURL = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL1 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL2 = "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}";
String outputURL3 = "http://ip:port/path?";
String outputURL4 = "http://ip:port/path?username=abcd&password={PASS}";
2. Deleted pattern(if any)
String deletedPatteren = TXT{UE-IP,UE-Username,UE-Password}
My Attempts
String loginURLPattern = TXT+"\\{([\\w-,]*)\\}&*";
System.out.println("1. ");
getListOfTemplates(loginURL, loginURLPattern);
System.out.println();
System.out.println("2. ");
getListOfTemplates(loginURL1, loginURLPattern);
System.out.println();
private static void getListOfTemplates(String inputSequence,String pattern){
System.out.println("Input URL : " + inputSequence);
Matcher templateMatcher = Pattern.compile(pattern).matcher(inputSequence);
if (templateMatcher.find() && templateMatcher.group(1).length() > 0) {
System.out.println(templateMatcher.group(1));
System.out.println("OutputURL : " + templateMatcher.replaceAll(""));
}
}
OUTPUT obtained
1.
Input URL : http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}
UE-IP,UE-Username,UE-Password}&password={PASS
OutputURL : http://ip:port/path?username=abcd&location={LOCATION}&
2.
Input URL : http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}
UE-IP,UE-Username,UE-Password
OutputURL : http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&
DRAWBACK OF ABOVE PATTERN
If i add any String containing character like #,%,# in between TXT{} then my code breaks.
How can i achieve it using java.util.regex library so that user can input any comma separated String between TXT{Any Comma Separated Strings}.

I would recommend using Matcher.appendReplacement:
public static void main(final String[] args) throws Exception {
final String[] loginURLs = {
"http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}",
"http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}",
"http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}",
"http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}",
"http://ip:port/path?username=abcd&password={PASS}"};
final Pattern patt = Pattern.compile("(\\?)?&?(TXT\\{[^}]++})(&)?");
for (final String loginURL : loginURLs) {
System.out.printf("%1$-10s %2$s%n", "Processing", loginURL);
final StringBuffer sb = new StringBuffer();
final Matcher matcher = patt.matcher(loginURL);
while (matcher.find()) {
final String found = matcher.group(2);
System.out.printf("%1$-10s %2$s%n", "Found", found);
if (matcher.group(1) != null && matcher.group(3) != null) {
matcher.appendReplacement(sb, "$1");
} else {
matcher.appendReplacement(sb, "$3");
}
}
matcher.appendTail(sb);
System.out.printf("%1$-10s %2$s%n%n", "Processed", sb.toString());
}
}
Output:
Processing http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}
Processing http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}
Found TXT{UE-IP,UE-Username,UE-Password}
Processed http://ip:port/path
Processing http://ip:port/path?username=abcd&password={PASS}
Processed http://ip:port/path?username=abcd&password={PASS}
As you rightly point out, there are 3 possible cases:
"?{TEXT}&" -> "?"
"&{TEXT}&" -> "&"
"?{TEXT}" -> ""
So what we need to do is test for those cases in the regex. Here is the pattern:
(\\?)?&?(TXT\\{[^}]++})(&)?
Explanation:
(\\?)? optionally matches and captures a ?
&? optionally captures an &
(TXT\\{[^}]++}) matches and captures TXT, followed by {, followed by one or most not } (possessively), followed by } (closing brackets don't need to be escaped
(&)? optionally matches and captures a &
We have 3 groups:
potentially a ?
the required text
potentially an &
Now when we find a match we need to replace with the appropriate capture of case 1..3
if (matcher.group(1) != null && matcher.group(3) != null) {
matcher.appendReplacement(sb, "$1");
} else {
matcher.appendReplacement(sb, "$3");
}
If groups 1 and 3 are both present:
We must be in case 1; we must replace with "?" which is in group 1 so $1.
Otherwise we are in case 2 or 3:
In case 2 we need to replace with "&" and in 3 with "".
In case 2 group 3 will hold "&" and in case 3 it will hold "" so we can replace with $3 in both these cases.
Here I only capture the TXT{...} part using a match group. This means that although the leading ? or & is replaced it is not in the String found. I you only want the bit between {} then just move the parenthesis.
Note that I reuse the Pattern - you can also reuse the Matcher if performance is a concern. You should always reuse the Pattern as it is (very) expensive to create. Store it in a static final if you can - it's threadsafe, matchers are not. The usual way to do it is to store the Pattern in a static final and then reuse the Matcher in the context of a method.
Also, the use of Matcher.appendReplacement is much more efficient than your current approach as it only needs to process the input once. Your approach parses the string twice.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex to get string in between slash of a url - java

Related

Java StringUtils.stripEnd with period, hyphen or underscore

Parsing a String based on '$', as being used string seperator and may be used as the part of the value

Regex function rename file issue

Replace environment variable place-holders with their actual value?

How can i match particular format in input using java.util.regex in java?

Categories

Resources