Apache Common UrlValidator does not support unicode. alernative is avaliable? - java

i try to url validation.
but UrlValidator is does not support unicode.
here is code
public static boolean isValidHttpUrl(String url) {
String[] schemes = {"http", "https"};
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid(url)) {
System.out.println("url is valid");
return true;
}
System.out.println("url is invalid");
return false;
}
String url = "ftp://hi.com";
boolean isValid = isValidHttpUrl(url);
assertFalse(isValid);
url = "http:// hi.com";
isValid = isValidHttpUrl(url);
assertFalse(isValid);
url = "http://hi.com";
isValid = isValidHttpUrl(url);
assertTrue(isValid);
// this is problem... it's not true...
url = "http://안녕.com";
isValid = isValidHttpUrl(url);
assertTrue(isValid);
do you know any alternative url validator support unicode?
i add some case... http://seapy_hi.com is invalid. why?
underbar is valid domain why invalid?

It doesn't support IDN. You need to convert URL to Punycode first. Try this,
isValid = isValidHttpUrl(IDN.toASCII(url));

There may be a more recent RFC that supersedes this one, but technically speaking URLs do not suppor Unicode. RFC1738
The relevant section in particular:
No corresponding graphic US-ASCII:
URLs are written only with the
graphic printable characters of the
US-ASCII coded character set. The
octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F
and 7F hexadecimal represent
control characters; these must be
encoded.

As Kaerber mention in the comment to accepted answer - that one have a bug if the string starts with a scheme.
So here's my solution with fix of that:
public static String convertUnicodeURLToAscii(String url) throws URISyntaxException {
if(url == null) {
return null;
}
url = url.trim();
URI uri = new URI(url);
boolean includeScheme = true;
// URI needs a scheme to work properly with authority parsing
if(uri.getScheme() == null) {
uri = new URI("http://" + url);
includeScheme = false;
}
String scheme = uri.getScheme() != null ? uri.getScheme() + "://" : null;
String authority = uri.getRawAuthority() != null ? uri.getRawAuthority() : ""; // includes domain and port
String path = uri.getRawPath() != null ? uri.getRawPath() : "";
String queryString = uri.getRawQuery() != null ? "?" + uri.getRawQuery() : "";
String fragment = uri.getRawFragment() != null ? "#" + uri.getRawFragment() : "";
// Must convert domain to punycode separately from the path
url = (includeScheme ? scheme : "") + IDN.toASCII(authority) + path + queryString + fragment;
// Convert path from unicode to ascii encoding
return new URI(url).normalize().toASCIIString();
}

Related

How to ignore encoding certain characters in a url in java?

I have a url that looks like this: https://123.com/screen-shot-2021-02-25-at-7.31.10%2520PM.png
screen-shot-2021-02-25-at-7.31.10%2520PM.png is the file name and %25 is the encoded value for %
This gives me a 404. I need % to not be encoded. What is the proper way to ignore this when encoding a url using Google's UrlEscapers.urlFragmentEscaper().escape(); for Java other than using a replace() method?
Code for encoding:
private static String FILENAME_REGEX = ".*//?(.*)$";
private static Pattern FILENAME_PATTERN = Pattern.compile(FILENAME_REGEX);
public String sanitizedURL(#NonNull String url) throws URISyntaxException {
String contentUrl = url;
Matcher matcher = FILENAME_PATTERN.matcher(url);
if (matcher.matches()) {
String filename = matcher.group(1);
String encodedFilename = UrlEscapers.urlFragmentEscaper().escape(filename);
contentUrl = url.replace(filename, encodedFilename);
//contentUrl = contentUrl.replace("%25", "%");
}
// validate this is a good URI
URI uri = new URI(contentUrl);
return uri.toString();
}
Try UrlDecoder.decode(String s, String enc)
e.g.
jshell> URLDecoder.decode("https://123.com/screen-shot-2021-02-25-at-7.31.10%2520PM.png", "UTF-8")
$1 ==> "https://123.com/screen-shot-2021-02-25-at-7.31.10%20PM.png"

How to encode the arabic words in a Url [duplicate]

How do you encode a URL in Android?
I thought it was like this:
final String encodedURL = URLEncoder.encode(urlAsString, "UTF-8");
URL url = new URL(encodedURL);
If I do the above, the http:// in urlAsString is replaced by http%3A%2F%2F in encodedURL and then I get a java.net.MalformedURLException when I use the URL.
You don't encode the entire URL, only parts of it that come from "unreliable sources".
Java:
String query = URLEncoder.encode("apples oranges", Charsets.UTF_8.name());
String url = "http://stackoverflow.com/search?q=" + query;
Kotlin:
val query: String = URLEncoder.encode("apples oranges", Charsets.UTF_8.name())
val url = "http://stackoverflow.com/search?q=$query"
Alternatively, you can use Strings.urlEncode(String str) of DroidParts that doesn't throw checked exceptions.
Or use something like
String uri = Uri.parse("http://...")
.buildUpon()
.appendQueryParameter("key", "val")
.build().toString();
I'm going to add one suggestion here. You can do this which avoids having to get any external libraries.
Give this a try:
String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();
You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.
This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.
The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.
For android, I would use
String android.net.Uri.encode(String s)
Encodes characters in the given string as '%'-escaped octets using the UTF-8 scheme. Leaves letters ("A-Z", "a-z"), numbers ("0-9"), and unreserved characters ("_-!.~'()*") intact. Encodes all other characters.
Ex/
String urlEncoded = "http://stackoverflow.com/search?q=" + Uri.encode(query);
Also you can use this
private static final String ALLOWED_URI_CHARS = "##&=*+-_.,:!?()/~'%";
String urlEncoded = Uri.encode(path, ALLOWED_URI_CHARS);
it's the most simple method
try {
query = URLEncoder.encode(query, "utf-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
you can use below methods
public static String parseUrl(String surl) throws Exception
{
URL u = new URL(surl);
return new URI(u.getProtocol(), u.getAuthority(), u.getPath(), u.getQuery(), u.getRef()).toString();
}
or
public String parseURL(String url, Map<String, String> params)
{
Builder builder = Uri.parse(url).buildUpon();
for (String key : params.keySet())
{
builder.appendQueryParameter(key, params.get(key));
}
return builder.build().toString();
}
the second one is better than first.
Find Arabic chars and replace them with its UTF-8 encoding.
some thing like this:
for (int i = 0; i < urlAsString.length(); i++) {
if (urlAsString.charAt(i) > 255) {
urlAsString = urlAsString.substring(0, i) + URLEncoder.encode(urlAsString.charAt(i)+"", "UTF-8") + urlAsString.substring(i+1);
}
}
encodedURL = urlAsString;

Java convert string into url title characters only [duplicate]

How do you encode a URL in Android?
I thought it was like this:
final String encodedURL = URLEncoder.encode(urlAsString, "UTF-8");
URL url = new URL(encodedURL);
If I do the above, the http:// in urlAsString is replaced by http%3A%2F%2F in encodedURL and then I get a java.net.MalformedURLException when I use the URL.
You don't encode the entire URL, only parts of it that come from "unreliable sources".
Java:
String query = URLEncoder.encode("apples oranges", Charsets.UTF_8.name());
String url = "http://stackoverflow.com/search?q=" + query;
Kotlin:
val query: String = URLEncoder.encode("apples oranges", Charsets.UTF_8.name())
val url = "http://stackoverflow.com/search?q=$query"
Alternatively, you can use Strings.urlEncode(String str) of DroidParts that doesn't throw checked exceptions.
Or use something like
String uri = Uri.parse("http://...")
.buildUpon()
.appendQueryParameter("key", "val")
.build().toString();
I'm going to add one suggestion here. You can do this which avoids having to get any external libraries.
Give this a try:
String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();
You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.
This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.
The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.
For android, I would use
String android.net.Uri.encode(String s)
Encodes characters in the given string as '%'-escaped octets using the UTF-8 scheme. Leaves letters ("A-Z", "a-z"), numbers ("0-9"), and unreserved characters ("_-!.~'()*") intact. Encodes all other characters.
Ex/
String urlEncoded = "http://stackoverflow.com/search?q=" + Uri.encode(query);
Also you can use this
private static final String ALLOWED_URI_CHARS = "##&=*+-_.,:!?()/~'%";
String urlEncoded = Uri.encode(path, ALLOWED_URI_CHARS);
it's the most simple method
try {
query = URLEncoder.encode(query, "utf-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
you can use below methods
public static String parseUrl(String surl) throws Exception
{
URL u = new URL(surl);
return new URI(u.getProtocol(), u.getAuthority(), u.getPath(), u.getQuery(), u.getRef()).toString();
}
or
public String parseURL(String url, Map<String, String> params)
{
Builder builder = Uri.parse(url).buildUpon();
for (String key : params.keySet())
{
builder.appendQueryParameter(key, params.get(key));
}
return builder.build().toString();
}
the second one is better than first.
Find Arabic chars and replace them with its UTF-8 encoding.
some thing like this:
for (int i = 0; i < urlAsString.length(); i++) {
if (urlAsString.charAt(i) > 255) {
urlAsString = urlAsString.substring(0, i) + URLEncoder.encode(urlAsString.charAt(i)+"", "UTF-8") + urlAsString.substring(i+1);
}
}
encodedURL = urlAsString;

Validate and encode urls containing unicode characters in Java

I am working on an application in which we need to validate URLs , check if it started with http ( if not, prepend 'http') and finally encode them. My problem is urls we receive can contain all types of things - invalid / valid but not starting with http / already encoded / valid but containing spaces or unicode characters.
Currently I am using URLValidator class, but it does not validate spaces or unicode chars. Following is my code:
if (url != null && !url.trim().isEmpty()) {
url = URLDecoder.decode(url, "UTF-8");
if (!url.matches("^(https?)://.*$")) {
url = "http" + url;
}
UrlValidator validator = new UrlValidator();
if (url.contains("(")) {
if (validator.isValid(url.substring(0, url.indexOf("(")))) {
return getEncodedSiteUrl(url);
}
return null;
}
if (validator.isValid(url)) {
return getEncodedSiteUrl(url);
}
}
But this code filters out all valid urls that contain a space / unicode chars. I don't think I should use URLValidator looking at all the types of urls we get. Can anybody please help / guide me? Thank you.
Check this URL which has a method you may use.
public static boolean isURL(String url)
{
if (url == null) {
return false;
}
// Assigning the url format regular expression
String urlPattern = "^http(s{0,1})://[a-zA-Z0-9_/\\-\\.]+\\.([A-Za-z/]{2,5})[a-zA-Z0-9_/\\&\\?\\=\\-\\.\\~\\%]*";
return url.matches(urlPattern);
}

Append relative URL to java.net.URL

Provided I have a java.net.URL object, pointing to let's say
http://example.com/myItems or http://example.com/myItems/
Is there some helper somewhere to append some relative URL to this?
For instance append ./myItemId or myItemId to get :
http://example.com/myItems/myItemId
URL has a constructor that takes a base URL and a String spec.
Alternatively, java.net.URI adheres more closely to the standards, and has a resolve method to do the same thing. Create a URI from your URL using URL.toURI.
This one does not need any extra libs or code and gives the desired result:
//import java.net.URL;
URL url1 = new URL("http://petstore.swagger.wordnik.com/api/api-docs?foo=1&bar=baz");
URL url2 = new URL(url1.getProtocol(), url1.getHost(), url1.getPort(), url1.getPath() + "/pet" + "?" + url1.getQuery(), null);
System.out.println(url1);
System.out.println(url2);
This prints:
http://petstore.swagger.wordnik.com/api/api-docs?foo=1&bar=baz
http://petstore.swagger.wordnik.com/api/api-docs/pet?foo=1&bar=baz
The accepted answer only works if there is no path after the host (IMHO the accepted answer is wrong)
You can just use the URI class for this:
import java.net.URI;
import org.apache.http.client.utils.URIBuilder;
URI uri = URI.create("http://example.com/basepath/");
URI uri2 = uri.resolve("./relative");
// => http://example.com/basepath/relative
Note the trailing slash on the base path and the base-relative format of the segment that's being appended. You can also use the URIBuilder class from Apache HTTP client:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.3</version>
</dependency>
...
import java.net.URI;
import org.apache.http.client.utils.URIBuilder;
URI uri = URI.create("http://example.com/basepath");
URI uri2 = appendPath(uri, "relative");
// => http://example.com/basepath/relative
public URI appendPath(URI uri, String path) {
URIBuilder builder = new URIBuilder(uri);
builder.setPath(URI.create(builder.getPath() + "/").resolve("./" + path).getPath());
return builder.build();
}
Here is a helper function I've written to add to the url path:
public static URL concatenate(URL baseUrl, String extraPath) throws URISyntaxException,
MalformedURLException {
URI uri = baseUrl.toURI();
String newPath = uri.getPath() + '/' + extraPath;
URI newUri = uri.resolve(newPath);
return newUri.toURL();
}
I cannot believe how nasty URI.resolve() really is its full of nasty edge cases.
new URI("http://localhost:80").resolve("foo") => "http://localhost:80foo"
new URI("http://localhost:80").resolve("//foo") => "http://foo"
new URI("http://localhost:80").resolve(".//foo") => "http://foo"
The tidiest solution I have seen that handles these edge cases in an predictable way is:
URI addPath(URI uri, String path) {
String newPath;
if (path.startsWith("/")) newPath = path.replaceAll("//+", "/");
else if (uri.getPath().endsWith("/")) newPath = uri.getPath() + path.replaceAll("//+", "/");
else newPath = uri.getPath() + "/" + path.replaceAll("//+", "/");
return uri.resolve(newPath).normalize();
}
Results:
jshell> addPath(new URI("http://localhost"), "sub/path")
$3 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/"), "sub/path")
$4 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/"), "/sub/path")
$5 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/random-path"), "/sub/path")
$6 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/random-path"), "./sub/path")
$7 ==> http://localhost/random-path/sub/path
jshell> addPath(new URI("http://localhost/random-path"), "../sub/path")
$8 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost"), "../sub/path")
$9 ==> http://localhost/../sub/path
jshell> addPath(new URI("http://localhost/"), "//sub/path")
$10 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/"), "//sub/./path")
$11 ==> http://localhost/sub/path
I've searched far and wide for an answer to this question. The only implementation I can find is in the Android SDK: Uri.Builder. I've extracted it for my own purposes.
private String appendSegmentToPath(String path, String segment) {
if (path == null || path.isEmpty()) {
return "/" + segment;
}
if (path.charAt(path.length() - 1) == '/') {
return path + segment;
}
return path + "/" + segment;
}
This is where I found the source.
In conjunction with Apache URIBuilder, this is how I'm using it: builder.setPath(appendSegmentToPath(builder.getPath(), segment));
You can use URIBuilder and the method URI#normalize to avoid duplicate / in the URI:
URIBuilder uriBuilder = new URIBuilder("http://example.com/test");
URI uri = uriBuilder.setPath(uriBuilder.getPath() + "/path/to/add")
.build()
.normalize();
// expected : http://example.com/test/path/to/add
UPDATED
I believe this is the shortest solution:
URL url1 = new URL("http://domain.com/contextpath");
String relativePath = "/additional/relative/path";
URL concatenatedUrl = new URL(url1.toExternalForm() + relativePath);
Concatenate a relative path to a URI:
java.net.URI uri = URI.create("https://stackoverflow.com/questions")
java.net.URI res = uri.resolve(uri.getPath + "/some/path")
res will contain https://stackoverflow.com/questions/some/path
A pragmatical solution without any external libs is given below.
(Comment: After reading through all the answers given so far, I am really not happy with the solutions provided - especially as this question is eight years old. No solution does deal properly with queries, fragments and so on.)
Extension method on URL
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
class URLHelper {
public static URL appendRelativePathToURL(URL base, String relPath) {
/*
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
see https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
*/
try {
URI baseUri = base.toURI();
// cut initial slash of relative path
String relPathToAdd = relPath.startsWith("/") ? relPath.substring(1) : relPath;
// cut trailing slash of present path
String path = baseUri.getPath();
String pathWithoutTrailingSlash = path.endsWith("/") ? path.substring(0, path.length() - 1) : path;
return new URI(baseUri.getScheme(),
baseUri.getAuthority(),
pathWithoutTrailingSlash + "/" + relPathToAdd,
baseUri.getQuery(),
baseUri.getFragment()).toURL();
} catch (URISyntaxException e) {
throw new MalformedURLRuntimeException("Error parsing URI.", e);
} catch (MalformedURLException e) {
throw new MalformedURLRuntimeException("Malformed URL.", e);
}
}
public static class MalformedURLRuntimeException extends RuntimeException {
public MalformedURLRuntimeException(String msg, Throwable cause) {
super("Malformed URL: " + msg, cause);
}
}
}
Testing
private void demo() {
try {
URL coolURL = new URL("http://fun.de/path/a/b/c?query&another=3#asdf");
URL notSoCoolURL = new URL("http://fun.de/path/a/b/c/?query&another=3#asdf");
System.out.println(URLHelper.appendRelativePathToURL(coolURL, "d"));
System.out.println(URLHelper.appendRelativePathToURL(coolURL, "/d"));
System.out.println(URLHelper.appendRelativePathToURL(notSoCoolURL, "d"));
System.out.println(URLHelper.appendRelativePathToURL(notSoCoolURL, "/d"));
} catch (MalformedURLException e) {
e.printStackTrace();
}
}
On Android you can use android.net.Uri. The following allows to create an Uri.Builder from an existing URL as String and then append:
Uri.parse(baseUrl) // Create Uri from String
.buildUpon() // Creates a "Builder"
.appendEncodedPath("path/to/add")
.appendQueryParameter("at_ref", "123") // To add ?at_ref=123
.fragment("anker") // To add #anker
.build()
Note that appendEncodedPath doesn't expect a leading / and only contains a check if the "baseUrl" ends with one, otherwise one is added before the path.
According to the docs, this supports
Absolute hierarchical URI reference following the pattern
<scheme>://<authority><absolute path>?<query>#<fragment>
Relative URI with pattern
<relative or absolute path>?<query>#<fragment>
//<authority><absolute path>?<query>#<fragment>
Opaque URI with pattern
<scheme>:<opaque part>#<fragment>
Some examples using the Apache URIBuilder http://hc.apache.org/httpcomponents-client-4.3.x/httpclient/apidocs/org/apache/http/client/utils/URIBuilder.html:
Ex1:
String url = "http://example.com/test";
URIBuilder builder = new URIBuilder(url);
builder.setPath((builder.getPath() + "/example").replaceAll("//+", "/"));
System.out.println("Result 1 -> " + builder.toString());
Result 1 -> http://example.com/test/example
Ex2:
String url = "http://example.com/test";
URIBuilder builder = new URIBuilder(url);
builder.setPath((builder.getPath() + "///example").replaceAll("//+", "/"));
System.out.println("Result 2 -> " + builder.toString());
Result 2 -> http://example.com/test/example
My solution based on twhitbeck answer:
import java.net.URI;
import java.net.URISyntaxException;
public class URIBuilder extends org.apache.http.client.utils.URIBuilder {
public URIBuilder() {
}
public URIBuilder(String string) throws URISyntaxException {
super(string);
}
public URIBuilder(URI uri) {
super(uri);
}
public org.apache.http.client.utils.URIBuilder addPath(String subPath) {
if (subPath == null || subPath.isEmpty() || "/".equals(subPath)) {
return this;
}
return setPath(appendSegmentToPath(getPath(), subPath));
}
private String appendSegmentToPath(String path, String segment) {
if (path == null || path.isEmpty()) {
path = "/";
}
if (path.charAt(path.length() - 1) == '/' || segment.startsWith("/")) {
return path + segment;
}
return path + "/" + segment;
}
}
Test:
import org.junit.Test;
import static org.junit.Assert.assertEquals;
public class URIBuilderTest {
#Test
public void testAddPath() throws Exception {
String url = "http://example.com/test";
String expected = "http://example.com/test/example";
URIBuilder builder = new URIBuilder(url);
builder.addPath("/example");
assertEquals(expected, builder.toString());
builder = new URIBuilder(url);
builder.addPath("example");
assertEquals(expected, builder.toString());
builder.addPath("");
builder.addPath(null);
assertEquals(expected, builder.toString());
url = "http://example.com";
expected = "http://example.com/example";
builder = new URIBuilder(url);
builder.addPath("/");
assertEquals(url, builder.toString());
builder.addPath("/example");
assertEquals(expected, builder.toString());
}
}
Gist: https://gist.github.com/enginer/230e2dc2f1d213a825d5
I had some difficulty with the encoding of URI's. Appending was not working for me because it was of a content:// type and it was not liking the "/". This solution assumes no query, nor fragment(we are working with paths after all):
Kotlin code:
val newUri = Uri.parse(myUri.toString() + Uri.encode("/$relPath"))
Support for appending paths was added to URIBuilder in Apache HttpClient 5.1 with the appendPath method:
import org.apache.hc.core5.net.URIBuilder;
..
URI uri = new URIBuilder("https://stackoverflow.com/questions")
.appendPath("7498030")
.appendPath("append-relative-url")
.build();
// https://stackoverflow.com/questions/7498030/append-relative-url
Maven dependency:
<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
<artifactId>httpclient5</artifactId>
<version>5.1</version>
</dependency>
For android make sure you use .appendPath() from android.net.Uri
public String joinUrls(String baseUrl, String extraPath) {
try {
URI uri = URI.create(baseUrl+"/");//added additional slash in case there is no slash at either sides
URI newUri = uri.resolve(extraPath);
return newUri.toURL().toString();
} catch (IllegalArgumentException | MalformedURLException e) {
//exception
}
}
An handmade uri segments joiner
public static void main(String[] args) {
System.out.println(concatURISegments(
"http://abc/",
"/dfg/",
"/lmn",
"opq"
));
}
public static String concatURISegments(String... segmentArray) {
if (segmentArray.length == 0) {
return "";
} else if (segmentArray.length == 1) {
return segmentArray[0];
}
List<String> segmentList = new ArrayList<>();
for (String s : segmentArray) {
if (s != null && s.length() > 0) {
segmentList.add(s);
}
}
if (segmentList.size() == 0) {
return "";
} else if (segmentList.size() == 1) {
return segmentList.get(0);
}
StringBuilder sb = new StringBuilder();
sb.append(segmentList.get(0));
String prevS;
String currS;
boolean prevB;
boolean currB;
for (int i = 1; i < segmentList.size(); i++) {
prevS = segmentList.get(i - 1);
currS = segmentList.get(i);
prevB = prevS.endsWith("/");
currB = currS.startsWith("/");
if (!prevB && !currB) {
sb.append("/").append(currS);
} else if (prevB && currB) {
sb.append(currS.substring(1));
} else {
sb.append(currS);
}
}
return sb.toString();
}
This takes only one line, normalize() is your friend here, and always add an extra / inbetween the concatenation
When baseUrl ends with / the normalize() would remove the extra ones. If it doesn't end with / then we've covered it by adding one deliberately.
String unknownBaseUrl = "https://example.com/apples/";
String result = URI.create(unknownBaseUrl + "/" + "1209").normalize().toString();
System.out.println(result);
output:
https://example.com/apples/1209
Sample with many extra / will be normalized to a sane path as per the RFC 2396
String unknownBaseUrl = "https://example.com/apples/";
String result = URI.create(unknownBaseUrl + "/" + "/1209").normalize().toString();
System.out.println(result);
output:
https://example.com/apples/1209
To get around all the edge cases the best would be to combine two standard classes - UriBuilder from apache.httpclient and java.nio.file.Paths:
String rootUrl = "http://host:80/root/url";
String relativePath = "relative/path";
URIBuilder builder = new URIBuilder(rootUrl);
String combinedPath = Paths.get(builder.getPath(), relativePath).toString();
builder.setPath(combinedPath);
URL targetUrl = builder.build().toURL();
It results in: http://host:80/root/url/relative/path
This works with any number of leading and trailing / and also when / are absent.

Categories

Resources