Append relative URL to java.net.URL - java

Provided I have a java.net.URL object, pointing to let's say
http://example.com/myItems or http://example.com/myItems/
Is there some helper somewhere to append some relative URL to this?
For instance append ./myItemId or myItemId to get :
http://example.com/myItems/myItemId

URL has a constructor that takes a base URL and a String spec.
Alternatively, java.net.URI adheres more closely to the standards, and has a resolve method to do the same thing. Create a URI from your URL using URL.toURI.

This one does not need any extra libs or code and gives the desired result:
//import java.net.URL;
URL url1 = new URL("http://petstore.swagger.wordnik.com/api/api-docs?foo=1&bar=baz");
URL url2 = new URL(url1.getProtocol(), url1.getHost(), url1.getPort(), url1.getPath() + "/pet" + "?" + url1.getQuery(), null);
System.out.println(url1);
System.out.println(url2);
This prints:
http://petstore.swagger.wordnik.com/api/api-docs?foo=1&bar=baz
http://petstore.swagger.wordnik.com/api/api-docs/pet?foo=1&bar=baz
The accepted answer only works if there is no path after the host (IMHO the accepted answer is wrong)

You can just use the URI class for this:
import java.net.URI;
import org.apache.http.client.utils.URIBuilder;
URI uri = URI.create("http://example.com/basepath/");
URI uri2 = uri.resolve("./relative");
// => http://example.com/basepath/relative
Note the trailing slash on the base path and the base-relative format of the segment that's being appended. You can also use the URIBuilder class from Apache HTTP client:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.3</version>
</dependency>
...
import java.net.URI;
import org.apache.http.client.utils.URIBuilder;
URI uri = URI.create("http://example.com/basepath");
URI uri2 = appendPath(uri, "relative");
// => http://example.com/basepath/relative
public URI appendPath(URI uri, String path) {
URIBuilder builder = new URIBuilder(uri);
builder.setPath(URI.create(builder.getPath() + "/").resolve("./" + path).getPath());
return builder.build();
}

Here is a helper function I've written to add to the url path:
public static URL concatenate(URL baseUrl, String extraPath) throws URISyntaxException,
MalformedURLException {
URI uri = baseUrl.toURI();
String newPath = uri.getPath() + '/' + extraPath;
URI newUri = uri.resolve(newPath);
return newUri.toURL();
}

I cannot believe how nasty URI.resolve() really is its full of nasty edge cases.
new URI("http://localhost:80").resolve("foo") => "http://localhost:80foo"
new URI("http://localhost:80").resolve("//foo") => "http://foo"
new URI("http://localhost:80").resolve(".//foo") => "http://foo"
The tidiest solution I have seen that handles these edge cases in an predictable way is:
URI addPath(URI uri, String path) {
String newPath;
if (path.startsWith("/")) newPath = path.replaceAll("//+", "/");
else if (uri.getPath().endsWith("/")) newPath = uri.getPath() + path.replaceAll("//+", "/");
else newPath = uri.getPath() + "/" + path.replaceAll("//+", "/");
return uri.resolve(newPath).normalize();
}
Results:
jshell> addPath(new URI("http://localhost"), "sub/path")
$3 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/"), "sub/path")
$4 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/"), "/sub/path")
$5 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/random-path"), "/sub/path")
$6 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/random-path"), "./sub/path")
$7 ==> http://localhost/random-path/sub/path
jshell> addPath(new URI("http://localhost/random-path"), "../sub/path")
$8 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost"), "../sub/path")
$9 ==> http://localhost/../sub/path
jshell> addPath(new URI("http://localhost/"), "//sub/path")
$10 ==> http://localhost/sub/path
jshell> addPath(new URI("http://localhost/"), "//sub/./path")
$11 ==> http://localhost/sub/path

I've searched far and wide for an answer to this question. The only implementation I can find is in the Android SDK: Uri.Builder. I've extracted it for my own purposes.
private String appendSegmentToPath(String path, String segment) {
if (path == null || path.isEmpty()) {
return "/" + segment;
}
if (path.charAt(path.length() - 1) == '/') {
return path + segment;
}
return path + "/" + segment;
}
This is where I found the source.
In conjunction with Apache URIBuilder, this is how I'm using it: builder.setPath(appendSegmentToPath(builder.getPath(), segment));

You can use URIBuilder and the method URI#normalize to avoid duplicate / in the URI:
URIBuilder uriBuilder = new URIBuilder("http://example.com/test");
URI uri = uriBuilder.setPath(uriBuilder.getPath() + "/path/to/add")
.build()
.normalize();
// expected : http://example.com/test/path/to/add

UPDATED
I believe this is the shortest solution:
URL url1 = new URL("http://domain.com/contextpath");
String relativePath = "/additional/relative/path";
URL concatenatedUrl = new URL(url1.toExternalForm() + relativePath);

Concatenate a relative path to a URI:
java.net.URI uri = URI.create("https://stackoverflow.com/questions")
java.net.URI res = uri.resolve(uri.getPath + "/some/path")
res will contain https://stackoverflow.com/questions/some/path

A pragmatical solution without any external libs is given below.
(Comment: After reading through all the answers given so far, I am really not happy with the solutions provided - especially as this question is eight years old. No solution does deal properly with queries, fragments and so on.)
Extension method on URL
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
class URLHelper {
public static URL appendRelativePathToURL(URL base, String relPath) {
/*
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
see https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
*/
try {
URI baseUri = base.toURI();
// cut initial slash of relative path
String relPathToAdd = relPath.startsWith("/") ? relPath.substring(1) : relPath;
// cut trailing slash of present path
String path = baseUri.getPath();
String pathWithoutTrailingSlash = path.endsWith("/") ? path.substring(0, path.length() - 1) : path;
return new URI(baseUri.getScheme(),
baseUri.getAuthority(),
pathWithoutTrailingSlash + "/" + relPathToAdd,
baseUri.getQuery(),
baseUri.getFragment()).toURL();
} catch (URISyntaxException e) {
throw new MalformedURLRuntimeException("Error parsing URI.", e);
} catch (MalformedURLException e) {
throw new MalformedURLRuntimeException("Malformed URL.", e);
}
}
public static class MalformedURLRuntimeException extends RuntimeException {
public MalformedURLRuntimeException(String msg, Throwable cause) {
super("Malformed URL: " + msg, cause);
}
}
}
Testing
private void demo() {
try {
URL coolURL = new URL("http://fun.de/path/a/b/c?query&another=3#asdf");
URL notSoCoolURL = new URL("http://fun.de/path/a/b/c/?query&another=3#asdf");
System.out.println(URLHelper.appendRelativePathToURL(coolURL, "d"));
System.out.println(URLHelper.appendRelativePathToURL(coolURL, "/d"));
System.out.println(URLHelper.appendRelativePathToURL(notSoCoolURL, "d"));
System.out.println(URLHelper.appendRelativePathToURL(notSoCoolURL, "/d"));
} catch (MalformedURLException e) {
e.printStackTrace();
}
}

On Android you can use android.net.Uri. The following allows to create an Uri.Builder from an existing URL as String and then append:
Uri.parse(baseUrl) // Create Uri from String
.buildUpon() // Creates a "Builder"
.appendEncodedPath("path/to/add")
.appendQueryParameter("at_ref", "123") // To add ?at_ref=123
.fragment("anker") // To add #anker
.build()
Note that appendEncodedPath doesn't expect a leading / and only contains a check if the "baseUrl" ends with one, otherwise one is added before the path.
According to the docs, this supports
Absolute hierarchical URI reference following the pattern
<scheme>://<authority><absolute path>?<query>#<fragment>
Relative URI with pattern
<relative or absolute path>?<query>#<fragment>
//<authority><absolute path>?<query>#<fragment>
Opaque URI with pattern
<scheme>:<opaque part>#<fragment>

Some examples using the Apache URIBuilder http://hc.apache.org/httpcomponents-client-4.3.x/httpclient/apidocs/org/apache/http/client/utils/URIBuilder.html:
Ex1:
String url = "http://example.com/test";
URIBuilder builder = new URIBuilder(url);
builder.setPath((builder.getPath() + "/example").replaceAll("//+", "/"));
System.out.println("Result 1 -> " + builder.toString());
Result 1 -> http://example.com/test/example
Ex2:
String url = "http://example.com/test";
URIBuilder builder = new URIBuilder(url);
builder.setPath((builder.getPath() + "///example").replaceAll("//+", "/"));
System.out.println("Result 2 -> " + builder.toString());
Result 2 -> http://example.com/test/example

My solution based on twhitbeck answer:
import java.net.URI;
import java.net.URISyntaxException;
public class URIBuilder extends org.apache.http.client.utils.URIBuilder {
public URIBuilder() {
}
public URIBuilder(String string) throws URISyntaxException {
super(string);
}
public URIBuilder(URI uri) {
super(uri);
}
public org.apache.http.client.utils.URIBuilder addPath(String subPath) {
if (subPath == null || subPath.isEmpty() || "/".equals(subPath)) {
return this;
}
return setPath(appendSegmentToPath(getPath(), subPath));
}
private String appendSegmentToPath(String path, String segment) {
if (path == null || path.isEmpty()) {
path = "/";
}
if (path.charAt(path.length() - 1) == '/' || segment.startsWith("/")) {
return path + segment;
}
return path + "/" + segment;
}
}
Test:
import org.junit.Test;
import static org.junit.Assert.assertEquals;
public class URIBuilderTest {
#Test
public void testAddPath() throws Exception {
String url = "http://example.com/test";
String expected = "http://example.com/test/example";
URIBuilder builder = new URIBuilder(url);
builder.addPath("/example");
assertEquals(expected, builder.toString());
builder = new URIBuilder(url);
builder.addPath("example");
assertEquals(expected, builder.toString());
builder.addPath("");
builder.addPath(null);
assertEquals(expected, builder.toString());
url = "http://example.com";
expected = "http://example.com/example";
builder = new URIBuilder(url);
builder.addPath("/");
assertEquals(url, builder.toString());
builder.addPath("/example");
assertEquals(expected, builder.toString());
}
}
Gist: https://gist.github.com/enginer/230e2dc2f1d213a825d5

I had some difficulty with the encoding of URI's. Appending was not working for me because it was of a content:// type and it was not liking the "/". This solution assumes no query, nor fragment(we are working with paths after all):
Kotlin code:
val newUri = Uri.parse(myUri.toString() + Uri.encode("/$relPath"))

Support for appending paths was added to URIBuilder in Apache HttpClient 5.1 with the appendPath method:
import org.apache.hc.core5.net.URIBuilder;
..
URI uri = new URIBuilder("https://stackoverflow.com/questions")
.appendPath("7498030")
.appendPath("append-relative-url")
.build();
// https://stackoverflow.com/questions/7498030/append-relative-url
Maven dependency:
<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
<artifactId>httpclient5</artifactId>
<version>5.1</version>
</dependency>

For android make sure you use .appendPath() from android.net.Uri

public String joinUrls(String baseUrl, String extraPath) {
try {
URI uri = URI.create(baseUrl+"/");//added additional slash in case there is no slash at either sides
URI newUri = uri.resolve(extraPath);
return newUri.toURL().toString();
} catch (IllegalArgumentException | MalformedURLException e) {
//exception
}
}

An handmade uri segments joiner
public static void main(String[] args) {
System.out.println(concatURISegments(
"http://abc/",
"/dfg/",
"/lmn",
"opq"
));
}
public static String concatURISegments(String... segmentArray) {
if (segmentArray.length == 0) {
return "";
} else if (segmentArray.length == 1) {
return segmentArray[0];
}
List<String> segmentList = new ArrayList<>();
for (String s : segmentArray) {
if (s != null && s.length() > 0) {
segmentList.add(s);
}
}
if (segmentList.size() == 0) {
return "";
} else if (segmentList.size() == 1) {
return segmentList.get(0);
}
StringBuilder sb = new StringBuilder();
sb.append(segmentList.get(0));
String prevS;
String currS;
boolean prevB;
boolean currB;
for (int i = 1; i < segmentList.size(); i++) {
prevS = segmentList.get(i - 1);
currS = segmentList.get(i);
prevB = prevS.endsWith("/");
currB = currS.startsWith("/");
if (!prevB && !currB) {
sb.append("/").append(currS);
} else if (prevB && currB) {
sb.append(currS.substring(1));
} else {
sb.append(currS);
}
}
return sb.toString();
}

This takes only one line, normalize() is your friend here, and always add an extra / inbetween the concatenation
When baseUrl ends with / the normalize() would remove the extra ones. If it doesn't end with / then we've covered it by adding one deliberately.
String unknownBaseUrl = "https://example.com/apples/";
String result = URI.create(unknownBaseUrl + "/" + "1209").normalize().toString();
System.out.println(result);
output:
https://example.com/apples/1209
Sample with many extra / will be normalized to a sane path as per the RFC 2396
String unknownBaseUrl = "https://example.com/apples/";
String result = URI.create(unknownBaseUrl + "/" + "/1209").normalize().toString();
System.out.println(result);
output:
https://example.com/apples/1209

To get around all the edge cases the best would be to combine two standard classes - UriBuilder from apache.httpclient and java.nio.file.Paths:
String rootUrl = "http://host:80/root/url";
String relativePath = "relative/path";
URIBuilder builder = new URIBuilder(rootUrl);
String combinedPath = Paths.get(builder.getPath(), relativePath).toString();
builder.setPath(combinedPath);
URL targetUrl = builder.build().toURL();
It results in: http://host:80/root/url/relative/path
This works with any number of leading and trailing / and also when / are absent.

Related

System.out.format not working in java for loop

Below is the concerned code. Basically what the code is supposed to do is output the URL, name and version of each GitHub release defined by GetUpdateInfo.getInfo().
GetUpdateInfo.getInfo (NOTE Strings login, oauth and password omitted for security reasons.):
import java.util.List;
import org.kohsuke.github.*;
import org.apache.commons.lang3.ArrayUtils;
public class GetUpdateInfo {
public static getInfo() {
String version = "";
String url = "";
String[] urls = {};
String[] names = {};
String[] versions = {};
String[] releases = {};
GitHub github = GitHubBuilder.fromEnvironment(login, password, oauth).build();
//Get the repo name from the organization
GHOrganization gho = github.getOrganization("NuovoLauncher-Mods");
PagedIterable<GHRepository> repos = gho.listRepositories();
List<GHRepository> repos_list = repos.asList();
for(int i=0; i < repos_list.size(); i++) {
GHRepository repo_test = repos_list.get(i);
GHRelease latest = repo_test.getLatestRelease();
ArrayUtils.add(releases, latest.toString());
ArrayUtils.add(names, latest.getName());
ui.setName(names);
ui.setRelease(releases);
List<GHAsset> assets = latest.getAssets();
for( int x = 0; x < assets.size(); x++ ) {
GHAsset asset = assets.get(x);
url = asset.getBrowserDownloadUrl();
version = url.split("/")[7];
System.out.format("URL: %s, Name: %s, Latest Release: %s. Version %s\n", url, latest.getName(), latest, version);
ArrayUtils.add(urls, url);
ArrayUtils.add(versions, version);
ui.setURL(urls);
ui.setVersion(versions);
}
}
return ui;
}
public static void main(String[] args) throws Exception {
GetUpdateInfo.getInfo();
}
}
DownloadUpdate.runner:
public static void runner() throws Exception {
String system_type = System.getProperty("os.name");
File fpath = new File("");
UpdateInfo ui = GetUpdateInfo.getInfo();
for(int i = 0; i < ui.getName().length; i++) {
System.out.format("URL: %s, Name %s, Version, %s", ui.getURL()[i], ui.getName()[i], ui.getVersion()[i]);
System.out.format("Downloading %s-%s", ui.getName()[i], ui.getVersion()[i]);
System.out.print("\n");
if(system_type.equals("Linux")) {
fpath = new File(System.getProperty("user.home") + "/.minecraft/mods/" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
} else if(system_type.equals("Windows")) {
fpath = new File(System.getProperty("user.home") + "/AppData/Roaming/.minecraft/mods" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
} else {
fpath = new File(System.getProperty("user.home") + "/.minecraft/mods/" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
}
String url = ui.getURL()[i];
FileUtils.copyURLToFile(new URL(url), fpath);
}
}
public static void main(String[] args) throws Exception {
System.out.println("DEBUG START");
DownloadUpdate.runner();
}
}
Looking at the code, I cannot see a reason why the code is not outputting like expected; I am getting zero output on console, simply the line stating that the code is being executed. No exceptions either.
EDIT: variable ui is not being returned properly. For example, ui.getName[0] throws an ArrayIndexOutOfBoundsException, due to the length being zero. Seeing this, I now understand why the for loop isn't behaving as expected. Is this a scope issue? What am I missing here?
An obvious problem of your code is the use of ArrayUtils.add: you have to reassign its result to the input array, as you cannot modify arrays like lists in Java.
Use it like this:
releases = ArrayUtils.add(releases, latest.toString());
names = ArrayUtils.add(names, latest.getName());
and later in the for-loop:
urls = ArrayUtils.add(urls, url);
versions = ArrayUtils.add(versions, version);
Also you don't need to set the elements in each loop cycle to the result:
ui.setURL(urls);
ui.setVersion(versions);
Those would be sufficient once the for-loop has completed.
An alternative would be to first use List<String> instead of the arrays. If you have control over the UpdateInfo class, change it there to be lists too, otherwise create an array from the lists before you set it in UpdateInfo.
As a general advice I would recommend that you get rid of your static methods. Create instances and use your credentials (login, password, oauth) as member fields OR pass in even the whole GitHub instance. This way you would have an easier time writing proper tests.

How to ignore encoding certain characters in a url in java?

I have a url that looks like this: https://123.com/screen-shot-2021-02-25-at-7.31.10%2520PM.png
screen-shot-2021-02-25-at-7.31.10%2520PM.png is the file name and %25 is the encoded value for %
This gives me a 404. I need % to not be encoded. What is the proper way to ignore this when encoding a url using Google's UrlEscapers.urlFragmentEscaper().escape(); for Java other than using a replace() method?
Code for encoding:
private static String FILENAME_REGEX = ".*//?(.*)$";
private static Pattern FILENAME_PATTERN = Pattern.compile(FILENAME_REGEX);
public String sanitizedURL(#NonNull String url) throws URISyntaxException {
String contentUrl = url;
Matcher matcher = FILENAME_PATTERN.matcher(url);
if (matcher.matches()) {
String filename = matcher.group(1);
String encodedFilename = UrlEscapers.urlFragmentEscaper().escape(filename);
contentUrl = url.replace(filename, encodedFilename);
//contentUrl = contentUrl.replace("%25", "%");
}
// validate this is a good URI
URI uri = new URI(contentUrl);
return uri.toString();
}
Try UrlDecoder.decode(String s, String enc)
e.g.
jshell> URLDecoder.decode("https://123.com/screen-shot-2021-02-25-at-7.31.10%2520PM.png", "UTF-8")
$1 ==> "https://123.com/screen-shot-2021-02-25-at-7.31.10%20PM.png"

Java convert string into url title characters only [duplicate]

How do you encode a URL in Android?
I thought it was like this:
final String encodedURL = URLEncoder.encode(urlAsString, "UTF-8");
URL url = new URL(encodedURL);
If I do the above, the http:// in urlAsString is replaced by http%3A%2F%2F in encodedURL and then I get a java.net.MalformedURLException when I use the URL.
You don't encode the entire URL, only parts of it that come from "unreliable sources".
Java:
String query = URLEncoder.encode("apples oranges", Charsets.UTF_8.name());
String url = "http://stackoverflow.com/search?q=" + query;
Kotlin:
val query: String = URLEncoder.encode("apples oranges", Charsets.UTF_8.name())
val url = "http://stackoverflow.com/search?q=$query"
Alternatively, you can use Strings.urlEncode(String str) of DroidParts that doesn't throw checked exceptions.
Or use something like
String uri = Uri.parse("http://...")
.buildUpon()
.appendQueryParameter("key", "val")
.build().toString();
I'm going to add one suggestion here. You can do this which avoids having to get any external libraries.
Give this a try:
String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();
You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.
This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.
The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.
For android, I would use
String android.net.Uri.encode(String s)
Encodes characters in the given string as '%'-escaped octets using the UTF-8 scheme. Leaves letters ("A-Z", "a-z"), numbers ("0-9"), and unreserved characters ("_-!.~'()*") intact. Encodes all other characters.
Ex/
String urlEncoded = "http://stackoverflow.com/search?q=" + Uri.encode(query);
Also you can use this
private static final String ALLOWED_URI_CHARS = "##&=*+-_.,:!?()/~'%";
String urlEncoded = Uri.encode(path, ALLOWED_URI_CHARS);
it's the most simple method
try {
query = URLEncoder.encode(query, "utf-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
you can use below methods
public static String parseUrl(String surl) throws Exception
{
URL u = new URL(surl);
return new URI(u.getProtocol(), u.getAuthority(), u.getPath(), u.getQuery(), u.getRef()).toString();
}
or
public String parseURL(String url, Map<String, String> params)
{
Builder builder = Uri.parse(url).buildUpon();
for (String key : params.keySet())
{
builder.appendQueryParameter(key, params.get(key));
}
return builder.build().toString();
}
the second one is better than first.
Find Arabic chars and replace them with its UTF-8 encoding.
some thing like this:
for (int i = 0; i < urlAsString.length(); i++) {
if (urlAsString.charAt(i) > 255) {
urlAsString = urlAsString.substring(0, i) + URLEncoder.encode(urlAsString.charAt(i)+"", "UTF-8") + urlAsString.substring(i+1);
}
}
encodedURL = urlAsString;

URI - getHost returns null. Why?

Why is the 1st one returning null, while the 2nd one is returning mail.yahoo.com?
Isn't this weird? If not, what's the logic behind this behavior?
Is the underscore the culprit? Why?
public static void main(String[] args) throws Exception {
java.net.URI uri = new java.net.URI("http://broken_arrow.huntingtonhelps.com");
String host = uri.getHost();
System.out.println("Host = [" + host + "].");
uri = new java.net.URI("http://mail.yahoo.com");
host = uri.getHost();
System.out.println("Host = [" + host + "].");
}
As mentioned in the comments by #hsz it is a known bug.
But, let's debug and look inside the sources of URI class. The problem is inside the method:
private int parseHostname(int start, int n):
parsing first URI fails at lines if ((p < n) && !at(p, n, ':')) fail("Illegal character in hostname", p);
this is because _ symbol isn't foreseen inside the scan block, so it allows only alphas, digits and -symbol (L_ALPHANUM, H_ALPHANUM, L_DASH and H_DASH).
And yes, this is not yet fixed in Java 7.
It's because of underscore in base uri.
Just Remove underscore to check that out.It's working.
Like given below :
public static void main(String[] args) throws Exception {
java.net.URI uri = new java.net.URI("http://brokenarrow.huntingtonhelps.com");
String host = uri.getHost();
System.out.println("Host = [" + host + "].");
uri = new java.net.URI("http://mail.yahoo.com");
host = uri.getHost();
System.out.println("Host = [" + host + "].");
}
I don't think it's a bug in Java, I think Java is parsing hostnames correctly according to the spec, there are good explanations of the spec here: http://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names and here: http://www.netregister.biz/faqit.htm#1
Specifically hostnames MUST NOT contain underscores.
Consider using: new java.net.URL("http://broken_arrow.huntingtonhelps.com").getHost() instead. It has alternative parsing implementation. If you have an URI myUri instance, then call myUri.toURL().getHost().
I faced this URI issue in OpenJDK 1.8 and it worked fine with URL.
As mentioned, it is a known JVM bug.
Although, if you want to do an HTTP request to such a host, you still can try to use a workaround.
The main idea is to construct request basing on the IP, not on the 'wrong' hostname. But in that case you also need to add "Host" header to the request, with the correct (original) hostname.
1: Cut hostname from the URL (it's a rough example, you can use some more smart way):
int n = url.indexOf("://");
if (n > 0) { n += 3; } else { n = 0; }
int m = url.indexOf(":", n);
int k = url.indexOf("/", n);
if (-1 == m) { m = k; }
String hostHeader;
if (k > -1) {
hostHeader = url.substring(n, k);
} else {
hostHeader = url.substring(n);
}
String hostname;
if (m > -1) {
hostname = url.substring(n, m);
} else {
hostname = url.substring(n);
}
2: Get hostname's IP:
String IP = InetAddress.getByName(hostname).getHostAddress();
3: Construct new URL basing on the IP:
String newURL = url.substring(0, n) + IP + url.substring(m);
4: Now use an HTTP library for preparing request on the new URL (pseudocode):
HttpRequest req = ApacheHTTP.get(newUrl);
5: And now you should add "Host" header with the correct (original) hostname:
req.addHeader("Host", hostHeader);
6: Now you can do the request (pseudocode):
String resp = req.getResponse().asString();

Apache Common UrlValidator does not support unicode. alernative is avaliable?

i try to url validation.
but UrlValidator is does not support unicode.
here is code
public static boolean isValidHttpUrl(String url) {
String[] schemes = {"http", "https"};
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid(url)) {
System.out.println("url is valid");
return true;
}
System.out.println("url is invalid");
return false;
}
String url = "ftp://hi.com";
boolean isValid = isValidHttpUrl(url);
assertFalse(isValid);
url = "http:// hi.com";
isValid = isValidHttpUrl(url);
assertFalse(isValid);
url = "http://hi.com";
isValid = isValidHttpUrl(url);
assertTrue(isValid);
// this is problem... it's not true...
url = "http://안녕.com";
isValid = isValidHttpUrl(url);
assertTrue(isValid);
do you know any alternative url validator support unicode?
i add some case... http://seapy_hi.com is invalid. why?
underbar is valid domain why invalid?
It doesn't support IDN. You need to convert URL to Punycode first. Try this,
isValid = isValidHttpUrl(IDN.toASCII(url));
There may be a more recent RFC that supersedes this one, but technically speaking URLs do not suppor Unicode. RFC1738
The relevant section in particular:
No corresponding graphic US-ASCII:
URLs are written only with the
graphic printable characters of the
US-ASCII coded character set. The
octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F
and 7F hexadecimal represent
control characters; these must be
encoded.
As Kaerber mention in the comment to accepted answer - that one have a bug if the string starts with a scheme.
So here's my solution with fix of that:
public static String convertUnicodeURLToAscii(String url) throws URISyntaxException {
if(url == null) {
return null;
}
url = url.trim();
URI uri = new URI(url);
boolean includeScheme = true;
// URI needs a scheme to work properly with authority parsing
if(uri.getScheme() == null) {
uri = new URI("http://" + url);
includeScheme = false;
}
String scheme = uri.getScheme() != null ? uri.getScheme() + "://" : null;
String authority = uri.getRawAuthority() != null ? uri.getRawAuthority() : ""; // includes domain and port
String path = uri.getRawPath() != null ? uri.getRawPath() : "";
String queryString = uri.getRawQuery() != null ? "?" + uri.getRawQuery() : "";
String fragment = uri.getRawFragment() != null ? "#" + uri.getRawFragment() : "";
// Must convert domain to punycode separately from the path
url = (includeScheme ? scheme : "") + IDN.toASCII(authority) + path + queryString + fragment;
// Convert path from unicode to ascii encoding
return new URI(url).normalize().toASCIIString();
}

Categories

Resources