URI - getHost returns null. Why? - java

Why is the 1st one returning null, while the 2nd one is returning mail.yahoo.com?
Isn't this weird? If not, what's the logic behind this behavior?
Is the underscore the culprit? Why?
public static void main(String[] args) throws Exception {
java.net.URI uri = new java.net.URI("http://broken_arrow.huntingtonhelps.com");
String host = uri.getHost();
System.out.println("Host = [" + host + "].");
uri = new java.net.URI("http://mail.yahoo.com");
host = uri.getHost();
System.out.println("Host = [" + host + "].");
}

As mentioned in the comments by #hsz it is a known bug.
But, let's debug and look inside the sources of URI class. The problem is inside the method:
private int parseHostname(int start, int n):
parsing first URI fails at lines if ((p < n) && !at(p, n, ':')) fail("Illegal character in hostname", p);
this is because _ symbol isn't foreseen inside the scan block, so it allows only alphas, digits and -symbol (L_ALPHANUM, H_ALPHANUM, L_DASH and H_DASH).
And yes, this is not yet fixed in Java 7.

It's because of underscore in base uri.
Just Remove underscore to check that out.It's working.
Like given below :
public static void main(String[] args) throws Exception {
java.net.URI uri = new java.net.URI("http://brokenarrow.huntingtonhelps.com");
String host = uri.getHost();
System.out.println("Host = [" + host + "].");
uri = new java.net.URI("http://mail.yahoo.com");
host = uri.getHost();
System.out.println("Host = [" + host + "].");
}

I don't think it's a bug in Java, I think Java is parsing hostnames correctly according to the spec, there are good explanations of the spec here: http://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names and here: http://www.netregister.biz/faqit.htm#1
Specifically hostnames MUST NOT contain underscores.

Consider using: new java.net.URL("http://broken_arrow.huntingtonhelps.com").getHost() instead. It has alternative parsing implementation. If you have an URI myUri instance, then call myUri.toURL().getHost().
I faced this URI issue in OpenJDK 1.8 and it worked fine with URL.

As mentioned, it is a known JVM bug.
Although, if you want to do an HTTP request to such a host, you still can try to use a workaround.
The main idea is to construct request basing on the IP, not on the 'wrong' hostname. But in that case you also need to add "Host" header to the request, with the correct (original) hostname.
1: Cut hostname from the URL (it's a rough example, you can use some more smart way):
int n = url.indexOf("://");
if (n > 0) { n += 3; } else { n = 0; }
int m = url.indexOf(":", n);
int k = url.indexOf("/", n);
if (-1 == m) { m = k; }
String hostHeader;
if (k > -1) {
hostHeader = url.substring(n, k);
} else {
hostHeader = url.substring(n);
}
String hostname;
if (m > -1) {
hostname = url.substring(n, m);
} else {
hostname = url.substring(n);
}
2: Get hostname's IP:
String IP = InetAddress.getByName(hostname).getHostAddress();
3: Construct new URL basing on the IP:
String newURL = url.substring(0, n) + IP + url.substring(m);
4: Now use an HTTP library for preparing request on the new URL (pseudocode):
HttpRequest req = ApacheHTTP.get(newUrl);
5: And now you should add "Host" header with the correct (original) hostname:
req.addHeader("Host", hostHeader);
6: Now you can do the request (pseudocode):
String resp = req.getResponse().asString();

Related

System.out.format not working in java for loop

Below is the concerned code. Basically what the code is supposed to do is output the URL, name and version of each GitHub release defined by GetUpdateInfo.getInfo().
GetUpdateInfo.getInfo (NOTE Strings login, oauth and password omitted for security reasons.):
import java.util.List;
import org.kohsuke.github.*;
import org.apache.commons.lang3.ArrayUtils;
public class GetUpdateInfo {
public static getInfo() {
String version = "";
String url = "";
String[] urls = {};
String[] names = {};
String[] versions = {};
String[] releases = {};
GitHub github = GitHubBuilder.fromEnvironment(login, password, oauth).build();
//Get the repo name from the organization
GHOrganization gho = github.getOrganization("NuovoLauncher-Mods");
PagedIterable<GHRepository> repos = gho.listRepositories();
List<GHRepository> repos_list = repos.asList();
for(int i=0; i < repos_list.size(); i++) {
GHRepository repo_test = repos_list.get(i);
GHRelease latest = repo_test.getLatestRelease();
ArrayUtils.add(releases, latest.toString());
ArrayUtils.add(names, latest.getName());
ui.setName(names);
ui.setRelease(releases);
List<GHAsset> assets = latest.getAssets();
for( int x = 0; x < assets.size(); x++ ) {
GHAsset asset = assets.get(x);
url = asset.getBrowserDownloadUrl();
version = url.split("/")[7];
System.out.format("URL: %s, Name: %s, Latest Release: %s. Version %s\n", url, latest.getName(), latest, version);
ArrayUtils.add(urls, url);
ArrayUtils.add(versions, version);
ui.setURL(urls);
ui.setVersion(versions);
}
}
return ui;
}
public static void main(String[] args) throws Exception {
GetUpdateInfo.getInfo();
}
}
DownloadUpdate.runner:
public static void runner() throws Exception {
String system_type = System.getProperty("os.name");
File fpath = new File("");
UpdateInfo ui = GetUpdateInfo.getInfo();
for(int i = 0; i < ui.getName().length; i++) {
System.out.format("URL: %s, Name %s, Version, %s", ui.getURL()[i], ui.getName()[i], ui.getVersion()[i]);
System.out.format("Downloading %s-%s", ui.getName()[i], ui.getVersion()[i]);
System.out.print("\n");
if(system_type.equals("Linux")) {
fpath = new File(System.getProperty("user.home") + "/.minecraft/mods/" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
} else if(system_type.equals("Windows")) {
fpath = new File(System.getProperty("user.home") + "/AppData/Roaming/.minecraft/mods" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
} else {
fpath = new File(System.getProperty("user.home") + "/.minecraft/mods/" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
}
String url = ui.getURL()[i];
FileUtils.copyURLToFile(new URL(url), fpath);
}
}
public static void main(String[] args) throws Exception {
System.out.println("DEBUG START");
DownloadUpdate.runner();
}
}
Looking at the code, I cannot see a reason why the code is not outputting like expected; I am getting zero output on console, simply the line stating that the code is being executed. No exceptions either.
EDIT: variable ui is not being returned properly. For example, ui.getName[0] throws an ArrayIndexOutOfBoundsException, due to the length being zero. Seeing this, I now understand why the for loop isn't behaving as expected. Is this a scope issue? What am I missing here?
An obvious problem of your code is the use of ArrayUtils.add: you have to reassign its result to the input array, as you cannot modify arrays like lists in Java.
Use it like this:
releases = ArrayUtils.add(releases, latest.toString());
names = ArrayUtils.add(names, latest.getName());
and later in the for-loop:
urls = ArrayUtils.add(urls, url);
versions = ArrayUtils.add(versions, version);
Also you don't need to set the elements in each loop cycle to the result:
ui.setURL(urls);
ui.setVersion(versions);
Those would be sufficient once the for-loop has completed.
An alternative would be to first use List<String> instead of the arrays. If you have control over the UpdateInfo class, change it there to be lists too, otherwise create an array from the lists before you set it in UpdateInfo.
As a general advice I would recommend that you get rid of your static methods. Create instances and use your credentials (login, password, oauth) as member fields OR pass in even the whole GitHub instance. This way you would have an easier time writing proper tests.

UriComponents returns IP instead of domain

I'm weak in network technologies and maybe you can help me. I have a simple code
HttpServletRequest request = ((ServletRequestAttributes) RequestContextHolder.getRequestAttributes())
.getRequest();
UriComponents uriComponents = UriComponentsBuilder.fromHttpUrl(request.getRequestURL().toString()).build();
UriComponents newUriComponents = UriComponentsBuilder.newInstance().scheme(uriComponents.getScheme())
.host(uriComponents.getHost()).port(uriComponents.getPort()).build();
return newUriComponents.toUriString() + request.getContextPath();
This code should return link to my server with specific path. The problem is - on product server uriComponents.getHost() returns IP instead of domain name. Domain works when I go via browser to server. I can go to
http://exmaple.com/some/one/path and want to get in answer (in JSON, there are no redirections. just get request and json answer) - http://exmaple.com/some/another/path but code which I have showed returns - http://78.54.128.98.com/some/another/path (IP address just example). So I don't know why my code returns IP but not domain name. Only what I can to say more - in my local machine I don't have any problems with it. Code returns localhost, or if i add 127.0.0.1 exmaple.com to hosts file, my code will return correct exmaple.com, no any ip
This is not a problem of the URIComponents, it parses what it gets in input. More specifically looking at the source of UriComponentsBuilder.fromHttpUrl you see:
public static UriComponentsBuilder fromHttpUrl(String httpUrl) {
Assert.notNull(httpUrl, "HTTP URL must not be null");
Matcher matcher = HTTP_URL_PATTERN.matcher(httpUrl);
if (matcher.matches()) {
UriComponentsBuilder builder = new UriComponentsBuilder();
String scheme = matcher.group(1);
builder.scheme(scheme != null ? scheme.toLowerCase() : null);
builder.userInfo(matcher.group(4));
String host = matcher.group(5);
if (StringUtils.hasLength(scheme) && !StringUtils.hasLength(host)) {
throw new IllegalArgumentException("[" + httpUrl + "] is not a valid HTTP URL");
}
builder.host(host);
String port = matcher.group(7);
if (StringUtils.hasLength(port)) {
builder.port(port);
}
builder.path(matcher.group(8));
builder.query(matcher.group(10));
return builder;
}
else {
throw new IllegalArgumentException("[" + httpUrl + "] is not a valid HTTP URL");
}
}
where you can notice that a pattern matcher is defined on an expected structure of the url and parts are parsed according to the matcher. If you see IP it means that the url specified in input (request.getRequestURL().toString()) contained the IP address as a host.
This means that you should be looking for the guilty one above in the chain, starting by whoever calls this piece of code and following the links until you find the cause.

Apache Common UrlValidator does not support unicode. alernative is avaliable?

i try to url validation.
but UrlValidator is does not support unicode.
here is code
public static boolean isValidHttpUrl(String url) {
String[] schemes = {"http", "https"};
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid(url)) {
System.out.println("url is valid");
return true;
}
System.out.println("url is invalid");
return false;
}
String url = "ftp://hi.com";
boolean isValid = isValidHttpUrl(url);
assertFalse(isValid);
url = "http:// hi.com";
isValid = isValidHttpUrl(url);
assertFalse(isValid);
url = "http://hi.com";
isValid = isValidHttpUrl(url);
assertTrue(isValid);
// this is problem... it's not true...
url = "http://안녕.com";
isValid = isValidHttpUrl(url);
assertTrue(isValid);
do you know any alternative url validator support unicode?
i add some case... http://seapy_hi.com is invalid. why?
underbar is valid domain why invalid?
It doesn't support IDN. You need to convert URL to Punycode first. Try this,
isValid = isValidHttpUrl(IDN.toASCII(url));
There may be a more recent RFC that supersedes this one, but technically speaking URLs do not suppor Unicode. RFC1738
The relevant section in particular:
No corresponding graphic US-ASCII:
URLs are written only with the
graphic printable characters of the
US-ASCII coded character set. The
octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F
and 7F hexadecimal represent
control characters; these must be
encoded.
As Kaerber mention in the comment to accepted answer - that one have a bug if the string starts with a scheme.
So here's my solution with fix of that:
public static String convertUnicodeURLToAscii(String url) throws URISyntaxException {
if(url == null) {
return null;
}
url = url.trim();
URI uri = new URI(url);
boolean includeScheme = true;
// URI needs a scheme to work properly with authority parsing
if(uri.getScheme() == null) {
uri = new URI("http://" + url);
includeScheme = false;
}
String scheme = uri.getScheme() != null ? uri.getScheme() + "://" : null;
String authority = uri.getRawAuthority() != null ? uri.getRawAuthority() : ""; // includes domain and port
String path = uri.getRawPath() != null ? uri.getRawPath() : "";
String queryString = uri.getRawQuery() != null ? "?" + uri.getRawQuery() : "";
String fragment = uri.getRawFragment() != null ? "#" + uri.getRawFragment() : "";
// Must convert domain to punycode separately from the path
url = (includeScheme ? scheme : "") + IDN.toASCII(authority) + path + queryString + fragment;
// Convert path from unicode to ascii encoding
return new URI(url).normalize().toASCIIString();
}

How to identify the URL of an Java web application from within?

My Java web application contains a startup servlet. Its init() method is invoked, when the web application server (Tomcat) is started. Within this method I need the URL of my web application. Since there is no HttpServletRequest, how to get this information?
You can't. Because there is no "URL of an Java web application" as seen "from within". A servlet is not tied to an URL, that is done from the outside. (Perhaps you have a Apache server that connects to a Tomcat - Tomcat can't know about it)
It makes sense to ask a HttpServletRequest for its url, because we are speaking of the information of a event (the URL that was actually used to generate this request), it does not make sense to ask for a configuration URL.
A workaround could be to perform the initialization lazy when the first request arrives. You can implement a filter that do that once, e.g. by storing a boolean flag in a static variable and synchronizing access to the flag correctly. But it implies a little overhead because each subsequent request will go through the filter which then bypass the initialization. It was just a thought.
There is nothing in the servlet API that provides this information, plus any given resource may be bound to multiple URL's.
What you CAN do, is to inspect the servlet context when you receive an actual request and see what URL was used.
Here is how it works for me and probably for most configurations:
public static String getWebappUrl(ServletConfig servletConfig, boolean ssl) {
String protocol = ssl ? "https" : "http";
String host = getHostName();
String context = servletConfig.getServletContext().getServletContextName();
return protocol + "://" + host + "/" + context;
}
public static String getHostName() {
String[] hostnames = getHostNames();
if (hostnames.length == 0) return "localhost";
if (hostnames.length == 1) return hostnames[0];
for (int i = 0; i < hostnames.length; i++) {
if (!"localhost".equals(hostnames[i])) return hostnames[i];
}
return hostnames[0];
}
public static String[] getHostNames() {
String localhostName;
try {
localhostName = InetAddress.getLocalHost().getHostName();
} catch (UnknownHostException ex) {
return new String[] {"localhost"};
}
InetAddress ia[];
try {
ia = InetAddress.getAllByName(localhostName);
} catch (UnknownHostException ex) {
return new String[] {localhostName};
}
String[] sa = new String[ia.length];
for (int i = 0; i < ia.length; i++) {
sa[i] = ia[i].getHostName();
}
return sa;
}

Java: Common way to validate and convert "host:port" to InetSocketAddress?

What is the common way in Java to validate and convert a string of the form host:port into an instance of InetSocketAddress?
It would be nice if following criteria were met:
No address lookups;
Working for IPv4, IPv6, and "string" hostnames;
(For IPv4 it's ip:port, for IPv6 it's [ip]:port, right? Is there some RFC which defines all these schemes?)
Preferable without parsing the string by hand.
(I'm thinking about all those special cases, when someone think he knows all valid forms of socket addresses, but forgets about "that special case" which leads to unexpected results.)
I myself propose one possible workaround solution.
Convert a string into URI (this would validate it automatically) and then query the URI's host and port components.
Sadly, an URI with a host component MUST have a scheme. This is why this solution is "not perfect".
String string = ... // some string which has to be validated
try {
// WORKAROUND: add any scheme to make the resulting URI valid.
URI uri = new URI("my://" + string); // may throw URISyntaxException
String host = uri.getHost();
int port = uri.getPort();
if (uri.getHost() == null || uri.getPort() == -1) {
throw new URISyntaxException(uri.toString(),
"URI must have host and port parts");
}
// here, additional checks can be performed, such as
// presence of path, query, fragment, ...
// validation succeeded
return new InetSocketAddress (host, port);
} catch (URISyntaxException ex) {
// validation failed
}
This solution needs no custom string parsing, works with IPv4 (1.1.1.1:123), IPv6 ([::0]:123) and host names (my.host.com:123).
Accidentally, this solution is well suited for my scenario. I was going to use URI schemes anyway.
A regex will do this quite neatly:
Pattern p = Pattern.compile("^\\s*(.*?):(\\d+)\\s*$");
Matcher m = p.matcher("127.0.0.1:8080");
if (m.matches()) {
String host = m.group(1);
int port = Integer.parseInt(m.group(2));
}
You can this in many ways such as making the port optional or doing some validation on the host.
It doesn't answer the question exactly, but this answer could still be useful others like me who just want to parse a host and port, but not necessarily a full InetAddress. Guava has a HostAndPort class with a parseString method.
Another person has given a regex answer which is what I was doing to do when originally asking the question about hosts. I will still do because it's an example of a regex that is slightly more advanced and can help determine what kind of address you are dealing with.
String ipPattern = "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}):(\\d+)";
String ipV6Pattern = "\\[([a-zA-Z0-9:]+)\\]:(\\d+)";
String hostPattern = "([\\w\\.\\-]+):(\\d+)"; // note will allow _ in host name
Pattern p = Pattern.compile( ipPattern + "|" + ipV6Pattern + "|" + hostPattern );
Matcher m = p.matcher( someString );
if( m.matches() ) {
if( m.group(1) != null ) {
// group(1) IP address, group(2) is port
} else if( m.group(3) != null ) {
// group(3) is IPv6 address, group(4) is port
} else if( m.group(5) != null ) {
// group(5) is hostname, group(6) is port
} else {
// Not a valid address
}
}
Modifying so that port is optional is pretty straight forward. Wrap the ":(\d+)" as "(?::(\d+))?" and then check for null for group(2), etc.
Edit: I'll note that there's no "common way" way that I'm aware of but the above is how I'd do it if I had to.
Also note: the IPv4 case can be removed if the host and IPv4 cases will actually be handled the same. I split them out because sometimes you can avoid an ultimate host look-up if you know you have the IP address.
new InetSocketAddress(
addressString.substring(0, addressString.lastIndexOf(":")),
Integer.parseInt(addressString.substring(addressString.lastIndexOf(":")+1, addressString.length));
? I probably made some little silly mistake. and I'm assuming you just wanted a new InetSocketAddress object out of the String in only that format. host:port
All kind of peculiar hackery, and elegant but unsafe solutions provided elsewhere. Sometimes the inelegant brute-force solution is the way.
public static InetSocketAddress parseInetSocketAddress(String addressAndPort) throws IllegalArgumentException {
int portPosition = addressAndPort.length();
int portNumber = 0;
while (portPosition > 1 && Character.isDigit(addressAndPort.charAt(portPosition-1)))
{
--portPosition;
}
String address;
if (portPosition > 1 && addressAndPort.charAt(portPosition-1) == ':')
{
try {
portNumber = Integer.parseInt(addressAndPort.substring(portPosition));
} catch (NumberFormatException ignored)
{
throw new IllegalArgumentException("Invalid port number.");
}
address = addressAndPort.substring(0,portPosition-1);
} else {
portNumber = 0;
address = addressAndPort;
}
return new InetSocketAddress(address,portNumber);
}
The open-source IPAddress Java library has a HostName class which will do the required parsing. Disclaimer: I am the project manager of the IPAddress library.
It will parse IPv4, IPv6 and string host names with or without ports. It will handle all the various formats of hosts and addresses. BTW, there is no single RFC for this, there are a number of RFCs that apply in different ways.
String hostName = "[a:b:c:d:e:f:a:b]:8080";
String hostName2 = "1.2.3.4:8080";
String hostName3 = "a.com:8080";
try {
HostName host = new HostName(hostName);
host.validate();
InetSocketAddress address = host.asInetSocketAddress();
HostName host2 = new HostName(hostName2);
host2.validate();
InetSocketAddress address2 = host2.asInetSocketAddress();
HostName host3 = new HostName(hostName3);
host3.validate();
InetSocketAddress address3 = host3.asInetSocketAddress();
// use socket address
} catch (HostNameException e) {
String msg = e.getMessage();
// handle improperly formatted host name or address string
}
URI can accomplish this:
URI uri = new URI(null, "example.com:80", null, null, null);
Unfortunately, there's a bug in current OpenJDK (or in the documentation) where the authority isn't properly validated. The documentation states:
The resulting URI string is then parsed as if by invoking the URI(String) constructor and then invoking the parseServerAuthority() method upon the result
That call to parseServerAuthority just doesn't happen unfortunately so the real solution here that properly validates is:
URI uri = new URI(null, "example.com:80", null, null, null).parseServerAuthority();
then
InetSocketAddress address = new InetSocketAddress(uri.getHost(), uri.getPort());

Categories

Resources