URL valid characters. java to validate - java

a string like: 'www.test.com' is good.
a string like: 'www.888.com' is good.
a string like: 'stackoverflow.com' is good.
a string like: 'GOoGle.Com' is good.
why ? because those are valid urls. it does not necessarely matter if they have been registered or not.
now bad strings are:
'goog*d\x'
'manydots...com'
why because you can't register those urls.
if I have a string in java which is supposed to be a good url
what's the best way to validate it ?
thanks a lot

use UrlValidator from the Apache Commons library. Binary package: http://www.mirrorservice.org/sites/ftp.apache.org/commons/validator/binaries/commons-validator-1.3.1.zip (zip contains .jar files)
Example of usage (Construct a UrlValidator with valid schemes of "http", and "https"):
String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("ftp://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
prints "url is invalid"
If instead the default constructor is used.
UrlValidator urlValidator = new UrlValidator();
if (urlValidator.isValid("ftp://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
prints out "url is valid"

Those examples are hostnames. They're not valid URLs in themselves.
Hostnames are made of .-separated ‘labels’. Each label must be up to 63 characters of letters, digits and hyphens, but a hyphen must not be the first or last character. It is optional to follow the whole hostname with another dot.
You can match this with a pattern like (assuming case-insensitive):
([a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])(\.[a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])*\.?
However this matches strings like 1.2.3.4 as well, which although they technically could be host/domain names will actually act as direct IP addresses. You may want to allow that. If you do, you may also want to allow IPv6 addresses, which are colon-separated hex; when embedded in a URL, they also have square brackets around them.
And then of course there's IDNA. Nowadays, 例え.テスト is a valid IDNA domain name, corresponding to xn--r8jz45g.xn--zckzah. If you want to allow those you'll need some Unicode support.
Summary: it's quite a bit more difficult than you might think. And that's just hostnames. ‘Validating’ a whole URL is even more work. A simple regex isn't going to hack it. Use a pre-existing library.

I think that new URL(yourString) will do the trick: it is supposed to raise MalformedURLException if url is not compliant (actually on java api it says If the string specifies an unknown protocol, but you can try it anyway):
try
{
new URL(string);
} catch (MalformedURLException e) {
//do whatever
}

I also believe you can use the URL in java.net
URL url = new URL("www.google.com");
The api says
public URL(String spec) throws MalformedURLException
Parameters:
spec - the String to parse as a URL.
Throws:
MalformedURLException - If the string specifies an unknown protocol.
So an exception is thrown if the URL is invalid.

You can do this kind of "url validation" through Regular Expressions.
And here is where you can get some good URL regex's (so you don't have to write your own).

Related

Documenting JSON in URL not possible

In my Rest API it should be possible to retrieve data which is inside a bounding box. Because the bounding box has four coordinates I want to design the GET requests in such way, that they accept the bounding box as JSON. Therefore I need to be able to send and document JSON strings as URL parameter.
The test itself works, but I can not document these requests with Spring RestDocs (1.0.0.RC1). I reproduced the problem with a simpler method. See below:
#Test public void ping_username() throws Exception
{
String query = "name={\"user\":\"Müller\"}";
String encodedQuery = URLEncoder.encode(query, "UTF-8");
mockMvc.perform(get(URI.create("/ping?" + encodedQuery)))
.andExpect(status().isOk())
.andDo(document("ping_username"));
}
When I remove .andDo(document("ping_username")) the test passes.
Stacktrace:
java.lang.IllegalArgumentException: Illegal character in query at index 32: http://localhost:8080/ping?name={"user":"Müller"}
at java.net.URI.create(URI.java:852)
at org.springframework.restdocs.mockmvc.MockMvcOperationRequestFactory.createOperationRequest(MockMvcOperationRequestFactory.java:79)
at org.springframework.restdocs.mockmvc.RestDocumentationResultHandler.handle(RestDocumentationResultHandler.java:93)
at org.springframework.test.web.servlet.MockMvc$1.andDo(MockMvc.java:158)
at application.rest.RestApiTest.ping_username(RestApiTest.java:65)
After I received the suggestion to encode the URL I tried it, but the problem remains.
The String which is used to create the URI in my test is now /ping?name%3D%7B%22user%22%3A%22M%C3%BCller%22%7D.
I checked the class MockMvcOperationRequestFactory which appears in the stacktrace, and in line 79 the following code is executed:
URI.create(getRequestUri(mockRequest)
+ (StringUtils.hasText(queryString) ? "?" + queryString : ""))
The problem here is that a not encoded String is used (in my case http://localhost:8080/ping?name={"user":"Müller"}) and the creation of the URI fails.
Remark:
Andy Wilkinson's answer is the solution for the problem. Although I think that David Sinfield is right and JSONs should be avoided in the URL to keep it simple. For my bounding box I will use a comma separated string, as it is used in WMS 1.1: BBOX=x1,y1,x2,y2
You haven't mentioned the version of Spring REST Docs that you're using, but I would guess that the problem is with URIUtil. I can't tell for certain as I can't see where URIUtil is from.
Anyway, using the JDK's URLEncoder works for me with Spring REST Docs 1.0.0.RC1:
String query = "name={\"user\":\"Müller\"}";
String encodedQuery = URLEncoder.encode(query, "UTF-8");
mockMvc.perform(get(URI.create("/baz?" + encodedQuery)))
.andExpect(status().isOk())
.andDo(document("ping_username"));
You can then use URLDecoder.decode on the server side to get the original JSON:
URLDecoder.decode(request.getQueryString(), "UTF-8")
The problem is that URIs have to be encoded as ACII. And ü is not a valid ASCII character, so it must be escaped in the url with % escaping.
If you are using Tomcat, you can use URIEncoding="UTF-8" in the Connector element of the server.xml, to configure UTF-8 escaping as default. If you do this, ü will be automatically converted to %C3%BC, which is the ASCII representation of the \uC3BC Unicode code-point (which represents ü).
Edit: It seems that I have missed the exact point of the error, but it is still the same error. Curly braces are invalid in a URI. Only the following characters are acceptable according to RFC 3986:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]#!$&'()*+,;=%
So these must be escaped too.

Play framework WS url spaces

I have a problem calling WS.url() in play framework 2.3.3 with url containing spaces. All other characters all url encoded automatically but not spaces. When i try to change all spaces to "%20", WS convert it to "%2520" because of "%" character. With spaces i've got java.net.URISyntaxException: Illegal character in query. How can i handle this ?
part of the URL's query String:
&input=/mnt/mp3/music/folder/01 - 23.mp3
The code looks like this:
Promise<JsonNode> jsonPromise = WS.url(url).setAuth("", "cube", WSAuthScheme.BASIC).get().map(
new Function<WSResponse, JsonNode>() {
public JsonNode apply(WSResponse response) {
System.out.println(response.getBody());
JsonNode json = response.asJson();
return json;
}
}
);
You should "build" your URL based on the way java.net.URL(which Play! uses for it's WS) does it. WS.url() follows the same logic.
The use of URLEncoder/Decoder is recommended only for form data.
From JavaDoc:
"Note, the java.net.URI class does perform escaping of its component
fields in certain circumstances. The recommended way to manage the
encoding and decoding of URLs is to use java.net.URI, and to convert
between these two classes using toURI() and URI.toURL(). The
URLEncoder and URLDecoder classes can also be used, but only for HTML
form encoding, which is not the same as the encoding scheme defined
in RFC2396."
So, the solution is to use THIS:
WS.url(baseURL).setQueryString(yourQueryString);
Where:
baseURL is your scheme + host + path etc.
yourQueryString is... well, your query String, but WITHOUT the ?: input=/mnt/mp3/music/folder/01 - 23.mp3
Or, if you want to use a more flexible, programmatic approach, THIS:
WS.url(baseURL).setQueryParameter(param, value);
Where:
param is the parameter's name in the query String
value is the value of the parameter
If you want multiple parameters with values in your query you need to chain them by adding another .setQueryParameter(...). This implies that this approach is not very accomodating for complex, multi-parameter query Strings.
Cheers!
If you check the console you will find that the exception is : java.net.URISyntaxException: Illegal character in path at index ...
That's because play Java api uses java.net.URL (as you can see here in line 47).
You can use java.net.URLEncoder to encode your URL
WS.url("http://" + java.net.URLEncoder.encode("google.com/test me", "UTF-8"))
UPDATE
If you want an RFC 2396 compliant method you can do this :
java.net.URI u = new java.net.URI(null, null, "http://google.com/test me",null);
System.out.println("encoded url " + u.toASCIIString());

Get a specific string from a url

How to get the ip address/address from the url string using substring in java.
http://abc.com:8080/abc/abc?abc=abc
I want to show the output abc.com from the above url. how can I extract this from the url.
Below is my code, I have retrieved, but is it a good way?
String a = servlet.substring(servlet.indexOf(":")+1);
String b = a.substring(2,a.indexOf(":"));
System.out.println(a);
System.out.println(b);
String c = servlet.replace(b, "192.168.0.1");
System.out.println(c);
Use URL class:
URL url = new URL("http://abc.com:8080/abc/abc?abc=abc");
System.out.println(url.getHost());
Why not use the existing URL class and call getHost() ?
Gets the host name of this URL, if applicable. The format of the host
conforms to RFC 2732, i.e. for a literal IPv6 address, this method
will return the IPv6 address enclosed in square brackets ('[' and
']').
Note the other useful methods on this (getPort() etc.). It's worth using these existing utility classes rather than roll your own solution. It looks a simple solution but the existing utilities will cater for all the edge cases.

URL validation Without protocol

I Have used URLValidator class in java to validate URL . But I Want that if user won't give any protocol in URL then also the validation should be returned as valid.
Explaned Correctly: If this is supplied in URL "http://www.google.com" then also it should be a valid URL and if "www.google.com" is supplied then also the validation should returned as valid URL.
I have tried a lot .Please Help me in this.
Thanks In Advance.
check if that works for you:
boolean foundMatch = false;
try {
Pattern regex = Pattern.compile("\\b(?:(https?|ftp|file)://|www\\.)?[-A-Z0-9+&#/%?=~_|$!:,.;]*[A-Z0-9+&##/%=~_|$]\\.[-A-Z0-9+&##/%?=~_|$!:,.;]*[A-Z0-9+&##/%=~_|$]", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.matches();
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
The best thing to do would be to prepend http (because that is the default protocol for urls) if there is no protocol and then validate the url.
You requirement is very specific and cannot be generalized. The URLValidator you are using validates correctly. The presence of a protocol string (like http://) is basic requirement for a URL to be valid. For your needs, you can do following
Create a custom URL validator (either by extending URLValidator or use composition).
Inside its validate method, check if the URL has protocol string in the beginning or not. If its not present add it and invoke the URLValidator to actually validate the URL.

How to encode URL to avoid special characters in Java? [duplicate]

This question already has answers here:
HTTP URL Address Encoding in Java
(24 answers)
Closed 5 years ago.
i need java code to encode URL to avoid special characters such as spaces and % and & ...etc
URL construction is tricky because different parts of the URL have different rules for what characters are allowed: for example, the plus sign is reserved in the query component of a URL because it represents a space, but in the path component of the URL, a plus sign has no special meaning and spaces are encoded as "%20".
RFC 2396 explains (in section 2.4.2) that a complete URL is always in its encoded form: you take the strings for the individual components (scheme, authority, path, etc.), encode each according to its own rules, and then combine them into the complete URL string. Trying to build a complete unencoded URL string and then encode it separately leads to subtle bugs, like spaces in the path being incorrectly changed to plus signs (which an RFC-compliant server will interpret as real plus signs, not encoded spaces).
In Java, the correct way to build a URL is with the URI class. Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server. To decode a URL, construct a URI object using the single-string constructor and then use the accessor methods (such as getPath()) to retrieve the decoded components.
Don't use the URLEncoder class! Despite the name, that class actually does HTML form encoding, not URL encoding. It's not correct to concatenate unencoded strings to make an "unencoded" URL and then pass it through a URLEncoder. Doing so will result in problems (particularly the aforementioned one regarding spaces and plus signs in the path).
I also spent quite some time with this issue, so that's my solution:
String urlString2Decode = "http://www.test.com/äüö/path with blanks/";
String decodedURL = URLDecoder.decode(urlString2Decode, "UTF-8");
URL url = new URL(decodedURL);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String decodedURLAsString = uri.toASCIIString();
If you don't want to do it manually use Apache Commons - Codec library. The class you are looking at is: org.apache.commons.codec.net.URLCodec
String final url = "http://www.google.com?...."
String final urlSafe = org.apache.commons.codec.net.URLCodec.encode(url);
Here is my solution which is pretty easy:
Instead of encoding the url itself i encoded the parameters that I was passing because the parameter was user input and the user could input any unexpected string of special characters so this worked for me fine :)
String review="User input"; /*USER INPUT AS STRING THAT WILL BE PASSED AS PARAMTER TO URL*/
try {
review = URLEncoder.encode(review,"utf-8");
review = review.replace(" " , "+");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
String URL = "www.test.com/test.php"+"?user_review="+review;
I would echo what Wyzard wrote but add that:
for query parameters, HTML encoding is often exactly what the server is expecting; outside these, it is correct that URLEncoder should not be used
the most recent URI spec is RFC 3986, so you should refer to that as a primary source
I wrote a blog post a while back about this subject: Java: safe character handling and URL building

Categories

Resources