Java - Convert String to valid URI object - java

I am trying to get a java.net.URI object from a String. The string has some characters which will need to be replaced by their percentage escape sequences. But when I use URLEncoder to encode the String with UTF-8 encoding, even the / are replaced with their escape sequences.
How can I get a valid encoded URL from a String object?
http://www.google.com?q=a b gives http%3A%2F%2www.google.com... whereas I want the output to be http://www.google.com?q=a%20b
Can someone please tell me how to achieve this.
I am trying to do this in an Android app. So I have access to a limited number of libraries.

You might try: org.apache.commons.httpclient.util.URIUtil.encodeQuery in Apache commons-httpclient project
Like this (see URIUtil):
URIUtil.encodeQuery("http://www.google.com?q=a b")
will become:
http://www.google.com?q=a%20b
You can of course do it yourself, but URI parsing can get pretty messy...

Android has always had the Uri class as part of the SDK:
http://developer.android.com/reference/android/net/Uri.html
You can simply do something like:
String requestURL = String.format("http://www.example.com/?a=%s&b=%s", Uri.encode("foo bar"), Uri.encode("100% fubar'd"));

I'm going to add one suggestion here aimed at Android users. You can do this which avoids having to get any external libraries. Also, all the search/replace characters solutions suggested in some of the answers above are perilous and should be avoided.
Give this a try:
String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();
You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.
This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.
The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.

Even if this is an old post with an already accepted answer, I post my alternative answer because it works well for the present issue and it seems nobody mentioned this method.
With the java.net.URI library:
URI uri = URI.create(URLString);
And if you want a URL-formatted string corresponding to it:
String validURLString = uri.toASCIIString();
Unlike many other methods (e.g. java.net.URLEncoder) this one replaces only unsafe ASCII characters (like ç, é...).
In the above example, if URLString is the following String:
"http://www.domain.com/façon+word"
the resulting validURLString will be:
"http://www.domain.com/fa%C3%A7on+word"
which is a well-formatted URL.

If you don't like libraries, how about this?
Note that you should not use this function on the whole URL, instead you should use this on the components...e.g. just the "a b" component, as you build up the URL - otherwise the computer won't know what characters are supposed to have a special meaning and which ones are supposed to have a literal meaning.
/** Converts a string into something you can safely insert into a URL. */
public static String encodeURIcomponent(String s)
{
StringBuilder o = new StringBuilder();
for (char ch : s.toCharArray()) {
if (isUnsafe(ch)) {
o.append('%');
o.append(toHex(ch / 16));
o.append(toHex(ch % 16));
}
else o.append(ch);
}
return o.toString();
}
private static char toHex(int ch)
{
return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
}
private static boolean isUnsafe(char ch)
{
if (ch > 128 || ch < 0)
return true;
return " %$&+,/:;=?#<>#%".indexOf(ch) >= 0;
}

You can use the multi-argument constructors of the URI class. From the URI javadoc:
The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character ('%') is always quoted by these constructors. Any other characters are preserved.
So if you use
URI uri = new URI("http", "www.google.com?q=a b");
Then you get http:www.google.com?q=a%20b which isn't quite right, but it's a little closer.
If you know that your string will not have URL fragments (e.g. http://example.com/page#anchor), then you can use the following code to get what you want:
String s = "http://www.google.com?q=a b";
String[] parts = s.split(":",2);
URI uri = new URI(parts[0], parts[1], null);
To be safe, you should scan the string for # characters, but this should get you started.

I had similar problems for one of my projects to create a URI object from a string. I couldn't find any clean solution either. Here's what I came up with :
public static URI encodeURL(String url) throws MalformedURLException, URISyntaxException
{
URI uriFormatted = null;
URL urlLink = new URL(url);
uriFormatted = new URI("http", urlLink.getHost(), urlLink.getPath(), urlLink.getQuery(), urlLink.getRef());
return uriFormatted;
}
You can use the following URI constructor instead to specify a port if needed:
URI uri = new URI(scheme, userInfo, host, port, path, query, fragment);

Well I tried using
String converted = URLDecoder.decode("toconvert","UTF-8");
I hope this is what you were actually looking for?

The java.net blog had a class the other day that might have done what you want (but it is down right now so I cannot check).
This code here could probably be modified to do what you want:
http://svn.apache.org/repos/asf/incubator/shindig/trunk/java/common/src/main/java/org/apache/shindig/common/uri/UriBuilder.java
Here is the one I was thinking of from java.net: https://urlencodedquerystring.dev.java.net/

Or perhaps you could use this class:
http://developer.android.com/reference/java/net/URLEncoder.html
Which is present in Android since API level 1.
Annoyingly however, it treats spaces specially (replacing them with + instead of %20). To get round this we simply use this fragment:
URLEncoder.encode(value, "UTF-8").replace("+", "%20");

I ended up using the httpclient-4.3.6:
import org.apache.http.client.utils.URIBuilder;
public static void main (String [] args) {
URIBuilder uri = new URIBuilder();
uri.setScheme("http")
.setHost("www.example.com")
.setPath("/somepage.php")
.setParameter("username", "Hello Günter")
.setParameter("p1", "parameter 1");
System.out.println(uri.toString());
}
Output will be:
http://www.example.com/somepage.php?username=Hello+G%C3%BCnter&p1=paramter+1

Related

How to include a ? in Java 11 HTTP Client URL? [duplicate]

Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.
URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding
I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.
Guava 15 has now added a set of straightforward URL escapers.
The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامه‌ای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.
Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()
The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil
Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}
In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri
In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.
Use this:
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);

ISO-8858-1 to UTF-8 only in URL, only invalid characters

Problem: sometimes we are getting links/phrases with invalid(for us) encoding.
Examples and my first solution below
Description:
I have to fix invalid encoded strings in one part of the application. Sometimes it is a word or phrase, but somtimes also a url. When its a URL I would like to change only wrongly encoded characters. If I decode with ISO and encode to UTF-8 the special url characters are also encoded (/ : ? = &). I coded a solution, which is working for my cases just fine, but those hashes you will see below are smelling badly to me.
Do you had a similar problem or do you know a library which allows to decode a phrase except some characters? Something like this:
decode(String value, char[] ignored)
I also though about braking URL into pieces and fix only path and query but it would be even more mess with parsing them etc..
TLDR: Decode ISO-8858-1 encoded URL and encode it to UTF-8. Dont touch URL specific characters (/ ? = : &)
Input/Output examples:
// wrong input
"http://some.url/xxx/a/%e4t%fcr%E4/b/%e4t%fcr%E4"
"t%E9l%E9phone"
// good output
"http://some.url/xxx/a/%C3%A4t%C3%BCr%C3%A4/b/%C3%A4t%C3%BCr%C3%A4"
"t%C3%A9l%C3%A9phone"
// very wrong output
"http%3A%2F%2Fsome.url%2Fxxx%2Fa%2F%C3%A4t%C3%BCr%C3%A4%2Fb%2F%C3%A4t%C3%BCr%C3%A4"
My first solution:
class EncodingFixer {
private static final String SLASH_HASH = UUID.randomUUID().toString();
private static final String QUESTION_HASH = UUID.randomUUID().toString();
private static final String EQUALS_HASH = UUID.randomUUID().toString();
private static final String AND_HASH = UUID.randomUUID().toString();
private static final String COLON_HASH = UUID.randomUUID().toString();
EncodingFixer() {
}
String fix(String value) {
if (isBlank(value)) {
return value;
}
return tryFix(value);
}
private String tryFix(String str) {
try {
String replaced = replaceWithHashes(str);
String fixed = java.net.URLEncoder.encode(java.net.URLDecoder.decode(replaced, ISO_8859_1), UTF_8);
return replaceBack(fixed);
} catch (Exception e) {
return str;
}
}
private String replaceWithHashes(String str) {
return str
.replaceAll("/", SLASH_HASH)
.replaceAll("\\?", QUESTION_HASH)
.replaceAll("=", EQUALS_HASH)
.replaceAll("&", AND_HASH)
.replaceAll(":", COLON_HASH);
}
private String replaceBack(String fixed) {
return fixed
.replaceAll(SLASH_HASH, "/")
.replaceAll(QUESTION_HASH, "?")
.replaceAll(EQUALS_HASH, "=")
.replaceAll(AND_HASH, "&")
.replaceAll(COLON_HASH, ":");
}
}
Or it should be more like: ???
Check if input is an URL
Create URL
Get path
Split by /
Fix every part
Put it back together
Same for query but little more complicated
??
I also though about it but it seems even more messy than those replaceAlls above :/
If you are able to recognize clearly that some string is an URL, then following user's #jschnasse answer in similar question on SO, this might be the solution you need:
URL url= new URL("http://some.url/xxx/a/%e4t%fcr%E4/b/%e4t%fcr%E4");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL=uri.toASCIIString();
System.out.println(correctEncodedURL);
outputs:
http://some.url/xxx/a/%25e4t%25fcr%25E4/b/%25e4t%25fcr%25E4

java.net.MalformedURLException: no protocol on URL based on a string modified with URLEncoder

So I was attempting to use this String in a URL :-
http://site-test.com/Meetings/IC/DownloadDocument?meetingId=c21c905c-8359-4bd6-b864-844709e05754&itemId=a4b724d1-282e-4b36-9d16-d619a807ba67&file=\\s604132shvw140\Test-Documents\c21c905c-8359-4bd6-b864-844709e05754_attachments\7e89c3cb-ce53-4a04-a9ee-1a584e157987\myDoc.pdf
In this code: -
String fileToDownloadLocation = //The above string
URL fileToDownload = new URL(fileToDownloadLocation);
HttpGet httpget = new HttpGet(fileToDownload.toURI());
But at this point I get the error: -
java.net.URISyntaxException: Illegal character in query at index 169:Blahblahblah
I realised with a bit of googling this was due to the characters in the URL (guessing the &), so I then added in some code so it now looks like so: -
String fileToDownloadLocation = //The above string
fileToDownloadLocation = URLEncoder.encode(fileToDownloadLocation, "UTF-8");
URL fileToDownload = new URL(fileToDownloadLocation);
HttpGet httpget = new HttpGet(fileToDownload.toURI());
However, when I try and run this I get an error when I try and create the URL, the error then reads: -
java.net.MalformedURLException: no protocol: http%3A%2F%2Fsite-test.testsite.com%2FMeetings%2FIC%2FDownloadDocument%3FmeetingId%3Dc21c905c-8359-4bd6-b864-844709e05754%26itemId%3Da4b724d1-282e-4b36-9d16-d619a807ba67%26file%3D%5C%5Cs604132shvw140%5CTest-Documents%5Cc21c905c-8359-4bd6-b864-844709e05754_attachments%5C7e89c3cb-ce53-4a04-a9ee-1a584e157987%myDoc.pdf
It looks like I can't do the encoding until after I've created the URL else it replaces slashes and things which it shouldn't, but I can't see how I can create the URL with the string and then format it so its suitable for use. I'm not particularly familiar with all this and was hoping someone might be able to point out to me what I'm missing to get string A into a suitably formatted URL to then use with the correct characters replaced?
Any suggestions greatly appreciated!
You need to encode your parameter's values before concatenating them to URL.
Backslash \ is special character which have to be escaped as %5C
Escaping example:
String paramValue = "param\\with\\backslash";
String yourURLStr = "http://host.com?param=" + java.net.URLEncoder.encode(paramValue, "UTF-8");
java.net.URL url = new java.net.URL(yourURLStr);
The result is http://host.com?param=param%5Cwith%5Cbackslash which is properly formatted url string.
I have the same problem, i read the url with an properties file:
String configFile = System.getenv("system.Environment");
if (configFile == null || "".equalsIgnoreCase(configFile.trim())) {
configFile = "dev.properties";
}
// Load properties
Properties properties = new Properties();
properties.load(getClass().getResourceAsStream("/" + configFile));
//read url from file
apiUrl = properties.getProperty("url").trim();
URL url = new URL(apiUrl);
//throw exception here
URLConnection conn = url.openConnection();
dev.properties
url = "https://myDevServer.com/dev/api/gate"
it should be
dev.properties
url = https://myDevServer.com/dev/api/gate
without "" and my problem is solved.
According to oracle documentation
Thrown to indicate that a malformed URL has occurred. Either no legal protocol could be found in a specification string or the string
could not be parsed.
So it means it is not parsed inside the string.
You want to use URI templates. Look carefully at the README of this project: URLEncoder.encode() does NOT work for URIs.
Let us take your original URL:
http://site-test.test.com/Meetings/IC/DownloadDocument?meetingId=c21c905c-8359-4bd6-b864-844709e05754&itemId=a4b724d1-282e-4b36-9d16-d619a807ba67&file=\s604132shvw140\Test-Documents\c21c905c-8359-4bd6-b864-844709e05754_attachments\7e89c3cb-ce53-4a04-a9ee-1a584e157987\myDoc.pdf
and convert it to a URI template with two variables (on multiple lines for clarity):
http://site-test.test.com/Meetings/IC/DownloadDocument
?meetingId={meetingID}&itemId={itemID}&file={file}
Now let us build a variable map with these three variables using the library mentioned in the link:
final VariableMap = VariableMap.newBuilder()
.addScalarValue("meetingID", "c21c905c-8359-4bd6-b864-844709e05754")
.addScalarValue("itemID", "a4b724d1-282e-4b36-9d16-d619a807ba67e")
.addScalarValue("file", "\\\\s604132shvw140\\Test-Documents"
+ "\\c21c905c-8359-4bd6-b864-844709e05754_attachments"
+ "\\7e89c3cb-ce53-4a04-a9ee-1a584e157987\\myDoc.pdf")
.build();
final URITemplate template
= new URITemplate("http://site-test.test.com/Meetings/IC/DownloadDocument"
+ "meetingId={meetingID}&itemId={itemID}&file={file}");
// Generate URL as a String
final String theURL = template.expand(vars);
This is GUARANTEED to return a fully functional URL!
Thanks to Erhun's answer I finally realised that my JSON mapper was returning the quotation marks around my data too! I needed to use "asText()" instead of "toString()"
It's not an uncommon issue - one's brain doesn't see anything wrong with the correct data, surrounded by quotes!
discoveryJson.path("some_endpoint").toString();
"https://what.the.com/heck"
discoveryJson.path("some_endpoint").asText();
https://what.the.com/heck
This code worked for me
public static void main(String[] args) {
try {
java.net.URL url = new java.net.URL("http://path");
System.out.println("Instantiated new URL: " + url);
}
catch (MalformedURLException e) {
e.printStackTrace();
}
}
Instantiated new URL: http://path
Very simple fix
String encodedURL = UriUtils.encodePath(request.getUrl(), "UTF-8");
Works no extra functionality needed.

how to avoid recursive url-decoding

I am writing something like an image proxy where I receive URLs from my site front-end, and I download images , re-size them, and return smaller images for the front end and client to download from the "proxy".
This means I need to take care of all-sorts of url patterns, this is why I chose to decode the given url and than encode it using URIUtils.decode:
private String fixUrl(String fromUrl) throws URIException {
fromUrl = URIUtil.decode(fromUrl);
fromUrl = URIUtil.encodeQuery(fromUrl);
return fromUrl;
}
This should help me take care of urls that are already encoded.
My problem is that some of the urls are double encoded, and from what I saw, URIUtils.decode, performs recursive decode and this means that in cases of double encoded urls I will get a bad url that does not work.
Is there a simple way to decode only once?
I'd try to check that URL still contains character %. If it does not contain any % it is not encoded and you can stop your decoding procedure.
Easiest option I know is to use the built-in
java.net.URLDecoder.decode
In order to decode automatically only once.
If like me you have the case that sometimes URL's are double / triple encoded - you can use this recursive function in order to decode again and again until there are no "%" or "+" :
private static String completeDecode(String url) {
if(url.contains("%") || url.contains("+"))
{
try
{
return(completeDecode(java.net.URLDecoder.decode(url, "UTF-8")));
}
catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
}
return url;
}
Cheers

how to check protocol present in url or not?

how to check protocol is present in URL , if not present need to append it.
is there any class to achieve this in java?
eg: String URL = www.google.com
need to get http://www.google.com
Just use String.startsWith("http://") to check this.
public String ensure_has_protocol(final String a_url)
{
if (!a_url.startsWith("http://"))
{
return "http://" + a_url;
}
return a_url;
}
EDIT:
An alternative would use a java.net.URL instance, whose constructor would throw an java.net.MalformedURLException if the URL did not contain a (legal) protocol (or was invalid for any other reason):
public URL make_url(final String a_url) throws MalformedURLException
{
try
{
return new URL(a_url);
}
catch (final MalformedURLException e)
{
}
return new URL("http://" + a_url);
}
You can use URL.toString() to obtain string representation of the URL. This is an improvement on the startsWith() approach as it guarantees that return URL is valid.
Let's say you have String url = www.google.com. String class methods would be enough for the goal of checking protocol identifiers. For example, url.startsWith("https://") would check whether a specific string is starting with the given protocol name.
However, are these controls enough for validation?
I think they aren't enough. First of all, you should define a list of valid protocol identifiers, e.g. a String array like {"http", "ftp", "https", ...}. Then you can parse your input String with regex ("://") and test your URL header whether it belongs to the list of valid protocol identifiers. And domain name validation methods are beyond this question, you can/should handle it with different techniques as well.
Just for completeness, I would do something like the following:
import com.google.common.base.Strings;
private static boolean isUrlHttps(String url){
if(Strings.isNullOrEmpty(url))
return false;
return url.toLowerCase().startsWith("https://");
}

Categories

Resources