Illegal character in URL - java

I get an error "Illegal character in URL" in my code and I don't know why:
I have a token and an hash that are string type.
String currentURL = "http://platform.shopyourway.com" +
"/products/get-by-tag?tagId=220431" +
"&token=" + token +
"&hash=" + hash;
HttpURLConnection urlConnection = null;
BufferedReader reader = null;
try {
URL url = new URL(currentURL);
urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
urlConnection.connect();
[...]
but when I wrote :
URL url = new URL("http://platform.shopyourway.com/products/get-by-tag?tagId=220431&token=0_11800_253402300799_1_a9c1d19702ed3a5e873fd3b3bcae6f8e3f8b845c9686418768291042ad5709f1&hash=e68e41e4ea4ed16f4dbfb32668ed02b080bf1f2cbee64c2692ef510e7f7dc26b");
it's work, but I can't write this order because I don't know the hash and the token because I generate them every time.
thanks.

From the Oracle docs on creating URLs you need to escape the "values" of your URL string.
URL addresses with Special characters
Some URL addresses contain special characters, for example the space
character. Like this:
http://example.com/hello world/ To make these characters legal they
need to be encoded before passing them to the URL constructor.
URL url = new URL("http://example.com/hello%20world");
Encoding the special character(s) in this example is easy as there is
only one character that needs encoding, but for URL addresses that
have several of these characters or if you are unsure when writing
your code what URL addresses you will need to access, you can use the
multi-argument constructors of the java.net.URI class to automatically
take care of the encoding for you.
URI uri = new URI("http", "example.com", "/hello world/", "");
And then convert the URI to a URL.
URL url = uri.toURL();
As commented also see this other post that uses URLEncoder to replace any offending characters

Related

How to avoid special character from URL using java

I am using below code to eliminate the special characters from URL:
String url1 = "https://dev/ABC/v1/XYZ?itemnumber%255Bin%255D=%255B3001%252C3005%252C202%255D&limit=2&apikey=4zVYEk2Xg8zvwYxNnW&offset=2";
String decodedURL = URLDecoder.decode(url1, "UTF-8");
System.out.println(decodedURL);
Expected output:
https://dev/ABC/v1/XYZ?itemnumber[in]=[3001,3005,20]&limit=2&offset=1&apikey=4zVYEk2Xg8zvwYxNnW
Error output:
https://dev/ABC/v1/XYZ?itemnumber%5Bin%5D=%5B3001%2C3005%2C202%5D&limit=2&apikey=4zVYEk2Xg8zvwYxNnW&offset=1
Your string is double-URL encoded, see https://ideone.com/CQQbPz:
String url1 = "https://dev/ABC/v1/XYZ?itemnumber%255Bin%255D=%255B3001%252C3005%252C202%255D&limit=2&apikey=4zVYEk2Xg8zvwYxNnW&offset=2";
System.out.println(URLDecoder.decode(url1, "UTF-8"));
System.out.println(URLDecoder.decode(URLDecoder.decode(url1, "UTF-8"), "UTF-8"));
Output:
https://dev/ABC/v1/XYZ?itemnumber%5Bin%5D=%5B3001%2C3005%2C202%5D&limit=2&apikey=4zVYEk2Xg8zvwYxNnW&offset=2
https://dev/ABC/v1/XYZ?itemnumber[in]=[3001,3005,202]&limit=2&apikey=4zVYEk2Xg8zvwYxNnW&offset=2
Browsers and many other http programs convert illegitimate url request symbols to URL encoding scheme that place a % percent sign in front of two numerals. Before use, use
String decoded = java.net.URLDecoder.decode(request);

why Encoding in http request?

I am trying to learn request and retrive data from server with http protocol on Java this is the code I found on Oracle>Tutorial>networking (Code is pasted at the bottom of question)
Question 1: in out.write("string=" + stringToReverse);why "string=" isn't encoded? like stringToReverse varable
String stringToReverse = URLEncoder.encode(args[1], "UTF-8");
Question 2:
there are two codes below one from oracle code and other from android studio tuts
code in oracle tuts
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
android tuts code
inputStream = urlConnection.getInputStream();
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, Charset.forName("UTF-8"));
BufferedReader reader = new BufferedReader(inputStreamReader);
why is Charset.forName("UTF-8") missing in oracle code?
Note: explaining from basics is very much useful :)
import java.io.*;
import java.net.*;
public class Reverse {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: java Reverse "
+ "http://<location of your servlet/script>"
+ " string_to_reverse");
System.exit(1);
}
String stringToReverse = URLEncoder.encode(args[1], "UTF-8");
URL url = new URL(args[0]);
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
OutputStreamWriter out = new OutputStreamWriter(
connection.getOutputStream());
out.write("string=" + stringToReverse);
out.close();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
String decodedString;
while ((decodedString = in.readLine()) != null) {
System.out.println(decodedString);
}
in.close();
}
}
Question 1:
There is no need to encode "string=" (as it does not contain any special characters as explained in https://docs.oracle.com/javase/6/docs/api/java/net/URLEncoder.html)
Question 2:
The charset in the following example is not explicitly defined:
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
Therefore defaut charset is used (which may not be UTF-8)
Every instance of the Java virtual machine has a default charset,
which may or may not be one of the standard charsets. The default
charset is determined during virtual-machine startup and typically
depends upon the locale and charset being used by the underlying
operating system. (https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html)
In a url the string after ? is called as query string
example.com/users/profile?key1=value1&key2=value2
So for the above url the query string is "key1=value1&key2=value2"
In a query string there are key,value pairs which a server script can access.These key value pairs are called as request parameters and are separated by an &.So ?,& ,space etc are called special characters in a url as they are treated specially by the browser.
So what happens in case the value1 itself contains an & character.The server will in advertently end the value1 before & character at user1.
name=user1&23=hello&place=hyd
If you see above example it will not work as expected.
So that's why you use url encoding to convert special characters like & ,? , space etc to some other non special characters when they are used in query string.The server will convert back them to their actual form once it is received.
Now coming to your question 1),URL encoding is not needed in your case as you are not sending the string_to_reverse as a request parameter in query string.As jesper pointed out this is not url encoding.You are sending it as body using the outputstream.
Now question 2),If you see the http://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html class,it states as below
Utility class for HTML form encoding. This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format.
So html form data is posted as application/x-www-form-urlencoded and in ur case URLEncoder is taking care of that.If no charset is specified the default character set is used.How to Find the Default Charset/Encoding in Java?.
The name URL in URLEncoder class is little misleading to you as its not really used for encoding url here but used for encoding the request body(string_to_reverse)as application/x-www-form-urlencoded.

OrientDB http request failed...?

I'm trying to do a "POST" method in Java. I create my output with the OrientDB method like this:
"http://xxxxxxxxxxx:2480/command/mydb/sql/CREATE VERTEX V SET name = ' datoAletarorio'"
I need to use the write and flush methods to send the command.
My DB is empty with this method.
Where is my error? Here is my code:
//...
PrintWriter out = null;
//...
conexion = (HttpURLConnection) url.openConnection();
conexion.setDoOutput(true);
conexion.setRequestMethod("POST");
out = new PrintWriter(conexion.getOutputStream());
conexion.connect();
//...
String cumuloDatos1 = "http://xxxxxxxxxxx:2480/command/mydb/sql/CREATE VERTEX V SET name = ' datoAletarorio'"
out.write(cumuloDatos1);
out.flush();
//..
conexion.disconnect();
Thank in advance.
The docs says:
The command-text can appear in either the URL or the content of the
POST transmission. Where the command-text is included in the URL, it
must be encoded as per normal URL encoding.
So you probably have to encode the URL before sending the request:
String cumuloDatos1 =
"http://xxxxxxxxxxx:2480/command/mydb/sql/" +
"CREATE%20VERTEX%20V%20SET%20name%20%3D%20%27%20datoAletarorio%27"
Anyway, you should see messages in the logs for a 400 or similiar in the server, if the request isn't valid.

Java get filename of download from redirected 'friendly' url

I'm trying to download a file from a given URL which may or may not be a direct link to the file. Does anyone know how I can detect the filename to write to if the URL is an indirect link (i.e http://www.example.com/download.php?getFile=1) ?
It is no problem if the URL is a direct link to extract the filename from the URL and start writing to the extracted filename but with a redirect link the only method I have found so far is to write to an arbitrary filename - foo.txt - and then try and work with that. Problem is I really need the filename (and extension) to be correct.
A sample of the code I am using is: (the section in the 'else' clause is neither finished nor working):
public static boolean dlFile(String URL, String dest){
try{
URL grab = new URL(URL);
ReadableByteChannel rbc = Channels.newChannel(grab.openStream());
String fnRE = ".*/([a-zA-Z0-9\\-\\._]+)$";
Pattern pattern = Pattern.compile(fnRE);
Matcher matcher = pattern.matcher(URL);
String fName = "";
if(matcher.find()) fName = matcher.group(1);
else { //filename cannot be extracted - do something here - below doesn't work raises MalformedURLExcpetion
URL foo = new URL(URL);
HttpURLConnection fooConnection = (HttpURLConnection) foo.openConnection();
URL secondFoo = new URL(fooConnection.getHeaderField("Location"));
System.out.println("Redirect URL: "+secondFoo);
fooConnection.setInstanceFollowRedirects(false);
URLConnection fooURL = secondFoo.openConnection();
}
System.out.println("Connection to "+URL+" established!");
if(dest.endsWith("/")){}
else dest+="/";
System.out.println("Writing "+fName+" to "+dest);
FileOutputStream fos = new FileOutputStream(dest+fName);
fos.getChannel().transferFrom(rbc, 0, 1 << 24);
I am sure there must be a simple way to get the filename from the headers or something like that but I cannot work out how to get it. Thanks in advance,
Assuming the response has a "Location" header field, I was able to obtain the direct link to a url containing multiple redirects like this:
String location = "http://www.example.com/download.php?getFile=1";
HttpURLConnection connection = null;
for (;;) {
URL url = new URL(location);
connection = (HttpURLConnection) url.openConnection();
connection.setInstanceFollowRedirects(false);
String redirectLocation = connection.getHeaderField("Location");
if (redirectLocation == null) break;
location = redirectLocation;
}
//and finally:
String fileName = location.substring(location.lastIndexOf('/') + 1, location.length());
I think its better to use Java Jsoup library, then use the below method:
public static void downloadFileJsoup(String URL, String PATH) throws IOException {
Response res = Jsoup.connect(URL)
.userAgent("Mozilla")
.timeout(30000)
.followRedirects(true)
.ignoreContentType(true)
.maxBodySize(20000000)//Increase value if download is more than 20MB
.execute();
String remoteFilename=res.header("Content-Disposition").replaceFirst("(?i)^.*filename=\"?([^\"]+)\"?.*$", "$1");
String filename = PATH + remoteFilename;
FileOutputStream out = (new FileOutputStream(new java.io.File(filename)));
out.write( res.bodyAsBytes());
out.close();
}
No, in general no way. The response does'nt contain that information normally, since you do not add any own protocol information to the data stream (in case you can control the server).
Anyway, you ask for the file name extension. Maybe with the correct content-type you are done.

What is the proper way to escape a URL for URLConnection.getInputStream()?

I'm having a really bizarre problem with URLConnection.getInputStream() when I have a space (' ') in the query string portion of a URI. Specifically, I have one URL that works and another that does not, when I think they should both fail or both succeed, additionally, its every time.
Working URL: http://minneapolis.craigslist.ca/search/sss?catAbb=sss&query=iPhone+sprint&sort=date&srchType=A&format=rss
Failed URL (exception below) : http://winnipeg.craigslist.ca/search/sss?catAbb=sss&query=iPhone+sprint&sort=date&srchType=A&format=rss
conn.getInputStream() throws the IO exception: "Illegal character in query at index 67: http://winnipeg.en.craigslist.ca/search/sss?catAbb=sss&query=iPhone sprint two&sort=date&srchType=A&format=rss"
It appears openConnection can't get the space (which I've already replaced with a '+' as I'd expect to have to with a 'URL', I've also tried '%20' with the same results.
Additionally, URL.toString() reports the URLS as I printed above, with the '+' not the space.
Code is as follows, searchUrl is a 'URL' instance.
URLConnection conn = null;
conn = searchUrl.openConnection();
conn.setConnectTimeout(CONNECT_TIMEOUT);
conn.setUseCaches(true);
conn.setAllowUserInteraction(false);
ByteArrayOutputStream oStream = new ByteArrayOutputStream();
InputStream istream = conn.getInputStream();
int numBytesRead, numBytesWritten = 0;
byte[] buffer = new byte[8 * 1024];
while ((numBytesRead = istream.read(buffer, 0, 8 * 1024)) > 0) {
oStream.write(buffer, numBytesWritten, numBytesRead);
numBytesWritten += numBytesRead;
}
Any ideas on where to deal with this? I'm about to pitch URLConnection and go another route...
Thanks
Kenny.
There is something wrong with your question (see my comment).
However, the fundamental problem here is that a URL with a space character in the query part is not a legal URL ... not withstanding that a typical web browser will accept it. The exception is therefore correct.
Your example URLs seem to show that the space is escaped with a '+'. This is HTML form escaping not proper URL escaping. You seem to be saying that you get the same result is you use %20 ... which would be correct escaping.
So my theory is that you are actually passing this URL to your code via a route that is removing the escapes ... not-withstanding what your traceprints seem to be telling you. (If I could see an SSCE we'd be able to test this theory ...)
FWIW, fixing the problem by calling UrlEncoder.encode as some of the other answers have suggested is a bad idea. The problem is that it is likely to "encode" other characters that shouldn't be encoded.
The URL itself is best encoded with new URI(null, url, null).toASCIIString().
Each key and value in the query string can be separately encoded with URLEncoder.encode(). According to RFC 2936 this isn't correct and the whole thing should be encoded as for the URL itself, but I've never seen it fail.
Did you try URLEncoder.encode(string, "UTF-8")
Following is the example:
Replace
String url = "http://somesite.com/page?user=" + user;
with
String url = "http://somesite.com/page?user="
+ URLEncoder.encode(user, "UTF-8");
String url= URLEncoder.encode("your URL without http or your query string part here");
URL searchUrl = new URL("http://" + url);
URLConnection conn = null;
conn = searchUrl.openConnection();

Categories

Resources