URL encoding the character # in query path - java

There are places/libraries that seem to consider "#" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not.
I am looking to find out what is the correct version.
Example string: "someone#example.com".
If I go to https://www.urlencoder.org/ , and try to encode the above String I get
someone%40example.com
If I am using org.springframework.web.util.UriUtils I get these results:
String s1 = UriUtils.encodePathSegment("someone#example.com", "UTF-8");
String s2 = UriUtils.encodeQueryParam("someone#example.com", "UTF-8");
String s3 = UriUtils.encodePath("someone#example.com", "UTF-8");
System.out.println("----------s1: " + s1);
System.out.println("----------s2: " + s2);
System.out.println("----------s3: " + s3);
...outputs
----------s1: someone#example.com
----------s2: someone#example.com
----------s3: someone#example.com
RestEasy-Client v4.0.0.Final does not encode the "#" character in path segments
WSO2 ESB complains when receiving a Path parameter that contains # char (well, it can't find the resource at said moment).
Who is right, what should be the correct outcome, should "#" be transformed to "%40" or not?

There are places/libraries that seem to consider "#" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not.
The standard for which characters must be escaped in a path segment is RFC 3986, Appendix A.
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
Notice that depending on the path production you are using, there are three different flavors of segment
segment = *pchar
segment-nz = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "#" )
; non-zero-length segment without any colon ":"
but...
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
So # is allowed in any path segment.
Is it required? As far as I can tell, the answer is no -- using the pct-encoded representation instead is permitted when # is not serving the role of a delimiter. There's nothing explicit, but this observation about unreserved characters is a hint:
When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.
This suggests that pct-encodings of unreserved characters are permitted, even though that's clearly not required. So that should hold true for other characters after the delimiters have been resolved.
For reference: the unreserved set is pretty much what you would expect.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

If you call an url like login(:password)#url.com, it will connect you to that endpoint with your credential. So I would not escape them at that point. But if they appear after the .com, I would escape them, because they should not be use as a separator.

Related

HMAC-SHA256 - how to?

I am doing HMAC-SHA256 in Android. Here is the following code :
String baseString = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiI2NjU0MjA5MGE2NGJhYWU0MzI4NGFiYTY0MmNkNWJmNmFlNzdkNjFhIiwiYXVkIjoiaHR0cHM6Ly9hcHAuaWZvcm1idWlsZGVyLmNvbS9leHphY3QvYXBpL29hdXRoL3Rva2VuIiwiZXhwIjoxNTEwNDMyMzcyLCJpYXQiOjE1MTA0MzE3NzJ9";
String clientSecret = "167edb4d9c3e603131619ae4a92c76307e3f9631";
Mac sha256_HMAC = Mac.getInstance("HmacSHA256");
SecretKeySpec secret_key = new
SecretKeySpec(clientSecret.getBytes("UTF-8"), "HmacSHA256");
sha256_HMAC.init(secret_key);
String jwtSignature =
Base64.encodeToString(sha256_HMAC.doFinal(baseString.getBytes("UTF-8")), Base64.NO_WRAP);
Log.d("JWT-SIGNATURE", jwtSignature);
I get the JWT-SIGNATURE value as 2nFaU/7jcc99jTWCO0VLriN/fiLwqi/ap7eeuVhhal4=
Instead the correct JWT-SIGNATURE value should be 2nFaU_7jcc99jTWCO0VLriN_fiLwqi_ap7eeuVhhal4
There are few characters that are not correct i.e. "/" and a "=" at the end.Can someone kindly help me out.
The encoding you need to use is a variant of Base64 encoding called base64url.
From wikipedia:
Using standard Base64 in URL requires encoding of '+', '/' and '='
characters into special percent-encoded hexadecimal sequences ('+'
becomes '%2B', '/' becomes '%2F' and '=' becomes '%3D'), which makes
the string unnecessarily longer.
For this reason, modified Base64 for URL variants exist, where the '+'
and '/' characters of standard Base64 are respectively replaced by '-'
and '_', so that using URL encoders/decoders is no longer necessary
and have no impact on the length of the encoded value, leaving the
same encoded form intact for use in relational databases, web forms,
and object identifiers in general. Some variants allow or require
omitting the padding '=' signs to avoid them being confused with field
separators, or require that any such padding be percent-encoded. Some
libraries will encode '=' to '.'.

How to use an array of coordinates with Google Elevation API?

I am trying to send several locations in one request to Google Elevation API, you are supposed to be able to send up to 512 locations per request. Their documentation says to use:
An array of coordinates separated using the pipe ('|') character: locations=40.714728,-73.998672|-34.397,150.644
but I am getting back the error:
Caused by: java.net.URISyntaxException: Illegal character in query at index 99: https://maps.googleapis.com/maps/api/elevation/json?locations=51.606013718523265,-8.432384161819547|51.606031961540985,-8.432374430210215|51.60607166348032,-8.432334651837888|51.60610446039263,-8.4322494395575&key=myAPIkey
It works if I just send a single point. I am told to use the pipe ('|') character yet it won't accept it. My code is
position = ellipsePositions.get(index1);
longitude = position.getLongitude();
latitude = position.getLatitude();
APIstring = latitude + "," + longitude;
for (int index = 1; index < indexList.size(); index++)
{
if (ellipsePositions.get(index) != null)
{
position = ellipsePositions.get(index);
longitude = position.getLongitude();
latitude = position.getLatitude();
APIstring = APIstring + "|" + latitude + "," + longitude;
}
}
and
WebResource webResource = client.resource("https://maps.googleapis.com/maps/api/elevation/json?locations="+ APIstring + "&key=myAPIkey");
ClientResponse response = webResource.accept("application/json").get(ClientResponse.class);
String data = response.getEntity(String.class);
Can anyone help?
You need to URL encode any "unsafe" characters in the query string. Per the documentation on web services
Building a Valid URL
You may think that a "valid" URL is self-evident, but that's not quite the case. A URL entered within an address bar in a browser, for example, may contain special characters (e.g. "上海+中國"); the browser needs to internally translate those characters into a different encoding before transmission. By the same token, any code that generates or accepts UTF-8 input might treat URLs with UTF-8 characters as "valid", but would also need to translate those characters before sending them out to a web server. This process is called URL-encoding.
We need to translate special characters because all URLs need to conform to the syntax specified by the W3 Uniform Resource Identifier specification. In effect, this means that URLs must contain only a special subset of ASCII characters: the familiar alphanumeric symbols, and some reserved characters for use as control characters within URLs.
Some common characters that must be encoded are:
Unsafe character Encoded value
| %7C

urlencode() the 'asterisk' (star?) character

I'm testing PHP urlencode() vs. Java java.net.URLEncoder.encode().
Java
String all = "";
for (int i = 32; i < 256; ++i) {
all += (char) i;
}
System.out.println("All characters: -||" + all + "||-");
try {
System.out.println("Encoded characters: -||" + URLEncoder.encode(all, "utf8") + "||-");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
PHP
$all = "";
for($i = 32; $i < 256; ++$i)
{
$all = $all.chr($i);
}
echo($all.PHP_EOL);
echo(urlencode(utf8_encode($all)).PHP_EOL);
All characters seem to be encoded in the same way with both functions, except for the 'asterisk' character that is not encoded by Java, and translated to %2A by PHP. Which behaviour is supposed to be the 'right' one, if any?
Note: I tried with rawurlencode(), too - no luck.
It is okay to have a * in a URL, (but it is also okay to have it in its encoded form).
RFC1738: Uniform Resource Locators (URL) states the following:
Reserved:
[...]
Usually a URL has the same interpretation when an octet is
represented by a character and when it encoded. However, this is not
true for reserved characters: encoding a character reserved for a
particular scheme may change the semantics of a URL.
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
On the other hand, characters that are not required to be encoded
(including alphanumerics) may be encoded within the scheme-specific
part of a URL, as long as they are not being used for a reserved
purpose.
Wikipedia suggests that * is a reserved character when it comes to URIs, and that it must be encoded if not used for the reserved purpose. According to RFC3986, pages 12-13:
URIs include components and subcomponents that are delimited by
characters in the "reserved" set. These characters are called
"reserved" because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm.
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "#"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
(The fact that the URL RFC still allows the * character to go unencoded is that is doesn't have a reserved purpose i URLs, and as such doesn't have to be encoded. So wether you have to encode it or not depends on what sort of URI you're creating.)
Javadoc of URLEncoder refers to the HTML specification:
This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format. For more information about HTML form encoding, consult the HTML specification.
HTML4 is quite unclear regarding this question and refers to RFC1738, which is quoted by aioobe:
Control names and values are escaped. Space characters are replaced by '+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by '%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., '%0D%0A').
However, HTML5 directly states that * should not be encoded:
If the character isn't in the range U+0020, U+002A, U+002D, U+002E, U+0030 to U+0039, U+0041 to U+005A, U+005F, U+0061 to U+007A
Replace the character with a string formed as follows:
...
Otherwise
Leave the character as is.

Encoding URL query parameters in Java

How does one encode query parameters to go on a url in Java? I know, this seems like an obvious and already asked question.
There are two subtleties I'm not sure of:
Should spaces be encoded on the url as "+" or as "%20"? In chrome if I type in "http://google.com/foo=?bar me" chrome changes it to be encoded with %20
Is it necessary/correct to encode colons ":" as %3B? Chrome doesn't.
Notes:
java.net.URLEncoder.encode doesn't seem to work, it seems to be for encoding data to be form submitted. For example, it encodes space as + instead of %20, and encodes colon which isn't necessary.
java.net.URI doesn't encode query parameters
java.net.URLEncoder.encode(String s, String encoding) can help too. It follows the HTML form encoding application/x-www-form-urlencoded.
URLEncoder.encode(query, "UTF-8");
On the other hand, Percent-encoding (also known as URL encoding) encodes space with %20. Colon is a reserved character, so : will still remain a colon, after encoding.
Unfortunately, URLEncoder.encode() does not produce valid percent-encoding (as specified in RFC 3986).
URLEncoder.encode() encodes everything just fine, except space is encoded to "+". All the Java URI encoders that I could find only expose public methods to encode the query, fragment, path parts etc. - but don't expose the "raw" encoding. This is unfortunate as fragment and query are allowed to encode space to +, so we don't want to use them. Path is encoded properly but is "normalized" first so we can't use it for 'generic' encoding either.
Best solution I could come up with:
return URLEncoder.encode(raw, "UTF-8").replaceAll("\\+", "%20");
If replaceAll() is too slow for you, I guess the alternative is to roll your own encoder...
EDIT: I had this code in here first which doesn't encode "?", "&", "=" properly:
//don't use - doesn't properly encode "?", "&", "="
new URI(null, null, null, raw, null).toString().substring(1);
EDIT: URIUtil is no longer available in more recent versions, better answer at Java - encode URL or by Mr. Sindi in this thread.
URIUtil of Apache httpclient is really useful, although there are some alternatives
URIUtil.encodeQuery(url);
For example, it encodes space as "+" instead of "%20"
Both are perfectly valid in the right context. Although if you really preferred you could issue a string replace.
It is not necessary to encode a colon as %3B in the query, although doing so is not illegal.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
It also seems that only percent-encoded spaces are valid, as I doubt that space is an ALPHA or a DIGIT
look to the URI specification for more details.
The built in Java URLEncoder is doing what it's supposed to, and you should use it.
A "+" or "%20" are both valid replacements for a space character in a URL. Either one will work.
A ":" should be encoded, as it's a separator character. i.e. http://foo or ftp://bar. The fact that a particular browser can handle it when it's not encoded doesn't make it correct. You should encode them.
As a matter of good practice, be sure to use the method that takes a character encoding parameter. UTF-8 is generally used there, but you should supply it explicitly.
URLEncoder.encode(yourUrl, "UTF-8");
I just want to add anther way to resolve this problem.
If your project depends on spring web, you can use their utils.
import org.springframework.web.util.UriUtils
import java.nio.charset.StandardCharsets
UriUtils.encode('vip:104534049:5', StandardCharsets.UTF_8)
Output:
vip%3A104534049%3A5
String param="2019-07-18 19:29:37";
param="%27"+param.trim().replace(" ", "%20")+"%27";
I observed in case of Datetime (Timestamp)
URLEncoder.encode(param,"UTF-8") does not work.
The white space character " " is converted into a + sign when using URLEncoder.encode. This is opposite to other programming languages like JavaScript which encodes the space character into %20. But it is completely valid as the spaces in query string parameters are represented by +, and not %20. The %20 is generally used to represent spaces in URI itself (the URL part before ?).
if you have only space problem in url. I have used below code and it work fine
String url;
URL myUrl = new URL(url.replace(" ","%20"));
example : url is
www.xyz.com?para=hello sir
then output of muUrl is
www.xyz.com?para=hello%20sir

Why am I getting a StringIndexOutOfBoundsException when I try to replace `\\` with `\`?

I have to replace \\ with \ in Java. The code I am using is
System.out.println( (MyConstants.LOCATION_PATH + File.separator + myObject.getStLocation() ).replaceAll("\\\\", "\\") );
But I don't know why it is throwing StringIndexOutOfBoundsException.
It says String index out of range: 1
What could be the reason? I guess it is because the first argument replaceAll accepts a pattern. What could be the possible solution?
Stacktrace
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(String.java:558)
at java.util.regex.Matcher.appendReplacement(Matcher.java:696)
at java.util.regex.Matcher.replaceAll(Matcher.java:806)
at java.lang.String.replaceAll(String.java:2000)
Answer Found
asalamon74 posted the code I required, but I don't know why he deleted it. In any case here it is.
There is a bug already filed in Java's bug database. (Thanks for this reference, asalamon.)
yourString.replaceAll("\\\\", "\\\\");
Amazingly, both search and replace string are the same :) but still it does what I require.
Use String.replace instead of replaceAll to avoid it using a regex:
String original = MyConstants.LOCATION_PATH + File.seperator
+ myObject.getStLocation();
System.out.println(original.replace("\\\\", "\\"));
Personally I wouldn't do it this way though - I'd create MyConstants.LOCATION_PATH_FILE as a File and then you could write:
File location = new File(MyConstants.LOCATION_PATH_FILE,
myObject.getStLocation());
which will do the right thing automatically.
Well, i tried
String test = "just a \\ test with some \\\\ and others \\\\ or \\ so";
String result = test.replaceAll("\\\\", "\\\\");
System.out.println(test);
System.out.println(result);
System.out.println(test.equals(result));
and got, as expected
just a \ test with some \\ and others \\ or \ so
just a \ test with some \\ and others \\ or \ so
true
What you really need is
string.replaceAll("\\\\\\\\", "\\\\");
to get
just a \ test with some \\ and others \\ or \ so
just a \ test with some \ and others \ or \ so
false
You want to find: \\  (2 slashes)
which needs to be escaped in the regex: \\\\ (4 slashes)
and escaped in Java: "\\\\\\\\" (8 slashes)
same for the replacement...
For the regex, if you want to change \ to \\, you should do this:
if (str.indexOf('\\') > -1)
str = str.replaceAll("\\\\", "\\\\\\\\");
str = "\"" + str + "\"";
Where \\\\ means \ and \\\\\\\\ means \\.
File.seperator is already escaped as is any string object so you are escaping them twice.
You only need to escape values that you are entering as a string literal.
The best way is :
str.replace(**'**\\**'**, **'**/**'**); //with char method not String
Try this
cadena.replaceAll("\\\\","\\\\\\\\")
I suspect the problem is that replaceAll() uses regexps and the backslash is an escape character in regexps as well as in Java - it might be necessary to double the number of backslashes.
In general you should always post the full stack trace of exceptions, it is much easier to diagnose the problem that way.
I believe what you need to do is:
System.out.println( (MyConstants.LOCATION_PATH + File.separator + myObject.getStLocation() ).replaceAll("\\\\\\\\", "\\\\") );
The regular expression String is actually four backslashes, which is a regular expression that matches two backslashes.
The replacement String has to be four slashes as per Java documentation, from:
http://java.sun.com/javase/6/docs/api/java/util/regex/Matcher.html#replaceAll(java.lang.String)
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
final StringBuilder result = new StringBuilder();
final StringCharacterIterator iterator = new StringCharacterIterator(str);
char character = iterator.current();
while (character != CharacterIterator.DONE )
{
if (character == '\\\\') {
result.append("\\");
}
else {
result.append(character);
}
character = iterator.next();
}
System.out.print(result);

Categories

Resources