One of my REST APIs is expecting a property "url" which expects a URL as input from the user. I am using ESAPI to prevent from XSS attacks. The problem is that the user supplied URL is something like
http://example.com/alpha?abc=def&phil=key%3dbdj
The cannonicalize method from the ESAPI encoder throws intrusion exception here claiming that the input has mixed encoding, since it is url encoded and the piece '&phi' is treated as HTML encoded and thus the exception.
I had a similar problem with sanitizing one of my application urls where the second query parameter started with 'pa' or 'pi' and was converted to delta or pi characters by HTML decoding. Please refer to my previous Stackoverflow question here
Now since the problem is that since the entire URL is coming as input from the user, I cannot simply parse out the Query parameters and sanitize them individually, since malicious input can be created combining the two query parameters and sanitizing them individually wont work in that case.
Example: <scr comes is last part of first query param value and ipt>alert(0); or something comes as first part of the next query param control context.
Has anyone faced a similar problem? I would really like to know what solutions you guys implemented. Thanks for any pointers.
EDIT: The below answer from 'avgvstvs' does not throw the intrusion exception (Thanks!!). However, the cannonicalize method now changes the original input string. ESAPI treats &phi of the query param to be some html encoded char and replaces it to '?' char. Something like my previous question which is linked here. The difference being that was a URL of my application whereas this is user input. Is my only option maintaining a white list here?
The problem that you're facing here, is that there are different rules for encoding different parts of a URL--to memory there's 4 sections in a URL that have different encoding rules. First, understand why in Java, you need to build URLs using the UriBuilder class. The URL specification will help with nitty-gritty details.
Now since the problem is that since the entire URL is coming as input
from the user, I cannot simply parse out the Query parameters and
sanitize them individually, since malicious input can be created
combining the two query parameters and sanitizing them individually
wont work in that case.
The only real option here is java.net.URI.
Try this:
URI dirtyURI = new URI("http://example.com/alpha?abc=def&phil=key%3dbdj");
String cleanURIStr = enc.canonicalize( dirtyURI.getPath() );
The call to URI.getPath() should give you a non-percent encoded URL, and if enc.canonicalize() detects double-encoding after that stage then you really DO have a double-encoded string and should inform the caller that you will only accept single-encoded URL strings. The URI.getPath() is smart enough to use decoding rules for each part of the URL string.
If its still giving you some trouble, the API reference has other methods that will extract other parts of the URL, in the event that you need to do different things with different parts of the URL. IF you ever need to manually parse parameters on a GET request for example, you can actually just have it return the query string itself--and it will have done a decoding pass on it.
=============JUNIT Test Case============
package org.owasp.esapi;
import java.net.URI;
import java.net.URISyntaxException;
import org.junit.Test;
public class TestURLValidation {
#Test
public void test() throws URISyntaxException {
Encoder enc = ESAPI.encoder();
String input = "http://example.com/alpha?abc=def&phil=key%3dbdj";
URI dirtyURI = new URI(input);
enc.canonicalize(dirtyURI.getQuery());
}
}
=================Answer for updated question=====================
There's no way around it: Encoder.canonicalize() is intended to reduce escaped character sequences into their reduced, native-to-Java form. URLs are most likely considered a special case so they were most likely deliberately excluded from consideration. Here's the way I would handle your case--without a whitelist, and it will guarantee that you are protected by Encoder.canonicalize().
Use the code above to get a URI representation of your input.
Step 1: Canonicalize all of the URI parts except URI.getQuery()
Step 2: Use a library parser to parse the query string into a data structure. I would use httpclient-4.3.3.jar and httpcore-4.3.3.jar from commons. You'll then do something like this:
import java.net.URI;
import java.net.URISyntaxException;
import java.util.Iterator;
import java.util.List;
import javax.ws.rs.core.UriBuilder;
import org.apache.http.client.utils.URLEncodedUtils;
import org.junit.Test;
import org.owasp.esapi.ESAPI;
import org.owasp.esapi.Encoder;
public class TestURLValidation
{
#Test
public void test() throws URISyntaxException {
Encoder enc = ESAPI.encoder();
String input = "http://example.com/alpha?abc=def&phil=key%3dbdj";
URI dirtyURI = new URI(input);
UriBuilder uriData = UriBuilder.fromUri(enc.canonicalize(dirtyURI.getScheme()));
uriData.path(enc.canonicalize(enc.canonicalize(dirtyURI.getAuthority() + dirtyURI.getPath())));
println(uriData.build().toString());
List<org.apache.http.NameValuePair> params = URLEncodedUtils.parse(dirtyURI, "UTF-8");
Iterator<org.apache.http.NameValuePair> it = params.iterator();
while(it.hasNext()) {
org.apache.http.NameValuePair nValuePair = it.next();
uriData.queryParam(enc.canonicalize(nValuePair.getName()), enc.canonicalize(nValuePair.getValue()));
}
String canonicalizedUrl = uriData.build().toString();
println(canonicalizedUrl);
}
public static void println(String s) {
System.out.println(s);
}
}
What we're really doing here is using standard libraries to parse the inputURL (thus taking all the burden off of us) and then canonicalizing the parts after we've parsed each section.
Please note that the code I've listed won't work for all url types... there are more parts to a URL than scheme/authority/path/queries. (Missing is the possibility of userInfo or port, if you need those, modify this code accordingly.)
Related
public Response getCustomerByName(
#PathParam("customerName") String customerName)
Problem :
I am passing customerName as : stack overflow (URL is encoded as : stack%20overflow). I want to receive as decoded string (stack overflow, without %20) in my java code.
What I tried :
This works perfectly fine, but I felt it is not more generic way of doing it.
URLDecoder.decode(customerName, "UTF-8");
Require more generic solution :
I want to do the similar changes in rest of the APIs as well, so using URLDecoder in each API is burden . Is there any common practice which I can follow to impose this decoding at application level? (#PathParam is already decoded when I receive the request)
It shall be auto "Decoded" and you don't need explicit decoding using URLDecoder.decode(customerName, "UTF-8");
As mentioned in javadoc of PathParam javadoc:
The value is URL decoded unless this is disabled using the Encoded annotation.
I just verified below and it works as per javadoc (in weblogic server)
#GET
#Produces(value = { "text/plain"})
#Path("{customerName}")
public Response getCustomerByName(#PathParam("customerName") String customerName) {
System.out.println(customerName);
return Response.ok().entity(customerName).type("text/plain").build();
}
I have a URL and I want to print in my graphical user interface the ID value after the hashtag.
For example, we have www.site.com/index.php#hello and I want to print hello value on a label in my GUI.
How can I do this using Java in Netbeans?
Simple solution is getRef() in URL class:
URL url = new URL("http://www.anyhost.com/index.php#hello");
jLabel.setText(url.getRef());
EDIT: According to #Henry comment:
I would recommend to use the java.net.URI as it also deals with encoding. The Javadocs say: "Note, the URI class does perform escaping of its component fields in certain circumstances. The recommended way to manage the encoding and decoding of URLs is to use URI, and to convert between these two classes using toURI() and URI.toURL()."
and this comment:
Why not just doing uri.getFragment()
URI uri = new URI("http://www.anyhost.com/index.php#hello");
jLabel.setText(uri.getFragment());
Use the String.split() Method.
public static String getId(string url) {
return url.split("#")[1];
}
String.split() returns an array of Strings that are delimited, or "Split," by the value you pass to it, or in this case #.
Because you want only the string after the #, you can just use the second item in the array that it returns by adding [1] to the end of it.
For more on String.split() go to Tutorials Point.
By the way, the part of the URL you are referencing is the Element ID. It is used to jump to an Element on a webpage.
So I'm trying to scrape a grammar website that gives you conjugations of verbs, but I'm having trouble accessing the pages that require accents, such as the page for the verb "fág".
Here is my current code:
String url = "http://www.teanglann.ie/en/gram/"+ URLEncoder.encode("fág","UTF-8");
System.out.println(url);
I've tried this both with and without the URLEncoder.encode() method, and it just keeps giving me a '?' in place of the 'á' when working with it, and my URL search returns nothing. Basically, I was wondering if there was something similar to Python's 'urllib.parse.quote_plus'. I've tried searching and tried many different methods from StackOverflow, all to no avail. Any help would be greatly appreciated.
Eventually, I'm going to replace the given string with a user inputed argument. Just using it to test at the moment.
Solution: It wasn't Java, but IntelliJ.
Summary from comment
The test code works fine.
import java.io.UnsupportedEncodingException;
import static java.net.URLEncoder.encode;
public class MainApp {
public static void main(String[] args) throws UnsupportedEncodingException {
String url = "http://www.teanglann.ie/en/gram/"+ encode("fág", "UTF-8");
System.out.println(url);
}
}
It emits like below
http://www.teanglann.ie/en/gram/f%EF%BF%BDg
Which would goto correct page.
Correct steps are
Ensure that source code encoding is correct. (IntelliJ probably
cannot guess it all correct)
Run the program with appropriate encoding (utf-8 in this case)
(See
What is the default encoding of the JVM?
for a relevant discussion)
Edit from Wyzard's comment
Above code works by accident(say does not have whitespace). Correct way to get encoded URL is like bellow
..
String url = "http://www.teanglann.ie/en/gram/fág";
System.out.println(new URI(url).toASCIIString());
This uses URI.toASCIIString() which adheres to RFC 2396, which talk about Uniform Resource Identifiers (URI): Generic Syntax
I am currently properly escaping my filters, either using Spring LDAP Filter clases, or by going through LdapEncoder.filterEncode().
At the same time, I am using WireShark to capture packets being exchanged between my local machine and the LDAP server.
And I seem to have a problem. Even if I properly escape values (which I have confirmed through debugging), they come out unescaped through the network. I have also confirmed (through debugging) that the value stays encoded all the way until it enters javax.naming.InitialContext.
Here is an example (note that I am using Spring LDAP 1.3.0, and that these happen on both Oracle JDK 6u45 and Oracle JDK 7u45).
In my own code, on the service layer, the call being made is:
String lMailAddress = (String) ldapTemplate.searchForObject("", new EqualsFilter(ldapUserSearchFilterAttribute, principal).encode(), new ContextMapper() {
#Override
public Object mapFromContext(Object ctx) {
DirContextAdapter lContext = (DirContextAdapter) ctx;
return lContext.getStringAttribute("mail");
}});
At this point, I can confirm that the String returned by the encode() method on the filter is "(sAMAccountName=boi\2a)"
The last point I can debug the code is the following one (starts at line 229 of org.springframework.ldap.core.LdapTemplate):
SearchExecutor se = new SearchExecutor() {
public NamingEnumeration executeSearch(DirContext ctx) throws javax.naming.NamingException {
return ctx.search(base, filter, controls);
}
};
When executeSearch() is later invoked, I can also verify that the filter String contains "(sAMAccountName=boi\2a)".
I cannot debug any further, since I do not have the source code to javax,naming.* or com.sun.jndi.ldap.* (since com.sun.jndi.ldap.LdapCtx is being invoked).
However, as soon as the call returns from executeSearch(), WireShark informs me that an LDAP packet containing a searchRequest with the filter "(sAMAccountName=boi*)" has been transmitted (the * is no longer escaped).
I have used similar encoding and used different methods of LdapTemplate that yielded the result I was expecting (I saw the encoded filter being transmitted in WireShark), but I cannot explain why, in the case I just exposed, the value gets decoded before being transmitted.
Please help me understanding the situation. Hpoefully, I am the one who does not properly understand the LDAP protocol here.
Thanks.
Disclaimer: I have posted the same question to Spring LDAP forums.
TL/DR: Why is com.sun.jndi.ldap.LdapCtx decoding LDAP encoded filters (like \2a to *) before transmitting them to the LDAP server?
Update: Tried and observed the same behavior with IBM's J9 JDK7.
Although I'm not familiar with Spring LDAP, it doesn't sound like there's necessarily a reason to be concerned. LDAP filters aren't transmitted as clear text, but rather in a binary encoding, and there is no need for escaping in this mechanism (nor would it be correct to do so).
Let's take "(sAMAccountName=boi*)" as an example. As written, this filter is a substring filter with a subInitial component of "boi". As you point out, if you want it to be an equality filter rather than a substring filter, then the string representation would have to be "(sAMAccountName=boi\2a)". However, the binary encodings for these filters don't use any escaping, but instead use an ASN.1 BER type to differentiate between substring and equality filters.
If you want "(sAMAccountName=boi*)" as a substring filter, then the encoded representation would be:
a417040e73414d4163636f756e744e616d6530058003626f69
On the other hand, if you want "(sAMAccountName=boi\2a)" as an equality filter, the encoding would be:
a316040e73414d4163636f756e744e616d650404626f692a
The full explanation of the encoding isn't something I want to get into, but the "a4" at the beginning of the first one indicates that it's a substring filter, whereas the "a3" at the beginning of the second indicates that it's an equality filter.
You should be able to verify the actual bytes sent in WireShark. It may well be that WireShark doesn't properly escape the filter when generating the string representation, but that would be an issue with WireShark itself. The directory server only gets the binary representation, and it's hard to believe that an LDAP server would misinterpret that.
OWASP suggest to encode strings for searches:
public static final String escapeLDAPSearchFilter(String filter) {
StringBuffer sb = new StringBuffer(); // If using JDK >= 1.5 consider using StringBuilder
for (int i = 0; i < filter.length(); i++) {
char curChar = filter.charAt(i);
switch (curChar) {
case '\\':
sb.append("\\5c");
break;
case '*':
sb.append("\\2a");
break;
case '(':
sb.append("\\28");
break;
case ')':
sb.append("\\29");
break;
case '\u0000':
sb.append("\\00");
break;
default:
sb.append(curChar);
}
}
return sb.toString();
}
DN strings are escaped different. See the link below.
https://www.owasp.org/index.php/Preventing_LDAP_Injection_in_Java
The best way is to use parameterized filter search method, thus the parameter will be properly encoded.
See https://docs.oracle.com/javase/jndi/tutorial/ldap/search/search.html
// Perform the search
NamingEnumeration answer = ctx.search("ou=NewHires",
"(&(mySpecialKey={0}) (cn=*{1}))", // Filter expression
new Object[]{key, name}, // Filter arguments
null); // Default search controls
I want to send a URI as the value of a query/matrix parameter. Before I can append it to an existing URI, I need to encode it according to RFC 2396. For example, given the input:
http://google.com/resource?key=value1 & value2
I expect the output:
http%3a%2f%2fgoogle.com%2fresource%3fkey%3dvalue1%2520%26%2520value2
Neither java.net.URLEncoder nor java.net.URI will generate the right output. URLEncoder is meant for HTML form encoding which is not the same as RFC 2396. URI has no mechanism for encoding a single value at a time so it has no way of knowing that value1 and value2 are part of the same key.
Jersey's UriBuilder encodes URI components using application/x-www-form-urlencoded and RFC 3986 as needed. According to the Javadoc
Builder methods perform contextual encoding of characters not permitted in the corresponding URI component following the rules of the application/x-www-form-urlencoded media type for query parameters and RFC 3986 for all other components. Note that only characters not permitted in a particular component are subject to encoding so, e.g., a path supplied to one of the path methods may contain matrix parameters or multiple path segments since the separators are legal characters and will not be encoded. Percent encoded values are also recognized where allowed and will not be double encoded.
You could also use Spring's UriUtils
I don't have enough reputation to comment on answers, but I just wanted to note that downloading the JSR-311 api by itself will not work. You need to download the reference implementation (jersey).
Only downloading the api from the JSR page will give you a ClassNotFoundException when the api tries to look for an implementation at runtime.
I wrote my own, it's short, super simple, and you can copy it if you like:
http://www.dmurph.com/2011/01/java-uri-encoder/
It seems that CharEscapers from Google GData-java-client has what you want. It has uriPathEscaper method, uriQueryStringEscaper, and generic uriEscaper. (All return Escaper object which does actual escaping). Apache License.
I think that the URI class is the one that you are looking for.
Mmhh I know you've already discarded URLEncoder, but despite of what the docs say, I decided to give it a try.
You said:
For example, given an input:
http://google.com/resource?key=value
I expect the output:
http%3a%2f%2fgoogle.com%2fresource%3fkey%3dvalue
So:
C:\oreyes\samples\java\URL>type URLEncodeSample.java
import java.net.*;
public class URLEncodeSample {
public static void main( String [] args ) throws Throwable {
System.out.println( URLEncoder.encode( args[0], "UTF-8" ));
}
}
C:\oreyes\samples\java\URL>javac URLEncodeSample.java
C:\oreyes\samples\java\URL>java URLEncodeSample "http://google.com/resource?key=value"
http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
As expected.
What would be the problem with this?