Regex pattern for query string

Regex pattern for query string - java

I need a help finding java regex pattern to get one query information from the URI.
For instance URI here is
"GET /6.2/calculateroute.xml?routeattributes=sm,wp,lg,bb&legattributes=mn&maneuverattributes=ac,po,tt,le,-rn,-sp,-di,no,nu,nr,sh&instructionFormat=html&language=en_US&mode=fastest;car;traffic:default&waypoint0=37.79548,-122.392025&waypoint1=36.0957717,-115.1745167&resolution=786&app_id=D4KnHBzGYyJtbM8lVfYX&token=TRKB7vnBguWLam5rdWshTA HTTP/1.1"
I need to extract 4 value out of it which I manage to do it:
GET
/6.2/calculateroute.xml
routeattributes=sm,wp,lg,bb&legattributes=mn&maneuverattributes=ac,po,tt,le,-rn,-sp,-di,no,nu,nr,sh&instructionFormat=html&language=en_US&mode=fastest;car;traffic:default&waypoint0=37.79548,-122.392025&waypoint1=36.0957717,-115.1745167&resolution=786&app_id=D4KnHBzGYyJtbM8lVfYX&token=TRKB7vnBguWLam5rdWshTA
HTTP/1.1
Now the question is how do I write a regex for app_id value from the query string. Note app_id do not appear in all the pattern, so it should be generic and regex should not fail if app_id is missing. Please help...

Your question can be simplified to: "How do I extract an optional query parameter from a string". Here's how:
String appId = input.replaceAll("(.*(app_id=(\\w+)).*)|.*", "$3");
The appId variable will contain the app_id value if it's present or be blank otherwise.
Here's some test code with the code bundled as a utility method:
public static String getParameterValue(String input, String parameter) {
return input.replaceAll("(.*("+parameter+"=(\\w+)).*)|.*", "$3"));
}
public static void main(String[] args) {
String input1 = "foo=bar&app_id=D4KnHBzGYyJtbM8lVfYX&x=y";
String input2 = "foo=bar&XXXXXX=D4KnHBzGYyJtbM8lVfYX&x=y";
System.out.println("app_id1:" + getParameterValue(input1, "app_id"));
System.out.println("app_id2:" + getParameterValue(input2, "app_id"));
}
Output:
app_id1:D4KnHBzGYyJtbM8lVfYX
app_id2:

Related

java grouping regex fail to match string+text

I wrote this test
#Test
public void removeRequestTextFromRouteError() throws Exception {
String input = "Failed to handle request regression_4828 HISTORIC_TIME from=s:33901510 tn:27825741 bd:false st:Winifred~Dr to=s:-1 d:false f:-1.0 x:-73.92752 y:40.696857 r:-1.0 cd:-1.0 fn:-1 tn:-1 bd:true 1 null false null on subject RoutingRequest";
final String answer = stringUtils.removeRequestTextFromError(input);
String expected = "Failed to handle request _ on subject RoutingRequest";
assertThat(answer, equalTo(expected));
}
which runs this method, but fails
public String removeRequestTextFromError(String answer) {
answer = answer.replaceAll("regression_\\d\\[.*?\\] on subject", "_ on subject");
return answer;
}
The input text stays the same and not replaced with "_"
how can I change the pattern matching to fix this?

You are using the a wrong regex. You are escaping [ and ] (not necessary at all) and using \\d instead of \\d+. Also, you should use a positive look-ahead instead of actually selecting and replacing the String "on subject"
Use :
public static void main(String[] args) {
String input = "Failed to handle request regression_4828 HISTORIC_TIME from=s:33901510 tn:27825741 bd:false st:Winifred~Dr to=s:-1 d:false f:-1.0 x:-73.92752 y:40.696857 r:-1.0 cd:-1.0 fn:-1 tn:-1 bd:true 1 null false null on subject RoutingRequest";
final String answer = input.replaceAll("regression_.* (?=on subject)", "_ ");
System.out.println(answer);
String expected = "Failed to handle request _ on subject RoutingRequest";
System.out.println(answer.equals(expected));
}
O/P :
Failed to handle request _ on subject RoutingRequest
true

As an alternative to the answer given by #TheLostMind, you can try breaking your input into 3 pieces, the second piece being what you want to match and then remove.
Each quantity in parentheses, if matched, will be available as a capture group. Here is the regex with the capture groups labelled:
(.*)(regression_\\d+.* on subject)(.*)
$1 $2 $3
You want to retain $1 and $3:
public String removeRequestTextFromError(String answer) {
answer = answer.replaceAll("(.*)(regression_\\d+.* on subject)(.*)", "$1$3");
}

Can't seem to get ESAPI Validator getValidInput() Working for URL Parameters

I am trying to use ESAPI Encoder to identify and canonicalize URL-encoded query parameters. It sort of works, but not in the way the API seems to indicate. Here is my class, and below is the output it generates:
CODE
package test.test;
import org.owasp.esapi.ESAPI;
import org.owasp.esapi.Validator;
import org.owasp.esapi.errors.EncodingException;
import org.owasp.esapi.errors.IntrusionException;
import org.owasp.esapi.errors.ValidationException;
public class ESAPITester {
public static void main(String argsp[]) throws ValidationException,
IntrusionException, EncodingException {
String searchString = "-/+=_ !$*?#";
String singleEncoded = ESAPI.encoder().encodeForURL(searchString);
String doubleEncoded = ESAPI.encoder().encodeForURL(singleEncoded);
Validator validator = ESAPI.validator();
System.out.println("Searched : " + searchString);
System.out.println("Single encoded : " + singleEncoded);
System.out.println("Double encoded : " + doubleEncoded);
System.out.println("Decode from URL : " + ESAPI.encoder().decodeFromURL(singleEncoded));
System.out.println("Canonicalized : " + ESAPI.encoder().canonicalize(singleEncoded));
System.out.println("Valid input : " + validator.getValidInput("http",
searchString, "HTTPParameterValue", 100, true, true));
System.out.println("Valid from Encoded : " + validator.getValidInput("http",
singleEncoded, "HTTPParameterValue", 100, true, true));
}
}
OUTPUT
Searched : -/+=_ !$*?#
Single encoded : -%2F%2B%3D_+%21%24*%3F%40
Double encoded : -%252F%252B%253D_%2B%2521%2524*%253F%2540
Decode from URL : -/ =_ !$*?#
Canonicalized : -/+=_+!$*?#
Valid input : -/+=_ !$*?#
log4j:WARN No appenders could be found for logger (IntrusionDetector).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.owasp.esapi.errors.ValidationException: http: Invalid input. Please conform to regex ^[\p{L}\p{N}.\-/+=_ !$*?#]{0,1000}$ with a maximum length of 100
at org.owasp.esapi.reference.validation.StringValidationRule.checkWhitelist(StringValidationRule.java:144)
at org.owasp.esapi.reference.validation.StringValidationRule.checkWhitelist(StringValidationRule.java:160)
at org.owasp.esapi.reference.validation.StringValidationRule.getValid(StringValidationRule.java:284)
at org.owasp.esapi.reference.DefaultValidator.getValidInput(DefaultValidator.java:214)
at test.test.ESAPITester.main(ESAPITester.java:25)
My question is: Why does the getValidInput() not canonicalize the URL-encoded input parameter? I'm curious as to why the canonicalize() method does so, but getValidInput() with the final argument ('canonicalize') set to true doesn't.

So the question becomes:
why the 2nd validator.getValidInput() call throws an exception, when
all it is expected to do is to canonicalize the input and validate
that it matches the expected value. In other words, the direct call to
canonicalize() works, but the call to getValidInput() fails.
Something is very wrong here. In the version of HTTPParameterValue that you get from the OWASP source repo, the regex is ^[a-zA-Z0-9.\\-\\/+=#_ ]*$ Someone has manipulated the HTTPParameterValue to look more like SafeString: ^[\\s\\p{L}\\p{N}.]{0,1024}$
See line 440.
This is wrong. Changing default ESAPI values shouldn't be done, if you need custom changes, write a brand new validator.properties entry using the established pattern.
Your test will still fail however, because the string decodes to -/+=_ !$*?# and ? is a reserved character within http queries.
From an earlier spec:
3.4. Query Component
The query component is a string of information to be interpreted by
the resource.
query = *uric
Within a query component, the characters ";", "/", "?", ":", "#",
"&", "=", "+", ",", and "$" are reserved.
As to why the input fails according to the regex you're running at, ^[\\p{L}\\p{N}.\\-/+=_ !$*?#]{0,1000}$, read the code. At line 266 you'll see the affected method.
Here's what you want to look at:
public String getValid( String context, String input ) throws ValidationException
{
String data = null;
// checks on input itself
// check for empty/null
if(checkEmpty(context, input) == null)
return null;
if (validateInputAndCanonical)
{
//first validate pre-canonicalized data
// check length
checkLength(context, input);
// check whitelist patterns
checkWhitelist(context, input);
// check blacklist patterns
checkBlacklist(context, input);
// canonicalize
data = encoder.canonicalize( input );
} else {
//skip canonicalization
data = input;
}
// check for empty/null
if(checkEmpty(context, data, input) == null)
return null;
// check length
checkLength(context, data, input);
// check whitelist patterns
checkWhitelist(context, data, input);
// check blacklist patterns
checkBlacklist(context, data, input);
// validation passed
return data;
The regex gets checked before it even attempts to canonicalize your input.

If a string contains a letter, return the entire String

Weird one but:
Let's say you've a huge html page and if the page contains an email address (looking for an # sign) you want to return that email.
So far I know I need something like this:
String email;
if (myString.contains("#")) {
email = myString.substring("#")
}
I know how to get to the # but how do I go back in the string to find what's before it etc?

if the myString is the string for email you received from html page then ,
you can return the same string if it has # right. something like below
String email;
if (myString.contains("#")) {
email = myString;
}
whats the challenge here.. can you explain any challenge if so ?

This method will give you a list of all the email addresses contained in a string.
static ArrayList<String> getEmailAdresses(String str) {
ArrayList<String> result = new ArrayList<>();
Matcher m = Pattern.compile("\\S+?#[^. ]+(\\.[^. ]+)*").matcher(str.replaceAll("\\s", " "));
while(m.find()) {
result.add(m.group());
}
return result;
}

String email;
if (myString.contains("#")) {
// Locate the #
int atLocation = myString.indexOf("#");
// Get the string before the #
String start = myString.substring(0, atLocation);
// Substring from the last space before the end
start = start.substring(start.lastIndexOf(" "), start.length);
// Get the string after the #
String end = myString.substring(atLocation, myString.length);
// Substring from the first space after the start (of the end, lol)
end = end.substring(end.indexOf(" "), end.length);
// Stick it all together
email = start + "#" + end;
}
This may be a little off as I've been writing javascript all day. :)

Rather than exact code, I would like to give you an approach.
Checking just by # symbol might not be appropriate as it might be possible in other cases as well.
Search through internet or create your own, a regex pattern which matches an email.
(if you want, you can add a check for email providers as well) [here is a link] (http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/)
Get the index of a pattern in a string using regex and find out the substring (email in your case).

Using regular expressions to rename a string

In java, I want to rename a String so it always ends with ".mp4"
Suppose we have an encoded link, looking as follows:
String link = www.somehost.com/linkthatIneed.mp4?e=13974etc...
So, how do I rename the link String so it always ends with ".mp4"?
link = www.somehost.com/linkthatIneed.mp4 <--- that's what I need the final String to be.

Just get the string until the .mp4 part using the following regex:
^(.*\.mp4)
and the first captured group is what you want.
Demo: http://regex101.com/r/zQ6tO5

Another way to do this would be to split the string with ".mp4" as a split char and then add it again :)
Something like :
String splitChar = ".mp4";
String link = "www.somehost.com/linkthatIneed.mp4?e=13974etcrezkhjk"
String finalStr = link.split(splitChar)[0] + splitChar;
easy to do ^^
PS: I prefer to pass by regex but it ask for more knowledge about regex ^^

Well you can also do this:
Match the string with the below regex
\?.*
and replace it with empty string.
Demo: http://regex101.com/r/iV1cZ8

Try below code,
private String trimStringAfterOccurance(String link, String occuranceString) {
Integer occuranceIndex = link.indexOf(occuranceString);
String trimmedString = (String) link.subSequence(0, occuranceIndex + occuranceString.length() );
System.out.println(trimmedString);
return trimmedString;
}

Getting paramValue for paramName in specifed querystring using regex

I like to write a java utility method that returns paramValue for paramName in specified query string
Pattern p = Pattern.compile("\\&?(\\w+)\\= (I don't know what to put here) ");
public String getParamValue(String entireQueryString, String paramName)
{
Matcher m = p.matcher(entireQueryString);
while(m.find()) {
if(m.group(1).equals(paramName)) {
return m.group(2);
}
}
return null;
}
I will be invoking this method from my servlet,
String qs = request.getQueryString(); //action=initASDF&requestId=9078-32&redirect=http://www.mydomain.com?actionId=4343
System.out.println(getParamValue(qs, "requestId"));
The output should be, 9078-32

you can use a regex negated group. See this other SO question: Regular Expressions and negating a whole character group
You'll need to get everything except a &.

Use the proper API to do it: request.getParameter("requestId")

Could you split the string based on ampersands (&) and then search the resulting array for the key (look upto the equals sign).
Here's a link to String.split(): http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29
Here's the type of thing I'm talking about:
private static final String KEY_VALUE_SEPARATOR = "=";
private static final String QUERY_STRING_SEPARATOR = "&";
public String getParamValue(String entireQueryString, String paramName) {
String[] fragments = entireQueryString.split(QUERY_STRING_SEPARATOR);
for (String fragment : fragments){
if (fragment.substring(0, fragment.indexOf(KEY_VALUE_SEPARATOR)).equalsIgnoreCase(paramName)){
return fragment.substring(fragment.indexOf(KEY_VALUE_SEPARATOR)+1);
}
}
throw new RuntimeException("can't find value");
}
The Exception at the end is a pretty rubbish idea but that's not really the important part of this.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex pattern for query string - java

Related

java grouping regex fail to match string+text

Can't seem to get ESAPI Validator getValidInput() Working for URL Parameters

If a string contains a letter, return the entire String

Using regular expressions to rename a string

Getting paramValue for paramName in specifed querystring using regex

Categories

Resources