URL Pattern in spark

URL Pattern in spark - java

I'm trying to make a filter that matches the following URL's:
/foo and /foo/*
So anything under /foo/ and also the base case /foo
I have this filter:
Spark.before("/foo/*", (request, response) -> {
String ticket = request.cookie("session");
if (ticket == null) {
Spark.halt(302);
}
});
But of course this does not execute when I enter to /foo
I tried with the following but with no luck:
/foo*
/foo.*
/foo/
Is there anyway to achieve this? Or maybe to use a list of URL? so that I can assign both url to the same filter.
And please don't say to store the function in a variable so that the I use it two times because I think that is not clean at all..

According to https://github.com/perwendel/spark/blob/1ecd428c8e2e5b0d1b8f221e9bf9e82429bd73bb/src/main/java/spark/route/RouteEntry.java#L31 (where the path matching takes place), it does not look like what you want to do is possible.
The RouteEntry code splits the given pattern, and the given url at the '/' character, and then looks for "matches" (equality or wildcard) for each of the components.
Here's a more detailed explanation:
The url /foo/blah has 2 parts (in the terminology of RouteEntry code linked to above), while /foo has 1 part. For a pattern to match the first, it must have 2 parts : /foo/* is the only one that makes sense. But this pattern has 2 parts, which makes /foo fail the if checks on both lines 49 and 78. The only special case is the hack on line 71, which should make the pattern /foo/* match the url /foo/, but not /foo.

Related

Regex to validate wildcard domains with special conditions

I am looking to validate wildcards against Samsung Knox Firewall. Please see below the full criteria for all domains:
A list of URLs for specified domain names to block DNS resolution. The format of the URL must be compliant with RFC's standards and must also match one of the following rules:
Full URL: "www.google.com"
Partial URL: "android.com"; "www.samsung"; "google". The
character "*" (wildcard) must be at the beginning and/or at the end
of the URL otherwise the URL is invalid.
Special case, matches any URL : "*"
Valid domains
The following examples are considered valid by Knox.
*.test.com
*test.com
*test
*test*
test.*
test1.test.*
Invalid domains
The following examples are considered invalid by Knox.
*test-
*test.
*test.com-
*test-.com
Is anybody able to offer a hand? I am struggling to accommodate for all of the requirements with this one.
Current code:
(?=^\*|.*\*$)^(?:\*\.?)?(?:(?:[a-z0-9-]+(?(?=\.)(?<!-)\.(?!-)))+[a-z]+)(?:\.?\*)?$
Edit: Actually, it looks like conditional regex may not even be supported in Java.

BASED ON YOUR PROVIDED EXAMPLES
If you're trying to pre-filter the domains, then this one matches all of your "Valid" examples and rejects all of your "Invalid" examples
^[\w*]([\w*-]+[\w*])?(\.[\w*]([\w*-]+[\w*])?)*$
If there's a file or carriage return separated field with all of these in it that you're trying to test, you may want to use the "multiline" switch like so:
(?m)^[\w*]([\w*-]+[\w*])?(\.[\w*]([\w*-]+[\w*])?)*$
since you tagged java, that would be encoded into a java string as follows:
"(?m)^[\\w*]([\\w*-]+[\\w*])?(\\.[\\w*]([\\w*-]+[\\w*])?)*$"
EDIT - Matching all the rules, in addition to your provided examples
This expression seems to work:
^(\*|(\*|\*\.)?\w+(\.\w+)*(\.\*|\*)?)$
Matching/Non-matching examples:
MATCHING NON-MATCHING
------------ ------------
* *test-
*.test.com *test.
*test.com *test.com-
*test *test-.com
*test* test*.com
test.* test.*com
test1.test.* -test.com

Using regex to find chars in a string and replace

When returning a string value from an incoming request in my network based app, I have a string like this
'post http://a.com\r\nHost: a.com\r\n'
Issue is that the host is always changing so I need to replace it with my defined host. To accomplish that I tried using regex but am stuck trying to find the 'host:a.com' chars in the string and replacing it with a defined valued.
I tried using this example www.javamex.com/tutorials/regular_expressions/search_replace_loop.shtml#.VUWvt541jqB changing the pattern compile to :([\\d]+) but it still remains unchanged.
My goal is to replace given chars in a string with a defined value and returning the new string with the defined value.
Any pointers?
EDIT:
Sample of a typical incoming request:
Post http://example.com\r\nHost: example.com\r\nConnection: close\r\n
Another incoming request might take this form:
GET http://example2.net\r\nContent-Length: 2\r\nConnection: close\r\nHost: example2.net\r\n
I want to replace it to this forms
Post http://example.com\r\nHost: mycustomhostvalue.com\r\nConnection: close\r\n
GET http://example2.net\r\nContent-Length: 2\r\nConnection: close\r\nHost: mycustomhostvalue.com\r\n

Use a regex to replace it, like this:
content = content.replaceAll("Host:\\s*(\\w)*\\.\\w*", "Host: newhost.com")
This will replace anything after Host: with newHost.com.
Note: as per comment by cfqueryparam, you may want to usea regex like this to cover .co.uk and such:
Host:\\s*.*?(?=\\\\r\\\\n)

java regex matcher results != to notepad++ regex find result

I am trying to extract data out of a website access log as part of a java program. Every entry in the log has a url. I have successfully extracted the url out of each record.
Within the url, there is a parameter that I want to capture so that I can use it to query a database. Unfortunately, it doesn't seem that the web developers used any one standard to write the parameter's name.
The parameter is usually called "course_id", but I have also seen "courseId", "course%3DId", "course%253Did", etc. The format for the parameter name and value is usually course_id=_22222_1, where the number I want is between the "_" and "_1". (The value is always the same, even if the parameter name varies.)
So, my idea was to use the regex /^.*course_id[^_]*_(\d*)_1.*$/i to find and extract the number.
In java, my code is
java.util.regex.Pattern courseIDPattern = java.util.regex.Pattern.compile(".*course[^i]*id[^_]*_(\\d*)_1.*", java.util.regex.Pattern.CASE_INSENSITIVE);
java.util.regex.Matcher courseIDMatcher = courseIDPattern.matcher(_url);
_courseID = "";
if(courseIDMatcher.matches())
{
_courseID = retrieveCourseID(courseIDMatcher.group(1));
return;
}
This works for a lot of the records. However, some records do not record the course_id, even though the parameter is in the url. One such example is the record:
/webapps/contentDetail?course_id=_223629_1&content_id=_3641164_1&rich_content_level=RICH&language=en_US&v=1&ver=4.1.2
However, I used notepad++ to do a regex replace on this (in fact, every) url using the regex above, and the url was successfully replaced by the course ID, implying that the regex is not incorrect.
Am I doing something wrong in the java code, or is the java matcher broken?

Website/URL Validation Regex in JAVA

I need a regex string to match URL starting with "http://", "https://", "www.", "google.com"
the code i tried using is:
//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("(http://|https://)(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?");
Matcher m;
m=p.matcher(urlAddress);
but this code only can match url such as "http://www.google.com"
I know this ma be a dupicate question but i have tried all of the regex provided and it does not suit my requirement. Willl someone please help me? Thank you.

You need to make (http://|https://) part in your regex as optional one.
^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$
DEMO

You can use the Apache commons library(org.apache.commons.validator.UrlValidator) for validating a url:
String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
And use :-
urlValidator.isValid(your url)
Then there is no need of regex.
Link:-
https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html

If you use Java, I recommend use this RegEx (I wrote it by myself):
^(https?:\/\/)?(www\.)?([\w]+\.)+[‌\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w]+\\.)+[‌\\w]{2,63}\\/?$" // as Java-String
to explain:
^ = line start
(https?://)? = "http://" or "https://" may occur.
(www.)? = "www." may orrur.
([\w]+.)+ = a word ([a-zA-Z0-9]) has to occur one or more times. (extend here if you need special characters like ü, ä, ö or others in your URL - remember to use IDN.toASCII(url) if you use special characters. If you need to know which characters are legal in general: https://kb.ucla.edu/articles/what-characters-can-go-into-a-valid-http-url
[‌\w]{2,63} = a word ([a-zA-Z0-9]) with 2 to 63 characters has to occur exactly one time. (a TLD (top level domain (for example .com) can not be shorter than 2 or longer than 63 characters)
/? = a "/"-character may occur. (some people or servers put a / at the end... whatever)
$ = line end
-
If you extend it by special characters it could look like this:
^(https?:\/\/)?(www\.)?([\w\Q$-_+!*'(),%\E]+\.)+[‌\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w\\Q$-_+!*'(),%\\E]+\\.)+[‌\\w]{2,63}\\/?$" // as Java-String
The answer of Avinash Raj is not fully correct.
^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$
The dots are not escaped what means it matches with any character. Also my version is simpler and I never heard of a domain like "test..com" (which actually matches...)
Demo: https://regex101.com/r/vM7wT6/279
Edit:
As I saw some people needing a regex which also matches servers directories I wrote this:
^(https?:\/\/)?([\w\Q$-_+!*'(),%\E]+\.)+(\w{2,63})(:\d{1,4})?([\w\Q/$-_+!*'(),%\E]+\.?[\w])*\/?$
while this may not be the best one, since I didn't spend too much time with it, maybe it helps someone. You can see how it works here: https://regex101.com/r/vM7wT6/700
It also matches urls like "hello.to/test/whatever.cgi"

Java compatible version of #Avinash's answer would be
//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("^(http://|https://)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$");
Matcher m;
m=p.matcher(urlAddress);
boolean matches = m.matches();

pattern="w{3}\.[a-z]+\.?[a-z]{2,3}(|\.[a-z]{2,3})"
this will only accept addresses like e.g www.google.com & www.google.co.in

//I use that
static boolean esURL(String cadena){
boolean bandera = false;
bandera = cadena.matches("\\b(https://?|ftp://|file://|www.)[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]");
return bandera;
}

Regex to Extract First Part of URL

I need a java regex to extract parts of a URL.
For example, take the following URLs:
http://localhost:81/example
https://test.com/test
http://test.com/
I would want my regex expression to return:
http://localhost:81
https://test.com
http://test.com
I will be using this in a Java patcher.
This is what I have so far, problem is it takes the whole URLs:
^https?:\/\/(?!.*:\/\/)\S+

import Java.net.URL
//snip
URL url = new URL(urlString);
return url.getProtocol() + "://" + url.getAuthority();
The right tool for the right job.

Building off your attempt, try this:
^https?://[^/]+
I'm assuming that you want to capture everything until the first / after http://? (That's what I was getting from your examples - if not, please post some more).
Are these URLs given as one input, or are each a different string?
Edit: It was pointed out that there were unnecessary escapes, so fixed to a more condensed version

Language independent answer:
For the whitespace: replace /^\s+/ with the empty string.
For removing the path information from the URL, if you can assume there aren't any slashes in the path (i.e. you're not dealing with http://localhost:81/foo/bar/baz), replace /\/[^\/]+$/ with the empty string. If there might be more slashes, you might try something like replacing /(^\s*.*:\/\/[^\/]+)\/.*/ with $1.

A simple one: ^(https?://[^/]+)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

URL Pattern in spark - java

Related

Regex to validate wildcard domains with special conditions

Using regex to find chars in a string and replace

java regex matcher results != to notepad++ regex find result

Website/URL Validation Regex in JAVA

Regex to Extract First Part of URL

Categories

Resources