How to trim characters from the beginning of a string. Android - java

I am writing an Android app and need some help.
I have a string that contains a URL. Sometimes I get extra text before the url and need to trim that off.
I get this "Some cool sitehttp://somecoolsite.com"
And want this "http://somecoolsite.com"
First, I need to detect if the string does not start with http:// and then if not, I need to trim everything in front of http://
Is there an easy way to do this?
I can do the first part.
if (url.startsWith("http://") == false) {
url.replace("", replacement)
}
Any help?

To check if the string starts with http:// you do
if (inputUrl.startsWith("http://")) {
...
}
To trim off the prefix up until the first occurrence of http:// you do
int index = inputUrl.indexOf("http://");
if (index != -1)
inputUrl = inputUrl.substring(index);
The API documentation for the String class should provide you with all the information you need here.

Use this:
if(inputURL.contains("http://")
inputURL = inputURL.substring(inputURL.indexOf("http://"));

Another option would be:
inputUrl = inputUrl.replaceAll(".*http://","http://");
it should work under all conditions (but I assume the regular expression is a bit less efficient).

Please note that provided answers assume that the string will be in lower case (no "HTTP" or "Http") and that no strings contain https://

Related

Regex Remove everything after / except when certain string exists

I have certain urls that I am trying to shorten. I want to remove all everything after the / of the url except when that url is equal to plus.google.com
For example:
www.somerubbish.com/about/64848372.meh.php will shorten to www.somerubbish.com
plus.google.com/756934692387498237/about will be left untouched
Any ideas on how I can do this?
My failed attempt is below. I know that the | is saying OR so thats why it is matching the / in the first line as well.
\b!(?:plus.google.com\/.*)\b|\b(?:\/.*)\b
http://regexr.com/3cv6n
Ok I have it.
The answer was to use a negative lookbehind and remove the pipe
(?<!plus.google.com)\b(?:\/.*)\b
https://regex101.com/r/pU3hU4/1
What's wrong with:
if( ! url.contains("plus.google.com")) {
url = StringUtils.substringBefore(url, "/");
}

Website/URL Validation Regex in JAVA

I need a regex string to match URL starting with "http://", "https://", "www.", "google.com"
the code i tried using is:
//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("(http://|https://)(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?");
Matcher m;
m=p.matcher(urlAddress);
but this code only can match url such as "http://www.google.com"
I know this ma be a dupicate question but i have tried all of the regex provided and it does not suit my requirement. Willl someone please help me? Thank you.
You need to make (http://|https://) part in your regex as optional one.
^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$
DEMO
You can use the Apache commons library(org.apache.commons.validator.UrlValidator) for validating a url:
String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
And use :-
urlValidator.isValid(your url)
Then there is no need of regex.
Link:-
https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html
If you use Java, I recommend use this RegEx (I wrote it by myself):
^(https?:\/\/)?(www\.)?([\w]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String
to explain:
^ = line start
(https?://)? = "http://" or "https://" may occur.
(www.)? = "www." may orrur.
([\w]+.)+ = a word ([a-zA-Z0-9]) has to occur one or more times. (extend here if you need special characters like ü, ä, ö or others in your URL - remember to use IDN.toASCII(url) if you use special characters. If you need to know which characters are legal in general: https://kb.ucla.edu/articles/what-characters-can-go-into-a-valid-http-url
[‌​\w]{2,63} = a word ([a-zA-Z0-9]) with 2 to 63 characters has to occur exactly one time. (a TLD (top level domain (for example .com) can not be shorter than 2 or longer than 63 characters)
/? = a "/"-character may occur. (some people or servers put a / at the end... whatever)
$ = line end
-
If you extend it by special characters it could look like this:
^(https?:\/\/)?(www\.)?([\w\Q$-_+!*'(),%\E]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w\\Q$-_+!*'(),%\\E]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String
The answer of Avinash Raj is not fully correct.
^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$
The dots are not escaped what means it matches with any character. Also my version is simpler and I never heard of a domain like "test..com" (which actually matches...)
Demo: https://regex101.com/r/vM7wT6/279
Edit:
As I saw some people needing a regex which also matches servers directories I wrote this:
^(https?:\/\/)?([\w\Q$-_+!*'(),%\E]+\.)+(\w{2,63})(:\d{1,4})?([\w\Q/$-_+!*'(),%\E]+\.?[\w])*\/?$
while this may not be the best one, since I didn't spend too much time with it, maybe it helps someone. You can see how it works here: https://regex101.com/r/vM7wT6/700
It also matches urls like "hello.to/test/whatever.cgi"
Java compatible version of #Avinash's answer would be
//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("^(http://|https://)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$");
Matcher m;
m=p.matcher(urlAddress);
boolean matches = m.matches();
pattern="w{3}\.[a-z]+\.?[a-z]{2,3}(|\.[a-z]{2,3})"
this will only accept addresses like e.g www.google.com & www.google.co.in
//I use that
static boolean esURL(String cadena){
boolean bandera = false;
bandera = cadena.matches("\\b(https://?|ftp://|file://|www.)[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]");
return bandera;
}

Replace design pattern in query string

I have currently some URL like this :
?param=value&offset=19&size=100
Or like that :
?offset=45&size=50&param=lol
And I would like to remove for each case the "offset" and the "value". I'm using the regex method but I don't understand how it's really working... Can you please help me for that?
I also want to get both values of the offset and the size.
Here is my work...
\(?|&)offset=([0-9])[*]&size=([0-9])[*]\
But it doesn't works at all!
Thanks.
Assuming is Javascript & you only want to remove offset param:
str.replace(\offset=[0-9]*&?\,"")
For Java:
str=str.replaceAll("offset=[0-9]*&?","");
//to remove & and ? at the end in some cases
if (str.endsWith("?") || str.endsWith("&"))
str=str.substring(0,str.length()-1);
With out regex .
String queryString ="param=value&offset=19&size=100";
String[] splitters = queryString.split("&");
for (String str : splitters) {
if (str.startsWith("offset")) {
String offset = str.substring(str.indexOf('=') + 1);//like wise size also
System.out.println(offset); //prints 19
}
}
If you need to use a regular expression for this then try this string in java for the regular expression (replace with nothing):
"(?>(?<=\\?)|&)(?>value|offset)=.*?(?>(?=&)|$)"
It will remove any parameter in your URL that has the name 'offset' or 'value'. It will also conserve any required parameter tokens for other parameters in the URL.

Java regular expression on matching asterisk only when it is the last character

Anyone can spot any error in this code?
String value = "/files/etc/hosts/*";
if (value.matches("\\*$")) {
System.out.println("MATCHES!");
}
I am trying to do some operation when the last character of a string is an asterisk.
The syntax looks correct to me, I tested it on http://regexpal.com/
Thanks in advance!
Why not just use:
if (value.endsWith("*")) {
String.matches() only returns true if the regex matches the entire CharSequence.
Try either this:
value.matches(".*?\\*$")
Or use a Pattern object.
EDIT: Per comment request.
Pattern glob = Pattern.compile("\\*$");
if (glob.matcher(value).find()) {
System.out.println("MATCHES!");
}
You need to match everything in the String when using String#matches:
if (value.matches(".*\\*$")) {

Regex to Extract First Part of URL

I need a java regex to extract parts of a URL.
For example, take the following URLs:
http://localhost:81/example
https://test.com/test
http://test.com/
I would want my regex expression to return:
http://localhost:81
https://test.com
http://test.com
I will be using this in a Java patcher.
This is what I have so far, problem is it takes the whole URLs:
^https?:\/\/(?!.*:\/\/)\S+
import Java.net.URL
//snip
URL url = new URL(urlString);
return url.getProtocol() + "://" + url.getAuthority();
The right tool for the right job.
Building off your attempt, try this:
^https?://[^/]+
I'm assuming that you want to capture everything until the first / after http://? (That's what I was getting from your examples - if not, please post some more).
Are these URLs given as one input, or are each a different string?
Edit: It was pointed out that there were unnecessary escapes, so fixed to a more condensed version
Language independent answer:
For the whitespace: replace /^\s+/ with the empty string.
For removing the path information from the URL, if you can assume there aren't any slashes in the path (i.e. you're not dealing with http://localhost:81/foo/bar/baz), replace /\/[^\/]+$/ with the empty string. If there might be more slashes, you might try something like replacing /(^\s*.*:\/\/[^\/]+)\/.*/ with $1.
A simple one: ^(https?://[^/]+)

Categories

Resources