I'm using this method to parse out plain text URLs in some HTML and make them links
private String fixLinks(String body) {
String regex = "^(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]";
body = body.replaceAll(regex, "$1");
Log.d(TAG, body);
return body;
}
No URLs are replaced in the HTML however. The regular expression seems to be matching URLs in other regular expression testers. What's going on?
The ^ anchor means the regex can only match at the start of the string. Try removing it.
Also, it looks like you mean $0 rather than $1, since you want the entire match and not the first capture group, which is (https?|ftp|file).
In summary, the following works for me:
private String fixLinks(String body) {
String regex = "(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]";
body = body.replaceAll(regex, "$0");
Log.d(TAG, body);
return body;
}
Related
I'm trying to create a redirect URL for my client. We have a service that you specify "fromUrl" -> "toUrl" that is using a java regex Matcher. But I can't get it work to include the token in when it converts it. For example:
/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
Should be:
/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
but it excludes the token so the result I get is:
/fromurl/login/
/tourl/login/
I tried various regex patterns like: " ?.* and [%5E//?]+)/([^/?]+)/(?.*)?$ and (/*) etc" but no one seems to work.
I'm not that familiar with regex. How can I solve this?
This can be easily done using simple string replace but if you insist on using regular expressions:
Pattern p = Pattern.compile("fromurl");
String originalUrlAsString = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf ";
String newRedirectedUrlAsString = p.matcher(originalUrlAsString).replaceAll("tourl");
System.out.println(newRedirectedUrlAsString);
If I understand you correctly you need something like this?
String from = "/my/old/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceAll("\\/(.*)\\/", "/my/new/url/");
System.out.println(to); // /my/new/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
This will replace everything between the first and the last forward slash.
Can you detail more exactly what the original expression is like? This is necessary because the regular expression is based on it.
Assuming that the first occurrence of fromurl should simply be replaced with the following code:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceFirst("fromurl", "tourl");
But if it is necessary to use more complex rules to determine the substring to replace, you can use:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = "";
String regularExpresion = "(<<pre>>)(fromurl)(<<pos>>)";
Pattern pattern = Pattern.compile(regularExpresion);
Matcher matcher = pattern.matcher(from);
if (matcher.matches()) {
to = from.replaceAll(regularExpresion, "$1tourl$3");
}
NOTE: pre and pos targets are referencial because I don't know the real expresion of the url
NOTE 2: $1 and $3 refer to the first and the third group
Although existing answers should solve the issue and some are similar, maybe below solution would be of help, with quite an easy regex being used (assuming you get input of same format as your example):
private static String replaceUrl(String inputUrl){
String regex = "/.*(/login\\?token=.*)";
String toUrl = "/tourl";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(inputUrl);
if (matcher.find()) {
return toUrl + matcher.group(1);
} else
return null;
}
You can write a test if it works for other expected inputs/outputs if you want to change format and adjust regex:
String inputUrl = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String expectedUrl = "/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
if (expectedUrl.equals(replaceUrl(inputUrl))){
System.out.println("Success");
}
I tried searching for something similar, and couldn't find anything. I'm having difficulty trying to replace a few characters after a specific part in a URL.
Here is the URL: https://scontent-b.xx.fbcdn.net/hphotos-xpf1/v/t1.0-9/s130x130/10390064_10152552351881633_355852593677844144_n.jpg?oh=479fa99a88adea07f6660e1c23724e42&oe=5519DE4B
I want to remove the /v/ part, leave the t1.0-9, and also remove the /s130x130/.I cannot just replace s130x130, because those may be different variables. How do I go about doing that?
I have a previous URL where I am using this code:
if (pictureUri.indexOf("&url=") != -1)
{
String replacement = "";
String url = pictureUri.replaceAll("&", "/");
String result = url.replaceAll("().*?(/url=)",
"$1" + replacement + "$2");
String pictureUrl = null;
if (result.startsWith("/url="))
{
pictureUrl = result.replace("/url=", "");
}
}
Can I do something similar with the above URL?
With the regex
/v/|/s\d+x\d+/
replaced with
/
It turns the string from
https://scontent-b.xx.fbcdn.net/hphotos-xpf1/v/t1.0-9/s130x130/10390064_10152552351881633_355852593677844144_n.jpg?oh=479fa99a88adea07f6660e1c23724e42&oe=5519DE4B
to
https://scontent-b.xx.fbcdn.net/hphotos-xpf1/t1.0-9/10390064_10152552351881633_355852593677844144_n.jpg?oh=479fa99a88adea07f6660e1c23724e42&oe=5519DE4B
as seen here. Is this what you're trying to do?
In java, I want to rename a String so it always ends with ".mp4"
Suppose we have an encoded link, looking as follows:
String link = www.somehost.com/linkthatIneed.mp4?e=13974etc...
So, how do I rename the link String so it always ends with ".mp4"?
link = www.somehost.com/linkthatIneed.mp4 <--- that's what I need the final String to be.
Just get the string until the .mp4 part using the following regex:
^(.*\.mp4)
and the first captured group is what you want.
Demo: http://regex101.com/r/zQ6tO5
Another way to do this would be to split the string with ".mp4" as a split char and then add it again :)
Something like :
String splitChar = ".mp4";
String link = "www.somehost.com/linkthatIneed.mp4?e=13974etcrezkhjk"
String finalStr = link.split(splitChar)[0] + splitChar;
easy to do ^^
PS: I prefer to pass by regex but it ask for more knowledge about regex ^^
Well you can also do this:
Match the string with the below regex
\?.*
and replace it with empty string.
Demo: http://regex101.com/r/iV1cZ8
Try below code,
private String trimStringAfterOccurance(String link, String occuranceString) {
Integer occuranceIndex = link.indexOf(occuranceString);
String trimmedString = (String) link.subSequence(0, occuranceIndex + occuranceString.length() );
System.out.println(trimmedString);
return trimmedString;
}
How to capture the urlOfHref but not to use Pattern & Matcher.
I am using Gwt plaform & it doesn't have Pattern & Matcher.
This code is ok, but it uses Pattern & Matcher.
public static String getTheUrlOfHref(String href){
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(href);
String url = null;
if (m.find()) {
url = m.group(1); // this variable should contain the link URL
}
return url;
}
So How to extract the Url of hyperlink (not to use Pattern & Matcher) (Java Regex)?
Hmm, you could perhaps remove everything you don't need instead?
public static String getTheUrlOfHref(String href){
href = href.replaceAll("^.*?href=\"", ""); // Remove everything before
// and including href="
String url = test.substring(0,test.indexOf('"')); // Get everything till
// first " character
return url;
}
ideone demo
I didn't put anything to handle errors. I guess you could add that yourself.
Or maybe:
public static String getTheUrlOfHref(String href){
String url = href.replaceAll("^.*?href=\"([^\"]*)\".*", "$1");
return url;
}
If you need to do more string processing on client side, then you will have to learn how to write Javascript and incorporate it into the GWT-generated javascript code.
See for example JSNI dynamic function reference in GWT.
If you want to do this on server side, then Pattern and Matcher will work just fine.
Greetings all.
I am using the following regex to detect urls in a string
and wrap them inside the < a > tag
public static String detectUrls(String text) {
String newText = text
.replaceAll("(?:https?|ftps?|http?)://[\\w/%.-?&=]+",
"<a href='$0'>$0</a>").replaceAll(
"(www\\.)[\\w/%.-?&=]+", "<a href='http://$0'>$0</a>");
return newText;
}
i have a problem that the following links are not detected correctly:
i am not that good with regex, so please advise.
http://code.google.com/p/shindig-dnd/
http://confluence.atlassian.com/display/GADGETDEV/Gadgets+and+JIRA+Portlets
www.liferay.com/web/raymond.auge/blog/
(www.opensocial.org/)
http://www.google.com
I'm using this:
private static final String URL_REGEX =
"http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";
Matcher matcher = URL_PATTERN.matcher(text);
text = matcher.replaceAll("$0");
return text;
The problem you have is that you are using - within a character group ([]) without escaping it, which is being used to define the range .-? (i.e. the characters ./0123456789:;<=>?). Either escape it \\- or put it at the end of the character class so that it doesn't complete a range.
public static String detectUrls(String text) {
String newText = text
.replaceAll("(?:https?|ftps?|http?)://[\\w/%.\\-?&=]+",
"<a href='$0'>$0</a>").replaceAll(
"(www\\.)[\\w/%.\\-?&=]+", "<a href='http://$0'>$0</a>");
return newText;
}
As marcog said, you should escape the - and to match the last 2 examples you gave, you have to make the http optionnal. Also http? matches htt wich is not a correct protocol.
So the regex will be:
"(?:(?:https?|ftps?)://)?[\\w/%.?&=-]+"