Regex Remove everything after / except when certain string exists - java

I have certain urls that I am trying to shorten. I want to remove all everything after the / of the url except when that url is equal to plus.google.com
For example:
www.somerubbish.com/about/64848372.meh.php will shorten to www.somerubbish.com
plus.google.com/756934692387498237/about will be left untouched
Any ideas on how I can do this?
My failed attempt is below. I know that the | is saying OR so thats why it is matching the / in the first line as well.
\b!(?:plus.google.com\/.*)\b|\b(?:\/.*)\b
http://regexr.com/3cv6n

Ok I have it.
The answer was to use a negative lookbehind and remove the pipe
(?<!plus.google.com)\b(?:\/.*)\b
https://regex101.com/r/pU3hU4/1

What's wrong with:
if( ! url.contains("plus.google.com")) {
url = StringUtils.substringBefore(url, "/");
}

Related

Regex: Read value between multiple brackets

I currently working on translating a website (Smarty) with Poedit. To get all the text from the .tpl files i'm using regex to get the data between the {t} and {/t}. so an example:
{t}Password incorrect, please try again{/t}
The regex will read Password incorrect, please try again and place it in a .po file. This is all working fine. It goes wrong when it gets a little more advanced.
Sometimes the text between the {t} tags uses a parameter. this looks like this:
{t 1=$email|escape 2=$mailbox}No $1 given, please check your $2{/t}
This is also working great.
The real problem start when i use brackets inside the parameter like this:
{t 1={site info='name'} 2=$mailbox}visit %1 or go to your %2{/t}
My regex will close when it sees the first closing brackets so the result will be 2=$mailbox}visit %1 or go to your %2.
My regex looks like this:
\{t.*?\}?[}]([^\{]+)\{\/t\}|\{t\}([^\{]+)\{\/t\}
The regex is used inside a java program.
Does anybody has a way to fix this problem?
The easiest solution I see on this is to normalize the .tpl files. Just use a regex which matches all tags something like this one:
{[^}]*[^{]*}
I had the same issue to solve and it worked pretty good with the normalizing.
The normalizing-method would look like this:
final String regex = "\\{[^\\}]*[^\\{]*\\}";
private String normalizeContent(String content) {
return content.replaceAll(regex, "");
}

Need java Regex to remove/replace the XML elements from specific string

I have a problem in getting the correct Regular expression.I have below xml as string
<user_input>
<UserInput Question="test Q?" Answer=<value>0</value><sam#testmail.com>"
</user_input>
Now I need to remove the xml character from Answer attribute only.
So I need the below:-
<user_input>
<UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>
I have tried the below regex but did not worked out:-
str1.replaceAll("Answer=.*?<([^<]*)>", "$1");
its removing all the text before..
Can anyone help please?
You need to put ? within the first group to make it none greedy, also you dont need Answer=.*?:
str1.replaceAll("<([^<]*?)>", "$1")
DEMO
httpRequest.send("msg="+data+"&TC="+TC); try like this
Although variable width look-behinds are not supported in Java, you can work around it with .{0,1000} that should suffice.
Please check out this approach using 2 regexes, or 1 regex and 1 replace. Choose the one that suits best (I removed the \n line break from the first input string to show the flaw with using simple replace):
String input = "<user_input><UserInput Question=\"test Q?\" Answer=<value>0</value><sam#testmail.com>\"\n</user_input>";
String st = input.replace("><", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
String st1 = input.replaceAll("(?<=Answer=.{0,1000})><(?=[^\"]*\")", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
System.out.println(st + "\n" + st1);
Output of a sample program:
<user_input UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>
<user_input><UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>
First off, in your sample above, there is a trailing " after the email and > which I do not know if it was placed by error.
However, I will keep it there as according to your expected result, you need it to still be present.
This is my hack.
(Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>) to replace it with
$1$3$5$8 $10
The explanation...
(Answer=)(<)(value)(>) matches from Answer to the start of the value 0
(.+?([^<]*) matches the result from 0 or more right to the beginning < which starts the closing value tag
(</) here, I still select this since it was dropped in the previous expression
(><) I will later replace this with a space
(.+?([^>]*) This matches from the start of the email and excludes the > after the .com
(>) this one selects the last > which I will later drop when replacing.
The trailing " is not selected as I will rather not touch it as requested.

Java replace pattern

I need to replace the ROOMS start and end tag from an xml file.
<A><ROOMS><B></B></ROOMS></A>
becomes
<A><B></B></A>
And also
<A><ROOMS><B></B></ROOMS></A>
becomes
<A><B></B></A>
I tried
Pattern.compile("\\\\\\\\<(.*)ROOMS\\\\\\\\>").matcher(xml).replaceAll("")
, but it does not work.
Can anybody help me?
Your regex is absurd. Just use:
xml = xml.replaceAll( "</?ROOMS>", "" );
Try using
<[/]?ROOMS>
as your pattern. It uses the ? flag to indicate that the XML-closing forward slash should occur 0 or 1 times.
You can probably use this regex :
<[\/]?ROOMS>

Regex to Extract First Part of URL

I need a java regex to extract parts of a URL.
For example, take the following URLs:
http://localhost:81/example
https://test.com/test
http://test.com/
I would want my regex expression to return:
http://localhost:81
https://test.com
http://test.com
I will be using this in a Java patcher.
This is what I have so far, problem is it takes the whole URLs:
^https?:\/\/(?!.*:\/\/)\S+
import Java.net.URL
//snip
URL url = new URL(urlString);
return url.getProtocol() + "://" + url.getAuthority();
The right tool for the right job.
Building off your attempt, try this:
^https?://[^/]+
I'm assuming that you want to capture everything until the first / after http://? (That's what I was getting from your examples - if not, please post some more).
Are these URLs given as one input, or are each a different string?
Edit: It was pointed out that there were unnecessary escapes, so fixed to a more condensed version
Language independent answer:
For the whitespace: replace /^\s+/ with the empty string.
For removing the path information from the URL, if you can assume there aren't any slashes in the path (i.e. you're not dealing with http://localhost:81/foo/bar/baz), replace /\/[^\/]+$/ with the empty string. If there might be more slashes, you might try something like replacing /(^\s*.*:\/\/[^\/]+)\/.*/ with $1.
A simple one: ^(https?://[^/]+)

How to trim characters from the beginning of a string. Android

I am writing an Android app and need some help.
I have a string that contains a URL. Sometimes I get extra text before the url and need to trim that off.
I get this "Some cool sitehttp://somecoolsite.com"
And want this "http://somecoolsite.com"
First, I need to detect if the string does not start with http:// and then if not, I need to trim everything in front of http://
Is there an easy way to do this?
I can do the first part.
if (url.startsWith("http://") == false) {
url.replace("", replacement)
}
Any help?
To check if the string starts with http:// you do
if (inputUrl.startsWith("http://")) {
...
}
To trim off the prefix up until the first occurrence of http:// you do
int index = inputUrl.indexOf("http://");
if (index != -1)
inputUrl = inputUrl.substring(index);
The API documentation for the String class should provide you with all the information you need here.
Use this:
if(inputURL.contains("http://")
inputURL = inputURL.substring(inputURL.indexOf("http://"));
Another option would be:
inputUrl = inputUrl.replaceAll(".*http://","http://");
it should work under all conditions (but I assume the regular expression is a bit less efficient).
Please note that provided answers assume that the string will be in lower case (no "HTTP" or "Http") and that no strings contain https://

Categories

Resources