Java replace pattern - java

I need to replace the ROOMS start and end tag from an xml file.
<A><ROOMS><B></B></ROOMS></A>
becomes
<A><B></B></A>
And also
<A><ROOMS><B></B></ROOMS></A>
becomes
<A><B></B></A>
I tried
Pattern.compile("\\\\\\\\<(.*)ROOMS\\\\\\\\>").matcher(xml).replaceAll("")
, but it does not work.
Can anybody help me?

Your regex is absurd. Just use:
xml = xml.replaceAll( "</?ROOMS>", "" );

Try using
<[/]?ROOMS>
as your pattern. It uses the ? flag to indicate that the XML-closing forward slash should occur 0 or 1 times.

You can probably use this regex :
<[\/]?ROOMS>

Related

Regex Remove everything after / except when certain string exists

I have certain urls that I am trying to shorten. I want to remove all everything after the / of the url except when that url is equal to plus.google.com
For example:
www.somerubbish.com/about/64848372.meh.php will shorten to www.somerubbish.com
plus.google.com/756934692387498237/about will be left untouched
Any ideas on how I can do this?
My failed attempt is below. I know that the | is saying OR so thats why it is matching the / in the first line as well.
\b!(?:plus.google.com\/.*)\b|\b(?:\/.*)\b
http://regexr.com/3cv6n
Ok I have it.
The answer was to use a negative lookbehind and remove the pipe
(?<!plus.google.com)\b(?:\/.*)\b
https://regex101.com/r/pU3hU4/1
What's wrong with:
if( ! url.contains("plus.google.com")) {
url = StringUtils.substringBefore(url, "/");
}

Java(Apex) RegEx not working?

I am having trouble with a regex in salesforce, apex. As I saw that apex is using the same syntax and logic as apex, I aimed this at java developers also.
I debugged the String and it is correct. street equals 'str 3 B'.
When using http://www.regexr.com/, the regex works('\d \w$').
The code:
Matcher hasString = Pattern.compile('\\d \\w$').matcher(street);
if(hasString.matches())
My problem is, that hasString.matches() resolves to false. Can anyone tell me if I did something somewhere wrong? I tried to use it without the $, with difference casing, etc. and I just can't get it to work.
Thanks in advance!
You need to use find instead of matches for partial input match as matches attempts to match complete input text.
Matcher hasString = Pattern.compile("\\d \\w$").matcher(street);
if(hasString.find()) {
// matched
System.out.println("Start position: " + hasString.start());
}

Need java Regex to remove/replace the XML elements from specific string

I have a problem in getting the correct Regular expression.I have below xml as string
<user_input>
<UserInput Question="test Q?" Answer=<value>0</value><sam#testmail.com>"
</user_input>
Now I need to remove the xml character from Answer attribute only.
So I need the below:-
<user_input>
<UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>
I have tried the below regex but did not worked out:-
str1.replaceAll("Answer=.*?<([^<]*)>", "$1");
its removing all the text before..
Can anyone help please?
You need to put ? within the first group to make it none greedy, also you dont need Answer=.*?:
str1.replaceAll("<([^<]*?)>", "$1")
DEMO
httpRequest.send("msg="+data+"&TC="+TC); try like this
Although variable width look-behinds are not supported in Java, you can work around it with .{0,1000} that should suffice.
Please check out this approach using 2 regexes, or 1 regex and 1 replace. Choose the one that suits best (I removed the \n line break from the first input string to show the flaw with using simple replace):
String input = "<user_input><UserInput Question=\"test Q?\" Answer=<value>0</value><sam#testmail.com>\"\n</user_input>";
String st = input.replace("><", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
String st1 = input.replaceAll("(?<=Answer=.{0,1000})><(?=[^\"]*\")", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
System.out.println(st + "\n" + st1);
Output of a sample program:
<user_input UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>
<user_input><UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>
First off, in your sample above, there is a trailing " after the email and > which I do not know if it was placed by error.
However, I will keep it there as according to your expected result, you need it to still be present.
This is my hack.
(Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>) to replace it with
$1$3$5$8 $10
The explanation...
(Answer=)(<)(value)(>) matches from Answer to the start of the value 0
(.+?([^<]*) matches the result from 0 or more right to the beginning < which starts the closing value tag
(</) here, I still select this since it was dropped in the previous expression
(><) I will later replace this with a space
(.+?([^>]*) This matches from the start of the email and excludes the > after the .com
(>) this one selects the last > which I will later drop when replacing.
The trailing " is not selected as I will rather not touch it as requested.

Need a regex expression to get value between two tags

Need regular expression to extract the values between >xxxxx<. Can anybody help me in this.
<ChangeID type="String">C10286</ChangeID>
<ChangeID type="String">C10296</ChangeID>
Is it possible to get the two values in a comma separated format like C10286,C10296 in a single regex expression?
Thanks and Regards
Riyas Hussain A
try this:
(?<=>)[^<]*
test it with grep -Po:
kent$ echo '<ChangeID type="String">C10286</ChangeID>
<ChangeID type="String">C10296</ChangeID>'|grep -Po '(?<=>)[^<]*'
C10286
C10296
My idea would be to lookup for all words and remove the ones we don't need (in case you have more than 1 value inside your tag):
(?!ChangeID\b)(?!type\b)(?!String\b)\b\w+
You can try it out on : http://regexpal.com/

Regex to Extract First Part of URL

I need a java regex to extract parts of a URL.
For example, take the following URLs:
http://localhost:81/example
https://test.com/test
http://test.com/
I would want my regex expression to return:
http://localhost:81
https://test.com
http://test.com
I will be using this in a Java patcher.
This is what I have so far, problem is it takes the whole URLs:
^https?:\/\/(?!.*:\/\/)\S+
import Java.net.URL
//snip
URL url = new URL(urlString);
return url.getProtocol() + "://" + url.getAuthority();
The right tool for the right job.
Building off your attempt, try this:
^https?://[^/]+
I'm assuming that you want to capture everything until the first / after http://? (That's what I was getting from your examples - if not, please post some more).
Are these URLs given as one input, or are each a different string?
Edit: It was pointed out that there were unnecessary escapes, so fixed to a more condensed version
Language independent answer:
For the whitespace: replace /^\s+/ with the empty string.
For removing the path information from the URL, if you can assume there aren't any slashes in the path (i.e. you're not dealing with http://localhost:81/foo/bar/baz), replace /\/[^\/]+$/ with the empty string. If there might be more slashes, you might try something like replacing /(^\s*.*:\/\/[^\/]+)\/.*/ with $1.
A simple one: ^(https?://[^/]+)

Categories

Resources