My regex does not remove the dash between numbers in java - java

I have tried to use regex in JAVA for replacing any funny character in a string for mobile numbers, however, it doesnt seems to be able to remove the '-' between the numbers
here is my code,
// Remove all (,),-,.,[,],<,>,{,} from string
myMobileNumber.replaceAll("[^\\d]", "");
example 65-12345678
it will still allows the - to go through without deleting it away. =(

You should reassign the result. A String is an immutable object, and all methods including .replaceAll won't modify it.
myMobileNumber = myMobileNumber.replaceAll("[^\\d]", "");
(BTW, the pattern "\\D" is equivalent to "[^\\d]".)

Related

Java String Regex replacement

Sample Input:
a:b
a.in:b
asds.sdsd:b
a:b___a.sds:bc___ab:bd
Sample Output:
a:replaced
a.in:replaced
asds.sdsd:replaced
a:replaced___a.sds:replaced___ab:replaced
String which comes after : should be replaced with custom function.
I have done the same without Regex. I feel it can be replaced with regex as we are trying to extract string out of specific pattern.
For first three cases, it's simple enough to extract String after :, but I couldn't find a way to deal with third case, unless I split the string ___ and apply the approach for first type of pattern and again concatenate them.
Just replace only the letters with exists next to : with the string replaced.
string.replaceAll("(?<=:)[A-Za-z]+", "replaced");
DEMO
or
If you also want to deal with digits, then add \d inside the char class.
string.replaceAll("(?<=:)[A-Za-z\\d]+", "replaced");
(:)[a-zA-Z]+
You can simply do this with string.replaceAll.Replace by $1replaced.See demo.
https://regex101.com/r/fX3oF6/18

How can I write a regex in Java that will perform a .replaceFirst on a group that is not in a comment?

So I need to return modified String where it replaces the first instance of a token with another token while skipping comments. Here's an example of what I'm talking about:
This whole quote is one big String
-- I don't want to replace this ##
But I want to replace this ##!
Being a former .NET developer, I thought this was easy. I'd just do a negative lookbehind like this:
(?<!--.*)##
But then I learned Java can't do this. So upon learning that the curly braces are okay, I tried this:
(?<!--.{0,9001})##
That didn't throw an exception, but it did match the ## in the comment.
When I test this regex with a Java regex tester, it works as expected. About the only thing I can think of is that I'm using Java 1.5. Is it possible that Java 1.5 has a bug in its regex engine? Assuming it does, how do I get Java 1.5 to do what I want it to do without breaking up my string and reassembling it?
EDIT I changed the # to the -- operator since it looks like the regex will be more complex with two chars instead of one. I originally did not reveal that I was modifying a query in order to avoid off topic discussion on "Well you shouldn't modify queries that way!" I have a very good reason for doing this. Please don't discuss query modification good practices. Thanks
You really don't need a negative look-behind here. You can do it without that too.
It would be like this:
String str = "I don't want to replace this ##";
str = str.replaceAll("^([^#].*?)##", "$1");
So, it replaces first occurrence of ## in the string that does not start with # with the part of the string before ##. So, ## is removed. Here replaceAll works because it uses a reluctant quantifier - .*?. So, it will automatically stop at the first ##.
As correctly pointed out by #nhahtdh in the comment, that this might fail, if your comment is at the end of the line. So, you can rather use this one:
String str = "I don't want to # replace this ##";
str = str.replaceAll("^([^#]*?)##", "$1");
This one will work for any case. And in the given example case, it won't replace the ##, as it is a part of the comment.
If your comment start is denoted by two characters, then negated character class won't work. You would need to use negative look-ahead like this:
String str = "This whole quote ## is one big String -- asdf ##\n" +
"-- I don't want to replace this ##\n" +
"But I want to replace this ##!";
str = str.replaceAll("(?m)^(((?!--).)*?)##", "$1");
System.out.println(str);
Output:
This whole quote is one big String -- asdf ##
-- I don't want to replace this ##
But I want to replace this !
(?m) at the beginning of the pattern is used to enable MULTILINE mode of matching, so the ^ will match the start of each line, rather than the start of the entire expression.
You can use something like this:
String string = "This whole quote is one big String\n" +
"# I don't want to replace this ##\n" +
"And I also # don't want to replace this ##\n" +
"But I want to replace this ##!\n" +
"But not this ##!";
Matcher m =
Pattern.compile (
"^((?:[^##]|#[^#]|#[^\n]*)*)##", Pattern.MULTILINE).
matcher (string);
StringBuffer result = new StringBuffer ();
if (m.find ())
m.appendReplacement (result, "$1FOO");
m.appendTail (result);
System.out.println (result.toString ());

String replaceAll with conditions

I am not good in regular expressions and I need help in replacing the string.
String str = "Name_XYZ_";
str = "XYZ_NAME_";
So how can I replace "Name_" or "_NAME_" from above two strings with empty string?
The conditions are "Name" can be in any case and it can be at index 0 or at any index but preceded by "_".
So far I tried,
String replacedString = str.replaceAll("(?i)Name_", ""); // This is not correct.
This is not the homework. I am working on XML file that needs such kind of processing.
String replacedString = str.replaceAll("(?i)(?:^|_)name_", "");
You were close. What you have to do is either anchor name to the beginning of the string (with ^) or require an underscore there. I also changed Name to name, because why mix lower and upper case, if you are treating the pattern case-insenstively anyway. Note that ?: is just an optimization (and a good practice). It suppresses capturing which you don't need in this case.
If you want to improve your regex skills, I can highly recommend this tutorial.
I'm using .NET's regex instead of Java's, but in that context (_?Name_) should work.

Regular expression to select all whitespace that isn't in quotes?

I'm not very good at RegEx, can someone give me a regex (to use in Java) that will select all whitespace that isn't between two quotes? I am trying to remove all such whitespace from a string, so any solution to do so will work.
For example:
(this is a test "sentence for the regex")
should become
(thisisatest"sentence for the regex")
Here's a single regex-replace that works:
\s+(?=([^"]*"[^"]*")*[^"]*$)
which will replace:
(this is a test "sentence for the regex" foo bar)
with:
(thisisatest"sentence for the regex"foobar)
Note that if the quotes can be escaped, the even more verbose regex will do the trick:
\s+(?=((\\[\\"]|[^\\"])*"(\\[\\"]|[^\\"])*")*(\\[\\"]|[^\\"])*$)
which replaces the input:
(this is a test "sentence \"for the regex" foo bar)
with:
(thisisatest"sentence \"for the regex"foobar)
(note that it also works with escaped backspaces: (thisisatest"sentence \\\"for the regex"foobar))
Needless to say (?), this really shouldn't be used to perform such a task: it makes ones eyes bleed, and it performs its task in quadratic time, while a simple linear solution exists.
EDIT
A quick demo:
String text = "(this is a test \"sentence \\\"for the regex\" foo bar)";
String regex = "\\s+(?=((\\\\[\\\\\"]|[^\\\\\"])*\"(\\\\[\\\\\"]|[^\\\\\"])*\")*(\\\\[\\\\\"]|[^\\\\\"])*$)";
System.out.println(text.replaceAll(regex, ""));
// output: (thisisatest"sentence \"for the regex"foobar)
Here is the regex which works for both single & double quotes (assuming that all strings are delimited properly)
\s+(?=(?:[^\'"]*[\'"][^\'"]*[\'"])*[^\'"]*$)
It won't work with the strings which has quotes inside.
This just isn't something regexes are good at. Search-and-replace functions with regexes are always a bit limited, and any sort of nesting/containment at all becomes difficult and/or impossible.
I'd suggest an alternate approach: Split your string on quote characters. Go through the resulting array of strings, and strip the spaces from every other substring (whether you start with the first or second depends on whether you string started with a quote or not). Then join them back together, using quotes as separators. That should produce the results you're looking for.
Hope that helps!
PS: Note that this won't handle nested strings, but since you can't make nested strings with the ASCII double-qutoe character, I'm gonna assume you don't need that behaviour.
PPS: Once you're dealing with your substrings, then it's a good time to use regexes to kill those spaces - no containing quotes to worry about. Just remember to use the /.../g modifier to make sure it's a global replacement and not just the first match.
Groups of whitespace outside of quotes are separated by stuff that's a) not whitespace, or b) inside quotes.
Perhaps something like:
(\s+)([^ "]+|"[^"]*")*
The first part matches a sequence of spaces; the second part matches non-spaces (and non-quotes), or some stuff in quotes, either repeated any number of times. The second part is the separator.
This will give you two groups for each item in the result; just ignore the second element. (We need the parentheses for precidence rather than match grouping there.) Or, you could say, concatenate all the second elements -- though you need to match the first non-space word too, or in this example, make the spaces optional:
StringBuffer b = new StringBuffer();
Pattern p = Pattern.compile("(\\s+)?([^ \"]+|\"[^\"]*\")*");
Matcher m = p.matcher("this is \"a test\"");
while (m.find()) {
if (m.group(2) != null)
b.append(m.group(2));
}
System.out.println(b.toString());
(I haven't done much regex in Java so expect bugs.)
Finally This is how I'd do it if regexes were compulsory. ;-)
As well as Xavier's technique, you could simply do it the way you'd do it in C: just iterate over the input characters, and copy each to the new string if either it's non-space, or you've counted an odd number of quotes up to that point.
If there is only one set of quotes, you can do this:
String s = "(this is a test \"sentence for the regex\") a b c";
Matcher matcher = Pattern.compile("^[^\"]+|[^\"]+$").matcher(s);
while (matcher.find())
{
String group = matcher.group();
s = s.replace(group, group.replaceAll("\\s", ""));
}
System.out.println(s); // (thisisatest"sentence for the regex")abc
This isn't an exact solution, but you can accomplish your goal by doing the following:
STEP 1: Match the two segments
\\(([a-zA-Z ]\*)"([a-zA-Z ]\*)"\\)
STEP 2: remove spaces
temp = $1 replace " " with ""
STEP 3: rebuild your string
(temp"$2")

What's wrong with this regex?

I am trying the following code on Java:
String test = "http://asda.aasd.sd.google.com/asdasdawrqwfqwfqwfqwf";
String regex = "[http://]{0,1}([a-zA-Z]*.)*\\.google\\.com/[-a-zA-Z/_.?&=]*";
System.out.println(test.matches(regex));
It does work for several minutes (after that I killed the VM) with no result.
Can anyone help me?
BTW: What will you recommend me to do to speed up weblink-testng regexes in future?
[http://] is a character class, meaning any one of those characters from the set.
Just leave those particular square brackets off if it must start with http://. If it's optional, you can use (http://)?.
One obvious problem is that you're looking for the sequence ([a-zA-Z]+.)*\\.google - this will do a lot of backtracking due to that naked . which means "any character" rather than the literal period that you wanted.
But even if you replace it with what you meant, ([a-zA-Z]+\\.)*\\.google, you still have a problem - this will then require two . characters immediately before google. You should instead try:
String regex = "(http://)?([a-zA-Z]+\\.)*google\\.com/[-a-zA-Z/_.?&=]*";
That returns immediately for me with a true match.
Keep in mind that this currently requires the / at the end of google.com. If that's a problem, it's a minor fix, but I've left it there since you had it in your original regex.
You are trying to match the scheme as a character class using square brackets. That means only zero or one of the characters from that set. You want a subpattern, with parentheses. You can also change {0,1} to just say ?.
Also, you should remove the period just before google\\.com because you're already looking for a period in the subdomain subpattern of your regex. As cherouvim points out, you forgot to escape that period as well.
String regex = "(http://)?([a-zA-Z]+\\.)*google\\.com/[-a-zA-Z/_.?&=]*";
In the ([a-zA-Z]*.) part you either need to escape the . (because right now it means "all characters") or remove it.
There are two problems with the regular expression.
The first is easy, as was mentioned by others. You need to match "http://" as a subpattern, not as a character class. Change the brackets to parentheses.
The second problem causes the very poor performance. It's causing the regex to backtrack repeatedly, trying to match the pattern.
What you're trying to do is match zero or more subdomains, which are groups of letters followed by a dot. Since you want to match the dot explicitly, escape the dot. Also remove the dot in front of "google" so you can match "http://google.com/etc" (ie, no leading dot in front of google).
So your expression becomes:
String regex = "(http://){0,1}([a-zA-Z]+\\.)*google\\.com/[-a-zA-Z/_.?&=]*";
Running this regex on your example takes just a fraction of a second.
Assuming you fix the ([a-zA-Z]*\\.) you need to change * to + so the part becomes ([a-zA-Z]+\\.). Otherwise you'll be accepting http://...google.com and this is not valid.
By grouping part before google.com I assume you are looking for part of URL host name. I think that rexep is powerful tool, but you can simply use URL Java class. There is getHost() method. Then you can check if host name ends with google.com and split it or use some simplier regexp with only host name.
URL url = new URL("http://asda.aasd.sd.google.com/asdasdawrqwfqwfqwfqwf");
String host = url.getHost();
if (host.endsWith("google.com"))
{
String [] parts = host.split("\\.");
for (String s: parts)
System.out.println(s);
}

Categories

Resources