I am not good in regular expressions and I need help in replacing the string.
String str = "Name_XYZ_";
str = "XYZ_NAME_";
So how can I replace "Name_" or "_NAME_" from above two strings with empty string?
The conditions are "Name" can be in any case and it can be at index 0 or at any index but preceded by "_".
So far I tried,
String replacedString = str.replaceAll("(?i)Name_", ""); // This is not correct.
This is not the homework. I am working on XML file that needs such kind of processing.
String replacedString = str.replaceAll("(?i)(?:^|_)name_", "");
You were close. What you have to do is either anchor name to the beginning of the string (with ^) or require an underscore there. I also changed Name to name, because why mix lower and upper case, if you are treating the pattern case-insenstively anyway. Note that ?: is just an optimization (and a good practice). It suppresses capturing which you don't need in this case.
If you want to improve your regex skills, I can highly recommend this tutorial.
I'm using .NET's regex instead of Java's, but in that context (_?Name_) should work.
Related
I am very confused on how Java's regular expressions work. I want to extract two strings from a pattern that looks like this:
String userstatus = "username = not ready";
Matcher matcher = Pattern.compile("(\\.*)=(\\.*)").matcher(userstatus);
System.out.println(matcher.matches());
But printing this will return false. I want to get the username, as well as the space that follows it, and the status string on the right of the equals sign and then store them both separately into two strings.
How would I do this? Thank you!
So the resulting strings should look like this:
String username = "username ";
String status = " not ready";
First, I assume that you are doing this as a learning exercise on regex, because a non-regex solution is easier to implement and to understand.
The problem with your solution is that you are escaping the dot, telling regex engine that you want to match literally a dot '.', not "any character". A simple fix to this problem is to remove \\:
(.*)=(.*)
Demo.
This would work, but it is not ideal. A better approach would be to say "match everything except =, like this:
([^=]*)=(.*)
Demo.
With \\.* you are trying to match zero or many dot characters. So the expression "(\\.*)=(\\.*)" actually expects something like ..=. or ..=. The wild card for any character is a simple .. To fix your code, you can change your regular expression to "(.*)=(.*)". This would match as many characters as it can before the = symbol and all the characters afterwards.
However, this solution is ugly and is not the best approach to do the job. The best thing to do is to use the split method if you want to extract what's on the left and the right side of the = sign.
You can use the split() method of a String.
String[] parts = userstatus.split("=");
String username = parts[0];
String status = parts[1];
- If its a question of fetching 2 String with = as the point of separation, then I think regex will be overkill for it.
- One can use split() method to handle this.
String lhs_Str = userstatus.split("=")[0]
String rhs_Str = userstatus.split("=")[1]
So I need to return modified String where it replaces the first instance of a token with another token while skipping comments. Here's an example of what I'm talking about:
This whole quote is one big String
-- I don't want to replace this ##
But I want to replace this ##!
Being a former .NET developer, I thought this was easy. I'd just do a negative lookbehind like this:
(?<!--.*)##
But then I learned Java can't do this. So upon learning that the curly braces are okay, I tried this:
(?<!--.{0,9001})##
That didn't throw an exception, but it did match the ## in the comment.
When I test this regex with a Java regex tester, it works as expected. About the only thing I can think of is that I'm using Java 1.5. Is it possible that Java 1.5 has a bug in its regex engine? Assuming it does, how do I get Java 1.5 to do what I want it to do without breaking up my string and reassembling it?
EDIT I changed the # to the -- operator since it looks like the regex will be more complex with two chars instead of one. I originally did not reveal that I was modifying a query in order to avoid off topic discussion on "Well you shouldn't modify queries that way!" I have a very good reason for doing this. Please don't discuss query modification good practices. Thanks
You really don't need a negative look-behind here. You can do it without that too.
It would be like this:
String str = "I don't want to replace this ##";
str = str.replaceAll("^([^#].*?)##", "$1");
So, it replaces first occurrence of ## in the string that does not start with # with the part of the string before ##. So, ## is removed. Here replaceAll works because it uses a reluctant quantifier - .*?. So, it will automatically stop at the first ##.
As correctly pointed out by #nhahtdh in the comment, that this might fail, if your comment is at the end of the line. So, you can rather use this one:
String str = "I don't want to # replace this ##";
str = str.replaceAll("^([^#]*?)##", "$1");
This one will work for any case. And in the given example case, it won't replace the ##, as it is a part of the comment.
If your comment start is denoted by two characters, then negated character class won't work. You would need to use negative look-ahead like this:
String str = "This whole quote ## is one big String -- asdf ##\n" +
"-- I don't want to replace this ##\n" +
"But I want to replace this ##!";
str = str.replaceAll("(?m)^(((?!--).)*?)##", "$1");
System.out.println(str);
Output:
This whole quote is one big String -- asdf ##
-- I don't want to replace this ##
But I want to replace this !
(?m) at the beginning of the pattern is used to enable MULTILINE mode of matching, so the ^ will match the start of each line, rather than the start of the entire expression.
You can use something like this:
String string = "This whole quote is one big String\n" +
"# I don't want to replace this ##\n" +
"And I also # don't want to replace this ##\n" +
"But I want to replace this ##!\n" +
"But not this ##!";
Matcher m =
Pattern.compile (
"^((?:[^##]|#[^#]|#[^\n]*)*)##", Pattern.MULTILINE).
matcher (string);
StringBuffer result = new StringBuffer ();
if (m.find ())
m.appendReplacement (result, "$1FOO");
m.appendTail (result);
System.out.println (result.toString ());
I have tried to use regex in JAVA for replacing any funny character in a string for mobile numbers, however, it doesnt seems to be able to remove the '-' between the numbers
here is my code,
// Remove all (,),-,.,[,],<,>,{,} from string
myMobileNumber.replaceAll("[^\\d]", "");
example 65-12345678
it will still allows the - to go through without deleting it away. =(
You should reassign the result. A String is an immutable object, and all methods including .replaceAll won't modify it.
myMobileNumber = myMobileNumber.replaceAll("[^\\d]", "");
(BTW, the pattern "\\D" is equivalent to "[^\\d]".)
Looking to find the appropriate regular expression for the following conditions:
I need to clean certain tags within free flowing text. For example, within the text I have two important tags: <2004:04:12> and <name of person>. Unfortunately some of tags have missing "<" or ">" delimiter.
For example, some are as follows:
1) <2004:04:12 , I need this to be <2004:04:12>
2) 2004:04:12>, I need this to be <2004:04:12>
3) <John Doe , I need this to be <John Doe>
I attempted to use the following for situation 1:
String regex = "<\\d{4}-\\d{2}-\\d{2}\\w*{2}[^>]";
String output = content.replaceAll(regex,"$0>");
This did find all instances of "<2004:04:12" and the result was "<2004:04:12 >".
However, I need to eliminate the space prior to the ending tag.
Not sure this is the best way. Any suggestions.
Thanks
Basically, you are looking for a negative look-ahead, like this:
String regex = "<\\d{4}-\\d{2}-\\d{2}(?!>)";
String output = content.replaceAll(regex,"$0>");
This will help with the numeric "tags", but since no regex can be intelligent enough to match an arbitrary name, you either must define very closely what a name can look like, or deal with the fact that the same approach is impossible for "name" tags.
For fixing the dates, you can match any date, with zero one or two angled brackets:
String regex = "(\\s?\\<?)(\\d{4}:\\d{2}:\\d{2})(\\>?\\s)";
String replace = " <$2> ";
To recognise a name, we assume parts of the name begin with a capital letter and the only separator is a space. We match the angled bracket explicitly at the start or end, and the preceeding/succeeding char before/after the name should be only a space or punctuation.
String regex = "(\\<[A-Z][a-zA-Z]*(\\s[A-Z][a-zA-Z])*)(?=[\\.!?:;\\s])";
String replace = "$1>";
String regex = "(?<=[\\.!?:;\\s])([A-Z][a-zA-Z]*(\\s[A-Z][a-zA-Z]*)*)";
String replace = "<$1";
I'm cleaning an incoming text in my Java code. The text includes a lot of "\n", but not as in a new line, but literally "\n". I was using replaceAll() from the String class, but haven't been able to delete the "\n".
This doesn't seem to work:
String string;
string = string.replaceAll("\\n", "");
Neither does this:
String string;
string = string.replaceAll("\n", "");
I guess this last one is identified as an actual new line, so all the new lines from the text would be removed.
Also, what would be an effective way to remove different patterns of wrong text from a String. I'm using regular expressions to detect them, stuff like HTML reserved characters, etc. and replaceAll, but everytime I use replaceAll, the whole String is read, right?
UPDATE: Thanks for your great answers. I' ve extended this question here:
Text replacement efficiency
I'm asking specifically about efficiency :D
Hooknc is right. I'd just like to post a little explanation:
"\\n" translates to "\n" after the compiler is done (since you escape the backslash). So the regex engine sees "\n" and thinks new line, and would remove those (and not the literal "\n" you have).
"\n" translates to a real new line by the compiler. So the new line character is send to the regex engine.
"\\\\n" is ugly, but right. The compiler removes the escape sequences, so the regex engine sees "\\n". The regex engine sees the two backslashes and knows that the first one escapes it so that translates to checking for the literal characters '\' and 'n', giving you the desired result.
Java is nice (it's the language I work in) but having to think to basically double-escape regexes can be a real challenge. For extra fun, it seems StackOverflow likes to try to translate backslashes too.
I think you need to add a couple more slashies...
String string;
string = string.replaceAll("\\\\n", "");
Explanation:
The number of slashies has to do with the fact that "\n" by itself is a controlled character in Java.
So to get the real characters of "\n" somewhere we need to use "\n". Which if printed out with give us: "\"
You're looking to replace all "\n" in your file. But you're not looking to replace the control "\n". So you tried "\n" which will be converted into the characters "\n". Great, but maybe not so much. My guess is that the replaceAll method will actually create a Regular Expression now using the "\n" characters which will be misread as the control character "\n".
Whew, almost done.
Using replaceAll("\\n", "") will first convert "\\n" -> "\n" which will be used by the Regular Expression. The "\n" will then be used in the Regular Expression and actually represents your text of "\n". Which is what you're looking to replace.
Instead of String.replaceAll(), which uses regular expressions, you might be better off using String.replace(), which does simple string substitution (if you are using at least Java 1.5).
String replacement = string.replace("\\n", "");
should do what you want.
string = string.replaceAll(""+(char)10, " ");
Try this. Hope it helps.
raw = raw.replaceAll("\t", "");
raw = raw.replaceAll("\n", "");
raw = raw.replaceAll("\r", "");
The other answers have sufficiently covered how to do this with replaceAll, and how you need to escape backslashes as necessary.
Since 1.5., there is also String.replace(CharSequence, CharSequence) that performs literal string replacement. This can greatly simplify many problem of string replacements, because there is no need to escape any regular expression metacharacters like ., *, |, and yes, \ itself.
Thus, given a string that can contain the substring "\n" (not '\n'), we can delete them as follows:
String before = "Hi!\\n How are you?\\n I'm \n good!";
System.out.println(before);
// Hi!\n How are you?\n I'm
// good!
String after = before.replace("\\n", "");
System.out.println(after);
// Hi! How are you? I'm
// good!
Note that if you insist on using replaceAll, you can prevent the ugliness by using Pattern.quote:
System.out.println(
before.replaceAll(Pattern.quote("\\n"), "")
);
// Hi! How are you? I'm
// good!
You should also use Pattern.quote when you're given an arbitrary string that must be matched literally instead of as a regular expression pattern.
I used this solution to solve that problem:
String replacement = str.replaceAll("[\n\r]", "");
Normally \n works fine. Otherwise you can opt for multiple replaceAll statements.
first apply one replaceAll on the text, and then reapply replaceAll again on the text. Should do what you are looking for.
I believe replaceAll() is an expensive operation. The below solution will probably perform better:
String temp = "Hi \n Wssup??";
System.out.println(temp);
StringBuilder result = new StringBuilder();
StringTokenizer t = new StringTokenizer(temp, "\n");
while (t.hasMoreTokens()) {
result.append(t.nextToken().trim()).append("");
}
String result_of_temp = result.toString();
System.out.println(result_of_temp);