Java Regex Remove Text Between and Including Parenthesis from String

Java Regex Remove Text Between and Including Parenthesis from String - java

I am programming in Java, and I have a few Strings that look similar to this:
"Avg. Price ($/lb)"
"Average Price ($/kg)"
I want to remove the ($/lb) and ($/kg) from both Strings and be left with
"Avg. Price"
"Average Price".
My code checks whether a String str variable matches one of the strings above, and if it does, replaces the text inside including the parentheses with an empty string:
if(str.matches(".*\\(.+?\\)")){
str = str.replaceFirst("\\(.+?\\)", "");
}
When I change str.matches to str.contains("$/lb"); as a test, the wanted substring is removed which leads me to believe there is something wrong with the if statement. Any help as to what I am doing wrong? Thank you.
Update
I changed the if statement to:
if(str.contains("(") && str.contains (")"))
Maybe not an elegant solution but it seems to work.

str.matches has always been problematic for me. I think it implies a '^' and '$' surrounding the regex you pass it.
Since you just care about replacing any occurrence of the string in question - try the following:
str = str.replaceAll("\\s+\\(\\$\\/(lb|kg)\\)", "");
There is an online regex testing tool that you can also try out to see how your expression works out.
EDIT With regard to your comment, the expression could be altered to just:
str = str.replaceAll("\\s+\\([^)]+\\)$", "");
This would mean, find any section of content starting with one or more white-space characters, followed by a literal '(', then look for any sequence of non-')' characters, followed by a literal ')' at the end of the line.
Is that more in-line with your expectation?
Additionally, heed the comment with regard to 'matches()' vs 'find()' that is very much so what is impacting operation here for you.

Unlike most other popular application languages, the matches() method in java only returns true if the regex matches the whole string (not part of the string like in perl, ruby, php, javascript etc).
The regex to match bracketed input, including any leading spaces, is:
" *\\(.*?\\)"
and the code to use this to remove matches is:
str = str.replaceAll(" *\\(.+?\\)", "");
Here's some test code:
String str = "foo (stuff) bar(whatever)";
str = str.replaceAll(" *\\(.+?\\)", "");
System.out.println(str);
Output:
"foo bar"

This code is working fine.
String str = "Avg. Price ($/lb) Average Price ($/kg)";
if (str.matches(".*\\(.+?\\)")) {
str = str.replaceFirst("\\(.+?\\)", "");
}
System.out.println("str: "+str);
This will print Avg. Price Average Price which is what you need.
Note: I changed replaceFirst with replaceAll here.

String first = "^(\\w+\\.\\s\\w+)";
This would print out Avg. Price
String second="(\\w\\s\\w)";
This would print out Average Price
hope this simple answer helps

Related

Replace a nth character using regex in Java

I'm trying to learn regex in Java.
So far, I've been trying some little mini challenges and I'm wondering if there is a way to define a nth character.
For instance, let's say I have this string: todayiwasnotagoodday
If I want to replace the third (fourth or seventh) character, how I can define a regex in order to change an specific "index", for this example the 'd' for an empty space "".
I've been searching about it, but so far my implementations match from the first element to the third: ^[a-z]{3}
¿Is it possible to define this regex?
Thanks in advance.

If you want to replace the third character with a space via regex, you could try a regex replace all:
String input = "todayiwasnotagoodday";
String output = input.replaceAll("^(.{2}).(.*)$", "$1 $2");
System.out.println(output); // to ayiwasnotagoodday
Note that you could also avoid regex here, and just use substring operations:
String output = input.substring(0, 2) + " " + input.substring(3);
System.out.println(output); // to ayiwasnotagoodday

How to check if a word ends and starts with a common symbol and replace it as many times it appears with 1

I am facing a little challenge, here's what I've been trying to do.
Assuming I have these 2 variables
String word1 ="hello! hello!! %can you hear me%? Yes I can.";
And then this one
String word2 ="*Java is awesome* Do you % agree % with us?";
I want to be able to check if a variable contains a word that begins and ends with a particular symbol(s) like % and * that I am using and replace with; with '1' (one). Here's what I tried.
StringTokenizer st = new StringTokenizer(word1);
while(st.hasMoreTokens()){
String block = st.nextToken();
if( (block.startsWith("%") && block.endsWith("%") ||(block.startsWith("*") && block.endsWith("*")){
word1.replace (block,"1");
}
}
//output
'hello!hello!!%canyouhearme%?YesIcan."
//expected
"hello! hello!! 1? Yes I can.";
It just ended up trimming it. I guess this is because of the delimiter used is Space and since the last % ends with %? It read it as a single block.
When I tried the same for word2
I got "1Doyou%agree%withus?"
//expected
"1 Do you 1 with us?"
And assuming I have another word like
String word3 ="%*hello*% friends";
I want to be able to produce
//output
"1friends"
//expected
"11 friends"
Since it has 4-symbols
Any help would be truly appreciated, just sharpening my java skills. Thanks.

You can use a Regular Expression (RegEx) within the String.matches() method for determining if a string contains the specific criteria, for example:
if (word1.matches(".*\\*.*\\*.*|.*\\%.*\\%.*")) {
// Replace desired test with the value of 1 here...
}
If you want the full explanation of this regular expression then go to rexex101.com and enter the following expression: .*\*.*\*.*|.*\%.*\%.*.
The above if statement condition utilizes the String.matches() method to validate whether or not the string contains text (or no text) between either asterisks (*) or between percent (%) characters. If it does we simply use the String.replaceAll() method to replace those string sections (between and including *...* and %...%) with the value of 1, something like this:
String word1 = "hello! hello!! %can you hear me%? Yes I can.";
if (word1.matches(".*\\*.*\\*.*|.*\\%.*\\%.*")) {
String newWord1 = word1.replaceAll("\\*.*\\*|%.*%", "1");
System.out.println(newWord1);
}
The Console window will display:
hello! hello!! 1? Yes I can.
If you were to play this string: "*Java is awesome* Do you % agree % with us?" into with the above code your console window will display:
1 Do you 1 with us?
Keep in mind that this will provide the same output to console if your supplied string was "** Do you %% with us?". If you don't really want this then you will need to modify the RegEx within the matches() method a wee bit to something like this:
".*\\*.+\\*.*|.*\\%.+\\%.*"
and you will need to modify the the RegEx within the replaceAll() method to this:
"\\*.+\\*|%.+%"
With this change there now must be text between both the asterisks and or the Percent characters before validation is successful and a change is made.

The question isn't clear (not sure about how %*hello*% somehow translates to 11, and didn't understand what you mean by Since it has 4-symbols), but wouldn't regular expressions work?
Can't you simply do:
String replaced = word1.replaceAll("\\*[^\\*]+\\*", "1")
.replaceAll("\\%[^\\%]+\\%", "1");

I would say your presumption that special characters will be replaced twice is wrong. Replace function only works with case when you are trying to replace occurance of String, which doesn't seem to work with special characters. Only replaceAll, seems to work in that case. In your code you are trying to replace special characters along with other strings inside that, so only replaceAll function will do so.
In other words, when replaceAll function is executed it checks occurance of special characters , and replaces it once. You wouldn't require effort of using StringTokenizer, which is part of Scanner library, it is only required if you are taking user's input. So, no matter what you do you would only see 1 friends instead of 11 friends , also , you wouldn't need if statement. Credit goes to jbx above for regex. Now, you could shorten your code like this, still bearing in mind that 1 is printed replacing whatever is inside special character is replaced by single number 1.
You will need if-statement to search , replaceAll, or replace function already searches in String you specify to search on, so that if-statement is redundant, it's just making code end up being verbose.
package object_list_stackoverflow;
import java.util.StringTokenizer;
public class Object_list_stackoverflow {
public static void main(String[] args) {
String word1 = "hello! hello!! %can you hear me%? Yes I can.";
String word2 ="*Java is awesome* Do you % agree % with us?";
String word3 ="%*hello*% friends";
String regex = "\\*[^\\*]+\\*";
String regex1= "\\%[^\\%]+\\%";
System.out.println(word3.replaceAll(regex, "1").replaceAll(regex1, "1"));
}
}
Also read similar question by going to : Find Text between special characters and replace string
You can also get rid of alphanumeric characters by looking at dhuma1981's answer: How to replace special characters in a string?
Syntax to replace alphanumerics in String :
replaceAll("[^a-zA-Z0-9]", "");

Eliminating spaces and words starting with particular chars from JAVA string

Eliminating spaces and words starting with particular chars from JAVA string.
With the following code spaces between string words are eliminated:
String str1= "This is symbel for snow and silk. Grapes are very dear"
String str2=str1.replaceAll(" ","");
System.out.println(str2);
It gives this output:-
output:
Thisissymbelforsnowandsilk.Grapesareverydear
But I want to eliminate all the words in str1 starting with char 's' (symbel snow silk) and char 'd' (dear) to get the following output:-
output:
Thisisforand.Grapesarevery
How it can be achieved by amending this code?

The best solution is to use a Regular Expression also known as a Regex.
These are designed specifically for complex search and replace functionality in strings.
This one:
"([sd]\\w+)|\\s+"
matches a word group indicated by the parentheses () starting with 's' or 'd' followed by one or more "word" characters (\\w = any alpha numeric or underscore) OR one or more whitespace characters (\\s = whitespace). When used as an argument to the String replaceAll function like so:
s.replaceAll("([sd]\\w+)|\\s+", "");
every occurance that matches either of these two patterns is replaced with the empty string.
There is comprehensive information on regexes in Oracle's java documentation here:
http://docs.oracle.com/javase/tutorial/essential/regex/
Although they seem cryptic at first, learning them can greatly simplify your code. Regexes are available in almost all modern languages so any knowledge you gain about regexes is useful and transferable.
Furthermore, the web is littered with handy sites where you can test your regexes out before committing them to code.

Do like this
String str1= "This is symbel for snow and silk. Grapes are very dear";
System.out.print(str1.replaceAll("[sd][a-z]+|[ ]+",""));
Explanation

try this
s = s.replaceAll("([sd]\\w+)|\\s+", "");

How can I write a regex in Java that will perform a .replaceFirst on a group that is not in a comment?

So I need to return modified String where it replaces the first instance of a token with another token while skipping comments. Here's an example of what I'm talking about:
This whole quote is one big String
-- I don't want to replace this ##
But I want to replace this ##!
Being a former .NET developer, I thought this was easy. I'd just do a negative lookbehind like this:
(?<!--.*)##
But then I learned Java can't do this. So upon learning that the curly braces are okay, I tried this:
(?<!--.{0,9001})##
That didn't throw an exception, but it did match the ## in the comment.
When I test this regex with a Java regex tester, it works as expected. About the only thing I can think of is that I'm using Java 1.5. Is it possible that Java 1.5 has a bug in its regex engine? Assuming it does, how do I get Java 1.5 to do what I want it to do without breaking up my string and reassembling it?
EDIT I changed the # to the -- operator since it looks like the regex will be more complex with two chars instead of one. I originally did not reveal that I was modifying a query in order to avoid off topic discussion on "Well you shouldn't modify queries that way!" I have a very good reason for doing this. Please don't discuss query modification good practices. Thanks

You really don't need a negative look-behind here. You can do it without that too.
It would be like this:
String str = "I don't want to replace this ##";
str = str.replaceAll("^([^#].*?)##", "$1");
So, it replaces first occurrence of ## in the string that does not start with # with the part of the string before ##. So, ## is removed. Here replaceAll works because it uses a reluctant quantifier - .*?. So, it will automatically stop at the first ##.
As correctly pointed out by #nhahtdh in the comment, that this might fail, if your comment is at the end of the line. So, you can rather use this one:
String str = "I don't want to # replace this ##";
str = str.replaceAll("^([^#]*?)##", "$1");
This one will work for any case. And in the given example case, it won't replace the ##, as it is a part of the comment.
If your comment start is denoted by two characters, then negated character class won't work. You would need to use negative look-ahead like this:
String str = "This whole quote ## is one big String -- asdf ##\n" +
"-- I don't want to replace this ##\n" +
"But I want to replace this ##!";
str = str.replaceAll("(?m)^(((?!--).)*?)##", "$1");
System.out.println(str);
Output:
This whole quote is one big String -- asdf ##
-- I don't want to replace this ##
But I want to replace this !
(?m) at the beginning of the pattern is used to enable MULTILINE mode of matching, so the ^ will match the start of each line, rather than the start of the entire expression.

You can use something like this:
String string = "This whole quote is one big String\n" +
"# I don't want to replace this ##\n" +
"And I also # don't want to replace this ##\n" +
"But I want to replace this ##!\n" +
"But not this ##!";
Matcher m =
Pattern.compile (
"^((?:[^##]|#[^#]|#[^\n]*)*)##", Pattern.MULTILINE).
matcher (string);
StringBuffer result = new StringBuffer ();
if (m.find ())
m.appendReplacement (result, "$1FOO");
m.appendTail (result);
System.out.println (result.toString ());

Text cleaning and replacement: delete \n from a text in Java

I'm cleaning an incoming text in my Java code. The text includes a lot of "\n", but not as in a new line, but literally "\n". I was using replaceAll() from the String class, but haven't been able to delete the "\n".
This doesn't seem to work:
String string;
string = string.replaceAll("\\n", "");
Neither does this:
String string;
string = string.replaceAll("\n", "");
I guess this last one is identified as an actual new line, so all the new lines from the text would be removed.
Also, what would be an effective way to remove different patterns of wrong text from a String. I'm using regular expressions to detect them, stuff like HTML reserved characters, etc. and replaceAll, but everytime I use replaceAll, the whole String is read, right?
UPDATE: Thanks for your great answers. I' ve extended this question here:
Text replacement efficiency
I'm asking specifically about efficiency :D

Hooknc is right. I'd just like to post a little explanation:
"\\n" translates to "\n" after the compiler is done (since you escape the backslash). So the regex engine sees "\n" and thinks new line, and would remove those (and not the literal "\n" you have).
"\n" translates to a real new line by the compiler. So the new line character is send to the regex engine.
"\\\\n" is ugly, but right. The compiler removes the escape sequences, so the regex engine sees "\\n". The regex engine sees the two backslashes and knows that the first one escapes it so that translates to checking for the literal characters '\' and 'n', giving you the desired result.
Java is nice (it's the language I work in) but having to think to basically double-escape regexes can be a real challenge. For extra fun, it seems StackOverflow likes to try to translate backslashes too.

I think you need to add a couple more slashies...
String string;
string = string.replaceAll("\\\\n", "");
Explanation:
The number of slashies has to do with the fact that "\n" by itself is a controlled character in Java.
So to get the real characters of "\n" somewhere we need to use "\n". Which if printed out with give us: "\"
You're looking to replace all "\n" in your file. But you're not looking to replace the control "\n". So you tried "\n" which will be converted into the characters "\n". Great, but maybe not so much. My guess is that the replaceAll method will actually create a Regular Expression now using the "\n" characters which will be misread as the control character "\n".
Whew, almost done.
Using replaceAll("\\n", "") will first convert "\\n" -> "\n" which will be used by the Regular Expression. The "\n" will then be used in the Regular Expression and actually represents your text of "\n". Which is what you're looking to replace.

Instead of String.replaceAll(), which uses regular expressions, you might be better off using String.replace(), which does simple string substitution (if you are using at least Java 1.5).
String replacement = string.replace("\\n", "");
should do what you want.

string = string.replaceAll(""+(char)10, " ");

Try this. Hope it helps.
raw = raw.replaceAll("\t", "");
raw = raw.replaceAll("\n", "");
raw = raw.replaceAll("\r", "");

The other answers have sufficiently covered how to do this with replaceAll, and how you need to escape backslashes as necessary.
Since 1.5., there is also String.replace(CharSequence, CharSequence) that performs literal string replacement. This can greatly simplify many problem of string replacements, because there is no need to escape any regular expression metacharacters like ., *, |, and yes, \ itself.
Thus, given a string that can contain the substring "\n" (not '\n'), we can delete them as follows:
String before = "Hi!\\n How are you?\\n I'm \n good!";
System.out.println(before);
// Hi!\n How are you?\n I'm
// good!
String after = before.replace("\\n", "");
System.out.println(after);
// Hi! How are you? I'm
// good!
Note that if you insist on using replaceAll, you can prevent the ugliness by using Pattern.quote:
System.out.println(
before.replaceAll(Pattern.quote("\\n"), "")
);
// Hi! How are you? I'm
// good!
You should also use Pattern.quote when you're given an arbitrary string that must be matched literally instead of as a regular expression pattern.

I used this solution to solve that problem:
String replacement = str.replaceAll("[\n\r]", "");

Normally \n works fine. Otherwise you can opt for multiple replaceAll statements.
first apply one replaceAll on the text, and then reapply replaceAll again on the text. Should do what you are looking for.

I believe replaceAll() is an expensive operation. The below solution will probably perform better:
String temp = "Hi \n Wssup??";
System.out.println(temp);
StringBuilder result = new StringBuilder();
StringTokenizer t = new StringTokenizer(temp, "\n");
while (t.hasMoreTokens()) {
result.append(t.nextToken().trim()).append("");
}
String result_of_temp = result.toString();
System.out.println(result_of_temp);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regex Remove Text Between and Including Parenthesis from String - java

String first = "^(\\w+\\.\\s\\w+)"; This would print out Avg. Price String second="(\\w\\s\\w)"; This would print out Average Price hope this simple answer helps

Related

Replace a nth character using regex in Java

How to check if a word ends and starts with a common symbol and replace it as many times it appears with 1

Eliminating spaces and words starting with particular chars from JAVA string

How can I write a regex in Java that will perform a .replaceFirst on a group that is not in a comment?

Text cleaning and replacement: delete \n from a text in Java

Categories

Resources