Remove special characters surrounded by white space - java

How can i remove special characters having white space on side.
String webcontent = "This is my string. i got this string from blabla.com."
When i use this regex
webcontent.replaceAll("[-.:,+^]*", "");
it becomes like this
String webcontent = "This is my string i got this string from blablacom"
which is not good i want
"This is my string i got this string from blabla.com"

You must test the presence of a white character or the end of the string with a lookahead (?=...) (followed by):
webcontent.replaceAll("[-.?:,+^\\s]+(?:(?=\\s)|$)", "");
The lookahead is only a test and doesn't consume characters.
If you want to do the same with all punctuation characters, you can use the unicode punctuation charcater class: \p{Punct}
webcontent.replaceAll("[\\p{Punct}\\s+^]+(?:(?=\\s)|$)", "");
(note that + and ^ are not punctuation characters.)

You can use negative lookahead to avoid this:
webcontent = webcontent.replaceAll("[-.:?,+^]+(?!\\w)", "");
//=> This is my string i got this string from blabla.com

Try this one
// any one or more special characters followed by space or in the end
// replace with single space
webcontent.replaceAll("[-.:,+]+(\\s|$)", " ").trim();
--EDIT--
if the special character is in the beginning
webcontent.replaceAll("^([-.:,+]+)|[-.:,+]+(\\s|$)", " ").trim();
input:
.This is my string. i got this string from blabla.com.
output:
This is my string i got this string from blabla.com
--EDIT--
I want to replace ? also
webcontent.replaceAll("^([-.:,+]+|\\?+)|([-.:,+]+|\\?+)(\\s|$)", " ").trim();
input
..This is my string.. ?? i got this string from blabla.com..
output
This is my string i got this string from blabla.com

Use the regex [-.:?,+^](\s|$) and remove the character for each match with basic string manipulation. It's a few more lines of code but much, much cleaner.
A pure java solutions where you loop over all special characters and check the next character is also quite simple.
As soon as there are lookaheads/lookbehinds involved, I usually fall back to a non-regex solution for clarity.

Related

How to check and replace a sequence of characters in a String?

Here what the program is expectiong as the output:
if originalString = "CATCATICATAMCATCATGREATCATCAT";
Output should be "I AM GREAT".
The code must find the sequence of characters (CAT in this case), and remove them. Plus, the resulting String must have spaces in between words.
String origString = remixString.replace("CAT", "");
I figured out I have to use String.replace, But what could be the logic for finding out if its not cat and producing the resulting string with spaces in between the words.
First off, you probably want to use the replaceAll method instead, to make sure you replace all occurrences of "CAT" within the String. Then, you want to introduce spaces, so instead of an empty String, replace "CAT" with " " (space).
As pointed out by the comment below, there might be multiple spaces between words - so we use a regular expression to replace multiple instances of "CAT" with a single space. The '+' symbol means "one or more",.
Finally, trim the String to get rid of leading and trailing white space.
remixString.replaceAll("(CAT)+", " ").trim()
You can use replaceAll which accepts a regular expression:
String remixString = "CATCATICATAMCATCATGREATCATCAT";
String origString = remixString.replaceAll("(CAT)+", " ").trim();
Note: the naming of replace and replaceAll is very confusing. They both replace all instances of the matching string; the difference is that replace takes a literal text as an argument, while replaceAll takes a regular expression.
Maybe this will help
String result = remixString.replaceAll("(CAT){1,}", " ");

How can we remove a ':' characters from a string?

I have strings like
#lle #mme: #crazy #upallnight:
I would like to remove the words which starts with either # or #. It works perfectly fine if those words doesn't contain the ':' character. However, that ':' character is left whenever I delete the words. Therefore I decided to replace those ':' characters before I delete the words using a string.replace() function. However, they are still not removed.
String example = "#lle #mme: #crazy #upallnight:";
example.replace(':',' ');
The result : #lle #mme: #crazy #upallnight:
I am pretty stuck here, anyhelp would be appreciated.
You can do this:
example = example.replaceAll(" +[##][^ ]+", "");
What this will do is replace any substrings in your string that match the regex pattern [##][^ ]+ with the empty string. Since that pattern matches the words you want to dump, it'll do what you want.
Demo of the pattern on Regex101
From Java docs:
String s = "Abc: abc#:";
String result = s.replace(':',' ');
Output in variable result= Abc abc#
I think you forgot to store the returned result of replace() method in some other String variable.

Remove unwanted characters from string by regex in Java

I have a string here:
javax.swing.JLabel[,380,30,150x25,alignmentX=0.0,alignmentY=0.0]: Hello
I want to remove everything before the ":", including the ":" itself. This would leave only "Hello". I read about regex, but no combination I tried worked. Can someone tell me how to do it. Thanks in advance!
You need to use replaceAll method or replaceFirst.
string.replaceFirst(".*:\\s*", "");
or
string.replaceAll(".*:\\s*", "");
This would give you only Hello. If you remove \\s* pattern,then it would give you <space>Hello string.
.* Matches any character zero or more times, greedily.
: Upto the colon.
\\s* Matches zero or more space characters.
You could also just split the string by : and take the second string. Like this
String sample = "javax.swing.JLabel[,380,30,150x25,alignmentX=0.0,alignmentY=0.0]: Hello";
System.out.println(sample.split(":", -1)[1]);
This will output
<space>Hello
If you want to get rid of that leading space just trim it off like
System.out.println(sample.split(":", -1)[1].trim());

Java Regex to replace string surrounded by non alphanumeric characters

I need a way to replace words in sentences so for example, "hi, something". I need to replace it with "hello, something".
str.replaceAll("hi", "hello") gives me "hello, somethellong".
I've also tried str.replaceAll(".*\\W.*" + "hi" + ".*\\W.*", "hello"), which I saw on another solution on here however that doesn't seem to work either.
What's the best way to achieve this so I only replace words not surrounded by other alphanumeric characters?
Word boundaries should serve you well in this case (and IMO are the better solution). A more general method is to use negative lookahead and lookbehind:
String input = "ab, abc, cab";
String output = input.replaceAll("(?<!\\w)ab(?!\\w)", "xx");
System.out.println(output); //xx, abc, cab
This searches for occurrences "ab" that are not preceded or followed by another word character. You can swap out "\w" for any regex (well, with practical limitations as regex engines don't allow unbounded lookaround).
Use \\b for word boundaries:
String regex = "\\bhi\\b";
e.g.,
String text = "hi, something";
String regex = "\\bhi\\b";
String newString = text.replaceAll(regex, "hello");
System.out.println(newString);
If you're going to be doing any amount of regular expressions, make this Regular Expressions Tutorial your new best friend. I can't recommend it too highly!

Removing all standalone occurences of a word from a string with regular expressions in Java

Need advice on how to replace a sub-string like: #sometext, but not replace "#someothertext#somemail.com" sub-string.
For example, when I've got a string something like:
An example with #sometext and also with "#someothertext#somemail.com" sometextafter
And the result, after replacing sub-strings in string above should look like:
An example with and also with "#someothertext#somemail.com" sometextafter
After getting string from a field, I'm using:
String textMod = someText.replaceAll("( |^)[^\"]#[^#]+?( |$)","");
someText = textMod + "#\"" + someone.getEmail() + "\" ";
And then I'm setting this string into field.
You can do a regex on a standalone occurence this way
\b#sometext\b
Putting the \b in front and in the back of the #sometext will make sure that it's a standalone word, not part of another word like #someothertext#sometext.com. Then if it's found the result will be put inside $match, now you can do whatever you want with $match
Hope this helps
From https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html
The \b in the pattern indicates a word boundary, so only the distinct
* word "web" is matched, and not a word partial like "webbing" or "cobweb"
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
}
^ PHP example but you get the point
If there is always a space before and behind the tags to replace, this might suffice.
/\s(#\w+)\s/g
Try this
(?<!\w)#[^#\s]+(?!\S)
See it here on Regexr
Match on a # but only if there is no word character \w before (?<!\w). Then match a sequence of characters that are not # and not whitespace \s but only if its not followed by a non whitespace \S
(?<!\w) is called a negative lookbehind assertion
[^#\s] is called a negated character class, means match anything that is not part of the class
(?!\S) is a negative lookahead assertion
This should correspond to your needs:
str = str.replaceAll("#\w+[^#]", "");
(c#, regex based)
//match #xxx sequences, but only if i can look back and NOT see a #xxx immediately preceding me, and if I don't end with a #
string input = #"[An example with #hello and also with ""##hello#somemail.com"" sometext #lastone";
var pattern = #"(?<!#\w+)(?>#\w+)(?!#)";
var matches = Regex.Matches(input, pattern);
Simply adding spaces before and after "#sometext" would not work if "#sometext" is at the start or end of a sentence. However, just adding a pattern checking for start or end of sentence would not work either, as when you match "#sometext " at the start of a sentence and leave a space " ", this will make the resulting string look strange. Same goes for the end of a sentence.
We need to split the regex replace in to two actions, and perform two seperate regex replaces:
str = str.replaceAll(" #sometext ", " ");
str = str.replaceAll("^#sometext | #sometext$|(?:#sometext ){2,}", "");
^ means start of line, $ means end of line.
EDIT: Added corner case handling of when several #sometext's are after each other.
myString = myString.replaceAll(" #hello ", " ");
If #hello is a single word, then it has spaces before and after, right? So you should find all #hellos with space before and after and replace it with a space.
If you need to remove not only #hellos and all words which are starting with # and not containing other #, use this:
myString = myString.replaceAll(" #[^#]+? ", " ");
[^#] is any symbol except #. +? means match at least one character until reaching the first space.
If you want to remove words with only alphanumeric characters, use \\w instead of [^#]
EDIT:
Yeah, ohaal's right. To make it match at the start and the end of string use this pattern:
( |^)#[^#]+?( |$)
myString = myString.replaceAll("( |^)#hello( |$)", " ");

Categories

Resources