Java Regex to replace string surrounded by non alphanumeric characters - java

I need a way to replace words in sentences so for example, "hi, something". I need to replace it with "hello, something".
str.replaceAll("hi", "hello") gives me "hello, somethellong".
I've also tried str.replaceAll(".*\\W.*" + "hi" + ".*\\W.*", "hello"), which I saw on another solution on here however that doesn't seem to work either.
What's the best way to achieve this so I only replace words not surrounded by other alphanumeric characters?

Word boundaries should serve you well in this case (and IMO are the better solution). A more general method is to use negative lookahead and lookbehind:
String input = "ab, abc, cab";
String output = input.replaceAll("(?<!\\w)ab(?!\\w)", "xx");
System.out.println(output); //xx, abc, cab
This searches for occurrences "ab" that are not preceded or followed by another word character. You can swap out "\w" for any regex (well, with practical limitations as regex engines don't allow unbounded lookaround).

Use \\b for word boundaries:
String regex = "\\bhi\\b";
e.g.,
String text = "hi, something";
String regex = "\\bhi\\b";
String newString = text.replaceAll(regex, "hello");
System.out.println(newString);
If you're going to be doing any amount of regular expressions, make this Regular Expressions Tutorial your new best friend. I can't recommend it too highly!

Related

Regex to check if String is one word in Java

I need regex to check if String has only one word (e.g. "This", "Country", "Boston ", " Programming ").
So far I used an alternative way of doing it which is to check if String contains spaces. However, I am sure that this can be done using regex.
One possible way in my opinion is "^\w{2,}\s". Does this work properly? Are there any other possible answers?
The pattern ^\w{2,}\s matches 2 or more word characters from the start of the string, followed by a mandatory whitespace char (that can also match a newline)
As the pattern is also unanchored, it can also match Boston in Boston test
If you want to match a single word with as least 2 characters surrounded by optional horizontal whitespace characters using \h* and add an anchor $ to assert the end of the string.
^\h*\w{2,}\h*$
Regex demo
In Java
String regex = "^\\h*\\w{2,}\\h*$";

Matching a string which occurs after a certain pattern

I want to match a string which occurs after a certain pattern but I am not able to come up with a regex to do that (I am using Java).
For example, let's say I have this string,
caa,abb,ksmf,fsksf,fkfs,admkf
and I want my regex to match only those commas which are prefixed by abb. How do I do that? Is it even possible using regexes?
If I use the regex abb, it matches the whole string abb, but I only want to match the comma after that.
I ask this because I wanted to use this regex in a split method which accepts a regex. If I pass abb, as the regex, it will consider the string abb, to be the delimiter and not the , which I want.
Any help would be greatly appreciated.
String test = "caa,abb,ksmf,fsksf,fkfs,admkf";
String regex = "(?<=abb),";
String[] split = test.split(regex);
for(String s : split){
System.out.println(s);
}
Output:
caa,abb
ksmf,fsksf,fkfs,admkf
See here for information:
https://www.regular-expressions.info/lookaround.html

Regular expression to remove unwanted characters from the String

I have a requirement where I need to remove unwanted characters for String in java.
For example,
Input String is
Income ......................4,456
liability........................56,445.99
I want the output as
Income 4,456
liability 56,445.99
What is the best approach to write this in java. I am parsing large documents
for this hence it should be performance optimized.
You can do this replace with this line of code:
System.out.println("asdfadf ..........34,4234.34".replaceAll("[ ]*\\.{2,}"," "));
For this particular example, I might use the following replacement:
String input = "Income ......................4,456";
input = input.replaceAll("(\\w+)\\s*\\.+(.*)", "$1 $2");
System.out.println(input);
Here is an explanation of the pattern being used:
(\\w+) match AND capture one or more word characters
\\s* match zero or more whitespace characters
\\.+ match one or more literal dots
(.*) match AND capture the rest of the line
The two quantities in parentheses are known as capture groups. The regex engine remembers what these were while matching, and makes them available, in order, as $1 and $2 to use in the replacement string.
Output:
Income 4,456
Demo
Best way to do that is like:
String result = yourString.replaceAll("[-+.^:,]","");
That will replace this special character with nothing.

Give look behind the priority over the actual regular expression

I am looking for a regular expression that can strip all 'a' characters from the beginning of an input word (comprising only of English alphabet).
How would I do this using an regular expression?
The following look behind based regex fails to do the job:
(?<=a*?)(\w)+
as for input abc the above regular expression would return abc.
Is there a clean way to do this using lookbehinds?
A (brute force-ish) regular expression that does work is using negation:
(?<=a*)([[^a]&&\w])*
which returns the correct answer of bc for an input word abc.
But I was wondering if there could be a more elegant regular expression, say, using the correct quantifier?
Pattern removeWords = Pattern.compile("\\b(?:a)\\b\\s*", Pattern.CASE_INSENSITIVE);
Matcher fix = removeWords.matcher(YourWord);
String fixedString = fix.replaceAll("");
this will remove a from the current string and if you want to remove some other letters
Pattern removeWords = Pattern.compile("\\b(?:a|b|c)\\b\\s*",Pattern.CASE_INSENSITIVE);
you ca do it this way
I think that a regex for this problem is overkill.
You could instead do:
str = str.startsWith("a") ? str.substring(1) : str;
Try with:
(?i)\\ba?(\\w+)\\b
and replace a word with captured group 1.
Code example:
String word = "aWord Another";
word = word.replaceAll("(?i)\\ba?(\\w+)\\b", "$1");
System.out.println(word);
with output:
Word nother
There are much more simpler way to do this, but as you insist on using using lookbehinds, I will give one. The regex will be
(?<=\b)a+(\w*)
Regex Breakdown
(?<=\b) #Find all word boundaries
a+ #Match the character a literally at least once. We have already ensured using word boundary to find those a's only which are starting of word
(\w*) #Find remaining characters
Regex Demo
Java Code
String str = "abc cdavbvhsza aaabcd";
System.out.println(str.replaceAll("(?<=\\b)a+(\\w*)", "$1"));
Ideone Demo

Remove special characters surrounded by white space

How can i remove special characters having white space on side.
String webcontent = "This is my string. i got this string from blabla.com."
When i use this regex
webcontent.replaceAll("[-.:,+^]*", "");
it becomes like this
String webcontent = "This is my string i got this string from blablacom"
which is not good i want
"This is my string i got this string from blabla.com"
You must test the presence of a white character or the end of the string with a lookahead (?=...) (followed by):
webcontent.replaceAll("[-.?:,+^\\s]+(?:(?=\\s)|$)", "");
The lookahead is only a test and doesn't consume characters.
If you want to do the same with all punctuation characters, you can use the unicode punctuation charcater class: \p{Punct}
webcontent.replaceAll("[\\p{Punct}\\s+^]+(?:(?=\\s)|$)", "");
(note that + and ^ are not punctuation characters.)
You can use negative lookahead to avoid this:
webcontent = webcontent.replaceAll("[-.:?,+^]+(?!\\w)", "");
//=> This is my string i got this string from blabla.com
Try this one
// any one or more special characters followed by space or in the end
// replace with single space
webcontent.replaceAll("[-.:,+]+(\\s|$)", " ").trim();
--EDIT--
if the special character is in the beginning
webcontent.replaceAll("^([-.:,+]+)|[-.:,+]+(\\s|$)", " ").trim();
input:
.This is my string. i got this string from blabla.com.
output:
This is my string i got this string from blabla.com
--EDIT--
I want to replace ? also
webcontent.replaceAll("^([-.:,+]+|\\?+)|([-.:,+]+|\\?+)(\\s|$)", " ").trim();
input
..This is my string.. ?? i got this string from blabla.com..
output
This is my string i got this string from blabla.com
Use the regex [-.:?,+^](\s|$) and remove the character for each match with basic string manipulation. It's a few more lines of code but much, much cleaner.
A pure java solutions where you loop over all special characters and check the next character is also quite simple.
As soon as there are lookaheads/lookbehinds involved, I usually fall back to a non-regex solution for clarity.

Categories

Resources