Using a regex to check for special characters java - java

I want to check and see if a word contains a special character and remove it. Lets say I have String word = "hello-there", I want to loop through and check to see if the word doesn't contain a letter, then remove that special character and concatenate the word. So I want to turn hello-there into hellothere using regex. I have tried this but I can't seem to figure out how to check individual characters of a string to a regex.
public static void main(String[] args){
String word = "hello-there";
for(int i = 0; i < word.length(); i++)
{
if(word.charAt(i).matches("^[a-zA-Z]+"))
But the last if statement doesn't work. Anybody know how to take care of this?

You may use the following regex, that'll match any character, that is not a lower-case or upper-case letter.
[^a-zA-Z]+
see regex demo
Java ( demo )
class RegEx {
public static void main(String[] args) {
String s = "hello-there";
String r = "[^a-zA-Z]+";
String o = s.replaceAll(r, "");
System.out.println(o); //-> hellothere
}
}

Related

How to exclude a character from regular expression?

I want to replace all non words characters from a string but I need to check if the word has a hyphen in it but the replace will delete the hyphen .
is there a way to do that after I replace everything that is not a letter or do I have to check before replacing ?
this is my code
word = word.replaceAll("[^a-zA-Z]", "").toLowerCase();
Use the regex, [^\w-] which means NOT(a word character or -).
public class Main {
public static void main(String[] args) {
// Test
String word = "Hello :) Hi, How are you doing? The Co-operative bank is open 2day!";
word = word.replaceAll("[^\\w-]", "").toLowerCase();
System.out.println(word);
}
}
Output:
hellohihowareyoudoingtheco-operativebankisopen2day
Note that a word character (i.e. \w) includes A-Za-z0-9_. If you want your regex to restrict only up to alphabets and hyphen, you should use [^A-Za-z\-]
public class Main {
public static void main(String[] args) {
// Test
String word = "Hello :) Hi, How are you doing? The Co-operative bank is open 2day!";
word = word.replaceAll("[^A-Za-z\\-]", "").toLowerCase();
System.out.println(word);
}
}
Output:
hellohihowareyoudoingtheco-operativebankisopenday
I need to check if the word has a hyphen in it but the replace will delete the hyphen
So check if there is a hyphen before you strip non-alpha characters.
if(word.contains("-")) {
//do whatever
}
//remove non-alpha chars

String Split using a regular expression in Java?

I am trying split a string based on regular expression which contains "[.,?!]+'" all these characters including a single space but splitting is not happening?
Here's my class:
public class splitStr {
public static void main(String[] args) {
String S="He is a very very good boy, isn't he?";
S.trim();
if(1<=S.length() && S.length()<=400000){
String delim ="[ .,?!]+'";
String []s=S.split(delim);
System.out.println(s.length);
for(String d:s)
{
System.out.println(d);
}
}
}
}
The reason it's not working is because not all the delimiters are within the square brackets.
String delim ="[ .,?!]+'"; // you wrote this
change to this:
String delim ="[ .,?!']";
Do the characters +, ', [ and ] must be part of the split?
I'm asking this because plus sign and brackets have special meaning in regular expressions, and if you want them to be part of the match, they must be escaped with \
So, if you want an expression that includes all these characters, it should be:
delim = "[\\[ .,\\?!\\]\\+']"
Note that I had to write \\ because the backslash needs to be escaped inside java strings. I'm also not sure if ? and + need to be escaped because they're inside brackets (test it with and without backslashes before them)
I'm not in a front of a computer right now, so I haven't tested it, but I believe it should work.
import java.util.*;
import java.util.stream.Collectors;
public class StringToken {
public static void main(String[] args) {
String S="He is a very very good boy, isn't he?";
S.trim();
if(1<=S.length() && S.length()<=400000){
String delim = "[ .,?!']";
String []s=S.split(delim);
List<String> d = Arrays.asList(s);
d= d.stream().filter(item-> (item.length() > 0)).collect(Collectors.toList());
System.out.println(d.size());
for(String m:d)
{
System.out.println(m);
}
}
}
}

How to check specific special character in String

I am having below String value, in that how can I find the only this four specified special character like [],:,{},-() (square bracket, curly bracket, hyphen and colon) in a given String.
String str = "[1-10],{10-20},dhoni:kholi";
Kindly help me as I am new to Java.
I think you can use regular expression like this.
class MyRegex
{
public static void main (String[] args) throws java.lang.Exception
{
String str = "[1-10],{10-20},dhoni:kholi";
String text = str.replaceAll("[a-zA-Z0-9]",""); // replacing all numbers and alphabets with ""
System.out.print(text); // result string
}
}
Hope this will help you.
If it is only characters that you want to check then you can use String.replaceAll method with regular expression
System.out.println("[Hello {}:-,World]".replaceAll("[^\\]\\[:\\-{}]", ""));

replacing consecutive identical character using replace and replaceAll in Java

I'd like to replace all occurrences of 2 consecutive commas (",,") by a marker in between, but I've found that I can't replace the second occurrence. The code is as follows:
String addresses = "a,,,b";
String b = addresses.replace(",,", ",EMPTYADDR,");
System.out.println(b);
I expect the result to be:
a,EMPTYADDR,EMPTYADDR,b
But instead, I get:
a,EMPTYADDR,,b
How should I change the code to get the desired result?
Pass lookaround based regex in replaceAll function. Lookarounds won't consume any character but asserts whether a match is possible or not.
string.replaceAll("(?<=,)(?=,)", "EMPTYADDR");
DEMO
(?<=,) tells the regex engine to lookafter to all the commas.
(?=,) tells the regex engine to match all the boundaries which exists after all the commas only if it's followed by another comma.
So two boundaries are being matched. By replacing the matched boundaries with EMPTYADDR will give you the desired output.
Simple non-Regex method using a while loop
public static void main(String[] args) {
String addresses = "a,,,b";
while (addresses.contains(",,")){
addresses = addresses.replace(",,", ",EMPTYADDR,");
}
System.out.println(addresses);
}
Results:
a,EMPTYADDR,EMPTYADDR,b
You could also split the string, fill in the empty elements, and then reconstruct with String.join()
public static void main(String[] args) {
String addresses = "a,,,b";
String[] pieces = addresses.split(",");
for (int i = 0; i < pieces.length; i++) {
if (pieces[i].isEmpty()) {
pieces[i] = "EMPTYADDR";
}
}
addresses = String.join(",", pieces);
System.out.println(addresses);
}

Replace word with special characters from string in Java

I am writing a method which should replace all words which matches with ones from the list with '****'
characters. So far I have code which works but all special characters are ignored.
I have tried with "\\W" in my expression but looks like I didn't use it well so I could use some help.
Here's code I have so far:
for(int i = 0; i < badWords.size(); i++) {
if (StringUtils.containsIgnoreCase(stringToCheck, badWords.get(i))) {
stringToCheck = stringToCheck.replaceAll("(?i)\\b" + badWords.get(i) + "\\b", "****");
}
}
E.g. I have list of words ['bad', '#$$'].
If I have a string: "This is bad string with #$$" I am expecting this method to return "This is **** string with ****"
Note that method should be aware of case sensitive words, e.g. TesT and test should handle same.
I'm not sure why you use the StringUtils you can just directly replace words that match the bad words. This code works for me:
public static void main(String[] args) {
ArrayList<String> badWords = new ArrayList<String>();
badWords.add("test");
badWords.add("BadTest");
badWords.add("\\$\\$");
String test = "This is a TeSt and a $$ with Badtest.";
for(int i = 0; i < badWords.size(); i++) {
test = test.replaceAll("(?i)" + badWords.get(i), "****");
}
test = test.replaceAll("\\w*\\*{4}", "****");
System.out.println(test);
}
Output:
This is a **** and a **** with ****.
The problem is that these special characters e.g. $ are regex control characters and not literal characters. You'll need to escape any occurrence of the following characters in the bad word using two backslashes:
{}()\[].+*?^$|
My guess is that your list of bad words contains special characters that have particular meanings when interpreted in a regular expression (which is what the replaceAll method does). $, for example, typically matches the end of the string/line. So I'd recommend a combination of things:
Don't use containsIgnoreCase to identify whether a replacement needs to be done. Just let the replaceAll run each time - if there is no match against the bad word list, nothing will be done to the string.
The characters like $ that have special meanings in regular expressions should be escaped when they are added into the bad word list. For example, badwords.add("#\\$\\$");
Try something like this:
String stringToCheck = "This is b!d string with #$$";
List<String> badWords = asList("b!d","#$$");
for(int i = 0; i < badWords.size(); i++) {
if (StringUtils.containsIgnoreCase(stringToCheck,badWords.get(i))) {
stringToCheck = stringToCheck.replaceAll("["+badWords.get(i)+"]+","****");
}
}
System.out.println(stringToCheck);
Another solution: bad words matched with word boundaries (and case insensitive).
Pattern badWords = Pattern.compile("\\b(a|b|ĉĉĉ|dddd)\\b",
Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
String text = "adfsa a dfs bb addfdsaf ĉĉĉ adsfs dddd asdfaf a";
Matcher m = badWords.matcher(text);
StringBuffer sb = new StringBuffer(text.length());
while (m.find()) {
m.appendReplacement(sb, stars(m.group(1)));
}
m.appendTail(sb);
String cleanText = sb.toString();
System.out.println(text);
System.out.println(cleanText);
}
private static String stars(String s) {
return s.replaceAll("(?su).", "*");
/*
int cpLength = s.codePointCount(0, s.length());
final String stars = "******************************";
return cpLength >= stars.length() ? stars : stars.substring(0, cpLength);
*/
}
And then (in comment) the stars with the correct count: one star for a Unicode code point giving two surrogate pairs (two UTF-16 chars).

Categories

Resources