I need to remove a doubled letter from a string using regex operations in java.
Eg: PRINCEE -> PRINCE
APPLE -> APLE
Simple Solution (remove duplicate characters)
Like this:
final String str = "APPLEE";
String replaced = str.replaceAll("(.)\\1", "$1");
System.out.println(replaced);
Output:
APLE
Not just any Chracters, Letters only
As #Jim comments correctly, the above matches any double character, not just letters. Here are a few variations that just match letters:
// the basics, ASCII letters. these two are equivalent:
str.replaceAll("([A-Za-z])\\1", "$1");
str.replaceAll("(\\p{Alpha})\\1", "$1");
// Unicode Letters
str.replaceAll("(\\p{L})\\1", "$1");
// anything where Character.isLetter(ch) returns true
str.replaceAll("(\\p{javaLetter})\\1", "$1");
References:
For additional reference:
Character.isLetter(ch) (javadocs)
any method in Character of
the form Character.isXyz(char)
enables a pattern named
\p{javaXyz} (mind the
capitalization). This mechanism is
described in the Pattern
javadocs
Unicode blocks and categories can
also be matched with the \p and
\P constructs as in Perl. \p{prop}
matches if the input has the
property prop, while \P{prop} does
not match if the input has that
property. This mechanism is also
described in the Pattern
javadocs
String s = "...";
String replaced = s.replaceAll( "([A-Z])\\1", "$1" );
If you want to replace just duplicate ("AA"->"A", "AAA" -> "AA") use
public String undup(String str) {
return str.replaceAll("(\\w)\\1", "$1");
}
To replace triplicates etc use: str.replaceAll("(\\w)\\1+", "$1");
To replace only a single dupe is a long string (AAAA->AAA, AAA->AA) use: str.replaceAll("(\\w)(\\1+)", "$2");
This can be done simply by iterating over the String instead of having to resort to regexes.
StringBuilder ret=new StringBuilder(text.length());
if (text.length()==0) return "";
ret.append(text.charAt(0));
for(int i=1;i<text.length();i++){
if (text.charAt(i)!=text.charAt(i-1))
ret.append(text.charAt(i));
}
return ret.toString();
Related
I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
I have a string which is of the form
String str = "124333 is the otp of candidate number 9912111242.
Please refer txn id 12323335465645 while referring blah blah.";
I need 124333, 9912111242 and 12323335465645 in a string array. I have tried this with
while (Character.isDigit(sms.charAt(i)))
I feel that running the above said method on every character is inefficient. Is there a way I can get a string array of all the numbers?
Use a regex (see Pattern and matcher):
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(<your string here>);
while (m.find()) {
//m.group() contains the digits you want
}
you can easily build ArrayList that contains each matched group you find.
Or, as other suggested, you can split on non-digits characters (\D):
"blabla 123 blabla 345".split("\\D+")
Note that \ has to be escaped in Java, hence the need of \\.
You can use String.split():
String[] nbs = str.split("[^0-9]+");
This will split the String on any group of non-numbers digits.
And this works perfectly for your input.
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
System.out.println(Arrays.toString(str.split("\\D+")));
Output:
[124333, 9912111242, 12323335465645]
\\D+ Matches one or more non-digit characters. Splitting the input according to one or more non-digit characters will give you the desired output.
Java 8 style:
long[] numbers = Pattern.compile("\\D+")
.splitAsStream(str)
.mapToLong(Long::parseLong)
.toArray();
Ah if you only need a String array, then you can just use String.split as the other answers suggests.
Alternatively, you can try this:
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
str = str.replaceAll("\\D+", ",");
System.out.println(Arrays.asList(str.split(",")));
\\D+ matches one or more non digits
Output
[124333, 9912111242, 12323335465645]
First thing comes into my mind is filter and split, then i realized that it can be done via
String[] result =str.split("\\D+");
\D matches any non-digit character, + says that one or more of these are needed, and leading \ escapes the other \ since \D would be parsed as 'escape character D' which is invalid
I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
I have a string with lots of special characters. I want to remove all those, but keep alphabetical characters.
How can I do this?
That depends on what you mean. If you just want to get rid of them, do this:
(Update: Apparently you want to keep digits as well, use the second lines in that case)
String alphaOnly = input.replaceAll("[^a-zA-Z]+","");
String alphaAndDigits = input.replaceAll("[^a-zA-Z0-9]+","");
or the equivalent:
String alphaOnly = input.replaceAll("[^\\p{Alpha}]+","");
String alphaAndDigits = input.replaceAll("[^\\p{Alpha}\\p{Digit}]+","");
(All of these can be significantly improved by precompiling the regex pattern and storing it in a constant)
Or, with Guava:
private static final CharMatcher ALNUM =
CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z'))
.or(CharMatcher.inRange('0', '9')).precomputed();
// ...
String alphaAndDigits = ALNUM.retainFrom(input);
But if you want to turn accented characters into something sensible that's still ascii, look at these questions:
Converting Java String to ASCII
Java change áéőűú to aeouu
ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars
I am using this.
s = s.replaceAll("\\W", "");
It replace all special characters from string.
Here
\w : A word character, short for [a-zA-Z_0-9]
\W : A non-word character
You can use the following method to keep alphanumeric characters.
replaceAll("[^a-zA-Z0-9]", "");
And if you want to keep only alphabetical characters use this
replaceAll("[^a-zA-Z]", "");
Replace any special characters by
replaceAll("\\your special character","new character");
ex:to replace all the occurrence of * with white space
replaceAll("\\*","");
*this statement can only replace one type of special character at a time
Following the example of the Andrzej Doyle's answer, I think the better solution is to use org.apache.commons.lang3.StringUtils.stripAccents():
package bla.bla.utility;
import org.apache.commons.lang3.StringUtils;
public class UriUtility {
public static String normalizeUri(String s) {
String r = StringUtils.stripAccents(s);
r = r.replace(" ", "_");
r = r.replaceAll("[^\\.A-Za-z0-9_]", "");
return r;
}
}
string Output = Regex.Replace(Input, #"([ a-zA-Z0-9&, _]|^\s)", "");
Here all the special characters except space, comma, and ampersand are replaced. You can also omit space, comma and ampersand by the following regular expression.
string Output = Regex.Replace(Input, #"([ a-zA-Z0-9_]|^\s)", "");
Where Input is the string which we need to replace the characters.
Here is a function I used to remove all possible special characters from the string
let name = name.replace(/[&\/\\#,+()$~%!.„'":*‚^_¤?<>|#ª{«»§}©®™ ]/g, '').toLowerCase();
You can use basic regular expressions on strings to find all special characters or use pattern and matcher classes to search/modify/delete user defined strings. This link has some simple and easy to understand examples for regular expressions: http://www.vogella.de/articles/JavaRegularExpressions/article.html
You can get unicode for that junk character from charactermap tool in window pc and add \u e.g. \u00a9 for copyright symbol.
Now you can use that string with that particular junk caharacter, don't remove any junk character but replace with proper unicode.
For spaces use "[^a-z A-Z 0-9]" this pattern