Regex: replace whitespaces with hyphens and allow only a-z - java

I'm struggling with regex here.
How do I replace whitespaces with hyphens and allow only a-z symbols?
public String filterSpeciesName(String species) {
return species.replaceAll("[^a-zA-Z]", "").toLowerCase();
}
An example would be
input string "Bar''r$ack Put1in"
output string "barrack-putin"

return species.trim().replaceAll("\\s", "-").replaceAll("[^a-zA-Z-]", "").toLowerCase();

To replace any space character by hyphens, use String#replaceAll("\\s", "-").
Then, if you want to simply remove the characters that are not a-z, use replaceAll("[^a-zA-Z-]", ""), assuming you don't want to get rid of your newly added hyphens :)
But I would rather recommend you to maybe just:
match if species.replaceAll("\\s", "-") matches ^[a-zA-Z-]+$
throw an Exception if this is not the case
return the formatted value otherwise

Related

find if a string ends with semicolon or a word

I need a regex to determine if a string ends with semicolon or "BEGIN" or "THEN". Also before BEGIN and THEN words, there must be a white space or line break character.
if(strLineText.matches(";|THEN|BEGIN$"))
This works for THEN and BEGIN but not for semicolon. And also with this regex I could not determine if THEN and BEGIN are exact words.
You need to put them inside a group.
if(strLineText.matches("(?s).*(?:;|\\bTHEN|\\bBEGIN)$"))
or
if(strLineText.matches("(?s).*(?:;|\\sTHEN|\\sBEGIN)$"))
You can use a simple lookahead for the same.
^(?=.*(?:;|[ \\n]THEN|[ \\n]BEGIN)$).*$
This is not a regex.
You can use .endsWith() method too.
String str = "hey;";
if(str.endsWith(";"))
System.out.println("Ends with a ;");
public static boolean endsWithWord(String str, String word)
{
return str.endsWith(word);
}
System.out.println(endsWithWord("hey;", ";"));
System.out.println(endsWithWord("umm BEGIN", "BEGIN"));
System.out.println(endsWithWord("umm THEN", "THEN"));

Regular expression to remove HTML tags doesn't match

I have a String like <li><font color='#008000'> [INFO]a random user. and I want to eliminate html tags such as <li> and <font> from this String.
I tried to achieve this with String.replaceAll method in Java but it doesn't work...
public static String removeHTMLTags(String original){
String str = original.replaceAll("^<.+>$", "");
return str;
}
Your regex isn't finding a match because the ^ and $ anchors specify that the very first character in the input string must be < and the very last must be >.
Without those anchors, your regex still won't do what you want, however, because quantifiers (such as .+) are by default greedy.
So if your input string was text1 <a href=foo>bar</a> text2, your transformed output would be text1 text2, because the regex would match everything from the first < to the last >.
So in order to stop at the first >, you should make your quantifier non-greedy: .+?.
Remove the ^ and $ and use a reluctant quantifier with the dotall flag (so dot matches newlines too):
public static String removeHTMLTags(String original){
return original.replaceAll("(?s)<.+?>", "");
}
or use a negated character class (which will match newlines)
public static String removeHTMLTags(String original){
return original.replaceAll("<[^>]+>", "");
}
You're transforming a HTML string that might have newline characters as well. DOT doesn't match new line characters in regex. You need to use (?s) (DOTALL) flag with lazy quantifier and without anchors:
String str = original.replaceAll("(?s)<.+?>", "");
Though I must caution you using regex to parse/transform HTML, it can be error prone.

Why is this Java regex not working?

I'm trying to match any string consisting of:
Any alphanumeric string of 1+ chars; then
Two periods (".."); then
Any alphanumeric string of 1+ chars
For example:
mydatabase..mytable
anotherDatabase23..table28
etc.
Given the following function:
public boolean isValidDBTableName(String candidate) {
if(candidate.matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"))
return true;
else
return false;
}
Passing this function the value "mydb..tablename" causes it to return false. Why? Thanks in advance!
As NeplatnyUdaj has pointed out in comment, your current regex should return true for the input "mydb..tablename".
However, your regex has the problem of over-matching, where it returns true for invalid names such as nodotname.
You need to escape ., since in Java regex, it will match any character except for line separators:
"[a-zA-Z0-9]+\\.\\.[a-zA-Z0-9]+"
In regex, you can escape meta-characters (character with special meaning) with \. To specify \ in string literal, you need to escape it again.
You must escape the period in regexes. As a \ must also be escaped, this gives
"[a-zA-Z0-9]+\\.\\.[a-zA-Z0-9]+"
I just tried your regex in Eclipse and it worked. Or at least did not fail. Try stripping whitespace characters.
#Test
public void test()
{
String testString = "mydb..tablename";
Assert.assertTrue("no match", testString.matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"));
Assert.assertFalse("falsematch", "a.b".matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"));
}

How to replace special characters in a string?

I have a string with lots of special characters. I want to remove all those, but keep alphabetical characters.
How can I do this?
That depends on what you mean. If you just want to get rid of them, do this:
(Update: Apparently you want to keep digits as well, use the second lines in that case)
String alphaOnly = input.replaceAll("[^a-zA-Z]+","");
String alphaAndDigits = input.replaceAll("[^a-zA-Z0-9]+","");
or the equivalent:
String alphaOnly = input.replaceAll("[^\\p{Alpha}]+","");
String alphaAndDigits = input.replaceAll("[^\\p{Alpha}\\p{Digit}]+","");
(All of these can be significantly improved by precompiling the regex pattern and storing it in a constant)
Or, with Guava:
private static final CharMatcher ALNUM =
CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z'))
.or(CharMatcher.inRange('0', '9')).precomputed();
// ...
String alphaAndDigits = ALNUM.retainFrom(input);
But if you want to turn accented characters into something sensible that's still ascii, look at these questions:
Converting Java String to ASCII
Java change áéőűú to aeouu
ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars
I am using this.
s = s.replaceAll("\\W", "");
It replace all special characters from string.
Here
\w : A word character, short for [a-zA-Z_0-9]
\W : A non-word character
You can use the following method to keep alphanumeric characters.
replaceAll("[^a-zA-Z0-9]", "");
And if you want to keep only alphabetical characters use this
replaceAll("[^a-zA-Z]", "");
Replace any special characters by
replaceAll("\\your special character","new character");
ex:to replace all the occurrence of * with white space
replaceAll("\\*","");
*this statement can only replace one type of special character at a time
Following the example of the Andrzej Doyle's answer, I think the better solution is to use org.apache.commons.lang3.StringUtils.stripAccents():
package bla.bla.utility;
import org.apache.commons.lang3.StringUtils;
public class UriUtility {
public static String normalizeUri(String s) {
String r = StringUtils.stripAccents(s);
r = r.replace(" ", "_");
r = r.replaceAll("[^\\.A-Za-z0-9_]", "");
return r;
}
}
string Output = Regex.Replace(Input, #"([ a-zA-Z0-9&, _]|^\s)", "");
Here all the special characters except space, comma, and ampersand are replaced. You can also omit space, comma and ampersand by the following regular expression.
string Output = Regex.Replace(Input, #"([ a-zA-Z0-9_]|^\s)", "");
Where Input is the string which we need to replace the characters.
Here is a function I used to remove all possible special characters from the string
let name = name.replace(/[&\/\\#,+()$~%!.„'":*‚^_¤?<>|#ª{«»§}©®™ ]/g, '').toLowerCase();
You can use basic regular expressions on strings to find all special characters or use pattern and matcher classes to search/modify/delete user defined strings. This link has some simple and easy to understand examples for regular expressions: http://www.vogella.de/articles/JavaRegularExpressions/article.html
You can get unicode for that junk character from charactermap tool in window pc and add \u e.g. \u00a9 for copyright symbol.
Now you can use that string with that particular junk caharacter, don't remove any junk character but replace with proper unicode.
For spaces use "[^a-z A-Z 0-9]" this pattern

Remove end of line characters from end of Java String

I have a string which I'd like to remove the end of line characters from the very end of the string only using Java
"foo\r\nbar\r\nhello\r\nworld\r\n"
which I'd like to become
"foo\r\nbar\r\nhello\r\nworld"
(This question is similar to, but not the same as question 593671)
You can use s = s.replaceAll("[\r\n]+$", "");. This trims the \r and \n characters at the end of the string
The regex is explained as follows:
[\r\n] is a character class containing \r and \n
+ is one-or-more repetition of
$ is the end-of-string anchor
References
regular-expressions.info/Anchors, Character Class, Repetition
Related topics
You can also use String.trim() to trim any whitespace characters from the beginning and end of the string:
s = s.trim();
If you need to check if a String contains nothing but whitespace characters, you can check if it isEmpty() after trim():
if (s.trim().isEmpty()) {
//...
}
Alternatively you can also see if it matches("\\s*"), i.e. zero-or-more of whitespace characters. Note that in Java, the regex matches tries to match the whole string. In flavors that can match a substring, you need to anchor the pattern, so it's ^\s*$.
Related questions
regex, check if a line is blank or not
how to replace 2 or more spaces with single space in string and delete leading spaces only
Wouldn't String.trim do the trick here?
i.e you'd call the method .trim() on your string and it should return a copy of that string minus any leading or trailing whitespace.
The Apache Commons Lang StringUtils.stripEnd(String str, String stripChars) will do the trick; e.g.
String trimmed = StringUtils.stripEnd(someString, "\n\r");
If you want to remove all whitespace at the end of the String:
String trimmed = StringUtils.stripEnd(someString, null);
Well, everyone gave some way to do it with regex, so I'll give a fastest way possible instead:
public String replace(String val) {
for (int i=val.length()-1;i>=0;i--) {
char c = val.charAt(i);
if (c != '\n' && c != '\r') {
return val.substring(0, i+1);
}
}
return "";
}
Benchmark says it operates ~45 times faster than regexp solutions.
If you have Google's guava-librariesin your project (if not, you arguably should!) you'd do this with a CharMatcher:
String result = CharMatcher.any("\r\n").trimTrailingFrom(input);
String text = "foo\r\nbar\r\nhello\r\nworld\r\n";
String result = text.replaceAll("[\r\n]+$", "");
"foo\r\nbar\r\nhello\r\nworld\r\n".replaceAll("\\s+$", "")
or
"foo\r\nbar\r\nhello\r\nworld\r\n".replaceAll("[\r\n]+$", "")

Categories

Resources