regex strip spaces hyphen

regex strip spaces hyphen - java

I am unable to strip one space before and after a hyphen. I have tried: -
sample.replaceAll("[\\s\\-\\s]","")
and permutations to no avail. I dont want to strip all spaces, neither all the intervening spaces. I am trying to parse a string based on " " but want to eliminate "-". Any insight appreciated.

[\s\-\s] is a character class, and does not matches space followed by - followed by space. It matches any of the characters - space, and -, and replace them with empty string.
You can use this: -
sample = sample.replaceAll("[ ]-[ ]","-");
Or, even String.replace would work here. You don't really need a replaceAll: -
sample = sample.replace(" - ", "-");

Related

Java, Regex, strip unwanted characters [trailing, leading, between]

i need help for an regular expression to strip unwanted characters from an String (in Java).
I solved this issue with 4 regular expression following each other.
The replace will be called many times [peeks: 50+ times/sec] it and decreases performance.
But i think it sure possible with an single expression, so the performance will be increased a little.
The TestString is
" ! ... my-Cruc i#l_\\/Disp lay.Na#m3 ?;()! "
The tasks i like to perform with regex
Remove all leading non-alpha charcters – [Beginning of String]
Remove all trailing non-alphanumeric characters – [End of String]
Remove all non-alphanumeric characters(except [_-.]) between
So the result will be
my-Cruil_Display.Nam3
The Problem is the switch between, the built-in patterns Alnum and alpha, depending on position in string (beginning, end) and the exception characters [_-.] between them.
I tried this many times in the last few days, but i do not get it to work.
Removing leading non-alpha characters is working with regex
^([^\\p{Alpha}]+)?
But if i append the „between“ it doesnt work longer anything
Removing trailing non-alpha charcter with regex
([^\\p{Alnum}]+$)
is working , but not im combination with all other regex
One of the last tries are
(^[^\\p{Alpha}]+)?[^\\p{Alnum}\\._-]+([^\\p{Alnum}]+$)
Can anyone help to get this working

You may use
^\P{Alpha}+|\P{Alnum}+$|[^\p{Alnum}_.-]
Java:
s = s.replaceAll("^\\P{Alpha}+|\\P{Alnum}+$|[^\\p{Alnum}_.-]", "");
Or, to make it Unicode aware, add the (?U) flag:
s = s.replaceAll("(?U)^\\P{Alpha}+|\\P{Alnum}+$|[^\\p{Alnum}_.-]", "");
Details
^\P{Alpha}+ - any 1 or more chars other than alphabetic chars at the start of the string
| - or
\P{Alnum}+$ - any 1 or more chars other than alphanumeric chars at the end of the string
| - or
[^\p{Alnum}_.-] - any char other than alphanumeric, _, . and - chars anywhere in the string
See the regex demo.

Remove unwanted characters from string by regex in Java

I have a string here:
javax.swing.JLabel[,380,30,150x25,alignmentX=0.0,alignmentY=0.0]: Hello
I want to remove everything before the ":", including the ":" itself. This would leave only "Hello". I read about regex, but no combination I tried worked. Can someone tell me how to do it. Thanks in advance!

You need to use replaceAll method or replaceFirst.
string.replaceFirst(".*:\\s*", "");
or
string.replaceAll(".*:\\s*", "");
This would give you only Hello. If you remove \\s* pattern,then it would give you <space>Hello string.
.* Matches any character zero or more times, greedily.
: Upto the colon.
\\s* Matches zero or more space characters.

You could also just split the string by : and take the second string. Like this
String sample = "javax.swing.JLabel[,380,30,150x25,alignmentX=0.0,alignmentY=0.0]: Hello";
System.out.println(sample.split(":", -1)[1]);
This will output
<space>Hello
If you want to get rid of that leading space just trim it off like
System.out.println(sample.split(":", -1)[1].trim());

Correct existing regular expression / create a new one

I am trying to learn Regular expressions and am trying to replace values in a string with white-spaces using regular expressions to feed it into a tokenizer. The string might contain many punctuations. However, I do not want to replace whitespaces in string which contain an apostrophe/ hyphen within them.
For example,
six-pack => six-pack
He's => He's
This,that => This That
I tried to replace all the punctuations with whitespace initially but that would not work.
I tried to replace only those punctuations by specifying the wordboundaries as in
\B[^\p{L}\p{N}\s]+\B|\b[^\p{L}\p{N}\s]+\B|\B[^\p{L}\p{N}\s]+\b
But, I am not able to exclude the hyphen and apostrophe from them.
My guess is that the above regex is also very cumbersome and there should be a better way. Is there any?
So, all I am trying to do is:
Replace all punctuations with whitespace
Do not do the above if they are hyphen/apostrophe
Do replace if the hyphen/apostrophe does occur at start/end of a word.
Any help is appreciated.

You can probably work out a set of punctuation characters that are ok between words, and another set that isn't, then define your regular expression based on that.
For instance:
String[] input = {
"six-pack",// => six-pack
"He's",// => He's
"This,that"// => This That"
};
for (String s: input) {
System.out.println(s.replaceAll("(?<=\\w)[\\p{Punct}&&[^'-]](?=\\w)", " "));
}
Output
six-pack
He's
This that
Note
Here I'm defining the Pattern by using a character class including all posix for punctuation, preceded and followed by a word character, but negating a character class containing either ' or -.

You can use this lookahead based regex:
(?!((?!^)['-].))\\p{Punct}
RegEx Demo

You could use negative lookahead assertion like below,
String s = "six-pack\n"
+ "He's\n"
+ "This,that";
System.out.println(s.replaceAll("(?m)^['-]|['-]$|(?!['-])\\p{Punct}", " "));
Output:
six-pack
He's
This that
Explanation:
(?m) Multiline Mode
^['-] Matches ' or - which are at the start.
| OR
['-]$ Matches ' or - which are at the end of the line.
| OR
(?!['-])\\p{Punct} Matches all the punctuations except these two ' or - . It won't touch the matched [-'] symbols (ie, at the start and end).
RegEx Demo

Regular Expression: Replace except from specific characters and whitespace

I am coding in Java and I have a string where I want to keep letters, digits, ":", "-" and whitespaces and remove everything else. So, I have used this piece of code:
str=str.replaceAll("[^\\dA-Za-z#:-\\s*]", "");
It doesn't work.
It does work fine until
str=str.replaceAll("[^\\dA-Za-z#:-]", "");
where everything else, except from letters, digits and the characters ":" and "-" is removed
But when I am trying to add the condition for whitespace characters I am facing problems.
I would appreciate your help.
Thank you in advance.

- when used within character class depicts range..
In your case you were actually trying to match characters from range : to \s which is an invalid range..
Move - to the start
[^-\\dA-Za-z#:\\s]
or end
[^\\dA-Za-z#:\\s-]

The dash must be the first or last character in a character class, or it will be interpreted as a range indicator (as in [A-Z]); in your case [:-\\s] is a meaningless range. Use
str = str.replaceAll("[^\\dA-Za-z#:\\s-]+", "");
(or did you want to keep asterisks in your text, too)?

Removing all standalone occurences of a word from a string with regular expressions in Java

Need advice on how to replace a sub-string like: #sometext, but not replace "#someothertext#somemail.com" sub-string.
For example, when I've got a string something like:
An example with #sometext and also with "#someothertext#somemail.com" sometextafter
And the result, after replacing sub-strings in string above should look like:
An example with and also with "#someothertext#somemail.com" sometextafter
After getting string from a field, I'm using:
String textMod = someText.replaceAll("( |^)[^\"]#[^#]+?( |$)","");
someText = textMod + "#\"" + someone.getEmail() + "\" ";
And then I'm setting this string into field.

You can do a regex on a standalone occurence this way
\b#sometext\b
Putting the \b in front and in the back of the #sometext will make sure that it's a standalone word, not part of another word like #someothertext#sometext.com. Then if it's found the result will be put inside $match, now you can do whatever you want with $match
Hope this helps
From https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html
The \b in the pattern indicates a word boundary, so only the distinct
* word "web" is matched, and not a word partial like "webbing" or "cobweb"
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
}
^ PHP example but you get the point

If there is always a space before and behind the tags to replace, this might suffice.
/\s(#\w+)\s/g

Try this
(?<!\w)#[^#\s]+(?!\S)
See it here on Regexr
Match on a # but only if there is no word character \w before (?<!\w). Then match a sequence of characters that are not # and not whitespace \s but only if its not followed by a non whitespace \S
(?<!\w) is called a negative lookbehind assertion
[^#\s] is called a negated character class, means match anything that is not part of the class
(?!\S) is a negative lookahead assertion

This should correspond to your needs:
str = str.replaceAll("#\w+[^#]", "");

(c#, regex based)
//match #xxx sequences, but only if i can look back and NOT see a #xxx immediately preceding me, and if I don't end with a #
string input = #"[An example with #hello and also with ""##hello#somemail.com"" sometext #lastone";
var pattern = #"(?<!#\w+)(?>#\w+)(?!#)";
var matches = Regex.Matches(input, pattern);

Simply adding spaces before and after "#sometext" would not work if "#sometext" is at the start or end of a sentence. However, just adding a pattern checking for start or end of sentence would not work either, as when you match "#sometext " at the start of a sentence and leave a space " ", this will make the resulting string look strange. Same goes for the end of a sentence.
We need to split the regex replace in to two actions, and perform two seperate regex replaces:
str = str.replaceAll(" #sometext ", " ");
str = str.replaceAll("^#sometext | #sometext$|(?:#sometext ){2,}", "");
^ means start of line, $ means end of line.
EDIT: Added corner case handling of when several #sometext's are after each other.

myString = myString.replaceAll(" #hello ", " ");
If #hello is a single word, then it has spaces before and after, right? So you should find all #hellos with space before and after and replace it with a space.
If you need to remove not only #hellos and all words which are starting with # and not containing other #, use this:
myString = myString.replaceAll(" #[^#]+? ", " ");
[^#] is any symbol except #. +? means match at least one character until reaching the first space.
If you want to remove words with only alphanumeric characters, use \\w instead of [^#]
EDIT:
Yeah, ohaal's right. To make it match at the start and the end of string use this pattern:
( |^)#[^#]+?( |$)
myString = myString.replaceAll("( |^)#hello( |$)", " ");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex strip spaces hyphen - java

Related

Java, Regex, strip unwanted characters [trailing, leading, between]

Remove unwanted characters from string by regex in Java

Correct existing regular expression / create a new one

Regular Expression: Replace except from specific characters and whitespace

Removing all standalone occurences of a word from a string with regular expressions in Java

Categories

Resources