Multiple regex for replacing characters in java - java

I have the following string:
String str = "Klaße, STRAßE, FUß";
Using of combined regex I want to replace German ß letter to ss or SS respectively. To perform this I have:
String replaceUml = str
.replaceAll("ß", "ss")
.replaceAll("A-Z|ss$", "SS")
.replaceAll("^(?=^A-Z)(?=.*A-Z$)(?=.*ss).*$", "SS");
Expected result:
Klasse, STRASSE, FUSS
Actual result:
Klasse, STRAssE, FUSS
Where I'm wrong?

First of all, if you're trying to match some character in the range A-Z, you need to put it in square brackets. This
.replaceAll("A-Z|ss$", "SS")
will look for the three characters A-Z in the source, which isn't what you want. Second, I think you're confused about what | means. If you say this:
.replaceAll("[A-Z]|ss$", "SS")
it will replace any upper-case letter at the end of the word with SS, because | means look for this or that.
A third problem with your approach is that the second and third replaceAll's will look for any ss that was in the original string, even if it didn't come from a ß. This may or may not be what you want.
Here's what I'd do:
String replaceUml = str
.replaceAll("(?<=[A-Z])ß", "SS")
.replaceAll("ß", "ss");
This will first replace all ß by SS if the character before the ß is an upper-case letter; then if there are any ß's left over, they get replaced by ss. Actually, this won't work if the character before ß is an umlaut like Ä, so you probably should change this to
String replaceUml = str
.replaceAll("(?<=[A-ZÄÖÜ])ß", "SS")
.replaceAll("ß", "ss");
(There may be a better way to specify an "upper-case Unicode letter"; I'll look for it.)
EDIT:
String replaceUml = str
.replaceAll("(?<=\\p{Lu})ß", "SS")
.replaceAll("ß", "ss");
A problem is that it won't work if ß is the second character in the text, and the first letter of the word is upper-cased but the rest of the word isn't. In that case you probably want lower-case "ss".
String replaceUml = str
.replaceAll("(?<=\\b\\p{Lu})ß(?=\\P{Lu})", "ss")
.replaceAll("(?<=\\p{Lu})ß", "SS")
.replaceAll("ß", "ss");
Now the first one will replace ß by ss if it's preceded by an upper-case letter that is the first letter of the word but followed by a character that isn't an upper-case letter. \P{Lu} with an upper-case P will match any character other than an upper-case letter (it's the negative of \p{Lu} with a lower-case p). I also included \b to test for the first character of a word.

String replaceUml = str
.replaceAll("(?<=\\p{Lu})ß", "SS")
.replace("ß", "ss")
This uses regex with a preceding unicode upper case letter ("SÜß"), to have capital "SS".
The (?<= ... ) is a look-behind, a kind of context matching. You could also do
.replaceAll("(\\p{Lu})ß", "$1SS")
as ß will not occure at the beginning.
Your main trouble was not using brackets [A-Z].

Breaking your regex into parts:
Regex 101 Demo
Regex
/ß/g
Description
ß Literal ß
g modifier: global. All matches (don't return on first match)
Visualization
Regex 101 Demo
Regex
/([A-Z])ss$/g
Description
1st Capturing group ([A-Z])
Char class [A-Z] matches:
A-Z A character range between Literal A and Literal Z
ss Literal ss
$ End of string
g modifier: global. All matches (don't return on first match)
Visualization
Regex 101 Demo
Regex
/([A-Z]+)ss([A-Z]+)/g
Description
1st Capturing group ([A-Z]+)
Char class [A-Z] 1 to infinite times [greedy] matches:
A-Z A character range between Literal A and Literal Z
ss Literal ss
2nd Capturing group ([A-Z]+)
Char class [A-Z] 1 to infinite times [greedy] matches:
A-Z A character range between Literal A and Literal Z
g modifier: global. All matches (don't return on first match)
Visualization
Specifically for you
String replaceUml = str
.replaceAll("ß", "ss")
.replaceAll("([A-Z])ss$", "$1SS")
.replaceAll("([A-Z]+)ss([A-Z]+)", "$1SS$2");

Use String.replaceFirst() instead of String.replaceAll().
replaceAll("ß", "ss")
This will replace all the occurrences of "ß". Hence the output after this statement becomes something like this :
Klasse, STRAssE, FUss
Now replaceAll("A-Z|ss$", "SS") replaces the last occurrence of "ss" with "SS", hence your final result looks like this :
Klasse, STRAssE, FUSS
To get your expected result try this out :
String replaceUml = str.replaceFirst("ß", "ss").replaceAll("ß", "SS");

Related

Regex to find the first word in a string java without using the string name

I am having a string which can have a sentence containing symbols and numbers and also the sentence can have different lengths
For Example
String myString = " () Huawei manufactures phones"
And the next time myString can have the following words
String myString = " * Audi has amazing cars &^"
How can i use regex to get the first word from the string so that the only word i get in the first myString is "Huawei" and the word i get on the second myString is Audi
Below is what i have tried but it fails when there is a space before the first words and symbols
String regexString = myString .replaceAll("\\s.*","")
You may use this regex with a capture group for matching:
^\W*\b(\w+).*
and replace with: $1
RegEx Demo
Java Code:
s = s.replaceAll("^\\W*\\b(\\w+).*", "$1");
RegEx Details:
^: Start
\W*: Match 0 or more non-word characters
\b: Word boundary
(\w+): Match 1+ word characters and capture it in group #1
.*: Match anything aftereards
See how you get on with:
s = s.replaceAll("^[^\\p{Alpha}]*", "");

Remove leading uppercase char in a String with Regex

I am struggling with another regex case at work. I need to be able to replace a beginning letter-char that is uppercaser. However, the touch is that I want to only be able to remove/replace this char as long as its the first and its standing by itself - What I mean is that it cannot stand next to another letter and be removed - It has to be the only uppercase letter in its space. In my code below I have managed to remove the first uppercase char - However my regex also removes "TH" which is essentially 2 chars which I dont want to remove. Any tips to adjust my regex?
String test = "B, 02 abc";
String test2= "TH - 2. tv";
String works1 = test.replaceAll("^.*([A-Z])", "");
String works2 =test2.replaceAll("^.*([A-Z])", "");
System.out.println(works1);
System.out.println(works2);
//Desired result for works1 = ",02 abc"
//Desired result for works2= "TH- 2. tv"
You can use:
str = str.replaceFirst("^\\p{Lu}\\b", "");
RegEx Demo
RegEx Details:
^: Start
\\p{Lu}: Match any uppercase letter
\\b: Word boundary
Note that if you want to allow optional non-word characters before uppercase letter then use:
str = str.replaceFirst("^\\W*\\p{Lu}\\b", "");

Scanning letters and floats using the java scanner

I have a string which looks like this:
"m 535.71429,742.3622 55.71428,157.14286 c 0,0 165.71429,-117.14286 -55.71428,-157.14286 z"
and i want the java scanner to ouput the following strings: "m", "535.71429", "742.3622", "55.71428", "157.14286", "c", ...
so everything seperated by a comma or a space, but I am having troubles getting it to work.
This is how my code looks like:
Scanner scanner = new Scanner(path_string);
scanner.useDelimiter(",||//s");
String s = scanner.next();
if (s.equals("m")){
s = scanner.next();
point[0] = Float.parseFloat(s);
s = scanner.next();
point[1] = Float.parseFloat(s);
....
but the strings that come out are: "m", " ", "5", "3", ...
I think trouble is with //s. You have to use this pattern:
scanner.useDelimiter("(,|\\s)");
Regex patterns:
abc… Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(ab|cd) Matches ab or cd
We use dual \ because this is special symbol and | isn't
If you want the output to be strings, the Float.parseFloat(s); is of no use for your problem. Is your array a float-array?
Because if it is, your should not get any output but an NumberFormatException, because the string "m" cannot be parsed into a float.
Furthermore, to solve the problem of the single values, you could use a StringBuilder which constructs your numbers and ignores the letters and commas. A special use of the letters should be implemented.
Finally, if it is not absolutely neccessary, use double instead of float. It's just so much safer and might save your from some more problems within you program!

Why isn't my regex matching uppercase characters and underscores?

I have the following Java code:
public static void main(String[] args) {
String var = "ROOT_CONTEXT_MATCHER";
boolean matches = var.matches("/[A-Z][a-zA-Z0-9_]*/");
System.out.println("The value of 'matches' is: " + matches);
}
This prints: The value of 'matches' is: false
Why doesn't my var match the regex? If I am reading my regex correctly, it matches any String:
Beginning with an upper-case char, A-Z; then
Consisting of zero or more:
Lower-case chars a-z; or
Upper-case chars A-Z; or
Digits 0-9; or
An underscore
The String "ROOT_CONTEXT_MATCHER":
Starts with an A-Z char; and
Consists of 19 subsequent characters that are all uppper-case A-Z or are an underscore
What's going on here?!?
The issue is with the forward slash characters at the beginning and at the end of the regex. They don't have any special meaning here and are treated as literals. Simply remove them to get it fixed:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
If you intended to use metacharacters for boundary matching, the correct characters are ^ for the beginning of the line, and $ for the end of the line:
boolean matches = var.matches("^[A-Z][a-zA-Z0-9_]*$");
although these are not needed here because String#matches would match the entire string.
You need to remove regex delimiers i.e. / from Java regex:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
That can be further shortened to:
boolean matches = var.matches("[A-Z]\\w*");
Since \\w is equivalent of [a-zA-Z0-9_] (word character)

Regular Expression - inserting space after comma only if succeeded by a letter or number

In Java I want to insert a space after a String but only if the character after the comma is succeeded by a digit or letter. I am hoping to use the replaceAll method which uses regular expressions as a parameter. So far I have the following:
String s1="428.0,chf";
s1 = s1.replaceAll(",(\\d|\\w)",", ");
This code does successfully distinguish between the String above and one where there is already a space after the comma. My problem is that I can't figure out how to write the expression so that the space is inserted. The code above will replace the c in the String shown above with a space. This is not what I want.
s1 should look like this after executing the replaceAll: "428.0 chf"
s1.replaceAll(",(?=[\da-zA-Z])"," ");
(?=[\da-zA-Z]) is a positive lookahead which would look for a digit or a word after ,.This lookahead would not be replaced since it is never included in the result.It's just a check
NOTE
\w includes digit,alphabets and a _.So no need of \d.
A better way to represent it would be [\da-zA-Z] instead of \w since \w also includes _ which you do not need 2 match
Try this, and note that $1 refers to your matched grouping:
s1.replaceAll(",(\\d|\\w)"," $1");
Note that String.replaceAll() works in the same way as a Matcher.replaceAll(). From the doc:
The replacement string may contain references to captured subsequences
String s1="428.0,chf";
s1 = s1.replaceAll(",([^_]\\w)"," $1"); //Match alphanumeric except '_' after ','
System.out.println(s1);
Output: -
428.0 chf
Since \w matches digits, words, and an underscore, So, [^_] negates the underscore from \w..
$1 represents the captured group.. You captured c after , here, so replace c with _$1 -> _c.. "_" represent a space..
Try this....
public class Tes {
public static void main(String[] args){
String s1="428.0,chf";
String[] sArr = s1.split(",");
String finalStr = new String();
for(String s : sArr){
finalStr = finalStr +" "+ s;
}
System.out.println(finalStr);
}
}

Categories

Resources