Remove leading uppercase char in a String with Regex - java

I am struggling with another regex case at work. I need to be able to replace a beginning letter-char that is uppercaser. However, the touch is that I want to only be able to remove/replace this char as long as its the first and its standing by itself - What I mean is that it cannot stand next to another letter and be removed - It has to be the only uppercase letter in its space. In my code below I have managed to remove the first uppercase char - However my regex also removes "TH" which is essentially 2 chars which I dont want to remove. Any tips to adjust my regex?
String test = "B, 02 abc";
String test2= "TH - 2. tv";
String works1 = test.replaceAll("^.*([A-Z])", "");
String works2 =test2.replaceAll("^.*([A-Z])", "");
System.out.println(works1);
System.out.println(works2);
//Desired result for works1 = ",02 abc"
//Desired result for works2= "TH- 2. tv"

You can use:
str = str.replaceFirst("^\\p{Lu}\\b", "");
RegEx Demo
RegEx Details:
^: Start
\\p{Lu}: Match any uppercase letter
\\b: Word boundary
Note that if you want to allow optional non-word characters before uppercase letter then use:
str = str.replaceFirst("^\\W*\\p{Lu}\\b", "");

Related

Merging 2 regex that allow only English and Arabic characters

I have a string and I want to remove any other character such as (0..9!##$%^&*()_., ...) and keep only alphabetic characters.
After looking up and doing some tests, I got 2 regexes formats:
String str = "123hello!#$% مرحبا. ok";
str = str.replaceAll("[^a-zA-Z]", "");
str = str.replaceAll("\\P{InArabic}+", "");
System.out.println(str);
This should return "hello مرحبا ok".
But of course, this will return an empty string because we're removing any non-Latin characters in the first regex then we remove any non-Arabic characters in the second regex.
My question is, how can I merge these 2 regexes in one to keep only Arabic and English characters only.
Use lowercase p since negation is handled with ^ and no quantifier is needed (but wouldn't hurt) since using replaceAll:
String str = "123hello!#$% مرحبا. ok";
str = str.replaceAll("[^a-zA-Z \\p{InArabic}]", "");
System.out.println(str);
Prints:
hello مرحبا ok
Note based on your expected results you want spaces included so a space is in the character list.

Regex to find the first word in a string java without using the string name

I am having a string which can have a sentence containing symbols and numbers and also the sentence can have different lengths
For Example
String myString = " () Huawei manufactures phones"
And the next time myString can have the following words
String myString = " * Audi has amazing cars &^"
How can i use regex to get the first word from the string so that the only word i get in the first myString is "Huawei" and the word i get on the second myString is Audi
Below is what i have tried but it fails when there is a space before the first words and symbols
String regexString = myString .replaceAll("\\s.*","")
You may use this regex with a capture group for matching:
^\W*\b(\w+).*
and replace with: $1
RegEx Demo
Java Code:
s = s.replaceAll("^\\W*\\b(\\w+).*", "$1");
RegEx Details:
^: Start
\W*: Match 0 or more non-word characters
\b: Word boundary
(\w+): Match 1+ word characters and capture it in group #1
.*: Match anything aftereards
See how you get on with:
s = s.replaceAll("^[^\\p{Alpha}]*", "");

How can I strip all non digits in a string except the first character?

I have a string that I want to make sure that the format is always a + followed by digits.
The following would work:
String parsed = inputString.replaceAll("[^0-9]+", "");
if(inputString.charAt(0) == '+') {
result = "+" + parsed;
}
else {
result = parsed;
}
But is there a way to have a regex in the replaceAll that would keep the + (if exists) in the beginning of the string and replace all non digits in the first line?
The following statement with the given regex would do the job:
String result = inputString.replaceAll("(^\\+)|[^0-9]", "$1");
(^\\+) find either a plus sign at the beginning of string and put it to a group ($1),
| or
[^0-9] find a character which is not a number
$1 and replace it with nothing or the plus sign at the start of group ($1)
You can use this expression:
String r = s.replaceAll("((?<!^)[^0-9]|^[^0-9+])", "");
The idea is to replace any non-digit when it is not the initial character of the string (that's the (?<!^)[^0-9] part with a lookbehind) or any character that is not a digit or plus that is the initial character of the string (the ^[^0-9+] part).
Demo.
What about just
(?!^)\D+
Java string:
"(?!^)\\D+"
Demo at regex101.com
\D matches a character that is not a digit [^0-9]
(?!^) using a negative lookahead to check, if it is not the initial character
Yes you can use this kind of replacement:
String parsed = inputString.replaceAll("^[^0-9+]*(\\+)|[^0-9]+", "$1");
if present and before the first digit in the string, the + character is captured in group 1. For example: dfd+sdfd12+sdf12 returns +1212 (the second + is removed since its position is after the first digit).
try this
1- This will allow negative and positive number and will match app special char except - and + at first position.
(?!^[-+])[^0-9.]
2- If you only want to allow + at first position
(?!^[+])[^0-9.]

JAVA: Replacing words in string

I want to replace words in a string, but I am having little difficulties. Here is what I want to do. I have string:
String a = "I want to replace some words in this string";
It should work like some kind of a translator. I am doing this with String.replaceAll(), but it doesn't work completely because of this. Let's say I am translating from English to German, than this should be the output (Ich means I in German).
String toTranslate = "I";
String translated = "Ich";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
Now the output of the String a will be this:
"ich want to replace some words ich**n** **th**ich**s** **str**ich**ng**"
How to replace just the words, not the subwords in the words?
replaceAll uses regex, so you may add word boundaries or look-around mechanisms to check if there are no non-space characters surrounding word you want to replace.
String toTranslate = "I";
String translated = "Ich";
a = a.replaceAll("(?<!\\S)"+toTranslate.toLowerCase()+"(?!\\S)", translated.toLowerCase());
You can also add quotation mechanism to escape any regex metacharacters like + * ( inside word you want to replace. BTW you don't need to change your string to lower case, simply add case-insensitive flag to regex (?i).
a = a.replaceAll("(?i)(?<!\\S)"+Pattern.quote(toTranslate)+"(?!\\S)", translated.toLowerCase());
Use split(" ") for getting each word in the sentence. And then use replaceAll on each word.
String a = "I want to replace some words in this string";
String toTranslate = "I";
String translated = "Ich";
String newString[]=a.split(" ");
for (String string : newString) {
string=string.replaceAll(toTranslate, toTranslate.toLowerCase());//Adding this line ensures you dont miss any uppercase toTranslate
string=string.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
System.out.println("after translation ="+string);
}
String toTranslate = "I ";
String translated = "Ich ";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
If you add a space after the "I" it should replace it when it comes to the word "Ich" but if your word ends in a "I" then thats another problem
If you assume that I will always be capitalized in English as it should be then
a = a.replaceAll(toTranslate, translated);
will work, otherwise you need to replace both cases
a = a.replaceAll(toTranslate, translated);
a = a.replaceAll("([^a-zA-Z])("+toTranslate.toLowerCase()+")([^a-zA-Z])", "$1"+translated.toLowerCase()+"$3");
Here is a working example
Yes, the word boundaries are the solution. I just did this in the regex:
text.replaceAll("\\b" + parts1[i] + "\\b", map.element.value);
Don't be confused with the second argument it's string (from Hash table).
You can use RegEx's word bound, which is \b
String toTranslate = "\\bI\\b";
String translated = "Ich";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
This should ensure I is separated entirely into its own word
Edit: I misread the question and realized you want whole words. See above, as I have accounted for that

How can I find repeated characters with a regex in Java?

Can anyone give me a Java regex to identify repeated characters in a string? I am only looking for characters that are repeated immediately and they can be letters or digits.
Example:
abccde <- looking for this (immediately repeating c's)
abcdce <- not this (c's seperated by another character)
Try "(\\w)\\1+"
The \\w matches any word character (letter, digit, or underscore) and the \\1+ matches whatever was in the first set of parentheses, one or more times. So you wind up matching any occurrence of a word character, followed immediately by one or more of the same word character again.
(Note that I gave the regex as a Java string, i.e. with the backslashes already doubled for you)
String stringToMatch = "abccdef";
Pattern p = Pattern.compile("(\\w)\\1+");
Matcher m = p.matcher(stringToMatch);
if (m.find())
{
System.out.println("Duplicate character " + m.group(1));
}
Regular Expressions are expensive. You would probably be better off just storing the last character and checking to see if the next one is the same.
Something along the lines of:
String s;
char c1, c2;
c1 = s.charAt(0);
for(int i=1;i<s.length(); i++){
char c2 = s.charAt(i);
// Check if they are equal here
c1=c2;
}

Categories

Resources