Replace all non numeric characters by only one word - java

I want to do the next replacement:
WORD1234 -> W1234
So, I'm using the regex:
([^\d]*)([0-9]+)([^\d]*)
Replacement: W$2
If the word is WORD1234AAAAA, using the previous regex I have the same result: W1234, which is what I want.
But if the word is WO12RD34 the result I have is: W12W34
What I want basically in all the cases is to remove all non-numeric characters and add the letter W at the beginning.
Update:
The input string does not always start with a W. It can be for example ABC12DE34 and the desired result is: FA1234. Meaning, remove all non-numeric characters and add a word at the beginning.

Try this:
String regex = "(?<start>^W)|(\\D)";
String replacement = "${start}";
System.out.println("WO12RD34".replaceAll(regex, replacement)); //prints W1234
System.out.println("WORD1234AAAAA".replaceAll(regex, replacement)); //prints W1234
With this regex, the "start" capturing group will only be set when the first character is matched. Otherwise, it will be empty.
The idea is that, when the start of the string followed by W is matched, the named "start" pattern would be initialised to ^W. Then, just replace ^W with itself.
Otherwise, when any non-digit character is matched, then the start pattern will not be set (and be empty). Then, also replace the non-digit character with nothing.

Related

matching empty row with regex and skipping

I am trying to implement regex match for empty string coming from csv file which has last column consisting of row number
eg: "","","","","","","","","",5
The regex pattern which i am using is as (\W*\d\W) though for now it is working but in the longer run i am not sure whether it will fulfill the requirement of checking empty row with last column as Digit.
Could some better pattern be suggested. I am still new to regex.
You do not need a regex to match an empty string. Java has a special method for that isEmpty. Just call it on a string that you want to check:
String str = ...;
if (str.isEmpty()) {
// do something if the string is empty
} else {
// do something if the string is not empty
}
Now the reason why your regex does not work is because you match:
\W* - zero or more non word character
\d - any digit (0-9)
\W - one more non word character
This will match something completely different. Your regex will match sequences like:
[[[[[9]
You can read more about semantics of these regex symbols mean here.
If you want to match an empty String with regex you can try the following regex:
^$
Which means:
^ - match beginning of a line
$ - match end of a line
This will only match a line that has nothing between beginning of it and end of it which is an empty line.

Java regex negative lookahead to replace non-triple characters

I'm trying to take a number, convert it into a string and replace all characters that are not a triple.
Eg. if I pass in 1222331 my replace method should return 222. I can find that this pattern exists but I need to get the value and save it into a string for additional logic. I don't want to do a for loop to iterate through this string.
I have the following code:
String first = Integer.toString(num1);
String x = first.replaceAll("^((?!([0-9])\\3{2})).*$","");
But it's replacing the triple digits also. I only need it to replace the rest of the characters. Is my approach wrong?
You can use
first = first.replaceAll("((\\d)\\2{2})|\\d", "$1");
See regex demo
The regex - ((\d)\2{2})|\d - matches either a digit that repeats thrice (and captures it into Group 1), or just matches any other digit. $1 just restores the captured text in the resulting string while removing all others.

How to write this Java regex?

I need to break the string into words by a hyphen. For example:
"WorkInProgress" is converted to "Work-In-Progress"
"NotComplete" is converted to "Not-Complete"
Most of cases one word starts with capital and ends with lowercase.
But there is one exception, "CIInProgress" should be converted to "CI-In-Progress".
I wrote like the code below, any pattern that has lowercase or "CI", followed by an capital, will be added "-" in middle. But it still can't work for "CIInProgress". Can anyone tell me how to correct it?
String str;
String pattern = "([a-z|CI]+)([A-Z])";
str= str.replaceAll(pattern, "$1\\-$2");
You could use a negative lookbehind,
Regex:
(?<!^)([A-Z][a-z])
Replacement string:
-$1
DEMO
Explanation:
(?<!^) Negative lookbehind is used here , which asserts what proceeds the characters [A-Z](uppercase) and also the following [a-z](lowercase) is not a starting anchor. An uppercase letter and the following lowercase letter will be matched only if it satisfies the above mentioned condition.() capturing groups are used to capture the matched characters, captured chars are stored into groups. Later you could get the captured chars by referring it's group index number.
Code:
System.out.println("WorkInProgress".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
System.out.println("NotComplete".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
System.out.println("CIInProgress".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
Output:
Work-In-Progress
Not-Complete
CI-In-Progress
You can't have | in a character class; it will just get interpreted as a literal vertical bar character. Try:
String pattern = "([a-z]+|CI)([A-Z])";
try this
str= str.replaceAll("(?<=\\p{javaLowerCase})(?=\\p{javaUpperCase})", "-");

simple java regular expression not working

I have this simple example of a regular expression. But it is not working. I don't know what I am doing wrong:
String name = "abc";
System.out.println(name.matches("[a-zA-Z]"));
it returns false, it should be true.
use :
name.matches("[a-zA-Z]+") // matches more than one character
or name.matches("\\w+") // matches more than one character
name.matches("[a-zA-Z]") // matches exactly one character.
Add + to your regex to match one or more alphabets,
String name = "abc"; System.out.println(name.matches("[a-zA-Z]+"));
Your regex [a-zA-Z] must match a single alphabet, not more than one.
[a-zA-Z] Match a lowercase alphabet from a-z or match an uppercase alphabet from A-Z.
The reason why this evaluates to false is, it tries to match the entrie string (see doc of String.matches()) to the Pattern [A-Za-z] wich only matches a single character. Either use
Pattern.compile("[A-Za-z]").matcher(str).find() to see if a substring matches (will return true in this case), or alter the RegEx to account for multiple Characters. The cleanest way of doing so is
Pattern.compile("^[A-Za-z]+$");
The ^ marks "start of string" and $ marks "end of string". + means "previous token at least once".
If you want to allow the empty String as well, use
Pattern.compile("^[A-Za-z]*$");
instead (* means "match the previous token 0 or more times")
Try with [a-zA-Z]+
[a-zA-Z] indicates:

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!
There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff
How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

Categories

Resources