Java regex, replace certain characters except

Java regex, replace certain characters except - java

I have this string "u2x4m5x7" and I want replace all the characters but a number followed by an x with "".
The output should be:
"2x5x"
Just the number followed by the x.
But I am getting this:
"2x45x7"
I'm doing this:
String string = "u2x4m5x7";
String s = string.replaceAll("[^0-9+x]","");
Please help!!!

Here is a one-liner using String#replaceAll with two replacements:
System.out.println(string.replaceAll("\\d+(?!x)", "").replaceAll("[^x\\d]", ""));
Here is another working solution. We can iterate the input string using a formal pattern matcher with the pattern \d+x. This is the whitelist approach, of trying to match the variable combinations we want to keep.
String input = "u2x4m5x7";
Pattern pattern = Pattern.compile("\\d+x");
Matcher m = pattern.matcher(input);
StringBuilder b = new StringBuilder();
while(m.find()) {
b.append(m.group(0));
}
System.out.println(b)
This prints:
2x5x

It looks like this would be much simpler by searching to get the match rather than replacing all non matches, but here is a possible solution, though it may be missing a few cases:
\d(?!x)|[^0-9x]|(?<!\d)x
https://regex101.com/r/v6udph/1
Basically it will:
\d(?!x) -- remove any digit not followed by an x
[^0-9x] -- remove all non-x/digit characters
(?<!\d)x -- remove all x's not preceded by a digit
But then again, grabbing from \dx would be much simpler

Capture what you need to $1 OR any character and replace with captured $1 (empty if |. matched).
String s = string.replaceAll("(\\d+x)|.", "$1");
See this demo at regex101 or a Java demo at tio.run

Related

regex break string into words in dictionary

I want to create an regex in order to break a string into words in a dictionary. If the string matches, I can iterate each group and make some change. some of the words are prefix of others. However, a regex like /(HH|HH12)+/ will not match string HH12HH link. what's wrong with the regex? should it match the first HH12 and then HH in the string?

You want to match an entire string in Java that should only contain HH12 or HH substrings. It is much easier to do in 2 steps: 1) check if the string meets the requirements (here, with matches("(?:HH12|HH)+")), 2) extract all tokens (here, with HH12|HH or HH(?:12)?, since the first alternative in an unanchored alternation group "wins" and the rest are not considered).
String str = "HH12HH";
Pattern p = Pattern.compile("HH12|HH");
List<String> res = new ArrayList<>();
if (str.matches("(?:HH12|HH)+")) { // If the whole string consists of the defined values
Matcher m = p.matcher(str);
while (m.find()) {
res.add(m.group());
}
}
System.out.println(res); // => [HH12, HH]
See the Java demo
An alternative is a regex that will check if a string meets the requirements with a lookahead at the beginning, and then will match consecutive tokens with a \G operator:
String str = "HH12HH";
Pattern p = Pattern.compile("(\\G(?!^)|^(?=(?:HH12|HH)+$))(?:HH12|HH)");
List<String> res = new ArrayList<>();
Matcher m = p.matcher(str);
while (m.find()) {
res.add(m.group());
}
System.out.println(res);
See another Java demo
Details:
(\\G(?!^)|^(?=(?:HH12|HH)+$)) - the end of the previous successful match (\\G(?!^)) or (|) start of string (^) that is followed with 1+ sequences of HH12 or HH ((?:HH12|HH)+) up to the end of string ($)
(?:HH12|HH) - either HH12 or HH.

In the string HH12HH, the regex (HH|HH12)+ will work this way:
HH12HH
^ - both option work, continue
HH12HH
^ - First condition is entierly satisfied, mark it as match
HH12HH
^ - No Match
HH12HH
^ - No Match
As you setted the A flag, which add the anchor to the start of the string, the rest will not raise a match. If you remove it, the pattern will match both HH at the start & at the end.
In this case, you have three options:
Put the longuest pattern first /(HH12|HH)/Ag. See demoThe one I prefer.
Mutualize the sharing part and use an optional group /(HH(?:12)?)/Ag. See second demo
Put a $ at the end like so /(HH|HH12)$/Ag

The problem you are having is entirely related to the way the regex engine decides what to match.
As I explained here, there are some regex flavors that pick the longest alternation... but you're not using one. Java's regex engine is the other type: the first matching alternation is used.
Your regex works a lot like this code:
if(bool1){
// This is where `HH` matches
} else if (bool1 && bool2){
// This is where `HH12` would match, but this code will never execute
}
The best way to fix this is to order your words in reverse, so that HH12 occurs before HH.
Then, you can just match with an alteration:
HH12|HH
It should be pretty obvious what matches, since you can get the results of each match.
(You could also put each word in its own capture group, but that's a bit harder to work with.)

Why does this regex capture the excluded character?

I have a regex like this:
(?:(\\s| |\\A|^))(?:#)[A-Za-z0-9]{2,}
What I am trying to do is find a pattern that starts with an # and has two or more characters after, however it can't start in the middle of a word.
I'm new to regex but was under the impression ?: matches but then excludes the character however my regex seems to match but include the characters. Ideally I'd like for "#test" to return "test" and "test#test" to not match at all.
Can anyone tell me what I've done wrong?
Thanks.

Your understanding is incorrect. The difference between (...) and (?:...) is only that the former also creates a numbered match group which can be referred to with a backreference from within the regex, or as a captured match group from code following the match.
You could change the code to use lookbehinds, but the simple and straightforward fix is to put ([A-Za-z0-9]{2,}) inside regular parentheses, like I have done here, and retrieve the first matched group. (The # doesn't need any parentheses around it in this scenario, but the ones you have are harmless.)

Try this : You could use word boundary to specify your condition.
public static void main(String[] args) {
String s1 = "#test";
String s2 = "test#test";
String pattern = "\\b#\\w{2,}\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(s1);
m.find();
System.out.println(m.group());
}
o/p :
#test
throws `IllegalStateException` in the second case (s2)..

How about:
\W#[\S]{2}[\S]*
The strings caught by this regular expression needs to be trimmed and remove the first character.

I guess you better need the following one:
(?<=(?<!\w)#)\w{2,}
Debuggex Demo
Don't forget to escape the backslashes in Java since in a string literal:
(?<=(?<!\\w)#)\\w{2,}

how to check all character in a string is lowercase using java

I tried like this but it outputs false,Please help me
String inputString1 = "dfgh";// but not dFgH
String regex = "[a-z]";
boolean result;
Pattern pattern1 = Pattern.compile(regex);
Matcher matcher1 = pattern1.matcher(inputString1);
result = matcher1.matches();
System.out.println(result);

Your solution is nearly correct. The regex must say "[a-z]+"—include a quantifier, which means that you are not matching a single character, but one or more lowercase characters. Note that the uber-correct solution, which matches any lowercase char in Unicode, and not only those from the English alphabet, is this:
"\\p{javaLowerCase}+"
Additionally note that you can achieve this with much less code:
System.out.println(input.matches("\\p{javaLowerCase}*"));
(here I am alternatively using the * quantifier, which means zero or more. Choose according to the desired semantics.)

you are almost there, except that you are only checking for one character.
String regex = "[a-z]+";
the above regex would check if the input string would contain any number of characters from a to z
read about how to use Quantifiers in regex

Use this pattern :
String regex = "[a-z]*";
Your current pattern only works if the tested string is one char only.
Note that it does exactly what it looks like : it doesn't really test if the string is in lowercase but if it doesn't contain chars outside [a-z]. This means it returns false for lowercase strings like "àbcd". A correct solution in a Unicode world would be to use the Character.isLowercase() function and loop over the string.

It should be
^[a-z]+$
^ is the start of string
$ is the end of string
[a-z]+ matches 1 to many small characters
You need to use quantifies like * which matches 0 to many chars,+ which matches 1 to many chars..They would matches 0 or 1 to many times of the preceding character or range

Why bother with a regular expression ?
String inputString1 = "dfgh";// but not dFgH
boolean result = inputString1.toLowerCase().equals( inputString1 );

Java Regex to split numbers with unknown number of spaces

I have a string of numbers that are a little weird. The source I'm pulling from has a non-standard formatting and I'm trying to switch from a .split where I need to specify an exact method to split on (2 spaces, 3 spaces, etc.) to a replaceall regex.
My data looks like this:
23574 123451 81239 1234 19274 4312457 1234719
I want to end up with
23574,xxxxx,xxxxx,xxxx
So I can just do a String.split on the ,

I will use \s Regex
This is its usage on Java
String[] numbers = myString.split("\\s+");

final Iterable<String> splitted = Splitter.on('').trimResults().omitEmptyStrings().split(input);
final String output = Joiner.on(',').join(splitted);
with Guava Splitter and Joiner

String pattern = "(\s+)";
Pattern regex = Pattern.compile(pattern);
Matcher match = r.matcher(inputString);
match.replaceAll(",");
String stringToSplit = match.toString();
I think that should do it for you. If not, googling for the Matcher and Pattern classes in the java api will be very helpful.

I understand this problem as a way to obtain integer numbers from a string with blank (not only space) separators.
The accepted solution does not work if the separator is a TAB \t for instance or if it has an \n at the end.
If we define an integer number as a sequence of digits, the best way to solve this is using a simple regular expression. Checking the Java 8 Pattern API, we can find that \D represents any non digit character:
\D A non-digit: [^0-9]
So if the String.split() method accepts a regular expression with the possible separators, it is easy to send "\\D+" to a trimmed string and get the result in one shot like this.
String source = "23574 123451 81239 1234 19274 4312457 1234719";
String trimmed = source.trim();
String[] numbers = trimmed.split("\\D+");
It is translated as split this trimmed string using any non digit character sequence as a possible separator.

How can I make a Java regex all or nothing?

I'm trying to make a regex all or nothing in the sense that the given word must EXACTLY match the regular expression - if not, a match is not found.
For instance, if my regex is:
^[a-zA-Z][a-zA-Z|0-9|_]*
Then I would want to match:
cat9
cat9_
bob_____
But I would NOT want to match:
cat7-
cat******
rango78&&
I want my regex to be as strict as possible, going for an all or nothing approach. How can I go about doing that?
EDIT: To make my regex absolutely clear, a pattern must start with a letter, followed by any number of numbers, letters, or underscores. Other characters are not permitted. Below is the program in question I am using to test out my regex.
Pattern p = Pattern.compile("^[a-zA-Z][a-zA-Z|0-9|_]*");
Scanner in = new Scanner(System.in);
String result = "";
while(!result.equals("-1")){
result = in.nextLine();
Matcher m = p.matcher(result);
if(m.find())
{
System.out.println(result);
}
}

I think that if you use String.matches(regex), then you will get the effect you are looking for. The documentation says that matches() will return true only if the entire string matches the pattern.

The regex won't match the second example. It's already strict, since * and & are not in the allowed set of characters.
It may match a prefix, but you can avoid this by adding '$' to the end of the regex, which explicitly matches end of input. So try,
^[a-zA-Z][a-zA-Z|0-9|_]*$
This will ensure the match is against the entire input string, and not just a prefix.

Note that \w is the same as [A-Za-z0-9_]. And you need to anchor to the end of the string like so:
Pattern p = Pattern.compile("^[a-zA-Z]\\w*$")

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex, replace certain characters except - java

Capture what you need to $1 OR any character and replace with captured $1 (empty if |. matched). String s = string.replaceAll("(\\d+x)|.", "$1"); See this demo at regex101 or a Java demo at tio.run

Related

regex break string into words in dictionary

Why does this regex capture the excluded character?

how to check all character in a string is lowercase using java

Java Regex to split numbers with unknown number of spaces

How can I make a Java regex all or nothing?

Categories

Resources