Give look behind the priority over the actual regular expression

Give look behind the priority over the actual regular expression - java

I am looking for a regular expression that can strip all 'a' characters from the beginning of an input word (comprising only of English alphabet).
How would I do this using an regular expression?
The following look behind based regex fails to do the job:
(?<=a*?)(\w)+
as for input abc the above regular expression would return abc.
Is there a clean way to do this using lookbehinds?
A (brute force-ish) regular expression that does work is using negation:
(?<=a*)([[^a]&&\w])*
which returns the correct answer of bc for an input word abc.
But I was wondering if there could be a more elegant regular expression, say, using the correct quantifier?

Pattern removeWords = Pattern.compile("\\b(?:a)\\b\\s*", Pattern.CASE_INSENSITIVE);
Matcher fix = removeWords.matcher(YourWord);
String fixedString = fix.replaceAll("");
this will remove a from the current string and if you want to remove some other letters
Pattern removeWords = Pattern.compile("\\b(?:a|b|c)\\b\\s*",Pattern.CASE_INSENSITIVE);
you ca do it this way

I think that a regex for this problem is overkill.
You could instead do:
str = str.startsWith("a") ? str.substring(1) : str;

Try with:
(?i)\\ba?(\\w+)\\b
and replace a word with captured group 1.
Code example:
String word = "aWord Another";
word = word.replaceAll("(?i)\\ba?(\\w+)\\b", "$1");
System.out.println(word);
with output:
Word nother

There are much more simpler way to do this, but as you insist on using using lookbehinds, I will give one. The regex will be
(?<=\b)a+(\w*)
Regex Breakdown
(?<=\b) #Find all word boundaries
a+ #Match the character a literally at least once. We have already ensured using word boundary to find those a's only which are starting of word
(\w*) #Find remaining characters
Regex Demo
Java Code
String str = "abc cdavbvhsza aaabcd";
System.out.println(str.replaceAll("(?<=\\b)a+(\\w*)", "$1"));
Ideone Demo

Related

Java regex, replace certain characters except

I have this string "u2x4m5x7" and I want replace all the characters but a number followed by an x with "".
The output should be:
"2x5x"
Just the number followed by the x.
But I am getting this:
"2x45x7"
I'm doing this:
String string = "u2x4m5x7";
String s = string.replaceAll("[^0-9+x]","");
Please help!!!

Here is a one-liner using String#replaceAll with two replacements:
System.out.println(string.replaceAll("\\d+(?!x)", "").replaceAll("[^x\\d]", ""));
Here is another working solution. We can iterate the input string using a formal pattern matcher with the pattern \d+x. This is the whitelist approach, of trying to match the variable combinations we want to keep.
String input = "u2x4m5x7";
Pattern pattern = Pattern.compile("\\d+x");
Matcher m = pattern.matcher(input);
StringBuilder b = new StringBuilder();
while(m.find()) {
b.append(m.group(0));
}
System.out.println(b)
This prints:
2x5x

It looks like this would be much simpler by searching to get the match rather than replacing all non matches, but here is a possible solution, though it may be missing a few cases:
\d(?!x)|[^0-9x]|(?<!\d)x
https://regex101.com/r/v6udph/1
Basically it will:
\d(?!x) -- remove any digit not followed by an x
[^0-9x] -- remove all non-x/digit characters
(?<!\d)x -- remove all x's not preceded by a digit
But then again, grabbing from \dx would be much simpler

Capture what you need to $1 OR any character and replace with captured $1 (empty if |. matched).
String s = string.replaceAll("(\\d+x)|.", "$1");
See this demo at regex101 or a Java demo at tio.run

Regex to catch all the words and the "i'm you're etc" in Java

I am trying to split lines of a document, by creating a Pattern in Java.
The default Pattern in WordCount example is something like this: "\\s*\\b\\s*".
The problem with this pattern however, is that it splits everything to a single word, while I want to keep things such as (I'm, You're, it's) together. So far, what I've tried is [a-zA-Z]+'{0,1}[a-zA-Z]*,
the problem is that when I have a test string, for example:
Pattern BOUNDARY = "[a-zA-Z]+'{0,1}[a-zA-Z]*"
String test = "Hello i'm #£$#you ##can !!be.
and run
for(String word : BOUNDARY.split(test){
println(word)}
I get no results. Ideally, I want to get
Hello
i'm
you
can
be
Any ideas are welcome. In the regex101.com the regex I've put up works like a charm, so I'm guessing I have misunderstood something in the Java part.

Your initial pattern was splitting at a word boundary enclosed with 0+ whitespaces pattern. The second pattern is matching substrings.
Use it like this:
String BOUNDARY_STR = "[a-zA-Z]+(?:'[a-zA-Z]+)?";
String test = "Hello i'm #£$#you ##can !!be.";
Matcher matcher = Pattern.compile(BOUNDARY_STR).matcher(test);
List<String> results = new ArrayList<>();
while (matcher.find()){
results.add(matcher.group(0));
}
System.out.println(results); // => [Hello, i'm, you, can, be]
See the Java demo
Note I used [a-zA-Z]+(?:'[a-zA-Z]+)? that matches
[a-zA-Z]+ - 1 or more ASCII letters
(?:'[a-zA-Z]+)? - an optional substring of
' - an apostrophe
[a-zA-Z]+ - 1 or more ASCII letters
You may also wrap the pattern with word boundaries to only match words that are enclosed with non-word chars, "\\b[a-zA-Z]+(?:'[a-zA-Z]+)?\\b".
To find all Unicode letters, use "\\p{L}+(?:'\\p{L}+)?".

Why does this regex capture the excluded character?

I have a regex like this:
(?:(\\s| |\\A|^))(?:#)[A-Za-z0-9]{2,}
What I am trying to do is find a pattern that starts with an # and has two or more characters after, however it can't start in the middle of a word.
I'm new to regex but was under the impression ?: matches but then excludes the character however my regex seems to match but include the characters. Ideally I'd like for "#test" to return "test" and "test#test" to not match at all.
Can anyone tell me what I've done wrong?
Thanks.

Your understanding is incorrect. The difference between (...) and (?:...) is only that the former also creates a numbered match group which can be referred to with a backreference from within the regex, or as a captured match group from code following the match.
You could change the code to use lookbehinds, but the simple and straightforward fix is to put ([A-Za-z0-9]{2,}) inside regular parentheses, like I have done here, and retrieve the first matched group. (The # doesn't need any parentheses around it in this scenario, but the ones you have are harmless.)

Try this : You could use word boundary to specify your condition.
public static void main(String[] args) {
String s1 = "#test";
String s2 = "test#test";
String pattern = "\\b#\\w{2,}\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(s1);
m.find();
System.out.println(m.group());
}
o/p :
#test
throws `IllegalStateException` in the second case (s2)..

How about:
\W#[\S]{2}[\S]*
The strings caught by this regular expression needs to be trimmed and remove the first character.

I guess you better need the following one:
(?<=(?<!\w)#)\w{2,}
Debuggex Demo
Don't forget to escape the backslashes in Java since in a string literal:
(?<=(?<!\\w)#)\\w{2,}

java regular expression [A-Z]{6}-[A-Z]{4}-[A-Z]{4}

I'm trying to write a regular expression in Java for this:
"/[A-Z]{6}-[A-Z]{4}-[A-Z]{4}/"
But it is not working. For example
"AASAAA-AAAA-AAAA".matches("/[A-Z]{6}-[A-Z]{4}-[A-Z]{4}/")
returns false.
What is the correct way?

Java != JavaScript, here you don't need to surround regex with / so try with
"AASAAA-AAAA-AAAA".matches("[A-Z]{6}-[A-Z]{4}-[A-Z]{4}")
Otherwise your regex would search for substring which also has / at start and end.
BTW you need to know that matches checks if regex matches entire String, so
"aaa".matches("aa")
is same as
"aaa".matches("^aa$")
which would return false since String couldn't be fully matched by regex.
If you would like to find substrings which would match regex you would need to use
String input = "abcd";
Pattern regex = Pattern.compile("\\w{2}");
Matcher matcher = regex.matcher(input);
while (matcher.find()){//this will try to find single match
System.out.println(matcher.group());
}
Output:
ab
cd

^[A-Z]{6}-[A-Z]{4}-[A-Z]{4}$
It's just you shouldn't put backslashes at the start and the end. Put instead ^ and $.
And ow I didn't see you have used the Javscript's syntax ! Java != Javascript

Finding duplicate words within a string regex C/W

I'm currently dabbing in regex in Java, and want to try and find duplicate words in strings. If I inputted a string such as 'This this is great.'. I was using \\b(\\w+) \\1\\b, but that only recognizes two duplicate words, such as 'this this' in a string.
Any help regarding this?

Add the "ignore case" switch (?i) to your regex:
(?i)\\b(\\w+) \\1\\b
Alternatively, you could fold the input to lower case first:
input.toLowerCase()
Note: If you're using String.matches(), the regex must match the entire input, so you'd add .* to both ends of your regex:
.*(?i)\\b(\\w+) \\1\\b.*

String pattern = "\\b(\\w+)(\\b\\W+\\b\\1\\b)*";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
You can use Matcher.group() and Matcher.group(1) to replace all duplicate words with this approach.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Give look behind the priority over the actual regular expression - java

I think that a regex for this problem is overkill. You could instead do: str = str.startsWith("a") ? str.substring(1) : str;

Try with: (?i)\\ba?(\\w+)\\b and replace a word with captured group 1. Code example: String word = "aWord Another"; word = word.replaceAll("(?i)\\ba?(\\w+)\\b", "$1"); System.out.println(word); with output: Word nother

Related

Java regex, replace certain characters except

Regex to catch all the words and the "i'm you're etc" in Java

Why does this regex capture the excluded character?

java regular expression [A-Z]{6}-[A-Z]{4}-[A-Z]{4}

Finding duplicate words within a string regex C/W

Categories

Resources