Regex match for anything EXCEPT pattern - java

I am coding in Java here.
I know that the regex for matching any number or string of letter is
"(0|[1-9][0-9]*)(\\.[0-9]+)?|[a-zA-Z]+"
But I would like to match anything except letter or number, ie symbols like !, #, +, -
I tried doing [^.. ] but it doesn't work.
For example, let's say I want to do the opposite, ie return all parts of the string that contains numbers or strings of letters or #, I would do
public ArrayList<String> findMatch(String string){
ArrayList <String> outputArr = new ArrayList<String>();
Pattern p = Pattern.compile("(0|[1-9][0-9]*)(\\.[0-9]+)?|[a-zA-Z]+|\\#");
// recognizes number, string, and #
Matcher m = p.matcher(string)
while (m.find()) {
outputArr.add(m.group());
}
return outputArr;
}
Let's say I want to find the opposite of the code above, how can I change line 3?

You'll probably want to use just this:
\W+
That will match a string of any characters that aren't "word characters", defined as:
[a-zA-Z0-9_]
or "all letters, numbers, and underscore". If you want to include underscore, try the following:
[\W_]+
Or, if you'd rather have it explicit:
[^A-Za-z0-9]+
Which means "everything but letters and numbers".
Hope this helps.

The simplest regex pattern that you can use is : [^\w]+
This will match all the special characters which are neither numbers nor alphabets. Hope this helps. This is a sample Regex Tester with sample examples. You can test your regex for correctness over here. Hope this will help you.
From the example you have provided what I understand is, you want all the characters except alphabets, numbers and '#'.
In regex '\w' matches any alphabet(including underscore) and any number. So you need to negate this, to get other symbolic characters like '$,#' etc.
Below expression will solve your issue = [^\w#]+
'^' indicate negation symbol. Here '^\w' meaning 'match anything except alphabets or numbers'. I have also added '#' symbol in the expression as you need to ignore it as well.
Hope this will answer your question.

If you can give some more detail, what is your requirement? and what you expect?
It will help me to figure out the solution.
What you put in your query looks like you want to match special characters only. Am I right?
If so you can just try:
[^A-Za-z0-9][your quantifier here]
quantifier can be:
? for 0 or 1 frequency
+ for >=1 frequency
* for >=0 frequency
Suppose you have a String like
String s="shyuit6785%^7kui!#*&123f#$annds";
//And you want to find out the characters except alphabets and numerals . (I hope its your requirement)
Pattern p = Pattern.compile("[^A-Za-z0-9#]+");
Matcher m = p.matcher(s);
while (m.find())
{
System.out.println("Found a required character " + m.group() + " at index number " +m.start());
}

Related

How to write a regex to match this String in Java?

I want to make a regular expression to the following string :
String s = "fdrt45B45"; // s is a password (just an example) with only
// letters and digits(must have letters and digits)
I tried with this pattern:
String pattern= "^(\\w{1,}|\\d{1,})$"; //but it doesn't work
I want to get a not match if my password doesn't contains letters or digits and a match if its contains both.
I know that: \w is a word character: [a-zA-Z_0-9] and \d is a digit: [0-9], but i can not mix \w and \d to get what i want. Any help or tips is very appreciated for a newbie.
A positive lookahead would do the trick :
String s = "fdrt45B45";
System.out.println(s.matches("(?=.*[a-zA-Z])(?=.*\\d)[a-zA-Z0-9]+"));
You should be able to use the following regex to achieve what you want. This uses a positive look ahead and will match any string containing at least one letter and at least one number.
^(?=.*\\d)(?=.*\\w).*

Cannot match my regular expression

I am trying to match a string that looks like "WIFLYMODULE-xxxx" where the x can be any digit. For example, I want to be able to find the following...
WIFLYMODULE-3253
WIFLYMODULE-1585
WIFLYMODULE-1632
I am currently using
final Pattern q = Pattern.compile("[WIFLYMODULE]-[0-9]{3}");
but I am not picking up the string that I want. So my question is, why is my regular expression not working? Am i going about it in the wrong way?
You should use (..) instead of [...]. [..] is used for Character class
With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters.
(WIFLYMODULE)-[0-9]{4}
Here is demo
Note: But in this case it's not needed at all. (...) is used for capturing group to access it by Matcher.group(index)
Important Note: Use \b as word boundary to match the correct word.
\\bWIFLYMODULE-[0-9]{4}\\b
Sample code:
String str = "WIFLYMODULE-3253 WIFLYMODULE-1585 WIFLYMODULE-1632";
Pattern p = Pattern.compile("\\bWIFLYMODULE-[0-9]{4}\\b");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
output:
WIFLYMODULE-3253
WIFLYMODULE-1585
WIFLYMODULE-1632
The regex should be:
"WIFLYMODULE-[0-9]{4}"
The square brackets means: one of the characters listed inside. Also you were matching three numbers instead of four. So your were matching strings like (where xxx is a number of three digits):
W-xxx, I-xxx, F-xxx, L-xxx, Y-xxx, M-xxx, O-xxx, D-xxx, U-xxx, L-xxx, E-xxx
You had it match on 3 digits instead of 4. And putting WIFLYMODULE inside [] makes it match on only one of those characters.
final Pattern q = Pattern.compile("WIFLYMODULE-[0-9]{4}");
[...] means that one character out of the ones in the bracket must match and not the string within it.
You, however, want to match WIFLYMODULE, thus, you have to use Pattern.compile("WIFLYMODULE-[0-9]{3}"); or Pattern.compile("(WIFLYMODULE)-[0-9]{3}");
{n} means that the character (or group) must match n-times. In your example you need 4 instead of 3: Pattern.compile("WIFLYMODULE-[0-9]{4}");
This way will work:
final Pattern q = Pattern.compile("WIFLYMODULE-[0-9]{4}");
The pattern breaks down to:
WIFLYMODULE- The literal string WIFLYMODULE-
[0-9]{4} Exactly four digits
What you had was:
[WIFLYMODULE] Any one of the characters in WIFLYMODULE
- The literal string -
[0-9]{3} Exactly three digits

RegEx to find the word between last Upper Case word and another word

My problem is to find a word between two words. Out of these two words one is an all UPPER CASE word which can be anything and the other word is "is". I tried out few regexes but none are helping me. Here is my example:
String :
In THE house BIG BLACK cat is very good.
Expected output :
cat
RegEx used :
(?<=[A-Z]*\s)(.*?)(?=\sis)
The above RegEx gives me BIG BLACK cat as output whereas I just need cat.
One solution is to simplify your regular expression a bit,
[A-Z]+\s(\w+)\sis
and use only the matched group (i.e., \1). See it in action here.
Since you came up with something more complex, I assume you understand all the parts of the above expression but for someone who might come along later, here are more details:
[A-Z]+ will match one or more upper-case characters
\s will match a space
(\w+) will match one or more word characters ([a-zA-Z0-9_]) and store the match in the first match group
\s will match a space
is will match "is"
My example is very specific and may break down for different input. Your question didn't provided many details about what other inputs you expect, so I'm not confident my solution will work in all cases.
Try this one:
String TestInput = "In THE house BIG BLACK cat is very good.";
Pattern p = Pattern
.compile(
"(?<=\\b\\p{Lu}+\\s) # lookbehind assertion to ensure a uppercase word before\n"
+ "\\p{L}+ # matching at least one letter\n"
+ "(?=\\sis) # lookahead assertion to ensure a whitespace is ahead\n"
, Pattern.COMMENTS); Matcher m = p.matcher(TestInput);
if(m.find())
System.out.println(m.group(0));
it matches only "cat".
\p{L} is a Unicode property for a letter in any language.
\p{Lu} is a Unicode property for an uppercase letter in any language.
You want to look for a condition that depends on several parts of infirmation and then only retrieve a specific part of that information. That is not possible in a regex without grouping. In Java you should do it like this:
public class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("[A-Z]+\\s(\\w+)\\sis");
Matcher matcher = pattern.matcher("In THE house BIG BLACK cat is very good.");
if (matcher.find())
System.out.println(matcher.group(1));
}
}
}
The group(1) is the one with brackets around it. In this case w+. And that's your word. The return type of group() is String so you can use it right away
The following part has a extrange behavior
(?<=[A-Z]*\s)(.*?)
For some reason [A-Z]* is matching a empty string. And (.*?) is matching BIG BLACK. With a little tweaks, I think the following will work (but it still matches some false positives):
(?<=[A-Z]+\s)(\w+)(?=\sis)
A slightly better regex would be:
(?<=\b[A-Z]+\s)(\w+)(?=\sis)
Hope it helps
String m = "In THE house BIG BLACK cat is very good.";
Pattern p = Pattern.compile("[A-Z]+\\s\\w+\\sis");
Matcher m1 = p.matcher(m);
if(m1.find()){
String group []= m1.group().split("\\s");// split by space
System.out.println(group[1]);// print the 2 position
}

Java Regex for username

I'm looking for a regex in Java, java.util.regex, to accept only letters ’, -, and . and a range of Unicode characters such as umlauts, eszett, diacritic and other valid letters from European languages.
What I don't want is numbers, spaces like “ ” or “ Tom”, or special characters like !”£$% etc.
So far I'm finding it very confusing.
I started with this
[A-Za-z.\\s\\-\\.\\W]+$
And ended up with this:
[A-Za-z.\\s\\-\\.\\D[^!\"£$%\\^&*:;##~,/?]]+$
Using the cavat to say none of the inner square brackets, according to the documentation
Anyone have any suggestions for a new regex or reasons why the above isn't working?
For my answer, I want to use a simpler regex similar to yours: [A-Z[^!]]+, which means "At least once: (a character from A to Z) or (a character that is not '!').
Note that "not '!'" already includes A to Z. So everything in the outer character group([A-Z...) is pointless.
Try [\p{Alpha}'-.]+ and compile the regex with the Pattern.UNICODE_CHARACTER_CLASS flag.
Use: (?=.*[##$%&\s]) - Return true when atleast one special character (from set) and also if username contain space.
you can add more special character as per your requirment. For Example:
String str = "k$shor";
String regex = "(?=.*[##$%&\\s])";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find()); => gives true

How to match repeated patterns?

I would like to match:
some.name.separated.by.dots
But I don't have any idea how.
I can match a single part like this
\w+\.
How can I say "repeat that"
Try the following:
\w+(?:\.\w+)+
The + after (?: ... ) tell it to match what is inside the parenthesis one or more times.
Note that \w only matches ASCII characters, so a word like café wouldn't be matches by \w+, let alone words/text containing Unicode.
EDIT
The difference between [...] and (?:...) is that [...] always matches a single character. It is called a "character set" or "character class". So, [abc] does not match the string "abc", but matches one of the characters a, b or c.
The fact that \w+[\.\w+]* also matches your string is because [\.\w+] matches a . or a character from \w, which is then repeated zero or more time by the * after it. But, \w+[\.\w+]* will therefor also match strings like aaaaa or aaa............
The (?:...) is, as I already mentioned, simply used to group characters (and possible repeat those groups).
More info on character sets: http://www.regular-expressions.info/charclass.html
More info on groups: http://www.regular-expressions.info/brackets.html
EDIT II
Here's an example in Java (seeing you post mostly Java answers):
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String text = "some.text.here only but not Some other " +
"there some.name.separated.by.dots and.we are done!";
Pattern p = Pattern.compile("\\w+(?:\\.\\w+)+");
Matcher m = p.matcher(text);
while(m.find()) {
System.out.println(m.group());
}
}
}
which will produce:
some.text.here
some.name.separated.by.dots
and.we
Note that m.group(0) and m.group() are equivalent: meaning "the entire match".
This will also work:
(\w+(\.|$))+
You can use ? to match 0 or 1 of the preceeding parts, * to match 0 to any amount of the preceeding parts, and + to match at least one of the preceeding parts.
So (\w\.)? will match w. and a blank, (\w\.)* will match r.2.5.3.1.s.r.g.s. and a blank, and (\w\.)+ will match any of the above but not a blank.
If you want to match something like your example, you'll need to do (\w+\.)+, which means 'match at least one non whitespace, then a period, and match at least one of these'.
(\w+\.)+
Apparently, the body has to be at least 30 characters. I hope this is enough.

Categories

Resources