Make Regex Match Whitespaces in Java - java

How can I make this regex match white spaces? Currently, it can only match the following:
abcdatcsdotuniversitydotedu
I would like it to mach the following:
abcd at cs dot university dot edu
This is the Regex:
([A-Za-z][A-Za-z0-9.\\-_]*)\\s[ ]?(at)[ ]*([A-Za-z][A-Za-z0-9\\-_(dot)]*[ ]?(dot)[ ]*[A-Za-z]+)

\s matches a white-space character and when this is used in a java string you need to escape the \ so it would be \\s. If you want to match zero-or-more white-space then use \\s*.
This will match a single domain and TLD:
([A-Za-z][A-Za-z0-9.\\-_]*)\\s*(at)\\s*([A-Za-z][A-Za-z0-9\\-_()]*\\s*(dot)\\s*[A-Za-z]+)
However, you are trying to match multiple levels of sub-domains so you need to wrap the domain part of the regular expression ([A-Za-z][A-Za-z0-9\\-_()]*\\s*(dot)\\s* in ()+ to get one-or-more of them:
([A-Za-z][A-Za-z0-9.\\-_]*)\\s*(at)\\s*(([A-Za-z][A-Za-z0-9\\-_()]*\\s*(dot)\\s*)+[A-Za-z]+)
^ ^^
Something like this:
public class RegexpMatch {
static Pattern Regex = Pattern.compile(
"([A-Za-z][A-Za-z0-9.\\-_]*)\\s*(at)\\s*(([A-Za-z][A-Za-z0-9\\-_()]*\\s*(dot)\\s*)+[A-Za-z]+)"
);
public static void main( final String[] args ){
final String[] tests = {
"abcdatcsdotuniversitydotedu",
"abcd at cs dot university dot edu"
};
for ( final String test : tests )
System.out.println( test + " - " + ( Regex.matcher( test ).matches() ? "Match" : "No Match" ) );
}
}
Which outputs:
abcdatcsdotuniversitydotedu - Match
abcd at cs dot university dot edu - Match

public static boolean isAlphaNumericWithWhiteSpace(String text) {
return text != null && text.matches("^[\\p{L}\\p{N}ın\\s]*$");
}
\p{L} matches a single code point in the category "letter".
\p{N} matches any kind of numeric character in any script.
I am using this code.

Related

regex for pattern in java

I need regex (named in code myRegex) which will match all functions (function1, function2, function3).
public static void main(String[] args) {
String template = "f[0-9]"; // like f1, f2 etc
String myRegex = "fun\\((" + template + "*)\\)"; //todo what regex?
Pattern myPattern = Pattern.compile(myRegex);
String function1 = "fun(f1)";
String function2 = "fun(f1,f9)"; //myRegex don't match
String function3 = "fun(f1,f9,f4)"; // myRegex don't match
List<String> functions = Lists.asList(function1, function2, function3);
for (String function : functions) {
Matcher matcher = myPattern.matcher(function);
while (matcher.find())
{
System.out.println(function + " match!");//works only for function1
}
}
}
Elements in brackets must be seperated by comma (,).
It must work for other funcions with many arguments like
:function4 = "fun(f1,f9,f4,f5,f7)";
Please use below.
String myRegex = "fun\\((" + template + ",)*" + template + "?\\)";
If you want to cater to fun() as well - without any parameters, use below
String myRegex = "fun\\((" + template + ",)*(" + template + ")?\\)";
Use the following regex pattern:
fun\(f[0-9](?:,f[0-9]){0,2}\)
This will match any function named fun() having between 1 and 3 f arguments. Your actual Java regex pattern should be defined as:
Pattern myPattern = Pattern.compile("fun\\(f[0-9](?:,f[0-9]){0,2}\\)");
Regex
Use the following regex. It will also work if there are spaces between arguments or parentheses:
fun\s*\(\s*(?:f[0-9])?(?:\s*,\s*f[0-9])*\s*\)
Demo
https://regex101.com/r/hndzov/1
Java string
Pattern myPattern = Pattern.compile("fun\\s*\\(\\s*(?:f[0-9])?(?:\\s*,\\s*f[0-9])*\\s*\\)")
Explanation
Regex
Description
fun
Match exactly the characters fun
\s
Match any whitespace character
\s*
Match 0 or more whitespaces
\(, \(
Match opening and closing parentheses
f
Match character f
[0-9]
Match any digit from 0-9
(?:)
Make a group but don't capture it
Cons
Also matches a parameter list with a leading comma like fun(,f2, f3).

Java regex 2 ignore pattern between 2 words and should say match in if conditio

I have to find out the match between from and IN_TXT and anything between 2 words should be ignored and say it is matched.I tried with below expression but not working.,
String table="IN_TXT";
String s="select * from JAN_X.IN_TXT";
if((s.matches("from"+"(.*)"+table))){
System.out.println``("Matched");
}
What might be missing here?
matches will add a ^ and $ anchor so your regex is not completely matching against your input
so you can use .*? as .*?from"+"(.*)"+table where .*? will cover the string occurred before from
.*? match as few times as possible
String s = "select * from JAN_X.IN_TXT";
String table = "IN_TXT";
if ((s.matches(".*?from" + "(.*)" + table))) {
System.out.println("Matched");
}
if you want to extract JAN_X then you can use
// $1 represents (.*) capture group
String s2= s.replaceAll(".*?from (.*)\\."+table,"$1");
System.out.println(s2);
output
JAN_X

Regex including date string, email, number

I have this regex expression:
String patt = "(\\w+?)(:|<|>)(\\w+?),";
Pattern pattern = Pattern.compile(patt);
Matcher matcher = pattern.matcher(search + ",");
I am able to match a string like
search = "firstName:Giorgio"
But I'm not able to match string like
search = "email:giorgio.rossi#libero.it"
or
search = "dataregistrazione:27/10/2016"
How I should modify the regex expression in order to match these strings?
You may use
String pat = "(\\w+)[:<>]([^,]+)"; // Add a , at the end if it is necessary
See the regex demo
Details:
(\w+) - Group 1 capturing 1 or more word chars
[:<>] - one of the chars inside the character class, :, <, or >
([^,]+) - Group 2 capturing 1 or more chars other than , (in the demo, I added \n as the demo input text contains newlines).
You can use regex like this:
public static void main(String[] args) {
String[] arr = new String[]{"firstName:Giorgio", "email:giorgio.rossi#libero.it", "dataregistrazione:27/10/2016"};
String pattern = "(\\w+[:|<|>]\\w+)|(\\w+:\\w+\\.\\w+#\\w+\\.\\w+)|(\\w+:\\d{1,2}/\\d{1,2}/\\d{4})";
for(String str : arr){
if(str.matches(pattern))
System.out.println(str);
}
}
output is:
firstName:Giorgio
email:giorgio.rossi#libero.it
dataregistrazione:27/10/2016
But you have to remember that this regex will work only for your format of data. To make up the universal regex you should use RFC documents and articles (i.e here) about email format. Also this question can be useful.
Hope it helps.
The Character class \w matches [A-Za-z0-9_]. So kindly change the regex as (\\w+?)(:|<|>)(.*), to match any character from : to ,.
Or mention all characters that you can expect i.e. (\\w+?)(:|<|>)[#.\\w\\/]*, .

RegEx: Matching n-char long sequence of repeating character

I want to split of a text string that might look like this:
(((Hello! --> ((( and Hello!
or
########No? --> ######## and No?
At the beginning I have n-times the same special character, but I want to match the longest possible sequence.
What I have at the moment is this regex:
([^a-zA-Z0-9])\\1+([a-zA-Z].*)
This one would return for the first example
( (only 1 time) and Hello!
and for the second
# and No!
How do I tell regEx I want the maximal long repetition of the matching character?
I am using RegEx as part of a Java program in case this matters.
I suggest the following solution with 2 regexps: (?s)(\\W)\\1+\\w.* for checking if the string contains same repeating non-word symbols at the start, and if yes, split with a mere (?<=\\W)(?=\\w) pattern (between non-word and a word character), else, just return a list containing the whole string (as if not split):
String ptrn = "(?<=\\W)(?=\\w)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
if (str.matches("(?s)(\\W)\\1+\\w.*")) {
System.out.println(Arrays.toString(str.split(ptrn)));
}else { System.out.println(Arrays.asList(str)); }
}
See IDEONE demo
Result:
[(((, Hello!]
[########, No?]
[$%^&^Hello!]
Also, your original regex can be modified to fit the requirement like this:
String ptrn = "(?s)((\\W)\\2+)(\\w.*)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
Pattern p = Pattern.compile(ptrn);
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(Arrays.asList(m.group(1), m.group(3)));
}
else {
System.out.println(Arrays.asList(str));
}
}
See another IDEONE demo
That regex matches:
(?s) - DOTALL inline modifier (if the string has newline characters, .* will also match them).
((\\W)\\2+) - Capture group 1 matching and capturing into Group 2 a non-word character followed by the same character (since a backreference \2 is used) 1 or more times.
(\\w.*) - matches and captures into Group 3 a word character and then one or more characters.

How can I change the invalid characters to valid chars in Java?

private static void isValidName(String[] filename){
FileSystem fs = FileSystems.getDefault();
System.out.println(fs);
String pattern = ("^[\\w&[^?\\\\/. ]]+?\\.*[\\w&[^?\\\\/. ]]+$");
for (String s: filename) {
//System.out.println(s.matches(pattern));
if (s.matches(pattern)==false){
System.out.println(s.matches(pattern));
}
}
Now I call this function:
String[] name2={"valami.txt."};
isValidName(name2);
How can I replace the invalid characters in if(s.matches(pattern)==false) with valid characters?
Output:
false
You may use this piece of code to remove/replace invalid characters:
String[] bad = {
"foo.tar.gz",
" foo.txt",
"foo?",
"foo/",
"foo\\",
".foo",
"foo."
};
String remove_pattern = "^[ .]+|\\.+$|\\.(?=[^.]*\\.[^.]*$)|[?\\\\/:;]";
for (String s: bad) {
System.out.println(s.replaceAll(remove_pattern, "_"));
}
See IDEONE demo
Output:
foo_tar.gz
_foo.txt
foo_
foo_
foo_
_foo
foo_
REGEX contains several alternatives joined with | alternation operator to match the invalid character(s) only.
^[ .]+ - Matches 1 or more leading spaces or dots
\\.+$ - Matches final ., 1 or more occurrences (change to [. ]+$ if you plan to also replace trailing spaces)
\\.(?=[^.]*\\.[^.]*$) - Matches a . that is followed by an optional number of characters and another dot (thus, leaving the last dot in the string)
[?\\\\/:;] - Matches ?, \, /, : and ; literally.

Categories

Resources