private static void isValidName(String[] filename){
FileSystem fs = FileSystems.getDefault();
System.out.println(fs);
String pattern = ("^[\\w&[^?\\\\/. ]]+?\\.*[\\w&[^?\\\\/. ]]+$");
for (String s: filename) {
//System.out.println(s.matches(pattern));
if (s.matches(pattern)==false){
System.out.println(s.matches(pattern));
}
}
Now I call this function:
String[] name2={"valami.txt."};
isValidName(name2);
How can I replace the invalid characters in if(s.matches(pattern)==false) with valid characters?
Output:
false
You may use this piece of code to remove/replace invalid characters:
String[] bad = {
"foo.tar.gz",
" foo.txt",
"foo?",
"foo/",
"foo\\",
".foo",
"foo."
};
String remove_pattern = "^[ .]+|\\.+$|\\.(?=[^.]*\\.[^.]*$)|[?\\\\/:;]";
for (String s: bad) {
System.out.println(s.replaceAll(remove_pattern, "_"));
}
See IDEONE demo
Output:
foo_tar.gz
_foo.txt
foo_
foo_
foo_
_foo
foo_
REGEX contains several alternatives joined with | alternation operator to match the invalid character(s) only.
^[ .]+ - Matches 1 or more leading spaces or dots
\\.+$ - Matches final ., 1 or more occurrences (change to [. ]+$ if you plan to also replace trailing spaces)
\\.(?=[^.]*\\.[^.]*$) - Matches a . that is followed by an optional number of characters and another dot (thus, leaving the last dot in the string)
[?\\\\/:;] - Matches ?, \, /, : and ; literally.
Related
I'm trying to replace all characters between two delimiters with another character using regex. The replacement should have the same length as the removed string.
String string1 = "any prefix [tag=foo]bar[/tag] any suffix";
String string2 = "any prefix [tag=foo]longerbar[/tag] any suffix";
String output1 = string1.replaceAll(???, "*");
String output2 = string2.replaceAll(???, "*");
The expected outputs would be:
output1: "any prefix [tag=foo]***[/tag] any suffix"
output2: "any prefix [tag=foo]*********[/tag] any suffix"
I've tried "\\\\\[tag=.\*?](.\*?)\\\\[/tag]" but this replaces the whole sequence with a single "\*".
I think that "(.\*?)" is the problem here because it captures everything at once.
How would I write something that replaces every character separately?
you can use the regex
\w(?=\w*?\[)
which would match all characters before a "[\"
see the regex demo, online compiler demo
You can capture the chars inside, one by one and replace them by * :
public static String replaceByStar(String str) {
String pattern = "(.*\\[tag=.*\\].*)\\w(.*\\[\\/tag\\].*)";
while (str.matches(pattern)) {
str = str.replaceAll(pattern, "$1*$2");
}
return str;
}
Use like this it will print your tx2 expected outputs :
public static void main(String[] args) {
System.out.println(replaceByStar("any prefix [tag=foo]bar[/tag] any suffix"));
System.out.println(replaceByStar("any prefix [tag=foo]loooongerbar[/tag] any suffix"));
}
So the pattern "(.*\\[tag=.*\\].*)\\w(.*\\[\\/tag\\].*)" :
(.*\\[tag=.*\\].*) capture the beginning, with eventually some char in the middle
\\w is for the char you want to replace
(.*\\[\\/tag\\].*) capture the end, with eventually some char in the middle
The substitution $1*$2:
The pattern is (text$1)oneChar(text$2) and it will replace by (text$1)*(text$2)
How can I make this regex match white spaces? Currently, it can only match the following:
abcdatcsdotuniversitydotedu
I would like it to mach the following:
abcd at cs dot university dot edu
This is the Regex:
([A-Za-z][A-Za-z0-9.\\-_]*)\\s[ ]?(at)[ ]*([A-Za-z][A-Za-z0-9\\-_(dot)]*[ ]?(dot)[ ]*[A-Za-z]+)
\s matches a white-space character and when this is used in a java string you need to escape the \ so it would be \\s. If you want to match zero-or-more white-space then use \\s*.
This will match a single domain and TLD:
([A-Za-z][A-Za-z0-9.\\-_]*)\\s*(at)\\s*([A-Za-z][A-Za-z0-9\\-_()]*\\s*(dot)\\s*[A-Za-z]+)
However, you are trying to match multiple levels of sub-domains so you need to wrap the domain part of the regular expression ([A-Za-z][A-Za-z0-9\\-_()]*\\s*(dot)\\s* in ()+ to get one-or-more of them:
([A-Za-z][A-Za-z0-9.\\-_]*)\\s*(at)\\s*(([A-Za-z][A-Za-z0-9\\-_()]*\\s*(dot)\\s*)+[A-Za-z]+)
^ ^^
Something like this:
public class RegexpMatch {
static Pattern Regex = Pattern.compile(
"([A-Za-z][A-Za-z0-9.\\-_]*)\\s*(at)\\s*(([A-Za-z][A-Za-z0-9\\-_()]*\\s*(dot)\\s*)+[A-Za-z]+)"
);
public static void main( final String[] args ){
final String[] tests = {
"abcdatcsdotuniversitydotedu",
"abcd at cs dot university dot edu"
};
for ( final String test : tests )
System.out.println( test + " - " + ( Regex.matcher( test ).matches() ? "Match" : "No Match" ) );
}
}
Which outputs:
abcdatcsdotuniversitydotedu - Match
abcd at cs dot university dot edu - Match
public static boolean isAlphaNumericWithWhiteSpace(String text) {
return text != null && text.matches("^[\\p{L}\\p{N}ın\\s]*$");
}
\p{L} matches a single code point in the category "letter".
\p{N} matches any kind of numeric character in any script.
I am using this code.
I want to split of a text string that might look like this:
(((Hello! --> ((( and Hello!
or
########No? --> ######## and No?
At the beginning I have n-times the same special character, but I want to match the longest possible sequence.
What I have at the moment is this regex:
([^a-zA-Z0-9])\\1+([a-zA-Z].*)
This one would return for the first example
( (only 1 time) and Hello!
and for the second
# and No!
How do I tell regEx I want the maximal long repetition of the matching character?
I am using RegEx as part of a Java program in case this matters.
I suggest the following solution with 2 regexps: (?s)(\\W)\\1+\\w.* for checking if the string contains same repeating non-word symbols at the start, and if yes, split with a mere (?<=\\W)(?=\\w) pattern (between non-word and a word character), else, just return a list containing the whole string (as if not split):
String ptrn = "(?<=\\W)(?=\\w)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
if (str.matches("(?s)(\\W)\\1+\\w.*")) {
System.out.println(Arrays.toString(str.split(ptrn)));
}else { System.out.println(Arrays.asList(str)); }
}
See IDEONE demo
Result:
[(((, Hello!]
[########, No?]
[$%^&^Hello!]
Also, your original regex can be modified to fit the requirement like this:
String ptrn = "(?s)((\\W)\\2+)(\\w.*)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
Pattern p = Pattern.compile(ptrn);
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(Arrays.asList(m.group(1), m.group(3)));
}
else {
System.out.println(Arrays.asList(str));
}
}
See another IDEONE demo
That regex matches:
(?s) - DOTALL inline modifier (if the string has newline characters, .* will also match them).
((\\W)\\2+) - Capture group 1 matching and capturing into Group 2 a non-word character followed by the same character (since a backreference \2 is used) 1 or more times.
(\\w.*) - matches and captures into Group 3 a word character and then one or more characters.
I have a string "'GLO', FLO" Now, I want a regex expression that will check each words in the string and if:
-word begins and ends with a single quote, replace single quotes with spaces
-if a comma is encounted between words split both words using space.
so, in the end, I should get GLO FLO.
Any help on how to do this using replaceAll() method on the string?
This regex didn't do it for me : "'([^' ]+)|\\s+'"
public static void displaySplitString(final String str) {
String pattern1 = "^'?(\\w+)'?,\\s+(\\w+)$";
StringTokenizer strTok = new StringTokenizer(str, " , ");
while (strTok.hasMoreTokens()) {
String delim = (strTok.nextToken());
delim.replaceAll(pattern1, "$1$2");
System.out.println(delim);
}
} //in main method displaySplitString("'GLO', FLO");
Here is the snippet that should get you going:
public static void displaySplitString(String str)
{
String pattern1 = "^'?(\\w+)'?(?=\\S)";
str = str.replaceAll(pattern1, " $1 ");
StringTokenizer strTok = new StringTokenizer(str, " , ");
while (strTok.hasMoreTokens())
{
String delim = (strTok.nextToken());
System.out.println(delim);
}
}
Here,
I change str argument declaration as not final (so that we could change the str value inside the method)
I am using the first regex ^'?(\\w+)'?(?=\\S) to remove potential single quotes from around the first word
Since you use a StringTokenizer, just 2 lines inside the while block are enough.
The regex means:
^ - Start looking for the match at the very start of the string
'? - match 0 or 1 single quote
(\\w+) - match and capture 1 or more alphanumeric symbols (we'll refer to them as $1 in the replacement pattern)
'? - match 0 or 1 single quote
(?=\\S) - match only if there is no space after the optional single quote. Perhaps, you can even replace this lookahead with a mere , if you always have it there, after the first word.
I wrote a regex which should check does string contains word 'Page' and after it any number
This is code:
public static void main(String[] args) {
String str1 = "12/15/14 7:01:44 Page 10 ";
String str2 = "12/15/14 7:01:44 Page 9 ";
System.out.println(containsPage(str2));
}
private static boolean containsPage(String str) {
String regExp = "^.*Page[ ]{1,}[0-9].$";
return Pattern.matches(regExp, str);
}
Result: str1: false, str2:true
Can you help me what is wrong?
Change the regex to the following:
String regExp = "^.*Page[ ]{1,}[0-9]+.$";
so that it matches one or more digits (hence the [0-9]+).
You also don't need the boundary matchers (^ and $) since Pattern#matches would match the entire input string; and [ ]{1,} is equivalent to [ ]+:
String regExp = ".*Page +[0-9]+.";
Change it to:
String regExp = "^.*Page[ ]{1,}[0-9]+.$"; //or \\d+
↑
[0-9] matches 9 in the second example, and . matches the space.
In the first example, [0-9] matches 1, . matches 0 and remained space isn't matched. Note that ^ and $ are not really needed here.
Your regex can be simplified to:
String regExp = ".*Page\\s+\\d+.";