regular expressions to determine if a string starts with ; - java

The requirement is simple: if the given string matches:
starts with ';'
starts with some char or chars among '\r','\n','\t',' ', and then followed with ';'.
For example ";", "\r;","\r\n;", " \r\n \t;" should all be ok.
Here is my code and it does not work:
private static String regex = "[\\r|\\n| |\\t]+;";
private static boolean startsWithSemicolon(String str) {
return str.matches(regex);
}
Thanks for any help.

You have 2 choices:
Use matches(), in which case the regex must match the entire input, so you'd have to add matching of characters following the ;.
Regex: str.matches("[\\r\\n\\t ]*;.*")
or: Pattern.compile("[\\r\\n\\t ]*;.*").matcher(str).matches()
Use find(), in which case the regex must be anchored to the beginning of the input:
Regex: Pattern.compile("^[\\r\\n\\t ]*;").matcher(str).find()

Related

Replace all characters between two delimiters using regex

I'm trying to replace all characters between two delimiters with another character using regex. The replacement should have the same length as the removed string.
String string1 = "any prefix [tag=foo]bar[/tag] any suffix";
String string2 = "any prefix [tag=foo]longerbar[/tag] any suffix";
String output1 = string1.replaceAll(???, "*");
String output2 = string2.replaceAll(???, "*");
The expected outputs would be:
output1: "any prefix [tag=foo]***[/tag] any suffix"
output2: "any prefix [tag=foo]*********[/tag] any suffix"
I've tried "\\\\\[tag=.\*?](.\*?)\\\\[/tag]" but this replaces the whole sequence with a single "\*".
I think that "(.\*?)" is the problem here because it captures everything at once.
How would I write something that replaces every character separately?
you can use the regex
\w(?=\w*?\[)
which would match all characters before a "[\"
see the regex demo, online compiler demo
You can capture the chars inside, one by one and replace them by * :
public static String replaceByStar(String str) {
String pattern = "(.*\\[tag=.*\\].*)\\w(.*\\[\\/tag\\].*)";
while (str.matches(pattern)) {
str = str.replaceAll(pattern, "$1*$2");
}
return str;
}
Use like this it will print your tx2 expected outputs :
public static void main(String[] args) {
System.out.println(replaceByStar("any prefix [tag=foo]bar[/tag] any suffix"));
System.out.println(replaceByStar("any prefix [tag=foo]loooongerbar[/tag] any suffix"));
}
So the pattern "(.*\\[tag=.*\\].*)\\w(.*\\[\\/tag\\].*)" :
(.*\\[tag=.*\\].*) capture the beginning, with eventually some char in the middle
\\w is for the char you want to replace
(.*\\[\\/tag\\].*) capture the end, with eventually some char in the middle
The substitution $1*$2:
The pattern is (text$1)oneChar(text$2) and it will replace by (text$1)*(text$2)

Ensuring a string is of a certain pattern

I am trying to work out if there is a way to get a check to ensure the string I am checking follows a structure.
eg: String s = "abcd, afsfsfs, abcdef, 90> 20, abeds"
Need to confirm that there is a ', ' followed by a ', ' followed by a ', 'followed by a '> ' and finally a ', '. The letters and numbers can vary in length between the characters that separate them.
I am a bit stuck on this. Any help would be appreciated.
If you want any number of letters,digits between special characters you can use this regex:
public static void main(String[] args) {
String s = "abcd, afsfsfs, abcdef, 90> 20, abeds";
boolean matches = s.matches("\\w+, \\w+, \\w+, \\d+> \\d+, \\w+");
System.out.println(matches);
}
You can use the following regex pattern in conjunction with String#matches():
.*, .*, .*, .*>.*, .*
Code sample:
public static void main(String args[])
{
String s = "abcd, afsfsfs, abcdef, 90> 20, abeds";
if (s.matches(".*, .*, .*, .*>.*, .*")) {
System.out.println("match");
}
else {
System.out.println("no match");
}
}
Demo here:
Rextester
Try something like the below.FYI,not tested yet. Explanation, With [^,]+ .. you are saying match anything but , and then match ,. The second pattern is [^>]+ > Match any char but > and the match >.
[] Character Classes or Character Sets
^ inside [ ] means Negated Character Classes. read more
^(?![\s]*$) [^,]+ , [^,]+ , [^,]+ , [^>]+ > [^,]+ $
start no empty 1st 2nd 3rd 4th end
Try this:
^\s*(?:\s*\w+\s*,\s*){3}\w+\s*>\s*\w+,(?!.*[,>]).*$
Regex Demo 1
it will make sure that the format is exactly what you have wanted. and there is no further , or > sign in the rest of the string. But if your intention is to allow more repetition of ,> in the string once the format is being found, then you may remove the next to last part i.e. (?!.*[,>]) from the regex thus it becomes:
^\s*(?:\s*\w+\s*,\s*){3}\w+\s*>\s*\w+,.*$
Regex Demo 2

Why isn't my regex matching uppercase characters and underscores?

I have the following Java code:
public static void main(String[] args) {
String var = "ROOT_CONTEXT_MATCHER";
boolean matches = var.matches("/[A-Z][a-zA-Z0-9_]*/");
System.out.println("The value of 'matches' is: " + matches);
}
This prints: The value of 'matches' is: false
Why doesn't my var match the regex? If I am reading my regex correctly, it matches any String:
Beginning with an upper-case char, A-Z; then
Consisting of zero or more:
Lower-case chars a-z; or
Upper-case chars A-Z; or
Digits 0-9; or
An underscore
The String "ROOT_CONTEXT_MATCHER":
Starts with an A-Z char; and
Consists of 19 subsequent characters that are all uppper-case A-Z or are an underscore
What's going on here?!?
The issue is with the forward slash characters at the beginning and at the end of the regex. They don't have any special meaning here and are treated as literals. Simply remove them to get it fixed:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
If you intended to use metacharacters for boundary matching, the correct characters are ^ for the beginning of the line, and $ for the end of the line:
boolean matches = var.matches("^[A-Z][a-zA-Z0-9_]*$");
although these are not needed here because String#matches would match the entire string.
You need to remove regex delimiers i.e. / from Java regex:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
That can be further shortened to:
boolean matches = var.matches("[A-Z]\\w*");
Since \\w is equivalent of [a-zA-Z0-9_] (word character)

Java split regex non-greedy match not working

Why is non-greedy match not working for me? Take following example:
public String nonGreedy(){
String str2 = "abc|s:0:\"gef\";s:2:\"ced\"";
return str2.split(":.*?ced")[0];
}
In my eyes the result should be: abc|s:0:\"gef\";s:2 but it is: abc|s
The .*? in your regex matches any character except \n (0 or more times, matching the least amount possible).
You can try the regular expression:
:[^:]*?ced
On another note, you should use a constant Pattern to avoid recompiling the expression every time, something like:
private static final Pattern REGEX_PATTERN =
Pattern.compile(":[^:]*?ced");
public static void main(String[] args) {
String input = "abc|s:0:\"gef\";s:2:\"ced\"";
System.out.println(java.util.Arrays.toString(
REGEX_PATTERN.split(input)
)); // prints "[abc|s:0:"gef";s:2, "]"
}
It is behaving as expected. The non-greedy match will match as little as it has to, and with your input, the minimum characters to match is the first colon to the next ced.
You could try limiting the number of characters consumed. For example to limit the term to "up to 3 characters:
:.{0,3}ced
To make it split as close to ced as possible, use a negative look-ahead, with this regex:
:(?!.*:.*ced).*ced
This makes sure there isn't a closer colon to ced.

Why is this Java regex not working?

I'm trying to match any string consisting of:
Any alphanumeric string of 1+ chars; then
Two periods (".."); then
Any alphanumeric string of 1+ chars
For example:
mydatabase..mytable
anotherDatabase23..table28
etc.
Given the following function:
public boolean isValidDBTableName(String candidate) {
if(candidate.matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"))
return true;
else
return false;
}
Passing this function the value "mydb..tablename" causes it to return false. Why? Thanks in advance!
As NeplatnyUdaj has pointed out in comment, your current regex should return true for the input "mydb..tablename".
However, your regex has the problem of over-matching, where it returns true for invalid names such as nodotname.
You need to escape ., since in Java regex, it will match any character except for line separators:
"[a-zA-Z0-9]+\\.\\.[a-zA-Z0-9]+"
In regex, you can escape meta-characters (character with special meaning) with \. To specify \ in string literal, you need to escape it again.
You must escape the period in regexes. As a \ must also be escaped, this gives
"[a-zA-Z0-9]+\\.\\.[a-zA-Z0-9]+"
I just tried your regex in Eclipse and it worked. Or at least did not fail. Try stripping whitespace characters.
#Test
public void test()
{
String testString = "mydb..tablename";
Assert.assertTrue("no match", testString.matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"));
Assert.assertFalse("falsematch", "a.b".matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"));
}

Categories

Resources