Check string using regex - java

I have the following string:
String s = "http://www.[VP_ANY].com:8080/servlet/[VP_ALL]";
I need to check if this string has the words [VP_ANY] o [VP_ALL]. I tried something like this (and many combinations), but it doesn't work:
Pattern.compile("\b(\\\\[VP_ANY\\\\]|\\\\[VP_ALL\\\\])\b").matcher(s).matches()
What am I doing wrong?
I tried the following:
s = "www.[VP_ANY].com:8080/servlet/[VP_ALL]";
System.out.println(Pattern.compile("\[VP_ANY\]").matcher(s).matches());
System.out.println(s.replaceAll("\[VP_ANY\]", "A"));
The first 'System.out' returns false, and the second one returns the replacement correctly.
I'm escaping the "[" and "]" characters with 2 backslashes, but when I save the post just one is showed. But I'm using 2 ...

Pattern.compile("\b(\\\\[VP_ANY\\\\]|\\\\[VP_ALL\\\\])\b").matcher(s).matches()
String s = "http://www.[VP_ANY].com:8080/servlet/[VP_ALL]";
^^ ^^ ^^ ^
NoWB NoWB NoWB WB
Your regex will not work because there is no word boundaray between . and [, between ] and . and between / and [
Additionally I think you are wrong with the escaping, your word boundaries would need a backslash more and the others two less.
So, since the word boundaries are not working, you should be fine with
Pattern.compile("\\[VP_(?:ANY|ALL)\\])")

Try this one
try {
boolean foundMatch = subjectString.matches("(?i)\\bVP_(?:ANY|ALL)\\b");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
or this
try {
boolean foundMatch = subjectString.matches("(?i)\\[VP_(?:ANY|ALL)\\]");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Try This
\[VP_ANY\]|\[VP_ALL\]
My go at Java
try {
boolean foundMatch = "www.[VP_ANY].com:8080/servlet/[VP_ALL]".matches("\\[VP_ANY\\]|\\[VP_ALL\\]");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

"http://www.[VP_ANY].com:8080/servlet/[VP_ALL]".replaceAll ("http://www.(\\[VP_ANY\\]).com:8080/servlet/(\\[VP_ALL\\])", "$1:$2")
res117: java.lang.String = [VP_ANY]:[VP_ALL]
If you're looking for a literal [, you have to mask it - else it will mean a group like [A-Z].
Now if you read the regex from a file or a JTextField at runtime, that's all. But if you write it to your source code, the compiler will see the \ and treat it as a general masking, which might be needed to mask quotes like in
char apo = '\'';
String quote = "He said: \"Whut?\"";
So you have to mask it again, because only "\\" means "\".
So, for development, to not get too much confused, it is a fine idea to have a simple GUI-App with 2 or 3 textfields for testing regexps. If you succeed, you only have to add another level of masking, but to develop them, you can keep this second level away.
Divide et impera, like the ancient roman programmers told us.

Related

Using replace function in Java

I am trying to use replace function to replace "('", "'" and "')" from the below string -
String foo = "('UK', 'IT', 'DE')";
I am trying to use the below code in order to do this operation -
(foo.contains("('")?foo.replaceAll("('", ""):foo.replace("'",""))?foo.replaceAll("')",""):foo
but its failing as -
java.util.regex.PatternSyntaxException: Unclosed group near index 2
Am I missing anything here?
replaceAll takes a regular expression as its search pattern. Since ( is a special character in regular expressions, it need to be escaped: '\\('. Furthermore, there’s no need for the contains test:
final String bar = foo.replaceAll("\\('", "") …
Lastly, you can combine all your replacements into one regular expression:
final String bar = foo.replaceAll("\\(?'([^']*)'\\)?", "$1");
// Output: UK, IT, DE
This will replace each occurrence of a single-quoted part inside your string with its content without the quotes, and it will allow (and discard) surrounding opening and closing parentheses.
foo.replaceAll("[(')]", "") will make the work )
As other answer(s) and the error message point out, replaceAll() deals with regular expressions, where the opening parenthesis ( has special meaning. Some characters have special meaning even in the replacement argument for the same reason.
If you want to be absolutely sure that your strings are going to behave as strings, there are two built-in "quote" methods (both are static) for "neutralizing" patterns:
Pattern.quote() for wrapping the replacee pattern
Matcher.quoteReplacement() for wrapping the replacement
Example code attempting to replaceAll() two ( to $ symbols:
System.out.println("Naive:");
try {
System.out.println("(("
.replaceAll("(", "$"));
} catch (Exception ex) {
System.out.println(ex);
}
System.out.println("\nPattern.quote:");
try {
System.out.println("q: "+Pattern.quote("("));
System.out.println("(("
.replaceAll(Pattern.quote("("), "$"));
} catch (Exception ex) {
System.out.println(ex);
}
System.out.println("\nPattern.quote+Matcher.quoteReplacement:");
try {
System.out.println("q: "+Pattern.quote("("));
System.out.println("qR: "+Matcher.quoteReplacement("$"));
System.out.println("(("
.replaceAll(Pattern.quote("("), Matcher.quoteReplacement("$")));
} catch (Exception ex) {
System.out.println(ex);
}
Output:
Naive:
java.util.regex.PatternSyntaxException: Unclosed group near index 1
(
Pattern.quote:
q: \Q(\E
java.lang.IllegalArgumentException: Illegal group reference: group index is missing
Pattern.quote+Matcher.quoteReplacement:
q: \Q(\E
qR: \$
$$
Of course by the time one knows about these methods, they have long got accustomed to escape the special characters manually.

Regex that matches the string ÷x% [duplicate]

This question already has answers here:
Why does replaceAll fail with "illegal group reference"?
(8 answers)
Closed 4 years ago.
I've been trying to create a regex that matches the following pattern:
÷x%
here is my code:
String string = "÷x%2%x#3$$#";
String myregex = "all the things I've tried";
string = string.replaceAll(myregex,"÷1x#1$%");
I've tried the following regexes: (÷x%) , [÷][x][%] , [÷]{1}[x]{1}[%]{1}
I am using NetBeans IDE and it gives me an
Illegal group reference
However, when I change the value of string to something else, a word for example.
NetBeans does not give me an exception.
any thoughts, thanks
To replace all occurrences of a sub-string you don't need a pattern. You can use String.replace():
String input = "÷x%abc÷x%def÷x%";
String output = input.replace("÷x%", "÷1x#1$%");
System.out.println(output); // ÷1x#1$%abc÷1x#1$%def÷1x#1$%
As per method javadoc:
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
As per the comments in the question, I am hoping that this will shed some light on how the replaceAll works.
As per the JavaDoc, the replaceAll takes in a regular expression as first argument. In your case, the regular expression appears to be sound, so there is no issue there.
The second argument that the replaceAll accepts, is the string that will be used to replace whatever the regular expression matches.
In some cases, you will need to replace the same pattern with the same (hard coded, if you will) string:
String myString = "123abc1344";
myString = myString.replaceAll("\\d+", "number");
myString = myString.replaceAll("\\w+", "word");
System.out.println(myString); //Would yield something of the sort: numberwordnumber
BUT, there are situations were you want use chunks of what you are replacing in the replacement string itself. This is where the $ comes in:
String myString = "Age:9;Gender:Male";
Let us say that you want to change the format of the string to the following: "I am a {Gender} and I am {Age} years of age.".
In this case, your replacement string needs to extract information from the string to be replaced and inject it in the replacement itself. You do this by using the following:
String myString = "Age:9;Gender:Male";
myString = myString.replaceAll("Age:(\\d+);Gender:(\\w+)", "I am a $2 and I am $1 years of age.";
The above should yield the string that you are after. Notice that I am using $1 and $2 to access regular expression groups. In regular expression language, the 0th group is whatever it is matched by the entire regular expression. Any other round parenthesis denotes another regular expression group which you can access through the $ keyword.
This is why it needs to be escaped.
In the Java Regex you have to escape the $ sign.
If you write $% you would refer to the group % which is not existant.
You can try:
try {
String string = "÷x%2%x#3$$#";
String myregex = "÷x%";
String replace = "÷1x#1\\$%";
String resultString = string.replaceAll(myregex, replace);
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
} catch (IllegalArgumentException ex) {
// Syntax error in the replacement text (unescaped $ signs?)
} catch (IndexOutOfBoundsException ex) {
// Non-existent backreference used the replacement text
}

Regex expression in java htmlunit

I am trying to advance my knowledge of java, by trying to automate webpage scraping and form input. I have experimented with jsoup and now htmlunit. I found a htmlunit example that I am trying to run.
public class GoogleHtmlUnitTest {
static final WebClient browser;
static {
browser = new WebClient();
browser.getOptions().setJavaScriptEnabled(false);
// browser.setJavaScriptEnabled(false);
}
public static void main(String[] arguments) {
boolean result;
try {
result = searchTest();
} catch (Exception e) {
e.printStackTrace();
result = false;
}
System.out.println("Test " + (result? "passed." : "failed."));
if (!result) {
System.exit(1);
}
}
private static boolean searchTest() {
HtmlPage currentPage;
try {
currentPage = (HtmlPage) browser.getPage("http://www.google.com");
} catch (Exception e) {
System.out.println("Could not open browser window");
e.printStackTrace();
return false;
}
System.out.println("Simulated browser opened.");
try {
((HtmlTextInput) currentPage.getElementByName("q")).setValueAttribute("qa automation");
currentPage = currentPage.getElementByName("btnG").click();
System.out.println("contents: " + currentPage.asText());
return containsPattern(currentPage.asText(), "About .* results");
} catch (Exception e) {
System.out.println("Could not search");
e.printStackTrace();
return false;
}
}
public static boolean containsPattern(String string, String regex) {
Pattern pattern = Pattern.compile(regex);
// Check for the existence of the pattern
Matcher matcher = pattern.matcher(string);
return matcher.find();
}
}
It works with some htmlunit errors, that I have found on stackoverflow to ignore. The program runs correctly, so I am taking the advice and ignoring the errors.
Jul 31, 2016 7:29:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.com/search?q=qa+automation&sa=G&gbv=1&sei=_eCdV63VGMjSmwHa85kg' [1:1467] Error in declaration. '*' is not allowed as first char of a property.
My problem at the moment is the regex expression being used for the search. If I am understanding this correctly, “qa automation” is being googled and the retrieved page is being searched by:
return containsPattern(currentPage.asText(), "About .* results");
What is throwing me is “About .* results”. This is the regex, but I don't get how it is being interpreted. What is being searched for on the retrieved page?
.* means "zero or more of any character," in another words, a complete wildcard. It can be
About 28 results
About 2864 results
About 2,864 results
About ERROR results
About results
(Response to comments.)
To be honest, you should find a quick regular expressions tutorial. You're missing some very basic things and instead relying on your own intuitive sense of how "searching" should work, which is leading to confusion.
I like teaching though, so here's a little more :-)
Go to this RegExr link. I already set it up with this expression:
/^About .* results$/gm
Ignore the /^ and the $/gm. (If you really want to know, the two slashes is just the conventional notation for regular expressions. The ^ and $ are "anchors" that force a "full match"—that's why it seemed like "About" had to be in position 0. Whatever regex engine you're using, it seems to force anchors. The g is a flag that just means "Highlight every match," and the m is a flag that means, "Treat every line as a separate entry.") Anyway, back to the main expression:
About .* results
And its matches:
See how if you put a character on either side, it's no longer a match? Again, that's because of anchoring. The expression expects "A" as the first character, so "x" fails. The expression also expects the last character to be "s", so "x" would fail there too. But why did About results fail? It's because there's a space around each side of the .*. The .* wildcard is allowed to match nothing, but the spaces have to match just like letters and numbers. So a single space won't cut it; you need at least two.
You wrote that you tried 230 .* results. See, you're not understanding that regex works character by character, with certain "special" characters you can use. Your expression means, "A string that begins with 230, a space, then anything, a space, "results", and nothing after."
[...] how would I code regex to find the "230" in any position followed by "results", ie "foobar 230 foobar2 results"?
In other words, you want to find a string that starts with anything, has 230 somewhere, has more of anything, a space, "results", and nothing more:
.*230.* results
Do you want the exact number, 230?
.* 230 results

Regex with special received signs and replaceAll() throwns errors

A function receives something like this with special sign (,>_$' and Java replaceAll throwns error.
SAMPLE INPUT
I got an error if input something like this:
[ FAILED ] appendtext variable has with System.lineSeparator():
$model_fsdfdsfdsfdsfdsfds->load('fsdfdsfdsfdsfdsfds','dsfsdfsd');
$model_fsdfdsfdsfdsfdsfds->fsdfdsfdsfdsfdsfds->index();
No error if input as:
[ OKAY ] appendtext variable have simple input with System.lineSeparator():
mysomethingmodel
blabla
EXPLANATIONS
appendtext goes into String with other combinations:
String allappend = "Something simple var" + System.lineSeparator() + "\t{" + System.lineSeparator() + appendtext;
Okay. Than it goes into replaceAll with regex and thrown an error:
str_list = rs.replaceAll(regex_newlinebracket, allappend);
regex_newlinebracket is something regex from another function:
public String RegexPatternsFunction(String types, String function_name)
{
// get a rid of special sign
String function_name_quoted = Pattern.quote(function_name);
switch (types) {
case "newlinebracket":
return function_name_quoted + "(\\s|\\t|\\n)+[{]";
}
return null;
}
ERRORS
Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:808)
at java.util.regex.Matcher.replaceAll(Matcher.java:906)
at java.lang.String.replaceAll(String.java:2162)
or exactly insider appendReplacement function from Matcher.java:
// The first number is always a group
refNum = (int)nextChar - '0';
if ((refNum < 0)||(refNum > 9))
throw new IllegalArgumentException(
"Illegal group reference");
cursor++;
PROBLEM
Using special characters as for the
$model_fsdfdsfdsfdsfdsfds->load('fsdfdsfdsfdsfdsfds','dsfsdfsd');
$model_fsdfdsfdsfdsfdsfds->fsdfdsfdsfdsfdsfds->index();
throwns an error in combination of replaceAll as Regex pattern.
A PROJECT WORKS IF NO SPECIAL SIGN.
I'm using Pattern.quote to escaping special characters in other words it will not works if come input like () and replaceAll using regex.
In C++ Qt, it's works well, in Java not.
Solutions?
It's fine (and necessary) that you use Pattern.quote. But what's causing the actual problem is the replacement string, since it contains $ (which is the relevant referencing-character in replacement strings). Luckily, Java provides you with another quoting function just to make replacement strings safe: Matcher.quoteReplacement()
So just try
allappend = Matcher.quoteReplacement(allappend);
str_list = rs.replaceAll(regex_newlinebracket, allappend);

Regular expression, value in between quotes

I'm having a little trouble constructing the regular expression using java.
The constraint is, I need to split a string seperated by !. The two strings will be enclosed in double quotes.
For example:
"value"!"value"
If I performed a java split() on the string above, I want to get:
value
value
However the catch is value can be any characters/punctuations/numerical character/spaces/etc..
So here's a more concrete example. Input:
""he! "l0"!"wor!"d1"
Java's split() should return:
"he! "l0
wor!"d1
Any help is much appreciated. Thanks!
Try this expression: (".*")\s*!\s*(".*")
Although it would not work with split, it should work with Pattern and Matcher and return the 2 strings as groups.
String input = "\" \"he\"\"\"\"! \"l0\" ! \"wor!\"d1\"";
Pattern p = Pattern.compile("(\".*\")\\s*!\\s*(\".*\")");
Matcher m = p.matcher(input);
if(m.matches())
{
String s1 = m.group(1); //" "he""""! "l0"
String s2 = m.group(2); //"wor!"d1"
}
Edit:
This would not work for all cases, e.g. "he"!"llo" ! "w" ! "orld" would get the wrong groups. In that case it would be really hard to determine which ! should be the separator. That's why often rarely used characters are used to separate parts of a string, like # in email addresses :)
have the value split on "!" instead of !
String REGEX = "\"!\"";
String INPUT = "\"\"he! \"l0\"!\"wor!\"d1\"";
String[] items = p.split(INPUT);
It feels like you need to parse on:
DOUBLEQUOTE = "
OTHER = anything that isn't a double quote
EXCLAMATION = !
ITEM = (DOUBLEQUOTE (OTHER | (DOUBLEQUOTE OTHER DOUBLEQUOTE))* DOUBLEQUOTE
LINE = ITEM (EXCLAMATION ITEM)*
It feels like it's possible to create a regular expression for the above (assuming the double quotes in an ITEM can't be nested even further) BUT it might be better served by a very simple grammer.
This might work... excusing missing escapes and the like
^"([^"]*|"[^"]*")*"(!"([^"]*|"[^"]*")*")*$
Another option would be to match against the first part, then, if there's a !and more, prune off the ! and keep matching (excuse the no-particular-language, I'm just trying to illustrate the idea):
resultList = []
while(string matches \^"([^"]*|"[^"]*")*(.*)$" => match(1)) {
resultList += match
string = match(2)
if(string.beginsWith("!")) {
string = string[1:end]
} elseif(string.length > 0) {
// throw an error, since there was no exclamation and the string isn't done
}
}
if(string.length > 0) {
// throw an exception since the string isn't done
}
resultsList == the list of items in the string
EDIT: I realized that my answer doesn't really work. You can have a single doublequote inside the strings, as well as exclamation marks. As such, you really CAN'T have "!" inside one of the strings. As such, the idea of 1) pull quotes off the ends, 2) split on '"!"' is really the right way to go.

Categories

Resources