How to properly escape this regex patterns in java? - java

This is the input I want to process. I want to extract the value of the operation attribute:
<h:outputLink value="#" id="temp_solution">
<rich:componentContro
for="panel"
attachTo="temp_solution"
operation="show"
event="onclick"/>
</h:outputLink>
With the help of an online regex tester I came up with the following regular expression
(?<=operation=")(\w+)(?=")
To be a bit more dynamic, I replaced operation with %s so I can use this template for different situations. But I encountered a problem, while trying to test my "creation" with the help of a small test program:
public class Main {
private static final String INPUT = "<h:outputLink value=\"#\" id=\"temp_solution\">\n"
+ " <rich:componentControl \n"
+ " for=\"panel\" \n"
+ " attachTo=\"temp_solution\" \n"
+ " operation=\"show\""
+ " event=\"onclick\"/> \n"
+ "</h:outputLink>";
private static final String REGEX_TEMPLATE = "(?<=%s=\")(\\w+)(?=\")";
public static void main(String[] args) throws IOException {
final String actualRegex = String.format(REGEX_TEMPLATE, "operation");
final Pattern pattern = Pattern.compile(actualRegex);
final Matcher matcher = pattern.matcher(INPUT);
System.out.println("Regex: " + pattern);
System.out.println(matcher.matches() ? matcher.group(0) : "Nothing found");
}
}
Output:
Regex: (?<=operation=")(\w+)(?=")
Nothing found
Even double escaping the regex inside my code:
private static final String REGEX_TEMPLATE = "(?<=%s=\\\")(\\\\w+)(?=\\\")";
doesn't help:
Regex: (?<=operation=\")(\\w+)(?=\")
Nothing found
Please give me some advise on this.

There is nothing wrong with your regex. However, it doesn't match the entire input, so you can't use matches(). Change it to find(), which only tries to find a matching subsequence:
System.out.println(matcher.find() ? matcher.group(0) : "Nothing found");

Try regex like this :
(?<=operation=\")(\w+)
demo

Related

Regular expression in java( pattern to match a part of a string)

Could any one please help me write a regular expression to match a part of the string given below.
"Cecscec\n:90A:/5645644343\nvalue1\nvalue2\nvalue3\n:80F:/245343\nfglfejfj\n"
I want to extract only value 1 and value 2 from the above string which is present in :90A:/ which can be considered for look behind.
Output:
Value2
Value3
Pattern:
((?<=(:90A:/).{0,40}[\n].{0,40}[\n]).asterisk)[^:]asterisk
I am struggling since so many days. I would be very grateful if someone could help me on this.
Thanks
Alternative regex:
":90A:/(?:[^\n]{0,40}\n){2}([^\n]+)\n([^\n]+)"
Regex in context:
public static void main(String[] args) {
String input = "Cecscec\n:90A:/5645644343\nvalue1\nvalue2\nvalue3\n:80F:/245343\nfglfejfj\n";
Matcher matcher = Pattern.compile(":90A:/(?:[^\n]{0,40}\n){2}([^\n]+)\n([^\n]+)").matcher(input);
if(matcher.find()) {
System.out.println("Group1: '" + matcher.group(1) + "'");
System.out.println("Group2: '" + matcher.group(2) + "'");
}
}
Output:
Group1: 'value2'
Group2: 'value3'

Get specific values from a string with regex

I try to extract string values from this string:
String str = "[{\"name:\"s2\"},{},{\"name\":\"f2\"},{\"name\":\"f2\"},{},{\"name\":\"l\"}]";
I use regex to extract "s2", "f2", "f2" and "l".
I thought about a solutions, define a regex to find string that begin with ":" + a quotation mark and end with a quotation mark.
I'm not very familiar with regex but I assumed my regex would look like something like this ? ":\".?\""
public static void main(String... args) {
Pattern p = Pattern.compile(":\".?\"");
String str = "[{\"name:\"s2\"},{},{\"name\":\"f2\"},{\"name\":\"f2\"},{},{\"name\":\"l\"}]";
Matcher m = p.matcher(str);
System.out.println(str);
while (m.find()) {
System.out.println("groupe = " + m.group());
}
}
Thanks for your help.
Use can use this pattern:
"(?<=:")[^"]*"
See Demo

Convert Java Regex into PHP regex

I got the following Java code from Apache commons to validate email addresses. I code in PHP so I'm trying to see if these regex can be used directly in PHP without any modification.
LEGAL_ASCII_REGEX = "^\\p{ASCII}+$";
EMAIL_REGEX = "^\\s*?(.+)#(.+?)\\s*$";
IP_DOMAIN_REGEX = "^\\[(.*)\\]$";
USER_REGEX = "^\\s*" + WORD + "(\\." + WORD + ")*$";
If an email address fails any of these 4 conditions above, then it would be considered invalid.
I don't have any experience with Java so any advice on modifications on these regex needed for PHP is hugely appreciated!
Best,
Update:
the code I'm using is:
$email_to_test='www.jinfu66#foxmail.com';
if(filter_var($email_to_test, FILTER_VALIDATE_EMAIL)&&preg_match('/^[[:ascii:]]+$/', $email_to_test)&&preg_match('/^\s*?(.+)#(.+?)\s*$/', $email_to_test))
{
echo 'It passed';
}
else
{
echo 'It did not t passs';
}
I'm not sure how to add the condition that $email_to_test must match the requirement from $USER_REGEX in order for it to echo 'It passed'. Thank you!
2nd update:
Here's what WORD stands for in the original JAVA regex:
private static final String SPECIAL_CHARS = "\\p{Cntrl}\\(\\)<>#,;:'\\\\\\\"\\.\\[\\]";
private static final String VALID_CHARS = "[^\\s" + SPECIAL_CHARS + "]";
private static final String QUOTED_USER = "(\"[^\"]*\")";
private static final String WORD = "((" + VALID_CHARS + "|')+|" + QUOTED_USER + ")";
PHP regex dont need double \\ like Java regex
PCRE regex have [[:ascii:]] instead of \\p{ASCII}
PCRE regex need delimiter unlike Java regex
Following PHP regex should work for you:
$LEGAL_ASCII_REGEX = '/^[[:ascii:]]+$/';
$EMAIL_REGEX = '/^\s*?(.+)#(.+?)\s*$/';
$IP_DOMAIN_REGEX = '/^\[(.*)\]$/';
$USER_REGEX = '/^\s*' + preg_quote(WORD, '/') + '(\.' + preg_quote(WORD, '/') + ')*$/';

Using regular expression to find a set number of + JAVA

I have a program where I want to filter Strings with a set number of "+"'s at the beginning.
For example:
+++Adam is working very well.
++Adam is working well.
+Adam is doing OK.
How do I only pick up each particular case (i.e. only one plus sign, only two plus signs, only three plus signs)? I usually get a return of anything beginning with a +.
I have the following regex patterns compiled, but I either get only one return (usually the two ++) or all of them:
public static String regexpluschar = "^\\Q+\\E{1}[\\w <]";
public static String regexpluspluschar = "^\\Q+\\E{2}[\\w <]";
public static String regexpluspluspluschar = "^\\Q+\\E{3}[\\w <]";
Pattern plusplusplus = Pattern.compile(regexpluspluspluschar);
Pattern plusplus = Pattern.compile(regexpluspluschar);
Pattern plus = Pattern.compile(regexpluschar);
I then try to find using a Matcher class - I've used .find() and .matches() but don't get the result I'm after (java+regex newbie alert here).
Matcher matcherplusplusplus = plusplusplus.matcher(check);
Matcher matcherplusplus = plusplus.matcher(check);
Matcher matcherplus = plus.matcher(check);
//OK we have 3+'s
if ((matcherplusplusplus.find())==true){
System.out.println("Filtering 3 +s.");
System.out.println("filter is " + filter + " in the 3 + filter.");
String toChange = getItem(i);
setItemFiltered(i, toChange);
}
//OK - we have 2 +'s
if ((matcherplusplus.find())==true){
System.out.println("Filtering 2 +s.");
System.out.println("filter is " + filter + " in the 2 + filter.");
String toChange = getItem(i);
setItemFiltered(i, toChange);
}
//OK - we have 1 +'s
if ((matcherplus.find())==true){
System.out.println("Filtering 1 +.");
System.out.println("filter is " + filter + " in the 1 + filter.");
String toChange = getItem(i);
setItemFiltered(i, toChange);
}
For the very curious, the above if's are embedded in a for loop that cycles around some JTextFields. Full code at: http://pastebin.ca/2199327
Why not simpler :
public static String regexpluschar = "^\\+[\\w <]";
public static String regexpluspluschar = "^\\+{2}[\\w <]";
public static String regexpluspluspluschar = "^\\+{3}[\\w <]";
or even
public static String regexpluschar = "^\\+[^\\+]";
public static String regexpluspluschar = "^\\+{2}[^\\+]";
public static String regexpluspluspluschar = "^\\+{3}[^\\+]";
Edit : It's working on my test program, but I had to removed your specific code :
String toChange = getItem(i);
setItemFiltered(i, toChange);
proof : my output is :
Filtering 3 +s.
+++Adam is working very well. is in the 3 + filter.
Filtering 2 +s.
++Adam is working well. is in the 2 + filter.
Filtering 1 +.
+Adam is doing OK. is in the 1 + filter.
Your filter is working, but you specific code may not... (maybe have a look at setItemFiltered?)
I was thinking something like this would be easier:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("^(\\+{1,3}).*");
Matcher matcher = pattern.matcher(<your text>);
if (matcher.matches()) {
String pluses = matcher.group(1);
switch (pluses.length()) {
}
}
}
And if you want to be sure that ++++This is insane does not match then change the pattern to
Pattern pattern = Pattern.compile("^(\\+{1,3})[^+].*");

How do I escape '+' in pattern matching to highlight keyword?

I'm implementing a keyword highlighter in Java. I'm using java.util.regex.Pattern to highlight (making bold) keyword within String content. The following piece of code is working fine for alphanumeric keywords, but it is not working for some special characters. For example, in String content, I would like to highlight the keyword c++ which has the special character + (plus), but it's not getting highlighted properly. How do I escape + character so that c++ is highlighted?
public static void main(String[] args)
{
String content = "java,c++,ejb,struts,j2ee,hibernate";
System.out.println("CONTENT: " + content);
String highlight = "C++";
System.out.println("HIGHLIGHT KEYWORD: " + highlight);
//highlight = highlight.replaceAll(Pattern.quote("+"), "\\\\+");
java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("\\b" + highlight + "\\b", java.util.regex.Pattern.CASE_INSENSITIVE);
System.out.println("PATTERN: " + pattern.pattern());
java.util.regex.Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
System.out.println("Match found!!!");
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
content = matcher.replaceAll("<B>" + matcher.group(i) + "</B>");
}
}
System.out.println("RESULT: " + content);
}
Output:
CONTENT: java,c++,ejb,struts,j2ee,hibernate
HIGHLIGHT KEYWORD: C++
PATTERN: \bC++\b
Match found!!!
c
RESULT: java,c++,ejb,struts,j2ee,hibernate
I even tried to escape '+' before calling Pattern.compile like this,
highlight = highlight.replaceAll(Pattern.quote("+"), "\\\\+");
but still I'm not able to get the syntax right. Can somebody help me solve this?
This should do what you need:
Pattern pattern = Pattern.compile(
"\\b"
+ Pattern.quote(highlight)
+ "\\b",
Pattern.CASE_INSENSITIVE);
Update: you are right, the above doesn't work for C++ (\b matches word boundaries and doesn't recognize ++ as a word). We need a more complicated solution:
Pattern pattern = Pattern.compile(
"\\b"
+ Pattern.quote(highlight)
+ "(?![^\\p{Punct}\\s])", // matches if the match is not followed by
// anything other than whitespace or punctuation
Pattern.CASE_INSENSITIVE);
Update in response to comments: it seems that you need more logic in your pattern creation. Here's a helper method to create the pattern for you:
private static final String WORD_BOUNDARY = "\\b";
// edit this to suit your neds:
private static final String ALLOWED = "[^,.!\\-\\s]";
private static final String LOOKAHEAD = "(?!" + ALLOWED + ")";
private static final String LOOKBEHIND = "(?<!" + ALLOWED + ")";
public static Pattern createHighlightPattern(final String highlight) {
final Pattern pattern = Pattern.compile(
(Character.isLetterOrDigit(highlight.charAt(0))
? WORD_BOUNDARY : LOOKBEHIND)
+ Pattern.quote(highlight)
+ (Character.isLetterOrDigit(highlight.charAt(highlight.length() - 1))
? WORD_BOUNDARY : LOOKAHEAD),
Pattern.CASE_INSENSITIVE);
return pattern;
}
And here is some test code to check that it works:
private static void testMatch(final String haystack, final String needle) {
final Matcher matcher = createHighlightPattern(needle).matcher(haystack);
if (!matcher.find())
System.out.println("Failed to find pattern " + needle);
while (matcher.find())
System.out.println("Found additional match: " + matcher.group() +
" for pattern " + needle);
}
public static void main(final String[] args) {
final String testString = "java,c++,hibernate,.net,asp.net,c#,spring";
testMatch(testString, "java");
testMatch(testString, "c++");
testMatch(testString, ".net");
testMatch(testString, "c#");
}
When I run this method, I don't see any output (which is good :-))
The problem is that the \b word boundary anchor is not matching, because + is a non word character and I assume there is a whitespace following that is also a non word character.
A word boundary \b is matching a change from a word character (Member in \w) to a non word character (no member of \w).
Also if you want to match a + literally you have to escape it. Here you are searching for C++ that means match at least one C and the ++ is a possessive quantifier matching at least 1 C and does not backtrack.
Try changing your pattern to something like this
java.util.regex.Pattern.compile("\\b" + highlight + "(?=\s)", java.util.regex.Pattern.CASE_INSENSITIVE);
(?=\s) is a positive lookahead that will check if there is a whitespace following your highlight
Additionally you will need to esacape the + your are searching for.
All you need is here :
Pattern.compile("\\Q"+highlight+"\\E", java.util.regex.Pattern.CASE_INSENSITIVE);
Assuming your keyword does not begin or end with punctuation, here is a commented regex which uses lookahead and lookbehind to achieve your desired matching behavior:
// Compile regex to match a keyword or keyphrase.
java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(
"(?<=[\\s'\".?!,;:]|^) # Word preceded by ws, quote, punct or BOS.\n" +
// Escape any regex metacharacters in the keyword phrase.
java.util.regex.Pattern.quote(highlight) + " # Keyword to be matched.\n" +
"(?=[\\s'\".?!,;:]|$) # Word followed by ws, quote, punct or EOS.",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.COMMENTS);
Note that this solution works even if your keyword is a phrase containing spaces.

Categories

Resources