Making group constraints in Regex in android

Making group constraints in Regex in android - java

So I have this String:
String articleContent = "dfgh{jdf%g{%qf234ad%22!#$56a%}vzsams{%3%45%}678456{78";
I want to remove everything between {% %}
So result would be something like :
dfgh{jdf%gvzsams678456{78
I tried this:
String regex = "[{%][^[%}]&&\\p{Graph}]*[%}]";
String abc = articleContent.replaceAll(regex, "");
But what I get is:
dfghfgqf234ad}vzsams3}678456{78
What I suppose I'm doing wrong is not able to make a group of "{%" instead of [{%] which is like an or condition { or % .
Any suggestions?
EDIT 1:
The string that I have taken is just for an example. It can have any special characters in between {% and %} not only ! and %

You can do it with this pattern:
String regex = "\\{%(?>[^%]++|%(?!}))*%}";
explanations:
The goal of this pattern is to reduce at the minimum the number of backtracks:
\\{% # { need to be escaped
(?> # open an atomic group *
[^%]++ # all characters but %, one or more times (possessive *)
| # OR
%(?!}) # % not followed by } (<-no need to escape)
)* # close the atomic group, repeat zero or more times
%}
(* more informations about possessive quantifiers and atomic groups)

Try this way
String abc = articleContent.replaceAll("\\{%.*?%}", "")
Since { is special characters you need to escape it. You can do this with \\{ or [{].
Now to match all characters between {% and %} you can use {%.*%}, but * quantifier is greedy so it will match maximal possible substring between first {% and last %}. To make it match minimal substring we need to add ? after * making it reluctant.
You can find more info about quantifiers here.

Related

How to remove everything after specific character in string using Java

I have a string that looks like this:
analitics#gmail.com#5
And it represents my userId.
I have to send that userId as parameter to the function and send it in the way that I remove number 5 after second # and append new number.
I started with something like this:
userService.getUser(user.userId.substring(0, userAfterMigration.userId.indexOf("#") + 1) + 3
What is the best way of removing everything that comes after the second # character in string above using Java?

Here is a splitting option:
String input = "analitics#gmail.com#5";
String output = String.join("#", input.split("#")[0], input.split("#")[1]) + "#";
System.out.println(output); // analitics#gmail.com#
Assuming your input would only have two at symbols, you could use a regex replacement here:
String input = "analitics#gmail.com#5";
String output = input.replaceAll("#[^#]*$", "#");
System.out.println(output); // analitics#gmail.com#

You can capture in group 1 what you want to keep, and match what comes after it to be removed.
In the replacement use capture group 1 denoted by $1
^((?:[^#\s]+#){2}).+
^ Start of string
( Capture group 1
(?:[^#\s]+#){2} Repeat 2 times matching 1+ chars other than #, and then match the #
) Close group 1
.+ Match 1 or more characters that you want to remove
Regex demo | Java demo
String s = "analitics#gmail.com#5";
System.out.println(s.replaceAll("^((?:[^#\\s]+#){2}).+", "$1"));
Output
analitics#gmail.com#
If the string can also start with ##1 and you want to keep ## then you might also use:
^((?:[^#]*#){2}).+
Regex demo

The simplest way that would seem to work for you:
str = str.replaceAll("#[^.]*$", "");
See live demo.
This matches (and replaces with blank to delete) # and any non-dot chars to the end.

REGEX: GETTING VALUE OF href="" EXCEPT FOR PATICULAR STRING

Here's my regex code:
\\s*(?i)href\\s*=\\s*(\"(([^\"]*\")|'[^']*'|([^'\">\\s]+)))
Actually the real problem is like this. I want to change the value for each href that will match except for these two types <link href="foo.css"> and <link href="boo.ico">. I want to retain the value of these two Strings.
Pattern p = Pattern.compile(HTML_A_HREF);
Matcher m = p.matcher(getLine());
setNewLine(m.replaceAll((String.format("%-1s", sp))+"href=\"javascript:history.go(0)\"" + (String.format("%-1s", sp))));
getLine() is the html file itself.
String sp = "";

Your regex is off. To show you, let me explode it:
\\s*(?i)href\\s*=\\s*
(\"
(
([^\"]*\")
|
'[^']*'
|
([^'\">\\s]+)
)
)
The leading double-quote is outside the multi-choice block. It needs to be in the first choice section.
Also:
You should put (?i) first.
With \" inside first choice, one set of parenthesis goes away.
You don't need parenthesis in choice sections.
Parenthesis around choice block should be non-capturing.
So, that means:
(?i)\\s*href\\s*=\\s*
(?:
\"[^\"]*\"
|
'[^']*'
|
[^'\">\\s]+
)
Which is (?i)\\s*href\\s*=\\s*(?:\"[^\"]*\"|'[^']*'|[^'\">\\s]+).
As for the replacement code:
String sp = "";
m.replaceAll((String.format("%-1s", sp))
+
"href=\"javascript:history.go(0)\""
+
(String.format("%-1s", sp))
)
What is the purpose of (String.format("%-1s", sp)) when sp = ""??? An empty string, formatted to fit at least 1 space, left-aligned. That is a single space, i.e. " ", so why all that overhead?
m.replaceAll(" href=\"javascript:history.go(0)\" ")
Finally, you want to exclude foo.css and boo.ico.
One way to do that is with a negative lookahead. Since you have 3 choices, you need to repeat it 3 times:
(?i)\\s*href\\s*=\\s*
(?:
\"(?!foo\\.css|boo\\.ico)[^\"]*\"
|
'(?!foo\\.css|boo\\.ico)[^']*'
|
(?!foo\\.css|boo\\.ico)[^'\">\\s]+
)
I'll let you collapse that back to one line.
UPDATE
If you want to exclude all .css and .ico files, use a negative lookbehind instead.
Also, I forgot to escape the . before, sorry. Fixed that.
(?i)\\s*href\\s*=\\s*
(?:
\"[^\"]*(?<!\\.css|\\.ico)\"
|
'[^']*(?<!\\.css|\\.ico)'
|
[^'\">\\s]+(?<!\\.css|\\.ico)
)

Java regex to replace whitespaces in css class string with dots

Given a class attribute of an HTML element, I want to generate a String suitable for css selector.
For instance, the element's class attribute value:
' foo bar baz '
OR
'foo bar baz '
Should both become a css selector:
'.foo.bar.baz'
Right now, I'm using:
String classCssSelector = "." + classProp.trim().replaceAll("\\s+", ".");
But I want to get rid of the leading hardcoded "." + and the trim() and make it all one replaceAll call.
EDIT:
I used the regex provided in #anubhava's answer, but then wanted it to also match an already 'dotted' class like this:
' foo bar baz '
The following regex works for both cases:
^(?!\.)| +(?= *\S)

You can use this regex in replaceAll:
^ *| +(?= *\S)
RegEx Demo
Code:
String classCssSelector = classProp.replaceAll("^ *| +(?= *\\S)", ".");
Explanation:
^ - Match beginning
^ * - Match 0 or more spaces at start
| - Alternation in regex
| + - Match 1 more spaces after `|`
(?= *\\S) - Lookahead to make sure there is at least one non-space character after matching
space in previous match

Weird password check matching using regex in Java

I'm trying to check a password with the following constraint:
at least 9 characters
at least 1 upper case
at least 1 lower case
at least 1 special character into the following list:
~ ! # # $ % ^ & * ( ) _ - + = { } [ ] | : ; " ' < > , . ?
no accentuated letter
Here's the code I wrote:
Pattern pattern = Pattern.compile(
"(?!.*[âêôûÄéÆÇàèÊùÌÍÎÏÐîÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ€£])"
+ "(?=.*\\d)"
+ "(?=.*[a-z])"
+ "(?=.*[A-Z])"
+ "(?=.*[`~!##$%^&*()_\\-+={}\\[\\]\\\\|:;\"'<>,.?/])"
+ ".{9,}");
Matcher matcher = pattern.matcher(myNewPassword);
if (matcher.matches()) {
//do what you've got to do when you
}
The issue is that some characters like € or £ doesn't make the password wrong.
I don't understand why this is working that way since I explicitly exclude € and £ from the authorized list.

Rather than trying to disallow those non-ascii characters why not makes your regex accept only ASCII characters like this:
Pattern pattern = Pattern.compile(
"(?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\\p{Print})\\p{ASCII}{9,})");
Also see use of \p{Print} instead of the big character class. I believe that would be suffice for you.
Check Javadoc for more details

This just allows printable Ascii. Note that it allows space character, but you could disallow space by setting \x21 instead.
Edit - I didn't see a number in the requirement, saw it in your regex, but wasn't sure.
# "^(?=.*[A-Z])(?=.*[a-z])(?=.*[`~!##$%^&*()_\\-+={}\\[\\]|:;\"'<>,.?])[\\x20-\\x7E]{9,}$"
^
(?= .* [A-Z] )
(?= .* [a-z] )
(?= .* [`~!##$%^&*()_\-+={}\[\]|:;"'<>,.?] )
[\x20-\x7E]{9,}
$

RegEx split string with on a delimeter(semi-colon ;) except those that appear inside a string

I have a Java String which is actually an SQL script.
CREATE OR REPLACE PROCEDURE Proc
AS
b NUMBER:=3;
c VARCHAR2(2000);
begin
c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;';
end Proc;
I want to split the script on semi-colon except those that appear inside a string.
The desired output is four different strings as mentioned below
1- CREATE OR REPLACE PROCEDURE Proc AS b NUMBER:=3
2- c VARCHAR2(2000)
3- begin c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;';
4- end Proc
Java Split() method will split above string into tokens as well. I want to keep this string as it is as the semi-colons are inside quotes.
c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;';
Java Split() method output
1- c := 'BEGIN ' || ' :1 := :1 + :2
2- ' || 'END
3- '
Please suggest a RegEx that could split the string on semi-colons except those that come inside string.
===================== CASE-2 ========================
Above Section has been answered and its working
Here is another more complex case
======================================================
I have an SQL Script and I want to tokenize each SQL query. Each SQL query is separated by either semi-colon(;) or forward slash(/).
1- I want to escape semi colon or / sign if they appear inside a string like
...WHERE col1 = 'some ; name/' ..
2- Expression must also escape any multiline comment syntax which is /*
Here is the input
/*Query 1*/
SELECT
*
FROM tab t
WHERE (t.col1 in (1, 3)
and t.col2 IN (1,5,8,9,10,11,20,21,
22,23,24,/*Reaffirmed*/
25,26,27,28,29,30,
35,/*carnival*/
75,76,77,78,79,
80,81,82, /*Damark accounts*/
84,85,87,88,90))
;
/*Query 2*/
select * from table
/
/*Query 3*/
select col form tab2
;
/*Query 4*/
select col2 from tab3 /*this is a multi line comment*/
/
Desired Result
[1]: /*Query 1*/
SELECT
*
FROM tab t
WHERE (t.col1 in (1, 3)
and t.col2 IN (1,5,8,9,10,11,20,21,
22,23,24,/*Reaffirmed*/
25,26,27,28,29,30,
35,/*carnival*/
75,76,77,78,79,
80,81,82, /*Damark accounts*/
84,85,87,88,90))
[2]:/*Query 2*/
select * from table
[3]: /*Query 3*/
select col form tab2
[4]:/*Query 4*/
select col2 from tab3 /*this is a multi line comment*/
Half of it can already be achieved by what was suggested to me in the previous post( link a start) but when comments syntax(/*) is introduced into the queries and each query can also be separated by forward slash(/), expression doesn't work.

The regular expression pattern ((?:(?:'[^']*')|[^;])*); should give you what you need. Use a while loop and Matcher.find() to extract all the SQL statements. Something like:
Pattern p = Pattern.compile("((?:(?:'[^']*')|[^;])*);";);
Matcher m = p.matcher(s);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": " + m.group(1));
}
Using the sample SQL you provided, will output:
1: CREATE OR REPLACE PROCEDURE Proc
AS
b NUMBER:=3
2:
c VARCHAR2(2000)
3:
begin
c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;'
4:
end Proc
If you want to get the terminating ;, use m.group(0) instead of m.group(1).
For more information on regular expressions, see the Pattern JavaDoc and this great reference. Here's a synopsis of the pattern:
( Start capturing group
(?: Start non-capturing group
(?: Start non-capturing group
' Match the literal character '
[^'] Match a single character that is not '
* Greedily match the previous atom zero or more times
' Match the literal character '
) End non-capturing group
| Match either the previous or the next atom
[^;] Match a single character that is not ;
) End non-capturing group
* Greedily match the previous atom zero or more times
) End capturing group
; Match the literal character ;

What you might try is just splitting on ";". Then for each string, if it has an odd number of 's, concatenate it with the following string until it has an even number of 's adding the ";"s back in.

I was having the same issue. I saw previous recommendations and decided to improve handling for:
Comments
Escaped single quotes
Single querys not ended by semicolon
My solution is written for java. Some things as backslash ecaping and DOTALL mode may change from one language to another one.
this worked for me "(?s)\s*((?:'(?:\\.|[^\\']|''|)'|/\.*?\*/|(?:--|#)[^\r\n]|[^\\'])?)(?:;|$)"
"
(?s) DOTALL mode. Means the dot includes \r\n
\\s* Initial whitespace
(
(?: Grouping content of a valid query
' Open string literal
(?: Grouping content of a string literal expression
\\\\. Any escaped character. Doesn't matter if it's a single quote
|
[^\\\\'] Any character which isn't escaped. Escaping is covered above.
|
'' Escaped single quote
) Any of these regexps are valid in a string literal.
* The string can be empty
' Close string literal
|
/\\* C-style comment start
.*? Any characters, but as few as possible (doesn't include */)
\\*/ C-style comment end
|
(?:--|#) SQL comment start
[^\r\n]* One line comment which ends with a newline
|
[^\\\\'] Anything which doesn't have to do with a string literal
) Theses four tokens basically define the contents of a query
*? Avoid greediness of above tokens to match the end of a query
)
(?:;|$) After a series of query tokens, find ; or EOT
"
As for your second case, please notice the last part of the regexp expresses how your regular expression will be ended. Right now it only accepts semicolon or end of text. However, you can add whatever you want to the ending. For example (?:;|#|/|$) accepts at and slash as ending characters. Haven't tested this solution for you, but shouldn't be hard.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Making group constraints in Regex in android - java

Related

How to remove everything after specific character in string using Java

REGEX: GETTING VALUE OF href="" EXCEPT FOR PATICULAR STRING

Java regex to replace whitespaces in css class string with dots

Weird password check matching using regex in Java

RegEx split string with on a delimeter(semi-colon ;) except those that appear inside a string

Categories

Resources