java grouping regex fail to match string+text

java grouping regex fail to match string+text - java

I wrote this test
#Test
public void removeRequestTextFromRouteError() throws Exception {
String input = "Failed to handle request regression_4828 HISTORIC_TIME from=s:33901510 tn:27825741 bd:false st:Winifred~Dr to=s:-1 d:false f:-1.0 x:-73.92752 y:40.696857 r:-1.0 cd:-1.0 fn:-1 tn:-1 bd:true 1 null false null on subject RoutingRequest";
final String answer = stringUtils.removeRequestTextFromError(input);
String expected = "Failed to handle request _ on subject RoutingRequest";
assertThat(answer, equalTo(expected));
}
which runs this method, but fails
public String removeRequestTextFromError(String answer) {
answer = answer.replaceAll("regression_\\d\\[.*?\\] on subject", "_ on subject");
return answer;
}
The input text stays the same and not replaced with "_"
how can I change the pattern matching to fix this?

You are using the a wrong regex. You are escaping [ and ] (not necessary at all) and using \\d instead of \\d+. Also, you should use a positive look-ahead instead of actually selecting and replacing the String "on subject"
Use :
public static void main(String[] args) {
String input = "Failed to handle request regression_4828 HISTORIC_TIME from=s:33901510 tn:27825741 bd:false st:Winifred~Dr to=s:-1 d:false f:-1.0 x:-73.92752 y:40.696857 r:-1.0 cd:-1.0 fn:-1 tn:-1 bd:true 1 null false null on subject RoutingRequest";
final String answer = input.replaceAll("regression_.* (?=on subject)", "_ ");
System.out.println(answer);
String expected = "Failed to handle request _ on subject RoutingRequest";
System.out.println(answer.equals(expected));
}
O/P :
Failed to handle request _ on subject RoutingRequest
true

As an alternative to the answer given by #TheLostMind, you can try breaking your input into 3 pieces, the second piece being what you want to match and then remove.
Each quantity in parentheses, if matched, will be available as a capture group. Here is the regex with the capture groups labelled:
(.*)(regression_\\d+.* on subject)(.*)
$1 $2 $3
You want to retain $1 and $3:
public String removeRequestTextFromError(String answer) {
answer = answer.replaceAll("(.*)(regression_\\d+.* on subject)(.*)", "$1$3");
}

Related

Mask Email in JSON String Using regex

I have an email masking regex and now I am trying to apply it on JSON Strings for masking email.
Regex: (?<=.{1})(?=[a-zA-Z0-9]).(?=.*#)
It works fine if we apply it on email in String variable.
String s = "test.ing%02#gmail.com";
s= s.replaceAll("(?<=.{1})(?=[a-zA-Z0-9]).(?=.*#)", "*");
Output: t***.***%**#gmail.com
Now I am trying to apply it on JSON String which contains the email field. I selected the email field but regex is not identifying its value
String jsonString = "{ \"name\":\"jhon\", \"email\":\"test.ing%02#gmail.com\" }";
String result = jsonString.replaceAll("(?<=email\":\")((?<=.{1})(?=[a-zA-Z0-9]).(?=.*#))(?=\")", "*");
System.out.println(result);
Actual Output: { "name":"jhon", "email":"test.ing%02#gmail.com" }
Expected Output: { "name":"jhon", "email":"t***.***%**#gmail.com" }

You might update the pattern to making use of a finite lookbehind assertion:
(?<=email":"[^\s"]{1,100})[a-zA-Z0-9](?=[^\s"#]*#)
The pattern in parts:
(?<=email":" Positive lookbehind, assert email":" to the left
[^\s"]{1,100} Match 1-100 times a non whitespace char other than " to the left (adjust the quantifier as needed)
) Close the lookbehind
[a-zA-Z0-9] Match a single char a-zA-Z0-9
(?=[^\s"#]*#) Positive lookahead, assert a # to the right without crossing double quotes
See a regex101 demo and a Java demo.
Example in Java with the doubled backslashes and escaped double quotes:
String jsonString = "{ \"name\":\"jhon\", \"email\":\"test.ing%02#gmail.com\" }";
String result = jsonString.replaceAll("(?<=email\":\"[^\\s\"]{1,100})[a-zA-Z0-9](?=[^\\s\"#]*#)", "*");
System.out.println(result);
Output
{ "name":"jhon", "email":"t***.***%**#gmail.com" }

Find the position of slash 1

String uRL = JOptionPane.showInputDialog("Enter a URL "); int colon = uRL.indexOf(":");
System.out.println("The position of colon is "+ colon);
String protocol = uRL.substring(0,colon);
System.out.println("the protocol is "+ protocol);
// Declare Statements, extract and print the second part
String restOFURL = uRL.substring(colon+7);
System.out.println("The rest of Url is "+restOFURL);
int positionOfSlash1 = restOFURL.indexOf("/");
System.out.println(positionOfSlash1);
Ok so the input will be a URL, let just say for example the URL is http://www.pcwebopedia.com/files/index.html
URL will always have let say the HTTP FTTP: // www. and etc
The project that Im doing is to break down URL into different parts and I'm stuck on one of the question
The question ask me to find the position Of Slash 1, as you see in the code rest of URL is
pcwebopedia.com/files/index.html the position is 15 for the first /
while http://www.pcwebopedia.com/files/index.html is 26, this is what I want.
There was a suggestion that said to first find how many characters there is from start of the string to period "www." and add the value to positionOfSlash1

the indexOf() method also take a second parameter (fromIndex) which is used to specify to say indexOf from where you try to lookup for the parameter:
int positionOfSlash1 = restOFURL.indexOf("/", colon + 3);
positionOfSlash1 = uRL.indexOf("/", colon + 3);

There are many ways to do it. I would do it using Java Regex API.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(final String args[]) {
// Test
System.out.println(indexOfDomainNameSeparator("pcwebopedia.com/files/index.html"));
System.out.println(indexOfDomainNameSeparator("http://www.pcwebopedia.com/files/index.html"));
System.out.println(indexOfDomainNameSeparator("Hello WOrld"));
}
static int indexOfDomainNameSeparator(String url) {
String regex = "(?<![:/])/";
Matcher matcher = Pattern.compile(regex).matcher(url);
// If found, return the index; otherwise, return -1
return matcher.find() ? matcher.start() : -1;
}
}
Output:
15
26
-1
The regex, (?<![:/])/ means / not preceded by a : or /. In regex terminology, 'not preceded by' is known as the negative lookbehind. The thing inside [ ] is known as character classes.
Alternatively, using String#indexOf(int,%20int) and String.html#indexOf(int):
public class Main {
public static void main(final String args[]) {
// Test
System.out.println(indexOfDomainNameSeparator("pcwebopedia.com/files/index.html"));
System.out.println(indexOfDomainNameSeparator("http://www.pcwebopedia.com/files/index.html"));
System.out.println(indexOfDomainNameSeparator("Hello WOrld"));
}
static int indexOfDomainNameSeparator(String url) {
int indexOfColon = url.indexOf(':');
if (indexOfColon != -1) {
return url.indexOf('/', indexOfColon + 3);// Starting from indexOfColon + 3
} else {
return url.indexOf('/');
}
}
}
Output:
15
26
-1

SwiftMessage Regular expression

I have the below message:
{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4::20:TEST000001:23B:CRED:32A:141117EUR0,1:33B:EUR1000,00:50A:ANZBAU30:59:ANZBAU30:71A:SHA-}{5:{CHK:1DBBF1D81EE1}{TNG:}}
And i want it to be converted like below, with whitespaces in block 4 (which is
{4: :20:TEST000001 :23B:CRED :32A:141117EUR0,1 :33B:EUR1000,00 :50A:ANZBAU30 :59:ANZBAU30 :71A:SHA -}
{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4: :20:TEST000001 :23B:CRED :32A:141117EUR0,1 :33B:EUR1000,00 :50A:ANZBAU30 :59:ANZBAU30 :71A:SHA -}{5:{CHK:1DBBF1D81EE1}{TNG:}}
I tried to extract using groups and then apply regular expression. But, i was unsuccessfully. Unable to find the error i am making.
public static void StringReplace() {
String data = "{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4::20:TEST000001:23B:CRED:32A:141117EUR0,1:33B:EUR1000,00:50A:ANZBAU30:59:ANZBAU30:71A:SHA-}{5:{CHK:1DBBF1D81EE1}{TNG:}}";
Pattern pat = Pattern.compile("(({1:\\w+})({2:\\w+})({4::\\d+:\\w+:\\d+.:\\w+:\\d+.:\\d+\\w+,\\d:\\d+.:\\w+,\\d+:\\d+.:\\w+:\\d+:\\w+:\\d+.:\\w+-})({5:{\\w+:.\\w+}{\\w+.}}))");
Matcher m = pat.matcher(data);
if(m.matches()) {
System.out.println(m.group(0));
}
}
Thanks in Adavance

You have just matched the string and simply printed it but havn't put logic of introducing a space in between. You need to add the logic of introducing space in block 4.
Looking at the expected output of your block 4, you can first catch the block 4 using this regex,
(.*?)(\\{4.*?\\})(.*?)
and then replace colon with a space colon ( :) in group 2 content which you call as block 4. I see you are not introducing space with every colon instead just for colon which are followed by 2-3 characters followed by colon. I have implemented the logic accordingly in my replaceAll() method.
Here is the modified java code,
public static void StringReplace() {
String data = "{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4::20:TEST000001:23B:CRED:32A:141117EUR0,1:33B:EUR1000,00:50A:ANZBAU30:59:ANZBAU30:71A:SHA-}{5:{CHK:1DBBF1D81EE1}{TNG:}}";
Pattern pat = Pattern.compile("(.*)(\\{4.*?\\})(.*)");
Matcher m = pat.matcher(data);
if (m.find()) {
String g1 = m.group(1);
String g2 = m.group(2).replaceAll(":(?=\\w{2,3}:)", " :");
String g3 = m.group(3);
System.out.println(g1 + g2 + g3);
} else {
System.out.println("Didn't match");
}
}
This prints the following output as you expect,
{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4: :20:TEST000001 :23B:CRED :32A:141117EUR0,1 :33B:EUR1000,00 :50A:ANZBAU30 :59:ANZBAU30 :71A:SHA-}{5:{CHK:1DBBF1D81EE1}{TNG:}}

masking of email address in java

I am trying to mask email address with "*" but I am bad at regex.
input : nileshxyzae#gmail.com
output : nil********#gmail.com
My code is
String maskedEmail = email.replaceAll("(?<=.{3}).(?=[^#]*?.#)", "*");
but its giving me output nil*******e#gmail.com I am not getting whats getting wrong here. Why last character is not converted?
Also can someone explain meaning all these regex

Your look-ahead (?=[^#]*?.#) requires at least 1 character to be there in front of # (see the dot before #).
If you remove it, you will get all the expected symbols replaced:
(?<=.{3}).(?=[^#]*?#)
Here is the regex demo (replace with *).
However, the regex is not a proper regex for the task. You need a regex that will match each character after the first 3 characters up to the first #:
(^[^#]{3}|(?!^)\G)[^#]
See another regex demo, replace with $1*. Here, [^#] matches any character that is not #, so we do not match addresses like abc#example.com. Only those emails will be masked that have 4+ characters in the username part.
See IDEONE demo:
String s = "nileshkemse#gmail.com";
System.out.println(s.replaceAll("(^[^#]{3}|(?!^)\\G)[^#]", "$1*"));

If you're bad at regular expressions, don't use them :) I don't know if you've ever heard the quote:
Some people, when confronted with a problem, think
"I know, I'll use regular expressions." Now they have two problems.
(source)
You might get a working regular expression here, but will you understand it today? tomorrow? in six months' time? And will your colleagues?
An easy alternative is using a StringBuilder, and I'd argue that it's a lot more straightforward to understand what is going on here:
StringBuilder sb = new StringBuilder(email);
for (int i = 3; i < sb.length() && sb.charAt(i) != '#'; ++i) {
sb.setCharAt(i, '*');
}
email = sb.toString();
"Starting at the third character, replace the characters with a * until you reach the end of the string or #."
(You don't even need to use StringBuilder: you could simply manipulate the elements of email.toCharArray(), then construct a new string at the end).
Of course, this doesn't work correctly for email addresses where the local part is shorter than 3 characters - it would actually then mask the domain.

Your Look-ahead is kind of complicated. Try this code :
public static void main(String... args) throws Exception {
String s = "nileshkemse#gmail.com";
s= s.replaceAll("(?<=.{3}).(?=.*#)", "*");
System.out.println(s);
}
O/P :
nil********#gmail.com

I like this one because I just want to hide 4 characters, it also dynamically decrease the hidden chars to 2 if the email address is too short:
public static String maskEmailAddress(final String email) {
final String mask = "*****";
final int at = email.indexOf("#");
if (at > 2) {
final int maskLen = Math.min(Math.max(at / 2, 2), 4);
final int start = (at - maskLen) / 2;
return email.substring(0, start) + mask.substring(0, maskLen) + email.substring(start + maskLen);
}
return email;
}
Sample outputs:
my.email#gmail.com > my****il#gmail.com
info#mail.com > i**o#mail.com

//In Kotlin
val email = "nileshkemse#gmail.com"
val maskedEmail = email.replace(Regex("(?<=.{3}).(?=.*#)"), "*")

public static string GetMaskedEmail(string emailAddress)
{
string _emailToMask = emailAddress;
try
{
if (!string.IsNullOrEmpty(emailAddress))
{
var _splitEmail = emailAddress.Split(Char.Parse("#"));
var _user = _splitEmail[0];
var _domain = _splitEmail[1];
if (_user.Length > 3)
{
var _maskedUser = _user.Substring(0, 3) + new String(Char.Parse("*"), _user.Length - 3);
_emailToMask = _maskedUser + "#" + _domain;
}
else
{
_emailToMask = new String(Char.Parse("*"), _user.Length) + "#" + _domain;
}
}
}
catch (Exception) { }
return _emailToMask;
}

Regex pattern for query string

I need a help finding java regex pattern to get one query information from the URI.
For instance URI here is
"GET /6.2/calculateroute.xml?routeattributes=sm,wp,lg,bb&legattributes=mn&maneuverattributes=ac,po,tt,le,-rn,-sp,-di,no,nu,nr,sh&instructionFormat=html&language=en_US&mode=fastest;car;traffic:default&waypoint0=37.79548,-122.392025&waypoint1=36.0957717,-115.1745167&resolution=786&app_id=D4KnHBzGYyJtbM8lVfYX&token=TRKB7vnBguWLam5rdWshTA HTTP/1.1"
I need to extract 4 value out of it which I manage to do it:
GET
/6.2/calculateroute.xml
routeattributes=sm,wp,lg,bb&legattributes=mn&maneuverattributes=ac,po,tt,le,-rn,-sp,-di,no,nu,nr,sh&instructionFormat=html&language=en_US&mode=fastest;car;traffic:default&waypoint0=37.79548,-122.392025&waypoint1=36.0957717,-115.1745167&resolution=786&app_id=D4KnHBzGYyJtbM8lVfYX&token=TRKB7vnBguWLam5rdWshTA
HTTP/1.1
Now the question is how do I write a regex for app_id value from the query string. Note app_id do not appear in all the pattern, so it should be generic and regex should not fail if app_id is missing. Please help...

Your question can be simplified to: "How do I extract an optional query parameter from a string". Here's how:
String appId = input.replaceAll("(.*(app_id=(\\w+)).*)|.*", "$3");
The appId variable will contain the app_id value if it's present or be blank otherwise.
Here's some test code with the code bundled as a utility method:
public static String getParameterValue(String input, String parameter) {
return input.replaceAll("(.*("+parameter+"=(\\w+)).*)|.*", "$3"));
}
public static void main(String[] args) {
String input1 = "foo=bar&app_id=D4KnHBzGYyJtbM8lVfYX&x=y";
String input2 = "foo=bar&XXXXXX=D4KnHBzGYyJtbM8lVfYX&x=y";
System.out.println("app_id1:" + getParameterValue(input1, "app_id"));
System.out.println("app_id2:" + getParameterValue(input2, "app_id"));
}
Output:
app_id1:D4KnHBzGYyJtbM8lVfYX
app_id2:

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java grouping regex fail to match string+text - java

Related

Mask Email in JSON String Using regex

Find the position of slash 1

SwiftMessage Regular expression

masking of email address in java

Regex pattern for query string

Categories

Resources