Java - regex parse string - java

Trying to parse out names with given samples
++++++++++++++++++SELIZABETH+COLLAZO+++++++++++++++++++
+++++++++++++++++++PALOMA+CORREA+++++++++++++++++++++++
+++++++++++++++++++NOAH+BLAKEMORE++++++++++++++++++++++
I've tried
//++(.*?)+(.*?)//++
but that's way off.
Would like to parse out the first and last name to two strings.

You can use this regex (\w+)\+(\w+) or \+{2,}(.*?)\+(.*?)\+{2,} with Pattern like this :
String str = "++++++++++++++++++SELIZABETH+COLLAZO+++++++++++++++++++\n"
+ "+++++++++++++++++++PALOMA+CORREA+++++++++++++++++++++++\n"
+ "+++++++++++++++++++NOAH+BLAKEMORE++++++++++++++++++++++";
Pattern pattern = Pattern.compile("(\\w+)\\+(\\w+)");// or instead "\\+{2,}(.*?)\\+"(.*?)\\+{2,}
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
Outputs
SELIZABETH COLLAZO
PALOMA CORREA
NOAH BLAKEMORE

Related

RegEx to extract text between tags in Java

I need to extract the values after :70: in the following text file using RegEx. Value may contain line breaks as well.
My current solution is to extract the string between :70: and : but this always returns only one match, the whole text between the first :70: and last :.
:32B:xxx,
:59:yyy
something
:70:ACK1
ACK2
:21:something
:71A:something
:23E:something
value
:70:ACK2
ACK3
:71A:something
How can I achive this using Java? Ideally I want to iterate through all values, i.e.
ACK1\nACK2,
ACK2\nACK3
Thanks :)
Edit: What I'm doing right now,
Pattern pattern = Pattern.compile("(?<=:70:)(.*)(?=\n)", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println(matcher.group())
}
Try this.
String data = ""
+ ":32B:xxx,\n"
+ ":59:yyy\n"
+ "something\n"
+ ":70:ACK1\n"
+ "ACK2\n"
+ ":21:something\n"
+ ":71A:something\n"
+ ":23E:something\n"
+ "value\n"
+ ":70:ACK2\n"
+ "ACK3\n"
+ ":71A:something\n";
Pattern pattern = Pattern.compile(":70:(.*?)\\s*:", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find())
System.out.println("found="+ matcher.group(1));
result:
found=ACK1
ACK2
found=ACK2
ACK3
You need a loop to do this.
Pattern p = Pattern.compile(regexPattern);
List<String> list = new ArrayList<String>();
Matcher m = p.matches(input);
while (m.find()) {
list.add(m.group());
}
As seen here Create array of regex matches

Java multiple regular expression search

I have a string some thing like this:
If message contains sensitive info like: {Password:123456, tmpPwd : tesgjadgj, TEMP_PASSWORD: kfnda}
My pattern should look for the particular words Password or tmpPwd or TEMP_PASSWORD.
How can I create a pattern for this kind of search?
I think you are looking for the values after these words. You need to set capturing groups to extract those values, e.g.
String content = "If message contains sensitive info like: {Password:123456, tmpPwd : tesgjadgj, TEMP_PASSWORD: kfnda} ";
Pattern p = Pattern.compile("\\{Password\\s*:\\s*([^,]+)\\s*,\\s*tmpPwd\\s*:\\s*([^,]+)\\s*,\\s*TEMP_PASSWORD:\\s*([^,]+)\\s*\\}");
Matcher m = p.matcher(content);
while (m.find()) {
System.out.println(m.group(1) + ", " + m.group(2) + ", " + m.group(3));
}
See IDEONE demo
This will output 123456, tesgjadgj, kfnda.
To just find out if there are any of the substrings, use contains method:
System.out.println(content.contains("Password") ||
content.contains("tmpPwd") ||
content.contains("TEMP_PASSWORD"));
See another demo
And if you want a regex-solution for the keywords, here it is:
String str = "If message contains sensitive info like: {Password:123456, tmpPwd : tesgjadgj, TEMP_PASSWORD: kfnda} ";
Pattern ptrn = Pattern.compile("Password|tmpPwd|TEMP_PASSWORD");
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println("Match found: " + m.group(0));
}
See Demo 3
Finally I am using it like as per my requirement .
private final static String censoredWords =
"(?i)PASSWORD|pwd";
The (?i) makes it case-insensitive

Java regex comparing group to string

I am trying to do a replacement using regex. The relevant piece of code is as follows:
String msg =" <ClientVerificationResult>\n " +
" <VerificationIDCheck>Y</VerificationIDCheck>\n" +
" </ClientVerificationResult>\n";
String regex = "(<VerificationIDCheck>)([Y|N])(</VerificationIDCheck>)";
String replacedMsg= msg.replaceAll(regex, "$2".matches("Y") ? "$1YES$3" : "$1NO$3") ;
System.out.println(replacedMsg);
The output of this is
<ClientVerificationResult>
<VerificationIDCheck>NO</VerificationIDCheck>
</ClientVerificationResult>
When it should be
<ClientVerificationResult>
<VerificationIDCheck>YES</VerificationIDCheck>
</ClientVerificationResult>
I guess the problem is that "$2".matches("Y") is returning false. I have tried doing "$2".equals("Y"); and weird combinations inside matches() like "[Y]" or "([Y])", but still nothing.
If I print "$2" the output is Y. Any hints on what am I doing wrong?
You cannot use Java code as the replacement argument for replaceAll which is supposed to be a string only. Better use Pattern and Matcher APIs and evaluate matcher.group(2) for your replacement logic.
Suggested Code:
String msg =" <ClientVerificationResult>\n " +
" <VerificationIDCheck>Y</VerificationIDCheck>\n" +
" </ClientVerificationResult>\n";
String regex = "(<VerificationIDCheck>)([YN])(</VerificationIDCheck>)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher( msg );
StringBuffer sb = new StringBuffer();
while (m.find()) {
String repl = m.group(2).matches("Y") ? "YES" : "NO";
m.appendReplacement(sb, m.group(1) + repl + m.group(3));
}
m.appendTail(sb);
System.out.println(sb); // replaced string
You are checking the literal string "$2" to see if it matches "Y". This will never happen.

Regex not matching words delimited by whitespace

I have an input string that will follow the pattern /user/<id>?name=<name>, where <id> is alphanumeric but must start with a letter, and <name> is a letter-only string that can have multiple spaces. Some examples of matches would be:
/user/ad?name=a a
/user/one111?name=one ONE oNe
/user/hello?name=world
I came up with the following regex:
String regex = "/user/[a-zA-Z]+\\w*\\?name=[a-zA-Z\\s]+";
All of the above examples match the regex, but it only looks at the first word in <name>. Shouldn't the sequence \s allow me to have white spaces?
The code that I made to test what it is doing is:
String regex = "/user/[a-zA-Z]+\\w*\\?name=[a-zA-Z\\s]+";
// Check to see that input matches pattern
if(Pattern.matches(regex, str) == true){
str = str.replaceFirst("/user/", "");
str = str.replaceFirst("name=", "");
String[] tokens = str.split("\\?");
System.out.println("size = " + tokens.length);
System.out.println("tokens[0] = " + tokens[0]);
System.out.println("tokens[1] = " + tokens[1]);
} else
System.out.println("Didn't match.");
So for example, one test might look like:
/user/myID123?name=firstName LastName
size = 2
tokens[0] = myID123
tokens[1] = firstName
whereas the desired output would be
tokens[1] = firstName LastName
How can I change my regex to do this?
Not sure what you think is the problem in your code. tokens[1] will indeed contain firstName LastName in your example.
Here's an ideone.com demo showing this.
However, have you considered using capturing groups for the id and the name.
If you write it like
String regex = "/user/(\\w+)\\?name=([a-zA-Z\\s]+)";
Matcher m = Pattern.compile(regex).matcher(input);
you can get hold of myID123 and firstName LastName through m.group(1) and m.group(2)
I don't find any fault in your code but you may capture group like this:
String str = "/user/myID123?name=firstName LastName ";
String regex = "/user/([a-zA-Z]+\\w*)\\?name=([a-zA-Z\\s]+)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1) + ", " + m.group(2));
}
The problem is that * is greedy by default (it matches the whole string), so you need to modify your regex by adding a ? (making it reluctant):
List<String> str = Arrays.asList("/user/ad?name=a a", "/user/one111?name=one ONE oNe", "/user/hello?name=world");
String regex = "/user/([a-zA-Z]+\\w*?)\\?name=([a-zA-Z\\s]+)";
for (String s : str) {
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.matches()) {
System.out.println("user: " + matcher.group(1));
System.out.println("name: " + matcher.group(2));
}
}
Output:
user: ad
name: a a
user: one111
name: one ONE oNe
user: hello
name: world

Need help with using regular expression in Java

I am trying to match pattern like '#(a-zA-Z0-9)+ " but not like 'abc#test'.
So this is what I tried:
Pattern MY_PATTERN
= Pattern.compile("\\s#(\\w)+\\s?");
String data = "abc#gere.com #gogasig #jytaz #tibuage";
Matcher m = MY_PATTERN.matcher(data);
StringBuffer sb = new StringBuffer();
boolean result = m.find();
while(result) {
System.out.println (" group " + m.group());
result = m.find();
}
But I can only see '#jytaz', but not #tibuage.
How can I fix my problem? Thank you.
This pattern should work: \B(#\w+)
The \B scans for non-word boundary in the front. The \w+ already excludes the trailing space. Further I've also shifted the parentheses so that the # and + comes in the correct group. You should preferably use m.group(1) to get it.
Here's the rewrite:
Pattern pattern = Pattern.compile("\\B(#\\w+)");
String data = "abc#gere.com #gogasig #jytaz #tibuage";
Matcher m = pattern.matcher(data);
while (m.find()) {
System.out.println(" group " + m.group(1));
}

Categories

Resources