Any regular expression to extract property from String? - java

I have a String which contains a set of properties like the following one:
"T=Junior Developer, DNQ=13346057, SURNAME=Doe, GIVENNAME=John, SERIALNUMBER=UK"
Is there a Regular Expression which can be used in Java to gather the individual properties (such as the SURNAME) ?
Thanks

This small example shows how to access the property name and its value in your example string. This is for all properties and values in the string.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Test {
public static void main(String[] args) {
String s = "T=Junior Developer, DNQ=13346057, SURNAME=Doe, GIVENNAME=John, SERIALNUMBER=UK";
Matcher m = Pattern.compile("(?:, )?([^=]+)\\=([^,]+)").matcher(s);
while (m.find()) {
System.out.println(m.group(1) + " - " + m.group(2));
}
}
}
Explanation of the regex:
(?:, )?([^=]+)\\=([^,]+)
(?:, )? is a non-capturing group that can, but does not have to occur. It matches the seperation by the comma and space between the property-value pairs.
([^=]+) is a group that matches one or more characters until a = appears.
\\= matches the =. It is a special character and thus has to be escaped.
([^,]+) matches one or more characters up to the next ,, when the next propery will start.

SURNAME=[^,]+
You can use this.Or to be safer you can also use
SURNAME=.*?(?=,\s)
The second one will work even if you have , in your surname.

Related

Split a string by commas but no inside parenthesis [duplicate]

I have a string that looks something like the following:
12,44,foo,bar,(23,45,200),6
I'd like to create a regex that matches the commas, but only the commas that are not inside of parentheses (in the example above, all of the commas except for the two after 23 and 45). How would I do this (Java regular expressions, if that makes a difference)?
Assuming that there can be no nested parens (otherwise, you can't use a Java Regex for this task because recursive matching is not supported):
Pattern regex = Pattern.compile(
", # Match a comma\n" +
"(?! # only if it's not followed by...\n" +
" [^(]* # any number of characters except opening parens\n" +
" \\) # followed by a closing parens\n" +
") # End of lookahead",
Pattern.COMMENTS);
This regex uses a negative lookahead assertion to ensure that the next following parenthesis (if any) is not a closing parenthesis. Only then the comma is allowed to match.
Paul, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
Also the existing solution checks that the comma is not followed by a parenthesis, but that does not guarantee that it is embedded in parentheses.
The regex is very simple:
\(.*?\)|(,)
The left side of the alternation matches complete set of parentheses. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.
In this demo, you can see the Group 1 captures in the lower right pane.
You said you want to match the commas, but you can use the same general idea to split or replace.
To match the commas, you need to inspect Group 1. This full program's only goal in life is to do just that.
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) throws java.lang.Exception {
String subject = "12,44,foo,bar,(23,45,200),6";
Pattern regex = Pattern.compile("\\(.*?\\)|(,)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();
// put Group 1 captures in a list
while (regexMatcher.find()) {
if(regexMatcher.group(1) != null) {
group1Caps.add(regexMatcher.group(1));
}
} // end of building the list
// What are all the matches?
System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
for (String match : group1Caps) System.out.println(match);
}
} // end main
} // end Program
Here is a live demo
To use the same technique for splitting or replacing, see the code samples in the article in the reference.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
I don’t understand this obsession with regular expressions, given that they are unsuited to most tasks they are used for.
String beforeParen = longString.substring(longString.indexOf('(')) + longString.substring(longString.indexOf(')') + 1);
int firstComma = beforeParen.indexOf(',');
while (firstComma != -1) {
/* do something. */
firstComma = beforeParen.indexOf(',', firstComma + 1);
}
(Of course this assumes that there always is exactly one opening parenthesis and one matching closing parenthesis coming somewhen after it.)

Need regex help - How to extract a String using regex?

I am looking for regex to extract a string from another string.
"sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule REQUIRED
storeKey=true principal='test#test.net' validate=true serviceName=esaas
keyTab='<some value>' useKeyTab=true;"
How to I extract the string after keyTab= I want to retrieve the value inside the single quotes -
Use the regex keyTab='(.*?)' and match the group 1. In java, your code should look like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
String content = "\"sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule REQUIRED \r\n" +
"storeKey=true principal='test#test.net' validate=true serviceName=esaas \r\n" +
"keyTab='<some value>' useKeyTab=true;\"";
Pattern pattern = Pattern.compile("keyTab='(.*?)'");
Matcher matcher = pattern.matcher(content);
matcher.find();
System.out.println(matcher.group(1)); //<some value>
}
}
Something that will work in most regex engine is to look for both the thing you want and the thing before it.
And put the thing you want in a capture group
This regex will put what's between the quotes in capture group \1
\bkeyTab=\'([^\']*)\'
The \b is a word boundary to make sure keyTab isn't part of a larger word.
You can use this expression to find it:
keyTab='(.*?)'
It will find all the values around keyTab='...', but will only capture what is between the quotes.
[\n\r].*keyTab='\s*([^\n\r]*)'
Your desired match will be in capture group 1.

Regex matching word that is in the middle of any character except a letter

I'd like to know how to detect word that is between any characters except a letter from alphabet. I need this, because I'm working on a custom import organizer for Java. This is what I have already tried:
The regex expression:
[^(a-zA-Z)]InitializationEvent[^(a-zA-Z)]
I'm searching for the word "InitializationEvent".
The code snippet I've been testing on:
public void load(InitializationEvent event) {
It looks like adding space before the word helps... is the parenthesis inside of alphabet range?
I tested this in my program and it didn't work. Also I checked it on regexr.com, showing same results - class name not recognized.
Am I doing something wrong? I'm new to regex, so it might be a really basic mistake, or not. Let me know!
Lose the parentheses:
[^a-zA-Z]InitializationEvent[^a-zA-Z]
Inside [], parentheses are taken literally, and by inverting the group (^) you prevent it from matching because a ( is preceding InitializationEvent in your string.
Note, however, that the above regex will only match if InitializationEvent is neither at the beginning nor at the end of the tested string. To allow that, you can use:
(^|[^a-zA-Z])InitializationEvent([^a-zA-Z]|$)
Or, without creating any matching groups (which is supposed to be cleaner, and perform better):
(?:^|[^a-zA-Z])InitializationEvent(?:[^a-zA-Z]|$)
how to detect word that is between any characters except a letter from alphabet
This is the case where lookarounds come handy. You can use:
(?<![a-zA-Z])InitializationEvent(?![a-zA-Z])
(?<![a-zA-Z]) is negative lookbehind to assert that there is no alphabet at previous position
(?![a-zA-Z]) is negative lookahead to assert that there is no alphabet at next position
RegEx Demo
The parentheses are causing the problem, just skip them:
"[^a-zA-Z]InitializationEvent[^a-zA-Z]"
or use the predefined non-word character class which is slightly different because it also excludes numbers and the underscore:
"\\WInitializationEvent\\W"
But as it seems you want to match a class name, this might be ok because the remaining character are exactly those that are allowed in a class name.
I'm not sure about your application but from a regexp perspective you can use negative lookaheads and negative lookbehinds to define what cannot surround the String to specify a match.
I have added the negative lookahead (?![a-zA-Z]) and the negative lookbehind (?<![a-zA-Z]) in place of your [^(a-zA-Z)] originally supplied to create: (?<![a-zA-Z])InitializationEvent(?![a-zA-Z])
Quick Fiddle I created:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld{
public static void main(String []args){
String pattern = "(?<![a-zA-Z])InitializationEvent(?![a-zA-Z])";
String sourceString = "public void load(InitializationEvent event) {";
String sourceString2 = "public void load(BInitializationEventA event) {";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(sourceString);
if (m.find( )) {
System.out.println("Found value of pattern in sourceString: " + m.group(0) );
} else {
System.out.println("NO MATCH in sourceString");
}
Matcher m2 = r.matcher(sourceString2);
if (m2.find( )) {
System.out.println("Found value of pattern in sourceString2: " + m2.group(0) );
} else {
System.out.println("NO MATCH in sourceString2");
}
}
}
output:
sh-4.3$ java -Xmx128M -Xms16M HelloWorld
Found value of pattern in sourceString: InitializationEvent
NO MATCH in sourceString2
You seem really close:
[^(a-zA-Z)]*(InitializationEvent)[^(a-zA-Z)]*
I think this is what you are looking for. The asterisk provides a match for zero or many of the character or group before it.
EDIT/UPDATE
My apologies on the initial response.
[^a-zA-Z]+(InitializationEvent)[^a-zA-Z]+
My regex is a little rusty, but this will match on any non-alphabet character one or many times prior to the InitializationEvent and after.

Explicitly defining the end of the input in a regular expression using $

I have this code using a regular expression to separate an input string into two words, where the second word is optional (I know that I might use String.split() in this particular case, but the actual regular expression is a bit more complex):
package com.example;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Dollar {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(.*?)\\s*(?: (.*))?$"); // Works
//Pattern pattern = Pattern.compile("(.*?)\\s*(?: (.*))?"); // Does not work
Matcher matcher = pattern.matcher("first second");
matcher.find();
System.out.println("first : " + matcher.group(1));
System.out.println("second: " + matcher.group(2));
}
}
With this code, I get the expected output
first : first
second: second
and it also works if the second word is not there.
However, if I use the other regexp (without the dollar sign at then end), I get empty strings / nulls for the capture groups.
My question is: Why do I have to explicitly put a dollar sign at the end of the regexp to match the "the end of the input sequence" (as the Javadoc says)? In other words, why is the end of the regular expression not implicitly treated as the end of the input sequence?
That is due to lazy nature of your regex which finds & captures many empty matches.
If you use this better regex:
(\S+)(?: (.*))?
Then it will also work with:
(\S+)(?: (.*))?$

how to read string part in java

I have this string :
<meis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="localhost/naro-nei" onded="flpSW531213" identi="lemenia" id="75" lastStop="bendi" xsi:noNamespaceSchemaLocation="http://localhost/xsd/postat.xsd xsd/postat.xsd">
How can I get lastStop property value in JAVA?
This regex worked when tested on http://www.myregexp.com/
But when I try it in java I don't see the matched text, here is how I tried :
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class SimpleRegexTest {
public static void main(String[] args) {
String sampleText = "<meis xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" uri=\"localhost/naro-nei\" onded=\"flpSW531213\" identi=\"lemenia\" id=\"75\" lastStop=\"bendi\" xsi:noNamespaceSchemaLocation=\"http://localhost/xsd/postat.xsd xsd/postat.xsd\">";
String sampleRegex = "(?<=lastStop=[\"']?)[^\"']*";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
String matchedText = m.group();
System.out.println("matched [" + matchedText + "]");
} else {
System.out.println("didn’t match");
}
}
}
Maybe the problem is that I use escape char in my test , but real string doesn't have escape inside. ?
UPDATE
Does anyone know why this doesn't work when used in java ? or how to make it work?
(?<=lastStop=[\"']?)[^\"]+
The reason it doesn't work as you expect is because of the * in [^\"']*. The lookbehind is matching at the position before the " in lastStop=", which is permitted because the quote is optional: [\"']?. The next part is supposed to match zero or more non-quote characters, but because the next character is a quote, it matches zero characters.
If you change that * to a +, the second part will fail to match at that position, forcing the regex engine to bump ahead one more position. The lookbehind will match the quote, and [^\"']+ will match what follows. However, you really shouldn't be using a lookbehind for this in the first place. It's much easier to just match the whole sequence in the normal way and extract the part you want to keep via a capturing group:
String sampleRegex = "lastStop=[\"']?([^\"']*)";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
String matchedText = m.group(1);
System.out.println("matched [" + matchedText + "]");
} else {
System.out.println("didn’t match");
}
It will also make it easier to deal with the problem #Kobi mentioned. You're trying to allow for values contained in double-quotes, single-quotes or no quotes, but your regex is too simplistic. For one thing, a quoted value can contain whitespace, but an unquoted one can't. To deal with all three possibilities, you'll need two or three capturing groups, not just one.

Categories

Resources