I am looking for regex to extract a string from another string.
"sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule REQUIRED
storeKey=true principal='test#test.net' validate=true serviceName=esaas
keyTab='<some value>' useKeyTab=true;"
How to I extract the string after keyTab= I want to retrieve the value inside the single quotes -
Use the regex keyTab='(.*?)' and match the group 1. In java, your code should look like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
String content = "\"sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule REQUIRED \r\n" +
"storeKey=true principal='test#test.net' validate=true serviceName=esaas \r\n" +
"keyTab='<some value>' useKeyTab=true;\"";
Pattern pattern = Pattern.compile("keyTab='(.*?)'");
Matcher matcher = pattern.matcher(content);
matcher.find();
System.out.println(matcher.group(1)); //<some value>
}
}
Something that will work in most regex engine is to look for both the thing you want and the thing before it.
And put the thing you want in a capture group
This regex will put what's between the quotes in capture group \1
\bkeyTab=\'([^\']*)\'
The \b is a word boundary to make sure keyTab isn't part of a larger word.
You can use this expression to find it:
keyTab='(.*?)'
It will find all the values around keyTab='...', but will only capture what is between the quotes.
[\n\r].*keyTab='\s*([^\n\r]*)'
Your desired match will be in capture group 1.
Related
I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.
String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]
String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];
// checker becomes machine
My goal is to parse that text string and just return back machine. Which is what I did in the code above.
But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?
Use a regex' lookbehind:
(?<=\bid=)[^],]*
See Regex101.
(?<= ) // Start matching only after what matches inside
\bid= // Match "\bid=" (= word boundary then "id="),
[^],]* // Match and keep the longest sequence without any ']' or ','
In Java, use it like this:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
This results in
machine
Assuming you’re using the Polarion ALM API, you should use the EnumOption’s getId method instead of deparsing and re-parsing the value via a string:
String id = test.get(i).getId();
Using the replace and split functions don't take the structure of the data into account.
If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ].
The value of id will be in capture group 1.
\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]
Explanation
\bEnumOption Match EnumOption preceded by a word boundary
\[enumId= Match [enumId=
[^=,\]]+, Match 1+ times any char except = , and ]
id= Match literally
( Capture group 1
[^\]]+ Match 1+ times any char except ]
)\]
Regex demo | Java demo
Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
machine
If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.
\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]
In Java
String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";
Regex demo
A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.
I would advise you not use any regex that you did not come up with yourself, or at least understand completely.
PS: I think your solution is actually quite readable.
Here's another non-regex version:
String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);
Not doing you a favor, but the downvote hurt, so here you go:
String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
throw new RuntimeException("unexpected input: " + input);
}
System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));
I am trying to create a regular expression that takes a file of name
"abcd_04-04-2020.txt" or "abcd_04-04-2020.txt.gz"
How can I handle the "OR" condition for the extension. This is what I have so far
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3})")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
This handles only the .txt. How can I handle ".txt.gz"
Thanks
Why not just use endsWith instead complex regex
if(fileName.endsWith(".txt") || fileName.endsWith(".txt.gz")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
You can use the below regex to achieve your purpose:
^[\w-]+\d{2}-\d{2}-\d{4}\.txt(?:\.gz)?$
Explanation of the above regex:]
^,$ - Matches start and end of the test string resp.
[\w-]+ - Matches word character along with hyphen one or more times.
\d{} - Matches digits as many numbers as mentioned in the curly braces.
(?:\.gz)? - Represents non-capturing group matching .gz zero or one time because of ? quantifier. You could have used | alternation( or as you were expecting OR) but this is legible and more efficient too.
You can find the demo of the above regex here.
IMPLEMENTATION IN JAVA:
import java.util.regex.*;
public class Main
{
private static final Pattern pattern = Pattern.compile("^[\\w-]+\\d{2}-\\d{2}-\\d{4}\\.txt(?:\\.gz)?$", Pattern.MULTILINE);
public static void main(String[] args) {
String testString = "abcd_04-04-2020.txt\nabcd_04-04-2020.txt.gz\nsomethibsnfkns_05-06-2020.txt\n.txt.gz";
Matcher matcher = pattern.matcher(testString);
while(matcher.find()){
System.out.println(matcher.group(0));
}
}
}
You can find the implementation of the above regex in java in here.
NOTE: If you want to match for valid dates also; please visit this.
You can replace .[a-zA-Z]{3} with .txt(\.gz)
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}).txt(\.gz)?")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
? will work for your required | . Try adding
(.[a-zA-Z]{2})?
to your original regex
([\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3}(.[a-zA-Z]{2})?)
A possible way of doing it:
Pattern pattern = Pattern.compile("^[\\w._-]+_\\d{2}-\\d{2}-\\d{4}(\\.txt(\\.gz)?)$");
Then you can run the following test:
String[] fileNames = {
"abcd_04-04-2020.txt",
"abcd_04-04-2020.tar",
"abcd_04-04-2020.txt.gz",
"abcd_04-04-2020.png",
".txt",
".txt.gz",
"04-04-2020.txt"
};
Arrays.stream(fileNames)
.filter(fileName -> pattern.matcher(fileName).find())
.forEach(System.out::println);
// output
// abcd_04-04-2020.txt
// abcd_04-04-2020.txt.gz
I think what you want (following from the direction you were going) is this:
[\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.[a-zA-Z]{3}(?:$|\\.[a-zA-Z]{2}$)
At the end, I have a conditional statement. It has to either match the end of the string ($) OR it has to match a literal dot followed by 2 letters (\\.[a-zA-Z]{2}). Remember to escape the ., because in regex . means "match any character".
I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text] and need to get at the text. Strings could look like: something &[text] &[text] anything &[text] etc.
I have tried the following regex but I cannot seem to get it to work: &\[(.*)\]
Any help would be greatly appreciated.
Brackets are a bit tricky regarding escaping. Try this:
Pattern r = Pattern.compile("&\\[([^\\]]*)\\]");
Matcher m = r.matcher("foo &[bla] [foo] &[blub]&[blab]");
while (m.find()) {
System.out.println("Found value: " + m.group(1));
}
I replaced your dot with a group of any sign that is not a closing bracket. The star operator would otherwise greedily match until the very end of the string. You could also suppress the greedy matching with a question mark, this reads even better: "&\\[(.*?)\\]"
Two things you need to do:
Double escape your square brackets
Prevent the capture group from matching other occurrences of the pattern, by preventing it from matching an opening or a closing bracket
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "&[test] something ] something &[test2]";
Pattern pattern = Pattern.compile("&\\[([^\\[\\]]*)\\]");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println("capture group: " + matcher.group(1));
}
}
}
I have a String which contains a set of properties like the following one:
"T=Junior Developer, DNQ=13346057, SURNAME=Doe, GIVENNAME=John, SERIALNUMBER=UK"
Is there a Regular Expression which can be used in Java to gather the individual properties (such as the SURNAME) ?
Thanks
This small example shows how to access the property name and its value in your example string. This is for all properties and values in the string.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Test {
public static void main(String[] args) {
String s = "T=Junior Developer, DNQ=13346057, SURNAME=Doe, GIVENNAME=John, SERIALNUMBER=UK";
Matcher m = Pattern.compile("(?:, )?([^=]+)\\=([^,]+)").matcher(s);
while (m.find()) {
System.out.println(m.group(1) + " - " + m.group(2));
}
}
}
Explanation of the regex:
(?:, )?([^=]+)\\=([^,]+)
(?:, )? is a non-capturing group that can, but does not have to occur. It matches the seperation by the comma and space between the property-value pairs.
([^=]+) is a group that matches one or more characters until a = appears.
\\= matches the =. It is a special character and thus has to be escaped.
([^,]+) matches one or more characters up to the next ,, when the next propery will start.
SURNAME=[^,]+
You can use this.Or to be safer you can also use
SURNAME=.*?(?=,\s)
The second one will work even if you have , in your surname.
I have a directory like this and I am trying to extract the word "photon" from just before "photon.exe".
C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)
My code looks like this:
String path = "C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)";
Pattern pattern = Pattern.compile(".+\\\\(.+).exe");
Matcher matcher = pattern.matcher(path);
System.out.println(matcher.group(1));
No matter what permutations I try I keep getting IllegalStateExceptions etc, despite this regular expression working on http://www.regexplanet.com/simple/index.html.
Thanks in advance for any help. I am super frustrated at this point >.<
You need to actually run the matcher:
if ( matcher.find() ) {
System.out.println(matcher.group(1));
}
Note that I use matcher.find() above instead of matcher.matches() because your regex is not set up to match the entire string (it won't match the (Suspended... part). Since that's the case, you don't really need the preamble to the slash; \\\\(.+).exe should work fine.
Of course, this is mentioned in the documentation for group(int):
Throws:
IllegalStateException - If no match has yet been attempted, or if the previous match operation failed
you can use the following regular expression: ^.*\\(.*)\.exe.*$ and the file name will be in the first match group. Here is an example.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main
{
public static void main(final String[] args)
{
final String input = args[0];
final Pattern pattern = Pattern.compile("^.*\\\\(.*)\\.exe.*$");
final Matcher matcher = pattern.matcher(input);
if (matcher.find())
{
System.out.println("matcher.group(1) = " + matcher.group(1));
}
else
{
System.out.format("%s does not match %s\n", input, pattern.pattern());
}
}
}
run it with C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running) as the input and here is the expected output:
matcher.group(1) = photon
(new java.io.File("C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)")).getName().split("\\.")[0];
Try this regex: [\\d\\w]+\\.exe
It assumes the executable only has digits and letters.
Another option is to use .+\\.exe to get the full file name and use substring and lastIndexOf('\') to get the file name.
You can also use new File(fullFilePath).getFileName() which is a more correct way to do it as it will save you the substring - but I don't know if it has better performance.