Convert this pattern to regex for Pattern.matches(..) - java

Some of my strings may contain a substring that looks like #[alph4Num3ric-alph4Num3ric] , where I will find the alpha numberic id and replace it with a corresponding text value mapped to the associated key in a map.
My first inclination was to check if my string.contains("#[") but I want to be more specific
so now I am looking at Pattern.matches( but am unsure of the regex and total expression
how would I regex for #[ ...... - .... ] in the Pattern.matches method, it must also account for dashes. So I'm not sure what needs to be escaped in this syntax or wildcarded, or more.
I am also not 100% sure if this is the best message. I want to get a boolean from Pattern.matches first, and then get the real value and modify the string with those values, which seems good enough, but I want to minimize computations.

Plese try this ,
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
// TODO Auto-generated method stub
String expression = "String contains #[alph4Num3ric-alph4Num3ric] as substring";
Pattern pattern = Pattern
.compile("\\#\\[([a-zA-Z0-9]+)-([a-zA-Z0-9]+)\\]");
Matcher matcher = pattern.matcher(expression);
while (matcher.find()) {
System.out.println("matched: "+matcher.group());
System.out.println("group1: "+matcher.group(1));
System.out.println("group2: "+matcher.group(2));
System.out
.println("after replace "+expression.replace(matcher.group(1), "customkey"));
}
}
}
output :
matched: #[alph4Num3ric-alph4Num3ric]
group1: alph4Num3ric
group2: alph4Num3ric
after replace: String contains #[customkey-customkey] as substring

Try using this:
/#[(a-zA-Z0-9-)+]/
I haven't given it a try but hope this would help. Also if it returns an error then add a backward slash between 9 and - e.g. /#[(a-zA-Z0-9-)+]/

Related

Need regex help - How to extract a String using regex?

I am looking for regex to extract a string from another string.
"sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule REQUIRED
storeKey=true principal='test#test.net' validate=true serviceName=esaas
keyTab='<some value>' useKeyTab=true;"
How to I extract the string after keyTab= I want to retrieve the value inside the single quotes -
Use the regex keyTab='(.*?)' and match the group 1. In java, your code should look like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
String content = "\"sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule REQUIRED \r\n" +
"storeKey=true principal='test#test.net' validate=true serviceName=esaas \r\n" +
"keyTab='<some value>' useKeyTab=true;\"";
Pattern pattern = Pattern.compile("keyTab='(.*?)'");
Matcher matcher = pattern.matcher(content);
matcher.find();
System.out.println(matcher.group(1)); //<some value>
}
}
Something that will work in most regex engine is to look for both the thing you want and the thing before it.
And put the thing you want in a capture group
This regex will put what's between the quotes in capture group \1
\bkeyTab=\'([^\']*)\'
The \b is a word boundary to make sure keyTab isn't part of a larger word.
You can use this expression to find it:
keyTab='(.*?)'
It will find all the values around keyTab='...', but will only capture what is between the quotes.
[\n\r].*keyTab='\s*([^\n\r]*)'
Your desired match will be in capture group 1.

Regex matching word that is in the middle of any character except a letter

I'd like to know how to detect word that is between any characters except a letter from alphabet. I need this, because I'm working on a custom import organizer for Java. This is what I have already tried:
The regex expression:
[^(a-zA-Z)]InitializationEvent[^(a-zA-Z)]
I'm searching for the word "InitializationEvent".
The code snippet I've been testing on:
public void load(InitializationEvent event) {
It looks like adding space before the word helps... is the parenthesis inside of alphabet range?
I tested this in my program and it didn't work. Also I checked it on regexr.com, showing same results - class name not recognized.
Am I doing something wrong? I'm new to regex, so it might be a really basic mistake, or not. Let me know!
Lose the parentheses:
[^a-zA-Z]InitializationEvent[^a-zA-Z]
Inside [], parentheses are taken literally, and by inverting the group (^) you prevent it from matching because a ( is preceding InitializationEvent in your string.
Note, however, that the above regex will only match if InitializationEvent is neither at the beginning nor at the end of the tested string. To allow that, you can use:
(^|[^a-zA-Z])InitializationEvent([^a-zA-Z]|$)
Or, without creating any matching groups (which is supposed to be cleaner, and perform better):
(?:^|[^a-zA-Z])InitializationEvent(?:[^a-zA-Z]|$)
how to detect word that is between any characters except a letter from alphabet
This is the case where lookarounds come handy. You can use:
(?<![a-zA-Z])InitializationEvent(?![a-zA-Z])
(?<![a-zA-Z]) is negative lookbehind to assert that there is no alphabet at previous position
(?![a-zA-Z]) is negative lookahead to assert that there is no alphabet at next position
RegEx Demo
The parentheses are causing the problem, just skip them:
"[^a-zA-Z]InitializationEvent[^a-zA-Z]"
or use the predefined non-word character class which is slightly different because it also excludes numbers and the underscore:
"\\WInitializationEvent\\W"
But as it seems you want to match a class name, this might be ok because the remaining character are exactly those that are allowed in a class name.
I'm not sure about your application but from a regexp perspective you can use negative lookaheads and negative lookbehinds to define what cannot surround the String to specify a match.
I have added the negative lookahead (?![a-zA-Z]) and the negative lookbehind (?<![a-zA-Z]) in place of your [^(a-zA-Z)] originally supplied to create: (?<![a-zA-Z])InitializationEvent(?![a-zA-Z])
Quick Fiddle I created:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld{
public static void main(String []args){
String pattern = "(?<![a-zA-Z])InitializationEvent(?![a-zA-Z])";
String sourceString = "public void load(InitializationEvent event) {";
String sourceString2 = "public void load(BInitializationEventA event) {";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(sourceString);
if (m.find( )) {
System.out.println("Found value of pattern in sourceString: " + m.group(0) );
} else {
System.out.println("NO MATCH in sourceString");
}
Matcher m2 = r.matcher(sourceString2);
if (m2.find( )) {
System.out.println("Found value of pattern in sourceString2: " + m2.group(0) );
} else {
System.out.println("NO MATCH in sourceString2");
}
}
}
output:
sh-4.3$ java -Xmx128M -Xms16M HelloWorld
Found value of pattern in sourceString: InitializationEvent
NO MATCH in sourceString2
You seem really close:
[^(a-zA-Z)]*(InitializationEvent)[^(a-zA-Z)]*
I think this is what you are looking for. The asterisk provides a match for zero or many of the character or group before it.
EDIT/UPDATE
My apologies on the initial response.
[^a-zA-Z]+(InitializationEvent)[^a-zA-Z]+
My regex is a little rusty, but this will match on any non-alphabet character one or many times prior to the InitializationEvent and after.

Java Regex is including new line in match

I'm trying to match a regular expression to textbook definitions that I get from a website.
The definition always has the word with a new line followed by the definition. For example:
Zither
Definition: An instrument of music used in Austria and Germany It has from thirty to forty wires strung across a shallow sounding board which lies horizontally on a table before the performer who uses both hands in playing on it Not to be confounded with the old lute shaped cittern or cithern
In my attempts to get just the word (in this case "Zither") I keep getting the newline character.
I tried both ^(\w+)\s and ^(\S+)\s without much luck. I thought that maybe ^(\S+)$ would work, but that doesn't seem to successfully match the word at all. I've been testing with rubular, http://rubular.com/r/LPEHCnS0ri; which seems to successfully match all my attempts the way I want, despite the fact that Java doesn't.
Here's my snippet
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\S+)$");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group();
terms.add(new SearchTerm(result, System.nanoTime()));
}
This is easily solved by triming the resulting string, but that seems like it should be unnecessary if I'm already using a regular expression.
All help is greatly appreciated. Thanks in advance!
Try using the Pattern.MULTILINE option
Pattern rgx = Pattern.compile("^(\\S+)$", Pattern.MULTILINE);
This causes the regex to recognise line delimiters in your string, otherwise ^ and $ just match the start and end of the string.
Although it makes no difference for this pattern, the Matcher.group() method returns the entire match, whereas the Matcher.group(int) method returns the match of the particular capture group (...) based on the number you specify. Your pattern specifies one capture group which is what you want captured. If you'd included \s in your Pattern as you wrote you tried, then Matcher.group() would have included that whitespace in its return value.
With regular expressions the first group is always the complete matching string. In your case you want group 1, not group 0.
So changing mtch.group() to mtch.group(1) should do the trick:
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\w+)\s");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group(1);
terms.add(new SearchTerm(result, System.nanoTime()));
}
A late response, but if you are not using Pattern and Matcher, you can use this alternative of DOTALL in your regex string
(?s)[Your Expression]
Basically (?s) also tells dot to match all characters, including line breaks
Detailed information: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Just replace:
String result = mtch.group();
By:
String result = mtch.group(1);
This will limit your output to the contents of the capturing group (e.g. (\\w+)) .
Try the next:
/* The regex pattern: ^(\w+)\r?\n(.*)$ */
private static final REGEX_PATTERN =
Pattern.compile("^(\\w+)\\r?\\n(.*)$");
public static void main(String[] args) {
String input = "Zither\n Definition: An instrument of music";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1 = $2")
); // prints "Zither = Definition: An instrument of music"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1")
); // prints "Zither"
}

How to parse the meta keywords of chinese with java regex?

I have code below, but it looks it parse keyword in wrong way for chinese. How can i change it?
OUTPUT:
keyword:test
keyword:中
keyword:文
keyword:U
keyword:I
keyword:素
keyword:材
Should be below:
keyword:test
keyword:中文
keyword:UI
keyword:素材
This is my code:
public class test {
public static final Pattern KEYWORDS_REGEX =
Pattern.compile("[^\\s,](?:[^,]+[^\\s,])?");
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String keywords = "test, 中文, UI, 素材";
Matcher matcher = KEYWORDS_REGEX.matcher(keywords);
while (matcher.find()) {
String s = matcher.group();
System.out.println("keyword:" +s);
}
}
Thanks!
The problem isn't with Chinese characters, the problem is with keywords that are two characters long. (That's why it affects UI as well.) This regex:
[^\s,](?:[^,]+[^\s,])?
allows two possibilities:
[^\s,] <-- exactly one character
[^\s,][^,]+[^\s,] <-- three or more characters
so any keywords with two characters will not match, so they get split into single-character keywords.
You could fix your regex by changing [^,]+ to [^,]*, but I'm inclined to agree with the spirit of Kisaro's comment above; I think you'd be better off using Pattern.split:
private static final KEYWORD_SPLITTER = Pattern.compile("\\s*,\\s*");
for(final String s : KEYWORD_SPLITTER.split(keywords))
System.out.println("keyword:" + s);
Your regex should be \\w to match words. This should generate the desired output.
Also since someone suggested explode: Apache Commons

Using Java's Regex to extract a word from a path name

I have a directory like this and I am trying to extract the word "photon" from just before "photon.exe".
C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)
My code looks like this:
String path = "C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)";
Pattern pattern = Pattern.compile(".+\\\\(.+).exe");
Matcher matcher = pattern.matcher(path);
System.out.println(matcher.group(1));
No matter what permutations I try I keep getting IllegalStateExceptions etc, despite this regular expression working on http://www.regexplanet.com/simple/index.html.
Thanks in advance for any help. I am super frustrated at this point >.<
You need to actually run the matcher:
if ( matcher.find() ) {
System.out.println(matcher.group(1));
}
Note that I use matcher.find() above instead of matcher.matches() because your regex is not set up to match the entire string (it won't match the (Suspended... part). Since that's the case, you don't really need the preamble to the slash; \\\\(.+).exe should work fine.
Of course, this is mentioned in the documentation for group(int):
Throws:
IllegalStateException - If no match has yet been attempted, or if the previous match operation failed
you can use the following regular expression: ^.*\\(.*)\.exe.*$ and the file name will be in the first match group. Here is an example.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main
{
public static void main(final String[] args)
{
final String input = args[0];
final Pattern pattern = Pattern.compile("^.*\\\\(.*)\\.exe.*$");
final Matcher matcher = pattern.matcher(input);
if (matcher.find())
{
System.out.println("matcher.group(1) = " + matcher.group(1));
}
else
{
System.out.format("%s does not match %s\n", input, pattern.pattern());
}
}
}
run it with C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running) as the input and here is the expected output:
matcher.group(1) = photon
(new java.io.File("C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)")).getName().split("\\.")[0];
Try this regex: [\\d\\w]+\\.exe
It assumes the executable only has digits and letters.
Another option is to use .+\\.exe to get the full file name and use substring and lastIndexOf('\') to get the file name.
You can also use new File(fullFilePath).getFileName() which is a more correct way to do it as it will save you the substring - but I don't know if it has better performance.

Categories

Resources