Regular Expression not working in Java - java

I have the string "SELECTOR('(namespace=''ar.com.osde.reintegros'' and eventname=''generacionReintegro'')')".
I need return the string that is between "SELECTOR(" and ")", the result should be: '(namespace=''ar.com.osde.reintegros'' and eventname=''generacionReintegro'')'.
My code with the regular expression is:
public static String extractSelector(String txt){
Pattern pattern = Pattern.compile("^SELECTOR\\((.*)\\)$");
Matcher m = pattern.matcher(txt);
String s = null;
while (m.find()) {
s = m.group(1);
}
return s;
}
Where txt = "SELECTOR('(namespace=''ar.com.osde.reintegros'' and eventname=''generacionReintegro'')')"
But this always return null. Why is that?

This jdoodle demonstrates that that part of the code works (only added a declaration for txt and removed the return statement). Are you sure there is nothing done with the txt before that?
Given your code, the only explanation is that txt does not match the pattern fully... This because the while-loop is not executed at all (thus find is always false).
Perhaps there are spaces between SELECTOR and the opening bracket (, perhaps SELECTOR can be put in lowercase. In order to debug this, you can perhaps write
System.out.println("\""+txt+"\"");
Before actually matching it, or use the debugger features of your IDE to do this. The quotes are used to make the spaces more explicit.

Related

String equal/contain none of them gets what I want

I have a string that can look somewhat like:
NCC_johjon (\users\johanjo\tomcattest\oysters\NCC_johjon, port 16001), utv_johjon (\users\johanjo\tomcattest\oysters\utv_johjon, port 16000)
and there could be like a lot of NCC_etskys, NCC_homyis and so on and I want to check if somewhere in the string there is an part that says "NCC_joh" already existing. I tried with like
if(oysters.contains("NCC_joh")){
System.out.println("HEJ HEJ HEJ HALLÅ HALLÅ HALLÅ");
}
but if there is an NCC_johjon in there it will go in the if case, but I only want to go in if exact that part exist not longer not shorter and .equal it needs to look like the whole String which is not what I want either. anyone got any idea? would be better if what I worked with were a list of Strings but I don't have that.
the oysterPaths is an Collection at first
Collection<TomcatResource> oysterPaths = TomcatResource.listCats(Paths.get(tomcatsPath));
Use regular expressions.
if (oysters.matches("(?s).*\\bNCC_joh\\b.*")) {
where
(?s) = single line mode, DOT-ALL, so . will match a newline too.
. = any char
.* = zero or more occurrences of . (any char)
\b = word boundary
String.matches does a match of the pattern over the entire string, hence the need for .* at begin and end.
(Word boundaries of course means, that between them a word has to be placed.)
This is similar to https://stackoverflow.com/a/49879388/2735286, but I would suggest to use the find method using this regular expression:
\bNCC_joh\b
Using the find method will simplify the regular expression and you will exclusively search for what is relevant.
Here is the corresponding method you can use:
public static boolean superExactMatch(String expression) {
Pattern p = Pattern.compile("\\bNCC_joh\\b", Pattern.MULTILINE);
final Matcher matcher = p.matcher(expression);
final boolean found = matcher.find();
if(found) {
// For debugging purposes to see where the match happened in the expression
System.out.println(matcher.start() + " " + matcher.end());
}
return found;
}

I can't get the url with Pattern.compile

What I really want is to return the URLs that are in the txt variable. the url comes from randomly then not are regular expreccion to use or not is part of my code this poorly written... use google translator only sorry I speak Spanish; ol
//I can't get the url with Pattern.compile
//My code example::::: in the works :(
String txt="sources: [{file:\"http://pla.cdn19.fx.rrrrrr.com/luq5t4nidtixexzw6wblbiexs7hg2hdu4coqdlltx6t3hu3knqhbfoxp7jna/normal.mp4\",label:\"360p\"}],sources: [{file:\"http://pla.cdn19.fx.rrrrrr.com/luq5t4nidtixexzw6wblbiexs7hg2hdu4coqdlltx6t3hu3knqhbfoxp7jna/normal.mp4\",label:\"360p\"}]";
ArrayList<String> getfi = new ArrayList<String>();
Matcher matcher = Pattern.compile("sources: [{file:\"(.*)\"").matcher(txt);
if (matcher.find()) {
while(matcher.find()) {
getfi.add(matcher.group(1));
}
System.out.println(getfi);
} else {
System.exit(1);
}
Pattern.compile("sources: [{file:\"(.*)\"")
You regex is wrong, since both [ and { are special characters, so they must be escaped. Which is why you get PatternSyntaxException: Unclosed character class near index 21, which you didn't mention in your question.
Also the pattern will match the entire string, except for the last two characters.
if (matcher.find()) {
while(matcher.find()) {
The find() call in the if statement consumes the first find. Since the first find is the entire text except last two characters, there is no second find for the find() call in the while loop, so loop is never entered.
To make it work, escape the special characters, change .* to not be greedy, and fix the loop:
String txt="sources: [{file:\"http://pla.cdn19.fx.rrrrrr.com/luq5t4nidtixexzw6wblbiexs7hg2hdu4coqdlltx6t3hu3knqhbfoxp7jna/normal.mp4\",label:\"360p\"}],sources: [{file:\"http://pla.cdn19.fx.rrrrrr.com/luq5t4nidtixexzw6wblbiexs7hg2hdu4coqdlltx6t3hu3knqhbfoxp7jna/normal.mp4\",label:\"360p\"}]";
Matcher matcher = Pattern.compile("sources: \\[\\{file:\"(.*?)\"").matcher(txt);
ArrayList<String> getfi = new ArrayList<String>();
while (matcher.find()) {
getfi.add(matcher.group(1));
}
if (getfi.isEmpty()) {
System.exit(1);
}
System.out.println(getfi);
WARNING:
Notice that sometimes there is a space after :, and sometimes not. That is perfectly valid for JSON. JSON text may contain whitespace, including newlines, so using a simple regex is not a good idea.
Use a JSON parser instead.

Replace text with data & matched group contents

I don't believe I saw this when searching (believe me, I spent a good amount of time searching for this) for a solution to this so here goes.
Goal:
Match regex in a string and replace it with something that contains the matched value.
Regex used currently:
\b(Connor|charries96|Foo|Bar)\b
For the record I suck at regex incase this isn't the best way to do it.
My current code (and several other methods I tried) can only replace the text with the first match it encounters if there are multiple matches.
private Pattern regexFromList(List<String> input) {
if(input.size() < 1) {
return "";
}
StringBuilder builder = new StringBuilder();
builder.append("\\b");
builder.append("(");
for(String s : input) {
builder.append(s);
if(!s.equals(input.get(input.size() - 1)))
{
builder.append("|");
}
}
builder.append(")");
builder.append("\\b");
return Pattern.compile(builder.toString(), Pattern.CASE_INSENSITIVE);
}
Example input:
charries96's name is Connor.
Example result using TEST as the data to prepend the match with
TESTcharries96's name is TESTcharries96.
Desired result using example input:
TESTcharries96's name is TESTConnor.
Here is my current code for replacing the text:
if(highlight) {
StringBuilder builder = new StringBuilder();
Matcher match = pattern.matcher(event.getMessage());
String string = event.getMessage();
if (match.find()) {
string = match.replaceAll("TEST" + match.group());
// I do realise I'm using #replaceAll but that's mainly given it gives me the same result as other methods so why not just cut to the chase.
}
builder.append(string);
return builder.toString();
}
EDIT:
Working example of desired result on RegExr
There are a few problems here:
You are taking the user input as is and build the regex:
builder.append(s);
If there are special character in the user input, it might be recognized as meta character and cause unexpected behavior.
Always use Pattern.quote if you want to match a string as it is passed in.
builder.append(Pattern.quote(s));
Matcher.replaceAll is a high level function which resets the Matcher (start the match all over again), and search for all the matches and perform the replacement. In your case, it can be as simple as:
String result = match.replaceAll("TEST$1");
The StringBuilder should be thrown away along with the if statement.
Matcher.find, Matcher.group are lower level functions for fine grain control on what you want to do with a match.
When you perform replacement, you need to build the result with Matcher.appendReplacement and Matcher.appendTail.
A while loop (instead of if statement) should be used with Matcher.find to search for and perform replacement for all matched.

Java Regex is including new line in match

I'm trying to match a regular expression to textbook definitions that I get from a website.
The definition always has the word with a new line followed by the definition. For example:
Zither
Definition: An instrument of music used in Austria and Germany It has from thirty to forty wires strung across a shallow sounding board which lies horizontally on a table before the performer who uses both hands in playing on it Not to be confounded with the old lute shaped cittern or cithern
In my attempts to get just the word (in this case "Zither") I keep getting the newline character.
I tried both ^(\w+)\s and ^(\S+)\s without much luck. I thought that maybe ^(\S+)$ would work, but that doesn't seem to successfully match the word at all. I've been testing with rubular, http://rubular.com/r/LPEHCnS0ri; which seems to successfully match all my attempts the way I want, despite the fact that Java doesn't.
Here's my snippet
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\S+)$");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group();
terms.add(new SearchTerm(result, System.nanoTime()));
}
This is easily solved by triming the resulting string, but that seems like it should be unnecessary if I'm already using a regular expression.
All help is greatly appreciated. Thanks in advance!
Try using the Pattern.MULTILINE option
Pattern rgx = Pattern.compile("^(\\S+)$", Pattern.MULTILINE);
This causes the regex to recognise line delimiters in your string, otherwise ^ and $ just match the start and end of the string.
Although it makes no difference for this pattern, the Matcher.group() method returns the entire match, whereas the Matcher.group(int) method returns the match of the particular capture group (...) based on the number you specify. Your pattern specifies one capture group which is what you want captured. If you'd included \s in your Pattern as you wrote you tried, then Matcher.group() would have included that whitespace in its return value.
With regular expressions the first group is always the complete matching string. In your case you want group 1, not group 0.
So changing mtch.group() to mtch.group(1) should do the trick:
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\w+)\s");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group(1);
terms.add(new SearchTerm(result, System.nanoTime()));
}
A late response, but if you are not using Pattern and Matcher, you can use this alternative of DOTALL in your regex string
(?s)[Your Expression]
Basically (?s) also tells dot to match all characters, including line breaks
Detailed information: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Just replace:
String result = mtch.group();
By:
String result = mtch.group(1);
This will limit your output to the contents of the capturing group (e.g. (\\w+)) .
Try the next:
/* The regex pattern: ^(\w+)\r?\n(.*)$ */
private static final REGEX_PATTERN =
Pattern.compile("^(\\w+)\\r?\\n(.*)$");
public static void main(String[] args) {
String input = "Zither\n Definition: An instrument of music";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1 = $2")
); // prints "Zither = Definition: An instrument of music"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1")
); // prints "Zither"
}

Regular expression, value in between quotes

I'm having a little trouble constructing the regular expression using java.
The constraint is, I need to split a string seperated by !. The two strings will be enclosed in double quotes.
For example:
"value"!"value"
If I performed a java split() on the string above, I want to get:
value
value
However the catch is value can be any characters/punctuations/numerical character/spaces/etc..
So here's a more concrete example. Input:
""he! "l0"!"wor!"d1"
Java's split() should return:
"he! "l0
wor!"d1
Any help is much appreciated. Thanks!
Try this expression: (".*")\s*!\s*(".*")
Although it would not work with split, it should work with Pattern and Matcher and return the 2 strings as groups.
String input = "\" \"he\"\"\"\"! \"l0\" ! \"wor!\"d1\"";
Pattern p = Pattern.compile("(\".*\")\\s*!\\s*(\".*\")");
Matcher m = p.matcher(input);
if(m.matches())
{
String s1 = m.group(1); //" "he""""! "l0"
String s2 = m.group(2); //"wor!"d1"
}
Edit:
This would not work for all cases, e.g. "he"!"llo" ! "w" ! "orld" would get the wrong groups. In that case it would be really hard to determine which ! should be the separator. That's why often rarely used characters are used to separate parts of a string, like # in email addresses :)
have the value split on "!" instead of !
String REGEX = "\"!\"";
String INPUT = "\"\"he! \"l0\"!\"wor!\"d1\"";
String[] items = p.split(INPUT);
It feels like you need to parse on:
DOUBLEQUOTE = "
OTHER = anything that isn't a double quote
EXCLAMATION = !
ITEM = (DOUBLEQUOTE (OTHER | (DOUBLEQUOTE OTHER DOUBLEQUOTE))* DOUBLEQUOTE
LINE = ITEM (EXCLAMATION ITEM)*
It feels like it's possible to create a regular expression for the above (assuming the double quotes in an ITEM can't be nested even further) BUT it might be better served by a very simple grammer.
This might work... excusing missing escapes and the like
^"([^"]*|"[^"]*")*"(!"([^"]*|"[^"]*")*")*$
Another option would be to match against the first part, then, if there's a !and more, prune off the ! and keep matching (excuse the no-particular-language, I'm just trying to illustrate the idea):
resultList = []
while(string matches \^"([^"]*|"[^"]*")*(.*)$" => match(1)) {
resultList += match
string = match(2)
if(string.beginsWith("!")) {
string = string[1:end]
} elseif(string.length > 0) {
// throw an error, since there was no exclamation and the string isn't done
}
}
if(string.length > 0) {
// throw an exception since the string isn't done
}
resultsList == the list of items in the string
EDIT: I realized that my answer doesn't really work. You can have a single doublequote inside the strings, as well as exclamation marks. As such, you really CAN'T have "!" inside one of the strings. As such, the idea of 1) pull quotes off the ends, 2) split on '"!"' is really the right way to go.

Categories

Resources