Split query to match a specific pattern - regex or parser?

Split query to match a specific pattern - regex or parser? - java

I'm working with API that does not allow in specific scenarios to use "OR" or "AND" in queries.
So I have to split the query(String) and send one by one which is fine but it has to be with three brackets at start and end of the string, which is causing me troubles.
So the string I have to split looks like this:
"WHERE (((City LIKE 'Japan') or (Id IN ('555666','666555, 88811888')))) LIMIT 10000"
API has built in methods that I have to use in order to send a query, above string should be separated in two and look like:
1. (((City LIKE 'Japan')))
2. (((Id IN ('555666','666555, 88811888'))))
I'm not very familiar with regex, but I did try to remove all brackets to get clean string without them and then just surround String with 3 brackets from each side, which is not working well for obviously example 2, it deletes the brackets that surround IDs. So I assume regex is not the best solution, but I'm not really sure how to properly create parser for this. Any help would be nice!
EDIT:
Example of code with regex that is removing brackets:
String condition = query.replaceAll("[\\[\\](){}]","").replace("WHERE", "").trim();
return "(((" + condition + ")))";

In simple scenarios you can use following code:
List<String> splitQuery(String input) {
Matcher m = Pattern.compile("\\({2}(.+)\\){2}").matcher(input);
if (m.find()) {
return Pattern.compile("(or|and)", Pattern.CASE_INSENSITIVE)
// Here m.group(1) extracts substring between "((" and "))"
// Splitting m.group(1) by "(or|and)" gives you N tokens
.splitAsStream(m.group(1))
// And then you encapsulate each token in "((token))"
.map(token -> "((" + token.trim() + "))")
.collect(Collectors.toList());
} else {
return Collections.emptyList();
}
}
Usage:
List<String> result = splitQuery("WHERE (((City LIKE 'Japan') or (Id IN ('555666','666555, 88811888')))) LIMIT 10000");
result.forEach(System.out::println);
Output:
(((City LIKE 'Japan')))
(((Id IN ('555666','666555, 88811888'))))

Related

How to parse string using regex

I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.
String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]
String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];
// checker becomes machine
My goal is to parse that text string and just return back machine. Which is what I did in the code above.
But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?

Use a regex' lookbehind:
(?<=\bid=)[^],]*
See Regex101.
(?<= ) // Start matching only after what matches inside
\bid= // Match "\bid=" (= word boundary then "id="),
[^],]* // Match and keep the longest sequence without any ']' or ','
In Java, use it like this:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
This results in
machine

Assuming you’re using the Polarion ALM API, you should use the EnumOption’s getId method instead of deparsing and re-parsing the value via a string:
String id = test.get(i).getId();

Using the replace and split functions don't take the structure of the data into account.
If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ].
The value of id will be in capture group 1.
\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]
Explanation
\bEnumOption Match EnumOption preceded by a word boundary
\[enumId= Match [enumId=
[^=,\]]+, Match 1+ times any char except = , and ]
id= Match literally
( Capture group 1
[^\]]+ Match 1+ times any char except ]
)\]
Regex demo | Java demo
Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
machine
If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.
\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]
In Java
String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";
Regex demo

A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.
I would advise you not use any regex that you did not come up with yourself, or at least understand completely.
PS: I think your solution is actually quite readable.
Here's another non-regex version:
String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);
Not doing you a favor, but the downvote hurt, so here you go:
String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
throw new RuntimeException("unexpected input: " + input);
}
System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));

How to find a String of last 2 items in colon separated string

I have a string = ab:cd:ef:gh. On this input, I want to return the string ef:gh (third colon intact).
The string apple:orange:cat:dog should return cat:dog (there's always 4 items and 3 colons).
I could have a loop that counts colons and makes a string of characters after the second colon, but I was wondering if there exists some easier way to solve it.

You can use the split() method for your string.
String example = "ab:cd:ef:gh";
String[] parts = example.split(":");
System.out.println(parts[parts.length-2] + ":" + parts[parts.length-1]);

String example = "ab:cd:ef:gh";
String[] parts = example.split(":",3); // create at most 3 Array entries
System.out.println(parts[2]);

The split function might be what you're looking for here. Use the colon, like in the documentation as your delimiter. You can then obtain the last two indexes, like in an array.

Yes, there is easier way.
First, is by using method split from String class:
String txt= "ab:cd:ef:gh";
String[] arr = example.split(":");
System.out.println(arr[arr.length-2] + " " + arr[arr.length-1]);
and the second, is to use Matcher class.

Use overloaded version of lastIndexOf(), which takes the starting index as 2nd parameter:
str.substring(a.lastIndexOf(":", a.lastIndexOf(":") - 1) + 1)

Another solution would be using a Pattern to match your input, something like [^:]+:[^:]+$. Using a pattern would probably be easier to maintain as you can easily change it to handle for example other separators, without changing the rest of the method.
Using a pattern is also likely be more efficient than String.split() as the latter is also converting its parameter to a Pattern internally, but it does more than what you actually need.
This would give something like this:
String example = "ab:cd:ef:gh";
Pattern regex = Pattern.compile("[^:]+:[^:]+$");
final Matcher matcher = regex.matcher(example);
if (matcher.find()) {
// extract the matching group, which is what we are looking for
System.out.println(matcher.group()); // prints ef:gh
} else {
// handle invalid input
System.out.println("no match");
}
Note that you would typically extract regex as a reusable constant to avoid compiling the pattern every time. Using a constant would also make the pattern easier to change without looking at the actual code.

How can I check if ArrayMap.keySet() contains a certain variable + Regex?

I have an ArrayMap, of which the keys are something like tag - randomWord. I want to check if the tag part of the key matches a certain variable.
I have tried messing around with Patterns, but to no success. The only way I can get this working at this moment, is iterating through all the keys in a for loop, then splitting the key on ' - ', and getting the first value from that, to compare to my variable.
for (String s : testArray) {
if ((s.split("(\\s)(-)(\\s)(.*)")[0]).equals(variableA)) {
// Do stuff
}
}
This seems very devious to me, especially since I only need to know if the keySet contains the variable, that's all I'm interested in. I was thinking about using the contains() method, and put in (variableA + "(\\s)(-)(\\s)(.*)"), but that doesn't seem to work.
Is there a way to use the .contains() method for this case, or do I have to loop the keys manually?

You should split these tasks into two steps - first extract the tag, then compare it. Your code should look something like this:
for (String s : testArray) {
if (arrayMap. keySet().contains(extractTag(s)) {
// Do stuff
}
}
Notice that we've separated our concerns into two steps, making it easier to verify each step behaves correctly individually. So now the question is "How do we implement extractTag()?"
The ( ) symbols in a regular expression create a group match, which you can retrieve via Matcher.group() - if you only care about tag you could use a Pattern like so:
"(\\S+)\\s-\\s.*"
In which case your extractTag() method would look like:
private static final Pattern TAG_PATTERN = Pattern.compile("(\\S+)\\s-\\s.*");
private static String extractTag(String s) {
Matcher m = TAG_PATTERN.matcher(s);
if (m.matches()) {
return m.group(1);
}
throw new IllegalArgumentException(
"'" + s + "' didn't match " TAG_PATTERN.pattern());
}
If you'd rather use String.split() you just need to define a regular expression that matches the delimiter, in this case -; you could use the following regular expression in a split() call:
"\\s-\\s"
It's often a good idea to use + after \\s to support one or more spaces, but it depends on what inputs you need to process. If you know it's always exactly one-space-followed-by-one-dash-followed-by-one-space, you could just split on:
" - "
In which case your extractTag() method would look like:
private static String extractTag(String s) {
String[] parts = s.split(" - ");
if (parts.length > 1) {
return s[0];
}
throw new IllegalArgumentException("Could not extract tag from '" + s + "'");
}

Replace text with data & matched group contents

I don't believe I saw this when searching (believe me, I spent a good amount of time searching for this) for a solution to this so here goes.
Goal:
Match regex in a string and replace it with something that contains the matched value.
Regex used currently:
\b(Connor|charries96|Foo|Bar)\b
For the record I suck at regex incase this isn't the best way to do it.
My current code (and several other methods I tried) can only replace the text with the first match it encounters if there are multiple matches.
private Pattern regexFromList(List<String> input) {
if(input.size() < 1) {
return "";
}
StringBuilder builder = new StringBuilder();
builder.append("\\b");
builder.append("(");
for(String s : input) {
builder.append(s);
if(!s.equals(input.get(input.size() - 1)))
{
builder.append("|");
}
}
builder.append(")");
builder.append("\\b");
return Pattern.compile(builder.toString(), Pattern.CASE_INSENSITIVE);
}
Example input:
charries96's name is Connor.
Example result using TEST as the data to prepend the match with
TESTcharries96's name is TESTcharries96.
Desired result using example input:
TESTcharries96's name is TESTConnor.
Here is my current code for replacing the text:
if(highlight) {
StringBuilder builder = new StringBuilder();
Matcher match = pattern.matcher(event.getMessage());
String string = event.getMessage();
if (match.find()) {
string = match.replaceAll("TEST" + match.group());
// I do realise I'm using #replaceAll but that's mainly given it gives me the same result as other methods so why not just cut to the chase.
}
builder.append(string);
return builder.toString();
}
EDIT:
Working example of desired result on RegExr

There are a few problems here:
You are taking the user input as is and build the regex:
builder.append(s);
If there are special character in the user input, it might be recognized as meta character and cause unexpected behavior.
Always use Pattern.quote if you want to match a string as it is passed in.
builder.append(Pattern.quote(s));
Matcher.replaceAll is a high level function which resets the Matcher (start the match all over again), and search for all the matches and perform the replacement. In your case, it can be as simple as:
String result = match.replaceAll("TEST$1");
The StringBuilder should be thrown away along with the if statement.
Matcher.find, Matcher.group are lower level functions for fine grain control on what you want to do with a match.
When you perform replacement, you need to build the result with Matcher.appendReplacement and Matcher.appendTail.
A while loop (instead of if statement) should be used with Matcher.find to search for and perform replacement for all matched.

Regular expression, value in between quotes

I'm having a little trouble constructing the regular expression using java.
The constraint is, I need to split a string seperated by !. The two strings will be enclosed in double quotes.
For example:
"value"!"value"
If I performed a java split() on the string above, I want to get:
value
value
However the catch is value can be any characters/punctuations/numerical character/spaces/etc..
So here's a more concrete example. Input:
""he! "l0"!"wor!"d1"
Java's split() should return:
"he! "l0
wor!"d1
Any help is much appreciated. Thanks!

Try this expression: (".*")\s*!\s*(".*")
Although it would not work with split, it should work with Pattern and Matcher and return the 2 strings as groups.
String input = "\" \"he\"\"\"\"! \"l0\" ! \"wor!\"d1\"";
Pattern p = Pattern.compile("(\".*\")\\s*!\\s*(\".*\")");
Matcher m = p.matcher(input);
if(m.matches())
{
String s1 = m.group(1); //" "he""""! "l0"
String s2 = m.group(2); //"wor!"d1"
}
Edit:
This would not work for all cases, e.g. "he"!"llo" ! "w" ! "orld" would get the wrong groups. In that case it would be really hard to determine which ! should be the separator. That's why often rarely used characters are used to separate parts of a string, like # in email addresses :)

have the value split on "!" instead of !
String REGEX = "\"!\"";
String INPUT = "\"\"he! \"l0\"!\"wor!\"d1\"";
String[] items = p.split(INPUT);

It feels like you need to parse on:
DOUBLEQUOTE = "
OTHER = anything that isn't a double quote
EXCLAMATION = !
ITEM = (DOUBLEQUOTE (OTHER | (DOUBLEQUOTE OTHER DOUBLEQUOTE))* DOUBLEQUOTE
LINE = ITEM (EXCLAMATION ITEM)*
It feels like it's possible to create a regular expression for the above (assuming the double quotes in an ITEM can't be nested even further) BUT it might be better served by a very simple grammer.
This might work... excusing missing escapes and the like
^"([^"]*|"[^"]*")*"(!"([^"]*|"[^"]*")*")*$
Another option would be to match against the first part, then, if there's a !and more, prune off the ! and keep matching (excuse the no-particular-language, I'm just trying to illustrate the idea):
resultList = []
while(string matches \^"([^"]*|"[^"]*")*(.*)$" => match(1)) {
resultList += match
string = match(2)
if(string.beginsWith("!")) {
string = string[1:end]
} elseif(string.length > 0) {
// throw an error, since there was no exclamation and the string isn't done
}
}
if(string.length > 0) {
// throw an exception since the string isn't done
}
resultsList == the list of items in the string
EDIT: I realized that my answer doesn't really work. You can have a single doublequote inside the strings, as well as exclamation marks. As such, you really CAN'T have "!" inside one of the strings. As such, the idea of 1) pull quotes off the ends, 2) split on '"!"' is really the right way to go.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split query to match a specific pattern - regex or parser? - java

Related

How to parse string using regex

How to find a String of last 2 items in colon separated string

How can I check if ArrayMap.keySet() contains a certain variable + Regex?

Replace text with data & matched group contents

Regular expression, value in between quotes

Categories

Resources