Extracting Capture Group from Non-Capture Group in Java

Extracting Capture Group from Non-Capture Group in Java - java

I have a string, let's call it output, that's equals the following:
ltm data-group internal str_testclass {
records {
baz {
data "value 1"
}
foobar {
data "value 2"
}
topaz {}
}
type string
}
And I'm trying to extract the substring between the quotes for a given "record" name. So given foobar I want to extract value 2. The substring I want to extract will always come in the form I have prescribed above, after the "record" name, a whitespace, an open bracket, a new line, whitespace, the string data, and then the substring I want to capture is between the quotes from there. The one exception is when there is no value, which will always happen like I have prescribed above with topaz, in which case after the "record" name there will just be an open and closed bracket and I'd just like to get an empty string for this. How could I write a line of Java to capture this? So far I have ......
String myValue = output.replaceAll("(?:foobar\\s{\n\\s*data "([^\"]*)|()})","$1 $2");
But I'm not sure where to go from here.

Let's start extracting "records" structure with following regex ltm\s+data-group\s+internal\s+str_testclass\s*\{\s*records\s*\{\s*(?<records>([^\s}]+\s*\{\s*(data\s*"[^"]*")?\s*\}\s*)*)\}\s*type\s*string\s*\}
Then from "records" group, just find for sucessive match against [^\s}]+\s*\{\s*(?:data\s*"(?<data>[^"]*)")?\s*\}\s*. The "data" group contains what's you're looking for and will be null in "topaz" case.
Java strings:
"ltm\\s+data-group\\s+internal\\s+str_testclass\\s*\\{\\s*records\\s*\\{\\s*(?<records>([^\\s}]+\\s*\\{\\s*(data\\s*\"[^\"]*\")?\\s*\\}\\s*)*)\\}\\s*type\\s*string\\s*\\}"
"[^\\s}]+\\s*\\{\\s*(?:data\\s*\"(?<data>[^\"]*)\")?\\s*\\}\\s*"
Demo:
String input =
"ltm data-group internal str_testclass {\n" +
" records {\n" +
" baz {\n" +
" data \"value 1\"\n" +
" }\n" +
" foobar {\n" +
" data \"value 2\"\n" +
" }\n" +
" topaz {}\n" +
" empty { data \"\"}\n" +
" }\n" +
" type string\n" +
"}";
Pattern language = Pattern.compile("ltm\\s+data-group\\s+internal\\s+str_testclass\\s*\\{\\s*records\\s*\\{\\s*(?<records>([^\\s}]+\\s*\\{\\s*(data\\s*\"[^\"]*\")?\\s*\\}\\s*)*)\\}\\s*type\\s*string\\s*\\}");
Pattern record = Pattern.compile("(?<name>[^\\s}]+)\\s*\\{\\s*(?:data\\s*\"(?<data>[^\"]*)\")?\\s*\\}\\s*");
Matcher lgMatcher = language.matcher(input);
if (lgMatcher.matches()) {
String records = lgMatcher.group();
Matcher rdMatcher = record.matcher(records);
while (rdMatcher.find()) {
System.out.printf("%s:%s%n", rdMatcher.group("name"), rdMatcher.group("data"));
}
} else {
System.err.println("Language not recognized");
}
Output:
baz:value 1
foobar:value 2
topaz:null
empty:
Alernatives: As your parsing a custom language, you can give a try to write an ANTLR grammar or create Groovy DSL.

Your regex shouldn't even compile, because you are not escaping the " inside your regex String, so it is ending your String at the first " inside your regex.
Instead, try this regex:
String regex = key + "\\s\\{\\s*\\n\\s*data\\s*\"([^\"]*)\"";
You can check out how it works here on regex101.
Try something like this getRecord() method where key is the record 'name' you're searching for, e.g. foobar, and the input is the string you want to search through.
public static void main(String[] args) {
String input = "ltm data-group internal str_testclass { \n" +
" records { \n" +
" baz { \n" +
" data \"value 1\" \n" +
" } \n" +
" foobar { \n" +
" data \"value 2\" \n" +
" }\n" +
" topaz {}\n" +
" } \n" +
" type string \n" +
"}";
String bazValue = getRecord("baz", input);
String foobarValue = getRecord("foobar", input);
String topazValue = getRecord("topaz", input);
System.out.println("Record data value for 'baz' is '" + bazValue + "'");
System.out.println("Record data value for 'foobar' is '" + foobarValue + "'");
System.out.println("Record data value for 'topaz' is '" + topazValue + "'");
}
private static String getRecord(String key, String input) {
String regex = key + "\\s\\{\\s*\\n\\s*data\\s*\"([^\"]*)\"";
final Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
//if we find a record with data return it
return matcher.group(1);
} else {
//else see if the key exists with empty {}
final Pattern keyPattern = Pattern.compile(key);
Matcher keyMatcher = keyPattern.matcher(input);
if (keyMatcher.find()) {
//return empty string if key exists with empty {}
return "";
} else {
//else handle error, throw exception, etc.
System.err.println("Record not found for key: " + key);
throw new RuntimeException("Record not found for key: " + key);
}
}
}
Output:
Record data value for 'baz' is 'value 1'
Record data value for 'foobar' is 'value 2'
Record data value for 'topaz' is ''

You could try
(?:foobar\s{\s*data "(.*)")

I think the replaceAll() isn't necessary here. Would something like this work:
String var1 = "foobar";
String regex = '(?:' + var1 + '\s{\n\s*data "([^"]*)")';
You can then use this as your regex to pass into your pattern and matcher to find the substring.
You can simple transform this into a function so that you can pass variables into it for your search string:
public static void SearchString(String str)
{
String regex = '(?:' + str + '\s{\n\s*data "([^"]*)")';
}

Related

Splitting on JSON Payload with Regex to get Value

I am attempting to get a value out of a partial JSON payload just using the "split" method. I can only use this method since this API is very limited. I can get my value using Pattern and match APIs..
package com.company;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main {
public static void main(String[] args) {
// write your code here
String myString = "{\n" +
" \"8\": [\n" +
" {\n" +
" \"TEST\": \"LN17ELJ\",\n" +
" \"ROUTE_UNIQUE_ID_REFERENCE\": \"2172752\",\n" +
" \"ORDER_UNIQUE_ID_REFERENCE\": \"109197634\",\n" +
" \"STATUS\": \"HORLEY\",\n" +
" \"SECONDARY_NAV_CITY\": \"HORLEY\",\n" +
" \"ROUTE\": \"THE STREET 12\",\n";
String myRegexPattern = "\"([ROUTE_UNIQUE_ID_REFERENCE\"]+)\"\\s*:\\s*\"([^\"]+)\",?";
Pattern pattern = Pattern.compile(myRegexPattern);
Matcher matcher = pattern.matcher(myString);
if (matcher.find())
{
System.out.println(matcher.group(2));
} else {
System.out.println("Didn't work!");
}
}
}
However; When I try and using String.split it doesn't work and my value is not in any of the array indexes..
package com.company;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main {
public static void main(String[] args) {
// write your code here
String myString = "{\n" +
" \"8\": [\n" +
" {\n" +
" \"TEST\": \"LN17ELJ\",\n" +
" \"ROUTE_UNIQUE_ID_REFERENCE\": \"2172752\",\n" +
" \"ORDER_UNIQUE_ID_REFERENCE\": \"109197634\",\n" +
" \"STATUS\": \"HORLEY\",\n" +
" \"SECONDARY_NAV_CITY\": \"HORLEY\",\n" +
" \"ROUTE\": \"THE STREET 12\",\n";
String myRegexPattern = "\"([ROUTE_UNIQUE_ID_REFERENCE\"]+)\"\\s*:\\s*\"([^\"]+)\",?";
String[] newValue = myString.split(myRegexPattern);
for(int i = 0; i < newValue.length; i++) {
if(newValue[i].equals("2172752")) {
System.out.println("IT'S HERE!");
}
}
}
}
What would be the best way to do this? Is there a better way to get ROUTE_UNIQUE_ID_REFERENCE with just using split??

Your regular expression isn't doing what you are expecting. It is not matching "ROUTE_UNIQUE_ID_REFERENCE":"...", but matching any key that starts with any of the letters in ROUTE_UNIQUE_ID_REFERENCE. You could replace it with something like \"ROUTE_UNIQUE_ID_REFERENCE[\"\\s:]+([^\",]+) which will match what you are after in matcher group 1.
The split function doesn't work as you expect. The regular expression you are using in the split is viewed as the delimiter. Thus it is removing the data you are hoping to extract.
Assuming you are looking to get all of the values for ROUTE_UNIQUE_ID_REFERENCE, in the case there is more than one ROUTE_UNIQUE_ID_REFERENCE in your real data, your first example is closer to what you are after.
You need to fix your regular expression and matching group
Use a while loop instead of an if statement to find all the instances
String myRegexPattern = "\"ROUTE_UNIQUE_ID_REFERENCE[\"\\s:]+([^\",]+)";
Pattern pattern = Pattern.compile(myRegexPattern);
Matcher matcher = pattern.matcher(myString);
while (matcher.find()) {
System.out.println(matcher.group(1));
}

Get field from json using regex

I have the following json document:
{
"videoUrl":"",
"available":"true",
"movie":{
"videoUrl":"http..."
},
"account":{
"videoUrl":"http...",
"login":"",
"password":""
}
}
In this json I have a property named videoUrl, I want to get first non empty videoUrl
My regex:
("videoUrl":)("http.+")
But this regex match the following String
"videoUrl" :"http..."},
"account" : {"videoUrl" : "http...","login" : "","password" : ""
What is my way to write Regex that will find first non empty videoUrl with it's value
(Result should be "videoUrl":"http...")

Add (?!,) at the end of the regex, it will make the regex stop at an , without capturing it:
public static void main(String[] args) {
String input = "{ \n" +
" \"videoUrl\":\"\",\n" +
" \"available\":\"true\",\n" +
" \"movie\":{ \n" +
" \"videoUrl\":\"http...\"\n" +
" },\n" +
" \"account\":{ \n" +
" \"videoUrl\":\"http...\",\n" +
" \"login\":\"\",\n" +
" \"password\":\"\"\n" +
" }\n" +
"} ";
Pattern pattern = Pattern.compile("(\"videoUrl\":)(\"http.+\")(?!,)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group()); // "videoUrl":"http..."
}
}

It will be more appropriate to use one of JSON parsers, like Gson or Jackson, instead of regex. Something like:
String jsonStr = "...";
Gson gson = new Gson();
JsonObject json = gson.fromJson(jsonStr, JsonObject.class);
String url = element.get("videoUrl").getAsString();

Univocity Parsers: Calling a function from here is not working: parserSettings.selectFields( some_function );

I am using a .csv file and would like to pass a string constructed by a function to: parserSettings.selectFields( function );
During testing, when the string returned by the function is pasted directly into: parserSettings.selectFields( string ); the parsing works fine, however, when the function is used instead, the parse doesn't work, and there is only output of whitespace.
Here is the function:
public String buildColList() {
//Parse the qty col names string, which is a comma separated string
String qtyString = getQtyString();
List<String> qtyCols = Arrays.asList(qtyString.split("\\s*,\\s*"));
String colString = StringUtils.join(qtyCols.toArray(), "\"" + ", " + "\"");
String fullColString;
fullColString = "\"" + getString1() + "\"" + ", " + "\"" + getString2() + "\"" + ", " + "\"" + colString + "\"" + ", " + "\"" + getString4 + "\"";
return fullColString;
}
Here is how it is placed:
parserSettings.selectFields(buildColList());
Any help would be greatly appreciated, thanks.

You need to return an array from your buildColList method, as the parserSettings.selectFields() method won't split a single string. Your current implementation is selecting a single, big header instead of multiple columns. Change your method to do something like this:
public String[] buildColList() {
//Parse the qty col names string, which is a comma separated string
String qtyString = getQtyString();
List<String> qtyCols = Arrays.asList(qtyString.split("\\s*,\\s*"));
String colString = StringUtils.join(qtyCols.toArray(), "\"" + ", " + "\"");
String[] fullColString = new String[]{getString1(), getString2(), colString, getString4};
return fullColString;
}
And it should work. You might need to adjust my solution to fit your particular scenario as I didn't run this code. Also, I'm not sure why you were appending quotes around the column names, so I removed them.
Hope this helps.

How can i add double quotes to look like json

i am having below string but i want to add double quotes in it to look like json
[
{
LastName=abc,
FirstName=xyz,
EmailAddress=s#s.com,
IncludeInEmails=false
},
{
LastName=mno,
FirstName=pqr,
EmailAddress=m#m.com,
IncludeInEmails=true
}
]
i want below output.
[
{
"LastName"="abc",
"FirstName"="xyz",
"EmailAddress"="s#s.com",
"IncludeInEmails"=false
},
{
"LastName"="mno",
"FirstName"="pqr",
"EmailAddress"="m#m.com",
"IncludeInEmails"=true
}
]
i have tried some string regex. but didn't got. could any one please help.
String text= jsonString.replaceAll("[^\\{\\},]+", "\"$0\"");
System.out.println(text);
thanks

The regex way, similar to you have tried:
String jsonString = "[ \n" + "{ \n" + " LastName=abc, \n" + " FirstName=xyz, \n"
+ " EmailAddress=s#s.com, \n" + " IncludeInEmails=false \n" + "}, \n" + "{ \n"
+ " LastName=mno, \n" + " FirstName=pqr, \n" + " EmailAddress=m#m.com, \n" + " Number=123, \n"
+ " IncludeInEmails=true \n" + "} \n" + "] \n";
System.out.println("Before:\n" + jsonString);
jsonString = jsonString.replaceAll("([\\w]+)[ ]*=", "\"$1\" ="); // to quote before = value
jsonString = jsonString.replaceAll("=[ ]*([\\w#\\.]+)", "= \"$1\""); // to quote after = value, add special character as needed to the exclusion list in regex
jsonString = jsonString.replaceAll("=[ ]*\"([\\d]+)\"", "= $1"); // to un-quote decimal value
jsonString = jsonString.replaceAll("\"true\"", "true"); // to un-quote boolean
jsonString = jsonString.replaceAll("\"false\"", "false"); // to un-quote boolean
System.out.println("===============================");
System.out.println("After:\n" + jsonString);

Since there are a lot of corner cases, like character escaping, booleans, numbers, ... a simple regex won't do.
You could split the input string by newline and then handle each key-value-pair separately
for (String line : input.split("\\R")) {
// split by "=" and handle key and value
}
But again, you will have to handle char. escaping, booleans, ... (and btw, = is not a valid JSON key-value separator, only : is).
I'd suggest using GSON since it provides lenient parsing. Using Maven you can add it to your project with this dependency:
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.6.2</version>
</dependency>
You can then parse your input string using
String output = new JsonParser()
.parse(input)
.toString();

Just use this library http://mvnrepository.com/artifact/com.googlecode.json-simple/json-simple/1.1
Here is code for your example:
JSONArray json = new JSONArray();
JSONObject key1 = new JSONObject();
key1.put("LastName", "abc");
key1.put("FirstName", "xyz");
key1.put("EmailAddress", "s#s.com");
key1.put("IncludeInEmails", false);
JSONObject key2 = new JSONObject();
key2.put("LastName", "mno");
key2.put("FirstName", "pqr");
key2.put("EmailAddress", "m#m.com");
key2.put("IncludeInEmails", true);
json.add(key1);
json.add(key2);
System.out.println(json.toString());

Use the below code to get the output for your expection,
public class jsonTest {
public static void main(String[] args){
String test="[{ LastName=abc, FirstName=xyz, EmailAddress=s#s.com,IncludeInEmails=false},{ LastName=mno, FirstName=pqr, EmailAddress=m#m.com, IncludeInEmails=true}]";
String reg= test.replaceAll("[^\\{\\},]+", "\"$0\"");
String value=reg.replace("\"[\"{", "[{").replace("=","\"=\"").replace(" ","").replace("}\"]\"","}]").replace("\"true\"", "true").replace("\"false\"", "false");
System.out.println("value :: "+value);
}
}

Java string parsing with different regex to split

str="Tick for symbol .ISEQ-IDX descriptor id 1 timestamp_sec 20130628030105 timestamp_usec 384000;EXCH_TIME 1372388465384;SENDING_TIME 0;PRICE 3957.890000;MIC XDUBIND;"
I dont have any control on changing the format of how this string is created.
I tried this but I cant really get the values of first keys "Tick for symbol","timestamp_sec" etc.
Not only in this specific string but I was curious about how to parse a string with multiple regex splits. Any help will be appreciated.
String[] s = line.split(";");
Map<String, String> m = new HashMap<String, String>();
for (int i = 0; i < s.length; i++)
{
String[] split = s[i].split("\\s+");
for (String string2 : split)
{
//Adding key value pair. to a map for further usage.
m.put(split[0], split[1]);
}
}
Edit
Desired output into a map:
(Tick for Symbol, .ISEQ-IDX)
(descriptor id, 1)
(timestamp_sec,20130628030105)
(timestamp_usec,384000)
(EXCH_TIME,1372388465384)
(SENDING_TIME,0)
(PRICE, 3957.890000)
(MIC, XDUBIND)

How about the following? You specify a list of key-value pattern pairs. Keys are specified directly as strings, values as regexes. Then you go thru this list and search the text for the key followed by the value pattern, if you find it you extract the value.
I assume the keys can be in any order, not all have to be present, there might be more than one space separating them. If you know the order of the keys, you can always start find on the place where the previous find ended. If you know all keys are obligatory, you can throw an exception if you do not find what you look for.
static String test="Tick for symbol .ISEQ-IDX descriptor id 1 timestamp_sec 20130628030105 timestamp_usec 384000;EXCH_TIME 1372388465384;SENDING_TIME 0;PRICE 3957.890000;MIC XDUBIND;";
static List<String> patterns = Arrays.asList(
"Tick for symbol", "\\S+",
"descriptor id", "\\d+",
"timestamp_sec", "\\d+",
"timestamp_usec", "\\d+",
"EXCH_TIME", "\\d+",
"SENDING_TIME","\\d+",
"PRICE", "\\d+.\\d",
"MIC", "\\S+"
);
public static void main(String[] args) {
Map<String,String> map = new HashMap<>();
for (int i = 0; i<patterns.size();i+=2) {
String key = patterns.get(i);
String val = patterns.get(i+1);
String pattern = "\\Q" +key + "\\E\\s+(" + val + ")";
Matcher m = Pattern.compile(pattern).matcher(test);
if (m.find()) {
map.put(key, m.group(1));
}
}
System.out.println(map);
}

I don't think a regex will help you here, whoever designed that output String clearly didn't have splitting in mind.
I suggest simply parsing through the String with a loop and doing the whole thing manually. Alternatively you can just look through the String for substrings (suck as "Tick for symbol"), then take whatever word comes after (until the next space), since the second parameter always seems to be one words.

Using the Pattern class from java.util.regex package, described step by step in this java Regex tutorial:
private static final Pattern splitPattern = Pattern.compile("^Tick for symbol (.*) descriptor id (\\d+) timestamp_sec (\\d+) timestamp_usec (\\d+);EXCH_TIME (\\d+);SENDING_TIME ?(\\d+);PRICE (.*);MIC (\\w+);$");
private static String printExtracted(final String str) {
final Matcher m = splitPattern.matcher(str);
if (m.matches()) {
final String tickForSymbol = m.group(1);
final long descriptorId = Long.parseLong(m.group(2), 10);
final long timestampSec = Long.parseLong(m.group(3), 10);
final long timestampUsec = Long.parseLong(m.group(4), 10);
final long exchTime = Long.parseLong(m.group(5), 10);
final long sendingTime = Long.parseLong(m.group(6), 10);
final double price = Double.parseDouble(m.group(7));
final String mic = m.group(8);
return "(Tick for Symbol, " + tickForSymbol + ")\n" +
"(descriptor id, " + descriptorId + ")\n" +
"(timestamp_sec, " + timestampSec + ")\n" +
"(timestamp_usec, " + timestampUsec + ")\n" +
"(EXCH_TIME, " + exchTime + ")\n" +
"(SENDING_TIME, " + sendingTime +")\n" +
"(PRICE, " + price + ")\n" +
"(MIC, " + mic + ")";
} else {
throw new IllegalArgumentException("Argument " + str + " doesn't match pattern.");
}
}
Edit: Using group instead of replaceAll as it makes more sense and and is also faster.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting Capture Group from Non-Capture Group in Java - java

You could try (?:foobar\s{\sdata "(.)")

Related

Splitting on JSON Payload with Regex to get Value

Get field from json using regex

Univocity Parsers: Calling a function from here is not working: parserSettings.selectFields( some_function );

How can i add double quotes to look like json

Java string parsing with different regex to split

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting Capture Group from Non-Capture Group in Java - java

You could try (?:foobar\s{\s*data "(.*)")

Related

Splitting on JSON Payload with Regex to get Value

Get field from json using regex

Univocity Parsers: Calling a function from here is not working: parserSettings.selectFields( *some_function* );

How can i add double quotes to look like json

Java string parsing with different regex to split

Categories

Resources

You could try (?:foobar\s{\sdata "(.)")

Univocity Parsers: Calling a function from here is not working: parserSettings.selectFields( some_function );