extracting specific but unknown values from a string in Java

extracting specific but unknown values from a string in Java - java

I am trying to extract values from a MySQL insert command in Java. The insert command is just a string as far as Java is concerned. it will be of the format
INSERT INTO employees VALUES ("John Doe", "45", "engineer");
I need to pull the '45' out of that statement. I can't pinpoint its index because names and job titles will be different. I only need the age. Other than overly complex string manipulation which I could probably figure out in time, is there a more straight forward way of isolating those characters? I just cant seem to wrap my mind around how to do it and I am not very familiar with regular expressions.

If this is the specific format of your message, then a regex like that should help:
INSERT INTO employees VALUES (".*?", "(.*?)", ".*?");
The read the first group of the result and you should get the age.
In regular expressions (X) defines a matching group that captures X (where X can be any regular expression). This means that if the entire regular expression matches, then you can easily find out the value within this matching group (using Matcher.group() in Java).
You can also have multiple matching groups in a single regex like this:
INSERT INTO employees VALUES ("(.*?)", "(.*?)", "(.*?)");
So your code could look like this:
String sql = "INSERT INTO employees VALUES (\"John Doe\", \"45\", \"engineer\");";
final Pattern patter = Pattern.compile("INSERT INTO employees VALUES (\"(.*?)\", \"(.*?)\", \"(.*?)\");");
final Matcher matcher = pattern.matcher(sql);
if (matcher.matches()) {
String name = matcher.group(1);
String age = matcher.group(2);
String job = matcher.group(3);
// do stuff ...
}

assuming that name doesn't contain any " you can use regex .*?".*?".*?"(\d+)".* and group(1) gives you the age.

As far as I understand your insert command will insert into 3 columns only. What you can probably do split the string on the character comma (,) and then get the second element of the array, trim left and right white spaces and then extract the elements of it except the first and last character. That should fetch you the age. Writing a psuedocode for it:
String insertQuery="INSERT INTO employees VALUES (\"John Doe\", \"45\", \"engineer\")";
String splitQuery=insertQuery.split(",");
String age=splitQuery[1];
age=age.trim();
age=age.substring(1, age.length-2);

If you are sure that there is only one instance of a number in the string, the regular expression you need is very simple:
//Assuming that str contains your insert statement
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher(str);
if(m.find()) System.out.println(m.group());

How about String.split by ","?
final String insert = "INSERT INTO employees VALUES (\"John Doe\", \"45\", \"engineer\"); ";
System.out.println(insert.split(",")[1].trim());

Related

Regex: Remove postfix string in any word after occurance of any of list of strings in a paragraph

I have a bigger string and a list of strings. I want to change the bigger string such that
- For any occurrence of a string in list in a bigger string, remove the suffix part till next space.
Bigger String
WITH dataTab0 AS (SELECT TO_CHAR(to_date(tab_0_0.times),'YYYYMMDD') AS TIME_ID_CATEGORYe93bc60a0041,tab_0_0.request_id AS PAGE_IMPRESSIONf6beefc4b44e4b FROM full_contents_2
List
TIME_ID_CATEGORY
PAGE_IMPRESSION
...
I need to remove suffix like e93bc60a0041 and f6beefc4b44e4b which is coming after TIME_ID_CATEGORY and PAGE_IMPRESSION
I expect following result. I need regex based/effective solution in java to achieve the same.
WITH dataTab0 AS (SELECT TO_CHAR(to_date(tab_0_0.times),'YYYYMMDD') AS TIME_ID_CATEGORY,tab_0_0.request_id AS PAGE_IMPRESSION FROM full_contents_2

How about something like this? Essentially matching TIME_ID_CATEGORY or PAGE_IMPRESSION into Group 1, and anything that follows (i.e. suffix) as Group 2.
(TIME_ID_CATEGORY|PAGE_IMPRESSION)(\w+)
Regex Demo
And then simply replace contents of Group 2 with empty string. Or just replace with Group 1, this will also get rid of the suffix (see below code snippet).
Example code snippet:
public static void main(String args[]) throws Exception {
String line = "WITH dataTab0 AS (SELECT TO_CHAR(to_date(tab_0_0.times),'YYYYMMDD') AS TIME_ID_CATEGORYe93bc60a0041,tab_0_0.request_id AS PAGE_IMPRESSIONf6beefc154b44e4b FROM full_contents_2";
Pattern p = Pattern.compile("(TIME_ID_CATEGORY|PAGE_IMPRESSION)(\\w+)");
Matcher m = p.matcher(line);
if (m.find()) {
String output = m.replaceAll("$1");
System.out.println(output);
//WITH dataTab0 AS (SELECT TO_CHAR(to_date(tab_0_0.times),'YYYYMMDD') AS TIME_ID_CATEGORY,tab_0_0.request_id AS PAGE_IMPRESSION FROM full_contents_2
}
}

My guess is that maybe a simple expression,
[a-f0-9]{14}
replaced with an empty string might actually work here, if we only have those 14-length substrings.
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

regex for splitting key value-pair containing comma

I need a regex to split key-value pairs.Key and value are separated by =
Values can contain comma(,) but if they contain comma(,) they need to be enclosed by ("").Also the value in ("") can contain multiple inner ("") with comma(,) in them.Hence multiple level of nesting with (" , ") is possible.
Key can anything except ( comman(,) equal(=) double quote("") )
Example- abc="hi my name is "ayush,nigam"",def="i live at "bangalore",ghi=bangalore is in karnataka,jkl="i am from UP"
Another example - "ayush="piyush="abc,def",bce="asb,dsa"",aman=nigam"
I expect output as ayush="piyush="abc,def",bce="asb,dsa"" and aman=nigam
I am using the following regex code in java.
Pattern abc=Pattern.compile("([^=,]*)=((?:\"[^\"]*\"|[^,\"])*)");
String text2="AssemblyName=(foo.dll),ClassName=\"SomeClassanotherClass=\"a,b\"\"";
Matcher m=abc.matcher(text2);
while(m.find()) {
String kvPair = m.group();
System.out.println(kvPair);
}
I am getting folliwng kvPair
:
AssemblyName=(foo.dll)
ClassName="SomeClassanotherClass="a
Where as i need to get,
AssemblyName=(foo.dll)
ClassName="SomeClassanotherClass="a,b"
Hence comma(,) in inner double quotes("") are not being parse properly.Please help.

Filtering string between double or single quotations with varying spaces

I have these two variations of this string
name='Anything can go here'
name="Anything can go here"
where name= can have spaces like so
name=(text)
name =(text)
name = (text)
I need to extract the text between the quotes, I'm not sure what's the best way to approach this, should I just have mechanism to cut the string off at quotes and do you have an example where I wont have many case handling, or should I use regex.

I'm not sure I understand the question exactly but I'll give it my best shot:
If you want to just assign a variable name2 to the string inside the quotation marks then you can easily do :
String name = 'Anything can go here';
String name2= name.replace("'","");
name2 = name2.replace("\"","");

You're wanting to get Anything can go here whether it's in between single quotes or double quotes. Regex has the capabilities of doing this regardless of the spaces before or after the "=" by using the following pattern:
"[\"'](.+)[\"']"
Breakdown:
[\"'] - Character class consisting of a double or single quote
(.+) - One or more of any character (may or may not match line terminators stored in capture group 1
[\"'] - Character class consisting of a double or single quote
In short, we are trying to capture anything between single or double quotes.
Example:
public static void main(String[] args) {
List<String> data = new ArrayList(Arrays.asList(
"name='Anything can go here'",
"name = \"Really! Anything can go here\""
));
for (String d : data) {
Matcher matcher = Pattern.compile("[\"'](.+)[\"']").matcher(d);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
}
Results:
Anything can go here
Really! Anything can go here

Regular expression matching issue with the following scenario

I am developing an application. User will enter some of the setting value in the server. When I ask for the value to the server through the inbuilt API. I am getting values like as a whole string:
for example-
name={abc};display={xyz};addressname={123}
Here the properties are name, display and address and there respective values are abc, xyz and 123.
I used to split with ; as first delimeter and = as a second dleimeter.
String[] propertyValues=iPropertiesStrings.split(";");
for(int i=0;i<propertyValues.length;i++)
{
if(isNullEmpty(propertyValues[i]))
continue;
String[] propertyValue=propertyValues[i].split("=");
if(propertyValue.length!=2)
mPropertyValues.put(propertyValue[0], "");
else
mPropertyValues.put(propertyValue[0], propertyValue[1]);
}
}
here mPropertyValues is hash map which is used for keeping property name and its value.
Problem is there can be string :
case 1: name={abc};display={ xyz=deno; demo2=pol };addressname={123}
case 2: name=;display={ xyz=deno; demo2=pol };addressname={123}
I want hashmap to be filled with :
case 1:
name ="abc"
display = "xyz= demo; demo2 =pol"
addressname = "123"
for case 2:
name =""
display = "xyz= demo; demo2 =pol"
addressname = "123"
I am looking for a regular expression to split these strings;

Assuming that there can't be nested {} this should do what you need
String data = "name=;display={ xyz=deno; demo2=pol };addressname={123}";
Pattern p = Pattern.compile("(?<name>\\w+)=(\\{(?<value>[^}]*)\\})?(;|$)");
Matcher m = p.matcher(data);
while (m.find()){
System.out.println(m.group("name")+"->"+(m.group("value")==null?"":m.group("value").trim()));
}
Output:
name->
display->xyz=deno; demo2=pol
addressname->123
Explanation
(?<name>\\w+)=(\\{(?<value>[^}]*)\\})?(;|$) can be split into parts where
(?<name>\\w+)= represents XXXX= and place XXXX in group named name (of property)
(\\{(?<value>[^}]*)\\})? is optional {XXXX} part where X can't be }. Also it will place XXXX part in group named value.
(;|$) represents ; OR end of data (represented by $ anchor) since formula is name=value; or in case of pair placed at the end of data name=value.

The following regex should match your criteria, and uses named capturing groups to get the three values you need.
name=\{(?<name>[^}])\};display=\{(?<display>[^}]+)\};addressname=\{(?<address>[^}]\)}

Assuming your dataset can change, a better parser may be more dynamic, building a Map from whatever is found in that return type.
The regex for this is pretty simple, given the cases you list above (and no nesting of {}, as others have mentioned):
Matcher m = Pattern.compile("(\\w+)=(?:\\{(.*?)\\})?").matcher(source_string);
while (m.find()) {
if (m.groupCount() > 1) {
hashMap.put(m.group(1), m.group(2));
}
}
There are, however, considerations to this:
If m.group(2) does not exist, "null" will be the value, (you can adjust that to be what you want with a tiny amount of logic).
This will account for varying data-sets - in case your data in the future changes.
What that regex does:
(\\w+) - This looks for one or more word characters in a row (A-z_) and puts them into a "capture group" (group(1))
= - The literal equals
(?:...)? - This makes the grouping not a capture group (will not be a .group(n), and the trailing ? makes it an optional grouping.
\\{(.*?)\\} - This looks for anything between the literals { and } (note: if a stray } is in there, this will break). If this section exists, the contents between {} will be in the second "capture group" (.group(2)).

Need nested parenthesis for this regex

I have got dataset like this with three columns
Col1, Col2, Col2
aaa,Arizona DL USTATES,12
bbb,Idaho DL USTATES,35
ccc,Idaho DL USTATES,28
ddd,Wisconsin DL USTATES,11
eeee,Wisconsin DL USTATES,35
What I want to do is that I want to extract the first word of the second column(what is a state name) and put it in the first column.
Expected Output:
Arizona,Arizona randam USTATES,12
Idaho,Idaho randam USTATES,35
Idaho,Idaho randam USTATES,28
Wisconsin,Wisconsin random USTATES,11
The regex that I have is
^[^,]+,([^ ]+) [^\n]+$
With my () I can extract the state name, but How do get the output? What I want is nested parenthesis, something like this
^[^,]+,(([^ ]+) [^\n]+)$
and then the output will be
\1,\2
I should point out that I want to do it using regex replace only.
Edit:
I have solved it by using regex to get all of the state names in a column and then merged it, but I want to know if there are any advanced regex which can be used here.

String s = "aaa,Arizona DL USTATES,12";
String st = s.split(",")[1].split(" ")[0];
s = s.replaceFirst("\\w+\\,", st + ",");

Your regex with nested parentheses works fine; you just need to use String's replaceFirst method and note that Java uses $ for group references. Also note that the groups are enumerated in the order they occur in the regex, so the outer group is $1 because it starts first:
String line = "aaa,Arizona DL USTATES,12";
String result = line.replaceFirst("^[^,]+,(([^ ]+) [^\n]+)$", "$2, $1");
// result is "Arizona, Arizona DL USTATES,12"

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

extracting specific but unknown values from a string in Java - java

assuming that name doesn't contain any " you can use regex .?".?".?"(\d+)". and group(1) gives you the age.

If you are sure that there is only one instance of a number in the string, the regular expression you need is very simple: //Assuming that str contains your insert statement Pattern p = Pattern.compile("[0-9]+"); Matcher m = p.matcher(str); if(m.find()) System.out.println(m.group());

How about String.split by ","? final String insert = "INSERT INTO employees VALUES (\"John Doe\", \"45\", \"engineer\"); "; System.out.println(insert.split(",")[1].trim());

Related

Regex: Remove postfix string in any word after occurance of any of list of strings in a paragraph

regex for splitting key value-pair containing comma

Filtering string between double or single quotations with varying spaces

Regular expression matching issue with the following scenario

Need nested parenthesis for this regex

Categories

Resources