Java - extract JSON values from string using multi regex

Java - extract JSON values from string using multi regex - java

I am trying to use this multi regex java library to extract JSON field value from a string.
My JSON look like this:
{
"field1": "something",
"field2": 13
"field3": "some"
}
I have created a regex pattern to fit each field, and it is working with Java Regex Pattern by simply doing something like this for each pattern:
Matcher matcher = patternToSearch.matcher(receiveData);
if (matcher.find()) {
return matcher.group(1);
}
I decided to try and improve the code and use multi regex so instead of scanning the string 3 times, it will scan it only one time and extract all needed values.
So I came up with something like this:
String[] patterns = new String[]{
"\"field1\":\\s*\"(.*?)\"",
"\"field2\":\\s*(\\d+)(\\.\\d)?",
"\"field3\":\\s*\"(.*?)\"",
};
this.matcher = MultiPattern.of(patterns).matcher();
the matcher has only one method - match - used like this:
int[] match = this.matcher.match(jsonStringToScan);
so I ended up with a list of integers, but I have no idea how to get the json values from these strings and how those integers are helping me. The multi regex matcher does not support the group method I used before to get the value.
Any idea of how I can extract multiple json values from string using multi regex? (Scanning string only once)

As mentioned on github page from your link match returnes indexes of patterns matched. Another point from this page:
The library does not handle groups.
Consider matching key as group too. Look at this simple example:
final Pattern p = Pattern.compile("\"(field.)\":((?:\".*?\")|(?:\\d+(?:\\.\\d+)?))");
final Matcher m = p.matcher("{\"field3\":\"hi\",\"field2\":100.0,\"field1\":\"hi\"}");
while (m.find()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.print(m.group(i) + " ");
}
System.out.println();
}
It prints:
field3 "hi"
field2 100.0
field1 "hi"
If you want to avoid quotes in value group, you need more complicated logic. I've stopped at:
final Pattern p = Pattern.compile("\"(field.)\":(?:(?:\"(.*?(?=\"))\")|(\\d+(?:\\.\\d+)?))");
resulting in
field3 hi null
field2 null 100.0
field1 hi null

Related

Replace a string using a regular expression

I have a string that I would like to replace using a regular expression in java but I am not quite sure how to do this.
Let's say I have the code below:
String globalID="60DC6285-1E71-4C30-AE36-043B3F7A4CA6";
String regExpr="^([A-Z0-9]{3})[A-Z0-9]*|-([A-Z0-9]{3})[A-Z0-9]*$|-([A-Z0-9]{2})[A-Z0-9]*"
What I would like to do is apply my regExpr in globalID so the new string will be something like : 60D1E4CAE043; I did it with str.substring(0,3)+.... but I was wondering if I can do it using the regexpr in java. I tried to do it by using the replaceAll but the output was not the one I describe above.
To be more specific , I would like to change the globalID to a newglobalID using the regexpr I described above. The newglobalID will be : 60D1E4CAE043.
Thanks

This is definitively not the best code ever, but you could do something like this:
String globalID = "60DC6285-1E71-4C30-AE36-043B3F7A4CA6";
String regExpr = "^([A-Z0-9]{3})[A-Z0-9]*|-([A-Z0-9]{3})[A-Z0-9]*$|-([A-Z0-9]{2})[A-Z0-9]*";
Pattern pattern = Pattern.compile(regExpr);
Matcher matcher = pattern.matcher(globalID);
String newGlobalID = "";
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
newGlobalID += matcher.group(i) != null ? matcher.group(i) : "";
}
}
System.out.println(newGlobalID);
You will need to use a Matcher to iterate over all matches in your input as your regular expression matches subsequences of the input string only. Depending on which substring is matched a different capturing group will be non-null, you could also use named capturing groups or remember where in the input you currently are, but the above code should work as example.

Your regexp must match the whole string. Your wersioe tries to match the parts alternatively which does not work.
thy this:
String regExpr="^([A-Z0-9]{3})[^-]*"+
"-([A-Z0-9]{2})[^-]*"+
"-([A-Z0-9]{3})[^-]*"+
"-([A-Z0-9]{2})[^-]*"+
"-([A-Z0-9]{2}).*"

The total code should be like that below,
String globalID = "60DC6285-1E71-4C30-AE36-043B3F7A4CA6";
String regExpr = "^(\\w{3}).*?-"
+ "(\\w{2}).*?-"
+ "(\\w{2}).*?-"
+ "(\\w{2}).*?-"
+ "(\\w{3}).*";
System.out.println(globalID.replaceAll(regExpr, "$1$2$3$4$5"));
The output of println function is
60D1E4CAE043

Parse out specific characters from java string

I have been trying to drop specific values from a String holding JDBC query results and column metadata. The format of the output is:
[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]
I am trying to get it into the following format:
I_Col1=someValue1, I_Col2=someVal2, I_Col3=someVal3
I have tried just dropping everything before the "=", but some of the "someVal" data has "=" in them. Is there any efficient way to solve this issue?
below is the code I used:
for(int i = 0; i < finalResult.size(); i+=modval) {
String resulttemp = finalResult.get(i).toString();
String [] parts = resulttemp.split(",");
//below is only for
for(int z = 0; z < columnHeaders.size(); z++) {
String replaced ="";
replaced = parts[z].replace("*=", "");
System.out.println("Replaced: " + replaced);
}
}

You don't need any splitting here!
You can use replaceAll() and the power of regular expressions to simply replace all occurrences of those unwanted characters, like in:
someString.replaceAll("[\\[\\]\\{\\}", "")
When you apply that to your strings, the resulting string should exactly look like required.

You could use a regular expression to replace the square and curly brackets like this [\[\]{}]
For example:
String s = "[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]";
System.out.println(s.replaceAll("[\\[\\]{}]", ""));
That would produce the following output:
I_Col1=someValue1, I_Col2=someVal2, I_Col3=someVal3
which is what you expect in your post.
A better approach however might be to match instead of replace if you know the character set that will be in the position of 'someValue'. Then you can design a regex that will match this perticular string in such a way that no matter what seperates I_Col1=someValue1 from the rest of the String, you will be able to extract it :-)
EDIT:
With regards to the matching approach, given that the value following I_Col1= consists of characters from a-z and _ (regardless of the case) you could use this pattern: (I_Col\d=\w+),?
For example:
String s = "[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]";
Matcher m = Pattern.compile("(I_Col\\d=\\w+),?").matcher(s);
while (m.find())
System.out.println(m.group(1));
This will produce:
I_Col1=someValue1
I_Col2=someVal2
I_Col3=someVal3

You could do four calls to replaceAll on the string.
String query = "[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]"
String queryWithoutBracesAndBrackets = query.replaceAll("\\{", "").replaceAll("\\]", "").replaceAll("\\]", "").replaceAll("\\[", "")
Or you could use a regexp if you want the code to be more understandable.
String query = "[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]"
queryWithoutBracesAndBrackets = query.replaceAll("\\[|\\]|\\{|\\}", "")

Java split a string by using a regex [duplicate]

I have a string that has two single quotes in it, the ' character. In between the single quotes is the data I want.
How can I write a regex to extract "the data i want" from the following text?
mydata = "some string with 'the data i want' inside";

Assuming you want the part between single quotes, use this regular expression with a Matcher:
"'(.*?)'"
Example:
String mydata = "some string with 'the data i want' inside";
Pattern pattern = Pattern.compile("'(.*?)'");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
Result:
the data i want

You don't need regex for this.
Add apache commons lang to your project (http://commons.apache.org/proper/commons-lang/), then use:
String dataYouWant = StringUtils.substringBetween(mydata, "'");

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(".*'([^']*)'.*");
String mydata = "some string with 'the data i want' inside";
Matcher matcher = pattern.matcher(mydata);
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
}
}

There's a simple one-liner for this:
String target = myData.replaceAll("[^']*(?:'(.*?)')?.*", "$1");
By making the matching group optional, this also caters for quotes not being found by returning a blank in that case.
See live demo.

Since Java 9
As of this version, you can use a new method Matcher::results with no args that is able to comfortably return Stream<MatchResult> where MatchResult represents the result of a match operation and offers to read matched groups and more (this class is known since Java 1.5).
String string = "Some string with 'the data I want' inside and 'another data I want'.";
Pattern pattern = Pattern.compile("'(.*?)'");
pattern.matcher(string)
.results() // Stream<MatchResult>
.map(mr -> mr.group(1)) // Stream<String> - the 1st group of each result
.forEach(System.out::println); // print them out (or process in other way...)
The code snippet above results in:
the data I want
another data I want
The biggest advantage is in the ease of usage when one or more results is available compared to the procedural if (matcher.find()) and while (matcher.find()) checks and processing.

Because you also ticked Scala, a solution without regex which easily deals with multiple quoted strings:
val text = "some string with 'the data i want' inside 'and even more data'"
text.split("'").zipWithIndex.filter(_._2 % 2 != 0).map(_._1)
res: Array[java.lang.String] = Array(the data i want, and even more data)

String dataIWant = mydata.replaceFirst(".*'(.*?)'.*", "$1");

as in javascript:
mydata.match(/'([^']+)'/)[1]
the actual regexp is: /'([^']+)'/
if you use the non greedy modifier (as per another post) it's like this:
mydata.match(/'(.*?)'/)[1]
it is cleaner.

String dataIWant = mydata.split("'")[1];
See Live Demo

In Scala,
val ticks = "'([^']*)'".r
ticks findFirstIn mydata match {
case Some(ticks(inside)) => println(inside)
case _ => println("nothing")
}
for (ticks(inside) <- ticks findAllIn mydata) println(inside) // multiple matches
val Some(ticks(inside)) = ticks findFirstIn mydata // may throw exception
val ticks = ".*'([^']*)'.*".r
val ticks(inside) = mydata // safe, shorter, only gets the first set of ticks

Apache Commons Lang provides a host of helper utilities for the java.lang API, most notably String manipulation methods.
In your case, the start and end substrings are the same, so just call the following function.
StringUtils.substringBetween(String str, String tag)
Gets the String that is nested in between two instances of the same
String.
If the start and the end substrings are different then use the following overloaded method.
StringUtils.substringBetween(String str, String open, String close)
Gets the String that is nested in between two Strings.
If you want all instances of the matching substrings, then use,
StringUtils.substringsBetween(String str, String open, String close)
Searches a String for substrings delimited by a start and end tag,
returning all matching substrings in an array.
For the example in question to get all instances of the matching substring
String[] results = StringUtils.substringsBetween(mydata, "'", "'");

you can use this
i use while loop to store all matches substring in the array if you use
if (matcher.find())
{
System.out.println(matcher.group(1));
}
you will get on matches substring so you can use this to get all matches substring
Matcher m = Pattern.compile("[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+").matcher(text);
// Matcher mat = pattern.matcher(text);
ArrayList<String>matchesEmail = new ArrayList<>();
while (m.find()){
String s = m.group();
if(!matchesEmail.contains(s))
matchesEmail.add(s);
}
Log.d(TAG, "emails: "+matchesEmail);

add apache.commons dependency on your pom.xml
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-io</artifactId>
<version>1.3.2</version>
</dependency>
And below code works.
StringUtils.substringBetween(String mydata, String "'", String "'")

Some how the group(1) didnt work for me. I used group(0) to find the url version.
Pattern urlVersionPattern = Pattern.compile("\\/v[0-9][a-z]{0,1}\\/");
Matcher m = urlVersionPattern.matcher(url);
if (m.find()) {
return StringUtils.substringBetween(m.group(0), "/", "/");
}
return "v0";

Smart parsing string java

Is there some kind of rule engine or some smart way to do this?
I have a string like this :
test 1-2-22
SO that I can get these values:
name = "test"
part_id = 1
brand_id = 2
count = 22
I have more of these so called rules from which I know the format of string.
I was thinking I can do this with regex, but is there a better way of doing this instead?
Edit:
I see some very good answers. Maybe I should have been more clear.
This is not the only string type that I might have, I could have a string like this :
test 3-brand 15 – 2
Where after parsing it should be :
name = "test"
part_id = 2
brand_id = 3
count = 15
So I can have different strings and I need to definy a rule/pattern for each of those. What would be good way to do this? Regex is one option for now

You can split around both spaces and dashes using the following expression:
[ -]
Then you will find the different components at indexes starting from 0.
In Java:
String input = "test 1-2-22";
String[] results = input.split("[ -]");

You can use this Pattern regex:
Pattern pattern = Pattern.compile("^([a-zA-Z]+)\\s*([^-]+)-([^-]+)-([^-]+)$");
Then this code should work:
String line = "test 1-2-22";
Pattern pattern = Pattern.compile("^([a-zA-Z]+)\\s*([^-]+)-([^-]+)-([^-]+)$");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.printf("name:%s, part_id:%s, brand_id:%s, count:%s%n",
matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4) );
}

In this particular case, suitable split operations (or other manual string processing) is probably going to be easiest, as you have the whitespace and the dashes to look for explicitly.
For more complex patterns you can look into antlr for tokenising this into (for example) one identifier and three number tokens and then parsing it, but that seems to be overkill here. (This would give you a 'rule engine', thugh.)
In general: you may want to read up on parsing and context-free grammars for this.

Something like this:
String s = "test 1-2-22";
String[] vars = s.split("[ -]");
String name = vars[0];
String part_id = vars[1];
String brand_id = vars[2];
String count = vars[3];
This will split the string if a space or "-" occurs.
you could then convert the ids and count to int if required.

Extracting a word containing a symbol from a string in Java

The basic idea is that I want to pull out any part of the string with the form "text1.text2". Some examples of the input and output of what I'd like to do would be:
"employee.first_name" ==> "employee.first_name"
"2 * employee.salary AS double_salary" ==> "employee.salary"
Thus far I have just .split(" ") and then found what I needed and .split("."). Is there any cleaner way?

I would go with an actual Pattern and an iterative find, instead of splitting the String.
For instance:
String test = "employee.first_name 2 * ... employee.salary AS double_salary blabla e.s blablabla";
// searching for a number of word characters or puctuation, followed by dot,
// followed by a number of word characters or punctuation
// note also we're avoiding the "..." pitfall
Pattern p = Pattern.compile("[\\w\\p{Punct}&&[^\\.]]+\\.[\\w\\p{Punct}&&[^\\.]]+");
Matcher m = p.matcher(test);
while (m.find()) {
System.out.println(m.group());
}
Output:
employee.first_name
employee.salary
e.s
Note: to simplify the Pattern you could only list the allowed punctuation forming your "."-separated words in the categories
For instance:
Pattern p = Pattern.compile("[\\w_]+\\.[\\w_]+");
This way, foo.bar*2 would be matched as foo.bar

You need to make use of split to break the string into fragments.Then search for . in each of those fragments using contains method, to get the desired fragments:
Here you go:
public static void main(String args[]) {
String str = "2 * employee.salary AS double_salary";
String arr[] = str.split("\\s");
for (int i = 0; i < arr.length; i++) {
if (arr[i].contains(".")) {
System.out.println(arr[i]);
}
}
}

String mydata = "2 * employee.salary AS double_salary";
pattern = Pattern.compile("(\\w+\\.\\w+)");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}

I'm not an expert in JAVA, but as I used regex in python and based on internet tutorials, I offer you to use r'(\S*)\.(\S*)' as the pattern. I tried it in python and it worked well in your example.
But if you want to use multiple dots continuously, it has a bug. I mean if you are trying to match something like first.second.third, this pattern identifies ('first.second', 'third') as the matched group and I think it relates to the best match strategy.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - extract JSON values from string using multi regex - java

Related

Replace a string using a regular expression

Parse out specific characters from java string

Java split a string by using a regex [duplicate]

Smart parsing string java

Extracting a word containing a symbol from a string in Java

Categories

Resources