Split string by tab and ignore tab in multi double quotes

Split string by tab and ignore tab in multi double quotes - java

public static void main(String[] args) {
String text = "hi ravi \"how are you\" when are you coming";
String regex = "\"([^\"]*)\"|(\\S+)";
Matcher m = Pattern.compile(regex).matcher(text);
while (m.find()) {
if (m.group(1) != null) {
System.out.println("Quoted [" + m.group(1) + "]");
} else{
System.out.println("Plain [" + m.group(0) + "]");
}
}
// getSplits(text);
}
Output:
Plain [hi]
Plain [ravi]
Quoted [how are you]
Plain [when]
Plain [are]
Plain [you]
Plain [coming]
Above code is working fine if the given text has only one single quotation. Can any one help me how to get below output with below input:
text = "hi ravi \"\"how are\" you\" when are you coming";
Expected Output:
Plain [hi]
Plain [ravi]
Quoted ["how are" you]
Plain [when]
Plain [are]
Plain [you]
Plain [coming]

Following regex works for your example input/output. You will have to give a more detailed description of the expected result, as this might not be what you were expecting.
public static void main(String[] args) {
String text = "hi ravi \"\"how are\" you\" when are you coming";
String regex = "(\".+\")|(\\S+)";
Matcher m = Pattern.compile(regex).matcher(text);
while (m.find()) {
if (m.group(1) != null) {
System.out.println("Quoted [" + m.group(1) + "]");
} else{
System.out.println("Plain [" + m.group(0) + "]");
}
}
// getSplits(text);
}

This will do:
[\t]+(?=([^"]*"[^"]*")*[^"]*$)
See the DEMO

Related

Java regex extract specific values in long log

I have a very long text and I'm extracting some specific values that are followed by some particular words. Here's an example of my long text:
.........
FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]
.........
TotalFrames[ValMin: 100000, ValMax:200000]
.........
MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]
.........
here's my code:
File file = filePath.toFile();
JSONObject jsonObject = new JSONObject();
String FPSMin="";
String FPSMax="";
String TotalFramesMin="";
String TotalFramesMax="";
String MemUsageMin="";
String MemUsageMax="";
String log = "my//log//file";
final Matcher matcher = Pattern.compile("FPS/\(FramesPerSecond/\)/\[ValMin:");
if(matcher.find()){
FPSMin= matcher.end().trim();
}
But I can't make it work. Where am I wrong? Basically I need to select, for each String, the corresponding values (max and min) coming from that long text and store them into the variables. Like
FPSMin = 29.0000
FPSMax = 35.0000
FramesMin = 100000
Etc
Thank you
EDIT:
I tried the following code (in a test case) to see if the solution could work, but I'm experiencing issues because I can't print anything except an object. Here's the code:
#Test
public void whenReadLargeFileJava7_thenCorrect()
throws IOException, URISyntaxException {
Scanner txtScan = new Scanner("path//to//file//test.txt");
String[] FPSMin= new String[0];
String FPSMax= "";
//Read File Line By Line
while (txtScan.hasNextLine()) {
// Print the content on the console
String str = txtScan.nextLine();
Pattern FPSMin= Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = FPSMin.matcher(str);
if(matcher.find()){
String MinMaxFPS= str.substring(matcher.end(), str.length()-1);
String[] splitted = MinMaxFPS.split(",");
FPSMin= splitted[0].split(": ");
FPSMax = splitted[1];
}
System.out.println(FPSMin);
System.out.println(FPSMax);
}

Maybe your pattern should be like this ^FPS\\(FramesPerSecond\\)\\[ValMin: . I've tried it and it works for me.
String line = "FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]";
Pattern pattern = Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(line.substring(matcher.end(), line.length()-1));
}
}
In that way, you get the offset of the line that you want to extract data and using the substring function you can get all characters starting from offset until the size of the line-1 (because you dont want to get also the ] character)

The following regular expression will match and capture the name, min and max:
Pattern.compile("(.*)\\[.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*,.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*\\]");
Usage (extracting the captured groups):
String input = (".........\n" +
"FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]\n" +
".........\n" +
"TotalFrames[ValMin: 100000, ValMax:200000]\n" +
".........\n" +
"MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]\n" +
".........");
for (String s : input.split("\n")) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + ", " + matcher.group(2) + ", " + matcher.group(3));
}
}
Output:
FPS(FramesPerSecond), 29.0000, 35.000
TotalFrames, 100000, 200000
MemoryUsage(In MB), 190000, 360000

Splitting on JSON Payload with Regex to get Value

I am attempting to get a value out of a partial JSON payload just using the "split" method. I can only use this method since this API is very limited. I can get my value using Pattern and match APIs..
package com.company;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main {
public static void main(String[] args) {
// write your code here
String myString = "{\n" +
" \"8\": [\n" +
" {\n" +
" \"TEST\": \"LN17ELJ\",\n" +
" \"ROUTE_UNIQUE_ID_REFERENCE\": \"2172752\",\n" +
" \"ORDER_UNIQUE_ID_REFERENCE\": \"109197634\",\n" +
" \"STATUS\": \"HORLEY\",\n" +
" \"SECONDARY_NAV_CITY\": \"HORLEY\",\n" +
" \"ROUTE\": \"THE STREET 12\",\n";
String myRegexPattern = "\"([ROUTE_UNIQUE_ID_REFERENCE\"]+)\"\\s*:\\s*\"([^\"]+)\",?";
Pattern pattern = Pattern.compile(myRegexPattern);
Matcher matcher = pattern.matcher(myString);
if (matcher.find())
{
System.out.println(matcher.group(2));
} else {
System.out.println("Didn't work!");
}
}
}
However; When I try and using String.split it doesn't work and my value is not in any of the array indexes..
package com.company;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main {
public static void main(String[] args) {
// write your code here
String myString = "{\n" +
" \"8\": [\n" +
" {\n" +
" \"TEST\": \"LN17ELJ\",\n" +
" \"ROUTE_UNIQUE_ID_REFERENCE\": \"2172752\",\n" +
" \"ORDER_UNIQUE_ID_REFERENCE\": \"109197634\",\n" +
" \"STATUS\": \"HORLEY\",\n" +
" \"SECONDARY_NAV_CITY\": \"HORLEY\",\n" +
" \"ROUTE\": \"THE STREET 12\",\n";
String myRegexPattern = "\"([ROUTE_UNIQUE_ID_REFERENCE\"]+)\"\\s*:\\s*\"([^\"]+)\",?";
String[] newValue = myString.split(myRegexPattern);
for(int i = 0; i < newValue.length; i++) {
if(newValue[i].equals("2172752")) {
System.out.println("IT'S HERE!");
}
}
}
}
What would be the best way to do this? Is there a better way to get ROUTE_UNIQUE_ID_REFERENCE with just using split??

Your regular expression isn't doing what you are expecting. It is not matching "ROUTE_UNIQUE_ID_REFERENCE":"...", but matching any key that starts with any of the letters in ROUTE_UNIQUE_ID_REFERENCE. You could replace it with something like \"ROUTE_UNIQUE_ID_REFERENCE[\"\\s:]+([^\",]+) which will match what you are after in matcher group 1.
The split function doesn't work as you expect. The regular expression you are using in the split is viewed as the delimiter. Thus it is removing the data you are hoping to extract.
Assuming you are looking to get all of the values for ROUTE_UNIQUE_ID_REFERENCE, in the case there is more than one ROUTE_UNIQUE_ID_REFERENCE in your real data, your first example is closer to what you are after.
You need to fix your regular expression and matching group
Use a while loop instead of an if statement to find all the instances
String myRegexPattern = "\"ROUTE_UNIQUE_ID_REFERENCE[\"\\s:]+([^\",]+)";
Pattern pattern = Pattern.compile(myRegexPattern);
Matcher matcher = pattern.matcher(myString);
while (matcher.find()) {
System.out.println(matcher.group(1));
}

How to remove text between brackets in multiple lines

I have a big text files and I want to remove everything that is between
double curly brackets.
So given the text below:
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
String cleanedText = Pattern.compile("(?<=\\{\\{).*?\\}\\}", Pattern.DOTALL).matcher(text).replaceAll("");
System.out.println(cleanedText);
I want the output to be:
This is what I want.
I have googled around and tried many different things but I couldn't find anything close to my case and as soon as I change it a little bit everything gets worse.
Thanks in advance

You can use this :
public static void main(String[] args) {
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
String cleanedText = text.replaceAll("\\n", "");
while (cleanedText.contains("{{") && cleanedText.contains("}}")) {
cleanedText = cleanedText.replaceAll("\\{\\{[a-zA-Z\\s]*\\}\\}", "");
}
System.out.println(cleanedText);
}

A regular expression cannot express arbitrarily nested structures; i.e. any syntax that requires a recursive grammar to describe.
If you want to solve this using Java Pattern, you need to do it by repeated pattern matching. Here is one solution:
String res = input;
while (true) {
String tmp = res.replaceAll("\\{\\{[^}]*\\}\\}", "");
if (tmp.equals(res)) {
break;
}
res = tmp;
}
This is not very efficient ...
That can be transformed into an equivalent, but more concise form:
String res = input;
String tmp;
while (!(tmp = res.replaceAll("\\{\\{[^}]*\\}\\}", "")).equals(res)) {
res = tmp;
}
... but I prefer the first version because it is (IMO) a lot more readable.

I am not an expert in regular expression, so I just write a loop which does this for you. If you don't have/want to use a regEx, then it could be helpful for you;)
public static void main(String args[]) {
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
int openBrackets = 0;
String output = "";
char[] input = text.toCharArray();
for(int i=0;i<input.length;i++){
if(input[i] == '{'){
openBrackets++;
continue;
}
if(input[i] == '}'){
openBrackets--;
continue;
}
if(openBrackets==0){
output += input[i];
}
}
System.out.println(output);
}

My suggestion is to remove anything between curly brackets, starting at the innermost pair:
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
Pattern p = Pattern.compile("\\{\\{[^{}]+?}}", Pattern.MULTILINE);
while (p.matcher(text).find()) {
text = p.matcher(text).replaceAll("");
}
resulting in the output
This is
what I
want.
This might fail when having single curly brackets or unpaired pair of brackets, but could be good enough for your case.

Extracting Capture Group from Non-Capture Group in Java

I have a string, let's call it output, that's equals the following:
ltm data-group internal str_testclass {
records {
baz {
data "value 1"
}
foobar {
data "value 2"
}
topaz {}
}
type string
}
And I'm trying to extract the substring between the quotes for a given "record" name. So given foobar I want to extract value 2. The substring I want to extract will always come in the form I have prescribed above, after the "record" name, a whitespace, an open bracket, a new line, whitespace, the string data, and then the substring I want to capture is between the quotes from there. The one exception is when there is no value, which will always happen like I have prescribed above with topaz, in which case after the "record" name there will just be an open and closed bracket and I'd just like to get an empty string for this. How could I write a line of Java to capture this? So far I have ......
String myValue = output.replaceAll("(?:foobar\\s{\n\\s*data "([^\"]*)|()})","$1 $2");
But I'm not sure where to go from here.

Let's start extracting "records" structure with following regex ltm\s+data-group\s+internal\s+str_testclass\s*\{\s*records\s*\{\s*(?<records>([^\s}]+\s*\{\s*(data\s*"[^"]*")?\s*\}\s*)*)\}\s*type\s*string\s*\}
Then from "records" group, just find for sucessive match against [^\s}]+\s*\{\s*(?:data\s*"(?<data>[^"]*)")?\s*\}\s*. The "data" group contains what's you're looking for and will be null in "topaz" case.
Java strings:
"ltm\\s+data-group\\s+internal\\s+str_testclass\\s*\\{\\s*records\\s*\\{\\s*(?<records>([^\\s}]+\\s*\\{\\s*(data\\s*\"[^\"]*\")?\\s*\\}\\s*)*)\\}\\s*type\\s*string\\s*\\}"
"[^\\s}]+\\s*\\{\\s*(?:data\\s*\"(?<data>[^\"]*)\")?\\s*\\}\\s*"
Demo:
String input =
"ltm data-group internal str_testclass {\n" +
" records {\n" +
" baz {\n" +
" data \"value 1\"\n" +
" }\n" +
" foobar {\n" +
" data \"value 2\"\n" +
" }\n" +
" topaz {}\n" +
" empty { data \"\"}\n" +
" }\n" +
" type string\n" +
"}";
Pattern language = Pattern.compile("ltm\\s+data-group\\s+internal\\s+str_testclass\\s*\\{\\s*records\\s*\\{\\s*(?<records>([^\\s}]+\\s*\\{\\s*(data\\s*\"[^\"]*\")?\\s*\\}\\s*)*)\\}\\s*type\\s*string\\s*\\}");
Pattern record = Pattern.compile("(?<name>[^\\s}]+)\\s*\\{\\s*(?:data\\s*\"(?<data>[^\"]*)\")?\\s*\\}\\s*");
Matcher lgMatcher = language.matcher(input);
if (lgMatcher.matches()) {
String records = lgMatcher.group();
Matcher rdMatcher = record.matcher(records);
while (rdMatcher.find()) {
System.out.printf("%s:%s%n", rdMatcher.group("name"), rdMatcher.group("data"));
}
} else {
System.err.println("Language not recognized");
}
Output:
baz:value 1
foobar:value 2
topaz:null
empty:
Alernatives: As your parsing a custom language, you can give a try to write an ANTLR grammar or create Groovy DSL.

Your regex shouldn't even compile, because you are not escaping the " inside your regex String, so it is ending your String at the first " inside your regex.
Instead, try this regex:
String regex = key + "\\s\\{\\s*\\n\\s*data\\s*\"([^\"]*)\"";
You can check out how it works here on regex101.
Try something like this getRecord() method where key is the record 'name' you're searching for, e.g. foobar, and the input is the string you want to search through.
public static void main(String[] args) {
String input = "ltm data-group internal str_testclass { \n" +
" records { \n" +
" baz { \n" +
" data \"value 1\" \n" +
" } \n" +
" foobar { \n" +
" data \"value 2\" \n" +
" }\n" +
" topaz {}\n" +
" } \n" +
" type string \n" +
"}";
String bazValue = getRecord("baz", input);
String foobarValue = getRecord("foobar", input);
String topazValue = getRecord("topaz", input);
System.out.println("Record data value for 'baz' is '" + bazValue + "'");
System.out.println("Record data value for 'foobar' is '" + foobarValue + "'");
System.out.println("Record data value for 'topaz' is '" + topazValue + "'");
}
private static String getRecord(String key, String input) {
String regex = key + "\\s\\{\\s*\\n\\s*data\\s*\"([^\"]*)\"";
final Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
//if we find a record with data return it
return matcher.group(1);
} else {
//else see if the key exists with empty {}
final Pattern keyPattern = Pattern.compile(key);
Matcher keyMatcher = keyPattern.matcher(input);
if (keyMatcher.find()) {
//return empty string if key exists with empty {}
return "";
} else {
//else handle error, throw exception, etc.
System.err.println("Record not found for key: " + key);
throw new RuntimeException("Record not found for key: " + key);
}
}
}
Output:
Record data value for 'baz' is 'value 1'
Record data value for 'foobar' is 'value 2'
Record data value for 'topaz' is ''

You could try
(?:foobar\s{\s*data "(.*)")

I think the replaceAll() isn't necessary here. Would something like this work:
String var1 = "foobar";
String regex = '(?:' + var1 + '\s{\n\s*data "([^"]*)")';
You can then use this as your regex to pass into your pattern and matcher to find the substring.
You can simple transform this into a function so that you can pass variables into it for your search string:
public static void SearchString(String str)
{
String regex = '(?:' + str + '\s{\n\s*data "([^"]*)")';
}

regex capture groups returning as null after an OR operator

Matcher matcher = Pattern.compile("\\bwidth\\s*:\\s*(\\d+)px|\\bbackground\\s*:\\s*#([0-9A-Fa-f]+)").matcher(myString);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
Example data:
myString = width:17px;background:#555;float:left; will produce null.
What I wanted:
matcher.group(1) = 17
matcher.group(2) = 555
I've just started using regex on Java, any help?

I would suggest to split things a bit up.
Instead of building one large regex (maybe you want to add more rules into the String?) you should split up the string in multiple sections:
String myString = "width:17px;background:#555;float:left;";
String[] sections = myString.split(";"); // split string in multiple sections
for (String section : sections) {
// check if this section contains a width definition
if (section.matches("width\\s*:\\s*(\\d+)px.*")) {
System.out.println("width: " + section.split(":")[1].trim());
}
// check if this section contains a background definition
if (section.matches("background\\s*:\\s*#[0-9A-Fa-f]+.*")) {
System.out.println("background: " + section.split(":")[1].trim());
}
...
}

Here is a working example. Having | (or) in the regexp is usually confusing so I've added two more matchers to show how I would do it.
public static void main(String[] args) {
String myString = "width:17px;background:#555;float:left";
int matcherOffset = 1;
Matcher matcher = Pattern.compile("\\bwidth\\s*:\\s*(\\d+)px|\\bbackground\\s*:\\s*#([0-9A-Fa-f]+)").matcher(myString);
while (matcher.find()) {
System.out.println("found something: " + matcher.group(matcherOffset++));
}
matcher = Pattern.compile("width:(\\d+)px").matcher(myString);
if (matcher.find()) {
System.out.println("found width: " + matcher.group(1));
}
matcher = Pattern.compile("background:#(\\d+)").matcher(myString);
if (matcher.find()) {
System.out.println("found background: " + matcher.group(1));
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split string by tab and ignore tab in multi double quotes - java

This will do: [\t]+(?=([^"]"[^"]")[^"]$) See the DEMO

Related

Java regex extract specific values in long log

Splitting on JSON Payload with Regex to get Value

How to remove text between brackets in multiple lines

Extracting Capture Group from Non-Capture Group in Java

regex capture groups returning as null after an OR operator

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split string by tab and ignore tab in multi double quotes - java

This will do: [\t]+(?=([^"]*"[^"]*")*[^"]*$) See the DEMO

Related

Java regex extract specific values in long log

Splitting on JSON Payload with Regex to get Value

How to remove text between brackets in multiple lines

Extracting Capture Group from Non-Capture Group in Java

regex capture groups returning as null after an OR operator

Categories

Resources

This will do: [\t]+(?=([^"]"[^"]")[^"]$) See the DEMO