regex for splitting key value-pair containing comma

regex for splitting key value-pair containing comma - java

I need a regex to split key-value pairs.Key and value are separated by =
Values can contain comma(,) but if they contain comma(,) they need to be enclosed by ("").Also the value in ("") can contain multiple inner ("") with comma(,) in them.Hence multiple level of nesting with (" , ") is possible.
Key can anything except ( comman(,) equal(=) double quote("") )
Example- abc="hi my name is "ayush,nigam"",def="i live at "bangalore",ghi=bangalore is in karnataka,jkl="i am from UP"
Another example - "ayush="piyush="abc,def",bce="asb,dsa"",aman=nigam"
I expect output as ayush="piyush="abc,def",bce="asb,dsa"" and aman=nigam
I am using the following regex code in java.
Pattern abc=Pattern.compile("([^=,]*)=((?:\"[^\"]*\"|[^,\"])*)");
String text2="AssemblyName=(foo.dll),ClassName=\"SomeClassanotherClass=\"a,b\"\"";
Matcher m=abc.matcher(text2);
while(m.find()) {
String kvPair = m.group();
System.out.println(kvPair);
}
I am getting folliwng kvPair
:
AssemblyName=(foo.dll)
ClassName="SomeClassanotherClass="a
Where as i need to get,
AssemblyName=(foo.dll)
ClassName="SomeClassanotherClass="a,b"
Hence comma(,) in inner double quotes("") are not being parse properly.Please help.

Related

Filtering string between double or single quotations with varying spaces

I have these two variations of this string
name='Anything can go here'
name="Anything can go here"
where name= can have spaces like so
name=(text)
name =(text)
name = (text)
I need to extract the text between the quotes, I'm not sure what's the best way to approach this, should I just have mechanism to cut the string off at quotes and do you have an example where I wont have many case handling, or should I use regex.

I'm not sure I understand the question exactly but I'll give it my best shot:
If you want to just assign a variable name2 to the string inside the quotation marks then you can easily do :
String name = 'Anything can go here';
String name2= name.replace("'","");
name2 = name2.replace("\"","");

You're wanting to get Anything can go here whether it's in between single quotes or double quotes. Regex has the capabilities of doing this regardless of the spaces before or after the "=" by using the following pattern:
"[\"'](.+)[\"']"
Breakdown:
[\"'] - Character class consisting of a double or single quote
(.+) - One or more of any character (may or may not match line terminators stored in capture group 1
[\"'] - Character class consisting of a double or single quote
In short, we are trying to capture anything between single or double quotes.
Example:
public static void main(String[] args) {
List<String> data = new ArrayList(Arrays.asList(
"name='Anything can go here'",
"name = \"Really! Anything can go here\""
));
for (String d : data) {
Matcher matcher = Pattern.compile("[\"'](.+)[\"']").matcher(d);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
}
Results:
Anything can go here
Really! Anything can go here

Splitting a string with a certain pattern in Java

I am writing a parser for a file containing the following string pattern:
Key : value
Key : value
Key : value
etc...
I am able to retrieve those lines one by one into a list. What I would like to do is to separate the key from the value for each one of those strings. I know there is the split() method that can take a Regex and do this for me, but I am very unfamiliar with them so I don't know what Regex to give as a parameter to the split() function.
Also, while not in the specifications of the file I am parsing, I would like for that Regex to be able to recognize the following patterns as well (if possible):
Key: value
Key :value
Key:value
etc...
So basically, whether there's a space or not after/before/after AND before the : character, I would like for that Regex to be able to detect it. What is the Regex that can achieve this?

In other words split method should look for : and zero or more whitespaces before or after it.
Key: value
^^
Key :value
^^
Key:value
^
Key : value
^^^
In that case split("\\s*:\\s*") should do the trick.
Explanation:
\\s represents any whitespace
* means one or more occurrences of element described before it
\\s* means zero or more whitespaces.
On the other hand you may want also to find entire key:value pair and place parts matching key and value in separate groups (you can even name groups as you like using (?<groupName>regex)). In that case you may use
Pattern p = Pattern.compile("(?<key>\\w+)\\s*:\\s*(?<value>\\w+)");
Matcher m = p.matcher(yourData);
while(m.find()){
System.out.println("key = " + m.group("key"));
System.out.println("value = " + m.group("value"));
System.out.println("--------");
}

If you want to use String.split(), you could use this:
String input = "key : value";
String[] s = input.split("\\s*:\\s*");
String key = s[0];
String value = s[1];
This will split the String at the ":", but add all whitespaces in front of the ":" to it, so that you will receive a trimmed string.
Explanation:
\\s* will match any whitespace, by default this is equal to [ \\n\\r\\t]*
The : in between the two \\s* means that your : need to be there
Note that this solution will cause an ArrayIndexOutOfBoundsException if your input line does not contain the key-value-format as you defined it.
If you are not sure if the line really contain the key-value-String, maybe because you want to have an empty line at the end of your file like there normally is, you could do it like that:
String input = "key : value";
Matcher m = Pattern.compile("(\\S+)\\s*:\\s*(.+)").matcher(input);
if (m.matches())
{
String key = m.group(1); // note that the count starts by 1 here
String value = m.group(2);
}
Explanation:
\\S+ matches any non-whitespace String - if it contains whitespaces, the next part of the regex will be matches with this expression already. Note that the () around it mark so that you can get it's value by m.group().
\\s* will match any whitespace, by default this is equal to [ \\n\\r\\t]*
The : in between the two \\s* means that your : need to be there
The last group, .+, will match any string, containing whitespaces and so on.

you can use the split method but can pass delimiter as ":"
This splits the string when it sees ':', then you can trim the values to get the key and value.
String s = " keys : value ";
String keyValuePairs[] = s.split(":");
String key = keyValuePairs[0].trim();
String value = keyValuePairs[1].trim();
You can also make use of regex to simplify it.
String keyValuePairs[] = s.trim().split("[ ]*:[ ]*");
s.trim() will remove the spaces before and after the string (if you have it in your case), So sting will become "keys : value" and
[ ]*:[ ]*
to split the string with regular expression saying spaces (one or more) : spaces (one or more) as delimiter.

For a pure regex solution, you can use the following pattern (note the space at the beginning):
?: ?
See http://regexr.com/39evh

String[] tokensVal = str.split(":");
String key = tokensVal[0].trim();
String value = tokensVal[1].trim();

Replace multiple characters in a string in Java

I have some strings with equations in the following format ((a+b)/(c+(d*e))).
I also have a text file that contains the names of each variable, e.g.:
a velocity
b distance
c time
etc...
What would be the best way for me to write code so that it plugs in velocity everywhere a occurs, and distance for b, and so on?

Don't use String#replaceAll in this case if there is slight chance part you will replace your string contains substring that you will want to replace later, like "distance" contains a and if you will want to replace a later with "velocity" you will end up with "disvelocityance".
It can be same problem as if you would like to replace A with B and B with A. For this kind of text manipulation you can use appendReplacement and appendTail from Matcher class. Here is example
String input = "((a+b)/(c+(d*e)))";
Map<String, String> replacementsMap = new HashMap<>();
replacementsMap.put("a", "velocity");
replacementsMap.put("b", "distance");
replacementsMap.put("c", "time");
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("\\b(a|b|c)\\b");
Matcher m = p.matcher(input);
while (m.find())
m.appendReplacement(sb, replacementsMap.get(m.group()));
m.appendTail(sb);
System.out.println(sb);
Output:
((velocity+distance)/(time+(d*e)))
This code will try to find each occurrence of a or b or c which isn't part of some word (it doesn't have any character before or after it - done with help of \b which represents word boundaries). appendReplacement is method which will append to StringBuffer text from last match (or from beginning if it is first match) but will replace found match with new word (I get replacement from Map). appendTail will put to StringBuilder text after last match.
Also to make this code more dynamic, regex should be generated automatically based on keys used in Map. You can use this code to do it
StringBuilder regexBuilder = new StringBuilder("\\b(");
for (String word:replacementsMap.keySet())
regexBuilder.append(Pattern.quote(word)).append('|');
regexBuilder.deleteCharAt(regexBuilder.length()-1);//lets remove last "|"
regexBuilder.append(")\\b");
String regex = regexBuilder.toString();

I'd make a hashMap mapping the variable names to the descriptions, then iterate through all the characters in the string and replace each occurrance of a recognised key with it's mapping.
I would use a StringBuilder to build up the new string.

Using a hashmap and iterating over the string as A Boschman suggested is one good solution.
Another solution would be to do what others have suggested and do a .replaceAll(); however, you would want to use a regular expression to specify that only the words matching the whole variable name and not a substring are replaced. A regex using word boundary '\b' matching will provide this solution.
String variable = "a";
String newVariable = "velocity";
str.replaceAll("\\b" + variable + "\\b", newVariable);
See http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html

For string str, use the replaceAll() function:
str = str.toUpperCase(); //Prevent substitutions of characters in the middle of a word
str = str.replaceAll("A", "velocity");
str = str.replaceAll("B", "distance");
//etc.

Stuck in regular expression

I have 3 strings that contain 2 fields and 2 values per string. I need a regular expression for the strings so I can get the data. Here are the 3 strings:
TTextRecordByLanguage{Text=Enter here the amount to transfer from your compulsory book saving account to your compulsory checking account; Id=55; }
TTextRecordByLanguage{Text=Hello World, CaribPayActivity!; Id=2; }
TTextRecordByLanguage{Text=(iphone); Id=4; }
The 2 fields are Text and Id, so I need an expression that gets the data between the Text field and the semi-colon (;). Make sure special symbols and any data are included.
Update ::
What i have tried.....
Pattern pinPattern = Pattern.compile("Text=([a-zA-Z0-9 \\E]*);");
ArrayList<String> pins = new ArrayList<String>();
Matcher m = pinPattern.matcher(soapObject.toString());
while (m.find()) {
pins.add(m.group(1));
s[i] = m.group(1);
}
Log.i("TAG", "ARRAY=>"+ s[i]);

I suggest a RE like this:
Text=.*?;
e.g: a returned of the last string should be
Text=(iphone);
then you may eliminate Text= and ; out of string as you want the content only.

extracting specific but unknown values from a string in Java

I am trying to extract values from a MySQL insert command in Java. The insert command is just a string as far as Java is concerned. it will be of the format
INSERT INTO employees VALUES ("John Doe", "45", "engineer");
I need to pull the '45' out of that statement. I can't pinpoint its index because names and job titles will be different. I only need the age. Other than overly complex string manipulation which I could probably figure out in time, is there a more straight forward way of isolating those characters? I just cant seem to wrap my mind around how to do it and I am not very familiar with regular expressions.

If this is the specific format of your message, then a regex like that should help:
INSERT INTO employees VALUES (".*?", "(.*?)", ".*?");
The read the first group of the result and you should get the age.
In regular expressions (X) defines a matching group that captures X (where X can be any regular expression). This means that if the entire regular expression matches, then you can easily find out the value within this matching group (using Matcher.group() in Java).
You can also have multiple matching groups in a single regex like this:
INSERT INTO employees VALUES ("(.*?)", "(.*?)", "(.*?)");
So your code could look like this:
String sql = "INSERT INTO employees VALUES (\"John Doe\", \"45\", \"engineer\");";
final Pattern patter = Pattern.compile("INSERT INTO employees VALUES (\"(.*?)\", \"(.*?)\", \"(.*?)\");");
final Matcher matcher = pattern.matcher(sql);
if (matcher.matches()) {
String name = matcher.group(1);
String age = matcher.group(2);
String job = matcher.group(3);
// do stuff ...
}

assuming that name doesn't contain any " you can use regex .*?".*?".*?"(\d+)".* and group(1) gives you the age.

As far as I understand your insert command will insert into 3 columns only. What you can probably do split the string on the character comma (,) and then get the second element of the array, trim left and right white spaces and then extract the elements of it except the first and last character. That should fetch you the age. Writing a psuedocode for it:
String insertQuery="INSERT INTO employees VALUES (\"John Doe\", \"45\", \"engineer\")";
String splitQuery=insertQuery.split(",");
String age=splitQuery[1];
age=age.trim();
age=age.substring(1, age.length-2);

If you are sure that there is only one instance of a number in the string, the regular expression you need is very simple:
//Assuming that str contains your insert statement
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher(str);
if(m.find()) System.out.println(m.group());

How about String.split by ","?
final String insert = "INSERT INTO employees VALUES (\"John Doe\", \"45\", \"engineer\"); ";
System.out.println(insert.split(",")[1].trim());

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex for splitting key value-pair containing comma - java

Related

Filtering string between double or single quotations with varying spaces

Splitting a string with a certain pattern in Java

Replace multiple characters in a string in Java

Stuck in regular expression

extracting specific but unknown values from a string in Java

Categories

Resources