Replace single quote with double quote with Regex - java

I have an app that received a malformed JSON string like this:
{'username' : 'xirby'}
I need to replaced the single quotes ' with double quoates "
With these rule (I think):
A single quote comes after a { with one or more spaces
Comes before one or more spaces and :
Comes after a : with one more spaces
Comes before one or more spaces and }
So this String {'username' : 'xirby'} or
{ 'username' : 'xirby' }
Would be transformed to:
{"username" : "xirby"}
Update:
Also a possible malformed JSON String:
{ 'message' : 'there's not much to say' }
In this example the single quote inside the message value should not be replaced.

Try this regex:
\s*\'\s*
and a call to Replace with " will do the job. Look at here.

Instead of doing this, you're better off using a JSON parser which can read such malformed JSON and "normalize" it for you. Jackson can do that:
final ObjectReader reader = new ObjectMapper()
.configure(Feature.ALLOW_SINGLE_QUOTES, true)
.reader();
final JsonNode node = reader.readTree(yourMalformedJson);
// node.toString() does the right thing

This regex will capture all appropriate single quotes and associated white spaces while ignoring single quotes inside a message. One can replace the captured characters with double quotes, while preserving the JSON format. It also generalizes to JSON strings with multiple messages (delimited by commas ,).
((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))
I know you tagged your question for java, but I'm more familiar with python. Here's an example of how you can replace the single quotes with double quotes in python:
import re
regex = re.compile('((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))')
s = "{ 'first_name' : 'Shaquille' , 'lastname' : 'O'Neal' }"
regex.sub('"', s)
> '{"first_name":"Shaquille","lastname":"O\'Neal"}'
This method looks for single quotes next to the symbols {},: using look-ahead and look-behind operations.

String test = "{'username' : 'xirby'}";
String replaced = test.replaceAll("'", "\"");

Concerning your question's tag is JAVA, I answered in JAVA.
At first import the libraries:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Then:
Pattern p = Pattern.compile("((?<=(\\{|\\[|\\,|:))\\s*')|('\\s*(?=(\\}|(\\])|(\\,|:))))");
String s = "{ 'firstName' : 'Malus' , 'lastName' : ' Ms'Malus' , marks:[ ' A+ ', 'B+']}";
String replace = "\"";
String o;
Matcher m = p.matcher(s);
o = m.replaceAll(replace);
System.out.println(o);
Output:
{"firstName":"Malus","lastName":" Ms'Malus", marks:[" A+ ","B+"]}

If you're looking to exactly satisfy all of those conditions, try this:
'{(\s)?\'(.*)\'(\s)?:(\s)?\'(.*)\'(\s)?}'
as you regex. It uses (\s)? to match one or zero whitespace characters.

I recommend you to use a JSON parser instead of REGEX.
String strJson = "{ 'username' : 'xirby' }";
strJson = new JSONObject(strJson).toString();
System.out.println(strJson);

Related

java JSON string formatting with regular expression

For a given plain JSON data do the following formatting:
replace all the special characters in key with underscore
remove the key double quote
replace the : with =
Example:
JSON Data: {"no/me": "139.82", "gc.pp": "\u0000\u000", ...}
After formatting: no_me="139.82", gc_pp="\u0000\u000"
Is it possible with a regular expression? or any other single command execution?
A single regex for the whole changes may be overkill. I think you could code something similar to this:
(NOTE: Since i do not code in java, my example is in javascript, just to get you the idea of it)
var json_data = '{"no/me": "139.82", "gc.pp": "0000000", "foo":"bar"}';
console.log(json_data);
var data = JSON.parse(json_data);
var out = '';
for (var x in data) {
var clean_x = x.replace(/[^a-zA-Z0-9]/g, "_");
if (out != '') out += ', ';
out += clean_x + '="' + data[x] + '"';
}
console.log(out);
Basically you loop through the keys and clean them (remove not-wanted characters), with the new key and the original value you create a new string with the format you like.
Important: Bear in mind overlapping ids. For example, both no/me and no#me will overlap into same id no_me. this may not be important since your are not outputting a JSON after all. I tell you just in case.
I haven't done Java in a long time, but I think you need something like this.
I'm assuming you mean 'all Non-Word characters' by specialchars here.
import java.util.regex.*;
String JsonData = '{"no/me": "139.82", "gc.pp": "\u0000\u000", ...}';
// remove { and }
JsonData = JsonData.substring(0, JsonData.length() - 1);
try {
Pattern regex = Pattern.compile("(\"[^\"]+\")\\s*:"); // find the keys, including quotes and colon
Matcher regexMatcher = regex.matcher(JsonData);
while (regexMatcher.find()) {
String temp = regexMatcher.group(1); // "no/me":
String key = regexMatcher.group(2).replaceAll("\\W", "_") + "="; // no_me=
JsonData.replaceAll(temp, key);
}
} catch (PatternSyntaxException ex) {
// regex has syntax error
}
System.out.println(JsonData);

Java regex pattern match for []

I have a string output which I need to match and I am using a regex
String schemaName = "Amazon";
String test = "{\"data\": [], \"name\": \"Amazon\", \"title\": \"StoreDataConfig\"}";
String output= method("\\[\\]",schemaName);
Matcher n = Pattern.compile(output).matcher(test);
boolean available = n.find();
System.out.println(available);
I wanted to validate the same and passing the regex to a method as mentioned
private static String method(String data, String schemaName) throws IOException {
System.out.println(data);
return ("{\"data\": " + data + ", \"name\": " + "\"" + schemaName + "\"" + ", \"title\": \"StoreDataConfig\"}");
}
But I am always getting java.util.regex.PatternSyntaxException: Illegal repetition.
Can you let me know what is the mistake?
If I don't use a method for [] and just giving it directly, I am not getting an error
It looks like you are doing this:
Take a valid regex for matching [].
Embed the regex in some JSON
Attempt to compile the JSON-with-an-embedded-regex as if the whole lot was a valid regex.
That fails ... because the JSON-with-an-embedded-regex is not a valid regex.
For a start, the { character is a regex meta character.
But the real puzzle is .... what are you actually trying to do here?
If you simply want a regex that matches a literal string then this will do it.
Pattern p = Pattern.compile(Pattern.quote(someLiteralString)).
And you could build a regex out of sub-regexes and literal strings by using Pattern.quote to escape the literal parts and then concatenating.
If what you are ultimately trying to do here is to extract information from a JSON string using pattern matching / regexes, then ... don't. The better approach is to use a proper JSON parser, and extract the information you need from the JSON object tree.
It's because you need to escape {} characters like this "\\{"

CSV with tab as quote character

I have tried several CSV parsers for Java but non of them handled the following line properly:
String str = "\tvalue1\t,,\tv1,",',v3\t,value2"
The format is comma separated with TAB as escape character. Part of fields empty, part not escaped.
Any suggestion for parser which handles this format good?
For example I would expect that the above string will be parsed as:
value1
null
v1,",',v3
value2
But it's producing the following:
value1
null
v1
"
'
v3
value2
Java Example:
import java.lang.String;
import com.univocity.parsers.csv.CsvParser;
import com.univocity.parsers.csv.CsvParserSettings;
public class StamMain {
public static void main(String[] args){
String str = "\tvalue1\t,,\tv1,',",v3\t,value2";
System.out.println(str);
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setQuote('\t');
CsvParser parser = new CsvParser(settings);
String[] fields = parser.parseLine(str);
for (String f : fields)
System.out.println(f);
}
}
The best results achieved if TAB replaced by quote, but quoting quotes is interesting task by itself.
Any ideas appreciated.
Apache Commons CSV can handle it just fine.
String str = "\tvalue1\t,,\tv1,\",',v3\t,value2";
CSVFormat csvFormat = CSVFormat.DEFAULT.withQuote('\t');
for (CSVRecord record : CSVParser.parse(str, csvFormat))
for (String value : record)
System.out.println(value);
Output
value1
v1,",',v3
value2
You can even add .withNullString("") to get that null value, if you want.
value1
null
v1,",',v3
value2
Very flexible CSV parser.
Just add this line before parsing to get the result you expect:
settings.trimValues(false);
This is required because by default the parser removes white spaces around delimiters, but your "quote" character happens to be a white space. Regardless, this is something the parser should handle. I opened this bug report to have it fixed in the next version of uniVocity-parsers.
Works with Super CSV
ICsvListReader reader = new CsvListReader(
new FileReader("weird.csv"),
CsvPreference.Builder('\t', ',', "\r\n").build()
);
List<String> record = reader.read();
for(String value : record)
System.out.println(value);
Output:
value1
null
v1,",',v3
value2
One option is to:
1) Replace all the double quotes in your string with some "good" replacement string that you know won't be in the actual data (e.g. "REPLACE_QUOTES_TEMP")
2) Replace all tabs with double quotes.
3) Run the parser as normal.
4) Replace back the "REPLACE_QUOTES_TEMP" strings (or whatever you chose), in the individual fields, with the actual double quote.
The String "\tvalue1\t,,\tv1,",',v3\t,value2" is not valid. to include '"' as character you need to write '\"'.
For parsing this code should work:
String st = "\tvalue1\t,,\tv1,\",',v3\t,value2";
String[] arr = st.split("\t");

Split string with alternative comma (,)

I know how to tokenize the String, but the Problem is I want to tokenize the as shown below.
String st = "'test1, test2','test3, test4'";
What I've tried is as below:
st.split(",");
This is giving me output as:
'test1
test2'
'test3
test4'
But I want output as:
'test1, test2'
'test3, test4'
How do i do this?
Since single quotes are not mandatory, split will not work, because Java's regex engine does not allow variable-length lookbehind expressions. Here is a simple solution that uses regex to match the content, not the delimiters:
String st = "'test1, test2','test3, test4',test5,'test6, test7',test8";
Pattern p = Pattern.compile("('[^']*'|[^,]*)(?:,?)");
Matcher m = p.matcher(st);
while (m.find()) {
System.out.println(m.group(1));
}
Demo on ideone.
You can add syntax for escaping single quotes by altering the "content" portion of the quoted substring (currently, it's [^']*, meaning "anything except a single quote repeated zero or more times).
The easiest and reliable solution would be to use a CSV parser. Maybe Commons CSV would help.
It will scape the strings based on CSV rules. So even '' could be used within the value without breaking it.
A sample code would be like:
ByteArrayInputStream baos = new ByteArrayInputStream("'test1, test2','test3, test4'".getBytes());
CSVReader reader = new CSVReader(new InputStreamReader(baos), ',', '\'');
String[] read = reader.readNext();
System.out.println("0: " + read[0]);
System.out.println("1: " + read[1]);
reader.close();
This would print:
0: test1, test2
1: test3, test4
If you use maven you can just import the dependency:
<dependency>
<groupId>net.sf.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>2.0</version>
</dependency>
And start using it.

Regexp to get the content of a string between two quotes, starting with a given name

I have multiple lines in an xml file.
My lines are like <Blog blogDescription="bla bla bla" description="" date="2010-10-10"/>
I'm working on all lines starting with "<Blog" where I want to :
Set the content of blogDescription field into description field
Remove blogDescription field
So my line would be like :
<Blog description="bla bla bla" date="2010-10-10"/>
I don't know what kind of regexp i can use, I only get the line with :
"^<(Blog) .*"
And I remove blogDescription field with :
" blogDescription="
But I don't know how to put the blogDescription value into description value.
If you're already working with XML that is correctly formatted, rather than building a parser yourself via regex, why not just use one of the XML parsers available to you? There are many available to do this.
See this related question:
Parsing XML in Java
String val = "<Blog blogDescription=\"bla bla bla\" description=\"\" date=\"2010-10-10\"/>";
String regex = "^<Blog (blogDescription=\"[^\"]*\"\\s+).*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(val);
matcher.matches();
MatchResult result = matcher.toMatchResult();
System.out.println(result.group(1));
String resultString = val.replace(result.group(1), "");
System.out.println(resultString);
you can use like this:
String str = "<Blog blogDescription=\"bla bla bla\" description=\"\" date=\"2010-10-10\"/>";
System.out.println(str.replaceAll("blogDescription=\"([^\"]+)\"\\s+description=\"[^\"]*\"",
"description=\"$1\""));
.I don't know if there is any newline in the string.
it would not work if you have one newline in the string liking:
blogDescription="bla \nbla"\n description=;

Categories

Resources