I have tried several CSV parsers for Java but non of them handled the following line properly:
String str = "\tvalue1\t,,\tv1,",',v3\t,value2"
The format is comma separated with TAB as escape character. Part of fields empty, part not escaped.
Any suggestion for parser which handles this format good?
For example I would expect that the above string will be parsed as:
value1
null
v1,",',v3
value2
But it's producing the following:
value1
null
v1
"
'
v3
value2
Java Example:
import java.lang.String;
import com.univocity.parsers.csv.CsvParser;
import com.univocity.parsers.csv.CsvParserSettings;
public class StamMain {
public static void main(String[] args){
String str = "\tvalue1\t,,\tv1,',",v3\t,value2";
System.out.println(str);
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setQuote('\t');
CsvParser parser = new CsvParser(settings);
String[] fields = parser.parseLine(str);
for (String f : fields)
System.out.println(f);
}
}
The best results achieved if TAB replaced by quote, but quoting quotes is interesting task by itself.
Any ideas appreciated.
Apache Commons CSV can handle it just fine.
String str = "\tvalue1\t,,\tv1,\",',v3\t,value2";
CSVFormat csvFormat = CSVFormat.DEFAULT.withQuote('\t');
for (CSVRecord record : CSVParser.parse(str, csvFormat))
for (String value : record)
System.out.println(value);
Output
value1
v1,",',v3
value2
You can even add .withNullString("") to get that null value, if you want.
value1
null
v1,",',v3
value2
Very flexible CSV parser.
Just add this line before parsing to get the result you expect:
settings.trimValues(false);
This is required because by default the parser removes white spaces around delimiters, but your "quote" character happens to be a white space. Regardless, this is something the parser should handle. I opened this bug report to have it fixed in the next version of uniVocity-parsers.
Works with Super CSV
ICsvListReader reader = new CsvListReader(
new FileReader("weird.csv"),
CsvPreference.Builder('\t', ',', "\r\n").build()
);
List<String> record = reader.read();
for(String value : record)
System.out.println(value);
Output:
value1
null
v1,",',v3
value2
One option is to:
1) Replace all the double quotes in your string with some "good" replacement string that you know won't be in the actual data (e.g. "REPLACE_QUOTES_TEMP")
2) Replace all tabs with double quotes.
3) Run the parser as normal.
4) Replace back the "REPLACE_QUOTES_TEMP" strings (or whatever you chose), in the individual fields, with the actual double quote.
The String "\tvalue1\t,,\tv1,",',v3\t,value2" is not valid. to include '"' as character you need to write '\"'.
For parsing this code should work:
String st = "\tvalue1\t,,\tv1,\",',v3\t,value2";
String[] arr = st.split("\t");
Related
I'm using apache commons.csv.CSVparser. I want to use a String array in a csv record for instance:
"\"[\"54bb051e-3d12-11e5-91cd-b8f6b11b7feb\",\"472a9748-3d12-11e5-91cd-b8f6b11b7feb\"]\",Hallo,114058,Leon,31,\" \",8400,bar,FOO";
CSVParser csvParser = CSVFormat.DEFAULT
.withDelimiter(CSV_SEPARATOR).withQuote(null)
.withFirstRecordAsHeader()
.parse(new StringReader(line));
How to escape the comma in the String[] array? After the record is readin the Strings get split into a java array.
I tried this:
#Test
public void processLine() throws Exception {
String line = "Ids,Info.name,Info.number,address.street,address.number,address.bus,address.postalcode,address.city," +
"address.country\n" +
"\"[\"\"54bb051e-3d12-11e5-91cd-b8f6b11b7feb\"\",\"\"472a9748-3d12-11e5-91cd-b8f6b11b7feb\"\"]\",Hallo,114058,Leon,31,\" \",8400,foo,BAR";
CSVParser csvParser = CSVFormat.DEFAULT
.withDelimiter(CSV_SEPARATOR).withQuote(null)
.withFirstRecordAsHeader()
.parse(new StringReader(line));
The comma of the String[] still been seen as a delimiter.
You need to escape correctly the CSV content. Try this out:
"\"[\"\"54bb051e-3d12-11e5-91cd-b8f6b11b7feb\"\",\"\"472a9748-3d12-11e5-91cd-b8f6b11b7feb\"\"]\",Hallo,114058,Leon,31,\" \",8400,bar,FOO"
The escaping gets confuse because you mix Java and CSV. While in java you need to user \" to escape the double quotes, on CSV you need double-double quotes to escape it. At the end you need a \"\" to get the output "" on string. The final string would look like: "[""54bb051e-3d12-11e5-91cd-b8f6b11b7feb"",""472a9748-3d12-11e5-91cd-b8f6b11b7feb""]",Hallo,114058,Leon,31," ",8400,bar,FOO. Being the first value on the CSV: ["54bb051e-3d12-11e5-91cd-b8f6b11b7feb","472a9748-3d12-11e5-91cd-b8f6b11b7feb"]
Additionally your string doesn't contain header, so you need to take care with withFirstRecordAsHeader().
This:
CSVParser csvParser = CSVFormat.DEFAULT.withDelimiter(',').withQuote('"').parse(new StringReader(
"\"[\"\"54bb051e-3d12-11e5-91cd-b8f6b11b7feb\"\",\"\"472a9748-3d12-11e5-91cd-b8f6b11b7feb\"\"]\",Hallo,114058,Leon,31,\" \",8400,bar,FOO"));
System.out.println(csvParser.getRecords().get(0).get(0));
Will output the following string:
["54bb051e-3d12-11e5-91cd-b8f6b11b7feb","472a9748-3d12-11e5-91cd-b8f6b11b7feb"]
And this string can be used be parsed into a String[].
You should not generate your own CSV line for testing, you already have the library to create it properly. You had the idea to use the Apache Commons to read the CSV but not to create it.
Using a CSVPrinter will "escape" the delimiter if needed(by escape, it will you double quotes the values as the format allows it)
//Get a printer on the System.out
CSVPrinter printer = CSVFormat.DEFAULT.withHeader("A", "B").printer();
// Create the pojos
List<POJO> pojos = new ArrayList<>();
pojos.add(new POJO("foo", "bar"));
pojos.add(new POJO("far", "boo"));
pojos.add(new POJO("for", "bao"));
pojos.add(new POJO("test,", "comma"));
for(POJO p : pojos) {
printer.printRecord(p.a, p.b);
}
A,B
foo,bar
far,boo
for,bao
"test,",comma
Using the POJO class
public class POJO{
String a;
String b;
public POJO(String a, String b) {
this.a = a;
this.b = b;
}
#Override
public String toString() {
return "POJO [a=" + a + " ## b=" + b + "]";
}
}
Note : this is probably not the perfect usage of the library, I have only used it once (now) but this is to show you that this can/should be done using the API instead of creating your own "CSV" line
And to show that this will be recovered correctly, let use an Appendable to store the CSV :
StringBuffer sb = new StringBuffer();
CSVPrinter printer = CSVFormat.DEFAULT.withHeader("A", "B").print(sb);
List<POJO> pojos = new ArrayList<>();
pojos.add(new POJO("foo", "bar"));
pojos.add(new POJO("far", "boo"));
pojos.add(new POJO("for", "bao"));
pojos.add(new POJO("test,", "comma"));
for(POJO p : pojos) {
printer.printRecord(p.a, p.b);
}
System.out.println("PRINTER");
System.out.println(sb.toString());
PRINTER
A,B
foo,bar
far,boo
for,bao
"test,",comma
And parse that String and create the POJO back :
CSVParser parser = CSVFormat.DEFAULT
.withFirstRecordAsHeader()
.parse(new StringReader(sb.toString()));
System.out.println("PARSER");
parser.getRecords().stream().map(r -> new POJO(r.get(0), r.get(1))).forEach(System.out::println);
PARSER
POJO [a=foo ## b=bar]
POJO [a=far ## b=boo]
POJO [a=for ## b=bao]
POJO [a=test, ## b=comma
I have text in csv format like below.
hi="hello",how="are",,,you="",thank="you"
Please help me to get a regex to extract the above text
hello,are,,,,you
Basically I want to take only values in the above key value pairs and keep the commas as they are (technically to frame a perfect csv)
Note: I am actually looking for a pure regex, not with java functions..please.
Thank you
So starting from your sentence :
String str = "hi=\"hello\",how=\"are\",,,you=\"\",thank=\"you\"";
You can remove all occurences of word=" and of " :
str = str.replaceAll("(\\w*=\"|\")", ""); //remove word=" OR "
Another way, this will catch the group key="value" and replace it by value :
str = str.replaceAll("\\w+=\"(.*?)\"", "$1"); // $1 is the value catch in ()
I want to put my string in (.csv) file.
but if string contains commas, it split and moves to next cell.
String str=resultSet.getString();
Since you're writing a CSV file with comma delimiter and your text happens to have a comma in it as well, you need to wrap your text within double quotes:
String str = "\"hello, world\"";
So, if the string you want to write is str:
String str = ...;
...
str = "\"" + str + "\"";// do this before put str content in your csv file
This should work.
Double quote the column value if there's an embedded comma:
Ex:
column1_data, column2_data, "column3_data, new_data", column4_data
here it will consider column3_data, new_data as value in column3
I have an app that received a malformed JSON string like this:
{'username' : 'xirby'}
I need to replaced the single quotes ' with double quoates "
With these rule (I think):
A single quote comes after a { with one or more spaces
Comes before one or more spaces and :
Comes after a : with one more spaces
Comes before one or more spaces and }
So this String {'username' : 'xirby'} or
{ 'username' : 'xirby' }
Would be transformed to:
{"username" : "xirby"}
Update:
Also a possible malformed JSON String:
{ 'message' : 'there's not much to say' }
In this example the single quote inside the message value should not be replaced.
Try this regex:
\s*\'\s*
and a call to Replace with " will do the job. Look at here.
Instead of doing this, you're better off using a JSON parser which can read such malformed JSON and "normalize" it for you. Jackson can do that:
final ObjectReader reader = new ObjectMapper()
.configure(Feature.ALLOW_SINGLE_QUOTES, true)
.reader();
final JsonNode node = reader.readTree(yourMalformedJson);
// node.toString() does the right thing
This regex will capture all appropriate single quotes and associated white spaces while ignoring single quotes inside a message. One can replace the captured characters with double quotes, while preserving the JSON format. It also generalizes to JSON strings with multiple messages (delimited by commas ,).
((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))
I know you tagged your question for java, but I'm more familiar with python. Here's an example of how you can replace the single quotes with double quotes in python:
import re
regex = re.compile('((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))')
s = "{ 'first_name' : 'Shaquille' , 'lastname' : 'O'Neal' }"
regex.sub('"', s)
> '{"first_name":"Shaquille","lastname":"O\'Neal"}'
This method looks for single quotes next to the symbols {},: using look-ahead and look-behind operations.
String test = "{'username' : 'xirby'}";
String replaced = test.replaceAll("'", "\"");
Concerning your question's tag is JAVA, I answered in JAVA.
At first import the libraries:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Then:
Pattern p = Pattern.compile("((?<=(\\{|\\[|\\,|:))\\s*')|('\\s*(?=(\\}|(\\])|(\\,|:))))");
String s = "{ 'firstName' : 'Malus' , 'lastName' : ' Ms'Malus' , marks:[ ' A+ ', 'B+']}";
String replace = "\"";
String o;
Matcher m = p.matcher(s);
o = m.replaceAll(replace);
System.out.println(o);
Output:
{"firstName":"Malus","lastName":" Ms'Malus", marks:[" A+ ","B+"]}
If you're looking to exactly satisfy all of those conditions, try this:
'{(\s)?\'(.*)\'(\s)?:(\s)?\'(.*)\'(\s)?}'
as you regex. It uses (\s)? to match one or zero whitespace characters.
I recommend you to use a JSON parser instead of REGEX.
String strJson = "{ 'username' : 'xirby' }";
strJson = new JSONObject(strJson).toString();
System.out.println(strJson);
I'm using the library org.json.
I have a string like this (quotes can't appear in field_n)
{field1=value1, field2=value2} (say it `val`)
This string is obtained from an Hashtable<String, Object>.
I create a JSONObject from that string, obtaining:
{"field1":"value1", "field2":"value2"}
The issue arises when in the value value_n quotes (or newlines and carriage return) appear.
I've tried to escape the string in this way:
value = value.replace("\\", "\\\\");
value = value.replace("\"", "\\\"");
value = value.replace("\r", "\\r");
value = value.replace("\n", "\\n");
but I always obtain the org.json.JSONException: Expected a ',' or '}' at ... [character ... line 1] when I try to create the JSONObject with:
JSONObject json = new JSONObject(val);
In order to create JSON from map, use:
new JSONObject(myMap);
Another related issue:
quotedStr = JSONObject.quote(val.trim());
will qoute all needed values as it says:
Produce a string in double quotes with backslash sequences in all the right places