I have one column contains below data . want to split data in to multiple columns using java code . problem i am facing was in string I have double quotes with comma it was falling in to another column. I have to split data as follows(target). Can any one help to fix this ?
I/P:
Column:
abc,"test,data",valid
xyz,"sample,data",invalid
Target:
Col1|Col2|Col3
abc|"test,data"|valid
xyz|"sample_data"|invalid
I highly recommend that you use a library to handle instead doing it yourself.
I guess your data is in CSV format, so you should take a look at common-csv.
You can resolve your problem with simple code:
CSVParser records = CSVParser.parse("abc,\"test,data\",valid", CSVFormat.DEFAULT);
for (CSVRecord csvRecord : records) {
for (String value : csvRecord) {
System.out.println(value);
}
}
Output:
abc
test,data
valid
Read more at https://www.baeldung.com/apache-commons-csv
Related
I want to write/append data to a CSV file, column-by-column, in below fashion:
query1 query2 query3
data_item1 data_item7 data_item12
data_item2 data_item8 data_item13
data_item3 data_item9 data_item14
data_item4 data_item10
data_item5 data_item11
data_item6
I have the data in a hashMap, with the queryID (i.e. query1,query2) being the key and data_items for the
corresponding queries being the values.
The values(data_items for every query) are in a list.
Therefore, my hash map looks like this :
HashMap<String,List<String>> hm = new HashMap<String,List<String>>();
How can I write this data, column by column to a csv, as demonstrated above, using JAVA ?
I tried CSVWriter, but couldn't do it. Can anyone please help me out ?
csv files are mostly used to persist data structured like a table... meaning data with columns and rows that are in a close context.
In your example there seems to be only a very loose connection between query1, 2 and 3, and no connection horizontally between item 1,7 and 12, or 2, 8 and 13 and so on.
On top of that writing into files are usually facilitated along rows or lines. So you open your file write one line, and then another and so on.
So to write the data columnwise as you are asking, you have to either restructure your data in your code alrady to have all the data which is written into one line available on writing that line, or run through your csv file and it's lines several times, each time adding another item to a row. Of course the latter option is very time consuming and would not make much sense.
So i would suggest if there is really no connection between the data of the 3 queries, you either write your data into 3 different csv files: query1.csv, 2.csv and 3.csv.
Or, if you have a horizontal connection i.e. between item 1,7 and 12, and so on you write it into one csv file, organizing the data into rows and columns. Something like:
queryNo columnX columnY columnZ
1 item1 item2 item3
2 item7 item8 item9
3 item12 item13 item14
How to do that is well described in this thread: Java - Writing strings to a CSV file.
Other examples you can also find here https://mkyong.com/java/how-to-export-data-to-csv-file-java/
After days of tinkering around, I finally succeeded. Here is the implementation :
for(int k=0;k<maxRows;k++) {
List<String> rowValues = new ArrayList<String>();
for(int i=0;i<queryIdListArr.length;i++) {
subList = qValuesList.subList(i, i+1);
List<String> subList2 = subList.stream().flatMap(List::stream).collect(Collectors.toList());
if(subList2.size()<=k) {
rowValues.add("");
}else{
rowValues.add(subList2.get(k));
}
}
String[] rowValuesArr = new String[rowValues.size()];
rowValuesArr = rowValues.toArray(rowValuesArr);
// System.out.println(rowValues);
writer.writeNext(rowValuesArr);
}
maxRows : Size of the value list with max size. I have a list of values for each key. My hash map looks like this
HashMap<String,List<String>> hm = new HashMap<String,List<String>>();
queryIdListArr : List of all the values obtained from the hash map.
qValuesList : List of all the value lists.
List<List<String>> qValuesList = new ArrayList<List<String>>();
subList2 : sublist obtained from qValuesList using the below syntax :
qValuesList.subList(i, i+1);
rowValuesArr is an array that gets populated with the index wise value for each
value fetched from qValuesList.
The idea is to fetch all the values for each index from all the sublists and then write those values to the row. If for that index, no value is found, write a blank character.
Following approach allows to read with skipping header:
Iterable<CSVRecord> records = CSVFormat.EXCEL.withHeader().parse(in);
for (CSVRecord record : records) {
//here first record is not header
}
How can I read csv since header line inclusively ?
P.S.
approach:
CSVFormat.EXCEL.withHeader().withSkipHeaderRecord(false).parse(in)
doesn't work and has the same behaviour
For me the followings all seem to have the header record as the first one (using commons-csv 1.5):
Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in);
Iterable<CSVRecord> records = CSVFormat.EXCEL.withSkipHeaderRecord().parse(in); //???
Iterable<CSVRecord> records = CSVFormat.EXCEL.withSkipHeaderRecord(false).parse(in);
Iterable<CSVRecord> records = CSVFormat.EXCEL.withSkipHeaderRecord(true).parse(in); //???
And as you have stated the following does NOT seem to have the header record as the first one:
Iterable<CSVRecord> records = CSVFormat.EXCEL.withHeader().parse(in); //???
It is beyond my understanding why withSkipHeaderRecord() and withSkipHeaderRecord(true) do include the header while withHeader() does not; seems to be the opposite behaviour as to what the method names suggest.
The withHeader() method tells the parser that the file has a header. Perhaps the method name is confusing.
The withFirstRecordAsHeader() method may also be useful.
From the CSVFormat (Apache Commons CSV 1.8 API) JavaDoc page:
Referencing columns safely
If your source contains a header record, you can simplify your code and safely reference columns, by using withHeader(String...) with no arguments:
CSVFormat.EXCEL.withHeader();
This causes the parser to read the first record and use its values as column names. Then, call one of the CSVRecord get method that takes a String column name argument:
String value = record.get("Col1");
This makes your code impervious to changes in column order in the CSV file.
I'm using JOOQ and Postgres.
In Postgres I have a column gender:
'gender' AS gender,
(the table itself is a view and the gender column is a placeholder for a value that gets calculated in Java)
In Java when I .fetch() the view, I do some calculations on each record:
for (Record r : skillRecords) {
idNumber=function(r)
r.set(id, idNumber);
r.set(gender,getGender(idNumber));
}
All looks good and if println the values they're all correct.
However, when I call intoResultSet() on my skillsRecord, the gender column has an asterisks next to all the values, eg "*Male".
Then, I use the resultset as input into an OpenCSV CSV writer and when I open the CSV the gender column comes out as null.
Any suggestions?
UPDATE:
Following the input from Lukas regarding the asterisks, I realise the issue is likely with opencsv.
My code is as follows:
File tempFile = new File("/tmp/file.csv");
BufferedWriter out = new BufferedWriter(new FileWriter(tempFile));
CSVWriter writer = new CSVWriter(out);
//Code for getting records sits here
for (Record r : skillRecords) {
idNumber=function(r)
r.set(id, idNumber);
r.set(gender,getGender(idNumber));
}
writer.writeAll(skillRecords.intoResultSet(), true);
return tempFile;
All the columsn in the CSV come back as expected, except the gender column, which has the header "gender" but the column values are empty.
I have the necessary try/catches in the code above but I've excluded them for brevity.
The asterisk in *Male
The asterisk that you see in the ResultSet.toString() output (or in Result.toString()) reflects the record's internal Record.changed(Field) flag, i.e. the information on each record that says that the record was modified after it was retrieved form the database (which you did).
That is just visual information which you can safely ignore.
Solution:
So I found the solution. It turns out with postgres if I have something like:
'gender' AS gender,
The type is unknown, not text. So the solution was to define as:
'gender'::text AS gender
After doing so OpenCSV was happy.
i am trying to read a csv file which does not contains coma separated values , these are columns for NASDAQ Stocks, i want to read a particular column, assume (3rd), do not know , how to get the column items. IS there any method to read Column wise data in hadoop? pls help here.
My CSV File Format is:
exchange stock_symbol date stock_price_open stock_price_high stock_price_low stock_price_close stock_volume stock_price_adj_close
NASDAQ ABXA 12/9/2009 2.55 2.77 2.5 2.67 158500 2.67
NASDAQ ABXA 12/8/2009 2.71 2.74 2.52 2.55 131700 2.55
Edited Here:
Column A : exchange
Column B : stock_symbol
Column C : date
Column D : stock_price_open
Column E : stock_price_high
and similarly.
These are Columns and not a comma separated values. i need to read this file as column wise.
In Pig it will look like this:
Q1 = LOAD 'file.csv' USING PigStorage('\t') AS (exchange, stock_symbol, stock_date:double, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close);
Q2 = FOREACH Q1 GENERATE stock_date;
DUMP C;
You can try to format excel sheet like, adding columns to a single text by using formula like:
=CONCATENATE(A2,";",B2,";",C2,";"D2,";",E2,";",F2,";",G2,";",H2,";",I2)
and concatenate these columns by your required separator, i have used ;, here. use what you want there to be.
I have a string like "2,345".I want to put it into a excel cell.I successfully did but in my excel file i got "2,345" as a string.So please suggest me how can i get "2,345" as a number value but with the same format as i used above(comma seperated).
Thanks in advance.
Remove the comma before inserting it into Excel, cast it to a number before inserting, then format the column to show the comma.
String replace
In Excel the code to format a Range with commas is:
SomeRange.Style = "Comma" 'or, recorded version
SomeRange.NumberFormat = "_-* #,##0_-;-* #,##0_-;_-* ""-""??_-;_-#_-"
'a simpler version..
SomeRange.NumberFormat = "#,##0"