How to rename Columns via Lambda function - fasterXML

How to rename Columns via Lambda function - fasterXML - java

Im using the FasterXML library to parse my CSV file. The CSV file has the column names in its first line. Unfortunately I need the columns to be renamed. I have a lambda function for this, where I can pass the red value from the csv file in and get the new value.
my code looks like this, but does not work.
CsvSchema csvSchema =CsvSchema.emptySchema().withHeader();
ArrayList<HashMap<String, String>> result = new ArrayList<HashMap<String, String>>();
MappingIterator<HashMap<String,String>> it = new CsvMapper().reader(HashMap.class)
.with(csvSchema )
.readValues(new File(fileName));
while (it.hasNext())
result.add(it.next());
System.out.println("changing the schema columns.");
for (int i=0; i < csvSchema.size();i++) {
String name = csvSchema.column(i).getName();
String newName = getNewName(name);
csvSchema.builder().renameColumn(i, newName);
}
csvSchema.rebuild();
when i try to print out the columns later, they are still the same as in the top line of my CSV file.
Additionally I noticed, that csvSchema.size() equals 0 - why?

You could instead use uniVocity-parsers for that. The following solution streams the input rows to the output so you don't need to load everything in memory to then write your data back with new headers. It will be much faster:
public static void main(String ... args) throws Exception{
Writer output = new StringWriter(); // use a FileWriter for your case
CsvWriterSettings writerSettings = new CsvWriterSettings(); //many options here - check the documentation
final CsvWriter writer = new CsvWriter(output, writerSettings);
CsvParserSettings parserSettings = new CsvParserSettings(); //many options here as well
parserSettings.setHeaderExtractionEnabled(true); // indicates the first row of the input are headers
parserSettings.setRowProcessor(new AbstractRowProcessor(){
public void processStarted(ParsingContext context) {
writer.writeHeaders("Column A", "Column B", "... etc");
}
public void rowProcessed(String[] row, ParsingContext context) {
writer.writeRow(row);
}
public void processEnded(ParsingContext context) {
writer.close();
}
});
CsvParser parser = new CsvParser(parserSettings);
Reader reader = new StringReader("A,B,C\n1,2,3\n4,5,6"); // use a FileReader for your case
parser.parse(reader); // all rows are parsed and submitted to the RowProcessor implementation of the parserSettings.
System.out.println(output.toString());
//nothing else to do. All resources are closed automatically in case of errors.
}
You can easily select the columns by using parserSettings.selectFields("B", "A") in case you want to reorder/eliminate columns.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Related

mapping particular column of a csv file with particular POJO's field

I have to map particular CSV column based on index with particular POJO attributes. Mapping will be based on a json file which will contain columnIndex and attribute name which means that for a particular columnIndex from csv file you have to map particular attribute from Pojo class.
Below is a sample of json file which shows column mapping strategy with Pojo attributes.
[{"index":0,"columnname":"date"},{"index":1,"columnname":"deviceAddress"},{"index":7,"columnname":"iPAddress"},{"index":3,"columnname":"userName"},{"index":10,"columnname":"group"},{"index":5,"columnname":"eventCategoryName"},{"index":6,"columnname":"message"}]
I have tried with OpenCSV library but the challenges which i faced with that I am not able to read partial column with it. As in above json you can see that we are skipping index 2 and 4 to read from CSV file. Below is the code with openCSV file.
public static List<BaseDataModel> readCSVFile(String filePath,List<String> columnListBasedOnIndex) {
List<BaseDataModel> csvDataModels = null;
File myFile = new File(filePath);
try (FileInputStream fis = new FileInputStream(myFile)) {
final ColumnPositionMappingStrategy<BaseDataModel> strategy = new ColumnPositionMappingStrategy<BaseDataModel>();
strategy.setType(BaseDataModel.class);
strategy.setColumnMapping(columnListBasedOnIndex.toArray(new String[0]));
final CsvToBeanBuilder<BaseDataModel> beanBuilder = new CsvToBeanBuilder<>(new InputStreamReader(fis));
beanBuilder.withMappingStrategy(strategy);
csvDataModels = beanBuilder.build().parse();
} catch (Exception e) {
e.printStackTrace();
}
}
List<ColumnIndexMapping> columnIndexMappingList = dataSourceModel.getColumnMappingStrategy();
List<String> columnNameList = columnIndexMappingList.stream().map(ColumnIndexMapping::getColumnname)
.collect(Collectors.toList());
List<BaseDataModel> DataModels = Utility
.readCSVFile(file.getAbsolutePath() + File.separator + fileName, columnNameList);
I have also tried with univocity but with this library how can i map csv with particular attributes. Below is the code -
CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically(); //detects the format
settings.getFormat().setLineSeparator("\n");
//extracts the headers from the input
settings.setHeaderExtractionEnabled(true);
settings.selectIndexes(0, 2); //rows will contain only values of columns at position 0 and 2
CsvRoutines routines = new CsvRoutines(settings); // Can also use TSV and Fixed-width routines
routines.parseAll(BaseDataModel.class, new File("/path/to/your.csv"));
List<String[]> rows = new CsvParser(settings).parseAll(new File("/path/to/your.csv"), "UTF-8");
Please have a look if someone can help me in this case.

Author of univocity-parsers here. You can define mappings to your class attributes in code instead of annotations. Something like this:
public class BaseDataModel {
private String a;
private int b;
private String c;
private Date d;
}
Then on your code, map the attributes to whatever column names you need:
ColumnMapper mapper = routines.getColumnMapper();
mapper.attributeToColumnName("a", "col1");
mapper.attributeToColumnName("b", "col2");
mapper.attributeToColumnName("c", "col3");
mapper.attributeToColumnName("d", "col4");
You can also use mapper.attributeToIndex("d", 3); to map attributes to a given column index.
Hope this helps.

Write multiple classes to one CSV file

I have a list of objects that are instances of a number of sub-classes of a base class. I've been trying to write these objects out together into one CSV file.
Each class contains the fields of the base class and adds a couple of extra fields of its own.
What I am trying to achieve is to write out a csv having the base class fields first and then the columns coming from the rest of the sub-classes. This of course means that the sub-classes that don't contain a particular column name should have that field empty.
I have tried achieving this using OpenCSV and SuperCSV but have not managed to configure them to do this. Looking at the libraries code I am pretty sure OpenCSV will not do this. Using SuperCSV with Dozer I got multiple classes to write in one file but I can't get the empty columns in place where a class is missing a particular column field.
I can obviously write my own custom CSV writer to achieve this but I was wondering if anyone could help me reach a solution based off an existing CSV writer library.
Edit: SuperCSV code added below per commenter's request
private static final String[] FIELD_MAPPING = new String[] { "documentNumber", "lineOfBusiness", "clientId", "childClass1Field", };
private static final String[] FIELD_MAPPING2 = new String[] { "documentNumber", "lineOfBusiness", "clientId", "childClass2Field1", "childClass2Field2"};
public static void writeWithCsvBeanWriter(PrintWriter writer, List<ParentClass> documents) throws Exception {
CsvDozerBeanWriter beanWriter = null;
try {
beanWriter = new CsvDozerBeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);
final String[] header = new String[] { "documentNumber", "lineOfBusiness", "clientId", "childClass1Field", "childClass2Field1", "childClass2Field2"};
beanWriter.configureBeanMapping(ChildClass1.class, FIELD_MAPPING);
beanWriter.configureBeanMapping(ChildClass2.class, FIELD_MAPPING2);
final CellProcessor[] processors = new CellProcessor[] { new Optional(), new Optional(), new Optional(), new Optional() }
final CellProcessor[] processors2 = new CellProcessor[] { new Optional(), new Optional(), new Optional(), new Optional(), new Optional() }
beanWriter.writeHeader(header);
for (final ParentClass document : documents) {
if (document instanceof ChildClass1) {
beanWriter.write(document, processors);
} else {
beanWriter.write(document, processors2);
}
}
} finally {
if (beanWriter != null) {
beanWriter.close();
}
}
}

Get CSV file header using apache commons

I have been looking for the past 2 hours for a solution to my problem in vain. I'am trying to read a CSV File using Apache commons ,I am able to read the whole file but my problem is how to extract only the header of the CSV in an array?

I looked everywhere and even the solution above didn't work.
For anyone else with this issue, this does.
Iterable<CSVRecord> records;
Reader in = new FileReader(fileLocation);
records = CSVFormat.EXCEL.withHeader().withSkipHeaderRecord(false).parse(in);
Set<String> headers = records.iterator().next().toMap().keySet();
Note that your use of .next() has consumed one row of the CSV.

By default, first record read by CSVParser will always be a header record, e.g. in the below example:
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withHeader(FILE_HEADER_MAPPING);
FileReader fileReader = new FileReader("file");
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List csvRecords = csvFileParser.getRecords();
csvRecords.get(0) will return the header record.

BufferedReader br = new BufferedReader(new FileReader(filename));
CSVParser parser = CSVParser.parse(br, CSVFormat.EXCEL.withFirstRecordAsHeader());
List<String> headers = parser.getHeaderNames();
This worked for me. The last line is what you need, extracts the headers found by the parser into a List of Strings.

Since Apache Commons CSV v1.9.0, the withSkipHeaderRecord() & the withFirstRecordAsHeader() methods are deprecated. A builder interface is provided. Use it thusly:
CSVFormat.DEFAULT.builder()
.setHeader()
.setSkipHeaderRecord(true)
.build();

In Kotlin:
val reader = File(path).bufferedReader()
val records = CSVFormat.DEFAULT.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withTrim()
.parse(reader)
println(records.headerNames)

The code below works for me:
import java.io.FileReader;
import org.apache.commons.csv.*;
public static String[] headersInCSVFile (String csvFilePath) throws IOException {
//reading file
CSVFormat csvFileFormat = CSVFormat.DEFAULT;
FileReader fileReader = new FileReader(csvFilePath);
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List csvRecords = csvFileParser.getRecords();
//Obtaining first record and splitting that into an array using delimiters and removing unnecessary text
String[] headers = csvRecords.get(0).toString().split("[,'=\\]\\[]+");
String[] result = new String[headers.length - 6];
for (int i = 6; i < headers.length; i++) {
//.replaceAll("\\s", "") removes spaces
result[i - 6] = headers[i].replaceAll("\\s", "");
}
return result;
}

Jackson CSV's WRAP_AS_ARRAY

According to http://fasterxml.github.io/jackson-dataformat-csv/javadoc/2.0.0/com/fasterxml/jackson/dataformat/csv/CsvParser.Feature.html, WRAP_AS_ARRAY is:
Feature that determines how stream of records (usually CSV lines, but sometimes multiple lines when linefeeds are included in quoted values) is exposed: either as a sequence of Objects (false), or as an array of Objects (true).
What is the difference between a "sequence of Objects" and an "array of Objects"? The description seems the same to me.

Parsing to a sequence of objects: you call readValues() and get a MappingIterator, which gives you the objects one-by-one. Equivalent to input containing multiple JSON objects, one after the other.
Parsing to an array of objects: you call readValue() and get a List of the objects. Equivalent to input containing a JSON array.
Examples:
#Test
public void parses_csv_to_object_list() throws Exception {
String csv = "id,name\n1,Red\n2,Green\n3,Blue";
CsvMapper mapper = new CsvMapper();
CsvSchema schema = mapper.schemaFor(ColourData.class).withHeader();
ObjectReader reader = mapper.readerFor(ColourData.class).with(schema);
try (MappingIterator<ColourData> iter = reader.readValues(csv)) {
assertThat(iter.readAll(),
contains(new ColourData(1, "Red"), new ColourData(2, "Green"), new ColourData(3, "Blue")));
}
}
#Test
public void parses_csv_to_object_list_in_one_read() throws Exception {
String csv = "id,name\n1,Red\n2,Green\n3,Blue";
CsvMapper mapper = new CsvMapper().enable(CsvParser.Feature.WRAP_AS_ARRAY);
CsvSchema schema = mapper.schemaFor(ColourData.class).withHeader();
ObjectReader reader = mapper.readerFor(new TypeReference<List<ColourData>>() {
}).with(schema);
assertThat(reader.readValue(csv),
contains(new ColourData(1, "Red"), new ColourData(2, "Green"), new ColourData(3, "Blue")));
}

Supercsv - unable to find method exception

I have the below implementation.
csvReader = new CsvBeanReader(new InputStreamReader(stream), CsvPreference.STANDARD_PREFERENCE);
lastReadIdentity = (T) csvReader.read(Packages.class, Packages.COLS);
In my Packages.class
I have set my unitcount variable.
public String getUnitCount() {
return unitCount;
}
public void setUnitCount(String unitCount) {
this.unitCount = unitCount;
}
This works fine when it is taken as a string, but when taken as a integer, it throws the below exception. Please help
private int unitCount;
public int getUnitCount() {
return unitCount;
}
public void setUnitCount(int unitCount) {
this.unitCount = unitCount;
}
Exception:
org.supercsv.exception.SuperCsvReflectionException: unable to find method setUnitCount(java.lang.String) in class com.directv.sms.data.SubscriberPackages - check that the corresponding nameMapping element matches the field name in the bean, and the cell processor returns a type compatible with the field
context=null
at org.supercsv.util.ReflectionUtils.findSetter(ReflectionUtils.java:139)
at org.supercsv.util.MethodCache.getSetMethod(MethodCache.java:95)

I'm not sure about SuperCsv, but univocity-parsers should be able to handle this without a hitch, not to mention it is at least 3 times faster to parse your input.
Just annotate your class:
public class SubscriberPackages {
#Parsed(defaultNullRead = "0") // if the file contains nulls, then they will be converted to 0.
private int unitCount; // The attribute name will be matched against the column header in the file automatically.
}
To parse the CSV into beans:
// BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
BeanListProcessor<SubscriberPackages> rowProcessor = new BeanListProcessor<SubscriberPackages>(SubscriberPackages.class);
CsvParserSettings parserSettings = new CsvParserSettings(); //many options here, check the tutorial.
parserSettings.setRowProcessor(rowProcessor); //uses the bean processor to handle your input rows
parserSettings.setHeaderExtractionEnabled(true); // extracts header names from the input file.
CsvParser parser = new CsvParser(parserSettings); //creates a parser with your settings.
parser.parse(new FileReader(new File("/path/to/file.csv"))); //all rows parsed here go straight to the bean processor
// The BeanListProcessor provides a list of objects extracted from the input.
List<SubscriberPackages> beans = rowProcessor.getBeans();
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to rename Columns via Lambda function - fasterXML - java

Related

mapping particular column of a csv file with particular POJO's field

Write multiple classes to one CSV file

Get CSV file header using apache commons

Jackson CSV's WRAP_AS_ARRAY

Supercsv - unable to find method exception

Categories

Resources