I have to map particular CSV column based on index with particular POJO attributes. Mapping will be based on a json file which will contain columnIndex and attribute name which means that for a particular columnIndex from csv file you have to map particular attribute from Pojo class.
Below is a sample of json file which shows column mapping strategy with Pojo attributes.
[{"index":0,"columnname":"date"},{"index":1,"columnname":"deviceAddress"},{"index":7,"columnname":"iPAddress"},{"index":3,"columnname":"userName"},{"index":10,"columnname":"group"},{"index":5,"columnname":"eventCategoryName"},{"index":6,"columnname":"message"}]
I have tried with OpenCSV library but the challenges which i faced with that I am not able to read partial column with it. As in above json you can see that we are skipping index 2 and 4 to read from CSV file. Below is the code with openCSV file.
public static List<BaseDataModel> readCSVFile(String filePath,List<String> columnListBasedOnIndex) {
List<BaseDataModel> csvDataModels = null;
File myFile = new File(filePath);
try (FileInputStream fis = new FileInputStream(myFile)) {
final ColumnPositionMappingStrategy<BaseDataModel> strategy = new ColumnPositionMappingStrategy<BaseDataModel>();
strategy.setType(BaseDataModel.class);
strategy.setColumnMapping(columnListBasedOnIndex.toArray(new String[0]));
final CsvToBeanBuilder<BaseDataModel> beanBuilder = new CsvToBeanBuilder<>(new InputStreamReader(fis));
beanBuilder.withMappingStrategy(strategy);
csvDataModels = beanBuilder.build().parse();
} catch (Exception e) {
e.printStackTrace();
}
}
List<ColumnIndexMapping> columnIndexMappingList = dataSourceModel.getColumnMappingStrategy();
List<String> columnNameList = columnIndexMappingList.stream().map(ColumnIndexMapping::getColumnname)
.collect(Collectors.toList());
List<BaseDataModel> DataModels = Utility
.readCSVFile(file.getAbsolutePath() + File.separator + fileName, columnNameList);
I have also tried with univocity but with this library how can i map csv with particular attributes. Below is the code -
CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically(); //detects the format
settings.getFormat().setLineSeparator("\n");
//extracts the headers from the input
settings.setHeaderExtractionEnabled(true);
settings.selectIndexes(0, 2); //rows will contain only values of columns at position 0 and 2
CsvRoutines routines = new CsvRoutines(settings); // Can also use TSV and Fixed-width routines
routines.parseAll(BaseDataModel.class, new File("/path/to/your.csv"));
List<String[]> rows = new CsvParser(settings).parseAll(new File("/path/to/your.csv"), "UTF-8");
Please have a look if someone can help me in this case.
Author of univocity-parsers here. You can define mappings to your class attributes in code instead of annotations. Something like this:
public class BaseDataModel {
private String a;
private int b;
private String c;
private Date d;
}
Then on your code, map the attributes to whatever column names you need:
ColumnMapper mapper = routines.getColumnMapper();
mapper.attributeToColumnName("a", "col1");
mapper.attributeToColumnName("b", "col2");
mapper.attributeToColumnName("c", "col3");
mapper.attributeToColumnName("d", "col4");
You can also use mapper.attributeToIndex("d", 3); to map attributes to a given column index.
Hope this helps.
Related
I have a list of objects that are instances of a number of sub-classes of a base class. I've been trying to write these objects out together into one CSV file.
Each class contains the fields of the base class and adds a couple of extra fields of its own.
What I am trying to achieve is to write out a csv having the base class fields first and then the columns coming from the rest of the sub-classes. This of course means that the sub-classes that don't contain a particular column name should have that field empty.
I have tried achieving this using OpenCSV and SuperCSV but have not managed to configure them to do this. Looking at the libraries code I am pretty sure OpenCSV will not do this. Using SuperCSV with Dozer I got multiple classes to write in one file but I can't get the empty columns in place where a class is missing a particular column field.
I can obviously write my own custom CSV writer to achieve this but I was wondering if anyone could help me reach a solution based off an existing CSV writer library.
Edit: SuperCSV code added below per commenter's request
private static final String[] FIELD_MAPPING = new String[] { "documentNumber", "lineOfBusiness", "clientId", "childClass1Field", };
private static final String[] FIELD_MAPPING2 = new String[] { "documentNumber", "lineOfBusiness", "clientId", "childClass2Field1", "childClass2Field2"};
public static void writeWithCsvBeanWriter(PrintWriter writer, List<ParentClass> documents) throws Exception {
CsvDozerBeanWriter beanWriter = null;
try {
beanWriter = new CsvDozerBeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);
final String[] header = new String[] { "documentNumber", "lineOfBusiness", "clientId", "childClass1Field", "childClass2Field1", "childClass2Field2"};
beanWriter.configureBeanMapping(ChildClass1.class, FIELD_MAPPING);
beanWriter.configureBeanMapping(ChildClass2.class, FIELD_MAPPING2);
final CellProcessor[] processors = new CellProcessor[] { new Optional(), new Optional(), new Optional(), new Optional() }
final CellProcessor[] processors2 = new CellProcessor[] { new Optional(), new Optional(), new Optional(), new Optional(), new Optional() }
beanWriter.writeHeader(header);
for (final ParentClass document : documents) {
if (document instanceof ChildClass1) {
beanWriter.write(document, processors);
} else {
beanWriter.write(document, processors2);
}
}
} finally {
if (beanWriter != null) {
beanWriter.close();
}
}
}
I want to send some survey in PDF from java, I tryed different methods. I use with StringBuffer and without, but always see text in PDF in one row.
public void writePdf(OutputStream outputStream) throws Exception {
Paragraph paragraph = new Paragraph();
Document document = new Document();
PdfWriter.getInstance(document, outputStream);
document.open();
document.addTitle("Survey PDF");
ArrayList nameArrays = new ArrayList();
StringBuffer sb = new StringBuffer();
int i = -1;
for (String properties : textService.getAnswer()) {
nameArrays.add(properties);
i++;
}
for (int a= 0; a<=i; a++){
System.out.println("nameArrays.get(a) -"+nameArrays.get(a));
sb.append(nameArrays.get(a));
}
paragraph.add(sb.toString());
document.add(paragraph);
document.close();
}
textService.getAnswer() this - ArrayList<String>
Could you please advise how to separate the text in order each new sentence will be starting from new row?
Now I see like this:
You forgot the newline character \n and your code seems a bit overcomplicated.
Try this:
StringBuffer sb = new StringBuffer();
for (String property : textService.getAnswer()) {
sb.append(property);
sb.append('\n');
}
What about:
nameArrays.add(properties+"\n");
You might be able to fix that by simply appending "\n" to the strings that you collecting in your list; but I think: that very much depends on the PDF library you are using.
You see, "newlines" or "paragraphs" are to a certain degree about formatting. It seems like a conceptual problem to add that "formatting" information to the data that you are processing.
Meaning: you might want to check if your library allows you to provide strings - and then have the library do the formatting for you!
In other words: instead of giving strings with newlines; you should check if you can keep using strings without newlines, but if there is way to have the PDF library add line breaks were appropriate.
Side note on code quality: you are using raw types:
ArrayList nameArrays = new ArrayList();
should better be
ArrayList<String> names = new ArrayList<>();
[ I also changed the name - there is no point in putting the type of a collection into the variable name! ]
This method is for save values in array list into a pdf document. In the mfilePath variable "/" in here you can give folder name. As a example "/example/".
and also for mFileName variable you can use name. I give the date and time that document will created. don't give static name other vice your values are overriding in same pdf.
private void savePDF()
{
com.itextpdf.text.Document mDoc = new com.itextpdf.text.Document();
String mFileName = new SimpleDateFormat("YYYY-MM-DD-HH-MM-SS", Locale.getDefault()).format(System.currentTimeMillis());
String mFilePath = Environment.getExternalStorageDirectory() + "/" + mFileName + ".pdf";
try
{
PdfWriter.getInstance(mDoc, new FileOutputStream(mFilePath));
mDoc.open();
for(int d = 0; d < g; d++)
{
String mtext = answers.get(d);
mDoc.add(new Paragraph(mtext));
}
mDoc.close();
}
catch (Exception e)
{
}
}
I am new to Hindsight & Hadoop map reduce concept. I am trying to merge multiple XML files to a single XML file using map reduce program. My intention is to merge each XML file into a destination XML file by prepending and appending file name as start and end tag.
For eg. the below XML's should be merged into a single XML shown below
Input XML Files
<xml><a></a></xml>
<xml><b></b></xml>
<xml><c></c></xml>
Output XML File
<xml>
<File1Name><xml><a></a></xml><File2Name>
<File2Name><xml><b></b></xml><File3Name>
<File3Name><xml><c></c></xml><File3Name>
<xml>
Question 1: Is it possible to map a XML file to each mapper and create a key value pair, key as a file name and value as an each XML file prepending and appending file name as start and end tags and reducer to merge all XML's to a single context and output to XML shown above.
Question 2: How can i get file name as key in mapper code?
Answer 1:
I don't suggest sending just a single XML to a mapper unless the files are over 1gb a piece. You can send a list of xml locations to your mapper and then in your mapper code open each location and extract the data into your output.
Answer 2:
If using azure blob storage, you could list all the blobs in a container and assign them to the input split.
How to create your list of InputSplits:
ArrayList<InputSplit> ret = new ArrayList<InputSplit>();
/*Do this for each path we receive. Creates a directory of splits in this order s = input path (S1,1),(s2,1)…(sN,1),(s1,2),(sN,2),(sN,3) etc..
*/
for (int i = numMinNameHashSplits; i <= Math.min(numMaxNameHashSplits,numNameHashSplits–1); i++) {
for (Path inputPath : inputPaths) {
ret.add(new ParseDirectoryInputSplit(inputPath.toString(), i));
System.out.println(i + ” “+inputPath.toString());
}
}
return ret;
}
}
Once the List<InputSplits> is assembled, each InputSplit is handed to a Record Reader class where each Key, Value, pair is read then passed to the map task. The initialization of the recordreader class uses the InputSplit, a string representing the location of a “folder” of invoices in blob storage, to return a list of all blobs within the folder, the blobs variable below. The below Java code demonstrates the creation of the record reader for each hashslot and the resulting list of blobs in that location.
Public class ParseDirectoryFileNameRecordReader
extends RecordReader<IntWritable, Text> {
private int nameHashSlot;
private int numNameHashSlots;
private Path myDir;
private Path currentPath;
private Iterator<ListBlobItem> blobs;
private int currentLocation;
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
myDir = ((ParseDirectoryInputSplit)split).getDirectoryPath();
//getNameHashSlot tells us which slot this record reader is responsible for
nameHashSlot = ((ParseDirectoryInputSplit)split).getNameHashSlot();
//gets the total number of hashslots
numNameHashSlots = getNumNameHashSplits(context.getConfiguration());
//gets the input credientals to the storage account assigned to this record reader.
String inputCreds = getInputCreds(context.getConfiguration());
//break the directory path to get account name
String[] authComponents = myDir.toUri().getAuthority().split(“#”);
String accountName = authComponents[1].split(“\\.”)[0];
String containerName = authComponents[0];
String accountKey = Utils.returnInputkey(inputCreds, accountName);
System.out.println(“This mapper is assigned the following account:”+accountName);
StorageCredentials creds = new StorageCredentialsAccountAndKey(accountName,accountKey);
CloudStorageAccount account = new CloudStorageAccount(creds);
CloudBlobClient client = account.createCloudBlobClient();
CloudBlobContainer container = client.getContainerReference(containerName);
blobs = container.listBlobs(myDir.toUri().getPath().substring(1) + “/”, true,EnumSet.noneOf(BlobListingDetails.class), null,null).iterator();
currentLocation = –1;
return;
}
Once initialized, the record reader is used to pass the next key to the map task. This is controlled by the nextKeyValue method, and it is called every time map task starts. The blow Java code demonstrates this.
//This checks if the next key value is assigned to this task or is assigned to another mapper. If it assigned to this task the location is passed to the mapper, otherwise return false
#Override
public boolean nextKeyValue() throws IOException, InterruptedException {
while (blobs.hasNext()) {
ListBlobItem currentBlob = blobs.next();
//Returns a number between 1 and number of hashslots. If it matches the number assigned to this Mapper and its length is greater than 0, return the path to the map function
if (doesBlobMatchNameHash(currentBlob) && getBlobLength(currentBlob) > 0) {
String[] pathComponents = currentBlob.getUri().getPath().split(“/”);
String pathWithoutContainer =
currentBlob.getUri().getPath().substring(pathComponents[1].length() + 1);
currentPath = new Path(myDir.toUri().getScheme(), myDir.toUri().getAuthority(),pathWithoutContainer);
currentLocation++;
return true;
}
}
return false;
}
The logic in the map function is than simply as follows, with inputStream containing the entire XML string
Path inputFile = new Path(value.toString());
FileSystem fs = inputFile.getFileSystem(context.getConfiguration());
//Input stream contains all data from the blob in the location provided by Text
FSDataInputStream inputStream = fs.open(inputFile);
Resources:
http://www.andrewsmoll.com/3-hacks-for-hadoop-and-hdinsight-clusters/ "Hack 3"
http://blogs.msdn.com/b/mostlytrue/archive/2014/04/10/merging-small-files-on-hdinsight.aspx
I have the below implementation.
csvReader = new CsvBeanReader(new InputStreamReader(stream), CsvPreference.STANDARD_PREFERENCE);
lastReadIdentity = (T) csvReader.read(Packages.class, Packages.COLS);
In my Packages.class
I have set my unitcount variable.
public String getUnitCount() {
return unitCount;
}
public void setUnitCount(String unitCount) {
this.unitCount = unitCount;
}
This works fine when it is taken as a string, but when taken as a integer, it throws the below exception. Please help
private int unitCount;
public int getUnitCount() {
return unitCount;
}
public void setUnitCount(int unitCount) {
this.unitCount = unitCount;
}
Exception:
org.supercsv.exception.SuperCsvReflectionException: unable to find method setUnitCount(java.lang.String) in class com.directv.sms.data.SubscriberPackages - check that the corresponding nameMapping element matches the field name in the bean, and the cell processor returns a type compatible with the field
context=null
at org.supercsv.util.ReflectionUtils.findSetter(ReflectionUtils.java:139)
at org.supercsv.util.MethodCache.getSetMethod(MethodCache.java:95)
I'm not sure about SuperCsv, but univocity-parsers should be able to handle this without a hitch, not to mention it is at least 3 times faster to parse your input.
Just annotate your class:
public class SubscriberPackages {
#Parsed(defaultNullRead = "0") // if the file contains nulls, then they will be converted to 0.
private int unitCount; // The attribute name will be matched against the column header in the file automatically.
}
To parse the CSV into beans:
// BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
BeanListProcessor<SubscriberPackages> rowProcessor = new BeanListProcessor<SubscriberPackages>(SubscriberPackages.class);
CsvParserSettings parserSettings = new CsvParserSettings(); //many options here, check the tutorial.
parserSettings.setRowProcessor(rowProcessor); //uses the bean processor to handle your input rows
parserSettings.setHeaderExtractionEnabled(true); // extracts header names from the input file.
CsvParser parser = new CsvParser(parserSettings); //creates a parser with your settings.
parser.parse(new FileReader(new File("/path/to/file.csv"))); //all rows parsed here go straight to the bean processor
// The BeanListProcessor provides a list of objects extracted from the input.
List<SubscriberPackages> beans = rowProcessor.getBeans();
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
Im using the FasterXML library to parse my CSV file. The CSV file has the column names in its first line. Unfortunately I need the columns to be renamed. I have a lambda function for this, where I can pass the red value from the csv file in and get the new value.
my code looks like this, but does not work.
CsvSchema csvSchema =CsvSchema.emptySchema().withHeader();
ArrayList<HashMap<String, String>> result = new ArrayList<HashMap<String, String>>();
MappingIterator<HashMap<String,String>> it = new CsvMapper().reader(HashMap.class)
.with(csvSchema )
.readValues(new File(fileName));
while (it.hasNext())
result.add(it.next());
System.out.println("changing the schema columns.");
for (int i=0; i < csvSchema.size();i++) {
String name = csvSchema.column(i).getName();
String newName = getNewName(name);
csvSchema.builder().renameColumn(i, newName);
}
csvSchema.rebuild();
when i try to print out the columns later, they are still the same as in the top line of my CSV file.
Additionally I noticed, that csvSchema.size() equals 0 - why?
You could instead use uniVocity-parsers for that. The following solution streams the input rows to the output so you don't need to load everything in memory to then write your data back with new headers. It will be much faster:
public static void main(String ... args) throws Exception{
Writer output = new StringWriter(); // use a FileWriter for your case
CsvWriterSettings writerSettings = new CsvWriterSettings(); //many options here - check the documentation
final CsvWriter writer = new CsvWriter(output, writerSettings);
CsvParserSettings parserSettings = new CsvParserSettings(); //many options here as well
parserSettings.setHeaderExtractionEnabled(true); // indicates the first row of the input are headers
parserSettings.setRowProcessor(new AbstractRowProcessor(){
public void processStarted(ParsingContext context) {
writer.writeHeaders("Column A", "Column B", "... etc");
}
public void rowProcessed(String[] row, ParsingContext context) {
writer.writeRow(row);
}
public void processEnded(ParsingContext context) {
writer.close();
}
});
CsvParser parser = new CsvParser(parserSettings);
Reader reader = new StringReader("A,B,C\n1,2,3\n4,5,6"); // use a FileReader for your case
parser.parse(reader); // all rows are parsed and submitted to the RowProcessor implementation of the parserSettings.
System.out.println(output.toString());
//nothing else to do. All resources are closed automatically in case of errors.
}
You can easily select the columns by using parserSettings.selectFields("B", "A") in case you want to reorder/eliminate columns.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).