Apache Commons CSV : Read Values with comma - java

I am converting CSV files to a Java Bean. I need to maintain the comma inside a value which is enclosed in "".
Here is my code.
public static PPRCV convertContestToObj(String fileName) throws IOException {
PPRCV pprcvHandler = PPRCVFactory.getPPRCVTable(fileName);
CSVFormat csvFileFormat = CSVFormat.DEFAULT.newFormat(',').withEscape('"');
List<PPRCV> pprcvs = new ArrayList<>();
FileReader fileReader = new FileReader(fileName);
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List<CSVRecord> csvRecords = csvFileParser.getRecords();
for (CSVRecord csvRecord : csvRecords) {
pprcvs.add(pprcvHandler.populateDynamicDetails(csvRecord));
}
return pprcvHandler;
}
Sample CSV line:
7080001, XI, ProvinceX, TownX, BRGX, "SHOOL, BRGX", "0054A,0055A,0055B,0055C"
my DTO
private String precintCode;
private String regionName;
private String provinceName;
private String municipalityName;
private String districtName;
private String votingCenter;
private String precint;
My expected output should be
precintCode = "7080001"
regionName = "XI"
provinceName = "ProvinceX"
municipalityName = "TownX"
districtName = "BRGX"
votingCenter = "SCHOOL, BRGX"
precint = "0054A,0055A,0055B,0055C"
However actual output is this
precintCode = "7080001"
regionName = "XI"
provinceName = "ProvinceX"
municipalityName = "TownX"
districtName = "BRGX"
votingCenter = ""SCHOOL"
precint = " , BRGX,"0054A"

You need the withIgnoreSurroundingSpaces() optione here. All other settings could be remain DEFAULT.
final Reader in = new StringReader("7080001, XI, ProvinceX, TownX, BRGX, \"SHOOL, BRGX\", \"0054A,0055A,0055B,0055C\" ");
final CSVFormat csvFileFormat = CSVFormat.DEFAULT.withIgnoreSurroundingSpaces();
for (CSVRecord record: csvFileFormat.parse(in)) {
for (String field: record) {
System.out.println("\"" + field + "\"");
}
System.out.println();
}
The output is
"7080001"
"XI"
"ProvinceX"
"TownX"
"BRGX"
"SHOOL, BRGX"
"0054A,0055A,0055B,0055C"

I was able to do it using the withQuote function from the library.
CSVFormat.EXCEL.newFormat(',').withQuote('"')

Have you already tried using the CSVFormat.DEFAULT constant?-- it's for CSV files adhering to RFC 4180.

The following way worked for me:
CSVFormat.EXCEL.withQuote('"')

Related

How to deserialize multiple objects sequentially using jackson-databind

I am using msgpack to serialize data. I have some code works fine with serializing data.
public void testJackson() throws Exception {
ByteArrayOutputStream out = new ByteArrayOutputStream();
String data1 = "test data";
int data2 = 10;
List<String> data3 = new ArrayList<String>();
data3.add("list data1");
data3.add("list data1");
ObjectMapper mapper = new ObjectMapper();
mapper.writeValue(out, data1);
mapper.writeValue(out, data2);
mapper.writeValue(out, data3);
// TODO: How to deserialize?
}
But now I don't know how to deserialize data.
I am not finding any solution anywhere. It will be good if anyone can help how to proceed.
The problem
I have tried many of the readValue methods, but I only can get the first String, about the second and third value I have no idea
The thing is, Jackson always reads the first data, since the data is neither deleted from the nor did you explicitly tell Jackson that the next data is from position A to position B
Solutions
this example works and is similar to your code, but is not very elegant. Here I explicitly tell Jackson where my data is, but I have to know how it got written, which is a way too specific solution
File dataFile = new File("jackson.txt");
if(!dataFile.exists())
dataFile.createNewFile();
FileOutputStream fileOut = new FileOutputStream(dataFile);
ByteArrayOutputStream out = new ByteArrayOutputStream();
FileInputStream fileIn = new FileInputStream(dataFile);
String writeData1 = "test data";
int writeData2 = 10;
List<String> writeData3 = new ArrayList<String>();
writeData3.add("list data1");
writeData3.add("list data1");
ObjectMapper mapper = new ObjectMapper();
byte[] writeData1Bytes = mapper.writeValueAsBytes(writeData1);
out.write(writeData1Bytes);
byte[] writeData2Bytes = mapper.writeValueAsBytes(writeData2);
out.write(writeData2Bytes);
byte[] writeData3Bytes = mapper.writeValueAsBytes(writeData3);
out.write(writeData3Bytes);
out.writeTo(fileOut);
// TODO: How to deserialize?
int pos = 0;
byte[] readData = new byte[1000];
fileIn.read(readData);
String readData1 = mapper.readValue(readData, pos, writeData1Bytes.length, String.class);
pos += writeData1Bytes.length;
Integer readData2 = mapper.readValue(readData, pos, writeData2Bytes.length, Integer.class);
pos += writeData2Bytes.length;
ArrayList readData3 = mapper.readValue(readData, pos, writeData3Bytes.length, ArrayList.class);
pos += writeData3Bytes.length;
System.out.printf("readData1 = %s%n", readData1);
System.out.printf("readData2 = %s%n", readData2);
System.out.printf("readData3 = %s%n", readData3);
the file looks then like this
"test data"10["list data1","list data1"]
How to do it correctly
a way more elegant way is to encapsulate your data in an object which can be turned into a valid JSON string and from that Jackson won't need any more information
public class JacksonTest {
public static class DataNode {
#JsonProperty("data1")
private String data1;
#JsonProperty("data2")
private int data2;
#JsonProperty("data3")
private List<String> data3;
//needed for Jackson
public DataNode() {
}
public DataNode(String data1, int data2, List<String> data3) {
this.data1 = data1;
this.data2 = data2;
this.data3 = data3;
}
}
public static void main(String[] args) throws Exception {
File dataFile = new File("jackson.txt");
if(!dataFile.exists())
dataFile.createNewFile();
FileOutputStream fileOut = new FileOutputStream(dataFile);
ByteArrayOutputStream out = new ByteArrayOutputStream();
FileInputStream fileIn = new FileInputStream(dataFile);
String writeData1 = "test data";
int writeData2 = 10;
List<String> writeData3 = new ArrayList<String>();
writeData3.add("list data1");
writeData3.add("list data1");
DataNode writeData = new DataNode(writeData1, writeData2, writeData3);
ObjectMapper mapper = new ObjectMapper();
mapper.writeValue(out, writeData);
out.writeTo(fileOut);
// TODO: How to deserialize?
DataNode readData = mapper.readValue(fileIn, DataNode.class);
System.out.printf("readData1 = %s%n", readData.data1);
System.out.printf("readData2 = %s%n", readData.data2);
System.out.printf("readData3 = %s%n", readData.data3);
}
}
the content of the file looks like this
{"data1":"test data","data2":10,"data3":["list data1","list data1"]}
You'll want to use one of the readValue methods from ObjectMapper - probably one that has a Reader or InputStream as the first parameter.
#Japu_D_Cret Thank you for such a detailed answer!
Actually I want to use msgpack to transfer data, and I made it work by using msgpack, here is my code
ByteArrayOutputStream out = new ByteArrayOutputStream();
String data1 = "test data";
int data2 = 10;
List<String> data3 = new ArrayList<String>();
data3.add("list data1");
data3.add("list data1");
MessagePack packer = new MessagePack();
packer.write(out, data1);
packer.write(out, data2);
packer.write(out, data3);
// TODO: How to deserialize?
BufferUnpacker unpacker = packer.createBufferUnpacker(out.toByteArray());
System.out.println(unpacker.readString());
System.out.println(unpacker.readInt());
System.out.println(unpacker.read(Templates.tList(Templates.TString)));
Then I found jackson-databind on msgpack website and it supports msgpack format also.
I do some tests on these two and found that jackson's serialize performance is better than msgpack, so I want to use jackson instead of msgpack.

How to read long cell values using OpenCSV?

I am importing a csv file. I have a cell value something like 1.00E+13 while importing which has to be read as 10023000000000 (actual expanded value).
I am using OpenCSV.
Note: I achieved this by using apache POI for XLS and XLSX files, Since POI doesn't support CSV I am using OpenCSV.
POI's solution to above scenario is:
FormulaEvaluator objFormulaEvaluator = new HSSFFormulaEvaluator(hWorkbook);
HSSFCell cellValue = row.getCell(1); //input as 1.00E+13
objFormulaEvaluator.evaluate(cellValue); //result as 10023000000000
How do I achieve this in OpenCSV?
Thanks in advance.
Kenny's answer is dead on if you are parsing the Strings from the csvReader yourself. My only concern is that the csv file is strings so the input will have to be 1.0023E+13 not 1.00E+13.
All that said there is another way in opencsv and that is the CsvToBean class. You create a class that has the values you want with public getters and setters then the CsvToBean will handle the conversions for you as the passing test below showed.
public class CsvToBeanDoubleTest {
private static final double DOUBLE_NUMBER = 10023000000000d;
private static final String TEST_STRING = "name,orderNumber,doubleNum\n" +
"kyle,abc123456,10023000000000\n" +
"jimmy,def098765,1.0023E+13 ";
private CSVReader createReader() {
return createReader(TEST_STRING);
}
private CSVReader createReader(String testString) {
StringReader reader = new StringReader(testString);
return new CSVReader(reader);
}
private MockBean createMockBean(String name, String orderNumber, double num) {
MockBean mockBean = new MockBean();
mockBean.setName(name);
mockBean.setOrderNumber(orderNumber);
mockBean.setDoubleNum(num);
return mockBean;
}
#Test
public void parseBeanWithNoAnnotations() {
HeaderColumnNameMappingStrategy<MockBean> strategy = new HeaderColumnNameMappingStrategy<MockBean>();
strategy.setType(MockBean.class);
CsvToBean<MockBean> bean = new CsvToBean<MockBean>();
List<MockBean> beanList = bean.parse(strategy, createReader());
assertEquals(2, beanList.size());
assertTrue(beanList.contains(createMockBean("kyle", "abc123456", DOUBLE_NUMBER)));
assertTrue(beanList.contains(createMockBean("jimmy", "def098765", DOUBLE_NUMBER)));
}
}

How to add CDATA in a XML without the Loss of <br/> tag in java?

How to add CDATA in a XML without the Loss of <br/> tag in java?
I need to add the Cdata to the String temp1 and also need to retain the break tag.
Then the program and sample below:
i) program-AddCDATASectionToDOMDocument.java
ii) input xml
iii) required output
i) program-AddCDATASectionToDOMDocument.java
public class AddCDATASectionToDOMDocument {
public static void main(String[] args) throws Exception {
xmlreader xmlr = new xmlreader();
String temp1 = xmlr.xmlFileReader("example.xml", "contentmeta","subtitle");
String temp2 = "<![CDATA[" + temp1 + "]]>";
xmlr.xmlFileWriter("example.xml", "contentmeta", "subtitle", temp2);
}
}
ii)example.xml
iii)required out put
How about using regular expressions instead of parsing it with DOM? This code may work with your example:
String input = new String(Files.readAllBytes(Paths.get("file1.xml")));
final Pattern regex = Pattern.compile("<subtitle>(.+?)</subtitle>");
final Matcher matcher = regex.matcher(input);
String modification;
if (matcher.find()) {
modification = "<subtitle><![CDATA["+matcher.group(1)+"]]></subtitle>";
String output = matcher.replaceFirst(modification);
System.out.println(output);
FileOutputStream outputStream = new FileOutputStream("file2.xml");
outputStream.write(output.getBytes());
}

pass values from java variables to microsoft word (doc and docx) variables

How can i replace a microsoft word document variable value with a value from a java variable ? I have a .doc or .docx file template in which i have defined some variables.
When user click on download button from my app the .doc or .docx variables must get the value from java variables.
I use docx4j for that purpose:
String inputfilepath = "binding-simple1.docx";
String outputfilepath = "OUT_VariableReplace.docx";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(new java.io.File(inputfilepath));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("subjectId", "E000001");
// Approach 1 (from 3.0.0; faster if you haven't yet caused unmarshalling to occur):
documentPart.variableReplace(mappings);
Docx4J.save(wordMLPackage, new File(outputfilepath));
Variable param as the following: ${subjectId}
I am responsible for a Play! intranet that generates word documents using templates that have a .docx extension. To achieve this, we have the following inheritance tree : Document > Word > [someDocument]
The abstract class Word handles replacing variables in Word documents
public abstract class Word extends Document {
public static JAXBContext context = org.docx4j.jaxb.Context.jc;
public Word(String inputfilepath){
super(inputfilepath);
}
public String generer(String outputfilepath) throws Exception {
//String inputfilepath = System.getProperty("user.dir")+"/app/doc/adhesionTemplate.docx";
//String outputfilepath = System.getProperty("user.dir")+ "/test-out.docx";
// Open a document from the file system
// 1. Load the Package
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));
// 2. Fetch the document part
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
org.docx4j.wml.Document wmlDocumentEl = (org.docx4j.wml.Document) documentPart.getJaxbElement();
// xml --> string
String xml = XmlUtils.marshaltoString(wmlDocumentEl, true);
//Change the variables using an abstract function getMapping()
HashMap<String, String> mappings = getMapping();
Object obj = XmlUtils.unmarshallFromTemplate(xml, mappings);
// change JaxbElement
documentPart.setJaxbElement((org.docx4j.wml.Document) obj);
//Footers :
List<SectionWrapper> wrappers = wordMLPackage.getDocumentModel().getSections();
for (SectionWrapper sw : wrappers) {
FooterPart footer = sw.getHeaderFooterPolicy().getDefaultFooter();
if (footer != null) {
Ftr footerDoc = footer.getJaxbElement();
String footerXml = XmlUtils.marshaltoString(footerDoc, true);
Object footerObj = XmlUtils.unmarshallFromTemplate(footerXml, mappings);
footer.setJaxbElement( (Ftr) footerObj);
}
}
// Save it
SaveToZipFile saver = new SaveToZipFile(wordMLPackage);
saver.save(outputfilepath);
Console.commande("sudo chmod 660 \"" + outputfilepath + "\"");
System.out.println("Saved output to:" + outputfilepath);
return outputfilepath;
}
Then, we have classes that inherit from this Word abstract class:
public class FAC extends Word {
public FAC() {
super(System.getProperty("user.dir") + "/templates/Facture.docx");
}
#Override
public HashMap<String, String> getMapping() {
//Preparing variables
int price = blablabla;
HashMap<String, String> map = new HashMap<String, String>();
map.put("FACDate", Utils.dateConvert(new Date()));
map.put("somePrice", String.valueOf(price));
return map;
}
}
Note : the "Document" superclass has nothing special, just a variable "inputFilePath", and the abstract method getMapping()
Hope this help, either you or future viewers like me :P

How to vectorize text file in mahout?

I'm having a text file with label and tweets .
positive,I love this car
negative,I hate this book
positive,Good product.
I need to convert each line into vector value.If i use seq2sparse command means the whole document gets converted to vector,but i need to convert each line as vector not the whole document.
ex :
key : positive value : vectorvalue(tweet)
How can we achieve this in mahout?
/* Here is what i have done */
StringTokenizer str= new StringTokenizer(line,",");
String label=str.nextToken();
while (str.hasMoreTokens())
{
tweetline =str.nextToken();
System.out.println("Tweetline"+tweetline);
StringTokenizer words = new StringTokenizer(tweetline," ");
while(words.hasMoreTokens()){
featureList.add(words.nextToken());}
}
Vector unclassifiedInstanceVector = new RandomAccessSparseVector(tweetline.split(" ").length);
FeatureVectorEncoder vectorEncoder = new AdaptiveWordValueEncoder(label);
vectorEncoder.setProbes(1);
System.out.println("Feature List: "+featureList);
for (Object feature: featureList) {
vectorEncoder.addToVector((String) feature, unclassifiedInstanceVector);
}
context.write(new Text("/"+label), new VectorWritable(unclassifiedInstanceVector));
Thanks in advance
You can write it to app hdfs path with SequenceFile.Writer
FS = FileSystem.get(HBaseConfiguration.create());
String newPath = "/foo/mahouttest/part-r-00000";
Path newPathFile = new Path(newPath);
Text key = new Text();
VectorWritable value = new VectorWritable();
SequenceFile.Writer writer = SequenceFile.createWriter(FS, conf, newPathFile,
key.getClass(), value.getClass());
.....
key.set("c/"+label);
value.set(unclassifiedInstanceVector );
writer.append(key,value);

Categories

Resources