Ignoring delimiter in quote-enclosed field in Apache Commons CSV / OpenCSV? - java

I have to parse a csv file which has fields that can look like the following:
("FOO, BAR BAZ", 42)
And yield the two fields:
FOO, BAR BAZ
42
I'm not sure how to do this succinctly using Apache Commons CSV or OpenCSV, so I'm looking for some guidance. It may just be that I don't fully understand the org.apache.commons.csv.CSVFormat property "quoteChar" which is touched on in the documentation but never clearly explained anywhere I could find. If so, it'd be very helpful if you could point me towards better documentation of that feature.
Here's a brief example that shows my problem as well as what I've tried and the results:
String test = "(\"FOO, BAR BAZ\", 42)";
int numTries = 5;
CSVParser[] tries = new CSVParser[numTries];
tries[0] = CSVParser.parse(line, CSVFormat.DEFAULT.withRecordSeparator("\n"));//BAR BAZ"
tries[1] = CSVParser.parse(line, CSVFormat.DEFAULT.withQuote('"'));//BAR BAZ"
tries[2] = CSVParser.parse(line, CSVFormat.DEFAULT.withQuote(null));//BAR BAZ"
tries[3] = CSVParser.parse(line, CSVFormat.DEFAULT.withQuote('"').withQuoteMode(QuoteMode.NON_NUMERIC));//BAR BAZ"
tries[4] = CSVParser.parse(line, CSVFormat.DEFAULT.withRecordSeparator(")\n("));//BAR BAZ"
for(int i = 0; i < numTries; i++){
CSVRecord record = tries[i].getRecords().get(0);
System.out.println(record.get(1));//.equals("42"));
}
Note that it works fine if you exclude the parentheses from the input.

You can use OpenCSV's CSVReader to read the data and get the data elements as shown below:
public static void main(String[] args) {
try(FileReader fr = new FileReader(new File("C:\\Sample.txt"));
CSVReader csvReader = new CSVReader(fr);) {
String[] data = csvReader.readNext();
for(String data1 : data) {
System.out.println(data1);
}
} catch (IOException e) {
e.printStackTrace();
}
}

You can achieve this with opencsv as follows:
import com.opencsv.CSVReader;
import java.io.FileReader;
import java.io.IOException;
public class NewClass1 {
public static void main(String[] args) throws IOException {
String fileName = "C:\\yourFile.csv";
String [] nextLine;
// use the three arg constructor to tell the reader which delimiter you have in your file(2nd arg : here ',')
// you can change this to '\t' if you have tab separeted file or ';' or ':' ... whatever your delimiter is
// (3rd arg) '"' if your fields are double quoted or '\'' if single quoted or no 3rd arg if the fields are not quoted
CSVReader reader = new CSVReader(new FileReader(fileName), ',' ,'"');
// nextLine[] is an array of values from the line
// each line represented by String[], and each field as an element of the array
while ((nextLine = reader.readNext()) != null) {
System.out.println("nextLine[0]: " +nextLine[0]);
System.out.println("nextLine[1]: " +nextLine[1]);
}
}
}

For me the default-format of commons-csv does the right thing for a correctly formatted CSV message:
Reader in = new StringReader("\"FOO, BAR BAZ\", 42");
Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
for (CSVRecord record : records) {
for(int i = 0;i < record.size();i++) {
System.out.println("At " + i + ": " + record.get(i));
}
}
Leads to:
At 0: FOO, BAR BAZ
At 1: 42
For the specially formatted lines you likely need to do a bit more handling top remove those brackets:
BufferedReader lineReader = new BufferedReader(
new StringReader("(\"FOO, BAR BAZ\", 42)\n(\"FOO, BAR FOO\", 44)"));
while(true) {
String line = lineReader.readLine();
if (line == null) {
break;
}
String adjustedLine = line.substring(1, line.length() - 1);
records = CSVFormat.DEFAULT.parse(new StringReader(adjustedLine));
for (CSVRecord record : records) {
for (int i = 0; i < record.size(); i++) {
System.out.println("At " + i + ": " + record.get(i));
}
}
}

Related

Java: Getting a substring from a string in Text file starting after a special word

I would like to extract The Name and Age from The Text file from it. Can someone please provide me some help?
The text content :
fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk
Name: Max Pain
Age: 99 Years
and they df;ml dk fdj,nbfdlkn ......
Code:
package myclass;
import java.io.*;
public class ReadFromFile2 {
public static void main(String[] args)throws Exception {
File file = new File("C:\\Users\\Ss\\Desktop\\s.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null)
System.out.println(st.substring(st.lastIndexOf("Name:")));
// System.out.println(st);
}
}
please try below code.
public static void main(String[] args)throws Exception
{
File file = new File("/root/test.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null) {
if(st.lastIndexOf("Name:") >= 0 || st.lastIndexOf("Age:") >= 0) {
System.out.println(st.substring(st.lastIndexOf(":")+1));
}
}
}
You can use replace method from string class, since String is immutable and is going to create a new string for each modification.
while ((st = br.readLine()) != null)
if(st.startsWith("Name:")) {
String name = st.replace("Name:", "").trim();
st = br.readLine();
String age="";
if(st!= null && st.startsWith("Age:")) {
age = st.replace("Age:", "").trim();
}
// now you should have the name and the age in those variables
}
}
This will do your Job:
public static void main(String[] args) {
String str = "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk Name: Max Pain Age: 99 Years and they df;ml dk fdj,nbfdlkn";
String[] split = str.split("(\\b: \\b)");
//\b represents an anchor like caret
// (it is similar to $ and ^)
// matching positions where one side is a word character (like \w) and
// the other side is not a word character
// (for instance it may be the beginning of the string or a space character).
System.out.println(split[1].replace("Age",""));
System.out.println(split[2].replaceAll("\\D+",""));
//remove everything except Integer ,i.e. Age
}
Output:
Max Pain
99
If they can occur on the same line and you want to use a pattern don't over matching them, you could use a capturing group and a tempered greedy token.
\b(?:Name|Age):\h*((?:.(?!(?:Name|Age):))+)
Regex demo | Java demo
For example
final String regex = "\\b(?:Name|Age):\\h*((?:.(?!(?:Name|Age):))+)";
final String string = "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk \n"
+ "Name: Max Pain\n"
+ "Age: 99 Years\n"
+ "and they df;ml dk fdj,nbfdlkn ......\n\n"
+ "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk \n"
+ "Name: Max Pain Age: 99 Years\n"
+ "and they df;ml dk fdj,nbfdlkn ......";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
Output
Max Pain
99 Years
Max Pain
99 Years

Read specific row in csv

Is there any way through which we can read specific row record on the basis of value. For example my csv file is:
ProductID,ProductName,price,availability,type
12345,Anaox,300,yes,medicine
23456,Chekmeter,400,yes,testing
I want to get the row whose ProductID is ‘23456’. i was checking the new CsvReader(“D:\products.csv”).getRawRecord() method, but it doesn’t have any method parameters.
Iterator iterator = CsvReader("D:\\products.csv").Iterator();
while(iterator.hasNext()){
if((String[] string = (iterator.next))[0] == 23456)
sout("Found the row: " + string[0] + ", " + string[1] + ", " + string[2] + ", " + string[3] + ", " + string[4]);
}
And regarding your concern about performance:
with 1.000 elements this will still be way faster than you need it to be, worry when you get to 1.000.000 elements
If you want the reading of the Csv to be performant you gotta use a performant way of storing your IDs. If you are just incrementing your IDs everytime a new one is made and never delete an ID you can use the ID as an index to directly get the correct line.
public static ArrayList<String> getSpecificRowData(String s) throws IOException
{
String csvFile = s;
BufferedReader br = null;
String line = "";
ArrayList<String> arr=new ArrayList<>();
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
//write your row name in the given csv file
if(line.contains("write your row name"))
{
arr.add(line);
}
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return arr;
}
//return the row in the form of ArrayList
ArrayList<String> li=getSpecificRowData("FileName12.csv");
System.out.println(li);

How to find similar lines in two text files irrespective of the line number at which they occur

I am trying to open two text files and find similar lines in them.
My code is correctly reading all the lines from both the text files.
I have used nested for loops to compare line1 of first text file with all lines of second text file and so on.
However, it is only detecting similar lines which have same line number,
(eg. line 1 of txt1 is cc cc cc and line 1 of txt2 is cc cc cc, then it correctly finds and prints it),
but it doesn't detect same lines on different line numbers in those files.
import java.io.*;
import java.util.*;
public class FeatureSelection500 {
public static void main(String[] args) throws FileNotFoundException, IOException {
// TODO code application logic here
File f1 = new File("E://implementation1/practise/ComUpdatusPS.exe.hex-04-ngrams-Freq.txt");
File f2 = new File("E://implementation1/practise/top-300features.txt");
Scanner scan1 = new Scanner(f1);
Scanner scan2 = new Scanner(f2);
int i = 1;
List<String> txtFileOne = new ArrayList<String>();
List<String> txtFileTwo = new ArrayList<String>();
while (scan1.hasNext()) {
txtFileOne.add(scan1.nextLine());
}
while (scan2.hasNext())
{
txtFileTwo.add(scan2.nextLine());
}
/*
for(String ot : txtFileTwo )
{
for (String outPut : txtFileOne)
{
// if (txtFileTwo.contains(outPut))
if(outPut.equals(ot))
{
System.out.print(i + " ");
System.out.println(outPut);
i++;
}
}
}
*/
for (int j = 0; j < txtFileTwo.size(); j++) {
String fsl = txtFileTwo.get(j);
// System.out.println(fileContentSingleLine);
for (int z = 0; z < 600; z++) // z < txtFileOne.size()
{
String s = txtFileOne.get(z);
// System.out.println(fsl+"\t \t"+ s);
if (fsl.equals(s)) {
System.out.println(fsl + "\t \t" + s);
// my line
// System.out.println(fsl);
} else {
continue;
}
}
}
}
}
I made your code look nicer, you're welcome :)
Anyway, I don't understand that you get that bug. It runs through all of the list2 for every line in the list1...
import java.io.*;
import java.util.*;
public class FeatureSelection500 {
public static void main(String[] args) throws FileNotFoundException, IOException {
// TODO code application logic here
File file1 = new File("E://implementation1/practise/ComUpdatusPS.exe.hex-04-ngrams-Freq.txt");
File file2 = new File("E://implementation1/practise/top-300features.txt");
Scanner scan1 = new Scanner(file1);
Scanner scan2 = new Scanner(file2);
List<String> txtFile1 = new ArrayList<String>();
List<String> txtFile2 = new ArrayList<String>();
while (scan1.hasNext()) {
txtFile1.add(scan1.nextLine());
}
while (scan2.hasNext()) {
txtFile2.add(scan2.nextLine());
}
for (int i = 0; i < txtFile2.size(); i++) {
String lineI = txtFile2.get(i);
// System.out.println(fileContentSingleLine);
for (int j = 0; j < txtFile1.size(); j++){ // z < txtFileOne.size(
String lineJ = txtFile1.get(j);
// System.out.println(fsl+"\t \t"+ s);
if (lineI.equals(lineJ)) {
System.out.println(lineI + "\t \t" + lineJ);
// my line
// System.out.println(fsl);
}
}
}
}
}
I don't see any problem with your code. Even the block you commented is absolutely fine. Since, you are doing equals() you should make sure that you have same text (same case) in the two files for them to be able to satisfy the condition successfully.
for(String ot : txtFileTwo )
{
for (String outPut : txtFileOne)
{
if(outPut.equals(ot)) /* Check Here */
{
/* Please note that here i will not give you line number,
it will just tell you the number of matches in the two files */
System.out.print(i + " ");
System.out.println(outPut);
i++;
}
}
}

Creating an ArrayList from data in a text file

I am trying to write a program that uses two classes to find the total $ amount from a text file of retail transactions. The first class must read the file, and the second class must perform the calculations. The problem I am having is that in the first class, the ArrayList only seems to get the price of the last item in the file. Here is the input (which is in a text file):
$69.99 3 Shoes
$79.99 1 Pants
$17.99 1 Belt
And here is my first class:
class ReadInputFile {
static ArrayList<Double> priceArray = new ArrayList<>();
static ArrayList<Double> quantityArray = new ArrayList<>();
static String priceSubstring = new String();
static String quantitySubstring = new String();
public void gatherData () {
String s = "C:\\filepath";
try {
FileReader inputFile = new FileReader(s);
BufferedReader bufferReader = new BufferedReader(inputFile);
String line;
String substring = " ";
while ((line = bufferReader.readLine()) != null)
substring = line.substring(1, line.lastIndexOf(" ") + 1);
priceSubstring = substring.substring(0,substring.indexOf(" "));
quantitySubstring = substring.substring(substring.indexOf(" ") + 1 , substring.lastIndexOf(" ") );
double price = Double.parseDouble(priceSubstring);
double quantity = Double.parseDouble(quantitySubstring);
priceArray.add(price);
quantityArray.add(quantity);
System.out.println(priceArray);
} catch (IOException e) {
e.printStackTrace();
}
}
The output and value of priceArray is [17.99], but the desired output is [69.99,79.99,17.99].
Not sure where the problem is, but thanks in advance for any help!
Basically what you have is:
while ((line = bufferReader.readLine()) != null) {
substring = line.substring(1, line.lastIndexOf(" ") + 1);
}
priceSubstring = substring.substring(0,substring.indexOf(" "));
quantitySubstring = substring.substring(substring.indexOf(" ") + 1 , substring.lastIndexOf(" ") );
double price = Double.parseDouble(priceSubstring);
double quantity = Double.parseDouble(quantitySubstring);
priceArray.add(price);
quantityArray.add(quantity);
System.out.println(priceArray);
So all you are doing is creating a substring of the line you just read, then reading the next line, so basically, only the substring of the last will get processed by the remaining code.
Wrap the code in {...} which you want to be executed on each iteration of the loop
For example...
while ((line = bufferReader.readLine()) != null) {
substring = line.substring(1, line.lastIndexOf(" ") + 1);
priceSubstring = substring.substring(0,substring.indexOf(" "));
quantitySubstring = substring.substring(substring.indexOf(" ") + 1 , substring.lastIndexOf(" ") );
double price = Double.parseDouble(priceSubstring);
double quantity = Double.parseDouble(quantitySubstring);
priceArray.add(price);
quantityArray.add(quantity);
System.out.println(priceArray);
}
This will execute all the code within the {...} block for each line of the file

array in array list

In the input file, there are 2 columns: 1) stem, 2) affixes. In my coding, i recognise each of the columns as tokens i.e. tokens[1] and tokens[2]. However, for tokens[2] the contents are: ng ny nge
stem affixes
---- -------
nyak ng ny nge
my problem here, how can I declare the contents under tokens[2]? Below are my the snippet of the coding:
try {
FileInputStream fstream2 = new FileInputStream(file2);
DataInputStream in2 = new DataInputStream(fstream2);
BufferedReader br2 = new BufferedReader(new InputStreamReader(in2));
String str2 = "";
String affixes = " ";
while ((str2 = br2.readLine()) != null) {
System.out.println("Original:" + str2);
tokens = str2.split("\\s");
if (tokens.length < 4) {
continue;
}
String stem = tokens[1];
System.out.println("stem is: " + stem);
// here is my point
affixes = tokens[3].split(" ");
for (int x=0; x < tokens.length; x++)
System.out.println("affix is: " + affixes);
}
in2.close();
} catch (Exception e) {
System.err.println(e);
} //end of try2
You are using tokens as an array (tokens[1]) and assigning the value of a String.split(" ") to it. So it makes things clear that the type of tokens is a String[] array.
Next,
you are trying to set the value for affixes after splitting tokens[3], we know that tokens[3] is of type String so calling the split function on that string will yield another String[] array.
so the following is wrong because you are creating a String whereas you need String[]
String affixes = " ";
so the correct type should go like this:
String[] affixes = null;
then you can go ahead and assign it an array.
affixes = tokens[3].split(" ");
Are you looking for something like this?
public static void main(String[] args) {
String line = "nyak ng ny nge";
MyObject object = new MyObject(line);
System.out.println("Stem: " + object.stem);
System.out.println("Affixes: ");
for (String affix : object.affixes) {
System.out.println(" " + affix);
}
}
static class MyObject {
public final String stem;
public final String[] affixes;
public MyObject(String line) {
String[] stemSplit = line.split(" +", 2);
stem = stemSplit[0];
affixes = stemSplit[1].split(" +");
}
}
Output:
Stem: nyak
Affixes:
ng
ny
nge

Categories

Resources