Java: CSV File Easy Read/Write

Java: CSV File Easy Read/Write - java

I'm working on a program that requires quick access to a CSV comma-delimited spreadsheet file.
So far I've been able to read from it easily using a BufferedReader.
However, now I want to be able to edit the data it reads, then export it BACK to the CSV.
The spreadsheet contains names, phone numbers, email addresses, etc. And the program lists everyone's data, and when you click on them it brings up a page with more detailed information, also pulled from the CSV. On that page you can edit the data, and I want to be able to click a "Save Changes" button, then export the data back to its appropriate line in the CSV--or delete the old one, and append the new.
I'm not very familiar with using a BufferedWriter, or whatever it is I should be using.
What I started to do is create a custom class called FileIO. It contains both a BufferedReader and a BufferedWriter. So far it has a method that returns bufferedReader.readLine(), called read(). Now I want a function called write(String line).
public static class FileIO {
BufferedReader read;
BufferedWriter write;
public FileIO (String file) throws MalformedURLException, IOException {
read = new BufferedReader(new InputStreamReader (getUrl(file).openStream()));
write = new BufferedWriter (new FileWriter (file));
}
public static URL getUrl (String file) throws IOException {
return //new URL (fileServer + file).openStream()));
FileIO.class.getResource(file);
}
public String read () throws IOException {
return read.readLine();
}
public void write (String line) {
String [] data = line.split("\\|");
String firstName = data[0];
// int lineNum = findLineThatStartsWith(firstName);
// write.writeLine(lineNum, line);
}
};
I'm hoping somebody has an idea as to how I can do this?

Rather than reinventing the wheel you could have a look at OpenCSV which supports reading and writing of CSV files. Here are examples of reading & writing

Please consider Apache commons csv.
To fast understand the api, there are four important classes:
CSVFormat
Specifies the format of a CSV file and parses input.
CSVParser
Parses CSV files according to the specified format.
CSVPrinter
Prints values in a CSV format.
CSVRecord
A CSV record parsed from a CSV file.
Code Example:
Unit test code:

The spreadsheet contains names, phone numbers, email addresses, etc. And the program lists everyone's data, and when you click on them it brings up a page with more detailed information, also pulled from the CSV. On that page you can edit the data, and I want to be able to click a "Save Changes" button, then export the data back to its appropriate line in the CSV--or delete the old one, and append the new.
The content of a file is a sequence of bytes. CSV is a text based file format, i.e. the sequence of byte is interpreted as a sequence of characters, where newlines are delimited by special newline characters.
Consequently, if the length of a line increases, the characters of all following lines need to be moved to make room for the new characters. Likewise, to delete a line you must move the later characters to fill the gap. That is, you can not update a line in a csv (at least not when changing its length) without rewriting all following lines in the file. For simplicity, I'd rewrite the entire file.
Since you already have code to write and read the CSV file, adapting it should be straightforward. But before you do that, it might be worth asking yourself if you're using the right tool for the job. If the goal is to keep a list of records, and edit individual records in a form, programs such as Microsoft Access or whatever the Open Office equivalent is called might be a more natural fit. If you UI needs go beyond what these programs provide, using a relational database to keep your data is probably a better fit (more efficient and flexible than a CSV).

Add Dependencies
implementation 'com.opencsv:opencsv:4.6'
Add Below Code in onCreate()
InputStreamReader is = null;
try {
String path= "storage/emulated/0/Android/media/in.bioenabletech.imageProcessing/MLkit/countries_image_crop.csv";
CSVReader reader = new CSVReader(new FileReader(path));
String[] nextLine;
int lineNumber = 0;
while ((nextLine = reader.readNext()) != null) {
lineNumber++;
//print CSV file according to your column 1 means first column, 2 means
second column
Log.e(TAG, "onCreate: "+nextLine[2] );
}
}
catch (Exception e)
{
Log.e(TAG, "onCreate: "+e );
}

I solved it using
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
<version>2.8.6</version>
</dependency>
and
private static final CsvMapper mapper = new CsvMapper();
public static <T> List<T> readCsvFile(MultipartFile file, Class<T> clazz) throws IOException {
InputStream inputStream = file.getInputStream();
CsvSchema schema = mapper.schemaFor(clazz).withHeader().withColumnReordering(true);
ObjectReader reader = mapper.readerFor(clazz).with(schema);
return reader.<T>readValues(inputStream).readAll();
}

Related

OpenCSV reads in additional byte value together with first line's first value together in Java

I was working on a project where we use OpenCSV to read in CSV files and fill up a database with them at start. I noticed that there is a strange thing, that in certain cases a given identifier value can not be queried. During debugging I found that OpenCSV does not read up the CSV correctly.
Let's say that I have the following CSV file:
01;foo
02;bar
...
The first line in the example is the first line in the real CSV file as well. The file is encoded in UTF-8. The following code is used to read in the value:
try (CSVReader csvReader = CSVUtils.createCSVReader(masterDataCSVPath, csvDelimiter)) {
List<String[]> masterData = csvReader.readAll();
}
The code creating the csvReader:
static private CSVParser createCSVParser(String CSVDelimiter) {
return new CSVParserBuilder().withSeparator(CSVDelimiter.charAt(0)).build();
}
static public CSVReader createCSVReader(String CSVPath, String CSVDelimiter) throws FileNotFoundException {
return new CSVReaderBuilder(new FileReader(CSVPath)).withCSVParser(createCSVParser(CSVDelimiter)).build();
}
When I read in the CSV file with the following code, during debug I get the following byte values for 01:
However if I change my CSV file to (notice the newline at the top):
01;foo
02;bar
...
The read-in data becomes:
In this case "all is good", if I remove the first item in my masterData list, I can read in the values "properly". However, this is not a clean solution:
It begs the question: Why does this happen?
Also, I do not think that we should work around the problem rather than solving it. This is only provided to work if there a newline at the beginning of my source CSV.
So I kindly ask for help, that how can this be mitigated?

This is not an OpenCSV specific problem, but rather that FileReader reads in the BOM in the UTF encoded file. This is kind of unexpected, but it makes sense, as there is no context for FileReader that it should excludes those bytes.
The solution would be to either manually remove it, or - in my case - use a library to make sure it is excluded. I wrote the following utility class:
public class CSVUtils {
private static CSVParser createCSVParser(final String CSVDelimiter) {
return new CSVParserBuilder().withSeparator(CSVDelimiter.charAt(0)).build();
}
private static BOMInputStream versatileBOMInputStreamGenerator(final InputStream inputStream) {
return new BOMInputStream(inputStream, ByteOrderMark.UTF_8, ByteOrderMark.UTF_16BE, ByteOrderMark.UTF_16LE,
ByteOrderMark.UTF_16BE, ByteOrderMark.UTF_32LE, ByteOrderMark.UTF_32BE);
}
public static CSVReader createCSVReaderFromFile(final String CSVPath, final String CSVDelimiter) throws FileNotFoundException {
return new CSVReaderBuilder(new InputStreamReader(
versatileBOMInputStreamGenerator(new FileInputStream(CSVPath)), StandardCharsets.UTF_8))
.withCSVParser(createCSVParser(CSVDelimiter)).build();
}
public static CSVReader createCSVReaderFromString(final String content, final String CSVDelimiter) {
byte[] contentBytes = content.getBytes(StandardCharsets.UTF_8);
return new CSVReaderBuilder(new InputStreamReader(
versatileBOMInputStreamGenerator(new ByteArrayInputStream(contentBytes)), StandardCharsets.UTF_8))
.withCSVParser(createCSVParser(CSVDelimiter)).build();
}
}
All I have to do is use these created CSVReader objects later where needed. As you can see, it uses some dependencies, which can be imported with
import org.apache.commons.io.ByteOrderMark;
import org.apache.commons.io.input.BOMInputStream;
These dependencies can be added to the project via the POM as follows:
<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.11.0</version>

Problem with input from user saved to file by RandomAccessFile methods

I've got a problem with input from user. I need to save input from user into binary file and when I read it and show it on the screen it isn't working properly. I dont want to put few hundreds of lines, so I will try to dexcribe it in more compact form. And encoding in NetBeans in properties of project is "UTF-8"
I got input from user, in NetBeans console or cmd console. Then I save it to object made up of strings, then add it to ArrayList<Ksiazka> where Ksiazka is my class (basically a book's properties). Then I save whole ArrayList object to file baza.bin. I do it by looping through whole list of objects of class Ksiazka, taking each String one by one and saving it into file baza.bin using method writeUTF(oneOfStrings). When I try to read file baza.bin I see question marks instead of special characters (ą, ć, ę, ł, ń, ó, ś, ź). I think there is a problem in difference in encoding of file and input data, but to be honest I don't have any idea ho to solve that.
Those are attributes of my class Ksiazka:
private String id;
private String tytul;
private String autor;
private String rok;
private String wydawnictwo;
private String gatunek;
private String opis;
private String ktoWypozyczyl;
private String kiedyWypozyczona;
private String kiedyDoOddania;
This is method for reading data from user:
static String podajDana(String[] tab, int coPokazac){
System.out.print(tab[coPokazac]);
boolean podawajDalej = true;
String linia = "";
Scanner klawiatura = new Scanner(System.in, "utf-8");
do{
try {
podawajDalej = false;
linia = klawiatura.nextLine();
}
catch(NoSuchElementException e){
System.err.println("Wystąpił błąd w czasie podawania wartości!"
+ " Spróbuj jeszcze raz!");
}
catch(IllegalStateException e){
System.err.println("Wewnętrzny błąd programu typu 2! Zgłoś to jak najszybciej"
+ " razem z tą wiadomością");
}
}while(podawajDalej);
return linia;
}
String[] tab is just array of strings I want to be able to show on the screen, each set (array) has its own function, int coPokazac is number of line from an array I want to show.
and this one saves all data from ArrayList<Ksiazka> to file baza.bin:
static void zapiszZmiany(ArrayList<Ksiazka> bazaKsiazek){
try{
RandomAccessFile plik = new RandomAccessFile("baza.bin","rw");
for(int i = 0; i < bazaKsiazek.size(); i++){
plik.writeUTF(bazaKsiazek.get(i).zwrocId());
plik.writeUTF(bazaKsiazek.get(i).zwrocTytul());
plik.writeUTF(bazaKsiazek.get(i).zwrocAutor());
plik.writeUTF(bazaKsiazek.get(i).zwrocRok());
plik.writeUTF(bazaKsiazek.get(i).zwrocWydawnictwo());
plik.writeUTF(bazaKsiazek.get(i).zwrocGatunek());
plik.writeUTF(bazaKsiazek.get(i).zwrocOpis());
plik.writeUTF(bazaKsiazek.get(i).zwrocKtoWypozyczyl());
plik.writeUTF(bazaKsiazek.get(i).zwrocKiedyWypozyczona());
plik.writeUTF(bazaKsiazek.get(i).zwrocKiedyDoOddania());
}
plik.close();
}
catch (FileNotFoundException ex){
System.err.println("Nie znaleziono pliku z bazą książek!");
}
catch (IOException ex){
System.err.println("Błąd zapisu bądź odczytu pliku!");
}
}
I think that there is a problem in one of those two methods (either I do something wrong while reading it or something wrong when it is saving data to file using writeUTF()) but even tho I tried few things to solve it, none of them worked.
After quick talk with lecturer I got information that I can use at most JDK 8.

You are using different techniques for reading and writing, and they are not compatible.
Despite the name, the writeUTF method of RandomAccessFile does not write a UTF-8 string. From the documentation:
Writes a string to the file using modified UTF-8 encoding in a machine-independent manner.
First, two bytes are written to the file, starting at the current file pointer, as if by the writeShort method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for each character.
writeUTF will write a two-byte length, then write the string as UTF-8, except that '\u0000' characters are written as two UTF-8 bytes and supplementary characters are written as two UTF-8 encoded surrogates, rather than single UTF-8 codepoint sequences.
On the other hand, you are trying to read that data using new Scanner(System.in, "utf-8") and klawiatura.nextLine();. This approach is not compatible because:
The text was not written as a true UTF-8 sequence.
Before the text was written, two bytes indicating its numeric length were written. They are not readable text.
writeUTF does not write a newline. It does not write any terminating sequence at all, in fact.
The best solution is to remove all usage of RandomAccessFile and replace it with a Writer:
Writer plik = new FileWriter(new File("baza.bin"), StandardCharsets.UTF_8);
for (int i = 0; i < bazaKsiazek.size(); i++) {
plik.write(bazaKsiazek.get(i).zwrocId());
plik.write('\n');
plik.write(bazaKsiazek.get(i).zwrocTytul());
plik.write('\n');
// ...

Univocity - writing out surrounding quotes even if field does not contain delimiter char

I have a file unloaded from a database in such a way that all varchar columns are surrounded by quotes, regardless of the actual content of the column (unfortunately the unload proces is out of my control).
Like this:
1,"Alex ,/,awesome/,","chan"
2,"Peter ,boring","pitt"
When using the following code with univocity 2.2.3 in the pom:
public class Sample {
public static void main(String[] args) throws IOException {
BeanListProcessor<Person> rowProcessor = new BeanListProcessor<Person>(Person.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setProcessor(rowProcessor);
parserSettings.getFormat().setDelimiter(',');
parserSettings.getFormat().setQuote('"');
parserSettings.getFormat().setQuoteEscape('/');
CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader("src/main/resources/person.csv"));
List<Person> beans = rowProcessor.getBeans();
Writer outputWriter = new FileWriter("src/main/resources/personOut.csv", true);
CsvWriterSettings settings = new CsvWriterSettings();
settings.getFormat().setDelimiter(',');
settings.getFormat().setQuote('"');
settings.getFormat().setQuoteEscape('/');
settings.getFormat().setCharToEscapeQuoteEscaping('\0');
settings.setRowWriterProcessor(new BeanWriterProcessor<Person>(Person.class));
CsvWriter writer = new CsvWriter(outputWriter, settings);
for (Person person : beans) {
writer.processRecord(person);
}
writer.close();
}
}
Only the columns containing the delimiter are surrounded by quotes:
1,"Alex ,/,awesome/,",chan
2,"Peter ,boring",pitt
When using settings.setQuoteAllFields(true); on the writer setting, all the fields get surrounded by quotes, but now the non varchar fields are in trouble.
How do I surround only the columns that are surrounded by quotes from the source with quotes regardless of the content of the column (e.g. delimiter is or is not present)?
Desired result:
1,"Alex ,/,awesome/,","chan"
2,"Peter ,boring","pitt"

The CSV writer doesn't provide an explicit mechanism to configure this, but you can do the following:
Parse with this:
parserSettings.setKeepQuotes(true);
parserSettings.setKeepEscapeSequences(true);
These two settings will effectively work as a "split" operation over your input CSV - you will get the entire content between delimiters. Using your sample input, the values will be parsed as:
1 | "Alex ,/,awesome/," | chan |
2 | "Peter boring" | pitt |
I'm using pipes to separate the values above to make it easier to visualize what comes out.
Now, the hacky bit, I can't guarantee this will work with future versions of the library as it uses internal API's: the CsvWriter has a processRow method which you can override. As your input values are coming properly formatted as you want them to be, you can dump them out "as-is" by just joining the values of each row with commas. Just do the following:
CsvWriter writer = new CsvWriter(outputWriter, settings){
#Override
protected void processRow(Object[] row) {
for(int i = 0; i < row.length; i++){
Object value = row[i];
appender.append(value.toString());
if(i + 1 < row.length) { //not the last column
appender.append(',');
}
appendValueToRow();
}
}
};
This will produce the output you expect, but I'm not sure if it's very useful because you simply depend on the input to be properly formatted and making changes over it will complicate things quite a bit.
The appropriate thing to do here is to add an additional configuration option to the library that would allow you to configure whether to quote a given column or not.

How to get rid of "Rogue Chars" in an .txt encoded under UTF-8

My program is reading from a .txt encoded with UTF-8. The reason why I'm using UTF-8 is to handle the characters åäö. The problem I come across is when the lines are read is that there seems to be some "rogue" characters sneaking in to the string which causes problems when I'm trying to store those lines into variables. Here's the code:
public void Läsochlista()
{
String Content = "";
String[] Argument = new String[50];
int index = 0;
Log.d("steg1", "steg1");
try{
InputStream inputstream = openFileInput("text.txt");
if(inputstream != null)
{
Log.d("steg2", "steg2");
//InputStreamReader inputstreamreader = new InputStreamReader(inputstream);
//BufferedReader bufferreader = new BufferedReader(inputstreamreader);
BufferedReader in = new BufferedReader(new InputStreamReader(inputstream, "UTF-8"));
String reciveString = "";
StringBuilder stringbuilder = new StringBuilder();
while ((reciveString = in.readLine()) != null)
{
Argument[index] = reciveString;
index++;
if(index == 6)
{
Log.d(Argument[0], String.valueOf((Argument[0].length())));
AllaPlatser.add(new Platser(Float.parseFloat(Argument[0]), Float.parseFloat(Argument[1]), Integer.parseInt(Argument[2]), Argument[3], Argument[4], Integer.parseInt(Argument[5])));
Log.d("En ny plats skapades", Argument[3]);
Arrays.fill(Argument, null);
index = 0;
}
}
inputstream.close();
Content = stringbuilder.toString();
}
}
catch (FileNotFoundException e){
Log.e("Filen", " Hittades inte");
} catch (IOException e){
Log.e("Filen", " Ej läsbar");
}
}
Now, I'm getting the error
Invalid float: "61.193521"
where the line only contains the chars "61.193521". When i print out the length of the string as read within the program, the output shows "10" which is one more character than the string is supposed to contain. The question; How do i get rid of those invisible "Rouge" chars? and why are they there in the first place?

When you save a file as "UTF-8", your editor may be writing a byte-order mark (BOM) at the beginning of the file.
See if there's an option in your editor to save UTF-8 without the BOM.
Apparently the BOM is just a pain in the butt: What's different between UTF-8 and UTF-8 without BOM?
I know you want to be able to have extended characters in your data; however, you may want to pick a different encoding like Latin-1 (ISO 8859-1).
Or you can just read & discard the first three bytes from the input stream before you wrap it with the reader.

Unfortunately you have not provided the sample text file so testing with your code exactly is not possible and here is the theoretical answer based on guess, what could have been the reasons:
Looks like it is BOM related issue and you may have to treat this. Some related detail is given here: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
And some information here: What is XML BOM and how do I detect it?
Basically there are various situation:
In one of the situation we face issues when we don't read and write using correct encoding.
In another situation we use an editor or reader which doesn't support UTF-8
Third is when we are using correct encoding for reading and writing, we are not facing issue in a text editor but facing issue in some other application or program. I think your issues is related to third case.
In third situation we may have to remove the BOM using a program or deal with it according to our context.
Here is some solution I guess you may find interesting:
UTF-8 file reading: the first character issue
You can use code given in this threads answer or use apache commons to deal with it:
Byte order mark screws up file reading in Java

Add comment to an ARFF file

this is my first question in this forum....
I'm making adata-mining application in java with the WEKA API.
I make first a pre-processing stage and when I save the ARFF file i would like to add a couple of lines (as comments) specifing the preprocessing task that i have done to the file...
the problem is that i don't know how to add comments to an ARFF file from the java WEKA API.
To save the file i use the class ArffSaver like this...
try {
ArffSaver saver = new ArffSaver();
saver.setInstances(dataPost);
saver.setFile(arffFile);
saver.writeBatch();
return true;
} catch (IOException ex) {
Logger.getLogger(Preprocesamiento.class.getName()).log(Level.SEVERE, null, ex);
return false;
}
I would be really greatfull if someone could give some idea...
thanks!

You should AVOID writting comments on an .arff file, even more when writting it from Java. These files are very "parser-sensitive". The Weka API to create these files is restrictive for this particular reason.
Even though, you can always add your comments manually with the % symbol. This said, I wouldn't recommend you writting anything more than instances, attributes and values into an .arff file. ;-)

I don't see a reason to not write comments into the header of an ARFF file. The specification clearly says:
Lines that begin with a % are comments.
So while it is technically valid, it can be difficult if you want to use the ArffSaver#setFile method. This method does a lot of (convenient, but somewhat arbitrary and unspecified) work internally, until it finally calls
setDestination(new FileOutputStream(m_outputFile));
If this is not required, the easiest option is to write directly to an OutputStream, which then can simply be set as the destination for the ArffSaver. This can be wrapped in a small helper method, for example, like this:
static void writeArff(
Instances instances,
List<String> commentLines,
OutputStream outputStream) throws IOException
{
ArffSaver saver = new ArffSaver();
saver.setInstances(instances);
if (commentLines != null && !commentLines.isEmpty())
{
BufferedWriter bw = new BufferedWriter(
new OutputStreamWriter(outputStream));
for (String commentLine : commentLines)
{
bw.write("% " + commentLine + "\n");
}
bw.write("\n");
bw.flush();
}
saver.setDestination(outputStream);
saver.writeBatch();
}
When calling it like this
List<String> comments = Arrays.asList("A comment", "Another one");
writeArff(instances, comments, outputStream);
then the given comments will be inserted at the top of the ARFF file.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.