I admit I am not a great Java programmer and probably my question is pretty dumb.I need to add new columns in different places to an existing csv file. I'm using the super-csv library.
My input file is something like that
1,2011-5-14 16:30:0.250,A
2,2011-5-14 16:30:21.500,B
3,2011-5-14 16:30:27.000,C
4,2011-5-14 16:30:29.750,B
5,2011-5-14 16:30:34.500,F
AS you can see, i have (or need) no header.
And I need to add a column in column(2) and a column at the end of each row in order to get:
1,1,2011-5-14 16:30:0.250,A,1
2,1,2011-5-14 16:30:21.500,B,1
3,1,2011-5-14 16:30:27.000,C,1
4,1,2011-5-14 16:30:29.750,B,1
5,1,2011-5-14 16:30:34.500,F,1
From the library documentation i got (am i wrong?) that I cannot directly modify the original file, but the best idea is to read it and write it back. I guess using CsvMapReader and CsvMapwriter could be a good choice. But how can I add the columns in between existing ones? I should read each field of the existing column separately, and I tried to find suggestions in the library documentation but i cannot understand how to do it.
You can do it using CsvListReader and CsvListWriter classes. Below you can see simple example how to do it:
CsvListReader reader = new CsvListReader(new FileReader(inputCsv), CsvPreference.STANDARD_PREFERENCE);
CsvListWriter writer = new CsvListWriter(new FileWriter(outputCsv), CsvPreference.STANDARD_PREFERENCE);
List<String> columns;
while ((columns = reader.read()) != null) {
System.out.println("Input: " + columns);
// Add new columns
columns.add(1, "Column_2");
columns.add("Last_column");
System.out.println("Output: " + columns);
writer.write(columns);
}
reader.close();
writer.close();
This is a simple example. You have to catch all exception and close streams in finally block.
Related
I have some data that I want to write.
Code:
private void saveStats(int early, int infected, int recovered, int deads, int notInfected, int vaccinated, int iteration){
try
{
FileWriter txt = new FileWriter("statistic.csv");
txt.write((String.valueOf(early)));
txt.write(";");
txt.write(String.valueOf(infected));
txt.write(";");
txt.write((String.valueOf(recovered)));
txt.write(";");
txt.write((String.valueOf(deads)));
txt.write(";");
txt.write((String.valueOf(notInfected)));
txt.write(";");
txt.write((String.valueOf(vaccinated)));
txt.write("\n");
txt.close();
} catch (IOException ex)
{
ex.printStackTrace();
System.out.println("Error!");
}
}
I will use this function to save the iteration number and some additional data; for example:
Iteration Infected Recovered Dead NotInfected Vaccinated
1 200 300 400 500
2 300 400 600 900
etc
A perfect solution would have the first row of the file hold names for each column, similar to what's written above.
For something like this, it is a good idea to use an existing Java CSV library. One possibility is Apache Commons CSV. "Google is your friend" if you want to find tutorials or other alternatives.
But if you wanted to "roll your own" code, there are various ways to do it. The simplest way to change your code so that it records multiple rows in the CSV would be to change
new FileWriter("statistic.csv");
to
new FileWriter("statistic.csv", true);
That opens the file in "append" mode, and the new row will be added at the end of the file instead of replacing the existing row.
You should also use Java 7+ try with resources to manage the FileWriter. That will make sure that the FileWriter is always properly closed.
If you want to get fancy with CSV headers, more efficient file handling, etc, you will need to write your own CSVWriter class. But if you are doing that, you would be better off using a library someone has already designed, written and tested. (See above!)
I using opencsv library in java and export csv. But i have problem. When i used string begin zero look like : 0123456 , when i export it remove 0 and my csv look like : 123456. Zero is missing. I using way :
"\"\t"+"0123456"+ "\""; but when csv export it look like : "0123456" . I don't want it. I want 0123456. I don't want edit from excel because some end user don't know how to edit. How to export csv using open csv and keep 0 begin string. Please help
I think it is not really the problem when generating CSV but the way excel treats the data when opened via explorer.
Tried this code, and viewed the CSV in a text editor ( not excel ), notice that it shows up correctly, though when opened in excel, leading 0s are lost !
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"));
// feed in your array (or convert your data to an array)
String[] entries = "0123131#21212#021213".split("#");
List<String[]> a = new ArrayList<>();
a.add(entries);
//don't apply quotes
writer.writeAll(a,false);
writer.close();
If you are really sure that you want to see the leading 0s for numeric values when excel is opened by user, then each cell entry be in format ="dataHere" format; see code below:
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"));
// feed in your array (or convert your data to an array)
String[] entries = "=\"0123131\"#=\"21212\"#=\"021213\"".split("#");
List<String[]> a = new ArrayList<>();
a.add(entries);
writer.writeAll(a);
writer.close();
This is how now excel shows when opening excel from windows explorer ( double clicking ):
But now, if we see the CSV in a text editor, with the modified data to "suit" excel viewing, it shows as :
Also see link :
format-number-as-text-in-csv-when-open-in-both-excel-and-notepad
have you tried to use String like this "'"+"0123456". ' char will mark number as text when parse into excel
For me OpenCsv works correctly ( vers. 5.6 ).
for example my csv file has a row as the following extract:
"999739059";;;"abcdefgh";"001024";
and opencsv reads the field "1024" as 001024 corretly. Of course I have mapped the field in a string, not in a Double.
But, if you still have problems, you can grab a simple yet powerful parser that fully adheres with RFC 4180 standard:
mykong.com
Mykong shows you some examples using opencsv directly and, in the end, he writes a simple parser to use if you don't want to import OpenCSV , and the parser works very well , and you can use it if you still have any problems.
So you have an easy-to-understand source code of a simple parser that you can modify as you want if you still have any problem or if you want to customize it for your needs.
I'm trying to find the best way to read in data from a file similar to an Excel document. It doesn't necessarily need to be an actual excel document, just any file that allows you to enter data in a grid format.
Something where I would be able to do manipulation similar to this:
String val = file.readString(column,row);
float val2 = file.readFloat(column,row);
I'm sorry, I usually try to do more research before I post a question here but I was having a hard time finding much info. A lot of what I saw was 3rd party libraries that read excel files. I'm really hoping if possible I can avoid downloading libraries and hopefully use built in ones.
So I guess my questions in short are:
What's the most appropriate file format for this?
What's the best way to read data from that file?
The first thing that comes to my mind is CSV. CSV files are just regular text files with the .csv filename extension. Data is stored in this format:
cell,anothercell,athirdcell
anotherrow,anothercellonthenewrow,thirdcellofsecondrow
For more specifics, read the CSV specs here.
Option 1
Store your data in a CSV and read with any kind of reader (e.g. BufferedReader). This might be the easiest and fastest solution, if you want to use Excel/LibreOffice for entering data.
Please check out the answers in these threads for various solutions.
String csvfile = path;
BufferedReader br = null;
String line = "";
String cvsSplitby = ";";
try {
br = new BufferedReader(new FileReader(csvfile));
while ((line = br.readLine()) != null) {
String[] i = line.split(cvsSplitby);
// do stuff
}
} catch (all kind of exceptions e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Hope I didn't miss anything important.
Option 2
Use POI Apache.
Option 3
I've made some decent experience with JXL, but I understand that you don'T want to include too many external libs. (I just saw that it hasn't been updated in while. Consider the other options!)
For a project I need to deal with CSV files where I do not know the columns before runtime. The CSV files are perfectly valid, I only need to perform a simple task on several different files over and over again. I do need to analyse the values of the columns, which is why I would need to use a library for working with CSV files. For simplicity, lets assume that I need to do something simple like appending a date column to all files, regardless how many columns they have. I want to do that with Super CSV, because I use the library for other tasks as well.
What I am struggeling with is more a conceptual issue. I am not sure how to deal with the files if I do not know in advance how many columns there are. I am not sure how I should define POJOs that map arbitrary CSV files or how I should define the Cell Processors if I do not know which and how many columns will be in the file. How can I dynamically create Cell processors that match the number of columns? How would I define POJOs for instance based on the header of the CSV file?
Consider the case where I have two CSV files: products.csv and address.csv. Lets assume I want to append a date column with today’s date for both files, without having to write two different methods (e.g. addDateColumnToProduct() and addDateColumnToAddress()) which do the same thing.
product.csv:
name, description, price
"Apple", "red apple from Italy","2.5€"
"Orange", "orange from Spain","3€"
address.csv:
firstname, lastname
"John", "Doe"
"Coole", "Piet"
Based on the header information of the CSV files, how could I define a POJO that maps the product CSV? The same question for Cell Processors? How could I define even a very simple cell processor that just basically has the right amount of parameters for the constructor, e.g. for the product.csv
CellProcessor[] processor = new CellProcessor[] {
null,
null,
null
};
and for the address.csv:
CellProcessor[] processor = new CellProcessor[] {
null,
null
};
Is this even possible? Am I on the wrong track to achieve this?
Edit 1:
I am not looking for a solution that can deal with CSV files having variable columns in one file. I try to figure out if it is possible dealing with arbitrary CSV files during runtime, i.e. can I create POJOs based only on the header information which is contained in the CSV file during runtime. Without knowing in advance how many columns a csv file will have.
Solution
Based on the answer and comments from #baba
private static void readWithCsvListReader() throws Exception {
ICsvListReader listReader = null;
try {
listReader = new CsvListReader(new FileReader(fileName), CsvPreference.TAB_PREFERENCE);
listReader.getHeader(true); // skip the header (can't be used with CsvListReader)
int amountOfColumns=listReader.length();
CellProcessor[] processor = new CellProcessor[amountOfColumns];
List<Object> customerList;
while( (customerList = listReader.read(processor)) != null ) {
System.out.println(String.format("lineNo=%s, rowNo=%s, customerList=%s", listReader.getLineNumber(),
listReader.getRowNumber(), customerList));
}
}
finally {
if( listReader != null ) {
listReader.close();
}
}
}
Maybe a little bit late but could be helpful...
CellProcessor[] processors=new CellProcessor[properties.size()];
for(int i=0; i< properties.zise(); i++){
processors[i]=new Optional();
}
return processors;
This is a very common issue and there are multiple tutorials on the internetz, including the Super Csv page:
http://supercsv.sourceforge.net/examples_reading_variable_cols.html
As this line says:
As shown below you can execute the cell processors after calling
read() by calling the executeProcessors() method. Because it's done
after reading the line of CSV, you have an opportunity to check how
many columns there are (using listReader.length()) and supplying the
correct number of processors.
I have more than 10 million JSON documents of the form :
["key": "val2", "key1" : "val", "{\"key\":\"val", \"key2\":\"val2"}"]
in one file.
Importing using JAVA Driver API took around 3 hours, while using the following function (importing one BSON at a time):
public static void importJSONFileToDBUsingJavaDriver(String pathToFile, DB db, String collectionName) {
// open file
FileInputStream fstream = null;
try {
fstream = new FileInputStream(pathToFile);
} catch (FileNotFoundException e) {
e.printStackTrace();
System.out.println("file not exist, exiting");
return;
}
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
// read it line by line
String strLine;
DBCollection newColl = db.getCollection(collectionName);
try {
while ((strLine = br.readLine()) != null) {
// convert line by line to BSON
DBObject bson = (DBObject) JSON.parse(JSONstr);
// insert BSONs to database
try {
newColl.insert(bson);
}
catch (MongoException e) {
// duplicate key
e.printStackTrace();
}
}
br.close();
} catch (IOException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
Is there a faster way? Maybe, MongoDB settings may influence the insertion speed? (for, example adding key : "_id" which will function as index, so that MongoDB would not have to create artificial key and thus index for each document) or disable index creation at all at insertion.
Thanks.
I'm sorry but you're all picking minor performance issues instead of the core one. Separating the logic from reading the file and inserting is a small gain. Loading the file in binary mode (via MMAP) is a small gain. Using mongo's bulk inserts is a big gain, but still no dice.
The whole performance bottleneck is the BSON bson = JSON.parse(line). Or in other words, the problem with the Java drivers is that they need a conversion from json to bson, and this code seems to be awfully slow or badly implemented. A full JSON (encode+decode) via JSON-simple or specially via JSON-smart is 100 times faster than the JSON.parse() command.
I know Stack Overflow is telling me right above this box that I should be answering the answer, which I'm not, but rest assured that I'm still looking for an answer for this problem. I can't believe all the talk about Mongo's performance and then this simple example code fails so miserably.
I've done importing a multi-line json file with ~250M records. I just use mongoimport < data.txt and it took 10 hours. Compared to your 10M vs. 3 hours I think this is considerably faster.
Also from my experience writing your own multi-threaded parser would speed things up drastically. The procedure is simple:
Open the file as BINARY (not TEXT!)
Set markers(offsets) evenly across the file. The count of markers depends on the number of threads you want.
Search for '\n' near the markers, calibrate the markers so they are aligned to lines.
Parse each chunk with a thread.
A reminder:
when you want performance, don't use stream reader or any built-in line-based read methods. They are slow. Just use binary buffer and search for '\n' to identify a line, and (most preferably) do in-place parsing in the buffer without creating a string. Otherwise the garbage collector won't be so happy with this.
You can parse the entire file together at once and the insert the whole json in mongo document, Avoid multiple loops, You need to separate the logic as follows:
1)Parse the file and retrieve the json Object.
2)Once the parsing is over, save the json Object in the Mongo Document.
I've got a slightly faster way (I'm also inserting millions at the moment), insert collections instead of single documents with
insert(List<DBObject> list)
http://api.mongodb.org/java/current/com/mongodb/DBCollection.html#insert(java.util.List)
That said, it's not that much faster. I'm about to experiment with setting other WriteConcerns than ACKNOWLEDGED (mainly UNACKNOWLEDGED) to see if I can speed it up faster. See http://docs.mongodb.org/manual/core/write-concern/ for info
Another way to improve performance, is to create indexes after bulk inserting. However, this is rarely an option except for one off jobs.
Apologies if this is slightly wooly sounding, I'm still testing things myself. Good question.
You can also remove all the indexes (except for the PK index, of course) and rebuild them after the import.
Use bulk operations insert/upserts. After Mongo 2.6 you can do Bulk Updates/Upserts. Example below does bulk update using c# driver.
MongoCollection<foo> collection = database.GetCollection<foo>(collectionName);
var bulk = collection.InitializeUnorderedBulkOperation();
foreach (FooDoc fooDoc in fooDocsList)
{
var update = new UpdateDocument { {fooDoc.ToBsonDocument() } };
bulk.Find(Query.EQ("_id", fooDoc.Id)).Upsert().UpdateOne(update);
}
BulkWriteResult bwr = bulk.Execute();
You can use a bulk insertion
You can read the documentation at mongo website and you can also check this java example on StackOverflow