I would like to update a .csv file containing 1 million records. I want to update in the same file without creating another file. I am using Opencsv. I don't find any way to update the .csv file. If anyone can help me out that would be great.
For clarification let's say these small .csv file:
Initial csv file:
a,b,c,d
1,2,3,4
5,6,7,8
12,13,14,15
Desired csv file:
a,b,c,d,e,f
1,2,3,4, ,17
5,6,7,8,16,
12,13,14,15,
Basically - you cannot do that.
As csv does not use fixed-length records, most changes will require moving data up or down in the file. That cannot be done without completely rewriting the file.
1,2,3,4
5,6,7,8
...
changing the 8 to a 10 would require every byte from that location onwards in the whole file to be moved up one position.
To achieve the effect of editing the file it is usual to copy the file to a new name making the change you wish to make during the copy. You would then rename the two files so that the new one replaces the old one.
Normally file system doesnt allow to update the file. You need to rewrite the file. Because even though you have all in one file . it will be written in multiple segments. So if you are going to change the file segments will be damaged. So you need to rewrite every thing.
Related
I have to read a sequential file which has over a million of records. I have to read each line/record and have to delete that record/line from the file and keep on reading.
Not finding any example on how to do that without using temporary file or creating/recreating a new file of the same name.
These are text files. Each file is about .5 GB big and we have over a million lines/records in each file.
Currently we are copying all the records to memory as we do not want to re-process any record if any thing happens in the middle of the processing of a file.
Assuming that the file in question is a simple sequential file - you can't. In the Java file model, deleting part of a file implies deleting all of it after the deletion point.
Some alternative approaches are:
In your process copy the file, omitting the parts you want deleted. This is the normal way of doing this.
Overwrite the parts of the file you want deleted with some value that you know never occurs in the file, and then at a later date copy the file, removing the marked parts.
Store the entire file in memory, edit it as required, and write it again. Just because you have a million records doesn't make that impossible. If your files are 0.5GB, as you say, then this approach is almost certainly viable.
Each time you delete some record, copy all of the contents of the file after the deletion to its new position. This will be incredibly inefficient and error-prone.
Unless you can store the file in memory, using a temporary file is the most efficient. That's why everyone does it.
If this is some kind of database, then that's an entirely different question.
EDIT: Since I answered this. comments have indicated that what the user wants to do is use deletion to keep track of which records have already been processed. If that is the case, there are much simpler ways of doing this. One good way is to write a file which just contains a count of how many bytes (or records) of the file have been processed. If the processor crashes, update the file by deleting the records that have been processed and start again.
Files are unstructured streams of bytes; there is no record structure. You can not "delete" a "line" from an unstructured stream of bytes.
The basic algorithm you need to use is this:
create temporary file.
open input file
if at the end of the file, goto line 7
read a line from the input file
if the line is not to be deleted, write it to the temporary file
goto line 3
close the input file.
close the temporary file.
delete (or just rename) the input file.
rename (or move) the temporary file to have the original name of the input file.
There is a similar question asked, "Java - Find a line in a file and remove".
Basically they all use a temp file, there is no harm doing so. So why not just do it? It will not affect your performance much and can avoid some errors.
Why not a simple sed -si '/line I want to delete/d' big_file?
I am having an excel document with names populated. I want to read the characteristics of the names from multiple other documents using vlookup. I am using java poi(just started, so open to look at other solutions). I also need to do some calculations on some of the populated columns.
So far, I am able to read an excel file. Any other pointers would be helpful.
Thanks.
Update: More information
I have 2 files,
File 1 columns - Name, detail1,detail2
File 2 columns - Name, detail3,detail4 & so on
So I need to read both the files and copy the File 2 columns in File1 using vlookups. I need to automate this using java.
I have the code to read and write the files, so I just need to implement the vlookups.
I want to find a line in a text file and delete it.
Ex:
aaa aaa
bbb -> ccc (bbb deleted)
ccc
There is a example to do that but it writes to new file and rewrite. how to do that without writing to another file(also with out rewriting to same file also because it contains lot of data)
I think the only to do this is by writing a new file.
If you think about the file system, and how the file is stored on the hard drive. Most file systems will store the file as a sequential block on your hard drive. If you just remove those bytes from the file, then there would be an empty hole in the file. You would have to shift the bytes which are to the right of the bytes you removed, to the left in the file. Either way the file system is limiting you.
Instead you can try to minimize the number of times you write a new file. For example if you're making a word processor for large file, you can instead save a file which contains the edits the user made, and execute those commands after the user closes the file (and write the file only once).
did you try to get the text file content to a string array variable?
here is a pseudo code how would I do that;
var your_text_content= txt file;
var your_new_text_content = your_text_content.replace("line you are looking for","");
txtfile = your_new_text_content;
You can't. All software that has this feature copy the content of the file to a variable and delete the line and copy back the content where the line is deleted.
So you need to copy the content of the file. Read line by line until you reach the line and delete it and save the content again.
This is how Notepad and other editors work. That's why most of those editors only accepts files up to 1 GB. Because it takes time with huge files.
Hope this helps.
I will not post any code because you already said you found a solution for this.
I have a large size CSV file that is almost 60MB. This CSV file contains Object_Name, timestamp, and value.
In the CSV file, there are 10 objects that are listed in csv file based on time sequence, but those are overlap. such as :
A1,2013-08-24 15:36:47,24.83
A2,2013-08-24 15:36:47,26.56
A3,2013-08-24 15:36:47,25.83
A6,2013-08-24 15:36:47,-40
A8,2013-08-24 15:36:47,-40
A9,2013-08-24 15:36:47,-40
B2,2013-08-24 15:36:47,6
C1,2013-08-24 15:37:18,6
I want to classfy those records by object_name. If the size of file is small, I can do it. In this situation, I spend 10 mins to read the csv file. I could not image to classify the data, probably crash my laptop. The expected results are 10 list, each of them contain only one object with timestamp and value, such as,
Object_Name,timestamp,val
A1,2013-08-24 15:00:00,26.7
.....
.....
Could someone help me? Basically, I just want to know a effective way which sorts these data by object name and separates from it.
BTW, I use opencsv to read csv file.
Thank you.
I would like to update specific part of a text file using Java. I would like to be able to scan through the file and select specific lines to be updated, a bit like in a database, for instance given the file:
ID Value
1 100
2 500
4 20
I would like to insert 3 and update 4, e.g.
ID Value
1 100
2 500
3 80
4 1000
Is there a way to achieve this (seemingly) easy task? I know you can append to a file, but I am more interested in a random access
I know you can append to a file, but I am more interested in a random access
You're trying to insert and delete bytes in the middle of a file. You can't do that. File systems simply don't (in general) support that. You can overwrite specific bytes, but you can't insert or delete them.
You could update specific records with random access if your records were fixed-length (in bytes) but it looks like that's not the case.
You could either load the whole file into memory, or read from the original file, writing to a new file with either the old data or the new data as appropriate on a per line basis.
You can do so using Random Access files in java where you can place your current write and read position using available methods. you can explore more here
Load the file into memory, change your value, and then re-write the file
if there's a way to insert into a file without loading it, I haven't heard of it. You have to move the other data out of the way first.
unless you're dealing with huge files, frequently, performance isn't too much of a concern
As said by the previous answers, it's not possible to do that symply using streams. You could try to use properties, that are key, value pairs that can be saved and modified in a text file.
For example you can add to a file a new property with the command
setProperty(String key, String value)
This method adds a new property or, if already existing, modifies the value of the property with the choosen key.
Obviously, new properties are added at the end of the file but the lack of ordering is not a problem for performances because the access to the file is made with the getProperty method that calls the Hashtable method put.
See this tutorial for some examples:
http://docs.oracle.com/javase/tutorial/essential/environment/properties.html