I'm trying to load in a large csv file that has all the bitcoin trade data. First I try to calculte the size of the data. It does this by using a loop that increments' one to the size and stops when readline returns a null.
The csv file has 764732 lines of data. My program only reads in the first 100000. The last line matches the line kin my text editor (Komodo), which displays another 664732 lines of data.
Is there something wrong with my code, or is it java cannot handle a really big text file??
The code
BufferedReader br = new BufferedReader(new FileReader(FileName));
for(int size=0; (line=br.readLine())!=null;)
size++;
br.close();
// last line is empty
size--;
Declare size field outside the loop and print afterwards.
BufferedReader br = new BufferedReader(new FileReader(fileName));
int size = 0;
String line;
while ((line = br.readLine()) != null) {
size++;
}
br.close();
System.out.printf("%,d lines%n", size);
Max integer value (Integer.MAX_VALUE) is 2,147,483,647 so if you plan to read files with over 2 billion lines (that lines not bytes) then switch int to long type. Java can read very large files and BufferedReader.readLine() in particular can read very large files as long as the largest number of sequential characters between end-of-line markers fits into memory.
had size as a int works if I make size a long. Looks like it was reading in all the data*
If you want to calculate number of lines in a file with Java, here's an example
http://www.mkyong.com/java/how-to-get-the-total-number-of-lines-of-a-file-in-java/
Related
I have to modify a text file in java.
eg this is the file before modify
line
line
line
line
line
line
and after it should look like:
line
line
this is another
line
line
line
line
So don't write over anything, only add a line between the 2. and 3. line, and the original 3. line will be the new 4. line.
A way is to make a temp file, write every line in it, and where I want to modify I do the modification. Than delet the original, and rename the temp file. Or read the temp file and write it to te original file.
But is there any way to read and modify a file like I want using the same class in java?
thx!
You can read and modify to and from a file in Java at the same time. The problem you have though is that you need to insert data here, in order to do that everything after the new line needs to be shuffled down and then the length of the file extended.
Depending on exactly what and why you are trying to do there are a number of ways to do this, the easiest is probably to scan the file copying it to a new location and inserting the new values as you go. If you need to edit in place though then it's more complicated but essentially you do the same thing: Read X characters to a buffer, overwrite the X characters in the file with the new data, read next X characters. Overwrite the just-read characters from the first buffer. Repeat until EOF.
Think of files on disk as arrays - if you want to insert some items into the middle of an array, you need to shift all of them to make room.
The only safe way is to create a new temp file, copy the old file line by line and then rename it, just as you suggested. By updating the same file directly on the disk you risk losing the data if anything goes wrong and you would use a lot of memory.
Try this:
public void writeAfterNthLine(String filename, String text, int lineno) throws IOException{
File file = new File(filename);
File temp = File.createTempFile("temp-file-name", ".tmp");
BufferedReader br = new BufferedReader(new FileReader( file ));
PrintWriter pw = new PrintWriter(new FileWriter( temp ));
String line;
int lineCount = 0;
while ((line = br.readLine()) != null) {
pw.println(line);
if(lineCount==lineno){
pw.println(text);
}
lineCount++;
}
br.close();
pw.close();
file.delete();
temp.renameTo(file);
}
The code is not tested, but it should work, you can improve the code with several validations and exception handling
I want to read text files and convert each word to a number. Then for each file write sequence of numbers instead of word in a new file. I used a HashMap to assigned just one number (identifier) for each word, for instance, the word apple is assigned to number 10 so whenever, I see apple in a text file I write 10 in the sequence. I need to have just one HashMap to prevent assigned more than one identifier to a word. I wrote the following code but it process file slowly. For instance, converting a text file with size 165.7 MB to a file of sequence took 20 hours. I need to convert 600 text file with the same size to sequence files. I want to know is there any way to improve the efficiency of my code . The following function is called for each text file.
public void ConvertTextToSequence(File file) {
try{
FileWriter filewriter=new FileWriter(path.keywordDocIdsSequence,true);
BufferedWriter bufferedWriter= new BufferedWriter(filewriter);
String sequence="";
FileReader fileReader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fileReader);
String line = bufferedReader.readLine();
while(line!=null)
{
StringTokenizer tokens = new StringTokenizer(line);
String str;
while (tokens.hasMoreTokens())
{
str = tokens.nextToken();
if(keywordsId.containsKey(str))
sequence= sequence+" "+keywordsId.get(stmWord);
else
{
keywordsId.put(str,id);
sequence= sequence+" "+id;
id++;
}
if(keywordsId.size()%10000==0)
{
bufferedWriter.append(sequence);
sequence="";
start=id;
}
}
String line = bufferedReader.readLine();
}
}
if(start<id)
{
bufferedWriter.append(sequence);
}
bufferedReader.close();
fileReader.close();
bufferedWriter.close();
filewriter.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
The constructor of that class is:
public ConvertTextToKeywordIds(){
path= new LocalPath();
repository= new RepositorySQL();
keywordsId= new HashMap<String, Integer>();
id=1;
start=1;}
I suspect that the speed of your program is tied to the rehashing of the hash map as the number of words grows. Each rehash can incur a significant time penalty as the size of the hash map grows. You could try and estimate the number of unique words you expect and use that to initialize the hash map.
As mentioned by #JB Nizet you may want to write directly to the buffered writer rather than waiting to accumulate a number of entries. Since the buffered writer is already set up to write only when it has accumulated enough changes.
Your most effective performace boost is probably using StringBuilder instead of String for your sequence.
I would also write and flush the sequence each time it exceeds a certain length rather than whenever you've added 10000 words to your map.
This map could get pretty huge - have you considered improving that? If you hit millions of entries you may get better performance using a database.
I have to modify a text file in java.
eg this is the file before modify
line
line
line
line
line
line
and after it should look like:
line
line
this is another
line
line
line
line
So don't write over anything, only add a line between the 2. and 3. line, and the original 3. line will be the new 4. line.
A way is to make a temp file, write every line in it, and where I want to modify I do the modification. Than delet the original, and rename the temp file. Or read the temp file and write it to te original file.
But is there any way to read and modify a file like I want using the same class in java?
thx!
You can read and modify to and from a file in Java at the same time. The problem you have though is that you need to insert data here, in order to do that everything after the new line needs to be shuffled down and then the length of the file extended.
Depending on exactly what and why you are trying to do there are a number of ways to do this, the easiest is probably to scan the file copying it to a new location and inserting the new values as you go. If you need to edit in place though then it's more complicated but essentially you do the same thing: Read X characters to a buffer, overwrite the X characters in the file with the new data, read next X characters. Overwrite the just-read characters from the first buffer. Repeat until EOF.
Think of files on disk as arrays - if you want to insert some items into the middle of an array, you need to shift all of them to make room.
The only safe way is to create a new temp file, copy the old file line by line and then rename it, just as you suggested. By updating the same file directly on the disk you risk losing the data if anything goes wrong and you would use a lot of memory.
Try this:
public void writeAfterNthLine(String filename, String text, int lineno) throws IOException{
File file = new File(filename);
File temp = File.createTempFile("temp-file-name", ".tmp");
BufferedReader br = new BufferedReader(new FileReader( file ));
PrintWriter pw = new PrintWriter(new FileWriter( temp ));
String line;
int lineCount = 0;
while ((line = br.readLine()) != null) {
pw.println(line);
if(lineCount==lineno){
pw.println(text);
}
lineCount++;
}
br.close();
pw.close();
file.delete();
temp.renameTo(file);
}
The code is not tested, but it should work, you can improve the code with several validations and exception handling
Is there a way of reading, say, every 10 lines from a .txt file using a BufferedReader? At the moment my BufferedReader is reading every line, splitting the different values and storing them in an array list; which is then used elsewhere in my program.
Use LineNumberReader which is intended for this very purpose:
LineNumberReader reader = new LineNumberReader(fileReader);
ArrayList<String> goodLines = new ArrayList<String>();
String line = null;
while ((line = reader.readLine()) != null) {
if ((reader.getLineNumber()+1) % 10 == 0) {
goodLines.add(line);
}
}
Use a loop to read all the lines you don't want, then read the line you do want.
BufferedReader br = new BufferedReader(new FileReader(file));
int index = 10;
while (lineNumber < index - 1)
{
lineNumber++;
br.readLine();
}
String lineYouWant = br.readLine();
if (lineYouWant.isEmpty()) br.close();
// Do stuff with lineYouWant
br.close();
Since all of your lines are the same size you could look at the skip() method in the BufferedReader. You would basically read a line and then skip 10 * lineSize and read the next line, etc...
The purpose of a buffered reader is to make reading logical units like lines easy. Reading multiple lines would complicate your code and not provide a great performance boost since the buffered reader is already reading large blocks of data into its buffer.
Edit: Since your records are fixed size you could use a lower level reader and just read the amount of bytes required.
I am writing a small java app which will scan a text file for any instances of particular word and need to have a feature whereby it can report that an instance of the word was found to be the 14th word in the file, on the third line, for example.
For this i tried to use the following code which i thought would check to see whether or not the input was a newline (\n) character and then incerement a line variable that i created:
FileInputStream fileStream = new FileInputStream("src/file.txt");
DataInputStream dataStream = new DataInputStream(fileStream);
BufferedReader buffRead = new BufferedReader(new InputStreamReader(dataStream));
String strLine;
String Sysnewline = System.getProperty("line.separator");
CharSequence newLines = Sysnewline;
int lines = 1;
while ((strLine = buffRead.readLine()) != null)
{
if(strLine.contains(newLines))
{
System.out.println("Line Found");
lines++;
}
}
System.out.println("Total Number Of Lines In File: " + lines);
This does not work for, it simply display 0 at the end of this file. I know the data is being placed into strLine during the while loop as if i change the code slightly to output the line, it is successfully getting each line from the file.
Would anyone happen to know the reason why the above code does not work?
Read the javadocs for readLine.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
readLine() strips newlines. Just increment every iteration of the loop. Also, you're overcomplicating your file reading code. Just do new BufferedReader(new FileReader("src/file.txt"))