I have a list of log files, and I need to find which one has a latest edition of a specific line, and all or none could have this line.
The lines in the files look like this:
2013/01/06 16:01:00:283 INFO ag.doLog: xxxx xxxx xxxx xxxx
And I need a line lets say
xx/xx/xx xx:xx:xx:xxx INFO ag.doLog: the line i need
I know how to get an array of files, and if I scan backwards I could find the latest latest line in each file (if it exists).
Biggest problem is that the file could be big (2k lines?) and I want to find the line in a relative fast way (a few seconds), so I am open for suggestion.
Personal ideas:
If a file has the line at X time, then any file that has not found the line before X time should not be scan anymore. This will require to search all files at the same time, which i dont know how.
Atm the code breaks, and I suppose if lack of memory.
Code:
if(files.length>0) { //in case no log files exist
System.out.println("files.length: " + files.length);
for(int i = 0; i < files.length; i++) { ///for each log file look for string
System.out.println("Reading file: " + i + " " + files[i].getName());
RandomAccessFile raf = new RandomAccessFile(files[i].getAbsoluteFile(), "r"); //open log file
long lastSegment = raf.length(); //Finds how long is the files
lastSegment = raf.length()-5; //Sets a point to start looking
String leido = "";
byte array[] = new byte[1024];
/*
* Going back until we find line or file is empty.
*/
while(!leido.contains(lineToSearch)||lastSegment>0) {
System.out.println("leido: " + leido);
raf.seek(lastSegment); //move the to that point
raf.read(array); //Reads 1024 bytes and saves in array
leido = new String(array); //Saves what is read as a string
lastSegment = lastSegment-15; //move the point a little further back
}
if(lastSegment<0) {
raf.seek(leido.indexOf(lineToSearch) - 23); //to make sure we get the date (23 characters long) NOTE: it wont be negative.
raf.read(array); //Reads 1024 bytes and saves in array
leido = new String(array); //make the array into a string
Date date = new SimpleDateFormat("MMMM d, yyyy", Locale.ENGLISH).parse(leido.substring(0, leido.indexOf(" INFO "))); //get only the date part
System.out.println(date);
//if date is bigger than the other save file name
}
}
}
I find the code difficult to verify. One could split the task in a backwards reader, which reads lines from file end to start. And use that for parsing dates line wise.
Mind, I am not going for nice code, but something like this:
public class BackwardsReader implements Closeable {
private static final int BUFFER_SIZE = 4096;
private String charset;
private RandomAccessFile raf;
private long position;
private int readIndex;
private byte[] buffer = new byte[BUFFER_SIZE];
/**
* #param file a text file.
* #param charset with bytes '\r' and '\n' (no wide chars).
*/
public BackwardsReader(File file, String charset) throws IOException {
this.charset = charset;
raf = new RandomAccessFile(file, "r");
position = raf.length();
}
public String readLine() throws IOException {
if (position + readIndex == 0) {
raf.close();
raf = null;
return null;
}
String line = "";
for (;;) { // Loop adding blocks without newline '\n'.
// Search line start:
boolean lineStartFound = false;
int lineStartIndex = readIndex;
while (lineStartIndex > 0) {
if (buffer[lineStartIndex - 1] == (byte)'\n') {
lineStartFound = true;
break;
}
--lineStartIndex;
}
String line2;
try {
line2 = new String(buffer, lineStartIndex, readIndex - lineStartIndex,
charset).replaceFirst("\r?\n?", "");
readIndex = lineStartIndex;
} catch (UnsupportedEncodingException ex) {
Logger.getLogger(BackwardsReader.class.getName())
.log(Level.SEVERE, null, ex);
return null;
}
line = line2 + line;
if (lineStartFound) {
--readIndex;
break;
}
// Read a prior block:
int toRead = BUFFER_SIZE;
if (position - toRead < 0) {
toRead = (int) position;
}
if (toRead == 0) {
break;
}
position -= toRead;
raf.seek(position);
raf.readFully(buffer, 0, toRead);
readIndex = toRead;
if (buffer[readIndex - 1] == (byte)'\r') {
--readIndex;
}
}
return line;
}
#Override
public void close() throws IOException {
if (raf != null) {
raf.close();
}
}
}
And a usage example:
public static void main(String[] args) {
try {
File file = new File(args[0]);
BackwardsReader reader = new BackwardsReader(file, "UTF-8");
int lineCount = 0;
for (;;) {
String line = reader.readLine();
if (line == null) {
break;
}
++lineCount;
System.out.println(line);
}
reader.close();
System.out.println("Lines: " + lineCount);
} catch (IOException ex) {
Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
}
}
Related
I am trying to split files from one file to 4 different files. So I am dividing the file by some "x" value and wanna write the file till that value and from there to next file continues till the file contents ends.
I am checking some x value in the file using buffer reader and checking with the content is equal to the x value and do the splitting.
Splitting is coming but in some another way, like it's reading the file and writing till the line number which is "x". But I need all the lines till that "x" value is present in the file.
I have a time in the file like start time hh:mm:ss and I am checking this with the hh:mm:ss with my x value and do the splitting like below
// inputs to the below method
// filePath = "//somepath";
// splitlen = 30;
// name ="somename"; */
public void split(String FilePath, long splitlen, String name) {
long leninfile = 0, leng = 0;
int count = 1, data;
try {
File filename = new File(FilePath);
InputStream infile = new BufferedInputStream(new FileInputStream(filename));
data = infile.read();
BufferedReader br = new BufferedReader(new InputStreamReader(infile));
while (data != -1) {
filename = new File("/Users//Documents/mysrt/" + count + ".srt");
OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename));
String strLine = br.readLine();
String[] atoms = strLine.split(" --> ");
if (atoms.length == 1) {
// outfile.write(Integer.parseInt(strLine + "\n"));
}
else {
String startTS = atoms[0];
String endTS = atoms[1];
System.out.println(startTS + "\n");
System.out.println(endTS + "\n");
String startTime = startTS.replace(",", ".");
String endTime = endTS.replace(",", ".");
System.out.println("startTime" + "\n" + startTime);
System.out.println("endTime" + "\n" + endTime);
String [] arrOfStr = endTime.split(":");
System.out.println("=====arrOfStr=====");
int x = Integer.parseInt(arrOfStr[1]);
System.out.println(arrOfStr[1]);
System.out.println("===x repeat==");
System.out.println(x);
System.out.println("===splitlen repeat==");
System.out.println(splitlen);
System.out.println(data);
System.out.println(br.readLine());
System.out.println(br.read());
while (data != -1 && x < splitlen) {
outfile.write(br.readLine().getBytes());
data = infile.read();
x++;
}
System.out.println("===== out of while x =====");
System.out.println(br.readLine());
System.out.println(x);
leninfile += leng;
leng = 0;
outfile.close();
firstPage = false;
firstPage = true;
count++;
splitlen = splitlen + 30;
System.out.println("=====splitlen after=====" +splitlen);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
I am incrementing the time with some number to read the next lines in file and with into another file.
Here splitlen is 30 , so it's writing the data till 30 lines in a new file. Then it's incrementing splitlen+30 i.e 60. But, it's reading next 60 lines and writing into next file.
But I need to check this splitlen with the time provided in the content of file and I should split that line.
Please suggest me where I am doing wrong. If you provide snippet it will be appreciated.
Thanks.
I think this is what you want
public void split(String filePath, long splitLen, String name) {
File fileSource = new File(filePath);
int count = 0;
boolean endOfFile = false;
String lineSeparator = System.getProperty("line.separator");
int hour = 0; // an accumulator for hours
int min = 0; // an accumulator for minutes
int sec = (int) splitLen; // an accumulator for seconds
int _hour = 0; // hours from the file
int _min = 0; // minutes from the file
int _sec = 0; // seconds from the file
try ( // try with resources to close files automatically
FileReader frSource = new FileReader(fileSource);
BufferedReader buffSource = new BufferedReader(frSource);
) {
String strIn = null;
while(!endOfFile) {
File fileOut = new File("f:\\test\\mysrt\\" + count + ".srt");
try ( // try with resources to close files automatically
FileWriter fwOut = new FileWriter(fileOut);
) {
if (strIn != null) {
// write out the last line read to the new file
fwOut.write(strIn + lineSeparator);
}
for (int i = 0; i < splitLen; i++) {
strIn = buffSource.readLine();
if (strIn == null) {
endOfFile = true; // stop the while loop
break; // exit the for loop
}
if (strIn.indexOf("-->") > 0) {
String endTime = strIn.split("-->")[1];
_hour = extractHours(endTime); // get the hours from the file
_min = extractMinutes(endTime); // get the minutes from the file
_sec = extractSeconds(endTime); // get the seconds from the file
if (_hour >= hour && _min >= min && _sec >= sec) { // if the file time is greater than our accumulators
sec += splitLen; // increment our accumulator seconds
if (sec >= 60) { // if accumulator seconds is greater than 59, we need to convert it to minutes and seconds
min += sec / 60;
sec = sec % 60;
}
if (min >= 60) { if accumulator minutes is greater than 59, we need to convert it to hours and minutes
hour += min / 60;
min = min % 60;
}
break; // break out of the for loop, which cause the file to be completed and a new file started.
}
}
fwOut.write(strIn + lineSeparator); // write out to the new file
}
fwOut.flush();
}
count++;
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
private int extractMinutes(String time) {
// You need to implement this, I don't know the format of your time
return 0;
}
private int extractSeconds(String time) {
// You need to implement this, I don't know the format of your time
return 0;
}
The problem with your code is that the timestamp you're looking at is in HH:MM:ss but with the splitlen and x variables you are only working with minutes.
So you need to keep track of both hours and minutes, maybe this could be done with some DateTime class but here is a simple int solution
//somewhere at the top
int hour = 0;
int minutes = 30;
//where you today increase splitlen
minutes += 30;
if (minutes == 60) {
hour++;
minutes = 0;
}
//parse also hours
int y = Integer.parseInt(arrOfStr[0]);
int x = Integer.parseInt(arrOfStr[1]);
//you need to rewrite this to compare x and y against hour and minutes
while (data != -1 && x < splitlen) {
So now you will not be looking for 30, 60, 90,... minutes but instead 00:30, 01:00, 01:30 and so on. Of course you must also be prepared to handle the situation where there is no entry for a whole minute unless of course you already do so.
checkTime is of course a a key method here and it might be a good idea to make the last hour and minute when the file was split into class members but they could of course also be sent as parameters from split().
Update
Here is a simplified version of the split method to give an example on how to solve this, it is not complete but should be a good starting point for solving the issue. I try to make use of how a .str file is constructed and make use of the logic explained above for determining when to open a new output file.
public void split(String filepath, long splitlen, String name) {
int count = 1;
try {
File filename = new File(filepath);
InputStream infile = new BufferedInputStream(new FileInputStream(filename));
BufferedReader br = new BufferedReader(new InputStreamReader(infile));
FileWriter outfile = createOutputFile(count);
boolean isEndOfFile = false;
while (!isEndOfFile) {
String line = null;
int i = 1;
while ((line = br.readLine()) != null) {
outfile.write(line);
if (line.trim().isEmpty()) { //last line of group
i = 1;
continue;
}
if (i == 2) { //Timestamp row
String[] split = line.split("-->");
if (checkTime(split)) {
count++;
outfile.flush();
outfile.close();
outfile = createOutputFile(count);
}
}
i++;
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private FileWriter createOutputFile(int index) {
//Create new outputfile and writer
return null;
}
private boolean checkTime(String[] arr) {
//use start or end time in arr to check if an even half or full hour has been passed
return true;
}
I am have a project that need to modify some text in the text file.
Like BB,BO,BR,BZ,CL,VE-BR
I need make it become BB,BO,BZ,CL,VE.
and HU, LT, LV, UA, PT-PT/AR become HU, LT, LV, UA,/AR.
I have tried to type some code, however the code fail to loop and also,in this case.
IN/CI, GH, KE, NA, NG, SH, ZW /EE, HU, LT, LV, UA,/AR, BB
"AR, BB,BO,BR,BZ,CL, CO, CR, CW, DM, DO,VE-AR-BR-MX"
I want to delete the AR in second row, but it just delete the AR in first row.
I got no idea and seeking for helps.
Please
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.util.Scanner;
public class tomy {
static StringBuffer stringBufferOfData = new StringBuffer();
static StringBuffer stringBufferOfData1 = stringBufferOfData;
static String filename = null;
static String input = null;
static String s = "-";
static Scanner sc = new Scanner(s);
public static void main(String[] args) {
boolean fileRead = readFile();
if (fileRead) {
replacement();
writeToFile();
}
System.exit(0);
}
private static boolean readFile() {
System.out.println("Please enter your files name and path i.e C:\\test.txt: ");
filename = "C:\\test.txt";
Scanner fileToRead = null;
try {
fileToRead = new Scanner(new File(filename));
for (String line; fileToRead.hasNextLine()
&& (line = fileToRead.nextLine()) != null;) {
System.out.println(line);
stringBufferOfData.append(line).append("\r\n");
}
fileToRead.close();
return true;
} catch (FileNotFoundException ex) {
System.out.println("The file " + filename + " could not be found! "+ ex.getMessage());
return false;
} finally {
fileToRead.close();
return true;
}
}
private static void writeToFile() {
try {
BufferedWriter bufwriter = new BufferedWriter(new FileWriter(
filename));
bufwriter.write(stringBufferOfData.toString());
bufwriter.close();
} catch (Exception e) {// if an exception occurs
System.out.println("Error occured while attempting to write to file: "+ e.getMessage());
}
}
private static void replacement() {
System.out.println("Please enter the contents of a line you would like to edit: ");
String lineToEdit = sc.nextLine();
int startIndex = stringBufferOfData.indexOf(lineToEdit);
int endIndex = startIndex + lineToEdit.length() + 2;
String getdata = stringBufferOfData.substring(startIndex + 1, endIndex);
String data = " ";
Scanner sc1 = new Scanner(getdata);
Scanner sc2 = new Scanner(data);
String lineToEdit1 = sc1.nextLine();
String replacementText1 = sc2.nextLine();
int startIndex1 = stringBufferOfData.indexOf(lineToEdit1);
int endIndex1 = startIndex1 + lineToEdit1.length() + 3;
boolean test = lineToEdit.contains(getdata);
boolean testh = lineToEdit.contains("-");
System.out.println(startIndex);
if (testh = true) {
stringBufferOfData.replace(startIndex, endIndex, replacementText1);
stringBufferOfData.replace(startIndex1, endIndex1 - 2,
replacementText1);
System.out.println("Here is the new edited text:\n"
+ stringBufferOfData);
} else {
System.out.println("nth" + stringBufferOfData);
System.out.println(getdata);
}
}
}
I wrote a quick method for you that I think does what you want, i.e. remove all occurrences of a token in a line, where that token is embedded in the line and is identified by a leading dash.
The method reads the file and writes it straight out to a file after editing for the token. This would allow you to process a huge file without worrying about about memory constraints.
You can simply rename the output file after a successful edit. I'll leave it up to you to work that out.
If you feel you really must use string buffers to do in memory management, then grab the logic for the line editing from my method and modify it to work with string buffers.
static void onePassReadEditWrite(final String inputFilePath, final String outputPath)
{
// the input file
Scanner inputScanner = null;
// output file
FileWriter outputWriter = null;
try
{
// open the input file
inputScanner = new Scanner(new File(inputFilePath));
// open output file
File outputFile = new File(outputPath);
outputFile.createNewFile();
outputWriter = new FileWriter(outputFile);
try
{
for (
String lineToEdit = inputScanner.nextLine();
/*
* NOTE: when this loop attempts to read beyond EOF it will throw the
* java.util.NoSuchElementException exception which is caught in the
* containing try/catch block.
*
* As such there is NO predicate required for this loop.
*/;
lineToEdit = inputScanner.nextLine()
)
// scan all lines from input file
{
System.out.println("START LINE [" + lineToEdit + "]");
// get position of dash in line
int dashInLinePosition = lineToEdit.indexOf('-');
while (dashInLinePosition != -1)
// this line has needs editing
{
// split line on dash
String halfLeft = lineToEdit.substring(0, dashInLinePosition);
String halfRight = lineToEdit.substring(dashInLinePosition + 1);
// get token after dash that is to be removed from whole line
String tokenToRemove = halfRight.substring(0, 2);
// reconstruct line from the 2 halves without the dash
StringBuilder sb = new StringBuilder(halfLeft);
sb.append(halfRight.substring(0));
lineToEdit = sb.toString();
// get position of first token in line
int tokenInLinePosition = lineToEdit.indexOf(tokenToRemove);
while (tokenInLinePosition != -1)
// do for all tokens in line
{
// split line around token to be removed
String partLeft = lineToEdit.substring(0, tokenInLinePosition);
String partRight = lineToEdit.substring(tokenInLinePosition + tokenToRemove.length());
if ((!partRight.isEmpty()) && (partRight.charAt(0) == ','))
// remove prefix comma from right part
{
partRight = partRight.substring(1);
}
// reconstruct line from the left and right parts
sb.setLength(0);
sb = new StringBuilder(partLeft);
sb.append(partRight);
lineToEdit = sb.toString();
// find next token to be removed from line
tokenInLinePosition = lineToEdit.indexOf(tokenToRemove);
}
// handle additional dashes in line
dashInLinePosition = lineToEdit.indexOf('-');
}
System.out.println("FINAL LINE [" + lineToEdit + "]");
// write line to output file
outputWriter.write(lineToEdit);
outputWriter.write("\r\n");
}
}
catch (java.util.NoSuchElementException e)
// end of scan
{
}
finally
// housekeeping
{
outputWriter.close();
inputScanner.close();
}
}
catch(FileNotFoundException e)
{
e.printStackTrace();
}
catch(IOException e)
{
inputScanner.close();
e.printStackTrace();
}
}
My output is reflecting the file that I am needing to process into hex values but my hex values are not being reflected in the output. Why isn't my file being converted into hex values?
public class HexUtilityDump {
public static void main(String[] args) {
FileReader myFileReader = null;
try {
myFileReader = new FileReader("src/hexUtility/test.txt");
} catch(Exception ex) {
System.out.println("Error opening file: " + ex.getLocalizedMessage());
}
BufferedReader b = null;
b = new BufferedReader(myFileReader);
//Loop through all the records in the file and print them on the console
while (true){
String myLine;
try {
myLine = b.readLine();
//check for null returned from readLine() and exit loop if so.
if (myLine ==null){break;}
System.out.println(myLine);
} catch (IOException e) {
e.printStackTrace();
//it is time to exit the while loop
break;
}
}
}
Here is the code to pull the file through the conversion
public static void convertToHex(PrintStream out, File myFileReader) throws IOException {
InputStream is = new FileInputStream(myFileReader);
int bytesCounter =0;
int value = 0;
StringBuilder sbHex = new StringBuilder();
StringBuilder sbText = new StringBuilder();
StringBuilder sbResult = new StringBuilder();
while ((value = is.read()) != -1) {
//convert to hex value with "X" formatter
sbHex.append(String.format("%02X ", value));
//If the character is not convertible, just print a dot symbol "."
if (!Character.isISOControl(value)) {
sbText.append((char)value);
} else {
sbText.append(".");
}
//if 16 bytes are read, reset the counter,
//clear the StringBuilder for formatting purpose only.
if(bytesCounter==15) {
sbResult.append(sbHex).append(" ").append(sbText).append("\n");
sbHex.setLength(0);
sbText.setLength(0);
bytesCounter=0;
}else{
bytesCounter++;
}
}
//if still got content
if(bytesCounter!=0){
//add spaces more formatting purpose only
for(; bytesCounter<16; bytesCounter++){
//1 character 3 spaces
sbHex.append(" ");
}
sbResult.append(sbHex).append(" ").append(sbText).append("\n");
}
out.print(sbResult);
is.close();
}
You never call convertToHex, remove the file reading from your main() method. It appears you wanted to do something like,
File f = new File("src/hexUtility/test.txt");
convertToHex(System.out, f);
I am tying to erase the last line in a text file using Java; however, the code below deletes everything.
public void eraseLast()
{
while(reader.hasNextLine()) {
reader.nextLine();
if (!reader.hasNextLine()) {
try {
fWriter = new FileWriter("config/lastWindow.txt");
writer = new BufferedWriter(fWriter);
writer.write("");
writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
If you wanted to delete the last line from the file without creating a new file, you could do something like this:
RandomAccessFile f = new RandomAccessFile(fileName, "rw");
long length = f.length() - 1;
do {
length -= 1;
f.seek(length);
byte b = f.readByte();
} while(b != 10);
f.setLength(length+1);
f.close();
Start off at the second last byte, looking for a linefeed character, and keep seeking backwards until you find one. Then truncate the file after that linefeed.
You start at the second last byte rather than the last in case the last character is a linefeed (i.e. the end of the last line).
You are creating a new file that's replacing the old one, you want something like this
public void eraseLast() {
StringBuilder s = new StringBuilder();
while (reader.hasNextLine()) {
String line = reader.readLine();
if (reader.hasNextLine()) {
s.append(line);
}
}
try {
fWriter = new FileWriter("config/lastWindow.txt");
writer = new BufferedWriter(fWriter);
writer.write(s.toString());
writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The answer above needs to be slightly modified to deal with the case where there is only 1 line left in the file (otherwise you get an IOException for negative seek offset):
RandomAccessFile f = new RandomAccessFile(fileName, "rw");
long length = f.length() - 1;
do {
length -= 1;
f.seek(length);
byte b = f.readbyte();
} while(b != 10 && length > 0);
if (length == 0) {
f.setLength(length);
} else {
f.setLength(length + 1);
}
You're opening the file in overwrite mode (hence a single write operation will wipe the entire contents of the file), to open it in append mode it should be:
fWriter = new FileWriter("config/lastWindow.txt", true);
And besides, it's not going to delete the last line: although the reader has reached the current last line of the file, the writer is after the last line - because we specified above that the append mode should be used.
Take a look at this answer to get an idea of what you'll have to do.
I benefited from others but the code was not working. Here is my working code on android studio.
File file = new File(getFilesDir(), "mytextfile.txt");
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "rw");
byte b;
long length = randomAccessFile.length() ;
if (length != 0) {
do {
length -= 1;
randomAccessFile.seek(length);
b = randomAccessFile.readByte();
} while (b != 10 && length > 0);
randomAccessFile.setLength(length);
randomAccessFile.close();
}
This is my solution
private fun removeLastSegment() {
val reader = BufferedReader(FileReader(segmentsFile))
val segments = ArrayList<Segment>()
var line: String?
while (reader.readLine().also { line = it } != null) {
segments.add(Gson().fromJson(line, Segment::class.java))
}
reader.close()
segments.remove(segments.last())
var writer = BufferedWriter(FileWriter(segmentsFile))
writer.write("")
writer = BufferedWriter(FileWriter(segmentsFile, true))
for (segment in segments) {
writer.appendLine(Gson().toJson(segment))
}
writer.flush()
writer.close()
lastAction--
lastSegment--
}
I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file
I was wondering if there is a smarter way to do that
This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.
public static int countLinesOld(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:
public static int countLinesNew(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int readChars = is.read(c);
if (readChars == -1) {
// bail out if nothing to read
return 0;
}
// make it easy for the optimizer to tune this loop
int count = 0;
while (readChars == 1024) {
for (int i=0; i<1024;) {
if (c[i++] == '\n') {
++count;
}
}
readChars = is.read(c);
}
// count remaining characters
while (readChars != -1) {
for (int i=0; i<readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
readChars = is.read(c);
}
return count == 0 ? 1 : count;
} finally {
is.close();
}
}
Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOld has a few outliers, and countLinesNew has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader is clearly slower.
I have implemented another solution to the problem, I found it more efficient in counting rows:
try
(
FileReader input = new FileReader("input.txt");
LineNumberReader count = new LineNumberReader(input);
)
{
while (count.skip(Long.MAX_VALUE) > 0)
{
// Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
}
result = count.getLineNumber() + 1; // +1 because line index starts at 0
}
The accepted answer has an off by one error for multi line files which don't end in newline. A one line file ending without a newline would return 1, but a two line file ending without a newline would return 1 too. Here's an implementation of the accepted solution which fixes this. The endsWithoutNewLine checks are wasteful for everything but the final read, but should be trivial time wise compared to the overall function.
public int count(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean endsWithoutNewLine = false;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
endsWithoutNewLine = (c[readChars - 1] != '\n');
}
if(endsWithoutNewLine) {
++count;
}
return count;
} finally {
is.close();
}
}
With java-8, you can use streams:
try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
long numOfLines = lines.count();
...
}
The answer with the method count() above gave me line miscounts if a file didn't have a newline at the end of the file - it failed to count the last line in the file.
This method works better for me:
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
I tested the above methods for counting lines and here are my observations for Different methods as tested on my system
File Size : 1.6 Gb
Methods:
Using Scanner : 35s approx
Using BufferedReader : 5s approx
Using Java 8 : 5s approx
Using LineNumberReader : 5s approx
Moreover Java8 Approach seems quite handy :
Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]
I know this is an old question, but the accepted solution didn't quite match what I needed it to do. So, I refined it to accept various line terminators (rather than just line feed) and to use a specified character encoding (rather than ISO-8859-n). All in one method (refactor as appropriate):
public static long getLinesCount(String fileName, String encodingName) throws IOException {
long linesCount = 0;
File file = new File(fileName);
FileInputStream fileIn = new FileInputStream(file);
try {
Charset encoding = Charset.forName(encodingName);
Reader fileReader = new InputStreamReader(fileIn, encoding);
int bufferSize = 4096;
Reader reader = new BufferedReader(fileReader, bufferSize);
char[] buffer = new char[bufferSize];
int prevChar = -1;
int readCount = reader.read(buffer);
while (readCount != -1) {
for (int i = 0; i < readCount; i++) {
int nextChar = buffer[i];
switch (nextChar) {
case '\r': {
// The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
linesCount++;
break;
}
case '\n': {
if (prevChar == '\r') {
// The current line is terminated by a carriage return immediately followed by a line feed.
// The line has already been counted.
} else {
// The current line is terminated by a line feed.
linesCount++;
}
break;
}
}
prevChar = nextChar;
}
readCount = reader.read(buffer);
}
if (prevCh != -1) {
switch (prevCh) {
case '\r':
case '\n': {
// The last line is terminated by a line terminator.
// The last line has already been counted.
break;
}
default: {
// The last line is terminated by end-of-file.
linesCount++;
}
}
}
} finally {
fileIn.close();
}
return linesCount;
}
This solution is comparable in speed to the accepted solution, about 4% slower in my tests (though timing tests in Java are notoriously unreliable).
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (Stream<String> lines = Files.lines(file.toPath())) {
return lines.count();
}
}
Tested on JDK8_u31. But indeed performance is slow compared to this method:
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
byte[] c = new byte[1024];
boolean empty = true,
lastEmpty = false;
long count = 0;
int read;
while ((read = is.read(c)) != -1) {
for (int i = 0; i < read; i++) {
if (c[i] == '\n') {
count++;
lastEmpty = true;
} else if (lastEmpty) {
lastEmpty = false;
}
}
empty = false;
}
if (!empty) {
if (count == 0) {
count = 1;
} else if (!lastEmpty) {
count++;
}
}
return count;
}
}
Tested and very fast.
A straight-forward way using Scanner
static void lineCounter (String path) throws IOException {
int lineCount = 0, commentsCount = 0;
Scanner input = new Scanner(new File(path));
while (input.hasNextLine()) {
String data = input.nextLine();
if (data.startsWith("//")) commentsCount++;
lineCount++;
}
System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
}
I concluded that wc -l:s method of counting newlines is fine but returns non-intuitive results on files where the last line doesn't end with a newline.
And #er.vikas solution based on LineNumberReader but adding one to the line count returned non-intuitive results on files where the last line does end with newline.
I therefore made an algo which handles as follows:
#Test
public void empty() throws IOException {
assertEquals(0, count(""));
}
#Test
public void singleNewline() throws IOException {
assertEquals(1, count("\n"));
}
#Test
public void dataWithoutNewline() throws IOException {
assertEquals(1, count("one"));
}
#Test
public void oneCompleteLine() throws IOException {
assertEquals(1, count("one\n"));
}
#Test
public void twoCompleteLines() throws IOException {
assertEquals(2, count("one\ntwo\n"));
}
#Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
assertEquals(2, count("one\ntwo"));
}
#Test
public void aFewLines() throws IOException {
assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}
And it looks like this:
static long countLines(InputStream is) throws IOException {
try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
char[] buf = new char[8192];
int n, previousN = -1;
//Read will return at least one byte, no need to buffer more
while((n = lnr.read(buf)) != -1) {
previousN = n;
}
int ln = lnr.getLineNumber();
if (previousN == -1) {
//No data read at all, i.e file was empty
return 0;
} else {
char lastChar = buf[previousN - 1];
if (lastChar == '\n' || lastChar == '\r') {
//Ending with newline, deduct one
return ln;
}
}
//normal case, return line number + 1
return ln + 1;
}
}
If you want intuitive results, you may use this. If you just want wc -l compatibility, simple use #er.vikas solution, but don't add one to the result and retry the skip:
try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
while(lnr.skip(Long.MAX_VALUE) > 0){};
return lnr.getLineNumber();
}
How about using the Process class from within Java code? And then reading the output of the command.
Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
int lineCount = 0;
while ((line = b.readLine()) != null) {
System.out.println(line);
lineCount = Integer.parseInt(line);
}
Need to try it though. Will post the results.
It seems that there are a few different approaches you can take with LineNumberReader.
I did this:
int lines = 0;
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
String line = count.readLine();
if(count.ready())
{
while(line != null) {
lines = count.getLineNumber();
line = count.readLine();
}
lines+=1;
}
count.close();
System.out.println(lines);
Even more simply, you can use the Java BufferedReader lines() Method to return a stream of the elements, and then use the Stream count() method to count all of the elements. Then simply add one to the output to get the number of rows in the text file.
As example:
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
int lines = (int)count.lines().count() + 1;
count.close();
System.out.println(lines);
This funny solution works really good actually!
public static int countLines(File input) throws IOException {
try (InputStream is = new FileInputStream(input)) {
int count = 1;
for (int aChar = 0; aChar != -1;aChar = is.read())
count += aChar == '\n' ? 1 : 0;
return count;
}
}
On Unix-based systems, use the wc command on the command-line.
Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.
If you don't have any index structures, you'll not get around the reading of the complete file. But you can optimize it by avoiding to read it line by line and use a regex to match all line terminators.
Best Optimized code for multi line files having no newline('\n') character at EOF.
/**
*
* #param filename
* #return
* #throws IOException
*/
public static int countLines(String filename) throws IOException {
int count = 0;
boolean empty = true;
FileInputStream fis = null;
InputStream is = null;
try {
fis = new FileInputStream(filename);
is = new BufferedInputStream(fis);
byte[] c = new byte[1024];
int readChars = 0;
boolean isLine = false;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if ( c[i] == '\n' ) {
isLine = false;
++count;
}else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF
isLine = true;
}
}
}
if(isLine){
++count;
}
}catch(IOException e){
e.printStackTrace();
}finally {
if(is != null){
is.close();
}
if(fis != null){
fis.close();
}
}
LOG.info("count: "+count);
return (count == 0 && !empty) ? 1 : count;
}
Scanner with regex:
public int getLineCount() {
Scanner fileScanner = null;
int lineCount = 0;
Pattern lineEndPattern = Pattern.compile("(?m)$");
try {
fileScanner = new Scanner(new File(filename)).useDelimiter(lineEndPattern);
while (fileScanner.hasNext()) {
fileScanner.next();
++lineCount;
}
}catch(FileNotFoundException e) {
e.printStackTrace();
return lineCount;
}
fileScanner.close();
return lineCount;
}
Haven't clocked it.
if you use this
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
you cant run to big num rows, likes 100K rows, because return from reader.getLineNumber is int. you need long type of data to process maximum rows..