Number of lines in a file in Java - java

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file
I was wondering if there is a smarter way to do that

This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.
public static int countLinesOld(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:
public static int countLinesNew(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int readChars = is.read(c);
if (readChars == -1) {
// bail out if nothing to read
return 0;
}
// make it easy for the optimizer to tune this loop
int count = 0;
while (readChars == 1024) {
for (int i=0; i<1024;) {
if (c[i++] == '\n') {
++count;
}
}
readChars = is.read(c);
}
// count remaining characters
while (readChars != -1) {
for (int i=0; i<readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
readChars = is.read(c);
}
return count == 0 ? 1 : count;
} finally {
is.close();
}
}
Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOld has a few outliers, and countLinesNew has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader is clearly slower.

I have implemented another solution to the problem, I found it more efficient in counting rows:
try
(
FileReader input = new FileReader("input.txt");
LineNumberReader count = new LineNumberReader(input);
)
{
while (count.skip(Long.MAX_VALUE) > 0)
{
// Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
}
result = count.getLineNumber() + 1; // +1 because line index starts at 0
}

The accepted answer has an off by one error for multi line files which don't end in newline. A one line file ending without a newline would return 1, but a two line file ending without a newline would return 1 too. Here's an implementation of the accepted solution which fixes this. The endsWithoutNewLine checks are wasteful for everything but the final read, but should be trivial time wise compared to the overall function.
public int count(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean endsWithoutNewLine = false;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
endsWithoutNewLine = (c[readChars - 1] != '\n');
}
if(endsWithoutNewLine) {
++count;
}
return count;
} finally {
is.close();
}
}

With java-8, you can use streams:
try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
long numOfLines = lines.count();
...
}

The answer with the method count() above gave me line miscounts if a file didn't have a newline at the end of the file - it failed to count the last line in the file.
This method works better for me:
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}

I tested the above methods for counting lines and here are my observations for Different methods as tested on my system
File Size : 1.6 Gb
Methods:
Using Scanner : 35s approx
Using BufferedReader : 5s approx
Using Java 8 : 5s approx
Using LineNumberReader : 5s approx
Moreover Java8 Approach seems quite handy :
Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]

I know this is an old question, but the accepted solution didn't quite match what I needed it to do. So, I refined it to accept various line terminators (rather than just line feed) and to use a specified character encoding (rather than ISO-8859-n). All in one method (refactor as appropriate):
public static long getLinesCount(String fileName, String encodingName) throws IOException {
long linesCount = 0;
File file = new File(fileName);
FileInputStream fileIn = new FileInputStream(file);
try {
Charset encoding = Charset.forName(encodingName);
Reader fileReader = new InputStreamReader(fileIn, encoding);
int bufferSize = 4096;
Reader reader = new BufferedReader(fileReader, bufferSize);
char[] buffer = new char[bufferSize];
int prevChar = -1;
int readCount = reader.read(buffer);
while (readCount != -1) {
for (int i = 0; i < readCount; i++) {
int nextChar = buffer[i];
switch (nextChar) {
case '\r': {
// The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
linesCount++;
break;
}
case '\n': {
if (prevChar == '\r') {
// The current line is terminated by a carriage return immediately followed by a line feed.
// The line has already been counted.
} else {
// The current line is terminated by a line feed.
linesCount++;
}
break;
}
}
prevChar = nextChar;
}
readCount = reader.read(buffer);
}
if (prevCh != -1) {
switch (prevCh) {
case '\r':
case '\n': {
// The last line is terminated by a line terminator.
// The last line has already been counted.
break;
}
default: {
// The last line is terminated by end-of-file.
linesCount++;
}
}
}
} finally {
fileIn.close();
}
return linesCount;
}
This solution is comparable in speed to the accepted solution, about 4% slower in my tests (though timing tests in Java are notoriously unreliable).

/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (Stream<String> lines = Files.lines(file.toPath())) {
return lines.count();
}
}
Tested on JDK8_u31. But indeed performance is slow compared to this method:
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
byte[] c = new byte[1024];
boolean empty = true,
lastEmpty = false;
long count = 0;
int read;
while ((read = is.read(c)) != -1) {
for (int i = 0; i < read; i++) {
if (c[i] == '\n') {
count++;
lastEmpty = true;
} else if (lastEmpty) {
lastEmpty = false;
}
}
empty = false;
}
if (!empty) {
if (count == 0) {
count = 1;
} else if (!lastEmpty) {
count++;
}
}
return count;
}
}
Tested and very fast.

A straight-forward way using Scanner
static void lineCounter (String path) throws IOException {
int lineCount = 0, commentsCount = 0;
Scanner input = new Scanner(new File(path));
while (input.hasNextLine()) {
String data = input.nextLine();
if (data.startsWith("//")) commentsCount++;
lineCount++;
}
System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
}

I concluded that wc -l:s method of counting newlines is fine but returns non-intuitive results on files where the last line doesn't end with a newline.
And #er.vikas solution based on LineNumberReader but adding one to the line count returned non-intuitive results on files where the last line does end with newline.
I therefore made an algo which handles as follows:
#Test
public void empty() throws IOException {
assertEquals(0, count(""));
}
#Test
public void singleNewline() throws IOException {
assertEquals(1, count("\n"));
}
#Test
public void dataWithoutNewline() throws IOException {
assertEquals(1, count("one"));
}
#Test
public void oneCompleteLine() throws IOException {
assertEquals(1, count("one\n"));
}
#Test
public void twoCompleteLines() throws IOException {
assertEquals(2, count("one\ntwo\n"));
}
#Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
assertEquals(2, count("one\ntwo"));
}
#Test
public void aFewLines() throws IOException {
assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}
And it looks like this:
static long countLines(InputStream is) throws IOException {
try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
char[] buf = new char[8192];
int n, previousN = -1;
//Read will return at least one byte, no need to buffer more
while((n = lnr.read(buf)) != -1) {
previousN = n;
}
int ln = lnr.getLineNumber();
if (previousN == -1) {
//No data read at all, i.e file was empty
return 0;
} else {
char lastChar = buf[previousN - 1];
if (lastChar == '\n' || lastChar == '\r') {
//Ending with newline, deduct one
return ln;
}
}
//normal case, return line number + 1
return ln + 1;
}
}
If you want intuitive results, you may use this. If you just want wc -l compatibility, simple use #er.vikas solution, but don't add one to the result and retry the skip:
try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
while(lnr.skip(Long.MAX_VALUE) > 0){};
return lnr.getLineNumber();
}

How about using the Process class from within Java code? And then reading the output of the command.
Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
int lineCount = 0;
while ((line = b.readLine()) != null) {
System.out.println(line);
lineCount = Integer.parseInt(line);
}
Need to try it though. Will post the results.

It seems that there are a few different approaches you can take with LineNumberReader.
I did this:
int lines = 0;
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
String line = count.readLine();
if(count.ready())
{
while(line != null) {
lines = count.getLineNumber();
line = count.readLine();
}
lines+=1;
}
count.close();
System.out.println(lines);
Even more simply, you can use the Java BufferedReader lines() Method to return a stream of the elements, and then use the Stream count() method to count all of the elements. Then simply add one to the output to get the number of rows in the text file.
As example:
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
int lines = (int)count.lines().count() + 1;
count.close();
System.out.println(lines);

This funny solution works really good actually!
public static int countLines(File input) throws IOException {
try (InputStream is = new FileInputStream(input)) {
int count = 1;
for (int aChar = 0; aChar != -1;aChar = is.read())
count += aChar == '\n' ? 1 : 0;
return count;
}
}

On Unix-based systems, use the wc command on the command-line.

Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.

If you don't have any index structures, you'll not get around the reading of the complete file. But you can optimize it by avoiding to read it line by line and use a regex to match all line terminators.

Best Optimized code for multi line files having no newline('\n') character at EOF.
/**
*
* #param filename
* #return
* #throws IOException
*/
public static int countLines(String filename) throws IOException {
int count = 0;
boolean empty = true;
FileInputStream fis = null;
InputStream is = null;
try {
fis = new FileInputStream(filename);
is = new BufferedInputStream(fis);
byte[] c = new byte[1024];
int readChars = 0;
boolean isLine = false;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if ( c[i] == '\n' ) {
isLine = false;
++count;
}else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF
isLine = true;
}
}
}
if(isLine){
++count;
}
}catch(IOException e){
e.printStackTrace();
}finally {
if(is != null){
is.close();
}
if(fis != null){
fis.close();
}
}
LOG.info("count: "+count);
return (count == 0 && !empty) ? 1 : count;
}

Scanner with regex:
public int getLineCount() {
Scanner fileScanner = null;
int lineCount = 0;
Pattern lineEndPattern = Pattern.compile("(?m)$");
try {
fileScanner = new Scanner(new File(filename)).useDelimiter(lineEndPattern);
while (fileScanner.hasNext()) {
fileScanner.next();
++lineCount;
}
}catch(FileNotFoundException e) {
e.printStackTrace();
return lineCount;
}
fileScanner.close();
return lineCount;
}
Haven't clocked it.

if you use this
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
you cant run to big num rows, likes 100K rows, because return from reader.getLineNumber is int. you need long type of data to process maximum rows..

Related

Is there a way to get the updated part of a file (for example a game log) that continuesly updates? [duplicate]

What's the quickest and most efficient way of reading the last line of text from a [very, very large] file in Java?
Below are two functions, one that returns the last non-blank line of a file without loading or stepping through the entire file, and the other that returns the last N lines of the file without stepping through the entire file:
What tail does is zoom straight to the last character of the file, then steps backward, character by character, recording what it sees until it finds a line break. Once it finds a line break, it breaks out of the loop. Reverses what was recorded and throws it into a string and returns. 0xA is the new line and 0xD is the carriage return.
If your line endings are \r\n or crlf or some other "double newline style newline", then you will have to specify n*2 lines to get the last n lines because it counts 2 lines for every line.
public String tail( File file ) {
RandomAccessFile fileHandler = null;
try {
fileHandler = new RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if( filePointer == fileLength ) {
continue;
}
break;
} else if( readByte == 0xD ) {
if( filePointer == fileLength - 1 ) {
continue;
}
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
} finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
/* ignore */
}
}
}
But you probably don't want the last line, you want the last N lines, so use this instead:
public String tail2( File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler =
new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
Invoke the above methods like this:
File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));
Warning
In the wild west of unicode this code can cause the output of this function to come out wrong. For example "Mary?s" instead of "Mary's". Characters with hats, accents, Chinese characters etc may cause the output to be wrong because accents are added as modifiers after the character. Reversing compound characters changes the nature of the identity of the character on reversal. You will have to do full battery of tests on all languages you plan to use this with.
For more information about this unicode reversal problem read this:
https://codeblog.jonskeet.uk/2009/11/02/omg-ponies-aka-humanity-epic-fail/
Apache Commons has an implementation using RandomAccessFile.
It's called ReversedLinesFileReader.
Have a look at my answer to a similar question for C#. The code would be quite similar, although the encoding support is somewhat different in Java.
Basically it's not a terribly easy thing to do in general. As MSalter points out, UTF-8 does make it easy to spot \r or \n as the UTF-8 representation of those characters is just the same as ASCII, and those bytes won't occur in multi-byte character.
So basically, take a buffer of (say) 2K, and progressively read backwards (skip to 2K before you were before, read the next 2K) checking for a line termination. Then skip to exactly the right place in the stream, create an InputStreamReader on the top, and a BufferedReader on top of that. Then just call BufferedReader.readLine().
Using FileReader or FileInputStream won't work - you'll have to use either FileChannel or RandomAccessFile to loop through the file backwards from the end. Encodings will be a problem though, as Jon said.
You can easily change the below code to print the last line.
MemoryMappedFile for printing last 5 lines:
private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
FileInputStream fileInputStream=new FileInputStream(file);
FileChannel channel=fileInputStream.getChannel();
ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
buffer.position((int)channel.size());
int count=0;
StringBuilder builder=new StringBuilder();
for(long i=channel.size()-1;i>=0;i--){
char c=(char)buffer.get((int)i);
builder.append(c);
if(c=='\n'){
if(count==5)break;
count++;
builder.reverse();
System.out.println(builder.toString());
builder=null;
builder=new StringBuilder();
}
}
channel.close();
}
RandomAccessFile to print last 5 lines:
private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
int lines = 0;
StringBuilder builder = new StringBuilder();
long length = file.length();
length--;
randomAccessFile.seek(length);
for(long seek = length; seek >= 0; --seek){
randomAccessFile.seek(seek);
char c = (char)randomAccessFile.read();
builder.append(c);
if(c == '\n'){
builder = builder.reverse();
System.out.println(builder.toString());
lines++;
builder = null;
builder = new StringBuilder();
if (lines == 5){
break;
}
}
}
}
as far as I know The fastest way to read the last line of a text file is using FileUtils Apache class which is in "org.apache.commons.io". I have a two-million-line file and by using this class, it took me less than one second to find the last line. Here is the my code:
LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
lastLine= lineIterator.nextLine();
}
try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {
String line = null;
System.out.println("======================================");
line = reader.readLine(); //Read Line ONE
line = reader.readLine(); //Read Line TWO
System.out.println("first line : " + line);
//Length of one line if lines are of even length
int len = line.length();
//skip to the end - 3 lines
reader.skip((reqFile.length() - (len*3)));
//Searched to the last line for the date I was looking for.
while((line = reader.readLine()) != null){
System.out.println("FROM LINE : " + line);
String date = line.substring(0,line.indexOf(","));
System.out.println("DATE : " + date); //BAM!!!!!!!!!!!!!!
}
System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
System.out.println("======================================");
} catch (IOException x) {
x.printStackTrace();
}
In C#, you should be able to set the stream's position:
From: http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file
using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
using(StreamReader sr = new StreamReader(fs))
{
sr.BaseStream.Position = fs.Length - 4;
if(sr.ReadToEnd() == "DONE")
// match
}
}
To avoid the Unicode problems related to reverting the string (or the StringBuilder), as discussed in Eric Leschinski excellent answer, one can read to a byte list, from the end of the file, revert it to a byte array and then create the String from the byte array.
Below are the changes to Eric Leschinski answer's code, to do it with a byte array. The code changes are below the commented lines of code:
static public String tail2(File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler = new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
//StringBuilder sb = new StringBuilder();
List<Byte> sb = new ArrayList<>();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
//sb.add( (char) readByte );
sb.add( (byte) readByte );
}
//String lastLine = sb.reverse().toString();
//Revert byte array and create String
byte[] bytes = new byte[sb.size()];
for (int i=0; i<sb.size(); i++) bytes[sb.size()-1-i] = sb.get(i);
String lastLine = new String(bytes);
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
Code is 2 lines only
// Please specify correct Charset
ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);
// read last 2 lines
System.out.println(rlf.toString(2));
Gradle:
implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'
Maven:
<dependency>
<groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
</dependency>

How to know bytes read(offset) of BufferedReader?

I want to read file line by line.
BufferedReader is much faster than RandomAccessFile or BufferedInputStream.
But the problem is that I don't know how many bytes I read.
How to know bytes read(offset)?
I tried.
String buffer;
int offset = 0;
while ((buffer = br.readLine()) != null)
offset += buffer.getBytes().length + 1; // 1 is for line separator
I works if file is small.
But, when the file becomes large, offset becomes smaller than actual value.
How can I get offset?
There is no simple way to do this with BufferedReader because of two effects: Character endcoding and line endings. On Windows, the line ending is \r\n which is two bytes. On Unix, the line separator is a single byte. BufferedReader will handle both cases without you noticing, so after readLine(), you won't know how many bytes were skipped.
Also buffer.getBytes() only returns the correct result when your default encoding and the encoding of the data in the file accidentally happens to be the same. When using byte[] <-> String conversion of any kind, you should always specify exactly which encoding should be used.
You also can't use a counting InputStream because the buffered readers read data in large chunks. So after reading the first line with, say, 5 bytes, the counter in the inner InputStream would return 4096 because the reader always reads that many bytes into its internal buffer.
You can have a look at NIO for this. You can use a low level ByteBuffer to keep track of the offset and wrap that in a CharBuffer to convert the input into lines.
Here's something that should work. It assumes UTF-8, but you can easily change that.
import java.io.*;
class main {
public static void main(final String[] args) throws Exception {
ByteCountingLineReader r = new ByteCountingLineReader(new ByteArrayInputStream(toUtf8("Hello\r\nWorld\n")));
String line = null;
do {
long count = r.byteCount();
line = r.readLine();
System.out.println("Line at byte " + count + ": " + line);
} while (line != null);
r.close();
}
static class ByteCountingLineReader implements Closeable {
InputStream in;
long _byteCount;
int bufferedByte = -1;
boolean ended;
// in should be a buffered stream!
ByteCountingLineReader(InputStream in) {
this.in = in;
}
ByteCountingLineReader(File f) throws IOException {
in = new BufferedInputStream(new FileInputStream(f), 65536);
}
String readLine() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
if (ended) return null;
while (true) {
int c = read();
if (ended && baos.size() == 0) return null;
if (ended || c == '\n') break;
if (c == '\r') {
c = read();
if (c != '\n' && !ended)
bufferedByte = c;
break;
}
baos.write(c);
}
return fromUtf8(baos.toByteArray());
}
int read() throws IOException {
if (bufferedByte >= 0) {
int b = bufferedByte;
bufferedByte = -1;
return b;
}
int c = in.read();
if (c < 0) ended = true; else ++_byteCount;
return c;
}
long byteCount() {
return bufferedByte >= 0 ? _byteCount - 1 : _byteCount;
}
public void close() throws IOException {
if (in != null) try {
in.close();
} finally {
in = null;
}
}
boolean ended() {
return ended;
}
}
static byte[] toUtf8(String s) {
try {
return s.getBytes("UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static String fromUtf8(byte[] bytes) {
try {
return new String(bytes, "UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static RuntimeException rethrow(Throwable t) {
throw t instanceof RuntimeException ? (RuntimeException) t : new RuntimeException(t);
}
}
Try use RandomAccessFile
RandomAccessFile raf = new RandomAccessFile(filePath, "r");
while ((cur_line = raf.readLine()) != null){
System.out.println(curr_line);
// get offset
long rowIndex = raf.getFilePointer();
}
to seek by offset do:
raf.seek(offset);
I am wondering your final solution, however, I think using long type instead of int can meet the most situation in your code above.
If you want to read a file line by line, I would recommend this code:
import java.io.*;
class FileRead
{
public static void main(String args[])
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I always used that method in the past, and works great!
Source: Here

How to read file from end to start (in reverse order) in Java?

I want to read file in opposite direction from end to the start my file,
[1322110800] LOG ROTATION: DAILY
[1322110800] LOG VERSION: 2.0
[1322110800] CURRENT HOST STATE:arsalan.hussain;DOWN;HARD;1;CRITICAL - Host Unreachable (192.168.1.107)
[1322110800] CURRENT HOST STATE: localhost;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.06 ms
[1322110800] CURRENT HOST STATE: musewerx-72c7b0;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.27 ms
i use code to read it in this way,
String strpath="/var/nagios.log";
FileReader fr = new FileReader(strpath);
BufferedReader br = new BufferedReader(fr);
String ch;
int time=0;
String Conversion="";
do {
ch = br.readLine();
out.print(ch+"<br/>");
} while (ch != null);
fr.close();
I would prefer to read in reverse order using buffer reader
I had the same problem as described here. I want to look at lines in file in reverse order, from the end back to the start (The unix tac command will do it).
However my input files are fairly large so reading the whole file into memory, as in the other examples was not really a workable option for me.
Below is the class I came up with, it does use RandomAccessFile, but does not need any buffers, since it just retains pointers to the file itself, and works with the standard InputStream methods.
It works for my cases, and empty files and a few other things I've tried. Now I don't have Unicode characters or anything fancy, but as long as the lines are delimited by LF, and even if they have a LF + CR it should work.
Basic Usage is :
in = new BufferedReader (new InputStreamReader (new ReverseLineInputStream(file)));
while(true) {
String line = in.readLine();
if (line == null) {
break;
}
System.out.println("X:" + line);
}
Here is the main source:
package www.kosoft.util;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.RandomAccessFile;
public class ReverseLineInputStream extends InputStream {
RandomAccessFile in;
long currentLineStart = -1;
long currentLineEnd = -1;
long currentPos = -1;
long lastPosInFile = -1;
public ReverseLineInputStream(File file) throws FileNotFoundException {
in = new RandomAccessFile(file, "r");
currentLineStart = file.length();
currentLineEnd = file.length();
lastPosInFile = file.length() -1;
currentPos = currentLineEnd;
}
public void findPrevLine() throws IOException {
currentLineEnd = currentLineStart;
// There are no more lines, since we are at the beginning of the file and no lines.
if (currentLineEnd == 0) {
currentLineEnd = -1;
currentLineStart = -1;
currentPos = -1;
return;
}
long filePointer = currentLineStart -1;
while ( true) {
filePointer--;
// we are at start of file so this is the first line in the file.
if (filePointer < 0) {
break;
}
in.seek(filePointer);
int readByte = in.readByte();
// We ignore last LF in file. search back to find the previous LF.
if (readByte == 0xA && filePointer != lastPosInFile ) {
break;
}
}
// we want to start at pointer +1 so we are after the LF we found or at 0 the start of the file.
currentLineStart = filePointer + 1;
currentPos = currentLineStart;
}
public int read() throws IOException {
if (currentPos < currentLineEnd ) {
in.seek(currentPos++);
int readByte = in.readByte();
return readByte;
}
else if (currentPos < 0) {
return -1;
}
else {
findPrevLine();
return read();
}
}
}
Apache Commons IO has the ReversedLinesFileReader class for this now (well, since version 2.2).
So your code could be:
String strpath="/var/nagios.log";
ReversedLinesFileReader fr = new ReversedLinesFileReader(new File(strpath));
String ch;
int time=0;
String Conversion="";
do {
ch = fr.readLine();
out.print(ch+"<br/>");
} while (ch != null);
fr.close();
The ReverseLineInputStream posted above is exactly what I was looking for. The files I am reading are large and cannot be buffered.
There are a couple of bugs:
File is not closed
if the last line is not terminated the last 2 lines are returned on the first read.
Here is the corrected code:
package www.kosoft.util;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.RandomAccessFile;
public class ReverseLineInputStream extends InputStream {
RandomAccessFile in;
long currentLineStart = -1;
long currentLineEnd = -1;
long currentPos = -1;
long lastPosInFile = -1;
int lastChar = -1;
public ReverseLineInputStream(File file) throws FileNotFoundException {
in = new RandomAccessFile(file, "r");
currentLineStart = file.length();
currentLineEnd = file.length();
lastPosInFile = file.length() -1;
currentPos = currentLineEnd;
}
private void findPrevLine() throws IOException {
if (lastChar == -1) {
in.seek(lastPosInFile);
lastChar = in.readByte();
}
currentLineEnd = currentLineStart;
// There are no more lines, since we are at the beginning of the file and no lines.
if (currentLineEnd == 0) {
currentLineEnd = -1;
currentLineStart = -1;
currentPos = -1;
return;
}
long filePointer = currentLineStart -1;
while ( true) {
filePointer--;
// we are at start of file so this is the first line in the file.
if (filePointer < 0) {
break;
}
in.seek(filePointer);
int readByte = in.readByte();
// We ignore last LF in file. search back to find the previous LF.
if (readByte == 0xA && filePointer != lastPosInFile ) {
break;
}
}
// we want to start at pointer +1 so we are after the LF we found or at 0 the start of the file.
currentLineStart = filePointer + 1;
currentPos = currentLineStart;
}
public int read() throws IOException {
if (currentPos < currentLineEnd ) {
in.seek(currentPos++);
int readByte = in.readByte();
return readByte;
} else if (currentPos > lastPosInFile && currentLineStart < currentLineEnd) {
// last line in file (first returned)
findPrevLine();
if (lastChar != '\n' && lastChar != '\r') {
// last line is not terminated
return '\n';
} else {
return read();
}
} else if (currentPos < 0) {
return -1;
} else {
findPrevLine();
return read();
}
}
#Override
public void close() throws IOException {
if (in != null) {
in.close();
in = null;
}
}
}
The proposed ReverseLineInputStream works really slow when you try to read thousands of lines. At my PC Intel Core i7 on SSD drive it was about 60k lines in 80 seconds. Here is the inspired optimized version with buffered reading (opposed to one-byte-at-a-time reading in ReverseLineInputStream). 60k lines log file is read in 400 milliseconds:
public class FastReverseLineInputStream extends InputStream {
private static final int MAX_LINE_BYTES = 1024 * 1024;
private static final int DEFAULT_BUFFER_SIZE = 1024 * 1024;
private RandomAccessFile in;
private long currentFilePos;
private int bufferSize;
private byte[] buffer;
private int currentBufferPos;
private int maxLineBytes;
private byte[] currentLine;
private int currentLineWritePos = 0;
private int currentLineReadPos = 0;
private boolean lineBuffered = false;
public ReverseLineInputStream(File file) throws IOException {
this(file, DEFAULT_BUFFER_SIZE, MAX_LINE_BYTES);
}
public ReverseLineInputStream(File file, int bufferSize, int maxLineBytes) throws IOException {
this.maxLineBytes = maxLineBytes;
in = new RandomAccessFile(file, "r");
currentFilePos = file.length() - 1;
in.seek(currentFilePos);
if (in.readByte() == 0xA) {
currentFilePos--;
}
currentLine = new byte[maxLineBytes];
currentLine[0] = 0xA;
this.bufferSize = bufferSize;
buffer = new byte[bufferSize];
fillBuffer();
fillLineBuffer();
}
#Override
public int read() throws IOException {
if (currentFilePos <= 0 && currentBufferPos < 0 && currentLineReadPos < 0) {
return -1;
}
if (!lineBuffered) {
fillLineBuffer();
}
if (lineBuffered) {
if (currentLineReadPos == 0) {
lineBuffered = false;
}
return currentLine[currentLineReadPos--];
}
return 0;
}
private void fillBuffer() throws IOException {
if (currentFilePos < 0) {
return;
}
if (currentFilePos < bufferSize) {
in.seek(0);
in.read(buffer);
currentBufferPos = (int) currentFilePos;
currentFilePos = -1;
} else {
in.seek(currentFilePos);
in.read(buffer);
currentBufferPos = bufferSize - 1;
currentFilePos = currentFilePos - bufferSize;
}
}
private void fillLineBuffer() throws IOException {
currentLineWritePos = 1;
while (true) {
// we've read all the buffer - need to fill it again
if (currentBufferPos < 0) {
fillBuffer();
// nothing was buffered - we reached the beginning of a file
if (currentBufferPos < 0) {
currentLineReadPos = currentLineWritePos - 1;
lineBuffered = true;
return;
}
}
byte b = buffer[currentBufferPos--];
// \n is found - line fully buffered
if (b == 0xA) {
currentLineReadPos = currentLineWritePos - 1;
lineBuffered = true;
break;
// just ignore \r for now
} else if (b == 0xD) {
continue;
} else {
if (currentLineWritePos == maxLineBytes) {
throw new IOException("file has a line exceeding " + maxLineBytes
+ " bytes; use constructor to pickup bigger line buffer");
}
// write the current line bytes in reverse order - reading from
// the end will produce the correct line
currentLine[currentLineWritePos++] = b;
}
}
}}
#Test
public void readAndPrintInReverseOrder() throws IOException {
String path = "src/misctests/test.txt";
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(path));
Stack<String> lines = new Stack<String>();
String line = br.readLine();
while(line != null) {
lines.push(line);
line = br.readLine();
}
while(! lines.empty()) {
System.out.println(lines.pop());
}
} finally {
if(br != null) {
try {
br.close();
} catch(IOException e) {
// can't help it
}
}
}
}
Note that this code reads the hole file into memory and then starts printing it. This is the only way you can do it with a buffered reader or anry other reader that does not support seeking. You have to keep this in mind, in your case you want to read a log file, log files can be very big!
If you want to read line by line and print on the fly then you have no other alternative than using a reader that support seeking such as java.io.RandomAccessFile and this anything but trivial.
As far as I understand, you try to read backwards line by line.
Suppose this is the file you try to read:
line1
line2
line3
And you want to write it to the output stream of the servlet as follows:
line3
line2
line1
Following code might be helpful in this case:
List<String> tmp = new ArrayList<String>();
do {
ch = br.readLine();
tmp.add(ch);
out.print(ch+"<br/>");
} while (ch != null);
for(int i=tmp.size()-1;i>=0;i--) {
out.print(tmp.get(i)+"<br/>");
}
I had a problem with your solution #dpetruha because of this:
Does RandomAccessFile.read() from local file guarantee that exact number of bytes will be read?
Here is my solution: (changed only fillBuffer)
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.RandomAccessFile;
public class ReverseLineInputStream extends InputStream {
private static final int MAX_LINE_BYTES = 1024 * 1024;
private static final int DEFAULT_BUFFER_SIZE = 1024 * 1024;
private RandomAccessFile in;
private long currentFilePos;
private int bufferSize;
private byte[] buffer;
private int currentBufferPos;
private int maxLineBytes;
private byte[] currentLine;
private int currentLineWritePos = 0;
private int currentLineReadPos = 0;
private boolean lineBuffered = false;
public ReverseLineInputStream(File file) throws IOException {
this(file, DEFAULT_BUFFER_SIZE, MAX_LINE_BYTES);
}
public ReverseLineInputStream(File file, int bufferSize, int maxLineBytes) throws IOException {
this.maxLineBytes = maxLineBytes;
in = new RandomAccessFile(file, "r");
currentFilePos = file.length() - 1;
in.seek(currentFilePos);
if (in.readByte() == 0xA) {
currentFilePos--;
}
currentLine = new byte[maxLineBytes];
currentLine[0] = 0xA;
this.bufferSize = bufferSize;
buffer = new byte[bufferSize];
fillBuffer();
fillLineBuffer();
}
#Override
public int read() throws IOException {
if (currentFilePos <= 0 && currentBufferPos < 0 && currentLineReadPos < 0) {
return -1;
}
if (!lineBuffered) {
fillLineBuffer();
}
if (lineBuffered) {
if (currentLineReadPos == 0) {
lineBuffered = false;
}
return currentLine[currentLineReadPos--];
}
return 0;
}
private void fillBuffer() throws IOException {
if (currentFilePos < 0) {
return;
}
if (currentFilePos < bufferSize) {
in.seek(0);
buffer = new byte[(int) currentFilePos + 1];
in.readFully(buffer);
currentBufferPos = (int) currentFilePos;
currentFilePos = -1;
} else {
in.seek(currentFilePos - buffer.length);
in.readFully(buffer);
currentBufferPos = bufferSize - 1;
currentFilePos = currentFilePos - bufferSize;
}
}
private void fillLineBuffer() throws IOException {
currentLineWritePos = 1;
while (true) {
// we've read all the buffer - need to fill it again
if (currentBufferPos < 0) {
fillBuffer();
// nothing was buffered - we reached the beginning of a file
if (currentBufferPos < 0) {
currentLineReadPos = currentLineWritePos - 1;
lineBuffered = true;
return;
}
}
byte b = buffer[currentBufferPos--];
// \n is found - line fully buffered
if (b == 0xA) {
currentLineReadPos = currentLineWritePos - 1;
lineBuffered = true;
break;
// just ignore \r for now
} else if (b == 0xD) {
continue;
} else {
if (currentLineWritePos == maxLineBytes) {
throw new IOException("file has a line exceeding " + maxLineBytes
+ " bytes; use constructor to pickup bigger line buffer");
}
// write the current line bytes in reverse order - reading from
// the end will produce the correct line
currentLine[currentLineWritePos++] = b;
}
}
}
}

reading from the file and writing to the file in java

I am beginner with Java.
This is my approach:
I am trying to read two files and then get the union of them. I should am using an array with size 100. (just one array allowed, reading and writing line by line or arrayList or other structures are not allowed.)
First, I read all records from file1, and write them to the output, a third file. For that purpose, I read 100 record at a time, and write them to the third file using iteration.
After that, like first file, this time I read second file as 100 records at a time, and write them to the memory[]. Then I find the common records, if the record which I read from File2 is not in File1, I write it to the output file. I do this until reader2.readLine() gets null and I re-open file1 in each iteration.
This is what I have done so far, almost done. Any help would be appreciated.
Edit: ok, now it doesn't give any exception, but it can't find the different records and can't write them. I guess the last for loop and booleans don't work , why? I really need help. Thanks for your patience.
import java.io.*;
public class FileUnion
{
private static long startTime, endTime;
public static void main(String[] args) throws IOException
{
System.out.println("PROCESSING...");
reset();
startTimer();
String[] memory = new String[100];
int memorySize = memory.length;
File file1 = new File("stdlist1.txt");
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
File file3 = new File("union.txt");
BufferedWriter writer = new BufferedWriter(new FileWriter(file3));
int numberOfLinesFile1 = 0;
String line1 = null;
String line11 = null;
while((line1 = reader1.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line1;
i++;
if(i < memorySize)
{
line1 = reader1.readLine();
}
}
for (int i = 0; i < memorySize; i++)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
reader1.close();
File file2 = new File("stdlist2.txt");
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String line2 = null;
while((line2 = reader2.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
line2 = reader2.readLine();
}
}
for (int k = 0; k < memorySize; k++ )
{
boolean found = false;
File f1 = new File("stdlist1.txt");
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for (int m = 0; m < numberOfLinesFile1; m++)
{
line11 = buff1.readLine();
if (line11.equals(memory[k]) && found == false);
{
found = true;
}
}
buff1.close();
if (found == false)
{
writer.write(memory[k]);
writer.newLine();
}
}
}
reader2.close();
writer.close();
endTimer();
long time = duration();
System.out.println("PROCESS COMPLETED SUCCESSFULLY");
System.out.println("Duration: " + time + " ms");
}
public static void startTimer()
{
startTime = System.currentTimeMillis();
}
public static void endTimer()
{
endTime = System.currentTimeMillis();
}
public static long duration()
{
return endTime - startTime;
}
public static void reset()
{
startTime = 0;
endTime = 0;
}
}
EDIT! Redo.
Ok, so to use 100 lines at a time you need to check for null, otherwise trying to write null to a file could cause errors.
You are checking if the file is at the end once, and then gathering 99 more peices of info without checking for null.
What if when this line is called:
while((line2 = reader2.readLine()) != null)
there is only 1 line left in the file? Then your memory array contains 99 instances of null, and you try to write null to the file 99 times. That's worse case scenario.
I don't really know how much help we are supposed to give to people looking for homework help, on most sites I'm familiar with it's not even allowed.
here is an example of one way to write the first file.
String line1 = reader1.readLine();
boolean end_of_file1 = false;
while(!end_of_file)
{
for (int i = 0; i < memorySize)
{
memory[i] = line1;
i++;
if(i < memorySize)
{
if((line1 = reader1.readLine()) == null)
{
end_of_file1 = true;
}
}
}
for (int i = 0; i < memorySize; i++)
{
if(!memory[i] == null)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
}
reader1.close();
once you have that, to make the checking for copies easier, make a public static boolean that checks the file for it, then you can call that, it will make the code cleaner.
public static boolean isUsed(String f1, String item, int dist)
{
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for(int i = 0;i<dist;i++)
{
String line = buff1.readLine()
if(line == null){
return false;
}
if(line.equals(item))
{
return true;
}
}
return false;
}
Then use the same method as writing file 1, only before writing each line check to see if !isUsed()
boolean end_of_file2 = false;
memory = new String[memorySize];// Reset the memory, erase old data from file1
int numberOfLinesFile2=0;
String line2 = reader2.readLine();
while(!end_of_file2)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
if((line2 = reader2.readLine()) == null)
{
end_of_file2 = true;
}
}
}
for (int i = 0; i < memorySize; i++)
{
if(!memory[i] == null)
{
//Check is current item was used in file 1.
if(!isUsed(file1, memory[i], numberOfLinesFile1)){//If not used already
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile2++;
}
}
}
}
reader2.close();
writer.close();
Hope this helps. Notice I'm not supplying the full code, because I've learned that just pasting the code will make it more likely for copy and paste to just use a code without understanding it. I hope you find it useful.

Quickly read the last line of a text file?

What's the quickest and most efficient way of reading the last line of text from a [very, very large] file in Java?
Below are two functions, one that returns the last non-blank line of a file without loading or stepping through the entire file, and the other that returns the last N lines of the file without stepping through the entire file:
What tail does is zoom straight to the last character of the file, then steps backward, character by character, recording what it sees until it finds a line break. Once it finds a line break, it breaks out of the loop. Reverses what was recorded and throws it into a string and returns. 0xA is the new line and 0xD is the carriage return.
If your line endings are \r\n or crlf or some other "double newline style newline", then you will have to specify n*2 lines to get the last n lines because it counts 2 lines for every line.
public String tail( File file ) {
RandomAccessFile fileHandler = null;
try {
fileHandler = new RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if( filePointer == fileLength ) {
continue;
}
break;
} else if( readByte == 0xD ) {
if( filePointer == fileLength - 1 ) {
continue;
}
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
} finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
/* ignore */
}
}
}
But you probably don't want the last line, you want the last N lines, so use this instead:
public String tail2( File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler =
new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
Invoke the above methods like this:
File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));
Warning
In the wild west of unicode this code can cause the output of this function to come out wrong. For example "Mary?s" instead of "Mary's". Characters with hats, accents, Chinese characters etc may cause the output to be wrong because accents are added as modifiers after the character. Reversing compound characters changes the nature of the identity of the character on reversal. You will have to do full battery of tests on all languages you plan to use this with.
For more information about this unicode reversal problem read this:
https://codeblog.jonskeet.uk/2009/11/02/omg-ponies-aka-humanity-epic-fail/
Apache Commons has an implementation using RandomAccessFile.
It's called ReversedLinesFileReader.
Have a look at my answer to a similar question for C#. The code would be quite similar, although the encoding support is somewhat different in Java.
Basically it's not a terribly easy thing to do in general. As MSalter points out, UTF-8 does make it easy to spot \r or \n as the UTF-8 representation of those characters is just the same as ASCII, and those bytes won't occur in multi-byte character.
So basically, take a buffer of (say) 2K, and progressively read backwards (skip to 2K before you were before, read the next 2K) checking for a line termination. Then skip to exactly the right place in the stream, create an InputStreamReader on the top, and a BufferedReader on top of that. Then just call BufferedReader.readLine().
Using FileReader or FileInputStream won't work - you'll have to use either FileChannel or RandomAccessFile to loop through the file backwards from the end. Encodings will be a problem though, as Jon said.
You can easily change the below code to print the last line.
MemoryMappedFile for printing last 5 lines:
private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
FileInputStream fileInputStream=new FileInputStream(file);
FileChannel channel=fileInputStream.getChannel();
ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
buffer.position((int)channel.size());
int count=0;
StringBuilder builder=new StringBuilder();
for(long i=channel.size()-1;i>=0;i--){
char c=(char)buffer.get((int)i);
builder.append(c);
if(c=='\n'){
if(count==5)break;
count++;
builder.reverse();
System.out.println(builder.toString());
builder=null;
builder=new StringBuilder();
}
}
channel.close();
}
RandomAccessFile to print last 5 lines:
private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
int lines = 0;
StringBuilder builder = new StringBuilder();
long length = file.length();
length--;
randomAccessFile.seek(length);
for(long seek = length; seek >= 0; --seek){
randomAccessFile.seek(seek);
char c = (char)randomAccessFile.read();
builder.append(c);
if(c == '\n'){
builder = builder.reverse();
System.out.println(builder.toString());
lines++;
builder = null;
builder = new StringBuilder();
if (lines == 5){
break;
}
}
}
}
as far as I know The fastest way to read the last line of a text file is using FileUtils Apache class which is in "org.apache.commons.io". I have a two-million-line file and by using this class, it took me less than one second to find the last line. Here is the my code:
LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
lastLine= lineIterator.nextLine();
}
try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {
String line = null;
System.out.println("======================================");
line = reader.readLine(); //Read Line ONE
line = reader.readLine(); //Read Line TWO
System.out.println("first line : " + line);
//Length of one line if lines are of even length
int len = line.length();
//skip to the end - 3 lines
reader.skip((reqFile.length() - (len*3)));
//Searched to the last line for the date I was looking for.
while((line = reader.readLine()) != null){
System.out.println("FROM LINE : " + line);
String date = line.substring(0,line.indexOf(","));
System.out.println("DATE : " + date); //BAM!!!!!!!!!!!!!!
}
System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
System.out.println("======================================");
} catch (IOException x) {
x.printStackTrace();
}
In C#, you should be able to set the stream's position:
From: http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file
using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
using(StreamReader sr = new StreamReader(fs))
{
sr.BaseStream.Position = fs.Length - 4;
if(sr.ReadToEnd() == "DONE")
// match
}
}
To avoid the Unicode problems related to reverting the string (or the StringBuilder), as discussed in Eric Leschinski excellent answer, one can read to a byte list, from the end of the file, revert it to a byte array and then create the String from the byte array.
Below are the changes to Eric Leschinski answer's code, to do it with a byte array. The code changes are below the commented lines of code:
static public String tail2(File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler = new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
//StringBuilder sb = new StringBuilder();
List<Byte> sb = new ArrayList<>();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
//sb.add( (char) readByte );
sb.add( (byte) readByte );
}
//String lastLine = sb.reverse().toString();
//Revert byte array and create String
byte[] bytes = new byte[sb.size()];
for (int i=0; i<sb.size(); i++) bytes[sb.size()-1-i] = sb.get(i);
String lastLine = new String(bytes);
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
Code is 2 lines only
// Please specify correct Charset
ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);
// read last 2 lines
System.out.println(rlf.toString(2));
Gradle:
implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'
Maven:
<dependency>
<groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
</dependency>

Categories

Resources