How to know bytes read(offset) of BufferedReader? - java

I want to read file line by line.
BufferedReader is much faster than RandomAccessFile or BufferedInputStream.
But the problem is that I don't know how many bytes I read.
How to know bytes read(offset)?
I tried.
String buffer;
int offset = 0;
while ((buffer = br.readLine()) != null)
offset += buffer.getBytes().length + 1; // 1 is for line separator
I works if file is small.
But, when the file becomes large, offset becomes smaller than actual value.
How can I get offset?

There is no simple way to do this with BufferedReader because of two effects: Character endcoding and line endings. On Windows, the line ending is \r\n which is two bytes. On Unix, the line separator is a single byte. BufferedReader will handle both cases without you noticing, so after readLine(), you won't know how many bytes were skipped.
Also buffer.getBytes() only returns the correct result when your default encoding and the encoding of the data in the file accidentally happens to be the same. When using byte[] <-> String conversion of any kind, you should always specify exactly which encoding should be used.
You also can't use a counting InputStream because the buffered readers read data in large chunks. So after reading the first line with, say, 5 bytes, the counter in the inner InputStream would return 4096 because the reader always reads that many bytes into its internal buffer.
You can have a look at NIO for this. You can use a low level ByteBuffer to keep track of the offset and wrap that in a CharBuffer to convert the input into lines.

Here's something that should work. It assumes UTF-8, but you can easily change that.
import java.io.*;
class main {
public static void main(final String[] args) throws Exception {
ByteCountingLineReader r = new ByteCountingLineReader(new ByteArrayInputStream(toUtf8("Hello\r\nWorld\n")));
String line = null;
do {
long count = r.byteCount();
line = r.readLine();
System.out.println("Line at byte " + count + ": " + line);
} while (line != null);
r.close();
}
static class ByteCountingLineReader implements Closeable {
InputStream in;
long _byteCount;
int bufferedByte = -1;
boolean ended;
// in should be a buffered stream!
ByteCountingLineReader(InputStream in) {
this.in = in;
}
ByteCountingLineReader(File f) throws IOException {
in = new BufferedInputStream(new FileInputStream(f), 65536);
}
String readLine() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
if (ended) return null;
while (true) {
int c = read();
if (ended && baos.size() == 0) return null;
if (ended || c == '\n') break;
if (c == '\r') {
c = read();
if (c != '\n' && !ended)
bufferedByte = c;
break;
}
baos.write(c);
}
return fromUtf8(baos.toByteArray());
}
int read() throws IOException {
if (bufferedByte >= 0) {
int b = bufferedByte;
bufferedByte = -1;
return b;
}
int c = in.read();
if (c < 0) ended = true; else ++_byteCount;
return c;
}
long byteCount() {
return bufferedByte >= 0 ? _byteCount - 1 : _byteCount;
}
public void close() throws IOException {
if (in != null) try {
in.close();
} finally {
in = null;
}
}
boolean ended() {
return ended;
}
}
static byte[] toUtf8(String s) {
try {
return s.getBytes("UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static String fromUtf8(byte[] bytes) {
try {
return new String(bytes, "UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static RuntimeException rethrow(Throwable t) {
throw t instanceof RuntimeException ? (RuntimeException) t : new RuntimeException(t);
}
}

Try use RandomAccessFile
RandomAccessFile raf = new RandomAccessFile(filePath, "r");
while ((cur_line = raf.readLine()) != null){
System.out.println(curr_line);
// get offset
long rowIndex = raf.getFilePointer();
}
to seek by offset do:
raf.seek(offset);

I am wondering your final solution, however, I think using long type instead of int can meet the most situation in your code above.

If you want to read a file line by line, I would recommend this code:
import java.io.*;
class FileRead
{
public static void main(String args[])
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I always used that method in the past, and works great!
Source: Here

Related

Java very basic encrypting a file

I want to excrypt a file in java very basically. Simply read line by line the file, and change the value of the chars to "char += key", where key is an integer.
The problem is that if I use a key larger or equal with 2, it doesn't work anymore.
public void encryptData(int key) {
System.out.println("Encrypt");
try {
BufferedReader br = new BufferedReader(new FileReader("encrypted.data"));
BufferedWriter out = new BufferedWriter(new FileWriter("temp_encrypted.data"));
String str;
while ((str = br.readLine()) != null) {
char[] str_array = str.toCharArray();
// Encrypt one line
for (int i = 0; i < str.length(); i++) {
str_array[i] += key;
}
// Put the line in temp file
str = String.valueOf(str_array);
out.write(str_array);
}
br.close();
out.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
The decrypt function is the same but with the input/output files interchanged and instead of adding the key value, I subtract it.
I check char by char and indeed, the header gets messed up when i use a key value > 1. Any ideas? Is it because of maximum value of the char being exceeded?
You're basically implementing a general-purpose Caesar cipher.
Adding a number to a character could change a character to newline, etc which will not work if using a BufferedReader to read it back in.
Best to manipulate the text as a byte stream which would correctly encode and decode newline and any non-ASCII characters.
public void encryptData(int key) {
System.out.println("Encrypt");
try {
BufferedInputStream in = new BufferedInputStream(new FileInputStream("raw-text.data"));
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream("temp_encrypted.data"));
int ch;
while((ch = in.read()) != -1) {
// NOTE: write(int) method casts int to byte
out.write(ch + key);
}
out.close();
in.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
public void decryptData(int key) {
System.out.println("Decrypt");
try {
BufferedInputStream in = new BufferedInputStream(new FileInputStream("temp_encrypted.data"));
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream("decrypted.data"));
int ch;
while((ch = in.read()) != -1) {
out.write(ch - key);
}
out.close();
in.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}

Read file of objects and write it back to a new one a single character at a time

How to read an entire record from a txt file, get each field separately and convert each field into a separate character stream. Then write the character streams of individual characters (in a loop) to a plain ASCII output text file.
I have my class definition, I just cannot seem to write the output file properly which has to be one individual plain ascii text character at a time. I just need a little help. Here is what I have so far:
----- This is my first question guys. Sorry if it isn't formatted well :( I'm trying to covert a file of objects to a plain ASCII character text file which i called "yankees.txt" I read it in with the ObjectInputStream then I'm supposed to get each field separately and convert each field into a seperate character stream, and write the characters one character at a time from each field to my "yankees.txt"
public class yankeesfilemain {
public static void main(String[] args) throws EOFException {
ObjectInputStream is;
OutputStream os;
yankees y;
int i, j, k;
String name, pos;
int number;
File fout;
try {
is = new ObjectInputStream(new
FileInputStream("yankees.yanks"));
y = (yankees)is.readObject();
fout = new File("yankees.txt");
os = new FileOutputStream(fout);
while (y != null) {
name = y.getname();
pos = y.getpos();
number = y.getnum();
for (i = 0; i < .length(); i++) {}
for (j = 0; j < .length(); j++) {
pos = y.getpos();
}
for (k = 0; k < .length(); k++) {
number = y.getnum();
}
break;
}
os.close();
is.close();
} catch(EOFException eof) {
eof.printStackTrace();
System.exit(0);
} catch(NullPointerException npe) {
npe.printStackTrace();
System.exit(0);
} catch(NumberFormatException nfe) {
nfe.printStackTrace();
System.exit(0);
} catch(IOException e) {
e.printStackTrace();
System.exit(0);
}
}
}
Please refer to the following code
public static void main(String[] args) throws IOException {
InputStream in = new FileInputStream("C:\\11.txt");
OutputStream out = new FileOutputStream("C:\\12.txt", true);
try {
byte[] buffer = new byte[1024];
while (true) {
int byteRead = in.read(buffer);
if (byteRead == -1)
break;
out.write(buffer, 0, byteRead);
}
}
catch (MalformedURLException ex) {
System.err.println(args[0] + " is not a URL Java understands.");
} finally {
if (in != null)
in.close();
if (out != null) {
out.close();
}
}
}

Find file with the most update information

I have a list of log files, and I need to find which one has a latest edition of a specific line, and all or none could have this line.
The lines in the files look like this:
2013/01/06 16:01:00:283 INFO ag.doLog: xxxx xxxx xxxx xxxx
And I need a line lets say
xx/xx/xx xx:xx:xx:xxx INFO ag.doLog: the line i need
I know how to get an array of files, and if I scan backwards I could find the latest latest line in each file (if it exists).
Biggest problem is that the file could be big (2k lines?) and I want to find the line in a relative fast way (a few seconds), so I am open for suggestion.
Personal ideas:
If a file has the line at X time, then any file that has not found the line before X time should not be scan anymore. This will require to search all files at the same time, which i dont know how.
Atm the code breaks, and I suppose if lack of memory.
Code:
if(files.length>0) { //in case no log files exist
System.out.println("files.length: " + files.length);
for(int i = 0; i < files.length; i++) { ///for each log file look for string
System.out.println("Reading file: " + i + " " + files[i].getName());
RandomAccessFile raf = new RandomAccessFile(files[i].getAbsoluteFile(), "r"); //open log file
long lastSegment = raf.length(); //Finds how long is the files
lastSegment = raf.length()-5; //Sets a point to start looking
String leido = "";
byte array[] = new byte[1024];
/*
* Going back until we find line or file is empty.
*/
while(!leido.contains(lineToSearch)||lastSegment>0) {
System.out.println("leido: " + leido);
raf.seek(lastSegment); //move the to that point
raf.read(array); //Reads 1024 bytes and saves in array
leido = new String(array); //Saves what is read as a string
lastSegment = lastSegment-15; //move the point a little further back
}
if(lastSegment<0) {
raf.seek(leido.indexOf(lineToSearch) - 23); //to make sure we get the date (23 characters long) NOTE: it wont be negative.
raf.read(array); //Reads 1024 bytes and saves in array
leido = new String(array); //make the array into a string
Date date = new SimpleDateFormat("MMMM d, yyyy", Locale.ENGLISH).parse(leido.substring(0, leido.indexOf(" INFO "))); //get only the date part
System.out.println(date);
//if date is bigger than the other save file name
}
}
}
I find the code difficult to verify. One could split the task in a backwards reader, which reads lines from file end to start. And use that for parsing dates line wise.
Mind, I am not going for nice code, but something like this:
public class BackwardsReader implements Closeable {
private static final int BUFFER_SIZE = 4096;
private String charset;
private RandomAccessFile raf;
private long position;
private int readIndex;
private byte[] buffer = new byte[BUFFER_SIZE];
/**
* #param file a text file.
* #param charset with bytes '\r' and '\n' (no wide chars).
*/
public BackwardsReader(File file, String charset) throws IOException {
this.charset = charset;
raf = new RandomAccessFile(file, "r");
position = raf.length();
}
public String readLine() throws IOException {
if (position + readIndex == 0) {
raf.close();
raf = null;
return null;
}
String line = "";
for (;;) { // Loop adding blocks without newline '\n'.
// Search line start:
boolean lineStartFound = false;
int lineStartIndex = readIndex;
while (lineStartIndex > 0) {
if (buffer[lineStartIndex - 1] == (byte)'\n') {
lineStartFound = true;
break;
}
--lineStartIndex;
}
String line2;
try {
line2 = new String(buffer, lineStartIndex, readIndex - lineStartIndex,
charset).replaceFirst("\r?\n?", "");
readIndex = lineStartIndex;
} catch (UnsupportedEncodingException ex) {
Logger.getLogger(BackwardsReader.class.getName())
.log(Level.SEVERE, null, ex);
return null;
}
line = line2 + line;
if (lineStartFound) {
--readIndex;
break;
}
// Read a prior block:
int toRead = BUFFER_SIZE;
if (position - toRead < 0) {
toRead = (int) position;
}
if (toRead == 0) {
break;
}
position -= toRead;
raf.seek(position);
raf.readFully(buffer, 0, toRead);
readIndex = toRead;
if (buffer[readIndex - 1] == (byte)'\r') {
--readIndex;
}
}
return line;
}
#Override
public void close() throws IOException {
if (raf != null) {
raf.close();
}
}
}
And a usage example:
public static void main(String[] args) {
try {
File file = new File(args[0]);
BackwardsReader reader = new BackwardsReader(file, "UTF-8");
int lineCount = 0;
for (;;) {
String line = reader.readLine();
if (line == null) {
break;
}
++lineCount;
System.out.println(line);
}
reader.close();
System.out.println("Lines: " + lineCount);
} catch (IOException ex) {
Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
}
}

Can I peek on a BufferedReader?

Is there a way to check if in BufferedReader object is something to read? Something like C++ cin.peek(). Thanks.
You can use a PushbackReader. Using that you can read a character, then unread it. This essentially allows you to push it back.
PushbackReader pr = new PushbackReader(reader);
char c = (char)pr.read();
// do something to look at c
pr.unread((int)c); //pushes the character back into the buffer
You can try the "boolean ready()" method.
From the Java 6 API doc: "A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready."
BufferedReader r = new BufferedReader(reader);
if(r.ready())
{
r.read();
}
The following code will look at the first byte in the Stream. Should act as a peek for you.
BufferedReader bReader = new BufferedReader(inputStream);
bReader.mark(1);
int byte1 = bReader.read();
bReader.reset();
The normal idiom is to check in a loop if BufferedReader#readLine() doesn't return null. If end of stream is reached (e.g. end of file, socket closed, etc), then it returns null.
E.g.
BufferedReader reader = new BufferedReader(someReaderSource);
String line = null;
while ((line = reader.readLine()) != null) {
// ...
}
If you don't want to read in lines (which is by the way the major reason a BufferedReader is been chosen), then use BufferedReader#ready() instead:
BufferedReader reader = new BufferedReader(someReaderSource);
while (reader.ready()) {
int data = reader.read();
// ...
}
BufferedReader br = new BufferedReader(reader);
br.mark(1);
int firstByte = br.read();
br.reset();
You could use a PushBackReader to read a character, and then "push it back". That way you know for sure that something was there, without affecting its overall state - a "peek".
The answer from pgmura (relying on the ready() method) is simple and works.
But bear in mind that it's because Sun's implementation of the method; which does not really agree with the documentation. I would not rely on that, if this behaviour is critical.
See here http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4090471
I'd rather go with the PushbackReader option.
my solution was.. extending BufferedReader and use queue as buf, then you can use peek method in queue.
public class PeekBufferedReader extends BufferedReader{
private Queue<String> buf;
private int bufSize;
public PeekBufferedReader(Reader reader, int bufSize) throws IOException {
super(reader);
this.bufSize = bufSize;
buf = Queues.newArrayBlockingQueue(bufSize);
}
/**
* readAheadLimit is set to 1048576. Line which has length over readAheadLimit
* will cause IOException.
* #throws IOException
**/
//public String peekLine() throws IOException {
// super.mark(1048576);
// String peekedLine = super.readLine();
// super.reset();
// return peekedLine;
//}
/**
* This method can be implemented by mark and reset methods. But performance of
* this implementation is better ( about 2times) than using mark and reset
**/
public String peekLine() throws IOException {
if (buf.isEmpty()) {
while (buf.size() < bufSize) {
String readLine = super.readLine();
if (readLine == null) {
break;
} else {
buf.add(readLine);
}
}
} else {
return buf.peek();
}
if (buf.isEmpty()) {
return null;
} else {
return buf.peek();
}
}
public String readLine() throws IOException {
if (buf.isEmpty()) {
while (buf.size() < bufSize) {
String readLine = super.readLine();
if (readLine == null) {
break;
} else {
buf.add(readLine);
}
}
} else {
return buf.poll();
}
if (buf.isEmpty()) {
return null;
} else {
return buf.poll();
}
}
public boolean isEmpty() throws IOException {
if (buf.isEmpty()) {
while (buf.size() < bufSize) {
String readLine = super.readLine();
if (readLine == null) {
break;
} else {
buf.add(readLine);
}
}
} else {
return false;
}
if (buf.isEmpty()) {
return true;
} else {
return false;
}
}
}

Number of lines in a file in Java

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file
I was wondering if there is a smarter way to do that
This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.
public static int countLinesOld(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:
public static int countLinesNew(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int readChars = is.read(c);
if (readChars == -1) {
// bail out if nothing to read
return 0;
}
// make it easy for the optimizer to tune this loop
int count = 0;
while (readChars == 1024) {
for (int i=0; i<1024;) {
if (c[i++] == '\n') {
++count;
}
}
readChars = is.read(c);
}
// count remaining characters
while (readChars != -1) {
for (int i=0; i<readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
readChars = is.read(c);
}
return count == 0 ? 1 : count;
} finally {
is.close();
}
}
Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOld has a few outliers, and countLinesNew has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader is clearly slower.
I have implemented another solution to the problem, I found it more efficient in counting rows:
try
(
FileReader input = new FileReader("input.txt");
LineNumberReader count = new LineNumberReader(input);
)
{
while (count.skip(Long.MAX_VALUE) > 0)
{
// Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
}
result = count.getLineNumber() + 1; // +1 because line index starts at 0
}
The accepted answer has an off by one error for multi line files which don't end in newline. A one line file ending without a newline would return 1, but a two line file ending without a newline would return 1 too. Here's an implementation of the accepted solution which fixes this. The endsWithoutNewLine checks are wasteful for everything but the final read, but should be trivial time wise compared to the overall function.
public int count(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean endsWithoutNewLine = false;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
endsWithoutNewLine = (c[readChars - 1] != '\n');
}
if(endsWithoutNewLine) {
++count;
}
return count;
} finally {
is.close();
}
}
With java-8, you can use streams:
try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
long numOfLines = lines.count();
...
}
The answer with the method count() above gave me line miscounts if a file didn't have a newline at the end of the file - it failed to count the last line in the file.
This method works better for me:
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
I tested the above methods for counting lines and here are my observations for Different methods as tested on my system
File Size : 1.6 Gb
Methods:
Using Scanner : 35s approx
Using BufferedReader : 5s approx
Using Java 8 : 5s approx
Using LineNumberReader : 5s approx
Moreover Java8 Approach seems quite handy :
Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]
I know this is an old question, but the accepted solution didn't quite match what I needed it to do. So, I refined it to accept various line terminators (rather than just line feed) and to use a specified character encoding (rather than ISO-8859-n). All in one method (refactor as appropriate):
public static long getLinesCount(String fileName, String encodingName) throws IOException {
long linesCount = 0;
File file = new File(fileName);
FileInputStream fileIn = new FileInputStream(file);
try {
Charset encoding = Charset.forName(encodingName);
Reader fileReader = new InputStreamReader(fileIn, encoding);
int bufferSize = 4096;
Reader reader = new BufferedReader(fileReader, bufferSize);
char[] buffer = new char[bufferSize];
int prevChar = -1;
int readCount = reader.read(buffer);
while (readCount != -1) {
for (int i = 0; i < readCount; i++) {
int nextChar = buffer[i];
switch (nextChar) {
case '\r': {
// The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
linesCount++;
break;
}
case '\n': {
if (prevChar == '\r') {
// The current line is terminated by a carriage return immediately followed by a line feed.
// The line has already been counted.
} else {
// The current line is terminated by a line feed.
linesCount++;
}
break;
}
}
prevChar = nextChar;
}
readCount = reader.read(buffer);
}
if (prevCh != -1) {
switch (prevCh) {
case '\r':
case '\n': {
// The last line is terminated by a line terminator.
// The last line has already been counted.
break;
}
default: {
// The last line is terminated by end-of-file.
linesCount++;
}
}
}
} finally {
fileIn.close();
}
return linesCount;
}
This solution is comparable in speed to the accepted solution, about 4% slower in my tests (though timing tests in Java are notoriously unreliable).
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (Stream<String> lines = Files.lines(file.toPath())) {
return lines.count();
}
}
Tested on JDK8_u31. But indeed performance is slow compared to this method:
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
byte[] c = new byte[1024];
boolean empty = true,
lastEmpty = false;
long count = 0;
int read;
while ((read = is.read(c)) != -1) {
for (int i = 0; i < read; i++) {
if (c[i] == '\n') {
count++;
lastEmpty = true;
} else if (lastEmpty) {
lastEmpty = false;
}
}
empty = false;
}
if (!empty) {
if (count == 0) {
count = 1;
} else if (!lastEmpty) {
count++;
}
}
return count;
}
}
Tested and very fast.
A straight-forward way using Scanner
static void lineCounter (String path) throws IOException {
int lineCount = 0, commentsCount = 0;
Scanner input = new Scanner(new File(path));
while (input.hasNextLine()) {
String data = input.nextLine();
if (data.startsWith("//")) commentsCount++;
lineCount++;
}
System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
}
I concluded that wc -l:s method of counting newlines is fine but returns non-intuitive results on files where the last line doesn't end with a newline.
And #er.vikas solution based on LineNumberReader but adding one to the line count returned non-intuitive results on files where the last line does end with newline.
I therefore made an algo which handles as follows:
#Test
public void empty() throws IOException {
assertEquals(0, count(""));
}
#Test
public void singleNewline() throws IOException {
assertEquals(1, count("\n"));
}
#Test
public void dataWithoutNewline() throws IOException {
assertEquals(1, count("one"));
}
#Test
public void oneCompleteLine() throws IOException {
assertEquals(1, count("one\n"));
}
#Test
public void twoCompleteLines() throws IOException {
assertEquals(2, count("one\ntwo\n"));
}
#Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
assertEquals(2, count("one\ntwo"));
}
#Test
public void aFewLines() throws IOException {
assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}
And it looks like this:
static long countLines(InputStream is) throws IOException {
try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
char[] buf = new char[8192];
int n, previousN = -1;
//Read will return at least one byte, no need to buffer more
while((n = lnr.read(buf)) != -1) {
previousN = n;
}
int ln = lnr.getLineNumber();
if (previousN == -1) {
//No data read at all, i.e file was empty
return 0;
} else {
char lastChar = buf[previousN - 1];
if (lastChar == '\n' || lastChar == '\r') {
//Ending with newline, deduct one
return ln;
}
}
//normal case, return line number + 1
return ln + 1;
}
}
If you want intuitive results, you may use this. If you just want wc -l compatibility, simple use #er.vikas solution, but don't add one to the result and retry the skip:
try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
while(lnr.skip(Long.MAX_VALUE) > 0){};
return lnr.getLineNumber();
}
How about using the Process class from within Java code? And then reading the output of the command.
Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
int lineCount = 0;
while ((line = b.readLine()) != null) {
System.out.println(line);
lineCount = Integer.parseInt(line);
}
Need to try it though. Will post the results.
It seems that there are a few different approaches you can take with LineNumberReader.
I did this:
int lines = 0;
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
String line = count.readLine();
if(count.ready())
{
while(line != null) {
lines = count.getLineNumber();
line = count.readLine();
}
lines+=1;
}
count.close();
System.out.println(lines);
Even more simply, you can use the Java BufferedReader lines() Method to return a stream of the elements, and then use the Stream count() method to count all of the elements. Then simply add one to the output to get the number of rows in the text file.
As example:
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
int lines = (int)count.lines().count() + 1;
count.close();
System.out.println(lines);
This funny solution works really good actually!
public static int countLines(File input) throws IOException {
try (InputStream is = new FileInputStream(input)) {
int count = 1;
for (int aChar = 0; aChar != -1;aChar = is.read())
count += aChar == '\n' ? 1 : 0;
return count;
}
}
On Unix-based systems, use the wc command on the command-line.
Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.
If you don't have any index structures, you'll not get around the reading of the complete file. But you can optimize it by avoiding to read it line by line and use a regex to match all line terminators.
Best Optimized code for multi line files having no newline('\n') character at EOF.
/**
*
* #param filename
* #return
* #throws IOException
*/
public static int countLines(String filename) throws IOException {
int count = 0;
boolean empty = true;
FileInputStream fis = null;
InputStream is = null;
try {
fis = new FileInputStream(filename);
is = new BufferedInputStream(fis);
byte[] c = new byte[1024];
int readChars = 0;
boolean isLine = false;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if ( c[i] == '\n' ) {
isLine = false;
++count;
}else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF
isLine = true;
}
}
}
if(isLine){
++count;
}
}catch(IOException e){
e.printStackTrace();
}finally {
if(is != null){
is.close();
}
if(fis != null){
fis.close();
}
}
LOG.info("count: "+count);
return (count == 0 && !empty) ? 1 : count;
}
Scanner with regex:
public int getLineCount() {
Scanner fileScanner = null;
int lineCount = 0;
Pattern lineEndPattern = Pattern.compile("(?m)$");
try {
fileScanner = new Scanner(new File(filename)).useDelimiter(lineEndPattern);
while (fileScanner.hasNext()) {
fileScanner.next();
++lineCount;
}
}catch(FileNotFoundException e) {
e.printStackTrace();
return lineCount;
}
fileScanner.close();
return lineCount;
}
Haven't clocked it.
if you use this
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
you cant run to big num rows, likes 100K rows, because return from reader.getLineNumber is int. you need long type of data to process maximum rows..

Categories

Resources