Save a reader of a file in a database in Java

Save a reader of a file in a database in Java - java

I have a Reader in Java:
And the reader (Reader read) is from a file with 1'000.000 of lines
And i need save each line in my database, i am reading the Reader like:
int data = read.read();
String line = "";
while (data != -1) {
char dataChar = (char) data;
data = read.read();
if (dataChar != '\n') {
line = line + dataChar;
} else {
i++;
showline(line);
line = "";
}
}
Then i am calling my DAO for each line:
private static void showline(String line) {
try {
if (line.startsWith(prefix)) {
line = line.substring(prefix.length());
}
ms = new Msisdn(Long.parseLong(line, 10), idList);
ListDAO.createMsisdn(ms);
} catch (Exception e) {
}
}
And my DAO is:
public static void createMsisdn(Msisdn msisdn) {
EntityManager e = DBManager.createEM();
try {
createMsisdn(msisdn, e);
} finally {
if (e != null) {
e.close();
}
}
}
public static void createMsisdn(Msisdn msisdn, EntityManager em) {
em.getTransaction().begin();
em.persist(msisdn);
em.getTransaction().commit();
}
But my problem is that with a file with 1'000.000 lines it takes about 1 hour 30 minutes to complete. How can I make it faster?
(My main problem is call the DAO 1'000.000 of times because it is very slow, because the while is faster, without the call to the DAO the time is less than 1 minute, but with the call to the DAO the time is 2 hours)

Reading characters and appending them into a String one by one is incredibly inefficient. Using a BufferedReader to read lines of text is much better:
String line;
BufferedReader reader = new BufferedReader(read);
while ((line = reader.readLine()) != null) {
showline(line);
}
This won't have a big effect in your case though: you are inserting each line in a separate transaction, and each transaction can take hundreds of milliseconds to complete. You should structure your code in a way that several lines could be inserted in a single transaction. For example you can read blocks of lines like this, but you'll have to change the showlines and createMsisdn methods so that they accept several at a time and process them in a single batch:
final int TRANSACTION_SIZE = 500;
int i = 0;
String[] lines = new String[TRANSACTION_SIZE];
BufferedReader reader = new BufferedReader(read);
while ((lines[i] = reader.readLine()) != null) {
if (i >= lines.length) {
showlines(lines, lines.length);
i = 0;
} else {
i++;
}
}
if (i > 0) showlines(lines, i);

Related

how to read csv file without specific number of column in java and without change the code after every additional column

I tried to write code to read csv file, and I stored the data in an array of object
but after every change in the number of columns, I should read another column and change the code.
because I want to use the same class for different csv files with different number of columns without need to change the code for every file.
public class Read_CSV {
public static Object[][]readCSVdata(String csvFilePath){
//String csvFilePath = null;
ArrayList<Object[]>dataList = new ArrayList <Object[]>();
String line = "";
String cvsSplitBy = ",";
try (BufferedReader br = new BufferedReader(new FileReader(csvFilePath))) {
int iteration = 0;
while ((line = br.readLine()) != null) {
if(iteration == 0) {
iteration++;
continue;
}
String[] arri = line.split(cvsSplitBy);
Object[]arri1= {arri[0] , arri[1],arri[2] };
//here after every additional column I should add another cell
dataList.add(arri1);
}
br.close();
return dataList.toArray(new Object[dataList.size()][]);
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
}

You could use the array of strings after splitting each line:
while ((line = br.readLine()) != null) {
if(iteration == 0) {
iteration++;
continue;
}
String[] arri = line.split(cvsSplitBy);
dataList.add(arri);
}
Then dataList will contain arrays of strings.
Or you have some other requirements?

Java - Read large .txt data file in batch size of 10

I have a large data file say dataset.txt where data is in the format -
1683492079 kyra maharashtra 18/04/2017 10:16:17
1644073389 pam delhi 18/04/2017 10:16:17
.......
The fields are id, name, state, and timestamp.
I have around 50,000 lines of data in the .txt data file.
My requirement is to read the data from this data file in batch size of 10.
So in first batch I need to read from 0 to 9th elements. Next batch from 10th to 19th elements and so on...
Using BufferedReader I have managed to read the whole file:
import java.io.*;
public class ReadDataFile {
public static void main(String args[]) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("dataset.txt"));
String line;
while((line = br.readLine())!= null)
{
System.out.println(line);
}
br.close();
}
}
But my requirement is to read the file in batch size of 10. I am new to Java so would really appreciate if some one can help me in simple terms.
As per #GhostCat answer - this what I have got -
public class ReadDataFile {
public static void main(String args[]) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("dataSetExample.txt"));
readBatch(br,10);
}
public static void readBatch(BufferedReader reader, int batchSize) throws IOException {
List<String> result = new ArrayList<>();
for (int i = 0; i < batchSize; i++) {
String line = reader.readLine();
if (line != null) {
// result.add(line);
System.out.println(line);
}
}
// return result;
return ;
}
}
The file is read in the readBatch method so how do I know in the main method that the end of file is reached to call the next 10 records? Kindly help.

Your requirements aren't really clear; but something simple to get you started:
A) your main method shouldn't do any reading; it just prepare that BufferedReader object
B) you use that reader with a method like:
private static List<String> readBatch(Reader reader, int batchSize) throws IOException {
List<String> result = new ArrayList<>();
for (int i = 0; i < batchSize; i++) {
String line = reader.readLine();
if (line != null) {
result.add(line);
} else {
return result;
}
}
return result;
}
To be used in your main:
BufferedReader reader = ...
int batchSize = 10;
boolean moreLines = true;
while (moreLines) {
List<String> batch = readBatch(reader, batchSize);
... do something with that list
if (batch.size() < batchSize) {
moreLines = false;
}
This is meant as "suggestion" how you could approach this. Things missing from my answer: probably you should use a distinct class, and do parsing right there (and return a List<DataClass> instead of moving around those raw "line strings".
And of course: 50000 lines isn't really much of data. Unless we are talking an embedded device, there is really not much point regarding "batch style".
And finally: the term batch processing has a very distinct meaning; also in Java, and if you intend to go there, see here for further reading.

Anybody in need of working example ---
// Create a method to read lines (using buffreader) and should accept the batchsize as argument
private static List<String> readBatch(BufferedReader br, int batchSize) throws IOException {
// Create a List object which will contain your Batch Sized lines
List<String> result = new ArrayList<>();
for (int i = 1; i < batchSize; i++) { // loop thru all your lines
String line = br.readLine();
if (line != null) {
result.add(line); // add your lines to your (List) result
} else {
return result; // Return your (List) result
}
}
return result; // Return your (List) result
}
public static void main(String[] args) throws IOException {
//input file
BufferedReader br = new BufferedReader(new FileReader("c://ldap//buffreadstream2.csv"));
//output file
BufferedWriter bw = new BufferedWriter(new FileWriter("c://ldap//buffreadstream3.csv"));
// Your Batch size i.e. how many lines you want in your batch
int batchSize = 5; // Define your batchsize here
String line = null;
long batchNumber = 1;
try {
List<String> mylist = null;
while ((line = br.readLine()) != null) { // Do it for your all line in your csv file
bw.write("Batch Number # " + batchNumber + "\n");
System.out.println("Batch Number # " + batchNumber);
bw.write(line + "\n"); // Since br.readLine() reads the next line you have to catch your first line here itself
System.out.println(line); // else you will miss every batchsize number line
// process your First Line here...
mylist = readBatch(br, batchSize); // get/catch your (List) result here as returned from readBatch() method
for (int i = 0; i < mylist.size(); i++) {
System.out.println(mylist.get(i));
// process your lines here...
bw.write(mylist.get(i) + "\n"); // write/process your returned lines
}
batchNumber++;
}
System.out.println("Lines are Successfully copied!");
br.close(); // one you are done .. dont forget to close/flush
br = null; // all
bw.flush(); // your
bw.close(); // BR and
bw = null; // BWs..
} catch (Exception e) {
System.out.println("Exception caught: " + e.getMessage()); // Catch any exception here
}
}

How to know bytes read(offset) of BufferedReader?

I want to read file line by line.
BufferedReader is much faster than RandomAccessFile or BufferedInputStream.
But the problem is that I don't know how many bytes I read.
How to know bytes read(offset)?
I tried.
String buffer;
int offset = 0;
while ((buffer = br.readLine()) != null)
offset += buffer.getBytes().length + 1; // 1 is for line separator
I works if file is small.
But, when the file becomes large, offset becomes smaller than actual value.
How can I get offset?

There is no simple way to do this with BufferedReader because of two effects: Character endcoding and line endings. On Windows, the line ending is \r\n which is two bytes. On Unix, the line separator is a single byte. BufferedReader will handle both cases without you noticing, so after readLine(), you won't know how many bytes were skipped.
Also buffer.getBytes() only returns the correct result when your default encoding and the encoding of the data in the file accidentally happens to be the same. When using byte[] <-> String conversion of any kind, you should always specify exactly which encoding should be used.
You also can't use a counting InputStream because the buffered readers read data in large chunks. So after reading the first line with, say, 5 bytes, the counter in the inner InputStream would return 4096 because the reader always reads that many bytes into its internal buffer.
You can have a look at NIO for this. You can use a low level ByteBuffer to keep track of the offset and wrap that in a CharBuffer to convert the input into lines.

Here's something that should work. It assumes UTF-8, but you can easily change that.
import java.io.*;
class main {
public static void main(final String[] args) throws Exception {
ByteCountingLineReader r = new ByteCountingLineReader(new ByteArrayInputStream(toUtf8("Hello\r\nWorld\n")));
String line = null;
do {
long count = r.byteCount();
line = r.readLine();
System.out.println("Line at byte " + count + ": " + line);
} while (line != null);
r.close();
}
static class ByteCountingLineReader implements Closeable {
InputStream in;
long _byteCount;
int bufferedByte = -1;
boolean ended;
// in should be a buffered stream!
ByteCountingLineReader(InputStream in) {
this.in = in;
}
ByteCountingLineReader(File f) throws IOException {
in = new BufferedInputStream(new FileInputStream(f), 65536);
}
String readLine() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
if (ended) return null;
while (true) {
int c = read();
if (ended && baos.size() == 0) return null;
if (ended || c == '\n') break;
if (c == '\r') {
c = read();
if (c != '\n' && !ended)
bufferedByte = c;
break;
}
baos.write(c);
}
return fromUtf8(baos.toByteArray());
}
int read() throws IOException {
if (bufferedByte >= 0) {
int b = bufferedByte;
bufferedByte = -1;
return b;
}
int c = in.read();
if (c < 0) ended = true; else ++_byteCount;
return c;
}
long byteCount() {
return bufferedByte >= 0 ? _byteCount - 1 : _byteCount;
}
public void close() throws IOException {
if (in != null) try {
in.close();
} finally {
in = null;
}
}
boolean ended() {
return ended;
}
}
static byte[] toUtf8(String s) {
try {
return s.getBytes("UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static String fromUtf8(byte[] bytes) {
try {
return new String(bytes, "UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static RuntimeException rethrow(Throwable t) {
throw t instanceof RuntimeException ? (RuntimeException) t : new RuntimeException(t);
}
}

Try use RandomAccessFile
RandomAccessFile raf = new RandomAccessFile(filePath, "r");
while ((cur_line = raf.readLine()) != null){
System.out.println(curr_line);
// get offset
long rowIndex = raf.getFilePointer();
}
to seek by offset do:
raf.seek(offset);

I am wondering your final solution, however, I think using long type instead of int can meet the most situation in your code above.

If you want to read a file line by line, I would recommend this code:
import java.io.*;
class FileRead
{
public static void main(String args[])
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I always used that method in the past, and works great!
Source: Here

Read in N Lines of an Input Stream and print in reverse order without using array or list type structure?

Using the readLine() method of BufferedReader, can you print the first N lines of a stream in reverse order without using a list or an array?

I think you can do it through recursion with something like:
void printReversed(int n)
{
String line = reader.readLine();
if (n > 0)
printReversed(n-1);
System.out.println(line);
}

How about recursion to reverse the order?
Pseudo code:
reverse(int linesLeft)
if (linesLeft == 0)
return;
String line = readLine();
reverse(linesLeft - 1);
System.out.println(line);

Nice question. Here you have one solution based on coordinated threads. Although it's heavy on resources (1 thread/line of the buffer) it solves your problem within the given constrains. I'm curious to see other solutions.
public class ReversedBufferPrinter {
class Worker implements Runnable {
private final CountDownLatch trigger;
private final CountDownLatch release;
private final String line;
Worker(String line, CountDownLatch release) {
this.trigger = new CountDownLatch(1);
this.release = release;
this.line = line;
}
public CountDownLatch getTriggerLatch() {
return trigger;
}
public void run() {
try {
trigger.await();
} catch (InterruptedException ex) { } // handle
work();
release.countDown();
}
void work() {
System.out.println(line);
}
}
public void reversePrint(BufferedReader reader, int lines) throws IOException {
CountDownLatch initialLatch = new CountDownLatch(1);
CountDownLatch triggerLatch = initialLatch;
int count=0;
String line;
while (count++<lines && (line = reader.readLine())!=null) {
Worker worker = new Worker(line, triggerLatch);
triggerLatch = worker.getTriggerLatch();
new Thread(worker).start();
}
triggerLatch.countDown();
try {
initialLatch.await();
} catch (InterruptedException iex) {
// handle
}
}
public static void main(String [] params) throws Exception {
if (params.length<2) {
System.out.println("usage: ReversedBufferPrinter <file to reverse> <#lines>");
}
String filename = params[0];
int lines = Integer.parseInt(params[1]);
File file = new File(filename);
BufferedReader reader = new BufferedReader(new FileReader(file));
ReversedBufferPrinter printer = new ReversedBufferPrinter();
printer.reversePrint(reader, lines);
}
}

Here you have another alternative, based on BufferedReader & StringBuilder manipulations. More manageable in terms of computer resources needed.
public void reversePrint(BufferedReader bufReader, int lines) throws IOException {
BufferedReader resultBufferReader = null;
{
String line;
StringBuilder sb = new StringBuilder();
int count = 0;
while (count++<lines && (line = bufReader.readLine())!=null) {
sb.append('\n'); // restore new line marker for BufferedReader to consume.
sb.append(new StringBuilder(line).reverse());
}
resultBufferReader = new BufferedReader(new StringReader(sb.reverse().toString()));
}
{
String line;
while ((line = resultBufferReader.readLine())!=null) {
System.out.println(line);
}
}
}

it will also require implicit data structures, but you can spawn threads, run them inorder, and make each thread read a line and wait a decreasing amount of time. the result will be: the last thread will run first, and the first one will run last, each one printing its line. (the interval between them will have to be large enough to ensure large "safety margins")
I have no idea how, if any, that can be done with no explicit/implicit data storage.

Prepend each line you read to a string, and print the string. If you run out of lines to read, you just print what you have.
Alternatively, if you are certain of the number of lines you have, and you do not wish to use a string:
void printReversed(int n, BufferedReader reader)
{
LineNumberReader lineReader = new LineNumberReader(reader);
while (--i >= 0)
{
lineReader.setLineNumber(i);
System.out.println(lineReader.readLine());
}
}

Number of lines in a file in Java

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file
I was wondering if there is a smarter way to do that

This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.
public static int countLinesOld(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:
public static int countLinesNew(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int readChars = is.read(c);
if (readChars == -1) {
// bail out if nothing to read
return 0;
}
// make it easy for the optimizer to tune this loop
int count = 0;
while (readChars == 1024) {
for (int i=0; i<1024;) {
if (c[i++] == '\n') {
++count;
}
}
readChars = is.read(c);
}
// count remaining characters
while (readChars != -1) {
for (int i=0; i<readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
readChars = is.read(c);
}
return count == 0 ? 1 : count;
} finally {
is.close();
}
}
Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOld has a few outliers, and countLinesNew has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader is clearly slower.

I have implemented another solution to the problem, I found it more efficient in counting rows:
try
(
FileReader input = new FileReader("input.txt");
LineNumberReader count = new LineNumberReader(input);
)
{
while (count.skip(Long.MAX_VALUE) > 0)
{
// Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
}
result = count.getLineNumber() + 1; // +1 because line index starts at 0
}

The accepted answer has an off by one error for multi line files which don't end in newline. A one line file ending without a newline would return 1, but a two line file ending without a newline would return 1 too. Here's an implementation of the accepted solution which fixes this. The endsWithoutNewLine checks are wasteful for everything but the final read, but should be trivial time wise compared to the overall function.
public int count(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean endsWithoutNewLine = false;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
endsWithoutNewLine = (c[readChars - 1] != '\n');
}
if(endsWithoutNewLine) {
++count;
}
return count;
} finally {
is.close();
}
}

With java-8, you can use streams:
try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
long numOfLines = lines.count();
...
}

The answer with the method count() above gave me line miscounts if a file didn't have a newline at the end of the file - it failed to count the last line in the file.
This method works better for me:
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}

I tested the above methods for counting lines and here are my observations for Different methods as tested on my system
File Size : 1.6 Gb
Methods:
Using Scanner : 35s approx
Using BufferedReader : 5s approx
Using Java 8 : 5s approx
Using LineNumberReader : 5s approx
Moreover Java8 Approach seems quite handy :
Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]

I know this is an old question, but the accepted solution didn't quite match what I needed it to do. So, I refined it to accept various line terminators (rather than just line feed) and to use a specified character encoding (rather than ISO-8859-n). All in one method (refactor as appropriate):
public static long getLinesCount(String fileName, String encodingName) throws IOException {
long linesCount = 0;
File file = new File(fileName);
FileInputStream fileIn = new FileInputStream(file);
try {
Charset encoding = Charset.forName(encodingName);
Reader fileReader = new InputStreamReader(fileIn, encoding);
int bufferSize = 4096;
Reader reader = new BufferedReader(fileReader, bufferSize);
char[] buffer = new char[bufferSize];
int prevChar = -1;
int readCount = reader.read(buffer);
while (readCount != -1) {
for (int i = 0; i < readCount; i++) {
int nextChar = buffer[i];
switch (nextChar) {
case '\r': {
// The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
linesCount++;
break;
}
case '\n': {
if (prevChar == '\r') {
// The current line is terminated by a carriage return immediately followed by a line feed.
// The line has already been counted.
} else {
// The current line is terminated by a line feed.
linesCount++;
}
break;
}
}
prevChar = nextChar;
}
readCount = reader.read(buffer);
}
if (prevCh != -1) {
switch (prevCh) {
case '\r':
case '\n': {
// The last line is terminated by a line terminator.
// The last line has already been counted.
break;
}
default: {
// The last line is terminated by end-of-file.
linesCount++;
}
}
}
} finally {
fileIn.close();
}
return linesCount;
}
This solution is comparable in speed to the accepted solution, about 4% slower in my tests (though timing tests in Java are notoriously unreliable).

/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (Stream<String> lines = Files.lines(file.toPath())) {
return lines.count();
}
}
Tested on JDK8_u31. But indeed performance is slow compared to this method:
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
byte[] c = new byte[1024];
boolean empty = true,
lastEmpty = false;
long count = 0;
int read;
while ((read = is.read(c)) != -1) {
for (int i = 0; i < read; i++) {
if (c[i] == '\n') {
count++;
lastEmpty = true;
} else if (lastEmpty) {
lastEmpty = false;
}
}
empty = false;
}
if (!empty) {
if (count == 0) {
count = 1;
} else if (!lastEmpty) {
count++;
}
}
return count;
}
}
Tested and very fast.

A straight-forward way using Scanner
static void lineCounter (String path) throws IOException {
int lineCount = 0, commentsCount = 0;
Scanner input = new Scanner(new File(path));
while (input.hasNextLine()) {
String data = input.nextLine();
if (data.startsWith("//")) commentsCount++;
lineCount++;
}
System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
}

I concluded that wc -l:s method of counting newlines is fine but returns non-intuitive results on files where the last line doesn't end with a newline.
And #er.vikas solution based on LineNumberReader but adding one to the line count returned non-intuitive results on files where the last line does end with newline.
I therefore made an algo which handles as follows:
#Test
public void empty() throws IOException {
assertEquals(0, count(""));
}
#Test
public void singleNewline() throws IOException {
assertEquals(1, count("\n"));
}
#Test
public void dataWithoutNewline() throws IOException {
assertEquals(1, count("one"));
}
#Test
public void oneCompleteLine() throws IOException {
assertEquals(1, count("one\n"));
}
#Test
public void twoCompleteLines() throws IOException {
assertEquals(2, count("one\ntwo\n"));
}
#Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
assertEquals(2, count("one\ntwo"));
}
#Test
public void aFewLines() throws IOException {
assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}
And it looks like this:
static long countLines(InputStream is) throws IOException {
try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
char[] buf = new char[8192];
int n, previousN = -1;
//Read will return at least one byte, no need to buffer more
while((n = lnr.read(buf)) != -1) {
previousN = n;
}
int ln = lnr.getLineNumber();
if (previousN == -1) {
//No data read at all, i.e file was empty
return 0;
} else {
char lastChar = buf[previousN - 1];
if (lastChar == '\n' || lastChar == '\r') {
//Ending with newline, deduct one
return ln;
}
}
//normal case, return line number + 1
return ln + 1;
}
}
If you want intuitive results, you may use this. If you just want wc -l compatibility, simple use #er.vikas solution, but don't add one to the result and retry the skip:
try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
while(lnr.skip(Long.MAX_VALUE) > 0){};
return lnr.getLineNumber();
}

How about using the Process class from within Java code? And then reading the output of the command.
Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
int lineCount = 0;
while ((line = b.readLine()) != null) {
System.out.println(line);
lineCount = Integer.parseInt(line);
}
Need to try it though. Will post the results.

It seems that there are a few different approaches you can take with LineNumberReader.
I did this:
int lines = 0;
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
String line = count.readLine();
if(count.ready())
{
while(line != null) {
lines = count.getLineNumber();
line = count.readLine();
}
lines+=1;
}
count.close();
System.out.println(lines);
Even more simply, you can use the Java BufferedReader lines() Method to return a stream of the elements, and then use the Stream count() method to count all of the elements. Then simply add one to the output to get the number of rows in the text file.
As example:
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
int lines = (int)count.lines().count() + 1;
count.close();
System.out.println(lines);

This funny solution works really good actually!
public static int countLines(File input) throws IOException {
try (InputStream is = new FileInputStream(input)) {
int count = 1;
for (int aChar = 0; aChar != -1;aChar = is.read())
count += aChar == '\n' ? 1 : 0;
return count;
}
}

On Unix-based systems, use the wc command on the command-line.

Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.

If you don't have any index structures, you'll not get around the reading of the complete file. But you can optimize it by avoiding to read it line by line and use a regex to match all line terminators.

Best Optimized code for multi line files having no newline('\n') character at EOF.
/**
*
* #param filename
* #return
* #throws IOException
*/
public static int countLines(String filename) throws IOException {
int count = 0;
boolean empty = true;
FileInputStream fis = null;
InputStream is = null;
try {
fis = new FileInputStream(filename);
is = new BufferedInputStream(fis);
byte[] c = new byte[1024];
int readChars = 0;
boolean isLine = false;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if ( c[i] == '\n' ) {
isLine = false;
++count;
}else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF
isLine = true;
}
}
}
if(isLine){
++count;
}
}catch(IOException e){
e.printStackTrace();
}finally {
if(is != null){
is.close();
}
if(fis != null){
fis.close();
}
}
LOG.info("count: "+count);
return (count == 0 && !empty) ? 1 : count;
}

Scanner with regex:
public int getLineCount() {
Scanner fileScanner = null;
int lineCount = 0;
Pattern lineEndPattern = Pattern.compile("(?m)$");
try {
fileScanner = new Scanner(new File(filename)).useDelimiter(lineEndPattern);
while (fileScanner.hasNext()) {
fileScanner.next();
++lineCount;
}
}catch(FileNotFoundException e) {
e.printStackTrace();
return lineCount;
}
fileScanner.close();
return lineCount;
}
Haven't clocked it.

if you use this
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
you cant run to big num rows, likes 100K rows, because return from reader.getLineNumber is int. you need long type of data to process maximum rows..

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Save a reader of a file in a database in Java - java

Related

how to read csv file without specific number of column in java and without change the code after every additional column

Java - Read large .txt data file in batch size of 10

How to know bytes read(offset) of BufferedReader?

Read in N Lines of an Input Stream and print in reverse order without using array or list type structure?

Number of lines in a file in Java

Categories

Resources