InputStream does not work if used multiple times - java

I'm having some problems with an InputStream. I'm writing a little Android application and part of it has to fetch HTML code from a website. Generally, it works fine, but sometimes (usually the second time it's called, but it may also take a few tries to reproduce this) it will just skip the InputStream (I noticed this since it takes a few seconds while debugging, but every time it fails it will just immediately skip to the next line). Any ideas what could be causing this and how to fix it?
private class fetchdata extends AsyncTask<Void, Void, Void> {
public Activity activity;
public fetchdata(Activity a)
{
activity = a;
}
protected Void doInBackground(Void...voids)
{
String[] page = new String[16384]; //Number is just for testing, don't worry
try {
page = executeHttpGet();
} catch (Exception e) {
page[0] = "Error";
}
displayFetchedData(page);
return null;
}
public String[] executeHttpGet() throws Exception {
URL u;
InputStream is = null;
DataInputStream dis = null;
String s;
int i = 0;
int hostselection;
boolean skip;
String[] page = new String[16384];
String[] serverurls = new String[2];
addSecurityException();
SharedPreferences dataprefs = getSharedPreferences("serverdata", Context.MODE_PRIVATE);
hostselection = dataprefs.getInt("selectedhost", 0);
SharedPreferences preferences;
preferences = PreferenceManager.getDefaultSharedPreferences(activity);
serverurls[0] = preferences.getString("server01", "");
serverurls[1] = preferences.getString("server02", "");
for (int j = 0; j < 2; j++)
{
skip = false;
if (j == 0)
{
if (hostselection == 0 || hostselection == 1)
{
Authenticator.setDefault(new MyAuthenticator(activity, false));
}
else
{
skip = true;
}
}
if (j == 1)
{
if (hostselection == 0 || hostselection == 2)
{
Authenticator.setDefault(new MyAuthenticator(activity, true));
}
else
{
skip = true;
}
}
if (skip == false)
{
try {
u = new URL(serverurls[j]);
is = u.openStream(); //LINE IN QUESTION
dis = new DataInputStream(new BufferedInputStream(is));
while ((s = dis.readLine()) != null)
{
if (s.length() > 18)
{
page[i] = s;
i++;
}
}
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
is.close();
}
}
return page;
}

Create a BufferedInputStream out of the input stream you get, then Call mark() method with the input stream length as parameter. Call reset() when you need to reuse the stream next time.

Unrelated, but you aren't closing the DataInputStream.
Tell us more about the skipping. Is an exception raised? Is it possible that when you are running it outside of debug mode it is somehow referencing stale class files? The only thing I can imagine is that somehow your debug and normal classes are somehow different.

Related

FileInputStream returns null

I have two methods, both using FileInputStream Objects.
The First one returns expected value. This method works fine.
But the Second method returns nothing. The value passed to the second method is not null.
I need to get the hexadecimal format of the files passed to methods.
Why is it so? Kindly Explain.
Here is my code
public String binaryFile1(File file1){
try{
stringBuilder1=new StringBuilder();
is1=new FileInputStream(file1);
while(b!=-1){
counter++;
b=is1.read();
String s = Integer.toHexString(b).toUpperCase();
if (s.length() == 1) {
stringBuilder1.append('0');
}
if(counter%5==0){
stringBuilder1.append(s).append("\n");
counter=0;
}else
stringBuilder1.append(s).append(' ');
}
is1.close();
}catch(Exception e){
e.printStackTrace();
}
return stringBuilder1.toString();
}
public String binaryFile2(File file2){
try{
stringBuilder2=new StringBuilder();
is2=new FileInputStream(file2);
while(b!=-1){
counter++;
b=is2.read(); //Here b does not get any content assigned.
String s = Integer.toHexString(b).toUpperCase();
if (s.length() == 1) {
stringBuilder2.append('0');
}
if(counter%5==0){
stringBuilder2.append(s).append("\n");
counter=0;
}else
stringBuilder2.append(s).append(' ');
}
is2.close();
}catch(Exception e){
e.printStackTrace();
}
return stringBuilder2.toString(); //Here stringBuilder2 is null
}
Since b is shared and you don't reset it after binaryFile1 it's still -1 at the start of binaryFile2. I suggest you use,
int b;
while ((b = is2.read()) != -1) {
// ...
}
Edit
It is important to close your resources when you're done. I also suggest you try and limit variable scope as much as possible. Using try-with-resources you could write binaryFile2 like
public String binaryFile2(File file) {
StringBuilder sb = new StringBuilder();
int counter = 0;
try (InputStream is = new FileInputStream(file)) {
int b;
while ((b = is.read()) != -1) {
counter++;
String s = Integer.toHexString(b).toUpperCase();
if (s.length() == 1) {
sb.append('0');
}
sb.append(s);
if (counter % 5 == 0) {
sb.append(System.lineSeparator());
counter = 0;
} else {
sb.append(' ');
}
}
} catch (Exception e) {
e.printStackTrace();
}
return sb.toString();
}

Save a reader of a file in a database in Java

I have a Reader in Java:
And the reader (Reader read) is from a file with 1'000.000 of lines
And i need save each line in my database, i am reading the Reader like:
int data = read.read();
String line = "";
while (data != -1) {
char dataChar = (char) data;
data = read.read();
if (dataChar != '\n') {
line = line + dataChar;
} else {
i++;
showline(line);
line = "";
}
}
Then i am calling my DAO for each line:
private static void showline(String line) {
try {
if (line.startsWith(prefix)) {
line = line.substring(prefix.length());
}
ms = new Msisdn(Long.parseLong(line, 10), idList);
ListDAO.createMsisdn(ms);
} catch (Exception e) {
}
}
And my DAO is:
public static void createMsisdn(Msisdn msisdn) {
EntityManager e = DBManager.createEM();
try {
createMsisdn(msisdn, e);
} finally {
if (e != null) {
e.close();
}
}
}
public static void createMsisdn(Msisdn msisdn, EntityManager em) {
em.getTransaction().begin();
em.persist(msisdn);
em.getTransaction().commit();
}
But my problem is that with a file with 1'000.000 lines it takes about 1 hour 30 minutes to complete. How can I make it faster?
(My main problem is call the DAO 1'000.000 of times because it is very slow, because the while is faster, without the call to the DAO the time is less than 1 minute, but with the call to the DAO the time is 2 hours)
Reading characters and appending them into a String one by one is incredibly inefficient. Using a BufferedReader to read lines of text is much better:
String line;
BufferedReader reader = new BufferedReader(read);
while ((line = reader.readLine()) != null) {
showline(line);
}
This won't have a big effect in your case though: you are inserting each line in a separate transaction, and each transaction can take hundreds of milliseconds to complete. You should structure your code in a way that several lines could be inserted in a single transaction. For example you can read blocks of lines like this, but you'll have to change the showlines and createMsisdn methods so that they accept several at a time and process them in a single batch:
final int TRANSACTION_SIZE = 500;
int i = 0;
String[] lines = new String[TRANSACTION_SIZE];
BufferedReader reader = new BufferedReader(read);
while ((lines[i] = reader.readLine()) != null) {
if (i >= lines.length) {
showlines(lines, lines.length);
i = 0;
} else {
i++;
}
}
if (i > 0) showlines(lines, i);

reading from the file and writing to the file in java

I am beginner with Java.
This is my approach:
I am trying to read two files and then get the union of them. I should am using an array with size 100. (just one array allowed, reading and writing line by line or arrayList or other structures are not allowed.)
First, I read all records from file1, and write them to the output, a third file. For that purpose, I read 100 record at a time, and write them to the third file using iteration.
After that, like first file, this time I read second file as 100 records at a time, and write them to the memory[]. Then I find the common records, if the record which I read from File2 is not in File1, I write it to the output file. I do this until reader2.readLine() gets null and I re-open file1 in each iteration.
This is what I have done so far, almost done. Any help would be appreciated.
Edit: ok, now it doesn't give any exception, but it can't find the different records and can't write them. I guess the last for loop and booleans don't work , why? I really need help. Thanks for your patience.
import java.io.*;
public class FileUnion
{
private static long startTime, endTime;
public static void main(String[] args) throws IOException
{
System.out.println("PROCESSING...");
reset();
startTimer();
String[] memory = new String[100];
int memorySize = memory.length;
File file1 = new File("stdlist1.txt");
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
File file3 = new File("union.txt");
BufferedWriter writer = new BufferedWriter(new FileWriter(file3));
int numberOfLinesFile1 = 0;
String line1 = null;
String line11 = null;
while((line1 = reader1.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line1;
i++;
if(i < memorySize)
{
line1 = reader1.readLine();
}
}
for (int i = 0; i < memorySize; i++)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
reader1.close();
File file2 = new File("stdlist2.txt");
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String line2 = null;
while((line2 = reader2.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
line2 = reader2.readLine();
}
}
for (int k = 0; k < memorySize; k++ )
{
boolean found = false;
File f1 = new File("stdlist1.txt");
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for (int m = 0; m < numberOfLinesFile1; m++)
{
line11 = buff1.readLine();
if (line11.equals(memory[k]) && found == false);
{
found = true;
}
}
buff1.close();
if (found == false)
{
writer.write(memory[k]);
writer.newLine();
}
}
}
reader2.close();
writer.close();
endTimer();
long time = duration();
System.out.println("PROCESS COMPLETED SUCCESSFULLY");
System.out.println("Duration: " + time + " ms");
}
public static void startTimer()
{
startTime = System.currentTimeMillis();
}
public static void endTimer()
{
endTime = System.currentTimeMillis();
}
public static long duration()
{
return endTime - startTime;
}
public static void reset()
{
startTime = 0;
endTime = 0;
}
}
EDIT! Redo.
Ok, so to use 100 lines at a time you need to check for null, otherwise trying to write null to a file could cause errors.
You are checking if the file is at the end once, and then gathering 99 more peices of info without checking for null.
What if when this line is called:
while((line2 = reader2.readLine()) != null)
there is only 1 line left in the file? Then your memory array contains 99 instances of null, and you try to write null to the file 99 times. That's worse case scenario.
I don't really know how much help we are supposed to give to people looking for homework help, on most sites I'm familiar with it's not even allowed.
here is an example of one way to write the first file.
String line1 = reader1.readLine();
boolean end_of_file1 = false;
while(!end_of_file)
{
for (int i = 0; i < memorySize)
{
memory[i] = line1;
i++;
if(i < memorySize)
{
if((line1 = reader1.readLine()) == null)
{
end_of_file1 = true;
}
}
}
for (int i = 0; i < memorySize; i++)
{
if(!memory[i] == null)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
}
reader1.close();
once you have that, to make the checking for copies easier, make a public static boolean that checks the file for it, then you can call that, it will make the code cleaner.
public static boolean isUsed(String f1, String item, int dist)
{
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for(int i = 0;i<dist;i++)
{
String line = buff1.readLine()
if(line == null){
return false;
}
if(line.equals(item))
{
return true;
}
}
return false;
}
Then use the same method as writing file 1, only before writing each line check to see if !isUsed()
boolean end_of_file2 = false;
memory = new String[memorySize];// Reset the memory, erase old data from file1
int numberOfLinesFile2=0;
String line2 = reader2.readLine();
while(!end_of_file2)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
if((line2 = reader2.readLine()) == null)
{
end_of_file2 = true;
}
}
}
for (int i = 0; i < memorySize; i++)
{
if(!memory[i] == null)
{
//Check is current item was used in file 1.
if(!isUsed(file1, memory[i], numberOfLinesFile1)){//If not used already
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile2++;
}
}
}
}
reader2.close();
writer.close();
Hope this helps. Notice I'm not supplying the full code, because I've learned that just pasting the code will make it more likely for copy and paste to just use a code without understanding it. I hope you find it useful.

How to implement a universal file loader in Java?

This is what I'm trying to do:
public String load(String path) {
//...
}
load("file:/tmp/foo.txt"); // loads by absolute file name
load("classpath:bar.txt"); // loads from classpath
I think it's possible to do with JDK, but can't find out how exactly.
I can think of two approaches:
Just write plain Java code to extract the "scheme" from those URI-like strings, and then dispatch to the different code to load the file in different ways.
Register a custom URL stream handler to deal with the "classpath" case and then use URL.openStream() to open the stream to read the object.
The package documentation for java.net has some information about how stream handlers are discovered.
From my libraries omino roundabout, the two methods you'll need... I need them everywhere. The resource reader is relative to a class, at least to know which jar to read. But the path can start with / to force it back to the top. Enjoy!
(You'll have to make our own top level wrapper to look for "file:" and "classpath:".)
see also http://code.google.com/p/omino-roundabout/
public static String readFile(String filePath)
{
File f = new File(filePath);
if (!f.exists())
return null;
String result = "";
try
{
FileReader in = new FileReader(f);
boolean doing = true;
char[] bunch = new char[10000];
int soFar = 0;
while (doing)
{
int got = in.read(bunch, 0, bunch.length);
if (got <= 0)
doing = false;
else
{
String k = new String(bunch, 0, got);
result += k;
soFar += got;
}
}
} catch (Exception e)
{
return null;
}
// Strip off the UTF-8 front, if present. We hate this. EF BB BF
// see http://stackoverflow.com/questions/4897876/reading-utf-8-bom-marker for example.
// Mysteriously, when I read those 3 chars, they come in as 212,170,248. Fine, empirically, I'll strip that, too.
if(result != null && result.length() >= 3)
{
int c0 = result.charAt(0);
int c1 = result.charAt(1);
int c2 = result.charAt(2);
boolean leadingBom = (c0 == 0xEF && c1 == 0xBB && c2 == 0xBF);
leadingBom |= (c0 == 212 && c1 == 170 && c2 == 248);
if(leadingBom)
result = result.substring(3);
}
// And because I'm a dictator, fix up the line feeds.
result = result.replaceAll("\\r\\n", "\n");
result = result.replaceAll("\\r","\n");
return result;
}
static public String readResource(Class<?> aClass,String srcResourcePath)
{
if(aClass == null || srcResourcePath==null || srcResourcePath.length() == 0)
return null;
StringBuffer resultB = new StringBuffer();
URL resourceURL = null;
try
{
resourceURL = aClass.getResource(srcResourcePath);
}
catch(Exception e) { /* leave result null */ }
if(resourceURL == null)
return null; // sorry.
try
{
InputStream is = resourceURL.openStream();
final int BLOCKSIZE = 13007;
byte[] bytes = new byte[BLOCKSIZE];
int bytesRead = 0;
while(bytesRead >= 0)
{
bytesRead = is.read(bytes);
if(bytesRead > 0)
{
char[] chars = new char[bytesRead];
for(int i = 0; i < bytesRead; i++)
chars[i] = (char)bytes[i];
resultB.append(chars);
}
}
}
catch(IOException e)
{
return null; // sorry
}
String result = resultB.toString();
return result;
}
(edit -- removed a stray reference to OmString, to keep it standalone here.)

Number of lines in a file in Java

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file
I was wondering if there is a smarter way to do that
This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.
public static int countLinesOld(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:
public static int countLinesNew(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int readChars = is.read(c);
if (readChars == -1) {
// bail out if nothing to read
return 0;
}
// make it easy for the optimizer to tune this loop
int count = 0;
while (readChars == 1024) {
for (int i=0; i<1024;) {
if (c[i++] == '\n') {
++count;
}
}
readChars = is.read(c);
}
// count remaining characters
while (readChars != -1) {
for (int i=0; i<readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
readChars = is.read(c);
}
return count == 0 ? 1 : count;
} finally {
is.close();
}
}
Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOld has a few outliers, and countLinesNew has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader is clearly slower.
I have implemented another solution to the problem, I found it more efficient in counting rows:
try
(
FileReader input = new FileReader("input.txt");
LineNumberReader count = new LineNumberReader(input);
)
{
while (count.skip(Long.MAX_VALUE) > 0)
{
// Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
}
result = count.getLineNumber() + 1; // +1 because line index starts at 0
}
The accepted answer has an off by one error for multi line files which don't end in newline. A one line file ending without a newline would return 1, but a two line file ending without a newline would return 1 too. Here's an implementation of the accepted solution which fixes this. The endsWithoutNewLine checks are wasteful for everything but the final read, but should be trivial time wise compared to the overall function.
public int count(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean endsWithoutNewLine = false;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
endsWithoutNewLine = (c[readChars - 1] != '\n');
}
if(endsWithoutNewLine) {
++count;
}
return count;
} finally {
is.close();
}
}
With java-8, you can use streams:
try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
long numOfLines = lines.count();
...
}
The answer with the method count() above gave me line miscounts if a file didn't have a newline at the end of the file - it failed to count the last line in the file.
This method works better for me:
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
I tested the above methods for counting lines and here are my observations for Different methods as tested on my system
File Size : 1.6 Gb
Methods:
Using Scanner : 35s approx
Using BufferedReader : 5s approx
Using Java 8 : 5s approx
Using LineNumberReader : 5s approx
Moreover Java8 Approach seems quite handy :
Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]
I know this is an old question, but the accepted solution didn't quite match what I needed it to do. So, I refined it to accept various line terminators (rather than just line feed) and to use a specified character encoding (rather than ISO-8859-n). All in one method (refactor as appropriate):
public static long getLinesCount(String fileName, String encodingName) throws IOException {
long linesCount = 0;
File file = new File(fileName);
FileInputStream fileIn = new FileInputStream(file);
try {
Charset encoding = Charset.forName(encodingName);
Reader fileReader = new InputStreamReader(fileIn, encoding);
int bufferSize = 4096;
Reader reader = new BufferedReader(fileReader, bufferSize);
char[] buffer = new char[bufferSize];
int prevChar = -1;
int readCount = reader.read(buffer);
while (readCount != -1) {
for (int i = 0; i < readCount; i++) {
int nextChar = buffer[i];
switch (nextChar) {
case '\r': {
// The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
linesCount++;
break;
}
case '\n': {
if (prevChar == '\r') {
// The current line is terminated by a carriage return immediately followed by a line feed.
// The line has already been counted.
} else {
// The current line is terminated by a line feed.
linesCount++;
}
break;
}
}
prevChar = nextChar;
}
readCount = reader.read(buffer);
}
if (prevCh != -1) {
switch (prevCh) {
case '\r':
case '\n': {
// The last line is terminated by a line terminator.
// The last line has already been counted.
break;
}
default: {
// The last line is terminated by end-of-file.
linesCount++;
}
}
}
} finally {
fileIn.close();
}
return linesCount;
}
This solution is comparable in speed to the accepted solution, about 4% slower in my tests (though timing tests in Java are notoriously unreliable).
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (Stream<String> lines = Files.lines(file.toPath())) {
return lines.count();
}
}
Tested on JDK8_u31. But indeed performance is slow compared to this method:
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
byte[] c = new byte[1024];
boolean empty = true,
lastEmpty = false;
long count = 0;
int read;
while ((read = is.read(c)) != -1) {
for (int i = 0; i < read; i++) {
if (c[i] == '\n') {
count++;
lastEmpty = true;
} else if (lastEmpty) {
lastEmpty = false;
}
}
empty = false;
}
if (!empty) {
if (count == 0) {
count = 1;
} else if (!lastEmpty) {
count++;
}
}
return count;
}
}
Tested and very fast.
A straight-forward way using Scanner
static void lineCounter (String path) throws IOException {
int lineCount = 0, commentsCount = 0;
Scanner input = new Scanner(new File(path));
while (input.hasNextLine()) {
String data = input.nextLine();
if (data.startsWith("//")) commentsCount++;
lineCount++;
}
System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
}
I concluded that wc -l:s method of counting newlines is fine but returns non-intuitive results on files where the last line doesn't end with a newline.
And #er.vikas solution based on LineNumberReader but adding one to the line count returned non-intuitive results on files where the last line does end with newline.
I therefore made an algo which handles as follows:
#Test
public void empty() throws IOException {
assertEquals(0, count(""));
}
#Test
public void singleNewline() throws IOException {
assertEquals(1, count("\n"));
}
#Test
public void dataWithoutNewline() throws IOException {
assertEquals(1, count("one"));
}
#Test
public void oneCompleteLine() throws IOException {
assertEquals(1, count("one\n"));
}
#Test
public void twoCompleteLines() throws IOException {
assertEquals(2, count("one\ntwo\n"));
}
#Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
assertEquals(2, count("one\ntwo"));
}
#Test
public void aFewLines() throws IOException {
assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}
And it looks like this:
static long countLines(InputStream is) throws IOException {
try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
char[] buf = new char[8192];
int n, previousN = -1;
//Read will return at least one byte, no need to buffer more
while((n = lnr.read(buf)) != -1) {
previousN = n;
}
int ln = lnr.getLineNumber();
if (previousN == -1) {
//No data read at all, i.e file was empty
return 0;
} else {
char lastChar = buf[previousN - 1];
if (lastChar == '\n' || lastChar == '\r') {
//Ending with newline, deduct one
return ln;
}
}
//normal case, return line number + 1
return ln + 1;
}
}
If you want intuitive results, you may use this. If you just want wc -l compatibility, simple use #er.vikas solution, but don't add one to the result and retry the skip:
try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
while(lnr.skip(Long.MAX_VALUE) > 0){};
return lnr.getLineNumber();
}
How about using the Process class from within Java code? And then reading the output of the command.
Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
int lineCount = 0;
while ((line = b.readLine()) != null) {
System.out.println(line);
lineCount = Integer.parseInt(line);
}
Need to try it though. Will post the results.
It seems that there are a few different approaches you can take with LineNumberReader.
I did this:
int lines = 0;
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
String line = count.readLine();
if(count.ready())
{
while(line != null) {
lines = count.getLineNumber();
line = count.readLine();
}
lines+=1;
}
count.close();
System.out.println(lines);
Even more simply, you can use the Java BufferedReader lines() Method to return a stream of the elements, and then use the Stream count() method to count all of the elements. Then simply add one to the output to get the number of rows in the text file.
As example:
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
int lines = (int)count.lines().count() + 1;
count.close();
System.out.println(lines);
This funny solution works really good actually!
public static int countLines(File input) throws IOException {
try (InputStream is = new FileInputStream(input)) {
int count = 1;
for (int aChar = 0; aChar != -1;aChar = is.read())
count += aChar == '\n' ? 1 : 0;
return count;
}
}
On Unix-based systems, use the wc command on the command-line.
Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.
If you don't have any index structures, you'll not get around the reading of the complete file. But you can optimize it by avoiding to read it line by line and use a regex to match all line terminators.
Best Optimized code for multi line files having no newline('\n') character at EOF.
/**
*
* #param filename
* #return
* #throws IOException
*/
public static int countLines(String filename) throws IOException {
int count = 0;
boolean empty = true;
FileInputStream fis = null;
InputStream is = null;
try {
fis = new FileInputStream(filename);
is = new BufferedInputStream(fis);
byte[] c = new byte[1024];
int readChars = 0;
boolean isLine = false;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if ( c[i] == '\n' ) {
isLine = false;
++count;
}else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF
isLine = true;
}
}
}
if(isLine){
++count;
}
}catch(IOException e){
e.printStackTrace();
}finally {
if(is != null){
is.close();
}
if(fis != null){
fis.close();
}
}
LOG.info("count: "+count);
return (count == 0 && !empty) ? 1 : count;
}
Scanner with regex:
public int getLineCount() {
Scanner fileScanner = null;
int lineCount = 0;
Pattern lineEndPattern = Pattern.compile("(?m)$");
try {
fileScanner = new Scanner(new File(filename)).useDelimiter(lineEndPattern);
while (fileScanner.hasNext()) {
fileScanner.next();
++lineCount;
}
}catch(FileNotFoundException e) {
e.printStackTrace();
return lineCount;
}
fileScanner.close();
return lineCount;
}
Haven't clocked it.
if you use this
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
you cant run to big num rows, likes 100K rows, because return from reader.getLineNumber is int. you need long type of data to process maximum rows..

Categories

Resources