Java: reading utf-8 file page by page using FileInputStream - java

I need some code that will allow me to read one page at a time from a UTF-8 file.
I've used the code;
File fileDir = new File("DIRECTORY OF FILE");
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
in.close();
}
After surrounding it with a try catch block it runs but outputs the entire file!
Is there a way to amend this code to just display ONE PAGE of text at a time?
The file is in UTF-8 format and after viewing it in notepad++, i can see the file contains FF characters to denote the next page.

You will need to look for the form feed character by comparing to 0x0C.
For example:
char c = in.read();
while ( c != -1 ) {
if ( c == 0x0C ) {
// form feed
} else {
// handle displayable character
}
c = in.read();
}
EDIT added an example of using a Scanner, as suggested by Boris
Scanner s = new Scanner(new File("a.txt")).useDelimiter("\u000C");
while ( s.hasNext() ) {
String str = s.next();
System.out.println( str );
}

If the file is valid UTF-8, that is, the pages are split by U+00FF, aka (char) 0xFF, aka "\u00FF", 'ÿ', then a buffered reader can do. If it is a byte 0xFF there would be a problem, as UTF-8 may use a byte 0xFF.
int soughtPageno = ...; // Counted from 0
int currentPageno = 0;
try (BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream(fileDir), StandardCharsets.UTF_8))) {
String str;
while ((str = in.readLine()) != null && currentPageno <= soughtPageno) {
for (int pos = str.indexOf('\u00FF'; pos >= 0; )) {
if (currentPageno == soughtPageno) {
System.out.println(str.substring(0, pos);
++currentPageno;
break;
}
++currentPageno;
str = str.substring(pos + 1);
}
if (currentPageno == soughtPageno) {
System.out.println(str);
}
}
}
For a byte 0xFF (wrong, hacked UTF-8) use a wrapping InputStream between FileInputStream and the reader:
class PageInputStream implements InputStream {
InputStream in;
int pageno = 0;
boolean eof = false;
PageInputSTream(InputStream in, int pageno) {
this.in = in;
this.pageno = pageno;
}
int read() throws IOException {
if (eof) {
return -1;
}
while (pageno > 0) {
int c = in.read();
if (c == 0xFF) {
--pageno;
} else if (c == -1) {
eof = true;
in.close();
return -1;
}
}
int c = in.read();
if (c == 0xFF) {
c = -1;
eof = true;
in.close();
}
return c;
}
Take this as an example, a bit more work is to be done.

You can use a Regex to detect form-feed (page break) characters. Try something like this:
File fileDir = new File("DIRECTORY OF FILE");
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF8"));
String str;
Regex pageBreak = new Regex("(^.*)(\f)(.*$)")
while ((str = in.readLine()) != null) {
Match match = pageBreak.Match(str);
bool pageBreakFound = match.Success;
if(pageBreakFound){
String textBeforeLineBreak = match.Groups[1].Value;
//Group[2] will contain the form feed character
//Group[3] will contain the text after the form feed character
//Do whatever logic you want now that you know you hit a page boundary
}
System.out.println(str);
}
in.close();
The parenthesis around portions of the Regex denote capture groups, which get recorded in the Match object. The \f matches on the form feed character.
Edited Apologies, for some reason I read C# instead of Java, but the core concept is the same. Here's the Regex documentation for Java: http://docs.oracle.com/javase/tutorial/essential/regex/

Related

Is there a way to get the updated part of a file (for example a game log) that continuesly updates? [duplicate]

What's the quickest and most efficient way of reading the last line of text from a [very, very large] file in Java?
Below are two functions, one that returns the last non-blank line of a file without loading or stepping through the entire file, and the other that returns the last N lines of the file without stepping through the entire file:
What tail does is zoom straight to the last character of the file, then steps backward, character by character, recording what it sees until it finds a line break. Once it finds a line break, it breaks out of the loop. Reverses what was recorded and throws it into a string and returns. 0xA is the new line and 0xD is the carriage return.
If your line endings are \r\n or crlf or some other "double newline style newline", then you will have to specify n*2 lines to get the last n lines because it counts 2 lines for every line.
public String tail( File file ) {
RandomAccessFile fileHandler = null;
try {
fileHandler = new RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if( filePointer == fileLength ) {
continue;
}
break;
} else if( readByte == 0xD ) {
if( filePointer == fileLength - 1 ) {
continue;
}
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
} finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
/* ignore */
}
}
}
But you probably don't want the last line, you want the last N lines, so use this instead:
public String tail2( File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler =
new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
Invoke the above methods like this:
File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));
Warning
In the wild west of unicode this code can cause the output of this function to come out wrong. For example "Mary?s" instead of "Mary's". Characters with hats, accents, Chinese characters etc may cause the output to be wrong because accents are added as modifiers after the character. Reversing compound characters changes the nature of the identity of the character on reversal. You will have to do full battery of tests on all languages you plan to use this with.
For more information about this unicode reversal problem read this:
https://codeblog.jonskeet.uk/2009/11/02/omg-ponies-aka-humanity-epic-fail/
Apache Commons has an implementation using RandomAccessFile.
It's called ReversedLinesFileReader.
Have a look at my answer to a similar question for C#. The code would be quite similar, although the encoding support is somewhat different in Java.
Basically it's not a terribly easy thing to do in general. As MSalter points out, UTF-8 does make it easy to spot \r or \n as the UTF-8 representation of those characters is just the same as ASCII, and those bytes won't occur in multi-byte character.
So basically, take a buffer of (say) 2K, and progressively read backwards (skip to 2K before you were before, read the next 2K) checking for a line termination. Then skip to exactly the right place in the stream, create an InputStreamReader on the top, and a BufferedReader on top of that. Then just call BufferedReader.readLine().
Using FileReader or FileInputStream won't work - you'll have to use either FileChannel or RandomAccessFile to loop through the file backwards from the end. Encodings will be a problem though, as Jon said.
You can easily change the below code to print the last line.
MemoryMappedFile for printing last 5 lines:
private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
FileInputStream fileInputStream=new FileInputStream(file);
FileChannel channel=fileInputStream.getChannel();
ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
buffer.position((int)channel.size());
int count=0;
StringBuilder builder=new StringBuilder();
for(long i=channel.size()-1;i>=0;i--){
char c=(char)buffer.get((int)i);
builder.append(c);
if(c=='\n'){
if(count==5)break;
count++;
builder.reverse();
System.out.println(builder.toString());
builder=null;
builder=new StringBuilder();
}
}
channel.close();
}
RandomAccessFile to print last 5 lines:
private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
int lines = 0;
StringBuilder builder = new StringBuilder();
long length = file.length();
length--;
randomAccessFile.seek(length);
for(long seek = length; seek >= 0; --seek){
randomAccessFile.seek(seek);
char c = (char)randomAccessFile.read();
builder.append(c);
if(c == '\n'){
builder = builder.reverse();
System.out.println(builder.toString());
lines++;
builder = null;
builder = new StringBuilder();
if (lines == 5){
break;
}
}
}
}
as far as I know The fastest way to read the last line of a text file is using FileUtils Apache class which is in "org.apache.commons.io". I have a two-million-line file and by using this class, it took me less than one second to find the last line. Here is the my code:
LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
lastLine= lineIterator.nextLine();
}
try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {
String line = null;
System.out.println("======================================");
line = reader.readLine(); //Read Line ONE
line = reader.readLine(); //Read Line TWO
System.out.println("first line : " + line);
//Length of one line if lines are of even length
int len = line.length();
//skip to the end - 3 lines
reader.skip((reqFile.length() - (len*3)));
//Searched to the last line for the date I was looking for.
while((line = reader.readLine()) != null){
System.out.println("FROM LINE : " + line);
String date = line.substring(0,line.indexOf(","));
System.out.println("DATE : " + date); //BAM!!!!!!!!!!!!!!
}
System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
System.out.println("======================================");
} catch (IOException x) {
x.printStackTrace();
}
In C#, you should be able to set the stream's position:
From: http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file
using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
using(StreamReader sr = new StreamReader(fs))
{
sr.BaseStream.Position = fs.Length - 4;
if(sr.ReadToEnd() == "DONE")
// match
}
}
To avoid the Unicode problems related to reverting the string (or the StringBuilder), as discussed in Eric Leschinski excellent answer, one can read to a byte list, from the end of the file, revert it to a byte array and then create the String from the byte array.
Below are the changes to Eric Leschinski answer's code, to do it with a byte array. The code changes are below the commented lines of code:
static public String tail2(File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler = new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
//StringBuilder sb = new StringBuilder();
List<Byte> sb = new ArrayList<>();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
//sb.add( (char) readByte );
sb.add( (byte) readByte );
}
//String lastLine = sb.reverse().toString();
//Revert byte array and create String
byte[] bytes = new byte[sb.size()];
for (int i=0; i<sb.size(); i++) bytes[sb.size()-1-i] = sb.get(i);
String lastLine = new String(bytes);
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
Code is 2 lines only
// Please specify correct Charset
ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);
// read last 2 lines
System.out.println(rlf.toString(2));
Gradle:
implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'
Maven:
<dependency>
<groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
</dependency>

How to parse "Sección" to "Sección"? (string accutes encoding issue)

I have a string with this value "Sección"
I need to parse it to UTF-8, so the string gets transformed to "Sección"
I tried with line = new String(line.getBytes("UTF-8"), "UTF-8"); but this does not work.
Edit
I'm reading the string with this method:
public static String loadLine(InputStream is) {
if (is == null)
return null;
final short TAM_LINE = 256;
String line;
char[] buffer = new char[TAM_LINE];
short i;
int ch;
try {
line = "";
i = 0;
do {
ch = is.read();
if ((ch != '\n') && (ch != -1)) {
buffer[i++] = (char)(ch & 0xFF);
if (i >= TAM_LINE) {
line += new String(buffer, 0, i);
i = 0;
}
}
} while ((ch != '\n') && (ch != -1));
// Si no hemos llegado a leer ning�n caracter, devolvemos null
if (ch == -1 && i == 0)
return null;
// A�adimos el �ltimo trozo de l�nea le�do
line += new String(buffer, 0, i);
} catch (IOException e) {
e.printStackTrace();
return null;
}
return line;
}
The character ó is encoded as 0xc3 0xb3 in UTF-8. It appears that whichever program read that UTF-8-encoded string in the first place read it assuming the wrong encoding, for example windows-1252, where 0xc3 encodes à and 0xb3 encodes ³.
In your case, your edit shows that (as far as I can tell, I don't know Java), you're reading the input byte by byte, building the string one character at a time, one from each byte. This is not a good idea if the encoding UTF-8 uses multiple bytes to encode certain characters such as ó.
You should read the input into a bytes array first, then build a String using the correct encoding:
line = new String(byteArray, "UTF-8")

Reading ascii file line by line - Java

I am trying to read an ascii file and recognize the position of newline character "\n" as to know which and how many characters i have in every line.The file size is 538MB. When i run the below code it never prints me anything.
I search a lot but i didn't find anything for ascii files. I use netbeans and Java 8. Any ideas??
Below is my code.
String inputFile = "C:\myfile.txt";
FileInputStream in = new FileInputStream(inputFile);
FileChannel ch = in.getChannel();
int BUFSIZE = 512;
ByteBuffer buf = ByteBuffer.allocateDirect(BUFSIZE);
Charset cs = Charset.forName("ASCII");
while ( (rd = ch.read( buf )) != -1 ) {
buf.rewind();
CharBuffer chbuf = cs.decode(buf);
for ( int i = 0; i < chbuf.length(); i++ ) {
if (chbuf.get() == '\n'){
System.out.println("PRINT SOMETHING");
}
}
}
Method to store the contents of a file to a string:
static String readFile(String path, Charset encoding) throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
Here's a way to find the occurrences of a character in the entire string:
public static void main(String [] args) throws IOException
{
List<Integer> indexes = new ArrayList<Integer>();
String content = readFile("filetest", StandardCharsets.UTF_8);
int index = content.indexOf('\n');
while (index >= 0)
{
indexes.add(index);
index = content.indexOf('\n', index + 1);
}
}
Found here and here.
The number of characters in a line is the length of the string read by a readLine call:
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
int iLine = 0;
String line;
while ((line = br.readLine()) != null) {
System.out.println( "Line " + iLine + " has " +
line.length() + " characters." );
iLine++;
}
} catch( IOException ioe ){
// ...
}
Note that the (system-dependent) line end marker has been stripped from the string by readLine.
If a very large file contains no newlines, it is indeed possible to run out of memory. Reading character by character will avoid this.
File file = new File( "Z.java" );
Reader reader = new FileReader(file);
int len = 0;
int c;
int iLine = 0;
while( (c = reader.read()) != -1) {
if( c == '\n' ){
iLine++;
System.out.println( "line " + iLine + " contains " +
len + " characters" );
len = 0;
} else {
len++;
}
}
reader.close();
You should user FileReader which is convenience class for reading character files.
FileInputStream javs docs clearly states
FileInputStream is meant for reading streams of raw bytes such as
image data. For reading streams of characters, consider using
FileReader.
Try below
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
for (int pos = line.indexOf("\n"); pos != -1; pos = line.indexOf("\n", pos + 1)) {
System.out.println("\\n at " + pos);
}
}
}

How to know bytes read(offset) of BufferedReader?

I want to read file line by line.
BufferedReader is much faster than RandomAccessFile or BufferedInputStream.
But the problem is that I don't know how many bytes I read.
How to know bytes read(offset)?
I tried.
String buffer;
int offset = 0;
while ((buffer = br.readLine()) != null)
offset += buffer.getBytes().length + 1; // 1 is for line separator
I works if file is small.
But, when the file becomes large, offset becomes smaller than actual value.
How can I get offset?
There is no simple way to do this with BufferedReader because of two effects: Character endcoding and line endings. On Windows, the line ending is \r\n which is two bytes. On Unix, the line separator is a single byte. BufferedReader will handle both cases without you noticing, so after readLine(), you won't know how many bytes were skipped.
Also buffer.getBytes() only returns the correct result when your default encoding and the encoding of the data in the file accidentally happens to be the same. When using byte[] <-> String conversion of any kind, you should always specify exactly which encoding should be used.
You also can't use a counting InputStream because the buffered readers read data in large chunks. So after reading the first line with, say, 5 bytes, the counter in the inner InputStream would return 4096 because the reader always reads that many bytes into its internal buffer.
You can have a look at NIO for this. You can use a low level ByteBuffer to keep track of the offset and wrap that in a CharBuffer to convert the input into lines.
Here's something that should work. It assumes UTF-8, but you can easily change that.
import java.io.*;
class main {
public static void main(final String[] args) throws Exception {
ByteCountingLineReader r = new ByteCountingLineReader(new ByteArrayInputStream(toUtf8("Hello\r\nWorld\n")));
String line = null;
do {
long count = r.byteCount();
line = r.readLine();
System.out.println("Line at byte " + count + ": " + line);
} while (line != null);
r.close();
}
static class ByteCountingLineReader implements Closeable {
InputStream in;
long _byteCount;
int bufferedByte = -1;
boolean ended;
// in should be a buffered stream!
ByteCountingLineReader(InputStream in) {
this.in = in;
}
ByteCountingLineReader(File f) throws IOException {
in = new BufferedInputStream(new FileInputStream(f), 65536);
}
String readLine() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
if (ended) return null;
while (true) {
int c = read();
if (ended && baos.size() == 0) return null;
if (ended || c == '\n') break;
if (c == '\r') {
c = read();
if (c != '\n' && !ended)
bufferedByte = c;
break;
}
baos.write(c);
}
return fromUtf8(baos.toByteArray());
}
int read() throws IOException {
if (bufferedByte >= 0) {
int b = bufferedByte;
bufferedByte = -1;
return b;
}
int c = in.read();
if (c < 0) ended = true; else ++_byteCount;
return c;
}
long byteCount() {
return bufferedByte >= 0 ? _byteCount - 1 : _byteCount;
}
public void close() throws IOException {
if (in != null) try {
in.close();
} finally {
in = null;
}
}
boolean ended() {
return ended;
}
}
static byte[] toUtf8(String s) {
try {
return s.getBytes("UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static String fromUtf8(byte[] bytes) {
try {
return new String(bytes, "UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static RuntimeException rethrow(Throwable t) {
throw t instanceof RuntimeException ? (RuntimeException) t : new RuntimeException(t);
}
}
Try use RandomAccessFile
RandomAccessFile raf = new RandomAccessFile(filePath, "r");
while ((cur_line = raf.readLine()) != null){
System.out.println(curr_line);
// get offset
long rowIndex = raf.getFilePointer();
}
to seek by offset do:
raf.seek(offset);
I am wondering your final solution, however, I think using long type instead of int can meet the most situation in your code above.
If you want to read a file line by line, I would recommend this code:
import java.io.*;
class FileRead
{
public static void main(String args[])
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I always used that method in the past, and works great!
Source: Here

Quickly read the last line of a text file?

What's the quickest and most efficient way of reading the last line of text from a [very, very large] file in Java?
Below are two functions, one that returns the last non-blank line of a file without loading or stepping through the entire file, and the other that returns the last N lines of the file without stepping through the entire file:
What tail does is zoom straight to the last character of the file, then steps backward, character by character, recording what it sees until it finds a line break. Once it finds a line break, it breaks out of the loop. Reverses what was recorded and throws it into a string and returns. 0xA is the new line and 0xD is the carriage return.
If your line endings are \r\n or crlf or some other "double newline style newline", then you will have to specify n*2 lines to get the last n lines because it counts 2 lines for every line.
public String tail( File file ) {
RandomAccessFile fileHandler = null;
try {
fileHandler = new RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if( filePointer == fileLength ) {
continue;
}
break;
} else if( readByte == 0xD ) {
if( filePointer == fileLength - 1 ) {
continue;
}
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
} finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
/* ignore */
}
}
}
But you probably don't want the last line, you want the last N lines, so use this instead:
public String tail2( File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler =
new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
Invoke the above methods like this:
File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));
Warning
In the wild west of unicode this code can cause the output of this function to come out wrong. For example "Mary?s" instead of "Mary's". Characters with hats, accents, Chinese characters etc may cause the output to be wrong because accents are added as modifiers after the character. Reversing compound characters changes the nature of the identity of the character on reversal. You will have to do full battery of tests on all languages you plan to use this with.
For more information about this unicode reversal problem read this:
https://codeblog.jonskeet.uk/2009/11/02/omg-ponies-aka-humanity-epic-fail/
Apache Commons has an implementation using RandomAccessFile.
It's called ReversedLinesFileReader.
Have a look at my answer to a similar question for C#. The code would be quite similar, although the encoding support is somewhat different in Java.
Basically it's not a terribly easy thing to do in general. As MSalter points out, UTF-8 does make it easy to spot \r or \n as the UTF-8 representation of those characters is just the same as ASCII, and those bytes won't occur in multi-byte character.
So basically, take a buffer of (say) 2K, and progressively read backwards (skip to 2K before you were before, read the next 2K) checking for a line termination. Then skip to exactly the right place in the stream, create an InputStreamReader on the top, and a BufferedReader on top of that. Then just call BufferedReader.readLine().
Using FileReader or FileInputStream won't work - you'll have to use either FileChannel or RandomAccessFile to loop through the file backwards from the end. Encodings will be a problem though, as Jon said.
You can easily change the below code to print the last line.
MemoryMappedFile for printing last 5 lines:
private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
FileInputStream fileInputStream=new FileInputStream(file);
FileChannel channel=fileInputStream.getChannel();
ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
buffer.position((int)channel.size());
int count=0;
StringBuilder builder=new StringBuilder();
for(long i=channel.size()-1;i>=0;i--){
char c=(char)buffer.get((int)i);
builder.append(c);
if(c=='\n'){
if(count==5)break;
count++;
builder.reverse();
System.out.println(builder.toString());
builder=null;
builder=new StringBuilder();
}
}
channel.close();
}
RandomAccessFile to print last 5 lines:
private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
int lines = 0;
StringBuilder builder = new StringBuilder();
long length = file.length();
length--;
randomAccessFile.seek(length);
for(long seek = length; seek >= 0; --seek){
randomAccessFile.seek(seek);
char c = (char)randomAccessFile.read();
builder.append(c);
if(c == '\n'){
builder = builder.reverse();
System.out.println(builder.toString());
lines++;
builder = null;
builder = new StringBuilder();
if (lines == 5){
break;
}
}
}
}
as far as I know The fastest way to read the last line of a text file is using FileUtils Apache class which is in "org.apache.commons.io". I have a two-million-line file and by using this class, it took me less than one second to find the last line. Here is the my code:
LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
lastLine= lineIterator.nextLine();
}
try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {
String line = null;
System.out.println("======================================");
line = reader.readLine(); //Read Line ONE
line = reader.readLine(); //Read Line TWO
System.out.println("first line : " + line);
//Length of one line if lines are of even length
int len = line.length();
//skip to the end - 3 lines
reader.skip((reqFile.length() - (len*3)));
//Searched to the last line for the date I was looking for.
while((line = reader.readLine()) != null){
System.out.println("FROM LINE : " + line);
String date = line.substring(0,line.indexOf(","));
System.out.println("DATE : " + date); //BAM!!!!!!!!!!!!!!
}
System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
System.out.println("======================================");
} catch (IOException x) {
x.printStackTrace();
}
In C#, you should be able to set the stream's position:
From: http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file
using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
using(StreamReader sr = new StreamReader(fs))
{
sr.BaseStream.Position = fs.Length - 4;
if(sr.ReadToEnd() == "DONE")
// match
}
}
To avoid the Unicode problems related to reverting the string (or the StringBuilder), as discussed in Eric Leschinski excellent answer, one can read to a byte list, from the end of the file, revert it to a byte array and then create the String from the byte array.
Below are the changes to Eric Leschinski answer's code, to do it with a byte array. The code changes are below the commented lines of code:
static public String tail2(File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler = new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
//StringBuilder sb = new StringBuilder();
List<Byte> sb = new ArrayList<>();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
//sb.add( (char) readByte );
sb.add( (byte) readByte );
}
//String lastLine = sb.reverse().toString();
//Revert byte array and create String
byte[] bytes = new byte[sb.size()];
for (int i=0; i<sb.size(); i++) bytes[sb.size()-1-i] = sb.get(i);
String lastLine = new String(bytes);
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
Code is 2 lines only
// Please specify correct Charset
ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);
// read last 2 lines
System.out.println(rlf.toString(2));
Gradle:
implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'
Maven:
<dependency>
<groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
</dependency>

Categories

Resources