Why is this "line count" program slow in Java? Using MappedByteBuffer - java

To try MappedByteBuffer (memory mapped file in Java), I wrote a simple wc -l (text file line count) demo:
int wordCount(String fileName) throws IOException {
FileChannel fc = new RandomAccessFile(new File(fileName), "r").getChannel();
MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
int nlines = 0;
byte newline = '\n';
for(long i = 0; i < fc.size(); i++) {
if(mem.get() == newline)
nlines += 1;
}
return nlines;
}
I tried this on a file of about 15 MB (15008641 bytes), and 100k lines. On my laptop, it takes about 13.8 sec. Why is it so slow?
Complete class code is here: http://pastebin.com/t8PLRGMa
For the reference, I wrote the same idea in C: http://pastebin.com/hXnDvZm6
It runs in about 28 ms, or 490 times faster.
Out of curiosity, I also wrote a Scala version using essentially the same algorithm and APIs as in Java. It runs 10 times faster, which suggests there is definitely something odd going on.
Update: The file is cached by the OS, so there is no disk loading time involved.
I wanted to use memory mapping for random access to bigger files which may not fit into RAM. That is why I am not just using a BufferedReader.

The code is very slow, because fc.size() is called in the loop.
JVM obviously cannot eliminate fc.size(), since file size can be changed in run-time. Querying file size is relatively slow, because it requires a system call to the underlying file system.
Change this to
long size = fc.size();
for (long i = 0; i < size; i++) {
...
}

Related

Most efficient way in java to continually read small files into an object

tl/dr: I need to keep some values in my app up to date with the values in ~10 small files, but I'm worried reading the value over and over will have a lot of GC overhead. Do I create a bunch of unbuffered file readers and poll them, or is there any way to "map" the values in a file into a java Double that I can re-run a moment later when the value (maybe) changed?
Long version: I've got some physical sensors (Gyroscope, tachometer) which ev3dev helpfully exposes their current values as small files in a virtual filesystem. Like one file called "/sys/bus/lego/drivers/ev3-analog-sensor/angle" that contains 56.26712
Or the next moment it contains 58.9834
And I'd like a value in my app to keep as close in sync with that file as possible. I could have your standard loop containing MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size()); (from here) but that seems like a lot of allocation overhead if it put it in a fast loop.
Maybe something with a Scanner, or
FileChannel inChannel = aFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(1024);
while(inChannel.read(buffer) > 0)...
I haven't found a magic function of KeepInSyncWithFile(myFloatArray, File("./angle", MODE.FILE_TO_VALUE, 10, TimeUnits.MS)
Java 8+
Singe you are talking about pseudofiles on /sys virtual filesystem, it's unlikely that the standard WatchService will work for them. In order to get updated values, you need to read these files.
The good news is that you can keep reading in a garbage-free manner, i.e. with no allocation at all. Open the file and allocate the buffer just once, and every time you want to read a value, seek to the beginning of the file and read to an existing preallocated buffer.
Here is the code:
public class DeviceReader implements Closeable {
private final RandomAccessFile file;
private final byte[] buf = new byte[512];
public DeviceReader(String fileName) throws IOException {
this.file = new RandomAccessFile(fileName, "r");
}
#Override
public void close() throws IOException {
file.close();
}
public synchronized double readDouble() throws IOException {
file.seek(0);
int length = file.read(buf);
if (length <= 0) {
throw new EOFException();
}
int sign = 1;
long exp = 0;
long value = 0;
for (int i = 0; i < length; i++) {
byte ch = buf[i];
if (ch == '-') {
sign = -1;
} else if (ch == '.') {
exp = 1;
} else if (ch >= '0' && ch <= '9') {
value = (value * 10) + (ch - '0');
exp *= 10;
} else if (ch < ' ') {
break;
}
}
return (double) (sign * value) / Math.max(1, exp);
}
}
Note that I manually parse a floating point number from a byte[] buffer. It would be much easier to call Double.parseDouble, but in this case you'd have to convert a byte[] to a String, and the algorithm will no longer be allocation free.
I can't vouch for this but, File Observer might be worth looking into. You can cache the latest values in your app and observe the file via FileObserver to find if any modify event occurs. I personally don't have any experience working with it, so I can't say for sure as to whether it would work with system files. But if it does, then it's a better solution when compared to just repeatedly looking up the file in a loop.

what is the better way to substring large text?

Suppose my file is 2GB, I want some specific data from one. index to another index(considering specific data 300MB between two index), what is the better way to do that?? I tried substring but throwing out of memory exception. Please suggest better way to do same.
In general, assuming that 2GB file is on disk, and you want to read some part from it into memory, you absolutely don't have to read the whole 2GB into memory first.
The most straightforward solution is using Random Access File
The point is that it provides an abstraction of a pointer that can be moved back and forth over a big file and once you're set you can read bytes from the place the pointer points on.
RandomAccessFile file = new RandomAccessFile(path, "r");
file.seek(position);
byte[] bytes = new byte[size];
file.read(bytes);
file.close();
Reading the file by character and writing them to the output file can solve the issue. Since it won't load the whole file at once.
So, the process will be - read the input file by character, continue to the desired substring start index, then start writing to an output file until the end of the substring.
If you are getting Exception in thread "main" java.lang.OutOfMemoryError: Java heap space, you can try increasing the heap size if you really need to read the file at once and you are sure that String size won't go past max String size limit.
The following snippet shows the idea above -
import java.io.*;
public class LargeFileSubstr {
public static void main(String[] args) throws IOException {
BufferedReader r = new BufferedReader(new FileReader("/Users/me/Downloads/big.txt"));
try (PrintWriter wr = new PrintWriter(new FileWriter("/Users/me/Downloads/big_substr.txt"))) {
int startIndex = 100;
int endIndex = 200;
int pointer = 0;
int ch;
while ((ch = r.read()) != -1) {
if (pointer > endIndex) {
break;
}
if (pointer >= startIndex) {
wr.print((char) ch);
}
pointer++;
}
}
}
}
I have tried this to take a 200MB substring out of 2GB file, works pretty reasonably fast.

How Buffer Streams works internally in Java

I'm reading about Buffer Streams. I searched about it and found many answers that clear my concepts but still have little more questions.
After searching, I have come to know that, Buffer is temporary memory(RAM) which helps program to read data quickly instead hard disk. and when Buffers empty then native input API is called.
After reading little more I got answer from here that is.
Reading data from disk byte-by-byte is very inefficient. One way to
speed it up is to use a buffer: instead of reading one byte at a time,
you read a few thousand bytes at once, and put them in a buffer, in
memory. Then you can look at the bytes in the buffer one by one.
I have two confusion,
1: How/Who data filled in Buffers? (native API how?) as quote above, who filled thousand bytes at once? and it will consume same time. Suppose I have 5MB data, and 5MB loaded once in Buffer in 5 Seconds. and then program use this data from buffer in 5 seconds. Total 10 seconds. But if I skip buffering, then program get direct data from hard disk in 1MB/2sec same as 10Sec total. Please clear my this confusion.
2: The second one how this line works
BufferedReader inputStream = new BufferedReader(new FileReader("xanadu.txt"));
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
Thanks.
As for the performance of using buffering during read/write, it's probably minimal in impact since the OS will cache too, however buffering will reduce the number of calls to the OS, which will have an impact.
When you add other operations on top, such as character encoding/decoding or compression/decompression, the impact is greater as those operations are more efficient when done in blocks.
You second question said:
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
I believe your thinking is wrong. Yes, technically the FileReader will write data to a buffer, but the buffer is not defined by the FileReader, it's defined by the caller of the FileReader.read(buffer) method.
The operation is initiated from outside, when some code calls BufferedReader.read() (any of the overloads). BufferedReader will then check it's buffer, and if enough data is available in the buffer, it will return the data without involving the FileReader. If more data is needed, the BufferedReader will call the FileReader.read(buffer) method to get the next chunk of data.
It's a pull operation, not a push, meaning the data is pulled out of the readers by the caller.
All the stuff is done by a private method named fill() i give you for educational purpose, but all java IDE let you see the source code yourself :
private void fill() throws IOException {
int dst;
if (markedChar <= UNMARKED) {
/* No mark */
dst = 0;
} else {
/* Marked */
int delta = nextChar - markedChar;
if (delta >= readAheadLimit) {
/* Gone past read-ahead limit: Invalidate mark */
markedChar = INVALIDATED;
readAheadLimit = 0;
dst = 0;
} else {
if (readAheadLimit <= cb.length) {
/* Shuffle in the current buffer */
// here copy the read chars in a memory buffer named cb
System.arraycopy(cb, markedChar, cb, 0, delta);
markedChar = 0;
dst = delta;
} else {
/* Reallocate buffer to accommodate read-ahead limit */
char ncb[] = new char[readAheadLimit];
System.arraycopy(cb, markedChar, ncb, 0, delta);
cb = ncb;
markedChar = 0;
dst = delta;
}
nextChar = nChars = delta;
}
}
int n;
do {
n = in.read(cb, dst, cb.length - dst);
} while (n == 0);
if (n > 0) {
nChars = dst + n;
nextChar = dst;
}
}

Java Array Bulk Flush on Disk

I have two arrays (int and long) which contains millions of entries. Until now, I am doing it using DataOutputStream and using a long buffer thus disk I/O costs gets low (nio is also more or less same as I have huge buffer, so I/O access cost low) specifically, using
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream("abc.txt"),1024*1024*100));
for(int i = 0 ; i < 220000000 ; i++){
long l = longarray[i];
dos.writeLong(l);
}
But it takes several seconds (more than 5 minutes) to do that. Actually, what I want to bulk flush (some sort of main memory to disk memory map). For that, I found a nice approach in here and here. However, can't understand how to use that in my javac. Can anybody help me about that or any other way to do that nicely ?
On my machine, 3.8 GHz i7 with an SSD
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream("abc.txt"), 32 * 1024));
long start = System.nanoTime();
final int count = 220000000;
for (int i = 0; i < count; i++) {
long l = i;
dos.writeLong(l);
}
dos.close();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to write %,d longs%n",
time / 1e9, count);
prints
Took 11.706 seconds to write 220,000,000 longs
Using memory mapped files
final int count = 220000000;
final FileChannel channel = new RandomAccessFile("abc.txt", "rw").getChannel();
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_WRITE, 0, count * 8);
mbb.order(ByteOrder.nativeOrder());
long start = System.nanoTime();
for (int i = 0; i < count; i++) {
long l = i;
mbb.putLong(l);
}
channel.close();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to write %,d longs%n",
time / 1e9, count);
// Only works on Sun/HotSpot/OpenJDK to deallocate buffer.
((DirectBuffer) mbb).cleaner().clean();
final FileChannel channel2 = new RandomAccessFile("abc.txt", "r").getChannel();
MappedByteBuffer mbb2 = channel2.map(FileChannel.MapMode.READ_ONLY, 0, channel2.size());
mbb2.order(ByteOrder.nativeOrder());
assert mbb2.remaining() == count * 8;
long start2 = System.nanoTime();
for (int i = 0; i < count; i++) {
long l = mbb2.getLong();
if (i != l)
throw new AssertionError("Expected "+i+" but got "+l);
}
channel.close();
long time2 = System.nanoTime() - start2;
System.out.printf("Took %.3f seconds to read %,d longs%n",
time2 / 1e9, count);
// Only works on Sun/HotSpot/OpenJDK to deallocate buffer.
((DirectBuffer) mbb2).cleaner().clean();
prints on my 3.8 GHz i7.
Took 0.568 seconds to write 220,000,000 longs
on a slower machine prints
Took 1.180 seconds to write 220,000,000 longs
Took 0.990 seconds to read 220,000,000 longs
Is here any other way not to create that ? Because I have that array already on my main memory and I can't allocate more than 500 MB to do that?
This doesn't uses less than 1 KB of heap. If you look at how much memory is used before and after this call you will normally see no increase at all.
Another thing, is this gives efficient loading also means MappedByteBuffer?
In my experience, using a memory mapped file is by far the fastest because you reduce the number of system calls and copies into memory.
Because, in some article I found read(buffer) this gives better loading performance. (I check that one, really faster 220 million int array -float array read 5 seconds)
I would like to read that article because I have never seen that.
Another issue: readLong gives error while reading from your code output file
Part of the performance in provement is storing the values in native byte order. writeLong/readLong always uses big endian format which is much slower on Intel/AMD systems which are little endian format natively.
You can make the byte order big-endian which will slow it down or you can use native ordering (DataInput/OutputStream only supports big endian)
I am running it server with 16GB memory with 2.13 GhZ [CPU]
I doubt the problem has anything to do with your Java code.
Your file system appears to be extraordinarily slow (at least ten times slower than what one would expect from a local disk).
I would do two things:
Double check that you are actually writing to a local disk, and not to a network share. Bear in mind that in some environments home directories are NFS mounts.
Ask your sysadmins to take a look at the machine to find out why the disk is so slow. If I were in their shoes, I'd start by checking the logs and running some benchmarks (e.g. using Bonnie++).

Why is System.out.println so slow?

Is this something common to all programming languages? Doing multiple print followed by a println seems faster but moving everything to a string and just printing that seems fastest. Why?
EDIT: For example, Java can find all the prime numbers up to 1 million in less than a second - but printing then all out each on their own println can take minutes! Up to a 10 billion can hours to print!
EX:
package sieveoferatosthenes;
public class Main {
public static void main(String[] args) {
int upTo = 10000000;
boolean primes[] = new boolean[upTo];
for( int b = 0; b < upTo; b++ ){
primes[b] = true;
}
primes[0] = false;
primes[1] = false;
int testing = 1;
while( testing <= Math.sqrt(upTo)){
testing ++;
int testingWith = testing;
if( primes[testing] ){
while( testingWith < upTo ){
testingWith = testingWith + testing;
if ( testingWith >= upTo){
}
else{
primes[testingWith] = false;
}
}
}
}
for( int b = 2; b < upTo; b++){
if( primes[b] ){
System.out.println( b );
}
}
}
}
println is not slow, it's the underlying PrintStream that is connected with the console, provided by the hosting operating system.
You can check it yourself: compare dumping a large text file to the console with piping the same textfile into another file:
cat largeTextFile.txt
cat largeTextFile.txt > temp.txt
Reading and writing are similiar and proportional to the size of the file (O(n)), the only difference is, that the destination is different (console compared to file). And that's basically the same with System.out.
The underlying OS operation (displaying chars on a console window) is slow because
The bytes have to be sent to the console application (should be quite fast)
Each char has to be rendered using (usually) a true type font (that's pretty slow, switching off anti aliasing could improve performance, btw)
The displayed area may have to be scrolled in order to append a new line to the visible area (best case: bit block transfer operation, worst case: re-rendering of the complete text area)
System.out is a static PrintStream class. PrintStream has, among other things, those methods you're probably quite familiar with, like print() and println() and such.
It's not unique to Java that input and output operations take a long time. "long." printing or writing to a PrintStream takes a fraction of a second, but over 10 billion instances of this print can add up to quite a lot!
This is why your "moving everything to a String" is the fastest. Your huge String is built, but you only print it once. Sure, it's a huge print, but you spend time on actually printing, not on the overhead associated with the print() or println().
As Dvd Prd has mentioned, Strings are immutable. That means whenever you assign a new String to an old one but reusing references, you actually destroy the reference to the old String and create a reference to the new one. So you can make this whole operation go even faster by using the StringBuilder class, which is mutable. This will decrease the overhead associated with building that string you'll eventually print.
I believe this is because of buffering. A quote from the article:
Another aspect of buffering concerns
text output to a terminal window. By
default, System.out (a PrintStream) is
line buffered, meaning that the output
buffer is flushed when a newline
character is encountered. This is
important for interactivity, where
you'd like to have an input prompt
displayed before actually entering any
input.
A quote explaining buffers from wikipedia:
In computer science, a buffer is a
region of memory used to temporarily
hold data while it is being moved from
one place to another. Typically, the
data is stored in a buffer as it is
retrieved from an input device (such
as a Mouse) or just before it is sent
to an output device (such as Speakers)
public void println()
Terminate the current line by writing
the line separator string. The line
separator string is defined by the
system property line.separator, and is
not necessarily a single newline
character ('\n').
So the buffer get's flushed when you do println which means new memory has to be allocated etc which makes printing slower. The other methods you specified require lesser flushing of buffers thus are faster.
Take a look at my System.out.println replacement.
By default, System.out.print() is only line-buffered and does a lot work related to Unicode handling. Because of its small buffer size, System.out.println() is not well suited to handle many repetitive outputs in a batch mode. Each line is flushed right away. If your output is mainly ASCII-based then by removing the Unicode-related activities, the overall execution time will be better.
If you're printing to the console window, not to a file, that will be the killer.
Every character has to be painted, and on every line the whole window has to be scrolled.
If the window is partly overlaid with other windows, it also has to do clipping.
That's going to take far more cycles than what your program is doing.
Usually that's not a bad price to pay, since console output is supposed to be for your reading pleasure :)
The problem you have is that displaying to the screen is very espensive, especially if you have a graphical windows/X-windows environment (rather than a pure text terminal) Just to render one digit in a font is far more expensive than the calculations you are doing. When you send data to the screen faster than it can display it, it buffered the data and quickly blocks. Even writing to a file is significant compare to the calculations, but its 10x - 100x faster than displaying on the screen.
BTW: math.sqrt() is very expensive, and using a loop is much slower than using modulus i.e. % to determine if a number is a multiple. BitSet can be 8x more space efficient than boolean[], and faster for operations on multiple bits e.g. counting or searching bits.
If I dump the output to a file, it is quick, but writing to the console is slow, and if I write to the console the data which was written to a file it takes about the same amount of time.
Took 289 ms to examine 10,000,000 numbers.
Took 149 ms to toString primes up to 10,000,000.
Took 306 ms to write to a file primes up to 10,000,000.
Took 61,082 ms to write to a System.out primes up to 10,000,000.
time cat primes.txt
real 1m24.916s
user 0m3.619s
sys 0m12.058s
The code
int upTo = 10*1000*1000;
long start = System.nanoTime();
BitSet nonprimes = new BitSet(upTo);
for (int t = 2; t * t < upTo; t++) {
if (nonprimes.get(t)) continue;
for (int i = 2 * t; i <= upTo; i += t)
nonprimes.set(i);
}
PrintWriter report = new PrintWriter("report.txt");
long time = System.nanoTime() - start;
report.printf("Took %,d ms to examine %,d numbers.%n", time / 1000 / 1000, upTo);
long start2 = System.nanoTime();
for (int i = 2; i < upTo; i++) {
if (!nonprimes.get(i))
Integer.toString(i);
}
long time2 = System.nanoTime() - start2;
report.printf("Took %,d ms to toString primes up to %,d.%n", time2 / 1000 / 1000, upTo);
long start3 = System.nanoTime();
PrintWriter pw = new PrintWriter(new BufferedOutputStream(new FileOutputStream("primes.txt"), 64*1024));
for (int i = 2; i < upTo; i++) {
if (!nonprimes.get(i))
pw.println(i);
}
pw.close();
long time3 = System.nanoTime() - start3;
report.printf("Took %,d ms to write to a file primes up to %,d.%n", time3 / 1000 / 1000, upTo);
long start4 = System.nanoTime();
for (int i = 2; i < upTo; i++) {
if (!nonprimes.get(i))
System.out.println(i);
}
long time4 = System.nanoTime() - start4;
report.printf("Took %,d ms to write to a System.out primes up to %,d.%n", time4 / 1000 / 1000, upTo);
report.close();
Most of the answers here are right, but they don't cover the most important point: system calls. This is the operation that induces the more overhead.
When your software needs to access some hardware resource (your screen for example), it needs to ask the OS (or hypervisor) if it can access the hardware. This costs a lot:
Here are interesting blogs about syscalls, the last one being dedicated to syscall and Java
http://arkanis.de/weblog/2017-01-05-measurements-of-system-call-performance-and-overhead
http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html
https://blog.packagecloud.io/eng/2017/03/14/using-strace-to-understand-java-performance-improvement/

Categories

Resources