I use a FileChannel to write 4GB files to a spin disc and although I have tweaked the buffer size to maximise write speed and flush the channel every second the file channel close can take 200 ms. This is enough time that the queue that I read from overflows and starts dropping packets.
I use a direct byte buffer, but I am struggling to understand what is happening here. I have removable discs and write caching has been disabled so I would not expect the OS to be buffering the data?
The benchmark speed of the discs is around 80 MB/Sec, but I am seeing the long file channel close times even when writing at speeds of ~ 40 MB/Sec.
I appreciate that as the discs fills then write performance will decrease, but these discs are empty.
Is there any tweaks I can do to remove the long delay when closing the file channel. Should I be allocating the file space upfront and I write the file with a .lock extension and then do a rename once the file has been completed?
Just hoping someone who has done high throughput IO can provide some pointers as to possible options above and beyond what is usually documented when writing files using NIO.
The code is below and I cannot see anything immediately wrong.
public final class DataWriter implements Closeable {
private static final Logger LOG = Logger.getLogger("DataWriter");
private static final long MB = 1024 * 1024;
private final int flushPeriod;
private FileOutputStream fos;
private FileChannel fileChannel;
private long totalBytesWritten;
private long lastFlushTime;
private final ByteBuffer buffer;
private final int bufferSize;
private final long startTime;
private long totalPackets = 0;
private final String fileName;
public DataWriter(File recordFile, int bSize, int flushPeriod) throws IOException {
this.flushPeriod = flushPeriod;
if (!recordFile.createNewFile()) {
throw new IllegalStateException("Record file has not been created");
}
totalBytesWritten = 0;
fos = new FileOutputStream(recordFile);
fileChannel = fos.getChannel();
buffer = ByteBuffer.allocateDirect(bSize);
bufferSize = bSize;
startTime = System.currentTimeMillis();
this.fileName = recordFile.getAbsolutePath();
}
/**
* Appends the supplied ByteBuffer to the main buffer if there is space
* #param packet
* #return
* #throws IOException
*/
public int write(ByteBuffer packet) throws IOException {
int bytesWritten = 0;
totalPackets++;
//If the buffer cannot accommodate the supplied buffer then write straight out
if(packet.limit() > buffer.capacity()) {
bytesWritten = writeBuffer(packet);
totalBytesWritten += bytesWritten;
} else {
//write the currently filled buffer if no space exists to accomodate the current buffer
if(packet.limit() > buffer.remaining()) {
buffer.flip();
bytesWritten = writeBuffer(buffer);
totalBytesWritten += bytesWritten;
}
buffer.put(packet);
}
if(System.currentTimeMillis()-lastFlushTime > flushPeriod) {
fileChannel.force(true);
lastFlushTime=System.currentTimeMillis();
}
return bytesWritten;
}
public long getTotalBytesWritten() {
return totalBytesWritten;
}
/**
* Writes the buffer and then clears it
* #throws IOException
*/
private int writeBuffer(ByteBuffer byteBuffer) throws IOException {
int bytesWritten = 0;
while(byteBuffer.hasRemaining()) {
bytesWritten += fileChannel.write(byteBuffer);
}
//Reset the buffer ready for writing
byteBuffer.clear();
return bytesWritten;
}
#Override
public void close() throws IOException {
//Write the buffer if data is present
if(buffer.position() != 0) {
buffer.flip();
totalBytesWritten += writeBuffer(buffer);
fileChannel.force(true);
}
long time = System.currentTimeMillis() - startTime;
if(LOG.isDebugEnabled()) {
LOG.debug( totalBytesWritten + " bytes written in " + (time / 1000d) + " seconds using ByteBuffer size ["+bufferSize/1024+"] KB");
LOG.debug( (totalBytesWritten / MB) / (time / 1000d) + " MB per second written to file " + fileName);
LOG.debug( "Total packets written ["+totalPackets+"] average packet size ["+totalBytesWritten / totalPackets+"] bytes");
}
if (fos != null) {
fos.close();
fos = null;
}
}
}
Related
I need to calculate the storage throughput in my Android device, and I found source code about calculating storage sequential RW throughput in Android CTS.
FileUtil.java
public static long getFileSizeExceedingMemory(Context context, int bufferSize) {
long freeDisk = SystemUtil.getFreeDiskSize(context);
long memSize = SystemUtil.getTotalMemory(context);
long diskSizeTarget = (2 * memSize / bufferSize) * bufferSize;
final long minimumDiskSize = (512L * 1024L * 1024L / bufferSize) * bufferSize;
final long reservedDiskSize = (50L * 1024L * 1024L / bufferSize) * bufferSize;
if ( diskSizeTarget < minimumDiskSize ) {
diskSizeTarget = minimumDiskSize;
}
if (diskSizeTarget > freeDisk) {
Log.i(TAG, "Free disk size " + freeDisk + " too small");
return 0;
}
if ((freeDisk - diskSizeTarget) < reservedDiskSize) {
diskSizeTarget -= reservedDiskSize;
}
return diskSizeTarget;
}
This function is about to create a file from RAM to storage(write) and then read back.
I was just wondering:
long diskSizeTarget = (2 * memSize / bufferSize) * bufferSize;
Why they need to prepare a file which is around double ram size?
I've ever tried file size which is a half of ram size (my device ram size=2GB), and the write throughput looks normal but read throughput is too fast(around 200MB/s).
But the result looks fine when I use around 4GB file(double ram size) and 2GB.
(The buffersize parameters are 10MB for read and write)
Here are the read and write codes:
SequentialRWTest.java
public void testSingleSequentialRead() throws Exception {
final long fileSize = FileUtil.getFileSizeExceedingMemory(getContext(), BUFFER_SIZE);
if (fileSize == 0) { // not enough space, give up
return;
}
long start = System.currentTimeMillis();
final File file = FileUtil.createNewFilledFile(getContext(),
DIR_SEQ_RD, fileSize);
long finish = System.currentTimeMillis();
String streamName = "test_single_sequential_read";
DeviceReportLog report = new DeviceReportLog(REPORT_LOG_NAME, streamName);
report.addValue("file_size", fileSize, ResultType.NEUTRAL, ResultUnit.NONE);
report.addValue("write_throughput",
Stat.calcRatePerSec((double)fileSize / 1024 / 1024, finish - start),
ResultType.HIGHER_BETTER, ResultUnit.MBPS);
final int NUMBER_READ = 10;
final byte[] data = new byte[BUFFER_SIZE];
double[] times = MeasureTime.measure(NUMBER_READ, new MeasureRun() {
#Override
public void run(int i) throws IOException {
final FileInputStream in = new FileInputStream(file);
long read = 0;
while (read < fileSize) {
in.read(data);
read += BUFFER_SIZE;
}
in.close();
}
});
double[] mbps = Stat.calcRatePerSecArray((double)fileSize / 1024 / 1024, times);
report.addValues("read_throughput", mbps, ResultType.HIGHER_BETTER, ResultUnit.MBPS);
Stat.StatResult stat = Stat.getStat(mbps);
report.setSummary("read_throughput_average", stat.mAverage, ResultType.HIGHER_BETTER,
ResultUnit.MBPS);
report.submit(getInstrumentation());
}
And createNewFilledFile function in FileUtil.java
public static File createNewFilledFile(Context context, String dirName, long length)
throws IOException {
final int BUFFER_SIZE = 10 * 1024 * 1024;
File file = createNewFile(context, dirName);
FileOutputStream out = new FileOutputStream(file);
byte[] data = generateRandomData(BUFFER_SIZE);
long written = 0;
while (written < length) {
out.write(data);
written += BUFFER_SIZE;
}
out.flush();
out.close();
return file;
}
I need to limit the file size to 1 GB while writing preferably using BufferedWriter.
Is it possible using BufferedWriter or I have to use other libraries ?
like
try (BufferedWriter writer = Files.newBufferedWriter(path)) {
//...
writer.write(lines.stream());
}
You can always write your own OutputStream to limit the number of bytes written.
The following assumes you want to throw exception if size is exceeded.
public final class LimitedOutputStream extends FilterOutputStream {
private final long maxBytes;
private long bytesWritten;
public LimitedOutputStream(OutputStream out, long maxBytes) {
super(out);
this.maxBytes = maxBytes;
}
#Override
public void write(int b) throws IOException {
ensureCapacity(1);
super.write(b);
}
#Override
public void write(byte[] b) throws IOException {
ensureCapacity(b.length);
super.write(b);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
ensureCapacity(len);
super.write(b, off, len);
}
private void ensureCapacity(int len) throws IOException {
long newBytesWritten = this.bytesWritten + len;
if (newBytesWritten > this.maxBytes)
throw new IOException("File size exceeded: " + newBytesWritten + " > " + this.maxBytes);
this.bytesWritten = newBytesWritten;
}
}
You will of course now have to set up the Writer/OutputStream chain manually.
final long SIZE_1GB = 1073741824L;
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new LimitedOutputStream(Files.newOutputStream(path), SIZE_1GB),
StandardCharsets.UTF_8))) {
//
}
Exact bytes to 1 GB is very difficult in cases where you are writing lines. Each line may contain unknown number of bytes in it. I am assuming you want to write data line by line in file.
However, you can check how many bytes does line has before writing it to the file and another approach is to check file size after writing each line.
Following basic example writes one same line each time. Here This is just a test ! text takes 21 bytes on file in UTF-8 encoding. Ultimately after 49 writes it reaches to 1029 Bytes and stops writing.
public class Test {
private static final int ONE_KB = 1024;
public static void main(String[] args) {
File file = new File("D:/test.txt");
try (BufferedWriter writer = Files.newBufferedWriter(file.toPath())) {
while (file.length() < ONE_KB) {
writer.write("This is just a test !");
writer.flush();
}
System.out.println("1 KB Data is written to the file.!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
As you can see we have already written out of the limit of 1KB as above program writes 1029 Bytes and not less than 1024 Bytes.
Second approach is checking the bytes according to specific encoding before writing it to file.
public class Test {
private static final int ONE_KB = 1024;
public static void main(String[] args) throws UnsupportedEncodingException {
File file = new File("D:/test.txt");
String data = "This is just a test !";
int dataLength = data.getBytes("UTF-8").length;
try (BufferedWriter writer = Files.newBufferedWriter(file.toPath())) {
while (file.length() + dataLength < ONE_KB) {
writer.write(data);
writer.flush();
}
System.out.println("1 KB Data written to the file.!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
In this approach we check length of bytes prior to writing it to the file. So, it will write 1008 Bytes and it will stop writing.
Problems with both the approaches,
Write and Check : You may end up with some extra bytes and file size may cross the limit
Check and Write : You may have less bytes than the limit if next line has lot of data in it. You should be careful about the encoding.
However, there are other ways to do this validations with some third party library like apache io and I find it more cumbersome then conventional java ways.
int maxSize = 1_000_000_000;
Charset charset = StandardCharsets.UTF_F);
int size = 0;
int lineCount = 0;
while (lineCount < lines.length) {
long size2 = size + (lines[lineCount] + "\r\n").getBytes(charset).length;
if (size2 > maxSize) {
break;
}
size = size2;
++lineCount;
}
List<String> linesToWrite = lines.substring(0, lineCount);
Path path = Paths.get("D:/test.txt");
Files.write(path, linesToWrite , charset);
Or a bit faster while decoding only once:
int lineCount = 0;
try (FileChannel channel = new RandomAccessFile("D:/test.txt", "w").getChannel()) {
ByteBuffer buf = channel.map(FileChannel.MapMode.WRITE, 0, maxSize);
lineCount = lines.length;
for (int i = 0; i < lines.length; i++) {
bytes[] line = (lines.get(i) + "\r\n").getBytes(charset);
if (line.length > buffer.remaining()) {
lineCount = i;
break;
}
buffer.put(line);
}
}
IIUC, there are various ways to do it.
Keep writing data in chucks and flushing it and keep checking the file size after every flush.
Use log4j (or some logging framework) which can let us rollover to new file after certain size or time or some other trigger point.
While BufferedReader is great, there are some new APIs in java which could make it faster. Fastest way to write huge data in text file Java
While transfering images over the network using sockets, I had a -for me- strange issue:
When I wrote images to the OutputStream of one socket with ImageIO.write() and read the same images from the InputStream of the other socket with ImageIO.read() I noticed, that 16 bytes per image were sent more than read.
To be able to send multiple images in a row I had to read these bytes after every call of ImageIO.read() to not receive null because the input could not be parsed.
Does anybody know, why this is so and what these bytes are?
In this piece of code I have extracted the issue:
public class Test implements Runnable
{
public static final int COUNT = 5;
public void run()
{
try(ServerSocket server = new ServerSocket(3040))
{
Socket client = server.accept();
for(int i = 0; i < COUNT; i++)
{
final BufferedImage image = readImage(client.getInputStream());
System.out.println(image);
}
}
catch(IOException e)
{
e.printStackTrace();
}
}
private BufferedImage readImage(InputStream stream) throws IOException
{
BufferedImage image = ImageIO.read(stream);
dontKnowWhy(stream);
return image;
}
private void dontKnowWhy(InputStream stream) throws IOException
{
stream.read(new byte[16]);
}
public static void main(String... args)
{
new Thread(new Test()).start();
try(Socket server = new Socket("localhost", 3040))
{
for(int i = 0; i < COUNT; i++)
{
BufferedImage image = new BufferedImage(300, 300, BufferedImage.TYPE_INT_ARGB); //
int[] vals = new int[image.getWidth() * image.getHeight()]; //
Arrays.fill(vals, new Random().nextInt()); // Create random image
image.setRGB(0, 0, image.getWidth(), image.getHeight(), vals, 0, 1); //
ImageIO.write(image, "png", server.getOutputStream()); //send image to server
long time = System.currentTimeMillis(); //
while(time + 1000 > System.currentTimeMillis()); //wait a second
}
}
catch(IOException e)
{
e.printStackTrace();
}
}
}
I am glad about any answers, already thank you!
The "extra" bytes you see, is not read, simply because they are not needed to correctly decode the image (they are, however, most likely needed to form a fully compliant file in the chosen file format, so they are not just random "garbage" bytes).
For any given ImageIO plugin, the number of bytes left in the stream after a read may be 0, 16or any other number. It might depend on the format, the writer that wrote it, the reader, the number of images in the input, the metadata in the file, etc. In other words, relying on this behavior would be an error.
The easies way to fix this, is to prepend each image with a byte count, containing the length of the output image. This typically means you need to buffer the response on the client, to either a ByteArrayOutputStream (in-memory) or a FileOutputStream (disk).
The client then needs to read the byte count for the image, and make sure you skip any remaining bytes after the read. This can be accomplished by wrapping the input (see FilterInputStream) and keep track of the byte count internally.
(You can also read all the bytes up front, and wrapping them in a ByteArrayInputStream, before passing the data to ImageIO.read(), which is simpler but does more in-memory buffering).
After this, the client is ready do start over, by reading a new byte count, and a new image.
Another approach if you'd like less buffering on the server, could be to implement something like HTTP chunked transfer encoding, where you have multiple smaller blocks (chunks) sent to the client for each image, each prepended with its own byte count. You would need to handle the last chunk of each image especially, or insert special delimiter chunks to mark end of stream or start of a new stream.
Code below implements the buffering approach on the server, while using direct reading on the client.
Server:
DataOutputStream stream = new DataOutputStream(server.getOutputStream());
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
for (...) {
buffer.reset();
ImageIO.write(image, "png", buffer);
stream.writeInt(buffer.size());
buffer.writeTo(stream); // Send image to server
}
Client:
DataInputStream stream = new DataInputStream(client.getInputStream());
for (...) {
int size = stream.readInt();
try (InputStream imageData = new SubStream(stream, size)) {
return ImageIO.read(imageData);
}
// Note: imageData implicitly closed using try-with-resources
}
...
// Util class
private static final class SubStream extends FilterInputStream {
private final long length;
private long pos;
public SubStream(final InputStream stream, final long length) {
super(stream);
this.length = length;
}
#Override
public boolean markSupported() {
return false;
}
#Override
public int available() throws IOException {
return (int) Math.min(super.available(), length - pos);
}
#Override
public int read() throws IOException {
if (pos++ >= length) {
return -1;
}
return super.read();
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
if (pos >= length) {
return -1;
}
int count = super.read(b, off, (int) Math.min(len, length - pos));
if (count < 0) {
return -1;
}
pos += count;
return count;
}
#Override
public long skip(long n) throws IOException {
if (pos >= length) {
return -1;
}
long skipped = super.skip(Math.min(n, length - pos));
if (skipped < 0) {
return -1;
}
pos += skipped;
return skipped;
}
#Override
public void close() throws IOException {
// Don't close wrapped stream, just consume any bytes left
while (pos < length) {
skip(length - pos);
}
}
}
I've been asked to measure current disk performance, as we are planning to replace local disk with network attached storage on our application servers. Since our applications which write data are written in Java, I thought I would measure the performance directly in Linux, and also using a simple Java test. However I'm getting significantly different results, particularly for reading data, using what appear to me to be similar tests. Directly in Linux I'm doing:
dd if=/dev/zero of=/data/cache/test bs=1048576 count=8192
dd if=/data/cache/test of=/dev/null bs=1048576 count=8192
My Java test looks like this:
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class TestDiskSpeed {
private byte[] oneMB = new byte[1024 * 1024];
public static void main(String[] args) throws IOException {
new TestDiskSpeed().execute(args);
}
private void execute(String[] args) throws IOException {
long size = Long.parseLong(args[1]);
testWriteSpeed(args[0], size);
testReadSpeed(args[0], size);
}
private void testWriteSpeed(String filePath, long size) throws IOException {
File file = new File(filePath);
BufferedOutputStream writer = null;
long start = System.currentTimeMillis();
try {
writer = new BufferedOutputStream(new FileOutputStream(file), 1024 * 1024);
for (int i = 0; i < size; i++) {
writer.write(oneMB);
}
writer.flush();
} finally {
if (writer != null) {
writer.close();
}
}
long elapsed = System.currentTimeMillis() - start;
String message = "Wrote " + size + "MB in " + elapsed + "ms at a speed of " + calculateSpeed(size, elapsed) + "MB/s";
System.out.println(message);
}
private void testReadSpeed(String filePath, long size) throws IOException {
File file = new File(filePath);
BufferedInputStream reader = null;
long start = System.currentTimeMillis();
try {
reader = new BufferedInputStream(new FileInputStream(file), 1024 * 1024);
for (int i = 0; i < size; i++) {
reader.read(oneMB);
}
} finally {
if (reader != null) {
reader.close();
}
}
long elapsed = System.currentTimeMillis() - start;
String message = "Read " + size + "MB in " + elapsed + "ms at a speed of " + calculateSpeed(size, elapsed) + "MB/s";
System.out.println(message);
}
private double calculateSpeed(long size, long elapsed) {
double seconds = ((double) elapsed) / 1000L;
double speed = ((double) size) / seconds;
return speed;
}
}
This is being invoked with "java TestDiskSpeed /data/cache/test 8192"
Both of these should be creating 8GB files of zeros, 1MB at a time, measuring the speed, and then reading it back and measuring again. Yet the speeds I'm consistently getting are:
Linux: write - ~650MB/s
Linux: read - ~4.2GB/s
Java: write - ~500MB/s
Java: read - ~1.9GB/s
Can anyone explain the large discrepancy?
When I run this using NIO on my system. Ubuntu 15.04 with an i7-3970X
public class Main {
static final int SIZE_GB = Integer.getInteger("sizeGB", 8);
static final int BLOCK_SIZE = 64 * 1024;
public static void main(String[] args) throws IOException {
ByteBuffer buffer = ByteBuffer.allocateDirect(BLOCK_SIZE);
File tmp = File.createTempFile("delete", "me");
tmp.deleteOnExit();
int blocks = (int) (((long) SIZE_GB << 30) / BLOCK_SIZE);
long start = System.nanoTime();
try (FileChannel fc = new FileOutputStream(tmp).getChannel()) {
for (int i = 0; i < blocks; i++) {
buffer.clear();
while (buffer.remaining() > 0)
fc.write(buffer);
}
}
long mid = System.nanoTime();
try (FileChannel fc = new FileInputStream(tmp).getChannel()) {
for (int i = 0; i < blocks; i++) {
buffer.clear();
while (buffer.remaining() > 0)
fc.read(buffer);
}
}
long end = System.nanoTime();
long size = tmp.length();
System.out.printf("Write speed %.1f GB/s, read Speed %.1f GB/s%n",
(double) size/(mid-start), (double) size/(end-mid));
}
}
prints
Write speed 3.8 GB/s, read Speed 6.8 GB/s
You may get better performance if you drop the BufferedXxxStream. It's not helping since you're doing 1Mb read/writes, and is cause extra memory copy of the data.
Better yet, you should be using the NIO classes instead of the regular IO classes.
try-finally
You should clean up your try-finally code.
// Original code
BufferedOutputStream writer = null;
try {
writer = new ...;
// use writer
} finally {
if (writer != null) {
writer.close();
}
}
// Cleaner code
BufferedOutputStream writer = new ...;
try {
// use writer
} finally {
writer.close();
}
// Even cleaner, using try-with-resources (since Java 7)
try (BufferedOutputStream writer = new ...) {
// use writer
}
To complement Peter's great answer, I am adding the code below. It compares head-to-head the performance of the good-old java.io with NIO. Unlike Peter, instead of just reading data into a direct buffer, I do a typical thing with it: transfer it into an on-heap byte array. This steals surprisingly little from the performance: where I was getting 7.5 GB/s with Peter's code, here I get 6.0 GB/s.
For the java.io approach I can't have a direct buffer, but instead I call the read method directly with my target on-heap byte array. Note that this array is smallish and has an awkward size of 555 bytes. Nevertheless I retrieve almost identical performance: 5.6 GB/s. The difference is so small that it would evaporate completely in normal usage, and even in this artificial scenario if I wasn't reading directly from the disk cache.
As a bonus I include at the bottom a method which can be used on Linux and Mac to purge the disk caches. You'll see a dramatic turn in performance if you decide to call it between the write and the read step.
public final class MeasureIOPerformance {
static final int SIZE_GB = Integer.getInteger("sizeGB", 8);
static final int BLOCK_SIZE = 64 * 1024;
static final int blocks = (int) (((long) SIZE_GB << 30) / BLOCK_SIZE);
static final byte[] acceptBuffer = new byte[555];
public static void main(String[] args) throws IOException {
for (int i = 0; i < 3; i++) {
measure(new ChannelRw());
measure(new StreamRw());
}
}
private static void measure(RW rw) throws IOException {
File file = File.createTempFile("delete", "me");
file.deleteOnExit();
System.out.println("Writing " + SIZE_GB + " GB " + " with " + rw);
long start = System.nanoTime();
rw.write(file);
long mid = System.nanoTime();
System.out.println("Reading " + SIZE_GB + " GB " + " with " + rw);
long checksum = rw.read(file);
long end = System.nanoTime();
long size = file.length();
System.out.printf("Write speed %.1f GB/s, read Speed %.1f GB/s%n",
(double) size/(mid-start), (double) size/(end-mid));
System.out.println(checksum);
file.delete();
}
interface RW {
void write(File f) throws IOException;
long read(File f) throws IOException;
}
static class ChannelRw implements RW {
final ByteBuffer directBuffer = ByteBuffer.allocateDirect(BLOCK_SIZE);
#Override public String toString() {
return "Channel";
}
#Override public void write(File f) throws IOException {
FileChannel fc = new FileOutputStream(f).getChannel();
try {
for (int i = 0; i < blocks; i++) {
directBuffer.clear();
while (directBuffer.remaining() > 0) {
fc.write(directBuffer);
}
}
} finally {
fc.close();
}
}
#Override public long read(File f) throws IOException {
ByteBuffer buffer = ByteBuffer.allocateDirect(BLOCK_SIZE);
FileChannel fc = new FileInputStream(f).getChannel();
long checksum = 0;
try {
for (int i = 0; i < blocks; i++) {
buffer.clear();
while (buffer.hasRemaining()) {
fc.read(buffer);
}
buffer.flip();
while (buffer.hasRemaining()) {
buffer.get(acceptBuffer, 0, Math.min(acceptBuffer.length, buffer.remaining()));
checksum += acceptBuffer[acceptBuffer[0]];
}
}
} finally {
fc.close();
}
return checksum;
}
}
static class StreamRw implements RW {
final byte[] buffer = new byte[BLOCK_SIZE];
#Override public String toString() {
return "Stream";
}
#Override public void write(File f) throws IOException {
FileOutputStream out = new FileOutputStream(f);
try {
for (int i = 0; i < blocks; i++) {
out.write(buffer);
}
} finally {
out.close();
}
}
#Override public long read(File f) throws IOException {
FileInputStream in = new FileInputStream(f);
long checksum = 0;
try {
for (int i = 0; i < blocks; i++) {
for (int remaining = acceptBuffer.length, read;
(read = in.read(buffer)) != -1 && (remaining -= read) > 0; )
{
in.read(acceptBuffer, acceptBuffer.length - remaining, remaining);
}
checksum += acceptBuffer[acceptBuffer[0]];
}
} finally {
in.close();
}
return checksum;
}
}
public static void purgeCache() throws IOException, InterruptedException {
if (System.getProperty("os.name").startsWith("Mac")) {
new ProcessBuilder("sudo", "purge")
// .inheritIO()
.start().waitFor();
} else {
new ProcessBuilder("sudo", "su", "-c", "echo 3 > /proc/sys/vm/drop_caches")
// .inheritIO()
.start().waitFor();
}
}
}
I'm trying to perform a once-through read of a large file (~4GB) using Java 5.0 x64 (on Windows XP).
Initially the file read rate is very fast, but gradually the throughput slows down substantially, and my machine seems very unresponsive as time goes on.
I've used ProcessExplorer to monitor the File I/O statistics, and it looks like the process initially reads 500MB/sec, but this rate gradually drops to around 20MB/sec.
Any ideas on the the best way to maintain File I/O rates, especially with reading large files using Java?
Here's some test code that shows the "interval time" continuing to increase. Just pass Main a file that's at least 500MB.
import java.io.File;
import java.io.RandomAccessFile;
public class MultiFileReader {
public static void main(String[] args) throws Exception {
MultiFileReader mfr = new MultiFileReader();
mfr.go(new File(args[0]));
}
public void go(final File file) throws Exception {
RandomAccessFile raf = new RandomAccessFile(file, "r");
long fileLength = raf.length();
System.out.println("fileLen: " + fileLength);
raf.close();
long startTime = System.currentTimeMillis();
doChunk(0, file, 0, fileLength);
System.out.println((System.currentTimeMillis() - startTime) + " ms");
}
public void doChunk(int threadNum, File file, long start, long end) throws Exception {
System.out.println("Starting partition " + start + " to " + end);
RandomAccessFile raf = new RandomAccessFile(file, "r");
raf.seek(start);
long cur = start;
byte buf[] = new byte[1000];
int lastPercentPrinted = 0;
long intervalStartTime = System.currentTimeMillis();
while (true) {
int numRead = raf.read(buf);
if (numRead == -1) {
break;
}
cur += numRead;
if (cur >= end) {
break;
}
int percentDone = (int)(100.0 * (cur - start) / (end - start));
if (percentDone % 5 == 0) {
if (lastPercentPrinted != percentDone) {
lastPercentPrinted = percentDone;
System.out.println("Thread" + threadNum + " Percent done: " + percentDone + " Interval time: " + (System.currentTimeMillis() - intervalStartTime));
intervalStartTime = System.currentTimeMillis();
}
}
}
raf.close();
}
}
Thanks!
I very much doubt that you're really getting 500MB per second from your disk. Chances are the data is cached by the operating system - and that the 20MB per second is what happens when it really hits the disk.
This will quite possibly be visible in the disk section of the Vista Resource Manager - and a low-tech way to tell is to listen to the disk drive :)
Depending on your specific hardware and what else is going on, you might need to work reasonably hard to do much more than 20MB/sec.
I think perhaps you don't really how completely off-the-scale the 500MB/sec is...
What are you hoping for, and have you checked that your specific drive is even theoretically capable of it?
The Java Garbage Collector could be a bottleneck here.
I would make the buffer larger and private to the class so it is reused instead of allocated by each call to doChunk().
public class MultiFileReader {
private byte buf[] = new byte[256*1024];
...
}
You could use JConsole to monitor your app, including memory usage. The 500 MB/sec sounds to good to be true.
Some more information about the implementation and VM arguments used would be helpful.
Check
static void read3() throws IOException {
// read from the file with buffering
// and with direct access to the buffer
MyTimer mt = new MyTimer();
FileInputStream fis =
new FileInputStream(TESTFILE);
cnt3 = 0;
final int BUFSIZE = 1024;
byte buf[] = new byte[BUFSIZE];
int len;
while ((len = fis.read(buf)) != -1) {
for (int i = 0; i < len; i++) {
if (buf[i] == 'A') {
cnt3++;
}
}
}
fis.close();
System.out.println("read3 time = "
+ mt.getElapsed());
}
from http://java.sun.com/developer/JDCTechTips/2002/tt0305.html
The best buffer size might depend on the operating system.
Yours is maybe to0 small.