I'm trying to perform a once-through read of a large file (~4GB) using Java 5.0 x64 (on Windows XP).
Initially the file read rate is very fast, but gradually the throughput slows down substantially, and my machine seems very unresponsive as time goes on.
I've used ProcessExplorer to monitor the File I/O statistics, and it looks like the process initially reads 500MB/sec, but this rate gradually drops to around 20MB/sec.
Any ideas on the the best way to maintain File I/O rates, especially with reading large files using Java?
Here's some test code that shows the "interval time" continuing to increase. Just pass Main a file that's at least 500MB.
import java.io.File;
import java.io.RandomAccessFile;
public class MultiFileReader {
public static void main(String[] args) throws Exception {
MultiFileReader mfr = new MultiFileReader();
mfr.go(new File(args[0]));
}
public void go(final File file) throws Exception {
RandomAccessFile raf = new RandomAccessFile(file, "r");
long fileLength = raf.length();
System.out.println("fileLen: " + fileLength);
raf.close();
long startTime = System.currentTimeMillis();
doChunk(0, file, 0, fileLength);
System.out.println((System.currentTimeMillis() - startTime) + " ms");
}
public void doChunk(int threadNum, File file, long start, long end) throws Exception {
System.out.println("Starting partition " + start + " to " + end);
RandomAccessFile raf = new RandomAccessFile(file, "r");
raf.seek(start);
long cur = start;
byte buf[] = new byte[1000];
int lastPercentPrinted = 0;
long intervalStartTime = System.currentTimeMillis();
while (true) {
int numRead = raf.read(buf);
if (numRead == -1) {
break;
}
cur += numRead;
if (cur >= end) {
break;
}
int percentDone = (int)(100.0 * (cur - start) / (end - start));
if (percentDone % 5 == 0) {
if (lastPercentPrinted != percentDone) {
lastPercentPrinted = percentDone;
System.out.println("Thread" + threadNum + " Percent done: " + percentDone + " Interval time: " + (System.currentTimeMillis() - intervalStartTime));
intervalStartTime = System.currentTimeMillis();
}
}
}
raf.close();
}
}
Thanks!
I very much doubt that you're really getting 500MB per second from your disk. Chances are the data is cached by the operating system - and that the 20MB per second is what happens when it really hits the disk.
This will quite possibly be visible in the disk section of the Vista Resource Manager - and a low-tech way to tell is to listen to the disk drive :)
Depending on your specific hardware and what else is going on, you might need to work reasonably hard to do much more than 20MB/sec.
I think perhaps you don't really how completely off-the-scale the 500MB/sec is...
What are you hoping for, and have you checked that your specific drive is even theoretically capable of it?
The Java Garbage Collector could be a bottleneck here.
I would make the buffer larger and private to the class so it is reused instead of allocated by each call to doChunk().
public class MultiFileReader {
private byte buf[] = new byte[256*1024];
...
}
You could use JConsole to monitor your app, including memory usage. The 500 MB/sec sounds to good to be true.
Some more information about the implementation and VM arguments used would be helpful.
Check
static void read3() throws IOException {
// read from the file with buffering
// and with direct access to the buffer
MyTimer mt = new MyTimer();
FileInputStream fis =
new FileInputStream(TESTFILE);
cnt3 = 0;
final int BUFSIZE = 1024;
byte buf[] = new byte[BUFSIZE];
int len;
while ((len = fis.read(buf)) != -1) {
for (int i = 0; i < len; i++) {
if (buf[i] == 'A') {
cnt3++;
}
}
}
fis.close();
System.out.println("read3 time = "
+ mt.getElapsed());
}
from http://java.sun.com/developer/JDCTechTips/2002/tt0305.html
The best buffer size might depend on the operating system.
Yours is maybe to0 small.
Related
I am currently making mp3 player in NetBeans 12.1 and I can't find a way to control current position of a song.
I have tried using .setMicrosecondPosition(), but it seems it only works with the clip not with the line.
Is it even possible for my player to change current position of the track or should I change my code?
This is the code of the player.
public void run() {
final File file = new File(filePath);
try (final AudioInputStream in = AudioSystem.getAudioInputStream(file)) {
final AudioFormat outFormat = getOutFormat(in.getFormat());
final Info info = new Info(SourceDataLine.class, outFormat);
try (final SourceDataLine line
= (SourceDataLine) AudioSystem.getLine(info)) {
getLine(line);
line.getMicrosecondPosition();
if (line != null) {
line.open(outFormat);
line.start();
long millis;
AudioFileFormat fileFormat = AudioSystem.getAudioFileFormat(file);
Map<?, ?> properties = ((TAudioFileFormat) fileFormat).properties();
String key = "duration";
String title = "title";
Long microseconds = (Long) properties.get(key);
maksimumSekunde = (int)TimeUnit.MICROSECONDS.toSeconds(microseconds);
title1 = (String) properties.get(title);
int mili = (int) (microseconds / 1000);
sec = (mili / 1000) % 60;
min = (mili / 1000) / 60;
setVolumeDown(sliderGlasnoca.getValue());
//STREAM
int n = 0;
final byte[] buffer = new byte[4096];
AudioInputStream inp = getAudioInputStream(outFormat, in);
while (n != -1) {
if (pauza == true) {
break;
}
if (stop == true) {
synchronized (LOCK) {
LOCK.wait();
}
}
n = inp.read(buffer, 0, buffer.length);
if (n != -1) {
line.write(buffer, 0, n);
}
millis = TimeUnit.MICROSECONDS.toMillis(line.getMicrosecondPosition());
trajanjeSekunde = (int)TimeUnit.MICROSECONDS.toSeconds(line.getMicrosecondPosition());
minutes = (millis / 1000) / 60;
seconds = ((millis / 1000) % 60);
//System.out.println(minutes + ":" + seconds + " " + "time = " + min + ":" + sec + " " + title1);
}
//STREAM
line.drain();
line.stop();
Finished();
}
} catch (InterruptedException ex) {
}
} catch (UnsupportedAudioFileException
| LineUnavailableException
| IOException e) {
throw new IllegalStateException(e);
}
}
Its my first time posting here.
I always just counted and discarded frames from bytes being read via the AudioInputStream, but looking anew at the API, I see that one can use the AudioInputStream.skip(...) method to jump forward a given number of bytes. Calculating the number of bytes corresponding to a given amount of time involves knowing the number of bytes per frame, e.g., 16-bit encoding, stereo is 4 bytes per frame, and the sample rate.
IDK if one can reliably skip backwards. This will depend on whether one can "mark" and "reset" the file being read by the AudioInputStream. If these capabilities are supported, it seems conceivable that one could mark(...) the start of the AudioInputStream. Then, to go backwards, first reset() back to the beginning and then jump forward via skip(...). I haven't tested this. A lot would depend on the number of bytes permitted in the mark(...) method.
If starting or stopping in the middle of playing audio, the data that is fed to the SourceDataLine would potentially exhibit "clicks" due to the discontinuity in the signal. To deal with that it might be necessary to convert the starts and stops to PCM and ramp the volume up if starting, or down if stopping. The number of frames required would probably need to be determined by experimenting. I'm guessing 64 frames for 44100fps might be a good first try.
I use a FileChannel to write 4GB files to a spin disc and although I have tweaked the buffer size to maximise write speed and flush the channel every second the file channel close can take 200 ms. This is enough time that the queue that I read from overflows and starts dropping packets.
I use a direct byte buffer, but I am struggling to understand what is happening here. I have removable discs and write caching has been disabled so I would not expect the OS to be buffering the data?
The benchmark speed of the discs is around 80 MB/Sec, but I am seeing the long file channel close times even when writing at speeds of ~ 40 MB/Sec.
I appreciate that as the discs fills then write performance will decrease, but these discs are empty.
Is there any tweaks I can do to remove the long delay when closing the file channel. Should I be allocating the file space upfront and I write the file with a .lock extension and then do a rename once the file has been completed?
Just hoping someone who has done high throughput IO can provide some pointers as to possible options above and beyond what is usually documented when writing files using NIO.
The code is below and I cannot see anything immediately wrong.
public final class DataWriter implements Closeable {
private static final Logger LOG = Logger.getLogger("DataWriter");
private static final long MB = 1024 * 1024;
private final int flushPeriod;
private FileOutputStream fos;
private FileChannel fileChannel;
private long totalBytesWritten;
private long lastFlushTime;
private final ByteBuffer buffer;
private final int bufferSize;
private final long startTime;
private long totalPackets = 0;
private final String fileName;
public DataWriter(File recordFile, int bSize, int flushPeriod) throws IOException {
this.flushPeriod = flushPeriod;
if (!recordFile.createNewFile()) {
throw new IllegalStateException("Record file has not been created");
}
totalBytesWritten = 0;
fos = new FileOutputStream(recordFile);
fileChannel = fos.getChannel();
buffer = ByteBuffer.allocateDirect(bSize);
bufferSize = bSize;
startTime = System.currentTimeMillis();
this.fileName = recordFile.getAbsolutePath();
}
/**
* Appends the supplied ByteBuffer to the main buffer if there is space
* #param packet
* #return
* #throws IOException
*/
public int write(ByteBuffer packet) throws IOException {
int bytesWritten = 0;
totalPackets++;
//If the buffer cannot accommodate the supplied buffer then write straight out
if(packet.limit() > buffer.capacity()) {
bytesWritten = writeBuffer(packet);
totalBytesWritten += bytesWritten;
} else {
//write the currently filled buffer if no space exists to accomodate the current buffer
if(packet.limit() > buffer.remaining()) {
buffer.flip();
bytesWritten = writeBuffer(buffer);
totalBytesWritten += bytesWritten;
}
buffer.put(packet);
}
if(System.currentTimeMillis()-lastFlushTime > flushPeriod) {
fileChannel.force(true);
lastFlushTime=System.currentTimeMillis();
}
return bytesWritten;
}
public long getTotalBytesWritten() {
return totalBytesWritten;
}
/**
* Writes the buffer and then clears it
* #throws IOException
*/
private int writeBuffer(ByteBuffer byteBuffer) throws IOException {
int bytesWritten = 0;
while(byteBuffer.hasRemaining()) {
bytesWritten += fileChannel.write(byteBuffer);
}
//Reset the buffer ready for writing
byteBuffer.clear();
return bytesWritten;
}
#Override
public void close() throws IOException {
//Write the buffer if data is present
if(buffer.position() != 0) {
buffer.flip();
totalBytesWritten += writeBuffer(buffer);
fileChannel.force(true);
}
long time = System.currentTimeMillis() - startTime;
if(LOG.isDebugEnabled()) {
LOG.debug( totalBytesWritten + " bytes written in " + (time / 1000d) + " seconds using ByteBuffer size ["+bufferSize/1024+"] KB");
LOG.debug( (totalBytesWritten / MB) / (time / 1000d) + " MB per second written to file " + fileName);
LOG.debug( "Total packets written ["+totalPackets+"] average packet size ["+totalBytesWritten / totalPackets+"] bytes");
}
if (fos != null) {
fos.close();
fos = null;
}
}
}
I need to limit the file size to 1 GB while writing preferably using BufferedWriter.
Is it possible using BufferedWriter or I have to use other libraries ?
like
try (BufferedWriter writer = Files.newBufferedWriter(path)) {
//...
writer.write(lines.stream());
}
You can always write your own OutputStream to limit the number of bytes written.
The following assumes you want to throw exception if size is exceeded.
public final class LimitedOutputStream extends FilterOutputStream {
private final long maxBytes;
private long bytesWritten;
public LimitedOutputStream(OutputStream out, long maxBytes) {
super(out);
this.maxBytes = maxBytes;
}
#Override
public void write(int b) throws IOException {
ensureCapacity(1);
super.write(b);
}
#Override
public void write(byte[] b) throws IOException {
ensureCapacity(b.length);
super.write(b);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
ensureCapacity(len);
super.write(b, off, len);
}
private void ensureCapacity(int len) throws IOException {
long newBytesWritten = this.bytesWritten + len;
if (newBytesWritten > this.maxBytes)
throw new IOException("File size exceeded: " + newBytesWritten + " > " + this.maxBytes);
this.bytesWritten = newBytesWritten;
}
}
You will of course now have to set up the Writer/OutputStream chain manually.
final long SIZE_1GB = 1073741824L;
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new LimitedOutputStream(Files.newOutputStream(path), SIZE_1GB),
StandardCharsets.UTF_8))) {
//
}
Exact bytes to 1 GB is very difficult in cases where you are writing lines. Each line may contain unknown number of bytes in it. I am assuming you want to write data line by line in file.
However, you can check how many bytes does line has before writing it to the file and another approach is to check file size after writing each line.
Following basic example writes one same line each time. Here This is just a test ! text takes 21 bytes on file in UTF-8 encoding. Ultimately after 49 writes it reaches to 1029 Bytes and stops writing.
public class Test {
private static final int ONE_KB = 1024;
public static void main(String[] args) {
File file = new File("D:/test.txt");
try (BufferedWriter writer = Files.newBufferedWriter(file.toPath())) {
while (file.length() < ONE_KB) {
writer.write("This is just a test !");
writer.flush();
}
System.out.println("1 KB Data is written to the file.!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
As you can see we have already written out of the limit of 1KB as above program writes 1029 Bytes and not less than 1024 Bytes.
Second approach is checking the bytes according to specific encoding before writing it to file.
public class Test {
private static final int ONE_KB = 1024;
public static void main(String[] args) throws UnsupportedEncodingException {
File file = new File("D:/test.txt");
String data = "This is just a test !";
int dataLength = data.getBytes("UTF-8").length;
try (BufferedWriter writer = Files.newBufferedWriter(file.toPath())) {
while (file.length() + dataLength < ONE_KB) {
writer.write(data);
writer.flush();
}
System.out.println("1 KB Data written to the file.!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
In this approach we check length of bytes prior to writing it to the file. So, it will write 1008 Bytes and it will stop writing.
Problems with both the approaches,
Write and Check : You may end up with some extra bytes and file size may cross the limit
Check and Write : You may have less bytes than the limit if next line has lot of data in it. You should be careful about the encoding.
However, there are other ways to do this validations with some third party library like apache io and I find it more cumbersome then conventional java ways.
int maxSize = 1_000_000_000;
Charset charset = StandardCharsets.UTF_F);
int size = 0;
int lineCount = 0;
while (lineCount < lines.length) {
long size2 = size + (lines[lineCount] + "\r\n").getBytes(charset).length;
if (size2 > maxSize) {
break;
}
size = size2;
++lineCount;
}
List<String> linesToWrite = lines.substring(0, lineCount);
Path path = Paths.get("D:/test.txt");
Files.write(path, linesToWrite , charset);
Or a bit faster while decoding only once:
int lineCount = 0;
try (FileChannel channel = new RandomAccessFile("D:/test.txt", "w").getChannel()) {
ByteBuffer buf = channel.map(FileChannel.MapMode.WRITE, 0, maxSize);
lineCount = lines.length;
for (int i = 0; i < lines.length; i++) {
bytes[] line = (lines.get(i) + "\r\n").getBytes(charset);
if (line.length > buffer.remaining()) {
lineCount = i;
break;
}
buffer.put(line);
}
}
IIUC, there are various ways to do it.
Keep writing data in chucks and flushing it and keep checking the file size after every flush.
Use log4j (or some logging framework) which can let us rollover to new file after certain size or time or some other trigger point.
While BufferedReader is great, there are some new APIs in java which could make it faster. Fastest way to write huge data in text file Java
I have an InputStream, and the relative file name and size.
I need to access/read some random (increasing) positions in the InputStream. This positions are stored in an integer array (named offsets).
InputStream inputStream = ...
String fileName = ...
int fileSize = (int) ...
int[] offsets = new int[]{...}; // the random (increasing) offsets array
Now, given an InputStream, I've found only two possible solutions to jump to random (increasing) positions of the file.
The first one is to use the skip() method of the InputStream (note that I actually use BufferedInputStream, since I will need to mark() and reset() the file pointer).
//Open a BufferInputStream:
BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream);
byte[] bytes = new byte[1];
int curFilePointer = 0;
long numBytesSkipped = 0;
long numBytesToSkip = 0;
int numBytesRead = 0;
//Check the file size:
if ( fileSize < offsets[offsets.length-1] ) { // the last (bigger) offset is bigger then the file size...
//Debug:
Log.d(TAG, "The file is too small!\n");
return;
}
for (int i=0, k=0; i < offsets.length; i++, k=0) { // for each offset I have to jump...
try {
//Jump to the offset [i]:
while( (curFilePointer < offsets[i]) && (k < 10) ) { // until the correct offset is reached (at most 10 tries)
numBytesToSkip = offsets[i] - curFilePointer;
numBytesSkipped = bufferedInputStream.skip(numBytesToSkip);
curFilePointer += numBytesSkipped; // move the file pointer forward
//Debug:
Log.d(TAG, "FP: " + curFilePointer + "\n");
k++;
}
if ( curFilePointer != offsets[i] ) { // it did NOT jump properly... (what's going on?!)
//Debug:
Log.d(TAG, "InputStream.skip() DID NOT JUMP PROPERLY!!!\n");
break;
}
//Read the content of the file at the offset [i]:
numBytesRead = bufferedInputStream.read(bytes, 0, bytes.length);
curFilePointer += numBytesRead; // move the file pointer forward
//Debug:
Log.d(TAG, "READ [" + curFilePointer + "]: " + bytes[0] + "\n");
}
catch ( IOException e ) {
e.printStackTrace();
break;
}
catch ( IndexOutOfBoundsException e ) {
e.printStackTrace();
break;
}
}
//Close the BufferInputStream:
bufferedInputStream.close()
The problem is that, during my tests, for some (usually big) offsets, it has cycled 5 or more times before skipping the correct number of bytes. Is it normal? And, above all, can/should I thrust skip()? (That is: Are 10 cycles enough to be SURE it will ALWAYS arrive to the correct offset?)
The only alternative way I've found is the one of creating a RandomAccessFile from the InputStream, through File.createTempFile(prefix, suffix, directory) and the following function.
public static RandomAccessFile toRandomAccessFile(InputStream inputStream, File tempFile, int fileSize) throws IOException {
RandomAccessFile randomAccessFile = new RandomAccessFile(tempFile, "rw");
byte[] buffer = new byte[fileSize];
int numBytesRead = 0;
while ( (numBytesRead = inputStream.read(buffer)) != -1 ) {
randomAccessFile.write(buffer, 0, numBytesRead);
}
randomAccessFile.seek(0);
return randomAccessFile;
}
Having a RandomAccessFile is actually a much better solution, but the performance are exponentially worse (above all because I will have more than a single file).
EDIT: Using byte[] buffer = new byte[fileSize] speeds up (and a lot) the RandomAccessFile creation!
//Create a temporary RandomAccessFile:
File tempFile = File.createTempFile(fileName, null, context.getCacheDir());
RandomAccessFile randomAccessFile = toRandomAccessFile(inputStream, tempFile, fileSize);
byte[] bytes = new byte[1];
int numBytesRead = 0;
//Check the file size:
if ( fileSize < offsets[offsets.length-1] ) { // the last (bigger) offset is bigger then the file size...
//Debug:
Log.d(TAG, "The file is too small!\n");
return;
}
for (int i=0, k=0; i < offsets.length; i++, k=0) { // for each offset I have to jump...
try {
//Jump to the offset [i]:
randomAccessFile.seek(offsets[i]);
//Read the content of the file at the offset [i]:
numBytesRead = randomAccessFile.read(bytes, 0, bytes.length);
//Debug:
Log.d(TAG, "READ [" + (randomAccessFile.getFilePointer()-4) + "]: " + bytes[0] + "\n");
}
catch ( IOException e ) {
e.printStackTrace();
break;
}
catch ( IndexOutOfBoundsException e ) {
e.printStackTrace();
break;
}
}
//Delete the temporary RandomAccessFile:
randomAccessFile.close();
tempFile.delete();
Now, is there a better (or more elegant) solution to have a "random" access from an InputStream?
It's a bit unfortunate you have an InputStream to begin with, but in this situation buffering the stream in a file is of no use iff you are always skipping forward. But you don't have to count the number of times you have called skip, that's not really of interest.
What you do have to check if the stream has ended already, to prevent an infinite loop. Checking the source of the default skip implementation, I'd say you'll have to keep calling skip until it returns 0. This will indicate the end of stream has been reached. The JavaDoc was a bit unclear about this for my taste.
You can't. An InputStream is a stream, that is to say a sequential construct. Your question embodies a contradiction in terms.
I'm trying to do a few performance enhancements and am looking to use memory mapped files for writing data. I did a few tests and surprisingly, MappedByteBuffer seems slower than allocating direct buffers. I'm not able to clearly understand why this would be the case. Can someone please hint at what could be going on behind the scenes? Below are my test results:
I'm allocating 32KB buffers. I've already created the files with sizes 3Gigs before starting the tests. So, growing the file isn't the issue.
I'm adding the code that I used for this performance test. Any input / explanation about this behavior is much appreciated.
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
public class MemoryMapFileTest {
/**
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
for (int i = 0; i < 10; i++) {
runTest();
}
}
private static void runTest() throws IOException {
// TODO Auto-generated method stub
FileChannel ch1 = null;
FileChannel ch2 = null;
ch1 = new RandomAccessFile(new File("S:\\MMapTest1.txt"), "rw").getChannel();
ch2 = new RandomAccessFile(new File("S:\\MMapTest2.txt"), "rw").getChannel();
FileWriter fstream = new FileWriter("S:\\output.csv", true);
BufferedWriter out = new BufferedWriter(fstream);
int[] numberofwrites = {1,10,100,1000,10000,100000};
//int n = 10000;
try {
for (int j = 0; j < numberofwrites.length; j++) {
int n = numberofwrites[j];
long estimatedTime = 0;
long mappedEstimatedTime = 0;
for (int i = 0; i < n ; i++) {
byte b = (byte)Math.random();
long allocSize = 1024 * 32;
estimatedTime += directAllocationWrite(allocSize, b, ch1);
mappedEstimatedTime += mappedAllocationWrite(allocSize, b, i, ch2);
}
double avgDirectEstTime = (double)estimatedTime/n;
double avgMapEstTime = (double)mappedEstimatedTime/n;
out.write(n + "," + avgDirectEstTime/1000000 + "," + avgMapEstTime/1000000);
out.write("," + ((double)estimatedTime/1000000) + "," + ((double)mappedEstimatedTime/1000000));
out.write("\n");
System.out.println("Avg Direct alloc and write: " + estimatedTime);
System.out.println("Avg Mapped alloc and write: " + mappedEstimatedTime);
}
} finally {
out.write("\n\n");
if (out != null) {
out.flush();
out.close();
}
if (ch1 != null) {
ch1.close();
} else {
System.out.println("ch1 is null");
}
if (ch2 != null) {
ch2.close();
} else {
System.out.println("ch2 is null");
}
}
}
private static long directAllocationWrite(long allocSize, byte b, FileChannel ch1) throws IOException {
long directStartTime = System.nanoTime();
ByteBuffer byteBuf = ByteBuffer.allocateDirect((int)allocSize);
byteBuf.put(b);
ch1.write(byteBuf);
return System.nanoTime() - directStartTime;
}
private static long mappedAllocationWrite(long allocSize, byte b, int iteration, FileChannel ch2) throws IOException {
long mappedStartTime = System.nanoTime();
MappedByteBuffer mapBuf = ch2.map(MapMode.READ_WRITE, iteration * allocSize, allocSize);
mapBuf.put(b);
return System.nanoTime() - mappedStartTime;
}
}
You're testing the wrong thing. This is not how to write the code in either case. You should allocate the buffer once, and just keep updating its contents. You're including allocation time in the write time. Not valid.
Swapping data to disk is the main reason for MappedByteBuffer to be slower than DirectByteBuffer.
cost of allocation and deallocation is high with direct buffers , including MappedByteBuffer, and this is cost is accrued to both the examples hence the only difference in writing to disk , which is the case with MappedByteBuffer but not with Direct Byte Buffer