I am currently looking to write to multiple files simultaneously. The files will hold about 17 Million lines of integers.
Currently, I am opening 5 Files that can be written to,(Some will remain empty), and then I perform shifting calculations to get a multiplier for the integer and to decide which files to write on.
My code looks like:
//Make files directory
File tempDir = new File("temp/test/txtFiles");
tempDir.mkdirs();
List<File> files = new ArrayList<>(); //Will hold the Files
List<FileWriter> writers = new ArrayList<>(); //Will hold fileWriter objects for all of the files
File currTxtFile; //Used to create the files
//Create the files
//Unused files will be blank
for(int f = 0; f < 5; f++)
{
currTxtFile = new File(tempDir, "file" + f + ".txt");
currTxtFile.createNewFile();
files.add(currTxtFile);
FileWriter fw = new FileWriter(currTxtFile);
writers.add(fw);
}
int[] multipliers = new int[5]; //will be used to calculate what to write to file
int[] fileNums = new int[5]; //will be used to know which file to write to
int start = 0;
/**
An example of fileNums output would be {0,4,0,1,4}
(i.e write the to file 0, then 4, then 0, then 1, then 4)
An example of multipliers output would be {100,10,5,1,2000}
(i.e value uses 100 for file 0, then 10 for file 4, then 5 for file 0, then 1 for file 1, then 2000 for file 4)
*/
for(long c = 0; c < 16980000, c++)
{
//Gets values for the multipliers and fileNums
int numOfMultipliers = getMultiplier(start,multipliers,fileNums);
for(int j = 0; j < numOfMultipliers; j++) // NumOfMultipliers can range from 0-4
{
int val = 30000000 * multipler[j] + 20000000;
writers.get(fileNums[j]).append(val + "\n");
}
start++;
}
for(FileWriter f: writers)
{
f.close();
}
The code is currently taking quite a while to write to the files (Over several hours (5+)). This code was translated from C++, where the files would output in about 10 minutes.
How could I improve upon the code to get the output to write quicker?
Likely flushing issues. In general, writing to multiple files is slower than writing to a single file, not faster. Think about it - with spinning disks, that thing doesn't have 5 separate write heads inside it. There's just the one, the process of writing to a spinning disk is fundamentally 'single threaded' - trying to write to multiple files simultaneously is in fact orders of magnitude slower, as the write head has to bounce around.
With modern SSDs it doesn't matter nearly as much, but there's still a bottleneck somewhere. It's either the disk or it isn't. There's nothing inherent about SSD design (for example, it doesn't have multiple pipelines or a whole bunch of CPUs to deal with incoming writes) that would make it faster if you write to multiple files simultaneously.
If the files exist each on a different volume, that's a different story, but from your code that's clearly not the case.
Thus, let's first get rid of this whole 'multiple files' thing. That either doesn't do anything, or makes things (significantly) slower.
So why is it slow in java?
Because of block processing. You need to know how disks work, first.
SSDs
The memory in an SSD can't actually be written to. Instead, entire blocks can be wiped clean and only then can they be written to. That's the only way an SSD can store data: Obliterate an entire block, then write data to it.
If a single block is 64k, and your code writes one integer at a time, that integer is about 10 bytes or so a pop. Your SSD will be obliterating a block, write one integer to it, a newline, and a lot of pointless further writes (it writes.. in blocks. It can't write any smaller, that's just how it works), and it'll do the exact same thing 6400 times more.
Instead, you'd want the SSD to just wipe that block and write 6400 integers into it once. The reason it doesn't just work that way out of the box is because people trip over power cables. Trust me, the bank is not going to stand for this. If you pull some bills out of an ATM and then some crash happens and because the last couple of transactions are just being stored in memory, waiting for a full block's worth of data before it actually writes, oh dear. So if you WANT to flush that stuff to disk, the system will dutifully execute.
Spinning disks
The write head needs to move to the right position and wait for the right sector to spin round and then it can write. Even though CPUs are really fast, the disk keeps spinning, it can't stop on a dime. So in the very short time it takes for the java code to supply you with another integer, the disk spins past the write point so the disk needs to wait one full spin, again. Much better to just send a much larger chunk of data to the disk controller so it can write it all in 'one spin', so to speak.
So how do I do that?
Simple. Use a BufferedWriter. This does the exact thing you want: It'll buffer data for quite a while, and only actually writes until its convenient, or you explicitly ask for it (call .flush() on it), or you close the writer. With the downside that if someone trips over a power cable, your data is gone, but presumably you don't mind - half of such a file is a problem no matter how much is there. Incomplete = useless.
Can it be faster?
Certainly. You're storing e.g. the number like '123456789' in at least 10 bytes, and the CPU needs to do conversion to turn that into the sequence [31, 32, 33, 34, 35, 36, 37, 38, 39, 13]. Much more efficient to just store exactly the bytes precisely as they are in memory - only takes 4 bytes, and no conversion needed, or at least simpler conversion. The downside is that you won't be able to make any sense of this file unless you use a hexeditor.
Example code - write integers in text form
Let's not use obsolete APIs.
Let's properly close resources.
Let's ditch this pointless 'multiple files' thing.
Path tgt = Paths.get("temp/test/txtFiles/out.txt");
try (var out = Files.newBufferedWriter(tgt)) {
for (long c = 0; c < 16980000, c++) {
//Gets values for the multipliers and fileNums
int numOfMultipliers = getMultiplier(start, multipliers, fileNums);
for(int j = 0; j < numOfMultipliers; j++) { // NumOfMultipliers can range from 0-4
int val = 30000000 * multipler[j] + 20000000;
out.write(val + "\n");
}
start++;
}
}
Example code - write ints directly
Path tgt = Paths.get("temp/test/txtFiles/out.txt");
try (var out = new DataOutputStream(
new BufferedOutputStream(
Files.newOutputStream(tgt))) {
for (long c = 0; c < 16980000, c++) {
//Gets values for the multipliers and fileNums
int numOfMultipliers = getMultiplier(start, multipliers, fileNums);
for(int j = 0; j < numOfMultipliers; j++) { // NumOfMultipliers can range from 0-4
int val = 30000000 * multipler[j] + 20000000;
out.writeInt(val);
}
start++;
}
}
Related
I am dealing with input streams of unknown size that I need to serialize to a byte[] for fail-safe behavior.
I have this code right now based on IOUtils, but with 5-50 diffrent threads possibly running this, I don't know how reliable it is.
try(final ByteArrayOutputStream output= new ByteArrayOutputStream()){
long free_memory = Runtime.getRuntime().freeMemory() / 5;
final byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
long count = 0;
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
free_memory -= n;
if (free_memory < DEFAULT_BUFFER_SIZE) {
free_memory = Runtime.getRuntime().freeMemory();
if (free_memory < (DEFAULT_BUFFER_SIZE * 10)) {
throw new IOException("JVM is low on Memory.");
}
free_memory = free_memory / 5;
}
}
output.flush();
return output.toByteArray();
}
I want to catch an OOM error before it is a problem and kills the thread, and I don't want to save the stream as a file. Is there a better way of making sure you don't use too much memory?
(I'm using Java 8)
Too answer your question, given the fact that multiple threads are running the same code, this is a very unreliable way.
The code asks the system how many memory is available with Runtime.getRuntime().freeMemory(), which is a value that is obsolete the instant it returns, as other threads will have consumed more memory in the meantime. The corresponding I/O exception that should be thrown in case some not so obvious threshold of remaining memory is thrown might or might not be executed, but whether it is is totally not important.
The capturing of the data is done inside a ByteArrayOutputStream, which will increase (and copy) its buffer each time the end is reached. It is not controlled by the 'how much memory is there' check, so again, multiple threads will be resizing their buffer at the same time, any of which can fail.
The most fail safe manner is storing the data on a disk, thus making a copy. If the data comes from an outside streaming source you can use
Files.copy(). If what you get is a file, you can use the other variant of copy, which I think delegates it to the OS.
How can I make this piece of code extremely quick?
It reads a raw image using RandomAccessFile (in) and write it in a file using DataOutputStream (out)
final int WORD_SIZE = 4;
byte[] singleValue = new byte[WORD_SIZE];
long position;
for (int i=1; i<=100000; i++)
{
out.writeBytes(i + " ");
for(int j=1; j<=17; j++)
{
in.seek(position);
in.read(singleValue);
String str = Integer.toString(ByteBuffer.wrap(singleValue).order(ByteOrder.LITTLE_ENDIAN).getInt());
out.writeBytes(str + " ");
position+=WORD_SIZE;
}
out.writeBytes("\n");
}
The inner for creates a new line in the file every 17 elements
Thanks
I assume that the reason you are asking is because this code is running really slowly. If that is the case, then one reason is that each seek and read call is doing a system call. A RandomAccessFile has no buffering. (I'm guessing that singleValue is a byte[] of length 1.)
So the way to make this go faster is to step back and think about what it is actually doing. If I understand it correctly, it is reading each 4th byte in the file, converting them to decimal numbers and outputting them as text, 17 to a line. You could easily do that using a BufferedInputStream like this:
int b = bis.read(); // read a byte
bis.skip(3); // skip 3 bytes.
(with a bit of error checking ....). If you use a BufferedInputStream like this, most of the read and skip calls will operate on data that has already been buffered, and the number of syscalls will reduce to 1 for every N bytes, where N is the buffer size.
UPDATE - my guess was wrong. You are actually reading alternate words, so ...
bis.read(singleValue);
bis.skip(4);
Every 100000 offsets I have to jump 200000 and then do it again till the end of the file.
Use bis.skip(800000) to do that. It should do a big skip by moving the file position without actually reading any data. One syscall at most. (For a FileInputStream, at least.)
You can also speed up the output side by a roughly equivalent amount by wrapping the DataOutputStream around a BufferedOutputStream.
But System.out is already buffered.
I'm writing a program in Java
In this program I'm reading and changing an array of data. This is an example of the code:
public double computation() {
char c = 0;
char target = 'a';
int x = 0, y = 1;
for (int i = 0; i < data.length; i++) {
// Read Data
c = data[index[i]];
if (c == target)
x++;
else
y++;
//Change Value
if (Character.isUpperCase(c))
Character.toLowerCase(c);
else
Character.toUpperCase(c);
//Write Data
data[index[i]] = c;
}
return (double) x / (double) y;
}
BTW, the INDEX array contains DATA array's indexes in random order to prevent prefetching. I'm forcing all of my cache accesses to be missed by using random indexes in INDEX array.
Now I want to check what is the behavior of the CPU cache by collecting information about its hit ratio.
Is there any developed tool for this purpose? If not is there any technique?
On Linux it is possible to collect such information via OProfile. Each CPU has performance event counters. See here for the list of the AMD K15 family events: http://oprofile.sourceforge.net/docs/amd-family15h-events.php
OProfile regularly samples the event counter(s) and together with the program counter. After a program run you can analyze how many events happen and at (statistically) what program position.
OProfile has build in Java support. It interacts with the Java JIT and creates a synthetic symbol table to look up the Java method name for a peace of generated JIT code.
The initial setup is not quite easy. If interested, I can guide you through or write a little more about it.
I don't think you can reach such low level information from Java but someone might know better. You could write the same program with no cache misses and check the difference. This is what I suggested in this other post for example.
I'm reading in a NetCDF file and I want to read in each array as a float array and then write the float array to a new file. I can make it work if I read in the float array and then iterate over each element in the array (using a DataOutputStream), but this is very, very slow, my NetCDF files are over 1GB.
I tried using an ObjectOutputStream, but this writes extra bytes of information.
So, to recap.
1. Open NetCDF file
2. Read float array x from NetCDF file
3. Write float array x to raw data file in a single step
4. Repeat step 2 with x+1
Ok, You have 1 GB to read and 1 GB to write. Depending on your hard drive, you might get about 100 MB/s read and 60 MB/s write speed. This means it will take about 27 seconds to read and write.
What is the speed of your drive and how much slower than this are you seeing?
If you want to test the speed of your disk without any processing, time how long it takes to copy a file which you haven't accessed recently (i.e. it is not in disk cache) This will give you an idea of the minimum delay you can expect to read then write most of the data from the file (i.e. with no processing or Java involved)
For the benefit of anyone who would like to know how to do a loop less copy of data i.e. it doesn't just call a method which loops for you.
FloatBuffer src = // readable memory mapped file.
FloatByffer dest = // writeable memory mapped file.
src.position(start);
src.limit(end);
dest.put(src);
If you have mixed types of data you can use ByteBuffer which notionally copies a byte at a time but in reality could use long or wider type to copy 8 or more bytes at a time. i.e. whatever the CPU can do.
For small blocks this will use a loop but for large blocks it can use page mapping tricks in the OS. In any case, how it does it is not defined in Java, but its likely to be the fastest way to copy data.
Most of these tricks only make a difference if you are copying file already in memory to a cached file. As soon as you read a file from disk or the file is too large to cache the IO bandwidth of the your physical disk is the only thing which really matters.
This is because a CPU can copy data at 6 GB/s to main memory but only 60-100 MB/s to a hard drive. If the copy in the CPU/memory is 2x, 10x or 50x slower than it could be, it will still be waiting for the disk. Note: with no buffering this is entirely possible and worse, but provided you have any simple buffering the CPU will be faster than the disk.
I ran into the same problem and will dump my solution here just for future refrerence.
It is very slow to iterate over an array of floats and calling DataOutputStream.writeFloat for each of them. Instead, transform the floats yourself into a byte array and write that array all at once:
Slow:
DataOutputStream out = ...;
for (int i=0; i<floatarray.length; ++i)
out.writeFloat(floatarray[i]);
Much faster
DataOutputStream out = ...;
byte buf[] = new byte[4*floatarray.length];
for (int i=0; i<floatarray.length; ++i)
{
int val = Float.floatToRawIntBits(probs[i]);
buf[4 * i] = (byte) (val >> 24);
buf[4 * i + 1] = (byte) (val >> 16) ;
buf[4 * i + 2] = (byte) (val >> 8);
buf[4 * i + 3] = (byte) (val);
}
out.write(buf);
If your array is very large (>100k), break it up into chunks to avoid heap overflow with the buffer array.
1) when writing, use BufferedOutputStream, you will get a factor of 100 speedup.
2) when reading, read at least 10K per read, probably 100K is better.
3) post your code.
If you are using the Unidata NetCDF library your problem may not be the writing, but rather the NetCDF libraries caching mechanism.
NetcdfFile file = NetcdfFile.open(filename);
Variable variable = openFile.findVariable(variable name);
for (...) {
read data
variable.invalidateCache();
}
Lateral solution:
If this is a one-off generation (or if you are willing to automate it in an Ant script) and you have access to some kind of Unix environment, you can use NCDUMP instead of doing it in Java. Something like:
ncdump -v your_variable your_file.nc | [awk] > float_array.txt
You can control the precision of the floats with the -p option if you desire. I just ran it on a 3GB NetCDF file and it worked fine. As much as I love Java, this is probably the quickest way to do what you want.
I have a large (3Gb) binary file of doubles which I access (more or less) randomly during an iterative algorithm I have written for clustering data. Each iteration does about half a million reads from the file and about 100k writes of new values.
I create the FileChannel like this...
f = new File(_filename);
_ioFile = new RandomAccessFile(f, "rw");
_ioFile.setLength(_extent * BLOCK_SIZE);
_ioChannel = _ioFile.getChannel();
I then use a private ByteBuffer the size of a double to read from it
private ByteBuffer _double_bb = ByteBuffer.allocate(8);
and my reading code looks like this
public double GetValue(long lRow, long lCol)
{
long idx = TriangularMatrix.CalcIndex(lRow, lCol);
long position = idx * BLOCK_SIZE;
double d = 0;
try
{
_double_bb.position(0);
_ioChannel.read(_double_bb, position);
d = _double_bb.getDouble(0);
}
...snip...
return d;
}
and I write to it like this...
public void SetValue(long lRow, long lCol, double d)
{
long idx = TriangularMatrix.CalcIndex(lRow, lCol);
long offset = idx * BLOCK_SIZE;
try
{
_double_bb.putDouble(0, d);
_double_bb.position(0);
_ioChannel.write(_double_bb, offset);
}
...snip...
}
The time taken for an iteration of my code increases roughly linearly with the number of reads. I have added a number of optimisations to the surrounding code to minimise the number of reads, but I am at the core set that I feel are necessary without fundamentally altering how the algorithm works, which I want to avoid at the moment.
So my question is whether there is anything in the read/write code or JVM configuration I can do to speed up the reads? I realise I can change hardware, but before I do that I want to make sure that I have squeezed every last drop of software juice out of the problem.
Thanks in advance
As long as your file is stored on a regular harddisk, you will get the biggest possible speedup by organizing your data in a way that gives your accesses locality, i.e. causes as many get/set calls in a row as possible to access the same small area of the file.
This is more important than anything else you can do because accessing random spots on a HD is by far the slowest thing a modern PC does - it takes about 10,000 times longer than anything else.
So if it's possible to work on only a part of the dataset (small enough to fit comfortably into the in-memory HD cache) at a time and then combine the results, do that.
Alternatively, avoid the issue by storing your file on an SSD or (better) in RAM. Even storing it on a simple thumb drive could be a big improvement.
Instead of reading into a ByteBuffer, I would use file mapping, see: FileChannel.map().
Also, you don't really explain how your GetValue(row, col) and SetValue(row, col) access the storage. Are row and col more or less random? The idea I have in mind is the following: sometimes, for image processing, when you have to access pixels like row + 1, row - 1, col - 1, col + 1 to average values; on trick is to organize the data in 8 x 8 or 16 x 16 blocks. Doing so helps keeping the different pixels of interest in a contiguous memory area (and hopefully in the cache).
You might transpose this idea to your algorithm (if it applies): you map a portion of your file once, so that the different calls to GetValue(row, col) and SetValue(row, col) work on this portion that's just been mapped.
Presumably if we can reduce the number of reads then things will go more quickly.
3Gb isn't huge for a 64 bit JVM, hence quite a lot of the file would fit in memory.
Suppose that you treat the file as "pages" which you cache. When you read a value, read the page around it and keep it in memory. Then when you do more reads check the cache first.
Or, if you have the capacity, read the whole thing into memory, in at the start of processing.
Access byte-by-byte always produce poor performance (not only in Java). Try to read/write bigger blocks (e.g. rows or columns).
How about switching to database engine for handling such amounts of data? It would handle all optimizations for you.
May be This article helps you ...
You might want to consider using a library which is designed for managing large amounts of data and random reads rather than using raw file access routines.
The HDF file format may by a good fit. It has a Java API but is not pure Java. It's licensed under an Apache Style license.