How to measure write/read disc speed using java, i know about SIGAR library, but i can't found such methods in there. Maybe someone know solution?
The problem is that I need to determine at what rate currently is writing to disk, and at what speed is being read. Furthermore, ideally, the data must be obtained for specific directories. But if you tell me at least how to define the entire disk would be very grateful.
How you obtain this information is entirely dependant on the operating system.
On Linux the simplest tool is to use iostat which will show you how much read/write in blocks and count for each file system.
Performance measurements on a per directory basis are not very meaningful as file systems are not implemented that way. You write to files which can be anywhere on disk and these files might appear in one or more directories. The files are not physically arranged by directory.
Related
I have an issue with fragmentation on my drive. I got a programm that generates over 50000 files in different folders, each file grows over time. Each file will be about 500MB in size and I need to read the files fast.
The issue I am facing is that each file will be spread over the drive and defragmenation would take over 4 weeks.
I heard about a filesystem that will spread each file on the drive so that the gap between each file is the same sice. I searched the internet for that filesystem but i couldn't find anything.
My program is written in Java, maybe there is a way to set the beginning of a file on a specific byte position on the drive.
I would be glad if someone could help me facing this issue.
I heard about a filesystem that will spread each file on the drive so that the gap between each file will be the same sice. I searched in the internet for that filesystem but i coudn't find anything.
Most likely you did not because it does not exist...
But we have RAID systems (Rapid Array of Inexpensive Disks) which could ease your pain...
As Timothy said, you can't get to that level by using Java.
I neither heard that filesystem, it hasn't got much logic though.
Perhaps, in the case that you are storing text, you can use a NoSQL database (like MongoDB) that stores data in binary size. Probably you'll get good speeds, and the Java connector is easy to use.
Use a Linux filesystem like ext4 where disk fragmentation is very low but also make sure you have plenty of disk space left else fragmentation will happen anyway.
I also don't know of a file system that does this. However I have some info that may help-
If you used an SSD, then fragmentation would be less of a concern for reading performance reasons. SSDs store data in chunks - NAND flash pages, 16 KB for instance. These are always stored in scattered order due to the wear-levelling algorithm used. That is very unlike how hard disks work in practice. Pages on SSDs are accessed in a very parallel fashion as well. As a result, you would have much less impact of fragmentation on reading performance with an SSD. Fragmentation would still have some penalty for writes/deletions.
RAID would also allow for higher performance on reads as Timothy mentions.
I'm testing data structure performance with very large data.
As a temporary workaround (see here) I want to write memory to disk.
I want to test with very big datasets - how can I make it so that when the java VM runs out of memory it writes some of it to disk?
Since we're talking about temporary fixes here you could always increase your page file if you need a little extra space (swap file in most linux distros)
Here's a link from Microsoft:
http://windows.microsoft.com/en-us/windows-vista/change-the-size-of-virtual-memory
Linux:
http://www.cyberciti.biz/faq/linux-add-a-swap-file-howto/
Now let me say that this isn't a good long term fix, but I understand that sometimes developers just need to make it work. If this is something that will ever see a production environment you may want to look at a tool like Hadoop. It allows you to distribute your data processing over multiple JVM's--a tool built for a "big data" application like the one you're describing
Maybe you can use stream, or some buffered one. I think that will be the best choice for testing such structure. If you can read from disk using stream and that will be not make any additional objects(only that which are necessary) so you can have all jvm memory for your structure. But maybe you can describe your problem more?
Using log4j on Unix, which Appender would perform the best to write 1000Meg :
1) Using RollingFileAppender writing 10 file of 100 Meg
or
2) Using a FileAppender and writing a single 1000Meg file
In other words, using java on unix, does the size matter?
Thank you
There no Java-side performance difference between writing to a small file or writing to a large file. There might be a small difference at the OS level when a file gets big enough that an extra level of index blocks is required (FS dependent), but it is probably not worth worrying about.
There will be a performance cost in implementing the file rolling behavior. The appender has to:
test / remember how big the file is,
close the current one,
rename it,
open a new file.
My gut feeling is that this is not likely to be significant. (However, it would be worth measuring to see if the performance impact should be a concern. Also, you should probably ask yourself if you are not doing too much logging.)
You have to compare all of the above against the advantages of file rolling:
Having a bounded size on log files means that your logging won't fill the disk, causing problems for the application and potentially others on the same machine.
Smaller log files can make it easier / quicker to do searches for events at specific times. (Running less on a 1000Mb file can be painful ...)
They'll both easily write 1000MB files. I don't see why they should perform differently.
You do need the RollingFileAppender though in order to set the total maximum size that the log file(s) can reach. Otherwise you may run out of hard disk space, assuming that you application has got traffic.
I think it's always preferable to use small files than large files because they are more manageable. Also, consider that with large files you may have problems in the case of file-system full because of the risk of having to remove the log file when the process is up and running to free up disk space.
I am writing a media transcoding server in which I would need to move files in the filesystem and till now I am in the dilemma of whether using java renameTo can be replaced by something else that would give me better performance. I was considering using exec("mv file1 file2") but that would be my last bet.
Anyone has had similar experiences or can help me find a solution?
First of all, renameTo is likely just wrapping a system call.
Secondly, moving a file does not involve copying any data from the file itself (at least, in unix). All that happens is that the link from the old directory is removed, and a link from the new directory is added. I don't think you're going to find any performance improvements here.
I don't think that using the default methods for file has a (mentionable) performance penalty as most of this JVMtoOS functions are wrapping native calls already.
The only case where an exec would be needed is if you wanted to do something with different rights than the program or use a special tool to copy/move the file. (e.g. smart-move when ntfs-junctions are involved)
If rename is a significant performance bottleneck, then you need to improve your hardware as this is your main contraint. The software is a trivial portion of the time spent and optimising it will make little difference.
What is your disk confiugration? How is it optimised for writes?
I need to perform a simple grep and other manipulations on large files in Java. I am not that familiar with the Java NIO utilities, but I am assuming that is what I need to use. What resources or helpful tips do you have for reading/writing large files. Also, I am working on a SWT application and need to display parts of that data within a text area on a GUI.
java.io.RandomAccessFile uses long for file-pointer offset so should be able to cope. However, you should read a chunk at a time otherwise overheads will be high. FileInputStream works similarly.
Java NIO shouldn't be too difficult. You don't need to mess around with Selectors or similar. In fact, prior to JDK7 you can't select with files. However, avoid mapping files. There is no unmap, so if you try to do it lots you'll run out of address space on 32-bit systems, or run into other problems (NIO does attempt to call GC, but it's a bit of a hack).
If all you are doing is reading the entire file a chunk at a time, with no special processing, then nio and java.io.RandomAccessFile are probably overkill. Just read and process the content of the file a block at a time. Ensure that you use a BufferedInputStream or BufferedReader.
If you have to read the entire file to do what you are doing, and you read only one file at a time, then you will gain little benefit from nio.
Maybe a little bit off topic: Have a look on VFS by apache. It's originally meant to be a library for hiding the ftp-http-file-whatever system behind a file system facade from your application's point of view. I mentioning it here because I have positive experience with accessing large files (via ftp) for searching, reading, copying etc. (large in that context means > 15MB) with this library.