The Java Paths class has this useful method for creating a Path object:
Path path = Paths.get("a", "b", "c", "d", "e.txt");
However, with this approach the resultant path is relative to the invoking directory. What is the best platform-independent way to get an absolute path (one which transparently incorporates both the Windows \ and the Unix / conventions)?
If you have a Path object that needs to be an absolute path instead, you can just invoke toAbsolutePath() on it; note that path objects know what the platform separator is; it is already platform independent, there's no need to manually convert any slashes.
If you mean: I have a bunch of strings and I want on unix to obtain a path representing /a/b/c/d/e.txt but on windows to obtain a path representing C:\a\b\c\d\e.txt that's a problem. Because C: is not actually something you just get to assume.
Your question is then unanswerable: Turns out that if you want to be fully platform independent, there is no such notion as 'the root directory'. On windows, one can only speak of 'a root directory', and there are as many as file systems hooked up to a drive letter, and that too is a tricky abstraction, because really windows works with a \\drive-identifier\path model.
Something you may want to investigate:
for (Path p : FileSystems.getDefault().getRootDirectories()) {
System.out.println("Found a root: " + p);
}
You could for example go with 'just go off of the first root, whatever it is':
Path root = FileSystems.getDefault().getRootDirectories().iterator().next();
Path p = root.resolve(Paths.get("a", "b", "c", "d", "e.txt"));
System.out.println(p);
Whether that is the root you intended (presumably, C:) - that'll depend on quite a few factors.
The best thing to do is to forget about having separate strings. If you are taking user input representing an absolute path, take one, single string. A user on windows is bound to type something like: C:\Users\Sandeep\hello.txt, and Paths.get() has no problem with that input at all. A user on mac might type /Users/Sandeep/hello.txt and this too works just fine if you feed it to Path, and operations from there on out are entirely platform independent already.
Related
I've "inherited" existing Java Selenium & Cucumber framework which was written mostly for OS usage. I'm using Windows and I'm trying to fix & run it on Windows.
My first problem is specifing corrent file path, this is how it was written for OS:
private String getProjectName(Scenario scenario) {
return Arrays.asList(scenario.getUri().getPath().replace(System.getProperty("user.dir"), "").split("/")).get(5);
}
Error which I'm receiving is:
java.lang.ArrayIndexOutOfBoundsException: Index 5 out of bounds for length 1
As for Windows we're using backlashes I've tried switching "/" into "" but as error appears (+ after my investigations) I've tried with "\\\\" but actually error remains the same as above.
I'm aware that providing only portion of my code and it may be hard but for the first glance can you tell me:
If that method may work on Windows or this should be completely refactored?
Is System.getProperty("user.dir") correct solution?
How to correctly pass backslashes?
Why they're taking .get(5)?
I can guess:
This method is taking the project name that is likely the name of a certain folder in the folder structure where scenario file located.
This is why they took 5th element. Because on the 5th level there was the folder which represented the project.
The used approach look very arguable. At least because there are some redundent steps like converting to list.
Now. How would you go:
The proper way is to use java.nio.file.Path (starts from Java 7) that takes care of differnt OS-specific things.
So your code might look like:
private String getProjectName(Scenario scenario) {
return Path.of(scenario.getUri()).getName(5)
}
P.S. - of course you have to change 5 to catch a proper position of the required folder in your structure.
This question already has an answer here:
How to check if a directory is inside other directory
(1 answer)
Closed 4 months ago.
Given two strings - one representing a directory path and the other representing a file path - what is the most efficient way to check whether the file exists under (it could be any levels deep) the given directory.
I started by turning both into File objects, and comparing their canonical paths
// null checks etc. omitted for brevity
File file = new File(filePath);
String fileCanonicalPath = file.getCanonicalPath();
File dir = new File(dirPath);
String dirCanonicalPath = dir.getCanonicalPath();
return fileCanonicalPath.startsWith(dirCanonicalPath);
But I am not convinced this is actually the most accurate way to do it - I would like to rely on what the system considers rather than comparing strings.
I then tried with converting both to File objects and then recursively calling the getParentFile() on the file, and comparing each one with the directory (using equals(...)), until I get the directory or no further parent (getParent() returns null).
But this seems rather inefficient.
Is there a better way - more efficient and more objectively correct - to do this?
There is a Topic in the Book : Java the complete reference guide (Twelfth edition) in (EXPLORING NIO) season which is "Use walkFileTree() to List a Directory Tree" which explain the java solution to list and do what ever you want with a Directory Tree.
Try to read about walkFileTree() method of java, it will simply solve your problem.
feel free to ask any question you need to know.
Suppose a very simple program that lists out all the subdirectories of a given directory. Sound simple enough? Except the only way to list all subdirectories in Java is to use FilenameFilter combined with File.list().
This works for the trivial case, but when the folder has say 150,000 files and 2 sub folders, it's silly waiting there for 45 seconds iterating through all the files and testing for file.isDirectory(). Is there a better way to list sub directories??
PS. Sorry, please save the lectures on having too many files in the same directory. Our live environment has this as part of the requirement.
As has already been mentioned, this is basicly a hardware problem. Disk access is always slow, and most file systems aren't really designed to handle directories with that many files.
If you for some reason have to store all the files in the same directory, I think you'll have to maintain your own cache. This could be done using a local database such as sqlite, HeidiSQL or HSQL. If you want extreme performance, use a java TreeSet and cache it in memory. This means at the very least that you'll have to read the directory less often, and it could possibly be done in the background. You could reduce the need to refresh the list even further by using your systems native file update notification API (inotify on linux) to subscribe to changes to the directory.
This doesn't seem to be possible for you, but I once solved a similiar problem by "hashing" the files into subdirectories. In my case, the challenge was to store a couple of millions images with numeric ids. I constructed the directory structure as follows:
images/[id - (id % 1000000)]/[id - (id % 1000)]/[id].jpg
This has worked well for us, and it's the solution that I would recommend. You could do something similiar to alpha-numeric filenames by simply taking the first two letters of the filename, and then the next two letters. I've done this as well once, and it did the job as well.
Do you know the finite list of possible subdirectory names? If so, use a loop over all possible names and check for directory's existence.
Otherwise, you can not get ONLY directory names in most underlying OSs (e.g. in Unix, the directory listing is merely reading contents of "directory" file, so there's no way to find "just directories" quickly without listing all the files).
However, in NIO.2 in Java7 (see http://java.sun.com/developer/technicalArticles/javase/nio/#3 ), there's a way to have a streaming directory list so you don't get a full array of file elements cluttering your memory/network.
There's actually a reason why you got the lectures: it's the correct answer to your problem. Here's the background, so that perhaps you can make some changes in your live environment.
First: directories are stored on the filesystem; think of them as files, because that's exactly what they are. When you iterate through the directory, you have to read those blocks from the disk. Each directory entry will require enough space to hold the filename, and permissions, and information on where that file is found on-disk.
Second: directories aren't stored with any internal ordering (at least, not in the filesystems where I've worked with directory files). If you have 150,000 entries and 2 sub-directories, those 2 sub-directory references could be anywhere within the 150,000. You have to iterate to find them, there's no way around that.
So, let's say that you can't avoid the big directory. Your only real option is to try to keep the blocks comprising the directory file in the in-memory cache, so that you're not hitting the disk every time you access them. You can achieve this by regularly iterating over the directory in a background thread -- but this is going to cause undue load on your disks, and interfere with other processes. Alternatively, you can scan once and keep track of the results.
The alternative is to create a tiered directory structure. If you look at commercial websites, you'll see URLs like /1/150/15023.html -- this is meant to keep the number of files per directory small. Think of it as a BTree index in a database.
Of course, you can hide that structure: you can create a filesystem abstraction layer that takes filenames and automatically generates the directory tree where those filenames can be found.
The key problem could be File.isDirectory() function called in a loop.
File.isDirectory() can be extremely slow. I saw NFS take 10 seconds to process 200 file directory.
If you can by all means prevent File.isDirectory() calls (e.g. test for extension, no extension == directory), you could improve the performance drastically.
Otherwise I would suggest doing JNA/JNI/writing a native script that does this for you.
The jCifs library lets you manipulate windows network shares more efficiently. I am not aware of a library that would do this for other network file systems.
You could hack it if the 150k files all (or a significant number of them) had a similar naming convention like:
*.jpg
*Out.txt
and only actually create file objects for the ones you are unsure about being a folder.
I don't know if the overhead of shelling out to cmd.exe would eat it up, but one possibility would be something like this:
...
Runtime r = Runtime.getRuntime();
Process p = r.exec("cmd.exe /k dir /s/b/ad C:\\folder");
BufferedReader br = new BufferedReader(new InputStreamReader(p.getInputStream()));
for (;;) {
String d = br.readLine();
if (d == null)
break;
System.out.println(d);
}
...
/s means search subdirectories
/ad means only return directories
/b means return the full pathname from the root
I came across similar question when debugging performance in a Java application enumerating plenty of files. It is using old approach
for (File f : new File("C:\\").listFiles()) {
if (f.isDirectory()) {
continue;
}
}
And it appears that each f.isDirectory() is the call into native FileSsystem which, at least on NTFS, is very slow. Java7 NIO has additional API, but not all methods are good there. I'll just provide JMH benchmark result here
Benchmark Mode Cnt Score Error Units
MyBenchmark.dir_listFiles avgt 5 0.437 ? 0.064 s/op
MyBenchmark.path_find avgt 5 0.046 ? 0.001 s/op
MyBenchmark.path_walkTree avgt 5 1.702 ? 0.047 s/op
Number come from execution of this code:
java -jar target/benchmarks.jar -bm avgt -f 1 -wi 5 -i 5 -t 1
static final String testDir = "C:/Sdk/Ide/NetBeans/src/dev/src/";
static final int nCycles = 50;
public static class Counter {
int countOfFiles;
int countOfFolders;
}
#Benchmark
public List<File> dir_listFiles() {
List<File> files = new ArrayList<>(1000);
for( int i = 0; i < nCycles; i++ ) {
File dir = new File(testDir);
files.clear();
for (File f : dir.listFiles()) {
if (f.isDirectory()) {
continue;
}
files.add(f);
}
}
return files;
}
#Benchmark
public List<Path> path_walkTree() throws Exception {
final List<Path> files = new ArrayList<>(1000);
for( int i = 0; i < nCycles; i++ ) {
Path dir = Paths.get(testDir);
files.clear();
Files.walkFileTree(dir, new SimpleFileVisitor<Path> () {
#Override
public FileVisitResult visitFile(Path path, BasicFileAttributes arg1) throws IOException {
files.add(path);
return FileVisitResult.CONTINUE;
}
#Override
public FileVisitResult preVisitDirectory(Path path, BasicFileAttributes arg1)
throws IOException {
return path == dir ? FileVisitResult.CONTINUE : FileVisitResult.SKIP_SUBTREE;
}
});
}
return files;
}
#Benchmark
public List<Path> path_find() throws Exception {
final List<Path> files = new ArrayList<>(1000);
for( int i = 0; i < nCycles; i++ ) {
Path dir = Paths.get(testDir);
files.clear();
files.addAll(Files.find(dir, 1, (path, attrs)
-> true /*!attrs.isDirectory()*/).collect(Collectors.toList()));
}
return files;
}
if your OS is 'stable' give a try to JNA:
opendir/readdir on UNIX
FindFirstFile and related API on Windows
Java7 with NIO2
these are all "streaming API". They doesn't force you to allocate a 150k list/array before start searching. IMHO this is a great advantage in your scenario.
Here's an off-the wall solution, and devoid of any testing at all. It's also dependent on having a filesystem that supports symbolic links. This isn't a Java solution. I suspect your problem is filesystem/OS-related, and not Java related.
Is it possible to create a parallel directory structure, with subdirectories based on initial letters of the filenames, and then symbolically link to the real files ? An illustration
/symlinks/a/b/cde
would link to
/realfiles/abcde
(where /realfiles is where your 150,000 files reside)
You'd have to create and maintain this directory structure, and I don't have enough info to determine if that's practical. But the above would create a fast(er) index into your non-hierarchical (and slow) directory.
there is also a recursive parallel scanning at http://blogs.oracle.com/adventures/entry/fast_directory_scanning. Essentially siblings are processed in parallel. There also encouraging performance tests.
Maybe you could write a directory searching program in C#/C/C++ and use JNI to get it to Java. Do not know if this would improve performance or not.
Well, either JNI, or, if you say your deployment is constant, just run "dir" on Windows or "ls" on *nixes, with appropriate flags to list only directories (Runtime.exec())
In that case you might try some JNA solution - a platform dependant directory traverser (FindFirst, FindNext on Windows) with the possibility of some iteration pattern. Also Java 7 will have much better file system support, worth checking out the specs (I don't remember any specifics).
Edit: An idea: one option is to hide the slowness of the directory listing from the user's eyes. In a client side app, you could use some animation while the listing is working to distract the user. Actually depends on what else your application does beside the listing.
As of 2020, the DirectoryStream does seem to be faster than using File.listFiles() and checking each file with isDirectory().
I learned the answer from here:
https://www.baeldung.com/java-list-directory-files
I'm using Java 1.8 on Windows 10.
I just wondered if there are any ways to let your program find a file you want to use by just giving the name without writing the whole search path like this.
Scanner betalningsservice = new Scanner(new File("/afs/nada.kth.se/home/i/u1vxrjgi/betalningsservice.txt"));
String line1 = betalningsservice.nextLine();
You can see that its a pretty long path and I would like the program to be able to detect the file "betalningsservice.txt" whereever the file is located in the computer (in case the file has been moved to somewhere else). Any tips guys? :)
Thanks in advance
Since there's some debate about what exactly is wanted in this question, I'll post another answer.
If you're using Java 8, finding a file is made somewhat simpler by using the Files.find function. It has the advantage of being able to limit how deep the search goes, keeping search speed under control. Here's an example that sticks the Paths of all matching files into a List. If you find more than one matching file you can have the user choose the right one:
final String SEARCH_FILE = "betalningsservice.txt"; // the file you're looking for
final String SEARCH_ROOT = "/afs/nada.kth.se/home/i/"; // where to start the search (top folder)
final int SEARCH_DEPTH = 4; // how many nested subfolders to delve into
final List<Path> files = new LinkedList<>();
Files.find(Paths.get(SEARCH_ROOT), SEARCH_DEPTH, (p, a) -> p.endsWith(SEARCH_FILE))
.forEach(e -> files.add(e));
It's debatable whether one big-ass statement that does all of the logic of the search is more readable or less readable, but that's Java 8 for you.
If you want to get advanced, you can also append FileVisitOptions to the find function's parameter list (for example, to follow symbolic links).
It's interesting to note that in Java 8, Path has generally replaced File as the way to represent files and folders, hence the List of Paths. Once you've selected the correct Path (we'll say it's in a variable called path), you can use it similarly to how you would use a File:
Scanner betalningsservice = new Scanner(path);
The rest is as before.
File can also create files using a relative path. Just don't start the file name with a slash. For example, if you run the program from the folder "/afs/nada.kth.se/home/i/u1vxrjgi/", you can just use:
new File("betalningsservice.txt")
...and that will get you the file you want.
There is not a practical/reliable way to just find the file anywhere on the computer.
You can, however, utilize relative file paths if you know your working directory. So if your working directory were /afs/nada.kth.se/home/i/u1vxrjgi, you could refer to the file just by new File("betalningsservice.txt"). Similarly, if your working diretory were /afs/nada.kth.se/home/i, you could refer to the file as new File("u1vxrjgi/betalningsservice.txt").
Another option would be to read from the classpath. This can be accomplished by getting the classloader to get a resource.
Suppose a very simple program that lists out all the subdirectories of a given directory. Sound simple enough? Except the only way to list all subdirectories in Java is to use FilenameFilter combined with File.list().
This works for the trivial case, but when the folder has say 150,000 files and 2 sub folders, it's silly waiting there for 45 seconds iterating through all the files and testing for file.isDirectory(). Is there a better way to list sub directories??
PS. Sorry, please save the lectures on having too many files in the same directory. Our live environment has this as part of the requirement.
As has already been mentioned, this is basicly a hardware problem. Disk access is always slow, and most file systems aren't really designed to handle directories with that many files.
If you for some reason have to store all the files in the same directory, I think you'll have to maintain your own cache. This could be done using a local database such as sqlite, HeidiSQL or HSQL. If you want extreme performance, use a java TreeSet and cache it in memory. This means at the very least that you'll have to read the directory less often, and it could possibly be done in the background. You could reduce the need to refresh the list even further by using your systems native file update notification API (inotify on linux) to subscribe to changes to the directory.
This doesn't seem to be possible for you, but I once solved a similiar problem by "hashing" the files into subdirectories. In my case, the challenge was to store a couple of millions images with numeric ids. I constructed the directory structure as follows:
images/[id - (id % 1000000)]/[id - (id % 1000)]/[id].jpg
This has worked well for us, and it's the solution that I would recommend. You could do something similiar to alpha-numeric filenames by simply taking the first two letters of the filename, and then the next two letters. I've done this as well once, and it did the job as well.
Do you know the finite list of possible subdirectory names? If so, use a loop over all possible names and check for directory's existence.
Otherwise, you can not get ONLY directory names in most underlying OSs (e.g. in Unix, the directory listing is merely reading contents of "directory" file, so there's no way to find "just directories" quickly without listing all the files).
However, in NIO.2 in Java7 (see http://java.sun.com/developer/technicalArticles/javase/nio/#3 ), there's a way to have a streaming directory list so you don't get a full array of file elements cluttering your memory/network.
There's actually a reason why you got the lectures: it's the correct answer to your problem. Here's the background, so that perhaps you can make some changes in your live environment.
First: directories are stored on the filesystem; think of them as files, because that's exactly what they are. When you iterate through the directory, you have to read those blocks from the disk. Each directory entry will require enough space to hold the filename, and permissions, and information on where that file is found on-disk.
Second: directories aren't stored with any internal ordering (at least, not in the filesystems where I've worked with directory files). If you have 150,000 entries and 2 sub-directories, those 2 sub-directory references could be anywhere within the 150,000. You have to iterate to find them, there's no way around that.
So, let's say that you can't avoid the big directory. Your only real option is to try to keep the blocks comprising the directory file in the in-memory cache, so that you're not hitting the disk every time you access them. You can achieve this by regularly iterating over the directory in a background thread -- but this is going to cause undue load on your disks, and interfere with other processes. Alternatively, you can scan once and keep track of the results.
The alternative is to create a tiered directory structure. If you look at commercial websites, you'll see URLs like /1/150/15023.html -- this is meant to keep the number of files per directory small. Think of it as a BTree index in a database.
Of course, you can hide that structure: you can create a filesystem abstraction layer that takes filenames and automatically generates the directory tree where those filenames can be found.
The key problem could be File.isDirectory() function called in a loop.
File.isDirectory() can be extremely slow. I saw NFS take 10 seconds to process 200 file directory.
If you can by all means prevent File.isDirectory() calls (e.g. test for extension, no extension == directory), you could improve the performance drastically.
Otherwise I would suggest doing JNA/JNI/writing a native script that does this for you.
The jCifs library lets you manipulate windows network shares more efficiently. I am not aware of a library that would do this for other network file systems.
You could hack it if the 150k files all (or a significant number of them) had a similar naming convention like:
*.jpg
*Out.txt
and only actually create file objects for the ones you are unsure about being a folder.
I don't know if the overhead of shelling out to cmd.exe would eat it up, but one possibility would be something like this:
...
Runtime r = Runtime.getRuntime();
Process p = r.exec("cmd.exe /k dir /s/b/ad C:\\folder");
BufferedReader br = new BufferedReader(new InputStreamReader(p.getInputStream()));
for (;;) {
String d = br.readLine();
if (d == null)
break;
System.out.println(d);
}
...
/s means search subdirectories
/ad means only return directories
/b means return the full pathname from the root
I came across similar question when debugging performance in a Java application enumerating plenty of files. It is using old approach
for (File f : new File("C:\\").listFiles()) {
if (f.isDirectory()) {
continue;
}
}
And it appears that each f.isDirectory() is the call into native FileSsystem which, at least on NTFS, is very slow. Java7 NIO has additional API, but not all methods are good there. I'll just provide JMH benchmark result here
Benchmark Mode Cnt Score Error Units
MyBenchmark.dir_listFiles avgt 5 0.437 ? 0.064 s/op
MyBenchmark.path_find avgt 5 0.046 ? 0.001 s/op
MyBenchmark.path_walkTree avgt 5 1.702 ? 0.047 s/op
Number come from execution of this code:
java -jar target/benchmarks.jar -bm avgt -f 1 -wi 5 -i 5 -t 1
static final String testDir = "C:/Sdk/Ide/NetBeans/src/dev/src/";
static final int nCycles = 50;
public static class Counter {
int countOfFiles;
int countOfFolders;
}
#Benchmark
public List<File> dir_listFiles() {
List<File> files = new ArrayList<>(1000);
for( int i = 0; i < nCycles; i++ ) {
File dir = new File(testDir);
files.clear();
for (File f : dir.listFiles()) {
if (f.isDirectory()) {
continue;
}
files.add(f);
}
}
return files;
}
#Benchmark
public List<Path> path_walkTree() throws Exception {
final List<Path> files = new ArrayList<>(1000);
for( int i = 0; i < nCycles; i++ ) {
Path dir = Paths.get(testDir);
files.clear();
Files.walkFileTree(dir, new SimpleFileVisitor<Path> () {
#Override
public FileVisitResult visitFile(Path path, BasicFileAttributes arg1) throws IOException {
files.add(path);
return FileVisitResult.CONTINUE;
}
#Override
public FileVisitResult preVisitDirectory(Path path, BasicFileAttributes arg1)
throws IOException {
return path == dir ? FileVisitResult.CONTINUE : FileVisitResult.SKIP_SUBTREE;
}
});
}
return files;
}
#Benchmark
public List<Path> path_find() throws Exception {
final List<Path> files = new ArrayList<>(1000);
for( int i = 0; i < nCycles; i++ ) {
Path dir = Paths.get(testDir);
files.clear();
files.addAll(Files.find(dir, 1, (path, attrs)
-> true /*!attrs.isDirectory()*/).collect(Collectors.toList()));
}
return files;
}
if your OS is 'stable' give a try to JNA:
opendir/readdir on UNIX
FindFirstFile and related API on Windows
Java7 with NIO2
these are all "streaming API". They doesn't force you to allocate a 150k list/array before start searching. IMHO this is a great advantage in your scenario.
Here's an off-the wall solution, and devoid of any testing at all. It's also dependent on having a filesystem that supports symbolic links. This isn't a Java solution. I suspect your problem is filesystem/OS-related, and not Java related.
Is it possible to create a parallel directory structure, with subdirectories based on initial letters of the filenames, and then symbolically link to the real files ? An illustration
/symlinks/a/b/cde
would link to
/realfiles/abcde
(where /realfiles is where your 150,000 files reside)
You'd have to create and maintain this directory structure, and I don't have enough info to determine if that's practical. But the above would create a fast(er) index into your non-hierarchical (and slow) directory.
there is also a recursive parallel scanning at http://blogs.oracle.com/adventures/entry/fast_directory_scanning. Essentially siblings are processed in parallel. There also encouraging performance tests.
Maybe you could write a directory searching program in C#/C/C++ and use JNI to get it to Java. Do not know if this would improve performance or not.
Well, either JNI, or, if you say your deployment is constant, just run "dir" on Windows or "ls" on *nixes, with appropriate flags to list only directories (Runtime.exec())
In that case you might try some JNA solution - a platform dependant directory traverser (FindFirst, FindNext on Windows) with the possibility of some iteration pattern. Also Java 7 will have much better file system support, worth checking out the specs (I don't remember any specifics).
Edit: An idea: one option is to hide the slowness of the directory listing from the user's eyes. In a client side app, you could use some animation while the listing is working to distract the user. Actually depends on what else your application does beside the listing.
As of 2020, the DirectoryStream does seem to be faster than using File.listFiles() and checking each file with isDirectory().
I learned the answer from here:
https://www.baeldung.com/java-list-directory-files
I'm using Java 1.8 on Windows 10.