Fastest way to search a file in many folders - java

I need to search for a file based on a list of filenames inside a directory that contains 3100+ files and 14 folders but it takes hours to complete the search. Furthermore, I am only talking about 1 list of filenames yet, I still have other list of filenames to search.
After a locating the file to be searched, I need to access it and search for words inside it. And lastly proceed to the next file.
What I currently doing right now is that I used the concept of Breadth-First Search but it also takes hours to complete the search.
Are there any other ways to complete this task much faster?

See comments in the code
public class FileFinderApp {
// Create a list of file names that you want to process
private List<String> fileNames = new ArrayList<String>(Arrays.asList(new String[] {"test.txt","test1.txt","test2.txt"}));
// Create a FolderFilter, this just allows us to find files that are actually folders
private FolderFilter folderFilter = new FolderFilter();
public FileFinderApp() {
// Let the user know we are doing something in case there is no other output
System.out.println("Finding Files");
// Create a File to represent our starting folder
File startFolder = new File("F:\\FileTest");
// Get the list of Files that match our criteria
List<File> foundFiles = findFiles(startFolder, new FileNameFilter());
// loop through our files and do something with them
for (File file : foundFiles) {
// do something with each file
System.out.println(file.toString());
}
// Let the user know we are done.
System.out.println("Done Finding Files");
}
// This filter returns true if the File is a file (not folder, etc.) and matches one of our file names.
class FileNameFilter implements FileFilter {
public boolean accept(File file) {
return fileNames.contains(file.getName()) && file.isFile();
}
}
// This filter only returns folders
class FolderFilter implements FileFilter {
public boolean accept(File file) {
return file.isDirectory();
}
}
// Here's the meat and potatoes
private List<File> findFiles(File folder, FileFilter filter) {
// Create an empty list of Files
List<File> foundFiles = new ArrayList<File>();
// Find all sub-folders
File[] folders = folder.listFiles(folderFilter);
// Find the folders that pass our filter
File[] files = folder.listFiles(filter);
// add the files to our found files
foundFiles.addAll(Arrays.asList(files));
// for (File file : files) {
// System.out.println(file.getAbsolutePath());
// }
// loop through our sub-folders and get any files that match our filter there and add them to our list
// This is recursive and will execute as many levels as there are nested folders
for(File subFolder : folders) {
foundFiles.addAll(findFiles(subFolder, filter));
}
return foundFiles;
}
public static void main(String[] args) {
// don't block the UI
SwingUtilities.invokeLater(new Runnable() {
public void run() {
new FileFinderApp();
}
});
}
}

See comments :
/**
* #param searchIn
* a valid path to any non null file or directory
* #param searchFor
* any non null name of a file or directory
*/
public static File findFile(String searchIn, String searchFor){
return findFile(new File(searchIn), searchFor);
}
/**
* #param searchIn
* not null file or directory
* #param searchFor
* any non null name of a file or directory
*/
public static File findFile(File searchIn, String searchFor){
if(searchIn.getName().equalsIgnoreCase(searchFor)) {
return searchIn;
}
if(searchIn.isDirectory() && (searchIn.listFiles() != null)){
for (File file : searchIn.listFiles() ){
File f = findFile(file, searchFor);
if(f != null) {
return f;
}
}
}
return null;
}
Tested with :
String sourcePath = "C:\\";
String searchFor = "Google Docs.ico";
File f = findFile(sourcePath, searchFor);
Performance measured :
time:13.596946 seconds, directories:17985 files:116837

Related

File. ListFiles with FileNameFilter returning null instead of list of files with matching file name

I have written a find function which is like this :
public static List<File> find ( String path, String fName) {
List<File> list = new ArrayList<>() ;
File dir = new File(path) ;
if( dir. isDirectory() ) {
for( String aChild : dir. list()) {
list = find(path + File.separator + aChild, fName) ;
}
}
else {
File[] files = dir. listFiles ( (d, name) -> name. startsWith(fName) && name. endsWith(".txt")) ;
for(File fl : files)
list. add(fl) ;
}
return list;
}
The Directory structure on my Local machine is like C:\Salary with sub directories like January, February etc. Each of the sub directory contains files like 601246_jan_sal.txt or 601246_ feb_sal.txt.
I am calling the find function like
List<File> filePath = Utils. find("C:\\Salary\\", "601246") ;
And then performing operation on each individual file.
The problem is that in the find method dir.listFiles(FileNameFilter) is returning null value.
What am I doing wrong?
Below is basically the same method with the exception that is uses regex along with the String#matches() method to determine a file name match. I used regex so that the ? and * wildcard characters can be used within your file name search criteria, for example:
"601246*.txt"
You may find this useful for other searches you might like to carry out.
There is no returned object with this method, you just need to pass the List to it. Here is an example of how you might use it:
List<File> fileList = new ArrayList<>();
String searchCriteria = "601246*.txt";
searchFolder(new File("C:\\Salary"), searchCriteria, fileList);
// Display found files within the Console Window:
if (!fileList.isEmpty()) {
for (File file : fileList) {
System.out.println(file.getAbsolutePath());
}
}
else {
System.out.println("File name (" + searchCriteria + ") can not be found!");
}
This will search the directory (and all its sub-directories) located at C:\Salary within the local file system for all files that start with 601246 and ends with .txt. Since your files are in the following format:
601246_jan_sal.txt or 601246_ feb_sal.txt
and you happen to want all the files for February Sales, your search criteria might be: *feb?sal.txt.
Here is the searchFolder() method:
/**
* This method navigates through the supplied directory and any
* sub-directories contained within it for the supplied file name search
* criteria. Anything found is placed within the supplied List object.<br>
*
* #param file (File) The starting point directory (folder) in
* the local file system where the file(s) search
* should begin.<br>
*
* #param searchCriteria (String) The name of the file to search for. The
* wildcard characters '?' and '*' can also be used
* within the search criteria. Using an asterisk (*)
* allows you to replace a string of text. This is
* often useful if you know what kind of file you’re
* looking for but don’t know where it is or what
* certain name parts might be. <br><br>
*
* The wildcard '?' lets you use it to replace any character in a search.
* This means that if you’re looking for a file and you’re not sure how it
* is spelled, you can simply substitute '?' for the characters you don’t
* know. In the following example, we search for files that start with
* “img_2” and ends with “.jpg”:<pre>
*
* img_2???.jpg</pre><br>
*
* #param list (List Interface of type File: {#code List<File>})
* This would be the List of File you pass to this
* method. It will be this list that is filled with
* found files objects.<br>
*
* #param ignoreLetterCase (Optional - Boolean) Default is true. The search
* is not letter case sensitive. If false is
* supplied then the search is letter case
* sensitive.
*/
public static void searchFolder(File file, String searchCriteria, List<File> list, boolean... ignoreLetterCase) {
boolean ignoreCase = true;
if (ignoreLetterCase.length > 0) {
ignoreCase = ignoreLetterCase[0];
}
// Convert the supplied criteria string to a Regular Expression
// for the String#matches() method
String regEx = searchCriteria.replace("?", ".").replace("-", ".").replace("*", ".*?");
if (ignoreCase) {
regEx = "(?i)" + regEx;
}
if (file.isDirectory()) {
//System.out.println("Searching directory ... " + file.getAbsoluteFile());
//do you have permission to read this directory?
if (file.canRead()) {
for (File temp : file.listFiles()) {
if (temp.isDirectory()) {
searchFolder(temp, searchCriteria, list, ignoreCase);
}
else {
if (temp.getName().matches(regEx)) {
list.add(temp);
}
}
}
}
else {
System.err.println(file.getAbsoluteFile() + " - PERMISSION DENIED!");
}
}
}
Since you are using recursion, you should pass the list (of files) as a parameter to method find and not create a new list on each invocation. Hence the method find does not need to return a value.
public static void find(String path, String fName, List<File> fileList) {
File dir = new File(path);
if (dir.isDirectory()) {
for (File aChild : dir.listFiles()) {
if (aChild.isDirectory()) {
find(aChild.getAbsolutePath(), fName, fileList);
}
else {
String name = aChild.getName();
if (name.startsWith(fName) && name.endsWith(".txt")) {
fileList.add(aChild);
}
}
}
}
else {
String name = aChild.getName();
if (name.startsWith(fName) && name.endsWith(".txt")) {
fileList.add(aChild);
}
}
}
If method parameter path indicates a directory, then list the files in that directory. For each file in the directory, check whether the name of the file matches your search criteria and if it does then add it to the list. If it doesn't then check if it is itself a directory and if it is then recursively call method find with the new directory.
Initially call method find like so
List<File> list = new ArrayList<File>();
find("C:\\Salary\\", "601246", list);
Now list should contain the list of relevant files. So the following line will print the contents of list
System.out.println(list);
I see a fundamental problem in your logic that answers your question. Here are the key lines along with an explanation:
if( dir.isDirectory() ) {
....
}
else {
File[] files = dir.listFiles(...); // <- "dir" is a file, not a directory
for(File fl : files)
list.add(fl) ;
}
You check if File object dir represents a directory. If that test fails (ie: it's not a directory), then you call dir.listFiles on it and assign the result to files. But if it's not a directory, then by the definition of that function, it will return null. It seems that if the check for a directory fails, you should just add the object to list instead of performing another operation on it.
I think you want this:
if( dir.isDirectory() ) {
....
}
else {
list.add(dir) ;
}
I guess dir isn't really the right name for the variable here, as it isn't always a directory.
Note that the Files.walk() method already does what you're trying to do.
public static List<File> find ( String path, String fName) {
List<File> result = new ArrayList<>();
FileVisitor<Path> visitor = new SimpleFileVisitor() {
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
if (attrs.isRegularFile()) {
String name = file.getFileName();
if (name.startsWith(fName) && name.endsWith(".txt")) {
result.add(file.toFile());
}
}
return FileVisitResult.CONTINUE;
}
};
Files.walk(Paths.get(path), visitor);
return result;
}

How do I conditionally get code to loop back to specific point in code in java?

I am creating an application that automatically sorts and organizes files into a database​. I have written my code to read files within the imported folder one at a time, and process them into the DB. However, I am having trouble looping this process, so that I can process files that are nested in any amount of folders within the original folder that the user wants to input.
I simply need to instruct my program to go back to a specific part of my code and start running from there again.
Another possible way to solve this issue would be to create a way to list out all of the individual files within folder (including all the files within subfolders), and I could easily fit that into my program too.
I tried using labeled continue, return, and break keywords based off of an answer I got online, but I never expected those to succeed in looping my code back to a specific spot.
JFileChooser chooser = new JFileChooser();
chooser.setSelectedFiles(null);
chooser.setFileSelectionMode(JFileChooser.FILES_AND_DIRECTORIES);
chooser.showOpenDialog(null);
//Getting file paths from within folder
File f = chooser.getSelectedFile();
String file = f.getAbsolutePath();
if (f.isDirectory()) {
//Need to loop back to here
File folder = new File(file);
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isDirectory()) {
//Code here is run if there is a folder within a folder. I tested it too
//I want the code here to loop back above where it says "Need to loop back to here"
}
if (listOfFiles[i].isFile()) { //Once I list the files from within the folder, their information gets assigned variable here, and the rest of my program sorts it and saves it to DB accordingly.
//Everything below here is not important, but it might be helpful to see what happens each file with the folders.
System.out.println(listOfFiles[i]);
String filename = (listOfFiles[i].getName()); //For Files
Long filemodified = (listOfFiles[i].lastModified());
String filepath = (listOfFiles[i].getAbsolutePath());
Long filesizeraw = (listOfFiles[i].length());
long filehashcode = (listOfFiles[i].hashCode());
String fileparent = (listOfFiles[i].getParent());
Currently, there is no error message. It would process any individual files directly in the imported file (not nested in any folder within the folder), but wouldn't get to any of the files that are in folders within folders.
Another possible way to solve this issue would be to create a way to list out all of the individual files within folder (including all the files within subfolders), and I could easily fit that into my program too
Although this doesn't do the SQLite inserts, the following class extracts a list (of File objects) the files (thus file name and path are available via the File object).
public class FTS {
private ArrayList<File> mFileList; //Resultant list of Files extracted
private String mBaseDirectory; // The Directory to search
private long mSubDirectoryCount; // The count of the subdirectories
//Constructor
public FTS(String directory) {
this.mBaseDirectory = directory;
this.mSubDirectoryCount = 0;
buildFileListing(this.mBaseDirectory);
}
//
private void buildFileListing(String directory) {
// Initialise the ArrayList for the result
if (mFileList == null) {
mFileList = new ArrayList(){};
}
//Get the File (directory to process)
File dir = new File(directory);
// Get the List of the Directories contents
String[] filelist = dir.list();
// If empty (null) then return
if (filelist == null) {
return;
}
// Loop through the directory list
for (String s: filelist) {
//get the current list item as a file
File f = new File(dir.getAbsolutePath() + File.separator + s);
// is it a file or directory?
if (f.isFile() && !f.isDirectory()) {
this.mFileList.add(f); // If a file then add the file to the extracted list
} else {
// If a directory then increment the count of the subdirectories processed
mSubDirectoryCount++;
// and then recursively call this method to process the directory
buildFileListing(f.getAbsolutePath());
}
}
}
// return the list of extracted files
public ArrayList<File> getFileList() {
return this.mFileList;
}
// return the number of sub-directories processed
public long getSubDirectoryCount() {
return this.mSubDirectoryCount;
}
}
An example usage of the above is :-
public class Main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
FTS fileTreeSearch;
String BaseDirectory = "E:" + File.separator;
List<File> files = (fileTreeSearch = new FTS(BaseDirectory)).getFileList();
System.out.println("Extracted " + String.valueOf(files.size()) + " files, from " + String.valueOf(fileTreeSearch.getSubDirectoryCount()) + " sub-directories of " + BaseDirectory);
/* this commented out code would process all the extracted files
for (File f: files) {
System.out.println("File is " + f.getName() + "\t\t path " + f.getAbsolutePath());
}
*/
}
}
Example output from running the above :-
Extracted 186893 files, from 54006 sub-directories of E:\

Matching a String with a File Name

I'm writing a program that does various data analysis functions for use with Excel.
I need a way of returning file names of documents so I can search through them and find the ones I want.
I need to be able to take a string, saved as a variable, and use it to return the name of every document in a folder whose file name contains that string.
This will be used to sift through pre-categorized sections of data. Ideally I would save those documents' file names in a string array for later use within other functions.
private List<String> searchForFileNameContainingSubstring( String substring )
{
//This is assuming you pass in the substring from input.
File file = new File("C:/Users/example/Desktop"); //Change this to the directory you want to search in.
List<String> filesContainingSubstring = new ArrayList<String>();
if( file.exists() && file.isDirectory() )
{
String[] files = file.list(); //get the files in String format.
for( String fileName : files )
{
if( fileName.contains( substring ) )
filesContainingSubstring.add( fileName );
}
}
for( String fileName : filesContainingSubstring )
{
System.out.println( fileName ); //or do other operation
}
return filesContainingSubstring; //return the list of filenames containing substring.
}
Using this method, you could pass in the input from the user as the string you want the filename to contain. The only other thing you need to change is where you want in your directory to start searching for files, and this program only looks in that directory.
You could further look recursively within other directories from the starting point, but I won't add that functionality here. You should definitely look into it though.
This also assumes that you are looking for everything within the directory, including other folders and not just files.
You can get the list of all the files in a directory and then store them in an array. Next, using the java.io.File.getName() method, you can get the names of the files. Now you can simply use the .indexOf() method to check whether the string is a substring of the file name. I assume that all the items in the directory of concern are files and not sub directories.
public static void main(String[] args) throws IOException {
File[] files = new File("X:/").listFiles(); //X is the directory
String s <--- the string you want to check filenames with
for(File f : files){
if(f.getName().toLowerCase().indexOf(s.toLowerCase()) != -1)
System.out.println(f.getName());
}
}
This should display the names of all those files in the directory X:\ whose names include the String s.
References
This question: How do I iterate through the files in a directory in Java?
The java.io.File.getName() method
Statutory edit info
I have edited this answer simply to replace the previous algorithm, for checking the existence of a substring in a string, with the one that is currently used in the code above.
Here is an answer to search the file recursively??
String name; //to hold the search file name
public String listFolder(File dir) {
int flag;
File[] subDirs = dir.listFiles(new FileFilter() {
#Override
public boolean accept(File pathname) {
return pathname.isDirectory();
}
});
System.out.println("File of Directory: " + dir.getAbsolutePath());
flag = Listfile(dir);
if (flag == 0) {
System.out.println("File Found in THe Directory: " + dir.getAbsolutePath());
Speak("File Found in THe Directory: !!" + dir.getAbsolutePath());
return dir.getAbsolutePath();
}
for (File folder : subDirs) {
listFolder(folder);
}
return null;
}
private int Listfile(File dir) {
boolean ch = false;
File[] files = dir.listFiles();
for (File file : files) {
Listfile(file);
if (file.getName().indexOf(name.toLowerCase()) != -1) {//check all in lower case
System.out.println(name + "Found Sucessfully!!");
ch = true;
}
}
if (ch) {
return 1;
} else {
return 0;
}
}

Java: How to read directory folder, count and display no of files and copy to another folder?

I have to read a folder, count the number of files in the folder (can be of any type), display the number of files and then copy all the files to another folder (specified).
How would I proceed?
i Have to read a folder, count the number of files in the folder (can
be of any type) display the number of files
You can find all of this functionality in the javadocs for java.io.File
and then copy all the files to another folder (specified)
This is a bit more tricky. Read: Java Tutorial > Reading, Writing and Creating of Files
(note that the mechanisms described there are only available in Java 7 or later. If Java 7 is not an option, refer to one of many previous similar questions, e.g. this one: Fastest way to write to file? )
you have all the sample code here :
http://www.exampledepot.com
http://www.exampledepot.com/egs/java.io/GetFiles.html
File dir = new File("directoryName");
String[] children = dir.list();
if (children == null) {
// Either dir does not exist or is not a directory
} else {
for (int i=0; i<children.length; i++) {
// Get filename of file or directory
String filename = children[i];
}
}
// It is also possible to filter the list of returned files.
// This example does not return any files that start with `.'.
FilenameFilter filter = new FilenameFilter() {
public boolean accept(File dir, String name) {
return !name.startsWith(".");
}
};
children = dir.list(filter);
// The list of files can also be retrieved as File objects
File[] files = dir.listFiles();
// This filter only returns directories
FileFilter fileFilter = new FileFilter() {
public boolean accept(File file) {
return file.isDirectory();
}
};
files = dir.listFiles(fileFilter);
The copying http://www.exampledepot.com/egs/java.io/CopyDir.html :
// Copies all files under srcDir to dstDir.
// If dstDir does not exist, it will be created.
public void copyDirectory(File srcDir, File dstDir) throws IOException {
if (srcDir.isDirectory()) {
if (!dstDir.exists()) {
dstDir.mkdir();
}
String[] children = srcDir.list();
for (int i=0; i<children.length; i++) {
copyDirectory(new File(srcDir, children[i]),
new File(dstDir, children[i]));
}
} else {
// This method is implemented in Copying a File
copyFile(srcDir, dstDir);
}
}
However is very easy to gooole for this stuff :)
I know this is too late but below code worked for me. It basically iterates through each file in directory, if found file is a directory then it makes recursive call. It only gives files count in a directory.
public static int noOfFilesInDirectory(File directory) {
int noOfFiles = 0;
for (File file : directory.listFiles()) {
if (file.isFile()) {
noOfFiles++;
}
if (file.isDirectory()) {
noOfFiles += noOfFilesInDirectory(file);
}
}
return noOfFiles;
}

List files from directories and sub directories in java including only partial file paths

I need to get the paths of files and their parent directories in java from a given directory but not including it.
So for example, If my method was given the path: /home/user/test as a path it would return the paths of all files in that directory and under it.
So if /home/user/test had the sub folders: /subdir1 and /subdir2 each containing file1.txt and file2.txt then the result of the method would be 2 strings containing /subdir1/file1.txt and /subdir2/file2.txt
And if subdir1 had a directory inside it called subsubdir and inside that file3.txt, then the string created for that file would be /subdir1/subsubdir/file3.txt, and if there are further sub directories that would continue.
The idea is I just want the directory paths above the file but not the absolute path so only the directories AFTER the initial given path.
I know its a little confusing but I'm sure someone can make sense of it. Right now all I have is a recursive function that prints out file names and their absolute paths.
Any assistance on this?
What would have been nice if you had tried something and asked questions about that...
However...
public class TestFileSearch {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
new TestFileSearch();
}
public TestFileSearch() {
File parentPath = new File("C:/Users/shane/Documents");
List<String> files = list(parentPath);
for (String file : files) {
System.out.println(file);
}
}
protected List<String> list(File parent) {
return listFiles(parent, parent);
}
protected List<String> listFiles(File parent, File folder) {
List<String> lstFiles = new ArrayList<String>(25);
if (folder.isDirectory()) {
File[] files = folder.listFiles();
if (files != null) {
for (File file : files) {
if (file.isDirectory()) {
lstFiles.addAll(listFiles(parent, file));
} else {
String path = file.getPath();
String offset = parent.getPath();
path = path.substring(offset.length());
lstFiles.add(path);
}
}
}
}
return lstFiles;
}
}
You could simply do a normal folder recursion, returning a list of files and THEN strip of the prefix, but that's up to you
What about using the absolute path you currently have but removing the prefix from it using String.replace
You said you had the full, absolute path, say in full
then just do
String relative = full.replace(prefix, "");
If you have the input "/home/user/text", all absolute paths to files will start with /home/user/text/. If you're already able to print a list of all files under text/, then all you need to do is take the suitable substring.
The following function should visit all files under pathToDir. In the printFileName function, you can remove the /home/user/text part and print the file names
public static void gotoAllFiles(File pathToDir) {
if (pathToDir.isDirectory()) {
String[] subdirs = pathToDir.list();
for (int i=0; i<subdirs.length; i++) {
gotoAllFiles(new File(pathToDir, subdirs[i]));
}
} else {
printFileName(pathToDir);
}
}
For each file found, print the file.getAbsolutePath().substring(rootPath.length());

Categories

Resources