Check for new files in a loop - java

Check for new files in a loop - java - java

I have a program that needs to read files. I need to check every 10 seconds if there is new files.
To do that, I've made this :
ArrayList<File>oldFiles = new ArrayList<File>();
ArrayList<File>files=new ArrayList<File>();
while(isFinished != true){
files=listFilesForFolder(folder);
if(oldFiles.size() != files.size()){
System.out.println("Here is when a new file(s) is(are) in the folder");
}
Thread.sleep(10000);
}
Basically, the listFilesForFolder is getting a folder destination, and check the files in there.
My problem : My program does every loop my reading function on every file. I want to do my reading function ONLY on new files.
How can I do something like :
new files - old files = my files I want to read.

Rather than your approach why not store the DateTime of the last time that you checked.
Then compare this time to the File.lastModified value
The problem with your appraoch is that the array sizes will be different even in a file is deleted, and will be the same if one file is deleted and one file is added.

Rather than comparing old and new files, why not write a method to just return Last Modified Files.
public static ArrayList<File> listLastModifiedFiles(File folder,
long sleepDuration) throws Exception {
ArrayList<File> newFileList = new ArrayList<File>();
for (File fileEntry : folder.listFiles())
if ((System.currentTimeMillis() - fileEntry.lastModified()) <= sleepDuration)
newFileList.add(fileEntry);
return newFileList;
}
//Sample usage:
long sleepDuration = 10000;
ArrayList<File> newFileList;
int counter = 10;
while (counter-- > 0) {
newFileList = listLastModifiedFiles(folder, sleepDuration);
for (File File : newFileList)
System.out.println(File.getName());
Thread.sleep(sleepDuration);
}

You can use sets. Instead of returning an ArrayList, you could return a set instead.
newFiles.removeAll(oldFiles);
would then give you all the files that are not in the old set. I'm not saying that working with the modification date as Scary Wombat has pointed out is a worse idea, I'm just offering another solution.
Additionally, you have to modify your oldFiles to hold all files you've already encountered. The following example I think does what you're trying to achieve.
private static Set<File> findFilesIn(File directory) {
// Or whatever logic you have for finding files
return new HashSet<File>(Arrays.asList(directory.listFiles()));
}
public static void main(String[] args) throws Throwable {
Set<File> allFiles = new HashSet<File>(); // Renamed from oldFiles
Set<File> newFiles = new HashSet<File>();
File dir = new File("/tmp/stackoverflow/");
while (true) {
allFiles.addAll(newFiles); // Add files from last round to collection of all files
newFiles = findFilesIn(dir);
newFiles.removeAll(allFiles); // Remove all the ones we already know.
System.out.println(String.format("Found %d new files: %s", newFiles.size(), newFiles));
System.out.println("Sleeping...");
Thread.sleep(5000);
}
}
Sets are a more appropiate data storage for your case since you don't need any order in your collection of files and can benefit from faster lookup times (when using a HashSet).

Assuming that you only need to detect new files, not modified ones, and no file will be removed while your code is running:
ArrayList implements removeAll(Collection c), which does exactly what you want:
Removes from this list all of its elements that are contained in the
specified collection.

You might want to consider using the Java WatchService API which uses the low level operating system to notify you of changes to the file system. It's more efficient and faster than listing the files in directory.
There is a tutorial at Watching a Directory for Changes and the API is documented here: Interface WatchService

Related

Extract FileName from getAbsolutePath() method

I'm using a method from this site to read all the files exist on the system hard drives, it's working fine, but i need to check that a certain file exists while searching.
to make the story short here is the line code which is reading the files
parseAllFiles(f.getAbsolutePath());
how can I assign the output from this method to a string so i can search iniside this string for the file I want, or there any way to add/change to this statement to get the filename directely in a string?
public static void parseAllFiles(String parentDirectory){
File[] filesInDirectory = new File(parentDirectory).listFiles();
if(filesInDirectory != null){
for(File f : filesInDirectory){
if(f.isDirectory()){
parseAllFiles(f.getAbsolutePath()); // get full path
}
System.out.println("Current File -> " + f);
}
}
}

Use objects rather than strings since they tend to offer useful functionality that strings don’t offer. In your case pass a File object or a Path object to your recursive method. I take it that you start out from a string, so have your public method accept a string and construct the first object.
public static void parseAllFiles(String parentDirectory) {
parseAllFiles(new File(parentDirectory));
}
private static void parseAllFiles(File dir) {
File[] filesInDirectory = dir.listFiles();
if (filesInDirectory != null) {
for (File f : filesInDirectory) {
String fileName = f.getName();
String fullPathName = f.getAbsolutePath();
System.out.println("Current File -> " + fileName);
System.out.println("Current path -> " + fullPathName);
if (f.isDirectory()) {
parseAllFiles(f);
}
}
}
}
I didn’t get whether you wanted only the file name or the full path name of the file, but the code shows how to extract each into a string. You may then search inside these two strings for whatever you are looking for.
java.nio since Java 7
I routinely use java.nio.file for file system operations. For everyday purposes I don’t find it better to work with than the older File class, but it does offer a wealth of options that the older class doesn’t offer. #Shahar Rotshtein in a comment mentioned the FileVisitor interface from java.nio.file. Depending on your exact requirements Files.walkFileTree passing your own FileVisitor may be the best option for you. I have not understood your real requirements well enough to offer a code example.
Links
java.nio.file documentation
Section Walking the file tree in the Oracle tutorial: Basic I/O

Read all the files from a folder starting from 2nd file in Java

I have few files in a folder and I want to read the first file and perform some operations and after the first iteration, I want to read files from the 2nd one on wards to perform different set of operations.
How can I do this?
File folder=new File(Folder);
File[] listOfFiles=folder.listFiles();
for(File file:listOfFiles)
{
//Do something
}
//Here I want to read from the 2nd file to do different set of operations

Get the first file as listOfFiles[0] and do the operation1 on it.
Then, use a simple (regular) for loop starting at index 1.
for (int i = 1; i < listOfFiles.length; i++) {
File currentFile = listOfFiles[i];
//do Operation 2 with currentFile
}
Note from the javadoc of Files.listFiles
There is no guarantee that the name strings in the resulting array will appear in any specific order; they are not, in particular, guaranteed to appear in alphabetical order.

As per your comment, it seems like you don't need the list of files to be sorted in some particular order before you process them. In that case,
File folder=new File(Folder);
File[] listOfFiles=folder.listFiles();
//use a normal for loop to keep track of the index
for(int i=0; i<listOfFiles.length; i++){
//the file in the current index of iteration
File currentFile = listOfFiles[i];
if(i==0){
//Do something with first file
}
else{
//Here I want to read from the 2nd file to do different set of operations
}
}
In the above code put the operation code for first file in the if block and the code for rest of the files in else block

Don't use java.io.File anymore, use java.nio—it's been around for a couple of years now! Beside the easier use—including Java 8 streams, one benefit is that you're not restricted to the system's default file system. You might even use an in-memory file system like Google's JimFS. As pointed out by other's, there is no guarantee on the order of files. You may however introduce your own sorting:
FileSystem fs = FileSystems.getDefault(); // use a file system of your choice here!
Path folder = fs.getPath(...);
Files.newDirectoryStream(folder)
.sorted((a, b) -> { ... })
// this realizes the skipping of the first element you initially requested:
.skip(1)
.forEach(f -> { ... });
If you want to perform action A on the first element and action B on the second, it might get a litle trickier: you might define a boolean firstFileHasBeenProcessed as external dependency, that's gonna be set to true once you processed the first file, but I am not sure if the consumers for all files will run strictly sequential—or if the processing of the first file may be interrupted to start processing the seond, before that flag could be set.
You can always render a stream to an array ...
final Path[] allFiles = Files.newDirectoryStream(folder)
.sorted((a, b) -> { ... })
.toArray(Path[]::new);
... or a list, though, to gain more control.
final List<Path> allFiles = Files.newDirectoryStream(folder)
.sorted((a, b) -> { ... })
.collect(Collectors.toList());
boolean firstHasBeenProcessed = false;
for (final Path currentFile: allFiles) {
if (firstHasBeenProcessed) {
processAsFollowUpFile(currentFile);
} else {
processAsFirstFile(currentFile);
firstHasBeenProcessed = true;
}
}

Performance optimization searching data in file system

I have a network associated storage where around 5 million txt files are there related to around 3 million transactions. Size of the total data is around 3.5 TB. I have to search in that location to find if the transaction related file is available or not and have to make two separate reports as CSV file of "available files" and "not available files". We are
still in JAVA 6. The challenge that I am facing since I have to search in the location recursively, it takes me around average 2 mins to search in that location because of huge size. I am using Java I/O API to search recursively like below. Is there any way I can improve the performance?
File searchFile(File location, String fileName) {
if (location.isDirectory()) {
File[] arr = location.listFiles();
for (File f : arr) {
File found = searchFile(f, fileName);
if (found != null)
return found;
}
} else {
if (location.getName().equals(fileName)) {
return location;
}
}
return null;
}

You should take a different approach, rather than walking the entire directory every time you search for a file, you should instead create an index, which is a mapping from filename to file location.
Essentially:
void buildIndex(Map index, File baseDir) {
if (location.isDirectory()) {
File[] arr = location.listFiles();
for (File f : arr) {
buildIndex(index, f);
}
} else {
index.put(f.getName(), f);
}
}
Now that you've got the index, searching for the files becomes trivial.
Now you've got the files in a Map, you can also even use Set operation to find the intersection:
Map index = new HashMap();
buildIndex(index, ...);
Set fileSet = index.keySet();
Set transactionSet = ...;
Set intersection = new HashSet(fileSet);
fileSet.retainAll(transactionSet);
Optionally, if the index itself is too big to keep in memory, you may want to create the index in an SQLite database.

Searching in a Directory or a Network Associated Storage is a
nightmare.It takes lot of time when directory is too big / depth. As you are in Java 6 ,
So you can follow an old fashion approach. List all files in a CSV file like
below.
e.g
find . -type f -name '*.txt' >> test.csv . (if unix)
dir /b/s *.txt > test.csv (if Windows)
Now load this CSV file into a Map to have an index as filename. Loading the file will take some time as it will be huge but once you load then searching in the map ( as it will be file name ) will be much more quick and will reduce your search time drastically.

You can use NIO FileVisitor, available in java 6.
Path findTransactionFile(Path root) {
Path transactionFile = null;
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
if (/* todo dir predicate*/ false) {
return FileVisitResult.SKIP_SUBTREE; // optimization
}
return FileVisitResult.CONTINUE;
}
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
if (/* todo file predicate*/ true) {
transactionFile = file;
return FileVisitResult.TERMINATE; // found
}
return FileVisitResult.CONTINUE;
}
});
return transactionFile;
}

I dont know the answer, but from algorithm perspective, your program has the worst complexity. per single look up for single transaction , it iterates all the files (5 million). you have 3 million transactions.
my suggestion is to iterate the files (5 million files) and build up an index based on the file name. then iterate the transactions and search the index instead of full scan.
Or there might be third party free tools that can index a large file system and then that index can be accessed by an external application (in this case your java app). if you can not find that kind of tool, better you invent it (then you can build the index in a optimum way that suits your requirement).

Java updating file reference after renaming

Hi there I have a problem dealing with some legacy code.
I need a way to get the changed File from the parseFile() method up to the calling doWithFileList() method.
public static void main(String[] args) throws IOException {
File file1 = File.createTempFile("file1", ".tmp");
File file2 = File.createTempFile("file2", ".tmp");
ArrayList<File> fileList = new ArrayList<File>();
fileList.add(file1);
fileList.add(file2);
doWithFileList(fileList);
}
static void doWithFileList(List<File> fileList) {
for (File file : fileList) {
String result = parseFile(file);
}
//Do something with the (now incorrect) file objects
for (File file : fileList) {
// always false here
if (!file.exists()) {
System.out.println("File does not exist anymore");
}
}
}
private static String parseFile(File file) {
//1. Get information from the File
//2. Use this information to load an object from the Database
//3. return some property of this object
//4. depending on another property of the DB object rename the file
file.renameTo(new File(file.getAbsoluteFile() + ".renamed"));
return "valueParsedFromFile";
}
I know that File objects are immutable.
The problem is in my real world problem the parseFile() method at the moment only does Step step 1-3 but I need to add step 4.
The renaming is not a problem, but I need to get the new file name somehow to the calling method.
in the real life problem there is bigger stack trace across multiple objects between those methods.
What would be the best way to get the changed name of the file back to the beginning of the the call hierarchy where I can change the object in the list.
my best guess at the moment would be to create a ReturnObject that holds both the String to return and the new File object. But then I have to refactor a bunch of methods on my way up so I would need to create a bunch of different return objects.

The following possiblities come to mind:
pass a mutable object, e.g. a new String[1] and set it there. (Mega-ugly, because you have side effects and not a pure function anymore) (On the other hand: you already have side-effects - go figure ;-))
Use a generic return object like String[], a Map, a Pair-implementation that you can find in various utilities (e.g. org.colllib.datastruct.Pair)
Use a hand-crafted return object
Personally, I'd probably go with (2), but it also might be (3)

Using a ReturnObjet seem to be the sole solution as far as I know.

Java Data structure files StackOverflowError

My program collect all path to files on the computer(OS Ubuntu) to one Map.
The key in the Map is a file size and value is list of canonical path to files the size is equal to key.
Map<Long, ArrayList<String>> map = new HashMap<>(100000);
Total number of files on computer is: 281091
A method that collects the files, it is recursively.
private void scanner(String path) throws Exception {
File[] dirs = new File(path).listFiles(new FileFilter() {
#Override
public boolean accept(File file) {
if (file.isFile() && file.canRead()) {
long size = file.length();
String canonPath = file.getCanonicalPath();
if (map.containsKey(size))
map.get(size).add(canonPath);
else map.put(size, new ArrayList<>(Arrays.asList(canonPath)));
return false;
}
return file.isDirectory() && file.canRead();
}
});
for (File dir : dirs) {
scanner(dir.getCanonicalPath());
}
}
When I begin start scanning from the root folder "/" have exception:
Exception in thread "main" java.lang.StackOverflowError
at java.io.UnixFileSystem.canonicalize0(Native Method)
at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
at java.io.File.getCanonicalPath(File.java:589)
at taskB.FileScanner.setCanonPath(FileScanner.java:49)
at taskB.FileScanner.access$000(FileScanner.java:12)
at taskB.FileScanner$1.accept(FileScanner.java:93)
at java.io.File.listFiles(File.java:1217)
at taskB.FileScanner.scanner(FileScanner.java:85)
at taskB.FileScanner.scanner(FileScanner.java:109)
at taskB.FileScanner.scanner(FileScanner.java:109)
...
But for test I filled directory "~/Documents" more than 400~ thousand files and began to scanning from it. Everything works fine.
Why when the program starts from the root directory "/" where less 300 thousand files I have exception? What should I do to prevent this was?

StackOverflow means that you called so many nested functions that your program ran out of space in memory for the function call information (retained for after returning from the call). In your case I suspect that it is due to parsing the "." (current directory) and ".." (parent directory) entries when returned in the directory list, thus you recurse into the same directory more than once.

The most likely explanation is that you have a symbolic link somewhere in the filesystem that creates a cycle (an infinite loop). For example, the following would be a cycle
/home/userid/test/data -> /home/userid
While scanning files you need to ignore symbolic links to directories.

#Jim Garrison was right, this was due to symbolic links. Solve their problems I found here.
I use the isSymbolicLink(Path) method.
return file.isDirectory() && file.canRead() && !Files.isSymbolicLink(file.toPath());

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Check for new files in a loop - java - java

Assuming that you only need to detect new files, not modified ones, and no file will be removed while your code is running: ArrayList implements removeAll(Collection c), which does exactly what you want: Removes from this list all of its elements that are contained in the specified collection.

Related

Extract FileName from getAbsolutePath() method

Read all the files from a folder starting from 2nd file in Java

Performance optimization searching data in file system

Java updating file reference after renaming

Java Data structure files StackOverflowError

Categories

Resources