I have a directory with 100,000 files and I need to iterate them all to read a value. Right now I use listFiles() to load all files in a array and then iterate one by one. But is there a memory efficient way to do this without loading in a array?
File[] tFiles = new File(Dir).listFiles();
try {
for (final File tFile : tFiles) {
//Process files one by one
}
}
Since Java 7, you can use the file visitor pattern to visit the contents of a directory recursively.
The documentation for the FileVisitor interface is here.
This allows you to iterate over files without creating a large array of File objects.
Simple example to print out your file names:
Path start = Paths.get(new URI("file:///my/folder/"));
Files.walkFileTree(start, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
throws IOException
{
System.out.println(file);
return FileVisitResult.CONTINUE;
}
#Override
public FileVisitResult postVisitDirectory(Path dir, IOException e)
throws IOException
{
if (e == null) {
System.out.println(dir);
return FileVisitResult.CONTINUE;
}
else {
// directory iteration failed
throw e;
}
}
});
Java 8 lazily loaded stream version:
Files.list(new File("path to directory").toPath()).forEach(path -> {
File file = path.toFile();
//process your file
});
If you want to avoid the excessive boilerplate that comes with JDK's FileVisitor, you can use Guava. Files.fileTreeTraverser() gives you a TreeTraverser<File> which you can use for traversing the files in the folder (or even sub-folders):
for (File f : Files.fileTreeTraverser()
.preOrderTraversal(new File("/parent/folder"))) {
// do something with each file
}
Related
I have a directory structure of the form:
base_directory / level_one_a, level_one_b, level_one_c /
then within all those directories in level_one_x are a multitude of subsequent directories, i.e.
/ level_one_a_1,level_one_a_2,level_one_a_3...
and so on for level_one_b & level_one_c
then inside of level_one_a_1 we have more still, i.e. level_one_a_1_I,level_one_a_1_II,level_one_a_1_III,level_one_a_1_IV...
Then finally inside of level_one_a_1_IV, and all those on the same level, are the files I want to operate on.
I guess a shorter way to say that would be start/one/two/three/*files*
There are many many files and I want to process them all with a simple java program I wrote:
try
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null)
{
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
Document doc = Jsoup.parse(everything);
String link = doc.select("block.full_text").text();
System.out.println(link);
}
finally
{
br.close();
}
it uses jsoup
I'd like to construct this script such that the program can navigate this directory structure autonomously and grab each file then process it with that script, using buffered reader and file reader I guess, how can I facilitate that? I tried implementing this solution but I couldn't get it to work.
Ideally I want to output each file it processes with a unique name, i.e. is the file is named 00001.txt it might save it as 00001_output.txt but, that's a horse of a different colour
Just use java.io.File and its method listFiles.
See javadoc File API
Similar question on SO was posted here:
Recursively list files in Java
You can achieve this also by using the Java NIO 2 API.
public class ProcessFiles extends SimpleFileVisitor<Path> {
static final String OUT_FORMAT = "%-17s: %s%n";
static final int MAX_DEPTH = 4;
static final Path baseDirectory = Paths.get("R:/base_directory");
public static void main(String[] args) throws IOException {
Set<FileVisitOption> visitOptions = new HashSet<>();
visitOptions.add(FileVisitOption.FOLLOW_LINKS);
Files.walkFileTree(baseDirectory, visitOptions, MAX_DEPTH,
new ProcessFiles()
);
}
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
if (file.getNameCount() <= MAX_DEPTH) {
System.out.printf(OUT_FORMAT, "skip wrong level", file);
return FileVisitResult.SKIP_SUBTREE;
} else {
// add probably a file name check
System.out.printf(OUT_FORMAT, "process file", file);
return CONTINUE;
}
}
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attr) {
if (dir.getNameCount() < MAX_DEPTH) {
System.out.printf(OUT_FORMAT, "walk into dir", dir);
return CONTINUE;
}
if (dir.getName(MAX_DEPTH - 1).toString().equals("level_one_a_1_IV")) {
System.out.printf(OUT_FORMAT, "destination dir", dir);
return CONTINUE;
} else {
System.out.printf(OUT_FORMAT, "skip dir name", dir);
return FileVisitResult.SKIP_SUBTREE;
}
}
}
assuming following directory/file structure
base_directory
base_directory/base_directory.file
base_directory/level_one_a
base_directory/level_one_a/level_one_a.file
base_directory/level_one_a/level_one_a_1
base_directory/level_one_a/level_one_a_1/level_one_a_1.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_I
base_directory/level_one_a/level_one_a_1/level_one_a_1_I/level_one_a_1_I.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_II
base_directory/level_one_a/level_one_a_1/level_one_a_1_II/level_one_a_1_II.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_III
base_directory/level_one_a/level_one_a_1/level_one_a_1_III/level_one_a_1_III.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_IV
base_directory/level_one_a/level_one_a_1/level_one_a_1_IV/level_one_a_1_IV.file
base_directory/someother_a
base_directory/someother_a/someother_a.file
base_directory/someother_a/someother_a_1
base_directory/someother_a/someother_a_1/someother_a_1.file
base_directory/someother_a/someother_a_1/someother_a_1_I
base_directory/someother_a/someother_a_1/someother_a_1_I/someother_a_1_I.file
base_directory/someother_a/someother_a_1/someother_a_1_II
base_directory/someother_a/someother_a_1/someother_a_1_II/someother_a_1_II.file
base_directory/someother_a/someother_a_1/someother_a_1_III
base_directory/someother_a/someother_a_1/someother_a_1_III/someother_a_1_III.file
base_directory/someother_a/someother_a_1/someother_a_1_IV
base_directory/someother_a/someother_a_1/someother_a_1_IV/someother_a_1_IV.file
you would get following output (for demonstration)
walk into dir : R:\base_directory
skip wrong level : R:\base_directory\base_directory.file
walk into dir : R:\base_directory\level_one_a
skip wrong level : R:\base_directory\level_one_a\level_one_a.file
walk into dir : R:\base_directory\level_one_a\level_one_a_1
skip wrong level : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1.file
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_I
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_II
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_III
destination dir : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_IV
process file : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_IV\level_one_a_1_IV.file
walk into dir : R:\base_directory\someother_a
skip wrong level : R:\base_directory\someother_a\someother_a.file
walk into dir : R:\base_directory\someother_a\someother_a_1
skip wrong level : R:\base_directory\someother_a\someother_a_1\someother_a_1.file
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_I
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_II
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_III
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_IV
some links to the Oralce tutorial for further reading
Walking the File Tree
Finding Files
I'm curious as if there's a way to define a Parent-folder, then have a program cycle through all of the files, and sub-folders, and rename the file extension.
I know this can be done in the command prompt using the command "*.ext *.newext" however that's not a possible solution for me and I need to rename 2,719 file extentions that are nested inside of this folder.
Yes, you can do it. Here's an example:
// java 6
File parentDir = new File("..");
System.out.println(parentDir.getAbsolutePath());
final File[] files = parentDir.listFiles();
System.out.println(Arrays.toString(files));
// java 7+
File parentDir = new File("..");
try {
Files.walkFileTree(parentDir.toPath(), new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
if (file.toFile().renameTo(new File("othername.txt"))) {
return FileVisitResult.CONTINUE;
} else {
return FileVisitResult.TERMINATE;
}
}
});
} catch (IOException e) {
e.printStackTrace();
}
This one does not go through subdirs, but it is easy to modify that way.
Here's a simple function that should do the job for you. Sorry, if it's not the most elegant code -- my Java is a little rusty.
By default, it's recursive in its implementation; so be aware that it will affect all files of the specified type in the parent directory!
SwapFileExt params
path the parent directory you want to parse
cExt the extension type that you want to replace
nExt the desired extension type
NOTE: Both cExt and nExt are to be represented without the '.' (e.g. "txt", not ".txt")
public static void SwapFileExt(String path, String cExt, String nExt) {
File parentDir = new File(path);
File[] contents = parentDir.listFiles();
for (int i = 0; i < contents.length; i++) {
if (contents[i].isFile()) {
if (contents[i].toString().contains("." + cExt)) {
String item = contents[i].toString().replaceAll("." + cExt, "." + nExt);
contents[i].renameTo(new File(item));
}
} else if (contents[i].isDirectory()) {
SwapFileExt(contents[i].toString(), cExt, nExt);
}
}
}
I am looking a way to get the list of files inside a zip file. I created a method to get the list of files inside a directory but I am also looking a way to get files inside a zip as well instead of showing just zip file.
here is my method:
public ArrayList<String> listFiles(File f, String min, String max) {
try {
// parse input strings into date format
Date minDate = sdf.parse(min);
Date maxDate = sdf.parse(max);
//
File[] list = f.listFiles();
for (File file : list) {
double bytes = file.length();
double kilobytes = (bytes / 1024);
if (file.isFile()) {
String fileDateString = sdf.format(file.lastModified());
Date fileDate = sdf.parse(fileDateString);
if (fileDate.after(minDate) && fileDate.before(maxDate)) {
lss.add("'" + file.getAbsolutePath() +
"'" + " Size KB:" + kilobytes + " Last Modified: " +
sdf.format(file.lastModified()));
}
} else if (file.isDirectory()) {
listFiles(file.getAbsoluteFile(), min, max);
}
}
} catch (Exception e) {
e.getMessage();
}
return lss;
}
After having searched for a better answer for a while, I finally found a better way to do this. You can actually do the same thing in a more generic way using the Java NIO API (Since Java 7).
// this is the URI of the Zip file itself
URI zipUri = ...;
FileSystem zipFs = FileSystems.newFileSystem(zipUri, Collections.emptyMap());
// The path within the zip file you want to start from
Path root = zipFs.getPath("/");
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(Path path, BasicFileAttributes attrs) throws IOException {
// You can do anything you want with the path here
System.out.println(path);
// the BasicFileAttributes object has lots of useful meta data
// like file size, last modified date, etc...
return FileVisitResult.CONTINUE;
}
// The FileVisitor interface has more methods that
// are useful for handling directories.
});
This approach has the advantage that you can travers ANY file system this way: your normal windows or Unix filesystem, the file system contain contained within a zip or a jar, or any other really.
You can then trivially read the contents of any Path via the Files class, using methods like Files.copy(), File.readAllLines(), File.readAllBytes(), etc...
You can use ZipFile.entries() method to read the list of files via iteration as below:
File[] fList = directory.listFiles();
for (File file : fList)
{
ZipFile myZipFile = new ZipFile(fList.getName());
Enumeration zipEntries = myZipFile.entries();
while (zipEntries.hasMoreElements())
{
System.out.println(((ZipEntry) zipEntries.nextElement()).getName());
// you can do what ever you want on each zip file
}
}
public class Sorter {
String dir1 = ("C:/Users/Drew/Desktop/test");
String dir2 = ("C:/Users/Drew/Desktop/");
public void SortingAlgo() throws IOException {
// Declare files for moving
File sourceDir = new File(dir1);
File destDir = new File(dir2);
//Get files, list them, grab only mp3 out of the pack, and sort
File[] listOfFiles = sourceDir.listFiles();
if(sourceDir.isDirectory()) {
for(int i = 0; i < listOfFiles.length; i++) {
//list Files
System.out.println(listOfFiles[i]);
String ext = FilenameUtils.getExtension(dir1);
System.out.println(ext);
}
}
}
}
I am trying to filter out only .mp3's in my program. I'm obviously a beginner and tried copying some things off of Google and this website. How can I set a directory (sourceDir) and move those filtered files to it's own folder?
File provides an ability to filter the file list as it's begin generated.
File[] listOfFiles = sourceDir.listFiles(new FileFilter() {
#Override
public boolean accept(File pathname) {
return pathname.getName().toLowerCase().endsWith(".mp3");
}
});
Now, this has a number of benefits, the chief among which is you don't need to post-process the list, again, or have two lists in memory at the same time.
It also provides pluggable capabilities. You could create a MP3FileFilter class for instance and re-use it.
I find the NIO.2 approach using GLOBs or custom filter the cleanest solution. Check out this example on how to use GLOB or filter example in the attached link:
Path directoryPath = Paths.get("C:", "Program Files/Java/jdk1.7.0_40/src/java/nio/file");
if (Files.isDirectory(directoryPath)) {
try (DirectoryStream<Path> stream = Files.newDirectoryStream(directoryPath, "*.mp3")) {
for (Path path : stream) {
System.out.println(path);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
For more information about content listing and directory filtering visit Listing and filtering directory contents in NIO.2
if(ext.endWith(".mp3")){
//do what ever you want
}
I want to delete all files inside ABC directory.
When I tried with FileUtils.deleteDirectory(new File("C:/test/ABC/")); it also deletes folder ABC.
Is there a one liner solution where I can delete files inside directory but not directory?
import org.apache.commons.io.FileUtils;
FileUtils.cleanDirectory(directory);
There is this method available in the same file. This will also recursively deletes all sub-folders and files under them.
Docs: org.apache.commons.io.FileUtils.cleanDirectory
Do you mean like?
for(File file: dir.listFiles())
if (!file.isDirectory())
file.delete();
This will only delete files, not directories.
Peter Lawrey's answer is great because it is simple and not depending on anything special, and it's the way you should do it. If you need something that removes subdirectories and their contents as well, use recursion:
void purgeDirectory(File dir) {
for (File file: dir.listFiles()) {
if (file.isDirectory())
purgeDirectory(file);
file.delete();
}
}
To spare subdirectories and their contents (part of your question), modify as follows:
void purgeDirectoryButKeepSubDirectories(File dir) {
for (File file: dir.listFiles()) {
if (!file.isDirectory())
file.delete();
}
}
Or, since you wanted a one-line solution:
for (File file: dir.listFiles())
if (!file.isDirectory())
file.delete();
Using an external library for such a trivial task is not a good idea unless you need this library for something else anyway, in which case it is preferrable to use existing code. You appear to be using the Apache library anyway so use its FileUtils.cleanDirectory() method.
Java 8 Stream
This deletes only files from ABC (sub-directories are untouched):
Arrays.stream(new File("C:/test/ABC/").listFiles()).forEach(File::delete);
This deletes only files from ABC (and sub-directories):
Files.walk(Paths.get("C:/test/ABC/"))
.filter(Files::isRegularFile)
.map(Path::toFile)
.forEach(File::delete);
^ This version requires handling the IOException
Or to use this in Java 8:
try {
Files.newDirectoryStream( directory ).forEach( file -> {
try { Files.delete( file ); }
catch ( IOException e ) { throw new UncheckedIOException(e); }
} );
}
catch ( IOException e ) {
e.printStackTrace();
}
It's a pity the exception handling is so bulky, otherwise it would be a one-liner ...
public class DeleteFile {
public static void main(String[] args) {
String path="D:\test";
File file = new File(path);
File[] files = file.listFiles();
for (File f:files)
{if (f.isFile() && f.exists)
{ f.delete();
system.out.println("successfully deleted");
}else{
system.out.println("cant delete a file due to open or error");
} } }}
rm -rf was much more performant than FileUtils.cleanDirectory.
Not a one-liner solution but after extensive benchmarking, we found that using rm -rf was multiple times faster than using FileUtils.cleanDirectory.
Of course, if you have a small or simple directory, it won't matter but in our case we had multiple gigabytes and deeply nested sub directories where it would take over 10 minutes with FileUtils.cleanDirectory and only 1 minute with rm -rf.
Here's our rough Java implementation to do that:
// Delete directory given and all subdirectories and files (i.e. recursively).
//
static public boolean clearDirectory( File file ) throws IOException, InterruptedException {
if ( file.exists() ) {
String deleteCommand = "rm -rf " + file.getAbsolutePath();
Runtime runtime = Runtime.getRuntime();
Process process = runtime.exec( deleteCommand );
process.waitFor();
file.mkdirs(); // Since we only want to clear the directory and not delete it, we need to re-create the directory.
return true;
}
return false;
}
Worth trying if you're dealing with large or complex directories.
For deleting all files from directory say "C:\Example"
File file = new File("C:\\Example");
String[] myFiles;
if (file.isDirectory()) {
myFiles = file.list();
for (int i = 0; i < myFiles.length; i++) {
File myFile = new File(file, myFiles[i]);
myFile.delete();
}
}
Another Java 8 Stream solution to delete all the content of a folder, sub directories included, but not the folder itself.
Usage:
Path folder = Paths.get("/tmp/folder");
CleanFolder.clean(folder);
and the code:
public interface CleanFolder {
static void clean(Path folder) throws IOException {
Function<Path, Stream<Path>> walk = p -> {
try { return Files.walk(p);
} catch (IOException e) {
return Stream.empty();
}};
Consumer<Path> delete = p -> {
try {
Files.delete(p);
} catch (IOException e) {
}
};
Files.list(folder)
.flatMap(walk)
.sorted(Comparator.reverseOrder())
.forEach(delete);
}
}
The problem with every stream solution involving Files.walk or Files.delete is that these methods throws IOException which are a pain to handle in streams.
I tried to create a solution which is more concise as possible.
I think this will work (based on NonlinearFruit previous answer):
Files.walk(Paths.get("C:/test/ABC/"))
.sorted(Comparator.reverseOrder())
.map(Path::toFile)
.filter(item -> !item.getPath().equals("C:/test/ABC/"))
.forEach(File::delete);
Cheers!
package com;
import java.io.File;
public class Delete {
public static void main(String[] args) {
String files;
File file = new File("D:\\del\\yc\\gh");
File[] listOfFiles = file.listFiles();
for (int i = 0; i < listOfFiles.length; i++)
{
if (listOfFiles[i].isFile())
{
files = listOfFiles[i].getName();
System.out.println(files);
if(!files.equalsIgnoreCase("Scan.pdf"))
{
boolean issuccess=new File(listOfFiles[i].toString()).delete();
System.err.println("Deletion Success "+issuccess);
}
}
}
}
}
If you want to delete all files remove
if(!files.equalsIgnoreCase("Scan.pdf"))
statement it will work.