I have a directory with files, directories, subdirectories, etc. How I can get the list of absolute paths to all files and directories using the Apache Hadoop API?
Using HDFS API :
package org.myorg.hdfsdemo;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HdfsDemo {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/core-site.xml"));
conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
System.out.println("Enter the directory name :");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
Path path = new Path(br.readLine());
displayDirectoryContents(fs, path);
}
private static void displayDirectoryContents(FileSystem fs, Path rootDir) {
// TODO Auto-generated method stub
try {
FileStatus[] status = fs.listStatus(rootDir);
for (FileStatus file : status) {
if (file.isDir()) {
System.out.println("This is a directory:" + file.getPath());
displayDirectoryContents(fs, file.getPath());
} else {
System.out.println("This is a file:" + file.getPath());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Writer a recursive function which takes a file and check if its a directory or not, if directory list out all files in it and in a for loop check if the file is a directory then recursively call or just return the list of files.
Something like this below but not exactly same (here I am returning only .java files)
private static List<File> recursiveDir(File file) {
if (!file.isDirectory()) {
// System.out.println("[" + file.getName() + "] is not a valid directory");
return null;
}
List<File> returnList = new ArrayList<File>();
File[] files = file.listFiles();
for (File f : files) {
if (!f.isDirectory()) {
if (f.getName().endsWith("java")) {
returnList.add(f);
}
} else {
returnList.addAll(recursiveDir(f));
}
}
return returnList;
}
with hdfs you can use hadoop fs -lsr .
Related
Please take a look at the code I have so far and if possible explain what I'm doing wrong. I'm trying to learn.
I made a little program to search for a type of file in a directory and all its sub-directories and copy them into another folder.
Code
import java.util.ArrayList;
import java.util.List;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
public class FandFandLoop {
public static void main(String[] args) {
final File folder = new File("C:/Users/ina/src");
List<String> result = new ArrayList<>();
search(".*\\.txt", folder, result);
File to = new File("C:/Users/ina/dest");
for (String s : result) {
System.out.println(s);
File from = new File(s);
try {
copyDir(from.toPath(), to.toPath());
System.out.println("done");
}
catch (IOException ex) {
ex.printStackTrace();
}
}
}
public static void copyDir(Path src, Path dest) throws IOException {
Files.walk(src)
.forEach(source -> {
try {
Files.copy(source, dest.resolve(src.relativize(source)),
StandardCopyOption.REPLACE_EXISTING);
} catch (IOException e) {
e.printStackTrace();
}
});
}
public static void search(final String pattern, final File folder, List<String> result) {
for (final File f : folder.listFiles()) {
if (f.isDirectory()) {
search(pattern, f, result);
}
if (f.isFile()) {
if (f.getName().matches(pattern)) {
result.add(f.getAbsolutePath());
}
}
}
}
}
It works, but what it actually does is to take my .txt files and write them into another file named dest without extension.
And at some point, it deletes the folder dest.
The deletion happens because of StandardCopyOption.REPLACE_EXISTING, if I understand this, but what I would have liked to obtain was that if several files had the same name then only one copy of it should be kept.
There is no need to call Files.walk on the matched source files.
You can improve this code by switching completely to using java.nio.file.Path and not mixing string paths and File objects. Additionally instead of calling File.listFiles() recursively you can use Files.walk or even better Files.find.
So you could instead use the following:
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.CopyOption;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.Objects;
import java.util.function.BiPredicate;
import java.util.stream.Stream;
public class CopyFiles {
public static void copyFiles(Path src, Path dest, PathMatcher matcher, CopyOption... copyOptions) throws IOException {
// Argument validation
if (!Files.isDirectory(src)) {
throw new IllegalArgumentException("Source '" + src + "' is not a directory");
}
if (!Files.isDirectory(dest)) {
throw new IllegalArgumentException("Destination '" + dest + "' is not a directory");
}
Objects.requireNonNull(matcher);
Objects.requireNonNull(copyOptions);
BiPredicate<Path, BasicFileAttributes> filter = (path, attributes) -> attributes.isRegularFile() && matcher.matches(path);
// Use try-with-resources to close stream as soon as it is not longer needed
try (Stream<Path> files = Files.find(src, Integer.MAX_VALUE, filter)) {
files.forEach(file -> {
Path destFile = dest.resolve(src.relativize(file));
try {
copyFile(file, destFile, copyOptions);
}
// Stream methods do not allow checked exceptions, have to wrap it
catch (IOException ioException) {
throw new UncheckedIOException(ioException);
}
});
}
// Wrap UncheckedIOException; cannot unwrap it to get actual IOException
// because then information about the location where the exception was wrapped
// will get lost, see Files.find doc
catch (UncheckedIOException uncheckedIoException) {
throw new IOException(uncheckedIoException);
}
}
private static void copyFile(Path srcFile, Path destFile, CopyOption... copyOptions) throws IOException {
Path destParent = destFile.getParent();
// Parent might be null if dest is empty path
if (destParent != null) {
// Create parent directories before copying file
Files.createDirectories(destParent);
}
Files.copy(srcFile, destFile, copyOptions);
}
public static void main(String[] args) throws IOException {
Path srcDir = Paths.get("path/to/src");
Path destDir = Paths.get("path/to/dest");
// Could also use FileSystem.getPathMatcher
PathMatcher matcher = file -> file.getFileName().toString().endsWith(".txt");
copyFiles(srcDir, destDir, matcher);
}
}
Below is the code for copy a file from one directory to another directory.
For eg: if the filename is Red-Already over,Red-Already Over(slow),NEFFEX-Destiny,then it should create a directory name red and copy the file into it,
For another artist it should NEFFEX folder and copy the file into it.
The problem is that it can create the directory if Files.copy is commented.But it is unable to create a dir but copy the file when Files.copy is uncommented.
The file is not playable because it doesn't have extension(seems the file is not getting copied properly).
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;
public class OrgLogic {
String path="C:\\Users\\Fawkes\\Music\\Music\\";
String target_path="C:\\Output\\";
OrgLogic() throws IOException{
File f=new File(path); //loads the input dir path
File dir=new File(target_path); //loads the output dir path
dir.mkdir();//create a new dir name output
File[] total_file=f.listFiles();//get the total number of file
//System.out.println(total_file.length);//prints the total number of the file
for(int i=0;i<total_file.length;i++) {
String name=total_file[i].getName();
String new_name=name.substring(0, name.indexOf("-")-1);
dir=new File(target_path+new_name);
if(dir.exists()) {
//new File(new_path+new_name).mkdir();
Files.copy(total_file[i].toPath(), dir.toPath(),StandardCopyOption.REPLACE_EXISTING);
}
else {
//new File(target_path+new_name).mkdir();
dir.mkdir();
Files.copy(total_file[i].toPath(), dir.toPath(),StandardCopyOption.REPLACE_EXISTING);
}
}
}
public static void main(String[] args) {
try {
new OrgLogic();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The filename path is:
Source:
C:\Users\Fawkes\Music\Music\Red-Already over
C:\Users\Fawkes\Music\Music\Red-Already over(slow)
C:\Users\Fawkes\Music\Music\NEFFEX-Destiny
Destination:
C:\Output\
For eg:
C:\Output\Red\Already over
C:\Output\Red\Already over(slow)
C:\Output\NEFFEX\Destiny
it is declared as variable:path and target_path
(Feature: it would be good that while pasting codes in here Line Number should be there.)
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
public class OrgLogic {
String path="C:\\Users\\Fawkes\\Music\\Music\\";
String target_path="C:\\Output\\";
OrgLogic() throws IOException{
File f=new File(path); //loads the input dir path
File dir=new File(target_path); //loads the output dir path
dir.mkdir();//create a new dir name output
File[] total_file=f.listFiles();//get the total number of file
//System.out.println(total_file.length);//prints the total number of the file
for(int i=0;i<total_file.length;i++) {
String name=total_file[i].getName();
String new_name=name.substring(0, name.indexOf("-")-1);
dir=new File(target_path+new_name);
if(dir.exists()) {
Path src=Paths.get(total_file[i].getPath());
Path dest=Paths.get(dir.getAbsolutePath().concat("\\").concat(total_file[i].getName()));
Files.copy(src, dest ,StandardCopyOption.REPLACE_EXISTING);
System.out.println("Files copied: "+(i+1)+"/"+total_file.length);
}
else {
dir.mkdir();
Path src=Paths.get(total_file[i].getPath());
Path dest=Paths.get(dir.getAbsolutePath().concat("\\").concat(total_file[i].getName()));// only needed to add this line.
Files.copy(src, dest ,StandardCopyOption.REPLACE_EXISTING);
System.out.println("Files copied: "+(i+1)+"/"+total_file.length);
}
}
}
public static void main(String[] args) {
try {
new OrgLogic();
} catch (IOException e) {
e.printStackTrace();
}
}
}
I wrote a script to print the names of all files and folders, recursively, to console. I want to edit this to print it in an excel spreadsheet/csv like so :
Folder
FullPath/Folder/File.extension, Folder, Extension
...
...
Recursively do this for all documents in the folder.
Here is my script to do it in Console :
package test;
import java.io.File;
import java.io.IOException;
public class RecursiveFileDisplay {
public static void main(String[] args) {
File currentDir = new File("."); // current directory
displayDirectoryContents(currentDir);
}
public static void displayDirectoryContents(File dir) {
try {
File[] files = dir.listFiles();
for (File file : files) {
if (file.isDirectory()) {
System.out.println("directory:" + file.getCanonicalPath());
displayDirectoryContents(file);
} else {
System.out.println(" file:" + file.getCanonicalPath());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
How can I adapt this to print to an Excel spreadsheet or CSV?
Thanks.
I've created a tool that checks pdf files for errors. The tool selects with a filechooser a directory, checks if there are pdf files and scans them. But I want that the tool checks recursively the directory. If I try this code:
public class RecursiveFileDisplay {
public static void main(String[] args) {
File currentDir = new File("C://Users//Tommy//Desktop"); // current directory
displayDirectoryContents(currentDir);
}
public static void displayDirectoryContents(File dir) {
try {
File[] files = dir.listFiles();
for (File file : files) {
if (file.isDirectory()) {
displayDirectoryContents(file);
} else {
if (file.getName().endsWith((".pdf"))) {
System.out.println(" file:" + file.getCanonicalPath());
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
the tool will show me for each directory on a Desktop a new list and doesn't sum up all the results in one list...:
If you want your program to create a whole list with all the files processed, then you have to modify your displayDirectoryContent method as following:
public static List<File> displayDirectoryContents(File dir) {
ArrayList<File> rtnFiles = new ArrayList<File>();
try {
File[] files = dir.listFiles();
for (File file : files) {
if (file.isDirectory()) {
rtnFiles.addAll(displayDirectoryContents(file));
} else {
if (file.getName().endsWith((".pdf"))) {
System.out.println(" file:" + file.getCanonicalPath());
rtnFiles.add(file);
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
return rtnFiles;
}
This way, it will return you with all the files found/processed. If you want this method only return you with failed files, then you will have to check not only the file is a .pdf file, but it is ok/failed, and then add to the list.
If you want to controll sum, then you can create a Java Class with two Integers one for pdf found files, and other for pdf failed found files. Recursively you can cum it up in the same way I build the list.
Hope it helps.
You can use the java.nio.file.Files.walkFileTree
import java.io.File;
import java.io.IOException;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.ArrayList;
import java.util.List;
public class FolderRecurs {
public Path pathToStart = Paths.get("The_URI_OF_THE_ROOT_DIRECTORY");
public List<File> listPDF = new ArrayList<>();
public FolderRecurs(Path pathToStart) {
this.pathToStart = pathToStart;
}
public void goRecurs() throws IOException{
Files.walkFileTree(pathToStart, new SimpleFileVisitor<Path>() {
//for folders
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs){
boolean error = false;
//your code
return error?FileVisitResult.TERMINATE : FileVisitResult.CONTINUE;
}
//for files
#Override
public FileVisitResult visitFile(Path filePath, BasicFileAttributes attrs){
boolean error = false;
//Convert path to file
File file = filePath.toFile();
if (file.getName().endsWith((".pdf"))){
listPDF.add(file);
}
//your code
return error?FileVisitResult.TERMINATE : FileVisitResult.CONTINUE;
}
});
}
}
import java.io.File;
import org.apache.commons.io.FilenameUtils;
public class Tester {
public static void main(String[] args) {
String rootPath = "F:\\Java\\Java_Project";
File fRoot = new File(rootPath);
File[] fsSub = fRoot.listFiles();
for (File file : fsSub) {
if(file.isDirectory()) continue;
String fileNewPath = FilenameUtils.removeExtension(file.getPath()) + "\\" + file.getName();
File fNew = new File(fileNewPath);
try {
file.renameTo(fNew);
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
I am trying to move the file to another directory,for instance,if the File path is
"C:\out.txt"
than I want to move to
"C:\out\out.txt"
If i try to print the original File and the new original information, the work well,But they just can not move successful.
I suggest to try Java 7 NIO2
Files.move(Path source, Path target, CopyOption... options)