How to move files within the Hadoop HDFS directory? - java

I need to move the files from one HDFS directory to another HDFS directory.
I wanted to check if there's some easier way (some HDFS API) to achieve the same task, other than InputStream/OutputStream ?
I've heard of FileSystem.rename(srcDir, destDir); but is unsure if this will delete the original src directory.
I don't want to remove the original directory structure, only move the files from one folder to another directory.
e.g
input Dir - /testHDFS/input/*.txt
dest Dir - /testHDFS/destination
After moving the files, directory should look something like this :-
input Dir - /testHDFS/input
dest Dir - /testHDFS/destination/*.txt
PS : I want to achieve this working inside mapper function for each file.
Any help would be appreciated.

FileSystem.rename will move the file from source to destination directory. I believe you can use it for your requirement.

The best way to do this is with org.apache.hadoop.fs.FileUtil.copy(), setting the deleteSource parameter to true. People commonly use FileSystem.rename(), but that function will fail silently for invisible issues (such as the source and destination Paths being on different volumes)

You can use DistCp programmatically verify this

Related

is there a way to force hadoop job to look into underscore folder?

I have to process a bunch of files in a folder that begins with "_" (underscore). Is there a way I can force hadoop to look into those folders? Do I need to write my own FileInputFormat?
The easiest way is probably to build the list of input files by yourself using for instance FileSystem.globStatus then manually add them to the job with FileInputFormat.addInputPath. FileSystem.globStatus doesn't filter hidden files by default.

Java IO issue when using SVN versioning system

I have a Java program that is supplied a directory name, gets a list of all the file in that directory using dirName.listFiles() and then iterates through every file parsing information from them.
The files would normally all just be normal text files, but I am using SVN and there seems to be a directory called .svn in my dirName directory which is causing my program to fail because .svn is a directory and not a text file.
Now, I could implement filters using a FileFilter object, but I would really only expect text files to be in that directory in the final program.
My question is: Is there a way round my issue without using a FileFilter? I also think that my program is ignoring the .svn directory in other programs that I've written, so I'm not sure why it's an issue now.
Thanks in advance.
You would have this issue with many version control systems (not just SVN) as some of them have files on disk that help identify where the working copy comes from (.svn for SVN, view.dat for clearcase). You really should just implement a FileFilter to exclude those, or use the ones from commons-io:
makeSvnAware
It's null safe, so if you give it null input, it simply returns an svn filter for you. If you give it another IOFileFilter (a subinterface of FileFilter) it simply returns one that does an AND between the existing filter and the svn filter.
FileFilter svnFilter = FileFilterUtils.makeSvnAware(null);
You could call isDirectory() on each object that listFiles() returns.
Two possible soulutions (at least):
FileFilter or FileNameFilter
isFile()
Look here: http://docs.oracle.com/javase/6/docs/api/java/io/File.html
Better than using java for file and directory search, i would prefer writing a jni program and use C's dirent.h and stat.h to differentiate between files. The jni program would be much faster.
If dirName is not the root directory of your working copy, you can upgrade to the latest version of svn. This doesn't have an .svn directory for every directory but only for the root.

Best to way execute a file internal to a java project

Suppose my project structure is:
/project
/src
/java
Util.java
/cpp
/bin
a.out
I'd like to execute a.out from within Util.java without hard-coding any absolute paths in my java file. What's the best way to go about doing this?
EDIT -- Here's what I ended up doing: I happen to be using autoconf as most of the code is c++. I defined a substitution variable like AC_SUBST([project_root], [$(pwd)]) in configure.ac and substituted it in a Config.java.in file.
Perhaps using a properties file to be loaded on deployment/running time depending on the nature of your app.
More about its use in this thread
How to use Java property files?
You're file path to a.out could be ../bin/a.out. And then execute the file using that path.
Below is some pseudo code that might help you.
// look for the executable in the current working directory
File executable = new File("a.out");
if( executable.exists()){
System.exec() .... etc
} else {
String location = YourMainClass.class.getProtectionDomain().getCodeSource().getLocation();
// write code to form a path name to the a.out based on the location of .jar file
}

Rename/move multiple files with a file pattern

I need to move any file matching a pattern defined by an rsync file pattern (used for --include, --exclude).
For example: *.str
I need to move any file in /source/ to /archive/ locally using Java. Will a simple File.renameTo method work? I don't see how looking at the source code.
What's the best way to do this? Any recommend libraries?
Background: some files and directories are being rsynced to multiple hosts. After it is successfully rsynced to each host, I need to archive the file (move it to the archive dir) locally. It works fine if local/source dir is a file, but when it's a directory and the rsync option --include is given, ONLY those files need to be moved after rsync is successful.
You can use the ant move task programmatically. Get ant and use org.apache.tools.ant.taskdefs.Move

Path resolution in eclipse package structure

This is a very simple question for many of you reading this, but it's quite new for me.
Here is a screenshot for my eclipse
When i run this program i get java.io.FileNotFoundException: queries.xml (The system cannot find the file specified) i tried ../../../queries.xml but that is also not working. I really don't understand when to use ../ because it means go 1 step back in dir, and in some cases it works, can anyone explain this? Also how can I refer to queries.xml here. Thanks
Note: I might even use this code on a linux box
I assume it is compiling your code into a build or classes folder, and running it from there...
Have you tried the traditional Java way for doing this:
def query = new XmlSlurper().parse( GroovySlurping.class.getResourceAsStream( '/queries.xml' ) )
Assuming the build step is copying the xml into the build folder, I believe that should work
I don't use Eclipse though, so can't be 100% sure...
Try
file = new File("src/org/ars/groovy/queries.xml");
To check the actual working directory of eclipse you can use
File f = new File(".");
System.out.println(f.getAbsolutePath());
You could try using a property file to store the path of the xml files.
This way you can place the xml files in any location, and simply change the property file.
This will not require a change/recompilation of code.
This would mean you will only need to hardcode the path of the property file.
If you prefer not hardcoding the path of the property file, then you could pass it as an argument during startup in your server setup file. (in tomcat web.xml). Every server will have an equivalent setup file where you could specify the path of the property file.
Alternatively you could specify the path of the xml in this same file, if you don't want to use property files.
This link will show you an example of reading from property files.
http://www.zparacha.com/how-to-read-properties-file-in-java/

Categories

Resources