HDFS from Java - Specifying the User - java

I'm happily connecting to HDFS and listing my home directory:
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop:8020");
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(fs.getHomeDirectory(), false);
while (ri.hasNext()) {
LocatedFileStatus lfs = ri.next();
log.debug(lfs.getPath().toString());
}
fs.close();
What I'm wanting to do now though is connect as a specific user (not the whois user). Does anyone know how you specify which user you connect as?

As soon as I see this is done through UserGroupInformation class and PrivilegedAction or PrivilegedExceptionAction. Here is sample code to connect to remote HDFS 'like' different user ('hbase' in this case). Hope this will solve your task. In case you need full scheme with authentication you need to improve user handling. But for SIMPLE authentication scheme (actually no authentication) it works just fine.
package org.myorg;
import java.security.PrivilegedExceptionAction;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
public class HdfsTest {
public static void main(String args[]) {
try {
UserGroupInformation ugi
= UserGroupInformation.createRemoteUser("hbase");
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://1.2.3.4:8020/user/hbase");
conf.set("hadoop.job.ugi", "hbase");
FileSystem fs = FileSystem.get(conf);
fs.createNewFile(new Path("/user/hbase/test"));
FileStatus[] status = fs.listStatus(new Path("/user/hbase"));
for(int i=0;i<status.length;i++){
System.out.println(status[i].getPath());
}
return null;
}
});
} catch (Exception e) {
e.printStackTrace();
}
}
}

If I got you correct, all you want is to get home directory of the user if specify and not the whois user.
In you configuration file, set your homedir property to user/${user.name}. Make sure you have a system property named user.name
This worked in my case.
I hope this is what you want to do, If not add a comment.

Related

How to check if a folder exists at a regular interval in java

My goal is to check if a Hadoop path exists and if yes, then it should download some file. Now in every five minutes, I want to check if the file exists in the Hadoop path.
currently, my code is checking if the file exists but not on an interval basis.
public boolean checkPathExistance(final org.apache.hadoop.fs.Path hdfsAbsolutePath)
{
final Configuration configuration = getHdfsConfiguration();
try
{
final FileSystem fileSystems = hdfsAbsolutePath.getFileSystem(configuration);
if (fileSystems.exists(hdfsAbsolutePath))
{
return true;
}
}
catch (final IOException e)
{
e.printStackTrace();
}
return false;
}
The criteria are this checkPathExistance method should be called in every five minutes to check if the file exists. And when it will return true, the file should be downloaded.
public void download(final String hdfs, final Path outputPath)
{
final org.apache.hadoop.fs.Path hdfsAbsolutePath = getHdfsFile(hdfsLocalPath).getPath();
logger.info("path check {}", hdfsAbsolutePath.getName());
final boolean isPathExist = checkPathExistance(hdfsAbsolutePath);
downloadFromHDFS(hdfsAbsolutePath, outputPath);
}
Can I please get some help here ?
For the file copying (and not folder copying, if I understood correctly within your question's context) you can just use the copyToLocalFile method from FileSystem as seen here by specifying the boolean that checks if you want to delete the source file, and the input (HDFS)/output (local) paths.
As for the periodic checking of the existence of the file in HDFS, you can use a ScheduledExecutorService object (Java 8 docs here) by specifying that you want your functions' execution to run every 5 minutes.
The following program takes two arguments, the path of the input file in the HDFS and the path of the output file locally.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
public class RegularFileCheck
{
public static boolean checkPathExistence(Path inputPath, Configuration conf) throws IOException
{
boolean flag = false;
FileSystem fs = FileSystem.get(conf);
if(fs.exists(inputPath))
flag = true;
return flag;
}
public static void download(Path inputPath, Path outputPath, Configuration conf) throws IOException
{
FileSystem fs = FileSystem.get(conf);
fs.copyToLocalFile(false, inputPath, outputPath); // don't delete the source input file
System.out.println("File copied!");
}
public static void main(String[] args)
{
Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]);
Configuration conf = new Configuration();
ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);
Runnable task = () ->
{
System.out.println("New Timer!");
try
{
if(checkPathExistence(inputPath, conf))
download(inputPath, outputPath, conf);
}
catch (IOException e)
{
e.printStackTrace();
}
};
executor.scheduleWithFixedDelay(task, 0, 5, TimeUnit.MINUTES);
}
}
The console output of course is continuous and looks like the one on the screenshot below (test.txt is the file stored in HDFS, test1.txt is the file to be copied locally). You can additionally modify the code above if you want to stop the re-executions after the file has been found and copied already, or if you want to stop checking for the file after a while.
To stop the search and copying, simply replace to the code above with the following snippet:
Runnable task = () ->
{
System.out.println("New Timer!");
try
{
if(new File(String.valueOf(outputPath)).exists())
System.exit(0);
else if(checkPathExistence(inputPath, conf))
download(inputPath, outputPath, conf);
}
catch (IOException e)
{
e.printStackTrace();
}
};
And the program will stop after the file has been copied, as seen from the console output:

Permission Dennied when copying file from Local system to HDFS from STS java program

I am wokring on the HDFS and trying to copy a file from the local system to the HDFS file system using Configuration and FileSystem classes from the hadoop conf and fs packages as follows:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
public class FileCopyWithWrite {
public static void main(String[] args) {
// TODO Auto-generated method stub
String localSrc = "/Users/bng/Documents/hContent/input/ncdc/sample.txt";
String dst = "hdfs://localhost/sample.txt";
try{
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();;
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
// TODO Auto-generated method stub
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4092, true);
}catch(Exception e){
e.printStackTrace();
}
}
}
But running this program gives me an exception as follows:
org.apache.hadoop.security.AccessControlException: Permission denied: user=KV, access=WRITE, inode="/":root:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6545)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6527)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6479)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2712)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2632)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
The reason is right that the current user KV does not have the file write permission to the books directory in the HDFS.
I tried copying the file from the console which is working fine. I tried the following command from the console:
sudo su
hadoop fs -copyFromLocal /Users/bng/Documents/hContent/input/ncdc/sample.txt hdfs://localhost/sample.txt
I found a lot of search results on google but none worked for me. How to solve this issue? How can i run the specific class from STS or eclipse with sudo permission? Or is there any other option for this?
Providing the permissions to the current user in HDFS solved the problem for me.
I added the permissions in HDFS as follows:
hadoop fs -chown -R KV:KV hdfs://localhost

How to access hadoop cluster which is Kerberos enabled programmatically?

I have this piece of code which can fetch a file from a Hadoop filesystem. I setup hadoop on a single node and from my local machine ran this code to see if it would be able to fetch file from HDFS setup on that node. It worked.
package com.hdfs.test.hdfs_util;
/* Copy file from hdfs to local disk without hadoop installation
*
* params are something like
* hdfs://node01.sindice.net:8020 /user/bob/file.zip file.zip
*
*/
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSdownloader{
public static void main(String[] args) throws Exception {
System.getProperty("java.classpath");
if (args.length != 3) {
System.out.println("use: HDFSdownloader hdfs src dst");
System.exit(1);
}
System.out.println(HDFSdownloader.class.getName());
HDFSdownloader dw = new HDFSdownloader();
dw.copy2local(args[0], args[1], args[2]);
}
private void copy2local(String hdfs, String src, String dst) throws IOException {
System.out.println("!! Entering function !!");
Configuration conf = new Configuration();
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.set("fs.default.name", hdfs);
FileSystem.get(conf).copyToLocalFile(new Path(src), new Path(dst));
System.out.println("!! copytoLocalFile Reached!!");
}
}
Now I took the same code, bundled it in a jar and tried to run it on another node(say B). This time the code had to fetch a file from a proper distributed Hadoop cluster. That cluster has Kerberos enabled in it.
The code ran but gave an exception :
Exception in thread "main" org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2030)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1999)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1975)
at com.hdfs.test.hdfs_util.HDFSdownloader.copy2local(HDFSdownloader.java:49)
at com.hdfs.test.hdfs_util.HDFSdownloader.main(HDFSdownloader.java:35)
Is there a way to programatically make this code run. For some reason, I can't install kinit on the source node.
Here's a code snippet to work in the scenario you have described above i.e. programatically access a kerberos enabled cluster. Important points to note are
Provide keytab file location in UserGroupInformation
Provide kerberos realm details in JVM arguments - krb5.conf file
Define hadoop security authentication mode as kerberos
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;
public class KerberosHDFSIO {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
//The following property is enough for a non-kerberized setup
// conf.set("fs.defaultFS", "localhost:9000");
//need following set of properties to access a kerberized cluster
conf.set("fs.defaultFS", "hdfs://devha:8020");
conf.set("hadoop.security.authentication", "kerberos");
//The location of krb5.conf file needs to be provided in the VM arguments for the JVM
//-Djava.security.krb5.conf=/Users/user/Desktop/utils/cluster/dev/krb5.conf
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("user#HADOOP_DEV.ABC.COM",
"/Users/user/Desktop/utils/cluster/dev/.user.keytab");
try (FileSystem fs = FileSystem.get(conf);) {
FileStatus[] fileStatuses = fs.listStatus(new Path("/user/username/dropoff"));
for (FileStatus fileStatus : fileStatuses) {
System.out.println(fileStatus.getPath().getName());
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

Error executing PigServer in Java

I am trying to run pig scripts remotely from my java machine, for that i have written below code
code:
import java.io.IOException;
import java.util.Properties;
import org.apache.pig.ExecType;
import org.apache.pig.PigServer;
import org.apache.pig.backend.executionengine.ExecException;
public class Javapig{
public static void main(String[] args) {
try {
Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://hdfs://192.168.x.xxx:8022");
props.setProperty("mapred.job.tracker", "192.168.x.xxx:8021");
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
runIdQuery(pigServer, "fact");
}
catch(Exception e) {
System.out.println(e);
}
}
public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
pigServer.registerQuery("A = load '" + inputFile + "' using org.apache.hive.hcatalog.pig.HCatLoader();");
pigServer.registerQuery("B = FILTER A by category == 'Aller';");
pigServer.registerQuery("DUMP B;");
System.out.println("Done");
}
}
but while executing i am getting below error.
Error
ERROR 4010: Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath).
I don't know what am i doing wrong.
Well, self describing error...
neither hadoop-site.xml nor core-site.xml was found in the classpath
You need both of those files in the classpath of your application.
You ideally would get those from your $HADOOP_CONF_DIR folder, and you would copy them into your Java's src/main/resources, assuming you have a Maven structure
Also, with those files, you should rather use a Configuration object for Hadoop
PigServer(ExecType execType, org.apache.hadoop.conf.Configuration conf)

copying directory from local system to hdfs java code

I'm having a problem trying to copy a directory from my local system to HDFS using java code. I'm able to move individual files but can't figure out a way to move an entire directory with sub-folders and files. Can anyone help me with that? Thanks in advance.
Just use the FileSystem's copyFromLocalFile method. If the source Path is a local directory it will be copied to the HDFS destination:
...
Configuration conf = new Configuration();
conf.addResource(new Path("/home/user/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/home/user/hadoop/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(new Path("/home/user/directory/"),
new Path("/user/hadoop/dir"));
...
Here is the full working code to read and write in to HDFS. It takes two arguments
Input path ( local / HDFS )
Output path(HDFS)
I used Cloudera sandbox.
package hdfsread;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class ReadingAFileFromHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
InputStream in = null;
Path pt = new Path(uri);
Configuration myConf = new Configuration();
Path outputPath = new Path(args[1]);
myConf.set("fs.defaultFS","hdfs://quickstart.cloudera:8020");
FileSystem fSystem = FileSystem.get(URI.create(uri),myConf);
OutputStream os = fSystem.create(outputPath);
try{
InputStream is = new BufferedInputStream(new FileInputStream(uri));
IOUtils.copyBytes(is, os, 4096, false);
}
catch(IOException e){
e.printStackTrace();
}
finally{
IOUtils.closeStream(in);
}
}
}

Categories

Resources