I have this piece of code which can fetch a file from a Hadoop filesystem. I setup hadoop on a single node and from my local machine ran this code to see if it would be able to fetch file from HDFS setup on that node. It worked.
package com.hdfs.test.hdfs_util;
/* Copy file from hdfs to local disk without hadoop installation
*
* params are something like
* hdfs://node01.sindice.net:8020 /user/bob/file.zip file.zip
*
*/
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSdownloader{
public static void main(String[] args) throws Exception {
System.getProperty("java.classpath");
if (args.length != 3) {
System.out.println("use: HDFSdownloader hdfs src dst");
System.exit(1);
}
System.out.println(HDFSdownloader.class.getName());
HDFSdownloader dw = new HDFSdownloader();
dw.copy2local(args[0], args[1], args[2]);
}
private void copy2local(String hdfs, String src, String dst) throws IOException {
System.out.println("!! Entering function !!");
Configuration conf = new Configuration();
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.set("fs.default.name", hdfs);
FileSystem.get(conf).copyToLocalFile(new Path(src), new Path(dst));
System.out.println("!! copytoLocalFile Reached!!");
}
}
Now I took the same code, bundled it in a jar and tried to run it on another node(say B). This time the code had to fetch a file from a proper distributed Hadoop cluster. That cluster has Kerberos enabled in it.
The code ran but gave an exception :
Exception in thread "main" org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2030)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1999)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1975)
at com.hdfs.test.hdfs_util.HDFSdownloader.copy2local(HDFSdownloader.java:49)
at com.hdfs.test.hdfs_util.HDFSdownloader.main(HDFSdownloader.java:35)
Is there a way to programatically make this code run. For some reason, I can't install kinit on the source node.
Here's a code snippet to work in the scenario you have described above i.e. programatically access a kerberos enabled cluster. Important points to note are
Provide keytab file location in UserGroupInformation
Provide kerberos realm details in JVM arguments - krb5.conf file
Define hadoop security authentication mode as kerberos
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;
public class KerberosHDFSIO {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
//The following property is enough for a non-kerberized setup
// conf.set("fs.defaultFS", "localhost:9000");
//need following set of properties to access a kerberized cluster
conf.set("fs.defaultFS", "hdfs://devha:8020");
conf.set("hadoop.security.authentication", "kerberos");
//The location of krb5.conf file needs to be provided in the VM arguments for the JVM
//-Djava.security.krb5.conf=/Users/user/Desktop/utils/cluster/dev/krb5.conf
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("user#HADOOP_DEV.ABC.COM",
"/Users/user/Desktop/utils/cluster/dev/.user.keytab");
try (FileSystem fs = FileSystem.get(conf);) {
FileStatus[] fileStatuses = fs.listStatus(new Path("/user/username/dropoff"));
for (FileStatus fileStatus : fileStatuses) {
System.out.println(fileStatus.getPath().getName());
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Related
Consider a simple Java file which creates a BufferedInputStream to copy a local file 1400-8.txt to Hadoop HDFS and print some dots as a progress status. The example is Example 3-3 from the Hadoop book here.
// cc FileCopyWithProgress Copies a local file to a Hadoop filesystem, and shows progress
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
// vv FileCopyWithProgress
public class FileCopyWithProgress {
public static void main(String[] args) throws Exception {
String localSrc = args[0];
String dst = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4096, true);
}
}
// ^^ FileCopyWithProgress
I compile the code and create the JAR file with
hadoop com.sun.tools.javac.Main FileCopyWithProgress.java
jar cf FileCopyWithProgress.jar FileCopyWithProgress.class
The above commands generate the files FileCopyWithProgress.class, FileCopyWithProgress$1.class and FileCopyWithProgress.jar. Then, I try to run it
hadoop jar FileCopyWithProgress.jar FileCopyWithProgress 1400-8.txt hdfs://localhost:9000/user/kostas/1400-8.txt
But, I receive the error
Exception in thread "main" java.lang.NoClassDefFoundError:
FileCopyWithProgress$1
To my understanding, the FileCopyWithProgress$1.class is due to the anonymous callback function the program declares. But since the file exists what is the issue here? Am I running the correct sequence of commands?
I found the issue so I am just posting in case it helps someone. I had to include the class FileCopyWithProgress$1.class in the JAR. The correct one should be
jar cf FileCopyWithProgress.jar FileCopyWithProgress*.class
I am wokring on the HDFS and trying to copy a file from the local system to the HDFS file system using Configuration and FileSystem classes from the hadoop conf and fs packages as follows:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
public class FileCopyWithWrite {
public static void main(String[] args) {
// TODO Auto-generated method stub
String localSrc = "/Users/bng/Documents/hContent/input/ncdc/sample.txt";
String dst = "hdfs://localhost/sample.txt";
try{
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();;
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
// TODO Auto-generated method stub
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4092, true);
}catch(Exception e){
e.printStackTrace();
}
}
}
But running this program gives me an exception as follows:
org.apache.hadoop.security.AccessControlException: Permission denied: user=KV, access=WRITE, inode="/":root:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6545)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6527)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6479)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2712)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2632)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
The reason is right that the current user KV does not have the file write permission to the books directory in the HDFS.
I tried copying the file from the console which is working fine. I tried the following command from the console:
sudo su
hadoop fs -copyFromLocal /Users/bng/Documents/hContent/input/ncdc/sample.txt hdfs://localhost/sample.txt
I found a lot of search results on google but none worked for me. How to solve this issue? How can i run the specific class from STS or eclipse with sudo permission? Or is there any other option for this?
Providing the permissions to the current user in HDFS solved the problem for me.
I added the permissions in HDFS as follows:
hadoop fs -chown -R KV:KV hdfs://localhost
I am trying to run pig scripts remotely from my java machine, for that i have written below code
code:
import java.io.IOException;
import java.util.Properties;
import org.apache.pig.ExecType;
import org.apache.pig.PigServer;
import org.apache.pig.backend.executionengine.ExecException;
public class Javapig{
public static void main(String[] args) {
try {
Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://hdfs://192.168.x.xxx:8022");
props.setProperty("mapred.job.tracker", "192.168.x.xxx:8021");
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
runIdQuery(pigServer, "fact");
}
catch(Exception e) {
System.out.println(e);
}
}
public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
pigServer.registerQuery("A = load '" + inputFile + "' using org.apache.hive.hcatalog.pig.HCatLoader();");
pigServer.registerQuery("B = FILTER A by category == 'Aller';");
pigServer.registerQuery("DUMP B;");
System.out.println("Done");
}
}
but while executing i am getting below error.
Error
ERROR 4010: Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath).
I don't know what am i doing wrong.
Well, self describing error...
neither hadoop-site.xml nor core-site.xml was found in the classpath
You need both of those files in the classpath of your application.
You ideally would get those from your $HADOOP_CONF_DIR folder, and you would copy them into your Java's src/main/resources, assuming you have a Maven structure
Also, with those files, you should rather use a Configuration object for Hadoop
PigServer(ExecType execType, org.apache.hadoop.conf.Configuration conf)
I have an application that uses embedded jetty. When I run this application in Netbeans IDE then I can browse my site # localhost:8080/
When I launch the jar file of my application from command line: java -jar app.jar then browsing localhost:8080/ jetty server says "page not found"
What am I missing here? Can't figure out the problem.
EDIT:
Netbeans project is uploaded to Github
Everything works fine if I run this project in Netbeans.
But when I take the jar file with lib folder and run it in cmd like this: java -jar EmbeddedJettyJspJstl.jar
Then navigating to http://localhost:8080/test I get errors:
org.apache.jasper.JasperException: java.lang.ClassNotFoundException: org.apache.jsp.WEB_002dINF.jstl_jsp
org.apache.jasper.JasperException: The absolute uri: http://java.sun.com/jsp/jstl/core cannot be resolved in either web.xml or the jar files deployed with this application
My JSP page uses JSTL and looks like it is not locating the jstl libraries?
And this is the code that starts the server:
package server;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.lang.management.ManagementFactory;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import org.eclipse.jetty.jmx.MBeanContainer;
import org.eclipse.jetty.server.Handler;
import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.server.ServerConnector;
import org.eclipse.jetty.server.handler.AllowSymLinkAliasChecker;
import org.eclipse.jetty.server.handler.DefaultHandler;
import org.eclipse.jetty.server.handler.HandlerList;
import org.eclipse.jetty.webapp.Configuration;
import org.eclipse.jetty.webapp.WebAppContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
*
* #author lkallas
*/
public class JettyServer {
// Resource path pointing to where the WEBROOT is
private static final String WEBROOT = "/webapp/";
private static final Logger logger = LoggerFactory.getLogger(JettyServer.class);
public void start() throws IOException, InterruptedException, URISyntaxException {
Server server = new Server();
// HTTP connector
ServerConnector connector = new ServerConnector(server);
connector.setHost("localhost");
connector.setPort(8080);
connector.setIdleTimeout(30000);
// Set the connector
server.addConnector(connector);
// Setup JMX for web applications
MBeanContainer mbContainer = new MBeanContainer(
ManagementFactory.getPlatformMBeanServer());
server.addBean(mbContainer);
//Setting up web application
WebAppContext webapp = new WebAppContext();
webapp.setAttribute("javax.servlet.context.tempdir", getScratchDir());
webapp.setDescriptor(WEBROOT + "WEB-INF/web.xml");
webapp.setResourceBase(getWebRootResourceUri().toASCIIString());
webapp.setContextPath("/");
webapp.setWar(getWebRootResourceUri().toASCIIString());
webapp.addAliasCheck(new AllowSymLinkAliasChecker());
//For debugging
logger.info("Descriptor file: {}", webapp.getDescriptor());
logger.info("Resource base: {}", getWebRootResourceUri().toASCIIString());
logger.info("WAR location: {}", webapp.getWar());
HandlerList handlerList = new HandlerList();
handlerList.setHandlers(new Handler[]{webapp, new DefaultHandler()});
// This webapp will use jsps and jstl. We need to enable the
// AnnotationConfiguration in order to correctly
// set up the jsp container
Configuration.ClassList classlist = Configuration.ClassList
.setServerDefault(server);
classlist.addBefore(
"org.eclipse.jetty.webapp.JettyWebXmlConfiguration",
"org.eclipse.jetty.annotations.AnnotationConfiguration");
// Set the ContainerIncludeJarPattern so that jetty examines these
// container-path jars for tlds, web-fragments etc.
// If you omit the jar that contains the jstl .tlds, the jsp engine will
// scan for them instead.
webapp.setAttribute("org.eclipse.jetty.server.webapp.ContainerIncludeJarPattern", ".*/[^/]*taglibs.*\\.jar$");
// A WebAppContext is a ContextHandler as well so it needs to be set to
// the server so it is aware of where to send the appropriate requests.
server.setHandler(handlerList);
try {
server.start();
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
server.dumpStdErr();
}
/**
* Establish Scratch directory for the servlet context (used by JSP
* compilation)
*/
private File getScratchDir() throws IOException {
File tempDir = new File(System.getProperty("java.io.tmpdir"));
File scratchDir = new File(tempDir.toString(), "embedded-jetty");
if (!scratchDir.exists()) {
if (!scratchDir.mkdirs()) {
throw new IOException("Unable to create scratch directory: " + scratchDir);
}
}
return scratchDir;
}
/**
* Get webroot URI.
*
* #return
* #throws FileNotFoundException
* #throws URISyntaxException
*/
private URI getWebRootResourceUri() throws FileNotFoundException, URISyntaxException {
URL indexUri = this.getClass().getResource(WEBROOT);
if (indexUri == null) {
throw new FileNotFoundException("Unable to find resource " + WEBROOT);
}
logger.debug("WEBROOT: {}", indexUri.toURI().toASCIIString());
return indexUri.toURI();
}
}
I have already looked # http://www.eclipse.org/jetty/documentation/current/advanced-embedding.html
There's a number of reasons and causes that could be affecting you.
However you haven't posted any code to help us identify what the specific cause is.
The Jetty Project maintains an example for this setup, btw.
https://github.com/jetty-project/embedded-jetty-uber-jar
Pay attention to your context.setContextPath() (like #Haider-Ali pointed out), and also your context.setBaseResource()
For JSPs in Embedded Jetty you can look at the other example project
https://github.com/jetty-project/embedded-jetty-jsp
Note prior answer about Embedded Jetty and JSP.
I'm happily connecting to HDFS and listing my home directory:
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop:8020");
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(fs.getHomeDirectory(), false);
while (ri.hasNext()) {
LocatedFileStatus lfs = ri.next();
log.debug(lfs.getPath().toString());
}
fs.close();
What I'm wanting to do now though is connect as a specific user (not the whois user). Does anyone know how you specify which user you connect as?
As soon as I see this is done through UserGroupInformation class and PrivilegedAction or PrivilegedExceptionAction. Here is sample code to connect to remote HDFS 'like' different user ('hbase' in this case). Hope this will solve your task. In case you need full scheme with authentication you need to improve user handling. But for SIMPLE authentication scheme (actually no authentication) it works just fine.
package org.myorg;
import java.security.PrivilegedExceptionAction;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
public class HdfsTest {
public static void main(String args[]) {
try {
UserGroupInformation ugi
= UserGroupInformation.createRemoteUser("hbase");
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://1.2.3.4:8020/user/hbase");
conf.set("hadoop.job.ugi", "hbase");
FileSystem fs = FileSystem.get(conf);
fs.createNewFile(new Path("/user/hbase/test"));
FileStatus[] status = fs.listStatus(new Path("/user/hbase"));
for(int i=0;i<status.length;i++){
System.out.println(status[i].getPath());
}
return null;
}
});
} catch (Exception e) {
e.printStackTrace();
}
}
}
If I got you correct, all you want is to get home directory of the user if specify and not the whois user.
In you configuration file, set your homedir property to user/${user.name}. Make sure you have a system property named user.name
This worked in my case.
I hope this is what you want to do, If not add a comment.