I'm attempting to build a Hadoop program, the purpose of which is to cat files that I've previously uploaded to HDFS, based largely on this tutorial, the program looks like this:
import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
public class ReadHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null ;
try
{
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
}
finally
{
IOUtils.closeStream(in);
}
}
}
It seems to me that the tutorial is flawed, because- according to my understanding- IOUtils is part of the apache.commons library. However, although I added the following line to the program I've been trying to deploy:
import org.apache.commons.compress.utils.IOUtils;
I'm still met with the following error:
That is:
FileSystemCat.java:37: error: cannot find symbol
IOUtils.copyBytes(in, System.out, 4096, false);
^
symbol: method copyBytes(InputStream,PrintStream,int,boolean)
location: class IOUtils
FileSystemCat.java:40: error: cannot find symbol
IOUtils.closeStream(in);
^
symbol: variable in
location: class FileSystemCat
2 errors
I'm executing it on the NameNode with this command:
javac -cp /usr/local/hadoop/share/hadoop/common/hadoop-common-2.8.1.jar:/home/ubuntu/job_program/commons-io-2.5/commons-io-2.5.jar FileSystemCat.java
Necessary appendix to ~/.bashrc:
# Classpath for Java
# export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
How to compule the program at the bottom:
javac -cp ${HADOOP_CLASSPATH}:commons-io-2.5.jar ReaderHDFS.java
How to generate the jar file for that program:
jar cf rhdfs.jar ReaderHDFS*.class
Run the command:
$HADOOP_HOME/bin/hadoop jar rhdfs.jar ReaderHDFS hdfs://master:9000/input_1/codes.txt
This is the program:
import org.apache.hadoop.io.IOUtils;
//import org.apache.commons.io.IOUtils;
import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
public class ReaderHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null ;
try
{
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
}
finally
{
IOUtils.closeStream(in);
}
}
}
Related
Consider a simple Java file which creates a BufferedInputStream to copy a local file 1400-8.txt to Hadoop HDFS and print some dots as a progress status. The example is Example 3-3 from the Hadoop book here.
// cc FileCopyWithProgress Copies a local file to a Hadoop filesystem, and shows progress
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
// vv FileCopyWithProgress
public class FileCopyWithProgress {
public static void main(String[] args) throws Exception {
String localSrc = args[0];
String dst = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4096, true);
}
}
// ^^ FileCopyWithProgress
I compile the code and create the JAR file with
hadoop com.sun.tools.javac.Main FileCopyWithProgress.java
jar cf FileCopyWithProgress.jar FileCopyWithProgress.class
The above commands generate the files FileCopyWithProgress.class, FileCopyWithProgress$1.class and FileCopyWithProgress.jar. Then, I try to run it
hadoop jar FileCopyWithProgress.jar FileCopyWithProgress 1400-8.txt hdfs://localhost:9000/user/kostas/1400-8.txt
But, I receive the error
Exception in thread "main" java.lang.NoClassDefFoundError:
FileCopyWithProgress$1
To my understanding, the FileCopyWithProgress$1.class is due to the anonymous callback function the program declares. But since the file exists what is the issue here? Am I running the correct sequence of commands?
I found the issue so I am just posting in case it helps someone. I had to include the class FileCopyWithProgress$1.class in the JAR. The correct one should be
jar cf FileCopyWithProgress.jar FileCopyWithProgress*.class
Any idea why this works fine even on a Windows environment ?
Can you point me to some docs which explains why this works ?
I tried to look for it with no results.
package com.my.example;
import org.junit.Test;
import java.io.File;
public class TestPathCreation
{
#Test
public void testPathCreation() throws Exception {
final String path = "~/.my/dir";
System.out.println(path);
File f = new File(path);
System.out.println(f.mkdirs()); // true
}
}
I am wokring on the HDFS and trying to copy a file from the local system to the HDFS file system using Configuration and FileSystem classes from the hadoop conf and fs packages as follows:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
public class FileCopyWithWrite {
public static void main(String[] args) {
// TODO Auto-generated method stub
String localSrc = "/Users/bng/Documents/hContent/input/ncdc/sample.txt";
String dst = "hdfs://localhost/sample.txt";
try{
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();;
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
// TODO Auto-generated method stub
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4092, true);
}catch(Exception e){
e.printStackTrace();
}
}
}
But running this program gives me an exception as follows:
org.apache.hadoop.security.AccessControlException: Permission denied: user=KV, access=WRITE, inode="/":root:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6545)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6527)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6479)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2712)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2632)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
The reason is right that the current user KV does not have the file write permission to the books directory in the HDFS.
I tried copying the file from the console which is working fine. I tried the following command from the console:
sudo su
hadoop fs -copyFromLocal /Users/bng/Documents/hContent/input/ncdc/sample.txt hdfs://localhost/sample.txt
I found a lot of search results on google but none worked for me. How to solve this issue? How can i run the specific class from STS or eclipse with sudo permission? Or is there any other option for this?
Providing the permissions to the current user in HDFS solved the problem for me.
I added the permissions in HDFS as follows:
hadoop fs -chown -R KV:KV hdfs://localhost
I get the following error when I try to run
java -cp 'angus-sdk-java-0.0.2-jar-with-dependencies.jar:.' FaceDetect
I am following a tutorial for face detection in http://angus-doc.readthedocs.io/en/latest/getting-started/java.html . Below is my java code,
import java.io.IOException;
import org.json.simple.JSONObject;
import ai.angus.sdk.Configuration;
import ai.angus.sdk.Job;
import ai.angus.sdk.ProcessException;
import ai.angus.sdk.Root;
import ai.angus.sdk.Service;
import ai.angus.sdk.impl.ConfigurationImpl;
import ai.angus.sdk.impl.File;
public class FaceDetect {
public static void main(String[] args) throws IOException, ProcessException {
Configuration conf = new ConfigurationImpl();
Root root = conf.connect();
Service service = root.getServices().getService("age_and_gender_estimation", 1);
JSONObject params = new JSONObject();
params.put("image", new File("Downloads/IMG_1060.jpg"));
Job job = service.process(params);
System.out.println(job.getResult().toJSONString());
}
}
I don't understand the problem with it. I have tried all the answers in the stack overflow but nothing is working for me.
remove the single qoutes around the classpath:
java -cp angus-sdk-java-0.0.2-jar-with-dependencies.jar:. FaceDetect
I'm having a problem trying to copy a directory from my local system to HDFS using java code. I'm able to move individual files but can't figure out a way to move an entire directory with sub-folders and files. Can anyone help me with that? Thanks in advance.
Just use the FileSystem's copyFromLocalFile method. If the source Path is a local directory it will be copied to the HDFS destination:
...
Configuration conf = new Configuration();
conf.addResource(new Path("/home/user/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/home/user/hadoop/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(new Path("/home/user/directory/"),
new Path("/user/hadoop/dir"));
...
Here is the full working code to read and write in to HDFS. It takes two arguments
Input path ( local / HDFS )
Output path(HDFS)
I used Cloudera sandbox.
package hdfsread;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class ReadingAFileFromHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
InputStream in = null;
Path pt = new Path(uri);
Configuration myConf = new Configuration();
Path outputPath = new Path(args[1]);
myConf.set("fs.defaultFS","hdfs://quickstart.cloudera:8020");
FileSystem fSystem = FileSystem.get(URI.create(uri),myConf);
OutputStream os = fSystem.create(outputPath);
try{
InputStream is = new BufferedInputStream(new FileInputStream(uri));
IOUtils.copyBytes(is, os, 4096, false);
}
catch(IOException e){
e.printStackTrace();
}
finally{
IOUtils.closeStream(in);
}
}
}