copying directory from local system to hdfs java code - java

I'm having a problem trying to copy a directory from my local system to HDFS using java code. I'm able to move individual files but can't figure out a way to move an entire directory with sub-folders and files. Can anyone help me with that? Thanks in advance.

Just use the FileSystem's copyFromLocalFile method. If the source Path is a local directory it will be copied to the HDFS destination:
...
Configuration conf = new Configuration();
conf.addResource(new Path("/home/user/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/home/user/hadoop/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(new Path("/home/user/directory/"),
new Path("/user/hadoop/dir"));
...

Here is the full working code to read and write in to HDFS. It takes two arguments
Input path ( local / HDFS )
Output path(HDFS)
I used Cloudera sandbox.
package hdfsread;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class ReadingAFileFromHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
InputStream in = null;
Path pt = new Path(uri);
Configuration myConf = new Configuration();
Path outputPath = new Path(args[1]);
myConf.set("fs.defaultFS","hdfs://quickstart.cloudera:8020");
FileSystem fSystem = FileSystem.get(URI.create(uri),myConf);
OutputStream os = fSystem.create(outputPath);
try{
InputStream is = new BufferedInputStream(new FileInputStream(uri));
IOUtils.copyBytes(is, os, 4096, false);
}
catch(IOException e){
e.printStackTrace();
}
finally{
IOUtils.closeStream(in);
}
}
}

Related

Read and Write Parquet and HDFS files from java

currently I can read and write HDFS files in java, but I don't know how to read apache parquet files besides hdfs, my idea is to be able to read and write both files in java
package com.leerhdfs;
//import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
import java.io.*;
import java.net.URI;
import java.nio.charset.StandardCharsets;
public class ReadWriteHDFSExample {
public static void main(String[] args) throws IOException {
String localsrc = args[0];
String destinosrc = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localsrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(destinosrc), conf);
OutputStream out = fs.create(new Path(destinosrc), new Progressable() {
public void progress() {
System.out.println(".");
}
});
IOUtils.copyBytes(in, out, 4096, true);
}
}
Please help me!!!
Thanks!!

Hadoop is complaining for nonexistent anonymous class (NoClassDefFoundError)

Consider a simple Java file which creates a BufferedInputStream to copy a local file 1400-8.txt to Hadoop HDFS and print some dots as a progress status. The example is Example 3-3 from the Hadoop book here.
// cc FileCopyWithProgress Copies a local file to a Hadoop filesystem, and shows progress
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
// vv FileCopyWithProgress
public class FileCopyWithProgress {
public static void main(String[] args) throws Exception {
String localSrc = args[0];
String dst = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4096, true);
}
}
// ^^ FileCopyWithProgress
I compile the code and create the JAR file with
hadoop com.sun.tools.javac.Main FileCopyWithProgress.java
jar cf FileCopyWithProgress.jar FileCopyWithProgress.class
The above commands generate the files FileCopyWithProgress.class, FileCopyWithProgress$1.class and FileCopyWithProgress.jar. Then, I try to run it
hadoop jar FileCopyWithProgress.jar FileCopyWithProgress 1400-8.txt hdfs://localhost:9000/user/kostas/1400-8.txt
But, I receive the error
Exception in thread "main" java.lang.NoClassDefFoundError:
FileCopyWithProgress$1
To my understanding, the FileCopyWithProgress$1.class is due to the anonymous callback function the program declares. But since the file exists what is the issue here? Am I running the correct sequence of commands?
I found the issue so I am just posting in case it helps someone. I had to include the class FileCopyWithProgress$1.class in the JAR. The correct one should be
jar cf FileCopyWithProgress.jar FileCopyWithProgress*.class

Compilation of Hadoop Java program with additional dependencies

I'm attempting to build a Hadoop program, the purpose of which is to cat files that I've previously uploaded to HDFS, based largely on this tutorial, the program looks like this:
import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
public class ReadHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null ;
try
{
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
}
finally
{
IOUtils.closeStream(in);
}
}
}
It seems to me that the tutorial is flawed, because- according to my understanding- IOUtils is part of the apache.commons library. However, although I added the following line to the program I've been trying to deploy:
import org.apache.commons.compress.utils.IOUtils;
I'm still met with the following error:
That is:
FileSystemCat.java:37: error: cannot find symbol
IOUtils.copyBytes(in, System.out, 4096, false);
^
symbol: method copyBytes(InputStream,PrintStream,int,boolean)
location: class IOUtils
FileSystemCat.java:40: error: cannot find symbol
IOUtils.closeStream(in);
^
symbol: variable in
location: class FileSystemCat
2 errors
I'm executing it on the NameNode with this command:
javac -cp /usr/local/hadoop/share/hadoop/common/hadoop-common-2.8.1.jar:/home/ubuntu/job_program/commons-io-2.5/commons-io-2.5.jar FileSystemCat.java
Necessary appendix to ~/.bashrc:
# Classpath for Java
# export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
How to compule the program at the bottom:
javac -cp ${HADOOP_CLASSPATH}:commons-io-2.5.jar ReaderHDFS.java
How to generate the jar file for that program:
jar cf rhdfs.jar ReaderHDFS*.class
Run the command:
$HADOOP_HOME/bin/hadoop jar rhdfs.jar ReaderHDFS hdfs://master:9000/input_1/codes.txt
This is the program:
import org.apache.hadoop.io.IOUtils;
//import org.apache.commons.io.IOUtils;
import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
public class ReaderHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null ;
try
{
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
}
finally
{
IOUtils.closeStream(in);
}
}
}

Permission Dennied when copying file from Local system to HDFS from STS java program

I am wokring on the HDFS and trying to copy a file from the local system to the HDFS file system using Configuration and FileSystem classes from the hadoop conf and fs packages as follows:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
public class FileCopyWithWrite {
public static void main(String[] args) {
// TODO Auto-generated method stub
String localSrc = "/Users/bng/Documents/hContent/input/ncdc/sample.txt";
String dst = "hdfs://localhost/sample.txt";
try{
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();;
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
// TODO Auto-generated method stub
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4092, true);
}catch(Exception e){
e.printStackTrace();
}
}
}
But running this program gives me an exception as follows:
org.apache.hadoop.security.AccessControlException: Permission denied: user=KV, access=WRITE, inode="/":root:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6545)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6527)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6479)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2712)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2632)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
The reason is right that the current user KV does not have the file write permission to the books directory in the HDFS.
I tried copying the file from the console which is working fine. I tried the following command from the console:
sudo su
hadoop fs -copyFromLocal /Users/bng/Documents/hContent/input/ncdc/sample.txt hdfs://localhost/sample.txt
I found a lot of search results on google but none worked for me. How to solve this issue? How can i run the specific class from STS or eclipse with sudo permission? Or is there any other option for this?
Providing the permissions to the current user in HDFS solved the problem for me.
I added the permissions in HDFS as follows:
hadoop fs -chown -R KV:KV hdfs://localhost

Why Java Files.createFile with permissions doesn't work correctly?

I need to ceate file and set permissions(-rwxrw-r) to it, the permission of parent dir is (drwxrwxr--). The problem is that the write permission is
not set in created files. The user that ran this application is the owner of the parent dir.
Below is my test class that present the same problem. When I run this program, the permissions of generated file is (-rwxr--r--) though the class set permissions (-rwxrw-rw-). Why the write permission is not set
and I don't see any exception?
Any idea?
import java.io.BufferedInputStream;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.attribute.FileAttribute;
import java.nio.file.attribute.PosixFileAttributes;
import java.nio.file.attribute.PosixFilePermission;
import java.nio.file.attribute.PosixFilePermissions;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.HashSet;
import java.util.Set;
import java.util.TimeZone;
import java.util.concurrent.TimeUnit;
public class TestPermission{
static String parentDir = "/tmp/test/";
static Set<PosixFilePermission> defaultPosixPermissions = null;
static {
defaultPosixPermissions = new HashSet<>();
defaultPosixPermissions.add(PosixFilePermission.OWNER_READ);
defaultPosixPermissions.add(PosixFilePermission.OWNER_WRITE);
defaultPosixPermissions.add(PosixFilePermission.OWNER_EXECUTE);
defaultPosixPermissions.add(PosixFilePermission.GROUP_READ);
defaultPosixPermissions.add(PosixFilePermission.GROUP_WRITE);
//Others have read permission so that ftp user who doesn't belong to the group can fetch the file
defaultPosixPermissions.add(PosixFilePermission.OTHERS_READ);
defaultPosixPermissions.add(PosixFilePermission.OTHERS_WRITE);
}
public static void createFileWithPermission(String fileName) throws IOException{
// File parentFolder = new File(parentDir);
// PosixFileAttributes attrs = Files.readAttributes(parentFolder.toPath(), PosixFileAttributes.class);
// System.out.format("parentfolder permissions: %s %s %s%n",
// attrs.owner().getName(),
// attrs.group().getName(),
// PosixFilePermissions.toString(attrs.permissions()));
// FileAttribute<Set<PosixFilePermission>> attr = PosixFilePermissions.asFileAttribute(attrs.permissions());
FileAttribute<Set<PosixFilePermission>> attr = PosixFilePermissions.asFileAttribute(defaultPosixPermissions);
File file = new File(fileName);
Files.createFile(file.toPath(), attr);
}
public static void main(String[] args) throws IOException{
String fileName = parentDir + "testPermission_" + System.currentTimeMillis();
createFileWithPermission(fileName);
}
}
I believe the catch here is
The check for the existence of the file and the creation of the new
file if it does not exist are a single operation that is atomic with
respect to all other filesystem activities that might affect the
directory.
as mentioned in Class Files
This might be because of the OS operations that happen after a file is being created. The following modification in code should get things work fine:
File file = new File(fileName);
Files.createFile(file.toPath(), attr);
Files.setPosixFilePermissions(file.toPath(), defaultPosixPermissions); //Assure the permissions again after the file is created
It turns out that the reason is that my os has a umask as 0027(u=rwx,g=rx,o=) which means application has no way to
set permission for others group.
Files.createFile(file.toPath(), attr);
in the above line instead of using Files.createFile
use file.createNewFile() and if it returns true then the file is created.

Categories

Resources