I want to read the file from the Hadoop File System.
In order to achieve the correct path of the file, I need host name and port address of the hdfs.
so finally my path of the file will look something like
Path path = new Path("hdfs://123.23.12.4344:9000/user/filename.txt")
Now I want to know to extract the HostName = "123.23.12.4344" & port: 9000?
Basically, I want to access the FileSystem on Amazon EMR but, when I use FileSystem fs = FileSystem.get(getConf()); I get
You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path
So I decided to use URI. (I have to use URI) but I am not sure how to access the URI.
You can use either of the two ways to solve your error.
1.
String infile = "file.txt";
Path ofile = new Path(infile);
FileSystem fs = ofile.getFileSystem(getConf());
2.
Configuration conf = getConf();
System.out.println("fs.default.name : - " + conf.get("fs.default.name"));
// It prints uri as : hdfs://10.214.15.165:9000 or something
String uri = conf.get("fs.default.name");
FileSystem fs = FileSystem.get(uri,getConf());
Related
I am writing a small program to load hdfs files using java. When i run the code, i get the list of files from the hdfs. But, i want to get the partition files alone. Eg.part-00000 files.
Below is the sample code:
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost");
FileSystem hdfs = FileSystem.get(new URI(
"hdfs://localhost"), conf);
RemoteIterator<LocatedFileStatus> fsStatus = hdfs.listFiles(
new Path("/hdfs/path"), true);
while (fsStatus.hasNext()) {
String path = fsStatus.next().getPath().toString();
System.out.println(path.matches("part-"));
}
I assume you want to print that path, not the fact that it matches
if (path.startsWith("part-")) {
System.out.println(path);
}
I'd like to use Google's JIMFS for creating a virtual file system for testing purposes. I have trouble getting started, though.
I looked at this tutorial: http://www.hascode.com/2015/03/creating-in-memory-file-systems-with-googles-jimfs/
However, when I create the file system, it actually gets created in the existing file system, i. e. I cannot do:
Files.createDirectory("/virtualfolder");
because I am denied access.
Am I missing something?
Currently, my code looks something like this:
Test Class:
FileSystem fs = Jimfs.newFileSystem(Configuration.unix());
Path vTargetFolder = fs.getPath("/Store/homes/linux/abc/virtual");
TestedClass test = new TestedClass(vTargetFolder.toAbsolutePath().toString());
Java class somewhere:
targetPath = Paths.get(targetName);
Files.createDirectory(targetPath);
// etc., creating files and writing them to the target directory
However, I created a separate class just to test JIMFS and here the creation of the directory doesnt fail, but I cannot create a new file like this:
FileSystem fs = Jimfs.newFileSystem(Configuration.unix());
Path data = fs.getPath("/virtual");
Path dir = Files.createDirectory(data);
Path file = Files.createFile(Paths.get(dir + "/abc.txt")); // throws NoSuchFileException
What am I doing wrong?
The problem is a mix of Default FileSystem and new FileSystem.
Problem 1:
Files.createDirectory("/virtualfolder");
This will actually not compile so I suspect you meant:
Files.createDirectory( Paths.get("/virtualfolder"));
This attempts to create a directory in your root directory of the default filesystem. You need privileges to do that and probably should not do it as a test. I suspect you tried to work around this problem by using strings and run into
Problem 2:
Lets look at your code with comments
FileSystem fs = Jimfs.newFileSystem(Configuration.unix());
// now get path in the new FileSystem
Path data = fs.getPath("/virtual");
// create a directory in the new FileSystem
Path dir = Files.createDirectory(data);
// create a file in the default FileSystem
// with a parent that was never created there
Path file = Files.createFile(Paths.get(dir + "/abc.txt")); // throws NoSuchFileException
Lets look at the last line:
dir + "/abc.txt" >> is the string "/virtual/abc.txt"
Paths.get(dir + "/abc.txt") >> is this as path in the default filesystem
Remember the virtual filesystem lives parallel to the default filesystem.
Paths have a filesystem and can not be used in an other filesystem. They are not just names.
Notes:
Working with virtual filesystems avoid the Paths class. This class will always work in the default filesystem. Files is ok because you have create a path in the correct filesystem first.
if your original plan was to work with a virtual filesystem mounted to the default filesystem you need bit more. I have a project where I create a Webdav server based on a virtual filesystem and then use OS build in methods to mount that as a volume.
In your shell try ls /
the output should contain the "/virtual" directory.
If this is not the case which I suspect it is then:
The program is masking a:
java.nio.file.AccessDeniedException: /virtual/abc.txt
In reality the code should be failing at Path dir = Files.createDirectory(data);
But for some reason this exception is silent and the program continues without creating the directory (or thinking it has) and attempts to write to the directory that doesn't exist
Leaving a misleading java.nio.file.NoSuchFileException
I suggest you use memoryfilesystem instead. It has a much more complete implementation than Jimfs; in particular, it supports POSIX attributes when creating a "Linux" filesystem etc.
Using it, your code will actually work:
try (
final FileSystem fs = MemoryFileSystemBuilder.newLinux()
.build("testfs");
) {
// create a directory, a file within this directory etc
}
Seems like instead of
Path file = Files.createFile(Paths.get(dir + "/abc.txt"));
You should be doing
Path file = Files.createFile(dir.resolve("/abc.txt"))
This way, the context of dir (it's filesystem) is not lost.
For unit tests, I'd like to create an in-memory file system with VFS.
My current code:
final String ROOTPATH = "ram://virtual";
FileSystemManager fsManager = VFS.getManager();
fsManager.createVirtualFileSystem(ROOTPATH);
FileObject testFile = fsManager.resolveFile(ROOTPATH + "/test.txt");
testFile.createFile();
FileObject testFile2 = fsManager.resolveFile(ROOTPATH + "/test2.txt");
testFile2.createFile();
FileObject testFile3 = fsManager.resolveFile(ROOTPATH + "/test3.txt");
testFile3.createFile();
FileObject testFile4 = fsManager.resolveFile(ROOTPATH + "/test4.txt");
testFile4.createFile();
FileObject folder = fsManager.resolveFile(ROOTPATH);
FileObject[] files = folder.getChildren();
for (FileObject file : files) {
System.out.println(file.getName());
}
My question: Is this the correct way to do it? Examples on this topica are sparse.
I still get the log message:
Apr 14, 2015 11:08:17 AM org.apache.commons.vfs2.VfsLog info
INFORMATION: Using "/tmp/vfs_cache" as temporary files store.
Can I ignore this, since I am using the ram URI scheme? I guess it's because I didn't configure the DefaultFileSystemManager.
Thanks for help and tips!
EDIT:
Now with the marschall memoryFileSystem:
I copied the example code from their website.
This is my #Test-Method:
FileSystem fileSystem = this.rule.getFileSystem();
Path testDirectoryPath = Paths.get("test");
Files.createDirectories(testDirectoryPath);
Path p = fileSystem.getPath("test");
System.out.println(p.getFileName());
System.out.println(p.getFileSystem());
Path testfile = Paths.get("test/text2.txt");
Path test = Files.createFile(testfile);
Path f = fileSystem.getPath("test/text2.txt");
System.out.println(f.getFileName());
System.out.println(f.getFileSystem());
System.out.println(f.toAbsolutePath());
This is the console output:
test
MemoryFileSystem[VirtualTestFileSystem]
text2.txt
MemoryFileSystem[VirtualTestFileSystem]
/test/text2.txt
Looks alright, however: The files and directories actually get created on my hard drive, in the project folder. I thought, the whole point of this was to avoid exactly that...? Am I doing something wrong or do I just not get it...?
Your are using Paths.get
Path testDirectoryPath = Paths.get("test");
Paths.get always creates paths on the default file system. You should use
Path testDirectoryPath = fileSystem.getPath("test");
Is it possible to use an instance of Hadoop FileSystem created from any valid hdfs url to be used again for reading and writing different hdfs urls.I have tried the following
String url1 = "hdfs://localhost:54310/file1.txt";
String url2 = "hdfs://localhost:54310/file2.txt";
String url3 = "hdfs://localhost:54310/file3.txt";
//Creating filesystem using url1
FileSystem fileSystem = FileSystem.get(URI.create(url1), conf);
//Using same filesystem with url2 and url3
InputStream in = fileSystem.open(new Path(url2));
OutputStream out = fileSystem.create(new Path(url3));
This works.But will this cause any other issues.
You can certainly create a single FileSystem with your scheme and adress and then get it via the FileSystem.
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:54310");
FileSystem fs = FileSystem.get(conf);
InputStream is = fs.open(new Path("/file1.txt"));
For different dfs paths, methods create/open will fail. Look at org.apache.hadoop.fs.FileSystem#checkPath method.
I am an amateur with hadoop and stuffs. Now, I am trying to access the hadoop cluster (HDFS) and retrieve the list of files from client eclipse. I can do the following operations after setting up the required configurations on hadoop java client.
I can perform copyFromLocalFile, copyToLocalFile operations accessing HDFS from client.
Here's what I am facing. When i give listFiles() method I am getting
org.apache.hadoop.fs.LocatedFileStatus#d0085360
org.apache.hadoop.fs.LocatedFileStatus#b7aa29bf
MainMethod
Properties props = new Properties();
props.setProperty("fs.defaultFS", "hdfs://<IPOFCLUSTER>:8020");
props.setProperty("mapreduce.jobtracker.address", "<IPOFCLUSTER>:8032");
props.setProperty("yarn.resourcemanager.address", "<IPOFCLUSTER>:8032");
props.setProperty("mapreduce.framework.name", "yarn");
FileSystem fs = FileSystem.get(toConfiguration(props)); // Setting up the required configurations
Path p4 = new Path("/user/myusername/inputjson1/");
RemoteIterator<LocatedFileStatus> ritr = fs.listFiles(p4, true);
while(ritr.hasNext())
{
System.out.println(ritr.next().toString());
}
I have also tried FileContext and ended up only getting the filestatus object string or something. Is there a possibility to take the filenames when i iterate to the remote hdfs directory, there is a method called getPath(), Is that the only way we can retrieve the full path of the filenames using the hadoop API or there are any other method so that i can retrieve only name of the files in a specified directory path, Please help me through this, Thanks.
You can indeed use getPath() this will return you a Path object which let you query the name of the file.
Path p = ritr.next().getPath();
// returns the filename or directory name if directory
String name = p.getName();
The FileStatus object you get can tell you if this is a file or directory.
Here is more API documentation:
http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/Path.html
http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/FileStatus.html