How should I access my hdfs file system from my desktop

How should I access my hdfs file system from my desktop - java

I want to run jar file of a map reduce job. My input and output files are in hdfs. My WordCountJob.jar file is in Desktop.
Input file (inside hdfs) : /rucha/input/sample.txt
Output file(inside hdfs) : /rucha/output/result
hadoop jar WordCountJob.jar /usr/local/hadoop/input /usr/local/Cellar/hadoop/output/result
So what would be the command for running this jar file which takes input from hdfs and stores result in hdfs.

You need to modify the command as below:
hadoop jar local path for the jar/WordCountJob.jar fully qualified classname /rucha/input/sample.txt /rucha/output/result
Classname you can find in the main program.

Related

Retrieve the job result from HDFS

to read the file directly from HDFS without copying it to the local file system. i copied the results to the local file system though.
hduser#ubuntu:/usr/local/hadoop$ mkdir /tmp/gutenberg-output
bin/hadoop dfs -getmerge /user/hduser/gutenberg-output /tmp/gutenberg-output
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
20/11/17 21:58:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
the linux answer is
getmerge: `/tmp/gutenberg-output': Is a directory
how to fix the error please?

You seem to try to output this particular HDFS directory itself instead of the contents inside of it.
The good thing about the HDFS, though, is that it does follow a couple of Unix-based command line conventions, so you can really read the contents of a file under this directory (which you supposedly have the output of a job) by using the cat command like this:
hadoop fs -cat output_directory/part-r-00000
Where the output_directory is the name of the directory your desired output is stored and the part-r-00000 is the name of the file (or the first of a set of files named part-r-00000, part-r-00001, etc. depending on the number of your job's reducers that you might define) with the results of the job.
If the above command throws an error that there's no such file with that name, then either your job has stumbled upon a problem before setting the output key-value pairs, or your version of Hadoop is a bit older and the name of the output file(s) is something like part-00000, part-00001 and it goes on.
As an example of the, the following output in the screenshot below is from an executed job where its output was stored under the wc_out directory in the HDFS:

Why the BAT file launched from JMeter OS Sampler is not triggering the Maven execution?

Summary:- I need lots of dynamic data for my performance testing and it's not possible to generate those test data from Jmeter itself. Hence, I wrote a Java code which will generate these dynamic test data and will put those data into the excel file. This excel file can be consumed by JMeter script for the performance testing. Every iteration in JMeter needs a new set of test data and that's why I have created a bat file which will trigger the Maven execution(it's just mvn clean test) and will generate the fresh set of test data before each of iteration. Everything is working fine till this point. I just need to run the bat file from JMeter to trigger the test data creation before each iteration and that's the problem which I am facing
Problem:- As mentioned in the link How to run batch file(.bat) from Jmeter and as suggested by user #Dmitry T, I have added the OS sampler with the given parameters(See the screenshot below) but it is not starting the Maven execution. It is hitting the bat file(I put some msg command to check) but somehow it is not starting the execution. I tried the other solution given by the same user about using the Beanshell Sampler and running the command
Runtime.getRuntime().exec("C:/Windows/System32/cmd.exe /c D:/XXXX/XXX/XXXX/GenerateTestData.bat");
This is also not working. Am I missing something here? Please let me know if there is any solution for this? Appreciate any help on this?

The batch file is most likely not designed to work properly with current directory on execution being different to the directory containing the batch file. The current directory can be any directory. Very common are the directories %SystemRoot% (Windows directory) and %SystemRoot%\System32 or %SystemRoot%\SysWOW64 (Windows system directory) as current directory, whereby any directory can be the current directory on running a batch file.
A batch file referencing other files or directories relative to the batch file directory should set the current directory to the batch file directory or reference all directories and files with full batch file path.
The argument 0 of a batch file is always the batch file itself. The help output on running in command prompt window call /? explains how to reference an argument with a modifier. In this case %~dp0 should be used to get full path of the batch file.
So in the batch file can be used at top:
#echo off
cd /D "%~dp0"
The current directory is set with second command line to the directory containing the batch file as long as the batch file is stored on a drive with a drive letter.
There is another method to make the directory of the batch file the current directory which works even with batch file being stored on a network resource and the batch file is executed using its UNC path.
#echo off
setlocal EnableExtensions DisableDelayedExpansion
pushd "%~dp0" || exit /B
rem Other commands accessing files and directories in batch file directory
rem using no path or a path relative to current working directory.
popd
endlocal
The help output on running in a command prompt window pushd /? describes why this code works even with a UNC path on command extensions enabled which is made sure by the second command line which defines together with first command line completely the execution environment for the batch file without depending on configurations outside of the batch file.
Another solution is referencing all files and directories in batch file directory with full path which means with using %~dp0, for example "%~dp0ExcelFile.xlsx".
Note: The path string referenced with %~dp0 always ends with a backslash which is the directory separator on Windows as explained by Microsoft documentation about Naming Files, Paths, and Namespaces. Therefore concatenation of %~dp0 with another string like file/folder name or wildcard pattern should be done always without using an additional backslash for a 100% correct full file/folder/pattern argument string.

In the Command input provide full path to the cmd.exe
Change the Working directory to where your batch file lives
Use just batch file name in the Command Parameters
Something like:
See How to Run External Commands and Programs Locally and Remotely from JMeter article for more details.
Alternatively you can use Maven Exec Plugin to run your custom command before running the JMeter test

Commands for executing JAR file containing mapreduce programming on hadoop in Singleton Mode

I have Hadoop setup in Stand-Alone Mode on Ubuntu.
I have JAR file with MapReduce program: runner.JAR in /home/ubuntu folder
package for JAR file: mypackage3
I have input file: demo.csv in /home/ubuntu folder
I want to execute this jar file with demo.csv as an input.
I have used below commands to copy Hadoop's configuration files into it:
mkdir ~/input
cp /usr/local/hadoop/etc/hadoop/*.xml ~/input
Can you please tell me how to execute this MapReduce program.

ERROR: Streaming jar not found

Whenever I am running the following code to download a .csv file to my local system from a distributed file system (hdfs), I am getting the following error
ERROR: Streaming jar not found
The command I executed is
dumbo cat <hdfs path for .csv file> -hadoop $HADOOP_INSTALL > <.csv file path in local system>
I want to get the .csv file to open in libre-calc. Thank you.

To access files from HDFS, you have various options. From the command-line, there are two options:
Accessing HDFS via a NFS mounted drive (Then you can use the commands as with every other file that is on a mounted drive with a linux filesystem)
Using HDFS commands.
In your case, the following should do the job
hdfs dfs -get hdfs-path localpath

Define hadoop FS path inside shell script or perl script

How can I define my path for HDFS inside my shell or perl script so that it picks the input files stored in hdfs and executes the script . It executes correctly under the local file system, but i may need in hdfs
For example , I have part of the script below defined for executing in local path
Define names of folders to be watched
$folderRoot = **'/home/local'**;
A Java pgm
$oscmd = "java -classpath **/home/local**";
#print "forking Java PGM [$thisFile] [$oscmd]\n";
$oscmdResult = `$oscmd`;
print "$oscmdResult\n";
How to define the HDFS path inside Shell or perl script
How to define in the Java classpath in Java Pgm for HDFS so when the shell script is called it invokes the Java pgm as well
My objective : the perl/shell script needs to pick the input files in HDFS and execute it successfully

Mount HDFS into some place and then define a path to this place.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.