Hadoop Hive unable to move source to destination

Hadoop Hive unable to move source to destination - java

I am trying to use Hive 1.2.0 over Hadoop 2.6.0. I have created an employee table. However, when I run the following query:
hive> load data local inpath '/home/abc/employeedetails' into table employee;
I get the following error:
Failed with exception Unable to move source file:/home/abc/employeedetails to destination hdfs://localhost:9000/user/hive/warehouse/employee/employeedetails_copy_1
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
What wrong am I doing here? Are there any specific permissions that I need to set? Thanks in advance!

As mentioned by Rio, the issue involved lack of permissions to load data into hive table. I figures out the following command solves my problems:
hadoop fs -chmod g+w /user/hive/warehouse

See the permission for the HDFS directory:
hdfs dfs -ls /user/hive/warehouse/employee/employeedetails_copy_1
Seems like you may not have permission to load data into hive table.

The error might be due to permission issue on local filesystem.
Change the permission for local filesystem:
sudo chmod -R 777 /home/abc/employeedetails
Now, run:
hive> load data local inpath '/home/abc/employeedetails' into table employee;

If we face same error After running the above command in distributed mode, we can try the the below cammand in all super users of all nodes.
sudo usermod -a -G hdfs yarn
Note:we get this error after restart the all the services of YARN(in AMBARI).My problem was resolved.This is admin command better to care when you are running.

I meet the same problems and search it two days .Finally I find the reason is that datenode start a moment and shut down.
solve steps:
hadoop fs -chmod -R 777 /home/abc/employeedetails
hadoop fs -chmod -R 777 /user/hive/warehouse/employee/employeedetails_copy_1
vi hdfs-site.xml and add follow infos :
dfs.permissions.enabled
false
hdfs --daemon start datanode
vi hdfs-site.xml #find the location of 'dfs.datanode.data.dir'and'dfs.namenode.name.dir'.If it is the same location ,you must change it ,this is why I can't start datanode reason.
follow 'dfs.datanode.data.dir'/data/current edit the VERSION and copy clusterID to 'dfs.namenode.name.dir'/data/current clusterID of VERSION。
start-all.sh
if above it is unsolved , to be careful to follow below steps because of the safe of data ,but I already solve the problem because follow below steps.
stop-all.sh
delete the data folder of 'dfs.datanode.data.dir' and the data folder of 'dfs.namenode.name.dir' and tmp folder.
hdfs namenode -format
start-all.sh
solve the problem
maybe you will meet other problem like this.
problems:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
Cannot create directory
/opt/hive/tmp/root/1be8676a-56ac-47aa-ab1c-aa63b21ce1fc. Name node is
in safe mode
methods: hdfs dfsadmin -safemode leave

It might be because your hive user does not have the access to the HDFS' local directories

Related

Unable to create directory path [/User/Desktop/db2/logs] for Neo4j store

I am trying to use a tool that, in two steps, analyzes code smells for android.
In the first step, the tool parses an apk and generates within a directory .db files that should then be converted to CSV files in the next step; however, whenever I try to run the second step, the console returns the following error:
java.io.IOException: Unable to create directory path [/User/Desktop/db2/logs] for Neo4j store.
I think it is a Neo4J configuration problem.
I am currently running the tool with the following Java configuration:
echo $JAVA_HOME
/home/User/openlogic-openjdk-11.0.15
update-alternatives --config java
* 0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 auto mode
To be safe, I also started Neo4J, which returned the following output
sudo systemctl status neo4j.service
neo4j.service - Neo4j Graph Database
Loaded: loaded (/lib/systemd/system/neo4j.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-07-06 20:11:04 CEST; 16min ago
Main PID: 1040 (java)
Tasks: 57 (limit: 18901)
Memory: 705.4M
CPU: 16.639s
CGroup: /system.slice/neo4j.service
└─1040 /usr/bin/java -cp "/var/lib/neo4j/plugins:/etc/neo4j:/usr/share/neo4j/lib/*:/var/lib/neo4j/plugins/*" -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch -XX:+UnlockExper>.
How can I solve this?

You posted this error:
java.io.IOException: Unable to create directory path [/User/Desktop/db2/logs] for Neo4j store.
From that error, it looks like:
Neo4j was installed at "/User/Desktop/db2"
The permissions for that directory do not have "write" permission
I tried to reproduce this locally using Neo4j Community 4.4.5, following the steps below.
I do see an IOException related to "logs", but it's slightly different from what you posted. Perhaps we're on different versions of Neo4j.
Open terminal into install directory: cd neo4j
Verify "neo4j" is stopped: ./bin/neo4j stop
Rename existing "logs" directory: mv logs logs.save
Remove write permission for the Neo4j install: chmod u-w .
Start neo4j in console mode: ./bin/neo4j console
Observe errors in console output
2022-07-08 03:28:38.081+0000 INFO Starting...
ERROR StatusLogger Unable to create file [****************************]/neo4j/logs/debug.log
java.io.IOException: Could not create directory [****************************]/neo4j/logs
...
To fix things, try:
Get a terminal into your Neo4j directory:
cd /User/Desktop/db2
Set write permissions for the entire directory tree:
chmod u+w -R .
Start neo4j in console mode:
./bin/neo4j console
If this works and you're able to run neo4j fine, it points to an issue with user permissions when running neo4j as a system service.
The best steps from there depend on the system, your access, how comfortable you are making changes, probably other things. An easy, brute-force hammer would be to manually create each directory you discover (such as "/User/Desktop/db2/logs") and grant premissions to all users (chmod ugo+w .), then try re-running the service, see what errors pop up. Repeat that until you're able to run the service without errors.

Spark history server using wasb blob storage won't start

I installed a HDP 2.5 Hadoop/Spark cluster using cloudbreak on Azure.
Everything works except the spark history server. In the log it says the default uri for the event log hdfs:///spark-history is false, the hostname is missing.
So I replaced it with a direct reference to the actual location on the azure blob storage: wasb://<host>:<port>/spark-history. This uri works when used with hdsf dfs -ls, but still the spark history server won't start. Now it complains about a class not found: Caused by: java.lang.NoClassDefFoundError: com/microsoft/azure/storage/blob/BlobListingDetails.
So, it seems it doesn't load some driver during start. I did find /usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar, that might be it. But I'm not sure how to make the history server load the jar during startup using the ambari config editor or whether this is even the right solution to the original problem.
The strangest thing is that Azure HDInsight uses blob storage and there the spark history server simply runs using the default hdfs:///spark-history setting.
Any suggestions on how to load the azure-storage driver or any other approach to this problem?
Thanx

I'll answer my own question. Someone on the hortonworks community forum had the answer: the spark assembly jar contains invalid storage jars. Updating the assembly jar solves the issue:
mkdir -p /tmp/jarupdate && cd /tmp/jarupdate
find /usr/hdp/ -name "azure-storage*.jar"
cp /usr/hdp/2.5.0.1-210/hadoop/lib/azure-storage-2.2.0.jar .
cp /usr/hdp/current/spark-historyserver/lib/spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar .
unzip azure-storage-2.2.0.jar
jar uf spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar com/
mv -f spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar /usr/hdp/current/spark-historyserver/lib/spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar
cd .. && rm -rf /tmp/jarupdate

PIVOTAL GPDB- External table gphdfs protocol command ended with error. sh: java: command not found

We have small array of Greenplum database.
When trying to read External table in it.
Getting error
proddb=# select count(*) from ext_table;
ERROR: external table gphdfs protocol command ended with error. sh: java: command not found (seg0 slice1 sdw:
40000 pid=8675)
DETAIL:
Command: 'gphdfs://path/to/hdfs External table revenuereport_stg0, file gphdfs://Path/to/hdfs
We tried :
Checked Java env on greenplum master host.
Also checked , Setting up - the parameters for GPDB
[gpadmin#admin ~]$ gpconfig -c gp_hadoop_home -v "'/usr/lib/gphd'"
[gpadmin#admin ~]$ gpconfig -c gp_hadoop_target_version -v "'gphd-2.0'"
But it is failing with this error
[gpadmin#mdw ~]$ gpconfig -c gp_hadoop_home -v "'/usr/lib/gphd'"
20170123:02:02:04:017762 gpconfig:mdw:gpadmin-[ERROR]:-failed updating the postgresql.conf files on host: sdw
20170123:02:02:04:017762 gpconfig:mdw:gpadmin-[ERROR]:-failed updating the postgresql.conf files on host: mdw
20170123:02:02:09:017762 gpconfig:mdw:gpadmin-[ERROR]:-finished with errors
Therefore ,Test for HDFS Access from greenplum host is not working.
Checked if HDFS is accessible from any of the segment servers
[gpadmin#sdw1 ~]$hdfs dfs -ls hdfs://hdm2:8020/
Any help on it would bemuch appreciated !

It looks like a path issue to me .Please set right JAVA_HOME in hadoop-env.sh file
Also ,Please have a look into the following articles for better understanding on configuring gphdfs with gpdb .
https://discuss.pivotal.io/hc/en-us/articles/202635496-How-to-access-HDFS-data-via-GPDB-external-table-with-gphdfs-protocol
https://discuss.pivotal.io/hc/en-us/articles/203083906-Understanding-GPHDFS-Configurations
https://discuss.pivotal.io/hc/en-us/articles/221492507-One-time-HDFS-Protocol-Installation-for-GPHDFS-access-to-HDP-2-x-cluster
Thanks
Pratheesh Nair

export JAVA_HOME=/usr/local/jdk18
export HADOOP_HOME=/opt/apps/hadoop
export GP_JAVA_OPT='-Xmx1000m -XX:+DisplayVMOutputToStderr'
export PATH=$JAVA_HOME/bin:$PATH
export KRB5CCNAME=$GP_SEG_DATADIR/gpdb-gphdfs.krb5cc
JAVA=$JAVA_HOME/bin/java
java_home 和 hadoop_home 要给具体数值，置于最前面，写成从环境变量获取JAVA_HOME=$JAVA_HOME，GP处理时获取会为空值。

How to ensure HSQLDB properties configuration loaded on Windows?

I have a problem with HSQLDB V2.3 on Windows. I can't connect with new databases using the HSQLDB Server.
Is there a log or debug option for the server so I can check the properties loaded and file paths, etc?
Is my properties file OK? I wasn't sure how to formulate file paths for windows.
Can I use quotes on file path names?
Is the connection string I'm using for the tmp db correct?
What's the correct syntax to use the --props server argument?
--props path
--props path/filename
I have set-up two environment variables (too keep it simple*). These variables don't have any effect except to save my typing. Initially I was loading the server from the HSQLDB folder directly.
HSQLDB_HOME ... home folder for the current HSQLDB
HSQLDB_DATA ... folder for data repository
I am following the the steps from:
Running and Using HSQLDB
Every time I connect via the server it makes a database called, "test" instead of letting me connect to either of the two databases specified in the server.properties.
%HSQLDB_DATA%/
test.log
test.properties
test.script
test.tmp/ .......... (folder)
test.lck
I made a 'server.properties' file in:
%HSQLDB_HOME%/lib/
where the HSQLDB JAR file is. I want two databases: tmp and dev:
# -- tmp
server.database.0=file:hsqldb/tmp_db/tmp
server.dbname.0=tmp_db
#
####
#
# -- dev
server.database.1=file:r:/.data/hsqldb/dev_db/dev
server.dbname.1=dev_db
I expected that the properties file to be enough to set-up two databases. When I run the hsqldb manager I don't get a connection for tmp say:
"jdbc:hsqldb:hsql://localhost/tmp"
user: SA, password: ""
I get a pop-up error:
database alias does not exist (Manager)
[Thread[HSQLDB Connection #26827674,5,HSQLDB Connections #372f7a8d]]: database alias=tmp does not exist (Server)
I created these two manually using the cmd-line, e.g. named "tmp":
%JAVA_HOME%\bin\java.exe -classpath %HSQLDB_HOME%\lib\hsqldb.jar org.hsqldb.server.Server org.hsqldb.server.Server --database.1 file:r:/.data/hsqldb/tmp_db --dbname.1 tmp_db
And could connect and create tmp:
%HSQLDB_DATA%/tmp_db/
tmp.log
tmp.properties
tmp.script
tmp.tmp/ .......... (folder)
tmp.lck
as forecast in the documentation. When I start-up the HSQLDB Server with the aforesaid 'server.properties' file or specifying properties explicitly:
%JAVA_HOME%\bin\java.exe -classpath %HSQLDB_HOME%\lib\hsqldb.jar org.hsqldb.server.Server --trace true --props %HSQLDB_DATA%
The server will only let me connect with a database called "test" as described at the beginning of the question.
Because the properties file looks good and the in-process file connection string works,
jdbc:hsqldb:hsqldb/tmp_db/tmp
I am left considering that the server.properties file is in the wrong place or not loading for some reason. It would be wonderful if there's a way for the server to dump the properties file at start-time :-) Thanks in advance for your suggestions ...

I have found the problem. Firstly, thanks to this tutorial:
HSQLDB Installation
After reviewing this I realised my error.
The server.properties file must be in the current folder when the server script runs. I had read that on the Running and Using HSQLDB manual page but misinterpreted its meaning and I put the properties file in my %HSQLDB_HOME%/lib folder. Oops.
When you look at the BAT script, it actually changed the current folder to be the %HSQLDB_HOME%/data folder ...
cd ..\data
So the default location for your server.properties file should be your: %HSQLDB_HOME%/data if you want to work with the default runServer.bat script.
For those wanting to separate data from the server software. I made an improvement for the default script using the two environment variables as follows.
HSQLDB_HOME ... home folder for the current HSQLDB
HSQLDB_DATA ... folder for data repository
runServer.bat:
#cd /d %HSQLDB_DATA%
#cd
#echo.
#rem __ #pause
%JAVA_HOME%\bin\java -classpath %HSQLDB_HOME%\lib\hsqldb.jar org.hsqldb.server.Server %1 %2 %3 %4 %5 %6 %7 %8 %9
#echo.
#pause
Which now expects my server.properties file in the %HSQL_DATA% folder. And that works. Also for my server since it is for development /testing, I'm using the --trace=true option. Like a lot of these things, now I get-it, it all makes perfect sense. Hopefully my misunderstanding will assist others who haven't found a simple tutorial before resorting to stackoverflow.

Need assistance with running the WordCount.java provided by Cloudera

Hey guys so I am trying to run the WordCount.java example, provided by cloudera. I ran the command below and am getting the exception that I have put below the command. So do you have any suggestions on how to proceed. I have gone through all the steps provided by cloudera.
Thanks in advance.
hadoop jar ~/Desktop/wordcount.jar org.myorg.WordCount ~/Desktop/input
~/Desktop/output
Error:
ERROR security.UserGroupInformation: PriviledgedActionException
as:root (auth:SIMPLE)
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist: hdfs://localhost/home/rushabh/Desktop/input
Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: hdfs://localhost/home/rushabh/Desktop/input
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:194)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:977)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:969)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1248)
at org.myorg.WordCount.main(WordCount.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Your input and output files should be at hdfs. Atleast input should be at hdfs.
use the following command:
hadoop jar ~/Desktop/wordcount.jar org.myorg.WordCount hdfs:/input
hdfs:/output
To copy a file from your linux to hdfs use the following command:
hadoop dfs -copyFromLocal ~/Desktop/input hdfs:/
and check your file using :
hadoop dfs -ls hdfs:/
Hope this will help.

The error message says that this file does not exist: "hdfs://localhost/home/rushabh/Desktop/input".
Check that the file does exist at the location you've told it to use.
Check the hostname is correct. You are using "localhost" which most likely resolves to a loopback IP address; e.g. 127.0.0.1. That always means "this host" ... in the context of the machine that you are running the code on.

When I tried to run wordcount MapReduce code, I was getting error as:
ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/hduser/wordcount
I was trying to execute the wordcount MapReduce java code with input and output path as /user/hduser/wordcount and /user/hduser/wordcount-output. I just added 'fs.default.name' from core-site.xml before this path and it ran perfectly.

The error clearly states that your input path is local. Please specify the input path to something on HDFS rather than on local machine. My guess
hadoop jar ~/Desktop/wordcount.jar org.myorg.WordCount ~/Desktop/input
~/Desktop/output
needs to be changed to
hadoop jar ~/Desktop/wordcount.jar org.myorg.WordCount <hdfs-input-dir>
<hdfs-output-dir>
NOTE: To run MapReduce job, the input directory should be in HDFS, not local.
Hope this helps.

So I added the input folder to HDFS using the following command
hadoop dfs -put /usr/lib/hadoop/conf input/

Check the ownership of the files in hdfs to ensure that the owner of the job (root) has read privileges on the input files. Cloudera provides an hdfs viewer that you can use to view the filespace; open a web browser to either localhost:50075 or {fqdn}:50075 and click on "Browse the filesystem" to view the Input directory and input files. Check the ownership flags; just like *nix filesystem.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.