Debug information with UDF's in Hive

Debug information with UDF's in Hive - java

I'm trying to get GeoIP working with hive. I found this: http://www.jointhegrid.com/hive-udf-geo-ip-jtg/index.jsp, which seems to be exactly what I want.
I built the jars (I have no java experience, so I only hope I did this part right), added them to my query and get this:
hive> ADD jar hive-udf-geo-ip-jtg.jar;
Added hive-udf-geo-ip-jtg.jar to class path
Added resource: hive-udf-geo-ip-jtg.jar
hive> ADD jar geo-ip-java.jar;
Added geo-ip-java.jar to class path
Added resource: geo-ip-java.jar
hive> ADD file GeoIPCity.dat;
Added resource: GeoIPCity.dat
hive> create temporary function geoip as 'com.jointhegrid.hive.udf.GenericUDFGeoIP';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
Is there a way to find out what exactly is going wrong? return code 1 doesn't tell me much... Is there a log file somewhere?

if you want to see the log of hive, you can use $HIVE_HOME/bin/hive -hiveconf hive.root.logger=INFO,console. You can also change levels (DEBUG, INFO, WARN, ERROR or FATAL) to see if you can get enough information.

Try executing hive UDF with below command
hive --hiveconf hive.root.logger=DRFA --hiveconf hive.log.dir=./logs --hiveconf hive.log.level=DEBUG -e "query"
or
hive --hiveconf hive.root.logger=DRFA --hiveconf hive.log.dir=./logs --hiveconf hive.log.level=DEBUG -f queryscript.hql
The logs would be captured in a file under logs folder (current directory). Please make sure that the logs folder exist.
Try adjusting log level to get right detail.

Related

Corrupt H2 Database. Failed to recovery using the Recovery Tool

Today one of my H2 database failed to connect and presented the following error message:
Unable to obtain connection from database (jdbc:h2:file:C:\Users\Username\.appfiles\db\appdb) for user 'sa': File corrupted while reading record: null. Possible solution: use the recovery tool [90030-200]
SQL State : 90030
Error Code : 90030
Message : File corrupted while reading record: null. Possible solution: use the recovery tool [90030-200]
As suggested I tried to use the recover tool as instructed by the documentation, the steps I executed were the following:
Go to your h2 data file directory
java -cp h2-1.4.200.jar org.h2.tools.Recover
Use SQL file generated by the recovery tool to recreate the database
The steps created two files: a .sql and a .txt file, but the SQL generated by the tool didn't have any data or DDL from the database, just some aliases and a bunch of comments. The content of the files are linked below, if they can help shed any light on what went wrong during the process.
This is the .sql file output: https://pastebin.com/DFfwPemP
This is the .txt file output: https://pastebin.com/6zwCgqN3
Is there any step I'm not doing right or is any other thing I can try to recover this db? Any suggestion is welcome.

Run that files with
java -cp h2-1.4.200.jar org.h2.tools.RunScript -url jdbc:h2:[path to destination db file]/[db name] -user [user] -password [password] -script [text file/sql file]

Hadoop Hive unable to move source to destination

I am trying to use Hive 1.2.0 over Hadoop 2.6.0. I have created an employee table. However, when I run the following query:
hive> load data local inpath '/home/abc/employeedetails' into table employee;
I get the following error:
Failed with exception Unable to move source file:/home/abc/employeedetails to destination hdfs://localhost:9000/user/hive/warehouse/employee/employeedetails_copy_1
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
What wrong am I doing here? Are there any specific permissions that I need to set? Thanks in advance!

As mentioned by Rio, the issue involved lack of permissions to load data into hive table. I figures out the following command solves my problems:
hadoop fs -chmod g+w /user/hive/warehouse

See the permission for the HDFS directory:
hdfs dfs -ls /user/hive/warehouse/employee/employeedetails_copy_1
Seems like you may not have permission to load data into hive table.

The error might be due to permission issue on local filesystem.
Change the permission for local filesystem:
sudo chmod -R 777 /home/abc/employeedetails
Now, run:
hive> load data local inpath '/home/abc/employeedetails' into table employee;

If we face same error After running the above command in distributed mode, we can try the the below cammand in all super users of all nodes.
sudo usermod -a -G hdfs yarn
Note:we get this error after restart the all the services of YARN(in AMBARI).My problem was resolved.This is admin command better to care when you are running.

I meet the same problems and search it two days .Finally I find the reason is that datenode start a moment and shut down.
solve steps:
hadoop fs -chmod -R 777 /home/abc/employeedetails
hadoop fs -chmod -R 777 /user/hive/warehouse/employee/employeedetails_copy_1
vi hdfs-site.xml and add follow infos :
dfs.permissions.enabled
false
hdfs --daemon start datanode
vi hdfs-site.xml #find the location of 'dfs.datanode.data.dir'and'dfs.namenode.name.dir'.If it is the same location ,you must change it ,this is why I can't start datanode reason.
follow 'dfs.datanode.data.dir'/data/current edit the VERSION and copy clusterID to 'dfs.namenode.name.dir'/data/current clusterID of VERSION。
start-all.sh
if above it is unsolved , to be careful to follow below steps because of the safe of data ,but I already solve the problem because follow below steps.
stop-all.sh
delete the data folder of 'dfs.datanode.data.dir' and the data folder of 'dfs.namenode.name.dir' and tmp folder.
hdfs namenode -format
start-all.sh
solve the problem
maybe you will meet other problem like this.
problems:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
Cannot create directory
/opt/hive/tmp/root/1be8676a-56ac-47aa-ab1c-aa63b21ce1fc. Name node is
in safe mode
methods: hdfs dfsadmin -safemode leave

It might be because your hive user does not have the access to the HDFS' local directories

How to ensure HSQLDB properties configuration loaded on Windows?

I have a problem with HSQLDB V2.3 on Windows. I can't connect with new databases using the HSQLDB Server.
Is there a log or debug option for the server so I can check the properties loaded and file paths, etc?
Is my properties file OK? I wasn't sure how to formulate file paths for windows.
Can I use quotes on file path names?
Is the connection string I'm using for the tmp db correct?
What's the correct syntax to use the --props server argument?
--props path
--props path/filename
I have set-up two environment variables (too keep it simple*). These variables don't have any effect except to save my typing. Initially I was loading the server from the HSQLDB folder directly.
HSQLDB_HOME ... home folder for the current HSQLDB
HSQLDB_DATA ... folder for data repository
I am following the the steps from:
Running and Using HSQLDB
Every time I connect via the server it makes a database called, "test" instead of letting me connect to either of the two databases specified in the server.properties.
%HSQLDB_DATA%/
test.log
test.properties
test.script
test.tmp/ .......... (folder)
test.lck
I made a 'server.properties' file in:
%HSQLDB_HOME%/lib/
where the HSQLDB JAR file is. I want two databases: tmp and dev:
# -- tmp
server.database.0=file:hsqldb/tmp_db/tmp
server.dbname.0=tmp_db
#
####
#
# -- dev
server.database.1=file:r:/.data/hsqldb/dev_db/dev
server.dbname.1=dev_db
I expected that the properties file to be enough to set-up two databases. When I run the hsqldb manager I don't get a connection for tmp say:
"jdbc:hsqldb:hsql://localhost/tmp"
user: SA, password: ""
I get a pop-up error:
database alias does not exist (Manager)
[Thread[HSQLDB Connection #26827674,5,HSQLDB Connections #372f7a8d]]: database alias=tmp does not exist (Server)
I created these two manually using the cmd-line, e.g. named "tmp":
%JAVA_HOME%\bin\java.exe -classpath %HSQLDB_HOME%\lib\hsqldb.jar org.hsqldb.server.Server org.hsqldb.server.Server --database.1 file:r:/.data/hsqldb/tmp_db --dbname.1 tmp_db
And could connect and create tmp:
%HSQLDB_DATA%/tmp_db/
tmp.log
tmp.properties
tmp.script
tmp.tmp/ .......... (folder)
tmp.lck
as forecast in the documentation. When I start-up the HSQLDB Server with the aforesaid 'server.properties' file or specifying properties explicitly:
%JAVA_HOME%\bin\java.exe -classpath %HSQLDB_HOME%\lib\hsqldb.jar org.hsqldb.server.Server --trace true --props %HSQLDB_DATA%
The server will only let me connect with a database called "test" as described at the beginning of the question.
Because the properties file looks good and the in-process file connection string works,
jdbc:hsqldb:hsqldb/tmp_db/tmp
I am left considering that the server.properties file is in the wrong place or not loading for some reason. It would be wonderful if there's a way for the server to dump the properties file at start-time :-) Thanks in advance for your suggestions ...

I have found the problem. Firstly, thanks to this tutorial:
HSQLDB Installation
After reviewing this I realised my error.
The server.properties file must be in the current folder when the server script runs. I had read that on the Running and Using HSQLDB manual page but misinterpreted its meaning and I put the properties file in my %HSQLDB_HOME%/lib folder. Oops.
When you look at the BAT script, it actually changed the current folder to be the %HSQLDB_HOME%/data folder ...
cd ..\data
So the default location for your server.properties file should be your: %HSQLDB_HOME%/data if you want to work with the default runServer.bat script.
For those wanting to separate data from the server software. I made an improvement for the default script using the two environment variables as follows.
HSQLDB_HOME ... home folder for the current HSQLDB
HSQLDB_DATA ... folder for data repository
runServer.bat:
#cd /d %HSQLDB_DATA%
#cd
#echo.
#rem __ #pause
%JAVA_HOME%\bin\java -classpath %HSQLDB_HOME%\lib\hsqldb.jar org.hsqldb.server.Server %1 %2 %3 %4 %5 %6 %7 %8 %9
#echo.
#pause
Which now expects my server.properties file in the %HSQL_DATA% folder. And that works. Also for my server since it is for development /testing, I'm using the --trace=true option. Like a lot of these things, now I get-it, it all makes perfect sense. Hopefully my misunderstanding will assist others who haven't found a simple tutorial before resorting to stackoverflow.

SQOOP SQLSERVER Failed to load driver " appropriate connection manager is not being set"

I downloaded sqljdbc4.jar. I'm invoking sqoop like so from the folder (where the jar is stored):
sqoop list-tables --driver com.microsoft.jdbc.sqlserver.SQLServerDriver --connect jdbc:sqlserver://localhost:1433;user=me;password=myPassword; -libjars=./sqljdbc4.jar
I'm getting the following warning & error:
13/10/25 18:38:13 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
13/10/25 18:38:13 INFO manager.SqlManager: Using default fetchSize of 1000
13/10/25 18:38:13 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.microsoft.jdbc.sqlserver.SQLServerDriver
java.lang.RuntimeException: Could not load db driver class: com.microsoft.jdbc.sqlserver.SQLServerDriver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:727)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.listTables(SqlManager.java:418)
at org.apache.sqoop.tool.ListTablesTool.run(ListTablesTool.java:49)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
UPDATE
I changed the command line to reflect the comments below, I get the same error:
sqoop list-databases -libjars=<ABSOLUTE_PATH>/jars/sqljdbc4.jar --connect jdbc:sqlserver://localhost:1433;user=me;password=password
13/10/28 17:00:33 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.microsoft.sqlserver.jdbc.SQLServerDriver
java.lang.RuntimeException: Could not load db driver class: com.microsoft.sqlserver.jdbc.SQLServerDriver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:727)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:57)
at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
When I look at the listing of sqljdbc4.jar, I do see the class in that path... Is it possible that libjars option isn't doing what I think it is supposed to do?

In vast majority of cases using parameter --driver is not required and even more will lead to an undesirable behaviour. I would strongly recommend dropping this argument entirely from your command line. Check out Connectors vs Drivers blog post for more details.
Also in addition you are specifying a nonexistent JDBC Driver class. The correct one is:
com.microsoft.sqlserver.jdbc.SQLServerDriver
You can see it in the official docs, whereas you are specifying
com.microsoft.jdbc.sqlserver.SQLServerDriver
Notice the different order of jdbc and sqlserver packages. This is one of the reasons why it's recommended to not use the --driver option at all.

You need to put sqljdbc4.jar in $SQOOP_HOME/lib and also add sqoop-1.4.4.jar or whatever version you are using along with sqljdbc4.jar to $HADOOP_HOME/lib.
I'm using Hadoop-2.2.0, so i put it inside $HADOOP_HOME/share/hadoop/common/lib directory, and use the following command to do the import:
export HCAT_HOME=/home/Kuntal/BIG_DATA/hive-0.12.0/hcatalog
(sometimes HCatlog of Hive needs to be exported or set.)
./sqoop-import --connect "jdbc:sqlserver://IP\INSTANCE;port=1433;username=USERNAME;password=PASSWORD;database=DATABASE_NAME" --table TABLE_NAME --target-dir hdfs://localhost:50315/sqoop --m 1
Sometimes you have to specify the port, otherwise default works. Hope you find it useful.

According to this sqoop documentation, generic options like -libjars must come before tool-specific options:
Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
...
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.

Recently came across this same problem. Even though documentation says that it will pick up additional jar files. The problem is I believe propagated from Hadoop jar command line option. -libjars is not reliable option to set additional jar file path.
Instead, choose HADOOP_CLASSPATH option to setup additional jar files.
In my case, I had multiple different versions of driver JAR, but using -libjars was not correctly picking up file for me.
To resolve this, I specified
export HADOOP_CLASSPATH=/$SQOOP_HOME/<path_to_driver>.jar
This makes sure that correct JAR file gets loaded.

Error in installing DBpedia Spotlight while running the server class in jar

I get the following error:
org.dbpedia.spotlight.exceptions.ConfigurationException: Cannot find spotter file ../dist/src/deb/control/data/usr/share/dbpedia-spotlight/spotter.dict
at org.dbpedia.spotlight.model.SpotterConfiguration.<init>(SpotterConfiguration.java:54)
at org.dbpedia.spotlight.model.SpotlightConfiguration.<init>(SpotlightConfiguration.java:143)
at org.dbpedia.spotlight.web.rest.Server.main(Server.java:70)
Usage:
java -jar dbpedia-spotlight.jar org.dbpedia.spotlight.web.rest.Server [config file]
or:
mvn scala:run "-DaddArgs=[config file]"

Quick solution:
wget http://spotlight.dbpedia.org/download/release-0.5/dbpedia-spotlight-quickstart.zip
unzip dbpedia-spotlight-quickstart.zip
cd dbpedia-spotlight-quickstart/
./run.sh
Explanation:
DBpedia Spotlight looks for ~3.5M things of ~320 types in text and tries to disambiguate them to their global unique identifiers in DBpedia. Therefore it needs data files to accompany its jar. A minuscule example is distributed along with the source, but for real use cases you may need the larger files. After you've downloaded the files, you need to modify the configuration in server.properties with the correct path to the files. The error message you got tells you that one of the necessary files (spotter.dict) could not be found in the path you indicated in your server.properties.
More information available here:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Run-from-a-JAR

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.