PIG/Hadoop issue: ERROR 2081: Unable to setup the load function [duplicate] - java

This question already has answers here:
how to load files on hadoop cluster using apache pig?
(3 answers)
Closed 3 years ago.
I'm running Pig 0.13.0 and Hadoop 2.5.1, both installed from the Apache distros, they're not packages from Horton or Cloudera or anything.
I'm working with a tutorial and can get it to work fine when running Pig locally ($> ./pig -x local), but when trying to run it on the Hadoop instance I get an error that I'm having a hard time researching on the internet.
This command:
movies = LOAD '/home/hduser/pig-tutorial-master/movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
DUMP movies;
Works fine running locally. When I run it in Hadoop/MR mode, it seems to work fine when I run the first line of code:
grunt> movies = LOAD '/home/hduser/pig-tutorial-master/movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
2014-10-29 18:16:26,281 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-29 18:16:26,281 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
But when I try to $> DUMP movies it gives me this trace:
grunt> dump movies
2014-10-29 18:17:15,419 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-10-29 18:17:15,420 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-10-29 18:17:15,445 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2014-10-29 18:17:15,469 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2081: Unable to setup the load function.
Details at logfile: /usr/local/pig/pig_1414606194436.log
The ERROR 2081 is what I'm trying to diagnose, but can't find anything that helps point me in the right direction. Any ideas of where to start? I assume it's something to do with my Hadoop installation and not Pig, but I don't know. Any suggestions will be helpful.
Thanks,
Mark
EDIT: Here is the full log output:
ERROR 2081: Unable to setup the load function.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias movies
at org.apache.pig.PigServer.openIterator(PigServer.java:912)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:542)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias movies
at org.apache.pig.PigServer.storeEx(PigServer.java:1015)
at org.apache.pig.PigServer.store(PigServer.java:974)
at org.apache.pig.PigServer.openIterator(PigServer.java:887)
... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: movies: Store(hdfs://localhost:54310/tmp/temp-1276361014/tmp-2000190966:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:143)
at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:160)
at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:275)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
at org.apache.pig.PigServer.storeEx(PigServer.java:1011)
... 14 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:127)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
... 21 more
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/home/hduser/pig-tutorial-master/movies_data.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:95)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:123)
... 22 more
================================================================================

If you are running the pig commands from grunt shell on hadoop cluster, set the property:
set opt.fetch false;
By setting the above property dump will run in mapreduce mode, by default the above property is set to true.

if you are working with hadoop 2.6.0 and pig 0.14
downgrading pig to 0.13 may help. This worked for me.

Related

Exception when trying to run corda

I'm trying to run sample cordapp-example code by cloning from Github repository using:
git clone https://github.com/corda/samples
I followed all the steps as mentioned in the documentation for running the application from IntelliJ.
[ERROR] 14:54:18,832 [main] internal.DriverDSLImpl. - Driver shutting down because of exception [errorCode=1crywct, moreInformationAt=https://errors.corda.net/OS/4.3/1crywct]
java.lang.IllegalStateException: Unable to start notaries. A required port might be bound already.
at net.corda.testing.node.internal.DriverDSLImpl.start(DriverDSLImpl.kt:390) ~[corda-node-driver-4.3.jar:?]
at net.corda.testing.node.internal.DriverDSLImplKt.genericDriver(DriverDSLImpl.kt:1048) ~[corda-node-driver-4.3.jar:?]
at net.corda.testing.driver.Driver.driver(Driver.kt:185) ~[corda-node-driver-4.3.jar:?]
at com.example.test.NodeDriverKt.main(NodeDriver.kt:15) ~[test/:?]
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) ~[?:1.8.0_231]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) ~[?:1.8.0_231]
at net.corda.core.internal.concurrent.CordaFutureImpl.get(CordaFutureImpl.kt) ~[corda-core-4.3.jar:?]
at net.corda.core.internal.concurrent.CordaFutureImplKt.get(CordaFutureImpl.kt:172) ~[corda-core-4.3.jar:?]
at net.corda.core.utilities.KotlinUtilsKt.getOrThrow(KotlinUtils.kt:134) ~[corda-core-4.3.jar:?]
at net.corda.testing.node.internal.DriverDSLImpl.start(DriverDSLImpl.kt:379) ~[corda-node-driver-4.3.jar:?]
... 3 more
[WARN] 14:54:19,251 [driver-pool-thread-0] internal.InternalTestUtils. - Been polling address localhost:10040 to bind for 60 seconds...
[INFO] 14:54:57,702 [driver-pool-thread-0] internal.RPCClient. - Startup took 10512 msec
[INFO] 14:54:58,015 [driver-pool-thread-1] internal.DriverDSLImpl. - Node handle is ready. NodeInfo: NodeInfo(addresses=[localhost:10040], legalIdentitiesAndCerts=[O=Notary Service, L=Zurich, C=CH], platformVersion=5, serial=1578902078740), WebAddress: localhost:10043
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
I see this description:
[ERROR] 14:13:50,501 [main] internal.DriverDSLImpl. - Driver shutting down because of exception [errorCode=1crywct, moreInformationAt=https://errors.corda.net/OS/4.3/1crywct]
Has anyone else seen this before and are there any recommendations to fixing the issue or clues as to how we can debug it further?
From the error message, I'd see this:
"Unable to start notaries. A required port might be bound already"
which means that the port(s) used by notary is being used by other application, or, most likely being used by another running notary.
How to fix?
Open node.conf in your notary folder, and check the ports listed, such as
address : "localhost:10006"
then check the ports usage in you system, either kill the running process or change the port in notary node.conf and run again.
Good luck.

Error using Matlab parpool from Java on Linux

I need to use MATLAB library compiled to JAR file in my own Java application. That library uses parpool and has some parfor operators. We can use this example as a test.
On Windows it works. On Linux (Ubuntu xenial) I get the error like this but not the same:
Starting parallel pool (parpool) using the 'local_mcruserdata' profile ...
Error using parpool (line 104)
Failed to start a parallel pool. (For information in addition to the causing error, validate the profile 'local_mcruserdata' in the Cluster Profile Manager.)
Error in sample_pct (line 11)
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line 666)
Failed to initialize the interactive session.
Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 767)
The interactive communicating job failed with no message.
The error is com.mathworks.toolbox.javabuilder.MWException: Failed to start a parallel pool. (For information in addition to the causing error, validate the profile 'local_mcruserdata' in the Cluster Profile Manager.)
My MATLAB is 9.2.0.538062 (R2017a) and my JDK is 1.8.0_171 x86_64 on both systems.
If I comment line 11 (parpool function invocation) the error goes away but parfor operator does not create extra workers.
Is it a known bug and can it be fixed?
After adding setSchedulerMessageHandler(#disp); setenv('MDCE_DEBUG','true') to the example (as it was adviced in the comment) I have got the message:
matlabroot/bin/glnxa64/ctfxlauncher: error while loading shared libraries: libmwmclmcrrt.so.9.2: cannot open shared object file: No such file or directory
find matlabroot -name libmwmclmcrrt.so.9.2 gives matlabroot/runtime/glnxa64/libmwmclmcrrt.so.9.2
Adding matlabroot/runtime/glnxa64 directory to LD_LIBRARY_PATH helped!

Jmeter plugin execution throws ArrayIndexOutOfBoundsException

I've looked for answers to this problem, but couldn't find any over the internet. Maybe someone here ran into this issue before.
I have a CentOS machine with Jmeter 3.1. On this machine everything works fine. I created a new VM, and copied the jmeter directory to the new machine with everything set up. Test execution works fine, but when I try to use any of the plugins (cmdrunner-2.0.jar or by JMeterPluginsCMD.sh)
I get an
Exception back, with not much information what's wrong:
[root#box bin]# java -jar "/opt/apache-jmeter-3.1/lib/cmdrunner-2.0.jar" -n --tool Reporter --input-jtl "/tmp/data.csv" --plugin-type SynthesisReport --generate-csv "/tmp/report.csv"
WARN 2017-10-22 12:41:57.204 [jmeter.u] (): Exception 'null' occurred when fetching String property:'sampleresult.default.encoding', defaulting to:ISO-8859-1
WARN 2017-10-22 12:41:57.224 [jmeter.u] (): Exception 'null' occurred when fetching String property:'jmeterPlugin.prefixPlugins'
INFO 2017-10-22 12:41:57.224 [kg.apc.j] (): Using JMeterPluginsCMD v. N/A
INFO 2017-10-22 12:41:57.229 [jmeter.u] (): Setting Locale to en_US
INFO 2017-10-22 12:41:57.238 [kg.apc.j] (): Loading user properties from: /opt/apache-jmeter-3.1/bin/user.properties
INFO 2017-10-22 12:41:57.238 [kg.apc.j] (): Loading system properties from: /opt/apache-jmeter-3.1/bin/system.properties
ERROR: java.lang.ArrayIndexOutOfBoundsException: 0
*** Problem's technical details go below ***
Home directory was detected as: /opt/apache-jmeter-3.1/lib
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at sun.font.CompositeStrike.getStrikeForSlot(CompositeStrike.java:75)
at sun.font.CompositeStrike.getFontMetrics(CompositeStrike.java:93)
at sun.font.FontDesignMetrics.initMatrixAndMetrics(FontDesignMetrics.java:359)
...
...
ERROR: java.lang.ArrayIndexOutOfBoundsException: 0
That's all I get. The only differences between the two machines is:
Working platform:
kernel 3.10.0-514.26.2.el7.x86_64
java (build 1.8.0_131-b12)
Not working:
kernel 3.10.0-693.2.2.el7.x86_64,
java (build 1.8.0_144-b01)
There are no environment variables missing.
Any suggestions are more than welcomed
This is a java bug on this particular platform:
https://bugzilla.redhat.com/show_bug.cgi?id=1484079
I can't believe it - yum update java solved my issue. The thing is, I just updated java last week..

GDAL error on CentOS 7

I’m no IT guy so it’s possible that I making something very wrong. But I’m struggling for days with this issue…
Working in a VM with CentOS 7.
When running something in GeoKettle I have this error that points to GDAL.
Native library load failed.
java.lang.UnsatisfiedLinkError: /home/geoairc/QGIS_Install/geokettle/libswt/linux/x86_64/libogrjni.so: liblcms.so.1: cannot open shared object file: No such file or directory
INFO 29-05 10:10:37,639 - OGR Input - wfs xml geodomus.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=0)
Exception in thread "OGR Input - wfs xml geodomus.0 (Thread-10)" java.lang.UnsatisfiedLinkError: org.gdal.ogr.ogrJNI.RegisterAll()V
at org.gdal.ogr.ogrJNI.RegisterAll(Native Method)
at org.gdal.ogr.ogr.RegisterAll(ogr.java:110)
at org.pentaho.di.core.geospatial.OGRReader.open(OGRReader.java:75)
at org.pentaho.di.trans.steps.ogrfileinput.OGRFileInputMeta.getOutputFields(OGRFileInputMeta.java:277)
at org.pentaho.di.trans.steps.ogrfileinput.OGRFileInput.processRow(OGRFileInput.java:172)
at org.pentaho.di.trans.steps.ogrfileinput.OGRFileInput.run(OGRFileInput.java:342)
Someone pointed me that the error was caused by the lack of GDAL bindings to Java.
So I installed gdal-java RPM
https://www.rpmfind.net/linux/RPM/epel/7/x86_64/g/gdal-java-1.11.4-1.el7.x86_64.html
I installed but get successive dependency errors that can not get past (this is the first but when I try to install one of this got another set of dependencies errors):
[root#srvlgis01 tmp]# rpm -Uvh gdal-java-1.11.4-1.el7.x86_64.rpm
error: Failed dependencies:
gdal-libs(x86-64) = 1.11.4-1.el7 is needed by gdal-java-1.11.4-1.el7.x86_64
libgeotiff.so.1.2()(64bit) is needed by gdal-java-1.11.4-1.el7.x86_64
My GDAL version: gdal.x86_64 0:1.11.4-10.rhel7
Thanks in advance,
Pedro

Installing, Configuring, and running Hadoop 2.2.0 on Mac OS X

I've installed hadoop 2.2.0, and set up everything (for a single node) based on this tutorial here: Hadoop YARN Installation. However, I can't get hadoop to run.
I think my problem is that I can't connect to my localhost, but I'm not really sure why. I've spent upwards of about 10 hours installing, googling, and hating open-source software installation guides, so I've now turned to the one place that has never failed me.
Since a picture is worth a thousand words, I give you my set up ... in many many words pictures:
Basic profile/setup
I'm running Mac OS X (Mavericks 10.9.5)
For whatever it's worth, here's my /etc/hosts file:
My bash profile:
Hadoop file configurations
The setup for core-site.xml and hdfs-site.xml:
note: I have created folders in the locations you see above
The setup for my yarn-site.xml:
Setup for my hadoop-env.sh file:
Side Note
Before I show the results of when I run start-dfs.sh, start-yarn.sh, and check to see what's running with jps, keep in mind that I have a hadoop pointing to hadoop-2.2.0.
Starting up Hadoop
Now, here's the results of when I start the deamons up:
For those of you who don't have a microscope (it looks super small on the preview of this post), here's a code chunk of what shows above:
mrp:~ mrp$ start-dfs.sh
2014-11-08 13:06:05.695 java[17730:1003] Unable to load realm info from SCDynamicStore
14/11/08 13:06:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-mrp-namenode-mrp.local.out
localhost: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-mrp-datanode-mrp.local.out
localhost: 2014-11-08 13:06:10.954 java[17867:1403] Unable to load realm info from SCDynamicStore
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-mrp-secondarynamenode-mrp.local.out
0.0.0.0: 2014-11-08 13:06:16.065 java[17953:1403] Unable to load realm info from SCDynamicStore
2014-11-08 13:06:20.982 java[17993:1003] Unable to load realm info from SCDynamicStore
14/11/08 13:06:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mrp:~ mrp$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-mrp-resourcemanager-mrp.local.out
2014-11-08 13:06:43.765 java[18053:20b] Unable to load realm info from SCDynamicStore
localhost: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-mrp-nodemanager-mrp.local.out
Check to see what's running:
Time Out
OK. So far, I think, so good. At least this looks good based on all the other tutorials and posts. I think.
Before I try to do anything fancy, I'll just want to see if it's working properly, and run a simple command like hadoop fs -ls.
Failure
When I run hadoop fs -ls, here's what I get:
Again, in case you can't see that pic, it says:
2014-11-08 13:23:45.772 java[18326:1003] Unable to load realm info from SCDynamicStore
14/11/08 13:23:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From mrp.local/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I've tried to run other commands, and I get the same basic error in the beginning of everything:
Call From mrp.local/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Now, I've gone to that website mentioned, but honestly, everything in that link means nothing to me. I don't get what I should do.
I would very much appreciate any assistance with this. You'll make me the happiest hadooper, ever.
...this should go without saying, but obviously I'd be happy to edit/update with more info if needed. Thanks!
add these to .bashrc
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
Had a very similar problem and found this question while googling for a solution.
Here is how I could resolve it (on Mac OS 10.10 with Hadoop 2.5.1). Not sure if the question is exactly the same problem: I checked the log files generated by the data-node (/usr/local/hadoop-2.2.0/logs/hadoop-mrp-datanode-mrp.local.out) and found the following entry:
2014-11-09 17:44:35,238 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
Exception in namenode join org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory /private/tmp/hadoop-kthul/dfs/name is in an inconsistent state: storage
directory does not exist or is not accessible.
Based on this, I concluded that something is wrong with the HDFS data on the datanode.
I deleted the directory with the HDFS data and reformatted HDFS:
rm -rf /private/tmp/hadoop-kthul
hdfs namenode -format
Now, I am up and running again. Still wondering if /private/tmp is a good place to keep the HDSF data - looking options to change this.
So I've got Hadoop up and running. I had two problems (I think).
When starting up the NameNode and DataNode, I received the following error: Unable to load realm info from SCDynamicStore.
To fix this, I added the following two lines to my hadoop-env.sh file:
HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf=/dev/null"
I found those two lines in the solution to this post, Hadoop on OSX "Unable to load realm info from SCDynamicStore". The Answer was posted by Matthew L Daniel.
I had formatted the NameNode folder more than once, which apparently screws things up?
I can't verify this screws things up, because I don't have any errors in any of my log files, however once I followed Workaround 1 (deleting & recreating NameNode/DataNode folders, then reformatting) on this post, No data nodes are started, I was able to load up the DataNode and get everything working.
Since native library isn't supported on Mac, if you want to suppress this warning:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Add this to the log4j.properties in ${HADOOP_HOME}/libexec/etc/hadoop:
# Turn of native library warning
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

Categories

Resources