Troubles running Mahout 0.9 text-processing examples - java

TL/DR : are mahout 0.9 examples compatible with hadoop 2.4 ?
My problem:
I would like to classify a bunch of documents using Mahout 0.9. To do so, I'm following the example described here.
I'm on windows and trying to go full native (ie, no cygwin). I already dispose of a local hadoop 2.4.1 cluster.
I downloaded the mahout sources and compiled it according to the wiki :
mvn "-Dhadoop2.version=2.4.1" -DskipTests clean install
I then tried to execute the example with the following example :
hadoop jar $Env:mahout_home/examples/target/mahout-examples-0.9-job.jar org.apache.mahout.driver.MahoutDriver seqdirectory -i Decomposition -o output
It all seems to work : I'm getting logs showing the mapreduce job begins to run. However, I quickly get the following errors :
Error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.ja
va:166)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.<init>(CombineFileRecordReader.java:126)
at org.apache.mahout.text.MultipleTextFileInputFormat.createRecordReader(MultipleTextFileInputFormat.java:43)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.ja
va:157)
... 10 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but c
lass was expected
at org.apache.mahout.text.WholeFileRecordReader.<init>(WholeFileRecordReader.java:59)
... 15 more
According to the various links I found, it seems to come from code intented for Hadoop 1.0.
Am I missing something, or are mahout provided examples not suited for a Hadoop 2.4 cluster ?

The problem is because the object TaskAttemptContext is a Interface in the version 2.4 of Hadoop and the job was expecting a class (version 1.1.2 Hadoop). TaskAttemptContext has been changed in version 2.0.

Related

Hadoop Mapreduce Naive Bayes Classifier in netbeans

Exception in thread "main" java.lang.NoClassDefFoundError: com/google/protobuf/ServiceException
at org.apache.hadoop.ipc.ProtobufRpcEngine.<clinit> (ProtobufRpcEngine.java:73)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
I got the above error when I try to run the program using Naive Bayes Classifier using Map reduce.. I have imported all the necessary jar files... I already surfed for many answers and come to know that its because of protobuf version mismatch but I still have the correct version of jar....
Hadoop version : 2.6.0
Compiled with protoc : 2.5.0
Pls can any one help me out...... I am madly in need to execute it ....

Java NoClassDefFoundError : Apache Flink Complex Event Processing

I am trying to understand the Apache Flink CEP program to monitor rack temperatures in a data center as described by Flink Official Documentation. But when I follow the steps and create a jar using mvn clean package and tried to execute the package using the command
java -cp "../cep-monitoring-1.0.jar" org.stsffap.cep.monitoring.CEPMonitoring
But I get the following error,
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/functions/source/SourceFunction
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.flink.streaming.api.functions.source.SourceFunction
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I tried different variations of giving the classpath as described here but getting the same error. Can someone point out my mistake in running the program?
To submit a job to the local Flink cluster:
Run Flink.
/path/to/flink-1.4.0/bin/start-local.sh
Submit the job.
/path/to/flink-1.4.0/bin/flink run -c com.package.YourClass /path/to/jar.jar
Alternatively you can run the job simply from your IDE:
Your job in this case will be run in a Flink environement.
Check Flink's example: https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java
The cep example uses flink version 1.3.2. So here are the steps to run it.
Install version 1.3.2 of apache flink. (wget it from here and extract it).
cd into flink-1.3.2
./bin/start-local.sh, this will start the flink cluster. Do cd ...
Clone this repo using git clone and cd into that.
mvn clean package to build the project. This will create target directory.
Run ../flink-1.3.2/bin/flink run target/cep-monitoring-1.0.jar, to start the process.
In separate terminal the output can be logged like this (assuming that you are in same directory as previous step) tail -f ../flink-1.3.2/log/flink-*-jobmanager-*.out (* will be replaced by specific user detail, press tab to autocomplete those).
Here is the sample output,
rshah9#bn18-20:~/tools/cep-monitoring-master$ tail -f ../flink-1.3.2/log/flink-rshah9-jobmanager-0-bn18-20.dcs.mcnc.org.out
TemperatureWarning(9, 102.45860162626161)
TemperatureWarning(6, 113.21295716135027)
TemperatureWarning(5, 105.46064102697723)
TemperatureWarning(0, 106.44635415722034)
TemperatureWarning(4, 112.07396748089734)
TemperatureWarning(9, 114.53346561628322)
TemperatureWarning(3, 109.05305417712648)
TemperatureWarning(7, 112.3698094257147)
TemperatureWarning(3, 107.78609416982076)
TemperatureWarning(9, 107.34373990230458)
TemperatureWarning(5, 111.46480675461656)

HBase crashes after few seconds

I have been following the HBase installation instructions as given on this page:
https://hbase.apache.org/book.html#quickstart
I am using HBase version 1.1.3 (stable release) and have configured in standalone mode. I have installed OpenJDK 7
When I try to start hbase it crashes after few seconds and I get the following error:
2016-01-29 17:37:04,136 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster
at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:143)
at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:219)
at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:155)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:224)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2355)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer
at org.apache.hadoop.hbase.util.Bytes.putInt(Bytes.java:899)
at org.apache.hadoop.hbase.KeyValue.createByteArray(KeyValue.java:1082)
at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:652)
at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:580)
at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:483)
at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:370)
at org.apache.hadoop.hbase.KeyValue.<clinit>(KeyValue.java:267)
at org.apache.hadoop.hbase.HConstants.<clinit>(HConstants.java:978)
at org.apache.hadoop.hbase.HTableDescriptor.<clinit>(HTableDescriptor.java:1488)
at org.apache.hadoop.hbase.util.FSTableDescriptors.<init>(FSTableDescriptors.java:124)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:570)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:365)
at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.<init>(HMasterCommandLine.java:307)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139)
... 7 more
Can anyone tell me the reason for this error ?
Please let me know if you need more info.
The exception causing this is NoClassDefFoundError which usually means you are missing something from your classpath. In this you're missing org.apache.hadoop.hbase.util.Bytes which comes from hbase-common.jar. However, if one is missing, there are probably others too. This might mean you're starting HBase the wrong way. Have you tried the included start-up scripts?

Mahout : Cannot convert into sequence file

I'm trying to convert some text files into mahout sequence files. So I do
mahout seqdirectory -i inputFolder -o outputFolder
But I always get this exception
java.lang.Exception: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:164)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.<init>(CombineFileRecordReader.java:126)
at org.apache.mahout.text.MultipleTextFileInputFormat.createRecordReader(MultipleTextFileInputFormat.java:43)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:491)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:734)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:155)
... 11 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
at org.apache.mahout.text.WholeFileRecordReader.<init>(WholeFileRecordReader.java:52)
... 16 more
I'm running Mahout 0.8 on Hadoop 2.2.0
Any ideas ?
The previous answers are incorrect. Mahout 0.8 had a MapReduce version of seqdirectory which was a new feature. A bug in the MR version was causing the exception you are seeing.
To execute seqdirectory with Mahout 0.8, please use the sequential version by specifying the -xm sequential option to your command line.
mahout seqdirectory -i inputFolder -o outputFolder -xm sequential
By default seqdirectory executes the MR version if none specified.
This issue has since been fixed in Mahout 0.9.
As I read somewhere that mahout 0.8 works with hadoop 1.2. I only downloaded mahout (it uses hadoop jar from lib/hadoop )

Unable to convert openoffice documents using docsplit resulting in java.lang.NoClassDefFoundError

I have installed the docsplit gem and been able to convert PDF documents. However when it comes to splitting openoffice documents such as powerpoint and word files, I get the following error:
Exception: Command
/usr/local/bin/docsplit pdf /tmp/tmpzuk5gf/dump.ppt --output /tmp/tmpzuk5gf
finished with return code
1
and output:
Exception in thread "main" java.lang.NoClassDefFoundError: /usr/lib/openoffice
Caused by: java.lang.ClassNotFoundException: .usr.lib.openoffice
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: /usr/lib/openoffice. Program will exit.
I have already checked that the /usr/lib/openoffice folder is available.
How do I go about this?
I was using docsplit 0.7.2 with OpenOffice and Ubuntu 10.04. However, version 0.7.0 onward seems to have a preference for LibreOffice over OpenOffice. So I downgraded my docsplit to version 0.6.4 and it works OK with OpenOffice. I can now run the commands successfully.
I have using docsplit 0.7.2 with openoffice and ubuntu 12.04 all is OK.
In previous time I have used ubuntu 10.04 and sometimes converting were breaking with similar error, but i had tried again and in most cases converting were completed successfully.

Categories

Resources