Nutch without Solr & with MySQL: - java

I am new to Nutch, just setup using following url : Nutch+MySQL
I ran following command because the command mentioned in url is deprecated and not available with Nutch 2.3.
I ran the following command.
bin/crawl urls 1 null 10
Injecting seed URLs
/home/Aayush/NUTCH_HOME/runtime/local/bin/nutch inject urls -crawlId 1
InjectorJob: starting at 2015-03-15 17:14:50
InjectorJob: Injecting urlDir: urls
InjectorJob: java.lang.ClassNotFoundException: org.apache.gora.sql.store.SqlStore
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:259)
at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:93)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:77)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
And I get the following error.
Error running:
/home/Aayush/NUTCH_HOME/runtime/local/bin/nutch inject urls -crawlId 1
Failed with exit value 255.
I found somewhere that Nutch + MySQL is not supported by Gora version 0.5 you have to downgrade the version to 0.2.1. I downgraded the version in ivy.xml and then removed the directory build and runtime. And again rebuild the project using ant.
After rebuilding and rerunning I still get the above mentioned error.
Please provide any information/help.
Thank you for reading.

Related

Hadoop MR job - java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser

Oozie workflow triggers a Hadoop Map Reduce job's Java class. I have added opencsv-2.3.jar and commons-lang-3-3.1 jar dependencies in my Eclipse project. The project builds successfully, however when moved it on Hadoop cluster I get an ClassNotFoundError even though my project contains jar.
Since this is a working existing legacy system, I do not wish to change the environment dependencies. Hence, i tried different combinations by adding libraries to classpath without success.
Tried: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader - Upload File Vaadin
Checked with a MR client maven dependency - org.apache.hadoop:hadoop-mapreduce-client-common:2.6.0-cdh5.4.2.
The legacy jar in production env runs fine, but my project's compiled jar throws errors as follows:
oozie syslog:
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Running job: job_123213123123_35305
INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1548794054671_35304_m_000000_0 is : 1.0
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Job job_123213123123_35305 running in uber mode : false
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: map 0% reduce 0%
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Task Id : attempt_123213123123_35305_m_000001_0, Status : FAILED
oozie stderr:
Error: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Please suggest if I am missing anything and what I can try.
opencsv-2.3.jar library was added from Eclipse Build Path as an external jar. I had to use mvn clean and build it. Finally, used "*jar-with-dependencies.jar" from the target folder which fixed the issue.

How do I troubleshoot the installation of Apache Accumulo on Linux?

I am trying to install open source Accumulo on RHEL 7.x. I have two GB of swap space. I have installed Java 1.8, Hadoop 3, and Zookeeper. I have run the bootstrap_config.sh script for Accumulo 1.9.2.
I ran this (and expected it to work):
/bin/accumulo-1.9.2/bin/accumulo init
But I get this error:
[start.Main] ERROR: Uncaught exception
java.util.ServiceConfigurationError:
org.apache.accumulo.start.spi.KeywordExecutable: Provider
org.apache.accumulo.proxy.Proxy could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.accumulo.start.Main.checkDuplicates(Main.java:237)
at org.apache.accumulo.start.Main.getExecutables(Main.java:228)
at org.apache.accumulo.start.Main.main(Main.java:84) Caused by: java.lang.NoClassDefFoundError:
org/apache/commons/configuration/Configuration
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 5 more Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at org.apache.accumulo.start.classloader.AccumuloClassLoader$2.loadClass(AccumuloClassLoader.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
I used the Accumulo bootstrap_config.sh script to configure Hadoop version 3. How do I get "/bin/accumulo-1.9.2/bin/accumulo init" to work?
Accumulo 1.9.2 expects Hadoop 2 out of the box, but does have a build profile to rebuild a tarball specifically for use with Hadoop 3. You can build Accumulo with the Hadoop 3 profile by downloading the source tarball and doing:
mvn clean package -Dhadoop.profile=3 -DskipTests
If you're not interested in rebuilding from source, it may be possible to simply fix the class path issues by reading the error message, and adjusting your class path accordingly. In this case, it seems you're missing a commons-configuration jar.

Error while Integrating Apache Nutch 2.3 with Hbase 0.94.14 and Solr 5.2.1

I am integrating Nutch with Hbase and Solr.
After starting Hadoop and Hbase services, I run following command in Nutch Home
sudo -E bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 2
I am facing these errors:
Injecting seed URLs
/usr/local/apache-nutch-2.3.1/runtime/local/bin/nutch inject urls/seed.txt -crawlId TestCrawl
InjectorJob: starting at 2016-05-26 15:41:14
InjectorJob: Injecting urlDir: urls/seed.txt
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:114)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
Error running:
/usr/local/apache-nutch-2.3.1/runtime/local/bin/nutch inject urls/seed.txt -crawlId TestCrawl
Failed with exit value 1.
Can anyone suggest me what is wrong with it?
This is a bug in Nutch, It is unable to locate a transitive dependency while executing the crawl script.
Better configuration to use is nutch-2.3.1 with hbase-0.98.8-hadoop2
for better understanding refer given below url
https://wiki.apache.org/nutch/Nutch2Tutorial
this is a bug in gora-hbase 0.6.1
In addition add the missing hbase-common-0.98.8-hadoop2.jar transitive dependency, this is a bug in gora-hbase 0.6.1
<dependency org="org.apache.hbase" name="hbase-common" rev="0.98.8-hadoop2" conf="*->default" />
With this i was able to crawl successfully.

Running jzy3d demos result in ClassNotFoundException

The problem is as follows, during startup of the jzy3d demo ScatterDemo.java :
Exception in thread "main" java.lang.NoClassDefFoundError: javax/media/opengl/GLProfile
at org.jzy3d.chart.Settings.<init>(Settings.java:19)
at org.jzy3d.chart.Settings.getInstance(Settings.java:48)
at org.jzy3d.analysis.AnalysisLauncher.open(AnalysisLauncher.java:18)
at org.jzy3d.analysis.AnalysisLauncher.open(AnalysisLauncher.java:13)
at org.jzy3d.demos.scatter.ScatterDemo.main(ScatterDemo.java:16)
Caused by: java.lang.ClassNotFoundException: javax.media.opengl.GLProfile
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
Java Result: 1
Configuration is NetBeans 7.3.1, jzy3d-api-0.9.1, jzy3d-swt-0.9.1, log4j-1.2.17, gluegen-rt (latest stable), jogl-all (latest stable), JDK1.7
I am aware of the following change (Moving all of javax.media.opengl to com.jogamp.opengl, https://jogamp.org/bugzilla/show_bug.cgi?id=682). I have gone through the usual process of including .jar files into the project.
Project compiles fine, does not run.
My question(s) are: Can I somehow redirect javax.media.opengl.* to com.jogamp.opengl.* ? What is the correct way to resolve this problem?
As you can see here, even the code on the master branch (0.9.2) isn't based on the latest version of JOGL. Please ask Martin Pernollet to make the necessary changes (replace javax.media.* by com.jogamp.*) or do it yourself. You can rebuild Jzy3d to test it with the modifications of the import clauses. This is the correct way of solving this simple problem.

Error running spoon on Ubuntu 14.04 64 bit

I am using Spoon tool of Pentaho data integration for long and it was working fine on my system. But since i moved it to /opt I am unable to run again . I have Oracle Java 8 installed on my system and each time try to run it i am end up with following exception
Exception in thread "main" java.lang.NoClassDefFoundError: org/eclipse/swt/widgets/Composite
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2688)
at java.lang.Class.getMethod0(Class.java:2937)
at java.lang.Class.getMethod(Class.java:1771)
at org.pentaho.commons.launcher.Launcher.main(Launcher.java:149)
Caused by: java.lang.ClassNotFoundException: org.eclipse.swt.widgets.Composite
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
please help me to resolve this error i haven't found any solution yet
I found a solution to this problem I have removed all hidden directories generated by Kettle as well as its copy from /opt . Then I have extracted new version copy . after that I have added /opt/data-integration to my path variable and I have tried to run it from my home . Although it was not a successful run but it has generated all those dependent hidden folders required to run it . then I have to go to that directory by issuing
cd /opt/data-integration
and then I was successful to run it by issuing
sh spoon.sh
I have to go to that directory because Pentaho developers has set it so by placing relative path to launcher folder in their main command at spoon.sh.

Categories

Resources