I'm trying to run Apache Nutch (v2.3) with MongoDB (v2.6). I am following this tutorial to help me get things set up. I have already created my seed list and my gora.properties and nutch-site.xml are set up fine. However, when running the bin/nutch inject ../urls/test/ command, I keep getting a java.io.IOException error :
$ bin/nutch inject ./../../urls/test/
InjectorJob: starting at 2015-05-04 13:53:29
InjectorJob: Injecting urlDir: ../../urls/test
InjectorJob: Using class org.apache.gora.mongodb.store.MongoStore as the Gora storage class.
InjectorJob: java.io.IOException: Failed to set permissions of path: \tmp\hadoop-TColletti\mapred\staging\TColletti1801159571\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
I've read in some places this could be caused by not having the correct version of MongoDb or the Gora is wrong. It seems as though something is wrong with the permissions to a temp directory for hadoop(which im not using right now). I've looked at this S/O article but cannot find the core-site.xml file anywhere in my 2.3 version of Nutch.
Can someone help me finally run this command?
I'm not sure if this is the official answer or not, but it worked for me. I found another S/O post here. The general problem was the same. Other places kept mentioning about modifying core-site.xml which I did not have. However in one of the answers it mentions a patch to download and couple lines you need to add to the nutch-site.xml file in my runtime/local/conf directory. Tried it out and it solved my problems. It seems like the patch just ignored the errors and went around them. There may be a better solution to this but for now, it worked.
Related
The first screenshot shows a working run configuration. The second shows a non working one. They are identical module/classpath wise - at least according to the visible info.
Clearly there's a bug in IJ for this. So .. how have others of you out there discovered a workaround for this? Also, ideas on what triggers this behavior?
WORKING run configuration
NOT Working run configuration - gives ClassNotFoundException for org.apache.hadoop.fs.PathFilter
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.PathFilter
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
Here is the UDFPafDqm module - which DOES include hadoop jar from Hortonworks - so the ClassNotFoundException is bogus
The process above was repeatable : if you copy a run configuration it does not necessarily work - and can fail as described. You have to start from scratch. This is an IJ bug.
I've installed Flume and Hadoop manually (I mean, not CDH) and I'm trying to run the twitter example from Cloudera.
In the apache-flume-1.5.0-SNAPSHOT-bin directory, I start the agent with the following command:
bin/flume-ng agent -c conf -f conf/twitter.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent
My conf/twitter.conf file uses the logger as the sink. The conf/flume-env.sh assigns to CLASSPATH the flume-sources-1.0-SNAPSHOT.jar that contains the definition of the twitter source. The resulting output is:
(...) [ERROR org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows. java.lang.NoSuchMethodError:
twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j/FilterQuery;
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:139)
The conflict results from a FilterQuery class that is defined elsewhere in the flume lib and that does not contain the setIncludeEntities method. For me, the file that contains this class is the twitter4j-stream-3.0.3.jar and I cannot exclude the file from the classpath as suggested here.
I believe this experience was quite frustrating for you, for me it was for sure. The main problem is, both the files, flume-sources-1.0-SNAPSHOT.jar and twitter4j-stream-3.0.3.jar contains the same FilterQuery.class. That is why the conflict message is generated in the log file.
I am not a Java or Big Data expert, but I can give you an alternate to this problem. Download the Twitter4j-stream-2.6.6.jar or lower version from here and replacethe twitter4j-stream-3.0.3.jar. All the 3.X.X uses this class. After replacing, everything should work fine. But you may get some heap error after downloading huge amount of tweets. Please google the solution as it was resolved in 3.X.X files.
-Edit
Also, please don't forget to download and replace all the twitter4j files in /usr/lib/flume-ng folder. Namely, twitter4j-media-support-2.2.6.jar, twitter4j-stream-2.2.6.jar and twitter4j-core-2.2.6.jar. Any mismatch related to version among these files will also create problem.
As suggested in the post a problematic file can be search-contrib-1.0.0-jar-with-dependencies.jar too.
You need to recompile flume-sources-1.0-SNAPSHOT.jar from the git:https://github.com/cloudera/cdh-twitter-example
Install Maven, then download the repository of cdh-twitter-example.
Unzip, then execute inside (as mentionned) :
$ cd flume-sources
$ mvn package
$ cd ..
This problem happened when the twitter4j version updated from 2.2.6 to 3.X, they removed the method setIncludeEntities, and the JAR is not up to date.
PS: Do not download the prebuilt version, it is still the old.
Simply rename all twitter4j-stream* jar files and rerun your flume. It will work with charm. :)
I had the same problem and at last I solved following these steps:
First I renamed all jar files in jarx: from twitter4j-stream-3.0.3.jar -> twitter4j-stream-3.0.3.jarx, ...
This solved the error, but when it tried to estabilish connection, I got error 404:
(Twitter Stream consumer-1[Establishing connection])
[INFO - Twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] 404:
The URI requested is invalid or the resource requested, such as a user, does not exist.)
After reading this page (https://twittercommunity.com/t/twitter-streaming-api-not-working-with-twitter4j-and-apache-flume/66612/11) finally I solved downloading a new version of twitter4j (in the page there's a link).
Probably not the best solution, but worked for me.
I have done more than an hour of searching while trying to run JUnit on my project. I can see that there is a class missing - LogEntryFormatter. But no matter how hard I tried, I am not able to find the jar file which contains this one. Eclipse shows the below stack trace after running the Test case file.
java.lang.NoClassDefFoundError: weblogic/logging/LogEntryFormatter
at java.lang.ClassLoader.findBootstrapClass(Native Method)
at java.lang.ClassLoader.findBootstrapClass0(ClassLoader.java:892)
at java.lang.ClassLoader.loadClass(ClassLoader.java:302)
at java.lang.ClassLoader.loadClass(ClassLoader.java:300)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at weblogic.logging.commons.LogImpl.(LogImpl.java:14)
at weblogic.logging.commons.LogFactoryImpl.getInstance(LogFactoryImpl.java:21)
at weblogic.logging.commons.LogFactoryImpl.getInstance(LogFactoryImpl.java:18)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.(SpringJUnit4ClassRunner.java:91)
...
When I use #RunWith(SpringJUnit4ClassRunner.class) and if I use #RunWith(JUnit4.class), a sample test seems to work.
Any sort of help will be useful. I am using WebLogic server, and all weblogic related jars are available on the classpath.
Ok, so after a lot of effort I am able to run JUnit Tests, although not in the exact way I want. I am also at a loss to explain why this is happening or where weblogic.logging is configured. This could be a possible problem with my project setup. After adding the following jar files to the classpath (after removing everything except jdk), it seems to be working for me.
wlclient
com.bea.core.utils.classloaders
com.bea.core.descriptor
com.bea.core.utils
com.bea.core.management.core
junit 4.5
Thanks to all those who helped by providing valuable comments. The stack traces encountered at each step helped me point in the right direction.
How do you add jars to the class path for Oracle 10.3.5...As I understood it, there is a bug (or incorrect info) with the documentation (readme) that states that any jars placed in the $DOMAIN_HOME/lib directory would be added to the classpath dynamically...but in the real documentation for 10.3.3 it states that these files dont get added to the classpath anymore...
so here I am trying to find out -- how do you add jars to the classpath...I have tried changing the commonEnv.sh and am currently looking for the setDomainEnv.sh (but cant find it as of yet) and none of these things have worked to add this jar to the classpath...
my whole problem is that i added datasources to my server...and I am trying to add the DB2 jar to the environment so that it can be used...funny thing is that after adding the jar in the $DOMAIN_HOME/lib I was able to get rid of a connection error in the admin console when trying to test the connection to the database...and that all seems to work but now im getting a class definition error...
]] Root cause of ServletException.
java.lang.NoClassDefFoundError: com/ibm/db2/jcc/DB2Connection
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:630)
at java.lang.ClassLoader.defineClass(ClassLoader.java:614)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at weblogic.utils.classloaders.GenericClassLoader.defineClass(GenericClassLoader.java:343)
Truncated. see log file for complete stacktrace
Caused By: java.lang.ClassNotFoundException: com.ibm.db2.jcc.DB2Connection
at weblogic.utils.classloaders.GenericClassLoader.findLocalClass(GenericClassLoader.java:297)
at weblogic.utils.classloaders.GenericClassLoader.findClass(GenericClassLoader.java:270)
at java.lang.ClassLoader.loadClass(ClassLoader.java:305)
at java.lang.ClassLoader.loadClass(ClassLoader.java:246)
at weblogic.utils.classloaders.GenericClassLoader.loadClass(GenericClassLoader.java:179)
Truncated. see log file for complete stacktrace
idk what else to try - i searched for some answers but seemingly all of them are old and outdated...
$DOMAIN/lib should work fine, but not dynamically. You have to restart. However, handling JAR files for data source drivers is likely different.
Just curious - have you confirmed the jar file(s) contain he class in question?
Also try: http://docs.oracle.com/cd/E17904_01/web.1111/e13753/db2.htm
I ended up finding out the problem was that I was editing the commEnv.sh file on windows instead of the commEnv.cmd file...really dumb but editing that and adding the jar to the classpath there worked...bah!
I'm looking for a few days for the solution to an UnresolvedAddressException I can't figure out!
It seems it's quite a challenging problem, since I couldn't even find other info on the net!
I'm working with OSGi framework on JamVM.
I get this exception when using Date.toString o SimpleDateFormat on a Calendar object. I can't understand why the bundle tries to connect after the getZoneStrings function. It seems it cannot find the locale but I'm not sure this is the problem.
I tried adding the file /etc/timezone (that was missing) but it didn't solve the problem.
Here's the complete stack trace of the exception:
adsdebian:/usr/local/bundle# org.osgi.framework.BundleException: Activator start error in bundle zApp_RoadPricing [24].
at org.apache.felix.framework.Felix.startBundle(Felix .java:1506)
at org.apache.felix.framework.BundleImpl.start(Bundle Impl.java:774)
at org.apache.felix.shell.impl.StartCommandImpl.execu te(StartCommandImpl.java:105)
at org.apache.felix.shell.impl.Activator$ShellService Impl.executeCommand(Activator.java:291)
at org.apache.felix.shell.remote.Shell.run(Shell.java :109)
at java.lang.Thread.run(Thread.java:743)
Caused by: java.nio.channels.UnresolvedAddressException
at gnu.java.nio.SocketChannelImpl.connect(SocketChann elImpl.java:160)
at gnu.java.net.PlainSocketImpl.connect(PlainSocketIm pl.java:281)
at java.net.Socket.connect(Socket.java:454)
at java.net.Socket.connect(Socket.java:414)
at gnu.java.net.protocol.http.HTTPConnection.getSocke t(HTTPConnection.java:719)
at gnu.java.net.protocol.http.HTTPConnection.getOutpu tStream(HTTPConnection.java:800)
at gnu.java.net.protocol.http.Request.dispatch(Reques t.java:291)
at gnu.java.net.protocol.http.HTTPURLConnection.conne ct(HTTPURLConnection.java:219)
at gnu.java.net.protocol.http.HTTPURLConnection.getHe aderField(HTTPURLConnection.java:582)
at java.net.URLConnection.getHeaderFieldInt(URLConnec tion.java:426)
at java.net.URLConnection.getContentLength(URLConnect ion.java:302)
at gnu.java.net.loader.RemoteURLLoader.getResource(Re moteURLLoader.java:79)
at java.net.URLClassLoader.findResources(URLClassLoad er.java:720)
at java.lang.ClassLoader.getResources(ClassLoader.jav a:640)
at gnu.classpath.ServiceFactory.lookupProviders(Servi ceFactory.java:286)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.ja va:163)
at java.text.DateFormatSymbols.getZoneStrings(DateFor matSymbols.java:123)
at java.text.DateFormatSymbols.<init>(DateFormatSymbo ls.java:192)
at java.text.SimpleDateFormat.<init>(SimpleDateFormat .java:448)
at java.text.SimpleDateFormat.<init>(SimpleDateFormat .java:430)
at crf.opengate.app.roadpricing.RoadPricing.<init>(Ro adPricing.java:109)
at java.lang.reflect.Constructor.constructNative(Nati ve Method)
at java.lang.reflect.Constructor.newInstance(Construc tor.java:328)
at java.lang.Class.newInstance(Class.java:1154)
at org.apache.felix.framework.Felix.createBundleActiv ator(Felix.java:3341)
at org.apache.felix.framework.Felix.startBundle(Felix .java:1453)
...5 more
Is there anyone who could help me, please?
Thanks,
Andrea
Add to andre26's Reputation
Found this link which talks of a similar issue that got resolved by switching to Felix version 2.0.2. Maybe you could try that?
It's hard to be definitive without looking at the JamVM source code, but the stacktrace tells me that inside getZoneStrings there is an attempt to load a class or other file via the classloader (hence the call to ClassLoader and URLClassLoader three layers down the stack).
That attempt at classloading is not finding the address a URL that is in the classpath. That could be because you have a problem later on in your classpath and since it didn't find the file where you are putting your classes it went looking in the next place, which had an UnresolvedAddressException (by the way, that seems like an odd violation of the spec, the classloader should throw its checked exception. Here it seems that GNU classpath is letting a runtime exception leak out which should instead be converted into an exception indicating that the class cannot be found), or it could just be that that is what it does when it can't find a resource.
As for what class is not being found, it seems to be one of the ServiceProviders configured, perhaps in GNU Classpath:
at gnu.classpath.ServiceFactory.lookupProviders(Servi ceFactory.java:286)
The above is the core of the problem. It is looking up a provider, and attempted to get a resource from the classpath that likely doesn't exist.
It is impossible to say what exactly it is looking for without examining the source code. Fortunately you seem to be using all open source stuff, so it should be fairly easy to find.
Thanks JRL, the solution posted in the link you suggested me solved the problem!
I couldn't expect in such a bug of felix even because I thought I had already used Date and Calendar classes on the same platform and in the same conditions!
Just sum up of solution:
download newer version of felix (from at least 2.0.2) - from http://felix.apache.org/site/downloads.cgi get the .jar named Main (i.e. the bundle for the OSGi framework)
save it in the bin/ directory where your current felix.jar resides
rename felix.jar as felix.jar_old (after shutting down the OSGi FX if it's running!)
rename the newer version of felix (ex. org.apache.felix.main-2.0.3.jar) as felix.jar
restart the OSGi framework with your app
Hope this helps to someone else!
Bye,
Andrea
(Sorry for the second username...I register quite long time ago but I forgot about the google identification! :-) )