I am using Nutch 2.3 version to crawl data. I have to add a plugin in Nutch. I have search from web. I have found some guide from web e.g. wiki.apache.org. But it is for older version of Nutch 1.x.
How I'll do it please elaborate it as I am new in this field ?
Related
I want to install StormCrawler 2.0, but I am unable to locate the appropriate installation instructions. In particular, what versions of Apache Storm, Zookeeper, and Java is StormCrawler 2.0 compatible with?
Thanks!
The POM file for 2.0 contains the information you are after.
Java 8
Apache Storm 2.1 (likely to be upgraded to 2.2 shortly)
according to Storm's documentation ZK 3.3.3
You could have a look at Ansible Storm and see if you can get it to work with Storm 2 with a few changes?
I am working on a project with apache Nutch 2.3.1 and I need to be able to extract specific data from the downloaded html pages. I found a plugin (parse-xml NUTCH-185) that would help me for this purpose but some of the libraries it uses no longer exist or is deprecated, what I intend to do is make necessary changes to make it compatible with Nutch 2.3.1
The libraries that give me an error in the Nutch compilation are these, could you help me find the equivalents for Nutch 2.3.1?
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.nutch.crawl.CrawlDatum;
import org.apache.nutch.crawl.Inlinks;
import org.apache.nutch.parse.ParseData;
Nutch 2.3.1 is not the next version of Nutch 1.x. Nutch has at any given time 2 major "branches": Nutch 1.x/master/trunk and Nutch 2.x. Nutch 2.x is very different to its brother. They share lots of design ideas, but with different implementations. In short, you cannot find those classes because they don't exist in Nutch 2.x.
The org.apache.lucene.* are not implemented in Nutch, but used directly from the Apache Lucene library.
Nutch 2.x has a very different architecture compared to Nutch 1.x. This means that the update for that plugin is not only about replacing those imports. You'll need to adapt the code to the new architecture. Although the main logic of the plugin should be roughly the same.
I am using gwt 2.4 over eclipse (OS ubuntu 11.10). I have to use the google maps library for gwt. I tried to use this library with gwt 2.4 but there is a few incompatibilities. After a quick research i concluded to install gwt 2.1. But when i tried to install it from the "install new software" of eclipse and give the link that i found here is trying to reinstall the gwt v 2.4. Any ideas??
I just want the repository location of the gwt 2.1 for eclipse 3.7.
You have to download the GWT SDK (as a zip file), unpack it and then point Eclipse at it. Look at the Google preferences in Eclipse to "add a new SDK".
See also http://code.google.com/p/google-web-toolkit/issues/detail?id=6204
I am following this to build a JSF project in Eclipse. I am using Eclipse Galileo.
The problem I am facing is that it is mentioned to use DynamicWebModule version 2.5 to use JSF, but it is showing me upto version 2.4 only. How can I upgrade my DynamicWebModule version?
Have you selected a Servlet 2.4 Target Runtime?
The documentation you link to is for Helios. You would be better off using the Galileo documentation. If you are going to target JSF 2.0, you would probably be better off upgrading from Galileo to Helios.
I am working on Galileo but I am not able to find the plugin for hibernate 3.3 . Can anybody send me the plugin link please? And also send me the spring 2.x latest version plugin also.
Hibernate Tools for Eclipse is a component of JBoss Tools and you'll find the update site at this location.
For Spring IDE, you can a recent milestone compatible with Eclipse 3.5 from this update site (the link is correct and working with the update manager).
Hibernate Tools for Eclipse is a component of JBoss Tools and you'll find the update site at this location.
For Spring IDE, you can a recent milestone compatible with Eclipse 3.5 from this update site (the link is correct and working with the update manager)