Building Nutch Plugin: class dependency - java

I wrote some Nutch plugin with using different extension points such as Protocol, Parser and etc. These plugins work perfectly inside eclipse. But in order to use them on hadoop cluster it should be built by using ANT. My problem is, I wrote some classes in some new packages inside core folder(src). These classes are shared cross different developed plugins. My problem is at build time of developed plugins, ANT can not find the mentioned shared classes, so I am unable to complete the build process successfully. For better understanding of my problem Here is build.xml of one of my plugins:
<project name="filter-news" default="jar-core">
<import file="../build-plugin.xml"/>
<!-- Build compilation dependencies -->
<target name="deps-jar">
<ant target="jar" inheritall="false" dir="../lib-xml"/>
</target>
<!-- Add compilation dependencies to classpath -->
<path id="plugin.deps">
<fileset dir="${nutch.root}/build">
<include name="**/lib-xml/*.jar" />
</fileset>
</path>
<!-- Deploy Unit test dependencies -->
<!-- Deploy Unit test dependencies -->
<!-- for junit test -->
</project>
ivy.xml:
<ivy-module version="1.0">
<info organisation="org.apache.nutch" module="${ant.project.name}">
<license name="Apache 2.0"/>
<ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
<description>
Apache Nutch
</description>
</info>
<configurations>
<include file="../../..//ivy/ivy-configurations.xml"/>
</configurations>
<publications>
<!--get the artifact from our module name-->
<artifact conf="master"/>
</publications>
<dependencies>
<dependency org="mysql" name="mysql-connector-java" rev="5.1.31"/>
<dependency org="net.sourceforge.htmlcleaner" name="htmlcleaner" rev="2.2"/>
<dependency org="commons-jxpath" name="commons-jxpath" rev="1.3"/>
</dependencies>
</ivy-module>
plugin.xml:
<plugin id="filter-news" name="Apache Nutch XML/HTML Parser/Indexing Filter" version="1.4" provider-name="nutch.org">
<runtime>
<library name="filter-news.jar">
<export name="*"/>
</library>
<library name="ant-1.7.0.jar"/>
<library name="ant-launcher-1.7.0.jar"/>
<library name="jdom-1.1.jar"/>
<library name="commons-jxpath-1.3.jar"/>
<library name="htmlcleaner-2.2.jar"/>
<library name="mysql-connector-java-5.1.31.jar"/>
</runtime>
<requires>
<import plugin="nutch-extensionpoints"/>
</requires>
<extension id="org.apache.nutch.parse" name="Nutch XML/HTML Html parser filter" point="org.apache.nutch.parse.HtmlParseFilter">
<implementation id="com.ictcert.nutch.filter.news.NewsHtmlFilter" class="com.ictcert.nutch.filter.news.NewsHtmlFilter" />
</extension>
<extension id="org.apache.nutch.indexer" name="Nutch XML/HTML Indexing Filter" point="org.apache.nutch.indexer.IndexingFilter">
<implementation id="com.ictcert.nutch.filter.news.NewsIndexingFilter" class="com.ictcert.nutch.filter.news.NewsIndexingFilter"/>
</extension>
</plugin>
When I try to build this plugin ant can not find all of the class dependencies related to com.ictcert.nutch package which is located in core part of nutch (src). While for other classes located in org.apache.nutch I have not such problem. Would you please tell me what is wrong with my configuration that the default packages could be found by ANT but the new ones could not.

As far as I know from my experience the implementation id should have the same package structure as your extension point. Can you try that and see if it solves your problem.

Related

Excluding an Ivy dependency conditionally

I need to build a custom Ant script that builds a project based on CI output. We use Atlassian Bamboo as CI server.
Normally our projects have a dependency to our platform module, managed via Ivy/Artifactory.
Our typical dependencies.xml file contains a dependency to that module, transitively. And other potential dependencies. As an example, our core module depends on lots of Spring packages, but not on Spring Boot. If a project needs Spring Boot too, it will define its dependency in its dependencies.xml file along with <depencency org="com.acme" name="core-platform"...
My goal now is to exclude com.acme#core-platform from resolution, because I am making a different task that uses Bamboo output artifact to take the latest build of the core module and its dependencies without going through Artifactory.
This is very important because during a build I may like to change the version of a dependent package (e.g. upgrade Spring 4.3.1 to 4.3.3) and test with the proper Spring. If I simply resolve dependencies to com.acme#core-platform#latest.release, which is released on Artifactory, I won't take 4.3.3 of Spring which was committed to Git and available in core-platform's currently-building dependencies.xml. I hope my explanation is easy to understand.
So let's say I have this dependency list as an example
com.acme#core-platform#${version}
org.hibernate#hibernate-java8#5.1.0.Final
org.springframework.boot#spring-boot-starter-web#1.3.1.RELEASE
commons-collections#commons-collections#3.2.2
.... others
Full dependency is
<dependencies>
<dependency org="com.acme" name="core-platform" rev="${version}" transitive="true" conf="runtime->runtime" changing="true"/>
<dependency org="com.acme" name="core-platform" rev="${version}" transitive="true" conf="compile->compile" changing="true"/>
<dependency org="com.acme" name="core-platform" rev="${version}" transitive="true" conf="provided->provided" changing="true"/>
<dependency org="com.acme" name="core-platform" rev="${version}" transitive="true" conf="junit->junit" changing="true"/>
<dependency org="com.acme" name="core-platform" rev="${version}" transitive="true" conf="test->test" changing="true"/>
<!-- https://mvnrepository.com/artifact/org.hibernate/hibernate-java8 -->
<dependency org="org.hibernate" name="hibernate-java8" rev="5.1.0.Final" transitive="false" />
<dependency org="org.springframework.boot" name="spring-boot-starter-web" rev="1.3.1.RELEASE" transitive="false" />
<dependency org="org.springframework.boot" name="spring-boot-starter-tomcat" rev="1.3.1.RELEASE" transitive="false" />
<dependency org="org.springframework.boot" name="spring-boot-starter-validation" rev="1.3.1.RELEASE" transitive="false" />
<dependency org="commons-collections" name="commons-collections" rev="3.2.2" transitive="false" />
<!-- jackson2 libs -->
<dependency org="com.fasterxml.jackson.datatype" name="jackson-datatype-jdk8" rev="2.8.1" transitive="false" conf="runtime->*"/>
<dependency org="com.fasterxml.jackson.datatype" name="jackson-datatype-jsr310" rev="2.8.1" transitive="false" conf="runtime->*"/>
<exclude module="joda-time" />
<exclude module="jackson-datatype-joda" />
</dependencies>
I simply want to take Hibernates' Java8, commons-collections, etc.
Creating a duplicate dependencies.xml is not an option
I was considering manipulating the dependencies.xml via Ant and have it exclude the acme modules by regex. Feasible but tricky
Unfortunately I can't combine Ant task's ivy:retrieve with attributes file and element exclude, because that would have helped a looooooot
Any ideas?
It is hard to understand your requirement. I suspect that your problem could be solved by creating an additional configuration and use configuration mappings to control the downloads.
Example
This build creates two directories. The first contains the log4j dependency without transitive dependencies, the second includes the remote module's optional dependencies. If you look at the remote POM you'll see they have a different scope.
├── build.xml
├── ivy.xml
├── lib1
│ └── log4j-1.2.17.jar
└── lib2
├── activation-1.1.jar
├── geronimo-jms_1.1_spec-1.0.jar
├── log4j-1.2.17.jar
└── mail-1.4.3.jar
build.xml
<project name="demo" default="resolve" xmlns:ivy="antlib:org.apache.ivy.ant">
<target name="resolve">
<ivy:resolve/>
<ivy:retrieve pattern="lib1/[artifact]-[revision](-[classifier]).[ext]" conf="noDependencies"/>
<ivy:retrieve pattern="lib2/[artifact]-[revision](-[classifier]).[ext]" conf="withDependencies"/>
</target>
</project>
Notes:
Each "retrieve" task creates a directory containing the files that make up the configuration.
ivy.xml
<ivy-module version="2.0">
<info organisation="com.myspotontheweb" module="demo"/>
<configurations>
<conf name="noDependencies" description="File grouping that has no transitive dependencies"/>
<conf name="withDependencies" description="File grouping that contains dependencies"/>
</configurations>
<dependencies>
<dependency org="log4j" name="log4j" rev="1.2.17" conf="noDependencies->master; withDependencies->master,optional"/>
</dependencies>
</ivy-module>
Notes:
Note how the configuration is declared at the top of the ivy file and the dependency contains two configuration mappings
Additional
The following answer explains how ivy interprets Maven modules. It creates configurations that can be used to decide which files should be downloaded:
How are maven scopes mapped to ivy configurations by ivy
Ok, looks like the replace trick is very easy too.
Add the following markers <!-- DEPS START --> and <!-- DEPS END --> (or any of choice) between the parts of the dependencies.xml file to ignore
Hack via Ant
<copy file="dependencies.xml" tofile="ci/hacked-dependencies.xml" overwrite="true">
<filterchain>
<replacestring from="<!-- DEPS START -->" to="<!--" />
<replacestring from="<!-- DEPS END -->" to="-->" />
</filterchain>
</copy>
Example
<!-- DEPS START -->
<dependency org="com.acme" name="core-platfrom" rev="${version}" transitive="true" conf="runtime->runtime"/>
<!-- DEPS END -->

Publishing a jar file that already exists

I have some problems understanding how the publication works. I have to publish a jar file to my web repository, but I have found some probably maybe by the fact that I missing something about the artifact and the publication.
These are my three files for the publication:
Build.xml
<project xmlns:ivy="antlib:org.apache.ivy.ant" name="pubblication"
default="pubblication" basedir=".">
<echo>inizio</echo>
<target name="pubblication" description="--> pubblicare un artifact">
<ivy:settings file="archivaIvySetting.xml" />
<ivy:publish resolver="publish-artifact" conf="publicConf" organisation="bbi"
module="resutil" revision="1.0">
<artifacts pattern="./[artifact]-[revision].[type]"/>
</ivy:publish>
</target>
</project>
Ivy.xml
<ivy-module version="2.0">
<info organisation="org.apache" module="central"/>
<configurations>
<conf name="publicConf" visibility="public" />
</configurations>
<publications>
<artifact name="[organisation]-resutil" ext="jar" conf="publicConf"/>
</publications>
</ivy-module>
archivaIvySetting.xml
<?xml version="1.0" encoding="UTF-8"?>
<ivysettings>
<property name="archiva-internal" value="http://host.com:8080/repository
/internal/"/>
<settings defaultResolver="central">
<credentials host="host.com" realm="Repository Archiva Managed internal
Repository" username="username" passwd="passwd" />
</settings>
<resolvers >
<ibiblio name="central" m2compatible="true" usepoms="true" root="${archiva-
internal}" />
</resolvers>
</ivysettings>
My problem is that when I do the build that ant says there is no module with that name in the cache. Now the question:
1) In the pattern do I set the jar that I want to publish?
2) If not how do I must to do practically that: take the jar give it the info params and publish it in the repo?
I repeat the file already exist, and this is a test file.
The pattern in the publish task should match a that is created locally in your build. Additionally the publish section of the ivy file must match the files your attempting to upload.
Hopefully some examples will help:
good ivy tutorial for local repository?
Issues using ivy:publish task
Convert ivy.xml to pom.xml
how to publish 3rdparty artifacts with ivy and nexus

Trying to convert ivy artifacts into maven - missing artifact exception for the pom, even though it is there

Not sure if I'm going about this the right way, but I have some artifacts that I'm trying to convert to maven using ivy ant tasks and push into my maven repo.
the component in question is mystuff.services.common.
First I make the pom...
<ivy:makepom ivyfile="${ivy.lib.dir}/ivy/cache/myorg/mystuff.services.common/ivy-mystuff.services.common.xml" pomfile="${ivy.lib.dir}/ivy/cache/myorg/mystuff.services.common/poms/mystuff.services.common.pom">
<mapping conf="default" scope="compile"/>
<mapping conf="runtime" scope="runtime"/>
</ivy:makepom>
Then a little hackery - I insert an artifact element in the ivy file using xml task. This works ok...
<xmltask source="${ivy.lib.dir}/ivy/cache/myorg/mystuff.services.common/ivy-${resolved.revision}.xml" dest="${ivy.lib.dir}/ivy/cache/myorg/mystuff.services.common/ivy-${resolved.revision}.xml">
<insert path="/ivy-module/publications" >
<![CDATA[
<artifact name="mystuff.services.common" type="pom"/>
]]>
</insert>
</xmltask>
Then I resolve/deliver/publish, as per various docs I've seen on how to do this.
<ivy:resolve file="${ivy.lib.dir}/ivy/cache/myorg/mystuff.services.common/ivy-${resolved.revision}.xml"/>
<!--<echoproperties/>-->
<ivy:deliver conf="*" delivertarget="recursive-deliver"/>
<ivy:publish resolver="myrepo-publish" publishivy="false" overwrite="true">
<artifacts pattern="lib/myorg/[module]/[type]s/[artifact].[ext]"/>
</ivy:publish>
And the error I get:
build.xml:235: impossible to publish artifacts for
myorg#mystuff.services.common;1.0.1: java.io.IOException: missing artifact
myorg#mystuff.services.common;1.0.1!mystuff.services.common.pom
If I leave out the pom from the artifacts in the ivy file, the other artifacts just publish fine.
What am I doing wrong?
This is what the ivy file looks like after inserting the pom entry for artifacts
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="../../ivy-doc.xsl"?>
<ivy-module version="1.0">
<info organisation="myorg" module="mystuff.services.common" revision="1.0.1" status="integration" publication="20130206204156"/>
<configurations>
<conf name="default"/>
<conf name="compile" extends="default"/>
</configurations>
<publications>
<artifact name="services.common" type="jar" conf="compile"/>
<artifact name="services.common~test" type="jar" conf="compile"/>
<artifact name="services.common" type="javadoc-zip" ext="zip" conf="compile"/>
<artifact name="services.common~test" type="javadoc-zip" ext="zip" conf="compile"/>
<artifact name="services.common" type="src-zip" ext="zip" conf="compile"/>
<artifact name="services.common~test" type="src-zip" ext="zip" conf="compile"/>
<artifact name="com.myorg.mystuffservices.common" type="osgi-module" ext="jar" conf="compile"/>
<artifact name="services.common" type="pom"/>
</publications>
<dependencies>
<dependency org="org.testng" name="testng" rev="5.11" conf="compile->compile-15"/>
</dependencies>
</ivy-module>
Your publish does not have an artifact pattern that finds the pom generated by your "makepom" task.
Either change the location or alternatively add an extra artifacts tag to your publish task:
<ivy:publish resolver="myrepo-publish" publishivy="false" overwrite="true">
<artifacts pattern="lib/myorg/[module]/[type]s/[artifact].[ext]"/>
<artifacts pattern="${ivy.lib.dir}/ivy/cache/myorg/mystuff.services.common/poms/mystuff.services.common.pom"/>
</ivy:publish>
I also don't understand why you're inserting a POM entry into you ivy file. Why don't you just list in your publications section?
For a detailed example see:
how to publish 3rdparty artifacts with ivy and nexus

How to use Ivy to get artifacts other than JARs?

There is a Java library I want to use as part of my build, but it contains an external resources/ directory that has to be visible to the runtime classpath in order to work. I'd like to be able to store it as an artifact inside my Ivy repository, but not sure if Ivy can handle this, and if so, how to rig-up the ivy.xml, ivy-settings.xml files, as well as the repo itself.
My repo is actually an Artifactory server and I store artifacts and their ivy files right next to each other:
http://myrepo.com:8080/artifactory/simple/myrepo/
google/
guice/
3.0/
guice-3.0.jar
ivy.xml
Etc. I guess I'm looking for a similar setup here:
http://myrepo.com:8080/artifactory/simple/myrepo/
fizz/
buzz/
1.7/
buzz-1.7.jar
resources/
ivy.xml
...and somehow pull down both the jar and its resources/ directory as part of the Ivy resolve/retrieve pattern, and then place resources/ where I need it to be from there.
Is this possible? Any ideas? Thanks in advance!
Edit - If the fact that resources/ is a directory causes a problem, I don't mind zipping it up as resources.zip, and then resolving/retrieving it into my project at buildtime, and then unzipping it. That's just more work to do, if Ivy can't handle directory-artifacts out of the box.
You should zip/tar the directory and create the following setup:
http://myrepo.com:8080/artifactory/simple/myrepo/
fizz/
buzz/
1.7/
buzz-1.7.jar
resources-1.7.zip
ivy-1.7.xml
In your ivy.xml you would then declare each file as a publication of this module like this:
<?xml version="1.0" encoding="UTF-8"?>
<ivy-module version="2.0">
<info organisation="fizz"
module="buzz"
revision="1.7"
status="release"
publication="20110531150115"
default="true"
/>
<configurations>
<conf name="default" visibility="public"/>
</configurations>
<publications>
<artifact name="buzz" type="jar" />
<artifact name="resources" type="zip" />
</publications>
</ivy-module>
And if needed you could define separate configurations like:
<configurations>
<conf name="default" extends="jar, resources" visibility="public"/>
<conf name="jar" visibility="public"/>
<conf name="resources" visibility="public"/>
</configurations>
<publications>
<artifact name="buzz" type="jar" conf="jar"/>
<artifact name="resources" type="zip" conf="resources"/>
</publications>

Ivy appears to fetch javadoc jars only

I'm using Ivy on my project, with the Ivy Eclipse plugin.
It appears that certain jars which are downloaded and added to my project are the javadoc jars, not the jars with the actual code. Note - this doesn't happen with all jars.
For example, adding this to my ivy.xml file:
<dependency org="junit" name="junit" rev="4.8.2"/>
caused the javadocs for junit to be downloaded and added to my classpath:
This breaks compilation for my project, as none of the unit tests are working.
This was working fine until I added a reference to Spring, and everything broke. I've tried removing the reference, and deleting junit from my local cache to force ivy to fetch it again, but the problem persists.
Here's my total dependency block (with spring removed):
<dependencies>
<dependency org="org.hamcrest" name="hamcrest-library" rev="1.3.RC2"/>
<dependency org="junit" name="junit" rev="4.8.2"/>
<dependency org="org.mockito" name="mockito-core" rev="1.8.5"/>
<dependency org="javax.persistence" name="persistence-api" rev="1.0"/>
</dependencies>
Here's my ivysettings.xml for the project:
<ivysettings>
<caches artifactPattern="[organisation]/[module]/[revision]/[artifact].[ext]" />
<settings defaultResolver="local.ibiblio.jboss.java-net.springsource" checkUpToDate="true" />
<resolvers>
<chain name="local.ibiblio.jboss.java-net.springsource">
<filesystem name="libraries">
<artifact pattern="${basedir}/ivy-repo/[artifact]-[revision].[type]" />
</filesystem>
<ibiblio name="ibiblio" m2compatible="true" />
<ibiblio name="jboss" m2compatible="true"
root="https://repository.jboss.org/nexus/content/groups/public-jboss" />
<ibiblio name="java.net" m2compatible="true"
root="https://repository.jboss.org/nexus/content/repositories/java.net-m2/" />
<ibiblio name="java.net" m2compatible="true"
root="http://repository.codehaus.org/" />
<url name="com.springsource.repository.libraries.release">
<ivy pattern="http://repository.springsource.com/ivy/libraries/release/[organisation]/[module]/[revision]/[artifact]-[revision].[ext]" />
<artifact pattern="http://repository.springsource.com/ivy/libraries/release/[organisation]/[module]/[revision]/[artifact]-[revision].[ext]" />
</url>
<url name="com.springsource.repository.libraries.external">
<ivy pattern="http://repository.springsource.com/ivy/libraries/external/[organisation]/[module]/[revision]/[artifact]-[revision].[ext]" />
<artifact pattern="http://repository.springsource.com/ivy/libraries/external/[organisation]/[module]/[revision]/[artifact]-[revision].[ext]" />
</url>
<url name="com.springsource.repository.bundles.release">
<ivy pattern="http://repository.springsource.com/ivy/bundles/release/[organisation]/[module]/[revision]/[artifact]-[revision].[ext]" />
<artifact pattern="http://repository.springsource.com/ivy/bundles/release/[organisation]/[module]/[revision]/[artifact]-[revision].[ext]" />
</url>
<url name="com.springsource.repository.bundles.external">
<ivy pattern="http://repository.springsource.com/ivy/bundles/external/[organisation]/[module]/[revision]/[artifact]-[revision].[ext]" />
<artifact pattern="http://repository.springsource.com/ivy/bundles/external/[organisation]/[module]/[revision]/[artifact]-[revision].[ext]" />
</url>
</chain>
</resolvers>
</ivysettings>
Some open source modules include optional java doc jars. To remove them add a configuration mapping to each of your dependencies:
<dependency org="junit" name="junit" rev="4.8.2" conf="default"/>
The default configuration in ivy is equivalent to the the compile scope in a maven module. This is how the optional libraries can be automatically omitted. (Check their POMs).
A better approach is to declare your own configurations and the default mapping as follows:
<configurations defaultconfmapping="compile->default">
<conf name="compile" description="Required to compile code"/>
<conf name="test" description="Additional test dependencies" extends="compile" />
</configurations>
Then in your ivy file you only need to declare the non-standard configurations:
<dependencies>
<dependency org="org.hamcrest" name="hamcrest-library" rev="1.3.RC2" conf="test->default"/>
<dependency org="junit" name="junit" rev="4.8.2" conf="test->default"/>
<dependency org="org.mockito" name="mockito-core" rev="1.8.5" conf="test->default"/>
<dependency org="javax.persistence" name="persistence-api" rev="1.0"/>
</dependencies>
In this case we only want the 3 test libraries to appear on the test configuration.
Still confused? The magic of ivy configurations is when you use them to manage your build's class path
<target name='dependencies' description='Resolve project dependencies and set classpaths'>
<ivy:resolve/>
<ivy:cachepath pathid="compile.path" conf="compile"/>
<ivy:cachepath pathid="test.path" conf="test"/>
</target>
This is what Maven is doing when you declare a scope tag on a dependency, for example:
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.2</version>
<scope>test</scope>
</dependency>
The scopes in Maven are fixed. In ivy you can have as many as you need.

Categories

Resources