Adding a directory of files to Hadoop using distributed cache? - java

I'm trying to add a bunch of dependencies stored on hdfs to distributed cache. I've been following the advice from this article: http://www.datasalt.com/2011/05/handling-dependencies-and-configuration-in-java-hadoop-projects-efficiently/. My question is: is it possible to add a folder containing the dependencies to the classpath?
DistributedCache.addFileToClassPath(new Path("/tmp/lib/"), job.getConfiguration());
Or would I need to add each dependency individually?
for (Path dependency : dependencies) {
DistributedCache.addFileToClassPath(dependency, job.getConfiguration());
}
And how would I check that the dependencies were actually added to the classpath on all the slave nodes?
Thanks.

You'll need to iterate the jars and add them one at a time as you suggested. Or you can bundle the jars into a single zip file and then use the DistributedCache.addArchiveToClassPath(Path, Configuration) method.
To check they were added to the classpath, try examining the System property java.class.path in the setup method of a mapper / reducer.

Related

Micronaut + Maven - Environment variables

I am working on a Micronaut + Maven project.
I need to parametrize some values of my application.yml such as passwords and connection strings, to avoid committing them.
I know values can be parametrized this way:
secret-value: '${SECRET_VALUE}'
But i cant find any other way to set SECRET_VALUE except setting bash value in .bashrc or .profile or .envoirment script files.
I would like to use a .env file somehow, in order to commit a .env.example file in git repo.
Any thoughts?
According to maven-resource-plugin documentation :
https://maven.apache.org/plugins/maven-resources-plugin/examples/filter.html
You could add a filter file in the filters tag of your pom file.
See we can separate "your.name" from the POM by specifying a filter file my-filter-values.properties containing: in the documentation above.
I may have lost the focus of the question.
the solution was simply create a .env file with required values, then run application using something like
dotenv run ./mvnw mn:run.
regarding #yunandtidus solution:
That is good only for "fixed" variables, such as application name, as pom values are shared between environments (assuming that we have an application-dev.yml and an application-prod.yml).
Keep in mind that, unlike .env, pom.xml must be committed, as any application.yml you have.

How to override files when creating a jar in gradle

We have a java project where we have some default configurations under src/main/resources and there are overwrites under project_root/configDeploy
For our mapred jars we want to copy both configs but allow config/deploy files to overwrite defaults in resources. So we can have myconf.xml in resource and myconf.xml in deploy, but the mapred fat jar generated only has myconf.xml
I have tried two different methods, try to have deploy copy and overwrite the resources conf:
from 'src/main/resources'
from 'conf/deploy'
but this will add two files inside the jar, so it didn't work
Then I tried to add only files from src/main/resources that are not in conf/deploy, something like:
into('conf'){
from{
'src/main/resources'
}
exclude{file('deploy/conf/')}
}
into('conf'){
from{
'deploy/conf'
}
}
but this didn't work, as a result none of the confs from resources were copies.
So question is if I have a two folders with files which some of them have same name how can I include their files in jar so I get files from both folders but for files that are in both I get only the version in second folder.
Thanks for your help!
To avoid the duplicate files in the JAR you can set the duplicates strategy to EXCLUDE on the task.
duplicatesStrategy = DuplicatesStrategy.EXCLUDE
This will cause subsequent attempts to add the file to be ignored. Therefore, if you want files in 'deploy/conf' to take precedence you should define that copy spec first.

Maven configuration: pass file inside a classpath jar as an argument

Several maven plugins need/support passing a java.io.File as a configuration parameter, wherein we specify the relative/absolute location of the file for the plugin to locate and use.
Is there a way I can specify a property file in the plugin configuration where the file has to be found from inside a jar in the classpath? I'm particularly wanting this to know and use with the aspectj-maven-plugin, where I can specify the Xlintfile value to be the custom XlinkDefault.properties file location. The file, in my case, will be found inside a classpath jar.
I use maven-2.2.1 by the way.
No, not in general; there's no magic that will turn something that's not a file on disk into a java.io.File. Many Maven plugins (e.g., maven-checkstyle-plugin's configLocation are designed to allow more flexible input for just these cases:
This parameter is resolved as resource, URL, then file. If successfully resolved, the contents of the configuration is copied into the ${project.build.directory}/checkstyle-configuration.xml file before being passed to Checkstyle as a configuration.
As a workaround, if the plugin cannot be changed, dependency:unpack may be a way to get a classpath resource into a local file (see Maven: extract files from jar).

Can't add single XML file to Tomcat's classpath

I want to add a file to the classpath of all applications running on my Tomcat 7 server.
When adding
${catalina.base}/conf/myfile.xml
to common.loader in catalina.properties it's not working.
But adding just
${catalina.base}/conf
does the trick.
However, I just want to add a specific file, not the entire directory. The comments in catalina.properties state the following:
[…] Prefixes should be used to define what is the repository type. […]
[…] Examples: "foo/bar.jar": Add bar.jar as a class repository […]
Unfortunately I haven't found out with which prefix I should mark my file. Do you know more about this?
When you add something to the classpath, it it always either an whole directory; or a whole JAR file (which you may consider as an packed directory). You may never have a single file entry in your classpath.
Proposed solution: Either live with the conf/ directory; or pack your myfile.xml in a JAR file (even if it only contains a single file).

Deploy additional files in Gradle Application Plugin

I have a small Java/Gradle project. I'm using the Application plugin to create a zip distribution (using the distZip task). Using the standard configuration I get the following directories in my zip file:
/bin - The scripts to start the application go in here
/lib - Contains my project code in a JAR file and all dependency JAR files.
The trouble is that I would like a third directory: /conf where I can put my configuration files (instead of having them packaged inside my application JAR file.
I imagine that this is a pretty common requirement because things like log4j.xml and hibernate.properties would be better placed outside the JAR file. I just can't figure out how I can customise the behavior of the Application plugin to do this however.
I revisited this problem several months later and I finally have an elegant solution. The following code should be added to the gradle file:
distZip {
into(project.name) {
from '.'
include 'conf/*'
}
}
This adds an additional include to the distZip task. This copies the "conf" directory (including contents) into the Zip distribution.
The generated zip file contains a single directory which is the same as the project name. This is why the "into" part is required.
Actually, create a dist dir under the src dir in your project. Anything in this dir is copied by the application plugin (under applicationDistribution) when installApp or distZip is run.
Or edit applicationDistribution to do other things, if a simple copy is not enough.
For me, a simple
applicationDistribution.from("src/main/config/") {
into "config"
}
did the job. Of course you need to have your properties loaded correctly from within code. Especially if you move them from src/main/resources where they have been usable via classpath, into the new location. I circumvented this by adding a command line parameter which points to the configuration file.
I am not sure whether you can customize the application plugin, I have never used it. There is however other ways to achieve what you want to achieve.
You may create a /conf directory like this:
confDir = new File("$buildDir/conf")
You can then copy the files you need into this directory like this:
task copyConfFiles(type: Copy) {
from _wherever your files reside_
into confDir
include('**/*.properties') // your configuration files
}
You may then hook this copy task into the process like this:
distZip.dependsOn copyConfFiles
And last if you do not want your configurations in the final zip, you can do this:
distZip {
exclude('**/*.properties') // your configuration files
}
Again, there might be a better way. This is a way.
OP's self-answer may be good for his use case, but there are a few things I'd like to improve on:
His answer suggests that he has a directory conf parallel to the build.gradle. There is no such thing in the Maven Standard Directory Layout. The general consensus is to have a src/main/conf as had been hinted to in the docs:
If there are other contributing sources to the artifact build, they
would be under other subdirectories: for example src/main/antlr would
contain Antlr grammar definition files.
The target directory name is NOT project.name as had been pointed out in a comment.
If resource filtering is required, and it often is, then having a separate task is desirable. During local development, this task can be run to generate the filtered files. The distribution would merely use the output of this task (and unlike OP's answer, this also makes conf available to the tar distribution).
def props = new Properties()
file("src/main/filters/application.properties")
.withInputStream { props.load(it) }
import org.apache.tools.ant.filters.ReplaceTokens
task copyConf(type: Copy) {
from("src/main/conf/")
into("$buildDir/conf")
filesMatching("**/*.y*ml") {
filter(tokens: props, ReplaceTokens)
}
}
distributions {
main {
contents {
from(copyConf) {
into("conf")
}
}
}
}

Categories

Resources