I have a scripted Jenkinsfile running in our distributed Jenkins build environment.
I have code performing Kerberos authentication in the Jenkinsfile. That code is based on two small Java programs, both successfully authenticating to Kerberos. Those two Java programs run on both my Windows workstation and a Linux virtual machine guest.
That is: I have a pair of working Java programs that successfully perform Kerberos authentication from Windows and from Linux using a set of Kerberos config files. When I translate the code to my Jenkinsfile, it apparently fails at step 1: finding my carefully constructed krb5.conf (and login.conf) files.
The Kerberos code is in a correctly configured global shared library. I know it is correctly configured because the library is used elsewhere in my Jenkinsfile and I know it has downloaded the correct Kerberos libraries from our repository because I don't get any kind of compilation or class not found errors.
The specific error message, which I have not managed to budge over dozens of different build, trying to put the krb5.conf file everywhere I can think Jenkins might look for it, is this:
GSSException: Invalid name provided (Mechanism level: KrbException: Cannot locate default realm)
Yes, there's a longer stack trace, but if you know what's going on, that's all you should need.
I have tried using System.setProperty() from the Jenkinsfile to point at a file which has been checked in to the project, created using Jenkins file credentials, and by using the writeFile step to write a string containing the config file directly to the build workspace. In each case, Jenkins seems to simply not find the krb5.conf file and I get the same "Cannot locate default realm" error.
It's problematic to put the file in /etc for a variety of reasons. Plus, should I really have to put the Kerberos config files there when there is a clearly elucidated algorithm for finding them, and I seem to be following it?
If you have any idea what's going on, any help would be greatly appreciated.
NB: I have successfully authenticated to Kerberos using the krb5.conf and login.conf files at issue here. They work. Kerberos and my configs don't seem to be the issue. Whatever Jenkins is or is not doing seems to be the issue.
Answering my own question as I did eventually find a resolution to both this and to successfully using Kerberos authentication (after a fashion) with Jenkins Pipeline in part of our build process.
Our Jenkins instance uses a number of executors in our little part of the AWS cloud. Practically speaking, the only part of our pipeline that runs on the executors is the build step: Jenkins checks out workspaces to the build nodes (the executors) and performs builds on those nodes.
Almost everything else, and explicitly everything in Jenkins' so-called global shared libraries including the Kerberos code referenced in my original question, is actually run on master: even when you wrap calls to a function in a global shared library in a node() step in your Jenkinsfile, those calls still run on master.
Because, obviously, right?
What I was trying to do is place the krb5.conf file in all the places it should be on the build nodes. But since my Kerberos code wasn't part of the build (or one of the other few steps, like sh(), that run on nodes in Jenkins), it wasn't happening on the nodes: it was happening on the Jenkins master. Even though the calls were wrapped in a node step. I'm not bitter. It's fine.
Placing the krb5.conf file in the correct location on master resolved this issue, but created other problems. Ultimately, I put the Kerberos logic into a small Java command line utility in a jar, along with the appropriate config files. That was downloaded via curl and executed, all in an sh() step in our pipeline. Not the most elegant solution, but even after discussing the issue with Cloudbees support, this was the solution they recommended for what we were trying to do.
The Kerberos implementation inside JDK uses the system property "java.security.krb5.conf" to locate krb5.conf. I am not sure if you are using a 3rd-party Kerberos library or not.
Related
I'm running Spark 3.3.0 on Windows 10 using Java 11. I'm not using Hadoop. Every time I run something, it gives errors like this:
java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:735)
at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:270)
at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:286)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978)
First of all, even the link https://wiki.apache.org/hadoop/WindowsProblems in the error message is broken. The update link is apparently https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems, which basically says that Hadoop needs Winutils. But I'm not using Hadoop. I'm just using Spark to process some CSV files locally.
Secondly, I want my project to build with Maven and run with pure Java, without requiring the user to install some third-party software. If this Winutil stuff needs to be installed, it should be included in some Maven dependency.
Why is all this Hadoop/Winutils stuff needed if I'm not using Hadoop, and how do I get around it so that my project will build in Maven and run with pure Java like a Java project should?
TL;DR
I have created a local implementation of Hadoop FileSystem that bypasses Winutils on Windows (and indeed should work on any Java platform). The GlobalMentor Hadoop Bare Naked Local FileSystem source code is available on GitHub and can be specified as a dependency from Maven Central.
If you have an application that needs Hadoop local FileSystem support without relying on Winutils, import the latest com.globalmentor:hadoop-bare-naked-local-fs library into your project, e.g. in Maven for v0.1.0:
<dependency>
<groupId>com.globalmentor</groupId>
<artifactId>hadoop-bare-naked-local-fs</artifactId>
<version>0.1.0</version>
</dependency>
Then specify that you want to use the Bare Local File System implementation com.globalmentor.apache.hadoop.fs.BareLocalFileSystem for the file scheme. (BareLocalFileSystem internally uses NakedLocalFileSystem.) The following example does this for Spark in Java:
SparkSession spark = SparkSession.builder().appName("Foo Bar").master("local").getOrCreate();
spark.sparkContext().hadoopConfiguration().setClass("fs.file.impl", BareLocalFileSystem.class, FileSystem.class);
Note that you may still get warnings that "HADOOP_HOME and hadoop.home.dir are unset" and "Did not find winutils.exe". This is because the Winutils kludge permeates the Hadoop code and is hard-coded at a low-level, executed statically upon class loading, even for code completely unrelated to file access. More explanation can be found on the project page on GitHub. See also HADOOP-13223: winutils.exe is a bug nexus and should be killed with an axe.)
How Spark uses Hadoop FileSystem
Spark uses the Hadoop FileSystem API as a means for writing output to disk, e.g. for local CSV or JSON output. It pulls in the entire Hadoop client libraries (currently org.apache.hadoop:hadoop-client-api:3.3.2), containing various FileSystem implementations. These implementations use the Java service loader framework to automatically register several implementations for several schemes, including among others:
org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs.viewfs.ViewFileSystem
org.apache.hadoop.fs.http.HttpFileSystem
org.apache.hadoop.fs.http.HttpsFileSystem
org.apache.hadoop.hdfs.DistributedFileSystem
…
Each of these file systems indicates which scheme it supports. In particular org.apache.hadoop.fs.LocalFileSystem indicates it supports the file scheme, and it is used by default to access the local file system. It in turn uses the org.apache.hadoop.fs.RawLocalFileSystem internally, which is the FileSystem implementation ultimately responsible for requiring Winutils.
But it is possible to override the Hadoop configuration and specify another FileSystem implementation. Spark creates a special Configuration for Hadoop in org.apache.spark.sql.internal.SessionState.newHadoopConf(…) ultimately combining all the sources core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, and __spark_hadoop_conf__.xml, if any are present. Then Hadoop's FileSystem.getFileSystemClass(String scheme, Configuration conf) looks for the FileSystem implementation to use by looking up a configuration for the scheme (in this case file) in the form fs.${scheme}.impl (i.e. fs.file.impl in this case).
Thus if you want to specify another local file system implementation to use, you'll need to somehow get fs.file.impl into the configuration. Rather than creating a local configuration file if you are accessing Spark programmatically, you can set it via the Spark session, as explained in the introduction.
Why Winutils
The Hadoop FileSystem API in large part assumes a *nix file system. The current Hadoop local FileSystem implementation uses native *nix libraries or opens shell processes and directly runs *nix commands. The current local FileSystem implementation for Windows limps along with a huge kludge: a set of binary artifacts called Winutils that a Hadoop contributor created, providing a special back-door subsystem on Windows that Hadoop can access instead of *nix libraries and shell commands. (See HADOOP-13223: winutils.exe is a bug nexus and should be killed with an axe.)
However detection and required support of Winutils is actually hard-coded in the Hadoop at a low-level—even in code that has nothing to do with the file system! For example when Spark starts up, even a simple Configuration initialization in the Hadoop code invokes StringUtils.equalsIgnoreCase("true", valueString), and the StringUtils class has a static reference to Shell, which has a static initialization block that looks for Winutils and produces a warning if not found. 🤦♂️ (In fact this is the source of the warnings that were the motivation for this Stack Overflow question in the first place.)
Workaround to use FileSystem without Winutils
Irrespective of the warnings, the larger issue is getting FileSystem to work without needing Winutils. This is paradoxically both a simpler and also a much more complex project than it would first appear. One one hand it is not too difficult to use updated Java API calls instead of Winutils to access the local file system; I have done that already in the GlobalMentor Hadoop Bare Naked Local FileSystem. But weeding out Winutils completely is much more complex and difficult. The current LocalFileSystem and RawLocalFileSystem implementations have evolved haphazardly, with halway-implemented features scattered about, special-case code for ill-documented corner cases, and implementation-specific assumptions permeating the design itself.
The example was already given above of Configuration accessing Shell and trying to pull in Winutils just upon classloading during startup. At the FileSystem level Winutils-related logic isn't contained to RawLocalFileSystem, which would have allowed it to easily be overridden, but instead relies on the static FileUtil class which is like a separate file system implementation that relies on Winutils and can't be modified. For example here is FileUtil code that would need to be updated, unfortunately independently of the FileSystem implementation:
public static String readLink(File f) {
/* NB: Use readSymbolicLink in java.nio.file.Path once available. Could
* use getCanonicalPath in File to get the target of the symlink but that
* does not indicate if the given path refers to a symlink.
*/
…
try {
return Shell.execCommand(
Shell.getReadlinkCommand(f.toString())).trim();
} catch (IOException x) {
return "";
}
Apparently there is a "new Stat based implementation" of many methods, but RawLocalFileSystem instead uses a deprecated implementations such as DeprecatedRawLocalFileStatus, which is full of workarounds and special-cases, is package-private so it can't be accessed by subclasses, yet can't be removed because of HADOOP-9652. The useDeprecatedFileStatus switch is hard-coded so that it can't be modified by a subclass, forcing a re-implementation of everything it touches. In other words, even the new, less-kludgey approach is switched off in the code, has been for years, and no one seems to be paying it any mind.
Summary
In summary, Winutils is hard-coded at a low-level throughout the code, even in logic unrelated to file access, and the current implementation is a hodge-podge of deprecated and undeprecated code switched on or off by hard-coded flags that were put in place when errors appeared with new changes. It's a mess, and it's been that way for years. No one really cares, and instead keeps building on unstable sand (ViewFs anyone?) rather than going back and fixing the foundation. If Hadoop can't even fix large swathes of deprecated file access code consolidated in one place, do you think they are going to fix the Winutils kludge that permeates multiple classes at a low level?
I'm not holding my breath. Instead I'll be content with the workaround I've written which writes to the file system via the Java API, bypassing Winutils for the most common use case (writing to the local file system without using symlinks and without the need for the sticky bit permission), which is sufficient to get Spark accessing the local file system on Windows.
There is a longstanding JIRA for this...for anyone running spark standalone on laptop there's no needed to provide those posix permissions. is there
LocalFS to support ability to disable permission get/set; remove need for winutils
This is related to HADOOP-13223
winutils.exe is a bug nexus and should be killed with an axe.
It is only people running spark on windows who hit this problem, and nobody is putting in the work to fix it. If someone was, I will help review/nurture in.
Spark is a replacement execution framework for mapreduce, not a "Hadoop replacement".
Spark uses Hadoop libraries for Filesystem access, including local filesystem. As shown in your error org.apache.hadoop.fs.RawLocalFileSystem
It also uses winutils as a sort of shim to implement Unix (POSIX?) chown/chmod commands to determine file permissions on top of Windows directories.
tell Spark to use a different file system implementation than RawLocalFileSystem?
Yes, use a different URI than default file://
E.g. spark.csv("nfs://path/file.csv")
Or s3a or install HDFS, or GlusterFS, etc. for a distributed filesystem. After all Spark is meant to be distributed processing engine; if you're only handling small local files, it's not the best tool.
I try to create a PDF from an XML using apache fop, i can do it in Netbeans or Eclipse IDE's, but the Java Compute inside the IIB when i try to execute from SOAP UI launch me this java error
java.lang.NoClassDefFoundError: org/apache/fop/apps/FopFactory
java error
but, i already add the necesaires libraries:
libraries added without errors
Here reference is made
libraries referenced
I hope you can help me, thank you all.
It depends on how you want to work, but you simply have to put this jar on the shared-classes.
This folder exist at the execution group (integration server) level, or at the broker (integration node) level. If you plan to reuse it later, I would suggest you to put it at the broker level, otherwise at the execution group level.
Path sample (unix in this case)
/var/mqsi/shared-classes (For all your broker on this VM, NOT recommanded)
/var/mqsi/config/"yourBrokerName"/shared-classes (broker level)
/var/mqsi/config/"yourBrokerName"/"yourExecutionGroupName"/shared-classes (execution group level)
If you put it at the execution group level, the execution group needs to be restarted. If you put it at the broker level, you should restart the whole broker.
Feel free to contact me if you have other questions, but with the shared-classes keyword, you should be able to find everything you are looking for.
I've been messing around with Apache Derby inside Eclipse. I've booted up a Network Server, and I've been working with servlets. In my Eclipse project, I have a class called "User", inside the package "base.pack". I have an SQL script open, and I've been trying to convert User, which implements Serializable, into a custom type. When I run the following lines, everything works fine:
CREATE TYPE CARTEBLANCHE.bee
EXTERNAL NAME 'base.pack.User'
LANGUAGE JAVA
This follows the general format they identify here: http://db.apache.org/derby/docs/10.7/ref/rrefsqljcreatetype.html#rrefsqljcreatetype
Now, when I try to create a table using this new type, I get an error. I run the following line:
CREATE TABLE CARTEBLANCHE.TestTabel (ID INTEGER NOT NULL, NAME CARTEBLANCHE.bee, PRIMARY KEY(ID));
And I receive the following error:
The class 'base.pack.User' for column 'NAME' does not exist or is inaccessible. This can happen if the class is not public.
Now, the class is in fact public, and as I noted before, it does implement Serializable. I don't think I'm stating the package name incorrectly, but I could be wrong. I'm wondering, is this an issue with my classpath? If so, how would you suggest I fix this? I admit that I do not know much about the classpath.
Thank you.
(For reference, I have configured my project build path to include derby.jar, derbyclient.jar, derbytools.jar, and derbynet.jar, and I have put these files into my project's lib folder as well).
As politely as I can, may I suggest that if you are uncomfortable with Java's CLASSPATH notion, then writing your own custom data types in Derby is likely to be a challenging project?
In the specific case you describe here, one issue that will arise is that your custom Java code has to be available not only to your client application, but also to the Derby Network Server, which means you will need to be modifying the server's CLASSPATH as well as your application's CLASSPATH.
It's all possible, it's just not a beginner-level project.
To get started with customizing your Derby Network Server, the first topic involves how you are starting it. Here's an overview of the general process: http://db.apache.org/derby/docs/10.11/adminguide/tadmincbdjhhfd.html
Depending on how precisely you are starting the Derby Network Server, you'll possibly be editing the CLASSPATH settting in the startNetworkServer or startNetworkServer.bat script, or you'll be editing the CLASSPATH setting in your own script that you have written to start the server.
If it's a tool like Eclipse or Netbeans which is starting the Derby Network Server, you'll need to dig into the details of that tool to learn more about how to configure its CLASSPATH.
And if you've written a custom Java application to start the Derby Network Server (e.g., as described here: http://db.apache.org/derby/docs/10.11/adminguide/tadminconfig814963.html) then you'd be configuring the CLASSPATH of your custom application.
Regardless, as a basic step, you're going to want to be deploying your custom Java extension classes in the Derby Network Server's classpath, which means you'll want to build them into a .jar file and put that .jar file somewhere that the Derby Network Server has access to, and you'll want to make that build-a-jar-and-copy-it-to-the-right-location process straightforward, so you should integrate it into whatever build tool you're using (Apache Ant?).
And, you'll need to consider Java security policy, because the default security policy will prevent you from trivially loading custom Java classes into your Derby Network Server as that would seem like a malware attack and the Derby Network Server is going to try to prevent that. So study this section of the Security manual: http://db.apache.org/derby/docs/10.11/security/tsecnetservrun.html
I'm adding unit-tests to an existing codebase, and the application itself retrieves data from a server through REST. The URL to the server is hard-coded in the application.
However, developers are obviously not testing new features, bugs, etc on a live environment, but rather on a development-server. To acomplish this, the developement-build have a different "server-url"-string than the production-build.
During developement a non-production-url should be enforced; and when creating a production build, a production-url should be inforced instead.
I'm looking for advice on how to implement a neat solution for this, since missing to change the url can currently have devastating outcomes.
A maven build script only tests the production-value, and not both. I haven't found any way to make build-specific unit-tests (Technologies used: Java, Git, Git-flow, Maven, JUnit)
Application configuration is an interesting topic. What you've pointed out here as an issue is definitely a very practical need, but even more so, if you need to repackage (and possibly build) between different environments, how do you truly know that what you've got there is the same that was actually tested and verified.
So load the configuration from a resource outside of the application package. Java option to a file on filesystem or a JNDI resource are both good options. You can also have defaults for the development by commiting a config file and reading from there if the Java option is not specified.
Our build/deploy process is very tedious, sufficiently manual and error-prone. Could you give proposals for improvement?
So let me describe our deployment strategy and build process.
We are developing system called Application Server (AS for short). It is essentially servlet-based web application hosted on JBoss Web server. AS can be installed in two "environments". Each environment is a directory with webapp's code. This directory is placed on network storage. Storage is mounted to several production servers where JBoss instances are installed. Directory is linked to JBoss' webapps directory. Thus all JBoss instances use the same code for environment. Configuration of JBoss is separate from environment and updated on per instance basis.
So we have two types of patches: webapp patches (for different environments) and configuration patches (for per instance configuration)
Patch is an executable file. In fact it is bash script with embedded binary rpm package. Installation is pretty straight-forward: you just execute file and optionally answer some questions. Important point is that the patch is not a system as a whole - it contains only some classes with fixes and/or scripts that modify configuration files. Classes are copied into WEB-INF/classes (AS is deployed as exploded directory).
The way we build those pathes is:
We take some previous patch files and copy them.
We modify content of patch. The most important part of it is RPM spec. There we change name of patch, change its prerequisite rpm packages and write down actual bash commands for backing up, copying and modifying files. This is one of the most annoying parts because we not always can get actual change-set. That is especially true for new complex features which are spanned among multiple change requests and commits. Also, writing those commands for change-set is tedious and error-prone.
For webapp patches we also modify spec for other environment. Usually they are identical excepting rpm package name.
We put all rpm related files to VCS
We modify build.xml by adding a couple of targets for building new patch. Modification is done by copypasting and editing.
We modify CruiseControl's config by copypasting project and changing ant targets in it
At last, we build a system
Also, I'm interested in any references on patch preparation and deployment practices, preferably for Java applications. I haven't succeed googling that.
The place I work had similar problems, but perhaps not as complex.
We responded by eliminating the concept of patch altogether. We stopped patching, and started simply installing the whole app (even if we do a just a small change).
We now have Cruise Control build complete install kits that happen to contain the build timestamp in the install-kit name. This is a Cruise Control build artifact.
Cruise Control autoinstalls them on a test server, and runs some automated smoke tests. We then run manual tests on the test server. Then we install the artifact on a staging, then production server.
Getting rid of patching caused some people to splutter, "isn't that wasteful if you're just changing a couple of things?" and "why would you overwrite all the software just to patch something?"
But the truth is that good source control, automated install-kit building, and one-step installation has saved us tons of time. It does take a few seconds longer to install, but we can do it far more repeatedly and with less developer labor.