Use Apache Hadoop JAR files or vendor specific?

Use Apache Hadoop JAR files or vendor specific? - java

I am creating an application for Hadoop which should run on all Distributions of Hadoop provided by different vendors like: Cloudera, MapR, Hortonworks, Pivotal...etc. My application would be deployed on application servers like WebLogic, JBoss or can be deployed on tomcat also.
So my question here is:- Suppose some version of all these vendors use the same underlying Hadoop version say Hadoop 2.0, so should i use the JAR files given by these vendors or use the JAR files given by Apache hadoop?
I mean the JAR files that have the same classes as Apache hadoop but have their name in them like blablaCDH5.2blabla.jar, so should i use this one or the one from Apache? So i can build a single version for Hadoop 2.0 and use it for all vendors. Can that be done or i have to build different flavours of my app for all vendor distributions.
Thanks in advance

One approach, which may vary slightly based on your version control and build systems, would be to have separate build scripts using the dependencies from the different distributions.
Where test cases fail for a given distribution you could have a branch/fork for that distribution or, probably less desirable, have a specific build which does some pre-build magic for that distribution.
This way you should be able to maintain a consistent trunk while being able to track and handle issues that come up in vendor/version specific distribution going forward. This would definitely be possible with git and most build systems (e.g gradle, maven or ant).

You can create a shims layer that allows your application to run with any hadoop distribution.As most of the distribution has different hadoop versions it is very difficult to deal with this problem.So most of the vendors are now creating shims layer that can work with any hadoop distribution.Shim layer has now been implemented in many applications like Pentaho,hive,gora etc.

It depends on how deep into hadoop API you are threading.
If your application only submits jobs to the cluster, you are probably ok with vanilla libraries as long as you stick to one specific version. If you are doing advanced stuff and using hadoop internals, it may be necessary to include vendor specific ones.

Dennis you can build your application using jars provided by Apache Hadoop , because all of them are modified form of Apache hadoop. These all distributions have same baseline structure so using jars provided by Apache hadoop won't create any problem.
In fact I am providing you links for cloudera in which they are using jars provided by Apache Hadoop itself.This the required link.

Related

Which .JAR do I use to embed Jetty?

I'm at the point in my application where I would like to have an HTTP Server embedded into my project that updates the page in real-time using AJAJ(Similar to AJAX). However, I have no idea where to begin and the amount of tutorials on this subject are fairly limited, so I decided to go with a name that I've heard quite a few times before, Jetty
So, I downloaded Jetty and read through some documentation, and I'm staring at their beginner tutorial asking myself, "Which one of these f*kin jars do I use?" There's like 9,001 of them. Not to mention that there's like 1200 folders that all contain 1500 more jar files each.
Okay, I'm over exaggerating, but take a look.
It's fairly, uhm... confusing. This is much different than most libraries that are a single jar file, this is just... insane.
Anyway, I'm trying to figure out what all I need to be able to use JQuery, AJAX(AJAJ), and basic HTML features.

I'd suggest you to start with this simple tutorial and jetty-all jar
Embedding Jetty Webinar recording
Embedding Jetty docs
jetty-all different versions downlad

To followup on Gas's answer.
jetty-all doesn't have 100% of Jetty.
It used to, hence the name.
However, today its impossible to have 100% of jetty, as many components can conflict with each other.
If you use maven, or gradle, or ant+ivy, then you'll likely want to depend on:
org.eclipse.jetty:jetty-webapp
org.eclipse.jetty.websocket:javax-websocket-server-impl
let the transitive nature of those build tools pull in the rest.
This would get you "started" easily enough.
There are also plenty of example projects that use embedded jetty.
See:
Embedded Jetty: with JSP enabled
Embedded Jetty: with various WebSocket configurations
Embedded Jetty: using Servlet 3.0 features
Embedded Jetty: using Servlet 3.1 features
Embedded Jetty: various Logging configurations
Some use 100% embedded jetty (without a war file, or WEB-INF, or web.xml), some use a war file built elsewhere.
Jetty uses maven so it can participate in the global central artifact repository, and that we have 2 developers on Jetty that are also developers on Maven.
If you want to manage the dependencies yourself, then you will need to know intimately the purpose and role/purpose/relationship/requirements of every jar file that you are going to add into your project. (and answering that is way out of scope for stackoverflow)
You have many build tool options to make managing the dependencies easier:
Apache Maven
Gradle/Grails
Apache Buildr
Apache Ivy (an add-on for Apache ant)
Groovy Grape
Scala SBT (for working with Scala on top of Java)
Leiningen (for working with Clojure on top of Java)
Maven isn't required, you could use any of the above tools.
Tip: Maven and Gradle are the best integrated in various IDEs (like Eclipse IDE and IntelliJ)

Can I keep same jars with two different versions in java?

In my web application, there is already a lucene-core jar of version 3.6.2, now to add different functionality within the same project I need latest version of lucene-core jar i.e 4.4.0.
When I replace the latest jar with previous one it throws compilation errors as backward compatibility is not maintained by Lucene.
My newly added functionality doen't work on 3.6.2 version. I know it is not possible to keep both jars version in lib. Please suggest a solution.

Oh, yes the Jar hell!
If possible move your Lucene functionalities to a separate layer such as a webservice and access this service from your web application as a webservice client. Of course, this means some sort of overhead (network etc.).
Another possibility would be to use a OSGI solution such as JBoss Fuse that allows to serve web applications. Move your Lucene functionalities to separate modules (each one using a different Lucene version) and import the services to your web application. The advantage is that with this solution you may access the services directly without network overhead.

Multiple versions of a same Weblogic Shared Library active on a same server

I am having an issue while trying to deploy two different versions of a same shared Library on the same weblogic server.
Here is my goal:
I have multiple applications which reference a shared Library;
those applications should be able to use different implementation version of this shared Library (in a very good world);
So I would like to deploy multiple version of this shared Library (e.g.: AppB use the version 1.0.0 and AppA use the 1.0.1).
I think (know?) that is possible (I've seen on a weblogic exam that multiple version of a same shared lib can be deploy and active in the same time) but untill now I failed...
Weblogic is unhappy with the fact that two sharedlibrary have the same name... But this 'same name' is required by my applications to use those shared libraries...
My META-INF files are :
Extension-Name: app-local-services-ejb
Implementation-Version: 2.0.2-SNAPSHOT
Specification-Version: 2.0
and
Extension-Name: app-local-services-ejb
Implementation-Version: 2.0.1-SNAPSHOT
Specification-Version: 2.0
My weblogic-application files for the applications I only use the Extension Name of the web-inf and the specification version...
I've tried to do the same with two different version of JSF but I've got the same issue.
The Oracle's documentation on shared Library is not really clear and I didn't find something useful on goole/bing.
I don't know if I have to upload those shared libraries in a specific folder (e.g.: weblogic/commom/deployable-libraries) or if I have to specify something in WEB-INF/weblogic-application.xml files ?
Does anyone have an idea to solve this problem?

Can you separate the applications into two .ear/.war files, each with their own version of the library they need to use? This seems to be by far the easiest way around this problem.

You can only have one version of a library with the same spec version in a WLS at the same time.
To put a new implementation-version up, you'd have to update the existing deployment with the new jar, rather than deploying a new version. This will remove the lower-numbered implementation-version, if WLS can work it out.
This is why it's a very good idea to have spec-versions and implementation-versions be floats, since WLS has a much easier time working out which version is higher than the other if it can be cast to a float.
If, however, you have a library with a different spec number, you can happily upload it simultaneously with other spec numbers. Again, if WLS can work out the version numbers, it can transition apps to the new version automatically, provided they don't have exact-version set in their weblogic-application.xml.

Batch Java Help

My company is trying to determine the best strategy for implementing batch Java programs. We have a few hundred (and growing) separate Java programs. Most of them are individual Jasper Reports but some are bigger batch Java jobs. Currently, each Java Project is packaged an independent JAR file using Eclipse's export option. Those JARs are then deployed to our Linux server manually where they are tested. If they pass testing, they are then migrated up through QA and onto Production through a home grown source code control system.
Is this the best strategy for doing batch Java? Ongoing maintenance can be a hassle since searching Jar files is not easy and different developers are creating new Java Projects (new reports) every week.
Importing existing projects from the Jar files into Eclipse is a tricky process as well. We would like these things to be easier. We have thought about packaging all the code into 1 big project and writing an interface to be able to execute the desired "package" (aka program) maybe using a Web Server.
What are other people/companies doing out there with their batch Java programs? Are there any best practices out there on this stuff? Any help/ideas/working models would be appreciated.

I would say that you should be able to create one web based app for access Jasper reports, rather than a bunch of batch processes. Then, when you need to deploy a new report, just deploy a minor update that accesses a new compiled Jasper report file.
That said, you should be checking your code, not your binaries, into a Subversion or Git repository. Dump the "home grown" source control repository. Life is too short to try to home grow stuff like that. Just use Git or Subversion, they're proven, simple, and functional. When you import a new project, just pull it down from Subversion, don't try to import the JAR file from your Eclipse IDE.
Put your JAR files into a Maven repository such as Nexus, and deploy to QA and Production from there. Create automated builds for every project (be that with Maven or something else). Don't depend upon an IDE to export your JAR files. IDE's change and exporting from an IDE introduces more opportunity for human error. Also, different developers will prefer different IDE's. By standardizing on something like Maven, you're a bit more IDE agnostic.

Mhy company has standardized Java Batch execution using IBM Websphere Extended Deployment.
Here http://www.ibm.com/developerworks/websphere/techjournal/0801_vignola/0801_vignola.html is an article introducing techniques for programming and deploying java batch.
Introduction to batch programming using WebSphere Extended Deployment Compute Grid
Christopher Vignola, WebSphere
Architect, IBM
Commonly thought of as a
legacy "mainframe" technology, batch
processing is showing itself to be a
venerable workload style with growing
demand in Java™ and distributed
environments. This article introduces
an exciting new capability for Java
batch processing from IBM®, the leader
in batch processing systems for the
last 40 years. This content is part of
the IBM WebSphere Developer Technical
Journal.
WebSphere Extended Deployment Compute
rid provides a simple abstraction of
a batch job step and its inputs and
outputs. The programming model is
concise and straightforward to use.
The built-in checkpoint/rollback
mechanism makes it easy to build
robust, restartable Java batch
applications.
The Batch Simulator utility provided
with this article offers an
alternative test environment that runs
inside your Eclipse (or Rational
Application Developer) development
environment. Its xJCL generator can
help jump start you to the next phase
of testing in the Compute Grid unit
test server.
But even if you are not interested in the product, the article is a must read anyway.

Jbilling + Ruby

Are there ready solutions (gems, plugins, libraries, etc) for integration ruby (rails) applications and jbilling?
I didn't find even api client for ruby.
I need someone to share his experience with integration. Jbilling has web-service (SOAP ,Java RMI, Burlap) but there is no specific gem for easy accessing and editing data via API.
JRuby 1.6.0 was Released yesterday.
Lines from jbilling manual "All of the API classes are located in the jbilling_api.jar file located in your jBilling
distribution.
The API also makes use of several third-party libraries, such as the Log4j library and
Commons Logging, which provides a powerful logging infrastructure; Spring, which
handles configuration and remoting; CXF, a SOAP library; and Hessian, for
Hessian/Burlap support. You'll therefore need to provide the log4j.jar, commons-
logging.jar and spring.jar files in your class path, if your project does not already
include them.
"
Is that good practice to include so many jars in jruby rails application ?

You can try making your app run on JRuby and using the Java libraries directly. We did an experimental branch of our own app for a similar reason and found some useful projects in the process:
https://github.com/nicksieger/warbler/
https://github.com/calavera/trinidad
In the end we didn't go for JRuby, for various reasons that weren't necessarily of a technical nature.

What version of jB are you using? You could use this project as an example to do your integration.
You can also build such a project from scratch using wsdl2 java utility. Typically, you can access jBilling wsdl at localhost:8080/jbilling/services/jbilling?wsdl assuming jb is running on your local at port 8080.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.