Hadoop client JDK compatibility for communicating with Cloudera cluster - java

At the moment, we have both our Hadoop cluster (CDH 4.1.2) and services that communicate with it via hadoop-client running Java 6. We're planning to move those client components to Java 8 leaving Hadoop servers running on top of Java 6 as they do now because Cloudera has declared support for JDK 8 only since version 5.3.0, and we're not planning to upgrade Hadoop - details here:
https://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_rn_new_in_530.html
Therefore, our concern is whether different versions of Java in cluster (6) and in client components (8) may lead to problems of any kind. Internet hasn't been of big help as long as Hadoop compatibility with Java is mainly discussed regarding migration of server components, so please share your experience relevant to this matter if you have one.

Related

Difference between Java and Oracle Java for Redhat

I want to update my jdk for some security reasons in Redhat system and updated to jdk7u79 successfully.
Redhat has published some java vulnerabilities in their site with the name Oracle java for RHEL Server.
Do I need to update my jdk as mentioned in the RHEL site? Is jdk from oracle site is different from Oracle java for RHEL Server.
Reference
Oracle java is based on openjdk with some proprietary bits added:
– Sometimes those bits are supposed to increase performance (jrockit traces…),
– Sometimes those bits will improve compatibility (because they've been inherited from SUN and app authors tested against them since SUN era). A lot of the "stability" attributed to Oracle/Sun java is just app authors learning to avoid the bugs of Oracle/Sun proprietary bits, and adding workarounds that trip on jvms without those bugs (see also : IE6)
Red Hat java is based on the openjdk only
– pure openjdk is better integrated with the system. The openjdk guys try hard to remove residual java-isms and use the same conventions as other system apps
— pure openjdk is more forward facing. Oracle knows that SUN almost killed Java with byzantine combinations of proprietary tech it couldn't afford to maintain. Anything Oracle needs long term will end up in openjdk. It is sufficient for the openjdk implementation to achieve parity with the proprietary bits for Oracle to kill them – no $$$ in maintaining proprietary tech when similar free tech is available.
– it is very common for Red Hat to backport the code written for the next openjdk version in current redhat java, when it solves a problem in this version (as long as the current API is conserved), while Oracle will tend to wait for this next openjdk version before proposing it.
To my knowledge Oracle has been thoroughly disgusted by the way SUN handled java 1.6 (it was called java 1.6 but development was not linear, desktop/server/windows/linux jvms were all different with bits added in one version that could not be used in another due to coding shortcuts and complex licensing agreements, each of them lagged the others one way or another). Oracle intends to maintain a classic linear development pipe: openjdk next → current openjdk → oracle java
Whichever version you use you need to apply the security updates published by its maintainer. It's useless to use Oracle java as update to Red Hat java or vice versa, it's slightly different code with slightly different security bugs. Both companies have capable engineers and share security fixes in the openjdk trunk. When the fixed builds are published depend on embargo agreements and security fix policies. Oracle will tend to batch fixes in infrequent pre-planned releases, unless there is a critical vulnerability. Red Hat will publish as soon as there is something security-related to fix, be it big or small. Red Hat build processes are more agile than those Oracle uses. The Linux build processes are 100% automated, while Oracle needs to worry about windows & co.
Lastly Oracle Java as published in RHEL is a repackaging of Oracle files to use native Linux packaging tech and use the same path (etc) conventions as the openjdk packages (making it easy to replace one with another), while Oracle Java as published by Oracle still follows the very strange naming and path conventions SUN Solaris/windows people thought appropriate on Linux. It should have no more and no less security vulnerabilities than Oracle Java as published by Oracle (for the same version), just be a lot more convenient to deploy. It is designed to be just another linux package set, that can be deployed on many linux servers using native package deployment systems. When you have hundreds of servers to manage it is a great help not to have to special-case the jvm.
Each year in february Red Hat and Oracle top java people meet publicly at fosdem and present their current priorities.If you're interested you can consult their past presentations in fosdem public archives.

Rolling Java upgrade in Cassandra

we are currently running Apache Cassandra 1.2.13 on 4 machines having Java 6 and are planning to upgrade Java to version 7.
Is it possible to do a rolling Java upgrade in a running cluster, i.e. node by node?
I know that Thrift protocols for client-server communication are incompatible for Java versions 6 & 7. How does that apply to the internal server-server communication within a cluster?
I haven't found any mentioning about that in the documentation or other sources.

Method to remotely access HDFS in JAVA that works on old and new versions of Hadoop

I am trying to remotely access the HDFS with a program written in JAVA. WebHDFS works well with most recent versions of Hadoop, but which protocol(s) should I choose that will work on the largest number of versions of Hadoop?
If possible, I would like to use a single protocol that will work on all versions of Hadoop as long as it won't run much slower than using different protocols for different versions of Hadoop.
LibHDFS is present in older (1.x) and newer (2.x) releases of hadoop. It is pure java and has pretty stable API.

Java upgrade from 1.4 to 6

I am using Jboss 2.4.11, if i upgrade JDK from 1.4 to 6 How is the jboss server going to handle the application?. What are the common things i should start investigating while i am in the process of the upgrade. I am looking at the Oracle's documentation and other posts in stackoverflow related to jdk 6 backwards compatibility with v1.4. My question is more specific towards using Jboss server. Also the application uses ejb 1.1
I'd recomment to move from one consistent system to another one. Even JBoss 4 needs a special version for JDK1.6. Java 5 brought MBeans rigth into the VM and older versions of JBoss used MBeans for configuration. As there must not be to MBean servers within a single VM this was a big issue, when migrated to java5. As EJB 1.1 is still supported I'd recomment to move at least to JBoss 4.2 as this is still kind of similar to older versions, while JBoss 7 is totally different.
The only thing that will really help you to get the migration a little bit smother are tests. At least quite a number of integration tests.

Which version of Hadoop API to use

There are several versions of Hadoop APIs that are available as part of Cloudera and Yahoo distributions. Furthermore, for Cloudera there is cdh3u1 to cdh3u4 versions.
I saw that the API methods also change in the way they are named and the parameters they accept.
Which version of Hadoop API, and from where, can I use that is latest and stable?
Which version of Hadoop API, and from where, can I use that is latest and stable?
First thing to note that the latest and stable API don't go together. It takes some time for the latest API to become rock solid, with all the bugs found out and fixed.
If you are interested in packaged software, then go to Cloudera and download a stable or an alpha version and try it out. For HortonWorks you can download HDP 1.0 which is the only version available. Cloudera has been releasing CDH close to 4 years on a regular basis, so it is more mature compared to HDP from HortonWorks. CDH has got the next generation MapReduce included, while HDP has got the legacy MapReduce architecture.
The above mentioned packages (CDH and HDP) have a set of frameworks well integrated and tested. So, it's matter of learning how to use the frameworks. There is no need to worry about the interoperability issues across different frameworks.
If you wanted to really learn about Hadoop, I would suggest to download the software from Apache Hadoop and then go ahead with the installation and configuration. The same applies for Pig, Hive and other softwares also. You might find out some compatibility issues, which have to be resolved as you go on.
In the Apache Hadoop space, there is 1x track which has the stable legacy MR architecture and then the 2x track which has the next generation MapReduce architecture.

Categories

Resources