Hadoop: How to write to HDFS using a Java application - java

I am new to Hadoop and trying to learn. I am trying to run the below Hadoop sample code in Eclipse on Ubuntu Linux. I have Hadoop v 2.7.0 and I have the required jars.
Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/conf/core-site.xml"));
Path pt=new Path("hdfs://localhost:9000/myhome/a.txt");
FileSystem fs = FileSystem.get(conf);
When I run the application in Eclipse I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName. The hadoop-common-2.7.0.jar file I am referencing does not contain the class application is looking for. I am referencing that jar file Hadoop/common folder.
Any help is resolving this issue will be much appreciated.
If I create a jar file of the class for above above code and run it using hadoop -jar <jar file> <class name>, it works. So I am wondering whether it's possible at all to run a Hadoop Java application from Eclipse or command line without using hadoop command.

It seems that the JVM doesn't load all required Hadoop artifacts.
If you are a maven user, please ensure that you have these dependencies.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.client.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.client.version}</version>
</dependency>

Related

Trying to run libGDX with Maven and Eclipse

Maybe a newbie question...
I've been working on a LWJGL project, where I use Maven to manage dependencies. In it, I want to use some parts of the libgdx library. So I figured I will first run at least a helloworld working with it before I add it to my main project.
So in my pom.xml I have this:
<!-- https://mvnrepository.com/artifact/com.badlogicgames.gdx/gdx-backend-lwjgl -->
<dependency>
<groupId>com.badlogicgames.gdx</groupId>
<artifactId>gdx-backend-lwjgl</artifactId>
<version>1.9.11</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/com.badlogicgames.gdx/gdx-platform -->
<dependency>
<groupId>com.badlogicgames.gdx</groupId>
<artifactId>gdx-platform</artifactId>
<version>1.9.11</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/com.badlogicgames.gdx/gdx -->
<dependency>
<groupId>com.badlogicgames.gdx</groupId>
<artifactId>gdx</artifactId>
<version>1.9.11</version>
</dependency>
The other contents of the file are the same as in a working project and are 100% working.
I tried creating a separate libgdx project before that and... it didn't work. But, I saw that the code that was supposed to run the program was:
public static void main (String[] arg) {
LwjglApplicationConfiguration config = new LwjglApplicationConfiguration();
new LwjglApplication(new SomeApplicationListenerFile(), config);
}
So I used that in my maven project.
When I do "run as a Java Application", the error is the following:
Exception in thread "main" java.lang.NoClassDefFoundError: com/badlogic/gdx/backends/lwjgl/LwjglApplicationConfiguration
at org.boby.RayTracing.main.Main.main(Main.java:179)
And if I do a Maven Build, it tells me that "package com.badlogic.gdx.backends.lwjgl does not exist"
I looked for that package in the jars Maven downloaded in the "Maven dependencies" folder and I found it in gdx-backend-lwjgl-1.9.11.jar - right where it should be.
The package is apparently there, but Java cannot find it. How can I fix that?
Some additional information:
Windows 10, eclipse oxygen, Maven 3.6.0, JRE 1.8.0_191, JDK 8
Thank you in advance! I've been banging my head on this for hours.
Edit: I made some progress. Looks like the "test" was messing things up so I removed those statements. Now I get the following Error:
Exception in thread "main" com.badlogic.gdx.utils.GdxRuntimeException: Couldn't load shared library 'gdx64.dll' for target: Windows 10, 64-bit
at com.badlogic.gdx.utils.SharedLibraryLoader.load(SharedLibraryLoader.java:125)
at com.badlogic.gdx.utils.GdxNativesLoader.load(GdxNativesLoader.java:33)
at com.badlogic.gdx.backends.lwjgl.LwjglNativesLoader.load(LwjglNativesLoader.java:47)
at com.badlogic.gdx.backends.lwjgl.LwjglApplication.<init>(LwjglApplication.java:83)
at com.badlogic.gdx.backends.lwjgl.LwjglApplication.<init>(LwjglApplication.java:71)
at org.boby.RayTracing.main.Main.main(Main.java:178)
It looks like I need to include gdx-natives.jar in my dependencies, but I can't find a maven repository for it.
I downloaded gdx-natives.jar (saw it in a forum thread). In there, was a file named "gdx-64.dll". As I need "gdx64.dll", I just renamed the dll and now it runs.
You can let Maven do the work if you define the gdx-platform dep like this:
<dependency>
<groupId>com.badlogicgames.gdx</groupId>
<artifactId>gdx-platform</artifactId>
<version>1.9.11</version>
<classifier>natives-desktop</classifier>
</dependency>
This will load the natives jar including the gdx64.dll so you don't have to add any external jar to your project in the build path.
A side note is: if you use the standard Maven repo directory structure and you load assets with the Gdx.files.internal("fileName") statement you need to define a folder in the main/repository with the same name as the package you have your code in. (i.e. main/java/myPackage relates to /main/repository/myPackage). I struggled a bit with this because I don't normaly have to define a package folder int the repository dir.

FOP 2.3: Problems wiht fo:external-graphic

This is my case when i execute my java jar application in server, not directly from IDE.
I have config files in this path: C:\Temp\myuser\myappname\config\xslt. The main archives are header.jpg, fopfile.xconf and style.stl
This applications calls to Zxing library to generate a qr code in order to attach to a PDF new file. My application runs in C:\Temp\myapp\myapp.jar, so the gneerated QRCode file in png format, will be saved in that path with name qrcode.png.
My xsl-fo file uses the infamous tag fo:external-graphic. For both cases I use:
<fo:external-graphic src="url('file:\\C:\Temp\myuser\myappname\config\xslt\header.jpg')"/>
<fo:external-graphic src="url('file:\\C:\Temp\myapp\qrcode.png')"/>
But again the jar crushes and says
GRAVE: image not found, every time. I tried changing paths and the same error happens.
Now look, If i run this app from IDE -VSCode- this problem never happens.
Please guys help us what can we do? I read all the docs in the tutorial from apache but again nothing works.
note: I generate the jar via this: mvn clean compile assembly:single -f, so i create opne jar with all dependencies embedded in.
<!--Just put this first dependency **xmlgraphics-commons** before the **fop** dependency-->
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>xmlgraphics-commons</artifactId>
<version>2.3</version>
</dependency>
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>fop</artifactId>
<version>2.3</version>
</dependency>

java.lang.NoClassDefFoundError: running Mahout example on a hadoop Cluster

I'm following the Mahout In Action tutorial for kmeans clustring, i use the same code found here:
with the same pom.xml also.
On my local machine using eclipse every thing works fine, so i build the jar file (clustering-0.0.1-SNAPSHOT.jar) and bring it to the cluster (Hortonworks 2.3) when trying to run it using: hadoop jar clustering-0.0.1-SNAPSHOT.jar com.digimarket.clustering.App (I named my project differently) I get this error:
java.lang.NoClassDefFoundError:
org/apache/mahout/common/distance/DistanceMeasure
I know it's a dependency issue, I found questions asked by users who had this issue before but couldn't understand how they solved it.
here and here
This is the content of mahout directory in my cluster:
ls /usr/hdp/2.3.4.0-3485/mahout/
bin
conf
doc
lib
mahout-examples-0.9.0.2.3.4.0-3485.jar
mahout-examples-0.9.0.2.3.4.0-3485-job.jar
mahout-integration-0.9.0.2.3.4.0-3485.jar
mahout-math-0.9.0.2.3.4.0-3485.jar
mahout-mrlegacy-0.9.0.2.3.4.0-3485.jar
mahout-mrlegacy-0.9.0.2.3.4.0-3485-job.jar
Thanks.
It looks like you have a dependency that is not available to your code on your cluster.
Based on the pom.xml from that project you should be using:
<properties>
<mahout.version>0.5</mahout.version>
<mahout.groupid>org.apache.mahout</mahout.groupid>
</properties>
...
<dependencies>
<dependency>
<groupId>${mahout.groupid}</groupId>
<artifactId>mahout-core</artifactId>
<version>${mahout.version}</version>
</dependency>
...
</dependencies>
The class org.apache.mahout.common.distance.DistanceMeasure is included in the mahout-core-0.*.jar I have mahout-core-0.7.jar and the class is present in there.
You can download that jar and include it with the -libjars flag or you can put it on the hadoop classpath.

How to write logs to a file using Log4j and Storm Framework?

I am having bit of an issue in logging to a file using log4j in storm .
Before submitting my topology , i.e in my main method I wrote some log statements and configured the logger using :
PropertyConfigurator.configure(myLog4jProperties)
Now when I run my topology using my executable jar in eclipse -
its working fine and log files are being created as supposed.
OR
When i run my executable jar using "java -jar MyJarFile
someOtherOptions", i can see log4j being configured and the files are
formed correctly and logging is done on both files and console (as
defined in my log4j.properties)
BUT when i run the same jar using "storm jar MyJarFile MyMainClass someOtherOptions" it is not being able to create and log
into any of the files except on console.
I am talking about the logs I am printing BEFORE submitting my topology.
Is there any way to log my statements in a file while using storm ? I am not bound to use org.apache.log4j.
The storm framework uses its own logging. Your logs most likely will end up in the logs dir where storm is installed({Storm DIR}/logs). You can find the storm log properties in {Storm DIR}/logback/cluster.xml. It uses logback not log4j
I would recommend using SLF4J for your logging within storm. You probably could get LOG4J working but you will have additional setup to do on each node in the cluster. Since storm does this for you already, I don't really see the point.
In your project, include these maven dependencies (slf4j-simple for your unit tests!):
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.5</version>
<scope>test</scope>
</dependency>
Then, inside your topologies/spouts/bolts you just get the Logger:
private static final Logger LOG = LoggerFactory.getLogger(MySpout.class);
In my environment, anything INFO and above is logged to file and, more importantly, visible in the Storm UI. You just need to click on your spout/bolt and then click on the port number to go to the log viewer page.
If you want to get to the actual files then you can gather them off of each node (mine are in /var/log/storm/).

Issue in getImageWritersByFormatName for Tiff. Getting image writer

I am trying to convert PDF to tif images. I use following code to get the image writers by format.
Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName("TIFF");
if (writers == null || !writers.hasNext()) {
throw new ImageWritingException();
}
This works fine when I run the application standalone in Eclipse. But when I deploy the application to tomcat server in linux, writers==null is false but !writers.hasNext is true resulting to throw Exception.
I use maven to build the project to war.
I have following dependencies in the pom file.
<dependency>
<groupId>org.icepdf</groupId>
<artifactId>icepdf-core</artifactId>
</dependency>
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
</dependency>
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai-codec</artifactId>
</dependency>
<dependency>
<groupId>javax.media</groupId>
<artifactId>jai_core</artifactId>
</dependency>
What can be the difference between two environments? How can I fix this issue?
I met the same issue and found the root cause.
Let me summarize first, the issue does not occur in eclipse on dev machine, and it occurs on Tomcat server.
The root cause is that the imageio uses SPI, and there is a basic implementation in JDK (please refer to rt.jar, we can find it with two plugins for bmp and jpeg.) while the plugins we wants are in jai_imageio.jar.
With default configuration, Tomcat scans the one in rt.jar for plugins during initialization for ImageIO. Later when the application runs, the jai_imageio.jar won't be scanned.
As a result the plugins in jai_imageio.jar are not available. When running in dev machine, jai_imageio.jar is scanned.
There are several solutions as listed below, i would recommend the first one as it fits the design intention of ImageIO.
without changing the tomcat default configuration, re-scan the jar.
static {
ImageIO.scanForPlugins();
}
changing the tomcat configuration, so tomcat won't initialize ImageIO.
edit file /conf/server.xml,add appContextProtection="false" like following:
<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" appContextProtection="false" />
With suchconfiguration, tomcat won't call ImageIO.getCacheDirectory in JreMemoryLeakPreventionListener, so ImageIO is not initialized until our codes runs.
Tiff support is provided by the Java Advanced Imaging plugin jai_core.jar.
In order to work correctly, the jar file needs to be added to the JVM's ext directory, otherwise it won't register properly
Got my writer like this :
TIFFImageWriterSpi tiffSpi = new TIFFImageWriterSpi();
ImageWriter imageWriter = tiffSpi.createWriterInstance();

Categories

Resources