Mapreduce error with parquet format - java

I'm trying to run mapreduce job. My files are in a parquet format.
I'm getting the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/thrift/TException
at parquet.format.converter.ParquetMetadateConverter.readParquetMetadata(ParquetMetadateConverter.java:268)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:271)
at parquet.hadoop.ParquetFileReader.readSummeryFile(ParquetFileReader.java:200)
at parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummeryFiles(ParquetFileReader.java:99)
at parquet.hadoop.ParquetInputFormat.getFooters(ParquetInputFormat.java:354)
at parquet.hadoop.ParquetInputFormat.getFooters(ParquetInputFormat.java:339)
at parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:246)
...
I tried to add the jar that contains the TException with --libjars my_path/libthrift-0.9.0.jar and I still get the same error.

Please try setting the HADOOP_CLASSPATH parameter to point to a libthrift.jar file that matches the version you need.
For example:
export HADOOP_CLASSPATH=/var/lib/hdfs/libthrift-0.9.jar
Hope this helps!

Related

stanford nlp coreference resolution error: Exception in thread "main" java.lang.IllegalArgumentException: File doesn't exist: example_file.txt

I downloaded stanfordCoreNLP module version stanford-corenlp-full-2018-02-27 from the download page and unzipped the file. created a example_file.txt file in the directory where it was extracted. I added the text My name is Sam. I want to be an astronaut. I had snacks a while ago.. I navigated to the folder it was extracted to and tried to run the example code given for co-reference resolution in the command line
stanfordNLP page
java -Xmx5g -cp stanford-corenlp-3.9.1.jar:stanford-corenlp-3.9.1-sources.jar:* edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,mention,coref -coref.algorithm neural -file example_file.txt
I am getting below error message
Exception in thread "main" java.lang.IllegalArgumentException: File doesn't exist: example_file.txt
at edu.stanford.nlp.io.FileSequentialCollection$FileSequentialCollectionIterator.primeNextFile(FileSequentialCollection.java:364)
at edu.stanford.nlp.io.FileSequentialCollection$FileSequentialCollectionIterator.<init>(FileSequentialCollection.java:269)
at edu.stanford.nlp.io.FileSequentialCollection.iterator(FileSequentialCollection.java:238)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1166)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1010)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1365)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1427)
Any help on this?
Java lookup resources within the classpath that you defined with -cp option. The directory that contains example_file.txt should probably be included in it
-cp ".:stanford-corenlp-3.9.1.jar:stanford-corenlp-3.9.1-sources.jar:*"
The dot added to the class path means this directory which apparently contains your file. Also, double quotes prevent the shell to expand the wildcard at the end that it should not be there in my opinion unless it contains jars relevant to the app. At most, it could be *.jar.

MapReduce WordCount example issue

I was trying to run the basic WordCount example of Apache MapReduce 2.7 from here:
https://hadoop.apache.org/docs/r2.7.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0
I put the input files at : /user/hadoopLearning/WordCount/input/
Output path : /user/hadoopLearning/WordCount/output/
then I ran the following command :
hadoop jar wc.jar WordCount /user/hadoopLearning/WordCount/input/file01 /user/hadoopLearning/WordCount/output
However on running I am getting the following error:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: **Output directory** hdfs://sandbox.hortonworks.com:8020/user/hadoopLearning/WordCount/**input**/file01 already exists
I haven't written a single piece of code and copied everything from above location from Apache's website.
I understand the error , but the if we closely look at the error it says the output directory already exists and in the stack trace it gives the path of input directory.
Please can anyone help me. I am a beginner in the field of hadoop. Thanks in advance.
You're trying to create a file which already exists.
HDFS doesn't allow that.
replace your output path ('/user/hadoopLearning/WordCount/output'), with something else.
try this command
hadoop jar wc.jar WordCount /user/hadoopLearning/WordCount/input/file01 /user/hadoopLearning/WordCount/new_output_path

java.lang.NoClassDefFoundError with HBase Scan

I am trying to run a MapReduce job to scan a HBase table. Currently I am using the version 0.94.6 of HBase that comes with Cloudera 4.4. At some point in my program I use Scan(), and I properly import it with:
import org.apache.hadoop.hbase.client.Scan;
It compiles well and I am able to create a jar file too. I do it by passing the hbase classpath as the value for the -cp option. When running the program, I obtain the following message:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/Scan
I run the code using:
hadoop jar my_program.jar MyJobClass -libjars <list_of_jars>
where list_of_jars contains /opt/cloudera/parcels/CDH/lib/hbase/hbase.jar. Just to double-check, I confirmed that hbase.jar contains Scan. I do it with:
jar tf /opt/cloudera/parcels/CDH/lib/hbase/hbase.jar
And I can see the line:
org/apache/hadoop/hbase/client/Scan.class
in the output. All looks ok to me. I don't understand why is saying that Scan is not defined. I pass the correct jar, and it contains the class.
Any help is appreciated.
Setting the HADOOP_CLASSPATH variable fixed the issue:
export HADOOP_CLASSPATH=`/usr/bin/hbase classpath`

Why am I not able to generate report when I am trying to generate report via a batch file?

I am facing a very strange issue. When I am calling .jrxml file from my java class everything is working fine. But when I am calling the same java class from a batch file I am not able to generate the report. Console is showing some error like this. Please help me out with this.
The Stacktrace:
2012-12-21 16:15:13,466 net.sf.jasperreports.engine.fill.JRFillSubreport : prepare - Fill 31497899: exception java.lang.NullPointerException
at net.sf.jasperreports.engine.fill.JRFillImage.evaluateImage(JRFillImage.java:1034??)
at net.sf.jasperreports.engine.fill.JRFillImage.evaluate(JRFillImage.java:1004)
at net.sf.jasperreports.engine.fill.JRFillElementContainer.evaluate(JRFillElementCo??ntainer.java:258)
at net.sf.jasperreports.engine.fill.JRVerticalFiller.fillPageHeader(JRVerticalFille??r.java:403)
at net.sf.jasperreports.engine.fill.JRVerticalFiller.fillReportStart(JRVerticalFill??er.java:264)
at net.sf.jasperreports.engine.fill.JRVerticalFiller.fillReport(JRVerticalFiller.ja??va:128)
net.sf.jasperreports.engine.fill.JRFillBand.evaluate(JRFillBand.java:499)
at net.sf.jasperreports.engine.fill.JRVerticalFiller.fillBandNoOverflow(JRVerticalF??iller.java:439)
at net.sf.jasperreports.engine.fill.JRBaseFiller.fill(JRBaseFiller.java:946)
at net.sf.jasperreports.engine.fill.JRBaseFiller.fill(JRBaseFiller.java:845)
at net.sf.jasperreports.engine.fill.JRFillSubreport.fillSubreport(JRFillSubreport.j??ava:609)
at net.sf.jasperreports.engine.fill.JRSubreportRunnable.run(JRSubreportRunnable.jav??a:59)
at net.sf.jasperreports.engine.fill.JRThreadSubreportRunner.run(JRThreadSubreportRu??nner.java:205)
at java.lang.Thread.run(Thread.java:619)
My guess is that it cannot find the image. Try putting the absolute path to the image file, just to debug and see what happens. My guess is that it is looking for the image in the wrong place when you run it from command line.

Need assistance with running the WordCount.java provided by Cloudera

Hey guys so I am trying to run the WordCount.java example, provided by cloudera. I ran the command below and am getting the exception that I have put below the command. So do you have any suggestions on how to proceed. I have gone through all the steps provided by cloudera.
Thanks in advance.
hadoop jar ~/Desktop/wordcount.jar org.myorg.WordCount ~/Desktop/input
~/Desktop/output
Error:
ERROR security.UserGroupInformation: PriviledgedActionException
as:root (auth:SIMPLE)
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist: hdfs://localhost/home/rushabh/Desktop/input
Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: hdfs://localhost/home/rushabh/Desktop/input
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:194)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:977)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:969)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1248)
at org.myorg.WordCount.main(WordCount.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Your input and output files should be at hdfs. Atleast input should be at hdfs.
use the following command:
hadoop jar ~/Desktop/wordcount.jar org.myorg.WordCount hdfs:/input
hdfs:/output
To copy a file from your linux to hdfs use the following command:
hadoop dfs -copyFromLocal ~/Desktop/input hdfs:/
and check your file using :
hadoop dfs -ls hdfs:/
Hope this will help.
The error message says that this file does not exist: "hdfs://localhost/home/rushabh/Desktop/input".
Check that the file does exist at the location you've told it to use.
Check the hostname is correct. You are using "localhost" which most likely resolves to a loopback IP address; e.g. 127.0.0.1. That always means "this host" ... in the context of the machine that you are running the code on.
When I tried to run wordcount MapReduce code, I was getting error as:
ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/hduser/wordcount
I was trying to execute the wordcount MapReduce java code with input and output path as /user/hduser/wordcount and /user/hduser/wordcount-output. I just added 'fs.default.name' from core-site.xml before this path and it ran perfectly.
The error clearly states that your input path is local. Please specify the input path to something on HDFS rather than on local machine. My guess
hadoop jar ~/Desktop/wordcount.jar org.myorg.WordCount ~/Desktop/input
~/Desktop/output
needs to be changed to
hadoop jar ~/Desktop/wordcount.jar org.myorg.WordCount <hdfs-input-dir>
<hdfs-output-dir>
NOTE: To run MapReduce job, the input directory should be in HDFS, not local.
Hope this helps.
So I added the input folder to HDFS using the following command
hadoop dfs -put /usr/lib/hadoop/conf input/
Check the ownership of the files in hdfs to ensure that the owner of the job (root) has read privileges on the input files. Cloudera provides an hdfs viewer that you can use to view the filespace; open a web browser to either localhost:50075 or {fqdn}:50075 and click on "Browse the filesystem" to view the Input directory and input files. Check the ownership flags; just like *nix filesystem.

Categories

Resources