Why do I get this exception java.lang.NoClassDefFoundError? - java

I am trying to use HBase and Hadoop together. When I run the JAR file I get this error. Here is my source code:
public class TwitterTable {
final static Charset ENCODING = StandardCharsets.UTF_8;
final static String FILE_NAME = "/home/hduser/project04/sample.txt";
static class Mapper1 extends TableMapper<ImmutableBytesWritable, IntWritable>
{
byte[] value;
#Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException
{
value = values.getValue(Bytes.toBytes("text"), Bytes.toBytes(""));
String valueStr = Bytes.toString(value);
System.out.println("GET: " + valueStr);
}
}
public static class Reducer1 extends TableReducer<ImmutableBytesWritable, IntWritable, ImmutableBytesWritable> {
public void reduce(ImmutableBytesWritable key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
}
}
public static void main( String args[] ) throws IOException, ClassNotFoundException, InterruptedException
{
Configuration conf = new Configuration();
#SuppressWarnings("deprecation")
Job job = new Job(conf, "TwitterTable");
job.setJarByClass(TwitterTable.class);
HTableDescriptor ht = new HTableDescriptor( "twitter" );
ht.addFamily( new HColumnDescriptor("text"));
HBaseAdmin hba = new HBaseAdmin( conf );
if(!hba.tableExists("twitter"))
{
hba.createTable( ht );
System.out.println( "Table Created!" );
}
//Read the file and add to the database
TwitterTable getText = new TwitterTable();
Scan scan = new Scan();
String columns = "text";
scan.addColumn(Bytes.toBytes(columns), Bytes.toBytes(""));
TableMapReduceUtil.initTableMapperJob("twitter", scan, Mapper1.class, ImmutableBytesWritable.class,
IntWritable.class, job);
job.waitForCompletion(true);
//getText.readTextFile(FILE_NAME);
}
void readTextFile(String aFileName) throws IOException
{
Path path = Paths.get(aFileName);
try (BufferedReader reader = Files.newBufferedReader(path, ENCODING)){
String line = null;
while ((line = reader.readLine()) != null) {
//process each line in some way
addToTable(line);
}
}
System.out.println("all done!");
}
void addToTable(String line) throws IOException
{
Configuration conf = new Configuration();
HTable table = new HTable(conf, "twitter");
String LineText[] = line.split(",");
String row = "";
String text = "";
row = LineText[0].toString();
row = row.replace("\"", "");
text = LineText[1].toString();
text = text.replace("\"", "");
Put put = new Put(Bytes.toBytes(row));
put.addColumn(Bytes.toBytes("text"), Bytes.toBytes(""), Bytes.toBytes(text));
table.put(put);
table.flushCommits();
table.close();
}
}
I added the class path to the hadoop-env.sh still no luck.. I don't know what's the problem. Here my hadoop-env.sh class path :
export HADOOP_CLASSPATH=
/usr/lib/hbase/hbase-1.0.0/lib/hbase-common-1.0.0.jar:
/usr/lib/hbase/hbase-1.0.0/lib/hbase-client.jar:
/usr/lib/hbase/hbase-1.0.0/lib/log4j-1.2.17.jar:
/usr/lib/hbase/hbase-1.0.0/lib/hbase-it-1.0.0.jar:
/usr/lib/hbase/hbase-1.0.0/lib/hbase-common-1.0.0-tests.jar:
/usr/lib/hbase/hbase-1.0.0/conf:
/usr/lib/hbase/hbase-1.0.0/lib/zookeeper-3.4.6.jar:
/usr/lib/hbase/hbase-1.0.0/lib/protobuf-java-2.5.0.jar:
/usr/lib/hbase/hbase-1.0.0/lib/guava-12.0.1.jar

Ok I found it.. maybe you cannot add everything to the class path. In that case copy all the libraries from the HBase and add into the Hadoop(refer the hadoop.env.sh)
HADOOP_DIR/contrib/capacity-scheduler
It worked for me.

Related

How to generate .dot file using Schemacrawler

Using schemcrawler I've generated html file
public final class ExecutableExample {
public static void main(final String[] args) throws Exception {
// Set log level
new LoggingConfig(Level.OFF);
final LimitOptionsBuilder limitOptionsBuilder = LimitOptionsBuilder.builder()
.includeSchemas(new IncludeAll())
.includeTables(new IncludeAll());
final LoadOptionsBuilder loadOptionsBuilder =
LoadOptionsBuilder.builder()
// Set what details are required in the schema - this affects the
// time taken to crawl the schema
.withSchemaInfoLevel(SchemaInfoLevelBuilder.standard());
final SchemaCrawlerOptions options =
SchemaCrawlerOptionsBuilder.newSchemaCrawlerOptions()
.withLimitOptions(limitOptionsBuilder.toOptions())
.withLoadOptions(loadOptionsBuilder.toOptions());
final Path outputFile = getOutputFile(args);
final OutputOptions outputOptions =
OutputOptionsBuilder.newOutputOptions(TextOutputFormat.html, outputFile);
final String command = "schema";
try (Connection connection = getConnection()) {
final SchemaCrawlerExecutable executable = new SchemaCrawlerExecutable(command);
executable.setSchemaCrawlerOptions(options);
executable.setOutputOptions(outputOptions);
executable.setConnection(connection);
executable.execute();
}
System.out.println("Created output file, " + outputFile);
}
private static Connection getConnection() {
final String connectionUrl = "jdbc:postgresql://localhost:5433/table_accounts";
final DatabaseConnectionSource dataSource = new DatabaseConnectionSource(connectionUrl);
dataSource.setUserCredentials(new SingleUseUserCredentials("postgres", "new_password"));
return dataSource.get();
}
private static Path getOutputFile(final String[] args) {
final String outputfile;
if (args != null && args.length > 0 && !isBlank(args[0])) {
outputfile = args[0];
} else {
outputfile = "./schemacrawler_output.html";
}
final Path outputFile = Paths.get(outputfile).toAbsolutePath().normalize();
return outputFile;
}
But I want to have an output in .dot file that contains diagram, node, graph, edge etc.. So how can I do it using my code or maybe some another way to do it with Java?
Simply change the output format from TextOutputFormat.html to DiagramOutputFormat.scdot.
Sualeh Fatehi, SchemaCrawler

How do I import and use a class in Mapper in Hadoop?

I have a class PorterStemmer which I would like to use in my Mapper. My Driver class consists of Mapper, Reducer too. I tried putting the PorterStemmer class in the Driver class, but Hadoop showed ClassNotFoundException during runtime. I also tried putting the PorterStemmer in a JAR and added it to distributed cache but then obviously I got compiler time error as PorterStemmer wasn't present inside Driver class. Is there anyway I can get around this problem?
Here is my Driver class
public class InvertedIndex {
public static class IndexMapper extends Mapper<Object, Text, Text, Text>{
private Text word = new Text();
private Text filename = new Text();
private boolean caseSensitive = false;
public static PorterStemmer stemmer = new PorterStemmer();
String token;
#Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String filenameStr = ((FileSplit) context.getInputSplit()).getPath().getName();
filename = new Text(filenameStr);
String line = value.toString();
if (!caseSensitive) {
line = line.toLowerCase();
}
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
token = tokenizer.nextToken();
stemmer.add(token.toCharArray(), token.length());
stemmer.stem();
token =stemmer.toString();
word.set(token);
context.write(word, filename);
}
}
}
public static class IndexReducer extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
StringBuilder stringBuilder = new StringBuilder();
for (Text value : values) {
stringBuilder.append(value.toString());
if (values.iterator().hasNext()) {
stringBuilder.append(" -> ");
}
}
context.write(key, new Text(stringBuilder.toString()));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "inverted index");
job.addCacheFile(new Path("/invertedindex/lib/stemmer.jar").toUri());
job.setJarByClass(InvertedIndex.class);
/* Field separator for reducer output*/
job.getConfiguration().set("mapreduce.output.textoutputformat.separator", " | ");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(IndexMapper.class);
job.setCombinerClass(IndexReducer.class);
job.setReducerClass(IndexReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path inputFilePath = new Path(args[0]);
Path outputFilePath = new Path(args[1]);
FileInputFormat.addInputPath(job, inputFilePath);
FileOutputFormat.setOutputPath(job, outputFilePath);
/* Delete output filepath if already exists */
FileSystem fs = FileSystem.newInstance(conf);
if (fs.exists(outputFilePath)) {
fs.delete(outputFilePath, true);
}
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Either you build a fat jar with all the dependencies or share the jar to nodes using below process
You need to use -libjars to make the jar you are using distributed to all nodes. Then this new jar would be added to classpath of the task node and picked up by either mapper or reducer
hadoop jar yourJar.jar com.JobClass -libjars /path/of/stemmer.jar

Reduce method in Reducer class is not executing

In the below code,inside reducer class reduce method is not executing. please help me.In my reduce method i want to write output in multiple files. so i have used multipleoutputs.
public class DataValidation {
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
int flag = 1;
boolean result;
private HashMap<String, FileConfig> fileConfigMaps = new HashMap<String, FileConfig>();
private HashMap<String, List<LineValidator>> mapOfValidators = new HashMap<String, List<LineValidator>>();
private HashMap<String, List<Processor>> mapOfProcessors = new HashMap<String, List<Processor>>();
protected void setup(Context context) throws IOException {
System.out.println("configure inside map class");
ConfigurationParser parser = new ConfigurationParser();
Config config = parser.parse(new Configuration());
List<FileConfig> file = config.getFiles();
for (FileConfig f : file) {
try {
fileConfigMaps.put(f.getName(), f);
System.out.println("quotes in" + f.isQuotes());
System.out.println("file from xml : " + f.getName());
ValidationBuilder builder = new ValidationBuilder();
// ProcessorBuilder constructor = new ProcessorBuilder();
List<LineValidator> validators;
validators = builder.build(f);
// List<Processor> processors = constructor.build(f);
mapOfValidators.put(f.getName(), validators);
// mapOfProcessors.put(f.getName(),processors);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// String filename = ((FileSplit) context.getInputSplit()).getPath()
// .getName();
FileSplit fs = (FileSplit) context.getInputSplit();
String fileName = fs.getPath().getName();
System.out.println("filename : " + fileName);
String line = value.toString();
String[] csvDataArray = null;
List<LineValidator> lvs = mapOfValidators.get(fileName);
flag = 1;
csvDataArray = line.split(",", -1);
FileConfig fc = fileConfigMaps.get(fileName);
System.out.println("filename inside fileconfig " + fc.getName());
System.out.println("quote values" + fc.isQuotes());
if (fc.isQuotes()) {
for (int i = 0; i < csvDataArray.length; i++) {
csvDataArray[i] = csvDataArray[i].replaceAll("\"", "");
}
}
for (LineValidator lv : lvs) {
if (flag == 1) {
result = lv.validate(csvDataArray, fileName);
if (result == false) {
String write = line + "," + lv.getFailureDesc();
System.out.println("write" + write);
System.out.println("key" + new Text(fileName));
// output.collect(new Text(filename), new Text(write));
context.write(new Text(fileName), new Text(write));
flag = 0;
if (lv.stopValidation(csvDataArray) == true) {
break;
}
}
}
}
}
protected void cleanup(Context context) {
System.out.println("clean up in mapper");
}
}
public static class Reduce extends Reducer<Text, Text, NullWritable, Text> {
protected void reduce(Text key, Iterator<Text> values, Context context)
throws IOException, InterruptedException {
System.out.println("inside reduce method");
while (values.hasNext()) {
System.out.println(" Nullwritable value" + NullWritable.get());
System.out.println("key inside reduce method" + key.toString());
context.write(NullWritable.get(), values.next());
// out.write(NullWritable.get(), values.next(), "/user/hadoop/"
// + context.getJobID() + "/" + key.toString() + "/part-");
}
}
}
public static void main(String[] args) throws Exception {
System.out.println("hello");
Configuration configuration = getConf();
Job job = Job.getInstance(configuration);
job.setJarByClass(DataValidation.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
private static Configuration getConf() {
return new Configuration();
}
}
You have not properly over-ridden reduce method. Use this:
public void reduce(Key key, Iterable values,
Context context) throws IOException, InterruptedException

Hadoop mapper is never called, custom input format might be the issue

So I am doing a little test program just to get the hang of hadoops inputformat classes. I had a word search already built which took in lines as values and searched for the word line by line. I wanted to see if I could get hadoop to take in values word by word, hadoop doesn't seem to like that and keeps giving me results using the default mapper. My mappers initialize function is never even called.
I do know my record reader is called and that it is doing more or less what it is supposed to and I'm pretty sure the output of the record reader is what my mapper is searching for so why does hadoop decide not to call it?
Here is the relevant code
Input Format Class
public class WordReader extends FileInputFormat<Text, Text> {
#Override
public RecordReader<Text, Text> createRecordReader(InputSplit split,
TaskAttemptContext context) {
return new MyWholeFileReader();
}
}
Record Reader
public class MyWholeFileReader extends RecordReader<Text, Text> {
private long start;
private LineReader in;
private Text key = null;
private Text value = null;
private ArrayList<String> outputvalues;
public void initialize(InputSplit genericSplit,
TaskAttemptContext context) throws IOException {
outputvalues = new ArrayList<String>();
FileSplit split = (FileSplit) genericSplit;
Configuration job = context.getConfiguration();
start = split.getStart();
final Path file = split.getPath();
// open the file and seek to the start of the split
FileSystem fs = file.getFileSystem(job);
FSDataInputStream fileIn = fs.open(split.getPath());
in = new LineReader(fileIn, job);
if (key == null) {
key = new Text();
}
key.set(split.getPath().getName());
if (value == null) {
value = new Text();
}
}
public boolean nextKeyValue() throws IOException {
if (outputvalues.size() == 0) {
Text buffer = new Text();
int i = in.readLine(buffer);
String str = buffer.toString();
for (String vals : str.split(" ")) {
outputvalues.add(vals);
}
if (i == 0 || outputvalues.size() == 0) {
key = null;
value = null;
return false;
}
}
value.set(outputvalues.remove(0));
System.out.println(value.toString());
return true;
}
#Override
public Text getCurrentKey() {
return key;
}
#Override
public Text getCurrentValue() {
return value;
}
/**
*
* Get the progress within the split
*/
public float getProgress() {
return 0.0f;
}
public synchronized void close() throws IOException {
if (in != null) {
in.close();
}
}
}
Mapper
public class WordSearchMapper extends Mapper<Text, Text, OutputCollector<Text,IntWritable>, Reporter> {
static String keyword;
BloomFilter<String> b;
public void configure(JobContext jobConf) {
keyword = jobConf.getConfiguration().get("keyword");
System.out.println("keyword>> " + keyword);
b = new BloomFilter<String>(.01,10000);
b.add(keyword);
System.out.println(b.getExpectedBitsPerElement());
}
public void map(Text key, Text value, OutputCollector<Text,IntWritable> output,
Reporter reporter) throws IOException {
int wordPos;
System.out.println("value.toString()>> " + value.toString());
System.out.println(((FileSplit) reporter.getInputSplit()).getPath()
.getName());
String[] tokens = value.toString().split("[\\p{P} \\t\\n\\r]");
for (String st :tokens) {
if (b.contains(st)) {
if (value.toString().contains(keyword)) {
System.out.println("Found one");
wordPos = ((Text) value).find(keyword);
output.collect(value, new IntWritable(wordPos));
}
}
}
}
}
Driver:
public class WordSearch {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf,"WordSearch");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(WordSearchMapper.class);
job.setInputFormatClass( WordReader.class);
job.setOutputFormatClass(TextOutputFormat.class);
conf.set("keyword", "the");
FileInputFormat.setInputPaths(job, new Path("search.txt"));
FileOutputFormat.setOutputPath(job, new Path("outputs"+System.currentTimeMillis()));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
And I figured it out... this is why hadoop needs to stop supporting multiple versions of itself or why I should stop jamming multiple tutorials together. Turns out my mapper needs to be set up like this for the way my mapper and record reader are set up to interact.
'public class WordSearchMapper extends Mapper { static String keyword;`
I only realized this after looking at my imports and seeing that reporter was from package org.apache.hadoop.mapred as opposed to org.apache.hadoop.mapreduce –

ClassCast Error while writing to Cassandra from hadoop job

I am running a hadoop job and trying to write the output to Cassandra. I am getting following exception:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to java.nio.ByteBuffer
at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:514)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:156)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
I modeled my map reduce code on the WordCount example given at https://wso2.org/repos/wso2/trunk/carbon/dependencies/cassandra/contrib/word_count/src/WordCount.java
Here's my MR code:
public class SentimentAnalysis extends Configured implements Tool {
static final String KEYSPACE = "Travel";
static final String OUTPUT_COLUMN_FAMILY = "Keyword_PtitleId";
public static class Map extends Mapper<LongWritable, Text, Text, LongWritable> {
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
Sentiment sentiment = null;
try {
sentiment = (Sentiment) PojoMapper.fromJson(line, Sentiment.class);
} catch(Exception e) {
return;
}
if(sentiment != null && sentiment.isLike()) {
word.set(sentiment.getNormKeyword());
context.write(word, new LongWritable(sentiment.getPtitleId()));
}
}
}
public static class Reduce extends Reducer<Text, LongWritable, ByteBuffer, List<Mutation>> {
private ByteBuffer outputKey;
public void reduce(Text key, Iterator<LongWritable> values, Context context) throws IOException, InterruptedException {
List<Long> ptitles = new ArrayList<Long>();
java.util.Map<Long, Integer> ptitleToFrequency = new HashMap<Long, Integer>();
while (values.hasNext()) {
Long value = values.next().get();
ptitles.add(value);
}
for(Long ptitle : ptitles) {
if(ptitleToFrequency.containsKey(ptitle)) {
ptitleToFrequency.put(ptitle, ptitleToFrequency.get(ptitle) + 1);
}
else {
ptitleToFrequency.put(ptitle, 1);
}
}
byte[] keyBytes = key.getBytes();
outputKey = ByteBuffer.wrap(Arrays.copyOf(keyBytes, keyBytes.length));
for(Long ptitle : ptitleToFrequency.keySet()) {
context.write(outputKey, Collections.singletonList(getMutation(new Text(ptitle.toString()), ptitleToFrequency.get(ptitle))));
}
}
private static Mutation getMutation(Text word, int sum)
{
Column c = new Column();
byte[] wordBytes = word.getBytes();
c.name = ByteBuffer.wrap(Arrays.copyOf(wordBytes, wordBytes.length));
c.value = ByteBuffer.wrap(String.valueOf(sum).getBytes());
c.timestamp = System.currentTimeMillis() * 1000;
Mutation m = new Mutation();
m.column_or_supercolumn = new ColumnOrSuperColumn();
m.column_or_supercolumn.column = c;
return m;
}
}
public static void main(String[] args) throws Exception {
int ret = ToolRunner.run(new SentimentAnalysis(), args);
System.exit(ret);
}
#Override
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "SentimentAnalysis");
job.setJarByClass(SentimentAnalysis.class);
String inputFile = args[0];
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(ByteBuffer.class);
job.setOutputValueClass(List.class);
job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);
ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, OUTPUT_COLUMN_FAMILY);
FileInputFormat.setInputPaths(job, inputFile);
ConfigHelper.setRpcPort(job.getConfiguration(), "9160");
ConfigHelper.setInitialAddress(job.getConfiguration(), "localhost");
ConfigHelper.setPartitioner(job.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner");
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
}
If you look under the Reduce class, I am converting Text field (key) to ByteBuffer properly.
Would appreciate some pointers on how to fix this.
After some trial and error, I was able to figure out how to solve this particular issue. Basically, in my reduce method signature, I was using Iterator instead of Iterable and so the reducer was never called. And, hadoop was trying to write my Mapper output (Text, LongWritable) to Cassandra using outputKey/Value Classes for Reducer (ByteBuffer, List). This was causing the ClassCastException.
Changing reduce method signature to Iterable solved this issue.

Categories

Resources