Can not use the XmlInputFormat extends TextInputFormat in Java - java

I am trying to do a WordCount using Hadoop. I want to use XmlInputFormat.class to split the file base on XML tag. The XmlInputFormat.class is here
XmlInputFormat.class is extends from TextInputFormat.class
Job job = new Job(getConf());
job.setInputFormatClass(XmlInputFormat.class);
It shows the error
The method setInputFormatClass(Class) in the type Job is not applicable for the arguments (Class)
But it's OK when I use
Job job = new Job(getConf());
job.setInputFormatClass(TextInputFormat.class);
Why can't we use the extends one? Or did I do something wrong?

That looks like an issue with your Hadoop version. Did you check that the XMLInputFormat class you are using is actually the right one for your Hadoop version?

I think the hadoop tutorial using mapred library is outdated, and should take a look at:
http://wiki.apache.org/hadoop/WordCount
I could successfully run XMLInputFormat after slight modification of the code above.
Please ignore this answer. I think the cause is because I was using deprecated version of map reduce which use mapred.*.
I had the same problem, and it's resolved when I modified one of imports:
From:
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
To:
import org.apache.hadoop.mapred.TextInputFormat;

May be you are importing the wrong XmlInputFormat.class in your code. the same happened to me with the TextInputFormat.class, to see I was using the wrong import of the class which eclipse automatically pulled out. The correct class to import was:
org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

Related

InOut.readInt() only working in Windows Java Editor

At school, I write Java programs with Windows’ Java Editor (console mode). There, InOut.readInt() (used for user input) works without problems and I don’t have to import anything.
Now, I have a Java homework for the holidays, and I try to write Java programs on my Mac. In online console Java editors, the line InOut.readInt() causes this error:
/IntBetween10And100.java:8: error: cannot find symbol
int input = InOut.readInt("Integer --> ");
^
symbol: variable InOut
location: class IntBetween10And100
1 error
I already tried import lines (placed before the class) like:
import java.*
import java.util.*
import java.util.InOut
import java.io.BufferedStreamReader
import java.util.*;
public class IntBetween10And100 {
public static void main(String args[]) {
int input = InOut.readInt("Integer --> ");
}
}
int input = InOut.readInt("Integer --> ");
should produce the line
Integer -->
but instead, the error message (seen above) appears.
OK, so you are using the "Java-Editor" tool on Windows for writing your Java code.
It turns out that Java-Editor includes a class called InOut as an example class. (You can see it here: http://javaeditor.org/doku.php?id=en:examples).
For various reasons, it is not suitable for use in production code of any kind:
It is not part of the Java SE class library, or any 3rd-party libraries.
It is a class in the default package
It has limited functionality, even compared to the real System.in and System.out
It would interfere with any application or 3rd party library code that uses System.in in the normal way. (It creates its own BufferedReader to wrap System.in. That is liable to capture "type-ahead" input.)
You don't really need to use it for educational purposes either. It is only a wrapper class ...
However, if you want to use InOut outside of the Java-Editor context, you could simply download the source code from the page above and add it to your project. I can't tell you exactly how, but adding classes should be explained in the documentation of the tool you are using now! (If you are serious about learning Java, I would actually recommend that you download and install a real Java JDK and a real Java IDE on your own computer.)
The authors have neglected to include an explicit copyright notice in the InOut.java file. However, the Java-Editor site as a whole is licensed as "CC Attribution-Share Alike 4.0 International".

How to fix NoClassDefFoundError for Kafka Producer Example?

I am getting a NoClassDefFoundError when I try to compile and run the Producer example that comes with Kafka. I want to know how to resolve the error discussed below?
Caveat: I am a C++/C# programmer who is Linux literate and starting to
learn Java. I can follow instructions, but may well ask for some
clarification along the way.
I have a VM sandbox from Hortonworks that is running a Red Hat appliance. On it I have a working kafka server and by following this tutorial I am able to get the desired Producer posting messages to the server.
Now I want to get down to writing my own code, but first I decided to make sure I can compile the example files that Kafka came with After a day of trial and error I just cannot seem to get this going.
here is what I am doing:
I am going to the directory where the example files are located and typing:
javac -cp $KCORE:$KCLIENT:$SCORE:. ./*.java
$KCORE:$KCLIENT:$SCORE resolve to the jars for the kafka core, kafka-client, and scala libraries respectively. everything returns just fine with no errors and places all the class files in the current directory; however, when I follow up with
javac -cp $KCORE:$KCLIENT:$SCORE:. Producer
I get a NoClassDefFoundError telling me the following
The code for the class is
package kafka.examples;
import java.util.Properties;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class Producer extends Thread
{
private final kafka.javaapi.producer.Producer<Integer, String> producer;
private final String topic;
private final Properties props = new Properties();
public Producer(String topic)
{
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("metadata.broker.list", "localhost:9092");
// Use random partitioner. Don't need the key type. Just set it to Integer.
// The message is of type String.
producer = new kafka.javaapi.producer.Producer<Integer, String>(new ProducerConfig(props));
this.topic = topic;
}
public void run() {
int messageNo = 1;
while(true)
{
String messageStr = new String("Message_" + messageNo);
producer.send(new KeyedMessage<Integer, String>(topic, messageStr));
messageNo++;
}
}
}
Can anybody point me in the right direction to resolve this error? Do the classes need to go in different directories for some reason?
The package name is a part of the class name you need to supply on the command line:
javac -cp $KCORE:$KCLIENT:$SCORE:. kafka.examples.Producer
Also, you should be standing in the root directory of your class hierarchy, which seems to be two directories up (you're currently standing in kafka/examples. Alternatively, you can use ../.. instead of . in the -cp argument to denote that the root is two directories up.
You might want to get familiar with using an IDE such as IntelliJ IDEA or Eclipse (and how to use libraries in those IDEs), as this will make development much easier. Props for doing things the hard way first, though, as you'll get a better understanding of how things are stitched together (an IDE will essentially figure all those console commands for you without you noticing).

Hadoop map reduce whole file input format

I am trying to use the hadoop map reduce, but instead of mapping each line at a time in my Mapper, I would like to map a whole file at once.
So I have found these two classes
(https://code.google.com/p/hadoop-course/source/browse/HadoopSamples/src/main/java/mr/wholeFile/?r=3)
That suppose to help me do this.
And I got a compilation error that says :
The method setInputFormat(Class) in the type
JobConf is not applicable for the arguments
(Class) Driver.java /ex2/src line 33 Java
Problem
I changed my Driver class to be
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.InputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import forma.WholeFileInputFormat;
/*
* Driver
* The Driver class is responsible of creating the job and commiting it.
*/
public class Driver {
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(Driver.class);
conf.setJobName("Get minimun for each month");
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
// previous it was
// conf.setInputFormat(TextInputFormat.class);
// And it was changed it to :
conf.setInputFormat(WholeFileInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf,new Path("input"));
FileOutputFormat.setOutputPath(conf,new Path("output"));
System.out.println("Starting Job...");
JobClient.runJob(conf);
System.out.println("Job Done!");
}
}
What am I doing wrong?
Make sure your wholeFileInputFormat class has correct imports. You are using old MapReduce Api in your job Driver. I think you imported new API FileInputFormat in your WholeFileInputFormat class. If i am right, You should import org.apache.hadoop.mapred.FileInputFormat in your wholeFileInputFormat class instead of org.apache.hadoop.mapreduce.lib.input.FileInputFormat .
Hope this helps.
Easiest way to do this is to gzip your input file. This will make FileInputFormat.isSplitable() to return false.
We too ran into something similar and had an alternative out-of-box approach.
Let say you need to process 100 large files (f1, f2,...,f100) such that you need to read one file wholly in the map function. So instead of using the "WholeInputFileFormat" reader approach we created equivalent 10 text files (p1, p2,...,p10) each file containing either the HDFS URL or web URL of the f1-f100 files.
Thus p1 will contain url for f1-f10, p2 will urls for f11-f20 and so on.
These new files p1 thru p10 are then used as input to mappers. Thus the mapper m1 processing file p1 will open file f1 thru f10 one at a time and process it wholly.
This approach allowed us to control the number of mappers and write more exhaustive and complex application logic in map-reduce application. E.g we could run NLP on PDF files using this approach.

How can I import UnixFileAttributes?

When I execute this snippet:
FileSystem fs = FileSystems.getDefault();
for (String s : fs.supportedFileAttributeViews())
{
System.out.println(s);
}
I get this result: "basic owner user unix dos posix"
Then when I try actually to use UnixFileAttributeView() it appears to not exist.
I imported the whole package by importing java.nio.file.attribute.*;, but also tried to import directly java.nio.file.attributes.UnixFileAttributeView; and it appears to not exist.
Whereas I am able to import all the other attributeViews I get out of fs.supportedFileAttributeViews().
Do you know why it happens? And moreover how can I fix it?
Thanks in advance.
From the docs
PosixFileAttributeView – Extends the basic attribute view with
attributes supported on file systems that support the POSIX family of
standards, such as UNIX. These attributes include file owner, group
owner, and the nine related access permissions.
It does not seem to be possible to import it.
By googling you can find some source for OpenJDK implementation.
I have found the simplest way to access the data to be:
Files.getAttribute(file.toPath, "unix:uid")
You have these options at least:
dev
ino
mode
uid
gid
size
atime
mtime
ctime
Of course you should check Files.getFileStore(file.toPath).supportsFileAttributeView("unix") first.
It's not bundled with java 1.7, however, you can read its attributes as shown in previous comments.
Additional note is that you can refer to backport_project_of_JSR203 and you can find an implementation of it:
https://code.google.com/p/jsr203-backport/source/browse/trunk/src/jsr203/sun/nio/fs/UnixFileAttributeView.java

JavaCV/OpenCV: cvLoadImage not working

I installed the JavaCV/OpenCV libraries, and I'm having a problem with the basic example code.
According to several examples that I have looked at, this code should load an image:
IplImage image = cvLoadImage("C:\\img.jpg");
But, when I run that I get a "cannot find symbol" error.
Since this is my first time using it, I'm not sure if I messed the install up or not.
According to the newest JavaCV readme, I do have the correct version of OpenCV. I also have all the JavaCV jar files imported. As far as I can tell, I also have all the paths set correctly too.
Anyone know what the problem is?
Edit:
Full code:
import com.googlecode.javacv.CanvasFrame;
import com.googlecode.javacv.cpp.opencv_core.IplImage;
import java.io.File;
public class demo {
public static void main(String[] args)
{
IplImage image = cvLoadImage("C:\\img.jpg");
final CanvasFrame canvas = new CanvasFrame("Demo");
canvas.showImage(image);
canvas.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE);
}
}
Error when I try to run it:
Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - Erroneous sym type: cvLoadImage
at javacv.demo.main(demo.java:17)
Java Result: 1
Seems like it is claiming cvLoadImage doesn't take a string as an argument.
A walk around that i find for you is to load the image by ImageIO and passe it later to IplImage
e.g.:
BufferedImage img = ImageIO.read(new File("C:\\img.jpg") );
IplImage origImg = IplImage.createFrom(img);
This solved my problem: import static org.bytedeco.javacpp.opencv_imgcodecs.*;
You have to add this import statement:
import static org.bytedeco.javacpp.opencv_imgcodecs.cvLoadImage;
This is required so that the static method cvLoadImage can be used without using the class name.
You have to import com.googlecode.javacv.cpp.opencv_highgui.*;
With javacv 0,9 you have to import static org.bytedeco.javacpp.opencv_highgui.*;
I got the same error then, i imported the following package, problem solved.
import static com.googlecode.javacv.cpp.opencv_highgui.*;
This might be old but for those who stumbled upon this problem like me just now,
here is how I solved it and why:
First OP's error: Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - Erroneous sym type: cvLoadImage at javacv.demo.main(demo.java:17)
This indicates that the compiler cannot find the cvLoadImage method that you are trying to call.
cvLoadImage is a static method under JavaCPP.
Specifically it is a static method under opencv_imgcodecs class.
To solve this issue one must first specify the import of the opencv_imgcodecs class.
This can be done by adding the import:
import static org.bytedeco.javacpp.opencv_imgcodecs.cvLoadImage;
This in turn would result for the opencv_imgcodecs class to be usable within your class along with its static methods and other functions.
I hope this helps.
Got the same problem recently.
if you are using javacv-0.10 (more recent at the moment), import manually this one:
import static org.bytedeco.javacpp.opencv_highgui.*;
but the source of JRE of the project should be higher than 1.5
In my case, the problem happened when the squeegee in debug mode.
Try to run in normal mode.

Categories

Resources