how to use boilerpipe with a local html file? - java

I have an html file on my local disk and would like to extract text from it using BoilerPipe.
The "getText" method from the class ExtractorBase accepts a reader, so I wrote:
FileReader fr = new FileReader("D:/myHTMLfile");
System.out.println(ArticleExtractor.INSTANCE.getText(fr));
But then I get an error pointing to this second line of code.
Any clue? Thx!
EDIT: the whole error msg is:
Exception in thread "pool-1-thread-1" java.lang.NoClassDefFoundError: org/cyberneko/html/HTMLConfiguration
at de.l3s.boilerpipe.sax.BoilerpipeHTMLParser.<init>(BoilerpipeHTMLParser.java:50)
at de.l3s.boilerpipe.sax.BoilerpipeHTMLParser.<init>(BoilerpipeHTMLParser.java:41)
at de.l3s.boilerpipe.sax.BoilerpipeSAXInput.getTextDocument(BoilerpipeSAXInput.java:51)
at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:69)
at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:101)
at neuromarket.BoilerPlateExtractor.run(BoilerPlateExtractor.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: org.cyberneko.html.HTMLConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 9 more
Exception in thread "pool-1-thread-2" java.lang.NoClassDefFoundError: org/cyberneko/html/HTMLConfiguration
at de.l3s.boilerpipe.sax.BoilerpipeHTMLParser.<init>(BoilerpipeHTMLParser.java:50)
at de.l3s.boilerpipe.sax.BoilerpipeHTMLParser.<init>(BoilerpipeHTMLParser.java:41)
at de.l3s.boilerpipe.sax.BoilerpipeSAXInput.getTextDocument(BoilerpipeSAXInput.java:51)
at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:69)
at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:101)
at neuromarket.BoilerPlateExtractor.run(BoilerPlateExtractor.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
BUILD SUCCESSFUL (total time: 0 seconds)

You need to add nekohtml-1.x.x.jar at classpath.

Related

OpenCV Libraries in Hadoop

I wrote one simple application in which I am using OpenCV. I put all the jar files in /usr/local/hadoop/lib folder. While running the hadoop job, I am getting the following error:
java.lang.Exception: java.lang.NoClassDefFoundError: com/googlecode/javacv/cpp/opencv_core$CvArr
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:399)
Caused by: java.lang.NoClassDefFoundError: com/googlecode/javacv/cpp/opencv_core$CvArr
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1486)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1456)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1545)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:686)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.ClassNotFoundException: com.googlecode.javacv.cpp.opencv_core$CvArr
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 15 more
Any idea how to resolve this?
You could try using the option libjars when running the hadoop job. Try the following:
hadoop jar yourjar.jar MainClass -libjars /path/to/external/jar
More information here.

Java cannot find .class file, or class

I have a directory structure thus:
project/
src/
videoserver/
*.java
storeclient/
*.java
bin/
videoserver/
*.class
storeclient/
*.class
My current working directory is project/src/ and I'm using the command java -cp "../bin/" videoserver.Main
This all seems right to me, but Java is telling me it cannot find the class IVideoServer. I can see IVideoServer.class in the correct folder, and it seems able to find the Main class just fine.
What could be causing Java to be unable to find IVideoServer?
The method that appears to be causing the problem is this one
public static void main(String[] args)
{
try
{
VideoServer Server = new VideoServer();
videoserver.IVideoServer stub = (IVideoServer)UnicastRemoteObject.exportObject(Server, 0);
// Bind the remote object's stub in the registry
Registry registry = LocateRegistry.getRegistry("server-url", 9090);
registry.bind("VideoServer", stub);
System.err.println("Server ready");
}
catch (Exception e)
{
System.err.println("Server exception: " + e.toString());
e.printStackTrace();
}
}
I have already compiled all of the classes involved with javac.
This is the full exception and stack trace as it is returned to me
Server exception: java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:
java.lang.ClassNotFoundException: videoserver.IVideoServer
java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:
java.lang.ClassNotFoundException: videoserver.IVideoServer
at sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:396)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:250)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255)
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:359)
at sun.rmi.registry.RegistryImpl_Stub.bind(Unknown Source)
at videoserver.Main.main(Main.java:17)
Caused by: java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:
java.lang.ClassNotFoundException: videoserver.IVideoServer
at sun.rmi.registry.RegistryImpl_Skel.dispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:386)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:250)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: videoserver.IVideoServer
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at sun.rmi.server.LoaderHandler.loadProxyInterfaces(LoaderHandler.java:711)
at sun.rmi.server.LoaderHandler.loadProxyClass(LoaderHandler.java:655)
at sun.rmi.server.LoaderHandler.loadProxyClass(LoaderHandler.java:592)
at java.rmi.server.RMIClassLoader$2.loadProxyClass(RMIClassLoader.java:628)
at java.rmi.server.RMIClassLoader.loadProxyClass(RMIClassLoader.java:294)
at sun.rmi.server.MarshalInputStream.resolveProxyClass(MarshalInputStream.java:238)
at java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1530)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1492)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
... 12 more
From the src folder, have you tried java -cp . videoserver.Main ??
try this, javap videoserver.Main

RMI, ServerException: RemoteException

I followed this tutorial:
http://download.oracle.com/javase/1.5.0/docs/guide/rmi/hello/hello-world.html#2
And created 3 simle files (interface Hello.java, and classes Server.java and Client.java)
Unfortunately when I try to run Server like this:
start java -classpath "D:\[...]\RMISample\src" example.hello.Server
I get these exceptions:
Server exception: java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:
java.lang.ClassNotFoundException: example.hello.Hello
java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:
java.lang.ClassNotFoundException: example.hello.Hello
at sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:396)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:250)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(Unknown Source)
at sun.rmi.transport.StreamRemoteCall.executeCall(Unknown Source)
at sun.rmi.server.UnicastRef.invoke(Unknown Source)
at sun.rmi.registry.RegistryImpl_Stub.bind(Unknown Source)
at example.hello.Server.main(Server.java:26)
Caused by: java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:
java.lang.ClassNotFoundException: example.hello.Hello
at sun.rmi.registry.RegistryImpl_Skel.dispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:386)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:250)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: example.hello.Hello
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at sun.rmi.server.LoaderHandler.loadProxyInterfaces(LoaderHandler.java:711)
at sun.rmi.server.LoaderHandler.loadProxyClass(LoaderHandler.java:655)
at sun.rmi.server.LoaderHandler.loadProxyClass(LoaderHandler.java:592)
at java.rmi.server.RMIClassLoader$2.loadProxyClass(RMIClassLoader.java:628)
at java.rmi.server.RMIClassLoader.loadProxyClass(RMIClassLoader.java:294)
at sun.rmi.server.MarshalInputStream.resolveProxyClass(MarshalInputStream.java:238)
at java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1530)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1492)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
... 12 more
What can be wrong?
Greetings
When you start your server application you have to point where RMI should look for classes
you can do it using java.rmi.server.codebase property, so the right command looks like this:
java -classpath "D:\[...]\bin"
-Djava.rmi.server.codebase=file:/D:\[...]\bin/ example.hello.Server

java.rmi.ServerError: java.lang.NoClassDefFoundError: net/sf/jasperreports/engine/JasperPrint

I have a problem with running my server application which registers itself in rmi registry. In Remote interface I have method which returns JasperPrint. The exception is thrown when server calls:
server = (Server)UnicastRemoteObject.exportObject(server, 0);
Registry registry = LocateRegistry.getRegistry();
registry.rebind("server", server); // <-----------------------
I have jasperreports-3.7.5.jar in classpath of the server. Any help would be appreciated.
Stack trace:
java.rmi.ServerError: Error occurred in server thread; nested exception is:
java.lang.NoClassDefFoundError: net/sf/jasperreports/engine/JasperPrint
at sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:393)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:250)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255)
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:359)
at sun.rmi.registry.RegistryImpl_Stub.rebind(Unknown Source)
at server.ServerImpl.main(ServerImpl.java:126)
Caused by: java.lang.NoClassDefFoundError: net/sf/jasperreports/engine/JasperPrint
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
at java.lang.Class.privateGetPublicMethods(Class.java:2547)
at java.lang.Class.getMethods(Class.java:1410)
at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:409)
at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:306)
at java.lang.reflect.Proxy.getProxyClass(Proxy.java:501)
at sun.rmi.server.LoaderHandler.loadProxyClass(LoaderHandler.java:680)
at sun.rmi.server.LoaderHandler.loadProxyClass(LoaderHandler.java:669)
at sun.rmi.server.LoaderHandler.loadProxyClass(LoaderHandler.java:592)
at java.rmi.server.RMIClassLoader$2.loadProxyClass(RMIClassLoader.java:628)
at java.rmi.server.RMIClassLoader.loadProxyClass(RMIClassLoader.java:294)
at sun.rmi.server.MarshalInputStream.resolveProxyClass(MarshalInputStream.java:238)
at java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1531)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1732)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at sun.rmi.registry.RegistryImpl_Skel.dispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:386)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:250)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.ClassNotFoundException: net.sf.jasperreports.engine.JasperPrint
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 30 more
Starting rmiregistry with:
rmiregistry -J-classpath -J"path\to\jasperreports-3.7.5.jar"
solved the problem.
This error can occur because of several reasons, i had this error also several times,
try some refactoring, check your stubs and skeletons are there manually and too many "server" can also create confusion

I put the xmlbeans-2.3.0.jar into path and it says another exception while reading .xslx file using apache poi

I put the xmlbeans-2.3.0.jar into path and it says another exception while reading .xslx file using apache poi.
Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException
at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:149)
at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:136)
at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:98)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:53)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:176)
at setTargetPlan.setTask(setTargetPlan.java:152)
at userVerification.main(userVerification.java:204)
Caused by: java.lang.ClassNotFoundException: org.dom4j.DocumentException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 10 more
Same as your earlier problem.
For each ClassNotFound exception, search for the jar on the site below which contains this class and put the jar in your classpath
http://www.jarfinder.com/index.php/java/info/org.dom4j.DocumentException shows you need dom4j.jar

Categories

Resources