Stanford NER Error: Loading distsim lexicon Failed

Stanford NER Error: Loading distsim lexicon Failed - java

In my project. I need to use NER annotation so I used NERDemo.java
It works fine when I create a new project and have only this code, but when I add it to my project I keep getting errors. I have edited the path in my code to the specific location of the classifiers.
I added the Jar files:
This is the code:
String serializedClassifier = "/Users/ha/stanford-ner-2014-10-26/classifiers/english.all.3class.distsim.crf.ser.gz";
String serializedClassifier2 = "/Users/ha/stanford-ner-2014-10-26/classifiers/english.muc.7class.distsim.crf.ser.gz";
if (args.length > 0) {
serializedClassifier = args[0];
}
NERClassifierCombiner classifier = new NERClassifierCombiner(false, false, serializedClassifier, serializedClassifier2);
String fileContents = IOUtils.slurpFile("/Users/ha/NetBeansProjects/StanfordPOSCode/src/stanfordposcode/input.txt");
List<List<CoreLabel>> out = classifier.classify(fileContents);
int i = 0;
for (List<CoreLabel> lcl : out) {
i++;
int j = 0;
for (CoreLabel cl : lcl) {
j++;
System.out.printf("%d:%d: %s%n", i, j,
cl.toShorterString("Text", "CharacterOffsetBegin", "CharacterOffsetEnd", "NamedEntityTag"));
}
}
But I got this error:
run:
Loading classifier from /Users/ha/stanford-ner-2014-10-26/classifiers/english.all.3class.distsim.crf.ser.gz ... Loading distsim lexicon from /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters ... java.lang.RuntimeException: java.io.FileNotFoundException: /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters (No such file or directory)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:225)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.<init>(ReaderIteratorFactory.java:161)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory.iterator(ReaderIteratorFactory.java:98)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:404)
at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:242)
at edu.stanford.nlp.ie.NERFeatureFactory.initLexicon(NERFeatureFactory.java:471)
at edu.stanford.nlp.ie.NERFeatureFactory.init(NERFeatureFactory.java:379)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.reinit(AbstractSequenceClassifier.java:171)
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2630)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1620)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1736)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1679)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1662)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2851)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:189)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:173)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:125)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:52)
at stanfordposcode.MultipleNERs.main(MultipleNERs.java:24)
Caused by: java.io.FileNotFoundException: /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:131)
at edu.stanford.nlp.io.EncodingFileReader.<init>(EncodingFileReader.java:78)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:192)
... 18 more
Loading classifier from /Users/ha/stanford-ner-2014-10-26/classifiers/english.all.3class.distsim.crf.ser.gz ... Exception in thread "main" java.io.FileNotFoundException
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:199)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:173)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:125)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:52)
at stanfordposcode.MultipleNERs.main(MultipleNERs.java:24)
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to edu.stanford.nlp.classify.LinearClassifier
at edu.stanford.nlp.ie.ner.CMMClassifier.loadClassifier(CMMClassifier.java:1070)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1620)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1736)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1679)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1662)
at edu.stanford.nlp.ie.ner.CMMClassifier.getClassifier(CMMClassifier.java:1116)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:195)
... 4 more
Java Result: 1
BUILD SUCCESSFUL (total time: 1 second)

You are mixing and matching the code from version 3.4 and the models from version 3.5. I suggest upgrading everything to the latest version.

Related

Ignite Scan Query Throws class org.apache.ignite.binary.BinaryInvalidTypeException

Following Ignite Readme page https://apacheignite.readme.io/docs/cache-queries#section-scan-queries I am trying to run the discussed example code.
IgniteCache<Long, Person> cache = ignite.cache("mycache");
// Find only persons earning more than 1,000.
try (QueryCursor<Cache.Entry<Long, Person>> cursor =
cache.query(new ScanQuery<Long, Person>((k, p) -> p.getSalary() > 1000))) {
for (Cache.Entry<Long, Person> entry : cursor)
System.out.println("Key = " + entry.getKey() + ", Value = " + entry.getValue());
}
I am getting the following exception
Caused by: class org.apache.ignite.binary.BinaryInvalidTypeException: examples.model.Person
at org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:707)
at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1757)
at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
at org.apache.ignite.internal.binary.BinaryObjectImpl.deserializeValue(BinaryObjectImpl.java:798)
at org.apache.ignite.internal.binary.BinaryObjectImpl.value(BinaryObjectImpl.java:143)
at org.apache.ignite.internal.processors.cache.CacheObjectUtils.unwrapBinary(CacheObjectUtils.java:177)
at org.apache.ignite.internal.processors.cache.CacheObjectUtils.unwrapBinaryIfNeeded(CacheObjectUtils.java:39)
at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$ScanQueryIterator.advance(GridCacheQueryManager.java:3063)
at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$ScanQueryIterator.onHasNext(GridCacheQueryManager.java:2965)
at org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
at org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.runQuery(GridCacheQueryManager.java:1141)
at org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.processQueryRequest(GridCacheDistributedQueryManager.java:234)
at org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$2.apply(GridCacheDistributedQueryManager.java:109)
at org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$2.apply(GridCacheDistributedQueryManager.java:107)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
... 3 more
Caused by: java.lang.ClassNotFoundException: examples.model.Person
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8771)
at org.apache.ignite.internal.MarshallerContextImpl.getClass(MarshallerContextImpl.java:349)
at org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:698)
The Person object is taken from the Ingnite example (as well as most of the code, available https://gist.github.com/alexterman/075d7e12f470ce873f99d59478260250 on github ). I am running it on Mac.

This means you do not have Person class in classpath of your server node(s).
Ignite will not peer class load its Key-Value classes so you need to distribute them to all nodes before running any distributed operations which use those types.
Alternatively you can use withKeepBinary(), work on BinaryObject's. Something along the lines of
cache.withKeepBinary().query(new ScanQuery<Long, BinaryObject>(
(k, p) -> p.<Integer>getField("salary") > 1000))

ClassCastException- UIMA Ruta

I get "java.lang.ClassCastException: Cannot cast org.apache.uima.jcas.tcas.Annotation to java.lang.String" exception after updating the ruta version from 2.5.0 to 2.6.1 (Installtion Details attached). How to resolve this exception ? The exception occurs while executing the below script. Initializing Process taking more time.
Script:
FOREACH(hlev) Headinglevel{}
{
Document{->tagClass="",onlyClass=true};
hlev{-> ASSIGN(tagClass,tagName + "." + className), Headinglevel.class = tagClass}
<-{TagName{->MATCHEDTEXT(tagName)} # ClassName{->MATCHEDTEXT(className)};};
CssDefinition{->onlyClass=false, family = CssDefinition.fontfamily, size = CssDefinition.fontsize, color = CssDefinition.fontcolor, bold = CssDefinition.bold, italic = CssDefinition.italic, underline = CssDefinition.underline, case = CssDefinition.case}
<-{CssStyles{PARSE(cssStylesStr),IF(contains(cssStylesStr,tagClass))};};
}
Stacktrace:
Aug 02, 2018 12:07:25 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(434)
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:318)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
at org.apache.uima.ruta.ide.launching.RutaLauncher.processFile(RutaLauncher.java:242)
at org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:191)
Caused by: java.lang.ClassCastException: Cannot cast org.apache.uima.jcas.tcas.Annotation to java.lang.String
at java.lang.Class.cast(Class.java:3369)
at org.apache.uima.ruta.RutaEnvironment.getVariableValue(RutaEnvironment.java:866)
at org.apache.uima.ruta.expression.string.StringVariableExpression.getStringValue(StringVariableExpression.java:38)
at org.apache.uima.ruta.rule.RutaLiteralMatcher.getMatchingAnnotations(RutaLiteralMatcher.java:51)
at org.apache.uima.ruta.rule.RutaLiteralMatcher.getMatchingAnnotations(RutaLiteralMatcher.java:33)
at org.apache.uima.ruta.rule.RutaRuleElement.getAnchors(RutaRuleElement.java:51)
at org.apache.uima.ruta.rule.RutaRuleElement.startMatch(RutaRuleElement.java:59)
at org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:76)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:63)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:54)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:36)
at org.apache.uima.ruta.block.ForEachBlock.apply(ForEachBlock.java:92)
at org.apache.uima.ruta.block.RutaScriptBlock.apply(RutaScriptBlock.java:67)
at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:56)
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561)
...

Instead of using String, I used the annotation name directly.
FOREACH(hlev) Headinglevel{}
{
Document{->tagClass="",onlyClass=true};
Headinglevel{-> ASSIGN(tagClass,tagName + "." + className), Headinglevel.class = tagClass}
<-{TagName{->MATCHEDTEXT(tagName)} # ClassName{->MATCHEDTEXT(className)};};
CssDefinition{->onlyClass=false, family = CssDefinition.fontfamily, size = CssDefinition.fontsize, color = CssDefinition.fontcolor, bold = CssDefinition.bold, italic = CssDefinition.italic, underline = CssDefinition.underline, case = CssDefinition.case}
<-{CssStyles{PARSE(cssStylesStr),IF(contains(cssStylesStr,tagClass))};};
}

How can I remove the output layer of pre-trained model with TensorFlow java api?

I have the pre-trained model like Inception-v3. I want to remove the output layer and use it in image cognition. Here is the example given by tensorflow:
Just like the python framework Keras, it has a method like model.layers.pop(). I tried do it with tensorflow java api. First I tried to use dl4j, but when I imported the keras model, I got an error like this:
2017-06-15 21:15:43 INFO KerasInceptionV3Net:52 - Importing Inception model from data/inception-model.json
2017-06-15 21:15:43 INFO KerasInceptionV3Net:53 - Importing Weights model from data/inception_v3_complete
Exception in thread "main" java.lang.RuntimeException: Unknown exception.
at org.bytedeco.javacpp.hdf5$H5File.allocate(Native Method)
at org.bytedeco.javacpp.hdf5$H5File.<init>(hdf5.java:12713)
at org.deeplearning4j.nn.modelimport.keras.Hdf5Archive.<init>(Hdf5Archive.java:61)
at org.deeplearning4j.nn.modelimport.keras.KerasModel$ModelBuilder.weightsHdf5Filename(KerasModel.java:603)
at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:176)
at edu.usc.irds.dl.dl4j.examples.KerasInceptionV3Net.<init>(KerasInceptionV3Net.java:55)
at edu.usc.irds.dl.dl4j.examples.KerasInceptionV3Net.main(KerasInceptionV3Net.java:108)
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
#000: C:\autotest\HDF5110ReleaseRWDITAR\src\H5F.c line 579 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#001: C:\autotest\HDF5110ReleaseRWDITAR\src\H5Fint.c line 1100 in H5F_open(): unable to open file: time = Thu Jun 15 21:15:44 2017,name = 'data/inception_v3_complete', tent_flags = 0
major: File accessibilty
minor: Unable to open file
#002: C:\autotest\HDF5110ReleaseRWDITAR\src\H5FD.c line 812 in H5FD_open(): open failed
major: Virtual File Layer
minor: Unable to initialize object
#003: C:\autotest\HDF5110ReleaseRWDITAR\src\H5FDsec2.c line 348 in H5FD_sec2_open(): unable to open file: name = 'data/inception_v3_complete', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
major: File accessibilty
minor: Unable to open file
So I went back to tensorflow. I'm going to modify the model in keras and convert the model to tensor. Here is my conversion script:
input_fld = './'
output_node_names_of_input_network = ["pred0"]
write_graph_def_ascii_flag = True
output_node_names_of_final_network = 'output_node'
output_graph_name = 'test2.pb'
from keras.models import load_model
import tensorflow as tf
import os
import os.path as osp
from keras.applications.inception_v3 import InceptionV3
from keras.applications.vgg16 import VGG16
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
output_fld = input_fld + 'tensorflow_model/'
if not os.path.isdir(output_fld):
os.mkdir(output_fld)
net_model = InceptionV3(weights='imagenet', include_top=True)
num_output = len(output_node_names_of_input_network)
pred = [None]*num_output
pred_node_names = [None]*num_output
for i in range(num_output):
pred_node_names[i] = output_node_names_of_final_network+str(i)
pred[i] = tf.identity(net_model.output[i], name=pred_node_names[i])
print('output nodes names are: ', pred_node_names)
from keras import backend as K
sess = K.get_session()
if write_graph_def_ascii_flag:
f = 'only_the_graph_def.pb.ascii'
tf.train.write_graph(sess.graph.as_graph_def(), output_fld, f, as_text=True)
print('saved the graph definition in ascii format at: ', osp.join(output_fld, f))
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), pred_node_names)
graph_io.write_graph(constant_graph, output_fld, output_graph_name, as_t ext=False)
print('saved the constant graph (ready for inference) at: ', osp.join(output_fld, output_graph_name))
I got the model as .pb file, but when I put it into the tensor example, The LabelImage example, I got this error:
Exception in thread "main" java.lang.IllegalArgumentException: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
[[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access$100(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:285)
at org.tensorflow.Session$Runner.run(Session.java:235)
at com.dlut.cmh.sheng.LabelImage.executeInceptionGraph(LabelImage.java:98)
at com.dlut.cmh.sheng.LabelImage.main(LabelImage.java:51)
I don't know how to solve this. Can anyone help me? Or you have another way to do this?

The error message you get from the TensorFlow Java API:
Exception in thread "main" java.lang.IllegalArgumentException: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
[[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
suggests that the model is constructed in a way that requires you to feed a boolean value for the tensor named batch_normalization_1/keras_learning_phase.
So, you'd have to include that in your call to run by changing:
try (Session s = new Session(g);
Tensor result = s.runner().feed("input",image).fetch("output").run().get(0)) {
to something like:
try (Session s = new Session(g);
Tensor learning_phase = Tensor.create(false);
Tensor result = s.runner().feed("input", image).feed("batch_normalization_1/keras_learning_phase", learning_phase).fetch("output").run().get(0)) {
The names of nodes you feed and fetch depend on the model, so it's possible that the names of the 'input' and 'output' nodes are different as well.
You might also want to consider using the TensorFlow SavedModel format (see also https://github.com/tensorflow/serving/issues/310#issuecomment-297015251)
Hope that helps

ClassNotFoundException while trying to use slick codegen

I am trying to use a custom codegen for the purpose of acquiring DateTime types from mysql instead of Timestamp. I just couldn't make the sbt task to run with the custom code generator.
class is located at /project-root/app/com/my/name
val conf = ConfigFactory.parseFile(new File("conf/application.conf")).resolve()
slick <<= slickCodeGenTask
lazy val slick = TaskKey[Seq[File]]("gen-tables")
lazy val slickCodeGenTask = (sourceManaged, dependencyClasspath in Compile, runner in Compile, streams) map { (dir, cp, r, s) =>
val outputDir = (dir / "slick").getPath
val url = conf.getString("slick.dbs.default.db.url")
val jdbcDriver = conf.getString("slick.dbs.default.db.driver")
val slickDriver = conf.getString("slick.dbs.default.driver").dropRight(1)
val pkg = "com.my.name"
val user = conf.getString("slick.dbs.default.db.user")
val password = conf.getString("slick.dbs.default.db.password")
toError(r.run(s"$pkg.CustomCodeGenerator", cp.files, Array(slickDriver, jdbcDriver, url, outputDir, pkg, user, password), s.log))
val fname = outputDir + s"/$pkg/Tables.scala"
Seq(file(fname))
}
it always gives the same exception below when i try to run sbt gen-tables
java.lang.ClassNotFoundException: com.my.name.CustomCodeGenerator
at java.lang.ClassLoader.findClass(ClassLoader.java:530)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sbt.classpath.ClasspathFilter.loadClass(ClassLoaders.scala:59)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at sbt.Run.getMainMethod(Run.scala:72)
at sbt.Run.run0(Run.scala:60)
at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
at sbt.Logger$$anon$4.apply(Logger.scala:84)
at sbt.TrapExit$App.run(TrapExit.scala:248)
at java.lang.Thread.run(Thread.java:745)
when i try some built in java classes or default slick codegen class just to experiment it founds the class
i tried changing the order of this task in the build.sbt class but didn't solved

Instead of
dependencyClasspath in Compile
use fullClasspath in Compile
see https://www.scala-sbt.org/1.x/docs/Howto-Classpaths.html

Extract Wikipedia Infobox data

I want to extract the data from wikipedia infobox and came upon the code in Wikipedia infobox extraction in Java that suggests a method to do so with java. I am not handy with java as I am with python so I am using the wikixmlj-r43.jar in my eclipse with the code :
import edu.jhu.nlp.wikipedia.*;
public class InfoboxParser {
public static void main(String[] args) throws Exception{
WikiXMLParser parser = WikiXMLParserFactory.getSAXParser("/home/siddhartha/Documents/wiki/enwiki-latest-pages-articles.xml");
parser.setPageCallback(new PageCallbackHandler() {
public void process(WikiPage page) {
InfoBox infobox=page.getInfoBox();
//do something with info box
}
});
parser.parse();
}
}
I am getting the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tools/bzip2/CBZip2InputStream
at edu.jhu.nlp.wikipedia.WikiXMLParserFactory.getSAXParser(WikiXMLParserFactory.java:15)
at parser.InfoboxParser.main(InfoboxParser.java:7)
Caused by: java.lang.ClassNotFoundException: org.apache.tools.bzip2.CBZip2InputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 2 more
I have added the JAR in eclipse under properties > java build path > libraries. What I get is that it is not able to find CBZip2InputStream class.
Please help.

Response res = Jsoup.connect("http://en.wikipedia.org/wiki/Carbon")
.execute();
String html = res.body();
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();
Elements tables = body.getElementsByTag("table");// hasClass("infobox bordered");
for (Element table : tables) {
if (table.className().equalsIgnoreCase("infobox bordered")) {
System.out.println(table.outerHtml());
break;
}

This might help you.
https://code.google.com/p/wikixmlj/source/browse/trunk/tests/InfoBoxTest.java?spec=svn40&r=40
Replace the link in the code(data/newton.xml) with this
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=your_title&rvsection=0

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Stanford NER Error: Loading distsim lexicon Failed - java

You are mixing and matching the code from version 3.4 and the models from version 3.5. I suggest upgrading everything to the latest version.

Related

Ignite Scan Query Throws class org.apache.ignite.binary.BinaryInvalidTypeException

ClassCastException- UIMA Ruta

How can I remove the output layer of pre-trained model with TensorFlow java api?

ClassNotFoundException while trying to use slick codegen

Extract Wikipedia Infobox data

Categories

Resources