I am attempting to upgrade my version of Stanford CoreNLP to the latest edition (was previously on v3.5.2, tried to upgrade to v3.6.0). After compiling all of the new jars necessary for v3.6.0, I started up a standalone Stanford CoreNLP server (using Apache Thrift v0.9.3).
In addition, I am using Stanford's Shift Reduce Parser, which can be found at the following link: Stanford Shift Reduce Parser. I believe the latest version of the model was published on 10/23/2014. The model I need in particular is the englishSR.beam.ser.gz (English Beam Search Shift Reduce Model).
Unfortunately, upon running my new server (Stanford CoreNLP v3.6.0 / Apache Thrift v0.9.3), the logs displayed an error:
Reading in configuration from scripts/config...
Initializing Parser...
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.6 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.5 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.5 sec].
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
May 20, 2016 3:41:00 AM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 83 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
May 20, 2016 3:41:01 AM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 267 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
May 20, 2016 3:41:01 AM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 25 rules
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.4 sec].
03:41:01.566 [main] ERROR edu.stanford.nlp.io.IOUtils - Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.beam.ser.gz ...
done [10.4 sec].
Initializing Tokenizer...
The CoreNLP server is running...
Conversely, this is the log that is seen when running the old server (Stanford CoreNLP v3.5.2 / Thrift v0.9.3):
Reading in configuration from scripts/config...
Initializing Parser...
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.1 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.7 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.beam.ser.gz ... done [9.8 sec].
Adding annotator dcoref
Initializing Tokenizer...
The CoreNLP server is running...
As you can see, the new server errors out while trying to load in the srparser and does not end up "Adding annotator dcoref". I did not modify any of the other files and am unsure what could have caused the discrepancy. Currently looking for a Stanford Core NLP properties file, but I would appreciate any help regarding this issue. Thanks in advance!
I ran this command and had no issues:
java -Xmx6g -cp "stanford-corenlp-full-2015-12-09/*:stanford-english-corenlp-2016-01-10-models.jar" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse -file sample-text.txt -outputFormat text -parse.model edu/stanford/nlp/models/srparser/englishSR.beam.ser.gz
This is using the distribution for Stanford CoreNLP 3.6.0 and the latest model jars we use.
Make sure to just use those jars and only those jars, if you have older versions of things floating around in your CLASSPATH that could cause compatibility issues which might makes things not work.
And just to be clear the distribution comes with a standard models jar which has some basic things to use the toolkit. Then separately there is the English models jar which is HUGE and contains ALL English resources.
The English shift reduce parser models are all in the English models jar we distribute now which has ALL English resources. That is the recommended way to get English resources not available in the standard jar. I may need to update some pages to reflect this information. It's possible the old shift reduce models jar is not compatible with 3.6.0, I will investigate.
All of these things are available here: http://stanfordnlp.github.io/CoreNLP/download.html
If you are still having issues let me know, and let me know where the jar causing the problem is coming from and I will investigate. But if you run that command with the resources downloaded from the link above it should work fine.
Did you call ShiftReduceParser.loadModel method?
This method calls IOUtils.readObjectAnnouncingTimingFromURLOrClasspathOrFileSystem method, and its source is as follows:
try {
Timing timing = new Timing();
logger.error(msg + ' ' + path + " ... ");
obj = IOUtils.readObjectFromURLOrClasspathOrFileSystem(path);
timing.done();
} catch (IOException | ClassNotFoundException e) {
throw new RuntimeIOException(e);
}
return obj;
"logger.error" is a mistake probably. It shoud be "logger.info", I think.
Related
I train a neural network in python 3.7 using tensorflow in version 1.14 and sonnet in version 1.34. After training, I want to export my neural network in order to use it in Java for predictions. Unfortunately, it always displays an error while trying to load the model in Java, saying the file cannot be found.
I double checked the model path a hundred times and it is correct (including transformation of slashes to backslashes because it runs on windows)
I tried several other export methods in python (including SavedModelBuilder and ModelSaver)
Below this you can see the Python code to export the model.
tf.saved_model.simple_save(sess,
path_graph_model_export_dir,
inputs=model_inputs, outputs={'sim': similarity})
This is the structure of the exported directory.
saved_model
---- saved_model.pb
---- variables
-------- variables.data-00000-of-00001
-------- variables.index
And the code in Java to load the model..
SavedModelBundle model = SavedModelBundle.load(modelPath, "serve")
results in this exception.
2019-09-17 10:59:03.905538: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: saved_model
2019-09-17 10:59:03.905997: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: fail. Took 457 microseconds.
org.tensorflow.TensorFlowException: Could not find SavedModel .pb or .pbtxt at supplied export directory path: saved_model
I cannot figure out why this exception appears when the file is clearly located at this directory. I hope someone can help me with this. Thanks in advance!
I am unable to access coreNLP in R on a Mac running High Sierra. I am uncertain what the problem is, but it seems that every time I try again to get coreNLP to work, I am faced with a different error. I have JDK 9.0.4. Please see my code below for what I am attempting to do, and the error that stops me.
My previous attempt I was able to get initCoreNLP() to run and load some elements of the packages, but it would fail on others. When I then attempted to run annotateString(), it would throw the error Error Must initialize with 'int CoreNLP'!.
I have downloaded and re-downloaded the coreNLP Java archive many times and still no luck! See image for contents of my coreNLP R package folder located at /Library/Frameworks/R.framework/Versions/3.4/Resources/library/coreNLP.
Do you know how I can successfully initialize coreNLP?
dyn.load("/Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home/lib/server/libjvm.dylib")
library(NLP)
library(coreNLP)
> downloadCoreNLP()
trying URL 'http://nlp.stanford.edu/software//stanford-corenlp-full-2015-12-09.zip'
Content type 'application/zip' length 403157240 bytes (384.5 MB)
==================================================
downloaded 384.5 MB
> initCoreNLP()
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Searching for resource: StanfordCoreNLP.properties
Error in rJava::.jnew("edu.stanford.nlp.pipeline.StanfordCoreNLP", basename(path)) :
edu.stanford.nlp.io.RuntimeIOException: ERROR: cannot find properties file "StanfordCoreNLP.properties" in the classpath!
Per our discussion.
My sense is your Java / R configuration dependency issue. Thus, it appears that rJava is dependent on the version of java used and coreNLP is dependent on rJava.
java <- rJava <- coreNLP
thus we can set the dlynlib version to 1.8.X, uninstall rJava, reinstall rJava then reinstall coreNLP.
Setup a particular version of java in RStudio
dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/lib/server/libjvm.dylib')
remove.packages("rJava")
install.packages("rJava")
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
# usage
packages <- c("NLP", "coreNLP", "rJava")
ipak(packages)
.jinit()
.jcall("java/lang/System","S","getProperty","java.version")
# run the follwoing command once
# downloadCoreNLP() # <- Takes a while...
initCoreNLP()
example(getSentiment)
sIn <- "Mother died today. Or, maybe, yesterday; I can't be sure."
annoObj <- annotateString(sIn)
I’m no IT guy so it’s possible that I making something very wrong. But I’m struggling for days with this issue…
Working in a VM with CentOS 7.
When running something in GeoKettle I have this error that points to GDAL.
Native library load failed.
java.lang.UnsatisfiedLinkError: /home/geoairc/QGIS_Install/geokettle/libswt/linux/x86_64/libogrjni.so: liblcms.so.1: cannot open shared object file: No such file or directory
INFO 29-05 10:10:37,639 - OGR Input - wfs xml geodomus.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=0)
Exception in thread "OGR Input - wfs xml geodomus.0 (Thread-10)" java.lang.UnsatisfiedLinkError: org.gdal.ogr.ogrJNI.RegisterAll()V
at org.gdal.ogr.ogrJNI.RegisterAll(Native Method)
at org.gdal.ogr.ogr.RegisterAll(ogr.java:110)
at org.pentaho.di.core.geospatial.OGRReader.open(OGRReader.java:75)
at org.pentaho.di.trans.steps.ogrfileinput.OGRFileInputMeta.getOutputFields(OGRFileInputMeta.java:277)
at org.pentaho.di.trans.steps.ogrfileinput.OGRFileInput.processRow(OGRFileInput.java:172)
at org.pentaho.di.trans.steps.ogrfileinput.OGRFileInput.run(OGRFileInput.java:342)
Someone pointed me that the error was caused by the lack of GDAL bindings to Java.
So I installed gdal-java RPM
https://www.rpmfind.net/linux/RPM/epel/7/x86_64/g/gdal-java-1.11.4-1.el7.x86_64.html
I installed but get successive dependency errors that can not get past (this is the first but when I try to install one of this got another set of dependencies errors):
[root#srvlgis01 tmp]# rpm -Uvh gdal-java-1.11.4-1.el7.x86_64.rpm
error: Failed dependencies:
gdal-libs(x86-64) = 1.11.4-1.el7 is needed by gdal-java-1.11.4-1.el7.x86_64
libgeotiff.so.1.2()(64bit) is needed by gdal-java-1.11.4-1.el7.x86_64
My GDAL version: gdal.x86_64 0:1.11.4-10.rhel7
Thanks in advance,
Pedro
I have trained a custom NER model with Stanford-NER. I created a properties file and used the -serverProperties argument with the java command to start my server (direction I followed from another question of mine, seen here) and load my custom NER model but when the server attempts to load my custom model it fails with this error: java.io.EOFException: Unexpected end of ZLIB input stream
The stderr.log output with error is as follows:
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 4
[main] INFO CoreNLP - Liveness server started at /0.0.0.0:9000
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0.0.0.0:80
[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:35546] API call w/annotators tokenize,ssplit,pos,lemma,depparse,natlog,ner,openie
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... [pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 12.297 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [13.6 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2636)
at java.io.ObjectInputStream$BlockDataInputStream.readDoubles(ObjectInputStream.java:3333)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1920)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2650)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1462)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1494)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2963)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:282)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:266)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:141)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:128)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:121)
at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:273)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:152)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:451)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:154)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:145)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.mkStanfordCoreNLP(StanfordCoreNLPServer.java:273)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.access$500(StanfordCoreNLPServer.java:50)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:583)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
I have googled this error and most of what I read is in regards to an issue with Java from 2007-2010 where an EOFException is "arbitrarily" thrown. This information is from here.
"When using gzip (via new Deflater(Deflater.BEST_COMPRESSION, true)), for some files, and EOFException is thrown at the end of inflating. Although the file is correct, the bug is the EOFException is thrown inconsistently. For some files it is thrown, other it is not."
Answers to other peoples questions in regards to this error state that you have to close the output streams for the gzip...? Not entirely sure what that means and I don't know how I would execute that advice as Stanford-NER is the software creating the gzip file for me.
Question: What actions can I take to eliminate this error? I am hoping this has happened to others in the past. Also looking for feedback from #StanfordNLPHelp as to whether there have been similar issues risen in the past and if there is something being done/something that has been done to the CoreNLP software to eliminate this issue. If there is a solution from CoreNLP, what files do I need to change, where are these files located within the CoreNLP framework, and what changes do I need to make?
ADDED INFO (PER #StanfordNLPHelp comments):
My model was trained using the directions found here. To train the model I used a TSV as outlined in the directions which contained text from around 90 documents. I know this is not a substantial amount of data to train with but we are just in the testing phases and will improve the model as we acquire more data.
With this TSV file and the Standford-NER software I ran the command below.
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop
I then was had my model built and was even able to load and successfully tag a larger corpus of text with the ner GUI that comes with the Stanford-NER software.
During trouble shooting why I was unable to get the model to work I also attempted to update my server.properties file with the file path to the "3 class model" that comes standard in CoreNLP. Again it failed with the same error.
The fact that both my custom model and the 3 class model both work in the Stanford-NER software but fail to load makes me believe my custom model is not the issue and that there is some issue with how the CoreNLP software loads these models through the -serverProperties argument. Or it could be something I am completely unaware of.
The properties file I used to train my NER model was similar to the on in the directions with the train file changed and the output file name changed. It looks like this:
# location of the training file
trainFile = custom-model-trainingfile.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = custome-ner-model.ser.gz
# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1
# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1
# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
My server.properties file contained only one line ner.model = /path/to/custom_model.ser.gz
I also added /path/to/custom_model to the $CLASSPATH variable in the start up script. Changed line CLASSPATH="$CLASSPATH:$JAR to CLASSPATH="$CLASSPATH:$JAR:/path/to/custom_model.ser.gz. I am not sure if this is a necessary step because I get prompted with the ZLIB error first. Just wanted to include this for completeness.
Attempted to "gunzip" my custom model with the command gunzip custom_model.ser.gz and got a similar error that I get when trying to load the model. It is gzip: custom_model.ser.gz: unexpected end of file
I'm assuming you downloaded Stanford CoreNLP 3.7.0 and have a folder somewhere called stanford-corenlp-full-2016-10-31. For the sake of this example let's assume it's in /Users/stanfordnlphelp/stanford-corenlp-full-2016-10-31 (change this to your specific situation)
Also just to clarify, when you run a Java program, it looks in the CLASSPATH for compiled code and resources. A common way to set the CLASSPATH is to just set the CLASSPATH environment variable with export command.
Typically Java compiled code and resources are stored in jar files.
If you look at stanford-corenlp-full-2016-10-31 you'll see a bunch of .jar files. One of them is called stanford-corenlp-3.7.0-models.jar. You can look at what's inside a jar file with this command: jar tf stanford-corenlp-3.7.0-models.jar.
You'll notice when you look inside that file that there are (among others) various ner models. For instance you should see this file:
edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz
in the models jar.
So a reasonable way for us to get things working is to run the server and tell it to only load 1 model (since by default it will load 3).
run these commands in one window (in the same directory as the file ner-server.properties)
export CLASSPATH=/Users/stanfordnlphelp/stanford-corenlp-full-2016-10-31/*:
java -Xmx12g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties ner-server.properties
with ner-server.properties being a 2-line file with these 2 lines:
annotators = tokenize,ssplit,pos,lemma,ner
ner.model = edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz
The export command above is putting EVERY jar in that directory on the CLASSPATH. That is what the * means. So stanford-corenlp-3.7.0-models.jar should be on the CLASSPATH. Thus when the Java code runs, it will be able to find edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz.
In a different Terminal window, issue this command:
wget --post-data 'Joe Smith lives in Hawaii.' 'localhost:9000/?properties={"outputFormat":"json"}' -O -
When this runs, you should see in the first window (where the server is running) that only this model is loading edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz.
You should note that if you deleted the ner.model from your file and re-did all of these, 3 models would load instead of 1.
Please let me know if that all works or not.
Let's assume I made an NER model called custom_model.ser.gz , and that file is what StanfordCoreNLP output after the training process. Let's say I put it in the folder /Users/stanfordnlphelp/.
If steps 1 and 2 worked, you should be able to alter ner-server.properties to this:
annotators = tokenize,ssplit,pos,lemma,ner
ner.model = /Users/stanfordnlphelp/custom_model.ser.gz
And when you do the same thing, it will show your custom model loading. There should not be any kind of gzip issue. If you are still having a gzip issue, please let me know what kind of system you are running this on? Mac OS X, Unix, Windows, etc...?
And to confirm, you said that you have run your custom NER model with the standalone Stanford NER software right? If so, that sounds like the model file is fine.
I downloaded the examples of latest version for chapter 09 of “Mahout in Action”. I can successfully run several examples, but for three files, NewsKMeansClustering.java, ReutersToSparseVectors.java, and NewsFuzzyKMeansClusteing.java. Running these three programs gives similar error messages:
Aug 3, 2011 2:03:54 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions
WARNING: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions
WARNING: No job jar file set. User classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/user1/workspaceMahout1/recommender/inputDir
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
at mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:54)
For the above messages, I do not quite understand what do those two warnings mean? Moreover, it looks like that “input path” should have been created, how can I create this type of input? Thanks.
You can ignore the warnings. The error is that the input directory you have specified does not exist. Does it exist? What is your command line?
I ran into a similar mismatch. The MiA files at https://github.com/tdunning/MiA have some cases where a .csv file is left in the same dir as the Java source. For example https://github.com/tdunning/MiA/tree/master/src/main/java/mia/recommender/ch02 ... however via Eclipse, loading it using DataModel model = new FileDataModel(new File("intro.csv")); ...doesn't find it.
Adding
System.out.println("CWD: "+System.getProperty("user.dir"));
...will reveal where Eclipse is looking (in my case, a couple levels up the filetree, but this might vary depending on how exactly you've set things up).