I have trained a custom NER model with Stanford-NER. I created a properties file and used the -serverProperties argument with the java command to start my server (direction I followed from another question of mine, seen here) and load my custom NER model but when the server attempts to load my custom model it fails with this error: java.io.EOFException: Unexpected end of ZLIB input stream
The stderr.log output with error is as follows:
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 4
[main] INFO CoreNLP - Liveness server started at /
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /
[pool-1-thread-3] INFO CoreNLP - [/] API call w/annotators tokenize,ssplit,pos,lemma,depparse,natlog,ner,openie
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... [pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 12.297 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [13.6 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2636)
at java.io.ObjectInputStream$BlockDataInputStream.readDoubles(ObjectInputStream.java:3333)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1920)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2650)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1462)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1494)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2963)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:282)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:266)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:141)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:128)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:121)
at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:273)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:152)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:451)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:154)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:145)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.mkStanfordCoreNLP(StanfordCoreNLPServer.java:273)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.access$500(StanfordCoreNLPServer.java:50)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:583)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
I have googled this error and most of what I read is in regards to an issue with Java from 2007-2010 where an EOFException is "arbitrarily" thrown. This information is from here.
"When using gzip (via new Deflater(Deflater.BEST_COMPRESSION, true)), for some files, and EOFException is thrown at the end of inflating. Although the file is correct, the bug is the EOFException is thrown inconsistently. For some files it is thrown, other it is not."
Answers to other peoples questions in regards to this error state that you have to close the output streams for the gzip...? Not entirely sure what that means and I don't know how I would execute that advice as Stanford-NER is the software creating the gzip file for me.
Question: What actions can I take to eliminate this error? I am hoping this has happened to others in the past. Also looking for feedback from #StanfordNLPHelp as to whether there have been similar issues risen in the past and if there is something being done/something that has been done to the CoreNLP software to eliminate this issue. If there is a solution from CoreNLP, what files do I need to change, where are these files located within the CoreNLP framework, and what changes do I need to make?
ADDED INFO (PER #StanfordNLPHelp comments):
My model was trained using the directions found here. To train the model I used a TSV as outlined in the directions which contained text from around 90 documents. I know this is not a substantial amount of data to train with but we are just in the testing phases and will improve the model as we acquire more data.
With this TSV file and the Standford-NER software I ran the command below.
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop
I then was had my model built and was even able to load and successfully tag a larger corpus of text with the ner GUI that comes with the Stanford-NER software.
During trouble shooting why I was unable to get the model to work I also attempted to update my server.properties file with the file path to the "3 class model" that comes standard in CoreNLP. Again it failed with the same error.
The fact that both my custom model and the 3 class model both work in the Stanford-NER software but fail to load makes me believe my custom model is not the issue and that there is some issue with how the CoreNLP software loads these models through the -serverProperties argument. Or it could be something I am completely unaware of.
The properties file I used to train my NER model was similar to the on in the directions with the train file changed and the output file name changed. It looks like this:
# location of the training file
trainFile = custom-model-trainingfile.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = custome-ner-model.ser.gz
# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1
# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only
# the last 4 properties deal with word shape features
My server.properties file contained only one line ner.model = /path/to/custom_model.ser.gz
I also added /path/to/custom_model to the $CLASSPATH variable in the start up script. Changed line CLASSPATH="$CLASSPATH:$JAR to CLASSPATH="$CLASSPATH:$JAR:/path/to/custom_model.ser.gz. I am not sure if this is a necessary step because I get prompted with the ZLIB error first. Just wanted to include this for completeness.
Attempted to "gunzip" my custom model with the command gunzip custom_model.ser.gz and got a similar error that I get when trying to load the model. It is gzip: custom_model.ser.gz: unexpected end of file
I'm assuming you downloaded Stanford CoreNLP 3.7.0 and have a folder somewhere called stanford-corenlp-full-2016-10-31. For the sake of this example let's assume it's in /Users/stanfordnlphelp/stanford-corenlp-full-2016-10-31 (change this to your specific situation)
Also just to clarify, when you run a Java program, it looks in the CLASSPATH for compiled code and resources. A common way to set the CLASSPATH is to just set the CLASSPATH environment variable with export command.
Typically Java compiled code and resources are stored in jar files.
If you look at stanford-corenlp-full-2016-10-31 you'll see a bunch of .jar files. One of them is called stanford-corenlp-3.7.0-models.jar. You can look at what's inside a jar file with this command: jar tf stanford-corenlp-3.7.0-models.jar.
You'll notice when you look inside that file that there are (among others) various ner models. For instance you should see this file:
in the models jar.
So a reasonable way for us to get things working is to run the server and tell it to only load 1 model (since by default it will load 3).
run these commands in one window (in the same directory as the file ner-server.properties)
export CLASSPATH=/Users/stanfordnlphelp/stanford-corenlp-full-2016-10-31/*:
java -Xmx12g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties ner-server.properties
with ner-server.properties being a 2-line file with these 2 lines:
annotators = tokenize,ssplit,pos,lemma,ner
ner.model = edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz
The export command above is putting EVERY jar in that directory on the CLASSPATH. That is what the * means. So stanford-corenlp-3.7.0-models.jar should be on the CLASSPATH. Thus when the Java code runs, it will be able to find edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz.
In a different Terminal window, issue this command:
wget --post-data 'Joe Smith lives in Hawaii.' 'localhost:9000/?properties={"outputFormat":"json"}' -O -
When this runs, you should see in the first window (where the server is running) that only this model is loading edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz.
You should note that if you deleted the ner.model from your file and re-did all of these, 3 models would load instead of 1.
Please let me know if that all works or not.
Let's assume I made an NER model called custom_model.ser.gz , and that file is what StanfordCoreNLP output after the training process. Let's say I put it in the folder /Users/stanfordnlphelp/.
If steps 1 and 2 worked, you should be able to alter ner-server.properties to this:
annotators = tokenize,ssplit,pos,lemma,ner
ner.model = /Users/stanfordnlphelp/custom_model.ser.gz
And when you do the same thing, it will show your custom model loading. There should not be any kind of gzip issue. If you are still having a gzip issue, please let me know what kind of system you are running this on? Mac OS X, Unix, Windows, etc...?
And to confirm, you said that you have run your custom NER model with the standalone Stanford NER software right? If so, that sounds like the model file is fine.
I am unable to access coreNLP in R on a Mac running High Sierra. I am uncertain what the problem is, but it seems that every time I try again to get coreNLP to work, I am faced with a different error. I have JDK 9.0.4. Please see my code below for what I am attempting to do, and the error that stops me.
My previous attempt I was able to get initCoreNLP() to run and load some elements of the packages, but it would fail on others. When I then attempted to run annotateString(), it would throw the error Error Must initialize with 'int CoreNLP'!.
I have downloaded and re-downloaded the coreNLP Java archive many times and still no luck! See image for contents of my coreNLP R package folder located at /Library/Frameworks/R.framework/Versions/3.4/Resources/library/coreNLP.
Do you know how I can successfully initialize coreNLP?
> downloadCoreNLP()
trying URL 'http://nlp.stanford.edu/software//stanford-corenlp-full-2015-12-09.zip'
Content type 'application/zip' length 403157240 bytes (384.5 MB)
downloaded 384.5 MB
> initCoreNLP()
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Searching for resource: StanfordCoreNLP.properties
Error in rJava::.jnew("edu.stanford.nlp.pipeline.StanfordCoreNLP", basename(path)) :
edu.stanford.nlp.io.RuntimeIOException: ERROR: cannot find properties file "StanfordCoreNLP.properties" in the classpath!
Per our discussion.
My sense is your Java / R configuration dependency issue. Thus, it appears that rJava is dependent on the version of java used and coreNLP is dependent on rJava.
java <- rJava <- coreNLP
thus we can set the dlynlib version to 1.8.X, uninstall rJava, reinstall rJava then reinstall coreNLP.
Setup a particular version of java in RStudio
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
# usage
packages <- c("NLP", "coreNLP", "rJava")
# run the follwoing command once
# downloadCoreNLP() # <- Takes a while...
sIn <- "Mother died today. Or, maybe, yesterday; I can't be sure."
annoObj <- annotateString(sIn)
I am attempting to upgrade my version of Stanford CoreNLP to the latest edition (was previously on v3.5.2, tried to upgrade to v3.6.0). After compiling all of the new jars necessary for v3.6.0, I started up a standalone Stanford CoreNLP server (using Apache Thrift v0.9.3).
In addition, I am using Stanford's Shift Reduce Parser, which can be found at the following link: Stanford Shift Reduce Parser. I believe the latest version of the model was published on 10/23/2014. The model I need in particular is the englishSR.beam.ser.gz (English Beam Search Shift Reduce Model).
Unfortunately, upon running my new server (Stanford CoreNLP v3.6.0 / Apache Thrift v0.9.3), the logs displayed an error:
Reading in configuration from scripts/config...
Initializing Parser...
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.6 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.5 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.5 sec].
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
May 20, 2016 3:41:00 AM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 83 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
May 20, 2016 3:41:01 AM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 267 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
May 20, 2016 3:41:01 AM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 25 rules
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.4 sec].
03:41:01.566 [main] ERROR edu.stanford.nlp.io.IOUtils - Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.beam.ser.gz ...
done [10.4 sec].
Initializing Tokenizer...
The CoreNLP server is running...
Conversely, this is the log that is seen when running the old server (Stanford CoreNLP v3.5.2 / Thrift v0.9.3):
Reading in configuration from scripts/config...
Initializing Parser...
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.1 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.7 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.beam.ser.gz ... done [9.8 sec].
Adding annotator dcoref
Initializing Tokenizer...
The CoreNLP server is running...
As you can see, the new server errors out while trying to load in the srparser and does not end up "Adding annotator dcoref". I did not modify any of the other files and am unsure what could have caused the discrepancy. Currently looking for a Stanford Core NLP properties file, but I would appreciate any help regarding this issue. Thanks in advance!
I ran this command and had no issues:
java -Xmx6g -cp "stanford-corenlp-full-2015-12-09/*:stanford-english-corenlp-2016-01-10-models.jar" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse -file sample-text.txt -outputFormat text -parse.model edu/stanford/nlp/models/srparser/englishSR.beam.ser.gz
This is using the distribution for Stanford CoreNLP 3.6.0 and the latest model jars we use.
Make sure to just use those jars and only those jars, if you have older versions of things floating around in your CLASSPATH that could cause compatibility issues which might makes things not work.
And just to be clear the distribution comes with a standard models jar which has some basic things to use the toolkit. Then separately there is the English models jar which is HUGE and contains ALL English resources.
The English shift reduce parser models are all in the English models jar we distribute now which has ALL English resources. That is the recommended way to get English resources not available in the standard jar. I may need to update some pages to reflect this information. It's possible the old shift reduce models jar is not compatible with 3.6.0, I will investigate.
All of these things are available here: http://stanfordnlp.github.io/CoreNLP/download.html
If you are still having issues let me know, and let me know where the jar causing the problem is coming from and I will investigate. But if you run that command with the resources downloaded from the link above it should work fine.
Did you call ShiftReduceParser.loadModel method?
This method calls IOUtils.readObjectAnnouncingTimingFromURLOrClasspathOrFileSystem method, and its source is as follows:
try {
Timing timing = new Timing();
logger.error(msg + ' ' + path + " ... ");
obj = IOUtils.readObjectFromURLOrClasspathOrFileSystem(path);
} catch (IOException | ClassNotFoundException e) {
throw new RuntimeIOException(e);
return obj;
"logger.error" is a mistake probably. It shoud be "logger.info", I think.
I have downloaded the ActiveMQ zip file on my windows system and extracted it. Then I tried to run the activemq.bat file and it is not getting started. It is showing the following shared log file, Can any one tell me what is the issue and what needs to be done to start the activeMQ,
Java Runtime: Oracle Corporation 1.7.0_51 C:\Program Files\Java\jdk1.7.0_51\jre
Heap sizes: current=1005568k free=995061k max=1005568k
JVM args: -Dcom.sun.management.jmxremote -Xms1G -Xmx1G -Djava.util.logging.c
onfig.file=logging.properties -Djava.security.auth.login.config=D:\apache-active
mq-5.11.1\bin\..\conf\login.config -Dactivemq.classpath=D:\apache-activemq-5.11.
n\../conf; -Dactivemq.home=D:\apache-activemq-5.11.1\bin\.. -Dactivemq.base=D:\a
pache-activemq-5.11.1\bin\.. -Dactivemq.conf=D:\apache-activemq-5.11.1\bin\..\co
nf -Dactivemq.data=D:\apache-activemq-5.11.1\bin\..\data -Djava.io.tmpdir=D:\apa
Extensions classpath:
ACTIVEMQ_HOME: D:\apache-activemq-5.11.1\bin\..
ACTIVEMQ_BASE: D:\apache-activemq-5.11.1\bin\..
ACTIVEMQ_CONF: D:\apache-activemq-5.11.1\bin\..\conf
ACTIVEMQ_DATA: D:\apache-activemq-5.11.1\bin\..\data
Usage: Main [--extdir <dir>] [task] [task-options] [task data]
browse - Display selected messages in a specified destinat
bstat - Performs a predefined query that displays useful
statistics regarding the specified broker
create - Creates a runnable broker instance in the specifi
ed path.
decrypt - Decrypts given text
dstat - Performs a predefined query that displays useful
tabular statistics regarding the specified destination type
encrypt - Encrypts given text
export - Exports a stopped brokers data files to an archiv
e file
list - Lists all available brokers in the specified JMX
purge - Delete selected destination's messages that match
es the message selector
query - Display selected broker component's attributes an
d statistics.
start - Creates and starts a broker using a configuration
file, or a broker URI.
stop - Stops a running broker specified by the broker na
Task Options (Options specific to each task):
--extdir <dir> - Add the jar files in the directory to the classpath.
--version - Display the version information.
-h,-?,--help - Display this help information. To display task specific he
lp, use Main [task] -h,-?,--help
Task Data:
- Information needed by each specific task.
JMX system property options:
-Dactivemq.jmx.url=<jmx service uri> (default is: 'service:jmx:rmi:///jndi/r
-Dactivemq.jmx.user=<user name>
You must start ActiveMQ by command:
activemq-admin.bat start
activemq.bat is for managment, that's why you have set arguments
I have a pentaho job which runs successfully in pentaho but if i try to run the same through command line i get the error
Kitchen can't continue because the job could not be loaded.
D:\data-integration>kitchen.bat /file:D:\PENTAHO\pentahojobsNtrans_1\jobs\vws_sync_job_2.kjb /level:Basic
DEBUG: _PENTAHO_JAVA_HOME=C:\Program Files\Java\jre7
DEBUG: _PENTAHO_JAVA=C:\Program Files\Java\jre7\bin\java.exe
2014/08/25 12:44:33 - Kitchen - Logging is at level : Basic logging
2014/08/25 12:44:33 - Kitchen - Start of run.
ERROR: Kitchen can't continue because the job couldn't be loaded.
What is that i am doing wrong?
please help
The most common mistake that the begginers do (I am included) is to use "\" to give the path of the transformation or the job. Replace it by "/".
I get the same error on Windows (maybe de same solution works on Linux) because I have spaces on file names. If it is your case, protect the entire path of file directory with quotation marks.
For example:
"C:\Program Files\Pentaho Data Integration - Kettle\kitchen.bat" /file:"C:/Users/Username/Documents/Pentaho Projects/Job - System Integration.kjb" /level:Basic
Another important thing is to follow the instruction given by a_horse_with_no_name (funny username by the way) using slash.
Avoid the backslash instruction as given in the kitchen documentation (http://wiki.pentaho.com/display/EAI/Kitchen+User+Documentation) on its example: kitchen.bat /file:D:\Jobs\updateWarehouse.kjb /level:Basic
The current SAX parser takes a lot of time (20 minutes) and heap memory(around 400mb) to deserialize the response coming from the soap server as per the logs. Our response XMLs are of average size 4 mb.
A part of the log when it runs the applicaiton out of heap is below
DEBUG (org.apache.axis.encoding.DeserializationContext) Pushing handler org.apache.axis.message.SOAPHandler#16d22f1
DEBUG (org.apache.axis.i18n.ProjectResourceBundle) org.apache.axis.i18n.resource::handleGetObject(newElem00)
DEBUG (org.apache.axis.message.MessageElement) New MessageElement (org.apache.axis.message.MessageElement#112c22) named {}name
DEBUG (org.apache.axis.encoding.DeserializationContext) Pushing element name
DEBUG (org.apache.axis.utils.NSStack) NSPush (32)
DEBUG (org.apache.axis.encoding.DeserializationContext) Exit: DeserializationContext::startElement()
DEBUG (org.apache.axis.encoding.DeserializationContext) Enter: DeserializationContext::endElement(, name)
DEBUG (org.apache.axis.i18n.ProjectResourceBundle) org.apache.axis.i18n.resource::handleGetObject(popHandler00)
DEBUG (org.apache.axis.encoding.DeserializationContext) Popping handler org.apache.axis.message.SOAPHandler#16d22f1
DEBUG (org.apache.axis.utils.NSStack) NSPop (32)
DEBUG (org.apache.axis.encoding.DeserializationContext) Popped element stack to org.apache.axis.message.MessageElement:property
DEBUG (org.apache.axis.encoding.DeserializationContext) Exit: DeserializationContext::endElement()
DEBUG (org.apache.axis.encoding.DeserializationContext) Enter: DeserializationContext::startElement(, value)
DEBUG (org.apache.axis.i18n.ProjectResourceBundle) org.apache.axis.i18n.resource::handleGetObject(pushHandler00)
DEBUG (org.apache.axis.encoding.DeserializationContext) Pushing handler org.apache.axis.message.SOAPHandler#16880ba
DEBUG (org.apache.axis.i18n.ProjectResourceBundle) org.apache.axis.i18n.resource::handleGetObject(newElem00)
DEBUG (org.apache.axis.message.MessageElement) New MessageElement (org.apache.axis.message.MessageElement#1db74af) named {}value
DEBUG (org.apache.axis.encoding.DeserializationContext) Pushing element value
DEBUG (org.apache.axis.utils.NSStack) NSPush (32)
DEBUG (org.apache.axis.encoding.DeserializationContext) Exit: DeserializationContext::startElement()
DEBUG (org.apache.axis.encoding.DeserializationContext) Enter: DeserializationContext::endElement(, value)
DEBUG (org.apache.axis.i18n.ProjectResourceBundle) org.apache.axis.i18n.resource::handleGetObject(popHandler00)
DEBUG (org.apache.axis.encoding.DeserializationContext) Popping handler org.apache.axis.message.SOAPHandler#16880ba
DEBUG (org.apache.axis.utils.NSStack) NSPop (32)
I cannot use Axis2 because of technical reasons.
I have tried using HTTP Commons client instead of HTTP client but the response time remains the same.
How can i link a different parser(example xerces 2.10.0 or xstream 1.3.1?) to Axis 1.4 framework in this context so that memory management and response time is favorable?.
From this link of installation
In the Axis directory, you will find a WEB-INF sub-directory. This directory contains some basic configuration information, but can also be used to contain the dependencies and web services you wish to deploy.
Axis needs to be able to find an XML parser. If your application server or Java runtime does not make one visible to web applications, you need to download and add it. Java 1.4 includes the Crimson parser, so you can omit this stage, though the Axis team prefer Xerces.
To add an XML parser, acquire the JAXP 1.1 XML compliant parser of your choice. We recommend Xerces jars from the xml-xerces distribution, though others mostly work. Unless your JRE or app server has its own specific requirements, you can add the parser's libraries to axis/WEB-INF/lib. The examples in this guide use Xerces. This guide adds xml-apis.jar and xercesImpl.jar to the AXISCLASSPATH so that Axis can find the parser (see below).
If you get ClassNotFound errors relating to Xerces or DOM then you do not have an XML parser installed, or your CLASSPATH (or AXISCLASSPATH) variables are not correctly configured.
In order for these examples to work, java must be able to find axis.jar, commons-discovery.jar, commons-logging.jar, jaxrpc.jar, saaj.jar, log4j-1.2.8.jar (or whatever is appropriate for your chosen logging implementation), and the XML parser jar file or files (e.g., xerces.jar). These examples do this by adding these files to AXISCLASSPATH and then specifying the AXISCLASSPATH when you run them. Also for these examples, we have copied the xml-apis.jar and xercesImpl.jar files into the AXIS_LIB directory. An alternative would be to add your XML parser's jar file directly to the AXISCLASSPATH variable or to add all these files to your CLASSPATH variable.
On Windows, this can be done via the following. For this document we assume that you have installed Axis in C:\axis. To store this information permanently in WinNT/2000/XP you will need to right click on "My Computer" and select "Properties". Click the "Advanced" tab and create the new environmental variables. It is often better to use WordPad to create the variable string and then paste it into the appropriate text field.
set AXIS_HOME=c:\axis
set AXISCLASSPATH=%AXIS_LIB%\axis.jar;%AXIS_LIB%\commons-discovery.jar;
Unix users have to do something similar. Below we have installed AXIS into /usr/axis and are using the bash shell. See your shell's documentation for differences. To make variables permanent you will need to add them to your shell's startup (dot) files. Again, see your shell's documentation.
set AXIS_HOME=/usr/axis
set AXISCLASSPATH=$AXIS_LIB/axis.jar:$AXIS_LIB/commons-discovery.jar:
To use Axis client code, you can select AXISCLASSPATH when invoking Java by entering
java -cp %AXISCLASSPATH% ...