Creating a threshold file with weka's command line

Creating a threshold file with weka's command line - java

I need to get the threshold curve from my trained classifiers automatically, so I'm figuring out how to do this with the command line (currently using Weka's SimpleCLI). Following the output from java weka.classifiers.functions.Logistic -h I'm trying to use the -threshold-file argument, described as:
-threshold-file <file>
The file to save the threshold data to.
The format is determined by the extensions, e.g., '.arff' for ARFF
format or '.csv' for CSV.
This is the line I'm trying to execute from the SimpleCLI (broken in 2 parts for ease of read):
java weka.classifiers.functions.Logistic -t ".\data\iris.arff" -no-cv -R 1.0E-8 -M -1 \
-threshold-file "C:\Temp\somefile.csv"
Which gives me either this:
java.lang.NullPointerException
or this (or both):
weka.classifiers.evaluation.ThresholdCurve.getCurve(ThresholdCurve.java:125)
weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1739)
weka.classifiers.Evaluation.evaluateModel(Evaluation.java:650)
weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:359)
weka.classifiers.functions.Logistic.main(Logistic.java:1134)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
java.lang.reflect.Method.invoke(Unknown Source)
weka.gui.SimpleCLIPanel$ClassRunner.run(SimpleCLIPanel.java:199)
at weka.classifiers.evaluation.ThresholdCurve.getCurve(ThresholdCurve.java:125)
at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1739)
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:650)
at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:359)
at weka.classifiers.functions.Logistic.main(Logistic.java:1134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at weka.gui.SimpleCLIPanel$ClassRunner.run(SimpleCLIPanel.java:199)
(Executing this from Windows cmd.exe gives me roughly the same messages. Note that I'm using Weka 3.7.11 in a Windows 7 machine (64 bits) and Java 7 (Update 55).)
Note that deleting the last part will make the command work ok, although not creating the desired threshold file.
I've tried many variants of this line, with the same result. I'm not familiar with java. What I need is to know how I'm doing this wrong.
Thanks in advance.
Update: it was pointed out to me that the line weka.classifiers.evaluation.ThresholdCurve.getCurve(ThresholdCurve.java:125) is:
if ((predictions.size() == 0)
|| (((NominalPrediction) predictions.get(0)).distribution().length <= classIndex)) {
return null;
}
(source)
So it seems that the problem arises because there are no predictions made at all. I don't know why would this happen and how I can reverse it.

The code you are using to invoke the simple cli classifier does not generate any evaluation result, based upon which you can get the threshold curve.
You can do the following.
Remove the -no-cv parameter.
Specify a test file using -T option.

Related

Exception while trying to load openNLP POS models

I have been trying to use POS Models for POS tagging, but while loading the Models I get the following exception, and this happens for both maxent as well as perceptron models:
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
at java.util.zip.ZipInputStream.read(Unknown Source)
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readLong(Unknown Source)
at java.io.DataInputStream.readDouble(Unknown Source)
at opennlp.model.BinaryFileDataReader.readDouble(BinaryFileDataReader.java:53)
at opennlp.model.AbstractModelReader.readDouble(AbstractModelReader.java:75)
at opennlp.model.AbstractModelReader.getParameters(AbstractModelReader.java:146)
at opennlp.perceptron.PerceptronModelReader.constructModel(PerceptronModelReader.java:69)
at opennlp.model.GenericModelReader.constructModel(GenericModelReader.java:59)
at opennlp.model.AbstractModelReader.getModel(AbstractModelReader.java:87)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:35)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:31)
at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:231)
at opennlp.tools.util.model.BaseModel.(BaseModel.java:190)
at opennlp.tools.postag.POSModel.(POSModel.java:86)
at nlpcheck.NlpPOC.POSTag(NlpPOC.java:54)
at nlpcheck.NlpPOC.main(NlpPOC.java:86)
I have tried loading the tokenizaton model (en-token.bin) and Its loading and working fine.
Following is java snippet that I am using to load Model:
InputStream is = new FileInputStream(MODEL_PATH);
POSModel model = new POSModel(is);
I have downloaded the models (en-pos-perceptron.bin, en-pos-maxent.bin) from http://www.opennlp.org/models-1.5/.

It turns out the model file hosted on site mentioned above were corrupt, I was trying a different tool namely GATE(General architecture for Text Engineering) which was using the same model files so I copied them and put them on build path and it worked.

What causes an InternalError to be thrown by sun.awt.shell.Win32ShellFolder2.initSpecial()?

Some of our Windows users get this stack trace shortly after launching our app:
java.lang.InternalError: Could not bind shell folder to interface
at sun.awt.shell.Win32ShellFolder2.initSpecial(Native Method) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolder2.access$300(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolder2$1.call(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolder2$1.call(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolderManager2$ComInvoker.invoke(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.ShellFolder.invoke(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolder2.<init>(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolderManager2.getNetwork(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolder2.getFileSystemPath(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolder2.access$400(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolder2$10.call(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolder2$10.call(Unknown Source) ~[na:1.7.0_25]
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) ~[na:1.7.0_25]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_25]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:1.7.0_25]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:1.7.0_25]
at sun.awt.shell.Win32ShellFolderManager2$ComInvoker$3.run(Unknown Source) ~[na:1.7.0_25]
at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_25]
Observations:
Every frame in the stack trace is something in the JDK, not in our code.
This happens only on Windows, but we've had reports of it on both Vista and Windows 7.
This happens with various versions of Java: 1.6.0_19, 1.6.0_21, 1.7.0_11, 1.7.0_25.
The problem happens to only a handful of our users, but is 100% repeatable for those users. Unfortunately, we haven't been able to see anything which those users' systems have in common other than exhibiting this issue, and none of our developers has ever experienced it themselves.
My search of Oracle's bug database turned up no bugs with the same or a similar stack trace.
There seem to be a lot of posts on the net about this particular issue without anyone having any idea what causes it.
I'm not holding out any hope of Oracle fixing whatever the problem is, if it is indeed a JDK bug---but if we knew what triggered the bug, we could at least help our users afflicted by it. Can anyone shed light on what causes this to happen?
Edit: The relevant native function is this one, from ShellFolder2.cpp:
JNIEXPORT void JNICALL Java_sun_awt_shell_Win32ShellFolder2_initSpecial
(JNIEnv* env, jobject folder, jlong desktopIShellFolder, jint folderType)
{
// Get desktop IShellFolder interface
IShellFolder* pDesktop = (IShellFolder*)desktopIShellFolder;
if (pDesktop == NULL) {
JNU_ThrowInternalError(env, "Desktop shell folder missing");
return;
}
// Get special folder relative PIDL
LPITEMIDLIST relPIDL;
HRESULT res = fn_SHGetSpecialFolderLocation(NULL, folderType,
&relPIDL);
if (res != S_OK) {
JNU_ThrowIOException(env, "Could not get shell folder ID list");
return;
}
// Set field ID for relative PIDL
env->CallVoidMethod(folder, MID_relativePIDL, (jlong)relPIDL);
// Get special folder IShellFolder interface
IShellFolder* pFolder;
res = pDesktop->BindToObject(relPIDL, NULL, IID_IShellFolder,
(void**)&pFolder);
if (res != S_OK) {
JNU_ThrowInternalError(env,
"Could not bind shell folder to interface");
return;
}
// Set field ID for pIShellFolder
env->CallVoidMethod(folder, MID_pIShellFolder, (jlong)pFolder);
}
In order to reach the "Could not bind" exception, it looks like pDesktop != NULL and relPIDL is retrieved successfully, but pDesktop->BindToObject() returns something other than S_OK. pDesktop is an IShellFolder*, which is apparently defined in Windows's <shellapi.h>. Aggravatingly, Java throws away the error code returned by IShellFolder::BindToObject.
So, I guess the question reduces to: What can cause IShellFolder::BindToObject to fail?
Edit 2: Since Win32ShellFolderManager2.getNetwork() is what's calling the Win32ShellFolder2 ctor at Win32ShellFolderManager2.java:181, we can see that the last argument to Win32ShellFolder2.initSpecial must be Win32ShellFolder2.NETWORK. So, is something is wrong with the user's Network Neighborhood folder, perhaps?

Well, there are several reports similar to yours (like this one from jEdit, this one from codenameone and this one - in german - which seems like a JFileChooser bug with a stack trace very close to yours in Windows 7 and Java 6). All seem related to JFileChooser and/or File Browsing in one way or another.
So I would approach this in one of two ways:
Either go for the time-consuming / non speculative road and take dumps of the affected installations with tools such as jstack, VisualVM or JConsole until you are able to isolate the root cause of the problem (which may or may not be the JFileChooser). If you choose that path, remember that either remote access to one of the affected installations or help from a technical savvy user is a must.
Or try to take a shortcut, assume that the problem is indeed the JFileChooser (as anecdotal evidence shows) and release a custom version replacing the JFileChooser by FileDialog. If it runs as expected on the affected machines case is closed; else give yourself a tap on the back for trying and take "The Long and Winding Road".

Had same problem and the answer was
java.awt.FileDialog
If you look up VbScript solutions for "Open File Dialog", it seems that there is no COM class doing the job for most of the windows platforms

IndexOutOfBoundsException in Mahout

I'm trying to run mahout SGD classifier on a CSV file, and I'm getting this error -
[vineet#localhost bin]$ ./mahout trainlogistic --input ./filtered.csv --output model --target target --categories 33 \
--features 200 --passes 10 --predictors subject --types text --rate 50
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 6, Size: 4
at java.util.ArrayList.rangeCheck(ArrayList.java:604)
at java.util.ArrayList.get(ArrayList.java:382)
at org.apache.mahout.classifier.sgd.CsvRecordFactory.processLine(CsvRecordFactory.java:245)
at org.apache.mahout.classifier.sgd.TrainLogistic.mainToOutput(TrainLogistic.java:85)
at org.apache.mahout.classifier.sgd.TrainLogistic.main(TrainLogistic.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
The CSV file contains unicode text, and large text fields enclosed by quote characters.
I've tried the classifier on the sample donut.csv, and it works fine.
I also tried changing my header row to make it like "id","subject","field2",etc.., but it still doesn't work.
What am I doing wrong?

some lines may dirty - only have 4 attributes instead of 6. check your data again or try to feed only one line of data to validate my guess.

InvalidClassException : class invalid for deserialization

I am using SSA parser library in my project. When I invoke main method of one of it's class using command prompt it works fine on my machine.
I execute following command from command prompt :
java -Xmx800M -cp %1 edu.stanford.nlp.parser.lexparser.LexicalizedParser -retainTMPSubcategories -outputFormat "penn,typedDependenciesCollapsed" englishPCFG.ser.gz %2
But when I tried to use the same class in my java program, I am getting Caused by: java.io.InvalidClassException: edu.stanford.nlp.stats.Counter; edu.stanford.nlp.stats.Counter; class invalid for deserialization exception.
Following line throws error :
LexicalizedParser _parser = new LexicalizedParser("C:\englishPCFG.ser.gz");
This englishPCFG.ser.gz file contains some classes or information which gets loaded when creating object of type LexicalizedParser.
Following is the stacktrace :
Loading parser from serialized file C:\englishPCFG.ser.gz ...
Exception in thread "main" java.lang.RuntimeException: Invalid class in file: C:\englishPCFG.ser.gz
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserDataFromSerializedFile(LexicalizedParser.java:822)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserDataFromFile(LexicalizedParser.java:603)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.<init>(LexicalizedParser.java:168)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.<init>(LexicalizedParser.java:154)
at com.tcs.srl.ssa.SSAInvoker.<init>(SSAInvoker.java:21)
at com.tcs.srl.ssa.SSAInvoker.main(SSAInvoker.java:53)
Caused by: java.io.InvalidClassException: edu.stanford.nlp.stats.Counter; edu.stanford.nlp.stats.Counter; class invalid for deserialization
at java.io.ObjectStreamClass.checkDeserialize(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserDataFromSerializedFile(LexicalizedParser.java:814)
... 5 more
Caused by: java.io.InvalidClassException: edu.stanford.nlp.stats.Counter; class invalid for deserialization
at java.io.ObjectStreamClass.initNonProxy(Unknown Source)
at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
at java.io.ObjectInputStream.readClassDesc(Unknown Source)
... 17 more
I am new to Java world so I dont to why this error is coming and what should I do to avoid it.
I googled for this error then I found out that this error comes because of some version mismatch which I think is something similar to dll hell of windows API. Am I correct?
Anyone knows why this kind of error comes? and what should we do to avoid it?
Please enlighten !!!

It could be because the serialVersionUID of the classe has changed, and you are trying to read an object that was written with another version of the class.
You can force the version number by déclaring a serialVersionUID in your serializable class:
private static final long serialVersionUID = 1L;

The java word for dll hell is classpath hell ;-) But that's not your hell anyway.
Object serialization is a process of persisting java objects to files (or streams). The output format is binary. Deserialization (iaw: making java objects from serialized data) requires the same versions of the classes.
So it is possible, that you simply use an older or newer version of that Counter class. This input file should be shipped with a documentation that clearly says, which version of the parser is required. I'd investigate in that direction first.

OT: For the sake of completeness I ran into InvalidClassException ... class invalid for deserialization (and this question) when solving another problem.
(Since edu.stanford.nlp.stats.Counter is not anonymous, the case in this question is certainly not the same case as mine.)
I was sending a serialized class from server to client, the class had two anonymous classes. The jar with these classes was shared among server and client but for server it was compiled in Eclipse JDT, for client in javac. Compilers generated different ordering of names $1, $2 for anonymous classes, hence instance of $1 was sent by server but could not be received as $1 at client side. More info in blogpost (in Czech, though example is obvious).

Try using serialVer to generate the serialID of your old classes that you're trying to de-serialize and add it explicitly (private static final long serialVersionUID = (insert number from serialVer here)L;) in the new versions of the class. If you change anything in a class serialized and you haven't setted the serialID, java thinks the class you've serialized isn't compatible with the new one.

This error suggests that the serialized objects within C:\englishPCFG.ser.gz were serialized with using a older or newer definition of the class which unfortunately is different in such a way that it breaks the terms of compatible serialization from one version to another.
Please see http://download.oracle.com/javase/1.4.2/docs/api/java/io/InvalidClassException.html
Can you check to see when this file was produced and then if possible locate the version of the SSAParser library at the time of it's creation?

java.lang.ArrayIndexOutOfBoundsException when creating a connection to an Oracle database

It appears that Oracle's java client has a bug - if the tnsnames.ora file has misplaced spaces/tabs/new-lines in particular places, you get an exception with the following trace:
java.lang.ArrayIndexOutOfBoundsException: <some number>
at oracle.net.nl.NVTokens.parseTokens(Unknown Source)
at oracle.net.nl.NVFactory.createNVPair(Unknown Source)
at oracle.net.nl.NLParamParser.addNLPListElement(Unknown Source)
at oracle.net.nl.NLParamParser.initializeNlpa(Unknown Source)
at oracle.net.nl.NLParamParser.<init>(Unknown Source)
at oracle.net.resolver.TNSNamesNamingAdapter.loadFile(Unknown Source)
at oracle.net.resolver.TNSNamesNamingAdapter.checkAndReload(Unknown Source)
at oracle.net.resolver.TNSNamesNamingAdapter.resolve(Unknown Source)
at oracle.net.resolver.NameResolver.resolveName(Unknown Source)
at oracle.net.resolver.AddrResolution.resolveAndExecute(Unknown Source)
at oracle.net.ns.NSProtocol.establishConnection(Unknown Source)
at oracle.net.ns.NSProtocol.connect(Unknown Source)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1037)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:282)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:468)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:165)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:35)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:839)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
If you take a C++ application and try to connect with it to the database with the same tnsnames.ora in use - it works fine. Same goes for sqlplus. Also tnsping which should parse this file has no problem resolving any service name. Seems like Oracle were too lazy to .trim() the values or something - and it is the same problem with Oracle client versions 9, 10 and 11.
Any idea why this problem exists and what is the exact problem with the tnsnames.ora format? (I just remove all white-spaces to resolve it)

I tried the advice from GriffeyDog but unfortunately it did not solve the problem - so eventually I too the check for your self approach:
Oracle's documentation states that the structure of a record in the tnsnames.ora file should be as such:
net_service_name=
(DESCRIPTION=
(ADDRESS=...)
(ADDRESS=...)
(CONNECT_DATA=
(SERVICE_NAME=sales.us.example.com)))
Ours was:
net_service_name=
(DESCRIPTION=
(ADDRESS=...)
(ADDRESS=...)
(CONNECT_DATA=
(SERVICE_NAME=sales.us.example.com)))
Apparently the indentation is crucial - if any of the lines in the block of a single net_service_name start at index 1 - this exception is thrown.
Only once you add indentation to all (can be spaces or tab) - it works. It doesn't have to look good, but has to have an offset of some sort.
Important note - the only problem is with '(', indentation rules don't apply to ')'.
E.g. the below example is perfectly fine:
net_service_name=
(DESCRIPTION=
(ADDRESS=...
)
(ADDRESS=...
)
(CONNECT_DATA=
(SERVICE_NAME=sales.us.example.com))
)
After searching for this issue to be documented - I finally found out that indeed it is documented at http://download.oracle.com/docs/cd/A57673_01/DOC/net/doc/NWUS233/apb.htm
And here is the important excerpt:
Even if you do not choose to indent your files in this way, you must indent a wrapped line by at least one space, or it will be misread as a new parameter. The following layout is acceptable:
(ADDRESS=(COMMUNITY=tcpcom.world)(PROTOCOL=tcp)
(HOST=max.world)(PORT=1521))
The following layout is not acceptable:
(ADDRESS=(COMMUNITY=tcpcom.world)(PROTOCOL=tcp)
(HOST=max.world)(PORT=1521))

I've seen similar issues arise when a text file is saved with Unix-style end-of-lines (LF) versus DOS/Windows-style (CR/LF), or vice-versa. You might try opening your tnsnames.ora file with an editor that supports saving in both formats to see if you can correct the problem that way.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Creating a threshold file with weka's command line - java

The code you are using to invoke the simple cli classifier does not generate any evaluation result, based upon which you can get the threshold curve. You can do the following. Remove the -no-cv parameter. Specify a test file using -T option.

Related

Exception while trying to load openNLP POS models

What causes an InternalError to be thrown by sun.awt.shell.Win32ShellFolder2.initSpecial()?

IndexOutOfBoundsException in Mahout

InvalidClassException : class invalid for deserialization

java.lang.ArrayIndexOutOfBoundsException when creating a connection to an Oracle database

Categories

Resources