I recently came across sklearn2pmml and jpmml-sklearn when looking for a way to convert scikit-learn models to PMML. However, I've been hitting errors when trying to use the basic usage examples that I'm unable to figure out.
When attempting to usage example in sklearn2pmml, I've been receiving the following issue around casting a long as an int:
Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
at numpy.core.NDArrayUtil.getShape(NDArrayUtil.java:66)
at org.jpmml.sklearn.ClassDictUtil.getShape(ClassDictUtil.java:92)
at org.jpmml.sklearn.ClassDictUtil.getShape(ClassDictUtil.java:76)
at sklearn.linear_model.BaseLinearClassifier.getCoefShape(BaseLinearClassifier.java:144)
at sklearn.linear_model.BaseLinearClassifier.getNumberOfFeatures(BaseLinearClassifier.java:56)
at sklearn.Classifier.createSchema(Classifier.java:50)
at org.jpmml.sklearn.Main.run(Main.java:104)
at org.jpmml.sklearn.Main.main(Main.java:87)
Traceback (most recent call last):
File "C:\Users\user\workspace\sklearn_pmml\test.py", line 40, in <module>
sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")
File "C:\Python27\lib\site-packages\sklearn2pmml\__init__.py", line 49, in sklearn2pmml
os.remove(dump)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\user\\appdata\\local\\temp\\tmpmxyp2y.pkl'
Any suggestions as to what is going on here?
Usage code:
#
# Step 1: feature engineering
#
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import pandas
import sklearn_pandas
iris = load_iris()
iris_df = pandas.concat((pandas.DataFrame(iris.data[:, :], columns = ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]), pandas.DataFrame(iris.target, columns = ["Species"])), axis = 1)
iris_mapper = sklearn_pandas.DataFrameMapper([
(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], PCA(n_components = 3)),
("Species", None)
])
iris = iris_mapper.fit_transform(iris_df)
#
# Step 2: training a logistic regression model
#
from sklearn.linear_model import LogisticRegressionCV
iris_X = iris[:, 0:3]
iris_y = iris[:, 3]
iris_classifier = LogisticRegressionCV()
iris_classifier.fit(iris_X, iris_y)
#
# Step 3: conversion to PMML
#
from sklearn2pmml import sklearn2pmml
sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")
EDIT 12/6:
After the new update, the same issue comes up farther down the line:
Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Updating 1 target field and 3 active field(s)
Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Mapping target field y to Species
Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Mapping active field(s) [x1, x2, x3] to [Sepal.Length, Sepal.Width, Petal.Length, Petal.Width]
Traceback (most recent call last):
File "C:\Users\user\workspace\sklearn_pmml\test.py", line 40, in <module>
sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")
File "C:\Python27\lib\site-packages\sklearn2pmml\__init__.py", line 49, in sklearn2pmml
os.remove(dump)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\user\\appdata\\local\\temp\\tmpqeblat.pkl'
JPMML-SkLearn expected that ndarray.shape is tuple of i4 (mapped to java.lang.Integer by the Pyrolite library). However, in this case it was a tuple of i8 (mapped to java.lang.Long). Hence the cast exception.
This issue has been addressed in JPMML-SkLearn commit f7c16ac2fb.
If you should encounter another exception (data translation between platforms could be tricky), then you should also open a JPMML-SkLearn issue about it.
Related
We are trying to add in automated metrics to our Java Application, with Dropwizard metrics. So far, the config.yml file looks like this:
metrics:
reporters:
- type: log
logger: metrics
frequency: 5 minute
includes: "io.dropwizard.jetty.MutableServletContextHandler.active-requests","io.dropwizard.jetty.MutableServletContextHandler.active-dispatches","io.dropwizard.jetty.MutableServletContextHandler.active-suspended"
When running this project, we get an error stating that the yaml file is malformed:
io.dropwizard.configuration.ConfigurationParsingException: test/config.yml has an error:
* Malformed YAML at line: 24, column: 82; while parsing a block mapping
in 'reader', line 20, column 5:
- type: log
^
expected <block end>, but found FlowEntry
in 'reader', line 23, column 81:
... tContextHandler.active-requests","io.dropwizard.jetty.MutableSer ...
^
What exactly is wrong with the way the yaml is written here? My understanding is that the indentation, the spaces, and not having commas in within quotes were correct for this, and we're not able to find any other issues.
Just change line 6 to
includes: [io.dropwizard.jetty.MutableServletContextHandler.active-requests,io.dropwizard.jetty.MutableServletContextHandler.active-dispatches,io.dropwizard.jetty.MutableServletContextHandler.active-suspended]
I am trying to get this example for importing importing TensorFlow in Java to run: https://github.com/deeplearning4j/tf-import
I have managed to get the Java import in Mnist.java to work after some modifications.
However, I am unable to get mnist_jumpy.py to work in order to use the model in DeepLearning4J from Python. I got it run with the modifications below, but I get this exception when loading the model:
log4j:WARN No appenders could be found for logger (org.nd4j.linalg.factory.Nd4jBackend).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.rits.cloning.Cloner (file:/home/micha/.deeplearning4j/pydl4j/pydl4j-1.0.0-SNAPSHOT-cpu-core-datavec-spark2-2.11/pydl4j-1.0.0-SNAPSHOT-bin.jar) to field java.util.TreeSet.m
WARNING: Please consider reporting this to the maintainers of com.rits.cloning.Cloner
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Traceback (most recent call last):
File "/home/micha/Documents/01_work/git/tf_java_import_testing/tf-import/mnist/mnist_jumpy.py", line 17, in <module>
tf_model = jp.TFModel(path + '/mnist.pb')
File "/home/micha/Documents/01_work/git/tf_java_import_testing/tf-import/venv-deepmom/lib/python3.7/site-packages/jumpy/tf_model.py", line 22, in __init__
self.sd = TFGraphMapper.getInstance().importGraph(filepath)
File "jnius/jnius_export_class.pxi", line 906, in jnius.JavaMultipleMethod.__call__
File "jnius/jnius_export_class.pxi", line 638, in jnius.JavaMethod.__call__
File "jnius/jnius_export_class.pxi", line 715, in jnius.JavaMethod.call_method
File "jnius/jnius_utils.pxi", line 93, in jnius.check_exception
jnius.JavaException: JVM exception occurred: class java.lang.String cannot be cast to class org.tensorflow.framework.GraphDef (java.lang.String is in module java.base of loader 'bootstrap'; org.tensorflow.framework.GraphDef is in unnamed module of loader 'app')
From what I understand I am getting the exception jnius.JavaException, because the classes java.lang.String and org.tensorflow.framework.GraphDef live in different contexts, but I don't know how to resolve this (I am completely unfamiliar with jnius).
Any help would be greatly appreciated.
This is my version of mnist_jumpy.py:
import os
try:
# from jnius import autoclass
import jumpy as jp
except KeyError:
os.environ['JDK_HOME'] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.environ['JAVA_HOME'] = "/usr/lib/jvm/java-11-openjdk-amd64"
import jumpy as jp
from scipy import ndimage
# import numpy as np
import os
path = os.path.dirname(os.path.abspath(__file__))
# load tensorflow model
tf_model = jp.TFModel(path + '/mnist.pb')
# load jpg to numpy array
image = ndimage.imread(path + '/img (1).jpg').reshape((1, 28, 28))
# inference - uses nd4j
prediction = tf_model(image) # prediction is a jumpy array
# get label from predction using argmax
label = jp.argmax(prediction.reshape((10,)))
print(label)
I am trying to train a classifier on labeled data (data with the outcome vector included) and run predictions on unlabeled data using the Weka library in Java.
I've investigated every case I can find online of someone receiving the error
Exception in thread "main" java.lang.IndexOutOfBoundsException
when doing this and they seem to all be caused by either
a mis-match between the training and prediction data structures or
improperly handled missing values
The solution to the first cause is to make the data structures match and verify this with train.equalHeaders(test)). As far as I can tell the data structures are an exact match and the result of equalHeaders is true. There is no missing data in the data sets I've been using for development / testing.
My training data is the famous iris data set, which I produced by using the copy that's built into R by calling data(iris); write.csv(iris, "iris.csv", row.names = F). My prediction (test) data is the exact same data set, with the last column removed to simulate unlabeled test data. I've tried reading these files as .csv and from SQL Server tables and have encountered the same result.
I have tried 2 different ways of running the predictions; the way that's currently uncommented and the .evaluateModel method, both of which have the same error.
I have also tried changing the algorithm but this does not affect the error.
I have also printed the data to the screen and examined all of the available summary / diagnostic methods, all of which look as they should be.
The key part of my code is as follows. Originally I posted the entire code, so if you'd like to see that it's available in the edit history.
//Add dummy outcome attribute to make shape match training
Add filter1;
filter1 = new Add();
filter1.setAttributeIndex("last");
filter1.setNominalLabels("'\"setosa\"','\"versicolor\"','\"virginica\"'");
filter1.setAttributeName("\"Species\"");
filter1.setInputFormat(test);
test = Filter.useFilter(test, filter1);
Instances newTest = Filter.useFilter(test, filter); // create new test set
// set class attribute
newTest.setClassIndex(newTest.numAttributes() - 1);
// create copy
Instances labeled = new Instances(newTest);
System.out.println("check headers: " + newTrain.equalHeaders(newTest));
System.out.println(newTest); // throws the error if included
// label instances
for (int i = 0; i < newTest.numInstances(); i++) { // properly indexed
System.out.println(i);
double clsLabel = rf.classifyInstance(newTest.instance(i)); //throws the error if the earlier print is not included
labeled.instance(i).setClassValue(clsLabel);
}
}
}
The full error is:
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 22, Size: 22
java.util.ArrayList.rangeCheck(ArrayList.java:653)
java.util.ArrayList.get(ArrayList.java:429)
weka.core.Attribute.value(Attribute.java:735)
weka.core.AbstractInstance.stringValue(AbstractInstance.java:668)
weka.core.AbstractInstance.stringValue(AbstractInstance.java:644)
weka.core.AbstractInstance.toString(AbstractInstance.java:756)
weka.core.DenseInstance.toStringNoWeight(DenseInstance.java:330)
weka.core.AbstractInstance.toStringMaxDecimalDigits(AbstractInstance.java:692)
weka.core.AbstractInstance.toString(AbstractInstance.java:712)
java.lang.String.valueOf(String.java:2981)
java.lang.StringBuffer.append(StringBuffer.java:265)
weka.core.Instances.stringWithoutHeader(Instances.java:1734)
weka.core.Instances.toString(Instances.java:1718)
java.lang.String.valueOf(String.java:2981)
java.io.PrintStream.println(PrintStream.java:821)
weka.core.Tee.println(Tee.java:484)
myWeka.myWeka.main(myWeka.java:262)
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at weka.core.Attribute.value(Attribute.java:735)
at weka.core.AbstractInstance.stringValue(AbstractInstance.java:668)
at weka.core.AbstractInstance.stringValue(AbstractInstance.java:644)
at weka.core.AbstractInstance.toString(AbstractInstance.java:756)
at weka.core.DenseInstance.toStringNoWeight(DenseInstance.java:330)
at weka.core.AbstractInstance.toStringMaxDecimalDigits(AbstractInstance.java:692)
at weka.core.AbstractInstance.toString(AbstractInstance.java:712)
at java.lang.String.valueOf(String.java:2981)
at java.lang.StringBuffer.append(StringBuffer.java:265)
at weka.core.Instances.stringWithoutHeader(Instances.java:1734)
at weka.core.Instances.toString(Instances.java:1718)
at java.lang.String.valueOf(String.java:2981)
at java.io.PrintStream.println(PrintStream.java:821)
at weka.core.Tee.println(Tee.java:484)
at myWeka.myWeka.main(myWeka.java:262)
C:\Users\eggwhite\AppData\Local\NetBeans\Cache\8.1\executor-snippets\run.xml:53:
Java returned: 1
BUILD FAILED (total time: 23 seconds)
The error is thrown by double clsLabel = rf.classifyInstance(newTest.instance(i)); unless I include the line System.out.println(newTest); for diagnostic purposes, in which case the same error is thrown by that line.
I need to encode and decode some text using Reed-Solomon error correction codes. Implementation should be in Java.
I have gone through Sean Owen's implementation classes but was not able to construct these classes with a working example.
Can somebody please post an working example of Reed-Solomon error correction codes or any reference links.
this is a bit late, but there is a fully working example in Java on github here:
https://github.com/alexbeutel/Error-Correcting-Codes/tree/master/src
It features the following classes:
Decoder.java <== R-S Decoder class
Encoder.java <== R-S Encoder class
ErrorCodesMain.java <== Fully working example
GF257.java <== Galois Fields(257) class
GF28.java <== Galois Fields(2^8) class
To build the project from the command line:
javac ErrorCodesMain.java Decoder.java Encoder.java GF257.java GF28.java
To run it:
java ErrorCodesMain
Here is the program's output:
# of Generators of GF(2^8): 128
# of Generators of GF(257): 128
Generator: 206
Erasures: 38, 1, 7, 15, 28, 16, 29, 28, 7, 8,
OUTPUT FROM O(nk) IN GF(2^8): Hello, my name is Alex Beutel.
FFT OUTPUT DECODED: Hello, my name is Alex Beutel.
OUTPUT FROM O(nk) IN GF(257): Hello, my name is Alex Beutel.
I am running a jar file on mac os.It generates following error
9/2/09 1:17:54 PM [0x0-0x30c30c].com.apple.JarLauncher[11128] at
content.Main.(Main.java:18)
9/2/09 1:18:06 PM [0x0-0x30d30d].com.apple.JarLauncher[11130]
SystemFlippers: didn't consume all data for long ID 0 (pBase =
0x10012ecc0, p = 0x10012ecc4, pEnd = 0x10012ecc8)
9/2/09 1:18:06 PM [0x0-0x30d30d].com.apple.JarLauncher[11130]
SystemFlippers: didn't consume all data for long ID 0 (pBase =
0x100110140, p = 0x100110144, pEnd = 0x100110148)
9/2/09 1:18:06 PM [0x0-0x30d30d].com.apple.JarLauncher[11130]
SystemFlippers: didn't consume all data for long ID 0 (pBase =
0x100110140, p = 0x100110144, pEnd = 0x100110148)
9/2/09 1:18:06 PM [0x0-0x30d30d].com.apple.JarLauncher[11130]
Exception in thread "main"
9/2/09 1:18:06 PM [0x0-0x30d30d].com.apple.JarLauncher[11130]
java.lang.NoClassDefFoundError: javax/swing/GroupLayout$Group
9/2/09 1:18:06 PM [0x0-0x30d30d].com.apple.JarLauncher[11130] at
content.Main.(Main.java:18)
Are there required java libraries i need on my mac ?
Thanks in Advance.
GroupLayout is introduced in java 1.6 , mac os 10.4 ,10.5 still uses java 1.5 by default. Even when 1.6 is installed you have to manually set os x to use 1.6. Or if you are the developer of the application, there are separate jars for GroupLayout. you can bundle that with your application and use GroupLayout with java 1.5.
Strange error.
From this line it looks like you are missing Swing:
java.lang.NoClassDefFoundError: javax/swing/GroupLayout$Group
It may be the case you are using gcj? Try downloading the latest version of Java and see if that improves things.
you can check which version you are currently using with:
java -version