"No Operation named [input] in the Graph" in Java - java

Following this Colab exeercise from Google's ML Crash Course, I generated a model in Python for the MNIST database. The code looks as follows:
import pandas as pd
import tensorflow as tf
def create_model(my_learning_rate):
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(28, 28), name='input'))
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
model.add(tf.keras.layers.Dense(units=256, activation='relu'))
model.add(tf.keras.layers.Dense(units=128, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=10, activation='softmax', name='output'))
model.compile(optimizer=tf.keras.optimizers.Adam(lr=my_learning_rate),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
def train_model(model, train_features, train_label, epochs,
batch_size=None, validation_split=0.1):
history = model.fit(x=train_features, y=train_label, batch_size=batch_size,
epochs=epochs, shuffle=True,
validation_split=validation_split)
epochs = history.epoch
hist = pd.DataFrame(history.history)
return epochs, hist
if __name__ == '__main__':
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train_normalized = x_train / 255.0
x_test_normalized = x_test / 255.0
learning_rate = 0.003
epochs = 50
batch_size = 4000
validation_split = 0.2
my_model = create_model(learning_rate)
epochs, hist = train_model(my_model, x_train_normalized, y_train,
epochs, batch_size, validation_split)
my_model.save('my_model')
The model is saved to the "my_model" folder, as it should. Now I load it again in my Java program:
public class HelloTensorFlow {
public static void main(final String[] args) {
final String filePath = Paths.get("my_model").toAbsolutePath().toString();
try (final SavedModelBundle b = SavedModelBundle.load(filePath, "serve")) {
final Session sess = b.session();
final Tensor<Float> x = Tensor.create(new float[1][28 * 28], Float.class);
final List<Tensor<?>> run = sess.runner()
.feed("input", x)
.fetch("output")
.run();
final float[] y = run.get(0).copyTo(new float[1]);
System.out.println(y[0]);
}
}
}
The model is loaded but the runner does not work. When I execute the program, I get "No Operation named [input] in the Graph", even though my Input has this name. What am I doing wrong. I have the newest TensorFlow versions: 2.3.0 (Python) and 1.15.0 (Java).

I solved it. TensorFlow 2 seems to have odd naming schemes, but using the MetaGraphDef, this can be deciphered. First, you need the org.tensorflow.proto dependency. Then, you can extract the information from the meta graph like so:
final MetaGraphDef metaGraphDef = MetaGraphDef.parseFrom(bundle.metaGraphDef());
final SignatureDef signatureDef = metaGraphDef.getSignatureDefMap().get("serving_default");
final TensorInfo inputTensorInfo = signatureDef.getInputsMap()
.values()
.stream()
.filter(Objects::nonNull)
.findFirst()
.orElseThrow(() -> ...);
final TensorInfo outputTensorInfo = signatureDef.getOutputsMap()
.values()
.stream()
.filter(Objects::nonNull)
.findFirst()
.orElseThrow(() -> ...);
Now you can feed the tensor you created into the name from inputTensorInfo.getName() and fetch the results from outputTensorInfo.getName().

Related

Java flight recorder profiling summary extraction

I would like to be able to extract, in a textual format, some kind of a summary of the 'hot spots' detected by analyzing a JFR recording. To be precise, I would like to extract the hot-spots resulting from running some specific functions or some specific class. Something like this:
<some tool> myfile.jfr --look-at-class='org/me/MyClass' --limit 10 --order-by self-time
And obtain a table of the most time-consuming 10 methods methods called from MyClass in org.me. I tried looking at the jfr command-line tool, but it does not have such functionality. Alternatively, JMC only has a graphical interface, but not a command-line interface. Is there another way to obtain such a result?
It's easy to create such a tool using JFR Parsing API.
import jdk.jfr.consumer.RecordingFile;
import java.nio.file.Paths;
import java.util.HashMap;
public class JfrTop {
public static void main(String[] args) throws Exception {
var fileName = args[0];
var packageName = args.length > 1 ? args[1] : "";
var top = args.length > 2 ? Integer.parseInt(args[2]) : 10;
var hotMethods = new HashMap<String, Long>();
long total = 0;
try (var recording = new RecordingFile(Paths.get(fileName))) {
while (recording.hasMoreEvents()) {
var event = recording.readEvent();
if (event.getEventType().getName().equals("jdk.ExecutionSample")) {
var stackTrace = event.getStackTrace();
if (stackTrace != null && stackTrace.getFrames().size() > 0) {
var method = stackTrace.getFrames().get(0).getMethod();
var className = method.getType().getName();
if (className.startsWith(packageName)) {
var fullName = className + '.' + method.getName() + method.getDescriptor();
hotMethods.compute(fullName, (key, val) -> val == null ? 1L : (val + 1));
}
}
total++;
}
}
}
double percent = 100.0 / total;
hotMethods.entrySet().stream()
.sorted((e1, e2) -> Long.compare(e2.getValue(), e1.getValue()))
.limit(top)
.forEachOrdered(e -> System.out.printf("%5.2f%% %s\n", e.getValue() * percent, e.getKey()));
}
}
How to run:
java JfrTop idea.jfr com.intellij.openapi.editor 10
Sample output:
20,35% com.intellij.openapi.editor.impl.RangeHighlighterTree$RHNode.recalculateRenderFlags()V
4,20% com.intellij.openapi.editor.impl.IntervalTreeImpl.maxEndOf(Lcom/intellij/openapi/editor/impl/IntervalTreeImpl$IntervalNode;I)I
3,19% com.intellij.openapi.editor.impl.IntervalTreeImpl.assertAllDeltasAreNull(Lcom/intellij/openapi/editor/impl/IntervalTreeImpl$IntervalNode;)V
2,19% com.intellij.openapi.editor.impl.IntervalTreeImpl$IntervalNode.computeDeltaUpToRoot()I
1,94% com.intellij.openapi.editor.impl.IntervalTreeImpl.pushDelta(Lcom/intellij/openapi/editor/impl/IntervalTreeImpl$IntervalNode;)Z
1,63% com.intellij.openapi.editor.impl.IntervalTreeImpl$IntervalNode.hasAliveKey(Z)Z
1,50% com.intellij.openapi.editor.impl.IntervalTreeImpl.correctMax(Lcom/intellij/openapi/editor/impl/IntervalTreeImpl$IntervalNode;I)V
1,31% com.intellij.openapi.editor.impl.IntervalTreeImpl$1.hasNext()Z
0,88% com.intellij.openapi.editor.impl.IntervalTreeImpl$IntervalNode.tryToSetCachedValues(IZI)Z
0,63% com.intellij.openapi.editor.impl.IntervalTreeImpl.findOrInsert(Lcom/intellij/openapi/editor/impl/IntervalTreeImpl$IntervalNode;)Lcom/intellij/openapi/editor/impl/IntervalTreeImpl$IntervalNode;

The i/p col features must be either string or numeric type, but got org.apache.spark.ml.linalg.VectorUDT

I am very new to Spark Machine Learning just an 3 day old novice and I'm basically trying to predict some data using Logistic Regression algorithm in spark via Java. I have referred few sites and documentation and came up with the code and i am trying to execute it but facing an issue.
So i have pre-processed the data and have used vector assembler to club all the relevant columns into one and i am trying to fit the model and facing an issue.
public class Sparkdemo {
static SparkSession session = SparkSession.builder().appName("spark_demo")
.master("local[*]").getOrCreate();
#SuppressWarnings("empty-statement")
public static void getData() {
Dataset<Row> inputFile = session.read()
.option("header", true)
.format("csv")
.option("inferschema", true)
.csv("C:\\Users\\WildJasmine\\Downloads\\NKI_cleaned.csv");
inputFile.show();
String[] columns = inputFile.columns();
int beg = 16, end = columns.length - 1;
String[] featuresToDrop = new String[end - beg + 1];
System.arraycopy(columns, beg, featuresToDrop, 0, featuresToDrop.length);
System.out.println("rows are\n " + Arrays.toString(featuresToDrop));
Dataset<Row> dataSubset = inputFile.drop(featuresToDrop);
String[] arr = {"Patient", "ID", "eventdeath"};
Dataset<Row> X = dataSubset.drop(arr);
X.show();
Dataset<Row> y = dataSubset.select("eventdeath");
y.show();
//Vector Assembler concept for merging all the cols into a single col
VectorAssembler assembler = new VectorAssembler()
.setInputCols(X.columns())
.setOutputCol("features");
Dataset<Row> dataset = assembler.transform(X);
dataset.show();
StringIndexer labelSplit = new StringIndexer().setInputCol("features").setOutputCol("label");
Dataset<Row> data = labelSplit.fit(dataset)
.transform(dataset);
data.show();
Dataset<Row>[] splitsX = data.randomSplit(new double[]{0.8, 0.2}, 42);
Dataset<Row> trainingX = splitsX[0];
Dataset<Row> testX = splitsX[1];
LogisticRegression lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8);
LogisticRegressionModel lrModel = lr.fit(trainingX);
Dataset<Row> prediction = lrModel.transform(testX);
prediction.show();
}
public static void main(String[] args) {
getData();
}}
Below image is my dataset,
dataset
Error message:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: The input column features must be either string type or numeric type, but got org.apache.spark.ml.linalg.VectorUDT#3bfc3ba7.
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.ml.feature.StringIndexerBase$class.validateAndTransformSchema(StringIndexer.scala:86)
at org.apache.spark.ml.feature.StringIndexer.validateAndTransformSchema(StringIndexer.scala:109)
at org.apache.spark.ml.feature.StringIndexer.transformSchema(StringIndexer.scala:152)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at org.apache.spark.ml.feature.StringIndexer.fit(StringIndexer.scala:135)
My end result is I need a predicted value using the features column.
Thanks in advance.
That error occurs when the input field of your dataframe for which you want to apply the StringIndexer transformation is a Vector. In the Spark documentation https://spark.apache.org/docs/latest/ml-features#stringindexer you can see that the input column is a string. This transformer performs a distinct to that column and creates a new column with integers that correspond to each different string value. It does not work for vectors.

Spark ml streaming predictOnValues how to save results?

I have following code:
StreamingLinearRegressionWithSGD regressionWithSGD =
new StreamingLinearRegressionWithSGD()
.setInitialWeights(Vectors.zeros(featuresNumber));
JavaDStream<LabeledPoint> trainingData = streamingContext.textFileStream(model.getTrainPath()).map(LabeledPoint::parse).cache();
JavaDStream<LabeledPoint> testData = streamingContext.textFileStream(model.getPredictPath()).map(LabeledPoint::parse);
regressionWithSGD.trainOn(trainingData);
regressionWithSGD.predictOnValues(testData.mapToPair(lp -> new Tuple2<>(lp.label(), lp.features()))).print();
I would like to put results to some file/db/queue and so on instead of print() is it possible?
I have figured it out
StreamingLinearRegressionWithSGD regressionWithSGD =
new StreamingLinearRegressionWithSGD()
.setInitialWeights(Vectors.zeros(featuresNumber));
JavaDStream<LabeledPoint> trainingData = streamingContext.textFileStream(model.getTrainPath()).map(LabeledPoint::parse).cache();
JavaDStream<LabeledPoint> testData = streamingContext.textFileStream(model.getPredictPath()).map(LabeledPoint::parse);
regressionWithSGD.trainOn(trainingData);
JavaDStream<Double> doubleJavaDStream=regressionWithSGD.predictOn(testData.map(labeledPoint -> labeledPoint.features()));
doubleJavaDStream.dstream().saveAsTextFiles("result","out");
So as a result we are getting result-{timestamp}.out folders.

Getting a Tensors value in Java

I habe trained a recurrent neural network with tensorflow using python. I saved the model and restored it in a Java-Application. This is working. Now i feed my input-Tensors to the pretrained modell and fetch the output. My problem now is, that the output is a Tensor and I don´t know hot to get the Tensors value (it is a simple integer-tensor of shape 1).
The python code looks like this:
sess = tf.InteractiveSession()
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs], name="input_x")
y = tf.placeholder(tf.int32, [ None])
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
basic_cell = tf.contrib.rnn.OutputProjectionWrapper(tf.contrib.rnn.BasicRNNCell(num_units=n_neurons),output_size=n_outputs)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
logits = tf.layers.dense(states, n_outputs, name="logits")
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,logits=logits)
loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, y,1, name="correct")
pred = tf.argmax(logits, 1, name="prediction")
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
init = tf.global_variables_initializer()
def train_and_save_rnn():
# create a Saver object as normal in Python to save your variables
saver = tf.train.Saver()
# Use a saver_def to get the "magic" strings to restore
saver_def = saver.as_saver_def()
print (saver_def.filename_tensor_name)
print (saver_def.restore_op_name)
# Loading the Train-DataSet
data_train, labels_train = load_training_data("Train.csv")
data_test, labels_test = load_training_data("Test.csv")
#labels_train=reshape_labels_to_sequences(labels_train)
#labels_test=reshape_labels_to_sequences(labels_test)
dt_train = reshape_data(data_train)
dt_test = reshape_data(data_test)
X_test = dt_test
X_test = X_test.reshape((-1, n_steps, n_inputs))
y_test = labels_test-1
sess.run(tf.global_variables_initializer())
# START TRAINING ...
for epoch in range(n_epochs):
for iteration in range(dt_train.shape[0]-1):
X_batch, y_batch = dt_train[iteration], labels_train[iteration]-1
X_batch = X_batch.reshape((-1, n_steps, n_inputs))
y_batch = y_batch.reshape((1))
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)
# SAVE THE TRAINED MODEL ...
builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.SERVING])
builder.save(True) #true for human-readable
What I do in Java is:
byte[] graphDef = readAllBytesOrExit(Paths.get(IMPORT_DIRECTORY, "/saved_model.pbtxt"));
/*List<String> labels =
readAllLinesOrExit(Paths.get(IMPORT_DIRECTORY, "trained_model.txt"));
*/
try (SavedModelBundle b = SavedModelBundle.load(IMPORT_DIRECTORY, "serve")) {
// create the session from the Bundle
Session sess = b.session();
s = sess;
g = b.graph();
// This is just a sample Tensor for debugging:
Tensor t = Tensor.create(new float[][][] {{{(float)0.8231331,(float)-5.2657013,(float)-1.1111984,(float)0.0074825287,(float)0.075252056,(float)0.07835889,(float)-0.035752058,(float)-0.035610847,(float)0.045247793,(float)1.5594741,(float)57.78549,(float)-0.21489286,(float)0.011989355,(float)0.15965772,(float)13.370155,(float)3.4708557,(float)3.7776794,(float)-1.1115816,(float)0.72939104,(float)-0.44342846,(float)11.001129,(float)10.549805,(float)-50.719162,(float)-0.8261242,(float)0.71805984,(float)-0.1849739,(float)9.334606,(float)3.0003967,(float)-52.456577,(float)-0.1875816,(float)0.19306469,(float)0.004947722,(float)5.4054375,(float)-0.8630371,(float)-24.599575,(float)1.3387873,(float)-1.1488495,(float)-2.8362968,(float)22.174248,(float)-32.095154,(float)10.069847}}});
runTensor(t);
}
public static void runTensor(Tensor inputTensor) throws IOException, FileNotFoundException {
try (Graph graph = g;
Session sess = s;) {
Integer gesture = null;
Tensor y_ph = Tensor.create(new int[]{0});
Tensor result = sess.runner()
.feed("input_x", inputTensor)
.feed("Placeholder", y_ph)
.fetch("pred")
.run().get(0);
System.out.println(result);
} catch (Exception e) {
e.printStackTrace();
}
}
The output should (I´m not sure if it´s working) be an Integer between 0 and 10 for the predicted class. How can I extract the Integer in Java from the Tensor?
Thank you in advance.
Use Tensor.intValue() if it is a scalar, or Tensor.copyTo() if it is not. (So System.out.println(result.intValue());)

Building decision tree pipeline for kyro-encoded Datasets in spark 2.0.2 with Java

I'm trying to build a version of the decision tree classification example from Spark 2.0.2 org.apache.spark.examples.ml.JavaDecisionTreeClassificationExample. I can't use this directly because it uses libsvm-encoded data. I need to avoid libsvm (undocumented AFAIK) to classify ordinary datasets more easily. I'm trying to adapt the example to use a kyro-encoded dataset instead.
The issue originates in the map call below, particularly the consequences of using Encoders.kyro as the encoder as instructed by SparkML feature vectors and Spark 2.0.2 Encoders in Java
public SMLDecisionTree(Dataset<Row> incomingDS, final String label, final String[] features)
{
this.incomingDS = incomingDS;
this.label = label;
this.features = features;
this.mapSet = new StringToDoubleMapperSet(features);
this.sdlDS = incomingDS
.select(label, features)
.filter(new FilterFunction<Row>()
{
public boolean call(Row row) throws Exception
{
return !row.getString(0).equals(features[0]); // header
}
})
.map(new MapFunction<Row, LabeledFeatureVector>()
{
public LabeledFeatureVector call(Row row) throws Exception
{
double labelVal = mapSet.addValue(0, row.getString(0));
double[] featureVals = new double[features.length];
for (int i = 1; i < row.length(); i++)
{
Double val = mapSet.addValue(i, row.getString(i));
featureVals[i - 1] = val;
}
return new LabeledFeatureVector(labelVal, Vectors.dense(featureVals));
}
// https://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset
}, Encoders.kryo(LabeledFeatureVector.class));
Dataset<LabeledFeatureVector>[] splits = sdlDS.randomSplit(new double[] { 0.7, 0.3 });
this.trainingDS = splits[0];
this.testDS = splits[1];
}
This impacts the StringIndexer and VectorIndexer from the original spark example which are unable to handle the resulting kyro-encoded dataset. Here is the pipeline building code taken from the spark decision tree example code:
public void run() throws IOException
{
sdlDS.show();
StringIndexerModel labelIndexer = new StringIndexer()
.setInputCol("label")
.setOutputCol("indexedLabel")
.fit(df);
VectorIndexerModel featureIndexer = new VectorIndexer()
.setInputCol("features")
.setOutputCol("indexedFeatures")
.setMaxCategories(4) // treat features with > 4 distinct values as continuous.
.fit(df);
DecisionTreeClassifier classifier = new DecisionTreeClassifier()
.setLabelCol("indexedLabel")
.setFeaturesCol("indexedFeatures");
IndexToString labelConverter = new IndexToString()
.setInputCol("prediction")
.setOutputCol("predictedLabel")
.setLabels(labelIndexer.labels());
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[]
{ labelIndexer, featureIndexer, classifier, labelConverter });
This code apparently expects a dataset with "label" and "features" columns with the label and a Vector of double-encoded features. The problem is that kyro produces a single column named "values" that seems to hold a byte array. I know of no documentation for how to convert this to what the original StringIndexer and VectorIndexer expect. Can someone help? Java please.
Don't use Kryo encoder in the first place. It is very limited in general and not applicable here at all. The simplest solution here is to drop custom class and use Row encoder. First you'll need a bunch of imports:
import org.apache.spark.sql.catalyst.encoders.RowEncoder;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import org.apache.spark.ml.linalg.*;
and a schema:
List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("label", DoubleType, false));
fields.add(DataTypes.createStructField("features", new VectorUDT(), false));
StructType schema = DataTypes.createStructType(fields);
Encoder can be defined like this:
Encoder<Row> encoder = RowEncoder.apply(schema);
and use as shown below:
Dataset<Row> inputDs = spark.read().json(sc.parallelize(Arrays.asList(
"{\"lablel\": 1.0, \"features\": \"foo\"}"
)));
inputDs.map(new MapFunction<Row, Row>() {
public Row call(Row row) {
return RowFactory.create(1.0, Vectors.dense(1.0, 2.0));
}
}, encoder);

Categories

Resources