Passing data to Tensorflow model in Java - java

I'm trying to use a Tensorflow model that I trained in python to score data in Scala (using TF Java API). For the model, I've used thisregression example, with the only change being that I dropped asText=True from export_savedmodel.
My snippet of Scala:
val b = SavedModelBundle.load("/tensorflow/tf-estimator-tutorials/trained_models/reg-model-01/export/1531933435/", "serve")
val s = b.session()
// output = predictor_fn({'csv_rows': ["0.5,1,ax01,bx02", "-0.5,-1,ax02,bx02"]})
val input = "0.5,1,ax01,bx02"
val inputTensor = Tensor.create(input.getBytes("UTF-8"))
val result = s.runner()
.feed("csv_rows", inputTensor)
.fetch("dnn/logits/BiasAdd")
.run()
.get(0)
When I run, I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: Input to reshape is a tensor with 2 values, but the requested shape has 4
[[Node: dnn/input_from_feature_columns/input_layer/alpha_indicator/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _output_shapes=[[?,2]], _device="/job:localhost/replica:0/task:0/device:CPU:0"](dnn/input_from_feature_columns/input_layer/alpha_indicator/Sum, dnn/input_from_feature_columns/input_layer/alpha_indicator/Reshape/shape)]]
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access$100(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:298)
at org.tensorflow.Session$Runner.run(Session.java:248)
I figure that there's a problem with how I've prepared my input Tensor, but I'm stuck on how to best debug this.

The error message suggests that the shape of the input tensor in some operation isn't what is expected.
Looking at the Python notebook you linked to (particularly section 8a and 8c), it seems that the input tensor is expected to be a "batch" of string tensors, not a single string tensor.
You can observe this by comparing the shapes of the tensors in your Scala and Python program (inputTensor.shape() in scala vs. the shape of csv_rows provided to predict_fn in the Python notebook).
From that, it seems what you want is for inputTensor to be a vector of strings, not a single scalar string. To do that, you'd want to do something like:
val input = Array("0.5,1,ax01,bx02")
val inputTensor = Tensor.create(input.map(x => x.getBytes("UTF-8"))
Hope that helps

Related

How get accuracy value with deeplearning4j from 1 dataset

I have created a keras model in python imported into java
model = KerasModelImport.
importKerasSequentialModelAndWeights(aiModelPath);
then i passed an array of values to the model and asked for a prediction or better said for classification
INDArray x_2d = Nd4j.createFromArray(sensorValue);
INDArray prediction = model.output(x_2d);
my output ist nice and korrect
System.out.println(prediction);
[[ 0.9773, 0.0227]]
what i need is the loss function and the accuracy from this predicted values:
model.output(x_2d);
How can i get the values, any idea?
you can use evaluation:
https://deeplearning4j.konduit.ai/tuning-and-training/evaluation
Summarizing this page for what to look for:
Basic idea is you can use a datasetiterator (iterates through the data in minibatches).
DataSetIterator myTestData = ...
Evaluation eval = model.evaluate(myTestData);
or you can use the in memory approach:
Evaluation eval = new Evaluation(3);
INDArray output = model.output(testData.getFeatures());
eval.eval(testData.getLabels(), output);
log.info(eval.stats());
The eval.stats() will automatically calculate a lot of metrics.

Tensorflow : How to use speech recognition model trained in python in java

I have a tensorflow model trained in python by following this article. After training I have generated the frozen graph. Now I need to use this graph and generate recognition on a JAVA based application.
For this I was looking at the following example . However I failed to understand is to how to collect my output. I know that I need to provide 3 inputs to the graph.
From the example given on the official tutorial I have read the code that is based on python.
def run_graph(wav_data, labels, input_layer_name, output_layer_name,
num_top_predictions):
"""Runs the audio data through the graph and prints predictions."""
with tf.Session() as sess:
# Feed the audio data as input to the graph.
# predictions will contain a two-dimensional array, where one
# dimension represents the input image count, and the other has
# predictions per class
softmax_tensor = sess.graph.get_tensor_by_name(output_layer_name)
predictions, = sess.run(softmax_tensor, {input_layer_name: wav_data})
# Sort to show labels in order of confidence
top_k = predictions.argsort()[-num_top_predictions:][::-1]
for node_id in top_k:
human_string = labels[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
return 0
Can someone help me to understand the tensorflow java api?
The literal translation of the Python code you listed above would be something like this:
public static float[][] getPredictions(Session sess, byte[] wavData, String inputLayerName, String outputLayerName) {
try (Tensor<String> wavDataTensor = Tensors.create(wavData);
Tensor<Float> predictionsTensor = sess.runner()
.feed(inputLayerName, wavDataTensor)
.fetch(outputLayerName)
.run()
.get(0)
.expect(Float.class)) {
float[][] predictions = new float[(int)predictionsTensor.shape(0)][(int)predictionsTensor.shape(1)];
predictionsTensor.copyTo(predictions);
return predictions;
}
}
The returned predictions array will have the "confidence" values of each of the predictions, and you'll have to run the logic to compute the "top K" on it similar to how the Python code is using numpy (.argsort()) to do so on what sess.run() returned.
From a cursory reading of the tutorial page and code, it seems that predictions will have 1 row and 12 columns (one for each hotword). I got this from the following Python code:
import tensorflow as tf
graph_def = tf.GraphDef()
with open('/tmp/my_frozen_graph.pb', 'rb') as f:
graph_def.ParseFromString(f.read())
output_layer_name = 'labels_softmax:0'
tf.import_graph_def(graph_def, name='')
print(tf.get_default_graph().get_tensor_by_name(output_layer_name).shape)
Hope that helps.

Spark - word count using java

I'm quite new to Spark and I would like to extract features (basically count of words) from a text file using the Dataset class. I have read the "Extracting, transforming and selecting features" tutorial on Spark but every example reported starts from a bag of words defined "on the fly". I have tried several times to generate the same kind of Dataset starting from a text file but I have always failed. Here is my code:
SparkSession spark = SparkSession
.builder()
.appName("Simple application")
.config("spark.master", "local")
.getOrCreate();
Dataset<String> textFile = spark.read()
.textFile("myFile.txt")
.as(Encoders.STRING());
Dataset<Row> words = textFile.flatMap(s -> {
return Arrays.asList(s.toLowerCase().split("AG")).iterator();
}, Encoders.STRING()).filter(s -> !s.isEmpty()).toDF();
Word2Vec word2Vec = new Word2Vec()
.setInputCol("value")
.setOutputCol("result")
.setVectorSize(16)
.setMinCount(0);
Word2VecModel model = word2Vec.fit(words);
Dataset<Row> result = model.transform(words);
I get this error message: Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Column value must be of type equal to one of the following types: [ArrayType(StringType,true), ArrayType(StringType,false)] but was actually of type StringType.
I think I have to convert each line into a Row using something like:
RowFactory.create(0.0, line)
but I cannot figure out how to do that.
Basically, I was trying to train a classification system based on word counts of strings generated from a long sequence of characters. My text file contains one sequence per line so I need to split and count them for each row (the sub-strings are called k-mers and a general description can be found here). Depending on the length of the k-mers I could have more than 4^32 different strings, so I was looking for a scalable machine learning algorithm like Spark.
If you just want to count occurences of words, you can do:
Dataset<String> words = textFile.flatMap(s -> {
return Arrays.asList(s.toLowerCase().split("AG")).iterator();
}, Encoders.STRING()).filter(s -> !s.isEmpty());
Dataset<Row> counts = words.toDF("word").groupBy(col("word")).count();
Word2Vec is much more powerful ML algorithm, in your case it's not necessary to use it. Remember to add import static org.apache.spark.sql.functions.*; at the beggining of the file

How To Evaluate Predictive Neural Network in Encog

We're a creating a neural network for predicting typhoon occurrence using several typhoon parameters as input. So far, we have been able to generate data and train the neural network using Encog 3.2. Right now, we need to evaluate the results of the training.
We're using the ForestCover project (in Encog 3.2 examples) as reference, however the said project's Evaluation code is for a classification neural network. Thus, we can't evaluate our neural network following said project's code.
We also checked the PredictMarket project (in Encog 3.2 examples) since it is a predictive neural network. But we're having a difficulty in using the MLData.
MLData output = network.compute(inputData);
We want to extract the contents of output and compare it to the contents of the evaluation.csv for neural-network evaluation.
Is there a way we can extract/convert the output variable into a normalized value which we can then compare to the normalized evaluation.csv?
or
Is there a way we can modify the ForestCover Evaluate.java file to be able to evaluate a predictive neural network?
Thank you.
Here is a C# example (Java should be similar) which writes out a .csv file (TestResultsFile) with both the de-normalized expected and actual results so you can compare them with an Excel graph.
var evaluationSet = (BasicMLDataSet)EncogUtility.LoadCSV2Memory(Config.EvaluationNormalizedFile.ToString(),
network.InputCount, network.OutputCount, true, CSVFormat.English,
false);
var analyst = new EncogAnalyst();
analyst.Load(Config.NormalizationAnalystFile);
// Change this to whatever your output field index is
int outputFieldIndex = 29;
using (var resultsFile = new System.IO.StreamWriter(Config.TestResultsFile.ToString()))
{
foreach (var item in evaluationSet)
{
var normalizedActualOuput = (BasicMLData)network.Compute(item.Input);
var actualOutput = analyst.Script.Normalize.NormalizedFields[outputFieldIndex].DeNormalize(normalizedActualOuput.Data[0]);
var idealOutput = analyst.Script.Normalize.NormalizedFields[outputFieldIndex].DeNormalize(item.Ideal[0]);
var resultLine = String.Format("{0},{1}", idealOutput, actualOutput);
resultsFile.WriteLine(resultLine);
}
}
Much of this came from ideas from Abishek Kumar's Pluralsight Course
If you really want to compare Normalized values, just remove the two calls to "Denormalize".

How to get Spark MLlib RandomForestModel.predict response as text value YES/NO?

I am trying to implement RandomForest algorithm using Apache Spark MLLib. I have the dataset in the CSV format with the following features:
DayOfWeek(int),AlertType(String),Application(String),Router(String),Symptom(String),Action(String)
0,Network1,App1,Router1,Not reachable,YES
0,Network1,App2,Router5,Not reachable,NO
I want to use RandomForest MLlib and do prediction on last field Action and I want response as YES/NO.
I am following code from GitHub to create RandomForest model. Since I have all categorical features except one int feature I have used the following code to convert them into JavaRDD<LabeledPoint> - is any of that wrong?
// Load and parse the data file.
JavaRDD<String> data = jsc.textFile("/tmp/xyz/data/training-dataset.csv");
// I have 14 features so giving 14 as arg to the following
final HashingTF tf = new HashingTF(14);
// Create LabeledPoint datasets for Actionable and nonactionable
JavaRDD<LabeledPoint> labledData = data.map(new Function<String, LabeledPoint>() {
#Override public LabeledPoint call(String alert) {
List<String> featureList = Arrays.asList(alert.trim().split(","));
String actionType = featureList.get(featureList.size() - 1).toLowerCase();
return new LabeledPoint(actionType.equals("YES")? 1 : 0, tf.transform(featureList));
}
});
Similarly above I create testdata and use in the following code to do prediction
JavaPairRDD<Double, Double> predictionAndLabel =
testData.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {
#Override
public Tuple2<Double, Double> call(LabeledPoint p) {
return new Tuple2<Double, Double>(model.predict(p.features()), p.label());
}
});
How do I get prediction based on my last field Action and prediction should come as YES/NO? Current predict method returns double not able to understand how do I implement it? Also am I following the correct approach of categorical feature into LabledPoint? I am new to machine learning and Spark MLlib.
I am more familiar with the scala version but I'll try to help.
You need to map the target variable (Action) and all categorical features into levels starting in 0 like 0,1,2,3... For example router1, router2, ... router5 into 0,1,2...4. The same with your target variable which I think was the only one you actually mapped, yes/no to 1/0 (I am not sure what your tf.transform(featureList) is actually doing).
Once you have done this you can train your Randomforest classifier specifying the map for categorical features. Basically it needs you to tell which features are categorical and how many levels do they have, this is the scala version but you can easily translate it into java:
val categoricalFeaturesInfo = Map[Int, Int]((2,2),(3,5))
this is basically saying that in your list of features the 3rd one (2) has 2 levels (2,2) and the 4th one (3) has 5 levels (3,5). The rest are considered Doubles.
Now you pass the categoricalFeaturesInfo when training the classifier together with the other parameters as:
val modelRF = RandomForest.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo,numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins)
Now when you need to evaluate it, the predict function will return a double 0,1 and you can use that to compute accuracy, precision or any metric needed.
This is the example (sorry scala again) if you have a testData where you did the same transformations as before:
val predictionAndLabels = testData.map { point =>
val prediction = modelRF.predict(point.features)
(point.label, prediction)
}
Here your results are clear, the label as 1/0 and the predicted value is also 1/0, any computation of Accuracy, Precision and Recall is straightforward.
I hope it helps!!
You're heading in the correct direction, and you've already managed to train a model which is great.
For binary clasification it will return either a 0.0 or a 1.0, and its up to you to map this back to your string values.

Categories

Resources