Spark ml streaming predictOnValues how to save results?

Spark ml streaming predictOnValues how to save results? - java

I have following code:
StreamingLinearRegressionWithSGD regressionWithSGD =
new StreamingLinearRegressionWithSGD()
.setInitialWeights(Vectors.zeros(featuresNumber));
JavaDStream<LabeledPoint> trainingData = streamingContext.textFileStream(model.getTrainPath()).map(LabeledPoint::parse).cache();
JavaDStream<LabeledPoint> testData = streamingContext.textFileStream(model.getPredictPath()).map(LabeledPoint::parse);
regressionWithSGD.trainOn(trainingData);
regressionWithSGD.predictOnValues(testData.mapToPair(lp -> new Tuple2<>(lp.label(), lp.features()))).print();
I would like to put results to some file/db/queue and so on instead of print() is it possible?

I have figured it out
StreamingLinearRegressionWithSGD regressionWithSGD =
new StreamingLinearRegressionWithSGD()
.setInitialWeights(Vectors.zeros(featuresNumber));
JavaDStream<LabeledPoint> trainingData = streamingContext.textFileStream(model.getTrainPath()).map(LabeledPoint::parse).cache();
JavaDStream<LabeledPoint> testData = streamingContext.textFileStream(model.getPredictPath()).map(LabeledPoint::parse);
regressionWithSGD.trainOn(trainingData);
JavaDStream<Double> doubleJavaDStream=regressionWithSGD.predictOn(testData.map(labeledPoint -> labeledPoint.features()));
doubleJavaDStream.dstream().saveAsTextFiles("result","out");
So as a result we are getting result-{timestamp}.out folders.

Related

"No Operation named [input] in the Graph" in Java

Following this Colab exeercise from Google's ML Crash Course, I generated a model in Python for the MNIST database. The code looks as follows:
import pandas as pd
import tensorflow as tf
def create_model(my_learning_rate):
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(28, 28), name='input'))
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
model.add(tf.keras.layers.Dense(units=256, activation='relu'))
model.add(tf.keras.layers.Dense(units=128, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=10, activation='softmax', name='output'))
model.compile(optimizer=tf.keras.optimizers.Adam(lr=my_learning_rate),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
def train_model(model, train_features, train_label, epochs,
batch_size=None, validation_split=0.1):
history = model.fit(x=train_features, y=train_label, batch_size=batch_size,
epochs=epochs, shuffle=True,
validation_split=validation_split)
epochs = history.epoch
hist = pd.DataFrame(history.history)
return epochs, hist
if __name__ == '__main__':
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train_normalized = x_train / 255.0
x_test_normalized = x_test / 255.0
learning_rate = 0.003
epochs = 50
batch_size = 4000
validation_split = 0.2
my_model = create_model(learning_rate)
epochs, hist = train_model(my_model, x_train_normalized, y_train,
epochs, batch_size, validation_split)
my_model.save('my_model')
The model is saved to the "my_model" folder, as it should. Now I load it again in my Java program:
public class HelloTensorFlow {
public static void main(final String[] args) {
final String filePath = Paths.get("my_model").toAbsolutePath().toString();
try (final SavedModelBundle b = SavedModelBundle.load(filePath, "serve")) {
final Session sess = b.session();
final Tensor<Float> x = Tensor.create(new float[1][28 * 28], Float.class);
final List<Tensor<?>> run = sess.runner()
.feed("input", x)
.fetch("output")
.run();
final float[] y = run.get(0).copyTo(new float[1]);
System.out.println(y[0]);
}
}
}
The model is loaded but the runner does not work. When I execute the program, I get "No Operation named [input] in the Graph", even though my Input has this name. What am I doing wrong. I have the newest TensorFlow versions: 2.3.0 (Python) and 1.15.0 (Java).

I solved it. TensorFlow 2 seems to have odd naming schemes, but using the MetaGraphDef, this can be deciphered. First, you need the org.tensorflow.proto dependency. Then, you can extract the information from the meta graph like so:
final MetaGraphDef metaGraphDef = MetaGraphDef.parseFrom(bundle.metaGraphDef());
final SignatureDef signatureDef = metaGraphDef.getSignatureDefMap().get("serving_default");
final TensorInfo inputTensorInfo = signatureDef.getInputsMap()
.values()
.stream()
.filter(Objects::nonNull)
.findFirst()
.orElseThrow(() -> ...);
final TensorInfo outputTensorInfo = signatureDef.getOutputsMap()
.values()
.stream()
.filter(Objects::nonNull)
.findFirst()
.orElseThrow(() -> ...);
Now you can feed the tensor you created into the name from inputTensorInfo.getName() and fetch the results from outputTensorInfo.getName().

Getting latest data from AWS custom Cloudwatch in Java

I have a custom metric in AWS cloudwatch and i am putting data into it through AWS java API.
for(int i =0;i<collection.size();i++){
String[] cell = collection.get(i).split("\\|\\|");
List<Dimension> dimensions = new ArrayList<>();
dimensions.add(new Dimension().withName(dimension[0]).withValue(cell[0]));
dimensions.add(new Dimension().withName(dimension[1]).withValue(cell[1]));
MetricDatum datum = new MetricDatum().withMetricName(metricName)
.withUnit(StandardUnit.None)
.withValue(Double.valueOf(cell[2]))
.withDimensions(dimensions);
PutMetricDataRequest request = new PutMetricDataRequest().withNamespace(namespace+"_"+cell[3]).withMetricData(datum);
String response = String.valueOf(cw.putMetricData(request));
GetMetricDataRequest res = new GetMetricDataRequest().withMetricDataQueries();
//cw.getMetricData();
com.amazonaws.services.cloudwatch.model.Metric m = new com.amazonaws.services.cloudwatch.model.Metric();
m.setMetricName(metricName);
m.setDimensions(dimensions);
m.setNamespace(namespace);
MetricStat ms = new MetricStat().withMetric(m);
MetricDataQuery metricDataQuery = new MetricDataQuery();
metricDataQuery.withMetricStat(ms);
metricDataQuery.withId("m1");
List<MetricDataQuery> mqList = new ArrayList<MetricDataQuery>();
mqList.add(metricDataQuery);
res.withMetricDataQueries(mqList);
GetMetricDataResult result1= cw.getMetricData(res);
}
Now i want to be able to fetch the latest data entered for a particular namespace, metric name and dimention combination through Java API. I am not able to find appropriate documenation from AWS regarding the same. Can anyone please help me?

I got the results from cloudwatch by the below code.\
GetMetricDataRequest getMetricDataRequest = new GetMetricDataRequest().withMetricDataQueries();
Integer integer = new Integer(300);
Iterator<Map.Entry<String, String>> entries = dimensions.entrySet().iterator();
List<Dimension> dList = new ArrayList<Dimension>();
while (entries.hasNext()) {
Map.Entry<String, String> entry = entries.next();
dList.add(new Dimension().withName(entry.getKey()).withValue(entry.getValue()));
}
com.amazonaws.services.cloudwatch.model.Metric metric = new com.amazonaws.services.cloudwatch.model.Metric();
metric.setNamespace(namespace);
metric.setMetricName(metricName);
metric.setDimensions(dList);
MetricStat ms = new MetricStat().withMetric(metric)
.withPeriod(integer)
.withUnit(StandardUnit.None)
.withStat("Average");
MetricDataQuery metricDataQuery = new MetricDataQuery().withMetricStat(ms)
.withId("m1");
List<MetricDataQuery> mqList = new ArrayList<>();
mqList.add(metricDataQuery);
getMetricDataRequest.withMetricDataQueries(mqList);
long timestamp = 1536962700000L;
long timestampEnd = 1536963000000L;
Date d = new Date(timestamp );
Date dEnd = new Date(timestampEnd );
getMetricDataRequest.withStartTime(d);
getMetricDataRequest.withEndTime(dEnd);
GetMetricDataResult result1= cw.getMetricData(getMetricDataRequest);

Custom DataProvider Nattable

I create nattable the following way. But I can get access to the cells only through getters and setters in my Student class. How else can I access cells? Should I create my own BodyDataProvider or use IDataProvider? If it is true, could someone give some examples of implementing such providers?
final ColumnGroupModel columnGroupModel = new ColumnGroupModel();
ColumnHeaderLayer columnHeaderLayer;
String[] propertyNames = { "name", "groupNumber", "examName", "examMark" };
Map<String, String> propertyToLabelMap = new HashMap<String, String>();
propertyToLabelMap.put("name", "Full Name");
propertyToLabelMap.put("groupNumber", "Group");
propertyToLabelMap.put("examName", "Name");
propertyToLabelMap.put("examMark", "Mark");
DefaultBodyDataProvider<Student> bodyDataProvider = new DefaultBodyDataProvider<Student>(students,
propertyNames);
ColumnGroupBodyLayerStack bodyLayer = new ColumnGroupBodyLayerStack(new DataLayer(bodyDataProvider),
columnGroupModel);
DefaultColumnHeaderDataProvider defaultColumnHeaderDataProvider = new DefaultColumnHeaderDataProvider(
propertyNames, propertyToLabelMap);
DefaultColumnHeaderDataLayer columnHeaderDataLayer = new DefaultColumnHeaderDataLayer(
defaultColumnHeaderDataProvider);
columnHeaderLayer = new ColumnHeaderLayer(columnHeaderDataLayer, bodyLayer, bodyLayer.getSelectionLayer());
ColumnGroupHeaderLayer columnGroupHeaderLayer = new ColumnGroupHeaderLayer(columnHeaderLayer,
bodyLayer.getSelectionLayer(), columnGroupModel);
columnGroupHeaderLayer.addColumnsIndexesToGroup("Exams", 2, 3);
columnGroupHeaderLayer.setGroupUnbreakable(2);
final DefaultRowHeaderDataProvider rowHeaderDataProvider = new DefaultRowHeaderDataProvider(bodyDataProvider);
DefaultRowHeaderDataLayer rowHeaderDataLayer = new DefaultRowHeaderDataLayer(rowHeaderDataProvider);
ILayer rowHeaderLayer = new RowHeaderLayer(rowHeaderDataLayer, bodyLayer, bodyLayer.getSelectionLayer());
final DefaultCornerDataProvider cornerDataProvider = new DefaultCornerDataProvider(
defaultColumnHeaderDataProvider, rowHeaderDataProvider);
DataLayer cornerDataLayer = new DataLayer(cornerDataProvider);
ILayer cornerLayer = new CornerLayer(cornerDataLayer, rowHeaderLayer, columnGroupHeaderLayer);
GridLayer gridLayer = new GridLayer(bodyLayer, columnGroupHeaderLayer, rowHeaderLayer, cornerLayer);
NatTable table = new NatTable(shell, gridLayer, true);

As answered in your previous question How do I fix NullPointerException and putting data into NatTable, this is explained in the NatTable Getting Started Tutorial.
If you need some sample code try the NatTable Examples Application
And from knowing your previous question, your data structure does not work in a table, as you have nested objects where the child objects are stored in an array. So this is more a tree and not a table.

How to pass numerical and categorical features to RandomForestRegressor in Apache Spark: MLlib in Java?

How to pass numerical and categorical features to RandomForestRegressor in Apache Spark: MLlib in Java?
I'm able to do it with numerical or categorical, but I don't know how to implement it together.
My working code is as follows (only numerical features used for prediction)
String[] featureNumericalCols = new String[]{
"squareM",
"timeTimeToPragueCityCenter",
};
String[] featureStringCols = new String[]{ //not used
"type",
"floor",
"disposition",
};
VectorAssembler assembler = new VectorAssembler().setInputCols(featureNumericalCols).setOutputCol("features");
Dataset<Row> numericalData = assembler.transform(data);
numericalData.show();
RandomForestRegressor rf = new RandomForestRegressor().setLabelCol("price")
.setFeaturesCol("features");
// Chain indexer and forest in a Pipeline
Pipeline pipeline = new Pipeline()
.setStages(new PipelineStage[]{assembler, rf});
// Train model. This also runs the indexer.
PipelineModel model = pipeline.fit(trainingData);
// Make predictions.
Dataset<Row> predictions = model.transform(testData);

For anyone out there, this is the solution:
StringIndexer typeIndexer = new StringIndexer()
.setInputCol("type")
.setOutputCol("typeIndex");
preparedData = typeIndexer.fit(preparedData).transform(preparedData);
StringIndexer floorIndexer = new StringIndexer()
.setInputCol("floor")
.setOutputCol("floorIndex");
preparedData = floorIndexer.fit(preparedData).transform(preparedData);
StringIndexer dispositionIndexer = new StringIndexer()
.setInputCol("disposition")
.setOutputCol("dispositionIndex");
preparedData = dispositionIndexer.fit(preparedData).transform(preparedData);
String[] featureCols = new String[]{
"squareM",
"timeTimeToPragueCityCenter",
"floorIndex",
"floorIndex",
"dispositionIndex"
};
VectorAssembler assembler = new VectorAssembler().setInputCols(featureCols).setOutputCol("features");
preparedData = assembler.transform(preparedData);
// ... some more impelemtation details
RandomForestRegressor rf = new RandomForestRegressor().setLabelCol("price")
.setFeaturesCol("features");
return rf.fit(preparedData);

Weka filter removeuseless issue

I am doing a classification with weka, I tried to use the filter.removeuseless, but with the same arff file, I found some differences betweeen using that in the code and in the GUI. In the code I invoked it in this way:
Normalize norm = new Normalize();
norm.setInputFormat(train);
Instances train_norm = Filter.useFilter(train, norm);
RemoveUseless ru = new RemoveUseless();
ru.setInputFormat(train_norm);
Instances train_new = Filter.useFilter(train_norm, ru);
Ranker rank = new Ranker();
InfoGainAttributeEval eval = new InfoGainAttributeEval();
eval.buildEvaluator(train_new);
The result is "strange" because the filter deleted a lot of attributes which the GUI kept as informative for the classification. (The filter in the GUI worked very well). What is the problem? Am I using it well in the code?

i solved like this:
Normalize norm = new Normalize();
norm.setInputFormat(train);
train = Filter.useFilter(train, norm);
RemoveUseless ru = new RemoveUseless();
ru.setInputFormat(train);
train = Filter.useFilter(train, ru);
Ranker rank = new Ranker();
InfoGainAttributeEval eval = new InfoGainAttributeEval();
eval.buildEvaluator(train);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark ml streaming predictOnValues how to save results? - java

Related

"No Operation named [input] in the Graph" in Java

Getting latest data from AWS custom Cloudwatch in Java

Custom DataProvider Nattable

How to pass numerical and categorical features to RandomForestRegressor in Apache Spark: MLlib in Java?

Weka filter removeuseless issue

Categories

Resources