Nullpointer exception while using Weka classifier - java

I am using Weka to create a classifier via the Java API.
The instances are created using java code.
The classifier is being created from code as well via passing following
String args[]=" -x 10 -s 1 -W weka.classifiers.functions.Logistic".split(" ");
String classname;
String[] tmpOptions = Utils.splitOptions(Utils.getOption("W", args));
classname = tmpOptions[0];
System.out.println(classname);
Classifier cls = (Classifier) Utils.forName(Classifier.class, classname, tmpOptions);
It works fine and does cross validation.
After that I once again load my training instances and label their output as ?
and pass it to classifier using
for (int index = 0; index < postDatas.size(); index++) {
Instance instance = nominal.instance(index);
double label = classifier.classifyInstance(instance);
System.out.println(label);
}
classifier.classifyInstance(instance); gives me following exception:
java.lang.NullPointerException
at weka.classifiers.functions.Logistic.distributionForInstance(Logistic.java:710)
any clues to where am I going wrong?

Since you didn't provide all relevant information, I'll take a shot in the dark:
I'm assuming that you're using Weka version 3.7.5 and I found the following source code for Logistic.java online
public double [] distributionForInstance(Instance instance) throws Exception {
// line 710
m_ReplaceMissingValues.input(instance);
instance = m_ReplaceMissingValues.output();
...
}
Assuming you didn't pass null for instance, this only leaves m_ReplaceMissingValues. That member is initialized when the method Logistic.buildClassifier(Instances train)is called:
public void buildClassifier(Instances train) throws Exception {
...
// missing values
m_ReplaceMissingValues = new ReplaceMissingValues();
m_ReplaceMissingValues.setInputFormat(train);
train = Filter.useFilter(train, m_ReplaceMissingValues);
...
}
It looks like you've never trained your classifier Logistic on any data after you created the object in the line
Classifier cls = (Classifier) Utils.forName(Classifier.class, classname, tmpOptions);

Related

Automatically handling / ignoring NameError in Jython

I have a setup where I execute jython scripts from a Java application. The java application feed the jython script with variables, coming from the command line, so that a user can write the following code in it's jython script:
print("Hello, %s" % foobar)
And will call the java program with this:
$ java -jar myengine.jar script.py --foobar=baz
Hello, baz
My java application parse the command-line, and create a variable of that name with the given value to give to the jython scripting environment to consume. All is well so far.
My issue is that when the user does not provide the foobar command-line parameter, I'd like to be able to easily provide a fallback in my script. For now, the user needs to write that sort of code to handle the situation where the foobar parameter is missing from the command-line:
try: foobar
except NameError: foobar = "some default value"
But this is cumbersome, especially if the number of parameters is growing. Is there a way to handle that better from the script user point of view?
I was thinking of catching the jython NameError in the Java code, initializing the variable causing the exception to a default value if the variable causing the exception "looks like" a parameter (adding a naming convention is OK), and restarting where the exception occurred. Alternatively, I can require the script user to write code such as this:
parameter(foobar, "some default value")
Or something equivalent.
Well, this is one ugly workaround I found so far. Be careful, as this will call the script in loop many times, and is O(n^2).
private void callScriptLoop(String scriptfile) {
PythonInterpreter pi = new PythonInterpreter();
pi.set("env", someEnv);
int nloop = 0;
boolean shouldRestart;
do {
shouldRestart = false;
try {
pi.execfile(scriptfile);
} catch (Throwable e) {
if (e instanceof PyException) {
PyException pe = (PyException) e;
String typ = pe.type.toString();
String val = pe.value.toString();
Matcher m = Pattern.compile("^name '(.*)' is not defined")
.matcher(val);
if (typ.equals("<type 'exceptions.NameError'>")
&& m.find()) {
String varname = m.group(1);
pi.set(varname, Py.None);
System.out.println(
"Initializing missing parameter '"
+ varname + "' to default value (None).");
shouldRestart = true;
nloop++;
if (nloop > 100)
throw new RuntimeException(
"NameError handler infinite loop detected: bailing-out.");
}
}
if (!shouldRestart)
throw e;
}
} while (shouldRestart);
}

Java gRPC client predict call to half_plus_two example model

I'm trying to make a call from a Java client to Tensorflow Serving. The running model is the half_plus_two example model. I can make a REST call successfully. But cannot make the gRPC equivalent call.
I have tried passing a string as model input and also an array of floats into tensor proto builder. The tensor proto seems to contain correct data when I print it out:
[1.0, 2.0, 5.0]
String host = "localhost";
int port = 8500;
// the model's name.
String modelName = "half_plus_two";
// model's version
long modelVersion = 123;
// assume this model takes input of free text, and make some sentiment prediction.
// String modelInput = "some text input to make prediction with";
String modelInput = "{\"instances\": [1.0, 2.0, 5.0]";
// create a channel
ManagedChannel channel = ManagedChannelBuilder.forAddress(host, port).usePlaintext().build();
tensorflow.serving.PredictionServiceGrpc.PredictionServiceBlockingStub stub = tensorflow.serving.PredictionServiceGrpc.newBlockingStub(channel);
// create a modelspec
tensorflow.serving.Model.ModelSpec.Builder modelSpecBuilder = tensorflow.serving.Model.ModelSpec.newBuilder();
modelSpecBuilder.setName(modelName);
modelSpecBuilder.setVersion(Int64Value.of(modelVersion));
modelSpecBuilder.setSignatureName("serving_default");
Predict.PredictRequest.Builder builder = Predict.PredictRequest.newBuilder();
builder.setModelSpec(modelSpecBuilder);
// create the TensorProto and request
float[] floatData = new float[3];
floatData[0] = 1.0f;
floatData[1] = 2.0f;
floatData[2] = 5.0f;
org.tensorflow.framework.TensorProto.Builder tensorProtoBuilder = org.tensorflow.framework.TensorProto.newBuilder();
tensorProtoBuilder.setDtype(DataType.DT_FLOAT);
org.tensorflow.framework.TensorShapeProto.Builder tensorShapeBuilder = org.tensorflow.framework.TensorShapeProto.newBuilder();
tensorShapeBuilder.addDim(org.tensorflow.framework.TensorShapeProto.Dim.newBuilder().setSize(3));
tensorProtoBuilder.setTensorShape(tensorShapeBuilder.build());
// Set the float_val field.
for (int i = 0; i < floatData.length; i++) {
tensorProtoBuilder.addFloatVal(floatData[i]);
}
org.tensorflow.framework.TensorProto tp = tensorProtoBuilder.build();
System.out.println(tp.getFloatValList());
builder.putInputs("inputs", tp);
Predict.PredictRequest request = builder.build();
Predict.PredictResponse response = stub.predict(request);
When I print request the shape is:
model_spec {
name: "half_plus_two"
version {
value: 123
}
signature_name: "serving_default"
}
inputs {
key: "inputs"
value {
dtype: DT_FLOAT
tensor_shape {
dim {
size: -1
}
dim {
size: 1
}
}
float_val: 1.0
float_val: 2.0
float_val: 5.0
}
}
Get this exception:
Exception in thread "main" io.grpc.StatusRuntimeException: INVALID_ARGUMENT: input tensor alias not found in signature: inputs. Inputs expected to be in the set {x}.
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
at tensorflow.serving.PredictionServiceGrpc$PredictionServiceBlockingStub.predict(PredictionServiceGrpc.java:446)
at com.avaya.ccml.grpc.GrpcClient.main(GrpcClient.java:72)`
Edit:
Still working on this.
It looks like the tensor proto I'm supplying is not correct.
Did an inspect with saved_model_cli and it shows the correct shape:
The given SavedModel SignatureDef contains the following input(s):
inputs['x'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: x:0
The given SavedModel SignatureDef contains the following output(s):
outputs['y'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: y:0
Method name is: tensorflow/serving/predict
So next need to figure out how to create tensor proto of this structure
Current
I figured this out.
The answer was staring me in the face the whole time.
The exception states that input signature must be 'x'
Exception in thread "main" io.grpc.StatusRuntimeException: INVALID_ARGUMENT: input tensor alias not found in signature: inputs. Inputs expected to be in the set {x}.
And the output of the CLI also looks for 'x' as input name
The given SavedModel SignatureDef contains the following input(s):
inputs['x'] tensor_info:
So I changed line
requestBuilder.putInputs("inputs", proto);
to
requestBuilder.putInputs("x", proto);
Full working code
import com.google.protobuf.Int64Value;
import io.grpc.ManagedChannel;
import io.grpc.ManagedChannelBuilder;
import org.tensorflow.framework.DataType;
import tensorflow.serving.Predict;
public class GrpcClient {
public static void main(String[] args) {
String host = "localhost";
int port = 8500;
// the model's name.
String modelName = "half_plus_two";
// model's version
long modelVersion = 123;
// create a channel
ManagedChannel channel = ManagedChannelBuilder.forAddress(host, port).usePlaintext().build();
tensorflow.serving.PredictionServiceGrpc.PredictionServiceBlockingStub stub = tensorflow.serving.PredictionServiceGrpc.newBlockingStub(channel);
// create PredictRequest
Predict.PredictRequest.Builder requestBuilder = Predict.PredictRequest.newBuilder();
// create ModelSpec
tensorflow.serving.Model.ModelSpec.Builder modelSpecBuilder = tensorflow.serving.Model.ModelSpec.newBuilder();
modelSpecBuilder.setName(modelName);
modelSpecBuilder.setVersion(Int64Value.of(modelVersion));
modelSpecBuilder.setSignatureName("serving_default");
// set model for request
requestBuilder.setModelSpec(modelSpecBuilder);
// create TensorProto with 3 floats
org.tensorflow.framework.TensorProto.Builder tensorProtoBuilder = org.tensorflow.framework.TensorProto.newBuilder();
tensorProtoBuilder.setDtype(DataType.DT_FLOAT);
tensorProtoBuilder.addFloatVal(1.0f);
tensorProtoBuilder.addFloatVal(2.0f);
tensorProtoBuilder.addFloatVal(5.0f);
// create TensorShapeProto
org.tensorflow.framework.TensorShapeProto.Builder tensorShapeBuilder = org.tensorflow.framework.TensorShapeProto.newBuilder();
tensorShapeBuilder.addDim(org.tensorflow.framework.TensorShapeProto.Dim.newBuilder().setSize(3));
// set shape for proto
tensorProtoBuilder.setTensorShape(tensorShapeBuilder.build());
// build proto
org.tensorflow.framework.TensorProto proto = tensorProtoBuilder.build();
// set proto for request
requestBuilder.putInputs("x", proto);
// build request
Predict.PredictRequest request = requestBuilder.build();
System.out.println("Printing request \n" + request.toString());
// run predict
Predict.PredictResponse response = stub.predict(request);
System.out.println(response.toString());
}
}
in the example for half_plus_two here they use instances label for input values; https://www.tensorflow.org/tfx/serving/docker#serving_example
could you try to set it to instances like this?
builder.putInputs("instances", tp);
I also believe that the DType can be problematic. instead of DT_STRING, i think you should use DT_FLOAT as the inspection result shows
tensorProtoBuilder.setDtype(DataType.DT_FLOAT);
Edit
I am working with Python, couldnt spot the mistake on yours but, this is how we send a predict request (with a PredictRequest proto). Maybe you can try out the Predict proto or there is something that I am missing out and you may spot the difference yourself
request = predict_pb2.PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = signature_name
request.inputs['x'].dtype = types_pb2.DT_FLOAT
request.inputs['x'].float_val.append(2.0)
channel = grpc.insecure_channel(model_server_address)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
result = stub.Predict(request, RPC_TIMEOUT)

BigQueryIO: Query configured via options, but "Value only available at runtime"

Apache Beam 2.9.0
I have set up a pipeline that pulls data from BigQuery and does a series of transforms on it. The options have a start date attached to them using a ValueProvider:
ValueProvider<String> getStartTime();
void setStartTime(ValueProvider<String> startTime);
I then go to pull the data with BigQueryIO (changing things around a bit for the sake of making it explicit what is going on):
BigQueryIO.read(
(SerializableFunction<SchemaAndRecord, AggregatedRowRecord>)
input -> new BigQueryParser().apply(input.getRecord()))
.withoutValidation()
.withTemplateCompatibility()
.fromQuery(
ValueProvider.NestedValueProvider.of(
opts.getStartTime(),
(SerializableFunction<String, String>)
input -> {
Instant instant = Instant.parse(input);
return String.format(
<large SQL statement with a %s in it>,
String.format(
"%d_%d_%d",
instant.get(ChronoField.YEAR),
instant.get(ChronoField.MONTH_OF_YEAR),
instant.get(ChronoField.DAY_OF_MONTH)));
}))
.withCoder(<coder for AggregatedRowRecords>)
.usingStandardSql()
This is then added to a pipeline normally (p.apply(<above>)).
Now I run it:
--project=<project> \
--tempLocation=<directory> \
--stagingLocation=<directory> \
--network=dataflow \
--subnetwork=<subnetwork> \
--defaultWorkerLogLevel=DEBUG
--appName=<name>
--runner=DirectRunner
This causes the following error:
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalStateException: Value only available at runtime, but accessed from a non-runtime context: RuntimeValueProvider{propertyName=startTime, default=null}
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
at <class>.main(<class>.java:<>)
Caused by: java.lang.IllegalStateException: Value only available at runtime, but accessed from a non-runtime context: RuntimeValueProvider{propertyName=startTime, default=null}
at org.apache.beam.sdk.options.ValueProvider$RuntimeValueProvider.get(ValueProvider.java:228)
at org.apache.beam.sdk.options.ValueProvider$NestedValueProvider.get(ValueProvider.java:131)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.createBasicQueryConfig(BigQueryQuerySource.java:230)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.dryRunQueryIfNeeded(BigQueryQuerySource.java:175)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:115)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:102)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead$2.processElement(BigQueryIO.java:783)
The use of NestedValueProvider comes from this example on setting up templates:
The user provides a substring for a BigQuery query, such as a specific date. The transform uses the substring to create the full query. Calling .get() returns the full query.
Removing the value provider logic doesn't seem to help, however. Removing the ValueProvider entirely from the withQuery section works fine, but defeats the purpose of being able to set it via options.
The exception explains you the issue, Apache beam first builds the pipeline and the classes and then start to run the data in the pipeline, in this stage, you can't access to options, this is just metadata for building the pipeline.
The way to overcome it is to create a ParDo function/ PTransform, that will get the options you need as parameters in the constructor, then it can access it in its logic.
See example: (my use case, I face the same issue last days)
The pipeline:
HistoryProcessingOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
.as(HistoryProcessingOptions.class);
Pipeline pipeline = Pipeline.create(options);
pipeline.apply(SourceRead.of(options.getSourceBigQueryTable().get(),
options.getSourceBigQueryDataset().get(),
options.getSourceBigQueryProject().get(),
options.getFromDate().get(),
options.getToDate().get()
))
The transformer itself:
public class SourceRead extends PTransform<PBegin, PCollection<TableRow>> {
private String sourceBigQueryTable;
private String sourceBigQueryDataset;
private String sourceBigQueryProject;
private String formDate;
private String toDate;
private static Logger logger = LoggerFactory.getLogger(SourceRead.class);
public SourceRead(String sourceBigQueryTable, String sourceBigQueryDataset, String sourceBigQueryProject, String formDate, String toDate) {
this.sourceBigQueryTable = sourceBigQueryTable;
this.sourceBigQueryDataset = sourceBigQueryDataset;
this.sourceBigQueryProject = sourceBigQueryProject;
this.formDate = formDate;
this.toDate = toDate;
}
public static SourceRead of(String sourceBigQueryTable, String sourceBigQueryDataset, String sourceBigQueryProject, String yearToLoad, String dateToLoad) {
return new SourceRead(sourceBigQueryTable, sourceBigQueryDataset, sourceBigQueryProject, yearToLoad, dateToLoad);
}
#Override
public PCollection<TableRow> expand(PBegin input) {
String query = "SELECT * FROM TABLE_DATE_RANGE([" + sourceBigQueryProject + ":"+sourceBigQueryDataset+"."+sourceBigQueryTable+"],"
+ "TIMESTAMP('" + formDate + "'),"
+ "TIMESTAMP('" + toDate + "'))";
logger.info("query is"+ query);
return input.apply(BigQueryIO.readTableRows()
.fromQuery(query));
}

testing OpenNLP classifier model

I'm currently training a model for a classifier. yesterday I found out that it will be more accurate if you also test the created classify model. I tried searching on the internet how to test a model : testing openNLP model. But I cant get it to work. I think the reason is because i'm using OpenNLP version 1.83 instead of 1.5. Could anyone explain me how to properly test my model in this version of OpenNLP?
Thanks in advance.
Below is the way im training my model:
public static DoccatModel trainClassifier() throws IOException
{
// read the training data
final int iterations = 100;
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/trainingssetTest.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
// define the training parameters
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, iterations+"");
params.put(TrainingParameters.CUTOFF_PARAM, 0+"");
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
// create a model from traning data
DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());
return model;
}
I can think of two ways to test your model. Either way, you will need to have annotated documents (an by annotated I really mean expert-classified).
The first way involves using the opennlp DocCatEvaluator. The syntax would be something akin to
opennlp DoccatEvaluator -model model -data sampleData
The format of your sampleData should be
OUTCOME <document text....>
documents are separated by the new line character.
The second way involves creating an DocumentCategorizer. Something like:
(the model is the DocCat model from your question)
DocumentCategorizer categorizer = new DocumentCategorizerME(model);
// could also use: Tokenizer tokenizer = new TokenizerME(tokenizerModel)
Tokenizer tokenizer = WhitespaceTokenizer.INSTANCE();
// linesample is like in your question...
for(String sample=linesample.read(); sample != null; sample=linesample.read()){
String[] tokens = tokenizer.tokenize(sample);
double[] outcomeProb = categorizer.categorize(tokens);
String sampleOutcome = categorizer.getBestCategory(outcomeProb);
// check if the outcome is right...
// keep track of # right and wrong...
}
// calculate agreement metric of your choice
Since I typed the code here there may be a syntax error or two (either I or the SO community can fix), but the idea for running through your data, tokenizing, running it through the document categorizer and keeping track of the results is how you want to evaluate your model.
Hope it helps...

Getting JSON from System.out.println

I am trying to do sentiment analysis and project the values on Google Visualization.
I am calling this python script using my java program
Code Snippet (for AlchemyAPI)
https://github.com/AlchemyAPI/alchemyapi-twitter-python
I wrote a java program to call the python script.
import java.io.*;
public class twitmain {
public String twittersentiment(String[] args) throws IOException {
// set up the command and parameter
String pythonScriptPath = "/twitter/analyze.py"; // I'm calling AlchemyAPI
String[] cmd = new String[2 + args.length];
cmd[0] = "C:\\Python27\\python.exe";
cmd[1] = pythonScriptPath;
for (int i = 0; i < args.length; i++) {
cmd[i + 2] = args[i];
}
// create runtime to execute external command
Runtime rt = Runtime.getRuntime();
Process pr = rt.exec(cmd);
// retrieve output from python script
BufferedReader bfr = new BufferedReader(new InputStreamReader(
pr.getInputStream()));
String line = ""; int i=0;
while ((line = bfr.readLine()) != null) {
System.out.println(line);
}
return line;
}
}
Output:
I am getting tweets and final statistics as follows:
##########################################################
# The Tweets #
##########################################################
#uDiZnoGouD
Date: Mon Apr 07 05:07:19 +0000 2014
To enjoy in case you win!
To help you sulk in case you loose!
#IndiavsSriLanka #T20final http://t.co/hRAsIa19zD
Document Sentiment: positive (Score: 0.261738)
##########################################################
# The Stats #
##########################################################
Document-Level Sentiment:
Positive: 3 (60.00%)
Negative: 1 (20.00%)
Neutral: 1 (20.00%)
Total: 5 (100.00%)
Problem (Question):
How do I just scrape Positive, Negitive, Neutral and send it to Google Visualization? (I.e make a JSON?)
Any help would be really appreciated.
Shoot, I just realized you were asking the other way around. To write the parsing app in Java.
Anyway the idea would be the same, but the language would differ.
But that would also mean that you have access to the sources of the python app, so you can maybe dig around there, and you could dump the result object into the console as a JSON object.
Original answer in python:
You should identify the types of lines and parse them and construct the JSON object yourself.
Like for every line:
import re
json_obj = {}
pattern = "^(\w+): (\d) \((\d{2}\.\d{2}%)\)$"
match = re.match(pattern, line)
if match:
prop_obj = { "value": match[2], "percent": match[3] }
json_obj[match[1]] = prop_obj
This would transform the line:
Positive: 3 (60.00%)
into
{
Positive: {
value: "3"
percent: "60.00%"
}
}
Taking this idea further, the parsing rules should be a dictionary of pattern-extractor_methods as key-values
var parse_rules = {
"^(\w+): (\d) \((\d{2}\.\d{2}%)\)$":
def (matches):
return { match[1]: { "value": match[2], "percent": match[3] }}
, ...
}
And for each line you would test against the parse rules and execute the methods if a match is found, and the result of the method is merged in the JSON result object
This is a lot of work (depending on the complexity of the java app, but I'd go this way if the Java app cannot be modified.
Regex explanation & example

Categories

Resources