Are there any restrictions on Amazon neptune when using gremlin syntax? - java

I am using a graph database for my project, Neptune by AWS. Neptune uses gremlin syntax for graph queries. I was trying to execute a scenario where I have to filter the outgoing edges from a vertex on the basis of property on the edge. Let's call that property 'x'. The value of this property 'x' is of the form 'abc::xyz::ref'. This is to store multiple values on the edge, as Neptune does not allow multi values on edges. I have to do a contains check with three combinations and an exact match :-
'abc::'
'::abc'
'::abc::'
Exact match with 'abc'
I was trying to use filter command in the gremlin in my java code. The below code works fine with TinkerGraph in-memory, but when I connect it with Neptune and run the same query it throws some parsing exception.
String valueToCheck = "abc";
List<String> listOfValuesToCheck = new ArrayList<>();
listOfValuesToCheck.add("::abc");
listOfValuesToCheck.add("abc::");
listOfValuesToCheck.add("::abc::");
GraphTraversal<Vertex, Map<Object, Object>> gt24 = g.V().outE().has("x").filter(it -> {
String value = String.valueOf(it.get().value("x"));
if(value.equals(valueToCheck)){
return true;
}else {
for(String s: listOfValuesToCheck){
if(value.contains(s){
return true;
}
}
}
}).valueMap().with(WithOptions.tokens);
while (gt24.hasNext()) {
System.out.println(gt24.next());
}
Does someone know, why this is happening with Neptune? And is there a better way to do it that works with Neptune.
I have seen one more instance where Neptune did not throw an error but also gave back no results but the same work with TinkerGraph.
y - property on Edge
z - property on Vertex
GraphTraversal<Vertex, Map<String, Object>> gt13 = g.V(1, 2).project("id", "summary").by(T.id)
.by(__.outE().has("y", "e").inV().group().by("z"));

Neptune doesn't support lambdas steps.
you should try to replace your lambda code with other gremlin steps.
g.V().outE().has("x").or(
__.has('prop', 'abc'),
__.has('prop', TextP.containing('abc::')),
__.has('prop', TextP.containing('::abc')),
).valueMap().with(WithOptions.tokens)
example: https://gremlify.com/sjrd9lrwfr

It looks like you are trying to include in line code in your query. Amazon Neptune does not support that. However, Gremlin includes text predicates such as TextP.containing which you can use instead. The small number of Neptune differences are documented at https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-differences.html

Related

Map original data from the dataset to new data using Datavec library and store it in Spark RDDs

I have a dataset that contains a latitude and longitude written like 20.55E and 30.11N. I want to replace these direction strings with an appropriate - where required. So basically, I'll map based on the condition and change the value.
Currently, I have a Schema and I'm trying to sort out the TransformProcess
My Schema is like this:
new Schema.Builder()
.addColumnTime("dt", DateTimeZone.UTC)
.addColumnsDouble("AverageTemperature" , "AverageTemperatureUncertainty")
.addColumnsInteger("City" , "Country")
.addColumnsFloat("Latitude" , "Longitude")
.build();
And I'm stuck with my TransformProcess like this:
new TransformProcess.Builder(schema)
.filter(new FilterInvalidValues("AverageTemperature" , "AverageTemperatureUncertainty"))
.stringToTimeTransform("dt","yyyy-MM-dd", DateTimeZone.UTC)
. // map currentLatitude -> remove direction string and put sign
I am trying to follow this code from a tutorial and after the TransformProcess, I'll do the Spark stuff and save the data.
My question is:
How can I perform the mapping?
From the API docs of TansformProcess, I cannot make sense of anything that will help me solve my problem.
I am using the Datavec library in Deeplearning4J

Redisearch query with "begin with" instead of "contains"

I am trying to understand on how to perform queries in Redisearch strictly with "begins with" and I keep getting "contains".
For example if I have fields with values like 'football', 'myfootball', 'greenfootball' and would provide a search term like this:
> FT.SEARCH myIdx #myfield:foot*
I want just to get 'football' but I keep getting other fields that contain the word instead of beginning with that word.
Is there a way to avoid this?
I was trying to use VERBATIM and things like #myfield:^foot* but nothing.
I am using JRedisearch as a client but eventually I had to enter the DB and perform these queries manually in order to figure out what's happening. That being said, is this possible to do with this client at the moment?
Thanks
EDIT
A sample of my index setup:
Client client = new Client(INDEX_NAME, url, PORT);
Schema sc = new Schema().addSortableTextField("url", 1.0); // using this field for query
client.dropIndex(true);
client.createIndex(sc, Client.IndexOptions.Default());
return client;
Sample document:
id: // random uuid
urlPath: myfootbal
application: web
market: Europe
After checking the RDB provided I see that when searching foot* you are not getting myfootbal. The replies look like this: /dot-com/plp/football/x/index.html. You are getting those replies because this url is tokenized, and '/' is one of the tokenize chars. If you do not want those urls to be tokenized you need to declare them as TAGS and not as TEXT. This way the entire url will be indexed as is and when search for foot* it will not appear in the results.
For more information about TAGS see the FT.CREATE documentation: https://oss.redislabs.com/redisearch/Commands.html

Converting Linq queries to Java 8

Im traslating a old enterprise App who uses C# with Linq queries to Java 8. I have some of those queries who I'm not able to reproduce using Lambdas as I dont know how C# works with those.
For example, in this Linq:
from register in registers
group register by register.muleID into groups
select new Petition
{
Data = new PetitionData
{
UUID = groups.Key
},
Registers = groups.ToList<AuditRegister>()
}).ToList<Petition>()
I undestand this as a GroupingBy on Java 8 Lambda, but what's the "select new PetitionData" inside of the query? I don't know how to code it in Java.
I have this at this moment:
Map<String, List<AuditRegister>> groupByMuleId =
registers.stream().collect(Collectors.groupingBy(AuditRegister::getMuleID));
Thank you and regards!
The select LINQ operation is similar to the map method of Stream in Java. They both transform each element of the sequence into something else.
collect(Collectors.groupingBy(AuditRegister::getMuleID)) returns a Map<String, List<AuditRegister>> as you know. But the groups variable in the C# version is an IEnumerable<IGrouping<string, AuditRegister>>. They are quite different data structures.
What you need is the entrySet method of Map. It turns the map into a Set<Map.Entry<String, List<AuditRegister>>>. Now, this data structure is more similar to IEnumerable<IGrouping<string, AuditRegister>>. This means that you can create a stream from the return value of entry, call map, and transform each element into a Petition.
groups.Key is simply x.getKey(), groups.ToList() is simply x.getValue(). It should be easy.
I suggest you to create a separate method to pass into the map method:
// you can probably came up with a more meaningful name
public static Petition mapEntryToPetition(Map.Entry<String, List<AuditRegister>> entry) {
Petition petition = new Petition();
PetitionData data = new PetitionData();
data.setUUID(entry.getKey());
petition.setData(data);
petition.setRegisters(entry.getValue());
return petition;
}

How To Evaluate Predictive Neural Network in Encog

We're a creating a neural network for predicting typhoon occurrence using several typhoon parameters as input. So far, we have been able to generate data and train the neural network using Encog 3.2. Right now, we need to evaluate the results of the training.
We're using the ForestCover project (in Encog 3.2 examples) as reference, however the said project's Evaluation code is for a classification neural network. Thus, we can't evaluate our neural network following said project's code.
We also checked the PredictMarket project (in Encog 3.2 examples) since it is a predictive neural network. But we're having a difficulty in using the MLData.
MLData output = network.compute(inputData);
We want to extract the contents of output and compare it to the contents of the evaluation.csv for neural-network evaluation.
Is there a way we can extract/convert the output variable into a normalized value which we can then compare to the normalized evaluation.csv?
or
Is there a way we can modify the ForestCover Evaluate.java file to be able to evaluate a predictive neural network?
Thank you.
Here is a C# example (Java should be similar) which writes out a .csv file (TestResultsFile) with both the de-normalized expected and actual results so you can compare them with an Excel graph.
var evaluationSet = (BasicMLDataSet)EncogUtility.LoadCSV2Memory(Config.EvaluationNormalizedFile.ToString(),
network.InputCount, network.OutputCount, true, CSVFormat.English,
false);
var analyst = new EncogAnalyst();
analyst.Load(Config.NormalizationAnalystFile);
// Change this to whatever your output field index is
int outputFieldIndex = 29;
using (var resultsFile = new System.IO.StreamWriter(Config.TestResultsFile.ToString()))
{
foreach (var item in evaluationSet)
{
var normalizedActualOuput = (BasicMLData)network.Compute(item.Input);
var actualOutput = analyst.Script.Normalize.NormalizedFields[outputFieldIndex].DeNormalize(normalizedActualOuput.Data[0]);
var idealOutput = analyst.Script.Normalize.NormalizedFields[outputFieldIndex].DeNormalize(item.Ideal[0]);
var resultLine = String.Format("{0},{1}", idealOutput, actualOutput);
resultsFile.WriteLine(resultLine);
}
}
Much of this came from ideas from Abishek Kumar's Pluralsight Course
If you really want to compare Normalized values, just remove the two calls to "Denormalize".

How to get Spark MLlib RandomForestModel.predict response as text value YES/NO?

I am trying to implement RandomForest algorithm using Apache Spark MLLib. I have the dataset in the CSV format with the following features:
DayOfWeek(int),AlertType(String),Application(String),Router(String),Symptom(String),Action(String)
0,Network1,App1,Router1,Not reachable,YES
0,Network1,App2,Router5,Not reachable,NO
I want to use RandomForest MLlib and do prediction on last field Action and I want response as YES/NO.
I am following code from GitHub to create RandomForest model. Since I have all categorical features except one int feature I have used the following code to convert them into JavaRDD<LabeledPoint> - is any of that wrong?
// Load and parse the data file.
JavaRDD<String> data = jsc.textFile("/tmp/xyz/data/training-dataset.csv");
// I have 14 features so giving 14 as arg to the following
final HashingTF tf = new HashingTF(14);
// Create LabeledPoint datasets for Actionable and nonactionable
JavaRDD<LabeledPoint> labledData = data.map(new Function<String, LabeledPoint>() {
#Override public LabeledPoint call(String alert) {
List<String> featureList = Arrays.asList(alert.trim().split(","));
String actionType = featureList.get(featureList.size() - 1).toLowerCase();
return new LabeledPoint(actionType.equals("YES")? 1 : 0, tf.transform(featureList));
}
});
Similarly above I create testdata and use in the following code to do prediction
JavaPairRDD<Double, Double> predictionAndLabel =
testData.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {
#Override
public Tuple2<Double, Double> call(LabeledPoint p) {
return new Tuple2<Double, Double>(model.predict(p.features()), p.label());
}
});
How do I get prediction based on my last field Action and prediction should come as YES/NO? Current predict method returns double not able to understand how do I implement it? Also am I following the correct approach of categorical feature into LabledPoint? I am new to machine learning and Spark MLlib.
I am more familiar with the scala version but I'll try to help.
You need to map the target variable (Action) and all categorical features into levels starting in 0 like 0,1,2,3... For example router1, router2, ... router5 into 0,1,2...4. The same with your target variable which I think was the only one you actually mapped, yes/no to 1/0 (I am not sure what your tf.transform(featureList) is actually doing).
Once you have done this you can train your Randomforest classifier specifying the map for categorical features. Basically it needs you to tell which features are categorical and how many levels do they have, this is the scala version but you can easily translate it into java:
val categoricalFeaturesInfo = Map[Int, Int]((2,2),(3,5))
this is basically saying that in your list of features the 3rd one (2) has 2 levels (2,2) and the 4th one (3) has 5 levels (3,5). The rest are considered Doubles.
Now you pass the categoricalFeaturesInfo when training the classifier together with the other parameters as:
val modelRF = RandomForest.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo,numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins)
Now when you need to evaluate it, the predict function will return a double 0,1 and you can use that to compute accuracy, precision or any metric needed.
This is the example (sorry scala again) if you have a testData where you did the same transformations as before:
val predictionAndLabels = testData.map { point =>
val prediction = modelRF.predict(point.features)
(point.label, prediction)
}
Here your results are clear, the label as 1/0 and the predicted value is also 1/0, any computation of Accuracy, Precision and Recall is straightforward.
I hope it helps!!
You're heading in the correct direction, and you've already managed to train a model which is great.
For binary clasification it will return either a 0.0 or a 1.0, and its up to you to map this back to your string values.

Categories

Resources