I use Twitter anomaly detection algorithm in my project. For this purpose I use Rserve library to run R code in my Java application.
My Java code:
RConnection connection = new RConnection();
connection.voidEval("library(AnomalyDetection)");
connection.eval("res <- AnomalyDetectionTs(data.frame(/*list of timestamps*/,/*list of values*/), direction='both', plot=FALSE, longterm=TRUE)");
And, as a result, I got this output:
$anoms
timestamp anoms
1 1980-09-25 16:05:00 21.3510
2 1980-09-29 06:40:00 193.1036
3 1980-09-29 21:44:00 148.1740
To get results now I'm using this not nice solution:
connection.eval("write.csv(res[['anoms']],file='anom.csv')");
Then I open this file in Java and parse the results.
So, how to get the output results in Java using Rserve possibilities for data.frame structure?
Simply write the R command such that it returns the desired result back to Java:
RList l = c.eval("AnomalyDetectionTs(data, direction='both',
plot=FALSE, longterm=TRUE)$anoms").asList();
What you get is the data frame (as a list) with the two variables timestamp and anoms.
However, AnomalyDetectionTs returns dates in a horribly annoying and inefficient format so you may want to actually return a more sane result which is easier to work with in Java, e.g.:
RList l = c.eval("{ res <- AnomalyDetectionTs(data, direction='both', plot=FALSE,
longterm=TRUE)$anoms;
list(as.POSIXct(res$timestamp), res$anoms) }").asList();
double ts[] = l.at(0).asDoubles();
double anom[] = l.at(1).asDoubles();
for (int i = 0; i < ts.length; i++)
System.out.println(new java.util.Date((long)(ts[i]*1000.0)) + ": " + anom[i]);
PS: the right place for Rserve questions is the stats-rosuda-devel mailing list which will give you answers much faster.
Related
Im trying to copy the exrcise about halfway down the page on this link:
https://d2l.ai/chapter_recurrent-neural-networks/sequence.html
The exercise uses a sine function to create 1000 data points between -1 through 1 and use a recurrent network to approximate the function.
Below is the code I used. I'm going back to study more why this isn't working as it doesn't make much sense to me now when I was easily able to use a feed forward network to approximate this function.
//get data
ArrayList<DataSet> list = new ArrayList();
DataSet dss = DataSetFetch.getDataSet(Constants.DataTypes.math, "sine", 20, 500, 0, 0);
DataSet dsMain = dss.copy();
if (!dss.isEmpty()){
list.add(dss);
}
if (list.isEmpty()){
return;
}
//format dataset
list = DataSetFormatter.formatReccurnent(list, 0);
//get network
int history = 10;
ArrayList<LayerDescription> ldlist = new ArrayList<>();
LayerDescription l = new LayerDescription(1,history, Activation.RELU);
ldlist.add(l);
LayerDescription ll = new LayerDescription(history, 1, Activation.IDENTITY, LossFunctions.LossFunction.MSE);
ldlist.add(ll);
ListenerDescription ld = new ListenerDescription(20, true, false);
MultiLayerNetwork network = Reccurent.getLstm(ldlist, 123, WeightInit.XAVIER, new RmsProp(), ld);
//train network
final List<DataSet> lister = list.get(0).asList();
DataSetIterator iter = new ListDataSetIterator<>(lister, 50);
network.fit(iter, 50);
network.rnnClearPreviousState();
//test network
ArrayList<DataSet> resList = new ArrayList<>();
DataSet result = new DataSet();
INDArray arr = Nd4j.zeros(lister.size()+1);
INDArray holder;
if (list.size() > 1){
//test on training data
System.err.println("oops");
}else{
//test on original or scaled data
for (int i = 0; i < lister.size(); i++) {
holder = network.rnnTimeStep(lister.get(i).getFeatures());
arr.putScalar(i,holder.getFloat(0));
}
}
//add originaldata
resList.add(dsMain);
//result
result.setFeatures(dsMain.getFeatures());
result.setLabels(arr);
resList.add(result);
//display
DisplayData.plot2DScatterGraph(resList);
Can you explain the code I would need for a 1 in 10 hidden and 1 out lstm network to approximate a sine function?
Im not using any normalization as function is already -1:1 and Im using the Y input as the feature and the following Y Input as the label to train the network.
You notice i am building a class that allows for easier construction of nets and I have tried throwing many changes at the problem but I am sick of guessing.
Here are some examples of my results. Blue is data red is result
This is one of those times were you go from wondering why was this not working to how in the hell were my original results were as good as they were.
My failing was not understanding the documentation clearly and also not understanding BPTT.
With feed forward networks each iteration is stored as a row and each input as a column. An example is [dataset.size, network inputs.size]
However with recurrent input its reversed with each row being a an input and each column an iteration in time necessary to activate the state of the lstm chain of events. At minimum my input needed to be [0, networkinputs.size, dataset.size] But could also be [dataset.size, networkinputs.size, statelength.size]
In my previous example I was training the network with data in this format [dataset.size, networkinputs.size, 1]. So from my low resolution understanding the lstm network should never have worked at all but somehow produced at least something.
There may have also been some issue with converting the dataset to a list as I also changed how I feed the network but but I think the bulk of the issue was a data structure issue.
Below are my new results
Hard to tell what is going on without seeing the full code. For a start I don't see an RnnOutputLayer specified. You could take a look this which shows you how to build an RNN in DL4J.
If your RNN setup is correct this could be a tuning issue. You can find more on tuning here. Adam is probably a better choice for an updater than RMSProp. And tanh probably is a good choice for the activation for your output layer since it's range is (-1,1). Other things to check/tweak - learning rate, number of epochs, set up of your data (like are you trying to predict to far out?).
I am trying to use an Adaptive Algo from Interactive Brokers. It seems like IBrokers package for R (https://cran.r-project.org/web/packages/IBrokers/IBrokers.pdf - pg37 and 38) was not completed as my order does not go through when I execute the code below.
tws <- twsConnect()
stockEquity <- twsEquity("AAPL")
parentLongId <- reqIds(tws)
parentLongOrder <- twsOrder(parentLongId, action="BUY", totalQuantity = 100, orderType = "MKT", transmit=TRUE,
algoStrategy ="Adaptive", algoParams = "Normal")
I found API Guide on GitHub (http://interactivebrokers.github.io/tws-api/ibalgos.html) for JAVA, Python, C# and C++. I was wondering if anyone knows how to convert the codes into R.
Example of Java,
Order baseOrder = OrderSamples.LimitOrder("BUY", 1000, 1);
AvailableAlgoParams.FillAdaptiveParams(baseOrder, "Normal");
client.placeOrder(nextOrderId++, ContractSamples.USStockAtSmart(), baseOrder);
public static void FillAdaptiveParams(Order baseOrder, String priority) {
baseOrder.algoStrategy("Adaptive");
baseOrder.algoParams(new ArrayList<>());
baseOrder.algoParams().add(new TagValue("adaptivePriority", priority));
}
I think the easiest way for R users to access all of IBALGO is to install the Reticulate package which allows you to use Python in R. Then install ib_insync Python module. Now you can use R to manage IB API in almost every way as if working within its native Python thanks to Reticulate.
Just remember that the syntax to use the equivalent of Python's TagValue translate in R looks like this:
algoStrategy = 'Adaptive',
algoParams = list(insync$TagValue('adaptivePriority', 'Normal')))
I've searched all over stackoverflow and google for these kind predicitons but found nothing for IBk or KStar or LWL. I would need one instance predictions from any of these three clasifiers.I am doing this in Android studio.
I've found ways of getting predictions from other classifiers like these:
for J48: from Here
double[] prediction=j48.distributionForInstance(test.get(s1));
//output predictions
for(int i=0; i<prediction.length; i=i+1)
{
System.out.println("Probability of class "+
test.classAttribute().value(i)+
" : "+Double.toString(prediction[i]));
}
For Bayesnet: from Here
Evaluation eTest = new Evaluation(trainingInstance);
eTest.evaluateModelOnce(bayes_Classifier, testInstance);
For NaiveBayes: from Here
NaiveBayes naiveBayes = new NaiveBayes();
naiveBayes.buildClassifier(train);
// this does the trick
double label = naiveBayes.classifyInstance(test.instance(0));
test.instance(0).setClassValue(label);
System.out.println(test.instance(0).stringValue(4));
but I couldn't use them because my classifiers don't have the same methods...or I can't find a way
My code:
//I skipped code till here because its too much,
//but Data is definetly inside *instances* (I checked with debuger)
instances.setClassIndex(instances.numAttributes()-1);
//was trying the sam with KStar, LWL, AdditiveRegression, RandomCommittee)
IBk ibk = new IBk();
//I want predicitons for this instance. For the third attribute3
Instance testInst = new DenseInstance(3);
testInst.setValue(attribute1, 3);
testInst.setValue(attribute2, 16);
testInst.setValue(attribute3, 0);
//I was hopping for some simple way like this: (but this returns nothing)
double rez =0;
String var="";
try{
ibk.buildClassifier(instances);
rez = ibk.classifyInstance(testInst);
}
catch(Exception ex)
{
Log.e("Error","ex.getMessage()");
}
}
Log.w("GIMME RESULTS:",rez);
Even other classifiers would be okay like AdditiveRegression, Bagging, RandomCommitte and DecisionTable they make good prediction in Weka GUI, but I need predictions in Java.... :)
found it by testing all its methods..
ibk.buildClassifier(dataSet);
rez2 = ibk.distributionForInstance(i2); //distrib
int result = (int)rez3[0];
//it goes tha same with Kstar
Came to realize that classifiers in weka normaly run with discrete data (equal steps from min to max). And my data is not all discrete. Ibk and Kstar are able to use distributed data thats why I can use only these two with my data.
I'm trying to create an "automated trainning" using weka's java api but I guess I'm doing something wrong, whenever I test my ARFF file via weka's interface using MultiLayerPerceptron with 10 Cross Validation or 66% Percentage Split I get some satisfactory results (around 90%), but when I try to test the same file via weka's API every test returns basically a 0% match (every row returns false)
here's the output from weka's gui:
=== Evaluation on test split ===
=== Summary ===
Correctly Classified Instances 78 91.7647 %
Incorrectly Classified Instances 7 8.2353 %
Kappa statistic 0.8081
Mean absolute error 0.0817
Root mean squared error 0.24
Relative absolute error 17.742 %
Root relative squared error 51.0603 %
Total Number of Instances 85
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.885 0.068 0.852 0.885 0.868 0.958 1
0.932 0.115 0.948 0.932 0.94 0.958 0
Weighted Avg. 0.918 0.101 0.919 0.918 0.918 0.958
=== Confusion Matrix ===
a b <-- classified as
23 3 | a = 1
4 55 | b = 0
and here's the code I've using on java (actually it's on .NET using IKVM):
var classifier = new weka.classifiers.functions.MultilayerPerceptron();
classifier.setOptions(weka.core.Utils.splitOptions("-L 0.7 -M 0.3 -N 75 -V 0 -S 0 -E 20 -H a")); //these are the same options (the default options) when the test is run under weka gui
string trainingFile = Properties.Settings.Default.WekaTrainingFile; //the path to the same file I use to test on weka explorer
weka.core.Instances data = null;
data = new weka.core.Instances(new java.io.BufferedReader(new java.io.FileReader(trainingFile))); //loads the file
data.setClassIndex(data.numAttributes() - 1); //set the last column as the class attribute
cl.buildClassifier(data);
var tmp = System.IO.Path.GetTempFileName(); //creates a temp file to create an arff file with a single row with the instance I want to test taken from the arff file loaded previously
using (var f = System.IO.File.CreateText(tmp))
{
//long code to read data from db and regenerate the line, simulating data coming from the source I really want to test
}
var dataToTest = new weka.core.Instances(new java.io.BufferedReader(new java.io.FileReader(tmp)));
dataToTest.setClassIndex(dataToTest.numAttributes() - 1);
double prediction = 0;
for (int i = 0; i < dataToTest.numInstances(); i++)
{
weka.core.Instance curr = dataToTest.instance(i);
weka.core.Instance inst = new weka.core.Instance(data.numAttributes());
inst.setDataset(data);
for (int n = 0; n < data.numAttributes(); n++)
{
weka.core.Attribute att = dataToTest.attribute(data.attribute(n).name());
if (att != null)
{
if (att.isNominal())
{
if ((data.attribute(n).numValues() > 0) && (att.numValues() > 0))
{
String label = curr.stringValue(att);
int index = data.attribute(n).indexOfValue(label);
if (index != -1)
inst.setValue(n, index);
}
}
else if (att.isNumeric())
{
inst.setValue(n, curr.value(att));
}
else
{
throw new InvalidOperationException("Unhandled attribute type!");
}
}
}
prediction += cl.classifyInstance(inst);
}
//prediction is always 0 here, my ARFF file has two classes: 0 and 1, 92 zeroes and 159 ones
it's funny because if I change the classifier to let's say NaiveBayes the results match the test made via weka's gui
You are using a deprecated way of reading in ARFF files. See this documentation. Try this instead:
import weka.core.converters.ConverterUtils.DataSource;
...
DataSource source = new DataSource("/some/where/data.arff");
Instances data = source.getDataSet();
Note that that documentation also shows how to connect to a database directly, and bypass the creation of temporary ARFF files. You could, additionally, read from the database and manually create instances to populate the Instances object with.
Finally, if simply changing the classifier type at the top of the code to NaiveBayes solved the problem, then check the options in your weka gui for MultilayerPerceptron, to see if they are different from the defaults (different settings can cause the same classifier type to produce different results).
Update: it looks like you're using different test data in your code than in your weka GUI (from a database vs a fold of the original training file); it might also be the case that the particular data in your database actually does look like class 0 to the MLP classifier. To verify whether this is the case, you can use the weka interface to split your training arff into train/test sets, and then repeat the original experiment in your code. If the results are the same as the gui, there's a problem with your data. If the results are different, then we need to look more closely at the code. The function you would call is this (from the Doc):
public Instances trainCV(int numFolds, int numFold)
I had the same Problem.
Weka gave me different results in the Explorer compared to a cross-validation in Java.
Something that helped:
Instances dataSet = ...;
dataSet.stratify(numOfFolds); // use this
//before splitting the dataset into train and test set!
What is the quickest way to export a very large string (several megabytes) from a Java app to on-page javascript? At the moment it's taking so long the browser grinds to a halt.
Here is the code I'm using to modify the DOM:
window = JSObject.getWindow(this);
document = (JSObject) window.getMember("document");
for (int i = 0; i < encHexFileUploadStr.length(); i++){
char c = encHexFileUploadStr.charAt(i);
document.eval("document.getElementById('encOutgoingData').value += '"+c+"';");
if (i % 100 == 0) document.eval("console.log("+i+");");
}
Before this I tried to just assign the encHexFileUploadStr variable directly in one go, that was no better.
Is there any good way to do this that isn't so slow?
Thanks!
I have not attempted to transfer the amount of data you are mentioning from a Java application to a webpage, however, if you are using document.eval the performance problem you are having is most likely due to the processing required on an eval statement. An alternative approach is to directly invoke a JavaScript method which performs the actual data update work. Here is an example of how you would go about using this approach:
Java
JSObject jso = JSObject.getWindow(this);
// invoke JavaScript method updateData with parameter encHexFileUploadStr
jso.call("updateData", new String[] { encHexFileUploadStr });
JavaScript
function updateData(s) {
document.getElementById('encOutgoingData').value = s;
}
I would combine Kris' suggestion with breaking the string in chunks of 1024 bytes or such and append to a string in JavaScript and at the end copy, in JavaScript, the string to the field
How do I optimize this method for breaking a string in chunks?
Also in the loop, make it a habit to do
for (int i = 0, n=something.length();i<n; i++){
rather than
for (int i = 0; i < something.length(); i++){