Lag a value with Datavec transform - java
I'm trying to figure out how to get a lagged value of a field as part of a datavec transform step.
Here is a little example built off the dl4j examples:
import org.datavec.api.records.reader.RecordReader;
import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
import org.datavec.api.split.FileSplit;
import org.datavec.api.transform.TransformProcess;
import org.datavec.api.transform.schema.Schema;
import org.datavec.api.writable.Writable;
import org.datavec.local.transforms.LocalTransformExecutor;
import org.nd4j.linalg.io.ClassPathResource;
import java.io.File;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class myExample {
public static void main(String[] args) throws Exception {
Schema inputDataSchema = new Schema.Builder()
.addColumnString("DateTimeString")
.addColumnsString("CustomerID", "MerchantID")
.addColumnInteger("NumItemsInTransaction")
.addColumnCategorical("MerchantCountryCode", Arrays.asList("USA","CAN","FR","MX"))
.addColumnDouble("TransactionAmountUSD",0.0,null,false,false) //$0.0 or more, no maximum limit, no NaN and no Infinite values
.addColumnCategorical("FraudLabel", Arrays.asList("Fraud","Legit"))
.build();
TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
.removeAllColumnsExceptFor("DateTimeString","TransactionAmountUSD")
.build();
File inputFile = new ClassPathResource("BasicDataVecExample/exampledata.csv").getFile();
//Define input reader and output writer:
RecordReader rr = new CSVRecordReader(1, ',');
rr.initialize(new FileSplit(inputFile));
//Process the data:
List<List<Writable>> originalData = new ArrayList<>();
while(rr.hasNext()){
originalData.add(rr.next());
}
List<List<Writable>> processedData = LocalTransformExecutor.execute(originalData, tp);
int numRows = 5;
System.out.println("=== BEFORE ===");
for (int i=0;i<=numRows;i++) {
System.out.println(originalData.get(i));
}
System.out.println("=== AFTER ===");
for (int i=0;i<=numRows;i++) {
System.out.println(processedData.get(i));
}
}
}
I'm looking to get a lagged value (ordered by DateTimeString) of TransactionAmountUSD
I was looking at sequenceMovingWindowReduce from the docs but could not figure it out. Also could not really find any examples in the examples repo that seemed to do anything similar to this.
Thanks to some help from Alex Black on the dl4j gitter channel i can post my own answer.
Tip to anyone new to dl4j - there is lots of good things to look at in the tests code too in addition to the examples and tutorials.
Here is my updated toy example code:
package org.datavec.transform.basic;
import org.datavec.api.records.reader.RecordReader;
import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
import org.datavec.api.split.FileSplit;
import org.datavec.api.transform.TransformProcess;
import org.datavec.api.transform.schema.Schema;
import org.datavec.api.transform.sequence.comparator.NumericalColumnComparator;
import org.datavec.api.transform.transform.sequence.SequenceOffsetTransform;
import org.datavec.api.writable.Writable;
import org.datavec.local.transforms.LocalTransformExecutor;
import org.joda.time.DateTimeZone;
import org.nd4j.linalg.io.ClassPathResource;
import java.io.File;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class myExample {
public static void main(String[] args) throws Exception {
Schema inputDataSchema = new Schema.Builder()
.addColumnString("DateTimeString")
.addColumnsString("CustomerID", "MerchantID")
.addColumnInteger("NumItemsInTransaction")
.addColumnCategorical("MerchantCountryCode", Arrays.asList("USA","CAN","FR","MX"))
.addColumnDouble("TransactionAmountUSD",0.0,null,false,false) //$0.0 or more, no maximum limit, no NaN and no Infinite values
.addColumnCategorical("FraudLabel", Arrays.asList("Fraud","Legit"))
.build();
TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
.removeAllColumnsExceptFor("CustomerID", "DateTimeString","TransactionAmountUSD")
.stringToTimeTransform("DateTimeString","YYYY-MM-DD HH:mm:ss.SSS", DateTimeZone.UTC)
.convertToSequence(Arrays.asList("CustomerID"), new NumericalColumnComparator("DateTimeString"))
.offsetSequence(Arrays.asList("TransactionAmountUSD"),1, SequenceOffsetTransform.OperationType.NewColumn)
.build();
File inputFile = new ClassPathResource("BasicDataVecExample/exampledata.csv").getFile();
//Define input reader and output writer:
RecordReader rr = new CSVRecordReader(0, ',');
rr.initialize(new FileSplit(inputFile));
//Process the data:
List<List<Writable>> originalData = new ArrayList<>();
while(rr.hasNext()){
originalData.add(rr.next());
}
List<List<List<Writable>>> processedData = LocalTransformExecutor.executeToSequence(originalData, tp);
System.out.println("=== BEFORE ===");
for (int i=0;i<originalData.size();i++) {
System.out.println(originalData.get(i));
}
System.out.println("=== AFTER ===");
for (int i=0;i<processedData.size();i++) {
System.out.println(processedData.get(i));
}
}
}
This should give some output like below where you can see a now col with the last value for the transaction amount for each customer id is added.
"C:\Program Files\Java\jdk1.8.0_201\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.1\lib\idea_rt.jar=56103:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.1\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_201\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\rt.jar;C:\Users\amaguire\Documents\java_learning\dl4j-examples\datavec-examples\target\classes;C:\Users\amaguire\.m2\repository\org\datavec\datavec-api\1.0.0-beta3\datavec-api-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\jetbrains\annotations\13.0\annotations-13.0.jar;C:\Users\amaguire\.m2\repository\org\apache\commons\commons-lang3\3.6\commons-lang3-3.6.jar;C:\Users\amaguire\.m2\repository\commons-io\commons-io\2.5\commons-io-2.5.jar;C:\Users\amaguire\.m2\repository\commons-codec\commons-codec\1.10\commons-codec-1.10.jar;C:\Users\amaguire\.m2\repository\org\slf4j\slf4j-api\1.7.21\slf4j-api-1.7.21.jar;C:\Users\amaguire\.m2\repository\joda-time\joda-time\2.2\joda-time-2.2.jar;C:\Users\amaguire\.m2\repository\org\yaml\snakeyaml\1.12\snakeyaml-1.12.jar;C:\Users\amaguire\.m2\repository\org\nd4j\jackson\1.0.0-beta3\jackson-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;C:\Users\amaguire\.m2\repository\org\freemarker\freemarker\2.3.23\freemarker-2.3.23.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-common\1.0.0-beta3\nd4j-common-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-api\1.0.0-beta3\nd4j-api-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\com\google\flatbuffers\flatbuffers-java\1.9.0\flatbuffers-java-1.9.0.jar;C:\Users\amaguire\.m2\repository\com\github\os72\protobuf-java-shaded-351\0.9\protobuf-java-shaded-351-0.9.jar;C:\Users\amaguire\.m2\repository\com\github\os72\protobuf-java-util-shaded-351\0.9\protobuf-java-util-shaded-351-0.9.jar;C:\Users\amaguire\.m2\repository\com\google\code\gson\gson\2.7\gson-2.7.jar;C:\Users\amaguire\.m2\repository\org\objenesis\objenesis\2.6\objenesis-2.6.jar;C:\Users\amaguire\.m2\repository\uk\com\robust-it\cloning\1.9.3\cloning-1.9.3.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-buffer\1.0.0-beta3\nd4j-buffer-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\bytedeco\javacpp\1.4.3\javacpp-1.4.3.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-context\1.0.0-beta3\nd4j-context-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\net\ericaro\neoitertools\1.0.0\neoitertools-1.0.0.jar;C:\Users\amaguire\.m2\repository\com\clearspring\analytics\stream\2.7.0\stream-2.7.0.jar;C:\Users\amaguire\.m2\repository\net\sf\opencsv\opencsv\2.3\opencsv-2.3.jar;C:\Users\amaguire\.m2\repository\com\tdunning\t-digest\3.2\t-digest-3.2.jar;C:\Users\amaguire\.m2\repository\it\unimi\dsi\fastutil\6.5.7\fastutil-6.5.7.jar;C:\Users\amaguire\.m2\repository\org\datavec\datavec-spark_2.11\1.0.0-beta3_spark_1\datavec-spark_2.11-1.0.0-beta3_spark_1.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\scala-library\2.11.12\scala-library-2.11.12.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\scala-reflect\2.11.12\scala-reflect-2.11.12.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-sql_2.11\1.6.3\spark-sql_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-core_2.11\1.6.3\spark-core_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\apache\avro\avro-mapred\1.7.7\avro-mapred-1.7.7-hadoop2.jar;C:\Users\amaguire\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7.jar;C:\Users\amaguire\.m2\repository\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\Users\amaguire\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7-tests.jar;C:\Users\amaguire\.m2\repository\com\twitter\chill_2.11\0.5.0\chill_2.11-0.5.0.jar;C:\Users\amaguire\.m2\repository\com\esotericsoftware\kryo\kryo\2.21\kryo-2.21.jar;C:\Users\amaguire\.m2\repository\com\esotericsoftware\reflectasm\reflectasm\1.07\reflectasm-1.07-shaded.jar;C:\Users\amaguire\.m2\repository\com\esotericsoftware\minlog\minlog\1.2\minlog-1.2.jar;C:\Users\amaguire\.m2\repository\com\twitter\chill-java\0.5.0\chill-java-0.5.0.jar;C:\Users\amaguire\.m2\repository\org\apache\xbean\xbean-asm5-shaded\4.4\xbean-asm5-shaded-4.4.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-client\2.2.0\hadoop-client-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-common\2.2.0\hadoop-common-2.2.0.jar;C:\Users\amaguire\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\amaguire\.m2\repository\org\apache\commons\commons-math\2.1\commons-math-2.1.jar;C:\Users\amaguire\.m2\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;C:\Users\amaguire\.m2\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;C:\Users\amaguire\.m2\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;C:\Users\amaguire\.m2\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;C:\Users\amaguire\.m2\repository\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-auth\2.2.0\hadoop-auth-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-hdfs\2.2.0\hadoop-hdfs-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.2.0\hadoop-mapreduce-client-app-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.2.0\hadoop-mapreduce-client-common-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-yarn-client\2.2.0\hadoop-yarn-client-2.2.0.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-test-framework\jersey-test-framework-grizzly2\1.9\jersey-test-framework-grizzly2-1.9.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-test-framework\jersey-test-framework-core\1.9\jersey-test-framework-core-1.9.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-client\1.9\jersey-client-1.9.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-grizzly2\1.9\jersey-grizzly2-1.9.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-http\2.1.2\grizzly-http-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-framework\2.1.2\grizzly-framework-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\gmbal\gmbal-api-only\3.0.0-b023\gmbal-api-only-3.0.0-b023.jar;C:\Users\amaguire\.m2\repository\org\glassfish\external\management-api\3.0.0-b012\management-api-3.0.0-b012.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-http-server\2.1.2\grizzly-http-server-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-rcm\2.1.2\grizzly-rcm-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-http-servlet\2.1.2\grizzly-http-servlet-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\javax.servlet\3.1\javax.servlet-3.1.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-json\1.9\jersey-json-1.9.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;C:\Users\amaguire\.m2\repository\stax\stax-api\1.0.1\stax-api-1.0.1.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jackson\jackson-jaxrs\1.8.3\jackson-jaxrs-1.8.3.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jackson\jackson-xc\1.8.3\jackson-xc-1.8.3.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\contribs\jersey-guice\1.9\jersey-guice-1.9.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-yarn-server-common\2.2.0\hadoop-yarn-server-common-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.2.0\hadoop-mapreduce-client-shuffle-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-yarn-api\2.2.0\hadoop-yarn-api-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.2.0\hadoop-mapreduce-client-core-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-yarn-common\2.2.0\hadoop-yarn-common-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.2.0\hadoop-mapreduce-client-jobclient-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-annotations\2.2.0\hadoop-annotations-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-launcher_2.11\1.6.3\spark-launcher_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-network-common_2.11\1.6.3\spark-network-common_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-network-shuffle_2.11\1.6.3\spark-network-shuffle_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-unsafe_2.11\1.6.3\spark-unsafe_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\net\java\dev\jets3t\jets3t\0.7.1\jets3t-0.7.1.jar;C:\Users\amaguire\.m2\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;C:\Users\amaguire\.m2\repository\org\eclipse\jetty\orbit\javax.servlet\3.0.0.v201112011016\javax.servlet-3.0.0.v201112011016.jar;C:\Users\amaguire\.m2\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;C:\Users\amaguire\.m2\repository\org\slf4j\jul-to-slf4j\1.7.10\jul-to-slf4j-1.7.10.jar;C:\Users\amaguire\.m2\repository\org\slf4j\jcl-over-slf4j\1.7.10\jcl-over-slf4j-1.7.10.jar;C:\Users\amaguire\.m2\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Users\amaguire\.m2\repository\org\slf4j\slf4j-log4j12\1.7.10\slf4j-log4j12-1.7.10.jar;C:\Users\amaguire\.m2\repository\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;C:\Users\amaguire\.m2\repository\org\xerial\snappy\snappy-java\1.1.2.6\snappy-java-1.1.2.6.jar;C:\Users\amaguire\.m2\repository\net\jpountz\lz4\lz4\1.3.0\lz4-1.3.0.jar;C:\Users\amaguire\.m2\repository\org\roaringbitmap\RoaringBitmap\0.5.11\RoaringBitmap-0.5.11.jar;C:\Users\amaguire\.m2\repository\org\json4s\json4s-jackson_2.11\3.2.10\json4s-jackson_2.11-3.2.10.jar;C:\Users\amaguire\.m2\repository\org\json4s\json4s-core_2.11\3.2.10\json4s-core_2.11-3.2.10.jar;C:\Users\amaguire\.m2\repository\org\json4s\json4s-ast_2.11\3.2.10\json4s-ast_2.11-3.2.10.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\scalap\2.11.0\scalap-2.11.0.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\scala-compiler\2.11.0\scala-compiler-2.11.0.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\modules\scala-xml_2.11\1.0.1\scala-xml_2.11-1.0.1.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\modules\scala-parser-combinators_2.11\1.0.1\scala-parser-combinators_2.11-1.0.1.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-server\1.9\jersey-server-1.9.jar;C:\Users\amaguire\.m2\repository\asm\asm\3.1\asm-3.1.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-core\1.9\jersey-core-1.9.jar;C:\Users\amaguire\.m2\repository\org\apache\mesos\mesos\0.21.1\mesos-0.21.1-shaded-protobuf.jar;C:\Users\amaguire\.m2\repository\io\netty\netty-all\4.0.29.Final\netty-all-4.0.29.Final.jar;C:\Users\amaguire\.m2\repository\io\dropwizard\metrics\metrics-core\3.1.2\metrics-core-3.1.2.jar;C:\Users\amaguire\.m2\repository\io\dropwizard\metrics\metrics-jvm\3.1.2\metrics-jvm-3.1.2.jar;C:\Users\amaguire\.m2\repository\io\dropwizard\metrics\metrics-json\3.1.2\metrics-json-3.1.2.jar;C:\Users\amaguire\.m2\repository\io\dropwizard\metrics\metrics-graphite\3.1.2\metrics-graphite-3.1.2.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\module\jackson-module-scala_2.11\2.5.1\jackson-module-scala_2.11-2.5.1.jar;C:\Users\amaguire\.m2\repository\com\thoughtworks\paranamer\paranamer\2.6\paranamer-2.6.jar;C:\Users\amaguire\.m2\repository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;C:\Users\amaguire\.m2\repository\oro\oro\2.0.8\oro-2.0.8.jar;C:\Users\amaguire\.m2\repository\org\tachyonproject\tachyon-client\0.8.2\tachyon-client-0.8.2.jar;C:\Users\amaguire\.m2\repository\org\tachyonproject\tachyon-underfs-hdfs\0.8.2\tachyon-underfs-hdfs-0.8.2.jar;C:\Users\amaguire\.m2\repository\org\tachyonproject\tachyon-underfs-s3\0.8.2\tachyon-underfs-s3-0.8.2.jar;C:\Users\amaguire\.m2\repository\org\tachyonproject\tachyon-underfs-local\0.8.2\tachyon-underfs-local-0.8.2.jar;C:\Users\amaguire\.m2\repository\net\razorvine\pyrolite\4.9\pyrolite-4.9.jar;C:\Users\amaguire\.m2\repository\net\sf\py4j\py4j\0.9\py4j-0.9.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-catalyst_2.11\1.6.3\spark-catalyst_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\codehaus\janino\janino\2.7.8\janino-2.7.8.jar;C:\Users\amaguire\.m2\repository\org\codehaus\janino\commons-compiler\2.7.8\commons-compiler-2.7.8.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-column\1.7.0\parquet-column-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-common\1.7.0\parquet-common-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-encoding\1.7.0\parquet-encoding-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-generator\1.7.0\parquet-generator-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-hadoop\1.7.0\parquet-hadoop-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-format\2.3.0-incubating\parquet-format-2.3.0-incubating.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-jackson\1.7.0\parquet-jackson-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar;C:\Users\amaguire\.m2\repository\com\google\guava\guava\20.0\guava-20.0.jar;C:\Users\amaguire\.m2\repository\com\google\inject\guice\4.0\guice-4.0.jar;C:\Users\amaguire\.m2\repository\javax\inject\javax.inject\1\javax.inject-1.jar;C:\Users\amaguire\.m2\repository\aopalliance\aopalliance\1.0\aopalliance-1.0.jar;C:\Users\amaguire\.m2\repository\com\google\protobuf\protobuf-java\2.6.1\protobuf-java-2.6.1.jar;C:\Users\amaguire\.m2\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\amaguire\.m2\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\Users\amaguire\.m2\repository\commons-net\commons-net\3.1\commons-net-3.1.jar;C:\Users\amaguire\.m2\repository\com\sun\xml\bind\jaxb-core\2.2.11\jaxb-core-2.2.11.jar;C:\Users\amaguire\.m2\repository\com\sun\xml\bind\jaxb-impl\2.2.11\jaxb-impl-2.2.11.jar;C:\Users\amaguire\.m2\repository\com\typesafe\akka\akka-actor_2.11\2.3.16\akka-actor_2.11-2.3.16.jar;C:\Users\amaguire\.m2\repository\com\typesafe\akka\akka-remote_2.11\2.3.16\akka-remote_2.11-2.3.16.jar;C:\Users\amaguire\.m2\repository\org\uncommons\maths\uncommons-maths\1.2.2a\uncommons-maths-1.2.2a.jar;C:\Users\amaguire\.m2\repository\com\typesafe\akka\akka-slf4j_2.11\2.3.16\akka-slf4j_2.11-2.3.16.jar;C:\Users\amaguire\.m2\repository\io\netty\netty\3.10.4.Final\netty-3.10.4.Final.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.5.1\jackson-core-2.5.1.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.5.1\jackson-databind-2.5.1.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.5.1\jackson-annotations-2.5.1.jar;C:\Users\amaguire\.m2\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\Users\amaguire\.m2\repository\org\apache\commons\commons-compress\1.16.1\commons-compress-1.16.1.jar;C:\Users\amaguire\.m2\repository\org\apache\commons\commons-math3\3.5\commons-math3-3.5.jar;C:\Users\amaguire\.m2\repository\org\apache\curator\curator-recipes\2.8.0\curator-recipes-2.8.0.jar;C:\Users\amaguire\.m2\repository\org\apache\curator\curator-framework\2.8.0\curator-framework-2.8.0.jar;C:\Users\amaguire\.m2\repository\org\apache\curator\curator-client\2.8.0\curator-client-2.8.0.jar;C:\Users\amaguire\.m2\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;C:\Users\amaguire\.m2\repository\jline\jline\0.9.94\jline-0.9.94.jar;C:\Users\amaguire\.m2\repository\com\typesafe\config\1.3.0\config-1.3.0.jar;C:\Users\amaguire\.m2\repository\org\datavec\datavec-hadoop\1.0.0-beta3\datavec-hadoop-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\datavec\datavec-local\1.0.0-beta3\datavec-local-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\com\codepoetics\protonpack\1.15\protonpack-1.15.jar;C:\Users\amaguire\.m2\repository\org\datavec\datavec-arrow\1.0.0-beta3\datavec-arrow-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-arrow\1.0.0-beta3\nd4j-arrow-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\dataformat\jackson-dataformat-yaml\2.6.5\jackson-dataformat-yaml-2.6.5.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\dataformat\jackson-dataformat-xml\2.6.5\jackson-dataformat-xml-2.6.5.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.6.5\jackson-module-jaxb-annotations-2.6.5.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\datatype\jackson-datatype-joda\2.6.5\jackson-datatype-joda-2.6.5.jar;C:\Users\amaguire\.m2\repository\com\carrotsearch\hppc\0.8.1\hppc-0.8.1.jar;C:\Users\amaguire\.m2\repository\org\apache\arrow\arrow-vector\0.11.0\arrow-vector-0.11.0.jar;C:\Users\amaguire\.m2\repository\io\netty\netty-buffer\4.1.22.Final\netty-buffer-4.1.22.Final.jar;C:\Users\amaguire\.m2\repository\io\netty\netty-common\4.1.22.Final\netty-common-4.1.22.Final.jar;C:\Users\amaguire\.m2\repository\org\apache\arrow\arrow-memory\0.11.0\arrow-memory-0.11.0.jar;C:\Users\amaguire\.m2\repository\org\apache\arrow\arrow-format\0.11.0\arrow-format-0.11.0.jar" org.datavec.transform.basic.myExample
log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
=== BEFORE ===
[2016-01-01 17:00:00.000, 830a7u3, u323fy8902, 1, USA, 100.00, Legit]
[2016-01-01 18:03:01.256, 830a7u3, 9732498oeu, 3, FR, 73.20, Legit]
[2016-01-03 02:53:32.231, 78ueoau32, w234e989, 1, USA, 1621.00, Fraud]
[2016-01-03 09:30:16.832, t842uocd, 9732498oeu, 4, USA, 43.19, Legit]
[2016-01-04 23:01:52.920, t842uocd, cza8873bm, 10, MX, 159.65, Legit]
[2016-01-05 02:28:10.648, t842uocd, fgcq9803, 6, CAN, 26.33, Fraud]
[2016-01-05 10:15:36.483, rgc707ke3, tn342v7, 2, USA, -0.90, Legit]
=== AFTER ===
[[1451948512920, t842uocd, 159.65, 43.19], [1451960890648, t842uocd, 26.33, 159.65]]
[[1451671381256, 830a7u3, 73.20, 100.00]]
[]
[]
Process finished with exit code 0
Related
How can I make dynamic parameter of #Scheduled annotation?
I have a scheduled job and I want to get fixedRate dynamically but couldn't solve how to do it. FixedRate gets value as milliseconds but I want to give time as hours. And I also tried read parameter from property file and multiply it but I did not work. How can I make this? package com.ipera.communicationsuite.scheduleds; import com.ipera.communicationsuite.models.FreeDbSize; import com.ipera.communicationsuite.repositories.interfaces.IFreeDbSizeRepository; import com.ipera.communicationsuite.repositories.interfaces.settings.IPropertiesRepository; import com.ipera.communicationsuite.utilities.mail.SMTPConnection; import lombok.AllArgsConstructor; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.context.annotation.PropertySource; import org.springframework.scheduling.annotation.Scheduled; import org.springframework.stereotype.Component; import java.io.File; import java.util.ArrayList; import java.util.List; #Component #AllArgsConstructor #PropertySource("classpath:scheduled.properties") public class KeepAlive { private static Logger logger = LoggerFactory.getLogger(KeepAlive.class); private IFreeDbSizeRepository freeDbSizeRepository; private SMTPConnection smtpConnection; private IPropertiesRepository propertiesRepository; #Scheduled(fixedRateString ="${keepAlive.timer}") public void keepAliveMailSender() { StringBuilder content = new StringBuilder(); ArrayList<File> files = getDrivers(); List<FreeDbSize> list = freeDbSizeRepository.getFreeDbSize(); FreeDbSize dbDiskInfo = freeDbSizeRepository.dbDiskSize(); content.append("DB file size is: ").append(list.get(0).getType().equals("mdf") ? list.get(0).getFileSize() : list.get(1).getFileSize()).append(" MB\n") .append("DB log size is: ").append(list.get(0).getType().equals("ldf") ? list.get(0).getFileSize() : list.get(1).getFileSize()).append(" MB\n"); propertiesRepository.updateByKey("DatabaseSize", list.get(0).getType().equals("mdf") ? list.get(0).getFileSize().toString() : list.get(1).getFileSize().toString()); propertiesRepository.updateByKey("DatabaseLogSize", list.get(0).getType().equals("ldf") ? list.get(0).getFileSize().toString() : list.get(1).getFileSize().toString()); propertiesRepository.updateByKey("FreeDiskSpaceForDb", dbDiskInfo.getFreeSpace().toString()); for (int i = 0; i < files.size(); i++) { content.append("Free size for driver ").append(files.get(i)).append(" is ").append(files.get(i).getFreeSpace() / (1024 * 1024)).append(" MB\n"); propertiesRepository.createIfNotExistOrUpdate(("FreeSpaceInDisk".concat(Character.toString(files.get(i).toString().charAt(0)))), Long.toString(files.get(i).getFreeSpace() / (1024 * 1024))); } if (dbDiskInfo.getName().equals("-1")) { content.append("This application has not permission to run query for calculate free size of disk."); } else { content.append("Free size of disk which contains Db is: ").append(dbDiskInfo.getFreeSpace()); } smtpConnection.sendMail(content.toString(), "Server Is Up!!!", "fkalabalikoglu#iperasolutions.com", "", "", "", ""); logger.info("KeepAlive has runned."); } public ArrayList<File> getDrivers() { ArrayList<File> list = new ArrayList<>(); File[] drives = File.listRoots(); if (drives != null && drives.length > 0) { for (File aDrive : drives) { list.add(aDrive); } } return list; } } And also my propery file is here: keepAlive.timer=86400000
You could use SpEL in your annotation like: #Scheduled(fixedRateString ="#{new Long('${keepAlive.timer}') * 1000 * 3600}") to have expression evaluated. So keepAlive.timer would be the amount of hours. But in my opinion it would be an ugly solution. I would rather put it in the properties as you have it now and just add a comment like: # 24 hours is: 1000 * 3600 * 24 keepAlive.timer=86400000 Another way to use hours would be to use attribute cron that gives you more flexibility but might need some study before using: In your code: #Scheduled(cron = "${keepAlive.timer}") and the cron expression in your properties - for example - like: keepAlive.timer="*/60 00 21 * * ?" This would run every day # 21.00 Note this "*/60" it should accept also "0" here but in my case it did not
Weka output predictions
I've used the Weka GUI for training and testing a file (making predictions), but can't do the same with the API. The error I'm getting says there's a different number of attributes in the train and test files. In the GUI, this can be solved by checking "Output predictions". How to do something similar using the API? do you know of any samples out there? import weka.classifiers.bayes.NaiveBayes; import weka.classifiers.meta.FilteredClassifier; import weka.classifiers.trees.J48; import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; import weka.filters.Filter; import weka.filters.unsupervised.attribute.NominalToBinary; import weka.filters.unsupervised.attribute.Remove; public class WekaTutorial { public static void main(String[] args) throws Exception { DataSource trainSource = new DataSource("/tmp/classes - edited.arff"); // training Instances trainData = trainSource.getDataSet(); DataSource testSource = new DataSource("/tmp/classes_testing.arff"); Instances testData = testSource.getDataSet(); if (trainData.classIndex() == -1) { trainData.setClassIndex(trainData.numAttributes() - 1); } if (testData.classIndex() == -1) { testData.setClassIndex(testData.numAttributes() - 1); } String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -M 1 " + "-tokenizer \"weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\""); Remove remove = new Remove(); remove.setOptions(options); remove.setInputFormat(trainData); NominalToBinary filter = new NominalToBinary(); NaiveBayes nb = new NaiveBayes(); FilteredClassifier fc = new FilteredClassifier(); fc.setFilter(filter); fc.setClassifier(nb); // train and make predictions fc.buildClassifier(trainData); for (int i = 0; i < testData.numInstances(); i++) { double pred = fc.classifyInstance(testData.instance(i)); System.out.print("ID: " + testData.instance(i).value(0)); System.out.print(", actual: " + testData.classAttribute().value((int) testData.instance(i).classValue())); System.out.println(", predicted: " + testData.classAttribute().value((int) pred)); } } } Error: Exception in thread "main" java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2 != 17152 This was not an issue for the GUI.
You need to ensure that categories in train and test sets are compatible, try to combine train and test sets List item preprocess them save them as arff open two empty files copy the header from the top to line "#data" copy in training set into first file and test set into second file
Eclipse String List is not printed properly
I'm trying to print long size String list. but elements not printed properly. I'm not able to figure out what's happening. output of console : look, head of line is not visible but if I have slice list into 200, list printed properly. please help me. I'm using STS : Spring Tool Suite Version: 3.6.4.RELEASE Build Id: 201503100339 Platform: Eclipse Luna SR1 (4.4.2) code : import java.util.Arrays; import java.util.List; public class Test { public static void main(String[] args) { List<String> asList = Arrays.asList("testLongLongLongColumnName1", "testLongLongLongColumnName2", "testLongLongLongColumnName3", "testLongLongLongColumnName4", "testLongLongLongColumnName5", "testLongLongLongColumnName6", "testLongLongLongColumnName7", "testLongLongLongColumnName8", "testLongLongLongColumnName9", "testLongLongLongColumnName10", "testLongLongLongColumnName11", "testLongLongLongColumnName12", "testLongLongLongColumnName13", "testLongLongLongColumnName14", "testLongLongLongColumnName15", "testLongLongLongColumnName16", "testLongLongLongColumnName17", "testLongLongLongColumnName18", "testLongLongLongColumnName19", "testLongLongLongColumnName20", "testLongLongLongColumnName21", "testLongLongLongColumnName22", "testLongLongLongColumnName23", "testLongLongLongColumnName24", "testLongLongLongColumnName25", "testLongLongLongColumnName26", "testLongLongLongColumnName27", "testLongLongLongColumnName28", "testLongLongLongColumnName29", "testLongLongLongColumnName30", "testLongLongLongColumnName31", "testLongLongLongColumnName32", "testLongLongLongColumnName33", "testLongLongLongColumnName34", "testLongLongLongColumnName35", "testLongLongLongColumnName36", "testLongLongLongColumnName37", "testLongLongLongColumnName38", "testLongLongLongColumnName39", "testLongLongLongColumnName40", "testLongLongLongColumnName41", "testLongLongLongColumnName42", "testLongLongLongColumnName43", "testLongLongLongColumnName44", "testLongLongLongColumnName45", "testLongLongLongColumnName46", "testLongLongLongColumnName47", "testLongLongLongColumnName48", "testLongLongLongColumnName49", "testLongLongLongColumnName50", "testLongLongLongColumnName51", "testLongLongLongColumnName52", "testLongLongLongColumnName53", "testLongLongLongColumnName54", "testLongLongLongColumnName55", "testLongLongLongColumnName56", "testLongLongLongColumnName57", "testLongLongLongColumnName58", "testLongLongLongColumnName59", "testLongLongLongColumnName60", "testLongLongLongColumnName61", "testLongLongLongColumnName62", "testLongLongLongColumnName63", "testLongLongLongColumnName64", "testLongLongLongColumnName65", "testLongLongLongColumnName66", "testLongLongLongColumnName67", "testLongLongLongColumnName68", "testLongLongLongColumnName69", "testLongLongLongColumnName70", "testLongLongLongColumnName71", "testLongLongLongColumnName72", "testLongLongLongColumnName73", "testLongLongLongColumnName74", "testLongLongLongColumnName75", "testLongLongLongColumnName76", "testLongLongLongColumnName77", "testLongLongLongColumnName78", "testLongLongLongColumnName79", "testLongLongLongColumnName80", "testLongLongLongColumnName81", "testLongLongLongColumnName82", "testLongLongLongColumnName83", "testLongLongLongColumnName84", "testLongLongLongColumnName85", "testLongLongLongColumnName86", "testLongLongLongColumnName87", "testLongLongLongColumnName88", "testLongLongLongColumnName89", "testLongLongLongColumnName90", "testLongLongLongColumnName91", "testLongLongLongColumnName92", "testLongLongLongColumnName93", "testLongLongLongColumnName94", "testLongLongLongColumnName95", "testLongLongLongColumnName96", "testLongLongLongColumnName97", "testLongLongLongColumnName98", "testLongLongLongColumnName99", "testLongLongLongColumnName100", "testLongLongLongColumnName101", "testLongLongLongColumnName102", "testLongLongLongColumnName103", "testLongLongLongColumnName104", "testLongLongLongColumnName105", "testLongLongLongColumnName106", "testLongLongLongColumnName107", "testLongLongLongColumnName108", "testLongLongLongColumnName109", "testLongLongLongColumnName110", "testLongLongLongColumnName111", "testLongLongLongColumnName112", "testLongLongLongColumnName113", "testLongLongLongColumnName114", "testLongLongLongColumnName115", "testLongLongLongColumnName116", "testLongLongLongColumnName117", "testLongLongLongColumnName118", "testLongLongLongColumnName119", "testLongLongLongColumnName120", "testLongLongLongColumnName121", "testLongLongLongColumnName122", "testLongLongLongColumnName123", "testLongLongLongColumnName124", "testLongLongLongColumnName125", "testLongLongLongColumnName126", "testLongLongLongColumnName127", "testLongLongLongColumnName128", "testLongLongLongColumnName129", "testLongLongLongColumnName130", "testLongLongLongColumnName131", "testLongLongLongColumnName132", "testLongLongLongColumnName133", "testLongLongLongColumnName134", "testLongLongLongColumnName135", "testLongLongLongColumnName136", "testLongLongLongColumnName137", "testLongLongLongColumnName138", "testLongLongLongColumnName139", "testLongLongLongColumnName140", "testLongLongLongColumnName141", "testLongLongLongColumnName142", "testLongLongLongColumnName143", "testLongLongLongColumnName144", "testLongLongLongColumnName145", "testLongLongLongColumnName146", "testLongLongLongColumnName147", "testLongLongLongColumnName148", "testLongLongLongColumnName149", "testLongLongLongColumnName150", "testLongLongLongColumnName151", "testLongLongLongColumnName152", "testLongLongLongColumnName153", "testLongLongLongColumnName154", "testLongLongLongColumnName155", "testLongLongLongColumnName156", "testLongLongLongColumnName157", "testLongLongLongColumnName158", "testLongLongLongColumnName159", "testLongLongLongColumnName160", "testLongLongLongColumnName161", "testLongLongLongColumnName162", "testLongLongLongColumnName163", "testLongLongLongColumnName164", "testLongLongLongColumnName165", "testLongLongLongColumnName166", "testLongLongLongColumnName167", "testLongLongLongColumnName168", "testLongLongLongColumnName169", "testLongLongLongColumnName170", "testLongLongLongColumnName171", "testLongLongLongColumnName172", "testLongLongLongColumnName173", "testLongLongLongColumnName174", "testLongLongLongColumnName175", "testLongLongLongColumnName176", "testLongLongLongColumnName177", "testLongLongLongColumnName178", "testLongLongLongColumnName179", "testLongLongLongColumnName180", "testLongLongLongColumnName181", "testLongLongLongColumnName182", "testLongLongLongColumnName183", "testLongLongLongColumnName184", "testLongLongLongColumnName185", "testLongLongLongColumnName186", "testLongLongLongColumnName187", "testLongLongLongColumnName188", "testLongLongLongColumnName189", "testLongLongLongColumnName190", "testLongLongLongColumnName191", "testLongLongLongColumnName192", "testLongLongLongColumnName193", "testLongLongLongColumnName194", "testLongLongLongColumnName195", "testLongLongLongColumnName196", "testLongLongLongColumnName197", "testLongLongLongColumnName198", "testLongLongLongColumnName199", "testLongLongLongColumnName200", "testLongLongLongColumnName201", "testLongLongLongColumnName202", "testLongLongLongColumnName203", "testLongLongLongColumnName204", "testLongLongLongColumnName205", "testLongLongLongColumnName206", "testLongLongLongColumnName207", "testLongLongLongColumnName208", "testLongLongLongColumnName209", "testLongLongLongColumnName210", "testLongLongLongColumnName211", "testLongLongLongColumnName212", "testLongLongLongColumnName213", "testLongLongLongColumnName214", "testLongLongLongColumnName215", "testLongLongLongColumnName216", "testLongLongLongColumnName217", "testLongLongLongColumnName218", "testLongLongLongColumnName219", "testLongLongLongColumnName220", "testLongLongLongColumnName221", "testLongLongLongColumnName222", "testLongLongLongColumnName223", "testLongLongLongColumnName224", "testLongLongLongColumnName225", "testLongLongLongColumnName226", "testLongLongLongColumnName227", "testLongLongLongColumnName228", "testLongLongLongColumnName229", "testLongLongLongColumnName230", "testLongLongLongColumnName231", "testLongLongLongColumnName232", "testLongLongLongColumnName233", "testLongLongLongColumnName234", "testLongLongLongColumnName235", "testLongLongLongColumnName236", "testLongLongLongColumnName237", "testLongLongLongColumnName238", "testLongLongLongColumnName239", "testLongLongLongColumnName240", "testLongLongLongColumnName241", "testLongLongLongColumnName242", "testLongLongLongColumnName243", "testLongLongLongColumnName244", "testLongLongLongColumnName245", "testLongLongLongColumnName246", "testLongLongLongColumnName247", "testLongLongLongColumnName248", "testLongLongLongColumnName249", "testLongLongLongColumnName250", "testLongLongLongColumnName251", "testLongLongLongColumnName252", "testLongLongLongColumnName253", "testLongLongLongColumnName254", "testLongLongLongColumnName255", "testLongLongLongColumnName256", "testLongLongLongColumnName257", "testLongLongLongColumnName258", "testLongLongLongColumnName259", "testLongLongLongColumnName260", "testLongLongLongColumnName261", "testLongLongLongColumnName262", "testLongLongLongColumnName263", "testLongLongLongColumnName264", "testLongLongLongColumnName265", "testLongLongLongColumnName266", "testLongLongLongColumnName267", "testLongLongLongColumnName268", "testLongLongLongColumnName269", "testLongLongLongColumnName270", "testLongLongLongColumnName271", "testLongLongLongColumnName272", "testLongLongLongColumnName273", "testLongLongLongColumnName274", "testLongLongLongColumnName275", "testLongLongLongColumnName276", "testLongLongLongColumnName277", "testLongLongLongColumnName278", "testLongLongLongColumnName279", "testLongLongLongColumnName280", "testLongLongLongColumnName281", "testLongLongLongColumnName282", "testLongLongLongColumnName283", "testLongLongLongColumnName284", "testLongLongLongColumnName285", "testLongLongLongColumnName286", "testLongLongLongColumnName287", "testLongLongLongColumnName288", "testLongLongLongColumnName289", "testLongLongLongColumnName290", "testLongLongLongColumnName291", "testLongLongLongColumnName292", "testLongLongLongColumnName293", "testLongLongLongColumnName294", "testLongLongLongColumnName295", "testLongLongLongColumnName296", "testLongLongLongColumnName297", "testLongLongLongColumnName298", "testLongLongLongColumnName299", "testLongLongLongColumnName300", "testLongLongLongColumnName301", "testLongLongLongColumnName302", "testLongLongLongColumnName303", "testLongLongLongColumnName304", "testLongLongLongColumnName305", "testLongLongLongColumnName306", "testLongLongLongColumnName307", "testLongLongLongColumnName308", "testLongLongLongColumnName309", "testLongLongLongColumnName310", "testLongLongLongColumnName311", "testLongLongLongColumnName312", "testLongLongLongColumnName313", "testLongLongLongColumnName314", "testLongLongLongColumnName315", "testLongLongLongColumnName316", "testLongLongLongColumnName317", "testLongLongLongColumnName318", "testLongLongLongColumnName319", "testLongLongLongColumnName320", "testLongLongLongColumnName321", "testLongLongLongColumnName322", "testLongLongLongColumnName323", "testLongLongLongColumnName324", "testLongLongLongColumnName325", "testLongLongLongColumnName326", "testLongLongLongColumnName327", "testLongLongLongColumnName328", "testLongLongLongColumnName329", "testLongLongLongColumnName330", "testLongLongLongColumnName331", "testLongLongLongColumnName332", "testLongLongLongColumnName333", "testLongLongLongColumnName334", "testLongLongLongColumnName335", "testLongLongLongColumnName336", "testLongLongLongColumnName337", "testLongLongLongColumnName338", "testLongLongLongColumnName339", "testLongLongLongColumnName340", "testLongLongLongColumnName341", "testLongLongLongColumnName342", "testLongLongLongColumnName343", "testLongLongLongColumnName344", "testLongLongLongColumnName345", "testLongLongLongColumnName346", "testLongLongLongColumnName347", "testLongLongLongColumnName348", "testLongLongLongColumnName349", "testLongLongLongColumnName350"); System.out.println(asList); } }
cannot access neo4j database with py2neo after creating it with the java BatchInserter
SOLVED OK, I just messed with neo4j-server.properties` config file, I shouldn't have written the db path using "...". I have created a neo4j database using java's inserter and I strive to access it with py2neo. Here's my java code: ///opt/java/64/jdk1.6.0_45/bin/javac -classpath $HOME/opt/usr/neo4j-community-1.8.2/lib/*:. neo_batch.java ///opt/java/64/jdk1.6.0_45/bin/java -classpath $HOME/opt/usr/neo4j-community-1.8.2/lib/*:. neo_batch import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.Node; import org.neo4j.graphdb.Transaction; import org.neo4j.graphdb.factory.GraphDatabaseFactory; import org.neo4j.graphdb.index.Index; import java.io.File; import java.io.IOException; import java.io.InputStream; import java.io.Writer; import java.util.HashMap; import java.util.Map; import java.lang.Long; import org.neo4j.graphdb.Direction; import org.neo4j.graphdb.DynamicRelationshipType; import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.Node; import org.neo4j.graphdb.RelationshipType; import org.neo4j.helpers.collection.MapUtil; import org.neo4j.unsafe.batchinsert.BatchInserter; import org.neo4j.unsafe.batchinsert.BatchInserterImpl; import org.neo4j.unsafe.batchinsert.BatchInserters; import org.neo4j.unsafe.batchinsert.BatchInserterIndex; import org.neo4j.unsafe.batchinsert.BatchInserterIndexProvider; import org.neo4j.unsafe.batchinsert.LuceneBatchInserterIndexProvider; public class neo_batch{ private static final String KEY = "id"; public static void main(String[] args) { //create & connect 2 neo db folder String batch_dir = "neo4j-batchinserter-store"; BatchInserter inserter = BatchInserters.inserter( batch_dir ); //set up neo index BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider( inserter ); BatchInserterIndex OneIndex = indexProvider.nodeIndex( "one", MapUtil.stringMap( "type", "exact" ) ); OneIndex.setCacheCapacity( "id", 100000 ); //batchin graph, index and relationships RelationshipType linked = DynamicRelationshipType.withName( "linked" ); for (int i=0;i<10;i++){ System.out.println(i); long Node1 = createIndexedNode(inserter, OneIndex, i); long Node2 = createIndexedNode(inserter, OneIndex, i+10); inserter.createRelationship(Node1, Node2, linked, null); } indexProvider.shutdown(); inserter.shutdown(); } // START SNIPPET: helperMethods private static long createIndexedNode(BatchInserter inserter,BatchInserterIndex OneIndex,final long id) { Map<String, Object> properties = new HashMap<String, Object>(); properties.put( KEY, id ); long node = inserter.createNode( properties ); OneIndex.add( node, properties); return node; } // END SNIPPET: helperMethods } Then I modify the neo4j-server.properties config file accordingly and start neo4j start. The following python code suggests the graph is empty from py2neo import neo4j graph = neo4j.GraphDatabaseService("http://localhost:7474/db/data/") graph.size() Out[8]: 0 graph.get_indexed_node("one",'id',1) What's wrong with my appraoch? Thanks EDIT Neither can I count the nodes with a cypher: neo4j-sh (?)$ START n=node(*) > return count(*); +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row 0 ms EDIT 2 I can check with the java api that the indexes and nodes exist private static void query_batched_db(){ GraphDatabaseService graphDb = new GraphDatabaseFactory().newEmbeddedDatabase( batch_dir); IndexManager indexes = graphDb.index(); boolean oneExists = indexes.existsForNodes("one"); System.out.println("Does the 'one' index exists: "+oneExists); System.out.println("list indexes: "+graphDb.index().nodeIndexNames()); //search index 'one' Index<Node> oneIndex = graphDb.index().forNodes( "one" ); for (int i=0;i<25;i++){ IndexHits<Node> hits = oneIndex.get( KEY, i ); System.out.println(hits.size()); } graphDb.shutdown(); } Where the output is Does the 'one' index exists: true list indexes: [Ljava.lang.String;#26ae533a 1 1 ... 1 1 0 0 0 0 0 Now if I populate the graph using python, I won't be able to access them with the previous java method (will count 20 again) from py2neo import neo4j graph = neo4j.GraphDatabaseService("http://localhost:7474/db/data/") idx=graph.get_or_create_index(neo4j.Node,"idx") for k in range(100): graph.get_or_create_indexed_node('idx','id',k,{'id':k} EDIT 3 Now I delete the store I created with the batchinserter, namely neo4j-test-store while the neo4j-server.properties config file continues to point to the deleted store, namely org.neo4j.server.database.location="{some_path}/neo4j-test-store". Now if I run a cypher count, I got a 100, 100 being the number of nodes I inserted using py2neo. I am going crazy with this stuff! SOLVED OK, I just messed with neo4j-server.properties` config file, I shouldn't have written the db path using "...".
Calling R in java-Rcaller
I am trying to implement clustering using R in java by employing R caller. I am trying to run sample code for clustering validation and I get that common error faced by most of the users: Premature end of file package test; import rcaller.RCaller; import java.io.File; import java.lang.*; import java.util.*; import java.awt.image.DataBuffer; public class test3 { public static void main(String[] args) { new test3(); } public test3() { try{ RCaller caller = new RCaller(); caller.cleanRCode(); caller.setRscriptExecutable("C:/Program Files/R/R-2.15.1/bin/x64/Rscript"); caller.cleanRCode(); caller.addRCode("library(clvalid)"); caller.addRCode("data(mouse)"); caller.addRCode("express <- mouse [,c(M1,M2,M3,NC1,NC2,NC3)]"); caller.addRCode("rownames (express) <- mouse$ID "); caller.addRCode("intern <- clValid(express, 2:6 , clMethods = c( hierarchical,kmeans,diana,clara,model) ,validation = internal)"); caller.addRCode("b <- summary(intern) "); caller.runAndReturnResult("b"); } catch (Exception e){ e.printStackTrace(); } } }
You have some spelling mistakes in you code. like clValid not clvalid , and you miss many quotes like "hierarchical",.... I think it is better to put your code in a script, and call it from java like this : Runtime.getRuntime().exec("Rscript myScript.R"); where myScript.R is : library(clValid) data(mouse) express <- mouse [,c('M1','M2','M3','NC1','NC2','NC3')] rownames (express) <- mouse$ID intern <- clValid(express, 2:6 , clMethods = c( 'hierarchical','kmeans', 'diana','clara','model') , validation = 'internal') b <- summary(intern)