How can I make dynamic parameter of #Scheduled annotation? - java

I have a scheduled job and I want to get fixedRate dynamically but couldn't solve how to do it.
FixedRate gets value as milliseconds but I want to give time as hours. And I also tried read parameter from property file and multiply it but I did not work. How can I make this?
package com.ipera.communicationsuite.scheduleds;
import com.ipera.communicationsuite.models.FreeDbSize;
import com.ipera.communicationsuite.repositories.interfaces.IFreeDbSizeRepository;
import com.ipera.communicationsuite.repositories.interfaces.settings.IPropertiesRepository;
import com.ipera.communicationsuite.utilities.mail.SMTPConnection;
import lombok.AllArgsConstructor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.PropertySource;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
#Component
#AllArgsConstructor
#PropertySource("classpath:scheduled.properties")
public class KeepAlive {
private static Logger logger = LoggerFactory.getLogger(KeepAlive.class);
private IFreeDbSizeRepository freeDbSizeRepository;
private SMTPConnection smtpConnection;
private IPropertiesRepository propertiesRepository;
#Scheduled(fixedRateString ="${keepAlive.timer}")
public void keepAliveMailSender() {
StringBuilder content = new StringBuilder();
ArrayList<File> files = getDrivers();
List<FreeDbSize> list = freeDbSizeRepository.getFreeDbSize();
FreeDbSize dbDiskInfo = freeDbSizeRepository.dbDiskSize();
content.append("DB file size is: ").append(list.get(0).getType().equals("mdf") ? list.get(0).getFileSize() : list.get(1).getFileSize()).append(" MB\n")
.append("DB log size is: ").append(list.get(0).getType().equals("ldf") ? list.get(0).getFileSize() : list.get(1).getFileSize()).append(" MB\n");
propertiesRepository.updateByKey("DatabaseSize", list.get(0).getType().equals("mdf") ? list.get(0).getFileSize().toString() : list.get(1).getFileSize().toString());
propertiesRepository.updateByKey("DatabaseLogSize", list.get(0).getType().equals("ldf") ? list.get(0).getFileSize().toString() : list.get(1).getFileSize().toString());
propertiesRepository.updateByKey("FreeDiskSpaceForDb", dbDiskInfo.getFreeSpace().toString());
for (int i = 0; i < files.size(); i++) {
content.append("Free size for driver ").append(files.get(i)).append(" is ").append(files.get(i).getFreeSpace() / (1024 * 1024)).append(" MB\n");
propertiesRepository.createIfNotExistOrUpdate(("FreeSpaceInDisk".concat(Character.toString(files.get(i).toString().charAt(0)))), Long.toString(files.get(i).getFreeSpace() / (1024 * 1024)));
}
if (dbDiskInfo.getName().equals("-1")) {
content.append("This application has not permission to run query for calculate free size of disk.");
} else {
content.append("Free size of disk which contains Db is: ").append(dbDiskInfo.getFreeSpace());
}
smtpConnection.sendMail(content.toString(), "Server Is Up!!!", "fkalabalikoglu#iperasolutions.com", "", "", "", "");
logger.info("KeepAlive has runned.");
}
public ArrayList<File> getDrivers() {
ArrayList<File> list = new ArrayList<>();
File[] drives = File.listRoots();
if (drives != null && drives.length > 0) {
for (File aDrive : drives) {
list.add(aDrive);
}
}
return list;
}
}
And also my propery file is here:
keepAlive.timer=86400000

You could use SpEL in your annotation like:
#Scheduled(fixedRateString ="#{new Long('${keepAlive.timer}') * 1000 * 3600}")
to have expression evaluated. So keepAlive.timer would be the amount of hours.
But in my opinion it would be an ugly solution. I would rather put it in the properties as you have it now and just add a comment like:
# 24 hours is: 1000 * 3600 * 24
keepAlive.timer=86400000
Another way to use hours would be to use attribute cron that gives you more flexibility but might need some study before using:
In your code:
#Scheduled(cron = "${keepAlive.timer}")
and the cron expression in your properties - for example - like:
keepAlive.timer="*/60 00 21 * * ?"
This would run every day # 21.00
Note this "*/60" it should accept also "0" here but in my case it did not

Related

Powershell: Write-Host is very slow

I have a Java application which has a functionality to take a screenshot. It does it by running Powershell script:
Add-Type -AssemblyName System.Windows.Forms,System.Drawing
$screens = [Windows.Forms.Screen]::AllScreens
$top = ($screens.Bounds.Top | Measure-Object -Minimum).Minimum
$left = ($screens.Bounds.Left | Measure-Object -Minimum).Minimum
$width = ($screens.Bounds.Right | Measure-Object -Maximum).Maximum
$height = ($screens.Bounds.Bottom | Measure-Object -Maximum).Maximum
$bounds = [Drawing.Rectangle]::FromLTRB($left, $top, $width, $height)
$bmp = New-Object System.Drawing.Bitmap ([int]$bounds.width), ([int]$bounds.height)
$graphics = [Drawing.Graphics]::FromImage($bmp)
$graphics.CopyFromScreen($bounds.Location, [Drawing.Point]::Empty, $bounds.size)
$memStream = New-Object System.IO.MemoryStream
$bmp.Save($memStream, [Drawing.Imaging.ImageFormat]::Jpeg)
Write-Host $memStream.ToArray()
$graphics.Dispose()
$bmp.Dispose()
$memStream.Dispose()
Java application listens to the output of it and does some operations on it. The problem is that sometimes Write-Host $memStream.ToArray() takes too much time (Sometimes in 2 minutes, sometimes 3, or even 5). I'm not familiar with Powershell. Is there any analog of Write-Host which is faster? Or maybe I can take a screenshot using some other functionality faster? Thanks
You stated a solution using other functionality would be acceptable, so why not perform the screen capture directly with the Java application instead? Java is fully capable of this natively:
import java.awt.Robot;
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.awt.GraphicsDevice;
import java.awt.GraphicsEnvironment;
import java.awt.AWTException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;
// Set up Robot and other vars
Robot robot = new Robot();
String imgFormat = "jpg";
BufferedImage screenBuffer;
Rectangle screenBounds;
// Enumerate all screens
GraphicsEnvironment graphEnv = GraphicsEnvironment.getLocalGraphicsEnvironment();
GraphicsDevice[] screens = graphEnv.getScreenDevices();
// Variables only used for generating filename
String fnameFormat = "%s-%s-screencap.%s";
String dtNowString = new SimpleDateFormat("yyyyMMddHHmmss").format(Calendar.getInstance().getTime());
String filename = String.format(fnameFormat, dtNowString, "all", imgFormat);
Rectangle allScreenBounds = new Rectangle();
int num = 0;
for(GraphicsDevice screen : screens) {
screenBounds = screen.getDefaultConfiguration().getBounds();
allScreenBounds.x = Math.min(allScreenBounds.x, screenBounds.x);
allScreenBounds.y = Math.min(allScreenBounds.y, screenBounds.y);
// Make sure we only add extra pixels to the total width and height, subtracting overlapping dimensions
// Does not take into account non-continuous display area, normally impossible on Windows
allScreenBounds.width += Math.abs(allScreenBounds.width - (screenBounds.width + screenBounds.x));
allScreenBounds.height += Math.abs(allScreenBounds.height - (screenBounds.height + screenBounds.y));
System.out.println(String.format("Display %d: X=%d, Y=%d, Height=%d, Width=%d", num++, screenBounds.x, screenBounds.y, screenBounds.height, screenBounds.width));
}
System.out.println(String.format("Screen Area: X=%d, Y=%d, Height=%d, Width=%d", allScreenBounds.x, allScreenBounds.y, allScreenBounds.height, allScreenBounds.width));
screenBuffer = robot.createScreenCapture(allScreenBounds);
// Save the screencap to file
ImageIO.write(screenBuffer, imgFormat, new File(filename));
There is file-writing code there for testing but if this is performed by your application you can remove the filename variables, import javax.imageio.ImageIO, and the ImageIO.write call as you'll have the screenshot data in screenBuffer instead.

Java Apache Spark flatMaps & Data Wrangling

I have to pivot the data in a file and then store it in another file. I am having some difficulty pivoting the data.
I have multiple files, that contain data which looks somewhat like I show below. The columns are variable lengths. I am trying to merge the files, first. But for some reason, the output is not correct. I haven't even tried the pivot method, but am not sure how to use it either.
How can this be achieved?
File 1:
0,26,27,30,120
201008,100,1000,10,400
201009,200,2000,20,500
201010,300,3000,30,600
File 2:
0,26,27,30,120,145
201008,100,1000,10,400,200
201009,200,2000,20,500,100
201010,300,3000,30,600,150
File 3:
0,26,27,120,145
201008,100,10,400,200
201009,200,20,500,100
201010,300,30,600,150
Output:
201008,26,100
201008,27,1000
201008,30,10
201008,120,400
201008,145,200
201009,26,200
201009,27,2000
201009,30,20
201009,120,500
201009,145,100
.....
I am not quite familiar with Spark, but am trying to use flatMap and flatMapValues. I am not sure how I can use it for now, but would appreciate some guidance.
import org.apache.commons.lang.StringUtils;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.sql.SparkSession;
import lombok.extern.slf4j.Slf4j;
#Slf4j
public class ExecutionTest {
public static void main(String[] args) {
Logger.getLogger("org.apache").setLevel(Level.WARN);
Logger.getLogger("org.spark_project").setLevel(Level.WARN);
Logger.getLogger("io.netty").setLevel(Level.WARN);
log.info("Starting...");
// Step 1: Create a SparkContext.
boolean isRunLocally = Boolean.valueOf(args[0]);
String filePath = args[1];
SparkConf conf = new SparkConf().setAppName("Variable File").set("serializer",
"org.apache.spark.serializer.KryoSerializer");
if (isRunLocally) {
log.info("System is running in local mode");
conf.setMaster("local[*]").set("spark.executor.memory", "2g");
}
SparkSession session = SparkSession.builder().config(conf).getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(session.sparkContext());
jsc.textFile(filePath, 2)
.map(new Function<String, String[]>() {
private static final long serialVersionUID = 1L;
#Override
public String[] call(String v1) throws Exception {
return StringUtils.split(v1, ",");
}
})
.foreach(new VoidFunction<String[]>() {
private static final long serialVersionUID = 1L;
#Override
public void call(String[] t) throws Exception {
for (String string : t) {
log.info(string);
}
}
});
}
}
Solution in Scala as I am not a JAVA person, you should be able to adapt. And add sorting, cache, etc.
Data is as follows, 3 files with duplicate entry evident, get rid of that if you do not want.
0, 5,10, 15 20
202008, 5,10, 15, 20
202009,10,20,100,200
8 rows generated above.
0,888,999
202008, 5, 10
202009, 10, 20
4 rows generated above.
0, 5
202009,10
1 row, which is a duplicate.
// Bit lazy with columns names, but anyway.
import org.apache.spark.sql.functions.input_file_name
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
import spark.implicits._
val inputPath: String = "/FileStore/tables/g*.txt"
val rdd = spark.read.text(inputPath)
.select(input_file_name, $"value")
.as[(String, String)]
.rdd
val rdd2 = rdd.zipWithIndex
val rdd3 = rdd2.map(x => (x._1._1, x._2, x._1._2.split(",").toList.map(_.toInt)))
val rdd4 = rdd3.map { case (pfx, pfx2, list) => (pfx,pfx2,list.zipWithIndex) }
val df = rdd4.toDF()
df.show(false)
df.printSchema()
val df2 = df.withColumn("rankF", row_number().over(Window.partitionBy($"_1").orderBy($"_2".asc)))
df2.show(false)
df2.printSchema()
val df3 = df2.withColumn("elements", explode($"_3"))
df3.show(false)
df3.printSchema()
val df4 = df3.select($"_1", $"rankF", $"elements".getField("_1"), $"elements".getField("_2")).toDF("fn", "line_num", "val", "col_pos")
df4.show(false)
df4.printSchema()
df4.createOrReplaceTempView("df4temp")
val df51 = spark.sql("""SELECT hdr.fn, hdr.line_num, hdr.val AS pfx, hdr.col_pos
FROM df4temp hdr
WHERE hdr.line_num <> 1
AND hdr.col_pos = 0
""")
df51.show(100,false)
val df52 = spark.sql("""SELECT t1.fn, t1.val AS val1, t1.col_pos, t2.line_num, t2.val AS val2
FROM df4temp t1, df4temp t2
WHERE t1.col_pos <> 0
AND t1.col_pos = t2.col_pos
AND t1.line_num <> t2.line_num
AND t1.line_num = 1
AND t1.fn = t2.fn
""")
df52.show(100,false)
df51.createOrReplaceTempView("df51temp")
df52.createOrReplaceTempView("df52temp")
val df53 = spark.sql("""SELECT DISTINCT t1.pfx, t2.val1, t2.val2
FROM df51temp t1, df52temp t2
WHERE t1.fn = t2.fn
AND t1.line_num = t2.line_num
""")
df53.show(false)
returns:
+------+----+----+
|pfx |val1|val2|
+------+----+----+
|202008|888 |5 |
|202009|999 |20 |
|202009|20 |200 |
|202008|5 |5 |
|202008|10 |10 |
|202009|888 |10 |
|202008|15 |15 |
|202009|5 |10 |
|202009|10 |20 |
|202009|15 |100 |
|202008|20 |20 |
|202008|999 |10 |
+------+----+----+
What we see is Data Wrangling requiring massaged data for tempview creations and JOINing with SQL appropriately.
The key here is to know how to massage the data to make things easy. Note no groupBy etc. Per file, with varying length stuff, JOINing not attempted in RDD, too inflexible. Rank shows line#, so you know the first line with the 0 business.
This is what we call Data Wrangling. This is what we also call hard work for a few points on SO. This is one of my best efforts, and also one of the last of such efforts.
Weakness of solution is a lot of work to get 1st record of a file, there are alternatives. https://www.cyberciti.biz/faq/unix-linux-display-first-line-of-file/ preprocesing is what I would realistically consider.

Lag a value with Datavec transform

I'm trying to figure out how to get a lagged value of a field as part of a datavec transform step.
Here is a little example built off the dl4j examples:
import org.datavec.api.records.reader.RecordReader;
import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
import org.datavec.api.split.FileSplit;
import org.datavec.api.transform.TransformProcess;
import org.datavec.api.transform.schema.Schema;
import org.datavec.api.writable.Writable;
import org.datavec.local.transforms.LocalTransformExecutor;
import org.nd4j.linalg.io.ClassPathResource;
import java.io.File;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class myExample {
public static void main(String[] args) throws Exception {
Schema inputDataSchema = new Schema.Builder()
.addColumnString("DateTimeString")
.addColumnsString("CustomerID", "MerchantID")
.addColumnInteger("NumItemsInTransaction")
.addColumnCategorical("MerchantCountryCode", Arrays.asList("USA","CAN","FR","MX"))
.addColumnDouble("TransactionAmountUSD",0.0,null,false,false) //$0.0 or more, no maximum limit, no NaN and no Infinite values
.addColumnCategorical("FraudLabel", Arrays.asList("Fraud","Legit"))
.build();
TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
.removeAllColumnsExceptFor("DateTimeString","TransactionAmountUSD")
.build();
File inputFile = new ClassPathResource("BasicDataVecExample/exampledata.csv").getFile();
//Define input reader and output writer:
RecordReader rr = new CSVRecordReader(1, ',');
rr.initialize(new FileSplit(inputFile));
//Process the data:
List<List<Writable>> originalData = new ArrayList<>();
while(rr.hasNext()){
originalData.add(rr.next());
}
List<List<Writable>> processedData = LocalTransformExecutor.execute(originalData, tp);
int numRows = 5;
System.out.println("=== BEFORE ===");
for (int i=0;i<=numRows;i++) {
System.out.println(originalData.get(i));
}
System.out.println("=== AFTER ===");
for (int i=0;i<=numRows;i++) {
System.out.println(processedData.get(i));
}
}
}
I'm looking to get a lagged value (ordered by DateTimeString) of TransactionAmountUSD
I was looking at sequenceMovingWindowReduce from the docs but could not figure it out. Also could not really find any examples in the examples repo that seemed to do anything similar to this.
Thanks to some help from Alex Black on the dl4j gitter channel i can post my own answer.
Tip to anyone new to dl4j - there is lots of good things to look at in the tests code too in addition to the examples and tutorials.
Here is my updated toy example code:
package org.datavec.transform.basic;
import org.datavec.api.records.reader.RecordReader;
import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
import org.datavec.api.split.FileSplit;
import org.datavec.api.transform.TransformProcess;
import org.datavec.api.transform.schema.Schema;
import org.datavec.api.transform.sequence.comparator.NumericalColumnComparator;
import org.datavec.api.transform.transform.sequence.SequenceOffsetTransform;
import org.datavec.api.writable.Writable;
import org.datavec.local.transforms.LocalTransformExecutor;
import org.joda.time.DateTimeZone;
import org.nd4j.linalg.io.ClassPathResource;
import java.io.File;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class myExample {
public static void main(String[] args) throws Exception {
Schema inputDataSchema = new Schema.Builder()
.addColumnString("DateTimeString")
.addColumnsString("CustomerID", "MerchantID")
.addColumnInteger("NumItemsInTransaction")
.addColumnCategorical("MerchantCountryCode", Arrays.asList("USA","CAN","FR","MX"))
.addColumnDouble("TransactionAmountUSD",0.0,null,false,false) //$0.0 or more, no maximum limit, no NaN and no Infinite values
.addColumnCategorical("FraudLabel", Arrays.asList("Fraud","Legit"))
.build();
TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
.removeAllColumnsExceptFor("CustomerID", "DateTimeString","TransactionAmountUSD")
.stringToTimeTransform("DateTimeString","YYYY-MM-DD HH:mm:ss.SSS", DateTimeZone.UTC)
.convertToSequence(Arrays.asList("CustomerID"), new NumericalColumnComparator("DateTimeString"))
.offsetSequence(Arrays.asList("TransactionAmountUSD"),1, SequenceOffsetTransform.OperationType.NewColumn)
.build();
File inputFile = new ClassPathResource("BasicDataVecExample/exampledata.csv").getFile();
//Define input reader and output writer:
RecordReader rr = new CSVRecordReader(0, ',');
rr.initialize(new FileSplit(inputFile));
//Process the data:
List<List<Writable>> originalData = new ArrayList<>();
while(rr.hasNext()){
originalData.add(rr.next());
}
List<List<List<Writable>>> processedData = LocalTransformExecutor.executeToSequence(originalData, tp);
System.out.println("=== BEFORE ===");
for (int i=0;i<originalData.size();i++) {
System.out.println(originalData.get(i));
}
System.out.println("=== AFTER ===");
for (int i=0;i<processedData.size();i++) {
System.out.println(processedData.get(i));
}
}
}
This should give some output like below where you can see a now col with the last value for the transaction amount for each customer id is added.
"C:\Program Files\Java\jdk1.8.0_201\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.1\lib\idea_rt.jar=56103:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.1\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_201\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_201\jre\lib\rt.jar;C:\Users\amaguire\Documents\java_learning\dl4j-examples\datavec-examples\target\classes;C:\Users\amaguire\.m2\repository\org\datavec\datavec-api\1.0.0-beta3\datavec-api-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\jetbrains\annotations\13.0\annotations-13.0.jar;C:\Users\amaguire\.m2\repository\org\apache\commons\commons-lang3\3.6\commons-lang3-3.6.jar;C:\Users\amaguire\.m2\repository\commons-io\commons-io\2.5\commons-io-2.5.jar;C:\Users\amaguire\.m2\repository\commons-codec\commons-codec\1.10\commons-codec-1.10.jar;C:\Users\amaguire\.m2\repository\org\slf4j\slf4j-api\1.7.21\slf4j-api-1.7.21.jar;C:\Users\amaguire\.m2\repository\joda-time\joda-time\2.2\joda-time-2.2.jar;C:\Users\amaguire\.m2\repository\org\yaml\snakeyaml\1.12\snakeyaml-1.12.jar;C:\Users\amaguire\.m2\repository\org\nd4j\jackson\1.0.0-beta3\jackson-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;C:\Users\amaguire\.m2\repository\org\freemarker\freemarker\2.3.23\freemarker-2.3.23.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-common\1.0.0-beta3\nd4j-common-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-api\1.0.0-beta3\nd4j-api-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\com\google\flatbuffers\flatbuffers-java\1.9.0\flatbuffers-java-1.9.0.jar;C:\Users\amaguire\.m2\repository\com\github\os72\protobuf-java-shaded-351\0.9\protobuf-java-shaded-351-0.9.jar;C:\Users\amaguire\.m2\repository\com\github\os72\protobuf-java-util-shaded-351\0.9\protobuf-java-util-shaded-351-0.9.jar;C:\Users\amaguire\.m2\repository\com\google\code\gson\gson\2.7\gson-2.7.jar;C:\Users\amaguire\.m2\repository\org\objenesis\objenesis\2.6\objenesis-2.6.jar;C:\Users\amaguire\.m2\repository\uk\com\robust-it\cloning\1.9.3\cloning-1.9.3.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-buffer\1.0.0-beta3\nd4j-buffer-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\bytedeco\javacpp\1.4.3\javacpp-1.4.3.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-context\1.0.0-beta3\nd4j-context-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\net\ericaro\neoitertools\1.0.0\neoitertools-1.0.0.jar;C:\Users\amaguire\.m2\repository\com\clearspring\analytics\stream\2.7.0\stream-2.7.0.jar;C:\Users\amaguire\.m2\repository\net\sf\opencsv\opencsv\2.3\opencsv-2.3.jar;C:\Users\amaguire\.m2\repository\com\tdunning\t-digest\3.2\t-digest-3.2.jar;C:\Users\amaguire\.m2\repository\it\unimi\dsi\fastutil\6.5.7\fastutil-6.5.7.jar;C:\Users\amaguire\.m2\repository\org\datavec\datavec-spark_2.11\1.0.0-beta3_spark_1\datavec-spark_2.11-1.0.0-beta3_spark_1.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\scala-library\2.11.12\scala-library-2.11.12.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\scala-reflect\2.11.12\scala-reflect-2.11.12.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-sql_2.11\1.6.3\spark-sql_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-core_2.11\1.6.3\spark-core_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\apache\avro\avro-mapred\1.7.7\avro-mapred-1.7.7-hadoop2.jar;C:\Users\amaguire\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7.jar;C:\Users\amaguire\.m2\repository\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\Users\amaguire\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7-tests.jar;C:\Users\amaguire\.m2\repository\com\twitter\chill_2.11\0.5.0\chill_2.11-0.5.0.jar;C:\Users\amaguire\.m2\repository\com\esotericsoftware\kryo\kryo\2.21\kryo-2.21.jar;C:\Users\amaguire\.m2\repository\com\esotericsoftware\reflectasm\reflectasm\1.07\reflectasm-1.07-shaded.jar;C:\Users\amaguire\.m2\repository\com\esotericsoftware\minlog\minlog\1.2\minlog-1.2.jar;C:\Users\amaguire\.m2\repository\com\twitter\chill-java\0.5.0\chill-java-0.5.0.jar;C:\Users\amaguire\.m2\repository\org\apache\xbean\xbean-asm5-shaded\4.4\xbean-asm5-shaded-4.4.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-client\2.2.0\hadoop-client-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-common\2.2.0\hadoop-common-2.2.0.jar;C:\Users\amaguire\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\amaguire\.m2\repository\org\apache\commons\commons-math\2.1\commons-math-2.1.jar;C:\Users\amaguire\.m2\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;C:\Users\amaguire\.m2\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;C:\Users\amaguire\.m2\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;C:\Users\amaguire\.m2\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;C:\Users\amaguire\.m2\repository\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-auth\2.2.0\hadoop-auth-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-hdfs\2.2.0\hadoop-hdfs-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.2.0\hadoop-mapreduce-client-app-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.2.0\hadoop-mapreduce-client-common-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-yarn-client\2.2.0\hadoop-yarn-client-2.2.0.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-test-framework\jersey-test-framework-grizzly2\1.9\jersey-test-framework-grizzly2-1.9.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-test-framework\jersey-test-framework-core\1.9\jersey-test-framework-core-1.9.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-client\1.9\jersey-client-1.9.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-grizzly2\1.9\jersey-grizzly2-1.9.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-http\2.1.2\grizzly-http-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-framework\2.1.2\grizzly-framework-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\gmbal\gmbal-api-only\3.0.0-b023\gmbal-api-only-3.0.0-b023.jar;C:\Users\amaguire\.m2\repository\org\glassfish\external\management-api\3.0.0-b012\management-api-3.0.0-b012.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-http-server\2.1.2\grizzly-http-server-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-rcm\2.1.2\grizzly-rcm-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\grizzly\grizzly-http-servlet\2.1.2\grizzly-http-servlet-2.1.2.jar;C:\Users\amaguire\.m2\repository\org\glassfish\javax.servlet\3.1\javax.servlet-3.1.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-json\1.9\jersey-json-1.9.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;C:\Users\amaguire\.m2\repository\stax\stax-api\1.0.1\stax-api-1.0.1.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jackson\jackson-jaxrs\1.8.3\jackson-jaxrs-1.8.3.jar;C:\Users\amaguire\.m2\repository\org\codehaus\jackson\jackson-xc\1.8.3\jackson-xc-1.8.3.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\contribs\jersey-guice\1.9\jersey-guice-1.9.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-yarn-server-common\2.2.0\hadoop-yarn-server-common-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.2.0\hadoop-mapreduce-client-shuffle-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-yarn-api\2.2.0\hadoop-yarn-api-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.2.0\hadoop-mapreduce-client-core-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-yarn-common\2.2.0\hadoop-yarn-common-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.2.0\hadoop-mapreduce-client-jobclient-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\hadoop\hadoop-annotations\2.2.0\hadoop-annotations-2.2.0.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-launcher_2.11\1.6.3\spark-launcher_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-network-common_2.11\1.6.3\spark-network-common_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-network-shuffle_2.11\1.6.3\spark-network-shuffle_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-unsafe_2.11\1.6.3\spark-unsafe_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\net\java\dev\jets3t\jets3t\0.7.1\jets3t-0.7.1.jar;C:\Users\amaguire\.m2\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;C:\Users\amaguire\.m2\repository\org\eclipse\jetty\orbit\javax.servlet\3.0.0.v201112011016\javax.servlet-3.0.0.v201112011016.jar;C:\Users\amaguire\.m2\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;C:\Users\amaguire\.m2\repository\org\slf4j\jul-to-slf4j\1.7.10\jul-to-slf4j-1.7.10.jar;C:\Users\amaguire\.m2\repository\org\slf4j\jcl-over-slf4j\1.7.10\jcl-over-slf4j-1.7.10.jar;C:\Users\amaguire\.m2\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Users\amaguire\.m2\repository\org\slf4j\slf4j-log4j12\1.7.10\slf4j-log4j12-1.7.10.jar;C:\Users\amaguire\.m2\repository\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;C:\Users\amaguire\.m2\repository\org\xerial\snappy\snappy-java\1.1.2.6\snappy-java-1.1.2.6.jar;C:\Users\amaguire\.m2\repository\net\jpountz\lz4\lz4\1.3.0\lz4-1.3.0.jar;C:\Users\amaguire\.m2\repository\org\roaringbitmap\RoaringBitmap\0.5.11\RoaringBitmap-0.5.11.jar;C:\Users\amaguire\.m2\repository\org\json4s\json4s-jackson_2.11\3.2.10\json4s-jackson_2.11-3.2.10.jar;C:\Users\amaguire\.m2\repository\org\json4s\json4s-core_2.11\3.2.10\json4s-core_2.11-3.2.10.jar;C:\Users\amaguire\.m2\repository\org\json4s\json4s-ast_2.11\3.2.10\json4s-ast_2.11-3.2.10.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\scalap\2.11.0\scalap-2.11.0.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\scala-compiler\2.11.0\scala-compiler-2.11.0.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\modules\scala-xml_2.11\1.0.1\scala-xml_2.11-1.0.1.jar;C:\Users\amaguire\.m2\repository\org\scala-lang\modules\scala-parser-combinators_2.11\1.0.1\scala-parser-combinators_2.11-1.0.1.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-server\1.9\jersey-server-1.9.jar;C:\Users\amaguire\.m2\repository\asm\asm\3.1\asm-3.1.jar;C:\Users\amaguire\.m2\repository\com\sun\jersey\jersey-core\1.9\jersey-core-1.9.jar;C:\Users\amaguire\.m2\repository\org\apache\mesos\mesos\0.21.1\mesos-0.21.1-shaded-protobuf.jar;C:\Users\amaguire\.m2\repository\io\netty\netty-all\4.0.29.Final\netty-all-4.0.29.Final.jar;C:\Users\amaguire\.m2\repository\io\dropwizard\metrics\metrics-core\3.1.2\metrics-core-3.1.2.jar;C:\Users\amaguire\.m2\repository\io\dropwizard\metrics\metrics-jvm\3.1.2\metrics-jvm-3.1.2.jar;C:\Users\amaguire\.m2\repository\io\dropwizard\metrics\metrics-json\3.1.2\metrics-json-3.1.2.jar;C:\Users\amaguire\.m2\repository\io\dropwizard\metrics\metrics-graphite\3.1.2\metrics-graphite-3.1.2.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\module\jackson-module-scala_2.11\2.5.1\jackson-module-scala_2.11-2.5.1.jar;C:\Users\amaguire\.m2\repository\com\thoughtworks\paranamer\paranamer\2.6\paranamer-2.6.jar;C:\Users\amaguire\.m2\repository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;C:\Users\amaguire\.m2\repository\oro\oro\2.0.8\oro-2.0.8.jar;C:\Users\amaguire\.m2\repository\org\tachyonproject\tachyon-client\0.8.2\tachyon-client-0.8.2.jar;C:\Users\amaguire\.m2\repository\org\tachyonproject\tachyon-underfs-hdfs\0.8.2\tachyon-underfs-hdfs-0.8.2.jar;C:\Users\amaguire\.m2\repository\org\tachyonproject\tachyon-underfs-s3\0.8.2\tachyon-underfs-s3-0.8.2.jar;C:\Users\amaguire\.m2\repository\org\tachyonproject\tachyon-underfs-local\0.8.2\tachyon-underfs-local-0.8.2.jar;C:\Users\amaguire\.m2\repository\net\razorvine\pyrolite\4.9\pyrolite-4.9.jar;C:\Users\amaguire\.m2\repository\net\sf\py4j\py4j\0.9\py4j-0.9.jar;C:\Users\amaguire\.m2\repository\org\apache\spark\spark-catalyst_2.11\1.6.3\spark-catalyst_2.11-1.6.3.jar;C:\Users\amaguire\.m2\repository\org\codehaus\janino\janino\2.7.8\janino-2.7.8.jar;C:\Users\amaguire\.m2\repository\org\codehaus\janino\commons-compiler\2.7.8\commons-compiler-2.7.8.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-column\1.7.0\parquet-column-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-common\1.7.0\parquet-common-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-encoding\1.7.0\parquet-encoding-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-generator\1.7.0\parquet-generator-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-hadoop\1.7.0\parquet-hadoop-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-format\2.3.0-incubating\parquet-format-2.3.0-incubating.jar;C:\Users\amaguire\.m2\repository\org\apache\parquet\parquet-jackson\1.7.0\parquet-jackson-1.7.0.jar;C:\Users\amaguire\.m2\repository\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar;C:\Users\amaguire\.m2\repository\com\google\guava\guava\20.0\guava-20.0.jar;C:\Users\amaguire\.m2\repository\com\google\inject\guice\4.0\guice-4.0.jar;C:\Users\amaguire\.m2\repository\javax\inject\javax.inject\1\javax.inject-1.jar;C:\Users\amaguire\.m2\repository\aopalliance\aopalliance\1.0\aopalliance-1.0.jar;C:\Users\amaguire\.m2\repository\com\google\protobuf\protobuf-java\2.6.1\protobuf-java-2.6.1.jar;C:\Users\amaguire\.m2\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\amaguire\.m2\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\Users\amaguire\.m2\repository\commons-net\commons-net\3.1\commons-net-3.1.jar;C:\Users\amaguire\.m2\repository\com\sun\xml\bind\jaxb-core\2.2.11\jaxb-core-2.2.11.jar;C:\Users\amaguire\.m2\repository\com\sun\xml\bind\jaxb-impl\2.2.11\jaxb-impl-2.2.11.jar;C:\Users\amaguire\.m2\repository\com\typesafe\akka\akka-actor_2.11\2.3.16\akka-actor_2.11-2.3.16.jar;C:\Users\amaguire\.m2\repository\com\typesafe\akka\akka-remote_2.11\2.3.16\akka-remote_2.11-2.3.16.jar;C:\Users\amaguire\.m2\repository\org\uncommons\maths\uncommons-maths\1.2.2a\uncommons-maths-1.2.2a.jar;C:\Users\amaguire\.m2\repository\com\typesafe\akka\akka-slf4j_2.11\2.3.16\akka-slf4j_2.11-2.3.16.jar;C:\Users\amaguire\.m2\repository\io\netty\netty\3.10.4.Final\netty-3.10.4.Final.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.5.1\jackson-core-2.5.1.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.5.1\jackson-databind-2.5.1.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.5.1\jackson-annotations-2.5.1.jar;C:\Users\amaguire\.m2\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\Users\amaguire\.m2\repository\org\apache\commons\commons-compress\1.16.1\commons-compress-1.16.1.jar;C:\Users\amaguire\.m2\repository\org\apache\commons\commons-math3\3.5\commons-math3-3.5.jar;C:\Users\amaguire\.m2\repository\org\apache\curator\curator-recipes\2.8.0\curator-recipes-2.8.0.jar;C:\Users\amaguire\.m2\repository\org\apache\curator\curator-framework\2.8.0\curator-framework-2.8.0.jar;C:\Users\amaguire\.m2\repository\org\apache\curator\curator-client\2.8.0\curator-client-2.8.0.jar;C:\Users\amaguire\.m2\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;C:\Users\amaguire\.m2\repository\jline\jline\0.9.94\jline-0.9.94.jar;C:\Users\amaguire\.m2\repository\com\typesafe\config\1.3.0\config-1.3.0.jar;C:\Users\amaguire\.m2\repository\org\datavec\datavec-hadoop\1.0.0-beta3\datavec-hadoop-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\datavec\datavec-local\1.0.0-beta3\datavec-local-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\com\codepoetics\protonpack\1.15\protonpack-1.15.jar;C:\Users\amaguire\.m2\repository\org\datavec\datavec-arrow\1.0.0-beta3\datavec-arrow-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\org\nd4j\nd4j-arrow\1.0.0-beta3\nd4j-arrow-1.0.0-beta3.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\dataformat\jackson-dataformat-yaml\2.6.5\jackson-dataformat-yaml-2.6.5.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\dataformat\jackson-dataformat-xml\2.6.5\jackson-dataformat-xml-2.6.5.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.6.5\jackson-module-jaxb-annotations-2.6.5.jar;C:\Users\amaguire\.m2\repository\com\fasterxml\jackson\datatype\jackson-datatype-joda\2.6.5\jackson-datatype-joda-2.6.5.jar;C:\Users\amaguire\.m2\repository\com\carrotsearch\hppc\0.8.1\hppc-0.8.1.jar;C:\Users\amaguire\.m2\repository\org\apache\arrow\arrow-vector\0.11.0\arrow-vector-0.11.0.jar;C:\Users\amaguire\.m2\repository\io\netty\netty-buffer\4.1.22.Final\netty-buffer-4.1.22.Final.jar;C:\Users\amaguire\.m2\repository\io\netty\netty-common\4.1.22.Final\netty-common-4.1.22.Final.jar;C:\Users\amaguire\.m2\repository\org\apache\arrow\arrow-memory\0.11.0\arrow-memory-0.11.0.jar;C:\Users\amaguire\.m2\repository\org\apache\arrow\arrow-format\0.11.0\arrow-format-0.11.0.jar" org.datavec.transform.basic.myExample
log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
=== BEFORE ===
[2016-01-01 17:00:00.000, 830a7u3, u323fy8902, 1, USA, 100.00, Legit]
[2016-01-01 18:03:01.256, 830a7u3, 9732498oeu, 3, FR, 73.20, Legit]
[2016-01-03 02:53:32.231, 78ueoau32, w234e989, 1, USA, 1621.00, Fraud]
[2016-01-03 09:30:16.832, t842uocd, 9732498oeu, 4, USA, 43.19, Legit]
[2016-01-04 23:01:52.920, t842uocd, cza8873bm, 10, MX, 159.65, Legit]
[2016-01-05 02:28:10.648, t842uocd, fgcq9803, 6, CAN, 26.33, Fraud]
[2016-01-05 10:15:36.483, rgc707ke3, tn342v7, 2, USA, -0.90, Legit]
=== AFTER ===
[[1451948512920, t842uocd, 159.65, 43.19], [1451960890648, t842uocd, 26.33, 159.65]]
[[1451671381256, 830a7u3, 73.20, 100.00]]
[]
[]
Process finished with exit code 0

Weka output predictions

I've used the Weka GUI for training and testing a file (making predictions), but can't do the same with the API. The error I'm getting says there's a different number of attributes in the train and test files. In the GUI, this can be solved by checking "Output predictions".
How to do something similar using the API? do you know of any samples out there?
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NominalToBinary;
import weka.filters.unsupervised.attribute.Remove;
public class WekaTutorial
{
public static void main(String[] args) throws Exception
{
DataSource trainSource = new DataSource("/tmp/classes - edited.arff"); // training
Instances trainData = trainSource.getDataSet();
DataSource testSource = new DataSource("/tmp/classes_testing.arff");
Instances testData = testSource.getDataSet();
if (trainData.classIndex() == -1)
{
trainData.setClassIndex(trainData.numAttributes() - 1);
}
if (testData.classIndex() == -1)
{
testData.setClassIndex(testData.numAttributes() - 1);
}
String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -M 1 "
+ "-tokenizer \"weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
Remove remove = new Remove();
remove.setOptions(options);
remove.setInputFormat(trainData);
NominalToBinary filter = new NominalToBinary();
NaiveBayes nb = new NaiveBayes();
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(filter);
fc.setClassifier(nb);
// train and make predictions
fc.buildClassifier(trainData);
for (int i = 0; i < testData.numInstances(); i++)
{
double pred = fc.classifyInstance(testData.instance(i));
System.out.print("ID: " + testData.instance(i).value(0));
System.out.print(", actual: " + testData.classAttribute().value((int) testData.instance(i).classValue()));
System.out.println(", predicted: " + testData.classAttribute().value((int) pred));
}
}
}
Error:
Exception in thread "main" java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2 != 17152
This was not an issue for the GUI.
You need to ensure that categories in train and test sets are compatible, try to
combine train and test sets
List item
preprocess them
save them as arff
open two empty files
copy the header from the top to line "#data"
copy in training set into first file and test set into second file

cannot access neo4j database with py2neo after creating it with the java BatchInserter

SOLVED
OK, I just messed with neo4j-server.properties` config file, I shouldn't have written the db path using "...".
I have created a neo4j database using java's inserter and I strive to access it with py2neo. Here's my java code:
///opt/java/64/jdk1.6.0_45/bin/javac -classpath $HOME/opt/usr/neo4j-community-1.8.2/lib/*:. neo_batch.java
///opt/java/64/jdk1.6.0_45/bin/java -classpath $HOME/opt/usr/neo4j-community-1.8.2/lib/*:. neo_batch
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import org.neo4j.graphdb.index.Index;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.Writer;
import java.util.HashMap;
import java.util.Map;
import java.lang.Long;
import org.neo4j.graphdb.Direction;
import org.neo4j.graphdb.DynamicRelationshipType;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.unsafe.batchinsert.BatchInserter;
import org.neo4j.unsafe.batchinsert.BatchInserterImpl;
import org.neo4j.unsafe.batchinsert.BatchInserters;
import org.neo4j.unsafe.batchinsert.BatchInserterIndex;
import org.neo4j.unsafe.batchinsert.BatchInserterIndexProvider;
import org.neo4j.unsafe.batchinsert.LuceneBatchInserterIndexProvider;
public class neo_batch{
private static final String KEY = "id";
public static void main(String[] args) {
//create & connect 2 neo db folder
String batch_dir = "neo4j-batchinserter-store";
BatchInserter inserter = BatchInserters.inserter( batch_dir );
//set up neo index
BatchInserterIndexProvider indexProvider =
new LuceneBatchInserterIndexProvider( inserter );
BatchInserterIndex OneIndex =
indexProvider.nodeIndex( "one", MapUtil.stringMap( "type", "exact" ) );
OneIndex.setCacheCapacity( "id", 100000 );
//batchin graph, index and relationships
RelationshipType linked = DynamicRelationshipType.withName( "linked" );
for (int i=0;i<10;i++){
System.out.println(i);
long Node1 = createIndexedNode(inserter, OneIndex, i);
long Node2 = createIndexedNode(inserter, OneIndex, i+10);
inserter.createRelationship(Node1, Node2, linked, null);
}
indexProvider.shutdown();
inserter.shutdown();
}
// START SNIPPET: helperMethods
private static long createIndexedNode(BatchInserter inserter,BatchInserterIndex OneIndex,final long id)
{
Map<String, Object> properties = new HashMap<String, Object>();
properties.put( KEY, id );
long node = inserter.createNode( properties );
OneIndex.add( node, properties);
return node;
}
// END SNIPPET: helperMethods
}
Then I modify the neo4j-server.properties config file accordingly and start neo4j start.
The following python code suggests the graph is empty
from py2neo import neo4j
graph = neo4j.GraphDatabaseService("http://localhost:7474/db/data/")
graph.size()
Out[8]: 0
graph.get_indexed_node("one",'id',1)
What's wrong with my appraoch? Thanks
EDIT
Neither can I count the nodes with a cypher:
neo4j-sh (?)$ START n=node(*)
> return count(*);
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row
0 ms
EDIT 2
I can check with the java api that the indexes and nodes exist
private static void query_batched_db(){
GraphDatabaseService graphDb = new GraphDatabaseFactory().newEmbeddedDatabase( batch_dir);
IndexManager indexes = graphDb.index();
boolean oneExists = indexes.existsForNodes("one");
System.out.println("Does the 'one' index exists: "+oneExists);
System.out.println("list indexes: "+graphDb.index().nodeIndexNames());
//search index 'one'
Index<Node> oneIndex = graphDb.index().forNodes( "one" );
for (int i=0;i<25;i++){
IndexHits<Node> hits = oneIndex.get( KEY, i );
System.out.println(hits.size());
}
graphDb.shutdown();
}
Where the output is
Does the 'one' index exists: true
list indexes: [Ljava.lang.String;#26ae533a
1
1
...
1
1
0
0
0
0
0
Now if I populate the graph using python, I won't be able to access them with the previous java method (will count 20 again)
from py2neo import neo4j
graph = neo4j.GraphDatabaseService("http://localhost:7474/db/data/")
idx=graph.get_or_create_index(neo4j.Node,"idx")
for k in range(100):
graph.get_or_create_indexed_node('idx','id',k,{'id':k}
EDIT 3
Now I delete the store I created with the batchinserter, namely neo4j-test-store while the neo4j-server.properties config file continues to point to the deleted store, namely org.neo4j.server.database.location="{some_path}/neo4j-test-store".
Now if I run a cypher count, I got a 100, 100 being the number of nodes I inserted using py2neo.
I am going crazy with this stuff!
SOLVED
OK, I just messed with neo4j-server.properties` config file, I shouldn't have written the db path using "...".

Categories

Resources