Sample code use for (moa PairedLearners) - java

Hello I am new into using MOA and WEKA,
I need to test paired learners concept using this code and I have been able to locate the code but I cannot find any example online and
I am having a hard time figuring how to pas my data into the code and run a test and see my results.
Pls can anyone point my in a right direction or give me a few pointers that I could follow to implement this.
moa/moa/src/main/java/moa/classifiers/meta/PairedLearners.java
Trying to use a similar code like this:
https://groups.google.com/forum/#!topic/moa-development/3IKcguR2kOk
Best Regards.
//Sample code below
import moa.classifiers.meta.pairedLearner;
Public class SamplePairedlearner{
public static void main(String[] args) {
FileStream fStream = new FileStream();
fStream.arffFileOption.setValue("test.arff");// set the ARFF file name
fStream.normalizeOption.setValue(false);// set normalized to be true or false
fStream.prepareForUse();
int numLines = 0;
PairedLearner learners = PairedLearners();
learners.resetLearning();
learners.resetLearningImpl(); //this is where i get an error message
ClusteringStream stream = fStream;
while (stream.hasMoreInstances()) {
Instance curr = stream.nextInstance().getData();
learners.trainOnInstanceImpl(curr)//this line also generates an error
numLines++;
}
Clustering resDstream = dstream.getClusteringResult();
dstream.getMicroClusteringResult();
System.out.println("Size of result from Dstream: " + resDstream.size());
System.out.println(numLines + " lines have been read");
}
}

I could fix the code that you have there, but it wouldn't do you much good. MOA has it's own selection of tasks and evaluators for running these experiments at a much higher level. This is how to run evaluations properly and not dive too deeply into the code. I'll assume a few things:
We use PairedLearners as our classifier.
We evaluate stream classification performance.
We evaluate in predictive sequential (prequential) fashion, i.e. train, then test on each example in the sequence.
Therefore, we can define our task quite simply, as follows.
public class PairedLearnersExample {
public static void main(String[] args) {
ArffFileStream fs = new ArffFileStream(PairedLearnersExample.class.getResource("abalone.arff").getFile(), -1);
fs.prepareForUse();
PairedLearners learners = new PairedLearners();
BasicClassificationPerformanceEvaluator evaluator = new BasicClassificationPerformanceEvaluator();
EvaluatePrequential task = new EvaluatePrequential();
task.learnerOption.setCurrentObject(learners);
task.streamOption.setCurrentObject(fs);
task.evaluatorOption.setCurrentObject(evaluator);
task.prepareForUse();
LearningCurve le = (LearningCurve) task.doTask();
System.out.println(le);
}
}
If you want to do other tasks, you can quite happily swap out the evaluator, stream and learner to achieve whatever it is you want to do.
If you refer to the MOA Manual you'll see that all I'm doing is imitating the command line commands - you could equally perform this evaluation at the command line if you wished.
For example,
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
"EvaluatePrequential -l PairedLearners \
-e BasicClassificationPerformanceEvaluator \
-s (ArffFileStream -f abalone.arff) \
-i 100000000 -f 1000000" > plresult_abalone.csv

Related

Can ABCL's Interpreter load Lisp source from an InputStream?

I've just started looking at ABCL to mix some Lisp into Java. For now, loading some Lisp from a file will be sufficient, and I've been looking at the examples. In every case, the pattern is:
Interpreter interpreter = Interpreter.createInstance();
interpreter.eval("(load \"lispfunctions.lisp\")");
But say I'm building a Maven project with a view to packaging as a JAR: how can I load lispfunctions.lisp from src/main/resources? I can easily get an InputStream—can I go somewhere with that? Or is there another idiom I'm missing here for loading Lisp source from a resource like this?
I've gotten the following to work. I am working with ABCL 1.7.0 on MacOS, although I'm pretty sure this isn't version-specific.
/* load_lisp_within_jar.java -- use ABCL to load Lisp file as resource in jar
* copyright 2020 by Robert Dodier
* I release this work under terms of the GNU General Public License
*/
/* To run this example:
$ javac -cp /path/to/abcl.jar -d . load_lisp_within_jar.java
$ cat << EOF > foo.lisp
(defun f (x) (1+ x))
EOF
$ jar cvf load_lisp_within_jar.jar load_lisp_within_jar.class foo.lisp
$ java -cp load_lisp_within_jar.jar:/path/to/abcl.jar load_lisp_within_jar
*
* Expected output:
(F 100) => 101
*/
import org.armedbear.lisp.*;
import java.io.*;
public class load_lisp_within_jar {
public static void main (String [] args) {
try {
// It appears that interpreter instance is required even though
// it isn't used directly; I guess it arranges global resources.
Interpreter I = Interpreter.createInstance ();
LispObject LOAD_function = Symbol.LOAD.getSymbolFunction ();
// Obtain an input stream for Lisp source code in jar.
ClassLoader L = load_lisp_within_jar.class.getClassLoader ();
InputStream f = L.getResourceAsStream ("foo.lisp");
Stream S = new Stream (Symbol.SYSTEM_STREAM, f, Symbol.CHARACTER);
// Call COMMON-LISP:LOAD with input stream as argument.
LOAD_function.execute (S);
// Verify that function F has been defined.
Symbol F = Packages.findPackage ("COMMON-LISP-USER").findAccessibleSymbol ("F");
LispObject F_function = F.getSymbolFunction ();
LispObject x = F_function.execute (LispInteger.getInstance (100));
System.out.println ("(F 100) => " + x.javaInstance ());
}
catch (Exception e) {
System.err.println ("oops: " + e);
e.printStackTrace ();
}
}
}
As you can see, the program first gets the function associated with the symbol LOAD. (For convenience, many, maybe all of COMMON-LISP symbols have static definitions, so you can just say Symbol.LOAD instead of looking up the symbol via findAccessibleSymbol.) Then the input stream is supplied to the load function. Afterwards we verify that our function F is indeed defined.
I know this stuff can be kind of obscure; I'll be happy to try to answer any questions.

How do you access an array in python from Java

I want to do something like this:
Python Code:
nums = [1,2,3]
Java Code:
nums_Java[] = nums //from python
System.out.println(nums_Java[0])
Output:
1
I have been looking over jython but I just can't seem to find the answer. It seems like it should be very simple but I'm lost. Thanks!
If I understand the question correctly, you'd like to run some embedded python code from a java program, and get the value of a python variable.
Based on http://www.jython.org/archive/21/docs/embedding.html , I wrote a small program that might help:
import org.python.util.PythonInterpreter;
import org.python.core.*;
public class SimpleEmbedded {
public static void main(String[] args) throws PyException {
PythonInterpreter interp = new PythonInterpreter();
interp.exec("nums = [1,2,3]");
PyObject nums = interp.get("nums");
System.out.println("nums: " + nums);
System.out.println("nums is of type: " + nums.getClass());
}
}
Unfortunately, I don't have jython installed at the moment, so the above code is untested. Also I'm not sure what type you will get back from the interpreter, and how to convert it to a java array or access its items. But the program should get you started and give you some more information.

com.python.util.PythonInterpreter: cannot set a python variable to a String

I have an odd problem. So I am writing a program that uses Python for a simple user scripting interface. But to keep my question simple...I am trying to use PythonInterpreter.set to set a variable in the Python interpreter to a String value. But when I set it I get this exception:
LookupError: no codec search functions registered: can't find encoding
The following is my code:
package demo;
import org.python.util.PythonInterpreter;
public class Application {
public static void main(String[] args) {
PythonInterpreter pi = new PythonInterpreter();
String greeting = "Jesus Christ";
Integer times = 6;
pi.exec("actor = 'Lucy'");
pi.set("greeting", greeting);
pi.set("times", times);
pi.exec("print '%s %s' % (greeting, actor)");
pi.exec("print \"%s %s \\n\" % (greeting, actor) * times");
System.out.println("RESULT: " + pi.eval("actor == 'Lucy'"));
System.out.println("ACTOR: " + pi.get("actor"));
}
}
If you need to see the pom file for my project, I can include it, but really I just have the Jython 2.5.0 library installed. I am wondering if I needed to install something else on my system other than having maven install this library for me. I do have Python installed on this computer, and PYTHON_HOME setup in the environment variables. What am I doing wrong?
EDIT: The line:
pi.set("times", times);
...works just fine.
But...
pi.set("greeting", greeting);
does not. I imagine it has something to with times being a primitive data type and greeting being a String.

Getting the results of a Haskell script from Java

I'm trying to create a program to compare the amount of time it takes various haskell scripts to run, which will later be used to create graphs and displayed in a GUI. I've tried to create said GUI using Haskell libraries but I haven't had much luck, especially since I'm having trouble finding up to date GUI libraries for Windows. I've tried to use Java to get these results but either get errors returned or simply no result.
I've constructed a minimal example to show roughly what I'm doing at the moment:
import java.io.*;
public class TestExec {
public static void main(String[] args) {
try {
Process p = Runtime.getRuntime().exec("ghc test.hs 2 2");
BufferedReader in = new BufferedReader(
new InputStreamReader(p.getInputStream()));
String line = null;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
And here is the Haskell script this is calling, in this case a simple addition:
test x y = x + y
Currently there simply isn't any result stored or printed. Anyone have any ideas?
Since you're attempting to run this as an executable, you need to provide a main. In you're case it should look something like
import System.Environment
test :: Integer -> Integer -> Integer
test = (+)
main = do
[x, y] <- map read `fmap` getArgs
print $ x `test` y
This just reads the command line arguments, adds them, then prints them. Though I did something like a while ago, it's much easier to do the benchmarking/testing in Haskell, and dump the output data to a text file in a more structured format, then parse/display it in Java or whatever language you like.
This is mostly a Java question. Search for Runtime.getRuntime().exec().
On the Haskell side, you need to write a stand-alone Haskell script. The one by #jozefg is OK. You should be able to run it as
runghc /path/to/script.hs 1 2
from the command line.
Calling it from Java is no different than running any other external process in Java. In Clojure (a JVM language, I use it for brevity) I do:
user=> (def p (-> (Runtime/getRuntime) (.exec "/usr/bin/runghc /tmp/test.hs 1 2")))
#'user/p
user=> (-> p .getInputStream input-stream reader line-seq)
("3")
Please note that I use runghc to run a script (not ghc). Full paths are not necessary, but could be helpful. Your Java program can be modified this way:
--- TestExec.question.java
+++ TestExec.java
## -2,7 +2,7 ##
public class TestExec {
public static void main(String[] args) {
try {
- Process p = Runtime.getRuntime().exec("ghc test.hs 2 2");
+ Process p = Runtime.getRuntime().exec("/usr/bin/runghc /tmp/test.hs 2 2");
BufferedReader in = new BufferedReader(
new InputStreamReader(p.getInputStream()));
String line = null;
The modified version runs the Haskell script just fine. You may have to change paths to you runghc and test.hs locations.
At first to read from output you need to use OutputStreamReader(p.getOutputStream()) instead of InputStreamReader
As I said in comment such a benchmark is simply incorrect. While benchmarking one should eliminate as many side coasts as possible. The best solution is to use the criterion package. It produces nice graphical output as you desire.
Small example:
import Criterion
import Criterion.Main
import Criterion.Config
fac 1 = 1
fac n = n * (fac $ n-1)
myConfig = defaultConfig {
cfgReport = ljust "report.html"
}
main = defaultMainWith myConfig (return ()) [
bench "fac 30" $ whnf fac 30
]
After execution it produces a file "report.html" with neat interactive plots.

Counting the number of files in a directory using Java

How do I count the number of files in a directory using Java ? For simplicity, lets assume that the directory doesn't have any sub-directories.
I know the standard method of :
new File(<directory path>).listFiles().length
But this will effectively go through all the files in the directory, which might take long if the number of files is large. Also, I don't care about the actual files in the directory unless their number is greater than some fixed large number (say 5000).
I am guessing, but doesn't the directory (or its i-node in case of Unix) store the number of files contained in it? If I could get that number straight away from the file system, it would be much faster. I need to do this check for every HTTP request on a Tomcat server before the back-end starts doing the real processing. Therefore, speed is of paramount importance.
I could run a daemon every once in a while to clear the directory. I know that, so please don't give me that solution.
Ah... the rationale for not having a straightforward method in Java to do that is file storage abstraction: some filesystems may not have the number of files in a directory readily available... that count may not even have any meaning at all (see for example distributed, P2P filesystems, fs that store file lists as a linked list, or database-backed filesystems...).
So yes,
new File(<directory path>).list().length
is probably your best bet.
Since Java 8, you can do that in three lines:
try (Stream<Path> files = Files.list(Paths.get("your/path/here"))) {
long count = files.count();
}
Regarding the 5000 child nodes and inode aspects:
This method will iterate over the entries but as Varkhan suggested you probably can't do better besides playing with JNI or direct system commands calls, but even then, you can never be sure these methods don't do the same thing!
However, let's dig into this a little:
Looking at JDK8 source, Files.list exposes a stream that uses an Iterable from Files.newDirectoryStream that delegates to FileSystemProvider.newDirectoryStream.
On UNIX systems (decompiled sun.nio.fs.UnixFileSystemProvider.class), it loads an iterator: A sun.nio.fs.UnixSecureDirectoryStream is used (with file locks while iterating through the directory).
So, there is an iterator that will loop through the entries here.
Now, let's look to the counting mechanism.
The actual count is performed by the count/sum reducing API exposed by Java 8 streams. In theory, this API can perform parallel operations without much effort (with multihtreading). However the stream is created with parallelism disabled so it's a no go...
The good side of this approach is that it won't load the array in memory as the entries will be counted by an iterator as they are read by the underlying (Filesystem) API.
Finally, for the information, conceptually in a filesystem, a directory node is not required to hold the number of the files that it contains, it can just contain the list of it's child nodes (list of inodes). I'm not an expert on filesystems, but I believe that UNIX filesystems work just like that. So you can't assume there is a way to have this information directly (i.e: there can always be some list of child nodes hidden somewhere).
Unfortunately, I believe that is already the best way (although list() is slightly better than listFiles(), since it doesn't construct File objects).
This might not be appropriate for your application, but you could always try a native call (using jni or jna), or exec a platform-specific command and read the output before falling back to list().length. On *nix, you could exec ls -1a | wc -l (note - that's dash-one-a for the first command, and dash-lowercase-L for the second). Not sure what would be right on windows - perhaps just a dir and look for the summary.
Before bothering with something like this I'd strongly recommend you create a directory with a very large number of files and just see if list().length really does take too long. As this blogger suggests, you may not want to sweat this.
I'd probably go with Varkhan's answer myself.
Since you don't really need the total number, and in fact want to perform an action after a certain number (in your case 5000), you can use java.nio.file.Files.newDirectoryStream. The benefit is that you can exit early instead having to go through the entire directory just to get a count.
public boolean isOverMax(){
Path dir = Paths.get("C:/foo/bar");
int i = 1;
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
for (Path p : stream) {
//larger than max files, exit
if (++i > MAX_FILES) {
return true;
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
return false;
}
The interface doc for DirectoryStream also has some good examples.
If you have directories containing really (>100'000) many files, here is a (non-portable) way to go:
String directoryPath = "a path";
// -f flag is important, because this way ls does not sort it output,
// which is way faster
String[] params = { "/bin/sh", "-c",
"ls -f " + directoryPath + " | wc -l" };
Process process = Runtime.getRuntime().exec(params);
BufferedReader reader = new BufferedReader(new InputStreamReader(
process.getInputStream()));
String fileCount = reader.readLine().trim() - 2; // accounting for .. and .
reader.close();
System.out.println(fileCount);
Using sigar should help. Sigar has native hooks to get the stats
new Sigar().getDirStat(dir).getTotal()
This method works for me very well.
// Recursive method to recover files and folders and to print the information
public static void listFiles(String directoryName) {
File file = new File(directoryName);
File[] fileList = file.listFiles(); // List files inside the main dir
int j;
String extension;
String fileName;
if (fileList != null) {
for (int i = 0; i < fileList.length; i++) {
extension = "";
if (fileList[i].isFile()) {
fileName = fileList[i].getName();
if (fileName.lastIndexOf(".") != -1 && fileName.lastIndexOf(".") != 0) {
extension = fileName.substring(fileName.lastIndexOf(".") + 1);
System.out.println("THE " + fileName + " has the extension = " + extension);
} else {
extension = "Unknown";
System.out.println("extension2 = " + extension);
}
filesCount++;
allStats.add(new FilePropBean(filesCount, fileList[i].getName(), fileList[i].length(), extension,
fileList[i].getParent()));
} else if (fileList[i].isDirectory()) {
filesCount++;
extension = "";
allStats.add(new FilePropBean(filesCount, fileList[i].getName(), fileList[i].length(), extension,
fileList[i].getParent()));
listFiles(String.valueOf(fileList[i]));
}
}
}
}
Unfortunately, as mmyers said, File.list() is about as fast as you are going to get using Java. If speed is as important as you say, you may want to consider doing this particular operation using JNI. You can then tailor your code to your particular situation and filesystem.
public void shouldGetTotalFilesCount() {
Integer reduce = of(listRoots()).parallel().map(this::getFilesCount).reduce(0, ((a, b) -> a + b));
}
private int getFilesCount(File directory) {
File[] files = directory.listFiles();
return Objects.isNull(files) ? 1 : Stream.of(files)
.parallel()
.reduce(0, (Integer acc, File p) -> acc + getFilesCount(p), (a, b) -> a + b);
}
Count files in directory and all subdirectories.
var path = Path.of("your/path/here");
var count = Files.walk(path).filter(Files::isRegularFile).count();
In spring batch I did below
private int getFilesCount() throws IOException {
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
Resource[] resources = resolver.getResources("file:" + projectFilesFolder + "/**/input/splitFolder/*.csv");
return resources.length;
}

Categories

Resources