I've just started looking at ABCL to mix some Lisp into Java. For now, loading some Lisp from a file will be sufficient, and I've been looking at the examples. In every case, the pattern is:
Interpreter interpreter = Interpreter.createInstance();
interpreter.eval("(load \"lispfunctions.lisp\")");
But say I'm building a Maven project with a view to packaging as a JAR: how can I load lispfunctions.lisp from src/main/resources? I can easily get an InputStream—can I go somewhere with that? Or is there another idiom I'm missing here for loading Lisp source from a resource like this?
I've gotten the following to work. I am working with ABCL 1.7.0 on MacOS, although I'm pretty sure this isn't version-specific.
/* load_lisp_within_jar.java -- use ABCL to load Lisp file as resource in jar
* copyright 2020 by Robert Dodier
* I release this work under terms of the GNU General Public License
*/
/* To run this example:
$ javac -cp /path/to/abcl.jar -d . load_lisp_within_jar.java
$ cat << EOF > foo.lisp
(defun f (x) (1+ x))
EOF
$ jar cvf load_lisp_within_jar.jar load_lisp_within_jar.class foo.lisp
$ java -cp load_lisp_within_jar.jar:/path/to/abcl.jar load_lisp_within_jar
*
* Expected output:
(F 100) => 101
*/
import org.armedbear.lisp.*;
import java.io.*;
public class load_lisp_within_jar {
public static void main (String [] args) {
try {
// It appears that interpreter instance is required even though
// it isn't used directly; I guess it arranges global resources.
Interpreter I = Interpreter.createInstance ();
LispObject LOAD_function = Symbol.LOAD.getSymbolFunction ();
// Obtain an input stream for Lisp source code in jar.
ClassLoader L = load_lisp_within_jar.class.getClassLoader ();
InputStream f = L.getResourceAsStream ("foo.lisp");
Stream S = new Stream (Symbol.SYSTEM_STREAM, f, Symbol.CHARACTER);
// Call COMMON-LISP:LOAD with input stream as argument.
LOAD_function.execute (S);
// Verify that function F has been defined.
Symbol F = Packages.findPackage ("COMMON-LISP-USER").findAccessibleSymbol ("F");
LispObject F_function = F.getSymbolFunction ();
LispObject x = F_function.execute (LispInteger.getInstance (100));
System.out.println ("(F 100) => " + x.javaInstance ());
}
catch (Exception e) {
System.err.println ("oops: " + e);
e.printStackTrace ();
}
}
}
As you can see, the program first gets the function associated with the symbol LOAD. (For convenience, many, maybe all of COMMON-LISP symbols have static definitions, so you can just say Symbol.LOAD instead of looking up the symbol via findAccessibleSymbol.) Then the input stream is supplied to the load function. Afterwards we verify that our function F is indeed defined.
I know this stuff can be kind of obscure; I'll be happy to try to answer any questions.
Related
Hello I am new into using MOA and WEKA,
I need to test paired learners concept using this code and I have been able to locate the code but I cannot find any example online and
I am having a hard time figuring how to pas my data into the code and run a test and see my results.
Pls can anyone point my in a right direction or give me a few pointers that I could follow to implement this.
moa/moa/src/main/java/moa/classifiers/meta/PairedLearners.java
Trying to use a similar code like this:
https://groups.google.com/forum/#!topic/moa-development/3IKcguR2kOk
Best Regards.
//Sample code below
import moa.classifiers.meta.pairedLearner;
Public class SamplePairedlearner{
public static void main(String[] args) {
FileStream fStream = new FileStream();
fStream.arffFileOption.setValue("test.arff");// set the ARFF file name
fStream.normalizeOption.setValue(false);// set normalized to be true or false
fStream.prepareForUse();
int numLines = 0;
PairedLearner learners = PairedLearners();
learners.resetLearning();
learners.resetLearningImpl(); //this is where i get an error message
ClusteringStream stream = fStream;
while (stream.hasMoreInstances()) {
Instance curr = stream.nextInstance().getData();
learners.trainOnInstanceImpl(curr)//this line also generates an error
numLines++;
}
Clustering resDstream = dstream.getClusteringResult();
dstream.getMicroClusteringResult();
System.out.println("Size of result from Dstream: " + resDstream.size());
System.out.println(numLines + " lines have been read");
}
}
I could fix the code that you have there, but it wouldn't do you much good. MOA has it's own selection of tasks and evaluators for running these experiments at a much higher level. This is how to run evaluations properly and not dive too deeply into the code. I'll assume a few things:
We use PairedLearners as our classifier.
We evaluate stream classification performance.
We evaluate in predictive sequential (prequential) fashion, i.e. train, then test on each example in the sequence.
Therefore, we can define our task quite simply, as follows.
public class PairedLearnersExample {
public static void main(String[] args) {
ArffFileStream fs = new ArffFileStream(PairedLearnersExample.class.getResource("abalone.arff").getFile(), -1);
fs.prepareForUse();
PairedLearners learners = new PairedLearners();
BasicClassificationPerformanceEvaluator evaluator = new BasicClassificationPerformanceEvaluator();
EvaluatePrequential task = new EvaluatePrequential();
task.learnerOption.setCurrentObject(learners);
task.streamOption.setCurrentObject(fs);
task.evaluatorOption.setCurrentObject(evaluator);
task.prepareForUse();
LearningCurve le = (LearningCurve) task.doTask();
System.out.println(le);
}
}
If you want to do other tasks, you can quite happily swap out the evaluator, stream and learner to achieve whatever it is you want to do.
If you refer to the MOA Manual you'll see that all I'm doing is imitating the command line commands - you could equally perform this evaluation at the command line if you wished.
For example,
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
"EvaluatePrequential -l PairedLearners \
-e BasicClassificationPerformanceEvaluator \
-s (ArffFileStream -f abalone.arff) \
-i 100000000 -f 1000000" > plresult_abalone.csv
Say I have the following Java API that all packages up as blocks.jar:
public class Block {
private Sting name;
private int xCoord;
private int yCoord;
// Getters, setters, ctors, etc.
public void setCoords(int x, int y) {
setXCoord(x);
setYCoord(y);
}
}
public BlockController {
public static moveBlock(Block block, int newXCoord, int newYCoord) {
block.setCooords(newXCoord, newYCoord);
}
public static stackBlocks(Block under, Block onTop) {
// Stack "onTop" on top of "under".
// Don't worry about the math here, this is just for an example.
onTop.setCoords(under.getXCoord() + onTop.getXCoord(), under.getYCoord());
}
}
Again, don't worry about the math and the fact that (x,y) coordinates don't accurately represent blocks in 3D space. The point is that we have Java code, compiled as a JAR, that performs operations on blocks. I now want to build a lightweight scripting language that allows a non-programmer to invoke the various block API methods and manipulate blocks, and I want to implement its interpreter with ANTLR (latest version is 4.3).
The scripting language, we'll call it BlockSpeak, might look like this:
block A at (0, 10) # Create block "A" at coordinates (0, 10)
block B at (0, 20) # Create block "B" at coordinates (0, 20)
stack A on B # Stack block A on top of block B
This might be equivalent to the following Java code:
Block A, B;
A = new Block(0, 10);
B = new Block(0, 20);
BlockController.stackBlocks(B, A);
So the idea is that the ANTLR-generated interpreter would take a *.blockspeak script as input, and use the commands in this script to invoke blocks.jar API operations. I read the excellent Simple Example which creates a simple calculator using ANTLR. However in that link, there is an ExpParser class with an eval() method:
ExpParser parser = new ExpParser(tokens);
parser.eval();
The problem here is that, in the case of the calculator, the tokens represent a mathematical expression to evaluate, and eval() returns the evaluation of the expression. In the case of an interpreter, the tokens would represent my BlockSpeak script, but calling eval() shouldn't evaluate anything, it should know how to map the various BlockSpeak commands to Java code:
BlockSpeak Command: Java code:
==========================================
block A at (0, 10) ==> Block A = new Block(0, 10);
block B at (0, 20) ==> Block B = new Block(0, 20);
stack A on B ==> BlockController.stackBlocks(B, A);
So my question is, where do I perform this "mapping"? In other words, how do I instruct ANTLR to call various pieces of code (packaged inside blocks.jar) when it encounters particular grammars in the BlockSpeak script? More importantly, can someone give me a pseudo-code example?
I would simply evaluate the script on the fly, not generate Java source files which need to be compiled themselves again.
With ANTLR 4 it is highly recommended to keep the grammar and target specific code separate from each other and put any target specific code inside a tree-listener or -visitor.
I will give a quick demo how to use a listener.
A grammar for your example input could look like this:
File: blockspeak/BlockSpeak.g4
grammar BlockSpeak;
parse
: instruction* EOF
;
instruction
: create_block
| stack_block
;
create_block
: 'block' NAME 'at' position
;
stack_block
: 'stack' top=NAME 'on' bottom=NAME
;
position
: '(' x=INT ',' y=INT ')'
;
COMMENT
: '#' ~[\r\n]* -> skip
;
INT
: [0-9]+
;
NAME
: [a-zA-Z]+
;
SPACES
: [ \t\r\n] -> skip
;
Some supporting Java classes:
File: blockspeak/Main.java
package blockspeak;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws Exception {
Scanner keyboard = new Scanner(System.in);
// Some initial input to let the parser have a go at.
String input = "block A at (0, 10) # Create block \"A\" at coordinates (0, 10)\n" +
"block B at (0, 20) # Create block \"B\" at coordinates (0, 20)\n" +
"stack A on B # Stack block A on top of block B";
EvalBlockSpeakListener listener = new EvalBlockSpeakListener();
// Keep asking for input until the user presses 'q'.
while(!input.equals("q")) {
// Create a lexer and parser for `input`.
BlockSpeakLexer lexer = new BlockSpeakLexer(new ANTLRInputStream(input));
BlockSpeakParser parser = new BlockSpeakParser(new CommonTokenStream(lexer));
// Now parse the `input` and attach our listener to it. We want to reuse
// the same listener because it will hold out Blocks-map.
ParseTreeWalker.DEFAULT.walk(listener, parser.parse());
// Let's see if the user wants to continue.
System.out.print("Type a command and press return (q to quit) $ ");
input = keyboard.nextLine();
}
System.out.println("Bye!");
}
}
// You can place this Block class inside Main.java as well.
class Block {
final String name;
int x;
int y;
Block(String name, int x, int y) {
this.name = name;
this.x = x;
this.y = y;
}
void onTopOf(Block that) {
// TODO
}
}
This main class is pretty self explanatory with the inline comments. The tricky part is what the listener is supposed to look like. Well, here it is:
File: blockspeak/EvalBlockSpeakListener.java
package blockspeak;
import org.antlr.v4.runtime.misc.NotNull;
import java.util.HashMap;
import java.util.Map;
/**
* A class extending the `BlockSpeakBaseListener` (which will be generated
* by ANTLR) in which we override the methods in which to create blocks, and
* in which to stack blocks.
*/
public class EvalBlockSpeakListener extends BlockSpeakBaseListener {
// A map that keeps track of our Blocks.
private final Map<String, Block> blocks = new HashMap<String, Block>();
#Override
public void enterCreate_block(#NotNull BlockSpeakParser.Create_blockContext ctx) {
String name = ctx.NAME().getText();
Integer x = Integer.valueOf(ctx.position().x.getText());
Integer y = Integer.valueOf(ctx.position().y.getText());
Block block = new Block(name, x, y);
System.out.printf("creating block: %s\n", name);
blocks.put(block.name, block);
}
#Override
public void enterStack_block(#NotNull BlockSpeakParser.Stack_blockContext ctx) {
Block bottom = this.blocks.get(ctx.bottom.getText());
Block top = this.blocks.get(ctx.top.getText());
if (bottom == null) {
System.out.printf("no such block: %s\n", ctx.bottom.getText());
}
else if (top == null) {
System.out.printf("no such block: %s\n", ctx.top.getText());
}
else {
System.out.printf("putting %s on top of %s\n", top.name, bottom.name);
top.onTopOf(bottom);
}
}
}
The listener above has 2 methods defined that map to the following parser rules:
create_block
: 'block' NAME 'at' position
;
stack_block
: 'stack' top=NAME 'on' bottom=NAME
;
Whenever the parser "enters" such a parser rule, the corresponding method inside the listener will be called. So, whenever enterCreate_block (the parser enters the create_block rule) is called, we create (and save) a block, and when enterStack_block is called, we retrieve the 2 block involved in the operation, and stack one on top of the other.
To see the 3 classes above in action, download ANTLR 4.4 inside the directory that holds the blockspeak/ directory with the .g4 and .java files.
Open a console and perform the following 3 steps:
1. generate the ANTLR files:
java -cp antlr-4.4-complete.jar org.antlr.v4.Tool blockspeak/BlockSpeak.g4 -package blockspeak
2. compile all Java sources files:
javac -cp ./antlr-4.4-complete.jar blockspeak/*.java
3. Run the main class:
3.1. Linux/Mac
java -cp .:antlr-4.4-complete.jar blockspeak.Main
3.2. Windows
java -cp .;antlr-4.4-complete.jar blockspeak.Main
Here is an example session of running the Main class:
bart#hades:~/Temp/demo$ java -cp .:antlr-4.4-complete.jar blockspeak.Main
creating block: A
creating block: B
putting A on top of B
Type a command and press return (q to quit) $ block X at (0,0)
creating block: X
Type a command and press return (q to quit) $ stack Y on X
no such block: Y
Type a command and press return (q to quit) $ stack A on X
putting A on top of X
Type a command and press return (q to quit) $ q
Bye!
bart#hades:~/Temp/demo$
More info on tree listeners: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parse+Tree+Listeners
I would personally write a grammar to generate a Java program for each script that you could then compile (along with your jar) and run independently... i.e., a 2-step process.
For example, with something like the following simple grammar (which I haven't tested and I am sure you would need to extend and adapt), you could replace the parser.eval() statement in that example with parser.program(); (also substituting "BlockSpeak" for "Exp" throughout) and it should spit out Java code that matches the script to stdout, which you could redirect into a .java file, compile (together with the jar) and run.
BlockSpeak.g:
grammar BlockSpeak;
program
#init { System.out.println("//import com.whatever.stuff;\n\npublic class BlockProgram {\n public static void main(String[] args) {\n\n"); }
#after { System.out.println("\n } // main()\n} // class BlockProgram\n\n"); }
: inss=instructions { if (null != $inss.insList) for (String ins : $inss.insList) { System.out.println(ins); } }
;
instructions returns [ArrayList<String> insList]
#init { $insList = new ArrayList<String>(); }
: (instruction { $insList.add($instruction.ins); })*
;
instruction returns [String ins]
: ( create { $ins = $create.ins; } | move { $ins = $move.ins; } | stack { $ins = $stack.ins; } ) ';'
;
create returns [String ins]
: 'block' id=BlockId 'at' c=coordinates { $ins = " Block " + $id.text + " = new Block(" + $c.coords + ");\n"; }
;
move returns [String ins]
: 'move' id=BlockId 'to' c=coordinates { $ins = " BlockController.moveBlock(" + $id.text + ", " + $c.coords + ");\n"; }
;
stack returns [String ins]
: 'stack' id1=BlockId 'on' id2=BlockId { $ins = " BlockController.stackBlocks(" + $id1.text + ", " + $id2.text + ");\n"; }
;
coordinates returns [String coords]
: '(' x=PosInt ',' y=PosInt ')' { $coords = $x.text + ", " + $y.text; }
;
BlockId
: ('A'..'Z')+
;
PosInt
: ('0'..'9') ('0'..'9')*
;
WS
: (' ' | '\t' | '\r'| '\n') -> channel(HIDDEN)
;
(Note that for simplicity this grammar requires semi-colons to separate each instruction.)
There are of course other ways to do this sort of thing, but this seems like the simplest to me.
Good luck!
Update
So I went ahead and "finished" my original post (fixing a few bugs in the above grammar) and testing it on a simple script.
Here is the .java file I used to test the above grammar (taken from the code stubs you posted above). Note that in your situation, you would probably want to make the script filename (in my code "script.blockspeak") into a command line parameter. Also, of course the Block and BlockController classes would instead come from your jar.
BlockTest.java:
import org.antlr.v4.runtime.*;
class Block {
private String name;
private int xCoord;
private int yCoord;
// Other Getters, setters, ctors, etc.
public Block(int x, int y) { xCoord = x; yCoord = y; }
public int getXCoord() { return xCoord; }
public int getYCoord() { return yCoord; }
public void setXCoord(int x) { xCoord = x; }
public void setYCoord(int y) { yCoord = y; }
public void setCoords(int x, int y) {
setXCoord(x);
setYCoord(y);
}
}
class BlockController {
public static void moveBlock(Block block, int newXCoord, int newYCoord) {
block.setCoords(newXCoord, newYCoord);
}
public static void stackBlocks(Block under, Block onTop) {
// Stack "onTop" on top of "under".
// Don't worry about the math here, this is just for an example.
onTop.setCoords(under.getXCoord() + onTop.getXCoord(), under.getYCoord());
}
}
public class BlocksTest {
public static void main(String[] args) throws Exception {
ANTLRFileStream in = new ANTLRFileStream("script.blockspeak");
BlockSpeakLexer lexer = new BlockSpeakLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
BlockSpeakParser parser = new BlockSpeakParser(tokens);
parser.program();
}
}
And here are the command lines I used (on my MacBook Pro):
> java -jar antlr-4.4-complete.jar BlockSpeak.g
> javac -cp .:antlr-4.4-complete.jar *.java
> java -cp .:antlr-4.4-complete.jar BlocksTest > BlockProgram.java
This was the input script:
script.blockspeak:
block A at (0, 10);
block B at (0, 20);
stack A on B;
And this was the output:
BlockProgram.java:
//import com.whatever.stuff;
public class BlockProgram {
public static void main(String[] args) {
Block A = new Block(0, 10);
Block B = new Block(0, 20);
BlockController.stackBlocks(A, B);
} // main()
} // class BlockProgram
You would of course then have to compile and run BlockProgram.java for each script.
In answer to one of the questions in your comment (#3), there are a couple more complex options I first contemplated that might streamline your "user experience".
(A) Instead of using the grammar to generate a java program that you then have to compile and run, you could embed the calls to the BlockController directly into the ANTLR actions. Where I created strings and passed them up from one non-terminal to the next, you could have java code there directly doing your Block commands whenever an instruction rule is recognized. This would require a bit more complexity with respect to the ANTLR grammar and imports, but it's technically doable.
(B) If you were to do option A, you could then go a step further and create an interactive interpreter ("shell"), where the user is presented with a prompt and just types in "blockspeak" commands at the prompt, which are then parsed and executed directly, displaying the results back to the user.
Neither of these options are all that much harder to accomplish in terms of complexity, but they each require doing a lot more coding that would be beyond the scope of a Stack Overflow answer. That's why I opted to present a "simpler" solution here.
The eval() in ExpParser is implemented through method calls; it's just that the calls have shortcut syntax in the form of operators.
As an exercise, change ExpParser adding a Calculator class with (unimplemented) methods for mathematical operators, add(), multiply(), divide(), and so on, and then change the rules to use those methods instead of the operators. Thus, you'll understand the basis of what you need to do for your BlockSpeak interpreter.
additionExp returns [double value]
: m1=multiplyExp {$value = $m1.value;}
( '+' m2=multiplyExp {$value = Calculator.add($value, $m2.value);}
| '-' m2=multiplyExp {$value = Calculator.subtract($value, $m2.value);}
)*
;
I have an odd problem. So I am writing a program that uses Python for a simple user scripting interface. But to keep my question simple...I am trying to use PythonInterpreter.set to set a variable in the Python interpreter to a String value. But when I set it I get this exception:
LookupError: no codec search functions registered: can't find encoding
The following is my code:
package demo;
import org.python.util.PythonInterpreter;
public class Application {
public static void main(String[] args) {
PythonInterpreter pi = new PythonInterpreter();
String greeting = "Jesus Christ";
Integer times = 6;
pi.exec("actor = 'Lucy'");
pi.set("greeting", greeting);
pi.set("times", times);
pi.exec("print '%s %s' % (greeting, actor)");
pi.exec("print \"%s %s \\n\" % (greeting, actor) * times");
System.out.println("RESULT: " + pi.eval("actor == 'Lucy'"));
System.out.println("ACTOR: " + pi.get("actor"));
}
}
If you need to see the pom file for my project, I can include it, but really I just have the Jython 2.5.0 library installed. I am wondering if I needed to install something else on my system other than having maven install this library for me. I do have Python installed on this computer, and PYTHON_HOME setup in the environment variables. What am I doing wrong?
EDIT: The line:
pi.set("times", times);
...works just fine.
But...
pi.set("greeting", greeting);
does not. I imagine it has something to with times being a primitive data type and greeting being a String.
I'm trying to create a program to compare the amount of time it takes various haskell scripts to run, which will later be used to create graphs and displayed in a GUI. I've tried to create said GUI using Haskell libraries but I haven't had much luck, especially since I'm having trouble finding up to date GUI libraries for Windows. I've tried to use Java to get these results but either get errors returned or simply no result.
I've constructed a minimal example to show roughly what I'm doing at the moment:
import java.io.*;
public class TestExec {
public static void main(String[] args) {
try {
Process p = Runtime.getRuntime().exec("ghc test.hs 2 2");
BufferedReader in = new BufferedReader(
new InputStreamReader(p.getInputStream()));
String line = null;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
And here is the Haskell script this is calling, in this case a simple addition:
test x y = x + y
Currently there simply isn't any result stored or printed. Anyone have any ideas?
Since you're attempting to run this as an executable, you need to provide a main. In you're case it should look something like
import System.Environment
test :: Integer -> Integer -> Integer
test = (+)
main = do
[x, y] <- map read `fmap` getArgs
print $ x `test` y
This just reads the command line arguments, adds them, then prints them. Though I did something like a while ago, it's much easier to do the benchmarking/testing in Haskell, and dump the output data to a text file in a more structured format, then parse/display it in Java or whatever language you like.
This is mostly a Java question. Search for Runtime.getRuntime().exec().
On the Haskell side, you need to write a stand-alone Haskell script. The one by #jozefg is OK. You should be able to run it as
runghc /path/to/script.hs 1 2
from the command line.
Calling it from Java is no different than running any other external process in Java. In Clojure (a JVM language, I use it for brevity) I do:
user=> (def p (-> (Runtime/getRuntime) (.exec "/usr/bin/runghc /tmp/test.hs 1 2")))
#'user/p
user=> (-> p .getInputStream input-stream reader line-seq)
("3")
Please note that I use runghc to run a script (not ghc). Full paths are not necessary, but could be helpful. Your Java program can be modified this way:
--- TestExec.question.java
+++ TestExec.java
## -2,7 +2,7 ##
public class TestExec {
public static void main(String[] args) {
try {
- Process p = Runtime.getRuntime().exec("ghc test.hs 2 2");
+ Process p = Runtime.getRuntime().exec("/usr/bin/runghc /tmp/test.hs 2 2");
BufferedReader in = new BufferedReader(
new InputStreamReader(p.getInputStream()));
String line = null;
The modified version runs the Haskell script just fine. You may have to change paths to you runghc and test.hs locations.
At first to read from output you need to use OutputStreamReader(p.getOutputStream()) instead of InputStreamReader
As I said in comment such a benchmark is simply incorrect. While benchmarking one should eliminate as many side coasts as possible. The best solution is to use the criterion package. It produces nice graphical output as you desire.
Small example:
import Criterion
import Criterion.Main
import Criterion.Config
fac 1 = 1
fac n = n * (fac $ n-1)
myConfig = defaultConfig {
cfgReport = ljust "report.html"
}
main = defaultMainWith myConfig (return ()) [
bench "fac 30" $ whnf fac 30
]
After execution it produces a file "report.html" with neat interactive plots.
How do I count the number of files in a directory using Java ? For simplicity, lets assume that the directory doesn't have any sub-directories.
I know the standard method of :
new File(<directory path>).listFiles().length
But this will effectively go through all the files in the directory, which might take long if the number of files is large. Also, I don't care about the actual files in the directory unless their number is greater than some fixed large number (say 5000).
I am guessing, but doesn't the directory (or its i-node in case of Unix) store the number of files contained in it? If I could get that number straight away from the file system, it would be much faster. I need to do this check for every HTTP request on a Tomcat server before the back-end starts doing the real processing. Therefore, speed is of paramount importance.
I could run a daemon every once in a while to clear the directory. I know that, so please don't give me that solution.
Ah... the rationale for not having a straightforward method in Java to do that is file storage abstraction: some filesystems may not have the number of files in a directory readily available... that count may not even have any meaning at all (see for example distributed, P2P filesystems, fs that store file lists as a linked list, or database-backed filesystems...).
So yes,
new File(<directory path>).list().length
is probably your best bet.
Since Java 8, you can do that in three lines:
try (Stream<Path> files = Files.list(Paths.get("your/path/here"))) {
long count = files.count();
}
Regarding the 5000 child nodes and inode aspects:
This method will iterate over the entries but as Varkhan suggested you probably can't do better besides playing with JNI or direct system commands calls, but even then, you can never be sure these methods don't do the same thing!
However, let's dig into this a little:
Looking at JDK8 source, Files.list exposes a stream that uses an Iterable from Files.newDirectoryStream that delegates to FileSystemProvider.newDirectoryStream.
On UNIX systems (decompiled sun.nio.fs.UnixFileSystemProvider.class), it loads an iterator: A sun.nio.fs.UnixSecureDirectoryStream is used (with file locks while iterating through the directory).
So, there is an iterator that will loop through the entries here.
Now, let's look to the counting mechanism.
The actual count is performed by the count/sum reducing API exposed by Java 8 streams. In theory, this API can perform parallel operations without much effort (with multihtreading). However the stream is created with parallelism disabled so it's a no go...
The good side of this approach is that it won't load the array in memory as the entries will be counted by an iterator as they are read by the underlying (Filesystem) API.
Finally, for the information, conceptually in a filesystem, a directory node is not required to hold the number of the files that it contains, it can just contain the list of it's child nodes (list of inodes). I'm not an expert on filesystems, but I believe that UNIX filesystems work just like that. So you can't assume there is a way to have this information directly (i.e: there can always be some list of child nodes hidden somewhere).
Unfortunately, I believe that is already the best way (although list() is slightly better than listFiles(), since it doesn't construct File objects).
This might not be appropriate for your application, but you could always try a native call (using jni or jna), or exec a platform-specific command and read the output before falling back to list().length. On *nix, you could exec ls -1a | wc -l (note - that's dash-one-a for the first command, and dash-lowercase-L for the second). Not sure what would be right on windows - perhaps just a dir and look for the summary.
Before bothering with something like this I'd strongly recommend you create a directory with a very large number of files and just see if list().length really does take too long. As this blogger suggests, you may not want to sweat this.
I'd probably go with Varkhan's answer myself.
Since you don't really need the total number, and in fact want to perform an action after a certain number (in your case 5000), you can use java.nio.file.Files.newDirectoryStream. The benefit is that you can exit early instead having to go through the entire directory just to get a count.
public boolean isOverMax(){
Path dir = Paths.get("C:/foo/bar");
int i = 1;
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
for (Path p : stream) {
//larger than max files, exit
if (++i > MAX_FILES) {
return true;
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
return false;
}
The interface doc for DirectoryStream also has some good examples.
If you have directories containing really (>100'000) many files, here is a (non-portable) way to go:
String directoryPath = "a path";
// -f flag is important, because this way ls does not sort it output,
// which is way faster
String[] params = { "/bin/sh", "-c",
"ls -f " + directoryPath + " | wc -l" };
Process process = Runtime.getRuntime().exec(params);
BufferedReader reader = new BufferedReader(new InputStreamReader(
process.getInputStream()));
String fileCount = reader.readLine().trim() - 2; // accounting for .. and .
reader.close();
System.out.println(fileCount);
Using sigar should help. Sigar has native hooks to get the stats
new Sigar().getDirStat(dir).getTotal()
This method works for me very well.
// Recursive method to recover files and folders and to print the information
public static void listFiles(String directoryName) {
File file = new File(directoryName);
File[] fileList = file.listFiles(); // List files inside the main dir
int j;
String extension;
String fileName;
if (fileList != null) {
for (int i = 0; i < fileList.length; i++) {
extension = "";
if (fileList[i].isFile()) {
fileName = fileList[i].getName();
if (fileName.lastIndexOf(".") != -1 && fileName.lastIndexOf(".") != 0) {
extension = fileName.substring(fileName.lastIndexOf(".") + 1);
System.out.println("THE " + fileName + " has the extension = " + extension);
} else {
extension = "Unknown";
System.out.println("extension2 = " + extension);
}
filesCount++;
allStats.add(new FilePropBean(filesCount, fileList[i].getName(), fileList[i].length(), extension,
fileList[i].getParent()));
} else if (fileList[i].isDirectory()) {
filesCount++;
extension = "";
allStats.add(new FilePropBean(filesCount, fileList[i].getName(), fileList[i].length(), extension,
fileList[i].getParent()));
listFiles(String.valueOf(fileList[i]));
}
}
}
}
Unfortunately, as mmyers said, File.list() is about as fast as you are going to get using Java. If speed is as important as you say, you may want to consider doing this particular operation using JNI. You can then tailor your code to your particular situation and filesystem.
public void shouldGetTotalFilesCount() {
Integer reduce = of(listRoots()).parallel().map(this::getFilesCount).reduce(0, ((a, b) -> a + b));
}
private int getFilesCount(File directory) {
File[] files = directory.listFiles();
return Objects.isNull(files) ? 1 : Stream.of(files)
.parallel()
.reduce(0, (Integer acc, File p) -> acc + getFilesCount(p), (a, b) -> a + b);
}
Count files in directory and all subdirectories.
var path = Path.of("your/path/here");
var count = Files.walk(path).filter(Files::isRegularFile).count();
In spring batch I did below
private int getFilesCount() throws IOException {
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
Resource[] resources = resolver.getResources("file:" + projectFilesFolder + "/**/input/splitFolder/*.csv");
return resources.length;
}