Java Generate software configuration - java

I am working on a project which has some .properties configuration files for datasource, MQ and some other stuff. We do also have launch shell scripts and user profile scripts. The problem i am facing is that we do actually deploy this software on 5 different environments, and of course the configuration is different for each of them. It's being a little bit hard to maintain about 30 plain text files with the configuration. Most of them are pretty much equal, like shell scripts that only have some different path references on.
Do you guys know any kind of tool i could integrate on our build script that might grab these properties from a single file or an embedded database and then generate the proper environment configuration? If it could also generate the scripts it would be even more interesting.
Thanks

Maven provides this out-of-the-box: http://maven.apache.org/guides/mini/guide-building-for-different-environments.html.

I am the maintainer of Config4*, which is a configuration-file parser library in C++ and Java flavours. Most of the contents in a Config4* configuration file are name=value statements, but you can reference environment variables and the standard output of executing some commands like hostname. You can also have if-then-else statements in a configuration file. For example (keywords are prefixed with "#"):
#if (exec("hostname") #in ["host1", "host2", "host3"]) {
... # set variables to values for production environment
} #elseIf (exec("hostname") #in ["host4", "host5", "host6"]) {
... # set variables to values for staging environment
} #else {
#error "Unknown host";
}
I call this adaptable configuration because a single configuration file can adapt its contents for a variety of hosts, user names, and so on. Config4* provides a trivial way to integrate command-line options with a configuration file, so it is possible to have a configuration file that adapts its contents based on the presence of a command-line option such as -env production or -env staging. For example:
env ?= ""; # set by a command-line option
if (env == "production") {
... # set variables to values for production environment
} #elseIf (env == "staging") {
... # set variables to values for staging environment
} #else {
#error "You must specify '-env production' or '-env staging' as a command-line option";
}
I can think of two possible ways that Config4* might be of help to you.
One option is for you to embed the Config4* parser in your applications. However, although I think that is a good approach when developing new applications, I think might be tedious to retrofit Config4* to an existing application (not because the Config4* is difficult to use, but just because you will be modifying existing code that uses, say, the Java properties API or an XML API to use a different API, and such modifications tend to be tedious).
The second option better fits with the specifics of your question. You write template versions of your shell scripts and property files. These template files will use a particular syntax, such as '${variable.name}' to specify where values from a configuration file should be used. You then write a small utility application that reads a template file and a configuration file, performs the required substitutions, and then writes the transformed file to disk. You could run that utility application from your build system.

You could have a look at newly announced tools4j-config which lets you handle configuration at runtime rather than build time.

In a previous answer, I outlined how Config4* could satisfy your needs. I decided to eat my own dog food, so I knocked up a ready-to-compile-and-run Config4*-based application that will do what you want. I am providing the code inline in this answer. Rather than reading the code via the StackOverview webpage, you might find it easier to copy-and-paste the code into files so you can view it with a text editor.
First, we need a configuration file that defines three variables:
deploymentType (specified as a command-line argument to have the value dev, staging or prod);
files (pairs of template files and output files);
searchAndReplace (pairs of search and replace strings to be applied to the template files to produce the output files). The pairs of strings used depend on the value of deploymentType.
Here is an example of such a file (copy-and-paste this into templates.cfg):
deploymentType ?= ""; # specified with a command-line argument
files = [
# template file output file
# ----------------------------------------------------
"log4j-template.properties", "log4j.properties",
"hello-template.sh", "hello.sh",
];
#if (deploymentType == "dev") {
searchAndReplace = [
"${db.host}", "localhost",
"${db.user}", "guest",
"${db.log.level}", "2",
];
} #elseIf (deploymentType == "staging") {
searchAndReplace = [
"${db.host}", exec("hostname"),
"${db.user}", getenv("USERNAME"),
"${db.log.level}", "0",
];
} #elseIf (deploymentType == "prod") {
searchAndReplace = [
"${db.host}", "production.example.com",
"${db.user}", getenv("USERNAME"),
"${db.log.level}", "0",
];
} #else {
#error "deploymentType must be 'dev', 'staging' or 'prod'";
}
Here is the main-line of the application. You should cut-n-paste the following into InstantiateTemplateFiles.java:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import org.config4j.Configuration;
import org.config4j.SchemaValidator;
import org.config4j.ConfigurationException;
public class InstantiateTemplateFiles
{
public static void main(String[] args)
{
Configuration cfg = Configuration.create();
SchemaValidator sv = new SchemaValidator();
String[] searchAndReplace;
String[] files;
String contents;
String modifiedContents;
String templateFile;
String outputFile;
int i;
String[] schema = new String[] {
"deploymentType = string",
"searchAndReplace=table[string,search, string,replace]",
"files=table[string,template-file, string,output-file]",
};
if (args.length != 2) {
System.err.println("\nusage: java InstantiateTemplateFiles"
+ " meta-config-file.cfg deploymentType\n");
System.exit(1);
}
try {
//--------
// Parse the configuration file, perform schema validation
// and retrieve the required configuration variables.
//--------
cfg.insertString("", "deploymentType", args[1]);
cfg.parse(args[0]);
sv.parseSchema(schema);
sv.validate(cfg, "", "");
searchAndReplace = cfg.lookupList("", "searchAndReplace");
files = cfg.lookupList("", "files");
//--------
// Do the real work
//--------
for (i = 0; i < files.length; i += 2) {
Util.searchAndReplaceInFile(files[i + 0], files[i + 1],
searchAndReplace);
}
} catch(IOException ex) {
System.err.println("\n" + ex.getMessage() + "\n");
System.exit(1);
} catch(ConfigurationException ex) {
System.err.println("\n" + ex.getMessage() + "\n");
System.exit(1);
}
}
}
Finally, here is the code to perform the search-and-replace on files. This code is independent of Config4*, so you might find it useful even if you decide to build a non-Config4*-based utility. You should cut-n-paste this code into Util.java:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
public class Util
{
public static void searchAndReplaceInFile(
String inputFile,
String outputFile,
String[] searchAndReplacePairs) throws IOException
{
String contents;
String modifiedContents;
contents = Util.readTextFile(inputFile);
modifiedContents = Util.replace(contents, searchAndReplacePairs);
Util.writeTextFile(outputFile, modifiedContents);
}
public static String readTextFile(String fileName) throws IOException
{
BufferedReader in;
StringBuffer result;
String line;
result = new StringBuffer();
in = new BufferedReader(new FileReader(fileName));
while ((line = in.readLine()) != null) {
result.append(line).append("\n");
}
in.close();
return result.toString();
}
public static void writeTextFile(String fileName, String contents)
throws IOException
{
PrintWriter out;
StringBuffer result;
String line;
out = new PrintWriter(new BufferedWriter(new FileWriter(fileName)));
out.print(contents);
out.close();
}
public static String replace(
String origStr,
String searchStr,
String replacementStr)
{
StringBuffer result;
int origStrLen;
int searchStrLen;
int currStart;
int pIndex;
result = new StringBuffer();
origStrLen = origStr.length();
searchStrLen = searchStr.length();
currStart = 0;
pIndex = origStr.indexOf(searchStr, currStart);
while (pIndex != -1) {
result.append(origStr.substring(currStart, pIndex));
result.append(replacementStr);
currStart = pIndex + searchStrLen;
pIndex = origStr.indexOf(searchStr, currStart);
}
result.append(origStr.substring(currStart));
return result.toString();
}
public static String replace(
String origStr,
String[] searchAndReplacePairs)
{
int i;
int currIndex;
String subStr;
String replaceStr;
StringBuffer result;
SearchAndReplacePair[] pairs;
SearchAndReplacePair nextPair;
pairs = new SearchAndReplacePair[searchAndReplacePairs.length / 2];
for (i = 0; i < searchAndReplacePairs.length; i += 2) {
pairs[i/2] = new SearchAndReplacePair(origStr,
searchAndReplacePairs[i + 0],
searchAndReplacePairs[i + 1]);
}
result = new StringBuffer();
currIndex = 0;
nextPair = findNextPair(origStr, currIndex, pairs);
while (nextPair != null) {
subStr = origStr.substring(currIndex, nextPair.indexOf);
result.append(subStr);
result.append(nextPair.replace);
currIndex = nextPair.indexOf + nextPair.length;
for (i = 0; i < pairs.length; i++) {
pairs[i].findNext(currIndex);
}
nextPair = findNextPair(origStr, currIndex, pairs);
}
subStr = origStr.substring(currIndex);
result.append(subStr);
return result.toString();
}
private static SearchAndReplacePair findNextPair(
String origStr,
int currIndex,
SearchAndReplacePair[] pairs)
{
int i;
SearchAndReplacePair bestSoFar;
SearchAndReplacePair item;
bestSoFar = null;
for (i = 0; i < pairs.length; i++) {
item = pairs[i];
if (item.indexOf == -1) {
continue;
}
if (bestSoFar == null) {
bestSoFar = item;
continue;
}
if (bestSoFar.indexOf < item.indexOf) {
continue;
}
if (bestSoFar.indexOf > item.indexOf) {
bestSoFar = item;
continue;
}
if (bestSoFar.length < item.length) {
bestSoFar = item;
}
}
return bestSoFar;
}
}
class SearchAndReplacePair
{
String source;
String search;
String replace;
int length;
int indexOf;
int sourceLength;
public SearchAndReplacePair(String source, String search, String replace)
{
this.source = source;
this.sourceLength = source.length();
this.search = search;
this.replace = replace;
this.length = search.length();
this.indexOf = source.indexOf(search);
}
public void findNext(int fromIndex)
{
if (indexOf == -1 || indexOf + 1 == sourceLength) {
indexOf = -1;
} else {
indexOf = source.indexOf(search, fromIndex);
}
}
}
Assuming you have downloaded and installed Config4J (from the Config4* website), you can compile the utility with the following:
CLASSPATH=.:/path/to/config4j.jar;
export CLASSPATH
javac -classpath .:/ag/projects/config4j/lib/config4j.jar *.java
Here is an example of running it:
java InstantiateTemplateFiles templates.cfg prod
If the file hello-template.sh looks like:
#!/bin/sh
DB_HOST=${db.host}
DB_USER=${db.user}
DB_LOG_LEVEL=${db.log.level}
echo Hello from $DB_USER at log level $DB_LOG_LEVEL on host $DB_HOST
then the generated hello.sh file will look like:
#!/bin/sh
DB_HOST=production.example.com
DB_USER=cjmchale
DB_LOG_LEVEL=0
echo Hello from $DB_USER at log level $DB_LOG_LEVEL on host $DB_HOST

Related

Compare contents of pdf in Java [duplicate]

How would you write a java function boolean sameContent(Path file1,Path file2)which determines if the two given paths point to files which store the same content? Of course, first, I would check if the file sizes are the same. This is a necessary condition for storing the same content. But then I'd like to listen to your approaches. If the two files are stored on the same hard drive (like in most of my cases) it's probably not the best way to jump too many times between the two streams.
Exactly what FileUtils.contentEquals method of Apache commons IO does and api is here.
Try something like:
File file1 = new File("file1.txt");
File file2 = new File("file2.txt");
boolean isTwoEqual = FileUtils.contentEquals(file1, file2);
It does the following checks before actually doing the comparison:
existence of both the files
Both file's that are passed are to be of file type and not directory.
length in bytes should not be the same.
Both are different files and not one and the same.
Then compare the contents.
If you don't want to use any external libraries, then simply read the files into byte arrays and compare them (won't work pre Java-7):
byte[] f1 = Files.readAllBytes(file1);
byte[] f2 = Files.readAllBytes(file2);
by using Arrays.equals.
If the files are large, then instead of reading the entire files into arrays, you should use BufferedInputStream and read the files chunk-by-chunk as explained here.
Since Java 12 there is method Files.mismatch which returns -1 if there is no mismatch in the content of the files. Thus the function would look like following:
private static boolean sameContent(Path file1, Path file2) throws IOException {
return Files.mismatch(file1, file2) == -1;
}
If the files are small, you can read both into the memory and compare the byte arrays.
If the files are not small, you can either compute the hashes of their content (e.g. MD5 or SHA-1) one after the other and compare the hashes (but this still leaves a very small chance of error), or you can compare their content but for this you still have to read the streams alternating.
Here is an example:
boolean sameContent(Path file1, Path file2) throws IOException {
final long size = Files.size(file1);
if (size != Files.size(file2))
return false;
if (size < 4096)
return Arrays.equals(Files.readAllBytes(file1), Files.readAllBytes(file2));
try (InputStream is1 = Files.newInputStream(file1);
InputStream is2 = Files.newInputStream(file2)) {
// Compare byte-by-byte.
// Note that this can be sped up drastically by reading large chunks
// (e.g. 16 KBs) but care must be taken as InputStream.read(byte[])
// does not neccessarily read a whole array!
int data;
while ((data = is1.read()) != -1)
if (data != is2.read())
return false;
}
return true;
}
This should help you with your problem:
package test;
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
public class CompareFileContents {
public static void main(String[] args) throws IOException {
File file1 = new File("test1.txt");
File file2 = new File("test2.txt");
File file3 = new File("test3.txt");
boolean compare1and2 = FileUtils.contentEquals(file1, file2);
boolean compare2and3 = FileUtils.contentEquals(file2, file3);
boolean compare1and3 = FileUtils.contentEquals(file1, file3);
System.out.println("Are test1.txt and test2.txt the same? " + compare1and2);
System.out.println("Are test2.txt and test3.txt the same? " + compare2and3);
System.out.println("Are test1.txt and test3.txt the same? " + compare1and3);
}
}
If it for unit test, then AssertJ provides a method named hasSameContentAs. An example:
Assertions.assertThat(file1).hasSameContentAs(file2)
I know I'm pretty late to the party on this one, but memory mapped IO is a pretty simple way to do this if you want to use straight Java APIs and no third party dependencies. It's only a few calls to open the files, map them, and then compare use ByteBuffer.equals(Object) to compare the files.
This is probably going to give you the best performance if you expect the particular file to be large because you're offloading a majority of the IO legwork onto the OS and the otherwise highly optimized bits of the JVM (assuming you're using a decent JVM).
Straight from the
FileChannel JavaDoc:
For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory.
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
public class MemoryMappedCompare {
public static boolean areFilesIdenticalMemoryMapped(final Path a, final Path b) throws IOException {
try (final FileChannel fca = FileChannel.open(a, StandardOpenOption.READ);
final FileChannel fcb = FileChannel.open(b, StandardOpenOption.READ)) {
final MappedByteBuffer mbba = fca.map(FileChannel.MapMode.READ_ONLY, 0, fca.size());
final MappedByteBuffer mbbb = fcb.map(FileChannel.MapMode.READ_ONLY, 0, fcb.size());
return mbba.equals(mbbb);
}
}
}
It's >=JR6 compatible, library-free and don't read all content at time.
public static boolean sameFile(File a, File b) {
if (a == null || b == null) {
return false;
}
if (a.getAbsolutePath().equals(b.getAbsolutePath())) {
return true;
}
if (!a.exists() || !b.exists()) {
return false;
}
if (a.length() != b.length()) {
return false;
}
boolean eq = true;
FileChannel channelA;
FileChannel channelB;
try {
channelA = new RandomAccessFile(a, "r").getChannel();
channelB = new RandomAccessFile(b, "r").getChannel();
long channelsSize = channelA.size();
ByteBuffer buff1 = channelA.map(FileChannel.MapMode.READ_ONLY, 0, channelsSize);
ByteBuffer buff2 = channelB.map(FileChannel.MapMode.READ_ONLY, 0, channelsSize);
for (int i = 0; i < channelsSize; i++) {
if (buff1.get(i) != buff2.get(i)) {
eq = false;
break;
}
}
} catch (FileNotFoundException ex) {
Logger.getLogger(HotUtils.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(HotUtils.class.getName()).log(Level.SEVERE, null, ex);
}
return eq;
}
package test;
import org.junit.jupiter.api.Test;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import static org.junit.Assert.assertEquals;
public class CSVResultDIfference {
#Test
public void csvDifference() throws IOException {
Path file_F = FileSystems.getDefault().getPath("C:\\Projekts\\csvTestX", "yolo2.csv");
long size_F = Files.size(file_F);
Path file_I = FileSystems.getDefault().getPath("C:\\Projekts\\csvTestZ", "yolo2.csv");
long size_I = Files.size(file_I);
assertEquals(size_F, size_I);
}
}
it worked for me :)

Get a list of disks to read free space in Java, using Sigar

I need to get the free available disk space for all disks in system, or all partitions, I don't mind that. (I dont have to use Sigar, but I am using it already on the project for some other processes, so I can use it for this as well)
I am using Sigar API and got this
public double getFreeHdd() throws SigarException{
FileSystemUsage f= sigar.getFileSystemUsage("/");
return ( f.getAvail());
}
But this only gives me the system partition (root), how can i get a list of all partition and loop them to get their free space?
I tried this
FileSystemView fsv = FileSystemView.getFileSystemView();
File[] roots = fsv.getRoots();
for (int i = 0; i < roots.length; i++) {
System.out.println("Root: " + roots[i]);
}
But it only returns the root dir
Root: /
Thanks
Edit
it seems that I could use
FileSystem[] fslist = sigar.getFileSystemList();
But the results i am getting do not match the ones i get from the terminal. On the other hand on this system I am working on, i have 3 disks with a total 12 partitions, so i might be loosing something there. Will try it on some other system in case i can make something useful out of the results.
We use SIGAR extensively for cross-platform monitoring. This is the code we use to get the file system list:
/**
* #return a list of directory path names of file systems that are local or network - not removable media
*/
public static Set<String> getLocalOrNetworkFileSystemDirectoryNames() {
Set<String> ret = new HashSet<String>();
try {
FileSystem[] fileSystemList = getSigarProxy().getFileSystemList();
for (FileSystem fs : fileSystemList) {
if ((fs.getType() == FileSystem.TYPE_LOCAL_DISK) || (fs.getType() == FileSystem.TYPE_NETWORK)) {
ret.add(fs.getDirName());
}
}
}
catch (SigarException e) {
// log or rethrow as appropriate
}
return ret;
}
You can then use that as the input to other SIGAR methods:
FileSystemUsage usageStats = getSigarProxy().getFileSystemUsage(fileSystemDirectoryPath);
The getSigarProxy() is just a convenience base method:
// The Humidor handles thread safety for a single instance of a Sigar object
static final private SigarProxy sigarProxy = Humidor.getInstance().getSigar();
static final protected SigarProxy getSigarProxy() {
return sigarProxy;
}
You can use java.nio.file.FileSystems to get a list of java.nio.file.FileStorages and then see the usable/available space. Per instance (assuming that you are using Java 7+):
import java.io.IOException;
import java.nio.file.FileStore;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.util.function.Consumer;
public static void main(String[] args) {
FileSystem fs = FileSystems.getDefault();
fs.getFileStores().forEach(new Consumer<FileStore>() {
#Override
public void accept(FileStore store) {
try {
System.out.println(store.getTotalSpace());
System.out.println(store.getUsableSpace());
} catch (IOException e) {
e.printStackTrace();
}
}
});
}
Also, keep in mind that FileStore.getUsableSpace() returns the size in bytes. See the docs for more information.

Apache Library to Standard Java

I have a problem. I wrote this code that reads a string from a txt file and I exported with the first method a int while the second one particular string. This method is already running but I have used the apache library, now I wanted to rewrite it in Java standard libraries. I have tried this, but I have had problems. Could someone help me? Thank you very much.
package ausiliare;
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.*;
public class Read {
public static int getInt() throws IOException {
String content = null;
File folder = new File("C:\\Solution.txt");
content = FileUtils.readFileToString(folder) + "\n";
int outside = Integer.parseInt(content.substring(0,
content.indexOf("[")).trim());
return outside;
}
public static String getString() throws IOException {
String content = null;
File folder = new File("C:\\Solution.txt");
content = FileUtils.readFileToString(folder) + "\n";
String remainingString = content.substring(content.indexOf(" ["),
content.lastIndexOf("]") + 1);
// System.out.println(remainingString);
return remainingString;
}
public static String[] arg() throws IOException {
String[] strArray = getString().split(" ");
// System.out.println(Arrays.toString(strArray));
return strArray;
}
}
Ps: The input file is txt (for example):
50 [8,24,-22] [-8,34,12] [19,14,47] [-49,32,44] [-41,16,-6] [-49,-11,43]
Where the first method extracts the int 50 and the second extraction method extracts the remaining
content = new String(Files.readAllBytes(folder.toPath()),
StandardCharsets.UTF_8);
The missing part is the knowledge of the Files class.
There is a List<String> readAllLines too.
The character set parameter is optional and defaults to the current operating system's encoding - not very portable to other computers.

Problems with the import statement

I've created the following java file:
import java.awt.*;
public class Text {
public static void main(String[] args) {
String str = "I AM A SENTENCE";
String[] lines = wrap(str, 5);
for (int i=0;i<lines.length;i++) {
if (lines[i] != null) System.out.println(lines[i]);
}
Font myFont = new Font("Impact", Font.BOLD, 36);
System.out.println(String.valueOf(charToPixel(str, myFont)));
}
public static String[] wrap(String str, int w) {
char[] string = str.toCharArray();
System.out.println("string.length: " + String.valueOf(string.length));
int charCounter = 0;
String[] line = new String[20];
String work = "";
int x = 0;
for (int i=0;i<string.length;i++) {
charCounter++;
System.out.println("charCounter: " + String.valueOf(charCounter));
System.out.println("i: " + string[i]);
if (charCounter > w) {
charCounter = 0;
System.out.println(String.valueOf(x));
line[x] = work;
x++;
work = "";
i--;
}
else {
work += string[i];
}
}
line[x] = work;
return line;
}
}
Now, I also created a simple applet that I want to use to receive the String[] and then one by one output it using Graphics.drawString().
I created a .jar file using the default manifest and the previous class file. The class file's directory is as follows within the jar: Dennis\Text.class.
I added my jar into the CLASSPATH.
I used the import statement as follows: import Dennis.*;
However when I compile the applet (btw the Text.class had compiled perfectly)
I get the following compilation error:
bad class file: B:\Apps\Java\JDK\lib\Text.jar(Dennis/Text.class)
class file contains wrong class: Text
Please remove or make sure it appears in the correct subdirectory of the classpath.
As far as I can tell, I put everything in the right place and the import statement was successful.
So what am I doing wrong?
The class file's directory is as follows within the jar: Dennis\Text.class.
It shouldn't be. It's not in any package, so it should just be directly within the root directory of the jar file. Ideally put it within a package (not Dennis, which violates Java naming conventions) and then make your jar file structure match the package structure.

Is there a workaround for Java's poor performance on walking huge directories?

I am trying to process files one at a time that are stored over a network. Reading the files is fast due to buffering is not the issue. The problem I have is just listing the directories in a folder. I have at least 10k files per folder over many folders.
Performance is super slow since File.list() returns an array instead of an iterable. Java goes off and collects all the names in a folder and packs it into an array before returning.
The bug entry for this is http://bugs.sun.com/view_bug.do;jsessionid=db7fcf25bcce13541c4289edeb4?bug_id=4285834 and doesn't have a work around. They just say this has been fixed for JDK7.
A few questions:
Does anybody have a workaround to this performance bottleneck?
Am I trying to achieve the impossible? Is performance still going to be poor even if it just iterates over the directories?
Could I use the beta JDK7 builds that have this functionality without having to build my entire project on it?
Although it's not pretty, I solved this kind of problem once by piping the output of dir/ls to a file before starting my app, and passing in the filename.
If you needed to do it within the app, you could just use system.exec(), but it would create some nastiness.
You asked. The first form is going to be blazingly fast, the second should be pretty fast as well.
Be sure to do the one item per line (bare, no decoration, no graphics), full path and recurse options of your selected command.
EDIT:
30 minutes just to get a directory listing, wow.
It just struck me that if you use exec(), you can get it's stdout redirected into a pipe instead of writing it to a file.
If you did that, you should start getting the files immediately and be able to begin processing before the command has completed.
The interaction may actually slow things down, but maybe not--you might give it a try.
Wow, I just went to find the syntax of the .exec command for you and came across this, possibly exactly what you want (it lists a directory using exec and "ls" and pipes the result into your program for processing): good link in wayback (Jörg provided in a comment to replace this one from sun that Oracle broke)
Anyway, the idea is straightforward but getting the code right is annoying. I'll go steal some codes from the internets and hack them up--brb
/**
* Note: Only use this as a last resort! It's specific to windows and even
* at that it's not a good solution, but it should be fast.
*
* to use it, extend FileProcessor and call processFiles("...") with a list
* of options if you want them like /s... I highly recommend /b
*
* override processFile and it will be called once for each line of output.
*/
import java.io.*;
public abstract class FileProcessor
{
public void processFiles(String dirOptions)
{
Process theProcess = null;
BufferedReader inStream = null;
// call the Hello class
try
{
theProcess = Runtime.getRuntime().exec("cmd /c dir " + dirOptions);
}
catch(IOException e)
{
System.err.println("Error on exec() method");
e.printStackTrace();
}
// read from the called program's standard output stream
try
{
inStream = new BufferedReader(
new InputStreamReader( theProcess.getInputStream() ));
processFile(inStream.readLine());
}
catch(IOException e)
{
System.err.println("Error on inStream.readLine()");
e.printStackTrace();
}
} // end method
/** Override this method--it will be called once for each file */
public abstract void processFile(String filename);
} // end class
And thank you code donor at IBM
How about using File.list(FilenameFilter filter) method and implementing FilenameFilter.accept(File dir, String name) to process each file and return false.
I ran this on Linux vm for directory with 10K+ files and it took <10 seconds.
import java.io.File;
import java.io.FilenameFilter;
public class Temp {
private static void processFile(File dir, String name) {
File file = new File(dir, name);
System.out.println("processing file " + file.getName());
}
private static void forEachFile(File dir) {
String [] ignore = dir.list(new FilenameFilter() {
public boolean accept(File dir, String name) {
processFile(dir, name);
return false;
}
});
}
public static void main(String[] args) {
long before, after;
File dot = new File(".");
before = System.currentTimeMillis();
forEachFile(dot);
after = System.currentTimeMillis();
System.out.println("after call, delta is " + (after - before));
}
}
An alternative is to have the files served over a different protocol. As I understand you're using SMB for that and java is just trying to list them as a regular file.
The problem here might not be java alone ( how does it behaves when you open that directory with Microsoft Explorer x:\shared ) In my experience it also take a considerably amount of time.
You can change the protocol to something like HTTP, only to fetch the file names. This way you can retrieve the list of files over http ( 10k lines should't be too much ) and let the server deal with file listing. This would be very fast, since it will run with local resources ( those in the server )
Then when you have the list, you can process them one by exactly the way you're doing right now.
The keypoint is to have an aid mechanism in the other side of the node.
Is this feasible?
Today:
File [] content = new File("X:\\remote\\dir").listFiles();
for ( File f : content ) {
process( f );
}
Proposed:
String [] content = fetchViaHttpTheListNameOf("x:\\remote\\dir");
for ( String fileName : content ) {
process( new File( fileName ) );
}
The http server could be a very small small and simple file.
If this is the way you have it right now, what you're doing is to fetch all the 10k files information to your client machine ( I don't know how much of that info ) when you only need the file name for later processing.
If the processing is very fast right now it may be slowed down a bit. This is because the information prefetched is no longer available.
Give it a try.
A non-portable solution would be to make native calls to the operating system and stream the results.
For Linux
You can look at something like readdir. You can walk the directory structure like a linked list and return results in batches or individually.
For Windows
In windows the behavior would be fairly similar using FindFirstFile and FindNextFile apis.
I doubt the problem is relate to the bug report you referenced.
The issue there is "only" memory usage, but not necessarily speed.
If you have enough memory the bug is not relevant for your problem.
You should measure whether your problem is memory related or not. Turn on your Garbage Collector log and use for example gcviewer to analyze your memory usage.
I suspect that it has to do with the SMB protocol causing the problem.
You can try to write a test in another language and see if it's faster, or you can try to get the list of filenames through some other method, such as described here in another post.
If you need to eventually process all files, then having Iterable over String[] won't give you any advantage, as you'll still have to go and fetch the whole list of files.
If you're on Java 1.5 or 1.6, shelling out "dir" commands and parsing the standard output stream on Windows is a perfectly acceptable approach. I've used this approach in the past for processing network drives and it has generally been a lot faster than waiting for the native java.io.File listFiles() method to return.
Of course, a JNI call should be faster and potentially safer than shelling out "dir" commands. The following JNI code can be used to retrieve a list of files/directories using the Windows API. This function can be easily refactored into a new class so the caller can retrieve file paths incrementally (i.e. get one path at a time). For example, you can refactor the code so that FindFirstFileW is called in a constructor and have a seperate method to call FindNextFileW.
JNIEXPORT jstring JNICALL Java_javaxt_io_File_GetFiles(JNIEnv *env, jclass, jstring directory)
{
HANDLE hFind;
try {
//Convert jstring to wstring
const jchar *_directory = env->GetStringChars(directory, 0);
jsize x = env->GetStringLength(directory);
wstring path; //L"C:\\temp\\*";
path.assign(_directory, _directory + x);
env->ReleaseStringChars(directory, _directory);
if (x<2){
jclass exceptionClass = env->FindClass("java/lang/Exception");
env->ThrowNew(exceptionClass, "Invalid path, less than 2 characters long.");
}
wstringstream ss;
BOOL bContinue = TRUE;
WIN32_FIND_DATAW data;
hFind = FindFirstFileW(path.c_str(), &data);
if (INVALID_HANDLE_VALUE == hFind){
jclass exceptionClass = env->FindClass("java/lang/Exception");
env->ThrowNew(exceptionClass, "FindFirstFileW returned invalid handle.");
}
//HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
//DWORD dwBytesWritten;
// If we have no error, loop thru the files in this dir
while (hFind && bContinue){
/*
//Debug Print Statment. DO NOT DELETE! cout and wcout do not print unicode correctly.
WriteConsole(hStdOut, data.cFileName, (DWORD)_tcslen(data.cFileName), &dwBytesWritten, NULL);
WriteConsole(hStdOut, L"\n", 1, &dwBytesWritten, NULL);
*/
//Check if this entry is a directory
if (data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY){
// Make sure this dir is not . or ..
if (wstring(data.cFileName) != L"." &&
wstring(data.cFileName) != L"..")
{
ss << wstring(data.cFileName) << L"\\" << L"\n";
}
}
else{
ss << wstring(data.cFileName) << L"\n";
}
bContinue = FindNextFileW(hFind, &data);
}
FindClose(hFind); // Free the dir structure
wstring cstr = ss.str();
int len = cstr.size();
//WriteConsole(hStdOut, cstr.c_str(), len, &dwBytesWritten, NULL);
//WriteConsole(hStdOut, L"\n", 1, &dwBytesWritten, NULL);
jchar* raw = new jchar[len];
memcpy(raw, cstr.c_str(), len*sizeof(wchar_t));
jstring result = env->NewString(raw, len);
delete[] raw;
return result;
}
catch(...){
FindClose(hFind);
jclass exceptionClass = env->FindClass("java/lang/Exception");
env->ThrowNew(exceptionClass, "Exception occured.");
}
return NULL;
}
Credit:
https://sites.google.com/site/jozsefbekes/Home/windows-programming/miscellaneous-functions
Even with this approach, there are still efficiencies to be gained. If you serialize the path to a java.io.File, there is a huge performance hit - especially if the path represents a file on a network drive. I have no idea what Sun/Oracle is doing under the hood but if you need additional file attributes other than the file path (e.g. size, mod date, etc), I have found that the following JNI function is much faster than instantiating a java.io.File object on a network the path.
JNIEXPORT jlongArray JNICALL Java_javaxt_io_File_GetFileAttributesEx(JNIEnv *env, jclass, jstring filename)
{
//Convert jstring to wstring
const jchar *_filename = env->GetStringChars(filename, 0);
jsize len = env->GetStringLength(filename);
wstring path;
path.assign(_filename, _filename + len);
env->ReleaseStringChars(filename, _filename);
//Get attributes
WIN32_FILE_ATTRIBUTE_DATA fileAttrs;
BOOL result = GetFileAttributesExW(path.c_str(), GetFileExInfoStandard, &fileAttrs);
if (!result) {
jclass exceptionClass = env->FindClass("java/lang/Exception");
env->ThrowNew(exceptionClass, "Exception Occurred");
}
//Create an array to store the WIN32_FILE_ATTRIBUTE_DATA
jlong buffer[6];
buffer[0] = fileAttrs.dwFileAttributes;
buffer[1] = date2int(fileAttrs.ftCreationTime);
buffer[2] = date2int(fileAttrs.ftLastAccessTime);
buffer[3] = date2int(fileAttrs.ftLastWriteTime);
buffer[4] = fileAttrs.nFileSizeHigh;
buffer[5] = fileAttrs.nFileSizeLow;
jlongArray jLongArray = env->NewLongArray(6);
env->SetLongArrayRegion(jLongArray, 0, 6, buffer);
return jLongArray;
}
You can find a full working example of this JNI-based approach in the javaxt-core library. In my tests using Java 1.6.0_38 with a Windows host hitting a Windows share, I have found this JNI approach approximately 10x faster then calling java.io.File listFiles() or shelling out "dir" commands.
I wonder why there are 10k files in a directory. Some file systems do not work well with so many files. There are specifics limitations for file systems like max amount of files per directory and max amount of levels of subdirectory.
I solve a similar problem with an iterator solution.
I needed to walk across huge directorys and several levels of directory tree recursively.
I try FileUtils.iterateFiles() of Apache commons io. But it implement the iterator by adding all the files in a List and then returning List.iterator(). It's very bad for memory.
So I prefer to write something like this:
private static class SequentialIterator implements Iterator<File> {
private DirectoryStack dir = null;
private File current = null;
private long limit;
private FileFilter filter = null;
public SequentialIterator(String path, long limit, FileFilter ff) {
current = new File(path);
this.limit = limit;
filter = ff;
dir = DirectoryStack.getNewStack(current);
}
public boolean hasNext() {
while(walkOver());
return isMore && (limit > count || limit < 0) && dir.getCurrent() != null;
}
private long count = 0;
public File next() {
File aux = dir.getCurrent();
dir.advancePostition();
count++;
return aux;
}
private boolean walkOver() {
if (dir.isOutOfDirListRange()) {
if (dir.isCantGoParent()) {
isMore = false;
return false;
} else {
dir.goToParent();
dir.advancePostition();
return true;
}
} else {
if (dir.isCurrentDirectory()) {
if (dir.isDirectoryEmpty()) {
dir.advancePostition();
} else {
dir.goIntoDir();
}
return true;
} else {
if (filter.accept(dir.getCurrent())) {
return false;
} else {
dir.advancePostition();
return true;
}
}
}
}
private boolean isMore = true;
public void remove() {
throw new UnsupportedOperationException();
}
}
Note that the iterator stop by an amount of files iterateds and it has a FileFilter also.
And DirectoryStack is:
public class DirectoryStack {
private class Element{
private File files[] = null;
private int currentPointer;
public Element(File current) {
currentPointer = 0;
if (current.exists()) {
if(current.isDirectory()){
files = current.listFiles();
Set<File> set = new TreeSet<File>();
for (int i = 0; i < files.length; i++) {
File file = files[i];
set.add(file);
}
set.toArray(files);
}else{
throw new IllegalArgumentException("File current must be directory");
}
} else {
throw new IllegalArgumentException("File current not exist");
}
}
public String toString(){
return "current="+getCurrent().toString();
}
public int getCurrentPointer() {
return currentPointer;
}
public void setCurrentPointer(int currentPointer) {
this.currentPointer = currentPointer;
}
public File[] getFiles() {
return files;
}
public File getCurrent(){
File ret = null;
try{
ret = getFiles()[getCurrentPointer()];
}catch (Exception e){
}
return ret;
}
public boolean isDirectoryEmpty(){
return !(getFiles().length>0);
}
public Element advancePointer(){
setCurrentPointer(getCurrentPointer()+1);
return this;
}
}
private DirectoryStack(File first){
getStack().push(new Element(first));
}
public static DirectoryStack getNewStack(File first){
return new DirectoryStack(first);
}
public String toString(){
String ret = "stack:\n";
int i = 0;
for (Element elem : stack) {
ret += "nivel " + i++ + elem.toString()+"\n";
}
return ret;
}
private Stack<Element> stack=null;
private Stack<Element> getStack(){
if(stack==null){
stack = new Stack<Element>();
}
return stack;
}
public File getCurrent(){
return getStack().peek().getCurrent();
}
public boolean isDirectoryEmpty(){
return getStack().peek().isDirectoryEmpty();
}
public DirectoryStack downLevel(){
getStack().pop();
return this;
}
public DirectoryStack goToParent(){
return downLevel();
}
public DirectoryStack goIntoDir(){
return upLevel();
}
public DirectoryStack upLevel(){
if(isCurrentNotNull())
getStack().push(new Element(getCurrent()));
return this;
}
public DirectoryStack advancePostition(){
getStack().peek().advancePointer();
return this;
}
public File[] peekDirectory(){
return getStack().peek().getFiles();
}
public boolean isLastFileOfDirectory(){
return getStack().peek().getFiles().length <= getStack().peek().getCurrentPointer();
}
public boolean gotMoreLevels() {
return getStack().size()>0;
}
public boolean gotMoreInCurrentLevel() {
return getStack().peek().getFiles().length > getStack().peek().getCurrentPointer()+1;
}
public boolean isRoot() {
return !(getStack().size()>1);
}
public boolean isCurrentNotNull() {
if(!getStack().isEmpty()){
int currentPointer = getStack().peek().getCurrentPointer();
int maxFiles = getStack().peek().getFiles().length;
return currentPointer < maxFiles;
}else{
return false;
}
}
public boolean isCurrentDirectory() {
return getStack().peek().getCurrent().isDirectory();
}
public boolean isLastFromDirList() {
return getStack().peek().getCurrentPointer() == (getStack().peek().getFiles().length-1);
}
public boolean isCantGoParent() {
return !(getStack().size()>1);
}
public boolean isOutOfDirListRange() {
return getStack().peek().getFiles().length <= getStack().peek().getCurrentPointer();
}
}
Using an Iterable doesn't imply that the Files will be streamed to you. In fact its usually the opposite. So an array is typically faster than an Iterable.
Are you sure it's due to Java, not just a general problem with having 10k entries in one directory, particularly over the network?
Have you tried writing a proof-of-concept program to do the same thing in C using the win32 findfirst/findnext functions to see whether it's any faster?
I don't know the ins and outs of SMB, but I strongly suspect that it needs a round trip for every file in the list - which is not going to be fast, particularly over a network with moderate latency.
Having 10k strings in an array sounds like something which should not tax the modern Java VM too much either.

Categories

Resources