I have arraylist of filenames (java) and I want to delete these files using rm
for(String check: checks){
here I am removing each file using rm
}
but it is time consuming can I do batching using xargs or something else which can help to delete files faster.
Don’t use rm. Use Java.
As others have pointed out, spawning a process is much slower than doing it in your program. Also, it’s just bad design to use a system command for something that can be done in code.
And finally, by using Files.delete, you gain platform independence. Your code is now write once, run anywhere!
Here’s what your loop would look like:
for (String check : checks) {
Files.delete(Path.of(check));
}
This is essentially the same as what rm would do anyway.
Related
I need some coding advice. I've written a simple Java program which runs within one of my company's internal websites. Its a command line program where users enter commands like copy file1 to remote1 and compare file1 to file2 archive diff and so on. My colleagues like it and use it frequently.
The problem is the users' commands tend to be long and repetitive. They have asked me if it is possible to implement a command history in the program. This way, they can use the up arrow or something to scroll through, edit, and resubmit previous commands. There's no need to remember commands entered in previous sessions, only the current session.
I've thought about it, and I think I could come up with a solution from scratch... but it would take a few weeks to develop. I'd much rather implement an available package or module, if one exists and isn't too much trouble. I know this is an open-ended question, but can anyone recommend such a resource?
I don't see why it should take so long to develop. Here's a rudimentary solution:
class CommandSession {
private List<Command> commands = new ArrayList<>();
private ListIterator<Command> scroller;
public void execute(Command command) {
scroller = null;
commands.add(command);
command.execute();
}
public Command scrollUp() {
if (scroller == null) {
scroller = commands.listIterator(commands.size());
}
if (scroller.hasPrevious()) {
return scroller.previous();
}
return null;
}
}
You could tweak this in various ways for more advanced functionality, but overall it's a pretty basic concept.
if you are using a *nix environment then rlwrap may be what's you're looking for.
rlwrap tries to be completely transparent - you (or your shell) shouldn't notice any difference between command and rlwrap command -
except the added readline functionality, of course. This should even
hold true when you are re-directing, piping and sending signals from
and to command, or when command manipulates its terminal settings.
There are many options to add (programmable) completion, handle
multi-line input, colour and re-write prompts. If you don't need them
(and you probably don't), you can skip the rest of this manpage.
if not, you can use cygwin to be able to use it
I've used CRaSH in my project. It's basically a SSH shell that the user is expected to connect to (supports username/password too) and yes, supports history in commands. Commands are written in Groovy.
I know that I can either catch the NameNotFoundException from a call to PackageManager.getPackageInfo or loop through the PackageInfo list returned by PackageManager.getInstalledPackages to know whether a particular package is installed, but both of these seem either long winded or ugly. On my personal phone, I have more than 300 packages installed, so I'd hate to have to do that operation every time I need to check. And catching an exception as a means of performing application logic just makes me feel wrong all over. Am I missing the isPackageInstalled method somewhere, or do I just need to implement it myself using one of the above mentioned techniques? And if the latter, which would be considered the faster and less resource intensive option?
Since PackageManager.getInstalledPackages() returns a List, you don't need to loop through it manually. You can use List.contains() or List.containsAll() to accomplish the task in one line of code. Of course, this doesn't change the efficiency since both methods likely contain a loop themselves.
If using the API really bugs you then you might look into a hack involving the following
Bash shell expression that gets the PM list
Java Runtime expression
Java Pipes and buffers and streams
Java NIO
Java grep
So the bash expression would be :
pm list packages -f | sed 's/^package.//' | awk -F"=" ' { print $2" "$1 } ' | sort
and of list of references for handling stdout from the 'pm list' in a way that might wind up being faster...
PipedBuffers
NIO/grep
Runtime/streams
Handling a NameNotFoundExcepetion should not make you feel "wrong all over" IMHO. According to the documentation this exception will be thrown if the package does not exist since api level 1. Using try/catch statement is very similar to using an if/then statement to test for a null value.
In this case it should not be considered a workaround or a hack as you are using the documented and expected return value of an exception to determine if a package exists.
I would assume this method to be faster than iterating through the List returned by getInstalledPackages(). However, I don't know what steps android takes prior to returning a NameNotFoundExcepetion. This would make an interesting benchmark test.
I'm not aware of any other practical method to test for an installed package.
I wrote some benchmarks and tested catching the exception vs a few different ways of fetching the installed packages and looping through them. Here is my result
Calling PackageManager.getPackageInfo and catching the NameNotFoundException took between 1 and 4 ms in all cases whether the requested package was installed or not, and I made sure to also include cases where this was the first call to the PackageManager for a particular run of the app and as a subsequent call just in case the framework does any caching of this information per app launch.
Calling PackageManger.getPackageInfo took between 1 and 1.5 seconds in all cases as well.
Calling getPackageInfo and catching the exception to determine if the package isn't installed is by far the faster way to check.
I'm trying to call a Java program (Stanford Chinese Word Segmenter) from within python. The Java program needs to load a large (100M) dictionary file (word list to assist segmentation) which takes 12+ seconds. I was wondering if it is possible to speed up the loading process, and more importantly, how to avoid loading it repeatedly when I need to call the python script multiple times?
Here's the relevant part of the code:
op = subprocess.Popen(['java',
'-mx2g',
'-cp',
'seg.jar',
'edu.stanford.nlp.ie.crf.CRFClassifier',
'-sighanCorporaDict',
'data',
'-testFile',
filename,
'-inputEncoding',
'utf-8',
'-sighanPostProcessing',
'true',
'ctb',
'-loadClassifier',
**'./data/ctb.gz',**
'-serDictionary',
'./data/dict-chris6.ser.gz',
'0'],
stdout = subprocess.PIPE,
stdin = subprocess.PIPE,
stderr = subprocess.STDOUT,
)
In the above code, './data/ctb.gz' is the place where the large word list file is loaded. I think this might be related to process, but I don't know much about it.
You might be able to use an OS specific solution here. Most modern Operating Systems have the ability to have a partition in memory. For example, in Linux you could do
mkfs -q /dev/ram1 8192
mkdir -p /ramcache
mount /dev/ram1 /ramcache
Moving the file to that directory would greatly speed I/O
There might be many ways to speed up the loading of the word list, but it depends on the details. If IO (disk read speed) is the bottleneck, then a simple way might be to zip the file and use a ZipInputStream to read it - but you would need to benchmark this.
To avoid multiple loading, you probably need to keep the Java process running, and communicate with it from Python via files or sockets, to send it commands, rather than actually launching the Java process each time from Python.
However, both of these require modifying the Java code.
If the java program produces output as soon as it receives input from filename named pipe and you can't change the java program then you could keep your Python script running instead and communicate with it via files/sockets as #DNA suggested for the Java process (the same idea but the Python program keeps running).
# ...
os.mkfifo(filename)
p = Popen([..., filename, ...], stdout=PIPE)
with open(filename, 'w') as f:
while True:
indata = read_input() # read text to segment from files/sockets, etc
f.write(indata)
# read response from java process
outdata = p.stdout.readline()# you need to figure out when to stop reading
write_output(outdata) # write response via files/sockets, etc
You can run a single instance of the JVM and use named pipes to allow the python script to communicate with the JVM. This will work assuming that the program executed by the JVM is stateless and responds on its stdout (and stderr perhaps) to requests arriving via its stdin.
Why not track whether the file has already been read on the python side? I'm not a python whiz, but I'm sure you could have some list or map/dictionary of all the files that have been opened so far.
The question is pretty much what is asked in the title.
I have a lot of PNG files created by MapTiler. 24083 files to be exact. They are within many folders which are in many folders i.e. a tree of folders, duh. Thing is, it's the biggest waste of time to manually PNGCrush all of those.
Does anyone have an algorithm to share for me please? One that could recursively crush all these PNGs?
I have a Windows PC and would love to have it rather in Java or PHP than another language (since I already know it well) But else something else might be fine.
Thanks!
You don't need anything special for this, just use the FOR command in the Windows Command Prompt.
Use this line:
FOR /R "yourdir" %f IN (*.png) DO pngcrush "%f" "%f.crushed.png"
The "yourdir" is the root-directory where the input files are stored.
The two %f's at the end:
The first one is the input filename
The second one is the output filename
-ow option added in 1.7.22 to make the operation in-place:
FOR /R "yourdir" %f IN (*.png) DO pngcrush -ow "%f"
See this page for more information of FOR.
The program 'sweep' http://users.csc.calpoly.edu/~bfriesen/software/files/sweep32.zip lets you run the same command on all files in a directory recursively.
See: RecursiveIteratorIterator with RecursiveDirectoryIterator and exec (or similar)
With that you can use:
$it = new RecursiveIteratorIterator(new RecursiveDirectoryIterator('%your-top-directory%'));
foreach ($it as $entry) {
if (strtolower($entry->getExtension()) == 'png') {
// execute command here
}
}
I deal with very large binary files ( several GB to multiple TB per file ). These files exist in a legacy format and upgrading requires writing a header to the FRONT of the file. I can create a new file and rewrite the data but sometimes this can take a long time. I'm wondering if there is any faster way to accomplish this upgrade. The platform is limited to Linux and I'm willing to use low-level functions (ASM, C, C++) / file system tricks to make this happen. The primimary library is Java and JNI is completely acceptable.
There's no general way to do this natively.
Maybe some file-systems provide some functions to do this (cannot give any hint about this), but your code will then be file-system dependent.
A solution could be that of simulating a file-system: you could store your data on a set of several files, and then provide some functions to open, read and write data as if it was a single file.
Sounds crazy, but you can store the file data in reverse order, if it is possible to change function that reads data from file. In that case you can append data (in reverse order) at the end of the file. It is just a general idea, so I can't recommend anything particular.
The code for reversing of current file can looks like this:
std::string records;
ofstream out;
std::copy( records.rbegin(), records.rend(), std::ostream_iterator<string>(out));
It depends on what you mean by "filesystem tricks". If you're willing to get down-and-dirty with the filesystem's on-disk format, and the size of the header you want to add is a multiple of the filesystem block size, then you could write a program to directly manipulate the filesystem's on-disk structures (with the filesystem unmounted).
This enterprise is about as hairy as it sounds though - it'd likely only be worth it if you had hundreds of these giant files to process.
I would just use the standard Linux tools to do it.
Writting another application to do it seems like it would be sub-optimal.
cat headerFile oldFile > tmpFile && mv tmpFile oldFile
I know this is an old question, but I hope this helps someone in the future. Similar to simulating a filesystem, you could simply use a named pipe:
mkfifo /path/to/file_to_be_read
{ echo "HEADER"; cat /path/to/source_file; } > /path/to/file_to_be_read
Then, you run your legacy program against /path/to/file_to_be_read, and the input would be:
HEADER
contents of /path/to/source_file
...
This will work as long as the program reads the file sequentially and doesn't do mmap() or rewind() past the buffer.