Compare files list to object List to delete files

Compare files list to object List to delete files - java

I have a list of plans, each plan has a PDF in ("/web/managed/")
I wasn't deleting the files when I delete the plan, so now I'm trying to add a function to delete all files that are not have the ids in my plan list.
Files name always has the id.
Example: 6365_Test-LVLD.pdf
list of the object:
#Transaction
public List<StorePlan> getPlans() {
List<StorePlan> list = getCurrentSession().createCriteria(StorePlan.class).list();
return list;
}
then I'll get all the files from my folder:
protected File[] getPDFs() {
return new File("/web/managed/").listFiles();
}
here's my purge function:
protected void getPlanIds() {
int count = 0;
for(StorePlan plan : storePlanDao.getPlans()) {
for (File file : getPDFs()) {
String planFileId = file.getName().substring(0, 4);
if(plan.getId() != Integer.valueOf(planFileId)) {
file.delete();
count++;
}
}
}
}
with my code: it will delete everything from my folder. when I want to keep the files that will still have ids in the other list.

If I understood your question then this should work:
List<Integer> planIds = Lists.newArrayList();
for(StorePlan plan : storePlanDao.getPlans()){
planIds.add(plan.getId());
}
for (File file : getPDFs()) {
Integer planFileId = Integer.valueOf(file.getName().substring(0, 4))
if(!ids.contains(planFileId)) {
file.delete();
count++;
}
}

I think I see the problem. Instead of deleting the problem within the second loop have it set a Boolean to true and break out of the loop. Outside of the second loop have an if statement that, if true, deletes the file. So:
protected void getPlanIds() {
int count = 0;
for(StorePlan plan : storePlanDao.getPlans()) {
Boolean found = false;
for (File file : getPDFs()) {
String planFileId = file.getName().substring(0, 4);
if(plan.getId() == Integer.valueOf(planFileId)) {
found = true;
break;
} else {
count++;
}
}
if (!found) {
file.delete();
}
}
}
I apologize for bad formatting. I'm on mobile and passing time at work. xD

Related

Value inserted into a Map does not remain there in the second loop of a cycle

The funciton below copies all files with given extension from rootDirectory into given destination. It works well when names of the files differ, however when there are two files with the same name (see the recurvive call - it can be in the subdirectory), it does not do what it should. If there are more files with the same name, it should copy both and rename the second (adding _1, _2,... to its name).
I see there might be a problem with the Map I am using - every time a file is copied, I want to save it's name and add counter that counts how many times it has been copied (so the appropriate number can be added to its name). Could you please help me to fix the problem?
void copy(File rootDirectory, String destination, String fileExtension) {
File destFile = new File(destination);
HashMap<String, Integer> counter = new HashMap<>();
for (File file : rootDirectory.listFiles()) {
try {
if (file.isDirectory()) { copy(file, destination, fileExtension);
} else if (getExtension(file.getPath().toLowerCase()).equals(fileExtension.toLowerCase())) {
if (!destFile.exists()) { destFile.mkdirs();}
String fileName = file.getName();
if(counter.containsKey(fileName)){ // <<-- IS NEVER TRUE
int count = counter.get(fileName);
count++;
counter.put(fileName, count);
int i = fileName.contains(".") ? fileName.lastIndexOf('.') : fileName.length();
fileName = fileName.substring(0, i) + "_" + count + fileName.substring(i);
} else{ counter.put(fileName, 0);
}
Files.copy(file.toPath(), Paths.get(destination + "\\" + fileName), StandardCopyOption.REPLACE_EXISTING);
}
} catch (IOException e) {
//...
}
}
}

You are using recursion. In other words you always start from a new empty Map. Put the map outside of your method and that will solve your problem.

Java - Method for batch processing text files is much slower then the same action individually the same amount of times

I wrote a method processTrainDirectory which is supposed to import and process all the text files from a given directory. Individually processing the files takes about the same time for each file (90ms), but when I use the method for batch importing a given directory, the time per file increases incrementally (from 90ms to over 4000ms after 300 files). The batch importing method is as follows:
public void processTrainDirectory(String folderPath, Category category) {
File folder = new File(folderPath);
File[] listOfFiles = folder.listFiles();
if (listOfFiles != null) {
for (File file : listOfFiles) {
if (file.isFile()) {
processTrainText(file.getPath(), category);
}
}
}
else {
System.out.println(foo);
}
}
As I said, the method processTrainText is called per text file in the directory. This method takes incrementally longer when used inside processTrainDirectory. The method processTrainText is as follows:
public void processTrainText(String path, Category category){
trainTextAmount++;
Map<String, Integer> text = prepareText(path);
update(text, category);
}
I called processTrainText 200 times on 200 different texts manual and the time that this took was 200 * 90ms. But when I have a directory of 200 files and use processTrainDirectory it takes 90-92-96-104....3897-3940-4002ms which is WAY longer.
The problem persists when I call processTrainText a second time; it does not reset. Do you have any idea why this is or what the cause it, and how I can solve it?
Any help is greatly appreciated!
EDIT: somebody asked what other called methods did so here are all the used methods from my class BayesianClassifier all others are deleted for clarification, underneath you can find the class Category:
public class BayesianClassifier {
private Map<String, Integer> vocabulary;
private List<Category> categories;
private int trainTextAmount;
private int testTextAmount;
private GUI gui;
public Map<String, Integer> prepareText(String path) {
String text = readText(path);
String normalizedText = normalizeText(text);
String[] tokenizedText = tokenizeText(normalizedText);
return countText(tokenizedText);
}
public String readText(String path) {
BufferedReader br;
String result = "";
try {
br = new BufferedReader(new FileReader(path));
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
result = sb.toString();
br.close();
} catch (IOException e) {
e.printStackTrace();
}
return result;
}
public Map<String, Integer> countText(String[] words){
Map<String, Integer> result = new HashMap<>();
for(int i=0; i < words.length; i++){
if (!result.containsKey(words[i])){
result.put(words[i], 1);
}
else {
result.put(words[i], result.get(words[i]) + 1);
}
}
return result;
}
public void processTrainText(String path, Category category){
trainTextAmount++;
Map<String, Integer> text = prepareText(path);
update(text, category);
}
public void update(Map<String, Integer> text, Category category) {
category.addText();
for (Map.Entry<String, Integer> entry : text.entrySet()){
if(!vocabulary.containsKey(entry.getKey())){
vocabulary.put(entry.getKey(), entry.getValue());
category.updateFrequency(entry);
category.updateProbability(entry);
category.updatePrior();
}
else {
vocabulary.put(entry.getKey(), vocabulary.get(entry.getKey()) + entry.getValue());
category.updateFrequency(entry);
category.updateProbability(entry);
category.updatePrior();
}
for(Category cat : categories){
if (!cat.equals(category)){
cat.addWord(entry.getKey());
cat.updatePrior();
}
}
}
}
public void processTrainDirectory(String folderPath, Category category) {
File folder = new File(folderPath);
File[] listOfFiles = folder.listFiles();
if (listOfFiles != null) {
for (File file : listOfFiles) {
if (file.isFile()) {
processTrainText(file.getPath(), category);
}
}
}
else {
System.out.println(foo);
}
}
This is my Category class (all the methods that are not needed are deleted for clarification:
public class Category {
private String categoryName;
private double prior;
private Map<String, Integer> frequencies;
private Map<String, Double> probabilities;
private int textAmount;
private BayesianClassifier bc;
public Category(String categoryName, BayesianClassifier bc){
this.categoryName = categoryName;
this.bc = bc;
this.frequencies = new HashMap<>();
this.probabilities = new HashMap<>();
this.textAmount = 0;
this.prior = 0.00;
}
public void addWord(String word){
this.frequencies.put(word, 0);
this.probabilities.put(word, 0.0);
}
public void updateFrequency(Map.Entry<String, Integer> entry){
if(!this.frequencies.containsKey(entry.getKey())){
this.frequencies.put(entry.getKey(), entry.getValue());
}
else {
this.frequencies.put(entry.getKey(), this.frequencies.get(entry.getKey()) + entry.getValue());
}
}
public void updateProbability(Map.Entry<String, Integer> entry){
double chance = ((double) this.frequencies.get(entry.getKey()) + 1) / (sumFrequencies() + bc.getVocabulary().size());
this.probabilities.put(entry.getKey(), chance);
}
public Integer sumFrequencies(){
Integer sum = 0;
for (Integer integer : this.frequencies.values()) {
sum = sum + integer;
}
return sum;
}
}

It looks like the times per file are growing linearly and the total time quadratically. This means that with each file you're processing the data of all previous files. Indeed, you are:
updateProbability calls sumFrequencies, which runs through the entire frequencies, which grows with each file. That's the culprit. Simply create a field int sumFrequencies and update it in `updateFrequency.
As a further improvement, consider using Guava Multiset, which does the counting in a simpler and more efficient way (no autoboxing). After fixing your code, consider letting it be reviewed on CR; there are quite a few minor problems with it.

what is this method doing?
update(text, category);
If it is doing what may be a random call of me than this may be your bottleneck.
If you call it in a single way without additional context and it is updating some general data structure than yes it will always take the same time.
If it updates something that holds data from your past iterations than I am pretty sure it will take more and more time - check complexiy of update() method then and reduce your bottleneck.
Update:
Your method updateProbability is working on all the data you gathered so far when you are calculating sum of frequencies - thus taking more and more time the more files you process. This is your bottleneck.
There is no need of calculating it every time - just save it and update it every time something changes to minimize amount of calculation.

Finding a file in a directory and returning its full path and filename

This should be easy. This question (Java - Search for files in a directory) seemed to take me 99% of the way to where I needed to be, but that missing 1% is being a real SOB.
I need to find a specific file in a directory and return the full path and filename as a string. If there's more than one matching file, that's fine, I just need the first match.
The code below works inasmuch as it will recursively traverse a directory structure and return all matches -- I can see it happening when I put sysouts into the various parts of the method -- but I can't for the life of me make it stop when it finds a match and return me the value of the match.
I've tried substituting the FOR statement with a WHILE statement controlled by the the value of the foundfile variable as well as half a dozen other approaches but they all come down to the same end; when I find the matching file and set it to the foundfile variable in the "else if" clause, the for loop just keeps on iterating and overwrites the value of the foundfile variable with the "" value on the next loop. I would have thought that calling the setOutput method from within the "if else" clause would have set the value successfully until the list array was empty, but evidently not.
Clearly there is something about recursion and the persistence of parameters that I'm fundamentally misunderstanding. Can anyone illuminate?
package app;
import java.io.*;
import java.util.*;
class FindFile {
public String setOutput(String name, File file, String fileloc) {
String foundfile = fileloc;
File[] list = file.listFiles();
if (list != null)
for (File fil : list) {
if (fil.isDirectory()) {
setOutput(name, fil, foundfile);
} else if (fil.getName().contains(name)) {
foundfile = (fil.getParentFile() + "\\" + fil.getName());
setOutput(name, fil, foundfile);
}
}
return foundfile;
}
public static void main(String[] args) {
FindFile ff = new FindFile();
String thisstring = ff.setOutput(".jar", new File("/Temp/df384b41-198d-4fee-8704-70952d28cbde"), "");
System.out.println("output: " + thisstring);
}
}

You can return the file path when you find it. No need to check the other files if you are only interested in the first match:
Here is an example (not tested):
public String setOutput(String name, File file) {
File[] list = file.listFiles();
if (list != null) {
for (File fil : list) {
String path = null;
if (fil.isDirectory()) {
path = setOutput(name, fil);
if (path != null) {
return path;
}
} else if (fil.getName().contains(name)) {
path =fil.getAbsolutePath();
if (path != null) {
return path;
}
}
}
}
return null; // nothing found
}

Undo effects of java mkdirs()

I have a situation where I need to run a "pre-check" to see if a directory is "createable". This is not a problem, just run a file.mkdirs() and see if it returns true.
The problem is that I would like to clean up after this check. This is a bit tricky, because I want to delete only those folders and subfolder that mkdirs() actually created.
Can anyone think of a clever way to do this?

I think this method does the job without you having to call mkdirs:
public static boolean canMkdirs(File dir) {
if(dir == null || dir.exists())
return false;
File parent = null;
try {
parent = dir.getCanonicalFile().getParentFile();
while(!parent.exists())
parent = parent.getParentFile();
} catch(NullPointerException | IOException e) {
return false;
}
return parent.isDirectory() && parent.canWrite();
}

Keep one array which holds name of that dirs. so when you want to delete dir you can take that array content/string/dir-name to delete.

A bit dangerous:
if (file.mkdirs()) {
long t0 = file.lastModified();
for (;;) {
long t = file.lastModified();
if (t < t0 - 1000L) { // Created longer than it's child minus 1 s?
break;
}
t0 = t;
file.delete();
file = file.getParentFile();
}
}

If my assumption that permissions are inherited in the file structure is correct, something like this should do it:
File f = new File("C:\\doesntExist\\Nope\\notHere");
File tempFile = f;
while (!tempFile.exists())
tempFile = tempFile.getParentFile();
if (!tempFile.canWrite()
&& tempFile.isDirectory()) // Copied this line from Lone nebula's answer (don't tell anyone, ok?)
System.out.println("Can't write!");
else
{
f.mkdirs();
...
}

Judging by the mkdirs() source code:
public boolean mkdirs() {
if (exists()) {
return false;
}
if (mkdir()) {
return true;
}
File canonFile = null;
try {
canonFile = getCanonicalFile();
} catch (IOException e) {
return false;
}
File parent = canonFile.getParentFile();
return (parent != null && (parent.mkdirs() || parent.exists()) &&
canonFile.mkdir());
}
If I hadn't missed something you have two options:
remeber the state of the files on the disk before calling the mkdirs(), compare it with the state after the mkdirs(), handle if necessary
extend the File class and override mkdirs() method to remember exactly which files were created. If any are created, handle them.
The latter seems like a more elegant solution which will yield less code.
UPDATE:
I strongly recommend to take in consideration david a. comment.

Recursively Deleting a Directory

I have this section of code:
public static void delete(File f) throws IOException
{
if (f.isDirectory())
{
for (File c : f.listFiles())
{
delete(c);
}
}
else if (!f.delete())
{
throw new FileNotFoundException("Failed to delete file: " + f);
}
}
public static void traverseDelete(File directory) throws FileNotFoundException, InterruptedException
{
//Get all files in directory
File[] files = directory.listFiles();
for (File file : files)
{
if (file.getName().equalsIgnoreCase("word"))
{
boolean containsMedia = false;
File[] filesInWordFolder = file.listFiles();
for ( File file2 : filesInWordFolder )
{
if ( file2.getName().contains("media"))
{
containsMedia = true;
break;
}
}
if (containsMedia == false)
{
try
{
delete(file.getParentFile());
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
else if (file.isDirectory())
{
traverseDelete(file);
}
}
}
Sorry for the lack of commenting, but it's pretty self-explanatory, I think. Essentially what the code is supposed to do is traverses a set of files in a given directory, if it encounters a directory named "word", then it should list out the contents of word, and then if a directory called "media" does NOT exist, recursively delete everything within the parent directory of "word" down.
My main concern comes from this conditional:
if(!filesInWordFolder.toString().contains("media"))
Is that the correct way to say if the files in that array does not contain an instance of "image", go ahead and delete?

That won't work.
File[] filesInWordFolder = file.listFiles();
if(!filesInWordFolder.toString().contains("media"))
will give you a string representation of a File array -- which will typically have a reference.
You have to iterate through the files to find out if there's any in there that contain the word media.
boolean containsMedia = false;
for ( File file : filesInWordFolder ) {
if ( file.getName().contains("media") ){
containsMedia = true;
break;
}
// now check your boolean
if ( !containsMedia ) {

Well using toString() will give you a String representation of the file (in this case the files). The String representation should contain the file name. If your set purpose is to check for any instance of a file containing the word "media" in the directory, you are fine.
In the example you are printing the String representation of the File array. Instead you should iterate through the File array and check the String representation of each individual File as so:
for (int i = 0; i < file_array.length; i++) {
if ((File)file_array[i]).toString().equals("your_search_term")) {
// The file contains your search term
} else {
// Doesn't contain the search term.
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Compare files list to object List to delete files - java

Related

Value inserted into a Map does not remain there in the second loop of a cycle

Java - Method for batch processing text files is much slower then the same action individually the same amount of times

Finding a file in a directory and returning its full path and filename

Undo effects of java mkdirs()

Recursively Deleting a Directory

Categories

Resources