sampling files from a folder - java

I have a folder containing 100000 files, and need to get 1000 files from this folder through random sampling. Are there any sample functions that I can use to sample from folder? In addition, how to copy the sampled files to another folder?

Random selection could follow along the following lines
File files[] = new File("/path/to/files").listFiles();
Map<Integer, File> selection = new HashMap<Integer, File>(1000);
while (selection.size() < 1000) {
int value = (int)Math.round(Math.random() * files.length);
if (!selection.containsKey(value)) {
selection.put(value, files[value]);
}
}
for (File file : selection.values()) {
System.out.println(file);
}
Essentially, you need to grab a list of the available files and the randomly pick through the list until you have enough of a sample. Check out java.io.File
There are plenty of examples of file copying over the net (and SO). If you're really stuck you could have a look the IO Trail or Apache Commons IO which I believe has a utility class capable of coping files
UPDATED
As suggested by Andrew, you could simply shuffle the file list and pull the first 1000 elements...
File files[] = new File("/path/to/files").listFiles();
List<File> selection = null;
List<File> fileList = new ArrayList<File>(Arrays.asList(files));
Collections.shuffle(fileList);
selection = fileList.subList(0, Math.min(1000, fileList.size()));
for (File file : selection) {
System.out.println(file);
}

Please try this
public static void main(String args[]) throws Exception
{
File f= new File("E:/Eclipse-Leo/Test/src/test/Desktop1");
List<File> randomFiles = new ArrayList<File>();
List<Integer> randNumber = new ArrayList<Integer>();
if(f != null && f.isDirectory()){
File[] files = f.listFiles();
Random randomGenerator = new Random();
int idx = 1;
while(idx <101 && idx >= 1)
{
int randTemp = randomGenerator.nextInt(1000);
if(!randNumber.contains(randTemp))
{
randNumber.add(randTemp);
randomFiles.add(files[randTemp]);
idx++;
}
}
}
}

File[] files = dir.listFiles();
Then just use files.length and a random number generator to index into the array.

Related

Take input files same in the order as in directory

I'm writing a code where in there a bunch of files that have to be taken as input from a directory.
The program works fine, but the problem comes up in the way the files are picked. In my directory when I do a sort the first file shown is file5521.3, but in my program the first file that is picked up is file5521.100. THis is pretty confusing.
I've also tried using Arrays.sort(list, NameFileComparator.NAME_COMPARATOR), but it also gives the same result as previous.
Below is my code.
void countFilesInDirectory(File directory, String inputPath) throws IOException {
File[] list = directory.listFiles();
Arrays.sort(list, NameFileComparator.NAME_COMPARATOR);
for (int i = 0; i < list.length; i++) {
System.out.println(list[i]);
}
tempPath = inputPath.substring(0, inputPath.lastIndexOf("\\") + 1) + "OP\\";
File outPath = new File(tempPath);
if (!outPath.exists()) {
outPath.mkdir();
}
File temp = new File(tempPath + "temp.txt");
FileOutputStream fos = new FileOutputStream(temp);
if (!temp.exists()) {
temp.createNewFile();
}
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fos));
for (int i = 0; i < list.length; i++) {
System.out.println(list[i]);
setStatusText(i);
GenerateFiles(list[i].getAbsoluteFile().toString(), bw);
}
bw.write("</body>");
bw.close();
File newFile = new File(temp.getParent(), "Index.html");
Files.move(temp.toPath(), newFile.toPath());
}
please let me know how can I do this.
Working Solution with Last Modified Date
Arrays.sort(list, new Comparator<File>() {
public int compare(File f1, File f2) {
return Long.compare(f1.lastModified(), f2.lastModified());
}
});
The Comparator works fine with last modified date, but when I try it with below code. The result is same as previous.
Arrays.sort(list, new Comparator<File>() {
#Override
public int compare(File o1, File o2) {
return o1.getName().compareTo(o2.getName());
}
});
In my Windows Explorer It looks like below. I've sorted the filenames.
And my console output shows the below.
Thanks
If your OS is Windows XP or later, your files will be sorted using Numerical File Name Sorting.
Since you want to match the file order both in the program and in your File Explorer, you can either:
Follow the steps in the link to use Classical Literal Sorting which will change the way File Explorer displays files.
Create a custom Comparator that you pass into sort that will sort based on the actual VALUE of the number (e.g. 3 would come before 10 because 3 < 10)
You can use Apache Commons IO to sort files using many predefined Comparators or even combine these using CompositeFileComparator
Here is an example:
import org.apache.commons.io.comparator.*;
File dir = new File(".");
File[] files = dir.listFiles();
File[] filesSorted = DefaultFileComparator.DEFAULT_COMPARATOR.sort(files);
File[] filesReversed = DefaultFileComparator.DEFAULT_REVERSE.sort(files);
... and here is the list of available comparators

File[] can not handle multiple files

I want to read multiple files from multiple directories. So far my program can read multiple files only until certain limit.
public void filesTobeCollected(String validpath) throws Exception {
this.folder = new File(validpath);
// this.file = new ArrayList<>();
File[] fileEntry = folder.listFiles();
System.out.println("directory path" + fileEntry.length);
/*****File upload read****/
for (int i = 0; i < fileEntry.length; i++) {
/*this line weeds out other directories/folders*/
if (fileEntry[i].exists()) {
file.add(String.valueOf(fileEntry[i]));
}
}
}
and I send value directories through this function,
public List<String> filesTobeTested(ArrayList<String> file) throws
Exception {
Set<String> test = new HashSet<>();
test.add("src/test/resources/gfd");
test.add("src/test/resources/rds");
test.add("src/test/resources/oiu");
//test.add("src/test/resources/pol");
System.out.println("directory path 2" + test);
for (int i = 0; i < 1; i++) {
for (Iterator<String> it = test.iterator(); it.hasNext(); ) {
String f = it.next();
filesTobeCollected(f);
System.out.println("Found" + f);
}
Here File[] fileEntry seems to have some limit as I push more directories using test.add("src/test/resources/obj") File[] fileEntry can't handle any more files. Is any alternative for File[] fileEntry incase of unlimited or unknown number of files?
Is any alternative for File[] fileEntry incase of unlimited or unknown number of files?
Yes, use ArrayList<File> wich will handle size automatically for you.
But this does not seem to be the problem because you know all the files you need to handle in the array creation so size is accurate File[] fileEntry = folder.listFiles();.
Remove this line
for (int i = 0; i < 1; i++) {
which seems useles... If problem remains, post your stacktrace
As deeply I recognized you, you want to collect those files in one.
In my case first I would recommend to collect the file(s) path or file(s) and add those in an ArrayList<File>, secondary read those file(s) and collect in ArrayList<byte[]>, later if you want to make it a single file, Surely you can use an ObjectOutputStream class to write the ArrayList<byte[]> Object and certainly create a single File
During this take care of File reading & writing bytes sizes don't crosses your Java Heap Space boundary.

Java File Name Printing

I am trying to print name of files from two folders, and this code compiles but not giving anything on running it.
The main target here is to find common name files in two folders, I have stored file names in two arrays and then i will applying sorting and will find common files.
package javaapplication13;
import java.io.File;
import java.util.*;
public class ListFiles1
{
public static void main(String[] args)
{
String path1 = "C:/";
String path2 = "D:/";
File folder1 = new File(path1);
File folder2 = new File(path2);
String[] f1=folder1.list();
File[] listOfFiles1 = folder1.listFiles();
File[] listOfFiles2 = folder2.listFiles();
ArrayList<String> fileNames1 = new ArrayList<>();
ArrayList<String> fileNames2 = new ArrayList<>();
for (int i = 0; i < listOfFiles1.length; i++)
{
if (listOfFiles1[i].isFile())
{
fileNames1.add(listOfFiles1[i].getName());//wow
System.out.println(listOfFiles1[i].getName());
}
}
for (int i = 0; i < listOfFiles2.length; i++)
{
if (listOfFiles2[i].isFile())
{
fileNames2.add(listOfFiles2[i].getName());//seriously wow
}
}
}
}
Loop through both the ArrayLists you have. Each ArrayList contains the file names as it is. You'll need a nested loop (a loop inside a loop). In the core of the nested loops, you want to do a compare between current position of each ArrayList. You can use .equals() method for this. The pseudo code is something like:
//create a new ArrayList called "commonNameList"
// loop through fileNames1 with position variable "i"
//loop through fileNames2 with position variable "j"
//tempFileName1 = fileNames1.get(i)
//tempFileName2 = fileNames2.get(j)
//if tempFileName1 equals tempFileName2
//commonNameList.add(tempFileName1)
Check these out:
http://mathbits.com/MathBits/Java/Looping/NestedFor.htm
Simple nested for loop example
How do I compare strings in Java?
The main target here is to find common name files in two folders, I have stored file names in two arrays and then i will applying sorting and will find common files.
I don't like the two array thing. Also, a duplicate file should probably have the same length as well as the same name. If you are really going for just names, you can remove the f.length() == temp.length() part of the condition.
private static void findDups(String dirName1, String dirName2){
File dir1 = new File(dirName1);
File dir2 = new File(dirName2);
Map<String,File> fileMap = new HashMap<String,File>();
File[] files = dir1.listFiles();
for(File f : files){
fileMap.put(f.getName(),f);
}
files = dir2.listFiles();
StringBuilder sb = new StringBuilder(100);
for(File f : files){
File temp = fileMap.get(f.getName());
if(temp != null && f.length() == temp.length()){
sb.append("Found duplicate files: ")
.append(temp.getAbsolutePath())
.append(" and ")
.append(f.getAbsolutePath());
System.out.println(sb);
sb.delete(0,sb.length());
}
}
}

Is it possible to store files of a folder in dynamic array in Java [duplicate]

This question already has answers here:
Finding common Files from two arrays
(2 answers)
Closed 9 years ago.
I am trying to find same name files in two folders.
I used File and listed names of file in two array list.
then i added common name files in two folders to a new arraylist and going to apply diff to find if these files are different or not.
As i have only stored name of files in Array List, i can't apply operation on those files directly.
Someone told me that by the use of dynamic array one can save files in Java...
My code till now with help of some great friends :
import java.io.File;
import java.util.*;
public class ListFiles1
{
public static void main(String[] args)
{
String path1 = "C:\\Users\\hi\\Downloads\\IIT Typing\\IIT Typing";
String path2 = "C:\\Users\\hi\\Downloads\\IIT Typing\\IIT Typing";
File folder1 = new File(path1);
File folder2 = new File(path2);
String[] f1=folder1.list();
File[] listOfFiles1 = folder1.listFiles();
File[] listOfFiles2 = folder2.listFiles();
ArrayList<String> fileNames1 = new ArrayList<>();
ArrayList<String> fileNames2 = new ArrayList<>();
for (int i = 0; i < listOfFiles1.length; i++)
{
if (listOfFiles1[i].isFile())
{
fileNames1.add(listOfFiles1[i].getName());
// System.out.println(f1[i] + " is a file");
}
}
for (int j = 0; j < listOfFiles2.length; j++)
{
if (listOfFiles2[j].isFile())
{
fileNames2.add(listOfFiles2[j].getName());
}
}
ArrayList<String> commonfiles = new ArrayList<>();
for (int i = 0; i < listOfFiles1.length; i++)
{
for (int j = 0; i < listOfFiles2.length; j++)
{
String tempfilename1;
String tempfilename2;
tempfilename1=fileNames1.get(i);
tempfilename2 = fileNames2.get(j);
if(tempfilename1.equals(tempfilename2))
{
commonfiles.add(tempfilename1);
System.out.println(commonfiles);
}
}
}
}
}
Rather then having an ArrayList of Strings of the file names, just have an ArrayList of Files
List<File> filesList1 = Arrays.asList(folder1.listFiles());
List<File> filesList2 = Arrays.asList(folder2.listFiles());
Then when comparing if they have the same name then do your check of if it is a file and has the same name, then you have the reference to the File object and not just the name so you can read the files and see if they are the same.
for (File f1 : filesList1)
{
if(f1.isFile())
{
for (File f2 : filesList2)
{
if(f2.isFile() && f1.getName().equals(f2.getName))
{
commonfiles.add(f1.getName());
System.out.println(f1.getName());
}
}
}
}
This could be done way more efficiently with sets though
I'm not sure what your question actually is here, but I suggested that you use:
Set<String> dir1Files = new HashSet<String>();
Set<String> dir2Files = new HashSet<String>();
// load the sets with the filenames by iterating the File.listFiles() value, and using File.isFile() and File.getName() - just like your existing code
dir1Files.retainAll(dir2Files);
// now dir1Files contains the filenames that are the same in both directories
And if you need to work with the file itself, just recreate the File object:
File dir1File = new File(folder1, filename);
import java.io.File;
public class FileNameMatcher
{
public static void main(String[] args)
{
File folder1 = new File("C:/Users/pappu/Downloads");
File folder2 = new File("C:/Users/pappu");
for(String fileFromFolderOne:folder1.list())
{
for(String fileFromFolderTwo:folder2.list())
{
if(fileFromFolderOne.equals(fileFromFolderTwo))
{
System.out.println("match found");
System.out.println("file name is ===>>>"+fileFromFolderOne);
}
}
}
}
}

How do i incorporate these code below into my current existing codes?

I was task to allocate only 1GB of space to store my videos in a particular file directory where it is going to auto-delete the oldest video file in that directory once its about to reach/hit 1GB?
And i eventually found these code but i was left with a problem on how to incorporate these example 1/2 codes into my current existing mainActivity.java file because of the differences in names like "dirlist,tempFile" compared with other examples 1/2 given to perform the task of size checking and deleting.
Sorry i'm kinna new in android/java therefore i don't really know what "fields" to change to suit my current coding needs? Can someone help on how am i going to complie these set of codes into a single set of code which perform the above mention functions??
My Current existing mainActivity.java
File dirlist = new File(Environment.getExternalStorageDirectory() + "/VideoList");
if(!(dirlist.exists()))
dirlist.mkdir();
File TempFile = new File(Environment.getExternalStorageDirectory()
+ "/VideoList", dateFormat.format(date) + fileFormat);
mediaRecorder.setOutputFile(TempFile.getPath());
(Example 1) code for summing up directory file size in a given folder..
private static long dirSize(File dir) {
long result = 0;
Stack<File> dirlist= new Stack<File>();
dirlist.clear();
dirlist.push(dir);
while(!dirlist.isEmpty())
{
File dirCurrent = dirlist.pop();
File[] fileList = dirCurrent.listFiles();
for (int i = 0; i < fileList.length; i++) {
if(fileList[i].isDirectory())
dirlist.push(fileList[i]);
else
result += fileList[i].length();
}
}
return result;
}
(Example 2) set of code for getting all the files in an array, and sorts them depending on their modified/created date. Then the first file in your array is your oldest file and delete it.
// no idea what are the parameters i should enter
// here for my case in mainActivity??
File directory = new File((**String for absolute path to directory**);
File[] files = directory.listFiles();
Arrays.sort(files, new Comparator<File>() {
#Override
public int compare(File f1, File f2)
{
return Long.valueOf(f1.lastModified()).compareTo(f2.lastModified());
}});
file[0].delete();
This is a reference to your previous question: How do I put a capped maximum directory storage space in SD?. In the future, you should keep discussions about the same topic in the same question, rather than create 2 new identical questions.
In your Activity class lets say you define these 2 methods:
private void deleteOldestFile(File directory)
{
File[] files = directory.listFiles();
Arrays.sort(files, new Comparator<File>() {
#Override
public int compare(File f1, File f2)
{
return Long.valueOf(f1.lastModified()).compareTo(f2.lastModified());
}});
files[0].delete();
}
private static long dirSize(File dir) {
long result = 0;
File[] fileList = dir.listFiles();
for(int i = 0; i < fileList.length; i++) {
if(fileList[i].isDirectory()) {
result += dirSize(fileList [i]);
} else {
// Sum the file size in bytes
result += fileList[i].length();
}
}
return result;
}
You can now do this with your code:
File dirlist = new File(Environment.getExternalStorageDirectory() + "/VideoList");
if(!(dirlist.exists()))
dirlist.mkdir();
Long directorySize = dirSize(dirlist);
if (directorySize > 1073741824) // this is 1GB in bytes
{
deleteOldestFile(dirlist);
}
File TempFile = new File(Environment.getExternalStorageDirectory()
+ "/VideoList", dateFormat.format(date) + fileFormat);
mediaRecorder.setOutputFile(TempFile.getPath());
So before setting the output file in that folder, it checks if the folder is > 1GB, and if so, deletes the oldest file first.
To be honest though, deleting the oldest file may not necessarily make the directory size < 1GB, so i would use a while loop to ensure that it is < 1GB like so:
while (directorySize > 1073741824)
{
deleteOldestFile(dirlist);
direcotrySize = dirSize(dirlist);
}

Categories

Resources