Can anyone help in tuning this method? When I log the "files" - it only takes around 5 seconds. But takes more than 10 minutes before returning the "fileInfo"
// fileSystem is HDFS
// dateNow = java.util.Date
// basePath = new Path("/")
// filePattern = "*.sf"
private Map<String, Long> listFiles(final Date dateNow, final Path basePath,
final String filePattern) throws IOException {
RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(basePath, true);
_LOG.info("files=" + files);
// map containing <filename, filesize>
Map<String, Long> fileInfo = new HashMap<String, Long>();
String regex = RegexUtil.convertGlobToRegex(filePattern);
Pattern pattern = Pattern.compile(regex);
if (files != null) {
while (files.hasNext()) {
LocatedFileStatus file = files.next();
Path filePath = file.getPath();
// Get only the files with created date = current date
if (DateUtils.truncate(new Date(file.getModificationTime()),
java.util.Calendar.DAY_OF_MONTH).equals(dateNow)) {
if (pattern.matcher(filePath.getName()).matches()) {
fileInfo.put(file.getPath().getName(), file.getLen());
}
}
}
}
_LOG.info("fileInfo =" + fileInfo);
return fileInfo;
}
You Said
When I log the "files" - it only takes around 5 seconds
RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(basePath, true);
Yes. Because this part of the code only checks the File present at that path (eg.:- no.Of Files,size) Status not looking into the file what and how much data it Contains.
Now if you look into this part of code
while (files.hasNext()) {
LocatedFileStatus file = files.next();
Path filePath = file.getPath();
// Get only the files with created date = current date
if (DateUtils.truncate(new Date(file.getModificationTime()),
java.util.Calendar.DAY_OF_MONTH).equals(dateNow)) {
if (pattern.matcher(filePath.getName()).matches()) {
fileInfo.put(file.getPath().getName(), file.getLen());
}
}
}
then you analyze that it Iterate throughout the Content of all the files in List. So, Definitely It will take more time than the previous one. This files may contains a number of files with different size of Content.
So, Iterating into each file content will definitely took more time. It also depends upon the size of the files this directory Contains. The more large your file the more time would took this loop.
Use listStatus with a PathFinder. This does much of the work on the server-side, and accumulated.
Related
I am trying to create a program that uploads multiple files and stores their name and BPM tag into an ArrayList ready for comparison between the files. I have found two functions to help me but I am unable to combine them to get the function that I need.
The first function takes a singular mp3 file and outputs its data into the console (using mp3agic library):
File file = new File(dataPath("") + "/Song.mp3");
Mp3File mp3file = new Mp3File(file.getPath());
if (mp3file.hasId3v2Tag()) {
ID3v2 id3v2Tag = mp3file.getId3v2Tag();
println("Track: " + id3v2Tag.getTrack());
println("Artist: " + id3v2Tag.getArtist());
println("BPM: " + id3v2Tag.getBPM());
println("Album artist: " + id3v2Tag.getAlbumArtist());
}
The second function takes a data path and outputs the directory containing the names and info of the files in the folder
void setup() {
String path = "Desktop/mp3folder";
println("Listing all filenames in a directory: ");
String[] filenames = listFileNames(path);
printArray(filenames);
println("\nListing info about all files in a directory: ");
File[] files = listFiles(path);
for (int i = 0; i < files.length; i++) {
File f = files[i];
println("Name: " + f.getName());
println("Is directory: " + f.isDirectory())
println("-----------------------");
}
}
// This function returns all the files in a directory as an array of Strings
String[] listFileNames(String dir) {
File file = new File(dir);
if (file.isDirectory()) {
String names[] = file.list();
return names;
} else {
// If it's not a directory
return null;
}
}
// This function returns all the files in a directory as an array of File objects
// This is useful if you want more info about the file
File[] listFiles(String dir) {
File file = new File(dir);
if (file.isDirectory()) {
File[] files = file.listFiles();
return files;
} else {
// If it's not a directory
return null;
}
}
The function I am trying to create combines the two. I need the Artist, Track and BPM from the first function to work with an array list of files from a directory.
Any guidance would be appreciated. Any advice on another way to go about it would also be appreciated.
One way to approach this is to use classes to encapsulate the data you want to track.
For example, here's a simplified class that contains information about artist, track, and bpm:
public class TrackInfo{
private String artist;
private String track;
int bpm;
}
I would also take a step back, break your problem down into smaller steps, and then take those pieces on one at a time. Can you create a function that takes a File argument and prints out the MP3 data of that File?
void printMp3Info(File file){
// print out data about file
}
Get that working perfectly before moving on. Try calling it with hard-coded File instances before you try to use it with an ArrayList of multiple File instances.
Then if you get stuck, you can post a MCVE along with a specific technical question. Good luck.
What is the way to list files created, for example, between two timestamps? I'd like to list all newly created files and then move them to a different directory.
I'm working on Windows
public void afterDate() throws IOException {
final String pathToDirectory = "/path/to/directory";
final long afterDate = new Date().getTime();
final List<Path> paths = new ArrayList<>();
final Path directory = Paths.get(pathToDirectory);
try (DirectoryStream<Path> directoryStream = Files.newDirectoryStream(directory)) {
for (Path path : directoryStream) {
final BasicFileAttributes attr = Files.readAttributes(path, BasicFileAttributes.class);
final long creationTime = attr.creationTime().toMillis();
if (creationTime >= afterDate) {
paths.add(path);
}
}
}
for (final Path path : paths) {
System.out.println(path.getFileName());
}
}
If you want to actually watch out actively for newly created files, you can use a WatchService: http://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchService.html
If you are only looking for the files that are there, you could, for example use the Apache Commons FileUtils class' methods to list all files modified/create between two specific dates.
The logic will be as below
Use a file class.
Iterate through all the files in the directory
It can be done using the file.listFiles() method.
The method will return all the files (files as well as directories) in the directory.
Then for each of the file object, get the timestamp using file.lastModified() and then check if it is between the timestamps you have specified
startTimestamp < file.lastModified() < endTimestamp
I have the following dir tree
C:\folder1\folder2\SPECIALFolders1\folder3\file1.img
C:\folder1\folder2\SPECIALFolders2\folder3\file2.img
C:\folder1\folder2\SPECIALFolders3\folder3\file3.img
C:\folder1\folder2\SPECIALFolders4\folder3\file4.img
C:\folder1\folder2\SPECIALFolders5\folder3\file5.img
I want to get to folder2, list all dirs in it (SpecialFolders) then retrieve the paths of those folders while adding (folder3) to their path
The reason I'm doing this is I want later to pass this path (paths) to a method to retrieve last modified files in folder3. I know there are way easier ways to do it but this is a very particular case.
I'm also trying to retrieve those folders within a specific time range so I used a while loop for that
Date first = dateFormat.parse("2015-6-4");
Calendar ystr = Calendar.getInstance();
ystr.setTime(first);
Date d = dateFormat.parse("2015-6-1");
Calendar last = Calendar.getInstance();
last.setTime(d);
while(last.before(ystr))
{
//fullPath here = "C:\folder1\folder2\"
File dir = (new File(fullPath));
File[] files = dir.listFiles();
for (File file : files)
{
//Retrieve Directories only (skip files)
if (file.isDirectory())
{
fullPath = file.getPath();
//last.add(Calendar.DATE, 1);
System.out.println("Loop " + fullPath);
}
}
}
fullPath += "\\folder3\\";
return fullPath;
The problem with my code is that it only returns one path (that's the last one in the loop) --which make sense but I want to return all of the paths like this
C:\folder1\folder2\SPECIALFolders1\folder3\
C:\folder1\folder2\SPECIALFolders2\folder3\
C:\folder1\folder2\SPECIALFolders3\folder3\
C:\folder1\folder2\SPECIALFolders4\folder3\
C:\folder1\folder2\SPECIALFolders5\folder3\
I appreciate your input in advance
Instead of fullPath String, use for example ArrayList<String> to store all paths. Than instead of:
fullPath = file.getPath();
use:
yourArrayList.add(file.getPath());
Your method will return an ArrayList with all paths, and you will need to code a method to retrive all paths from it.
I'm working on a program. However, the program is that I'm on a Linux based operating system and it wants perfect case-match names for all of the files, and considering the artist has some named with Caps, some not, some have ".png" some are ".Png" and some are ".PNG", etc; this is becoming a very difficult task. There's a little over a thousand Sprites, or renaming them wouldn’t be a problem. This is for a 2D RPG Hobby project that I'm doing for learning, purposes that I've been working on for awhile now.
Anyhow, my question is if we can make the 'Compiler'(I think is the right way to word this) ignore the file ending character-casing? If I want to load the following items
1.jpg
2.Jpg
3.JPg
4.JPG
5.jpG
I would like to be able to do it in a single line.
You cannot make the compiler ignore case; this is a filesystem characteristic. Note that NTFS is case-insensitive but it is case-preserving nonetheless.
Using Java 7 you can use a DirectoryStream.Filter<Path> to collect the relevant paths; then rename if appropriate:
final DirectoryStream.Filter<Path> filter = new DirectoryStream.Filter<Path>()
{
#Override
public boolean accept(final Path entry)
{
return Files.isRegularFile(entry)
&& entry.getFileName().toString().toLowerCase().endsWith(".jpg");
}
};
final List<Path> collected = new ArrayList<Path>();
try (
final DirectoryStream<Path> entries = Files.newDirectoryStream(dir, filter);
) {
for (final Path entry: entries)
collected.add(entry);
}
Path dst;
String targetName;
for (final Path src: collected) {
targetName = src.getFileName().toString().toLowerCase();
dst = src.resolveSibling(targetName);
if (!Files.isSameFile(src, dst))
Files.move(src, dst, StandardCopyOption.ATOMIC_MOVE);
}
With Java 8 you would probably use Files.walk() and lambdas instead.
If you know the exact directory for the file, you could use File.list() to get an String[] of all files in this directory. By iterating over those and using toLowerCase() on the filenames you can find your desired file.
String filename = "1.jpg";
String targetFilename = null;
File directory = new File("/some/path");
for(String maybeTargetName : directory.list()) {
if(filename.equals(maybeTargetName.toLowerCase()) {
targetFilename = maybeTargetName;
break;
}
}
if(targetFilename != null) {
File targetFile = new File(directory, targetFilename);
}
I am looking a way to get the list of files inside a zip file. I created a method to get the list of files inside a directory but I am also looking a way to get files inside a zip as well instead of showing just zip file.
here is my method:
public ArrayList<String> listFiles(File f, String min, String max) {
try {
// parse input strings into date format
Date minDate = sdf.parse(min);
Date maxDate = sdf.parse(max);
//
File[] list = f.listFiles();
for (File file : list) {
double bytes = file.length();
double kilobytes = (bytes / 1024);
if (file.isFile()) {
String fileDateString = sdf.format(file.lastModified());
Date fileDate = sdf.parse(fileDateString);
if (fileDate.after(minDate) && fileDate.before(maxDate)) {
lss.add("'" + file.getAbsolutePath() +
"'" + " Size KB:" + kilobytes + " Last Modified: " +
sdf.format(file.lastModified()));
}
} else if (file.isDirectory()) {
listFiles(file.getAbsoluteFile(), min, max);
}
}
} catch (Exception e) {
e.getMessage();
}
return lss;
}
After having searched for a better answer for a while, I finally found a better way to do this. You can actually do the same thing in a more generic way using the Java NIO API (Since Java 7).
// this is the URI of the Zip file itself
URI zipUri = ...;
FileSystem zipFs = FileSystems.newFileSystem(zipUri, Collections.emptyMap());
// The path within the zip file you want to start from
Path root = zipFs.getPath("/");
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(Path path, BasicFileAttributes attrs) throws IOException {
// You can do anything you want with the path here
System.out.println(path);
// the BasicFileAttributes object has lots of useful meta data
// like file size, last modified date, etc...
return FileVisitResult.CONTINUE;
}
// The FileVisitor interface has more methods that
// are useful for handling directories.
});
This approach has the advantage that you can travers ANY file system this way: your normal windows or Unix filesystem, the file system contain contained within a zip or a jar, or any other really.
You can then trivially read the contents of any Path via the Files class, using methods like Files.copy(), File.readAllLines(), File.readAllBytes(), etc...
You can use ZipFile.entries() method to read the list of files via iteration as below:
File[] fList = directory.listFiles();
for (File file : fList)
{
ZipFile myZipFile = new ZipFile(fList.getName());
Enumeration zipEntries = myZipFile.entries();
while (zipEntries.hasMoreElements())
{
System.out.println(((ZipEntry) zipEntries.nextElement()).getName());
// you can do what ever you want on each zip file
}
}